DETAILED ACTION
Claims 1-20 are pending in this application. Claim 11 has been amended.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
35 U.S.C. 102
Applicant’s arguments (see Remarks filed 1/21/2026) regarding the rejections to claim 11 under 35 U.S.C. 102 have been fully considered by the examiner and are persuasive in view of the amendments made to the claims. However, given the change in scope of claim 11, the examiner has presented a new grounds of rejection over Meyerson and in view of Berlin as fully discussed below.
Further, applicant’s arguments (see Remarks filed 1/21/2026) regarding the rejections to claims 1 and 15 under 35 U.S.C. 102 have been fully considered by the examiner and are not persuasive. Applicant argues that Meyerson fails to teach using an activation of a first decoder to train a second decoder the examiner disagrees. Meyerson teaches in [0053] that each decoder has an activation layer and in [0067] and figure 3, that the training of the decoders to perform the classification tasks take place as consecutive passing of information from encoder-decoder pair 1, to a following, separate set of decoders, the decoder of the encoder-decoder pair 1, is being interpreted as a first decoder, and the decoders in the decoder set following this pair are being interpreted as the at least second decoder. Given that all of the decoders have multiple activation layers (See Meyerson [0053]) and that Meyerson teaches a forward propagation of training from an encoder-decoder pair to a following set of decoders, one of ordinary skill in the art would understand this as being analogous to using activations of one decoder to train a second decoder. Further, given that forward propagation is used to pass the input data as shown in figure 3, one of ordinary skill in the art would further understand that data is passed from encoder-decoder pair 1, to decoder set 1, as shown in figure 3, and therefore during training the activation layer of a first decoder must process the input data prior to a second decoder receiving that data during training. Therefore, for at least the reasons discussed above, the examiner respectfully maintains the rejections to claim 1 and 15 under 35 U.S.C. 102.
PNG
media_image1.png
208
362
media_image1.png
Greyscale
(Meyerson, [00067])
PNG
media_image2.png
540
746
media_image2.png
Greyscale
(Meyerson Figure 3)
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claims 1, 2, 5-10, 15-16 and 19 are rejected under 35 U.S.C. 102(a)(1) as being anticipated over Meyerson (US 20190244108 A1).
Regarding claim 1 Meyerson discloses, A method of training a machine learning model, the method comprising (Meyerson, [0010] Figure 2 shows a method of jointly training multiple pairs of encoders and decoders):
training a decoder of a primary part of a composite neural network to identify a primary segment of a person in a training image (Meyerson, [0010] Figure 2 shows a method of jointly training multiple pairs of encoders and decoders, indicating at least a first, second and third encoder-decoder pair (parts of the composite network) and at least three decoders being trained, [0061] the network is being trained to perform facial recognition, so it is identifying a face of a person in a training image);
freezing the decoder of the primary part of the composite neural network after training the decoder of the primary part of the composite neural network (Meyerson, [0097] the weights of multiple decoders (at least 3, see figures 2 and 3) are frozen while updating weights of other decoders in the composite network, [0088]-[0096] detail the training steps that precede the freezing of the decoder);
and after freezing the decoder of the primary part of the composite neural network (Meyerson, [0097] the weights of multiple decoders (at least 3, see figures 2 and 3) are frozen while updating weights of other decoders in the composite network),
training, using an activation of the decoder of the primary part of the composite neural network indicating a location in the training image of the primary segment (Meyerson, [0053] each decoder has an activation layer, [0060] the model is trained to perform classification tasks using 400 encoder-decoder pairs [0061] the classification tasks include identifying different facial features (subsegments of the face which is the primary segment) appearing in one image),
a decoder of a secondary part of the composite neural network to identify a first subsegment of the person or a feature of the first subsegment of the person within the location in the training image (Meyerson, [0053] each decoder has an activation layer, [0060] the model is trained to perform classification tasks using 400 encoder-decoder pairs [0061] the classification tasks include identifying different facial features (subsegments of the face which is the primary segment) appearing in one image, [0062]-[0066] the multiple encoder decoder pairs (at least a first, second and third decoder, indicating a primary, secondary and tertiary part of the model) are trained to complete classification tasks to identify multiple features appearing in the same facial image of a person, for example the task can be identifying if the person is wearing glasses or not, the features of the images being the subsegments of the primary segment of the image being the person’s face identified),
wherein the first subsegment of the person is a part of the primary segment (Meyerson, [0053] each decoder has an activation layer, [0060] the model is trained to perform classification tasks using 400 encoder-decoder pairs [0061] the classification tasks include identifying different facial features (subsegments of the face which is the primary segment) appearing in one image, [0062]-[0066] the multiple encoder decoder pairs (at least a first, second and third decoder, indicating a primary, secondary and tertiary part of the model) are trained to complete classification tasks to identify multiple features appearing in the same facial image of a person, for example the task can be identifying if the person is wearing glasses or not, the features of the images being the subsegments of the primary segment of the image being the person’s face identified).
Regarding claim 2, Meyerson discloses; The method of Claim 1, wherein the primary segment of the person comprises a portion of the person appearing in the training image (Meyerson, [0053] each decoder has an activation layer, [0060] the model is trained to perform classification tasks using 400 encoder-decoder pairs [0061] the classification tasks include identifying different facial features (subsegments of the face which is the primary segment) appearing in one image),
and wherein the first subsegment of the person comprises a sub-portion of the portion of the person appearing in the training image (Meyerson, [0053] each decoder has an activation layer, [0060] the model is trained to perform classification tasks using 400 encoder-decoder pairs [0061] the classification tasks include identifying different facial features (subsegments of the face which is the primary segment) appearing in one image, [0062]-[0066] the multiple encoder decoder pairs (at least a first, second and third decoder, indicating a primary, secondary and tertiary part of the model) are trained to complete classification tasks to identify multiple features appearing in the same facial image of a person, for example the task can be identifying if the person is wearing glasses or not, the features of the images being the subsegments of the primary segment of the image being the person’s face identified).
Regarding claim 5 Meyerson discloses; The method of Claim 1, wherein training the decoder of the primary part of the composite neural network sets a plurality of weights of the decoder of the primary part of the composite neural network (Meyerson, [0068]-[0069] the weights of multiple (at least a first, second and third decoder) are updated (set)), and wherein freezing the decoder of the primary part of the composite neural network comprises freezing the plurality of weights of the decoder of the primary part of the composite neural network (Meyerson, [0097] the weights of multiple decoders (at least 3, see figures 2 and 3) are frozen while updating weights of other decoders in the composite network).
Regarding claim 7 Meyerson discloses; The method of Claim 1, wherein the first subsegment of the person is contained within the primary segment of the person (Meyerson, [0060] the model is trained to perform classification tasks using 400 encoder-decoder pairs [0061] the classification tasks include identifying different facial features (subsegments of the face which is the primary segment) appearing in one image, the face of the person is identified then multiple features/subsegments are identified which are contained within the face of the person).
Regarding claim 8 Meyerson discloses; The method of Claim 1, wherein training the decoder of the secondary part of the composite neural network is further based on the training image (Meyerson, [0060] the model is trained to perform classification tasks using 400 encoder-decoder pairs (at least a first and a second) [0061] the classification tasks include identifying different facial features (subsegments of the face which is the primary segment) appearing in one image, [0062]-[0066] the multiple encoder decoder pairs (at least a first, second and third decoder, indicating a primary, secondary and tertiary part of the model) are trained to complete classification tasks to identify multiple features appearing in the same facial image of a person, for example the task can be identifying if the person is wearing glasses or not, the features of the images being the subsegments of the primary segment of the image being the person’s face identified).
Regarding claim 9 Meyerson discloses; The method of Claim 1, further comprising: freezing the decoder of the secondary part of the composite neural network after training the decoder of the secondary part of the composite neural network (Meyerson, [0097] the weights of multiple decoders (at least 3, see figures 2 and 3) are frozen while updating weights of other decoders in the composite network, [0088]-[0096] detail the training steps that precede the freezing of the decoder);
and after freezing the decoder of the secondary part of the composite neural network (Meyerson, [0097] the weights of multiple decoders (at least 3, see figures 2 and 3) are frozen while updating weights of other decoders in the composite network), training, using activations of the decoder of the secondary part of the composite neural network (Meyerson, [0053] each decoder has an activation layer, where there is at least a first, second and third decoder of the model [0060] the model is trained to perform classification tasks using 400 encoder-decoder pairs (at least a first, second and third)),
a decoder of a tertiary part of the composite neural network to identify a second subsegment of the person or a feature of the second subsegment of the person in the training image (Meyerson, [0053] each decoder has an activation layer, where there is at least a first, second and third decoder of the model [0060] the model is trained to perform classification tasks using 400 encoder-decoder pairs (at least a first, second and third) [0061] the classification tasks include identifying different facial features (a plurality of subsegments of the face which is the primary segment) appearing in one image), wherein the second subsegment of the person is a subset of the first subsegment of the person (Meyerson, 0061] the classification tasks include identifying different facial features (a plurality of subsegments of the face which is the primary segment) appearing in one image, this paragraph also details at least two features of the face being identified, where the face is the primary segment, and the features are subsegments of the face and included within the face).
Regarding claim 10 Meyerson discloses; The method of Claim 9, further comprising: freezing the decoder of the tertiary part of the composite neural network after training the decoder of the tertiary part of the composite neural network (Meyerson, [0097] the weights of multiple decoders (at least 3, see figures 2 and 3) are frozen while updating weights of other decoders in the composite network, [0088]-[0096] details the training of the model preceding freezing the at least three, (primary, secondary and tertiary decoders/parts of the model));
and after freezing the decoder of the tertiary part of the composite neural network (Meyerson, [0097] the weights of multiple decoders (at least 3 decoders, or a first, second and third part of the network with a decoder, see figures 2 and 3) are frozen while updating weights of other decoders in the composite network), training, using activations of the decoder of the secondary part of the composite neural network (Meyerson, [0053] each of the plurality of decoders has an activation layer, where there is at least a first, second and third decoder of the model having a first, second and third activation layer [0060] the model is trained to perform classification tasks using multiple of the 400 encoder-decoder pairs (at least a first, second and third being trained)),
a decoder of the quaternary part of the composite neural network to identify a third subsegment of the person or a feature of the third subsegment of the person in the training image (Meyerson, [0053] each of the plurality of decoders has an activation layer, where there is at least a first, second, and fourth decoder of the model having a first, second, third and fourth activation layer [0060] the model is trained to perform classification tasks using multiple of the 400 encoder-decoder pairs (at least a first, second and third being trained), Meyerson, [0061] the classification tasks include identifying different facial features (a plurality of subsegments of the face which is the primary segment) appearing in one image, this paragraph also details at least two features of the face being identified,),
wherein the third subsegment of the person is a subset of the first subsegment of the person (Meyerson, [0061] the classification tasks include identifying different facial features (a plurality of subsegments of the face which is the primary segment) appearing in one image, this paragraph also details at least two features of the face being identified, where the face is the primary segment, and the features are subsegments of the face and included within the face).
Regarding claim 15, Meyerson discloses; A method of training a machine learning model, the method comprising:
training a first channel of a decoder of a primary part of a composite neural network to identify a primary segment of a person in a training image (Meyerson, [0010] Figure 2 shows a method of jointly training multiple pairs of encoders and decoders, each of which have a set of channels within them, indicating at least a first, second and third channel [0061] the network is being trained to perform facial recognition, so it is identifying a face of a person in a training image);
while training the first channel of the decoder of the primary part of the composite neural network, training, using the training image (Meyerson, [0010] Figure 2 shows a method of jointly training multiple pairs of encoders and decoders, each of which have a set of channels within them, indicating at least a first, second and third channel [0061] the network is being trained to perform facial recognition, so it is identifying a face of a person in a training image),
a second channel of the decoder of the neural network to identify a subsegment of the person or a feature of the first subsegment of the person in the training image (Meyerson, [0061] the network is being trained to perform facial recognition, so it is identifying a face of a person in a training image, the system has multiple (400) encoder and decoder pairs additionally the encoder-decoder pairs which include channels are trained in identifying different facial features (a plurality of subsegments of the face which is the primary segment) appearing in one image, this paragraph also details at least two features of the face being identified, where the face is the primary segment, and the features are subsegments of the face and included within the face),
wherein the first subsegment of the person is positioned within the primary segment of the person (Meyerson, [0061] the network is being trained to perform facial recognition, so it is identifying a face of a person in a training image, additionally the encoder-decoder pairs which include channels are trained in identifying different facial features (a plurality of subsegments of the face which is the primary segment) appearing in one image, this paragraph also details at least two features of the face being identified, where the face is the primary segment, and the features are subsegments of the face and included within the face)
freezing the decoder of the primary part of the composite neural network after training the first channel and the second channel of the decoder of the primary part of the composite neural network (Meyerson, [0097] the weights of multiple decoders (at least 3, see figures 2 and 3) are frozen while updating weights of other decoders in the composite network, [0088]-[0096] detail the training steps that precede the freezing of the decoder);
and after freezing the decoder of the primary part of the composite neural network (Meyerson, [0097] the weights of multiple decoders (at least 3, see figures 2 and 3) are frozen while updating weights of other decoders in the composite network), training, using the training image and an activation of the decoder of the primary part of the composite neural network indicating locations in the training image of the primary segment and the first subsegment (Meyerson, [0053] each decoder has an activation layer, where there is at least a first, second and third decoder of the model [0060] the model is trained to perform classification tasks using 400 encoder-decoder pairs (at least a first, second and third)),
a third channel of a decoder of a secondary part of the composite neural network to identify a second subsegment of the person or a feature of the second subsegment of the person in the training image (Meyerson, [0053] each decoder has channels within it, where there is at least a first, second and third decoder of the model and therefore at least a first, second and third channel [0060] the model is trained to perform classification tasks using 400 encoder-decoder pairs (at least a first, second and third) [0061] the classification tasks include identifying different facial features (a plurality of subsegments of the face which is the primary segment) appearing in one image),
wherein the second subsegment of the person is positioned within the first subsegment of the object(Meyerson, 0061] the classification tasks include identifying different facial features (a plurality of subsegments of the face which is the primary segment) appearing in one image, this paragraph also details at least two features of the face being identified, where the face is the primary segment, and the features are subsegments of the face and included within the face).
Regarding claim 16 Meyerson discloses; The method of Claim 15, wherein the primary segment of the person comprises a portion of the person appearing in the training image (Meyerson, [0053] each decoder has an activation layer, [0060] the model is trained to perform classification tasks using 400 encoder-decoder pairs [0061] the classification tasks include identifying different facial features (subsegments of the face which is the primary segment) appearing in one image), and wherein the first subsegment of the person comprises a sub-portion of the portion(Meyerson, [0053] each decoder has an activation layer, [0060] the model is trained to perform classification tasks using 400 encoder-decoder pairs [0061] the classification tasks include identifying different facial features (subsegments of the face which is the primary segment) appearing in one image, [0062]-[0066] the multiple encoder decoder pairs (at least a first, second and third decoder, indicating a primary, secondary and tertiary part of the model) are trained to complete classification tasks to identify multiple features appearing in the same facial image of a person, for example the task can be identifying if the person is wearing glasses or not, the features of the images being the subsegments of the primary segment of the image being the person’s face identified).
Regarding claim 19 Meyerson discloses; The method of Claim 15, wherein training the first channel and the second channel of the decoder of the primary part of the composite neural network sets a plurality of weights of the decoder of the primary part of the composite neural network (Meyerson, [0068]-[0069] the weights of multiple (at least a first, second and third decoder) are updated (set)), and wherein freezing the decoder of the primary part of the composite neural network comprises freezing the plurality of weights of the decoder of the primary part of the composite neural network (Meyerson, [0097] the weights of multiple decoders (at least 3, see figures 2 and 3) are frozen while updating weights of other decoders in the composite network).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 3-4, and 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Meyerson (US 20190244108 A1) and in further view of Morzhakov (US 20200349347 A1)
Regarding claim 3, Meyerson does not teach; The method of Claim 2, wherein the primary segment of the person comprises an upper body of the person appearing in the training image, and wherein the first subsegment of the person comprises a head of the person appearing in the training image.
However in the same field of endeavor of using Neural networks for object detection Morzhakov teaches; wherein the primary segment of the person comprises an upper body of the person appearing in the training image(Morzhakov, [0017]-[0019] the posture and movement of a person is tracked to detect changes in their pose, therefore the person’s body, including their upper body must also be detected in the image as shown by the stick figures in figure 2), and wherein the first subsegment of the person comprises a head of the person appearing in the training image (Morzhakov, [0017]-[0019] the posture and movement of a person is tracked to detect changes in their pose, [0007] identity of a person may be recognized using the face of the person).
The combination of Meyerson and Morzhakov would be obvious to one of ordinary skill in the art prior to the effective filing date. The motivation for the combination lies in that identification of people, including features of the face or body may be helpful in identification of individuals shown on video or images (Morzhakov [0023]-[0032]).
Regarding claim 4 the combination of Meyerson and Morzhakov teaches; The method of Claim 1, further comprising converting, using an encoder of the primary part of the composite neural network, the training image into a latent space (Morzhakov [0076] The encoder generated the input in latent space to train and autoencoder, which examiner is interpretating as having to convert training image into latent space), wherein the decoder of the primary part of the composite neural network is trained using the latent space (Morzhakov, [0063] the autoencoder includes a decoder, [0076] the autoencoder is trained using the encoder input).
The combination of Meyerson and Morzhakov would have been obvious to one of ordinary skill in the art prior to the effective filing date of the presently claimed invention. The motivation of the combination lies in that encoders and decoders may share data using latent space, and this may aid with self training. (Morzhakov [0077])
Regarding claim 17 the combination of Meyerson and Morzhakov teaches; The method of Claim 16, wherein the primary segment of the person comprises an upper body of the person appearing in the training image (Morzhakov, [0017]-[0019] the posture and movement of a person is tracked to detect changes in their pose, therefore the person’s body, including their upper body must also be detected in the image as shown by the stick figures in figure 2), and wherein the first subsegment of the person comprises a head of the person appearing in the training image (Morzhakov, [0017]-[0019] the posture and movement of a person is tracked to detect changes in their pose, [0007] identity of a person may be recognized using the face of the person).
The combination of Meyerson and Morzhakov would be obvious to one of ordinary skill in the art prior to the effective filing date. The motivation for the combination lies in that identification of people, including features of the face or body may be helpful in identification of individuals shown on video or images (Morzhakov [0023]-[0032]).
Regarding claim 18 the combination of Meyerson and Morzhakov teaches; The method of Claim 15, further comprising converting, using an encoder of the primary part of the composite neural network (Morzhakov [0076] The encoder generated the input in latent space to train and autoencoder, which examiner is interpretating as having to convert training ), the training image into a latent space, wherein the first channel and the second channel are trained using the latent space (Morzhakov, [0076]-[0077] multiple autoencoders may be trained using the data, which examiner is interpreting as being analogous to training multiple channels since each autoencoder reconstructs a different input).
The combination of Meyerson and Morzhakov would have been obvious to one of ordinary skill in the art prior to the effective filing date of the presently claimed invention. The motivation of the combination lies in that encoders and decoders may share data using latent space, and this may aid with self-training. (Morzhakov [0077])
Claims 6 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Meyerson (US 20190244108 A1) in view of Shih (US 20210064925 A1).
Regarding claim 6 Meyerson fails to disclose; The method of Claim 1, wherein the activation of the decoder of the primary part of the composite neural network comprises a heatmap indicating probabilities where the primary segment of the person appears at the location
However, in the same field of endeavor of using neural networks for object detection, Shih teaches; wherein the activation of the decoder of the primary part of the composite neural network comprises a heatmap indicating probabilities where the primary segment of the person appears at the location (Shih, [0063] the model generates Gaussian heatmaps which can be fed into a decoder or series of decoders, further, the heatmaps contain human pose information, a Gaussian heatmap is a confidence map created from the training image).
The combination of Meyerson and Shih would be obvious to one of ordinary skill in the art before the effective filing date of the presently claimed invention. The motivation for the combination lies in that the use of a heatmap to show the probability of an object’s location would improve the system by providing a visualization of the object’s location. (Shih [0063])
Regarding claim 20, the combination of Meyerson and Shih teaches; The method of Claim 15, wherein the activation of the decoder of the primary part of the composite neural network comprises a heatmap indicating probabilities where the primary segment of the person and the first subsegment of the person appear at locations in the training image (Shih, [0063] the model generates Gaussian heatmaps which can be fed into a decoder or series of decoders, further, the heatmaps contain human pose information, a Gaussian heatmap is a confidence map created from the training image).
The combination of Meyerson and Shih would be obvious to one of ordinary skill in the art before the effective filing date of the presently claimed invention. The motivation for the combination lies in that the use of a heatmap to show the probability of an object’s location would improve the system by providing a visualization of the object’s location. (Shih [0063])
Claims 11 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Meyerson (US 20190244108 A1) in view of Berlin (US 20230049729 A1).
Regarding claim 11 Meyerson discloses; A method of training a machine learning model, the method comprising: training a first channel of a decoder of a neural network to identify a primary segment of a person in a training image (Meyerson, [0010] Figure 2 shows a method of jointly training multiple pairs of encoders and decoders, each of which have a set of channels within them, indicating at least a first, second and third channel [0061] the network is being trained to perform facial recognition, so it is identifying a face of a person in a training image);
[and while training the first channel of the decoder, training, using the training image, a second channel of the decoder to identify a subsegment of the person or a feature of the subsegment of the person in the training image such that detection of the primary segment and detection of the subsegment feature are jointly learned,
wherein the subsegment of the person is positioned within the primary segment of the person.]
Meyerson does not disclose;
and while training the first channel of the decoder, training, using the training image, a second channel of the decoder to identify a subsegment of the person or a feature of the subsegment of the person in the training image such that detection of the primary segment and detection of the subsegment feature are jointly learned,
wherein the subsegment of the person is positioned within the primary segment of the person.
however, in the same field of endeavor, Berlin teaches; and while training the first channel of the decoder (Berlin, [0218]-[0221] figure 14 shows the training of the encoder/decoder pairs, where multiple channels (Input layer shown with inputs I0-In, indicating at least 3 channels as shown below) are trained with different feature inputs from the training image, [0213] the system may use channel-wise attention on network branches to train for different features or groups of features in the image, where channel-wise attention as one of ordinary skill in the art would understand is a means of training channels to learn different features/segments and adjusting the weights applied to each channel accordingly),
training, using the training image (Berlin,[0227] a decoder is trained for a specific target face (training image)), a second channel of the decoder to identify a subsegment of the person or a feature of the subsegment of the person in the training image (Berlin, [0218]-[0221] figure 14 shows the training of the encoder/decoder pairs, where multiple channels (Input layer shown with inputs I0-In, indicating at least 3 channels as shown below) are trained with different feature inputs from the training image, [0213] the system may use channel-wise attention on network branches to train for different features or groups of features in the image, where channel-wise attention as one of ordinary skill in the art would understand is a means of training channels to learn different features/segments and adjusting the weights applied to each channel accordingly, further, [0227]- [0229] states that the when training the decoder in this manner, the system may use multiple feature maps from the face, where the features are part of the face (primary segment) and each feature is learned as a different input vector and is input into a different channel to be learned jointly, as shown in figure 14 below) such that detection of the primary segment and detection of the subsegment feature are jointly learned (Berlin, [0227]- [0229] states that the when training the decoder in this manner, the system may use multiple feature maps from the face, where the features are part of the face (primary segment) and each feature is learned as a different input vector and is input into a different channel to be learned jointly, as shown in figure 14 below),
PNG
media_image3.png
606
470
media_image3.png
Greyscale
(Berlin, figure 14 emphasis added)
wherein the subsegment of the person is positioned within the primary segment of the person (Berlin, [0229] the decoder learns an image of a person’s face (primary segment of a person) and features (subsegments) of the face to predict an output face image).
The combination of Meyerson and Berlin would have been obvious to one of ordinary skill in the art prior to the effective filing date of the presently claimed invention. Meyerson teaches a training method for training a composite network to identify a person in an image, however it does not teach channel wise training of a decoder. Berlin teaches this deficiency, and the motivation for the addition of this feature to the method of Meyerson is that channel-wise training methods can improve the learning of different features. (Berlin, [0213])
Regarding claim 12 the combination of Meyerson and Berlin teaches; The method of Claim 11, wherein the primary segment of the person comprises a portion of the person appearing in the training image (Meyerson, [0053] each decoder has an activation layer, [0060] the model is trained to perform classification tasks using 400 encoder-decoder pairs [0061] the classification tasks include identifying different facial features (subsegments of the face which is the primary segment) appearing in one image), and wherein the subsegment of the object(Meyerson, [0053] each decoder has an activation layer, [0060] the model is trained to perform classification tasks using 400 encoder-decoder pairs [0061] the classification tasks include identifying different facial features (subsegments of the face which is the primary segment) appearing in one image, [0062]-[0066] the multiple encoder decoder pairs (at least a first, second and third decoder, indicating a primary, secondary and tertiary part of the model) are trained to complete classification tasks to identify multiple features appearing in the same facial image of a person, for example the task can be identifying if the person is wearing glasses or not, the features of the images being the subsegments of the primary segment of the image being the person’s face identified).
Claims 13 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Meyerson (US 20190244108 A1) in view of Berlin (US 20230049729 A1) and in further view of Morzhakov (US 20200349347 A1)
Regarding claim 13 the combination of Meyerson and Berlin fails to teach; The method of Claim 12, wherein the primary segment of the person comprises an upper body of the person appearing in the training, and wherein the subsegment of the person comprises a head of the person appearing in the training image.
However, in the same field of endeavor, Morzhakov teaches; wherein the primary segment of the person comprises an upper body of the person appearing in the training image (Morzhakov, [0017]-[0019] the posture and movement of a person is tracked to detect changes in their pose, therefore the person’s body, including their upper body must also be detected in the image as shown by the stick figures in figure 2), and wherein the subsegment of the person comprises a head of the person appearing in the training image (Morzhakov, [0017]-[0019] the posture and movement of a person is tracked to detect changes in their pose, [0007] identity of a person may be recognized using the face of the person).
The combination of Meyerson, Berlin and Morzhakov would be obvious to one of ordinary skill in the art prior to the effective filing date. The motivation for the combination lies in that identification of people, including features of the face or body may be helpful in identification of individuals shown on video or images (Morzhakov [0023]-[0032]).
Regarding claim 14 the combination of Meyerson, Berlin and Morzhakov teaches; The method of Claim 11, further comprising converting, using an encoder of the neural network, the training image into a latent space (Morzhakov [0076] The encoder generated the input in latent space to train and autoencoder, which examiner is interpretating as having to convert training), wherein the first channel and the second channel are trained using the latent space. (Morzhakov, [0076]-[0077] multiple autoencoders may be trained using the data, which examiner is interpreting as being analogous to training multiple channels since each autoencoder reconstructs a different input).
The combination of Meyerson, Berlin and Morzhakov would have been obvious to one of ordinary skill in the art prior to the effective filing date of the presently claimed invention. The motivation of the combination lies in that encoders and decoders may share data using latent space, and this may aid with self training. (Morzhakov [0077])
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure is cited on the attached PTO-892 form.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JORDAN M ELLIOTT whose telephone number is (703)756-5463. The examiner can normally be reached M-F 8AM-5PM ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Emily Terrell can be reached on (571) 270-3717. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/J.M.E./Examiner, Art Unit 2666
/EMILY C TERRELL/Supervisory Patent Examiner, Art Unit 2666