Office Action Analysis: 17748710 — Training Recognition Device

Office Action

§103 §112
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Detailed Action
The following action is in response to the communication(s) received on 02/11/2026.
As of the claims filed 02/11/2026:
Claims 1, 5, and 9 have been amended.
Claims 1-12 are pending.
Claims 1 and 9 are independent claims.

Response to Arguments
Applicant’s arguments filed 08/14/2025 have been fully considered, but are not fully persuasive.
With respect to the indefiniteness rejection under 35 USC § 112:
	Applicant asserts that the amendments to the claims have rendered the indefiniteness rejections to claims 5-7 moot. Examiner respectfully submits that the original rejections to the amended claim 5 have been withdrawn, but new indefinite rejections have been raised. Please see the new rejection below. Additionally, claims 6 and 7, which are not dependent on claim 5, have not been amended; thus, the rejections for claims 6 and 7 have been maintained.

With respect to the art rejection under 35 USC § 103:
	Applicant asserts that Liu does not disclose “the dual feature extraction paths, the claimed feature-point-to- feature-point difference calculation between those paths, and the claimed coefficient updating and identification re-training that are expressly based on that feature-point difference.” (p.8 ¶2) Examiner respectfully submits that 
dual feature execution paths, the feature point to feature point different calculations, and the argued coefficients are not recited in the claims, and thus cannot be read into the claims.
	Applicant further asserts that Liu “does not disclose a "pre-trained feature extraction unit" that (i) produces a "feature point of the simulation image" as a reference output at the re-training time and (ii) is then paired with a distinct re-training feature extraction unit such that the two feature-point outputs are compared by a dedicated error calculation unit. Rao does not cure these deficiencies.” (p.9 ¶1). Examiner respectfully disagrees. Regarding the “pre-trained feature extraction unit,” fig.3 step 2, generator B corresponds to the pre-trained feature extraction unit. Each frame of the simulated annotation of instance segmentation and tracking corresponds to each feature point of the simulation image, thus satisfying (i). Additionally, the re-training feature extraction unit being distinct cannot be read into the claims as it is not recited in the claims; thus, (ii) is moot. 
	Applicant further asserts that Liu and Rao in combination does not teach the “re-training feature extraction unit” that outputs a "feature point of the artificial site image" and is retrained in the claimed architecture that culminates in feature-point-to-feature-point comparison across two extractor paths. (p.10 ¶1) Examiner respectfully submits that Rao, p.3 right last ¶, the Sim2Real generator G(x) corresponds to the output of “the re-training feature extraction unit”; F(G(x)) corresponds to the output of the pre-trained feature extraction unit; the cycle consistency loss corresponds to the distance between the feature point outputs, thus the comparison across two extractor paths (F and G).
	Applicant further asserts that Liu/Rao does not disclose a unit that calculates a difference between feature-point outputs of two different feature extraction units. (p.11 ¶1) Examiner respectfully submits that, as recited above, the different feature extraction units are not recited in the claims, and thus cannot be read into the claims.
	Applicant further asserts that the rejection does not disclose “the claimed feature-point-driven retraining cascade” (p.12 ¶1). However, the specific feature-point-driven retraining cascade is not clearly recited in the claims to suggest against the current prior art. Thus, Liu/Rao (Liu, fig. 3; [p.5 left ¶2]; [p.8 left ¶2]) remains teaching this limitation, where the embedding based instance segmentation and tracking model is trained by the above stages and identifies the tracking of the elements on the video, thus corresponding to the re-training identification unit, and are based on the coefficient update unit, since the input to RSHN is based on the output from Generator B.
	Applicant further asserts that Rao’s cycle consistency loss does not teach the image translation pipeline (p.12 ¶2). Examiner respectfully submits that the prior arts must be read in combination and not in separation; the image translation pipeline is taught by Liu, while the specific consistency loss function is further taught by Rao.
	Applicant further asserts that Payer does not cure the alleged features recited in claims 1 and 9 for claims 3, 5-7, and 11 (p.13 ¶2). Examiner respectfully submits that the features remain taught by Liu/Rao for the reasons given above.
	
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No JP2021-095798,
filed on 06/08/2021. 
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Such claim limitation(s) is/are: 
Claim 1, 9: “an image conversion unit configured to …; a pre-trained feature extraction unit configured to…; a re-training feature extraction unit configured to…; an error calculation unit for feature extraction unit configured to…; a coefficient update unit for feature extraction unit configured to…; a coefficient update unit for feature extraction unit configured to”
Claims 3, 10: “an accuracy determination unit configured to”
Claims 4, 12: “a switching unit configured to”
Claims 5: “an error calculation unit configured to” 
Claims 6, 7: “an error calculation unit for identification unit configured to” 


Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.

If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 5-7 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claims 6 and 7 each recite: 
an error calculation unit for identification unit…
a coefficient update unit for identification unit…

It is unclear whether “identification unit” refers to the “re-training identification unit” from parent claim 1 or a new identification unit. It is also unclear whether they are referring back to the same error calculation unit/same coefficient update unit as claim 1 or an entirely new unit. 
Additionally, Claim 5 recites:
the coefficient update unit for identification unit…
It is unclear the coefficient update unit for identification unit in claim 5 is referring to the coefficient update unit recited earlier in claim 5, or a new coefficient update unit. 


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

	Claims 1, 2, 4, 8, 9, 10, and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al., "ASIST: ANNOTATION-FREE SYNTHETIC INSTANCE SEGMENTATION AND TRACKING BY ADVERSARIAL SIMULATIONS" (hereinafter Liu) in view of Rao et al., "RL-CycleGAN: Reinforcement Learning Aware Simulation-To-Real" (hereinafter Rao).
	Regarding Claim 1, Liu teaches:
	A training recognition device comprising: an image conversion unit configured to input a simulation image and an actual site image into a generative adversarial network and convert the simulation image into an artificial site image, the artificial site image being an image imitating a site image including an article to be recognized;  (Liu, fig. 3 step 2,
    PNG
    media_image1.png
    669
    1139
    media_image1.png
    Greyscale

	) (Note: the simulated annotation corresponds to the simulation image; the synthesized video corresponds to the artificial site image; the synthesized video is being recognized by RSHN, thus corresponding to imitating a site image including an article to be recognized)
	a pre-trained feature extraction unit configured to input the simulation image to a trained deep neural network trained using the simulation image and annotation data for the simulation image and output a feature point of the simulation image at time of re-training; (Liu, fig. 3 step 2 right “synthesized microvilli video” = simulation image; [p.5 left ¶2] 3. Methods
Our study has three steps: unsupervised image-annotation synthesis, video synthesis and instance segmentation and tracking (Fig. 3)…
[p.8 left ¶2] 3.4. Instance segmentation and tracking
From the above stages, the synthetic videos and corresponding annotations are achieved frame-by-frame. The next step is to train our instance segmentation and tracking model.) (Note: the embedding based instance segmentation and tracking model (in step 3 of fig.3 and section 3.4) is trained by the above stages and identifies the tracking of the elements on the video for the RSHN, thus corresponding to the time of re-training; each frame of the simulated annotation of instance segmentation and tracking corresponds to each feature point of the simulation image; the RSHN corresponds to the trained deep neural network)
	a re-training feature extraction unit configured to input the artificial site image to a deep neural network for re-training, re-train a difference between the simulation image and the artificial site image, and output a feature point of the artificial site image; (Liu, fig.3 step 1; bottom, 
    PNG
    media_image1.png
    669
    1139
    media_image1.png
    Greyscale
 Generator B corresponds to the re-training feature extraction unit; step 1 trains generator B in epochs (see fig. 5 below) thus retrains the difference between the simulation image and the artificial site image; each frame of the synthesized (microvilli) video corresponds to the feature point of the artificial site image; RSHN, which corresponds to the deep neural network for re-training, takes in the input of the synthesized video thus inputting the artificial site image to the deep neural network for re-training.

    PNG
    media_image2.png
    732
    881
    media_image2.png
    Greyscale
)
	Liu does not explicitly teach, but Rao further teaches:
	an error calculation unit for feature extraction unit configured to calculate a difference between the feature point output by the re-training feature extraction unit and the feature point output by the pre-trained feature extraction unit;  (Rao, p.3 right last ¶,
	
    PNG
    media_image3.png
    198
    457
    media_image3.png
    Greyscale

    PNG
    media_image4.png
    194
    468
    media_image4.png
    Greyscale
) (Note: the Sim2Real generator G(x) corresponds to the output of the re-training feature extraction unit; F(G(x)) corresponds to the output of the pre-trained feature extraction unit; the cycle consistency loss corresponds to the distance between the feature point outputs)
	a coefficient update unit for feature extraction unit configured to update a coefficient of the re-training feature extraction unit used for re-training based on the difference;  (Rao, p.3 right last ¶,
	
    PNG
    media_image3.png
    198
    457
    media_image3.png
    Greyscale

    PNG
    media_image4.png
    194
    468
    media_image4.png
    Greyscale
) (Note: the output of Sim2Real generator G(x) corresponds to the output of the re-training feature extraction unit; the output of F(G(x)) corresponds to the output of the pre-trained feature extraction unit; the cycle consistency loss corresponds to the distance between the feature point outputs and the coefficient of the re-training feature extraction unit)
Rao and Liu are analogous to the present invention because both are from the same field of endeavor of cycleGAN-based training methods. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to implement the loss function for training the generators from Rao into Liu’s GAN-based synthesized annotation method. The motivation would be to “Incorporating this loss into unsupervised domain translation, we obtain RL-CycleGAN, a new approach for simulation-to-real-world transfer for reinforcement learning… offers a substantial improvement over a number of prior methods for sim-to-real transfer, attaining excellent real-world performance with only a modest number of real-world observations.” (Rao, abstract).
	
	Liu, via Liu/Rao, further teaches:
	and a re-training identification unit configured to re-train a method for identifying an article based on a feature point output from the deep neural network for re-training of the coefficient updated by the coefficient update unit for feature extraction unit. (Liu, fig. 3, the synthesized video corresponds to the feature point outputs from the coefficient from the feature extraction unit
    PNG
    media_image1.png
    669
    1139
    media_image1.png
    Greyscale

	Last, an embedding based instance segmentation and tracking algorithm is trained using synthetic training data…
[p.5 left ¶2] 3. Methods
Our study has three steps: unsupervised image-annotation synthesis, video synthesis and instance segmentation and tracking (Fig. 3)…
[p.8 left ¶2] 3.4. Instance segmentation and tracking
From the above stages, the synthetic videos and corresponding annotations are achieved frame-by-frame. The next step is to train our instance segmentation and tracking model.) (Note: the embedding based instance segmentation and tracking model is trained by the above stages and identifies the tracking of the elements on the video, thus corresponding to the retraining identification unit, and are based on the coefficient update unit, since the input to RSHN is based on the output from Generator B.)
	
	Regarding Claim 2, Liu/Rao respectively teaches and incorporates the claimed limitations and rejections of Claim 1. Liu, via Liu/Rao, further teaches:
	The training recognition device according to claim 1, wherein the trained deep neural network includes a plurality of training layers,  (Liu, p.8 left ¶2, “We used the recurrent stacked hourglass network (RSHN)… as the instance segmentation and tracking backbone to encode the embedding vectors of each pixel. The RSHN is a stacked hourglass network with a convolutional gated recurrent unit to process temporal information.”) (Note: the stacked hourglass network corresponds to a plurality of training layers)
	the deep neural network for re-training includes a plurality of re-training layers,  (Liu, Fig. 3, “Last, an embedding based instance segmentation and tracking algorithm is trained using synthetic training data”; [p.8 left ¶2] “We used the recurrent stacked hourglass network (RSHN)… as the instance segmentation and tracking backbone to encode the embedding vectors of each pixel. The RSHN is a stacked hourglass network with a convolutional gated recurrent unit to process temporal information.”) (Note: the RHSN corresponds to the deep neural network for re-training)
	Rao, via Liu/Rao, further teaches:
	the pre-trained feature extraction unit performs the training in order from a preceding layer among the plurality of training layers,  (Rao, p.3 right last ¶,
	
    PNG
    media_image3.png
    198
    457
    media_image3.png
    Greyscale

    PNG
    media_image4.png
    194
    468
    media_image4.png
    Greyscale

	[Fig.5]
	
    PNG
    media_image5.png
    784
    405
    media_image5.png
    Greyscale

	) (Note: F corresponds to the pre-trained feature extraction unit.)
	and the re-training feature extraction unit performs the re-training in order from a preceding layer among the plurality of re-training layers. (Rao, p.3 right last ¶; Fig.5) (Note: the Sim2Real generator G corresponds to the re-training feature extraction unit)

	Regarding Claim 4, Liu/Rao respectively teaches and incorporates the claimed limitations and rejections of Claim 2. Liu, via Liu/Rao, further teaches:
	The training recognition device according to claim 2, wherein the trained deep neural network and the deep neural network for re-training are constituted by a shared deep neural network,  (Liu, fig. 3 bottom, each frame of the simulated annotation of instance segmentation and tracking corresponds to each feature point of the simulation image; the RSHN corresponds to the trained deep neural network; “Last, an embedding based instance segmentation and tracking algorithm is trained using synthetic training data”; 
	
    PNG
    media_image1.png
    669
    1139
    media_image1.png
    Greyscale
 [p.8 left ¶2] “We used the recurrent stacked hourglass network (RSHN)… as the instance segmentation and tracking backbone to encode the embedding vectors of each pixel. The RSHN is a stacked hourglass network with a convolutional gated recurrent unit to process temporal information.”) (Note: the RHSN corresponds to the deep neural network for re-training; the steps are shared and thus shared)
	and the training recognition device further includes a switching unit configured to switch the shared deep neural network by a time division operation. (Liu [p.8 left ¶2] From the above stages, the synthetic videos and corresponding annotations are achieved frame-by-frame.) (Note: the ASIST method switching the annotation to each frame corresponds to a time division operation.)

	Regarding Claim 8, Liu/Rao respectively teaches and incorporates the claimed limitations and rejections of Claim 1. Liu, via Liu/Rao, further teaches:
	The training recognition device according to claim 1, wherein the image conversion unit stops the conversion from the simulation image by the generative adversarial network to the artificial site image within a period in which the re- training is not performed. (Liu, fig. 3 step 3,
    PNG
    media_image1.png
    669
    1139
    media_image1.png
    Greyscale

	) (Note: the simulated annotation corresponds to the simulation image; the synthesized video corresponds to the artificial site image; since step 3 occurs after the generated image from Generator B, re-training for the generative adversarial network is not performed in step 3.)
	
	Regarding Claim 9, Liu teaches:
	A training recognition device comprising: an image conversion unit configured to input a first site image, which is an existing actual site image, and a second site image, which is another actual site image, to a generative adversarial network,  (Liu, fig. 3 steps 1-2,
    PNG
    media_image1.png
    669
    1139
    media_image1.png
    Greyscale

	) (Note: step 1, “microvilli” corresponds to the first site image; the synthesized video corresponds to the second site image)
	and convert the first site image into an artificial site image, the artificial site image being an image imitating a site image including an article to be recognized;  (Liu, Fig. 1 middle top right, 
    PNG
    media_image6.png
    281
    1071
    media_image6.png
    Greyscale
 fig. 3 step 2,
    PNG
    media_image1.png
    669
    1139
    media_image1.png
    Greyscale

	) (Note: the simulated annotation corresponds to the simulation image; the synthesized video corresponds to the artificial site image; the synthesized video is being recognized by RSHN, thus corresponding to imitating a site image including an article to be recognized)
	
	a pre-trained feature extraction unit configured to input the first site image to a trained deep neural network trained using the first site image and annotation data for the first site image and output a feature point of the first site image at time of re-training;  (Liu, fig. 3 bottom, each frame of the simulated annotation of instance segmentation and tracking corresponds to each feature point of the simulation image; the RSHN corresponds to the trained deep neural network)
	a re-training feature extraction unit configured to input the artificial site image to a deep neural network for re-training, re-train a difference between the first site image and the artificial site image, and output a feature point of the artificial site image;  (Liu, fig.3 bottom, each frame of the synthesized (microvilli) video corresponds to the feature point of the artificial site image)
	Liu does not explicitly teach, but Rao further teaches:
	an error calculation unit for feature extraction unit configured to calculate a difference between the feature point output by the re-training feature extraction unit and the feature point output by the pre-trained feature extraction unit;  (Rao, p.3 right last ¶,
	
    PNG
    media_image3.png
    198
    457
    media_image3.png
    Greyscale

    PNG
    media_image4.png
    194
    468
    media_image4.png
    Greyscale
) (Note: the Sim2Real generator G(x) corresponds to the output of the re-training feature extraction unit; F(G(x)) corresponds to the output of the pre-trained feature extraction unit; the cycle consistency loss corresponds to the distance between the feature point outputs)
	a coefficient update unit for feature extraction unit configured to update a coefficient of the re-training feature extraction unit used for re-training based on the difference;  (Rao, p.3 right last ¶,
	
    PNG
    media_image3.png
    198
    457
    media_image3.png
    Greyscale

    PNG
    media_image4.png
    194
    468
    media_image4.png
    Greyscale
) (Note: the output of Sim2Real generator G(x) corresponds to the output of the re-training feature extraction unit; the output of F(G(x)) corresponds to the output of the pre-trained feature extraction unit; the cycle consistency loss corresponds to the distance between the feature point outputs)
	Rao and Liu are analogous to the present invention because both are from the same field of endeavor of cycleGAN-based training methods. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to implement the loss function for training the generators from Rao into Liu’s GAN-based synthesized annotation method. The motivation would be to “Incorporating this loss into unsupervised domain translation, we obtain RL-CycleGAN, a new approach for simulation-to-real-world transfer for reinforcement learning… offers a substantial improvement over a number of prior methods for sim-to-real transfer, attaining excellent real-world performance with only a modest number of real-world observations.” (Rao, abstract).
	
	Liu, via Liu/Rao, further teaches:
	and a re-training identification unit configured to re-train a method for identifying an article based on a feature point output from the deep neural network for re-training of the coefficient updated by the coefficient update unit for feature extraction unit. (Liu, fig. 3,
    PNG
    media_image1.png
    669
    1139
    media_image1.png
    Greyscale

	Last, an embedding based instance segmentation and tracking algorithm is trained using synthetic training data) (Note: the embedding based instance segmentation and tracking algorithm identifies the tracking of the elements on the video and thus corresponds to the retraining identification unit)
	
	Claims 10 and 12, where claim 10 depends on claim 9 and 12 depends on claim 10, also recite the device configured to perform precisely the methods of Claims 2 and 4, respectively. Thus, Claims 10 and 12 are rejected for reasons set forth in Claims 2 and 4, respectively.

	Claims 3, 5-7, and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Liu/Rao in view of Payer et al., "Instance Segmentation and Tracking with Cosine Embeddings and Recurrent Hourglass Networks" (hereinafter Payer).
	
	Regarding Claim 3, Liu/Rao respectively teaches and incorporates the claimed limitations and rejections of Claim 2. Liu/Rao does not teach, but Payer further teaches:
	The training recognition device according to claim 2, further comprising: an accuracy determination unit configured to determine an accuracy of the deep neural network for re-training, wherein the re-training feature extraction unit terminates re-training based on an accuracy determination result by the accuracy determination unit. (Payer, p.3, “2.1 Recurrent Stacked Hourglass Network… We apply the loss function on the outputs of both hourglasses, while we only use the outputs of the second hourglass for the clustering of embeddings… [p.4] 
    PNG
    media_image7.png
    201
    483
    media_image7.png
    Greyscale
) (Note: Recurrent Stacked Hourglass Network (RSHN) corresponds to the deep neural network for retraining; L corresponds to the determined accuracy of the deep neural network for retraining; minimizing L corresponds to terminating the updating based on the accuracy determination result)
	Payer and Liu/Rao are analogous to the present invention because both are from the same field of endeavor of methods for training a deep neural network. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to implement the cosine embedding loss function from Payer into Liu/Rao’s deep neural network in RSHN. The motivation would be to “such that the network predicts unique embeddings for every instance throughout videos.” (Payer, abstract).
	
	Regarding Claim 5, Liu/Rao respectively teaches and incorporates the claimed limitations and rejections of Claim 1. Liu/Rao does not teach, but Payer further teaches:
	The training recognition device according to claim 1, further comprising: an error calculation unit configured to calculate a difference between output data of the re-training identification unit and the annotation data;  (Payer, p.4 ¶1, 
	
    PNG
    media_image8.png
    192
    485
    media_image8.png
    Greyscale
… 
	Since our embedding loss allows same embeddings for different instances that are far apart, we use both image coordinates and value of the embeddings as data points for the clustering algorithm. After identifying the embedding clusters with HDBSCAN and filtering clusters that are smaller than tsize, the final segmented instances for each frame pair are obtained. 
	For merging the segmented instances in overlapping frame pairs, we identify same instances by the highest intersection over union (IoU) between each segmented instance in the overlapping frame. The resulting segmentations are then upsampled back to the original image size, generating the final segmented and tracked instances.) 
	and a coefficient update unit configured to update the coefficient used by the re-training identification unit based on the difference, wherein the re-training identification unit re-trains the method for identifying the article based on the feature point output from the deep neural network for re-training of the coefficient updated by the coefficient update unit for identification unit. (Payer, p.4 ¶1, 
	
    PNG
    media_image9.png
    90
    468
    media_image9.png
    Greyscale
) (Note: minimizing L corresponds to updating the loss coefficient used by the re-training identification unit)
	Payer and Liu/Rao are analogous to the present invention because both are from the same field of endeavor of methods for training a deep neural network. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to implement the cosine embedding loss function from Payer into Liu/Rao’s deep neural network in RSHN. The motivation would be to “such that the network predicts unique embeddings for every instance throughout videos.” (Payer, abstract).

	Regarding Claim 6, Liu/Rao respectively teaches and incorporates the claimed limitations and rejections of Claim 2. Liu/Rao does not teach, but Payer further teaches:
	The training recognition device according to claim 1, further comprising: an error calculation unit for identification unit configured to calculate a difference between output data of the re-training identification unit and the annotation data;  (Payer, p.4 ¶1, 
	
    PNG
    media_image8.png
    192
    485
    media_image8.png
    Greyscale
… 
	Since our embedding loss allows same embeddings for different instances that are far apart, we use both image coordinates and value of the embeddings as data points for the clustering algorithm. After identifying the embedding clusters with HDBSCAN and filtering clusters that are smaller than tsize, the final segmented instances for each frame pair are obtained. 
	For merging the segmented instances in overlapping frame pairs, we identify same instances by the highest intersection over union (IoU) between each segmented instance in the overlapping frame. The resulting segmentations are then upsampled back to the original image size, generating the final segmented and tracked instances.) 
	and a coefficient update unit for identification unit configured to update the coefficient used by the re-training identification unit based on the difference, wherein the re-training identification unit re-trains the method for identifying the article based on the feature point output from the deep neural network for re-training of the coefficient updated by the coefficient update unit for identification unit. (Payer, p.4 ¶1, 
	
    PNG
    media_image9.png
    90
    468
    media_image9.png
    Greyscale
) (Note: minimizing L corresponds to updating the loss coefficient used by the re-training identification unit)
	Payer and Liu/Rao are analogous to the present invention because both are from the same field of endeavor of methods for training a deep neural network. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to implement the cosine embedding loss function from Payer into Liu/Rao’s deep neural network in RSHN. The motivation would be to “such that the network predicts unique embeddings for every instance throughout videos.” (Payer, abstract).

	Claim7, dependent on Claim4 respectively, also recite the system configured to perform precisely the methods of Claim 6. Thus, Claim7 is rejected for reasons set forth in Claim 6.
		
	Claim 11, dependent on Claim 10, also recites the device configured to perform precisely the methods of Claim 3. Thus, Claim 11 is rejected for reasons set forth in Claim 3.


Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOSEP HAN whose telephone number is (703)756-1346. The examiner can normally be reached Mon-Fri 9am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.








/J.H./Examiner, Art Unit 2122                                                              
                                                                                                                                                                                                                      
/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122
Read full office action
Training Recognition Device

This examiner grants 38% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Training Recognition Device

This examiner grants 38% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email