Last updated: April 19, 2026
Application No. 18/402,528
OBJECT DETECTION SYSTEMS AND METHODS INCLUDING AN OBJECT DETECTION MODEL USING A TAILORED TRAINING DATASET

Non-Final OA §103§112§DP
Filed
Jan 02, 2024
Examiner
KAUR, JASPREET
Art Unit
2662
Tech Center
2600 — Communications
Assignee
Tyco Fire & Security GmbH
OA Round
1 (Non-Final)
Interview Optional

— +30.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 16 resolved cases, 2023–2026
Examiner Intelligence

KAUR, JASPREET View full profile →
Grants 81% — above average
Career Allow Rate
13 granted / 16 resolved
+19.3% vs TC avg
Strong +30% interview lift
Without
With
+30.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 8m
Avg Prosecution
31 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
17.2%
-22.8% vs TC avg
§103
53.2%
+13.2% vs TC avg
§102
7.4%
-32.6% vs TC avg
§112
15.3%
-24.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 16 resolved cases
Office Action

§103 §112 §DP
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Acknowledgement is made of Applicant’s claim priority to this application being a Continuation-in-part of Application No. 17/468,175, filed on September 09, 2021.	

Information Disclosure Statement
The information disclosure statement (“IDS”) filed on 06/24/2024, 07/31/2024, and 05/29/2025 have been reviewed and the listed references have been considered.

Status of Claims
Claims 1-20 are pending.

Drawings
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they include the following reference character(s) not mentioned in the description: 210 of Figure 2.  Corrected drawing sheets in compliance with 37 CFR 1.121(d), or amendment to the specification to add the reference character(s) in the description in compliance with 37 CFR 1.121(b) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Specification
The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):

(b)  CONCLUSION. —The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


	
Claims 1-20 are rejected under 112(b) as indefinite. Independent claims 1,9, and 17, and depend claims 3, 7, 11, 15 recite a negative limitation. Regarding claim 1 which recites “detecting the classification model did not output the first class for the second image frame as predicted” renders the claim indefinite, because it is unclear what was predicted to be detected by the second frame. The limitation “detecting the classification model did not output the first class for the second image frame as predicted” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. As one could reasonability assume that cases may exist where the second frame does not contain the object that was identified in the previous frame, hence not outputting the first class. Therefore, for examination purposes examiner has interpreted the claims under broadest reasonable interpretation to be only add the second image frame to the training dataset when the classification model misclassified objects within the image. Independent claims 9 and 17 are similarly rejected for containing the same negative limitation. Therefore, dependent claims 2-8, 10-16, and 18-20, dependent on claims 1, 9, and 17, respectively, are also rejected under 112(b) indefinite. 

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory obviousness-type double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. 
Effective January 1, 1994, a registered attorney or agent of record may sign a terminal disclaimer.  A terminal disclaimer signed by the assignee must fully comply with 37 UFR 3.73(b). The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.

Claims 1-20 are rejected on the ground of nonstatutory obviousness-type double patenting as being unpatentable over claims of U.S. Patent No. 11,893,084. Although the claims at issue are not identical, they are not patentably distinct from each other because the claims of the instant application are obvious variants of the corresponding ones in the patent in view of Zhou et al. (US 2019/0130188 A1).
This is a nonstatutory obviousness-type double patenting rejection because the patentably indistinct claims have not in fact been patented.

For example, the following is a chart comparing claim 1 of the instant application to the claim 1 of the U.S. Patent No. 11,893,084:

Instant application: 18/402,528
U.S. Patent No.: 11,893,084




Although the U.S. Patent 11,893,084 discloses a “ROI detection model”, and “ROI tracking model” that can be interpretated as the “classification model” and the “classification tracking model” of the instant application. The U.S. Patent 11,893,084 does not disclose “a first class of a first object depicted in a first image frame” and “predict, using a classification tracking model, that the classification model will output the first class for the second image frame in response to detecting the first object in the second image frame”. 
However, in an analogous field of endeavor Zhou teaches “receive, from a classification model, a first class of a first object depicted in a first image frame (Zhou Figure 1 and paragraph [0122] "deep learning-based detector that can be used to detect or classify objects in video frames includes the You only look once (YOLO) detector") – where the blob detection system is the classification model” and
“predict, using a classification tracking model, that the classification model (Zhou paragraph [0228] "plurality of classification requests include one or more classification requests generated for the object tracker in the current video frame, and also one or more classification requests generated for one or more object trackers in one or more previous video frames") will output the first class for the second image frame in response to detecting the first object in the second image frame, wherein the classification tracking model is configured to detect whether one or more objects in one image exist in another image (Zhou paragraph [0229] "the one or more characteristics associated with an object tracker from the subset of object trackers include a state change of the object tracker from a first state to a second state. For example, a classification request can be generated for the object tracker when a state of the object tracker is changed from the first state (e.g., the state of the object tracker in a previous video frame) to the second state in the current video frame")”.

    PNG
    media_image1.png
    557
    836
    media_image1.png
    Greyscale

Zhou Figure 1

	It would have been obvious to a person having ordinary skill in the art before
effective filing date of the claimed invention of the instant application to combine the ROI detection and tracking model of claim 1 of U.S. Patent 11,893,084 to include classifying detected objects in each image frame as taught by Zhou. 
The suggestion/motivation for doing so would have been “Systems and methods are described herein for improving video analytics by introducing the classification functionality into a video analytics system based on conventional motion object (blob) detection and tracking. The systems and methods described herein provide high accuracy classification results provided by object classification, while eliminating or greatly reducing some of the issues that result from object classification. For example, object detection and tracking can be performed along with object classification in real-time ( e.g., at a rate of at least 30 fps when a camera device is operating a 30 fps frame rate) and with low complexity” as noted by Zhou disclosure in paragraph 128.
Therefore, it would have been obvious to combine the disclosure of U.S. Patent 11,893,084 with the Zhou disclosure to obtain the invention as specified in the instant application claim 1 as there is a reasonable expectation of success and/or because doing so merely combines prior art elements according to known methods to yield predictable results. Claims 2-20 are similarly rejected under nonstatutory obviousness-type double patenting as being unpatentable over claims of U.S. Patent 11,893,084 in view of Zhou.

Claims 1-20 are rejected on the ground of nonstatutory obviousness-type double patenting as being unpatentable over claims of U.S. Patent No. 12,147,501. Although the claims at issue are not identical, they are not patentably distinct from each other because the claims of the instant application are obvious variants of the corresponding ones in the patent in view of Zhou et al.
This is a nonstatutory obviousness-type double patenting rejection because the patentably indistinct claims have not in fact been patented.

For example, the following is a chart comparing claim 1 of the instant application to the claim 1 of the U.S. Patent No. 12,147,501:

Instant application: 18/402,528
U.S. Patent No.: 12,147,501




Although the U.S. Patent 12,147,501 discloses a “ROI detection model”, and “ROI tracking model” that can be interpretated as the “classification model” and the “classification tracking model” of the instant application. The U.S. Patent 12,147,501 does not disclose “a first class of a first object depicted in a first image frame” and “predict, using a classification tracking model, that the classification model will output the first class for the second image frame in response to detecting the first object in the second image frame, wherein the classification tracking model is configured to detect whether one or more objects in one image exist in another image; detect whether the classification model outputs the first class for the second image frame”. 
However, in an analogous field of endeavor Zhou teaches “receive, from a classification model, a first class of a first object depicted in a first image frame (Zhou Figure 1 and paragraph [0122] "deep learning-based detector that can be used to detect or classify objects in video frames includes the You only look once (YOLO) detector") – where the blob detection system is the classification model” and 
“predict, using a classification tracking model, that the classification model (Zhou paragraph [0228] "plurality of classification requests include one or more classification requests generated for the object tracker in the current video frame, and also one or more classification requests generated for one or more object trackers in one or more previous video frames") will output the first class for the second image frame in response to detecting the first object in the second image frame, wherein the classification tracking model is configured to detect whether one or more objects in one image exist in another image (Zhou paragraph [0229] "the one or more characteristics associated with an object tracker from the subset of object trackers include a state change of the object tracker from a first state to a second state. For example, a classification request can be generated for the object tracker when a state of the object tracker is changed from the first state (e.g., the state of the object tracker in a previous video frame) to the second state in the current video frame");
detect whether the classification model outputs the first class for the second image frame (Zhou paragraph [0144] "The RECOVER state change is determined when a lost or hidden tracker is detected and output again (with an old tracker label or ID) in the current frame. For example, a blob being tracked by a tracker may not be detected in a first frame, at which point the tracker is transitioned to a lost state. At a second frame, the blob may again be detected (in which case a recover state change RECOVER occurs), at which point the tracker can be output again with the same tracker ID. In such an example, a classification invocation request can be generated for the tracker at the second frame (e.g., at block 914)")”.

    PNG
    media_image1.png
    557
    836
    media_image1.png
    Greyscale

Zhou Figure 1

	It would have been obvious to a person having ordinary skill in the art before
effective filing date of the claimed invention of the instant application to combine the ROI detection and tracking model of claim 1 of U.S. Patent 12,147,501 to include classifying detected objects in each image frame as taught by Zhou. 
The suggestion/motivation for doing so would have been “Systems and methods are described herein for improving video analytics by introducing the classification functionality into a video analytics system based on conventional motion object (blob) detection and tracking. The systems and methods described herein provide high accuracy classification results provided by object classification, while eliminating or greatly reducing some of the issues that result from object classification. For example, object detection and tracking can be performed along with object classification in real-time ( e.g., at a rate of at least 30 fps when a camera device is operating a 30 fps frame rate) and with low complexity” as noted by Zhou disclosure in paragraph 128.
Therefore, it would have been obvious to combine the disclosure of U.S. Patent 12,147,501 with the Zhou disclosure to obtain the invention as specified in the instant application claim 1 as there is a reasonable expectation of success and/or because doing so merely combines prior art elements according to known methods to yield predictable results. Claims 2-20 are similarly rejected under nonstatutory obviousness-type double patenting as being unpatentable over claims of U.S. Patent 12,147,501 in view of Zhou.

	Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-4, 7-12, and 15-20 are rejected under 35 U.S.C. 103 as being unpatentable over Zhou et al. (US 2019/0130188 A1) in view of Forman et al. (US 7,792,353 B2).
Regarding claim 1, Zhou teaches “An apparatus for object detection (Zhou paragraph [0064] "A video analytics system can obtain a sequence of video frames from a video source and can process the video sequence to perform a variety of tasks"), comprising:
at least one memory (Zhou paragraph [0246] "The storage device may include any of a variety of distributed or locally accessed data storage media such as a hard drive, Blu-ray discs, DVDs, CD-ROMs, flash memory, volatile or non-volatile memory, or any other suitable digital storage media for storing encoded video data"); and 
at least one hardware processor coupled with the at least one memory and configured, individually or in combination (Zhou paragraph [0063] "embodiments may be implemented by hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the necessary tasks (e.g., a computer-program product) may be stored in a computer-readable or machine-readable medium. A processor(s) may perform the necessary tasks"), to:
receive, from a classification model, a first class of a first object depicted in a first image frame (Zhou Figure 1 and paragraph [0122] "deep learning-based detector that can be used to detect or classify objects in video frames includes the You only look once (YOLO) detector");
receive a second image frame that is a subsequent frame to the first image frame in a video (Zhou Figure 2 paragraph [0072] "when the blob tracker is updated in the previous frame ( after being associated with the previous blob in the previous frame), updated information for the blob tracker can include the tracking information for the previous frame and also prediction of a location of the blob tracker in the next frame (which is the current frame in this example)");
predict, using a classification tracking model, that the classification model (Zhou paragraph [0228] "plurality of classification requests include one or more classification requests generated for the object tracker in the current video frame, and also one or more classification requests generated for one or more object trackers in one or more previous video frames") will output the first class for the second image frame in response to detecting the first object in the second image frame, wherein the classification tracking model is configured to detect whether one or more objects in one image exist in another image (Zhou paragraph [0229] "the one or more characteristics associated with an object tracker from the subset of object trackers include a state change of the object tracker from a first state to a second state. For example, a classification request can be generated for the object tracker when a state of the object tracker is changed from the first state (e.g., the state of the object tracker in a previous video frame) to the second state in the current video frame");
detect whether the classification model outputs the first class for the second image frame (Zhou paragraph [0144] "The RECOVER state change is determined when a lost or hidden tracker is detected and output again (with an old tracker label or ID) in the current frame. For example, a blob being tracked by a tracker may not be detected in a first frame, at which point the tracker is transitioned to a lost state. At a second frame, the blob may again be detected (in which case a recover state change RECOVER occurs), at which point the tracker can be output again with the same tracker ID. In such an example, a classification invocation request can be generated for the tracker at the second frame (e.g., at block 914)");
(Zhou paragraph [0127] "While the purpose of object classification is to assign a class type for a blob, there are cases in which false classification results are output. When such false classification results are given, the tracking system may not be able to determine if an event has been changed, in which case the tracking system has no chance to update the class type of the object (with a wrong type). Further, when a blob associated with a tracker has a small size, the classification results are not reliable. In such cases, false classification labels can be assigned to the trackers"); and

	However, Zhou is not relied on to teach “determine that the second image frame should be added to a training dataset for the classification model” and “re-train the classification model, to define a re-trained classification model, using the training dataset comprising the second image frame in response to determining that the second image frame should be added to the training dataset.”
	In the same field of endeavor, Forman teaches “determine that the second image frame should be added to a training dataset (Forman column 8 lines 5-13 "In step 134, samples are selected 175 from among both training set 45 and prediction set 171, e.g., by evaluating samples in both such sets and selecting one or more based on a specified selection criterion. As illustrated in FIG. 9, this step is accomplished in the present embodiment by using a criterion that attempts to maximize the expected benefit from multiple different considerations, including but not limited to evaluation of: consistency 177 ( e.g., questioning training label outliers)")” and “re-train the classification model, to define a re-trained classification model, using the training dataset comprising the second image frame in response to determining that the second image frame should be added to the training dataset (Forman column 10 lines 16-19 "Finally, in step 138 the training set 45 is modified based on the labels received in step 137, the classifier 3 is retrained based on the modified training set 45, and at least some of the samples 2 and 7 are reprocessed using classifier 3").”
	It would have been obvious to a person having ordinary skill in the art before
effective filing date of the claimed invention of the instant application to combine an
object detection, classification and tracking across frames of a video as taught by Zhou to use the system of training the models with sample frames acquired during processing as taught by Forman.
The suggestion/motivation for doing so would have been it is known in the field of object detection and classification that training sets for training models help improve model accuracy "While conventional techniques generally assume that the assigned classification labels 8 are correct, in the preferred embodiments of the present invention such labels 8 are repeatedly questioned and some of them may be submitted for confirmation/re-labeling if they do not appear to conform to the underlying model used by classifier 3, as discussed in more detail below" as noted by the Forman disclosure in column 2, lines 37-42.
Therefore, it would have been obvious to combine the disclosure of Zhou with
the Forman disclosure to obtain the invention as specified in claim 1 as there is a
reasonable expectation of success and/or because doing so merely combines prior art
elements according to known methods to yield predictable results.
Claim 9 recites a method with steps corresponding to the apparatus elements
recited in claim 1. Therefore, the recited steps of this claim are mapped to the
proposed combination in the same manner as the corresponding elements of apparatus
claim 1. Additionally, the rationale and motivation to combine the Zhou and Forman
references, presented in rejection of claim 1 apply to this claim.
	

Claim 17 recites a computer readable medium including computer executable instructions corresponding to the elements of the apparatus recited in claim 1.  Therefore, the recited instructions of the computer readable medium of claim 17 are mapped to the proposed combination in the same manner as the corresponding elements of the apparatus claim 1.  Additionally, the rationale and motivation to combine Zhou and Forman presented in rejection of claim 1, apply to this claim.  

	
Regarding claim 2 (similarly claim 10 and claim 18), the combination of Zhou and Forman teaches “The apparatus of claim 1, wherein the at least one hardware processor is configured to: execute the re-trained classification model, wherein the re-trained classification model outputs the first class for the first object in any subsequently inputted image frame depicting the first object (Zhou paragraph [0104] "Once the association between the blob trackers 410A and blobs 408 has been completed, the blob tracker update engine 416 can use the information of the associated blobs, as well as the trackers' temporal statuses, to update the status (or states) of the trackers 410A for the current frame. Upon updating the trackers 410A, the blob tracker update engine 416 can perform object tracking using the updated trackers 410N, and can also provide the updated trackers 410N for use in processing a next frame").”

Regarding claim 3 (similarly claim 11 and claim 19), the combination of Zhou and Forman teaches “The apparatus of claim 1, wherein the at least one hardware processor is configured to: determine that the second image frame should not be added to the training dataset for the classification model (Forman column 8 lines 5-13 "In step 134, samples are selected 175 from among both training set 45 and prediction set 171, e.g., by evaluating samples in both such sets and selecting one or more based on a specified selection criterion. As illustrated in FIG. 9, this step is accomplished in the present embodiment by using a criterion that attempts to maximize the expected benefit from multiple different considerations, including but not limited to evaluation of: consistency 177 ( e.g., questioning training label outliers)") in response to detecting that the classification model did output the first class in the second image frame as predicted (Zhou paragraph [0144] "The RECOVER state change is determined when a lost or hidden tracker is detected and output again (with an old tracker label or ID) in the current frame. For example, a blob being tracked by a tracker may not be detected in a first frame, at which point the tracker is transitioned to a lost state. At a second frame, the blob may again be detected (in which case a recover state change RECOVER occurs), at which point the tracker can be output again with the same tracker ID. In such an example, a classification invocation request can be generated for the tracker at the second frame (e.g., at block 914)"); and
re-train the classification model, to define the re-trained classification model, using the training dataset not comprising the second image frame in response to determining that the second image frame should not be added to the training dataset (Forman column 10 lines 16-19 "Finally, in step 138 the training set 45 is modified based on the labels received in step 137, the classifier 3 is retrained based on the modified training set 45, and at least some of the samples 2 and 7 are reprocessed using classifier 3").”
The proposed combination as well as the motivation for combining Zhou and Forman references presented in the rejection of claim 1, applies to claim 3. Finally the apparatus recited in claim 3 is met by Zhou and Forman.

Regarding claim 4 (similarly claim 12 and claim 20), the combination of Zhou and Forman teaches “The apparatus of claim 1, wherein the at least one hardware processor is configured to determine that the second image frame should be added to the training dataset by: determining whether more than a threshold number of images (Forman column 10 lines 26-33 "5-20 samples are selected 175 for labeling, sorted by their prediction strength ( e.g. probability of belonging to the positive class according to the current classifier), and presented to the user 57 in a single screen. If the classifier 3 is reasonably accurate, the positive samples will be mostly gathered together, making it easier for the user 57 to group-select them and label them positive (same for the negative samples), with a few individual clicks to treat the exceptions") in the training dataset are labelled with the first class (Forman column 6 paragraph 10-12 "In any event, in the preferred embodiments the selected training samples 77 provided for confirmation/re-labeling are presented in groups of related samples"); and
adding the second image frame to the training dataset in response to determining that less than the threshold number of images in the training dataset are labelled with the first class (Forman column 10 lines 26-33 "5-20 samples are selected 175 for labeling, sorted by their prediction strength ( e.g. probability of belonging to the positive class according to the current classifier), and presented to the user 57 in a single screen. If the classifier 3 is reasonably accurate, the positive samples will be mostly gathered together, making it easier for the user 57 to group-select them and label them positive (same for the negative samples).”
The proposed combination as well as the motivation for combining Zhou and Forman references presented in the rejection of claim 1, applies to claim 4. Finally the apparatus recited in claim 4 is met by Zhou and Forman.

Regarding claim 7 (similarly claim 15), the combination of Zhou and Forman teaches “The apparatus of claim 1, wherein the at least one hardware processor is configured to detect that the classification model did not output the first class for the second image frame by determining that the classification model output a second class for the first object, wherein the second class does not match the first class (Zhou paragraph [0127] "While the purpose of object classification is to assign a class type for a blob, there are cases in which false classification results are output. When such false classification results are given, the tracking system may not be able to determine if an event has been changed, in which case the tracking system has no chance to update the class type of the object (with a wrong type). Further, when a blob associated with a tracker has a small size, the classification results are not reliable. In such cases, false classification labels can be assigned to the trackers").”
The proposed combination as well as the motivation for combining Zhou and Forman references presented in the rejection of claim 1, applies to claim 7. Finally the apparatus recited in claim 7 is met by Zhou and Forman.

Regarding claim 8 (similarly claim 16), the combination of Zhou and Forman teaches “The apparatus of claim 1, wherein the classification tracking model is configured to identify, using image metadata (Zhou paragraph [0109] "normal tracker (e.g., including certain status data of the normal tracker, the motion model for the normal tracker, or other information related to the normal tracker) can be output as part of object metadata. The metadata, including the normal tracker, can be output from the video analytics system ( e.g., an IP camera running the video analytics system) to a server or other system storage. The metadata can then be analyzed for event detection (e.g., by a rule interpreter). A tracker that is not promoted as a normal tracker can be removed ( or killed), after which the tracker can be considered as dead"), attributes of the one or more objects in the one image and predict that the attributes will be present in the another image in response to detecting the one or more objects in the another image (Zhou paragraph [0072] "when the blob tracker is updated in the previous frame ( after being associated with the previous blob in the previous frame), updated information for the blob tracker can include the tracking information for the previous frame and also prediction of a location of the blob tracker in the next frame (which is the current frame in this example)"), wherein the attributes comprises classifications associated with the one or more objects (Zhou Figure 26 and paragraph [0215] "Each cell also predicts a class for each bounding box. For example, a probability distribution over all the possible classes is provided. Any number of classes can be detected, such as a bicycle, a dog, a cat, a person, a car, or other suitable object class. The confidence score for a bounding box and the class prediction are combined into a final score that indicates the probability that that bounding box contains a specific type of object").”
The proposed combination as well as the motivation for combining Zhou and Forman references presented in the rejection of claim 1, applies to claim 8. Finally the apparatus recited in claim 8 is met by Zhou and Forman.

Claims 5-6 and 13-14 are rejected under 35 U.S.C. 103 as being unpatentable over Zhou and Forman in view of Brown et al. (US 2012/0170805 A1).
Regarding claim 5 (similarly claim 13), the combination of Zhou and Forman teaches “The apparatus of claim 1, wherein the first object is a person and a first ROI boundary around the first object has an occluded view of the person (Zhou Figure 21A-21C and "a blob is fed to a deep learning classification network, one or more shallow layers might learn simple geometrical objects, such as lines and/or other objects, that signify the object to be classified. The deeper layers will learn much more abstract, detailed features about the objects, such as sets of lines that define shapes or other detailed features, and then eventually sets of the shapes from the earlier layers that make up the shape of the object that is being classified ( e.g., a person, a car, an animal, or any other object). Further details of the structure and function of neural networks are described below with respect to FIG. 17-FIG. 21C"), and wherein the at least one hardware processor is configured to determine that the second image frame should be added to the training dataset by:
determining whether more than a threshold number of images in the training dataset (Forman column 10 lines 26-33 "5-20 samples are selected 175 for labeling, sorted by their prediction strength ( e.g. probability of belonging to the positive class according to the current classifier), and presented to the user 57 in a single screen. If the classifier 3 is reasonably accurate, the positive samples will be mostly gathered together, making it easier for the user 57 to group-select them and label them positive (same for the negative samples) include the occluded view of the person (Zhou Figure 21A-21C and "a blob is fed to a deep learning classification network, one or more shallow layers might learn simple geometrical objects, such as lines and/or other objects, that signify the object to be classified. The deeper layers will learn much more abstract, detailed features about the objects, such as sets of lines that define shapes or other detailed features, and then eventually sets of the shapes from the earlier layers that make up the shape of the object that is being classified ( e.g., a person, a car, an animal, or any other object). Further details of the structure and function of neural networks are described below with respect to FIG. 17-FIG. 21C"); and
(Forman column 10 lines 26-33 "5-20 samples are selected 175 for labeling, sorted by their prediction strength ( e.g. probability of belonging to the positive class according to the current classifier), and presented to the user 57 in a single screen. If the classifier 3 is reasonably accurate, the positive samples will be mostly gathered together, making it easier for the user 57 to group-select them and label them positive (same for the negative samples) include the occluded view of the person (Zhou Figure 21A-21C and "a blob is fed to a deep learning classification network, one or more shallow layers might learn simple geometrical objects, such as lines and/or other objects, that signify the object to be classified. The deeper layers will learn much more abstract, detailed features about the objects, such as sets of lines that define shapes or other detailed features, and then eventually sets of the shapes from the earlier layers that make up the shape of the object that is being classified ( e.g., a person, a car, an animal, or any other object). Further details of the structure and function of neural networks are described below with respect to FIG. 17-FIG. 21C"). 
	However, the combination of Zhou and Forman is not relied on to teach “adding the second image frame to the training dataset”
	Brown teaches “adding the second image frame to the training dataset (Brown paragraph [0057] "At step 525, the synthetically generated occluded images are added to the set of training data images")”.
	It would have been obvious to a person having ordinary skill in the art before
effective filing date of the claimed invention of the instant application to combine an
object detection, classification and tracking across frames of a video as taught by Zhou and Forman to include occluded image frames as taught by Brown.
The suggestion/motivation for doing so would have been “High volumes of activity data, different weather conditions, crowded scenes, partial occlusions, lighting effects such as shadows and reflections, and many other factors cause serious issues in real system deployments, making the problem very challenging. Traditional methods based on background modeling generally fail under these difficult conditions, as illustrated in FIGS. 1 and 2. FIG. 1 shows a typical crowded urban scene 100 to which event detection may be applied. FIG. 2 shows corresponding foreground blobs 200 obtained through background subtraction according to conventional event detection. Note that the prior art approach clusters groups of vehicles into the same blob" as noted by the Brown disclosure in paragraph 6.
Therefore, it would have been obvious to combine the disclosure of Zhou and Forman with the Brown disclosure to obtain the invention as specified in claim 5 as there is a reasonable expectation of success and/or because doing so merely combines prior art elements according to known methods to yield predictable results.

Regarding claim 6 (similarly claim 14), the combination of Zhou, Forman, and Brown teaches “The apparatus of claim 1, wherein the second image frame depicts a scene with a given light setting, background, or environment (Brown Figure 7 and paragraph [0069] "However we can see in FIG. 7 that we capture samples including a huge amount of appearance variation such as, for example, different lighting, weather conditions, vehicle models, and so forth. This is useful for training the appearance-based detector"), and wherein the at least one hardware processor is configured to determine that the second image frame should be added to the training dataset by:
determining whether more than a threshold number of images in the training dataset (Forman column 10 lines 26-33 "5-20 samples are selected 175 for labeling, sorted by their prediction strength ( e.g. probability of belonging to the positive class according to the current classifier), and presented to the user 57 in a single screen. If the classifier 3 is reasonably accurate, the positive samples will be mostly gathered together, making it easier for the user 57 to group-select them and label them positive (same for the negative samples) include the given light setting, background, or environment (Brown Figure 7 and paragraph [0069] "However we can see in FIG. 7 that we capture samples including a huge amount of appearance variation such as, for example, different lighting, weather conditions, vehicle models, and so forth. This is useful for training the appearance-based detector"); and
adding the second image frame to the training dataset in response to determining
that less than the threshold number of images in the training dataset (Forman column 10 lines 26-33 "5-20 samples are selected 175 for labeling, sorted by their prediction strength ( e.g. probability of belonging to the positive class according to the current classifier), and presented to the user 57 in a single screen. If the classifier 3 is reasonably accurate, the positive samples will be mostly gathered together, making it easier for the user 57 to group-select them and label them positive (same for the negative samples) include the given light setting, background, or environment (Brown Figure 7 and paragraph [0069] "However we can see in FIG. 7 that we capture samples including a huge amount of appearance variation such as, for example, different lighting, weather conditions, vehicle models, and so forth. This is useful for training the appearance-based detector").”
The proposed combination as well as the motivation for combining Zhou, Forman, and Brown references presented in the rejection of claim 5, applies to claim 6. Finally the apparatus recited in claim 6 is met by Zhou, Forman, and Brown.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JASPREET KAUR whose telephone number is (571)272-5534. The examiner can normally be reached Monday - Friday 7:30 am - 4:00 PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Amandeep Saini can be reached at (571)272-3382. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
	
/JASPREET KAUR/Examiner, Art Unit 2662                                                                                                                                                                                                        
/AMANDEEP SAINI/Supervisory Patent Examiner, Art Unit 2662
Read full office action
Prosecution Timeline

Jan 02, 2024
Application Filed
Mar 04, 2026
Non-Final Rejection — §103, §112, §DP (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/173,287
Patent 12596301
RETICLE INSPECTION AND PURGING METHOD AND TOOL
2y 5m to grant Granted Apr 07, 2026
18/480,768
Patent 12555199
IMAGE PROCESSING APPARATUS, IMAGE PROCESSING METHOD, AND STORAGE MEDIUM, WITH SYNTHESIS OF TWO INFERENCE RESULTS ABOUT AN IDENTICAL FRAME AND WITH INITIALIZING OF RECURRENT INFORMATION
2y 5m to grant Granted Feb 17, 2026
18/465,240
Patent 12513319
END-TO-END INSTANCE-SEPARABLE SEMANTIC-IMAGE JOINT CODEC SYSTEM AND METHOD
2y 5m to grant Granted Dec 30, 2025
18/172,643
Patent 12427606
SYSTEMS AND METHODS FOR NON-DESTRUCTIVELY TESTING STATOR WELD QUALITY AND EPOXY THICKNESS
2y 5m to grant Granted Sep 30, 2025
17/969,020
Patent 12421641
LAUNDRY TREATMENT APPLIANCE AND METHOD OF USING THE SAME ACCORDING TO MATCHED LAUNDRY LOADS
2y 5m to grant Granted Sep 23, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
81%
Grant Probability
99%
With Interview (+30.0%)
2y 8m
Median Time to Grant
Low
PTA Risk
Based on 16 resolved cases by this examiner. Grant probability derived from career allow rate.