DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments with respect to claims 1-3 and 5-16 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-2 and 7 are rejected under 35 U.S.C. 103 as being unpatentable over Sarkis et al (US 20160133022 A1) in view of Iter et al "Target Tracking with Kalman Filtering, KNN and LSTMs".
Referring to claim 1:
Sarkis et al disclose an electronic device (102), comprising:
a processor (par. 34) to:
provide a first set of images to an object tracker to output a bounding shape that represents an object in the first set of images (par. 37: “in initial frames . . . the object detection module 106 may determine a bounding box. The bounding box may be utilized to measure the one or more landmark positions.”; and par. 29: “A landmark is a location or point on an object or shape [and] a set of landmarks may be defined for a particular object [i.e., define an object shape].”); and
estimate a size and location of the bounding shape in a second set of images in response to the object tracker losing track of the object in the second set of images (par. 36-37: “the object detection module 106 may determine a location (e.g., a center point) and size of a bounding box that indicates the position and size of an object” and “in frames where tracking is lost, the object detection module 106 may determine a bounding box . . . [which] may be utilized to measure the one or more landmark positions”).
While Sarkis et al did not consider estimating the location of the bounding shape using an average location of the bounding shape in the first set of images, this is taught by Iter et al in a case where the object being tracked follows common patterns (sec. 4.3: “it is reasonable to use K-Nearest Neighbors as a baseline for predicting target motion. Specifically, we find the K most similar patterns that we’ve seen in our test set and use their weighted average to predict the new location of the target in the next frame.”).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified Sarkis et al in view of Iter et al, when the object being tracked follows a common patterns, to estimate the location of the bounding shape using an average location of the bounding shape in the first set of images, thereby providing a simple and expedient way to predict the location of an object, e.g., after losing track of the object, and continue tracking the object.
Referring to claim 2:
As indicated above, Sarkis et al disclose the bounding shape output by the object tracker comprises size information and location information for the bounding shape.
Referring to claim 7:
Sarkis et al, as modified in view of Iter et al, disclose the processor suspending bounding shape estimation in response to the object tracker tracking the object in a third set of images (par. 36-37: in frames where tracking is not lost, the need for the object detection module 106 to determine a location and size of a bounding box that indicates the position and size of an object is suspended).
Claims 1-3 and 5-7 are rejected under 35 U.S.C. 103 as being unpatentable over Sarkis et al Chen et al (US 20180286199 A1) in view of Iter et al "Target Tracking with Kalman Filtering, KNN and LSTMs".
Referring to claim 1:
Chen et al disclose an electronic device 100, comprising:
a processor (par. 10, 66) to:
provide a first set of images to an object tracker to output a bounding shape that represents an object in the first set of images (abstract, par. 72: “video analytics system 100 receives video frames 102 from a video source 130 . . . [and] video frames 102 can be part of one or more video sequences”; par. 75: “blob detection system 104 can detect one or more blobs in video frames of a video sequence, and the object tracking system 106 can track the one or more blobs [objects] across the frames of the video sequence” . . . “bounding region of a blob or tracker can include a bounding box, a bounding circle, a bounding ellipse, or any other suitably-shaped region representing a tracker or blob”; and par. 80-81: “video analytics system 100 can perform blob generation and detection for each frame or picture of a video sequence”); and
estimate a size and location of the bounding shape in a second set of images in response to the object tracker losing track of the object in the second set of images (par. 110: “there may be trackers that are temporarily lost (e.g., when a blob the tracker was tracking is no longer detected), in which case the locations of such trackers also need to be predicted (e.g., by a Kalman filter)” . . . “Prediction of the bounding box location helps not only to maintain certain level of tracking for lost and/or merged bounding boxes, but also to give more accurate estimation of the initial position of the trackers so that the association of the bounding boxes and trackers can be made more precise.”, wherein the size of the predicted bounding boxes would then need to be estimated to conform or adapt to the size of blobs [objects] being tracked).
While Chen et al did not consider estimating the location of the bounding shape using an average location of the bounding shape in the first set of images, this is taught by Iter et al in a case where the object being tracked follows common patterns (sec. 4.3: “it is reasonable to use K-Nearest Neighbors as a baseline for predicting target motion. Specifically, we find the K most similar patterns that we’ve seen in our test set and use their weighted average to predict the new location of the target in the next frame.”).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified Chen et al in view of Iter et al, when the object being tracked follows a common patterns, to estimate the location of the bounding shape using an average location of the bounding shape in the first set of images, thereby providing a simple and expedient way to predict the location of an object, e.g., after losing track of the object, and continue tracking the object.
Referring to claim 2:
As indicated above, Chen et al disclose the bounding shape output by the object tracker comprises size information and location information for the bounding shape.
Referring to claim 3:
Chen et al, as modified in view of Iter et al, disclose the processor estimating the location of the bounding shape comprises the processor to determining that the object is static in the first set of images and using location information for the bounding shape output by the object tracker in the first set of images to determine the location of the bounding shape in the second set of images (par. 77: “prediction of the location of the blob tracker in the current frame can be based on the location of the blob in the previous frame [and] a history or motion model can be maintained for a blob tracker, including a history of various states, a history of the velocity, and a history of location, of continuous frames”; and par. 110-111: “tracker's location can be further used to update the tracker's motion model and predict its location in the next frame”).
Referring to claim 5:
Chen et al, as modified in view of Iter et al, disclose the processor estimating the location of the bounding shape by determining that the object is moving in the first set of images, and apply a filter to predict the location of the bounding shape in the second set of images based on location information for the bounding shape output by the object tracker in the first set of images (par. 110-111: “the location of a blob tracker in a current frame may be predicted based on information from a previous frame [and] one method for performing a tracker location update is using a Kalman filter” and “a blob tracker can employ a Kalman filter to measure its trajectory as well as predict its future location(s) . . . the Kalman filter relies on the measurement of the associated blob(s) to correct the motion model for the blob tracker and to predict the location of the object tracker in the next frame”).
Referring to claim 6:
Chen et al, as modified in view of Iter et al, disclose the processor applying a filter to predict the size and location of the bounding shape in the second set of images based on size information and location information for the bounding shape output by the object tracker in the first set of images (par. 110-111: “the location of a blob tracker in a current frame may be predicted based on information from a previous frame [and] one method for performing a tracker location update is using a Kalman filter”, wherein the size of the predicted bounding boxes would then need to be estimated to conform or adapt to the size of blobs [objects] being tracked).
Referring to claim 7:
Chen et al, as modified in view of Iter et al, disclose the processor suspending bounding shape estimation in response to the object tracker tracking the object in a third set of images (par. 110, for trackers that are not temporarily lost (e.g., when a blob the tracker is tracking continues to be detected), predicting or estimating the tracker / bounding box location, and the size of the predicted bounding box needed to conform or adapt to the size of blob [object] being tracked is not required, and hence is suspended).
Claims 9-11 are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al in view of Iter et al and Tang et al (“Detection and Tracking of Occluded People”).
Referring to claim 9:
Chen et al disclose an electronic device 100, comprising:
a processor (par. 10, 66) to:
generate, via an object tracker, a first bounding shape that represents a first person and a second bounding shape that represents a second person in a first set of images (abstract; par. 75: “blob detection system 104 can detect one or more blobs in video frames of a video sequence, and the object tracking system 106 can track the one or more blobs across the frames of the video sequence” . . . “bounding region of a blob or tracker can include a bounding box, a bounding circle, a bounding ellipse, or any other suitably-shaped region representing a tracker or blob”; par. 80-81: “video analytics system 100 can perform blob generation and detection for each frame or picture of a video sequence”; and par. 315: “[first] person tracked with shape-adapted bounding box 2602a and [second] person tracked with shape-adapted bounding box 2604a”; par. 316: “[first] person tracked with shape-adapted bounding box 2702a and [second] person tracked with shape-adapted bounding box 2704a”; par. 319: “[first] person tracked with shape-adapted bounding box 2902a and [second] person tracked with shape-adapted bounding box 2904a”; or par. 320: “[first] person tracked with shape-adapted bounding box 3002a and [second] person tracked with shape-adapted bounding box 3004a”).
lose track, via the object tracker, of the first person in a second set of images (par. 110: “there may be trackers that are temporarily lost (e.g., when a blob the tracker was tracking is no longer detected)”);
estimate, via a bounding shape estimator, a size and location of the first bounding shape in the second set of images in response to losing track of the first person in the second set of images (par. 110-111: “the locations of such [lost] trackers also need to be predicted (e.g., by a Kalman filter)” . . . “Prediction of the bounding box location helps not only to maintain certain level of tracking for lost and/or merged bounding boxes, but also to give more accurate estimation of the initial position of the trackers so that the association of the bounding boxes and trackers can be made more precise.”, wherein the size of the predicted bounding boxes would then need to be estimated to conform or adapt to the blobs representing the people – see end of par. 315, 316, 319, 320); and
suspending the filter of the bounding shape estimator in response to the first person moving into view and no longer lost (this is implied if not inherent as there is no longer a reason to estimate a size and location of the first bounding shape in the second set of images when tracking of the first person is no longer lost).
While Chen et al did not explicitly disclose determining tracked persons that have become obscured (i.e., a first person becoming obscured by a second person), when considering Tang et al, it is taught using the advanced tracker in Pirsiavash et al (2011), which incorporates occlusion handling (i.e., determining tracked persons that have become obscured) by allowing tracks that skip several consecutive frames with low detection likelihood (under sec. 5: Multi-person Tracking, see p. 67, col. 2, lines 8-11).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified Chen et al in view of Tang et al to have incorporated in the object tracker, the occlusion handling such as in the advanced tracker in Pirsiavash et al, in order to improve processing speed and efficiency while also helping to maintain tracking accuracy. In the modification of Chen et al in view of Tang et al, the implied suspension of the filter of the bounding shape estimator is in response to the first person moving into view behind the second person.
Referring to claim 10:
Chen et al, as modified in view of Iter et al, disclose the processor estimating the size and location of the first bounding shape in response to determining that the first person is obscured by the second person (par. 315, 316, 319, 320: “[when] two people are close together, the initial bounding boxes (that are not shape adjusted) associated with the merged blob containing the people are largely overlapping with one another” and “shape adaptation [bounding shape estimation] is performed when splitting the merged objects, [so then] the two people are separately tracked with [a respective] bounding box”).
Referring to claim 11:
Chen et al, as modified in view of Iter et al, disclose the processor suspending bounding shape estimation in response to the processor determining that an estimated location of the first person is outside of a location of the second person (par. 160-161: “[if it is determined] that the blob should not be split, the whole blob splitting process can be terminated for the current blob”, i.e., depending on the area and overlap conditions of the bounding boxes, “the splitting process for the blob [can be ended]” and therefore the shape adaption [bounding shape estimation] for the split merged object is suspended; and par. 215: “since the distance between the bounding boxes of the objects merged into the merged blob can become larger . . . , the merge association optimization process may become irrelevant since the objects may no longer be merged into a merged blob during blob detection”).
Claims 8 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Sarkis et al or Chen et al in view of Iter et al as applied to claims 1 above, and further in view of well-known prior art (MPEP 2144.03).
Referring to claims 8 and 16:
While neither Sarkis et al nor Chen et al, as modified in view of Iter et al, disclose the processor, upon the size and location of the bounding shape being estimated, starting a timeout period, and suspending the estimation of the size and location of the bounding shape in the second set of images in response to an expiration of a timeout period, the feature of suspending a computational process after a timeout period is old and well known in the prior art in order manage processing resources and provide smoother operations. Therefore, for at least those reasons, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have included in Sarkis et al or Chen et al, as modified in view of Iter et al, in view of the well-known prior art, such a function of suspending bounding shape estimation in response to expiration of a timeout period.
Claims 12-15 are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al in view of Tang et al as applied to claim 9 above, and further in view of well-known prior art (MPEP 2144.03).
Referring to claim 12:
According to what was indicated above with respect to the functions recited below, Chen et al in view of Tang et al disclose (except what is underlined below) a non-transitory tangible computer-readable medium comprising instructions when executed cause a processor of an electronic device to (par. 65-66):
provide a first set of images to a machine-learning (ML) model of an object tracker to output [generate] a bounding shape that represents a first person in the first set of images and a second bounding shape that represents a second person in the first set of images;
determine that the first person is obscured by the second person;
determine that the ML model of the object tracker loses track of the first person in a second set of images;
determine that the first person [object] was in motion in the first set of images;
activate [apply] a filter of a bounding shape estimator to estimate a size and a location of the bounding shape in the second set of images in response to [based on] determining that the first person [object] was in motion [object location] in the first set of images; and
suspend the filter of the bounding shape estimator in response to the first person moving into view from behind the second person.
While Chen et al, as modified in view of Tang et al, do not disclose using a machine-learning (ML) model to output or generate the disclosed bounding shape that represents a first person in the first set of images, using ML models to generate bounding boxes for tracking people is routine because such models offer a robust and accurate way to localize and identify individuals in video or image data. Therefore, for at least those reasons, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention have included instructions when executed cause the processor in Chen et al, as modified in view of Tang et al, in view of the well-known prior art, provide a first set of images to a machine-learning (ML) model to output a bounding shape that represents a first person in the first set of images and determine that ML model loses track of the first person in a second set of images.
Referring to claims 13-14:
Chen et al, as modified in view of Tang et al, and further in view of the well-known prior art as indicated above, would include instructions when executed cause the processor to: determine that the first person is within a threshold distance from a second person when the ML model lost track of the first person; and activate the filter (a Kalman filter) to estimate the size and location of the bounding shape in response to determining that the first person is within the threshold distance from the second person (see par. 110-111 and 315-320 as indicated above in the same context).
Referring to claim 15:
Chen et al, as modified in view of Tang et al, and further in view of the well-known prior art as indicated above, would include instructions when executed cause the processor to suspend the bounding shape estimation in response to the ML model resuming tracking of the first person (see par. 160-161 as indicated above in the same context).
Cited Art
The prior art and other references made of record and not relied upon are considered pertinent to applicant's disclosure.
Takahashi (US 12131485 B2) disclose an object tracking device stores, for each site of an object, information indicating an area image of the object in a first frame image of frame images included in a moving image, and stores a conspicuous site of the object. In response to the object being lost, for each maximum value position in a conspicuity map for a second frame image with the lost object, a recovery unit in the object tracking device (1) estimates an area with each site of the object when the conspicuous site is hypothetically at the maximum value position and (2) calculates a similarity score based on a similarity between the area image of each site in the first frame image and the area image of each site in the second frame image, and determines that the object is at a maximum value position in the second frame image with a maximum similarity score.
Sommer et al (US 12417637 B2) disclose tracking the bounding box across subsequent images in the group of images in the continuous stream of image data can include: generating a first bounding box around the person at a first time in the first image of the group of images, generating a second bounding box around the person at a second time in a second image of the group of images, determining a change in pixel values between the first bounding box and the second bounding box, determining a velocity of the person based on the change in pixel values, the velocity indicating a change in movement and directionality of the person between the first image and the second image in the group of images, projecting a location of the person in a third image of the group of images at a third time based on the determined velocity of the person, and generating a third bounding box around the person at the third time in the third image at the projected location of the person (summary).
Sohn et al (US 20250371883 A1) disclose an image processing apparatus and a method for controlling the image processing apparatus. The image processing apparatus according to an embodiment of the present disclosure may identify an object from an acquired image, determine whether the object is hidden by another object by using an aspect ratio of a bounding box of the detected object, and based on the object being hidden, estimate an entire length of the object based on coordinate information of the bounding box. Accordingly, the size information of the hidden object may be efficiently identified while a large amount of database is applied or resources of the apparatus is minimized. The present disclosure may be in connection with a surveillance camera, an automotive driving vehicle, an artificial intelligence module of at least one of a user terminal or a server, a robot, an augmented reality (AR) device, a virtual reality (VR) device, a device related to a 5G service, and the like.
Xu et al (“Non-linear target trajectory prediction for robust visual tracking”) propose a novel occlusion awareness algorithm, which can both address the occlusion issue and the same semantic information false identification issues. In addition, a novel generative adversarial training and long short term memory (LSTM) based target trajectory prediction algorithm is proposed to predict the possible direction of the target in the following frames. The proposed trajectory prediction algorithm can deal with complicated tracking situations more robustly than the traditional algorithms, e.g. Kalman filter. To further improve the occlusion awareness ability of the proposed algorithm, an occlusion supervision based training strategy is proposed, which can improve the robustness of the occlusion awareness ability of the proposed occlusion awareness model. In addition, for accurate estimation of the target bounding box, a distance intersection over union (DIOU) loss for regression training is adopted. A comprehensive evaluation is performed on OTB2015, VOT2016, and VOT2018 to evaluate the effectiveness of the proposed algorithm. The experiment results demonstrate that the proposed algorithms perform well and can largely alleviate the tracking failure issue of the siamese network-based tracker caused by occlusion and the same semantic information target identification.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Scott Rogers whose telephone number is 571-272-7467. The examiner can normally be reached 8 am to 7 pm flex.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abderrahim Merouan can be reached on 571-270-5254. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Scott A Rogers/
Primary Examiner, Art Unit 2683
14 January 2026