DETAILED ACTION
Response to Amendment
Applicant’s amendments filed on 23 December 2025 have been entered. No claims are amended, canceled, or added. Hence, claims 1-20 are pending in the application., with claims 1 and 12 being independent.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1, 2, 6-12 and 15-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over CRONVALL et al. (US 20250157053 A1), referred herein as CRONVALL in view of MOHAMED et al. (US 20240404199 A1), referred herein as MOHAMED.
Regarding Claim 1, CRONVALL in view of MOHAMED teaches a method comprising:
for each image in a set of images, CRONVALL [0002] The system may detect the object in each of the image frames; [0033] In general system 100 may obtain images 104, detect objects in images 104, and track the objects; [0051] motion compensator 324 (or another system, device, or module) may determine intrinsic and/or extrinsic parameters of the camera; [0052] Motion compensator 324 (or another system, device, or module) may determine positions of detected objects (e.g., in world coordinates or relative to the camera));
CRONVALL discloses parameters of the camera, but does not explicitly disclose storing a set of parameter values for a set of parameters of a first function. However, MOHAMED teaches storing a set of parameter values for a set of parameters of a first function (MOHAMED [0045] the memory 204 may be configured to store the 3D object detection model 110 and the regression model 112. The memory 204 may further store the acquired 3D data, the generated variations of the 3D data, the generated 3D object detection results; [0026] The 3D object detection model 110 may be defined by its hyper-parameters, for example, activation function(s), a number of weights, a cost function, a regularization function, an input size, a number of layers, and the like).
CRONVALL in view of MOHAMED further teaches
training a neural network based on the set of images and the set of parameter values of each image (CRONVALL [0044] a corpus of training data may include a number of images and a number of bounding boxes associated with objects in each of the number of images… Parameters (e.g., weights) of each of CNN 304, transformer encoder 308, and transformer decoder 314, and/or queries 312 may be adjusted based on the difference. The parameters may be adjusted such that in future iterations of the training procedure (e.g., based on further images of the corpus of training data) the output embeddings result in bounding boxes that are more similar to the provided bounding boxes);
after training the neural network, inputting an image into the neural network (CRONVALL [0043] CNN 304 may be trained to generate features (e.g., features 306) based on images (e.g., image 302); [0111] The first layer of the CNN 700 can be the convolutional hidden layer 704. The convolutional hidden layer 704 can analyze image data of the input layer 702. Each node of the convolutional hidden layer 704 is connected to a region of nodes (pixels) of the input image called a receptive field);
based on inputting the image into the neural network, generating an output (CRONVALL [0121] the output from the output layer 710 can include an M-dimensional vector (in the prior example, M=10). M indicates the number of classes that the CNN 700 has to choose from when classifying the object in the image. Other example outputs can also be provided. Each number in the M-dimensional vector can represent the probability the object is of a certain class) that comprises a set of output parameter values of a particular object depicted in the image (MOHAMED [0029] one or more parameters of each node of the 3D object detection model 110 may be updated based on whether an output of the final layer for a given input (from the training dataset) matches a correct result based on a loss function for the 3D object detection model 110);
wherein the method is performed by one or more computing devices (CRONVALL [0122] FIG. 8 illustrates an example computing-device architecture 800 of an example computing device which can implement the various techniques described herein).
MOHAMED discloses an electronic apparatus and method for visualization of AI-generated predictions from 3D data. MOHAMED is analogous to the present patent application.
It would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention to have modified CRONVALL to incorporate the teachings of MOHAMED, and apply the 3D object detection model that correspond to a mathematical function with a set of parameters into the CNN end-to-end training procedure for tracking objects.
Doing so would provide a method for explaining 3D object detection model predictions that is both understandable to non-experts and provides a comprehensive understanding of the model's behavior.
Regarding Claim 2, CRONVALL in view of MOHAMED teaches method of Claim 1, and further teaches wherein the object is a track boundary (CRONVALL [0023] The bounding boxes may be indicative of image coordinates associated with objects in the given image frame; MOHAMED [0053] bounding box coordinates of the 3D data 302A which may define the boundaries of each data block of the set of data blocks 304A).
Regarding Claim 6, CRONVALL in view of MOHAMED teaches method of Claim 1, and further teaches further comprising:
based on the set of parameter values, determining a lateral position of a moving object that is associated with the image (MOHAMED [0081] The 3D object 508 may be at a first position in a 3D coordinate space (in the 3D environment) in the first 3D frame 504A, at a second position in the 3D coordinate space in the second 3D frame 504B, and at a third position in the 3D coordinate space in the third 3D frame 504C; [0082] Each 3D object detection result of the set of 3D object detection results 506A-506C may include a 3D bounding box for the 3D object 508 if the 3D object 508 is detected, by the 3D object detection model 110, in a corresponding input 3D frame of the set of input 3D frames 504A-504C. The first result 506A may include a first 3D bounding box 510A and the third result 506C may include a second 3D bounding box 510B).0
Regarding Claim 7, CRONVALL in view of MOHAMED teaches method of Claim 6, and further teaches wherein determining the lateral position of the moving object comprises:
determining a position of the particular object based on the set of output parameter values (CRONVALL [0052] Motion compensator 324 (or another system, device, or module) may determine positions of detected objects (e.g., in world coordinates or relative to the camera); [0054] Regardless of the source of the predicted positions of the objects, motion compensator 324 may modify output embeddings 316 based on the predicted positions of the objects);
generating a difference between a current position of the moving object and the position of the particular object (MOHAMED [0035] Each 3D point cloud frame of the set of 3D point cloud frames may be generated based on a set of images or depth map(s) that may be captured by one or more sensors (such as an image sensor or a depth sensor) in a static or mobile state. In these or other embodiments, the position of the image sensors may remain same or may vary at different time-instants; [0081] Since the 3D environment is dynamic, positions of the 3D object 508 may be different in each input 3D frame. The 3D object 508 may be at a first position in a 3D coordinate space (in the 3D environment) in the first 3D frame 504A, at a second position in the 3D coordinate space in the second 3D frame 504B, and at a third position in the 3D coordinate space in the third 3D frame 504C).
Regarding Claim 8, CRONVALL in view of MOHAMED teaches method of Claim 1, and further teaches wherein training the neural network comprises minimizing a cost function that is based on a distance between a predicted position of the object in said each image and an actual position of the object in said image (MOHAMED [0032] The regression model 112 may be defined by its hyper-parameters, for example, a number of weights, a cost function, an input size, a number of layers, and the like. The parameters of the regression model 112 may be tuned and weights (i.e., the weight values) may be updated so as to move towards a global minima of a cost function for the regression model 112).
Regarding Claim 9, CRONVALL in view of MOHAMED teaches method of Claim 1, and further teaches wherein the neural network comprises an embedding layer, a set of convolution layers, and a set of fully connected layers (CRONVALL [0109] Neural network 600 can include any suitable deep network. One example includes a convolutional neural network (CNN), which includes an input layer and an output layer, with multiple hidden layers between the input and out layers).
Regarding Claim 10, CRONVALL in view of MOHAMED teaches method of Claim 1, and further teaches further comprising, prior to training the neural network:
for each image in the set of images:
generating a set of points that describe the object in said each image (MOHAMED [0034] In operation, the electronic apparatus 102 may be configured to receive a user input from the user 114. The user input may trigger an acquisition of an input 3D data frame 116 (i.e., 3D data). The 3D data may be, for example, a 3D point cloud; [0035] the 3D data may correspond to a dynamic environment in which the 3D object 118 may be in a mobile state. The received user input may include a set of 3D point cloud frames, each of which may include the 3D object 118 (or multiple 3D objects)),
generating the set of parameter values of the first function by applying a parametric fitting to the set of points (MOHAMED [0026] The 3D object detection model 110 may be a machine learning model, which may be trained on an object detection task to detect 3D objects in 3D data (e.g., a 3D point cloud frame or a point cloud sequence). The 3D object detection model 110 may be defined by its hyper-parameters, for example, activation function(s), a number of weights, a cost function, a regularization function, an input size, a number of layers, and the like).
Regarding Claim 11, CRONVALL in view of MOHAMED teaches method of Claim 10, and further teaches further comprising:
defining a size of a bounding box (CRONVALL [0045] Each of queries 312 may focus on a certain aspect of the object type of interest, such as, different appearances or sizes. Each of queries 312 may be, or may include, a vector which is fixed across frames… Each of output embeddings 316 may be, or may include, an embedding vector than may be decoded (e.g., by bounding-box decoder 320) into object coordinates and scores);
for each image in the set of images:
projecting the bounding box onto a moving object that is associated with said each image (CRONVALL [0035] Image 202 is overlaid with a bounding box 204. Bounding box 204 is an example of image coordinates corresponding to person 206 in image 202);
wherein generating the set of points is based on a coordinate space that is defined by the bounding box (MOHAMED [0061] The object detection result may include a bounding box (for example, the 3D bounding box 120) for the 3D object 302B. The circuitry 202 may determine, from a plurality of activation nodes of the 3D object detection model 110, an activation node that may be responsible for the object detection result. Thereafter, a node anchor plugin corresponding to the activation node may be selected as a reference point).
Regarding Claims 12 and 15-20, CRONVALL in view of MOHAMED teaches one or more non-transitory storage media storing instructions. The metes and bounds of the limitations of the claims substantially correspond to the elements set forth in claims 1 and 6-11; thus they are rejected on similar grounds and rationale as their corresponding limitations.
Claim(s) 3-5, 13 and 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over CRONVALL et al. (US 20250157053 A1), referred herein as CRONVALL in view of MOHAMED et al. (US 20240404199 A1), referred herein as MOHAMED and Kim et al. (US 20240013564 A1), referred herein as Kim.
Regarding Claim 3, CRONVALL in view of MOHAMED teaches method of Claim 2, and further teaches wherein the track boundary is a first track boundary (CRONVALL [0023] The bounding boxes may be indicative of image coordinates associated with objects in the given image frame; MOHAMED [0053] bounding box coordinates of the 3D data 302A which may define the boundaries of each data block of the set of data blocks 304A). However, the cited prior art in viewing of Kim further teaches the method further comprising:
storing, for each image in the set of images, a second set of parameter values for a second set of parameters of a second function that describes a second track boundary in said each image (Kim [0052] Block 206 may comprise a determination of parameters that at least in part define first and second mappings defined and/or executed in blocks 204 and 206, respectively. In a particular example implementation, block 206 may comprise determining parameters for neural networks that implement processing paths 120 and 122; [0080] define bounding boxes over different objects detected. Cascade mask R-CNN 630 may also configure one or more detectors to solve a classification problem to, for example, classify one or more objects detected to be within defined bounding boxes. In a particular implementation, parameters of detectors of cascade mask R-CNN 630 may be configured from one or more neural network layers that may be determined and/or tuned using training operations);
wherein training the neural network is also based on the second set of parameter values of each image (Kim [0025] applying a supervised operation to further train parameters of the encoder and the decoder trained in the self-supervised operation based, at least in, in part, on a second loss function based, at least in part, on a computed loss associated with detection of objects; [0110] a second training operation at block 754 may further comprise determining parameters of an extractor (e.g., an instance of extractor 604) to map the content signal to the samples of the content signal. Such the parameters of the extractor comprise parameters of one or more second neural networks. In a particular implementation, the input tensor of the one or more first neural networks may be further populated with intermediate states of the one or more second neural networks).
Kim discloses machine-learning devices to implement one or more encoding and/or decoding techniques. Kim is analogous to the present patent application.
It would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention to have modified CRONVALL to incorporate the teachings of Kim, and apply the second views, encoder and decoders of a content signal into the CNN end-to-end training procedure for tracking objects.
Doing so would provide a self-supervised learning (SSL) with yielded visual representations having associated accuracy approaching an accuracy of visual representations obtained from fully supervised learning on large computer vision downstream tasks.
Regarding Claim 4, CRONVALL in view of MOHAMED teaches method of Claim 1, and further teaches wherein the first function is for a first dimension in a multi-dimensional space (CRONVALL [0034] object detections 108 may include image coordinates defining bounding boxes which may define pixels of images 104 that represent the objects… Object tracker 110 determine relative positions 112 of objects (e.g., relative to camera 102) based on object detections 108, for example, using three-dimensional geometry). However, the cited prior art in viewing of Kim further teaches the method further comprising:
for each image in the set of images, storing a second set of parameter values for a second set of parameters of a second function that describes the object in said each image (Kim [0052] Block 206 may comprise a determination of parameters that at least in part define first and second mappings defined and/or executed in blocks 204 and 206, respectively. In a particular example implementation, block 206 may comprise determining parameters for neural networks that implement processing paths 120 and 122; [0080] define bounding boxes over different objects detected. Cascade mask R-CNN 630 may also configure one or more detectors to solve a classification problem to, for example, classify one or more objects detected to be within defined bounding boxes. In a particular implementation, parameters of detectors of cascade mask R-CNN 630 may be configured from one or more neural network layers that may be determined and/or tuned using training operations);
wherein the second function is for a second dimension, in the multi-dimensional space, that is different than the first dimension (MOHAMED [0059] Each 3D object detection result of the generated set of 3D object detection results 310A . . . 310N may include a 3D bounding box for the 3D object 302B. For example, the object detection result 310A may include coordinates of a first 3D bounding box on the set of data blocks 304A; [0084] The circuitry 202 may be further configured to predict a position of a 3D bounding box 512A based on a position of the first 3D bounding box 510A in the first 3D frame 504A and a position of the second 3D bounding box 510B in the third 3D frame 504C).
The same motivation as claim 3 applies here.
Regarding Claim 5, CRONVALL in view of MOHAMED teaches method of Claim 4, and further teaches wherein the first function is a first polynomial function and the second function is a second polynomial function (CRONVALL [0060] In the general case (e.g., when B is not a linear function), δ can still be determined. But a more expensive optimization method may need to be used, for example, gradient descent).
Regarding Claims 13 and 14, CRONVALL in view of MOHAMED teaches the one or more storage media of Claim 12. The metes and bounds of the limitations of the claims substantially correspond to the elements set forth in claims 3 and 4; thus they are rejected on similar grounds and rationale as their corresponding limitations.
Response to Arguments
Applicant's arguments filed on 23 December 2025, with respect to the 103 rejection have been fully considered but they are not persuasive.
On page 3, Applicant's Remarks, with respect to claim 1, the applicant recited “The Office Action concedes that Cronvall fails to disclose the bolded storing portion of Claim 1”. Examiner reassert this statement. However, although Cronvall does not explicitly teach the storing portion, the object features extracted from by CNN 304 teaches the parameters that describes an object (see FIG. 3 and [0041]). The adjustable parameters of Cronvall generated from iteration of training data (see [0044]) also teaches the parameters that describes an object. Further, the 3D data for 3D objects detection of MOHAMED still teaches the parameters that describes an object, as claimed. The cited 3D object detection model with hyper-parameters performs the first function of training the detecting of an object from images. Regarding the first argument, it is respectfully noted that, CRONVALL in view of MOHAMED teaches storing a set of parameter values for a set of parameters of a first function that describes an object in said each image, as claimed.
On page 5, Applicant's Remarks, with respect to claim 1, the applicant argued “In neither Cronvall nor Mohamed is there a set of parameter values (of an object depicted in an image) that a model outputs”. Examiner respectfully disagrees with the second argument. FIG.5, element 508 of Cronvall disclosed “Detect the object in the second image based on the modified output embedding”, and the paragraph [0044] further disclosed “The parameters may be adjusted such that in future iterations of the training procedure (e.g., based on further images of the corpus of training data) the output embeddings result in bounding boxes that are more similar to the provided bounding boxes.” Therefore, the output data of Cronvall equates the output parameter values of an object, as claimed. The one or more parameters of each node of the 3D object of MOHAMED is derived from input 3D training data, and served as input/output data of layers of functions within the 3D object detection model. Regarding the second argument, it is respectfully noted that, CRONVALL in view of MOHAMED teaches generating an output that comprises a set of output parameter values of a particular object depicted in the image, as claimed.
On page 5, Applicant's Remarks, with respect to claim 2, the applicant argued “bounding boxes which are not depicted in the recited image.” Examiner respectfully disagrees with the third argument. Paragraph [0034] of Cronvall disclosed “bounding box 204 may define pixels in image 202 that represent person 206”, therefore, the bounding box represents/describes the object in the image, but does not necessaries be seen. Regarding the third argument, it is respectfully noted that, CRONVALL in view of MOHAMED teaches wherein the object is a track boundary, as claimed.
On page 6, Applicant's Remarks, with respect to claim 2, the applicant argued “those paragraphs refer to coordinates and positions, not to functions.” Examiner respectfully disagrees with the fourth argument. Paragraph [0034] of Cronvall disclosed “Object tracker 110 determine relative positions 112 of objects (e.g., relative to camera 102) based on object detections 108, for example, using three-dimensional geometry”, therefore, the object tracker of Cronvall, and 3D object detection model both performed function of tracking object’s 3D position, which anticipates the first of multi-dimensional space, as claimed. Regarding the fourth argument, it is respectfully noted that, CRONVALL teaches wherein the first function is for a first dimension in a multi-dimensional space, as claimed.
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Samantha (Yuehan) Wang whose telephone number is (571)270-5011. The examiner can normally be reached Monday-Friday, 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, King Poon can be reached on (571)272-7440. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Samantha (YUEHAN) WANG/
Primary Examiner
Art Unit 2617