Last updated: April 19, 2026
Application No. 18/272,849
EXTRACTING FEATURES FROM SENSOR DATA

Final Rejection §103
Filed
Jul 18, 2023
Examiner
LIU, XIAO
Art Unit
2664
Tech Center
2600 — Communications
Assignee
Five Al Limited
OA Round
2 (Final)
Interview Optional

— +11.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 290 resolved cases, 2023–2026
Examiner Intelligence

LIU, XIAO View full profile →
Grants 89% — above average
Career Allow Rate
257 granted / 290 resolved
+26.6% vs TC avg
Moderate +12% lift
Without
With
+11.5%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
44 currently pending
Career history
334
Total Applications
across all art units
Statute-Specific Performance

§101
8.8%
-31.2% vs TC avg
§103
50.9%
+10.9% vs TC avg
§102
17.0%
-23.0% vs TC avg
§112
17.4%
-22.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 290 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 10/10/2025 has/have been considered by the examiner.
Response to Amendment
	Applicant’s amendments filed on 11/10/2025 to the claims have overcome claim objections, claim rejections under 35 U.S.C. 112(b) and prior art rejections as preciously set forth in the Non-Final Rejection Office Action mailed on 08/08/2025.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 10-11, 13-14, 16 and 18-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al (US 20210327029 A1), hereinafter Chen in view of Beker et al (arXiv:2009.14524v1 2020), hereinafter Beker, and further in view of Zakharov et al (CVPR 2020), hereinafter Zakharov.
-Regarding claim 1, Chen discloses a computer implemented method of training ([0005] - [0007]; [0043]); an encoder to extract features (FIG. 2A, base encoder, representations                         
                            
                                    h
                                
                                    i
                                
                    ,                         
                            
                                    h
                                
                                    j
                                
                    ; FIG. 2B) from sensor data (FIGS. 11B-11C, sensor(s)), the method comprising (Abstract; FIGS. 1-11C, 

    PNG
    media_image1.png
    707
    552
    media_image1.png
    Greyscale
): training a machine learning (ML) system based on a self-supervised loss function applied to a training set (FIG. 2A; [0027], “contrastive learning … contrastive self-supervised learning algorithm”; [0037], “maximizing agreement between differently augmented views of the same data example via a contrastive loss in the latent space”; [0042]), the ML system comprising the encoder (FIG. 2A, base encoder); wherein the training set comprises first data representations and corresponding second data representations (FIG. 2A, images 212 and 222), wherein the encoder extracts features from each first and second data representation (FIG. 2A, intermediate representations 214 and 224), and wherein the self-supervised loss function is configured to cause the ML system to learn feature representations such that each first data representation is more similar to its corresponding second data representation based on their respective features,  extracted by the encoder, than to non-corresponding data representations (FIG. 2A, base encoder 204, maximize agreement for final representations 216 and 226; [0043], “similarity … loss function”; Equation (1); [0044]; Algorithm 1); wherein each first data representation and its corresponding second data representation represent a common set of sensor data (FIG. 2A, images 202, 212 and 222; [0038], “A stochastic data augmentation module (shown generally at 203) that transforms any given data example (e.g., an input image                         
                            x
                        
                     shown at 202) randomly resulting in two correlated views of the same example, denoted                         
                            
                                            x
                                        
                                        ~
                                    
                                    i
                                
                    ,                         
                            
                                            x
                                        
                                        ~
                                    
                                    j
                                
                     which are shown at 212 and 222, respectively”; FIGS. 11B-11C);
Chen does not disclose applying a two-dimensional (2D) object detector to an image other than the first and second data representations, wherein the image contains or is associated with the common set of sensor data, and transforming the common set of sensor data based one or more objects detected in the image, the second data representation representing the transformed sensor data.
In the same field of endeavor, Beker teaches a self-supervised method for textured shape reconstruction and pose estimation of rigid objects with the help of strong shape priors and 2D instance masks (Beker: Abstract; FIGS. 1-7). Beker further teaches applying a 2D object detector to an image other than the first and second data representations, wherein the image contains or is associated with the common set of sensor data, and transforming the common set of sensor data based one or more objects detected in the image, the second data representation representing the transformed sensor data (Beker: FIG. 1, left side, rendering;

    PNG
    media_image2.png
    505
    670
    media_image2.png
    Greyscale
 
Page 2, 2nd paragraph, “Starting from a 2D detector that produces 2D boxes with associated instance masks”).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Chen with the teaching of Beker by using transformed representation based object detector for self-supervised learning in order to generate training data for object detection in autonomous driving.
Chen in view of Beker does not teach the common set of sensor data comprising point cloud data or other non-image data. However, it is known that a 2D image can be obtained by projecting three-dimensional (3D)  point cloud.
Zakharov is an analogous art pertinent to the problem to be solved in this application and teaches a method to recover 3D shapes from pre-trained off-the-shelf 2D detectors and sparse LIDAR data (Zakharov: Abstract; FIGS. 1-9

    PNG
    media_image3.png
    581
    366
    media_image3.png
    Greyscale
). Zakharov further teaches an image patch projected from a common set of sensor data 3D LiDAR points (Zakharov: FIG. 1; Page 1225, 1st Col., Sec 3.3,1., 1st paragraph, “project the 3D LIDAR points                         
                            l
                            =
                            {
                            
                                    l
                                
                                    1
                                
                            …
                             
                                    l
                                
                                    k
                                
                            }
                        
                     … onto the patch …”). Actually, Zakharov also teaches a first data representation and a second data representation with a common set of sensor data 3D LiDAR points, and applying a 2D object detector to the image patch other than the first and second data representations (Zakharov: FIG. 1).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Chen in view of Beker with the teaching of Zakharov by using common set of 3D LIDAR data in order to generate data for 3D model training.
-Regarding claim 18, Chen discloses a computer system comprising (Abstract; FIGS. 1-11C): at least one memory configured to store computer-readable instructions (FIG. 11A; [0086]); at least one hardware processor coupled to the at least one memory and configured to execute the computer-readable instructions (FIG. 11A; [0086]), which upon execution cause the at least one hardware processor perform operations including: training a machine learning (ML) system based on a self-supervised loss function applied to a training set (FIG. 2A; [0027], “contrastive learning … contrastive self-supervised learning algorithm”; [0037], “maximizing agreement between differently augmented views of the same data example via a contrastive loss in the latent space”; [0042]), the ML system comprising the encoder (FIG. 2A, base encoder); wherein the training set comprises first data representations and corresponding second data representations (FIG. 2A, images 212 and 222), wherein the encoder extracts features from each first and second data representation (FIG. 2A, intermediate representations 214 and 224), and wherein the self-supervised loss function is configured to cause the ML system to learn feature representations such that each first data representation is more similar to its corresponding second data representation, based on their respective features extracted by the encoder, than to non-corresponding data representations (FIG. 2A, base encoder 204, maximize agreement for final representations 216 and 226; [0043], “similarity … loss function”; Equation (1); [0044]; Algorithm 1); wherein each first data representation and its corresponding second data representation represent a common set of sensor data (FIG. 2A, images 202, 212 and 222; [0038], “A stochastic data augmentation module (shown generally at 203) that transforms any given data example (e.g., an input image                         
                            x
                        
                     shown at 202) randomly resulting in two correlated views of the same example, denoted                         
                            
                                            x
                                        
                                        ~
                                    
                                    i
                                
                    ,                         
                            
                                            x
                                        
                                        ~
                                    
                                    j
                                
                     which are shown at 212 and 222, respectively”; FIGS. 11B-11C);
Chen does not disclose applying a two-dimensional (2D) object detector to an image other than the first and second data representations, wherein the image contains or is associated with the common set of sensor data, and transforming the common set of sensor data based one or more objects detected in the image, the second data representation representing the transformed sensor data.
Chen in view of Beker does not teach the common set of sensor data comprising point cloud data or other non-image data. However, it is known that a 2D image can be obtained by projecting three-dimensional (3D)  point cloud.
In the same field of endeavor, Beker teaches a self-supervised method for textured shape reconstruction and pose estimation of rigid objects with the help of strong shape priors and 2D instance masks (Beker: Abstract; FIGS. 1-7). Beker further teaches applying a 2D object detector to an image other than the first and second data representations, wherein the image contains or is associated with the common set of sensor data, and transforming the common set of sensor data based one or more objects detected in the image, the second data representation representing the transformed sensor data (Beker: FIG. 1, left side, rendering; Page 2, 2nd paragraph, “Starting from a 2D detector that produces 2D boxes with associated instance masks”).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Chen with the teaching of Beker by using transformed representation based object detector for self-supervised learning in order to generate training data for object detection in autonomous driving.
Zakharov is an analogous art pertinent to the problem to be solved in this application and teaches a method to recover 3D shapes from pre-trained off-the-shelf 2D detectors and sparse LIDAR data (Zakharov: Abstract; FIGS. 1-9). Zakharov further teaches an image patch projected from a common set of sensor data 3D LiDAR points (Zakharov: FIG. 1; Page 1225, 1st Col., Sec 3.3,1., 1st paragraph, “project the 3D LIDAR points                         
                            l
                            =
                            {
                            
                                    l
                                
                                    1
                                
                            …
                             
                                    l
                                
                                    k
                                
                            }
                        
                     … onto the patch …”). Actually, Zakharov also teaches a first data representation and a second data representation with a common set of sensor data 3D LiDAR points , and applying a 2D object detector to the image patch other than the first and second data representations (Zakharov: FIG. 1).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Chen in view of Beker with the teaching of Zakharov by using common set of 3D LIDAR data in order to generate data for 3D model training.
-Regarding claim 19, Chen discloses a non-transitory medium embodying computer-readable instructions configured, when executed on one or more computer hardware processors (FIG. 11A; [0086] – [0087]) to cause the one or more hardware processors to perform operations including (Abstract; FIGS. 1-11C): training a machine learning (ML) system based on a self-supervised loss function applied to a training set (FIG. 2A, base encoder, representations                         
                            
                                    h
                                
                                    i
                                
                    ,                         
                            
                                    h
                                
                                    j
                                
                    ; FIG. 2B; [0027], “contrastive learning … contrastive self-supervised learning algorithm”; [0037], “maximizing agreement between differently augmented views of the same data example via a contrastive loss in the latent space”; [0042]), the ML system comprising an encoder (FIG. 2A, base encoder); wherein the training set comprises first data representations and corresponding second data representations (FIG. 2A, images 212 and 222), wherein the encoder extracts features from each first and second data representation (FIG. 2A, intermediate representations 214 and 224), and wherein the self-supervised loss function encourages the ML system to associate each first data representation with its corresponding second data representation based on their respective features (FIG. 2A, maximize agreement for final representations 216 and 226; [0043], “similarity … loss function”; Equation (1); [0044]; Algorithm 1), wherein each first data representation and its corresponding second data representation represent a common set of sensor data (FIG. 2A, images 202, 212 and 222; [0038], “A stochastic data augmentation module (shown generally at 203) that transforms any given data example (e.g., an input image                         
                            x
                        
                     shown at 202) randomly resulting in two correlated views of the same example, denoted                         
                            
                                            x
                                        
                                        ~
                                    
                                    i
                                
                    ,                         
                            
                                            x
                                        
                                        ~
                                    
                                    j
                                
                     which are shown at 212 and 222, respectively”; FIGS. 11B-11C);
Chen does not disclose applying a two-dimensional (2D) object detector to an image other than the first and second data representations, wherein the image contains or is associated with the common set of sensor data, and transforming the common set of sensor data based one or more objects detected in the image, the second data representation representing the transformed sensor data
In the same field of endeavor, Beker teaches a self-supervised method for textured shape reconstruction and pose estimation of rigid objects with the help of strong shape priors and 2D instance masks (Beker: Abstract; FIGS. 1-7). Beker further teaches applying a 2D object detector to an image other than the first and second data representations, wherein the image contains or is associated with the common set of sensor data, and transforming the common set of sensor data based one or more objects detected in the image, the second data representation representing the transformed sensor data (Beker: FIG. 1, left side, rendering; Page 2, 2nd paragraph, “Starting from a 2D detector that produces 2D boxes with associated instance masks”).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Chen with the teaching of Beker by using transformed representation based object detector for self-supervised learning in order to generate training data for object detection in autonomous driving.
Chen in view of Beker does not teach the common set of sensor data comprising point cloud data or other non-image data. However, it is known that a 2D image can be obtained by projecting three-dimensional (3D)  point cloud.
Zakharov is an analogous art pertinent to the problem to be solved in this application and teaches a method to recover 3D shapes from pre-trained off-the-shelf 2D detectors and sparse LIDAR data (Zakharov: Abstract; FIGS. 1-9). Zakharov further teaches an image patch projected from a common set of sensor data 3D LiDAR points (Zakharov: FIG. 1; Page 1225, 1st Col., Sec 3.3,1., 1st paragraph, “project the 3D LIDAR points                         
                            l
                            =
                            {
                            
                                    l
                                
                                    1
                                
                            …
                             
                                    l
                                
                                    k
                                
                            }
                        
                     … onto the patch …”). Actually, Zakharov also teaches a first data representation and a second data representation with a common set of sensor data 3D LiDAR points, and applying a 2D object detector to the image patch other than the first and second data representations (Zakharov: FIG. 1).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Chen in view of Beker with the teaching of Zakharov by using common set of 3D LIDAR data in order to generate data for 3D model training.
-Regarding claim 10, Chen in view of Beker, and further in view of Zakharov teaches the method of claim 1.
	Chen does not disclose wherein the common set of sensor data is transformed by removing or distorting background sensor data that does not belong to any detected object.
In the same field of endeavor, Beker teaches a self-supervised method for textured shape reconstruction and pose estimation of rigid objects with the help of strong shape priors and 2D instance masks (Beker: Abstract; FIGS. 1-7). Beker further teaches wherein the common set of sensor data is transformed by removing or distorting background sensor data that does not belong to any detected object (Beker: FIG. 1, left side, rendering; Page 2, 2nd paragraph, “Starting from a 2D detector that produces 2D boxes with associated instance masks”; Page 5, 2nd paragraph, “clear foreground/background separation”; Page 8, 2nd paragraph, “the background region is masked out”).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Chen with the teaching of Beker by using transformed representation based object detector for self-supervised learning in order to generate training data for object detection in autonomous driving.
-Regarding claim 11, Chen in view of Beker, and further in view of Zakharov teaches the method of claim 10.
	Chen does not disclose wherein the 2D object detector computes a 2D bounding box for each detected object, wherein the background sensor data is identified as sensor data contained in or associated with a background region of the image outside of any 2D bounding box.
In the same field of endeavor, Beker teaches a self-supervised method for textured shape reconstruction and pose estimation of rigid objects with the help of strong shape priors and 2D instance masks (Beker: Abstract; FIGS. 1-7). Beker further teaches wherein the 2D object detector computes a 2D bounding box for each detected object, wherein the background sensor data is identified as sensor data contained in or associated with a background region of the image outside of any 2D bounding box (Beker: FIG. 1; Page 2, 2nd paragraph, “Starting from a 2D detector that produces 2D boxes with associated instance masks”; Page 5, 2nd paragraph, “clear foreground/background separation … first obtain a foreground mask, and then invert it.”; Page 8, 2nd paragraph, “the background region is masked out”).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Chen with the teaching of Beker by using transformed representation based object detector for self-supervised learning in order to generate training data for object detection in autonomous driving.
-Regarding claim 13, Chen in view of Beker, and further in view of Zakharov teaches the method of claim 1. The modification further teaches wherein the ML system comprises a trainable projection component which projects the features from a feature space into a projection space, the self-supervised loss defined on the projected features, wherein the trainable projection component is trained simultaneously with the encoder (Chen: FIG. 2A).
-Regarding claim 14, Chen in view of Beker, and further in view of Zakharov teaches the method of claim 1.
Chen does not disclose wherein each set of sensor data captures a static or dynamic driving scene.
In the same field of endeavor, Beker teaches a self-supervised method for textured shape reconstruction and pose estimation of rigid objects with the help of strong shape priors and 2D instance masks (Beker: Abstract; FIGS. 1-7). Beker further teaches wherein each set of sensor data captures a static or dynamic driving scene (Beker: FIG. 1).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Chen with the teaching of Beker by using transformed representation based object detector for self-supervised learning in order to generate training data for object detection in autonomous driving.
-Regarding claim 16, Chen in view of Beker, and further in view of Zakharov teaches the method of claim 1.
Chen does not disclose wherein each set of sensor data captures a static or dynamic driving scene.
Chen does not disclose wherein the 2D object detector is a trained machine learning (ML) 2D object detector.
In the same field of endeavor, Beker teaches a self-supervised method for textured shape reconstruction and pose estimation of rigid objects with the help of strong shape priors and 2D instance masks (Beker: Abstract; FIGS. 1-7). Beker further teaches wherein the 2D object detector is a trained machine learning (ML) 2D object detector, whereby knowledge learned in the training of the 2D ML object detector is transferred to the encoder during the training based on the self-supervised loss function (Beker: FIG. 1, left side, rendering; Page 2, 2nd paragraph, “Starting from a 2D detector that produces 2D boxes with associated instance masks”; Page 6, 4th paragraph, “a pre-trained … estimator that is trained with self-supervision”, Sec. 3.1., loss function).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Chen with the teaching of Beker by using transformed representation based object detector for self-supervised learning in order to generate training data for object detection in autonomous driving.

Claim(s) 2-3, 15 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al (US 20210327029 A1), hereinafter Chen in view of Beker et al (arXiv:2009.14524v1 2020), hereinafter Beker, and further in view of Zakharov et al (CVPR 2020), hereinafter Zakharov, in view of Chen et al (2017 CVPR), hereinafter Chen1.
-Regarding claims 2 and 20, Chen in view of Beker, and further in view of Zakharov teaches the method of claim 1 and the system of claim 18.
Chen in view of Beker, and further in view of Zakharov does not teach wherein the common set of sensor data comprises a point cloud encoded in a depth channel of the image and thus represented in a 2D image plane of the image, wherein the first and second data representations represent the point cloud in a 2D plane other than the image plane of the image. However, Chen in view of Beker, and further in view of Zakharov does teach that wherein the first and second data representations can be transformed representations of any given data representation (Chen: [0038]). In other words, a point cloud data can be used as common set of data as well.
 Chen1 is an analogous art pertinent to the problem to be solved in this application and teaches an object detection method for autonomous driving using light detection and range (LIDAR) point cloud and color images (Chen 1: Abstract; FIGS. 1-6). Chen1 further teaches represent the point cloud in a 2D plane other than the image plane of the image (Chen1: Page 1909, 1st Col., Sec.3.1., 2nd paragraph, “discretize the projected point cloud into a 2D grid with resolution of 0.1m”).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Chen in view of Beker, and further in view of Zakharov with the teaching of Chen1 by using discretized point cloud representation in 2D grid in order to perform self-supervised training to generate training data for object detection in autonomous driving.
-Regarding claim 3, Chen in view of Beker, and further in view of Zakharov, in view of Chen1 teaches the method of claim 2.
Chen in view of Beker, and further in view of Zakharov does not teach discretized image representation of the point cloud in the 2D plane that optionally include respective height channels. However, Chen in view of Beker, and further in view of Zakharov does teach that wherein the first and second data representations can be transformed representations of any given data representation (Chen: [0038]). In other words, a point cloud data can be used as common set of data as well.
 Chen1 is an analogous art pertinent to the problem to be solved in this application and teaches an object detection method for autonomous driving using light detection and range (LIDAR) point cloud and color images (Chen 1: Abstract; FIGS. 1-6). Chen1 further teaches discretized image representation of the point cloud in the 2D plane that optionally include respective height channels (Chen1: Page 1909, 1st Col., Sec.3.1., 2nd paragraph, “discretize the projected point cloud into a 2D grid with resolution of 0.1m … For each cell, the height feature is computed as the maximum height of the points in the cell. To encode more detailed height information, the point cloud is divided equally into M slices. A height map is computed for each slice, thus we obtain M height maps”).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Chen in view of Beker, and further in view of Zakharov with the teaching of Chen1 by using discretized point cloud representation in 2D grid in order to perform self-supervised training to generate training data for object detection in autonomous driving.
-Regarding claim 15, Chen in view of Beker, and further in view of Zakharov teaches the method of claim 1.
Chen in view of Beker, and further in view of Zakharov does not teach wherein the common set of sensor data comprising three-dimensional (3D) spatial data, or 2D spatial data in a 2D plane other than an image plane of the image. However, Chen in view of Beker, and further in view of Zakharov does teach that wherein the first and second data representations can be transformed representations of any given data representation (Chen: [0038]). In other words, a point cloud data can be used as common set of data as well.
 Chen1 is an analogous art pertinent to the problem to be solved in this application and teaches an object detection method for autonomous driving using light detection and range (LIDAR) point cloud and color images (Chen 1: Abstract; FIGS. 1-6). Chen1 further teaches wherein the common set of sensor data comprising 3D spatial data, or 2D spatial data in a 2D plane other than an image plane of the image (Chen1: Page 1909, 1st Col., Sec.3.1., 2nd paragraph, “discretize the projected point cloud into a 2D grid with resolution of 0.1m”).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Chen in view of Beker, and further in view of Zakharov with the teaching of Chen1 by using discretized point cloud representation in 2D grid in order to perform self-supervised training to generate training data for object detection in autonomous driving.
Claim(s) 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al (US 20210327029 A1), hereinafter Chen in view of Beker et al (arXiv:2009.14524v1 2020), hereinafter Beker, and further in view of Zakharov et al (CVPR 2020), hereinafter Zakharov, in view of Bongio Karrman et al (US 20210096241 A1), hereinafter Bongio Karrman.
-Regarding claim 5, Chen in view of Beker, and further in view of Zakharov teaches the method of claim 1.
Chen in view of Beker, and further in view of Zakharov does not teach wherein the common set of sensor data comprises a point cloud encoded in a depth channel of the image and thus represented in a 2D image plane of the image, wherein the first and second data representations represent the point cloud in three-dimensional (3D) space. However, Chen in view of Beker does teach that wherein the first and second data representations can be transformed representations of any given data representation (Chen: [0038]). In other words, a point cloud data can be used as common set of data as well.
Bongio Karrman is an analogous art pertinent to the problem to be solved in this application and teaches a perception system to capture data about an environment proximate and update data operation to the vehicle (Bongio Karrman: Abstract; FIGS. 1-6). Bongio Karrman also teaches the object detection system can be performed in a self-supervised manner (Bongio Karrman: [0012], “a machine learning model … may be trained … to identify sensor data representing objects in an environment … training may be performed in … a self-supervised manner). Bongio Karrman further teaches wherein the sensor data representation represents by the point cloud in 3D space (Bongio Karrman: FIGS. 1, 2, 4, 6; [0013]- [0014], “a registration on the radar-based point cloud … discretized point cloud update”; [0024]; [0033]).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Chen in view of Beker, and further in view of Zakharov with the teaching of Bongio Karrman by using a perception component that is configured to use the extracted features to interpret the input sensor data representation that represents the point cloud in 3D space in order to provide transformed data for self-supervised training to generate training data for object detection in autonomous driving.
Claim(s) 7-9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al (US 20210327029 A1), hereinafter Chen in view of Beker et al (arXiv:2009.14524v1 2020), hereinafter Beker, and further in view of Zakharov et al (CVPR 2020), hereinafter Zakharov, in view of Wu et al (US 20210406674 A1), hereinafter Wu.
-Regarding claim 7, Chen in view of Beker, and further in view of Zakharov teaches the method of claim 1.
Chen in view of Beker, and further in view of Zakharov does not teach wherein the image has been captured substantially simultaneously with the common set of sensor data, the sensor data of a non-image modality; wherein each detected object is matched with a corresponding subset of the common set of sensor data in order to transform the common set of sensor data.
Wu is an analogous art pertinent to the problem to be solved in this application and teaches a method to fuse raw data generated by multi-modal (e.g., radar, lidar, and camera), multi-view, multi-sensor for object detection, classification and segmentation (Wu: Abstract; FIGS. 1-7). Wu further teaches wherein the image has been captured substantially simultaneously with the common set of sensor data, the sensor data of a non-image modality; wherein each detected object is matched with a corresponding subset of the common set of sensor data in order to transform the common set of sensor data (Wu: FIGS. 1, 2-5A). 
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Chen in view of Beker, and further in view of Zakharov with the teaching of Wu by using cross-modality data in order to provide transformed data for self-supervised training to generate training data for object detection in autonomous driving.
-Regarding claim 8, Chen in view of Beker, and further in view of Zakharov, in view Wu teaches the method of claim 7.
Chen in view of Beker, and further in view of Zakharov does not teach wherein the common set of sensor data comprises a point cloud not encoded in the image.
Wu is an analogous art pertinent to the problem to be solved in this application and teaches a method to fuse raw data generated by multi-modal (e.g., radar, lidar, and camera), multi-view, multi-sensor for object detection, classification and segmentation (Wu: Abstract; FIGS. 1-7). Wu further teaches wherein the common set of sensor data comprises a point cloud not encoded in the image (Wu: FIGS. 1, 2-5A; [0025], “radar point cloud”). 
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Chen in view of Beker, and further in view of Zakharov with the teaching of Wu by using cross-modality data in order to provide transformed data for self-supervised training to generate training data for object detection in autonomous driving.
-Regarding claim 9, Chen in view of Beker, and further in view of Zakharov in view Wu teaches the method of claim 8.
Chen in view of Beker, and further in view of Zakharov does not teach wherein the point cloud has a non-image modality.
Wu is an analogous art pertinent to the problem to be solved in this application and teaches a method to fuse raw data generated by multi-modal (e.g., radar, lidar, and camera), multi-view, multi-sensor for object detection, classification and segmentation (Wu: Abstract; FIGS. 1-7). Wu further teaches wherein the point cloud has a non-image modality (Wu: FIGS. 1, 2-5A; [0024], “radar, LiDAR’; [0025], “point cloud”). 
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Chen in view of Beker, and further in view of Zakharov with the teaching of Wu by using cross-modality data in order to provide transformed data for self-supervised training to generate training data for object detection in autonomous driving.
Claim(s) 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al (US 20210327029 A1), hereinafter Chen in view of Beker et al (arXiv:2009.14524v1 2020), hereinafter Beker, and further in view of Zakharov et al (CVPR 2020), hereinafter Zakharov, in view of Bongio Karrman et al (US 20210096241 A1), hereinafter Bongio Karrman, in view of Yang et al (2018 CVPR), hereinafter Yang.
-Regarding claim 6, Chen in view of Beker, and further in view of Bongio Karrman teaches the method of claim 5.
Chen in view of Beker, and further in view of Zakharov, in view of Bongio Karrman does not teach wherein the first and second data representations are discretized voxel representations of the point cloud in 3D space, or non-discretized representations of the point cloud in 3D space.
However, Yang is an analogous art pertinent to the problem to be solved in this application and teaches a method for real-time 3D object detection from point cloud (Yang: Abstract; FIGS. 1-6). Yang further wherein the sensor data representation represents by discretized voxel representations of the point cloud in 3D space, or non-discretized representations of the point cloud in 3D space (Yang: Page 7652, 2nd Col., 2nd paragraph, “3D voxel grid transforms the point cloud into a regularly spaced 3D grid”; Page 7654, 1st Col., last paragraph).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Chen in view of Beker, and further in view of Zakharov, in view of Bongio Karrman with the teaching of Yang by interpreting the input sensor data representation that represents the point cloud in 3D space in order to provide transformed data for self-supervised training to generate training data for object detection in autonomous driving
Claim(s) 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al (US 20210327029 A1), hereinafter Chen in view of Beker et al (arXiv:2009.14524v1 2020), hereinafter Beker, and further in view of Zakharov et al (CVPR 2020), hereinafter Zakharov, in view of Velic et al (US 20170304732 A1), hereinafter Velic.
-Regarding claim 12, Chen in view of Beker, and further in view of Zakharov, teaches the method of claim 10.
Chen in view of Beker, and further in view of Zakharov does not teach wherein the background sensor data is fully or partially removed and replaced with random noise. However, this is a common practice in the filed for object recognition and detection.
Velic is an analogous art pertinent to the problem to be solved in this application and teaches a method for object recognition based on invariance of rotation, size, scale, illumination or background change (Velic: Abstract; FIGS. 1-7). Velic further teaches wherein the background sensor data is fully or partially removed and replaced with random noise (Velic: [0048], “performing a noise-introducing operation on respective background portions (outside the corresponding object portions) of the first and second processed versions of the captured image, e.g. a blurring operation or a process that adds a noise component to the background portion or even replaces the background with noise (i.e. random pixel values)”); [0156]).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Chen in view of Beker, and further in view of Zakharov, with the teaching of Velic by performing a noise-introducing operation on respective background of the sensor data in order to provide transformed data for self-supervised training to generate training data and improve the performance of object detection.
Claim(s) 21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al (US 20210327029 A1), hereinafter Chen in view of Beker et al (arXiv:2009.14524v1 2020), hereinafter Beker, and further in view of Zakharov et al (CVPR 2020), hereinafter Zakharov, in view of Chen et al (2017 CVPR), hereinafter Chen1, in view of Bongio Karrman et al (US 20210096241 A1), hereinafter Bongio Karrman.
-Regarding claim 21, Chen in view of Beker, and further in view of Zakharov, in view of Chen1 teaches the system of claim 20.
Chen in view of Beker, and further in view of Zakharov, in view of Chen1 does not teach wherein the perception component is configured to use the extracted features to interpret the input sensor data representation.
However, Bongio Karrman is an analogous art pertinent to the problem to be solved in this application and teaches a perception system to capture data about an environment proximate and update data operation to the vehicle (Bongio Karrman: Abstract; FIGS. 1-6). Bongio Karrman further teaches wherein the perception component is configured to use the extracted features to interpret the input sensor data representation (Bongio Karrman: FIGS. 1, 2, 4, 6; [0014]; [0028]; [0033]).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Chen in view of Beker, and further in view of Zakharov, in view of Chen1 with the teaching of Bongio Karrman by using a perception component that is configured to use the extracted features to interpret the input sensor data representation in order to provide transformed data for self-supervised training to generate training data for object detection in autonomous driving.
Allowable Subject Matter
Claim 4 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Response to Arguments
Applicant’s arguments with respect to claim(s) 1-16 and 18-21 dated on 11/10/2026 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.                                                                                                                                                    
Any inquiry concerning this communication or earlier communications from the examiner should be directed to XIAO LIU whose telephone number is (571)272-4539. The examiner can normally be reached Monday-Thursday and Alternate Fridays 8:30-4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Mehmood can be reached at (571) 272-2976. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/XIAO LIU/Primary Examiner, Art Unit 2664
Read full office action
Prosecution Timeline

Jul 18, 2023
Application Filed
Aug 06, 2025
Non-Final Rejection — §103
Nov 05, 2025
Examiner Interview Summary
Nov 05, 2025
Applicant Interview (Telephonic)
Nov 10, 2025
Response Filed
Jan 14, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/863,567
Patent 12603972
WIRELESS TRANSMITTER IDENTIFICATION IN VISUAL SCENES
2y 5m to grant Granted Apr 14, 2026
18/270,222
Patent 12592069
OBJECT RECOGNITION METHOD AND APPARATUS, AND DEVICE AND MEDIUM
2y 5m to grant Granted Mar 31, 2026
18/319,896
Patent 12579834
Information Extraction Method and Apparatus for Text With Layout
2y 5m to grant Granted Mar 17, 2026
18/324,644
Patent 12576873
SYSTEM AND METHOD OF CAPTIONS FOR TRIGGERS
2y 5m to grant Granted Mar 17, 2026
18/268,374
Patent 12573175
TARGET TRACKING METHOD, TARGET TRACKING SYSTEM AND ELECTRONIC DEVICE
2y 5m to grant Granted Mar 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
89%
Grant Probability
99%
With Interview (+11.5%)
2y 9m
Median Time to Grant
Moderate
PTA Risk
Based on 290 resolved cases by this examiner. Grant probability derived from career allow rate.