Last updated: April 19, 2026
Application No. 18/542,327
TRAINING A POINT CLOUD PROCESSING MODEL USING A COMPUTER VISION MODEL

Final Rejection §103
Filed
Dec 15, 2023
Examiner
MEMON, OWAIS IQBAL
Art Unit
2663
Tech Center
2600 — Communications
Assignee
Waymo LLC
OA Round
2 (Final)
Interview Optional

— +22.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 101 resolved cases, 2023–2026
Examiner Intelligence

MEMON, OWAIS IQBAL View full profile →
Grants 74% — above average
Career Allow Rate
75 granted / 101 resolved
+12.3% vs TC avg
Strong +22% interview lift
Without
With
+22.4%
Interview Lift
resolved cases with interview
Typical timeline
3y 2m
Avg Prosecution
27 currently pending
Career history
128
Total Applications
across all art units
Statute-Specific Performance

§101
4.4%
-35.6% vs TC avg
§103
51.8%
+11.8% vs TC avg
§102
30.6%
-9.4% vs TC avg
§112
12.6%
-27.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 101 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Applicants Remarks/Arguments
The specifications objection has been withdrawn in light of the spelling corrections made in the present amendment.
The 35 USC 112 rejection has been withdrawn in light of the applicants amendments.
In regards to the applicants arguments that Zhang does not teach training a point cloud processing neural network, the examiner respectfully disagrees. Zhang teaches ([0059] “deep CCA uses two deep neural network models f=f(X;W.sub.f) and g=g(Y;W.sub.g) to learn the non-linear structures of X and Y,” and [0060] “to optimize weights of deep CCA.” Learning and optimizing weights of a deep CCA model are understood by the examiner to be the same as the claimed training a neural network in light of instant specifications page 9  )
In regards to the applicants arguments that Zhang does not teach training the point cloud neural network based on differences, the examiner agrees and conducted a further search based on amended claim 1. The examiner found previously cited pertinent art Srinivasan et al US20250095173 teaching  training the point cloud processing neural network ([0027] “point could features may be extracted from a point cloud…In order to train the neural network…”) based on differences between the respective target features for the points in the training point clouds and the respective features for the points in the training point clouds. ([0027] “a loss function may be used that calculates differences between the generated depth maps and the ground truth dense depth map, and values calculated using the loss function may be used to update a neural network model for the neural network.” Is understood to be the same as the claimed training a neural network based on differences between…target features…and the features for the points in light of instant specifications pg 14).
In regards to the new claims 21-24, the examiner found previously cited prior art Xue teaching the instant claims and is cited in the 103 section below.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.


Claims 1, 6, 12, 14, 17 and 19  are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (US20220366681, hereinafter “Zhang”) and in view of Srinivasan et al (US20250095173, hereinafter “Srinivasan”)
Claims 1, 6, 17 and 19 are listed below
Claim 12. (Currently Amended)  Zhang teaches A system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or computers to perform operations ([0039] “A terminal device includes: a memory, a processor and computer programs stored on the memory and capable of running on the processor, wherein the processor implements the method”) comprising: obtaining a training data set ([0029] “selecting data of a public data set KITTI as a training set,”) comprising a plurality of training point clouds ([0029] “wherein the training set includes … point cloud data;”) and, for each training point cloud, a corresponding set of images; ([0029] “wherein the training set includes RGB images” KITTI public dataset is known to have training point cloud data corresponding to a set of images https://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d) 
 for each of the training point clouds: generating a respective target feature for each of a plurality of points from the training point cloud by processing the corresponding set of images using the pre-trained computer vision neural network; ([0051] “Step 4, designing a CCA module for the fusion of laser point cloud PC and image I according to the multi-source information input, and extracting features of the two source data by using a convolutional neural network, respectively.”) 
and processing the training point cloud using the point cloud processing neural network to generate respective features of each of the plurality of points; ([0065] “point cloud features generated by the network are extracted through a PointNet algorithm,” PointNet is a type of neural network for processing point clouds https://medium.com/@itberrios6/introduction-to-point-net-d23f43aa87d2) and
training the point cloud processing neural network ([0059] “deep CCA uses two deep neural network models f=f(X;W.sub.f) and g=g(Y;W.sub.g) to learn the non-linear structures of X and Y,” and [0060] “to optimize weights of deep CCA.” Learning and optimizing weights of a deep CCA model are understood by the examiner to be the same as the claimed training a neural network in light of instant specifications page 9)
Zhang does not explicitly teach training the point cloud processing neural network based on differences between the respective target features for the points in the training point clouds and the respective features for the points in the training point clouds. 
Srinivasan teaches training the point cloud processing neural network ([0027] “point could features may be extracted from a point cloud…In order to train the neural network…”) based on differences between the respective target features for the points in the training point clouds and the respective features for the points in the training point clouds. ([0027] “a loss function may be used that calculates differences between the generated depth maps and the ground truth dense depth map, and values calculated using the loss function may be used to update a neural network model for the neural network.” Is understood to be the same as the claimed training a neural network based on differences between…target features…and the features for the points in light of instant specifications pg 14)
It would have been obvious to persons of ordinary skill in the art before the effective filing date of the claimed invention to modify Zhang to have training the neural network based on differences between target features for the points and the features for the points as taught by Srinivasan to arrive at the claimed invention discussed above.  The motivation for the proposed modification would have been so that the (Srinivasan  [0017] “to produce accurate, dense depth maps.”) 
Claim 14. (Original)  Zhang and Srinivasan teach The system of claim 12, 
Zhang teaches wherein the point cloud processing neural network is further configured to process the respective features for each of the plurality of points ([0014] “inputting the fused point cloud features”) to generate a task prediction for a machine learning task. ([0014] “to achieve object detection.” Is understood to be the same as the claimed task prediction in light of instant specifications pg10)
Claim 1. (Currently Amended) The method herein has been executed and performed by the system of claim 12 and is likewise rejected.
Claim 6. (Original) The method herein has been executed and performed by the system of claim 14 and is likewise rejected
Claim 17. (Currently Amended) The Non-transitory computer storage medium herein has been executed and performed by the system of claim 12 and is likewise rejected
Claim 19. (Currently Amended) Zhang and Srinivasan teach The non-transitory computer storage media of claim 17, wherein the point cloud processing neural network is further configured to 
Zhang teaches process the respective features for each of the plurality of points ([0014] “inputting the fused point cloud features”) to generate a task prediction for a machine learning task. ([0014] “to achieve object detection.” Is understood to be the same as the claimed task prediction in light of instant specifications pg10)

Claims 7 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (US20220366681, hereinafter “Zhang”) and in view of Srinivasan et al (US20250095173, hereinafter “Srinivasan”) and in view of Wu et al (US20220383513, hereinafter “Wu”)
Claim 7. (Original) The method herein has been executed and performed by the system of claim 15 (below) and is likewise rejected
Claim 15. (Original) Zhang and Srinivasan teach The system of claim 14, wherein the operations further comprise: after training the point cloud processing neural network on the training data set, 
Zhang and Srinivasan do not explicitly teach training the point cloud processing neural network on training data for the machine learning task. 
Wu teaches training the point cloud ([0062] “point cloud”) processing neural network ([0062] “training a convolutional neural network”) on training data ([0062] “based on sample data”) for the machine learning task. ([0062] “to generate a transparent object detection model”) 
It would have been obvious to persons of ordinary skill in the art before the effective filing date of the claimed invention to modify the proposed combination of Zhang and Srinivasan to have training a point cloud neural network on training data for object detection as taught by Wu to arrive at the claimed invention discussed above.  The motivation for the proposed modification would have been so that (Wu [0048] “position where the … object is located can be estimated accurately”) 
Claims 9, 16 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (US20220366681, hereinafter “Zhang”) and in view of Srinivasan et al (US20250095173, hereinafter “Srinivasan”) and in view of Xu et al (US20200005485, hereinafter “Xu”)
Claim 9. (Currently Amended) The method herein has been executed and performed by the system of claim 16 (below) and is likewise rejected
Claim 16. (Currently Amended) Zhang and Srinivasan teach The system of claim 12, wherein generating a respective target feature for each of a plurality of points from the training point cloud by processing the corresponding set of images using a pre-trained computer vision neural network comprises: 
Zhang and Srinivasan do not explicitly teach processing each image in the corresponding set of images using the computer vision neural network to generate, for each corresponding image, a respective patch feature for each of a plurality of patches in the image; determining, for each of the plurality of points, a corresponding patch from a particular one of the images in the corresponding set; and
for each of the plurality of points, using, as the target feature for the point, the patch feature for the corresponding patch in the particular image.
Xu teaches processing each image in the corresponding set of images using the computer vision neural network ([0033] “PointNet”) to generate, for each corresponding image, ([0033] “attributes (e.g., pixels) in the image 304.”) a respective patch feature for each of a plurality of patches in the image; ([0033] “A feature patch around the point may be extracted from an intermediate layer of the image”)
determining, for each of the plurality of points,  a corresponding patch from a particular one of the images in the corresponding set; ([0048] “associating a portion of the cropped image with each point.”) and
for each of the plurality of points, using, as the target feature for the point, the patch feature for the corresponding patch in the particular image. ([0048] “the feature patch may be concatenated with the per-point feature vector obtained at 508 prior to processing with the global geometric feature vector for the entire point cloud and the appearance feature vector for the cropped image.”)
It would have been obvious to persons of ordinary skill in the art before the effective filing date of the claimed invention to modify the proposed combination of Zhang and Srinivasan to have a patch of each image correspond to points for feature detection as taught by Xu to arrive at the claimed invention discussed above.  The motivation for the proposed modification would have been to (Xu [0014] “provide improved simplicity, functionality and/or reliability.”) 
Claim 20. (Original) The Non-transitory computer storage medium herein has been executed and performed by the system of claim 16 and is likewise rejected

Claims 3-5, 8 and 21-24 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (US20220366681, hereinafter “Zhang”) and in view of Srinivasan et al (US20250095173, hereinafter “Srinivasan”) and in view of Xue et al (US20240312128, hereinafter “Xue”)
Claim 3. (Original) Zhang and Srinivasan teach The method of claim 1, wherein the pre-trained computer vision neural network 
Zhang and Srinivasan  do not explicitly  teach has been pre-trained using both images and text.
Xue teaches computer vision neural network has been pre-trained ([0034] “neural network based 3D visual understanding module 130 and one or more of its submodules 131-133 may be trained”) using both images  and text. ([0034] “with the text representations and image representations.”)
It would have been obvious to persons of ordinary skill in the art before the effective filing date of the claimed invention to modify the proposed combination of Zhang and Srinivasan to have both image and text used for pre-training a neural network as taught by Xue to arrive at the claimed invention discussed above.  The motivation for the proposed modification would have been so that the (Xue [0025] “model may be improved by the learning of unified representations.”) 
Claim 4. (Original) Zhang, Srinivasan and Xue teach The method of claim 3, 
Zhang and Srinivasan do not explicitly teach wherein the pre-trained computer vision neural network has been pre-trained jointly with a text processing neural network.
Xue teaches wherein the pre-trained computer vision neural network has been pre-trained ([0034] “neural network based 3D visual understanding module 130 and one or more of its submodules 131-133 may be trained”) jointly with a text processing neural network. ([0034] “with the text representations and image representations.”)
Claim 5. (Original) Zhang and Srinivasan teach The method of claim 1, 
Zhang and Srinivasan do not explicitly teach wherein the computer vision neural network is a text- prompted image segmentation neural network.
Xue teaches wherein the computer vision neural network is a text- prompted image segmentation neural network. ([0097] “neural network…text representations 1008 of the category candidates and the 3D representations 1012 are determined… The category (e.g., “piano”) that introduces the smallest distance is selected” )
It would have been obvious to persons of ordinary skill in the art before the effective filing date of the claimed invention to modify the proposed combination of Zhang and Srinivasan to have a text-prompted image segmentation neural network as taught by Xue to arrive at the claimed invention discussed above.  The motivation for the proposed modification would have been so that the (Xue [0025] “model may be improved by the learning of unified representations.”) 

Claim 8. (Original) Zhang and Srinivasan teach The method of claim 6, 
Zhang and Srinivasan do not explicitly teach wherein the training data set comprises, for each of the plurality of point clouds, a respective ground truth output for the machine learning task, and wherein the training the point cloud processing neural network on the training data set comprises training using (i) the target features and (ii) the respective ground truth outputs.
Xue teaches wherein the training data set comprises, for each of the plurality of point clouds, ([0023] “features from 3D point cloud”) a respective ground truth output ([0060] “expected output (e.g., a “ground-truth” such as the corresponding correct answer for an input question) from the training data,”) for the machine learning task, ([0025] “recognition ability”) and wherein the training the point cloud processing neural network on the training data set comprises training using (i) the target features ([0023] “pretrained on…features from 3D point cloud”) and (ii) the respective ground truth outputs. ([0060] “expected output (e.g., a “ground-truth” …from the training data”)
It would have been obvious to persons of ordinary skill in the art before the effective filing date of the claimed invention to modify the proposed combination of Zhang and Srinivasan to have training the neural network on point cloud features and ground truth  as taught by Xue to arrive at the claimed invention discussed above.  The motivation for the proposed modification would have been so that the (Xue [0025] “model may be improved by the learning of unified representations.”) 
Claim 21. (New) Zhang and Srinivasan teach The system of claim 12, 
Zhang and Srinivasan do not explicitly teach wherein the pre-trained computer vision neural network has been pre-trained using both images and text.
Xue teaches wherein the pre-trained computer vision neural network (Abstract “neural network”) has been pre-trained using both images and text. ([0023] “A vision language model that is pre-trained on massive image-text pairs”)
It would have been obvious to persons of ordinary skill in the art before the effective filing date of the claimed invention to modify the proposed combination of Zhang and Srinivasan to have both image and text used for pre-training a neural network as taught by Xue to arrive at the claimed invention discussed above.  The motivation for the proposed modification would have been so that the (Xue [0025] “model may be improved by the learning of unified representations.”) 

Claim 22. (New) Zhang and Srinivasan teach The system of claim 21, 
Zhang and Srinivasan do not explicitly teach wherein the pre-trained computer vision neural network has been pre-trained jointly with a text processing neural network.
Xue teaches wherein the pre-trained computer vision neural network has been pre-trained jointly with a text processing neural network. ([0023] “A vision language model that is pre-trained on massive image-text pairs” and [0072] “language model (e.g., BLIP-2)”)
It would have been obvious to persons of ordinary skill in the art before the effective filing date of the claimed invention to modify the proposed combination of Zhang and Srinivasan to have both image and text used for pre-training a neural network as taught by Xue to arrive at the claimed invention discussed above.  The motivation for the proposed modification would have been so that the (Xue [0025] “model may be improved by the learning of unified representations.”) 

Claim 23. (New) Zhang and Srinivasan teach The non-transitory computer storage media of claim 17, 
Zhang and Srinivasan do not explicitly teach wherein the pre-trained computer vision neural network has been pre-trained using both images and text.
Xue teaches wherein the pre-trained computer vision neural network has been pre-trained using both images and text. ([0023] “A vision language model that is pre-trained on massive image-text pairs”)
It would have been obvious to persons of ordinary skill in the art before the effective filing date of the claimed invention to modify the proposed combination of Zhang and Srinivasan to have both image and text used for pre-training a neural network as taught by Xue to arrive at the claimed invention discussed above.  The motivation for the proposed modification would have been so that the (Xue [0025] “model may be improved by the learning of unified representations.”) 

Claim 24. (New) Zhang, Srinivasan and Xue teach The non-transitory computer storage media of claim 23, 
Zhang and Srinivasan do not explicitly teach wherein the pre-trained computer vision neural network has been pre-trained jointly with a text processing neural network.
Xue teaches wherein the pre-trained computer vision neural network has been pre-trained jointly with a text processing neural network. ([0023] “A vision language model that is pre-trained on massive image-text pairs” and [0072] “language model (e.g., BLIP-2)”)
It would have been obvious to persons of ordinary skill in the art before the effective filing date of the claimed invention to modify the proposed combination of Zhang and Srinivasan to have both image and text used for pre-training a neural network as taught by Xue to arrive at the claimed invention discussed above.  The motivation for the proposed modification would have been so that the (Xue [0025] “model may be improved by the learning of unified representations.”) 

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (US20220366681, hereinafter “Zhang”) and in view of Srinivasan et al (US20250095173, hereinafter “Srinivasan”) and in view of Xu et al (US20200005485, hereinafter “Xu”) and in view of Bui et al (US20220028088, hereinafter “Bui”)
Claim 10. (Original) Zhang, Srinivasan  and Xu teach The method of claim 9, further comprising 
Zhang, Srinivasan and Xu do not explicitly teach down sampling the patch features in the particular image.
Bui teaches down sampling the patch features in the particular image. ([0011] “performs downsampling on the divided image patches,”)
It would have been obvious to persons of ordinary skill in the art before the effective filing date of the claimed invention to modify the proposed combination of Zhang, Srinivasan and Xu to have down sampling the patch features in an image as taught by Bui to arrive at the claimed invention discussed above.  The motivation for the proposed modification would have been so that the (Bui [0060] “output segmentation map is gradually improved”)

Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure:
	Corral-Soto et al US20200151512 teaches fusing images with 3D datapoints for feature/object detection
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to OWAIS MEMON whose telephone number is (571)272-2168. The examiner can normally be reached M-F (7:00am - 4:00pm) CST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Gregory Morse can be reached at (571) 272-3838. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/OWAIS IQBAL MEMON/Examiner, Art Unit 2663                                                                                                                                                                                                        

/GREGORY A MORSE/Supervisory Patent Examiner, Art Unit 2698
Read full office action
Prosecution Timeline

Dec 15, 2023
Application Filed
Nov 09, 2025
Non-Final Rejection — §103
Feb 12, 2026
Response Filed
Feb 28, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/359,335
Patent 12597224
SYSTEM AND METHOD FOR FEATURE SUB-IMAGE DETECTION AND IDENTIFICATION IN A GIVEN IMAGE
2y 5m to grant Granted Apr 07, 2026
18/392,282
Patent 12591989
METHOD FOR DEPTH ESTIMATION AND HEAD-MOUNTED DISPLAY
2y 5m to grant Granted Mar 31, 2026
18/483,238
Patent 12592013
REAL SCENE IMAGE EDITING METHOD BASED ON HIERARCHICALLY CLASSIFIED TEXT GUIDANCE
2y 5m to grant Granted Mar 31, 2026
17/935,437
Patent 12586338
SYSTEM FOR UPDATING NEURAL NETWORK PARAMETERS BASED ON OBJECT DETECTION AREA OVERLAP SCORE
2y 5m to grant Granted Mar 24, 2026
18/088,397
Patent 12573069
SYSTEMS AND METHODS FOR GENERATING AND CODING MULTIPLE FOCAL PLANES FROM TEXTURE AND DEPTH
2y 5m to grant Granted Mar 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
74%
Grant Probability
97%
With Interview (+22.4%)
3y 2m
Median Time to Grant
Moderate
PTA Risk
Based on 101 resolved cases by this examiner. Grant probability derived from career allow rate.