DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This office action is in response to communication filed on 2/2/2026. Claims 1-20 are pending on this application.
Response to Arguments
Applicant's arguments filed 2/2/2026 have been fully considered but they are not persuasive.
Response to Remarks
Regarding claim 17, the applicant notes that the claim has been rejected twice (over Wang in view of Yu and over the combination of Wang and Yu in view of Zakharov). The examiner agrees that the rejection of claim 17 over Wang in view of Yu is in error. The rejection of claim 17 over the combination of Wang and Yu in view of Zakharov remains.
Regarding claim 1, applicant asserts that Wang fails to teach cropping the training image around the object and training a normalized coordinate model using the training image and ground truth information (Remarks page 8).
Examiner respectfully disagrees.
Image cropping is the removal of unwanted areas from the periphery of an image. Wang teaches proposing regions of interest in an image containing objects using a Mask R-CNN (section 5.1.1). A Mask R-CNN uses instance segmentation to generate a mask for each region of interest (see https://developers.arcgis.com/python/latest/giude/how-maskrcnn-works/), essentially preserving the area around an object and removing unwanted areas. This is interpreted to perform the same function as cropping. The output is a NOCS map (section 5.1). The NOCS map from a ground truth object category is used during training (section. 5.1.1), interpreted to be the claimed ground truth information. It would be necessary to train a CNN model (section 1) using training images. Therefore Wang teaches cropping the training image around the object and training a normalized coordinate model using the training image and ground truth information
Regarding claim 9, applicant asserts that Wang does not teach a training dataset with differing levels of annotation including different domains having different degrees of annotation (Remarks page 9).
Examiner respectfully disagrees.
It is noted that neither claim 1 nor claim 9 define the domains and degrees of annotation. With that said, Wang teaches generating large amounts of fully annotated mixed reality data (abstract) and providing a fully annotated real-world dataset with large environment and instance variation (abstract). The generated data set (fig. 4) comprises 31 widely varying indoor scenes as backgrounds (section 4.1, Real Scenes), which are interpreted to be different domains, with realistic looking objects rendered (section 4.1, Synthetic Objects). When there are different numbers of objects in various scenes (section 4.1, 6 object categories and a distractor category. It is interpreted that the previously mentioned instance variation includes differing numbers of objects), it would be obvious to have a different degree of annotation for each image. For instance, an image with six objects may be annotated with a six whereas an image with nine objects may be annotated with a nine. Therefore Wang teaches generating dataset with differing levels of annotation including different domains having different degrees of annotation to be used for training (abstract).
Regarding claim 10, applicant asserts that there is not a prima facie case for obviousness with respect to omitting undesired data from the dataset to reduce storage requirements (Remarks page 9).
It is noted that the cited portion of Wang does not explicitly disclose location and orientation annotation (section 1, We also present a real-world dataset for training and testing with 18 different scenes and ground truth 6D pose and size annotations for 6 object categories). Therefore it would be obvious to one of ordinary skill in the art that only the disclosed annotations would be included in a domain. Furthermore, omitting undesired data would fall under the “Applying a known technique to a known device (method, or product) ready for improvement to yield predictable results” KSR rationale (See MPEP 2143). Omitting data is interpreted to be a technique used in the image analysis art to reduce processing requirements by focusing on only the relevant data, which is a predictable result, as well as only storing desired data. Also, Wang teaches selecting certain data (section 4.1, Real Scenes which recites collecting 553 images for the 31 scenes, 4 of which were set aside for validation). This suggests that not all of the data is necessary to generate fully annotated data sets. Therefore Wang suggests omitting undesired data from the dataset.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1, 3, 5, 6, 8-11, 13, 15, 16, and 18-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al ("Normalized object coordinate space for category-level 6d object pose and size estimation." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019, pages 2642-2651, retrieved from the Internet on 9/19/2025) in view of Yu et al ("pixelnerf: Neural radiance fields from one or few images." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021, pages 4578-4587, retrieved from the Internet on 9/19/2025).
Regarding claim 1, Wang teaches a computer-implemented method for training a model (abstract, Our region-based neural network is then trained to directly infer the correspondence from observed pixels to this shared object representation (NOCS) along with other object information such as class label and instance mask), comprising:
performing two-dimensional object detection on a training image to identify an object (fig. 1; section 5.1);
cropping the training image around the object (section 3, Mask R-CNN; section 5.1.1, For each proposed region of interest (ROI), the output of a head is of size 28×28×N, where N is the number of categories and each category containing the x (or y, z) coordinates for all detected objects in that category. It would be obvious to only use the ROI);
training a normalized coordinate model using the training image and ground truth information (abstract, Our region-based neural network is then trained to directly infer the correspondence from observed pixels to this shared object representation (NOCS); section 5.1.1, During training, only the NOCS map 2646 component from the ground truth object category is used in the loss function).
Wang fails to teach generating a category-level shape reconstruction using a neural radiance field (NeRF) model; and
training a model using information from the category-level shape reconstruction.
However Yu teaches generating a category-level shape reconstruction using a neural radiance field (NeRF) model (section 1; Our experiments show that pixelNeRF can generate novel views from a single image input for both category-specific and category-agnostic settings, even in the case of unseen object categories; section 2, PixelNeRF operates in view-space, which has been shown to allow better reconstruction of unseen object categories; section 4.1); and
training a model using information from the category-level shape reconstruction (section 5.1.2, During both training and evaluation, a random view is selected as the input view for each object; section 5.2, As seen in Fig. 8, the network trained
on synthetic data effectively infers shape and texture of the real cars).
Therefore taking the combined teachings of Wang and Yu as a whole, it would have been obvious to one of ordinary skill in the art at the time the invention was filed to incorporate the steps of Yu into the method of Wang. The motivation to combine Yu and Wang would be to achieve noticeably superior results (section 5.1.1 of Yu).
Regarding claim 3, the modified invention of Wang teaches a method wherein training the normal coordinate model includes training a neural network model using a deep learning process (abstract of Wang, Our region-based neural network).
Regarding claim 5, the modified invention of Wang teaches a method wherein cropping the image excludes information from the training image outside of a bounding box determined by the object detection (section 3 and fig. 3 of Wang).
Regarding claim 6, the modified invention of Wang teaches a method further comprising determining a three-dimensional pose of the object based on normalized coordinates from the normalized coordinate model and the estimated object size (section 5.2 and 7 of Wang).
Regarding claim 8, the modified invention of Wang teaches a method further comprising training a size estimation model to generate the estimated object size responsive to the training image (fig. 3 and section 5 of Wang, Figure 3 shows our method for 6D pose and size estimation of multiple previously unseen objects from an RGB-D image. A CNN predicts class labels, masks, and NOCS maps of objects. It would be necessary to train the CNN).
Regarding claim 9, the modified invention of Wang teaches a method wherein the training dataset is derived from multiple different domains having differing degrees of annotation (abstract and section 4.1 of Wang).
Regarding claim 10, the modified invention of Wang teaches a method wherein at least one domain of the training dataset has object size annotation (section 1 of Wang, We also present a real-world dataset for training and testing with 18 different scenes and ground truth 6D pose and size annotations for 6 object categories).
Wang fails to teach wherein at least one domain of the training dataset lacks location and orientation annotation. However it would be obvious to omit undesired data from the dataset to reduce storage requirements.
Regarding claim 11, the claim recites similar subject matter as claim 1 and is rejected for the same reasons as stated above.
Regarding claim 13, the claim recites similar subject matter as claim 3 and is rejected for the same reasons as stated above.
Regarding claim 15, the claim recites similar subject matter as claim 5 and is rejected for the same reasons as stated above.
Regarding claim 16, the claim recites similar subject matter as claim 6 and is rejected for the same reasons as stated above.
Regarding claim 18, the claim recites similar subject matter as claim 8 and is rejected for the same reasons as stated above.
Regarding claim 19, the claim recites similar subject matter as claim 9 and is rejected for the same reasons as stated above.
Regarding claim 20, the claim recites similar subject matter as claim 10 and is rejected for the same reasons as stated above.
Claim(s) 2 and 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al ("Normalized object coordinate space for category-level 6d object pose and size estimation." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019, pages 2642-2651, retrieved from the Internet on 9/19/2025) and Yu et al ("pixelnerf: Neural radiance fields from one or few images." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021, pages 4578-4587, retrieved from the Internet on 9/19/2025) in view of Cohen et al (US20240005261).
Regarding claim 2, the modified invention of Wang teaches a method wherein the training image is of a navigable environment (abstract of Wang, we also provide a
fully annotated real-world dataset with large environment and instance variation) and the object is a navigation obstacle (fig. 1 and section 5.1 of Wang).
Wang fails to teach wherein the navigable environment is a healthcare facility.
However Cohen teaches a training image of a healthcare facility (para. [0068], In various embodiments, a physical environment such as a warehouse, hospital, or office may be provisioned with sensors, emitters, or other data collection and/or data producing systems during training. This apparatus may be used to create a richer set of training data that can be used as inputs and/or labels for training a ML-based model).
Therefore taking the combined teachings of Wang and Yu with Cohen as a whole, it would have been obvious to one of ordinary skill in the art at the time the invention was filed to incorporate the steps of Cohen into the method of Wang and Yu. The motivation to combine Cohen, Yu and Wang would be to create a richer set of training data (para. [0068] of Cohen).
Regarding claim 12, the claim recites similar subject matter as claim 2 and is rejected for the same reasons as stated above.
Claim(s) 7 and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al ("Normalized object coordinate space for category-level 6d object pose and size estimation." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019, pages 2642-2651, retrieved from the Internet on 9/19/2025) and Yu et al ("pixelnerf: Neural radiance fields from one or few images." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021, pages 4578-4587, retrieved from the Internet on 9/19/2025) in view of Zakharov et al (US20220300770).
Regarding claim 7, the modified invention of Wang fails to teach a method further comprising using the normalized coordinates and the three-dimensional pose of the object in an autonomous vehicle to navigate through an environment.
However Zakharov teaches using normalized coordinates (para. [0032]) and the three-dimensional pose of an object (para. [0034]) in an autonomous vehicle to navigate through an environment (para. [0048], [0073]).
Therefore taking the combined teachings of Wang and Yu with Zakharov as a whole, it would have been obvious to one of ordinary skill in the art at the time the invention was filed to incorporate the steps of Zakharov into the method of Wang and Yu. The motivation to combine Zakharov, Yu and Wang would be to improve transferability to real-world datasets (para. [0001] of Zakharov).
Regarding claim 17, the claim recites similar subject matter as claim 7 and is rejected for the same reasons as stated above.
Allowable Subject Matter
Claims 4 and 14 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEON VIET Q NGUYEN whose telephone number is (571)270-1185. The examiner can normally be reached Mon-Fri 11AM-7PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Gregory Morse can be reached at 571-272-3838. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/LEON VIET Q NGUYEN/Primary Examiner, Art Unit 2663