DETAILED ACTION
This Action is in response to Applicant’s response filed on 02/20/2026. Claims 1-20 are still pending in the present application. This Action is made FINAL.
Response to Arguments
Applicant's arguments have been considered but are moot in view of the new ground(s) of rejection in view of Hori (US 2021/0247201 A1).
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claim(s) 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claim 1 recites the limitation "scaling factor" in line 9. There is insufficient antecedent basis for this limitation in the claim. A previous recitation for scaling factor cannot be found in claim 1. Independent claims 9 and 14 are rejected for similar reasons. Appropriate correction is required.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-17 and 19-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Li (US 2025/0022278 A1) in view of Cai (US 2024/0177329 A1) and in further view of Hori (US 2021/0247201 A1).
Regarding claims 1, 9 and 14, Li discloses a depth system/method, comprising:
one or more processors; (figure 2: processor 210)
a memory communicably coupled to the one or more processors and storing instructions that, when executed by the one or more processors, cause the one or more processors to: (figure 2: memory 214)
[claim 9: A non-transitory computer-readable medium including instructions that, when executed by one or more processors, cause the one or more processors to: (paragraph 82)]
acquire an image depicting surrounding objects present in an environment; (figure 3, step 302; paragraph 46: At step 302, the data processing system can obtain an image. For example, an autonomous vehicle may be driving along a roadway while on a route to a destination. The autonomous vehicle may use one or more cameras or other sensors to capture images of a surrounding environment around the autonomous vehicle.)
select a salient object (unknown objects on the road such as tumbleweeds are “salient” for driving) from the surrounding objects; (paragraphs 48-51, 62-65 and figures 5-7: Li uses panoptic segmentation and masks to identify objects on the roadway. It classifies pixels into road, objects and background, determines which objects are surrounded by road surface pixels, and extracts 2D bounding boxes for those objects.)
determine characteristics of the salient object according to a language model. (paragraphs 45, 47-48, 60-61: Li uses language learning models with text prompts (e.g., “tumbleweed”) for zero-shot object detection. The models take the image and prompts as input and identify instances of objects, which yields characteristics like object class and location (segmentation mask, bounding box).)
Li estimates 3D bounding boxes for unknown objects using LIDAR (paragraphs 56-59, 73-74) but this is not a learned depth model; therefore, Li fails to specifically disclose dimensions of the salient object to use as the scaling factor and adapting a depth model according to the characteristics.
In related art, Cai discloses a scaling factor (paragraphs 5-8 and 34-37: Cai teaches scaling the predicted depth map using depth values and also describes that monocular depth is scale-ambiguous and corrects up to a multiplication factor and provides post-hoc scaling that yields scale-correct depth prediction values) and adapting a depth model (self-supervised monocular depth network) according to the characteristics. (paragraphs 137-143, 147-152: Cai discloses a depth model (self-supervised monocular depth network) whose predicted depth map is scaled using additional information. It computes a predicted depth map, obtains sparse metric depth values from a camera tracker and computes a scaling factor and multiplies the predicted depth map to obtain a sale-correct depth map. Cai also proposes regional scaling where different scalars are applied to different regions or objects based on detected regions.)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to incorporate the depth model and scaling of Cai into Li’s system to obtain accurate metric depth for the same salient unknown objects Li identifies on the roadway. The combined system would acquire an image, select a salient (unknown) object, determine characteristics of that object using a language model, and adapt a depth model (its depth map output) according to those object characteristics.
Furthermore, in related art, Hori discloses dimensions of the salient object. (paragraphs 47, 83, 104 and 122: Hori teaches detecting salient objects and representing them with a bounding box and uses bounding box coordinates to compute area as a saliency measure. Hori’s object attribute extraction includes color, distance and size. Also a list of attributes includes size of the entire salient object and size of a visible portion.)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to incorporate the teachings of Hori into the teachings of Li and Cai to effectively detect multiple salient object from each image.
Regarding claims 2, 10 and 15, Li, as modified by Cai and Hori, discloses the claimed invention wherein the instructions to adapt the depth model include the instructions to train the depth model by using the characteristics to derive a scaling factor as a loss value that is part of a loss function, and wherein the depth model performs monocular depth estimation and is trained according to self-supervised structure-from-motion (SfM) training. (Cai: paragraphs 34 and 68-70 and 90-97: Cai discloses a self-supervised monocular depth model trained via structure-to-motion equation)
Regarding claims 3, 11 and 16, Li, as modified by Cai and Hori, discloses the claimed invention wherein the instructions to adapt the depth model include instructions to use the characteristics to define a scaling factor for adapting depth values generated by the depth model during inference. (Cai: paragraphs 137-143: Cai discloses computing a scaling factor during inference to adapt the predicted depth map: a scale factor is computed from representative values of sparse depth and predicted depth, and then used to scale the predicted depth map. Cai also discloses regional scaling: dividing the frame into a grid or into object, foreground and background regions and computing different scale factors for different regions or objects. The regions or objects are chosen based on object detection or segmentation outputs.)
Regarding claims 4, 12 and 17, Li, as modified by Cai and Hori, discloses the claimed invention wherein the instructions to select the salient object include instructions to :i) identify the surrounding objects according to a semantic model, and ii) segment the salient object from the surrounding objects according to whether a class of the surrounding objects is one of a group of salient classifications, wherein the salient object is a standardized object having pre-defined physical dimensions accessible by the language model. (Li: paragraphs 48-50 and 62-63: Li discloses identifying surrounding objects according to a semantic model: it uses panoptic segmentation and class labels such as roadway, tumbleweed, etc. Pixels are class-labeled, and instances are enumerated. Li then selects unknown objects on the roadway by determining which object instances are surrounded by roadway pixels and are of certain classes and extracts 2D bounding boxes only for those.)
Regarding claims 5 and 13, Li, as modified by Cai and Hori, discloses the claimed invention wherein the instructions to determine the characteristics of the salient object include instructions to provide a representation of the salient object from the image to the language model that uses information about the salient object to determine the characteristics indicating at least a size of the salient object (paragraphs 45, 47-48, 60-61: Li uses language learning models with text prompts (e.g., “tumbleweed”) for zero-shot object detection. The models take the image and prompts as input and identify instances of objects, which yields characteristics like object class and location (segmentation mask, bounding box). Li also discloses deriving 2D bounding boxes (paragraphs 50-51) and 3D bounding boxes using LiDAR and ground-plane estimation (paragraphs 58-59). Cai discloses generating metric depth maps and using them to obtain metric-correct spatial information about objects in the scene. Cai’s depth map allows extracting that object’s metric size from the depth plus image coordinates. (paragraphs 32-37).
Regarding claims 6 and 19, Li, as modified by Cai and Hori, discloses the claimed invention wherein the language model is one of a large language model (LLM) and a visual language model (VLM), and wherein the depth model performs monocular depth estimation on monocular images to generate depth data for the environment. (Cai: paragraphs 32-37, 60, 90-97 and 146-149: Cai discloses a monocular depth model (a self-supervised monocular depth network) that processes single camera images to generate a predicted depth map for the image)
Regarding claims 7 and 20, Li, as modified by Cai and Hori, discloses the claimed invention wherein providing the depth model, including integrating the depth model in a perception pipeline of an autonomous vehicle to facilitate control of the autonomous vehicle. (Cai: paragraphs 44 and 137-143: teaches a depth system that process images to produce scale-correct depth maps)
Regarding claim 8, Li, as modified by Cai and Hori, discloses the claimed invention wherein the depth system is embedded within a vehicle to perceive depth in the environment. (Cai: paragraphs 32-37: Cai discloses the depth system can be implemented in devices including vehicles and autonomous driving systems)
Claim 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Li, Cai and Hori and in further view of Mikhailiuk (US 2025/0077794 A1).
Regarding claim 18, Li, as modified by Cai and Hori, discloses determining the characteristics of the salient object include instructions to provide a representation of the salient object from the image to the language model that uses information about the salient object to determine the characteristics indicating at least a size of the salient object (paragraphs 45, 47-48, 60-61: Li uses language learning models with text prompts (e.g., “tumbleweed”) for zero-shot object detection. The models take the image and prompts as input and identify instances of objects, which yields characteristics like object class and location (segmentation mask, bounding box). Li also discloses deriving 2D bounding boxes (paragraphs 50-51) and 3D bounding boxes using LiDAR and ground-plane estimation (paragraphs 58-59). Cai discloses generating metric depth maps and using them to obtain metric-correct spatial information about objects in the scene. Cai’s depth map allows extracting that object’s metric size from the depth plus image coordinates. (paragraphs 32-37).
Li, as modified by Cai and Hori, fails to disclose wherein determining the characteristics includes generating a textual query to the language model requesting the physical dimensions of the salient object based on a semantic classification of the salient object.
In related art, Mikhailiuk discloses determining the characteristics includes generating a textual query to the language model requesting the physical dimensions of the salient object based on a semantic classification of the salient object. (paragraphs 27-30, 117-127 and 143-147)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to incorporate the teachings of Mikhailiuk into the teachings of Li, Cai and Hori to effectively interpret non-textual data and weave such interpretations into conversational context.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BOBBAK SAFAIPOUR whose telephone number is (571)270-1092. The examiner can normally be reached Monday - Friday, 8:00am - 5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Stephen Koziol can be reached at (408) 918-7630. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/BOBBAK SAFAIPOUR/Primary Examiner, Art Unit 2665