Last updated: April 19, 2026

Application No. 18/602,504

METHOD FOR ASCERTAINING A DESCRIPTOR IMAGE FOR AN IMAGE OF AN OBJECT

Final Rejection §103

Filed

Mar 12, 2024

Examiner

WANG, YUEHAN

Art Unit

2617

Tech Center

2600 — Communications

Assignee

Robert Bosch GmbH

OA Round

2 (Final)

Interview Optional

— +12.9% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 485 resolved cases, 2023–2026

Examiner Intelligence

WANG, YUEHAN View full profile →

Grants 83% — above average

Career Allow Rate

404 granted / 485 resolved

+21.3% vs TC avg

Moderate +13% lift

Without

With

+12.9%

Interview Lift

resolved cases with interview

Typical timeline

2y 7m

Avg Prosecution

47 currently pending

Career history

532

Total Applications

across all art units

Statute-Specific Performance

§101

4.3%

-35.7% vs TC avg

§103

69.6%

+29.6% vs TC avg

§102

8.3%

-31.7% vs TC avg

§112

6.6%

-33.4% vs TC avg

Black line = Tech Center average estimate • Based on career data from 485 resolved cases

Office Action

§103

DETAILED ACTION

Response to Amendment
Applicant’s amendments filed on 4 December 2025 have been entered. Claims 1, 3, and 5-9 have been amended. Claims 1-9 are still pending in this application, with claims 1 and 7-9 being independent.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-4, 6, 8 and 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over ZHOU et al. (US 20220164595 A1), referred herein as ZHOU in view of Laradji et al. (US 20210241034 A1), referred herein as Laradji and Grabner et al. (US 20200388071 A1), referred herein as Grabner.
Regarding Claim 1, ZHOU in view of Laradji and Grabner teaches a method for ascertaining a descriptor image for an image of an object, comprising the following steps (ZHOU Abst: a computing device obtains an image descriptor map corresponding to a captured image): 
training the machine learning model ZHOU [0060] the computing device 120 may input the captured image 130 into a trained machine learning model and then gain the image descriptor map 160 at the output of the machine learning model; [0071] After acquiring the reference image 140 of the external environment 105, the computing device 120 may obtain the set of spatial coordinates 145 and the set of reference descriptors 147 corresponding to the set of keypoints 143 in the reference image 140);
ZHOU does not but Laradji teaches
training a plurality of machine learning model instances, each machine learning model instance corresponding to an object class of a plurality of object classes, and for each machine learning model instance, training the machine learning model instance to map images of objects of the object class (Laradji [0014] generating, for each image-level labelled image of the set of image-level labelled images, based on the respective CAM and the respective set of region proposals, a respective pseudo mask of the respective object indicative of pixels in the image-level labelled image corresponding to the respective object class, and generating the set of training images to be provided for training the instance segmentation MLA, each training image comprising: a respective object class of the respective object, and the respective pseudo mask of the respective object having the respective object class).
ZHOU in view of Laradji further teaches
and storing reference descriptors output by the machine learning model instance for one or more objects of the object class (Laradji [0216] According to processing step 806, during training, the training server 220 generates a class activation map (CAM) 322 of the object class 318 in the given labelled image 312. The CAM 322 of the object class 318 is indicative of the discriminative image regions used by the PRM generation MLA 260 to identify the object class 318. The CAM 322 specifies a classification confidence for a given object class 318 at each image location in the given labelled image 312; ZHOU [0071] the computing device 120 or another entity (for example, another computing device) may have generated and stored a set of keypoints, a set of reference descriptors and a set of spatial coordinates in association for each reference image in the set of reference images of the external environment 105).
receiving an image of an object (ZHOU [0045] the captured image 130 captured by the imaging device of the vehicle 110 presents road boundaries, lane markings, trees, a traffic light, a vehicle in front of the vehicle 110, clouds in the sky, and other objects);
generating, by each machine learning model instance, a respective descriptor image for the object by mapping the received image to a descriptor image using the machine learning model instance trained for the respective object class (ZHOU [0059] the image descriptor map 160 may include descriptors of respective image points in the captured image 130. For example, in the image descriptor map 160, a position corresponding to an image point in the captured image 130 records a descriptor of the image point; Laradji [0216] According to processing step 806, during training, the training server 220 generates a class activation map (CAM) 322 of the object class 318 in the given labelled image 312. The CAM 322 of the object class 318 is indicative of the discriminative image regions used by the PRM generation MLA 260 to identify the object class 318. The CAM 322 specifies a classification confidence for a given object class 318 at each image location in the given labelled image 312);
evaluating, for each object class, a distance between the reference descriptors stored for the object class of the respective machine learning model instance and the descriptors of the descriptor image generated for the object class by the respective machine learning model instance (ZHOU [0088] Referring back to FIG. 2, at block 240, the computing device 120 may determine a plurality of similarities 170 between the plurality of sets of image descriptors 165 and the set of reference descriptors 147. In other words, for a set of image descriptors among the plurality of sets of image descriptors 165, the computing device 120 may determine a similarity between the set of image descriptors and the set of reference descriptors 147, thereby determining a similarity of the plurality of similarities 170; [0091] Using the L2 distance between descriptors to represent a difference between descriptors); and
ZHOU does not but Grabner teaches
assigning the descriptor image to the object as the descriptor image of the object generated for that object class for which the distance between the reference descriptors stored for the object class of the respective machine learning model instance and the descriptors of the descriptor image generated for the object class by the respective machine learning model instance, was rated to be smallest (Grabner [0097] and minimizes the distance D(f.sub.i, c.sub.y.sub.i) between a location field descriptor f.sub.i and its corresponding center descriptor c.sub.i. In this case, y.sub.i is the index of the corresponding 3D model and N denotes the number of samples; [0105] the descriptor matching engine 614 can output the top matching 3D model having the center location field descriptor with the shortest distance to the location field descriptor 624 of the chair object in the image 602).
Laradji discloses a method and a system for generating training images for training an instance segmentation machine learning algorithm. Laradji is analogous to the present patent application. 
It would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention to have modified ZHOU to incorporate the teachings of Laradji, and applying the method for training an instance segmentation machine learning model based on image-level labelled images into the method for the technical fields of autonomous driving, electronic map, deep learning, image processing.
Doing so, would not only enable to increase performance of models performing instance segmentation, but would also enable to broaden the types of applications of such models.
Grabner discloses techniques for one or more three-dimensional models representing one or more objects. Grabner is analogous to the present patent application. 
It would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention to have modified ZHOU to incorporate the teachings of Grabner, and applying the descriptor matching engine and algorithm into the method for the technical fields of autonomous driving, electronic map, deep learning, image processing.
Doing so, artificial virtual objects that do not exist in reality or computer-generated copies of actual objects or structures of the user's natural surroundings can be added to the AR environment.

Regarding Claim 2, ZHOU in view of Laradji and Grabner teaches the method according to claim 1, and further teaches wherein the distance between the reference descriptors and the descriptor image is evaluated by assigning each of the reference descriptors to a descriptor of the descriptor image and averaging the distances between the reference descriptors and their assigned descriptors (Grabner [0062] a single CNN to map RGB images and RGB renderings to an embedding space that is optimized using a Euclidean distance-based lifted structure loss. At test time, the distances between an embedding of an RGB image and embeddings of multi-view RGB renderings can be averaged to compensate for the unknown object pose; [0104] For each unseen 3D model, the location field rendering engine 610 can render a certain number of location fields (e.g., 100 location fields) under different 3D poses, the LFD-CNN 612 can compute the center descriptor embeddings and can average the center descriptors to obtain a new center descriptor).

Regarding Claim 3, ZHOU in view of Laradji and Grabner teaches the method according to claim 1, and further teaches wherein at least two machine learning model instances of the plurality of machine learning model instances share a sub-model that is used as part of training the at least two machine learning model instances (Laradji [0248] performing instance segmentation by generating instance segmentation training data from image-level training data; ZHOU [0088] for a set of image descriptors among the plurality of sets of image descriptors 165, the computing device 120 may determine a similarity between the set of image descriptors and the set of reference descriptors 147, thereby determining a similarity of the plurality of similarities 170. For example, referring to FIG. 1, for the first set of image descriptors 165-1 among the plurality of sets of image descriptors 165, the computing device 120 may determine a first similarity 170-1 between the first set of image descriptors 165-1 and the set of reference descriptors 147).

Regarding Claim 4, ZHOU in view of Laradji and Grabner teaches the method according to claim 4, and further teaches further comprising training the sub-model using training data containing objects from all of the object classes (ZHOU [0088] For other sets of image descriptors among the plurality of sets of image descriptors 165, the computing device 120 may determine likewise the similarities between them and the set of reference descriptors 147 to finally obtain the plurality of similarities 170).

Regarding Claims 8 and 9, ZHOU in view of Laradji and Grabner teaches a control unit configured to and a non-transitory computer-readable medium on which is stored instructions ascertaining a descriptor image for an image of an object, the instructions for controlling a robot to pick up or process an object, comprising the following steps (ZHOU Abst: a method, an apparatus, an electronic device and a storage medium for vehicle localization, which relates to the technical fields of autonomous driving, electronic map, deep learning, image processing, and the like… a computing device obtains an image descriptor map corresponding to a captured image; Fig. 8). The metes and bounds of the claims substantially correspond to the limitations set forth in claim 1; thus they are rejected on similar grounds and rationale as their corresponding limitations.

Claim(s) 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over ZHOU et al. (US 20220164595 A1), referred herein as ZHOU in view of Laradji et al. (US 20210241034 A1), referred herein as Laradji, Grabner et al. (US 20200388071 A1), referred herein as Grabner and WILLIAMS et al. (US 20220405363 A1), referred herein as WILLIAMS.
Regarding Claim 5, ZHOU in view of Laradji and Grabner teaches the method according to claim 1. However, in view of WILLIAMS, the prior art further teaches wherein, for each object class, the respective machine learning model instance is trained using a training data set that contains images of objects of the object class, wherein the objects of the object class are overrepresented in the training data set (WILLIAMS [0014] In the example of speaker authentication, using an overly large data set can lead to a neural network overfitting to the training data, especially if certain categories of speakers are overrepresented in a training data set).
WILLIAMS discloses a method of generating a biometric signature of a user for use in authentication using a neural network. WILLIAMS is analogous to the present patent application. 
It would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention to have modified ZHOU to incorporate the teachings of WILLIAMS, and applying the overrepresented training data into the method for the technical fields of autonomous driving, electronic map, deep learning, image processing.
Doing so would improve the performance of neural networks.

Claim(s) 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over ZHOU et al. (US 20220164595 A1), referred herein as ZHOU in view of Laradji et al. (US 20210241034 A1), referred herein as Laradji, Grabner et al. (US 20200388071 A1), referred herein as Grabner and Shrivastava et al. (US 20230186587 A1), referred herein as Shrivastava.
Regarding Claim 7, ZHOU in view of Laradji, Grabner and Shrivastava teaches a method for controlling a robot to pick up or process an object, comprising the following steps (ZHOU Abst: a computing device obtains an image descriptor map corresponding to a captured image; Shrivastava [0012] a deep neural network (DNN) can be trained and then used to determine objects in image data acquired by sensors in systems including vehicle guidance, robot operation, security, manufacturing, and product tracking). The metes and bounds of the claims substantially correspond to the limitations set forth in claim 1; thus they are rejected on similar grounds and rationale as their corresponding limitations.
ZHOU in view of Laradji, Grabner and Shrivastava further teaches
ascertaining a position of a location or a pose for picking up or processing the object in a current control scenario from the ascertained descriptor image (ZHOU [0059] the image descriptor map 160 may include descriptors of respective image points in the captured image 130. For example, in the image descriptor map 160, a position corresponding to an image point in the captured image 130 records a descriptor of the image point); and
controlling the robot to pick up or process the object according to the ascertained position or location or according to the ascertained pose (Shrivastava [0012] Robot guidance can include guiding a robot end effector, for example a gripper, to pick up a part and orient the part for assembly in an environment that includes a plurality of parts).
Shrivastava discloses a method for vehicle localization, which relates to the technical fields of autonomous driving, electronic map, deep learning, image processing. Shrivastava is analogous to the present patent application. 
It would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention to have modified ZHOU to incorporate the teachings of Shrivastava, and applying the deep neural network trained based on object class prediction loss functions into the method for the technical fields of autonomous driving, electronic map, deep learning, image processing.
Doing so can improve localization accuracy and robustness of the vehicle visual localization algorithm.

Response to Arguments
Applicant's arguments filed on 24 November 2025, with respect to the 103 rejection have been fully considered but are moot in view of the new grounds of rejection. 
	Examiner notes that independent claims 1, 9 and 13 have been amended to include new limitation. Examiner finds these limitations to be unpatentable as can be found in above detail action.
	On pages 10-12 of Applicant’s Remarks, the Applicant argues the dependent claims are not taught by the prior art, insomuch as they depend from claims that are not taught by the prior art. Examiner respectfully disagrees with these arguments, for the reasons discussed above.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Samantha (Yuehan) Wang whose telephone number is (571)270-5011. The examiner can normally be reached Monday-Friday, 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, King Poon can be reached on (571)272-7440. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Samantha (YUEHAN) WANG/
Primary Examiner
Art Unit 2617

Read full office action

Prosecution Timeline

Mar 12, 2024

Application Filed

Sep 24, 2025

Non-Final Rejection — §103

Dec 04, 2025

Response Filed

Jan 20, 2026

Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/198,019

Patent 12597178

VECTOR OBJECT PATH SEGMENT EDITING

2y 5m to grant Granted Apr 07, 2026

18/528,922

Patent 12597506

ENDOSCOPIC EXAMINATION SUPPORT APPARATUS, ENDOSCOPIC EXAMINATION SUPPORT METHOD, AND RECORDING MEDIUM

2y 5m to grant Granted Apr 07, 2026

18/492,720

Patent 12586286

DIFFERENTIABLE REAL-TIME RADIANCE FIELD RENDERING FOR LARGE SCALE VIEW SYNTHESIS

2y 5m to grant Granted Mar 24, 2026

18/584,076

Patent 12586261

IMAGE PROCESSING APPARATUS, IMAGE PROCESSING METHOD, AND STORAGE MEDIUM

2y 5m to grant Granted Mar 24, 2026

18/372,370

Patent 12567182

USING AUGMENTED REALITY TO VISUALIZE OPTIMAL WATER SENSOR PLACEMENT

2y 5m to grant Granted Mar 03, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

83%

Grant Probability

96%

With Interview (+12.9%)

2y 7m

Median Time to Grant

Moderate

PTA Risk

Based on 485 resolved cases by this examiner. Grant probability derived from career allow rate.