Last updated: May 29, 2026
Application No. 18/466,779
VOXEL SEARCH USING MULTI-MODAL EMBEDDINGS

Final Rejection §103§112
Filed
Sep 13, 2023
Examiner
CHANDRASIRI, UPUL PRIYADARSHAN
Art Unit
3665
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
GM Cruise Holdings LLC
OA Round
2 (Final)
Interview Optional

— -16.7% interview lift. Interview lift (-16.7%) is below the 15.0% threshold. A written response is recommended.
Based on 16 resolved cases, 2023–2026
Examiner Intelligence

CHANDRASIRI, UPUL PRIYADARSHAN View full profile →
Grants only 12% of cases
Career Allowance Rate
2 granted / 16 resolved
-39.5% vs TC avg
Minimal -17% lift
Without
With
+-16.7%
Interview Lift
resolved cases with interview
Typical timeline
2y 12m
Avg Prosecution
26 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§103
91.2%
+51.2% vs TC avg
§102
8.9%
-31.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 16 resolved cases
Office Action

§103 §112
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
The amendment filed 11/11/2025 is being entered. Claims 1, 2, 6, 8, 9, 13, 15, 16, and 20 are amended. Claims 1-20 are pending, and rejected as detailed below. This action is final as necessitated by amendment. 

Amendments to the Drawing
Replacement sheets for drawings 1 and 4 are accepted, However, replacement sheets for drawings 3 and 5 are unacceptable and objected because they are not in compliance with § 1.84  for having incorrect margins (left margin). Therefore the drawing objection for FIG. 3 and FIG. 5 are maintained. 

Response to Arguments
Claim Rejections under 35 U.S.C. §103 
Applicant argues that the claims, as amended herein, are not disclosed by the combined teachings of the cited art because none of the references, individually or in combination, teach or suggest the specific process of generating voxel representations from point cloud data and then producing embeddings from those voxel representations for the purpose of cross-modal comparison with an embedding derived from a natural language text string. The cited art may disclose the generic tasks of generating feature vectors or embeddings from visual or spatial data, and may separately discuss the use of textual labels or identifiers, but they do not disclose generating embeddings from both a spatial data modality and a natural language modality and then performing a direct comparison between these two types of embeddings to identify a matching object. The amended claim requires a data processing sequence that begins with spatial point cloud data, transforms it into voxel representations, generates embeddings from those voxels, and then compares those embeddings to an embedding generated from natural language input. This cross-modal semantic matching process, which bridges spatial sensor data and natural language, is not taught or suggested by the cited art and represents a technical distinction over the prior disclosures. In view of the foregoing, Applicants submit that the References fail to teach or suggest each and every element of the claimed invention, either arranged as claimed or arranged so as to perform as the claimed invention performs, and are therefore wholly inadequate in their teaching of the claimed invention as a whole, fail to motivate one skilled in the art to do what the patent Applicants have done, fail to teach a modification to a primary reference being modified that does not render the modified reference unsuitable for its intended purpose, fail to teach a modification to prior art absent the use of hindsight, and discloses a substantially different invention from the claimed invention, 
Applicant’s arguments, as amended herein, with respect to the rejections of claims 1, 8, and 15 under 35 U.S.C. §103 have been fully considered and not persuasive as Schwindt teach about the point cloud having a spatial data transformation in paragraph [0043] and voxal representation having spatial data in paragraph [0030]. Furthermore, Schwindt also teach text string having natural language expression (via text) in paragraph [0038]. In particular, the amendments to claims 1, 8, and 15 are addressed in the instant office action. 

Drawings
The Replacement drawings were received on 11/11/2025. These drawings are unacceptable as FIG. 3 and FIG. 5 are objected because they are not in compliance with § 1.84  for having incorrect margins (left side margin).  
1.84    Standards for drawings.
	(g) Margins. The sheets must not contain frames around the sight (i.e., the usable surface), but should have scan target points (i.e., cross-hairs) printed on two catercorner margin corners. Each sheet must include a top margin of at least 2.5 cm. (1 inch), a left side margin of at least 2.5 cm. (1 inch), a right side margin of at least 1.5 cm. (5/8 inch), and a bottom margin of at least 1.0 cm. (3/8 inch), thereby leaving a sight no greater than 17.0 cm. by 26.2 cm. on 21.0 cm. by 29.7 cm. (DIN size A4) drawing sheets, and a sight no greater than 17.6 cm. by 24.4 cm. (6 15/16 by 9 5/8 inches) on 21.6 cm. by 27.9 cm. (8 1/2 by 11 inch) drawing sheets.
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1, 2, 6, 8, 9, 13, 15, 16, and 20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Applicant has mentioned the term “modality” in claims 1, 2, 6, 8, 9, 13, 15, 16, and 20. However, Applicant fails to provide any additional information or explanations for the term “modality” within the specification. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim(s) 1-4, 6-11, 13-18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Schwindt (US 20240190461 A1), and further in view of ALON (US 20210383115 A1).

Regarding claim 1, Schwindt teaches (Currently Amended) An apparatus (Schwindt, at least one para. 0001; “a system and method for an autonomous vehicle with a plurality of sensors.”) comprising:
at least one memory (Schwindt, at least one para. 0039; “FIG. 2 illustrates an example of the electronic controller 14, which includes an electronic processor 34 (for example, a microprocessor, application specific integrated circuit, etc.), a memory 36, and an input/output interface 38. The memory 36 may be made up of one or more non-transitory computer-readable media and includes at least a program storage area and a data storage area. ”); and
at least one processor coupled to the at least one memory, the at least one processor configured to (Schwindt, at least one para. 0039; “FIG. 2 illustrates an example of the electronic controller 14, which includes an electronic processor 34 (for example, a microprocessor, application specific integrated circuit, etc.), a memory 36, and an input/output interface 38. The memory 36 may be made up of one or more non-transitory computer-readable media and includes at least a program storage area and a data storage area. ”):
receive sensor data, wherein the sensor data represents a real-world environment encountered by an autonomous vehicle (AV) (Schwindt, at least one para. 0043; “During operation of the autonomous vehicle 10, the sensor 22 measures features of the environment surrounding the autonomous vehicle 10, and outputs sensor information. At block 62, the electronic processor 34 receives the sensor information from the sensor 22.”) and wherein the sensor data comprises point cloud data representing (Schwindt, at least one para. 0049; “FIG. 4 schematically illustrates an example configuration of the sensors 22 for which the method 50 may be implemented. The plurality of sensors 22 may include one or more lidar sensors 104 configured to output lidar data”) a plurality of objects (Schwindt, at least one para. 0048; “The electronic processor 34 may use known techniques to classify detected objects”), the point cloud data having a spatial data modality (Schwindt, at least one para. 0043; “Processing the sensor information with the sensor plugin 44 may further include spatially transforming the sensor information based on the determined mounting position of the sensor 22.”, for compact prosecution, examiner interprets the term “modality” as a particular form or mode);
generate, based at least in part on the point cloud data having the spatial data modality, a voxel representation for the plurality of objects (Schwindt, at least one para. 0030; “Autonomous vehicles and driver assistance functions may rely on grid maps storing probabilistic information relating to features of an environment in order to control vehicle movement in the environment. Grid maps include a set of cells associated with respective positions in the environment surrounding a vehicle. The cells, which may be 2D cells or 3D voxels, include occupancy information relating to static or dynamic features of the environment.”), the voxel representation having the spatial data modality (Schwindt, at least one para. 0030; “The occupancy information may include sensor measurements of the features, occupancy probabilities of corresponding positions in the environment, and free-space probabilities of corresponding positions in the environment.”, for compact prosecution, examiner interprets the term “modality” as a particular form or mode);
generate, based on the voxel representation for the plurality of objects and the spatial data modality, a corresponding set of voxel-derived and spatial-data-modality-derived first embeddings; 
receive a text string having a natural language modality (Schwindt, at least one para. 0038; “In some instances, the electronic controller 14 controls aspects of the autonomous vehicle 10 based on commands received from the user interface 26. The user interface 26 provides an interface between the components of the autonomous vehicle 10 and an occupant (for example, a driver) of the autonomous vehicle 10. The user interface 26 is configured to receive input from the occupant, receive indications of vehicle status from the system's controllers (for example, the electronic controller 14), and provide information to the driver based on the received indications. The user interface 26 provides visual output, such as, for example, graphical indicators (for example, fixed or animated icons), lights, colors, text, images, combinations of the foregoing, and the like.”, it is inherent that a text message expression of language, and for compact prosecution, examiner interprets the term “modality” as a particular form or mode);
generate a natural-language-modality-derived second embedding corresponding to the text string and the natural language modality; and
identify a matching object among the plurality of objects based on a comparison of the set of voxel-derived and spatial-data-modality-derived first embeddings and the natural-language-modality-derived second embedding. 
Schwindt does not explicitly teaches generate, based on the voxel representation for the plurality of objects and the spatial data modality, a corresponding set of voxel-derived and spatial-data-modality-derived first embeddings; 
generate a natural-language-modality-derived second embedding corresponding to the text string and the natural language modality; and
identify a matching object among the plurality of objects based on a comparison of the set of voxel-derived and spatial-data-modality-derived first embeddings and the natural-language-modality-derived second embedding. 
However, ALON, in the same field of endeavor (ALON, at least one para. 0018; “In many cases a robot may interact with an environment to perform a variety of operations. For example, a robot cleaner may move around a room for cleaning purposes. As another example, a robot lawn mower may travel around a lawn or outdoor area for the purpose of mowing grass. In yet another example, an autonomous vehicle may be used to perform various operations at a work site or an industrial location”) teaches generate, based on the voxel representation for the plurality of objects and the spatial data modality, a corresponding set of voxel-derived and spatial-data-modality-derived first embeddings (ALON, at least one para. 0164; “Segmenting may include mapping large numbers of points or polygons (e.g., hundreds of thousands of polygons) to a plurality of discrete components in the still image. Segmenting may include implementing a segmenting algorithm, including implementing object recognition algorithms and/or machine-learning models, to map basic image elements (e.g. voxels) to one or more discrete components.”);
generate a natural-language-modality-derived second embedding corresponding to the text string and the natural language modality (ALON, at least one para. 0199; “In some embodiments, an object image identifier may include text generated by an algorithm. For example, an advertiser may provide text describing an object, and a text-parsing system may extract relevant keywords from the text for use in an object image identifier. In some embodiments, an object image identifier may include information based on results of a classification model. For example, an advertiser may provide one or more images of a product, and a 3D- or 2D-matching algorithm may identify similar objects in a game scene. As an example of a 2D-matching algorithm, a neural network may segment an image into one or more objects and classify the object.”); and
identify a matching object among the plurality of objects based on a comparison of the set of voxel-derived and spatial-data-modality-derived first embeddings and the natural-language-modality-derived second embedding (ALON, at least one para. 0257; “At step 412, visual input reconstruction system 120 may determine a matching object, consistent with disclosed embodiments. Determining a matching object may be based on output of an algorithm or model (e.g., a machine learning model).”).
Schwindt and ALON are both considered to be analogous to the claimed invention because both of them are in the same field as collecting sensor data to navigate an autonomous vehicle as the claimed invention. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention, to have use the collected sensor data of Schwindt to identify a matching object based on teaching of ALON. One of the ordinary skill in the art would have been motivated to make this modification so that a product can be accurately identified according to the inputted data (ALON; 059). 

Regarding claim 2, The combination of Schwindt and ALON teaches the limitations of claim 1, upon which the instant claim depends, as discussed supra. Further, ALON teaches (Currently Amended) The apparatus of claim 1, wherein the comparison is based on a set of distances between the natural-language-modality-derived second embedding and the set of voxel-derived and spatial-data-modality-derived first embeddings (ALON, at least one para. 0297; “Comparing components may include determining a similarity metric indicating a degree of similarity between the matched-component and stored image data. A similarity metric may be based on shape data, color data, and/or any other data. A similarity metric may be based on a statistical similarity such as a covariance, a least-squares distance.”). 

Regarding claim 3, The combination of Schwindt and ALON teaches the limitations of claim 2, upon which the instant claim depends, as discussed supra. Further, ALON teaches (Original The apparatus of claim 2, wherein the matching object is a closest Euclidean distance in the set of distances (ALON, at least one para. 0297; “Comparing components may include determining a similarity metric indicating a degree of similarity between the matched-component and stored image data. A similarity metric may be based on shape data, color data, and/or any other data. A similarity metric may be based on a statistical similarity such as a covariance, a least-squares distance.”).

Regarding claim 4, The combination of Schwindt and ALON teaches the limitations of claim 1, upon which the instant claim depends, as discussed supra. Further, Schwindt teaches (Original The apparatus of claim 1, wherein the at least one processor is further configured to: navigate the AV to a location corresponding to the matching object (Schwindt, at least one para. 0036; “The vehicle control systems 18 may include controllers, actuators, and the like for controlling operation of the autonomous vehicle 10 (for example, acceleration, braking, shifting gears, and the like). The vehicle control systems 18 communicate with the electronic controller 14 via the bus 30.”).

Regarding claim 6, The combination of Schwindt and ALON teaches the limitations of claim 1, upon which the instant claim depends, as discussed supra. Further, ALON teaches (Currently Amended) The apparatus of claim 1, wherein each embedding in the set of voxel-derived and spatial-data-modality-derived first embeddings comprises a vector representing characteristics of the corresponding object (ALON, at least one para. 0165; “Comparing objects and/or images may include comparing feature vectors associated with the objects and/or images.”, In other words, comparison step is executable because the first embedding comprises a vector representation).

Regarding claim 7, The combination of Schwindt and ALON teaches the limitations of claim 1, upon which the instant claim depends, as discussed supra. Further, Schwindt teaches (Original) The apparatus of claim 1, wherein the point cloud data is generated by a Light Detection and Ranging (LiDAR) sensor (Schwindt, at least one para. 0049; “FIG. 4 schematically illustrates an example configuration of the sensors 22 for which the method 50 may be implemented. The plurality of sensors 22 may include one or more lidar sensors 104 configured to output lidar data”).

Regarding claim 8, Schwindt teaches (Currently Amended) A computer-implemented method (Schwindt, at least one para. 0001; “a system and method for an autonomous vehicle with a plurality of sensors.”) comprising:
receiving sensor data, wherein the sensor data represents a real-world environment encountered by an autonomous vehicle (AV) (Schwindt, at least one para. 0043; “During operation of the autonomous vehicle 10, the sensor 22 measures features of the environment surrounding the autonomous vehicle 10, and outputs sensor information. At block 62, the electronic processor 34 receives the sensor information from the sensor 22.”) and wherein the sensor data comprises point cloud data representing (Schwindt, at least one para. 0049; “FIG. 4 schematically illustrates an example configuration of the sensors 22 for which the method 50 may be implemented. The plurality of sensors 22 may include one or more lidar sensors 104 configured to output lidar data”) a plurality of objects (Schwindt, at least one para. 0048; “The electronic processor 34 may use known techniques to classify detected objects”), the point cloud data having a spatial data modality (Schwindt, at least one para. 0043; “Processing the sensor information with the sensor plugin 44 may further include spatially transforming the sensor information based on the determined mounting position of the sensor 22.”, for compact prosecution, examiner interprets the term “modality” as a particular form or mode);
generating, based at least in part on the point cloud data having the spatial data modality, a voxel representation for the plurality of objects (Schwindt, at least one para. 0030; “Autonomous vehicles and driver assistance functions may rely on grid maps storing probabilistic information relating to features of an environment in order to control vehicle movement in the environment. Grid maps include a set of cells associated with respective positions in the environment surrounding a vehicle. The cells, which may be 2D cells or 3D voxels, include occupancy information relating to static or dynamic features of the environment.”), the voxel representation having the spatial data modality (Schwindt, at least one para. 0030; “The occupancy information may include sensor measurements of the features, occupancy probabilities of corresponding positions in the environment, and free-space probabilities of corresponding positions in the environment.”, for compact prosecution, examiner interprets the term “modality” as a particular form or mode);
generating, based on the voxel representation for the plurality of objects and the spatial data modality, a corresponding set of voxel-derived and spatial-data-modality-derived first embeddings;
receiving a text string having a natural language modality (Schwindt, at least one para. 0038; “In some instances, the electronic controller 14 controls aspects of the autonomous vehicle 10 based on commands received from the user interface 26. The user interface 26 provides an interface between the components of the autonomous vehicle 10 and an occupant (for example, a driver) of the autonomous vehicle 10. The user interface 26 is configured to receive input from the occupant, receive indications of vehicle status from the system's controllers (for example, the electronic controller 14), and provide information to the driver based on the received indications. The user interface 26 provides visual output, such as, for example, graphical indicators (for example, fixed or animated icons), lights, colors, text, images, combinations of the foregoing, and the like.”, it is inherent that a text message expression of language, and for compact prosecution, examiner interprets the term “modality” as a particular form or mode);
generating a natural-language-modality-derived second embedding corresponding to the text string and the natural language modality; and
identifying a matching object among the plurality of objects based on a comparison of the set of voxel-derived and spatial-data-modality-derived first embeddings and the natural-language-modality-derived second embedding. 
Schwindt does not explicitly teaches generating, based on the voxel representation for the plurality of objects and the spatial data modality, a corresponding set of voxel-derived and spatial-data-modality-derived first embeddings;
generating a natural-language-modality-derived second embedding corresponding to the text string and the natural language modality; and
identifying a matching object among the plurality of objects based on a comparison of the set of voxel-derived and spatial-data-modality-derived first embeddings and the natural-language-modality-derived second embedding. 
	However, ALON, in the same field of endeavor (ALON, at least one para. 0018; “In many cases a robot may interact with an environment to perform a variety of operations. For example, a robot cleaner may move around a room for cleaning purposes. As another example, a robot lawn mower may travel around a lawn or outdoor area for the purpose of mowing grass. In yet another example, an autonomous vehicle may be used to perform various operations at a work site or an industrial location”) teaches generating, based on the voxel representation for the plurality of objects and the spatial data modality, a corresponding set of voxel-derived and spatial-data-modality-derived first embeddings (ALON, at least one para. 0164; “Segmenting may include mapping large numbers of points or polygons (e.g., hundreds of thousands of polygons) to a plurality of discrete components in the still image. Segmenting may include implementing a segmenting algorithm, including implementing object recognition algorithms and/or machine-learning models, to map basic image elements (e.g. voxels) to one or more discrete components.”);
generating a natural-language-modality-derived second embedding corresponding to the text string and the natural language modality (ALON, at least one para. 0199; “In some embodiments, an object image identifier may include text generated by an algorithm. For example, an advertiser may provide text describing an object, and a text-parsing system may extract relevant keywords from the text for use in an object image identifier. In some embodiments, an object image identifier may include information based on results of a classification model. For example, an advertiser may provide one or more images of a product, and a 3D- or 2D-matching algorithm may identify similar objects in a game scene. As an example of a 2D-matching algorithm, a neural network may segment an image into one or more objects and classify the object.”); and
identifying a matching object among the plurality of objects based on a comparison of the set of voxel-derived and spatial-data-modality-derived first embeddings and the natural-language-modality-derived second embedding (ALON, at least one para. 0257; “At step 412, visual input reconstruction system 120 may determine a matching object, consistent with disclosed embodiments. Determining a matching object may be based on output of an algorithm or model (e.g., a machine learning model).”).
Schwindt and ALON are both considered to be analogous to the claimed invention because both of them are in the same field as collecting sensor data to navigate an autonomous vehicle as the claimed invention. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention, to have use the collected sensor data of Schwindt to identify a matching object based on teaching of ALON. One of the ordinary skill in the art would have been motivated to make this modification so that a product can be accurately identified according to the inputted data (ALON; 059). 

Regarding claim 9, The combination of Schwindt and ALON teaches the limitations of claim 8, upon which the instant claim depends, as discussed supra. Further, ALON teaches (Currently Amended) The computer-implemented method of claim 8, wherein the comparison is based on a set of distances between the natural-language- modality-derived second embedding and the set of voxel-derived and spatial-data- modality-derived first embeddings (ALON, at least one para. 0297; “Comparing components may include determining a similarity metric indicating a degree of similarity between the matched-component and stored image data. A similarity metric may be based on shape data, color data, and/or any other data. A similarity metric may be based on a statistical similarity such as a covariance, a least-squares distance.”).

Regarding claim 10, The combination of Schwindt and ALON teaches the limitations of claim 9, upon which the instant claim depends, as discussed supra. Further, ALON teaches (Original) The computer-implemented method of claim 9, wherein the matching object is a closest Euclidean distance in the set of distances (ALON, at least one para. 0297; “Comparing components may include determining a similarity metric indicating a degree of similarity between the matched-component and stored image data. A similarity metric may be based on shape data, color data, and/or any other data. A similarity metric may be based on a statistical similarity such as a covariance, a least-squares distance.”).

Regarding claim 11, The combination of Schwindt and ALON teaches the limitations of claim 8, upon which the instant claim depends, as discussed supra. Further, Schwindt teaches (Original) The computer-implemented method of claim 8, further comprising: navigating the AV to a location corresponding to the matching object (Schwindt, at least one para. 0036; “The vehicle control systems 18 may include controllers, actuators, and the like for controlling operation of the autonomous vehicle 10 (for example, acceleration, braking, shifting gears, and the like). The vehicle control systems 18 communicate with the electronic controller 14 via the bus 30.”).

Regarding claim 13, The combination of Schwindt and ALON teaches the limitations of claim 8, upon which the instant claim depends, as discussed supra. Further, ALON teaches (Currently Amended) The computer-implemented method of The computer-implemented method of claim 8, wherein each embedding in the set of voxel-derived and spatial-data-modality-derived first embeddings comprises a vector representing characteristics of the corresponding object (ALON, at least one para. 0165; “Comparing objects and/or images may include comparing feature vectors associated with the objects and/or images.”, In other words, comparison step is executable because the first embedding comprises a vector representation).

Regarding claim 14, The combination of Schwindt and ALON teaches the limitations of claim 8, upon which the instant claim depends, as discussed supra. Further, Schwindt teaches (Original) The computer-implemented method of claim 8, wherein the point cloud data is generated by a Light Detection and Ranging (LiDAR) sensor (Schwindt, at least one para. 0049; “FIG. 4 schematically illustrates an example configuration of the sensors 22 for which the method 50 may be implemented. The plurality of sensors 22 may include one or more lidar sensors 104 configured to output lidar data”).

Regarding claim 15, Schwindt teaches (Currently Amended) A non-transitory computer-readable storage medium comprising at least one instruction for causing a computer or processor to (Schwindt, at least one para. 0039; “FIG. 2 illustrates an example of the electronic controller 14, which includes an electronic processor 34 (for example, a microprocessor, application specific integrated circuit, etc.), a memory 36, and an input/output interface 38. The memory 36 may be made up of one or more non-transitory computer-readable media and includes at least a program storage area and a data storage area. ”):
receive sensor data, wherein the sensor data represents a real-world environment encountered by an autonomous vehicle (AV) (Schwindt, at least one para. 0043; “During operation of the autonomous vehicle 10, the sensor 22 measures features of the environment surrounding the autonomous vehicle 10, and outputs sensor information. At block 62, the electronic processor 34 receives the sensor information from the sensor 22.”) and wherein the sensor data comprises point cloud data representing (Schwindt, at least one para. 0049; “FIG. 4 schematically illustrates an example configuration of the sensors 22 for which the method 50 may be implemented. The plurality of sensors 22 may include one or more lidar sensors 104 configured to output lidar data”) a plurality of objects (Schwindt, at least one para. 0048; “The electronic processor 34 may use known techniques to classify detected objects”), the point cloud data having a spatial data modality (Schwindt, at least one para. 0043; “Processing the sensor information with the sensor plugin 44 may further include spatially transforming the sensor information based on the determined mounting position of the sensor 22.”, for compact prosecution, examiner interprets the term “modality” as a particular form or mode);
generate, based at least in part on the point cloud data having the spatial data
modality, a voxel representation for the plurality of objects (Schwindt, at least one para. 0030; “Autonomous vehicles and driver assistance functions may rely on grid maps storing probabilistic information relating to features of an environment in order to control vehicle movement in the environment. Grid maps include a set of cells associated with respective positions in the environment surrounding a vehicle. The cells, which may be 2D cells or 3D voxels, include occupancy information relating to static or dynamic features of the environment.”), the voxel representation having the spatial data modality (Schwindt, at least one para. 0030; “The occupancy information may include sensor measurements of the features, occupancy probabilities of corresponding positions in the environment, and free-space probabilities of corresponding positions in the environment.”, for compact prosecution, examiner interprets the term “modality” as a particular form or mode);
generate, based on the voxel representation for the plurality of objects and the
spatial data modality, a corresponding set of voxel-derived and spatial-data-modality-
derived first embeddings;
receive a text string having a natural language modality (Schwindt, at least one para. 0038; “In some instances, the electronic controller 14 controls aspects of the autonomous vehicle 10 based on commands received from the user interface 26. The user interface 26 provides an interface between the components of the autonomous vehicle 10 and an occupant (for example, a driver) of the autonomous vehicle 10. The user interface 26 is configured to receive input from the occupant, receive indications of vehicle status from the system's controllers (for example, the electronic controller 14), and provide information to the driver based on the received indications. The user interface 26 provides visual output, such as, for example, graphical indicators (for example, fixed or animated icons), lights, colors, text, images, combinations of the foregoing, and the like.”, it is inherent that a text message expression of language, and for compact prosecution, examiner interprets the term “modality” as a particular form or mode);
generate a natural-language-modality-derived second embedding corresponding to
the text string and the natural language modality; and
identify a matching object among the plurality of objects based on a comparison
of the set of voxel-derived and spatial-data-modality-derived first embeddings and the
natural-language-modality-derived second embedding.
Schwindt does not explicitly teaches generate, based on the voxel representation for the plurality of objects and the spatial data modality, a corresponding set of voxel-derived and spatial-data-modality-derived first embeddings;
generate a natural-language-modality-derived second embedding corresponding to
the text string and the natural language modality; and
identify a matching object among the plurality of objects based on a comparison
of the set of voxel-derived and spatial-data-modality-derived first embeddings and the
natural-language-modality-derived second embedding.
	However, ALON, in the same field of endeavor (ALON, at least one para. 0018; “In many cases a robot may interact with an environment to perform a variety of operations. For example, a robot cleaner may move around a room for cleaning purposes. As another example, a robot lawn mower may travel around a lawn or outdoor area for the purpose of mowing grass. In yet another example, an autonomous vehicle may be used to perform various operations at a work site or an industrial location”) teaches generate, based on the voxel representation for the plurality of objects and the spatial data modality, a corresponding set of voxel-derived and spatial-data-modality-derived first embeddings (ALON, at least one para. 0164; “Segmenting may include mapping large numbers of points or polygons (e.g., hundreds of thousands of polygons) to a plurality of discrete components in the still image. Segmenting may include implementing a segmenting algorithm, including implementing object recognition algorithms and/or machine-learning models, to map basic image elements (e.g. voxels) to one or more discrete components.”);
generate a natural-language-modality-derived second embedding corresponding to the text string and the natural language modality (ALON, at least one para. 0199; “In some embodiments, an object image identifier may include text generated by an algorithm. For example, an advertiser may provide text describing an object, and a text-parsing system may extract relevant keywords from the text for use in an object image identifier. In some embodiments, an object image identifier may include information based on results of a classification model. For example, an advertiser may provide one or more images of a product, and a 3D- or 2D-matching algorithm may identify similar objects in a game scene. As an example of a 2D-matching algorithm, a neural network may segment an image into one or more objects and classify the object.”); and
identify a matching object among the plurality of objects based on a comparison
of the set of voxel-derived and spatial-data-modality-derived first embeddings and the natural-language-modality-derived second embedding (ALON, at least one para. 0257; “At step 412, visual input reconstruction system 120 may determine a matching object, consistent with disclosed embodiments. Determining a matching object may be based on output of an algorithm or model (e.g., a machine learning model).”).
Schwindt and ALON are both considered to be analogous to the claimed invention because both of them are in the same field as collecting sensor data to navigate an autonomous vehicle as the claimed invention. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention, to have use the collected sensor data of Schwindt to identify a matching object based on teaching of ALON. One of the ordinary skill in the art would have been motivated to make this modification so that a product can be accurately identified according to the inputted data (ALON; 059). 

Regarding claim 16, The combination of Schwindt and ALON teaches the limitations of claim 15, upon which the instant claim depends, as discussed supra. Further, ALON teaches (Currently Amended) The non-transitory computer-readable storage medium of claim 15, wherein the comparison is based on a set of distances between the natural-language-modality-derived second embedding and the set of voxel-derived and spatial-data-modality-derived first embeddings (ALON, at least one para. 0297; “Comparing components may include determining a similarity metric indicating a degree of similarity between the matched-component and stored image data. A similarity metric may be based on shape data, color data, and/or any other data. A similarity metric may be based on a statistical similarity such as a covariance, a least-squares distance.”).

Regarding claim 17, The combination of Schwindt and ALON teaches the limitations of claim 16, upon which the instant claim depends, as discussed supra. Further, ALON teaches (Original) The non-transitory computer-readable storage medium of claim 16, wherein the matching object is a closest Euclidean distance in the set of distances (ALON, at least one para. 0297; “Comparing components may include determining a similarity metric indicating a degree of similarity between the matched-component and stored image data. A similarity metric may be based on shape data, color data, and/or any other data. A similarity metric may be based on a statistical similarity such as a covariance, a least-squares distance.”).

Regarding claim 18, The combination of Schwindt and ALON teaches the limitations of claim 15, upon which the instant claim depends, as discussed supra. Further, Schwindt teaches (Original) The non-transitory computer-readable storage medium of claim 15, wherein the at least one instruction is further configured to: navigate the AV to a location corresponding to the matching object (Schwindt, at least one para. 0036; “The vehicle control systems 18 may include controllers, actuators, and the like for controlling operation of the autonomous vehicle 10 (for example, acceleration, braking, shifting gears, and the like). The vehicle control systems 18 communicate with the electronic controller 14 via the bus 30.”).

Regarding claim 20, The combination of Schwindt and ALON teaches the limitations of claim 15, upon which the instant claim depends, as discussed supra. Further, ALON teaches (Currently Amended) The non-transitory computer-readable storage medium of claim 15, wherein each embedding in the set of voxel-derived and spatial-data-modality-derived first embeddings comprises a vector representing characteristics of the corresponding object (ALON, at least one para. 0165; “Comparing objects and/or images may include comparing feature vectors associated with the objects and/or images.”, In other words, comparison step is executable because the first embedding comprises a vector representation).

Claim(s) 5, 12, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Schwindt (US 20240190461 A1) and ALON (US 20210383115 A1), and further in view of Liang (US 20200298891 A1).

Regarding claim 5, The combination of Schwindt and ALON teaches the limitations of claim 1, upon which the instant claim depends, as discussed supra. Further, Schwindt teaches (Original) The apparatus of claim 1, wherein the text string is received from a remote assistance (RA) (Schwindt, at least one para. 0038; “In some instances, the electronic controller 14 controls aspects of the autonomous vehicle 10 based on commands received from the user interface 26.”).
Schwindt does not explicitly teaches wherein the text string is received from a remote assistance (RA).
	However, Liang, in the same field of endeavor (Liang, at least one para. 0002; “The present disclosure relates generally to determining the state of objects and predicting their motion through an environment.”) teaches wherein the text string is received from a remote assistance (RA) (Liang, at least one para. 0087; “The one or more remote computing devices 106 can communicate (e.g., send and/or receive data and/or signals) with one or more devices including the operations computing system 104 and the vehicle 108 via the communications network 102. For example, the one or more remote computing devices 106 can request the location of the vehicle 108 or the state of one or more objects detected by the one or more sensors 114 of the vehicle 108, via the communications network 102.”).
The combination of Schwindt, ALON, and Liang are considered to be analogous to the claimed invention because all of them are in the same field as collecting sensor data to navigate an autonomous vehicle as the claimed invention. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention, to modify the teaching of Schwindt to include the remote computing device on teaching of Liang. One of the ordinary skill in the art would have been motivated to make this modification because the substitution of one known element for another would have yielded predictable results to one of ordinary skill in the art.

Regarding claim 12, The combination of Schwindt and ALON teaches the limitations of claim 8, upon which the instant claim depends, as discussed supra. Further, Schwindt teaches (Original) The computer-implemented method of claim 8, wherein the text string is received from a remote assistance (RA) (Schwindt, at least one para. 0038; “In some instances, the electronic controller 14 controls aspects of the autonomous vehicle 10 based on commands received from the user interface 26.”).
Schwindt does not explicitly teaches wherein the text string is received from a remote assistance (RA).
	However, Liang, in the same field of endeavor (Liang, at least one para. 0002; “The present disclosure relates generally to determining the state of objects and predicting their motion through an environment.”) teaches wherein the text string is received from a remote assistance (RA) (Liang, at least one para. 0087; “The one or more remote computing devices 106 can communicate (e.g., send and/or receive data and/or signals) with one or more devices including the operations computing system 104 and the vehicle 108 via the communications network 102. For example, the one or more remote computing devices 106 can request the location of the vehicle 108 or the state of one or more objects detected by the one or more sensors 114 of the vehicle 108, via the communications network 102.”).
The combination of Schwindt, ALON, and Liang are considered to be analogous to the claimed invention because all of them are in the same field as collecting sensor data to navigate an autonomous vehicle as the claimed invention. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention, to modify the teaching of Schwindt to include the remote computing device on teaching of Liang. One of the ordinary skill in the art would have been motivated to make this modification because the substitution of one known element for another would have yielded predictable results to one of ordinary skill in the art.

Regarding claim 19, The combination of Schwindt and ALON teaches the limitations of claim 15, upon which the instant claim depends, as discussed supra. Further, Schwindt teaches (Original) The non-transitory computer-readable storage medium of claim 15, wherein the text string is received from a remote assistance (RA) (Schwindt, at least one para. 0038; “In some instances, the electronic controller 14 controls aspects of the autonomous vehicle 10 based on commands received from the user interface 26.”).
Schwindt does not explicitly teaches wherein the text string is received from a remote assistance (RA).
	However, Liang, in the same field of endeavor (Liang, at least one para. 0002; “The present disclosure relates generally to determining the state of objects and predicting their motion through an environment.”) teaches wherein the text string is received from a remote assistance (RA) (Liang, at least one para. 0087; “The one or more remote computing devices 106 can communicate (e.g., send and/or receive data and/or signals) with one or more devices including the operations computing system 104 and the vehicle 108 via the communications network 102. For example, the one or more remote computing devices 106 can request the location of the vehicle 108 or the state of one or more objects detected by the one or more sensors 114 of the vehicle 108, via the communications network 102.”).
The combination of Schwindt, ALON, and Liang are considered to be analogous to the claimed invention because all of them are in the same field as collecting sensor data to navigate an autonomous vehicle as the claimed invention. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention, to modify the teaching of Schwindt to include the remote computing device on teaching of Liang. One of the ordinary skill in the art would have been motivated to make this modification because the substitution of one known element for another would have yielded predictable results to one of ordinary skill in the art.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to UPUL P CHANDRASIRI whose telephone number is (703)756-5823. The examiner can normally be reached M-F 8.30 am to 5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Christian Chace can be reached at 571-272-4190. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/U.P.C./Examiner, Art Unit 3665                                                                                                                                                                                                        /CHRISTIAN CHACE/Supervisory Patent Examiner, Art Unit 3665
Read full office action
Prosecution Timeline

Sep 13, 2023
Application Filed
Aug 11, 2025
Non-Final Rejection mailed — §103, §112
Oct 31, 2025
Interview Requested
Nov 11, 2025
Response Filed
Feb 11, 2026
Final Rejection mailed — §103, §112
Mar 25, 2026
Interview Requested
Apr 10, 2026
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

18/149,308
Patent 12391240
VEHICLE DRIVING ASSIST DEVICE
2y 7m to grant Granted Aug 19, 2025
18/023,207
Patent 12325421
Method for Holding a Two-Track Motor Vehicle
2y 3m to grant Granted Jun 10, 2025
Study what changed to get past this examiner. Based on 2 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
12%
Grant Probability
-4%
With Interview (-16.7%)
2y 12m (~3m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 16 resolved cases by this examiner. Grant probability derived from career allowance rate.