Last updated: April 19, 2026
Application No. 18/304,975
CROSS-ATTENTION PERCEPTION MODEL TRAINED TO USE SENSOR AND/OR MAP DATA

Non-Final OA §102§103
Filed
Apr 21, 2023
Examiner
CHANDRASIRI, UPUL PRIYADARSHAN
Art Unit
3665
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
Zoox Inc.
OA Round
3 (Non-Final)
Interview Optional

— -28.6% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 10 resolved cases, 2023–2026
Examiner Intelligence

CHANDRASIRI, UPUL PRIYADARSHAN View full profile →
Grants only 20% of cases
Career Allow Rate
2 granted / 10 resolved
-32.0% vs TC avg
Minimal -29% lift
Without
With
+-28.6%
Interview Lift
resolved cases with interview
Typical timeline
2y 5m
Avg Prosecution
36 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
2.7%
-37.3% vs TC avg
§103
52.4%
+12.4% vs TC avg
§102
18.9%
-21.1% vs TC avg
§112
22.5%
-17.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 10 resolved cases
Office Action

§102 §103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
The amendment filed 11/24/2025 is being entered. Claims 1-4, 6, 7, 9, and 16 are amended. Claims 5, 12, 13, 18, and 20 are canceled. Claims 21-25 are new. Claims 1-4, 6-11, 14-17, 19, and 21-25 are pending, and rejected as detailed below. 

Response to Arguments
Yang does not Qualify as Prior Art for the purposes of 35 U.S.C. § 103 under 35 U.S.C. § 102(b)(2)(C) 
Applicant respectfully submits that Yang, as cited by the Office, does not qualify as prior art for the purposes of 35 U.S.C. § 103, as per the 35 U.S.C. § 102(b)(2)(C) exception. Per MPEP § 2141.01, an "obviousness rejection is ordinarily based on a disclosure that qualifies as prior art under 35 U.S.C. 102... If it is established that a disclosure does not qualify as prior art under an appropriate section of 35 U.S.C. 102, then the disclosure is also not prior art that can be used in an obviousness rejection." See MPEP, § 2141.01, (see also MPEP § 717.02(b) "it is also important to recognize that the 35 U.S.C. 102(b)(2)(C) exception applies when the rejection is under 35 U.S.C. 102(a)(2) (anticipation) or 35 U.S.C. 103 (obviousness)").
The pending application has an effective filing date of April 21, 2023, and is assigned to Zoox, Inc. The assignment of the pending application was recorded with the USPTO on May 4, 2023, and given reel/frame 063534/0223. The Yang document was filed on August 17, 2021, and is assigned to Amazon Technologies, Inc. Zoox, Inc. is a wholly owned subsidiary of Amazon Technologies, Inc. Yang was recorded with the USPTO on August 17, 2021, and given reel/frame 57205/0103. 
Yang was published on January 28, 2025, which is after the filing date of the instant application, which is April 21, 2023. Therefore, Yang does not qualify as prior art under § 102(a)(1). However, the Yang reference might be considered prior art under § 102(a)(2) as a published application for a patent with an effective filing date prior to the filing date of the pending application. However, the Yang document is not prior art against the pending application under § 102(b)(2)(C) because, at the time of filing of the pending application (April 21, 2023), the Yang document and the pending application were subject to an obligation of assignment to the same parent entity, Amazon Technologies, Inc. Therefore, according to at least 35 U.S.C. § 102(b)(2)(C), the Yang document is not eligible prior art against the instant claims. Accordingly, Applicant requests that the Office withdraw the § 103 rejections of claims 2, 3, 4, 6, 9-11, 15, 17, and 19 that at least partially rely on Yang.
Applicant’s arguments with respect to the rejections of claims under U.S.C. §103 in reference to Yang have been fully considered and persuasive. Therefore, the rejection has been withdrawn. However, upon further consideration, a new ground(s) of rejection for the dependent claims is made in view of reference Liu (US 20210258611 A1), and further in view of, Bojarski (US 20200324795 A1), Arditi (US 20230132889 A1), LIU (US 20220215558 A1), COIMBRA DE ANDRADE (US 20220067485 A1), and QIN (US 20210110234 A1). In particular, claims 1-20 are addressed in the instant office action. 
Independent Claim 1 
Applicant argues that Liu fails to disclose "determining... based at least in part on determining a dot product associated with the first embedding and the second embedding, an attention score indicating a relationship between the first portion of the sensor data and the first portion of the map data," as recited by amended claim 1. That is, the "conditional entropy model" of Liu is configured to "model the correlation" between the two images "to reduce the joint entropy" and the "joint bitrate of the two image codes," but fails to disclose or suggest using "a dot product associated with the first embedding and the second embedding" to determine "an attention score indicating a relationship between the first portion of the sensor data and the first portion of the map data." For at least the reasons presented herein, Liu does not disclose all of the features of amended claim 1. Accordingly, Applicant submits that Liu does not anticipate claim 1 and respectfully requests that the Office withdraw the § 102 rejection of claim 1. 
Applicant’s arguments, as amended herein, with respect to the rejections of claim 1 under 35 U.S.C. §102 have been fully considered and not persuasive as Liu also mentioned that the machine-learning models that includes end-to-end deep architecture can also use instead of the conditional entropy model [para. 0165 and 0036]. It is inherent that the dot product is inherent feature in the linear models of the machine-learning and the machine learning image compression model that include end-to-end deep architecture. In particular, as amended herein, claims 1 is addressed in the instant office action. 
Independent Claim 7 
Applicant argues that Liu fails to disclose all of the features of amended claim 7. 
For example, Liu describes a "machine-learned image compression model configured to generate compressed image data in response to image data associated with at least two image sensors having at least partially overlapping fields of view." Liu, para. [0005]. Liu further discusses using a "conditional entropy model" "to model the correlation between the two image codes of the two images to reduce the joint entropy, and hence the joint bitrate, of the two image codes." Liu, para. 0069. Liu, however, fails to disclose at least, "determining a confidence score associated with the output" and then "controlling a vehicle based at least in part on the output and the confidence score," as recited by amended claim 7. For at least the reasons presented herein, Liu does not disclose all of the features of claim 7. Accordingly, Applicant submits that Liu does not anticipate claim 7 and respectfully requests that the Office withdraw the § 102 rejection of claim 7. 
 Applicant’s arguments, as amended herein, with respect to the rejections of claim 7 under 35 U.S.C. §102 have been fully considered and not persuasive as Liu also mentioned that the cost volume can be seen as the confidence measure [para. 0099]. In particular, as amended herein, claims 7 is addressed in the instant office action. 
Dependent Claims 13 and 14 
Applicant argues that Claim 14 ultimately depends from independent claim 7. As discussed above, claim 7 is not anticipated by Liu, and is therefore allowable over the cited document. Therefore, claim 14 is allowable over the cited document of record for at least its dependency from an allowable base claim, and also for the additional features that it recites. 
Accordingly, Applicant respectfully requests that the Office withdraw the § 102 rejection of claim 14. 
 Applicant’s arguments with respect to the rejections of claim 14 under 35 U.S.C. §102 have been fully considered and not persuasive as Liu also mentioned that the cost volume can be seen as the confidence measure [para. 0099]. In particular claims 14 is addressed in the instant office action. 
Independent Claim 16 
Applicant argues that Liu fails to disclose all of the features of amended claim 16. 
For example, Liu describes a "machine-learned image compression model configured to generate compressed image data in response to image data associated with at least two image sensors having at least partially overlapping fields of view." Liu, para. [0005]. Liu further discusses using a "conditional entropy model" "to model the correlation between the two image codes of the two images to reduce the joint entropy, and hence the joint bitrate, of the two image codes." Liu, para. 0069. Liu, however, fails to disclose at least, "determining, based at least in part on the first embedding and the second embedding, an output comprising at least one of a semantic segmentation associated with the sensor data, an object detection indicating a detection of an object represented in the sensor data, a depth to the object, a localization error, or a false positive indication" where "the output is associated with a confidence score," as recited by amended claim 16. For at least the reasons presented herein, Liu does not disclose all of the features of claim 16. Accordingly, Applicant submits that Liu does not anticipate claim 16 and respectfully requests that the Office withdraw the § 102 rejection of claim 16. 
Applicant’s arguments, as amended herein, with respect to the rejections of claim 16 under 35 U.S.C. §102 have been fully considered and not persuasive as Liu also mentioned that the machine-learning models that includes end-to-end deep architecture can also use instead of the conditional entropy model [para. 0165 and 0036]. It is inherent that the dot product is inherent feature in the linear models of the machine-learning and the machine learning image compression model that include end-to-end deep architecture. Furthermore, Liu also mentioned that the cost volume can be seen as the confidence measure [para. 0099]. In particular, as amended herein, claims 16 is addressed in the instant office action. 
Claims 2, 8, and 17 Would Not Have Been Obvious over the Cited Documents. 
Applicant argues that Claims 2, 8, and 17 were rejected under 35 U.S.C. § 103 as allegedly being obvious over a combination of Liu in view of Yang. Claims 2, 8, and 17 ultimately depend from independent claims 1, 7, and 16. As discussed above, claims 1, 7, and 16 are allowable over Liu. The Office cites Yang as allegedly teaching the additional features of dependent claims 2, 8, and 17. Without conceding the propriety of the combination of cited documents, the Office has failed to show that the combination teaches or suggests one or more features of independent claims 1, 7, and 16. Therefore, claims 2, 8, and 17 are allowable over the cited documents of record for at least their dependency from an allowable base claim, and also for the additional features that each recites. Accordingly, Applicant respectfully requests that the Office withdraw the § 103 rejection of claims 2, 8, and 17. 
Applicant’s arguments, as amended herein, with respect to the rejections of claim 2, 8, and 17 under 35 U.S.C. §103 have been fully considered and not persuasive as Liu anticipate claims 1, 7, and 16. Furthermore, applicant’s arguments with respect to the rejections of claims 2, 8, and 17 under U.S.C. §103 in reference to Liu and Yang have been fully considered and persuasive. Therefore, the rejection has been withdrawn. However, upon further consideration, a new ground(s) of rejection for claims 2, 8, and 17 is made in view of reference Liu (US 20210258611 A1), and further in view of COIMBRA DE ANDRADE (US 20220067485 A1). In particular, claims 2, 8, and 17 are addressed in the instant office action. 
Claims 3, 4, 6, 9, 10, 12, and 19 Would Not Have Been Obvious over the Cited Documents 
Applicant argues that Claims 3, 4, 6, 9, 10, 12, and 19 were rejected under 35 U.S.C. § 103 as allegedly being obvious over a combination of Liu in view of Yang and further in view of Arditi. Claim 12 is canceled herein without prejudice or disclaimer of the subject matter, rendering the rejection of this claim moot. Claims 3, 4, 6, 9, 10, and 19 ultimately depend from one of independent claims 1, 7, and 16. As discussed above, claims 1, 7, and 16 are allowable over Liu. The Office cites Yang and Arditi as allegedly teaching the additional features of dependent claims 3, 4, 6, 9, 10, and 19. Without conceding the propriety of the combination of cited documents, the Office has failed to show that the combination teaches or suggests one or more features of independent claims 1, 7, and 16. Therefore, claims 3, 4, 6, 9, 10, and 19 are allowable over the cited documents of record for at least their dependency from an allowable base claim, and also for the additional features that each recites. Accordingly, Applicant respectfully requests that the Office withdraw the § 103 rejection of claims 3, 4, 6, 9, 10, and 19. 
Applicant’s arguments, as amended herein, with respect to the rejections of claim 3, 4, 6, 9, 10, and 19 under 35 U.S.C. §103 have been fully considered and not persuasive as Liu anticipate claims 1, 7, and 16. Furthermore, applicant’s arguments with respect to the rejections of claims 3, 4, 6, 9, 10, and 19 under U.S.C. §103 in reference to Liu, Yang, and Arditi have been fully considered and persuasive. Therefore, the rejection has been withdrawn. However, upon further consideration, a new ground(s) of rejection for claims 3, 4, 6, 9, 10, and 19 is made in view of reference Liu and COIMBRA DE ANDRADE, and further in view of Arditi. In particular, claims 3, 4, 6, 9, 10, and 19 are addressed in the instant office action. 
Claims 5 and 11 Would Not Have Been Obvious over the Cited Documents 
Applicant argues that Claims 5 and 11 were rejected under 35 U.S.C. § 103 as allegedly being obvious over a combination of Liu in view of Yang and further in view of Bojarski. Claim 5 is canceled herein without prejudice or disclaimer of the subject matter, rendering the rejection of this claim moot. Claim 11 ultimately depends from independent claim 7. As discussed above, claim 7 is allowable over Liu. The Office cites Yang and Bojarski as allegedly teaching the additional features of dependent claim 11. Without conceding the propriety of the combination of cited documents, the Office has failed to show that the combination teaches or suggests one or more features of independent claim 7. Therefore, claim 11 is allowable over the cited documents of record for at least its dependency from an allowable base claim, and also for the additional features that it recites. Accordingly, Applicant respectfully requests that the Office withdraw the § 103 rejection of claim 11. 
Applicant’s arguments, as amended herein, with respect to the rejections of claim 11 under 35 U.S.C. §103 have been fully considered and not persuasive as Liu anticipate claims 1, 7, and 16. Furthermore, applicant’s arguments with respect to the rejections of claim 11 under U.S.C. §103 in reference to Liu, Yang, and Bojarski have been fully considered and persuasive. Therefore, the rejection has been withdrawn. However, upon further consideration, a new ground(s) of rejection for claim 11 is made in view of reference Liu and COIMBRA DE ANDRADE, and further in view of Bojarski. In particular, claims 11 is addressed in the instant office action. 
Claims 15 and 20 Would Not Have Been Obvious over the Cited Documents 
Applicant argues that Claims 15 and 20 were rejected under 35 U.S.C. § 103 as allegedly being obvious over a combination of Liu in view of Yang and further in view of Luyan LIU. Claim 20 is canceled herein without prejudice or disclaimer of the subject matter, rendering the rejection of this claim moot. Claim 15 ultimately depends from independent claim 7. As discussed above, claim 7 is allowable over Liu. The Office cites Yang and Luyan LIU as allegedly teaching the additional features of dependent claim 15. Without conceding the propriety of the combination of cited documents, the Office has failed to show that the combination teaches or suggests one or more features of independent claim 7. Therefore, claim 15 is allowable over the cited documents of record for at least its dependency from an allowable base claim, and also for the additional features that it recites. Accordingly, Applicant respectfully requests that the Office withdraw the § 103 rejection of claim 15. 
Applicant’s arguments, as amended herein, with respect to the rejections of claim 15 under 35 U.S.C. §103 have been fully considered and not persuasive as Liu anticipate claims 1, 7, and 16. Furthermore, applicant’s arguments with respect to the rejections of claim 15 under U.S.C. §103 in reference to Liu, Yang, and Luyan have been fully considered and persuasive. Therefore, the rejection has been withdrawn. However, upon further consideration, a new ground(s) of rejection for claim 15 is made in view of reference Liu and COIMBRA DE ANDRADE, and further in view of Luyan. In particular, claims 15 is addressed in the instant office action. 
Claim 18 Would Not Have Been Obvious over the Cited Documents 
Applicant argues that Claim 18 was rejected under 35 U.S.C. § 103 as allegedly being obvious over a combination of Liu in view of Yang in view of Arditi and further in view of Bojarski. Claim 18 is canceled herein without prejudice or disclaimer of the subject matter, rendering the rejection of this claim moot. Accordingly, Applicant respectfully requests that the Office withdraw the § 103 rejection of claim 18. 
	Applicant canceled claim 18 and therefore, there is no rejection for claim 18. 
New Claims 21-25 
Applicant adds new claims 21-25. Claims 21-25 ultimately depend from one of independent claims 1,7 or 16. As discussed above, claims 1, 7, and 14 are allowable over the cited documents. Therefore, claims 21-25 are allowable over the cited documents of record for at least their dependency from an allowable base claim, and also for the additional features that each recites. Accordingly, Applicant respectfully requests expedient allowance of claims 21-25. 
Applicant’s arguments, as amended herein, with respect to the rejections of claim 21-25 under 35 U.S.C. §103 have been fully considered and not persuasive as Liu anticipate claims 1, 7, and 16. In particular, claims 21-25 are addressed in the instant office action. 

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claim(s) 1, 7, 14, and 16 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Liu (US 20210258611 A1).

Regarding claim 1, Liu teaches (Currently Amended) A system (Liu, at least one para. 0002; “The present disclosure relates generally to improving the ability of computing devices to compress image data.”) comprising:
one or more processors (Liu, at least one para. 0044; “one or more processors”); and
non-transitory memory storing processor-executable instructions that, when executed by the one or more processors, cause the system to perform operations (Liu, at least one para. 0044; “An autonomous vehicle according to an example aspect of the present disclosure can include a plurality of vehicle sensors including a first image sensor and a second image sensor, one or more processors, and one or more tangible, non-transitory, computer readable media that collectively store instructions that when executed by the one or more processors cause the one or more processors to perform operations.”) comprising:
receiving sensor data associated with an environment surrounding a vehicle (Liu, at least one para. 0054; “The vehicle sensor(s) 116 can be configured to acquire sensor data 118. This can include sensor data associated with the surrounding environment of the vehicle 102. ”);
determining map data associated with the environment based (Liu, at least one para. 0055; “In addition to the sensor data 118, the autonomy computing system 130 can retrieve or otherwise obtain map data 132. The map data 132 can provide static world representations about the surrounding environment of the vehicle 102. For example, in some implementations, a vehicle 102 can exploit prior knowledge about the static world by building very detailed maps (HD maps) that represent not only the roads, buildings, bridges, and landmarks, but also traffic lanes, signs, and lights to centimeter accurate three-dimensional representations.”) at least in part on a first pose of the vehicle and a second pose of a sensor associated with the vehicle (Liu, at least one para. 0054; “The sensor data 118 can include image data, radar data, LIDAR data, and/or other data acquired by the vehicle sensor(s) 116. For example, the vehicle sensor(s) 116 can include a front-facing RGB camera mounted on top of the vehicle 102 and the sensor data 118 can include an RGB image depicting the surrounding environment of the vehicle 102.”);
determining, by a first encoder based at least in part on at least a first portion of the sensor data, a first embedding (Liu, at least one para. 0067; “FIG. 2 depicts an example computing environment including a machine-learned image compression system 200 according to example embodiments of the present disclosure. Image compression system 200 includes a machine-learned image compression model 210 that is configured to jointly compress two or more inputs such as a first image 202 and a second image 204 having at least partially overlapping fields of view.”);
determining, by a second encoder based at least in part on a first portion of the map data, a second embedding, wherein the first portion of the sensor data and the first portion of the map data are associated with a region of the environment (Liu, at least one para. 0068; “The set of one or more parametric skip function(s) 228 can be provided in order to propagate information from feature maps 224 generated by the encoder 212 from the first image 202. In some examples, a parametric skip function can be provided for each encoding layer 214 except for a first encoding layer of encoder 212. The propagated information from the feature maps 224 is provided to encoder 242.”);
determining, by a transformer-based machine-learned model comprising the first encoder and the second encoder and based at least in part on determining a dot product associated with the first embedding and the second embedding (Liu, at least one para. 0165; “the machine-learned models 1010 can be or can otherwise include various machine-learned models such as, for example, neural networks (e.g., deep neural networks or other types of models including linear models and/or non-linear models. Example neural networks include feed-forward neural networks, recurrent neural networks (e.g., long short-term memory recurrent neural networks), convolutional neural networks, or other forms of neural networks.”) and (Liu, at least one para. 0036; “In some examples, the machine-learned image compression model can include an end-to-end deep architecture where implicit depth estimation and compression are performed jointly.”, wherein the dot product is inherent feature in the linear models of the machine-learning and the machine learning image compression model that include end-to-end deep architecture), an attention score (Liu, at least one para. 0101; “In Equation 2, C.sub.d,i represents the cost of disparity d at pixel i. The pixel index that is d pixels to the right of pixel i is represented by (i, d). The volumetric warping provides a warped feature map g.sub.2.sup.t−1 which better aligns with the feature map of the second image. This can also be seen as an attention mechanism for each pixel i into the first image's feature map within a disparity range”) indicating a relationship between the first portion of the sensor data and the first portion of the map data (Liu, at least one para. 0069; “A conditional entropy model can be used to model the correlation between the two image codes of the two images to reduce the joint entropy, and hence the joint bitrate, of the two image codes.”);
determining, based at least in part on the attention score, an output comprising at least one of a semantic segmentation associated with the sensor data, an object detection indicating a detection of an object represented in the sensor data (Liu, at least one para. 0166; “the computing system 1002 can implement the machine-learned model(s) 1010 to generate uncertainty data for object detections, predictions, and motion plan generation based on sensor data.”), a depth to the object (Liu, at least one para. 0041; “In some examples, an end to end deep architecture for multiple image compression (e.g., stereo image compression) is provided. The architecture can provide implicit depth estimation and compression that are performed jointly in the machine learned image compression model”), or a false positive dynamic object indication; and
controlling the vehicle based at least in part on the output (Liu, at least one para. 0058; “For example, the autonomy computing system 130 can obtain the sensor data 118 from the vehicle sensor(s) 116, process the sensor data 118 (and/or other data) to perceive its surrounding environment, predict the motion of objects within the surrounding environment, and generate an appropriate motion plan through such surrounding environment. The autonomy computing system 130 can communicate with the one or more vehicle control systems 120 to operate the vehicle 102 according to the motion plan.”).

Regarding claim 7, Liu teaches (Currently Amended) One or more non-transitory computer-readable media storing processor- executable instructions that, when executed by one or more processors (Liu, at least one para. 0044; “An autonomous vehicle according to an example aspect of the present disclosure can include a plurality of vehicle sensors including a first image sensor and a second image sensor, one or more processors, and one or more tangible, non-transitory, computer readable media that collectively store instructions that when executed by the one or more processors cause the one or more processors to perform operations.”), perform operations comprising:
receiving sensor data (Liu, at least one para. 0054; “The vehicle sensor(s) 116 can be configured to acquire sensor data 118. This can include sensor data associated with the surrounding environment of the vehicle 102.”);
receiving map data associated with a portion of an environment (Liu, at least one para. 0055; “In addition to the sensor data 118, the autonomy computing system 130 can retrieve or otherwise obtain map data 132. The map data 132 can provide static world representations about the surrounding environment of the vehicle 102. For example, in some implementations, a vehicle 102 can exploit prior knowledge about the static world by building very detailed maps (HD maps) that represent not only the roads, buildings, bridges, and landmarks, but also traffic lanes, signs, and lights to centimeter accurate three-dimensional representations.”) associated with the sensor data (Liu, at least one para. 0054; “The sensor data 118 can include image data, radar data, LIDAR data, and/or other data acquired by the vehicle sensor(s) 116. For example, the vehicle sensor(s) 116 can include a front-facing RGB camera mounted on top of the vehicle 102 and the sensor data 118 can include an RGB image depicting the surrounding environment of the vehicle 102.”);
determining, by a first machine-learned model based at least in part on the sensor data, a first embedding (Liu, at least one para. 0067; “FIG. 2 depicts an example computing environment including a machine-learned image compression system 200 according to example embodiments of the present disclosure. Image compression system 200 includes a machine-learned image compression model 210 that is configured to jointly compress two or more inputs such as a first image 202 and a second image 204 having at least partially overlapping fields of view.”);
determining, by a second machine-learned model based at least in part on the map data, a second embedding (Liu, at least one para. 0068; “The set of one or more parametric skip function(s) 228 can be provided in order to propagate information from feature maps 224 generated by the encoder 212 from the first image 202. In some examples, a parametric skip function can be provided for each encoding layer 214 except for a first encoding layer of encoder 212. The propagated information from the feature maps 224 is provided to encoder 242.”);
determining, based at least in part on the first embedding and the second embedding (Liu, at least one para. 0165; “the machine-learned models 1010 can be or can otherwise include various machine-learned models such as, for example, neural networks (e.g., deep neural networks or other types of models including linear models and/or non-linear models. Example neural networks include feed-forward neural networks, recurrent neural networks (e.g., long short-term memory recurrent neural networks), convolutional neural networks, or other forms of neural networks.”), an output comprising at least one of a semantic segmentation associated with the sensor data, an object detection indicating a detection of an object represented in the sensor data (Liu, at least one para. 0166; “the computing system 1002 can implement the machine-learned model(s) 1010 to generate uncertainty data for object detections, predictions, and motion plan generation based on sensor data.”), a depth to the object (Liu, at least one para. 0041; “In some examples, an end to end deep architecture for multiple image compression (e.g., stereo image compression) is provided. The architecture can provide implicit depth estimation and compression that are performed jointly in the machine learned image compression model”), a localization error, or a false positive indication; and
determining a confidence score associated with the output (Liu, at least one para. 0099; “A softmax layer can be applied to ensure the cost is normalized along the disparity dimension per pixel. Each value in the cost volume can be seen as a probability/confidence measure of the correct disparity at that coordinate.”); and
controlling a vehicle based at least in part on the output and the confidence score (Liu, at least one para. 0058; “For example, the autonomy computing system 130 can obtain the sensor data 118 from the vehicle sensor(s) 116, process the sensor data 118 (and/or other data) to perceive its surrounding environment, predict the motion of objects within the surrounding environment, and generate an appropriate motion plan through such surrounding environment. The autonomy computing system 130 can communicate with the one or more vehicle control systems 120 to operate the vehicle 102 according to the motion plan.”).

Regarding claim 14, Liu teaches the limitations of claim 7, upon which the instant claim depends, as discussed supra. Further, Liu teaches (Original) The one or more non-transitory computer-readable media of claim 7, wherein the map data comprises geometric data and a third embedding associated with the geometric data and the operations (Liu, at least one para. 0056; “the map data 132 can provide the vehicle 102 relative positions of the elements of a surrounding environment of the vehicle 102. The vehicle 102 can identify its position within the surrounding environment (e.g., across six axes, etc.) based at least in part on the map data 132.”) further comprise:
receiving training data indicating ground truth associated with the output (Liu, at least one para. 0176; “In particular, the model trainer 1060 can train a machine-learned model 1010 and/or 1040 based on a set of training data 1062. The training data 1062 can include, for example, ground truth data including annotations for sensor data portions and/or vehicle state data.”);
determining a loss based at least in part on a difference between the ground truth and the output (Liu, at least one para. 0175; “The model trainer 1060 can train the machine-learned models 1010 and/or 1040 using one or more training or learning algorithms. One example training technique is backwards propagation of errors.”); and
altering the third embedding to reduce the loss (Liu, at least one para. 0175; “The model trainer 1060 can perform a number of generalization techniques to improve the generalization capability of the models being trained. Generalization techniques include weight decays, dropouts, or other techniques.”).

Regarding claim 16, Liu teaches (Currently Amended) A method (Liu, at least one para. 0002; “The present disclosure relates generally to improving the ability of computing devices to compress image data.”) comprising:
receiving sensor data (Liu, at least one para. 0054; “The vehicle sensor(s) 116 can be configured to acquire sensor data 118. This can include sensor data associated with the surrounding environment of the vehicle 102. ”);
receiving map data associated with a portion of an environment (Liu, at least one para. 0055; “In addition to the sensor data 118, the autonomy computing system 130 can retrieve or otherwise obtain map data 132. The map data 132 can provide static world representations about the surrounding environment of the vehicle 102. For example, in some implementations, a vehicle 102 can exploit prior knowledge about the static world by building very detailed maps (HD maps) that represent not only the roads, buildings, bridges, and landmarks, but also traffic lanes, signs, and lights to centimeter accurate three-dimensional representations.”) associated with the sensor data (Liu, at least one para. 0054; “The sensor data 118 can include image data, radar data, LIDAR data, and/or other data acquired by the vehicle sensor(s) 116. For example, the vehicle sensor(s) 116 can include a front-facing RGB camera mounted on top of the vehicle 102 and the sensor data 118 can include an RGB image depicting the surrounding environment of the vehicle 102.”);
determining, by a first machine-learned model based at least in part on the sensor data, a first embedding (Liu, at least one para. 0067; “FIG. 2 depicts an example computing environment including a machine-learned image compression system 200 according to example embodiments of the present disclosure. Image compression system 200 includes a machine-learned image compression model 210 that is configured to jointly compress two or more inputs such as a first image 202 and a second image 204 having at least partially overlapping fields of view.”);
determining, by a second machine-learned model based at least in part on the map data, a second embedding (Liu, at least one para. 0068; “The set of one or more parametric skip function(s) 228 can be provided in order to propagate information from feature maps 224 generated by the encoder 212 from the first image 202. In some examples, a parametric skip function can be provided for each encoding layer 214 except for a first encoding layer of encoder 212. The propagated information from the feature maps 224 is provided to encoder 242.”);
determining, based at least in part on the first embedding and the second embedding (Liu, at least one para. 0165; “the machine-learned models 1010 can be or can otherwise include various machine-learned models such as, for example, neural networks (e.g., deep neural networks or other types of models including linear models and/or non-linear models. Example neural networks include feed-forward neural networks, recurrent neural networks (e.g., long short-term memory recurrent neural networks), convolutional neural networks, or other forms of neural networks.”), an output comprising at least one of a semantic segmentation associated with the sensor data, an object detection indicating a detection of an object represented in the sensor data (Liu, at least one para. 0166; “the computing system 1002 can implement the machine-learned model(s) 1010 to generate uncertainty data for object detections, predictions, and motion plan generation based on sensor data.”), a depth to the object (Liu, at least one para. 0041; “In some examples, an end to end deep architecture for multiple image compression (e.g., stereo image compression) is provided. The architecture can provide implicit depth estimation and compression that are performed jointly in the machine learned image compression model”), a localization error, or a false positive indication, the output associated with a confidence score (Liu, at least one para. 0099; “A softmax layer can be applied to ensure the cost is normalized along the disparity dimension per pixel. Each value in the cost volume can be seen as a probability/confidence measure of the correct disparity at that coordinate.”); and 
controlling a vehicle based at least in part on the output (Liu, at least one para. 0058; “For example, the autonomy computing system 130 can obtain the sensor data 118 from the vehicle sensor(s) 116, process the sensor data 118 (and/or other data) to perceive its surrounding environment, predict the motion of objects within the surrounding environment, and generate an appropriate motion plan through such surrounding environment. The autonomy computing system 130 can communicate with the one or more vehicle control systems 120 to operate the vehicle 102 according to the motion plan.”).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim(s) 2, 8, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Liu, and further in view of COIMBRA DE ANDRADE (US 20220067485 A1).

Regarding claim 2, Liu teaches the limitations of claim 1, upon which the instant claim depends, as discussed supra. Further, Liu teaches (Currently Amended) The system of claim 1, wherein determining the attention score  (Liu, at least one para. 0069; “A conditional entropy model can be used to model the correlation between the two image codes of the two images to reduce the joint entropy, and hence the joint bitrate, of the two image codes.”) and (Liu, at least one para. 0101; “In Equation 2, C.sub.d,i represents the cost of disparity d at pixel i. The pixel index that is d pixels to the right of pixel i is represented by (i, d). The volumetric warping provides a warped feature map g.sub.2.sup.t−1 which better aligns with the feature map of the second image. This can also be seen as an attention mechanism for each pixel i into the first image's feature map within a disparity range”) comprises:
determining a query vector based at least in part on multiplying the first embedding with a first set of learned weights;
determining a key vector based at least in part on multiplying the second embedding with a second set of learned weights; and
determining a first dot product between the query vector and the key vector.
	Liu does not explicitly teach determining a query vector based at least in part on multiplying the first embedding with a first set of learned weights;
determining a key vector based at least in part on multiplying the second embedding with a second set of learned weights; and
determining a first dot product between the query vector and the key vector.
	However, COIMBRA DE ANDRADE in the same field of endeavor (COIMBRA DE ANDRADE, at least one para. 0008; “A machine learning model may be trained to process data to perform a task. For example, a machine learning model may be included in an autonomous driving system of a vehicle. The machine learning model trained to process an image and/or to detect an object (e.g., a vehicle, a traffic sign, and/or the like) depicted in the image and/or the video. The autonomous driving system may utilize an output of the machine learning model (e.g., information indicating the detected object) to control an operation of the vehicle (e.g., to cause the vehicle to stop based on the machine learning model detecting a stop sign in the image and/or the video).”) teaches determining a query vector based at least in part on multiplying the first embedding (COIMBRA DE ANDRADE, at least one para. 0042; “The episodic DNN model may determine a similarity between the query vector and the key vector”) with a first set of learned weights (COIMBRA DE ANDRADE, at least one para. 0035; “As shown in FIG. 1C, and by reference number 125, the training system 110 processes the input data, with a convolutional neural network (CNN) model, to determine output data from the input data. The CNN model may include a residual neural network (ResNet) model, a GoogLeNet model, and/or the like. The CNN model may include multiple layers of bi-dimensional convolutional filters. The bi-dimensional convolutional filters may be associated with a set of weights. The CNN model may determine values for the weights based on processing the input data as part of a training process for training the CNN model.”);
determining a key vector based at least in part on multiplying the second embedding (COIMBRA DE ANDRADE, at least one para. 0042; “The episodic DNN model may determine a similarity between the query vector and the key vector”) with a second set of learned weights (COIMBRA DE ANDRADE, at least one para. 0035; “As shown in FIG. 1C, and by reference number 125, the training system 110 processes the input data, with a convolutional neural network (CNN) model, to determine output data from the input data. The CNN model may include a residual neural network (ResNet) model, a GoogLeNet model, and/or the like. The CNN model may include multiple layers of bi-dimensional convolutional filters. The bi-dimensional convolutional filters may be associated with a set of weights. The CNN model may determine values for the weights based on processing the input data as part of a training process for training the CNN model.”); and
determining a first dot product between the query vector and the key vector (COIMBRA DE ANDRADE, at least one para. 0042; “The episodic DNN model may determine a similarity between the query vector and the key vector. In some implementations, the episodic DNN model may perform a dot product operation with the query vector and the key vector to determine a similarity score associated with the sample. The episodic DNN model may determine a similarity score for each sample included in the set of input data in a similar manner..”).
Liu and COIMBRA DE ANDRADE are both considered to be analogous to the claimed invention because both of them are in the same field as the image processing through a machine learning as the claimed invention. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filling date of the claimed invention, to have modified the determination of the scoring of the Liu with teaching of COIMBRA DE ANDRADE. One of ordinary skill in the art would have been capable of applying a known technique to a known device (method, or product) that was ready for improvement, and the results would have been predictable to one of ordinary skill in the art. Furthermore, the results would have been obvious so that the semantic DNN model may process the output data to determine one or more features sets associated with the output data. The semantic DNN model may generate a feature vector that includes a series of information associated with the determined feature sets. The semantic DNN model may predict a classification associated with the samples based on the determined feature sets (COIMBRA DE ANDRADE; 0043).

Regarding claim 8, Liu teaches the limitations of claim 7, upon which the instant claim depends, as discussed supra. Further, Liu teaches (Original) The one or more non-transitory computer-readable media of The one or more non-transitory computer-readable media of
wherein:
the operations further comprise determining, by a transformer-based machine- learned model and based at least in part on the first embedding and the second embedding (Liu, at least one para. 0165; “the machine-learned models 1010 can be or can otherwise include various machine-learned models such as, for example, neural networks (e.g., deep neural networks or other types of models including linear models and/or non-linear models. Example neural networks include feed-forward neural networks, recurrent neural networks (e.g., long short-term memory recurrent neural networks), convolutional neural networks, or other forms of neural networks.”), a score indicating a relationship between the sensor data and the map data (Liu, at least one para. 0069; “A conditional entropy model can be used to model the correlation between the two image codes of the two images to reduce the joint entropy, and hence the joint bitrate, of the two image codes.”);
determining the output is based at least in part on the score (Liu, at least one para. 0166; “the computing system 1002 can implement the machine-learned model(s) 1010 to generate uncertainty data for object detections, predictions, and motion plan generation based on sensor data.”); and
determining the score comprises:
determining a query vector based at least in part on multiplying the first embedding with a first set of learned weights;
determining a key vector based at least in part on multiplying the second embedding with a second set of learned weights; and
determining a first dot product between the query vector and the key vector.
Liu does not explicitly teach determining the score comprises:
determining a query vector based at least in part on multiplying the first embedding with a first set of learned weights;
determining a key vector based at least in part on multiplying the second embedding with a second set of learned weights; and
determining a first dot product between the query vector and the key vector.
	However, COIMBRA DE ANDRADE in the same field of endeavor (COIMBRA DE ANDRADE, at least one para. 0008; “A machine learning model may be trained to process data to perform a task. For example, a machine learning model may be included in an autonomous driving system of a vehicle. The machine learning model trained to process an image and/or to detect an object (e.g., a vehicle, a traffic sign, and/or the like) depicted in the image and/or the video. The autonomous driving system may utilize an output of the machine learning model (e.g., information indicating the detected object) to control an operation of the vehicle (e.g., to cause the vehicle to stop based on the machine learning model detecting a stop sign in the image and/or the video).”) teaches determining the score comprises: determining a query vector based at least in part on multiplying the first embedding (COIMBRA DE ANDRADE, at least one para. 0042; “The episodic DNN model may determine a similarity between the query vector and the key vector”) with a first set of learned weights (COIMBRA DE ANDRADE, at least one para. 0035; “As shown in FIG. 1C, and by reference number 125, the training system 110 processes the input data, with a convolutional neural network (CNN) model, to determine output data from the input data. The CNN model may include a residual neural network (ResNet) model, a GoogLeNet model, and/or the like. The CNN model may include multiple layers of bi-dimensional convolutional filters. The bi-dimensional convolutional filters may be associated with a set of weights. The CNN model may determine values for the weights based on processing the input data as part of a training process for training the CNN model.”);
determining a key vector based at least in part on multiplying the second embedding (COIMBRA DE ANDRADE, at least one para. 0042; “The episodic DNN model may determine a similarity between the query vector and the key vector”) with a second set of learned weights (COIMBRA DE ANDRADE, at least one para. 0035; “As shown in FIG. 1C, and by reference number 125, the training system 110 processes the input data, with a convolutional neural network (CNN) model, to determine output data from the input data. The CNN model may include a residual neural network (ResNet) model, a GoogLeNet model, and/or the like. The CNN model may include multiple layers of bi-dimensional convolutional filters. The bi-dimensional convolutional filters may be associated with a set of weights. The CNN model may determine values for the weights based on processing the input data as part of a training process for training the CNN model.”); and
determining a first dot product between the query vector and the key vector (COIMBRA DE ANDRADE, at least one para. 0042; “The episodic DNN model may determine a similarity between the query vector and the key vector. In some implementations, the episodic DNN model may perform a dot product operation with the query vector and the key vector to determine a similarity score associated with the sample. The episodic DNN model may determine a similarity score for each sample included in the set of input data in a similar manner..”).
Liu and COIMBRA DE ANDRADE are both considered to be analogous to the claimed invention because both of them are in the same field as the image processing through a machine learning as the claimed invention. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filling date of the claimed invention, to have modified the determination of the scoring of the Liu with teaching of COIMBRA DE ANDRADE. One of ordinary skill in the art would have been capable of applying a known technique to a known device (method, or product) that was ready for improvement, and the results would have been predictable to one of ordinary skill in the art. Furthermore, the results would have been obvious so that the semantic DNN model may process the output data to determine one or more features sets associated with the output data. The semantic DNN model may generate a feature vector that includes a series of information associated with the determined feature sets. The semantic DNN model may predict a classification associated with the samples based on the determined feature sets (COIMBRA DE ANDRADE; 0043).

Regarding claim 17, Liu teaches the limitations of claim 16, upon which the instant claim depends, as discussed supra. Further, Liu teaches (Original) The method of claim 16, wherein: the method further comprises determining, by a transformer-based machine-learned model and based at least in part on the first embedding and the second embedding (Liu, at least one para. 0165; “the machine-learned models 1010 can be or can otherwise include various machine-learned models such as, for example, neural networks (e.g., deep neural networks or other types of models including linear models and/or non-linear models. Example neural networks include feed-forward neural networks, recurrent neural networks (e.g., long short-term memory recurrent neural networks), convolutional neural networks, or other forms of neural networks.”), a score indicating a relationship between the sensor data and the map data (Liu, at least one para. 0069; “A conditional entropy model can be used to model the correlation between the two image codes of the two images to reduce the joint entropy, and hence the joint bitrate, of the two image codes.”);
determining the output is based at least in part on the score (Liu, at least one para. 0166; “the computing system 1002 can implement the machine-learned model(s) 1010 to generate uncertainty data for object detections, predictions, and motion plan generation based on sensor data.”); and
determining the score comprises:
determining a query vector based at least in part on multiplying the first embedding with a first set of learned weights;
determining a key vector based at least in part on multiplying the second embedding with a second set of learned weights; and
determining a first dot product between the query vector and the key vector.
Liu does not explicitly teach determining a query vector based at least in part on multiplying the first embedding with a first set of learned weights;
determining a key vector based at least in part on multiplying the second embedding with a second set of learned weights; and
determining a first dot product between the query vector and the key vector.
	However, COIMBRA DE ANDRADE in the same field of endeavor (COIMBRA DE ANDRADE, at least one para. 0008; “A machine learning model may be trained to process data to perform a task. For example, a machine learning model may be included in an autonomous driving system of a vehicle. The machine learning model trained to process an image and/or to detect an object (e.g., a vehicle, a traffic sign, and/or the like) depicted in the image and/or the video. The autonomous driving system may utilize an output of the machine learning model (e.g., information indicating the detected object) to control an operation of the vehicle (e.g., to cause the vehicle to stop based on the machine learning model detecting a stop sign in the image and/or the video).”) teaches determining a query vector based at least in part on multiplying the first embedding (COIMBRA DE ANDRADE, at least one para. 0042; “The episodic DNN model may determine a similarity between the query vector and the key vector”) with a first set of learned weights (COIMBRA DE ANDRADE, at least one para. 0035; “As shown in FIG. 1C, and by reference number 125, the training system 110 processes the input data, with a convolutional neural network (CNN) model, to determine output data from the input data. The CNN model may include a residual neural network (ResNet) model, a GoogLeNet model, and/or the like. The CNN model may include multiple layers of bi-dimensional convolutional filters. The bi-dimensional convolutional filters may be associated with a set of weights. The CNN model may determine values for the weights based on processing the input data as part of a training process for training the CNN model.”);
determining a key vector based at least in part on multiplying the second embedding (COIMBRA DE ANDRADE, at least one para. 0042; “The episodic DNN model may determine a similarity between the query vector and the key vector”) with a second set of learned weights (COIMBRA DE ANDRADE, at least one para. 0035; “As shown in FIG. 1C, and by reference number 125, the training system 110 processes the input data, with a convolutional neural network (CNN) model, to determine output data from the input data. The CNN model may include a residual neural network (ResNet) model, a GoogLeNet model, and/or the like. The CNN model may include multiple layers of bi-dimensional convolutional filters. The bi-dimensional convolutional filters may be associated with a set of weights. The CNN model may determine values for the weights based on processing the input data as part of a training process for training the CNN model.”); and
determining a first dot product between the query vector and the key vector (COIMBRA DE ANDRADE, at least one para. 0042; “The episodic DNN model may determine a similarity between the query vector and the key vector. In some implementations, the episodic DNN model may perform a dot product operation with the query vector and the key vector to determine a similarity score associated with the sample. The episodic DNN model may determine a similarity score for each sample included in the set of input data in a similar manner..”).
Liu and COIMBRA DE ANDRADE are both considered to be analogous to the claimed invention because both of them are in the same field as the image processing through a machine learning as the claimed invention. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filling date of the claimed invention, to have modified the determination of the scoring of the Liu with teaching of COIMBRA DE ANDRADE. One of ordinary skill in the art would have been capable of applying a known technique to a known device (method, or product) that was ready for improvement, and the results would have been predictable to one of ordinary skill in the art. Furthermore, the results would have been obvious so that the semantic DNN model may process the output data to determine one or more features sets associated with the output data. The semantic DNN model may generate a feature vector that includes a series of information associated with the determined feature sets. The semantic DNN model may predict a classification associated with the samples based on the determined feature sets (COIMBRA DE ANDRADE; 0043).

Claim(s) 3-4, 6, 9-10, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Liu and COIMBRA DE ANDRADE, and further in view of Arditi (US 20230132889 A1) and QIN (US 20210110234 A1).

Regarding claim 3, The combination of Liu and COIMBRA DE ANDRADE teaches the limitations of claim 2, upon which the instant claim depends, as discussed supra. Further, Liu teaches (Currently Amended) The system of claim 2, wherein the output (Liu, at least one para. 0166; “the computing system 1002 can implement the machine-learned model(s) 1010 to generate uncertainty data for object detections, predictions, and motion plan generation based on sensor data.”) comprises the semantic segmentation and determining the semantic segmentation based at least in part on the attention score comprises at least one of:
	determining to associate a semantic label with the first portion of the sensor data based at least in part on determining that the attention score meets or exceeds a threshold score;
determining a value matrix based at least in part on multiplying the second embedding with a third set of learned weights;
determining a context vector based at least in part on determining a second dot product of the attention score and the value matrix; and
determining, by a decoder or a threshold value and based at least in part on the context vector, the semantic label to associated with the first portion of the sensor data. 
The combination of Liu and COIMBRA DE ANDRADE does not explicitly teach the semantic segmentation and determining the semantic segmentation based at least in part on the attention score comprises at least one of:
	determining to associate a semantic label with the first portion of the sensor data based at least in part on determining that the attention score meets or exceeds a threshold score;
determining a value matrix based at least in part on multiplying the second embedding with a third set of learned weights;
determining a context vector based at least in part on determining a second dot product of the attention score and the value matrix; and
determining, by a decoder or a threshold value and based at least in part on the context vector, the semantic label to associated with the first portion of the sensor data. 
	However, Arditi in the same field of endeavor (Arditi, at least one para. 0015; “Subject matter described herein is generally directed to generating and updating HD maps using data from different, heterogeneous sources. As described above, HD maps may be used by autonomous vehicles for driving and navigation, and as such, HD maps need to include highly accurate and detailed information to ensure the safe operation of autonomous vehicles.”) teaches the semantic segmentation and determining the semantic segmentation based at least in part on the attention score (Arditi, at least one para. 0021; “A training sample may further be associated with a known, target output, which in particular embodiments may be existing HD map data 480 at that particular location (x, y), which may include labeled segments or bounding boxes that indicate particular types of objects (e.g., curbs, walls, dividers, buildings, structures, etc.) in the HD map data. As an example, a labeled segment or bounding box may indicate that a known lane divider, for example, is within a boundary of a particular three-dimensional region in the HD map.”) comprises at least one of:
determining to associate a semantic label with the first portion of the sensor data based at least in part on determining that the attention score meets or exceeds a threshold score (Arditi, at least one para. 0048; “for each detected object, the system may check whether that object exists in the map data. In particular embodiments, the system may generate a confidence score representing the likelihood of the detected object being accounted for in the map data. The confidence score may be based on, for example, a similarity comparison of the measured size, dimensions, classification, and/or location of the detected object with known objects in the map data. At step 745, if the comparison results in a determination that the detected object(s) exists or is known in the HD map (e.g., the confidence score in the object existing in the map is higher than a threshold)”);
determining a value matrix based at least in part on multiplying the second embedding with a third set of learned weights;
determining a context vector based at least in part on determining a second dot product of the attention score and the value matrix; and
determining, by a decoder or a threshold value and based at least in part on the context vector, the semantic label to associated with the first portion of the sensor data (Arditi, at least one para. 0049; “The map-updating operation may begin at step 750, where the system may send sensor data and associated data gathered at that particular location to a server.”). 
The combination of Liu and COIMBRA DE ANDRADE, along with Arditi are considered to be analogous to the claimed invention because both of them are in the same field as the image processing through a machine learning as the claimed invention. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filling date of the claimed invention, to have modified the output of Liu with the semantic segmentation of Arditi. One of ordinary skill in the art would have been capable of applying a known technique to a known device (method, or product) that was ready for improvement, and the results would have been predictable to one of ordinary skill in the art so that the system can determine the map-updating operation is necessary or not (Arditi; 0048).
The combination of Liu, COIMBRA DE ANDRADE, and Arditi does not explicitly teach determining a value matrix based at least in part on multiplying the second embedding with a third set of learned weights;
determining a context vector based at least in part on determining a second dot product of the attention score and the value matrix; and
However, QIN in the same field of endeavor (QIN, at least one para. 0004; “Embodiments of the disclosure provide a computer-implemented method for executing an activation function of a neural network. The method can include: receiving a plurality of input vectors of input data; generating, among the plurality of input vectors, an estimation value associated with a subset of an input vector based on a weight vector of the activation function; determining whether the estimation value associated with the subset of the input vector satisfies a threshold condition; and determining an output of the activation function based on the estimation value.”) teaches determining a value matrix based at least in part on multiplying the second embedding with a third set of learned weights (QIN, at least one para. 0070; “In some embodiments, the neural network can perform the dot production on the second subset of weight elements and the second subset of input elements to generate a second dot product, and generate the output by adding the estimation value and the second dot product.”);
determining a context vector based at least in part on determining a second dot product of the attention score and the value matrix (QIN, at least one para. 0070; “In some embodiments, the neural network can perform the dot production on the second subset of weight elements and the second subset of input elements to generate a second dot product, and generate the output by adding the estimation value and the second dot product.”); and
The combination of Liu, COIMBRA DE ANDRADE, Arditi, and QIN are considered to be analogous to the claimed invention because both of them are in the same field as the image processing through a machine learning as the claimed invention. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filling date of the claimed invention, to have modified the output of Liu with the teaching of QIN. One of ordinary skill in the art would have been capable of applying a known technique to a known device (method, or product) that was ready for improvement, and the results would have been predictable to one of ordinary skill in the art so that the system can save computational resources (QIN; 0070).

Regarding claim 4, The combination of Liu and COIMBRA DE ANDRADE teaches the limitations of claim 2, upon which the instant claim depends, as discussed supra. Further, Liu teaches (Currently Amended) The system of claim 2, wherein the output comprises the detection and determining the detection based at least in part on the attention score (Liu, at least one para. 0166; “the computing system 1002 can implement the machine-learned model(s) 1010 to generate uncertainty data for object detections, predictions, and motion plan generation based on sensor data.”) comprises at least one of:
determining that the first dot product does not meet a threshold;
determining a value matrix based at least in part on multiplying the second embedding with a third set of learned weights;
determining a context vector based at least in part on determining a second dot product of the attention score and the value matrix; and
determining, by a decoder based at least in part on the context vector, the object detection.
The combination of Liu and COIMBRA DE ANDRADE does not explicitly teach the attention score comprises at least one of:
determining that the first dot product does not meet a threshold;
determining a value matrix based at least in part on multiplying the second embedding with a third set of learned weights;
determining a context vector based at least in part on determining a second dot product of the attention score and the value matrix; and
determining, by a decoder based at least in part on the context vector, the object detection.
However, Arditi in the same field of endeavor (Arditi, at least one para. 0015; “Subject matter described herein is generally directed to generating and updating HD maps using data from different, heterogeneous sources. As described above, HD maps may be used by autonomous vehicles for driving and navigation, and as such, HD maps need to include highly accurate and detailed information to ensure the safe operation of autonomous vehicles.”) teaches the score comprises at least one of:
determining that the first dot product does not meet a threshold (Arditi, at least one para. 0048; “for each detected object, the system may check whether that object exists in the map data. In particular embodiments, the system may generate a confidence score representing the likelihood of the detected object being accounted for in the map data. The confidence score may be based on, for example, a similarity comparison of the measured size, dimensions, classification, and/or location of the detected object with known objects in the map data. At step 745, if the comparison results in a determination that the detected object(s) exists or is known in the HD map (e.g., the confidence score in the object existing in the map is higher than a threshold)”); or 
determining a value matrix based at least in part on multiplying the second embedding with a third set of learned weights;
determining a context vector based at least in part on determining a second dot product of the attention score and the value matrix; and
determining, by a decoder based at least in part on the context vector, the object detection (Arditi, at least one para. 0049; “In particular embodiments, the server may also perform a comparison of the received data (and any objects detected therefrom) with a server-copy of the HD map to determine whether a mismatch exists.”).
The combination of Liu and COIMBRA DE ANDRADE, along with Arditi are considered to be analogous to the claimed invention because both of them are in the same field as the image processing through a machine learning as the claimed invention. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filling date of the claimed invention, to have modified the output of Liu with teaching of Arditi. One of ordinary skill in the art would have been capable of applying a known technique to a known device (method, or product) that was ready for improvement, and the results would have been predictable to one of ordinary skill in the art so that the system can determine the map-updating operation is necessary or not (Arditi; 0048).
The combination of Liu, COIMBRA DE ANDRADE, and Arditi does not explicitly teach
determining a value matrix based at least in part on multiplying the second embedding with a third set of learned weights;
determining a context vector based at least in part on determining a second dot product of the attention score and the value matrix; and
However, QIN in the same field of endeavor (QIN, at least one para. 0004; “Embodiments of the disclosure provide a computer-implemented method for executing an activation function of a neural network. The method can include: receiving a plurality of input vectors of input data; generating, among the plurality of input vectors, an estimation value associated with a subset of an input vector based on a weight vector of the activation function; determining whether the estimation value associated with the subset of the input vector satisfies a threshold condition; and determining an output of the activation function based on the estimation value.”) teaches determining a value matrix based at least in part on multiplying the second embedding with a third set of learned weights (QIN, at least one para. 0070; “In some embodiments, the neural network can perform the dot production on the second subset of weight elements and the second subset of input elements to generate a second dot product, and generate the output by adding the estimation value and the second dot product.”);
determining a context vector based at least in part on determining a second dot product of the attention score and the value matrix (QIN, at least one para. 0070; “In some embodiments, the neural network can perform the dot production on the second subset of weight elements and the second subset of input elements to generate a second dot product, and generate the output by adding the estimation value and the second dot product.”); and
The combination of Liu, COIMBRA DE ANDRADE, Arditi, and QIN are considered to be analogous to the claimed invention because both of them are in the same field as the image processing through a machine learning as the claimed invention. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filling date of the claimed invention, to have modified the output of Liu with the teaching of QIN. One of ordinary skill in the art would have been capable of applying a known technique to a known device (method, or product) that was ready for improvement, and the results would have been predictable to one of ordinary skill in the art so that the system can save computational resources (QIN; 0070).

Regarding claim 6, The combination of Liu and COIMBRA DE ANDRADE teaches the limitations of claim 2, upon which the instant claim depends, as discussed supra. Further, Liu teaches (Currently Amended) The system of claim 2, wherein the output comprises the depth and determining the depth based at least in part on the attention score (Liu, at least one para. 0041; “In some examples, an end to end deep architecture for multiple image compression (e.g., stereo image compression) is provided. The architecture can provide implicit depth estimation and compression that are performed jointly in the machine learned image compression model”) comprises:
determining that the first dot product meets or exceeds a threshold;
determining a surface associated with the first portion of the map data; and
associating a distance from a position of the sensor to the surface with the first portion of the sensor data as the depth.
The combination of Liu and COIMBRA DE ANDRADE does not explicitly teach determining that the first dot product meets or exceeds a threshold;
determining a surface associated with the first portion of the map data; and
associating a distance from a position of the sensor to the surface with the first portion of the sensor data as the depth.
However, Arditi in the same field of endeavor (Arditi, at least one para. 0015; “Subject matter described herein is generally directed to generating and updating HD maps using data from different, heterogeneous sources. As described above, HD maps may be used by autonomous vehicles for driving and navigation, and as such, HD maps need to include highly accurate and detailed information to ensure the safe operation of autonomous vehicles.”) teaches the determining that the first dot product meets or exceeds a threshold (Arditi, at least one para. 0048; “for each detected object, the system may check whether that object exists in the map data. In particular embodiments, the system may generate a confidence score representing the likelihood of the detected object being accounted for in the map data. The confidence score may be based on, for example, a similarity comparison of the measured size, dimensions, classification, and/or location of the detected object with known objects in the map data. At step 745, if the comparison results in a determination that the detected object(s) exists or is known in the HD map (e.g., the confidence score in the object existing in the map is higher than a threshold), then the system may not perform any map-updating operation and return to obtaining sensor data (e.g., step 710)”);
determining a surface associated with the first portion of the map data (Arditi, at least one para. 0049; “the server may prioritize autonomous vehicles that are in the region (e.g., within a threshold distance) of where the new object is detected or have trajectories that would result in the vehicles being in that region in the near future.”, wherein the autonomous vehicle is seen as the associated surface with the first portion of the map data); and
associating a distance from a position of the sensor to the surface with the first portion of the sensor data as the depth (Arditi, at least one para. 0049; “the server may prioritize autonomous vehicles that are in the region (e.g., within a threshold distance) of where the new object is detected or have trajectories that would result in the vehicles being in that region in the near future.”, wherein the region within a threshold distance is seen as the distance from a position of a sensor to the surface with the first portion of the sensor data as the depth).
The combination of Liu and COIMBRA DE ANDRADE, along with Arditi are considered to be analogous to the claimed invention because both of them are in the same field as the image processing through a machine learning as the claimed invention. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filling date of the claimed invention, to have modified the output of the Liu with teaching of Arditi. One of ordinary skill in the art would have been capable of applying a known technique to a known device (method, or product) that was ready for improvement, and the results would have been predictable to one of ordinary skill in the art so that the system can identify any mismatch between the existing HD map and the current sensor measurement of the world (Arditi; 0049).

Regarding claim 9, The combination of Liu and COIMBRA DE ANDRADE teaches the limitations of claim 8, upon which the instant claim depends, as discussed supra. Further, Arditi teaches (Currently Amended) The one or more non-transitory computer-readable media of claim 8, wherein the output (Liu, at least one para. 0166; “the computing system 1002 can implement the machine-learned model(s) 1010 to generate uncertainty data for object detections, predictions, and motion plan generation based on sensor data.”) comprises the semantic segmentation and determining the semantic segmentation based at least in part on the score comprises at least one of:
determining to associate a semantic label with the sensor data based at least in part on determining that the score meets or exceeds a threshold score;
determining a value matrix based at least in part on multiplying the second embedding with a third set of learned weights;
determining a context vector based at least in part on determining a second dot product of the score and the value matrix; and
determining, by a decoder or a threshold value and based at least in part on the context vector, the semantic label to associated with the sensor data. 
The combination of Liu and COIMBRA DE ANDRADE does not explicitly teach the semantic segmentation and determining the semantic segmentation based at least in part on the score comprises at least one of:
determining to associate a semantic label with the sensor data based at least in part on determining that the score meets or exceeds a threshold score;
determining a value matrix based at least in part on multiplying the second embedding with a third set of learned weights;
determining a context vector based at least in part on determining a second dot product of the score and the value matrix; and
determining, by a decoder or a threshold value and based at least in part on the context vector, the semantic label to associated with the sensor data. 
 However, Arditi in the same field of endeavor (Arditi, at least one para. 0015; “Subject matter described herein is generally directed to generating and updating HD maps using data from different, heterogeneous sources. As described above, HD maps may be used by autonomous vehicles for driving and navigation, and as such, HD maps need to include highly accurate and detailed information to ensure the safe operation of autonomous vehicles.”) teaches the semantic segmentation and determining the semantic segmentation based at least in part on the score (Arditi, at least one para. 0021; “A training sample may further be associated with a known, target output, which in particular embodiments may be existing HD map data 480 at that particular location (x, y), which may include labeled segments or bounding boxes that indicate particular types of objects (e.g., curbs, walls, dividers, buildings, structures, etc.) in the HD map data. As an example, a labeled segment or bounding box may indicate that a known lane divider, for example, is within a boundary of a particular three-dimensional region in the HD map.”) comprises at least one of:
determining to associate a semantic label with the sensor data based at least in part on determining that the score meets or exceeds a threshold score (Arditi, at least one para. 0048; “for each detected object, the system may check whether that object exists in the map data. In particular embodiments, the system may generate a confidence score representing the likelihood of the detected object being accounted for in the map data. The confidence score may be based on, for example, a similarity comparison of the measured size, dimensions, classification, and/or location of the detected object with known objects in the map data. At step 745, if the comparison results in a determination that the detected object(s) exists or is known in the HD map (e.g., the confidence score in the object existing in the map is higher than a threshold)”);
determining a value matrix based at least in part on multiplying the second embedding with a third set of learned weights;
determining a context vector based at least in part on determining a second dot product of the score and the value matrix; and
determining, by a decoder or a threshold value and based at least in part on the context vector, the semantic label to associated with the sensor data (Arditi, at least one para. 0049; “The map-updating operation may begin at step 750, where the system may send sensor data and associated data gathered at that particular location to a server.”).
The combination of Liu and COIMBRA DE ANDRADE, along with Arditi are considered to be analogous to the claimed invention because both of them are in the same field as the image processing through a machine learning as the claimed invention. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filling date of the claimed invention, to have modified the output of Liu with the semantic segmentation of Arditi. One of ordinary skill in the art would have been capable of applying a known technique to a known device (method, or product) that was ready for improvement, and the results would have been predictable to one of ordinary skill in the art so that the system can determine the map-updating operation is necessary or not (Arditi; 0048).
The combination of Liu, COIMBRA DE ANDRADE, and Arditi does not explicitly teach determining a value matrix based at least in part on multiplying the second embedding with a third set of learned weights;
determining a context vector based at least in part on determining a second dot product of the score and the value matrix; and
However, QIN in the same field of endeavor (QIN, at least one para. 0004; “Embodiments of the disclosure provide a computer-implemented method for executing an activation function of a neural network. The method can include: receiving a plurality of input vectors of input data; generating, among the plurality of input vectors, an estimation value associated with a subset of an input vector based on a weight vector of the activation function; determining whether the estimation value associated with the subset of the input vector satisfies a threshold condition; and determining an output of the activation function based on the estimation value.”) teaches determining a value matrix based at least in part on multiplying the second embedding with a third set of learned weights (QIN, at least one para. 0070; “In some embodiments, the neural network can perform the dot production on the second subset of weight elements and the second subset of input elements to generate a second dot product, and generate the output by adding the estimation value and the second dot product.”);
determining a context vector based at least in part on determining a second dot product of the score and the value matrix (QIN, at least one para. 0070; “In some embodiments, the neural network can perform the dot production on the second subset of weight elements and the second subset of input elements to generate a second dot product, and generate the output by adding the estimation value and the second dot product.”); and
The combination of Liu, COIMBRA DE ANDRADE, Arditi, and QIN are considered to be analogous to the claimed invention because both of them are in the same field as the image processing through a machine learning as the claimed invention. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filling date of the claimed invention, to have modified the output of Liu with the teaching of QIN. One of ordinary skill in the art would have been capable of applying a known technique to a known device (method, or product) that was ready for improvement, and the results would have been predictable to one of ordinary skill in the art so that the system can save computational resources (QIN; 0070).

Regarding claim 10, The combination of Liu and COIMBRA DE ANDRADE teaches the limitations of claim 8, upon which the instant claim depends, as discussed supra. Further, Liu teaches (Currently Amended) The one or more non-transitory computer-readable media of claim 8, wherein the output comprises the detection and determining the detection based at least in part on the score (Liu, at least one para. 0166; “the computing system 1002 can implement the machine-learned model(s) 1010 to generate uncertainty data for object detections, predictions, and motion plan generation based on sensor data.”) comprises at least one of:
determining that the first dot product does not meet a threshold;
determining a value matrix based at least in part on multiplying the second embedding with a third set of learned weights;
determining a context vector based at least in part on determining a second dot product of the score and the value matrix; and
determining, by a decoder based at least in part on the context vector, the object detection.
The combination of Liu and COIMBRA DE ANDRADE does not explicitly teach the score comprises at least one of:
determining that the first dot product does not meet a threshold;
determining a value matrix based at least in part on multiplying the second embedding with a third set of learned weights;
determining a context vector based at least in part on determining a second dot product of the score and the value matrix; and
determining, by a decoder based at least in part on the context vector, the object detection.
However, Arditi in the same field of endeavor (Arditi, at least one para. 0015; “Subject matter described herein is generally directed to generating and updating HD maps using data from different, heterogeneous sources. As described above, HD maps may be used by autonomous vehicles for driving and navigation, and as such, HD maps need to include highly accurate and detailed information to ensure the safe operation of autonomous vehicles.”) teaches the score comprises at least one of:
determining that the first dot product does not meet a threshold (Arditi, at least one para. 0048; “for each detected object, the system may check whether that object exists in the map data. In particular embodiments, the system may generate a confidence score representing the likelihood of the detected object being accounted for in the map data. The confidence score may be based on, for example, a similarity comparison of the measured size, dimensions, classification, and/or location of the detected object with known objects in the map data. At step 745, if the comparison results in a determination that the detected object(s) exists or is known in the HD map (e.g., the confidence score in the object existing in the map is higher than a threshold)”); 
determining a value matrix based at least in part on multiplying the second embedding with a third set of learned weights;
determining a context vector based at least in part on determining a second dot product of the score and the value matrix; and
determining, by a decoder based at least in part on the context vector, the object detection (Arditi, at least one para. 0049; “In particular embodiments, the server may also perform a comparison of the received data (and any objects detected therefrom) with a server-copy of the HD map to determine whether a mismatch exists.”). 
The combination of Liu and COIMBRA DE ANDRADE, along with Arditi are considered to be analogous to the claimed invention because both of them are in the same field as the image processing through a machine learning as the claimed invention. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filling date of the claimed invention, to have modified the output of Liu with teaching of Arditi. One of ordinary skill in the art would have been capable of applying a known technique to a known device (method, or product) that was ready for improvement, and the results would have been predictable to one of ordinary skill in the art so that the system can determine the map-updating operation is necessary or not (Arditi; 0048).
The combination of Liu, COIMBRA DE ANDRADE, and Arditi does not explicitly teach
determining a value matrix based at least in part on multiplying the second embedding with a third set of learned weights;
determining a context vector based at least in part on determining a second dot product of the score and the value matrix; and
However, QIN in the same field of endeavor (QIN, at least one para. 0004; “Embodiments of the disclosure provide a computer-implemented method for executing an activation function of a neural network. The method can include: receiving a plurality of input vectors of input data; generating, among the plurality of input vectors, an estimation value associated with a subset of an input vector based on a weight vector of the activation function; determining whether the estimation value associated with the subset of the input vector satisfies a threshold condition; and determining an output of the activation function based on the estimation value.”) teaches determining a value matrix based at least in part on multiplying the second embedding with a third set of learned weights (QIN, at least one para. 0070; “In some embodiments, the neural network can perform the dot production on the second subset of weight elements and the second subset of input elements to generate a second dot product, and generate the output by adding the estimation value and the second dot product.”);
determining a context vector based at least in part on determining a second dot product of the score and the value matrix (QIN, at least one para. 0070; “In some embodiments, the neural network can perform the dot production on the second subset of weight elements and the second subset of input elements to generate a second dot product, and generate the output by adding the estimation value and the second dot product.”); and
The combination of Liu, COIMBRA DE ANDRADE, Arditi, and QIN are considered to be analogous to the claimed invention because both of them are in the same field as the image processing through a machine learning as the claimed invention. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filling date of the claimed invention, to have modified the output of Liu with the teaching of QIN. One of ordinary skill in the art would have been capable of applying a known technique to a known device (method, or product) that was ready for improvement, and the results would have been predictable to one of ordinary skill in the art so that the system can save computational resources (QIN; 0070).

Regarding claim 19, The combination of Liu and COIMBRA DE ANDRADE teaches the limitations of claim 17, upon which the instant claim depends, as discussed supra. Further, Liu teaches (Original) The method of claim 17, wherein the output comprises the depth and determining the depth based at least in part on the score (Liu, at least one para. 0041; “In some examples, an end to end deep architecture for multiple image compression (e.g., stereo image compression) is provided. The architecture can provide implicit depth estimation and compression that are performed jointly in the machine learned image compression model”) comprises:
determining that the first dot product meets or exceeds a threshold;
determining a surface associated with the map data; and
associating a distance from a position of a sensor to the surface with the sensor data as the depth.
The combination of Liu and COIMBRA DE ANDRADE does not explicitly teach determining that the first dot product meets or exceeds a threshold;
determining a surface associated with the map data; and
associating a distance from a position of a sensor to the surface with the sensor data as the depth.
However, Arditi in the same field of endeavor (Arditi, at least one para. 0015; “Subject matter described herein is generally directed to generating and updating HD maps using data from different, heterogeneous sources. As described above, HD maps may be used by autonomous vehicles for driving and navigation, and as such, HD maps need to include highly accurate and detailed information to ensure the safe operation of autonomous vehicles.”) teaches determining that the first dot product meets or exceeds a threshold (Arditi, at least one para. 0048; “for each detected object, the system may check whether that object exists in the map data. In particular embodiments, the system may generate a confidence score representing the likelihood of the detected object being accounted for in the map data. The confidence score may be based on, for example, a similarity comparison of the measured size, dimensions, classification, and/or location of the detected object with known objects in the map data. At step 745, if the comparison results in a determination that the detected object(s) exists or is known in the HD map (e.g., the confidence score in the object existing in the map is higher than a threshold), then the system may not perform any map-updating operation and return to obtaining sensor data (e.g., step 710)”);
determining a surface associated with the map data (Arditi, at least one para. 0049; “the server may prioritize autonomous vehicles that are in the region (e.g., within a threshold distance) of where the new object is detected or have trajectories that would result in the vehicles being in that region in the near future.”, wherein the autonomous vehicle is seen as the associated surface with the first portion of the map data); and
associating a distance from a position of a sensor to the surface with the sensor data as the depth (Arditi, at least one para. 0049; “the server may prioritize autonomous vehicles that are in the region (e.g., within a threshold distance) of where the new object is detected or have trajectories that would result in the vehicles being in that region in the near future.”, wherein the region within a threshold distance is seen as the distance from a position of a sensor to the surface with the first portion of the sensor data as the depth).
The combination of Liu and COIMBRA DE ANDRADE, along with Arditi are considered to be analogous to the claimed invention because both of them are in the same field as the image processing through a machine learning as the claimed invention. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filling date of the claimed invention, to have modified the output of the Liu with teaching of Arditi. One of ordinary skill in the art would have been capable of applying a known technique to a known device (method, or product) that was ready for improvement, and the results would have been predictable to one of ordinary skill in the art so that the system can identify any mismatch between the existing HD map and the current sensor measurement of the world (Arditi; 0049).

Claim(s) 11 is rejected under 35 U.S.C. 103 as being unpatentable over Liu and COIMBRA DE ANDRADE, and further in view of Bojarski (US 20200324795 A1).

Regarding claim 11, The combination of Liu and COIMBRA DE ANDRADE teaches the limitations of claim 2, upon which the instant claim depends, as discussed supra. Further, Liu teaches (Original) The one or more non-transitory computer-readable media of claim 8 wherein: the operations further comprise receiving a dynamic object detection associated with a first sensor, the dynamic object detection indicating existence of a dynamic object in the environment based on first sensor data received from the first sensor (Liu, at least one para. 0054; “The vehicle sensor(s) 116 can be configured to acquire sensor data 118. This can include sensor data associated with the surrounding environment of the vehicle 102. ”);
the output comprises the false positive indication indicating that the object detection is a false positive dynamic object; and
determining the false positive indication comprises determining that the first dot product meets or exceeds a threshold.
The combination of Liu and COIMBRA DE ANDRADE does not explicitly teach the output comprises the false positive indication indicating that the object detection is a false positive dynamic object; and
determining the false positive indication comprises determining that the first dot product meets or exceeds a threshold.
However, Bojarski in the same field of endeavor (Bojarski, at least one para. 0019; “Systems and methods are disclosed related to leveraging map information for generating ground truth data for training neural networks for autonomous machine applications.”) teaches the output comprises the false positive indication indicating that the object detection is a false positive dynamic object (Bojarski, at least one para. 0101; “The DLA may be used to run any type of network to enhance control and driving safety, including for example, a neural network that outputs a measure of confidence for each object detection. Such a confidence value may be interpreted as a probability, or as providing a relative “weight” of each detection compared to other detections. This confidence value enables the system to make further decisions regarding which detections should be considered as true positive detections rather than false positive detections.”); and
determining the false positive indication comprises determining that the first dot product meets or exceeds a threshold (Bojarski, at least one para. 0101; “the system may set a threshold value for the confidence and consider only the detections exceeding the threshold value as true positive detections. In other words, it is obvious that the system can identify the false positive when the first dot product meets the threshold value.”).
The combination of Liu and COIMBRA DE ANDRADE, along with Bojarski are considered to be analogous to the claimed invention because both of them are in the same field as the image processing through a machine learning as the claimed invention. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filling date of the claimed invention, to have modified the object detection of the Liu with teaching of Bojarski. One of ordinary skill in the art would have been capable of applying a known technique to a known device (method, or product) that was ready for improvement, and the results would have been predictable to one of ordinary skill in the art so that the system eliminate undesirable driving conditions (Bojarski; 0101).

Claim(s) 15 is rejected under 35 U.S.C. 103 as being unpatentable over Liu and COIMBRA DE ANDRADE, and further in view of Luyan LIU (US 20220215558 A1).

Regarding claim 15, The combination of Liu and COIMBRA DE ANDRADE teaches the limitations of claim 14, upon which the instant claim depends, as discussed supra. Further, Liu teaches (Previously Presented) The one or more non-transitory computer-readable media of claim 14, wherein:
the first machine-learned model comprises a first encoder (Liu, at least one para. 0029; “According to example aspects of the present disclosure, a machine-learned image compression model can include a first encoder configured to encode a first image into an image code such as a first latent image code in a latent space and a second encoder configured to encode a second image into a second image code such as a second latent image code in the latent space.”);
the second machine-learned model comprises a second encoder (Liu, at least one para. 0029; “According to example aspects of the present disclosure, a machine-learned image compression model can include a first encoder configured to encode a first image into an image code such as a first latent image code in a latent space and a second encoder configured to encode a second image into a second image code such as a second latent image code in the latent space.”);
a third encoder determines the third embedding, the operations further comprising a pre-training stage that comprises:
determining, by the third encoder based at least in part on a portion of the geometric data and a feature associated therewith, a training embedding;
determining, by a training decoder based at least in part on the training embedding,
a reconstruction of the portion of the geometric data and the feature;
determining a second loss based at least in part on a difference between the reconstruction and the geometric data and the feature; and
altering at least one of the third encoder, the training embedding, or the training decoder to reduce the second loss.
	The combination of Liu and COIMBRA DE ANDRADE does not explicitly teach that a third encoder determines the third embedding, the operations further comprising a pre-training stage that comprises:
determining, by the third encoder based at least in part on a portion of the geometric data and a feature associated therewith, a training embedding;
determining, by a training decoder based at least in part on the training embedding,
a reconstruction of the portion of the geometric data and the feature;
determining a second loss based at least in part on a difference between the reconstruction and the geometric data and the feature; and
altering at least one of the third encoder, the training embedding, or the training decoder to reduce the second loss.
However, Luyan LIU in the same field of endeavor (Luyan LIU, at least one para. 0032; “Specifically, the computer device may process each two-dimensional slice of the three-dimensional image by using a method of performing object detection on a two-dimensional image, to obtain the two-dimensional object detection result of each two-dimensional slice of the three-dimensional image; and process each two-dimensional slice of the three-dimensional image by using an algorithm of performing edge detection on the two-dimensional image, to obtain the two-dimensional edge detection result of each two-dimensional slice of the three-dimensional image..”) teaches a third encoder determines the third embedding, the operations further comprising a pre-training stage that comprises (Luyan LIU, at least one para. 0096; “Specifically, the computer device may respectively pre-train the object detection model and the edge detection model under supervision. After pre-training, the two models are connected by using the mutual learning module, to obtain a mutual object and edge detection network, which is then further trained.”):
determining, by the third encoder based at least in part on a portion of the geometric data and a feature associated therewith, a training embedding (Luyan LIU, at least one para. 0097; “The object detection model and the edge detection model obtained through pre-training are used to obtain an initial two-dimensional detection result (a two-dimensional initial object detection result and a two-dimensional initial edge detection result) of a two-dimensional image according to the two-dimensional image.”);
determining, by a training decoder based at least in part on the training embedding,
a reconstruction of the portion of the geometric data and the feature (Luyan LIU, at least one para. 0097; “The mutual object and edge detection network obtained through further training is used to obtain a two-dimensional detection result of the two-dimensional image according to the two-dimensional image. The two-dimensional detection result is used to be stacked into a three-dimensional detection result, to be used in steps such as S106 and S108.”);
determining a second loss based at least in part on a difference between the reconstruction and the geometric data and the feature (Luyan LIU, at least one para. 0113; “In a specific embodiment, the loss function trained under supervision may be a binary classification category cross-entropy loss function”); and
altering at least one of the third encoder, the training embedding, or the training decoder to reduce the second loss (Luyan LIU, at least one para. 0112; “Specifically, a computer device may train the object detection model under supervision according to a training sample (a two-dimensional image) and a training label (an object detection label) of the training sample and by constructing a loss function.”).
The combination of Liu and COIMBRA DE ANDRADE, along with Luyan LIU are considered to be analogous to the claimed invention because both of them are in the same field as the image processing through a machine learning as the claimed invention. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filling date of the claimed invention, to have modified the object detection of the Liu with teaching of Luyan LIU. One of ordinary skill in the art would have been capable of applying a known technique to a known device (method, or product) that was ready for improvement, and the results would have been predictable to one of ordinary skill in the art so that accuracy of the detection results can be improved (Luyan LIU; 0082).

Claim(s) 21 is rejected under 35 U.S.C. 103 as being unpatentable over Liu, and further in view of QIN (US 20210110234 A1).

Regarding claim 21, Liu teaches the limitations of claim 1, upon which the instant claim depends, as discussed supra. Further, Liu teaches (New) The system of claim 1, wherein determining the output (Liu, at least one para. 0166; “the computing system 1002 can implement the machine-learned model(s) 1010 to generate uncertainty data for object detections, predictions, and motion plan generation based on sensor data.”) further comprises comparing the attention score to a threshold score.  
Liu does not explicitly teach comparing the attention score to a threshold score.
However, QIN in the same field of endeavor (QIN, at least one para. 0004; “Embodiments of the disclosure provide a computer-implemented method for executing an activation function of a neural network. The method can include: receiving a plurality of input vectors of input data; generating, among the plurality of input vectors, an estimation value associated with a subset of an input vector based on a weight vector of the activation function; determining whether the estimation value associated with the subset of the input vector satisfies a threshold condition; and determining an output of the activation function based on the estimation value.”) teaches comparing the attention score to a threshold score (QIN, at least one para. 0054; “Estimator 302 can further determine whether the estimation value associated with the input vector is less than a given threshold. For example, when the estimation value is greater than or equal to the given threshold, it can indicate that it is very likely the dot product associated with the input vector is positive and can be used as the output. Otherwise, when the estimation value is less than the given threshold, it is indicated that it is very likely the dot product associated with the input vector can be set to zero by a rectifier.”).
The combination of Liu and QIN are considered to be analogous to the claimed invention because both of them are in the same field as the image processing through a machine learning as the claimed invention. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filling date of the claimed invention, to have modified the output of Liu with the teaching of QIN. One of ordinary skill in the art would have been capable of applying a known technique to a known device (method, or product) that was ready for improvement, and the results would have been predictable to one of ordinary skill in the art so that the system can save computational resources (QIN; 0055).

Claim(s) 22-24 are rejected under 35 U.S.C. 103 as being unpatentable over Liu, and further in view of Bojarski (US 20200324795 A1).

Regarding claim 22, Liu teaches the limitations of claim 1, upon which the instant claim depends, as discussed supra. Further, Liu teaches (New) The system of claim 1, wherein determining the attention score (Liu, at least one para. 0101; “In Equation 2, C.sub.d,i represents the cost of disparity d at pixel i. The pixel index that is d pixels to the right of pixel i is represented by (i, d). The volumetric warping provides a warped feature map g.sub.2.sup.t−1 which better aligns with the feature map of the second image. This can also be seen as an attention mechanism for each pixel i into the first image's feature map within a disparity range.”) further comprises:
determining, based at least in part on scaling the attention score according to a function, a scaled attention score; and
determining, based at least in part on applying a softmax function to the scaled attention score, the attention score. 
Liu does not explicitly teach determining, based at least in part on scaling the attention score according to a function, a scaled attention score; and
determining, based at least in part on applying a softmax function to the scaled attention score, the attention score. 
However, Bojarski in the same field of endeavor (Bojarski, at least one para. 0019; “Systems and methods are disclosed related to leveraging map information for generating ground truth data for training neural networks for autonomous machine applications.”) teaches determining, based at least in part on scaling the attention score according to a function, a scaled attention score (Bojarski, at least one para. 0041; “Although input layers, convolutional layers, pooling layers, ReLU layers, and fully connected layers are discussed herein with respect to the DNN 116, this is not intended to be limiting. For example, additional or alternative layers may be used in the DNN 116, such as normalization layers, SoftMax layers, and/or other layer types.”); and
determining, based at least in part on applying a softmax function to the scaled attention score, the attention score (Bojarski, at least one para. 0041; “Although input layers, convolutional layers, pooling layers, ReLU layers, and fully connected layers are discussed herein with respect to the DNN 116, this is not intended to be limiting. For example, additional or alternative layers may be used in the DNN 116, such as normalization layers, SoftMax layers, and/or other layer types.”). 
The combination of Liu and Bojarski are considered to be analogous to the claimed invention because both of them are in the same field as the image processing through a machine learning as the claimed invention. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filling date of the claimed invention, to have modified the attention score of the Liu with teaching of Bojarski. One of ordinary skill in the art would have been capable of applying a known technique to a known device (method, or product) that was ready for improvement, and the results would have been predictable to one of ordinary skill in the art so that the system prevent vanishing gradients and gain numerical stability during the machine-learning process. 

Regarding claim 23, Liu teaches the limitations of claim 1, upon which the instant claim depends, as discussed supra. Further, Liu teaches (New) The one or more non-transitory computer-readable media of claim 7, wherein the confidence score (Liu, at least one para. 0099; “A softmax layer can be applied to ensure the cost is normalized along the disparity dimension per pixel. Each value in the cost volume can be seen as a probability/confidence measure of the correct disparity at that coordinate.”) represents a likelihood that the at least one of the semantic segmentation, the object detection, the depth to the object, the localization error, or the false positive indication is accurate.  
Liu does not explicitly teach a likelihood that the at least one of the semantic segmentation, the object detection, the depth to the object, the localization error, or the false positive indication is accurate.  
However, Bojarski in the same field of endeavor (Bojarski, at least one para. 0019; “Systems and methods are disclosed related to leveraging map information for generating ground truth data for training neural networks for autonomous machine applications.”) teaches a likelihood that the at least one of the semantic segmentation, the object detection, the depth to the object, the localization error, or the false positive indication is accurate (Bojarski, at least one para. 0101; “The DLA may be used to run any type of network to enhance control and driving safety, including for example, a neural network that outputs a measure of confidence for each object detection. Such a confidence value may be interpreted as a probability, or as providing a relative “weight” of each detection compared to other detections. This confidence value enables the system to make further decisions regarding which detections should be considered as true positive detections rather than false positive detections.”). 
The combination of Liu and Bojarski are considered to be analogous to the claimed invention because both of them are in the same field as the image processing through a machine learning as the claimed invention. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filling date of the claimed invention, to have modified the object detection of the Liu with teaching of Bojarski. One of ordinary skill in the art would have been capable of applying a known technique to a known device (method, or product) that was ready for improvement, and the results would have been predictable to one of ordinary skill in the art so that the system eliminate undesirable driving conditions (Bojarski; 0101).

Regarding claim 24, Liu teaches the limitations of claim 1, upon which the instant claim depends, as discussed supra. Further, Liu teaches (New) The one or more non-transitory computer-readable media of claim 7, wherein the output (Liu, at least one para. 0166; “the computing system 1002 can implement the machine-learned model(s) 1010 to generate uncertainty data for object detections, predictions, and motion plan generation based on sensor data.”) comprises the false positive indication and, the operations further comprising: determining the false positive indication based at least in part on determining that an attention score associated with the first embedding and the second embedding meets or exceeds a threshold.
Liu does not explicitly teach the false positive indication and, the operations further comprising: determining the false positive indication based at least in part on determining that an attention score associated with the first embedding and the second embedding meets or exceeds a threshold.
However, Bojarski in the same field of endeavor (Bojarski, at least one para. 0019; “Systems and methods are disclosed related to leveraging map information for generating ground truth data for training neural networks for autonomous machine applications.”) teaches the false positive indication and, the operations further comprising: determining the false positive indication based at least in part on determining that an attention score associated with the first embedding and the second embedding meets or exceeds a threshold (Bojarski, at least one para. 0101; “The DLA may be used to run any type of network to enhance control and driving safety, including for example, a neural network that outputs a measure of confidence for each object detection. Such a confidence value may be interpreted as a probability, or as providing a relative “weight” of each detection compared to other detections. This confidence value enables the system to make further decisions regarding which detections should be considered as true positive detections rather than false positive detections.”). 
The combination of Liu and Bojarski are considered to be analogous to the claimed invention because both of them are in the same field as the image processing through a machine learning as the claimed invention. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filling date of the claimed invention, to have modified the object detection of the Liu with teaching of Bojarski. One of ordinary skill in the art would have been capable of applying a known technique to a known device (method, or product) that was ready for improvement, and the results would have been predictable to one of ordinary skill in the art so that the system eliminate undesirable driving conditions (Bojarski; 0101).

Claim(s) 25 is rejected under 35 U.S.C. 103 as being unpatentable over Liu, and further in view of QIN (US 20210110234 A1) and Bojarski (US 20200324795 A1).

Regarding claim 25, Liu teaches the limitations of claim 17, upon which the instant claim depends, as discussed supra. Further, Liu teaches (New) The method of claim 17, wherein the score represents an attention score, and wherein determining the attention score (Liu, at least one para. 0101; “In Equation 2, C.sub.d,i represents the cost of disparity d at pixel i. The pixel index that is d pixels to the right of pixel i is represented by (i, d). The volumetric warping provides a warped feature map g.sub.2.sup.t−1 which better aligns with the feature map of the second image. This can also be seen as an attention mechanism for each pixel i into the first image's feature map within a disparity range.”) comprises:
determining a value matrix based at least in part on multiplying the second embedding with a set of weights;
scaling the value matrix by a scaling factor and applying a softmax function to determine a normalized attention score;
determining a context vector based at least in part on determining a dot product of the normalized attention score and the value matrix; and
determining, based at least in part on the context vector, the attention score indicating a correlation strength between a region of the sensor data and the region of the map data.
Liu does not explicitly teach determining a value matrix based at least in part on multiplying the second embedding with a set of weights;
scaling the value matrix by a scaling factor and applying a softmax function to determine a normalized attention score;
determining a context vector based at least in part on determining a dot product of the normalized attention score and the value matrix; and
determining, based at least in part on the context vector, the attention score indicating a correlation strength between a region of the sensor data and the region of the map data.
However, QIN in the same field of endeavor (QIN, at least one para. 0004; “Embodiments of the disclosure provide a computer-implemented method for executing an activation function of a neural network. The method can include: receiving a plurality of input vectors of input data; generating, among the plurality of input vectors, an estimation value associated with a subset of an input vector based on a weight vector of the activation function; determining whether the estimation value associated with the subset of the input vector satisfies a threshold condition; and determining an output of the activation function based on the estimation value.”) teaches determining a value matrix based at least in part on multiplying the second embedding with a set of weights (QIN, at least one para. 0070; “In some embodiments, the neural network can perform the dot production on the second subset of weight elements and the second subset of input elements to generate a second dot product, and generate the output by adding the estimation value and the second dot product. ”);
scaling the value matrix by a scaling factor and applying a softmax function to determine a normalized attention score;
determining a context vector based at least in part on determining a dot product of the normalized attention score and the value matrix; and
determining, based at least in part on the context vector, the attention score indicating a correlation strength between a region of the sensor data and the region of the map data (QIN, at least one para. 0070; “In some embodiments, the neural network can perform the dot production on the second subset of weight elements and the second subset of input elements to generate a second dot product, and generate the output by adding the estimation value and the second dot product. ”).
The combination of Liu and QIN are considered to be analogous to the claimed invention because both of them are in the same field as the image processing through a machine learning as the claimed invention. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filling date of the claimed invention, to have modified the output of Liu with the teaching of QIN. One of ordinary skill in the art would have been capable of applying a known technique to a known device (method, or product) that was ready for improvement, and the results would have been predictable to one of ordinary skill in the art so that the system can save computational resources (QIN; 0070).
The combination of Liu and QIN does not explicitly teach scaling the value matrix by a scaling factor and applying a softmax function to determine a normalized attention score;
determining a context vector based at least in part on determining a dot product of the normalized attention score and the value matrix; and
However, Bojarski in the same field of endeavor (Bojarski, at least one para. 0019; “Systems and methods are disclosed related to leveraging map information for generating ground truth data for training neural networks for autonomous machine applications.”) teaches scaling the value matrix by a scaling factor and applying a softmax function to determine a normalized attention score (Bojarski, at least one para. 0041; “Although input layers, convolutional layers, pooling layers, ReLU layers, and fully connected layers are discussed herein with respect to the DNN 116, this is not intended to be limiting. For example, additional or alternative layers may be used in the DNN 116, such as normalization layers, SoftMax layers, and/or other layer types.”);
determining a context vector based at least in part on determining a dot product of the normalized attention score and the value matrix (Bojarski, at least one para. 0052; “The method 400, at block B408, includes generating, based at least in part on the HD map data and the localizing, ground truth data corresponding to the sensor data. For example, the ground truth generator 112 may generate the ground truth data corresponding to the instance of the sensor data 102 using the map data. In some embodiments, this may include transforming or shifting a coordinate system of the HD map 104 to a coordinate system of the vehicle 500, correlating the map data with the sensor data 102, and/or other processes described herein with respect to the coordinate transformer 108, the correlator 110, and/or the ground truth generator 112.”); and
The combination of Liu and QIN, and Bojarski are considered to be analogous to the claimed invention because all of them are in the same field as the image processing through a machine learning as the claimed invention. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filling date of the claimed invention, to have modified the attention score of the Liu with teaching of Bojarski. One of ordinary skill in the art would have been capable of applying a known technique to a known device (method, or product) that was ready for improvement, and the results would have been predictable to one of ordinary skill in the art so that the system prevent vanishing gradients and gain numerical stability during the machine-learning process. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to UPUL P CHANDRASIRI whose telephone number is (703)756-5823. The examiner can normally be reached M-F 8.30 am to 5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Christian Chace can be reached at 571-272-4190. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/U.P.C./Examiner, Art Unit 3665                                                                                                                                                                                                        /CHRISTIAN CHACE/Supervisory Patent Examiner, Art Unit 3665
Read full office action
Prosecution Timeline

Apr 21, 2023
Application Filed
Apr 11, 2025
Non-Final Rejection — §102, §103
Jun 03, 2025
Applicant Interview (Telephonic)
Jun 03, 2025
Examiner Interview Summary
Jun 05, 2025
Response Filed
Aug 15, 2025
Non-Final Rejection — §102, §103
Oct 28, 2025
Applicant Interview (Telephonic)
Oct 30, 2025
Examiner Interview Summary
Nov 24, 2025
Response Filed
Feb 13, 2026
Non-Final Rejection — §102, §103
Apr 10, 2026
Applicant Interview (Telephonic)
Apr 13, 2026
Examiner Interview Summary
Precedent Cases

Applications granted by this same examiner with similar technology

18/149,308
Patent 12391240
VEHICLE DRIVING ASSIST DEVICE
2y 5m to grant Granted Aug 19, 2025
18/023,207
Patent 12325421
Method for Holding a Two-Track Motor Vehicle
2y 5m to grant Granted Jun 10, 2025
Study what changed to get past this examiner. Based on 2 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
20%
Grant Probability
-9%
With Interview (-28.6%)
2y 5m
Median Time to Grant
High
PTA Risk
Based on 10 resolved cases by this examiner. Grant probability derived from career allow rate.
CROSS-ATTENTION PERCEPTION MODEL TRAINED TO USE SENSOR AND/OR MAP DATA

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email