Last updated: April 19, 2026
Application No. 18/606,329
AUTONOMOUS DRIVING METHOD

Non-Final OA §103§112
Filed
Mar 15, 2024
Examiner
CHOU, SHIEN MING
Art Unit
3667
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.
OA Round
1 (Non-Final)
This examiner grants 57% of cases after interview

— +30.8% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 95 resolved cases, 2023–2026
Examiner Intelligence

CHOU, SHIEN MING View full profile →
Grants 57% of resolved cases
Career Allow Rate
54 granted / 95 resolved
+4.8% vs TC avg
Strong +31% interview lift
Without
With
+30.8%
Interview Lift
resolved cases with interview
Typical timeline
4y 4m
Avg Prosecution
28 currently pending
Career history
123
Total Applications
across all art units
Statute-Specific Performance

§101
14.8%
-25.2% vs TC avg
§103
49.3%
+9.3% vs TC avg
§102
14.8%
-25.2% vs TC avg
§112
20.3%
-19.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 95 resolved cases
Office Action

§103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
This action is in response to the application filed on ----3/15/2024 for application 18/606,329. Claim 1 – 20 are pending and have been examined.

Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed on 4/29/2024.
Should applicant desire to obtain the benefit of foreign priority under 35 U.S.C. 119(a)-(d) prior to declaration of an interference, a certified English translation of the foreign application must be submitted in reply to this action.  37 CFR 41.154(b) and 41.202(e).
Failure to provide a certified translation may result in no benefit being accorded for the non-English application.
No action by applicant is require at this time.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 3/15/2024, 5/15/2024, 9/17/2024 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 4 – 9 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claim 4 – 6 recite the limitation "the evaluative feedback layer".  There is insufficient antecedent basis for this limitation in the claim. For the examination purpose, the term is interpreted as “the evaluation feedback layer”. 
The dependent claims 7 – 9 are rejected with same reason. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claim(s) 1 – 3, 19 – 20 are rejected under 35 U.S.C. 103 as being unpatentable over Caruana, “Multitask Learning” in view of Yang, CN112861811.

Claim 1. Caruana discloses: An autonomous driving method implemented by using an automatic driving model, wherein the autonomous driving model comprises a multimodal encoding layer and a decision control layer , the multimodal encoding layer and the decision control layer are connected to form an end-to-end neural network model  (Fig. 1.1 - 1.2, & sec. 1.2, “A learner that learns many related tasks at the same time can use these tasks as inductive bias for each other and thus better learn the domain's regularities”; In Fig. 1.2, the middle layer transforms/encodes the input data into latent/implicit representation of the input data. Each task of the top layer (task 1 – 4) decodes/determines/decides the output value of the task) such that the decision control layer obtains autonomous driving strategy information based directly on the output of the multimodal encoding layer (sec 2.1, “The principal task in both 1D-ALVINN and 2D-ALVINN is to predict steering direction (driving strategy information)”), and wherein the method comprises:
obtaining first input information of the multimodal encoding layer, wherein the first input information comprises navigation information of a target vehicle and perception information for surrounding environment of the target vehicle obtained by using one or more sensors (sec. 2.1 & fig. 2.1, “generates synthetic road images based on a number of user defined parameters such as road width, number of lanes, angle and  field of view of the camera (sensor)”; “input pixels”; i.e., the input of the model are pictures which represents the surroundings of the vehicle during navigation; thus navigation information and perception information. The input simulates data obtained by camera (sensor)), … inputting the first input information into the multimodal encoding layer to obtain an implicit representation corresponding to the first input information output by the multimodal encoding layer (refer to the mapping above & fig. 1.2, the output from the middle layer are the implicit representation of the input ); and 
inputting second input information including the implicit representation into the decision control layer to obtain target autonomous driving strategy information output by the decision control layer (refer to the mapping above & fig. 1.2, the output of the middle layer are input to the top layer which output the steering direction (driving strategy)).  
Caruana does not explicitly teach: 
the perception information comprises current perception information and historical perception information for the surrounding environment of the target vehicle during vehicle driving process;
Yang, in the same field of endeavor, explicitly teach: 
the perception information comprises current perception information and historical perception information for the surrounding environment of the target vehicle during vehicle driving process (translation page 5, “applied to an automatic driving scene”; “carrying out target identification on the target frame signal based on the forward sequence arrangement result and the reverse sequence arrangement result to obtain a target identification result of the target frame signal”; “the signal acquisition module is used for acquiring a signal sequence to be processed; the signal sequence comprises a current target frame signal to be identified, which is acquired by a target sensor, and a plurality of historical frame signals acquired by the target sensor before the target frame signal;”);
Caruana and Yang both teach machine learning application for autonomous driving and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention with a reasonable likelihood of success to further include the current and historic sequence in the input data as taught by Yang in the system of Caruana’s teaching to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification “so that a more accurate and reliable target identification result can be obtained” (Yang translation page 7).

Claim 2. Caruana and Yang combination renders obviousness of all the limitation of Claim 1. The combination further teach: the autonomous driving model further comprises a future prediction layer, and wherein the method further comprises: inputting the implicit representation into the future prediction layer to obtain future prediction information for the surrounding environment of the target vehicle output by the future prediction layer (Caruana, sec. 4.5, “using future to predict the present”, “If output k refers to the prediction for the time series task at time Tk, this net makes predictions for the same task at K different times. Often, good performance is obtained if the output used for prediction is the middle output (temporally) so that there are tasks earlier and later than it trained on the net.”; The combination of Caruana and Yang teaches an multi-task application of detecting surroundings and controlling vehicle (sec. 2.1), Caruana further teach to add prediction tasks in order to obtain better results), wherein the inputting the second input information including the implicit representation into the decision control layer to obtain the target autonomous driving strategy information output by the decision control layer comprises: inputting the second input information including at least a portion of the future prediction information and the implicit representation into the decision control layer to obtain the target autonomous driving strategy information output by the decision control layer (Examiner notes that the claimed limitation: “including at least a portion of future prediction information and the implicit representation”, within BRI, including only implicit representation since the implicit representation is a portion of the data in the mentioned list. As point out in the mapping of Claim 1 & Caruana fig. 1.2, Caruana reference fulfill this limitation).  

Claim 3. Caruana and Yang combination renders obviousness of all the limitation of Claim 2. The combination further teach: the autonomous driving model further comprises a perception detection layer, and wherein the method further comprises: inputting the implicit representation into the perception detection layer to obtain target detection information for the surrounding environment of the target vehicle output by the perception detection layer (Caruana, sec. 2.1 & fig. 1.2, “eight additional tasks were used: whether the road is one or two lanes;  location of centerline (2-lane roads only); location of left edge of road location of right edge of road; location of road center; intensity of road surface; intensity of region bordering road; intensity of centerline (2-lane roads only)”; i.e., these tasks are perception detection, thus perception detection layer), wherein the target detection information comprises current detection information and historical detection information (refer to the mapping in Claim 1 & Yang, translation page 9 – 10, “obtain a positive sequence recognition result of the current signal frame”; i.e., the recognition/detection both in the current and historical frames), the current detection information comprises types and current state information of a plurality of obstacles and road surface elements in the surrounding environment of the target vehicle (refer to the mapping above, the road center and edge of road are obstacles, road surface and centerline are the road surface elements), and the historical detection information comprises types and historical state information of a plurality of obstacles in the surrounding environment of the target vehicle (refer to the mapping above, the combination teaches that the system recognize road center and edge of road (different type of obstacles) and location (state) in the current frame and historical frame ), wherein the inputting the second input information including the implicit representation into the decision control layer to obtain the target autonomous driving strategy information output by the decision control layer comprises: inputting the second input information including at least a portion of the target detection information and the implicit representation into the decision control layer to obtain the target autonomous driving strategy information output by the decision control layer (Examiner notes that the claimed limitation: “including at least a portion of target detection information and the implicit representation”, within BRI, including only implicit representation since the implicit representation is a portion of the data in the mentioned list. As point out in the mapping of Claim 1 & Caruana fig. 1.2, Caruana reference fulfill this limitation).  

Claim 19 is the corresponding electronic device claim of Claim 1. Yang further teach: one or more processors (Yang, translation page 3, “at least one processor”) and memory storing one or more programs configured to be executed by the one or more processors (Yang, translation page 6, “a memory communicatively coupled to the at least one processor; wherein, the memory stores instructions executable by the at least one processor”). Claim 19 is rejected with same reason. 

Claim 20 is the corresponding non-transitory computer-readable storage medium claim of Claim 19. Claim 20 is rejected with same reason. 

Claim(s) 4 – 6 are rejected under 35 U.S.C. 103 as being unpatentable over Caruana, “Multitask Learning” in view of Yang, CN112861811 as applied to Claim 3 above, and further in view of Wang CN120792954.

Claim 4. Caruana and Yang combination renders obviousness of all the limitation of Claim 3. The combination does not explicitly teach: the autonomous driving model further comprises an evaluation feedback layer, and wherein the method further comprises: inputting the implicit representation into the evaluative feedback layer to obtain evaluative feedback information for the target autonomous driving strategy information output by the evaluative feedback layer.
Wang, in the same field of endeavor, when combined with Caruana and Yang, renders obviousness of the following limitation: 
the autonomous driving model further comprises an evaluation feedback layer (Wang, translation page 2, “the invention provides a steer-by-wire angle error correction (evaluation feedback) system based on neural network compensation”, “The system can predict probability distribution of future errors, actively detect environment according to the predicted uncertainty and correct own model on line, thereby realizing quick, accurate and robust correction of steering angle errors”; translation page 4, “This architecture not only provides great freedom for interior design of the vehicle, but also paves the way for integrating advanced driving assistance systems and automatic driving functions.”)), and wherein the method further comprises: inputting the implicit representation into the evaluative feedback layer to obtain evaluative feedback information for the target autonomous driving strategy information output by the evaluative feedback layer (Wang, translation page 5, “The prediction module is connected with the acquisition module and is characterized in that the core of the prediction module is a time sequence error prediction integrated network … The real-time state vectors are input to the integrated network, and each independent time-series neural network independently outputs a predicted sequence of angular errors over a period of time in the future. By employing an integrated structure, the present system is able to predict the future from multiple perspectives, which is the basis for quantifying uncertainty.”; translation page 7, “The main controller acquires key information such as longitudinal speed, yaw rate, lateral acceleration, etc. of the vehicle from other control units”; Wang teaches use sensor data and navigation information (speed, yaw rate lateral acceleration etc.) as input for the error correction; Caruana teaches using common encoding layer of related tasks to learn and generate better model; the combination renders obviousness of using shared encoding layer for the control model and the correction model.)
  Caruana (in view of Yang) and Wang both teach machine learning application for autonomous driving and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention with a reasonable likelihood of success to further include the feedback/correction model/task as taught by Yang in the multitask model of Caruana (in view of Yang)’s teaching to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification so that “the steering error is corrected rapidly, accurately and robustly, and the self-adaptive capacity and safety of the system under complex and variable working conditions are improved remarkably” (Wang translation page 1).

Claim 5. Caruana, Yang and Wang combination renders obviousness of all the limitation of Claim 4. The combination does not explicitly teach: the inputting the implicit representation into the evaluative feedback layer to obtain the evaluative feedback information for the target autonomous driving strategy information output by the evaluative feedback layer comprises: inputting at least a portion of one or both of the future prediction information and the target detection information, and the implicit representation into the evaluation feedback layer to obtain the evaluative feedback information for the target autonomous driving strategy information output by the evaluative feedback layer (Examiner notes that the claimed limitation: “inputting at least a portion of one or both of the future prediction information and the target detection information and the implicit representation”, within BRI, inputting only implicit representation since the implicit representation is a portion of the data in the mentioned list. As point out in the mapping of Claim 1 & Caruana fig. 1.2, Caruana reference fulfill this limitation).  

Claim 6. Caruana, Yang and Wang combination renders obviousness of all the limitation of Claim 4. The combination does not explicitly teach: the inputting the implicit representation into the evaluative feedback layer to obtain the evaluative feedback information for the target autonomous driving strategy information output by the evaluative feedback layer comprises: inputting the implicit representation and the target autonomous driving strategy information into the evaluation feedback layer to obtain the evaluative feedback information for the target autonomous driving strategy information output by the evaluative feedback layer (refer to the mapping in Claim 4 & Wang, Translation page 5 & fig. 1, “The real-time status vector may include information such as commanded steering angle”;  the commended steering angle, in this case, is the driving strategy information generated by the control layer. Caruana 6.3.3 & fig. 6.4, “in Figure 6.4 separate nets were used to learn the models for the extra tasks before the main net was trained on the main task. In other words, the models for the extra tasks were learned with STL. Instead, we can use MTL to learn the models for the extra tasks. MTL should yield better predictions on average for the extra task signals and should yield a more useful hidden layer in the MTL net learned for the extra tasks. This level is a straightforward application of MTL to feature nets”; i.e., Caruana suggested using multitask learning on the extra tasks to learn implicit features and combine the implicit feature and the output from extra task to improve the model. See illustration below:  
    PNG
    media_image1.png
    777
    2072
    media_image1.png
    Greyscale
  The combination of Caruana, Yang and Wang renders obviousness of limitation.).  

Claim(s) 7 – 9 are rejected under 35 U.S.C. 103 as being unpatentable over Caruana, “Multitask Learning” in view of Yang, CN112861811 and Wang CN120792954 as applied to Claim 4 above, and further in view of Jin “Adapt: Action-aware Driving Caption Transformer”.

Claim 7. Caruana, Yang and Wang combination renders obviousness of all the limitation of Claim 4. The combination does not explicitly teach: the autonomous driving model further comprises an interpretation layer, and wherein the method further comprises: inputting the implicit representation into the interpretation layer to obtain interpretation information for the target autonomous driving strategy information output by the interpretation layer, wherein the interpretation information can represent a decision category of the target autonomous driving strategy information.  
Jin, in the same field of endeavor, explicitly teach: 
the autonomous driving model further comprises an interpretation layer (Jin, sec. III & Fig. 2(a), “multi-task learning”, “The purpose of the text generation head (interpretation layer) is to generate … sentences that describe … the action of the vehicle”), and wherein the method further comprises: inputting the implicit representation into the interpretation layer to obtain interpretation information for the target autonomous driving strategy information output by the interpretation layer (Jin sec. III & fig. 2, “encode video frames into video feature tokens”, fig. 2 shows that the encoded video tokens (implicit representation) are the input to both the control head and the text generation head), wherein the interpretation information can represent a decision category of the target autonomous driving strategy information (refer to the mapping above, the action of the vehicle, for example slow down, speed up, are the category).  
 Caruana (in view of Yang, Wang) and Jin both teach machine learning application for autonomous driving and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention with a reasonable likelihood of success to further include the autonomous action interpretation model as taught by Jin in the multitask model of Caruana (in view of Yang and Wang)’s teaching to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification to reduce “the lack of transparency and interpretability of the automatic decision making process” which “hinders its industrial adoption in practice.” (Jin Abs.).

Claim 8. Caruana, Yang, Wang and Jin combination renders obviousness of all the limitation of Claim 7. The combination further teach: the inputting the implicit representation into the interpretation layer to obtain the interpretation information for the target autonomous driving strategy information output by the interpretation layer comprises: inputting at least a portion of one or both of future prediction information and target detection information, and the implicit representation into the interpretation layer to obtain the interpretation information for the target autonomous driving strategy information output by the interpretation layer (Examiner notes that the claimed limitation: “inputting at least a portion of one or both of the future prediction information and the target detection information and the implicit representation”, within BRI, inputting only implicit representation since the implicit representation is a portion of the data in the mentioned list. As point out in the mapping of Claim 7, the combination of references fulfill this limitation).  

Claim 9. Yang, Wang and Jin combination renders obviousness of all the limitation of Claim 7. The combination further teach: the inputting the implicit representation into the interpretation layer to obtain the interpretation information for the target autonomous driving strategy information output by the interpretation layer comprises: inputting the implicit representation and the target autonomous driving strategy information into the interpretation layer to obtain the interpretation information for the target autonomous driving strategy information output by the interpretation layer (Jin, IV.D., “can we simply pass the control signals to the multi-modal transformer to get the final caption prediction? So we create such an architecture that takes video tokens, control signal tokens (generated by a learnable embedding layer) and masked text tokens as input and generates predictions of the masked tokens, which is referred toas ‘Single+’”; i.e., both the implicit representation and the control output are both input to the interpretation layer).  

Claim(s) 10 – 14 are rejected under 35 U.S.C. 103 as being unpatentable over Caruana, “Multitask Learning” in view of Yang, CN112861811 as applied to Claim 1 above, and further in view of Ashish US20240211647.

Claim 10. Caruana and Yang combination renders obviousness of all the limitation of Claim 1. The combination further teach: the multimodal encoding layer and the decision control layer of the automatic driving model are obtained by performing a first training process for training on an initial multimodal encoding layer and an initial decision control layer (Caruana, sec. 1.2, “Backpropagation is applied to these net by training each net”; ), and wherein the first training process comprises: 
obtaining first sample input information and first autonomous driving strategy information corresponding to the first sample input information, wherein the first sample input information comprises first sample navigation information of a first sample vehicle and sample perception information for surrounding environment of the first sample vehicle, and the sample perception information comprises current sample perception information and historical sample perception information for the surrounding environment of the first sample vehicle (refer to the mapping in Claim 1, the model’s input including current and historical perception information, Caruana, sec. 1.1 & eq. 1.1 – 1.2 shows that the training data includes model inputs and the predicted control output values y ); 
inputting the first sample input information into the initial multimodal encoding layer to obtain a first sample implicit representation output by the initial multimodal encoding layer;  inputting intermediate sample input information including the first sample implicit representation into the initial decision control layer to obtain first prediction autonomous driving strategy information output by the initial decision control layer; and adjusting one or more parameters of the initial multimodal encoding layer and the initial decision control layer based on at least the first prediction autonomous driving strategy information and the first real autonomous driving strategy information (Caruana, sec 1.1 & eq. 1.2, “supervised learning”; the claim recites a generic backpropagation supervised learning. Based on the network structure of Fig. 1.2, the inputs are fed into the middle (multimodal encoding) layer and the encoded data (implicit representation) are fed into the upper (decision control) layer, and the  modal output are back propagated to each layer and adjust modal parameters of each layer of the initial model).  
Caruana and Yang combination does not explicitly teach: real autonomous driving strategy
Ashish, in the same field of endeavor, explicitly teach: real autonomous driving strategy (Ashish, 0004, “a method of training an autonomous vehicle control system model includes accessing three dimensional multi-sensor data associated with a plurality of real-world drives in a sensor equipped vehicle”; i.e., the training of the autonomous system is based on the real-world driving and sensing)
 Caruana (in view of Yang) and Ashish both teach machine learning application for autonomous driving and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention with a reasonable likelihood of success to further include the feedback/correction model/task as taught by Yang in the multitask model of Caruana’s teaching to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification so that the autonomous driving system’s may ”may then mimic that good driving behavior in the future” (Ashish, 0002) and the reaction would be appropriate (Ashish, 0038).

Claim 11. Caruana, Yang, and Ashish combination renders obviousness of all the limitation of Claim 10. The combination further teach: before the first training process, performing an offline pre-training on the initial multimodal encoding layer and the initial decision control layer such that the autonomous driving model can obtain the first prediction autonomous driving strategy information based on the first sample input information; wherein the first training process further comprises: performing a first autonomous driving using the autonomous driving model obtained by the offline pre-training; and obtaining the first sample input information and the first real autonomous driving strategy information corresponding to the first sample input information during the first autonomous driving (Ashish, 0038, “the behavior of the autonomous driving system 120 is inappropriate may indicate a need for further training”; i.e., if a trained (pre-trained) model performs inappropriate, it needs be trained again; 0072 – 0083 “simulating operation of an autonomous vehicle control system”, “training an autonomous vehicle control system model”, i.e., the training of the model is not during the driving in the real-world, i.e. offline trained).  

Claim 12. Caruana, Yang, and Ashish combination renders obviousness of all the limitation of Claim 11. The combination further teach: the autonomous driving model further comprises a perception detection layer and a future prediction layer (refer to the mapping in Claim 1 – 3 & Caruana sec. 2.1 & 4.5, Caruana suggested a multitask learning system including detection task (layer) and prediction task (layer) for the detection task), and performing the offline pre-training on the initial multimodal encoding layer comprises: 
obtaining second sample input information as well as first real detection information and first future real information for surrounding environment of a second sample vehicle corresponding to the second sample input information, wherein the second sample input information comprises second sample navigation information of the second sample vehicle and sample perception information for the surrounding environment of the second sample vehicle, the first real detection information comprises types, real current state information and real history state information of a plurality of real sample obstacles in the surrounding environment of the second sample vehicle, and types and real current state information of a plurality of real sample road surface elements, and the first future real information comprises real detection information at a future moment (refer to the mapping in Claim 1 – 3 & Claim 11, The claimed limitations are referring to the training data of a supervised leaning process for model training of detection task and future prediction task. Caruana teaches supervised learning process as mapped in Claim 11 with detection task and future prediction task as mapped in Claim 2 – 3. The claimed second sample vehicle recites crowd sources of training samples collected by more than one vehicle. Ashish, 0034, ”Systems and methods as described herein are configured to modify, annotate, or otherwise augment the multi-sensor data captured by vehicles to make that data amenable to location by search for driving system training and verification.” Thus, the combination teaches the claimed limitations. );
inputting the second sample input information into the initial multimodal encoding layer to obtain a second sample implicit representation corresponding to the second sample input information output by the initial multimodal encoding layer; inputting the second sample implicit representation into the perception detection layer to obtain first prediction detection information output by the perception detection layer, wherein the first prediction detection information comprises types, prediction current state information and prediction history state information of a plurality of prediction sample obstacles, and types and prediction current state information of a plurality of prediction sample road surface elements in the surrounding environment of the second sample vehicle (refer to the mapping above and Claim 1 – 3 & 11, the claimed limitations describes a supervised training steps and data of a reception detection layer in a multitask modal training. The combination of Caruana, Yang and Ashish renders obviousness of these limitations); 
inputting the second sample implicit representation into the future prediction layer to obtain first future prediction information output by the future prediction layer; adjusting one or more parameters of the initial multimodal encoding layer based on the first real detection information and the first prediction detection information, as well as the first future real information and the first future prediction information; adjusting one or more parameters of the perception detection layer based on the first real detection information and the first prediction detection information; and adjusting one or more parameters of the future prediction layer based on the first future real information and the first future prediction information (refer to the mapping above and Claim 1 – 3 & 10 – 11, the claimed limitations describes a supervised training steps and data of a future prediction detection layer in a multitask modal training. The combination of Caruana, Yang and Ashish renders obviousness of these limitations.)  

Claim 13. Caruana, Yang, and Ashish combination renders obviousness of all the limitation of Claim 11. The combination further teach: the autonomous driving model further comprises a future prediction layer (refer to the mapping in Claim 10 – 11, Claim 1 – 3, the multitask model of Caruana includes future prediction task (layer)), and performing the offline pre-training on the initial multimodal encoding layer and the initial decision control layer comprises: 
obtaining third sample input information as well as second future real information and second real autonomous driving strategy information for surrounding environment of a third sample vehicle corresponding to the third sample input information, wherein the third sample input information comprises third sample navigation information of the third sample vehicle and sample perception information for the surrounding environment of the third sample vehicle (Refer to the mapping above & Claim 1 – 3 and 10 – 11, the limitations recites the training data for supervised learning model for future prediction task as described in Claim 2, The combination renders obviousness of the 3rd set of sample data as point out in Claim 10 – 11); 
inputting the third sample input information into the initial multimodal encoding layer to obtain a third sample implicit representation corresponding to the third sample input information output by the initial multimodal encoding layer; inputting the third sample implicit representation into the future prediction layer to obtain second future prediction information output by the future prediction layer; inputting a sample intermediate representation including the third sample implicit representation into the initial decision control layer to obtain second prediction autonomous driving strategy information output by the initial decision control layer (Refer to the mapping above & Claim 1 – 3 and 10 – 11, the limitation recites the supervised training step for future prediction task (layer) and decision control task (layer) in a multitask modal as described in Claim 2. Caruana sec. 1.1 – 1.2 & eq. 1.1 – 1.2 “we simultaneously train a net to recognize object outlines, shapes, … etc., it may learn better to recognize complex objects in the real world. We call this approach to learning Multitask Learning (MTL).”, “learning multiple tasks in parallel while using a shared representation”; sec. 3.3, “The mechanism that allows backpropagation to benefit from these relationships is the summing of error gradient terms at the hidden layer from different task outputs.” i.e., The multiple tasks are trained together thus uses the same implicit representations as input);
adjusting one or more parameters of the future prediction layer based on the second future real information and the second future prediction information; adjusting one or more parameters of the initial multimodal encoding layer based on the second real autonomous driving strategy information and the second prediction autonomous driving strategy information, as well as the second future real information and the second future prediction information; and adjusting one or more parameters of the initial decision control layer based on the second real autonomous driving strategy information and the second prediction autonomous driving strategy information (refer to the mapping above, the limitation describes the back propagation steps of parallel training and the parameter adjustment from multiple heads to the shared layers. Tuning model parameters through back propagation from output layer to input layer is well known for the skilled in the art. Such understanding can be easily found online, for example WatElectronics “What is Backpropagation Neural Network & Its Working”. Caruana further teaches in sec. 3.3 that combines the adjustment from multiple head to the shared layer and the combined reference renders obviousness of the limitations).
  
Claim 14. Caruana, Yang, and Ashish combination renders obviousness of all the limitation of Claim 13. The combination further teach: the performing the offline pre-training on the initial multimodal encoding layer and the initial decision control layer comprises: inputting the third sample input information into a driving strategy prediction model to obtain second autonomous driving strategy real information output by the driving strategy prediction model (Caruana, sec. 4.5, “using future to predict the present”, “If output k refers to the prediction for the time series task at time Tk, this net makes predictions for the same task at K different times. Often, good performance is obtained if the output used for prediction is the middle output (temporally) so that there are tasks earlier and later than it trained on the net.”; i.e., Caruana teaches to add a future prediction model, in this case a driving strategy prediction model, which may improve the performance of the current driving strategy output by the decision control layer).  

Claim(s) 15 – 16 are rejected under 35 U.S.C. 103 as being unpatentable over Caruana, “Multitask Learning” in view of Yang, CN112861811, Ashish US20240211647 as applied to Claim 11 above, and further in view of Wang CN120792954.

Claim 15, Caruana, Yang, and Ashish combination renders obviousness of all the limitation of Claim 11. The combination further teach: performing the offline pre-training on the initial multimodal encoding layer and the initial decision control layer further comprises: obtaining fourth sample input information and third real autonomous driving strategy information corresponding to the fourth sample input information, wherein the fourth sample input information comprises fourth sample navigation information of the fourth sample vehicle and sample perception information for the surrounding environment of the fourth sample vehicle; inputting the fourth sample input information into the initial multimodal encoding layer to obtain a fourth sample implicit representation corresponding to the fourth sample input information output by the initial multimodal encoding layer; inputting intermediate sample input information including the fourth sample implicit representation into the initial decision control layer to obtain third prediction autonomous driving strategy information output by the initial decision control layer (The limitation describes the offline pre-training data and training steps of a multitask modal. Refer to the mapping in Claim 10 – 11.  The combination reference renders obviousness of the claimed limitation of the fourth samples as point out in Claim 10 – 11) 
The combination does not explicitly teach: 
the autonomous driving model further comprises an evaluation feedback layer
inputting the fourth sample implicit representation into the evaluation feedback layer to obtain sample evaluation feedback information for the third prediction autonomous driving strategy information output by the evaluation feedback layer; adjusting one or more parameters of the initial multimodal encoding layer and the initial decision control layer based on the sample evaluation feedback information for the third prediction autonomous driving strategy information, the third prediction autonomous driving strategy information and the third real autonomous driving strategy information.  
Wang, in the same field of endeavor, explicitly teach: 
the autonomous driving model further comprises an evaluation feedback layer (Wang, translation page 2, “the invention provides a steer-by-wire angle error correction (evaluation feedback) system based on neural network compensation”, “The system can predict probability distribution of future errors, actively detect environment according to the predicted uncertainty and correct own model on line, thereby realizing quick, accurate and robust correction of steering angle errors”; translation page 4, “This architecture not only provides great freedom for interior design of the vehicle, but also paves the way for integrating advanced driving assistance systems and automatic driving functions.”)
inputting the fourth sample implicit representation into the evaluation feedback layer to obtain sample evaluation feedback information for the third prediction autonomous driving strategy information output by the evaluation feedback layer; adjusting one or more parameters of the initial multimodal encoding layer and the initial decision control layer based on the sample evaluation feedback information for the third prediction autonomous driving strategy information, the third prediction autonomous driving strategy information and the third real autonomous driving strategy information (The limitations recites the training of a multitask modal including decision control task/layer and feedback/error correction task/layer; refer to the mapping above & Claim 1, 10 – 11, Wang teaches adding an error detection neural network to the autonomous control system, Caruana teaches to train multitask modal to improve the performance on each task. The combination renders obviousness of the limitation).  
 Caruana (in view of Yang, Ashish) and Wang both teach machine learning application for autonomous driving and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention with a reasonable likelihood of success to further include the feedback/correction model/task as taught by Yang in the multitask model of Caruana (in view of Yang and Ashish)’s teaching to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification so that “the steering error is corrected rapidly, accurately and robustly, and the self-adaptive capacity and safety of the system under complex and variable working conditions are improved remarkably” (Wang translation page 1).

Claim 16, Caruana, Yang, Ashish and Wang combination renders obviousness of all the limitation of Claim 15. The combination further teach: the training process of the evaluation feedback layer comprises: obtaining fifth sample input information and real evaluation feedback information for the fifth sample input information, wherein the fifth sample input information comprises fifth sample navigation information of the fifth sample vehicle and sample perception information for the surrounding environment of the fifth sample vehicle (refer to the mapping in Claim 10 – 11. The limitation describes the training data and training steps of a multitask modal. The combination reference renders obviousness of the claimed limitation of the fifth samples as point out in Claim 10 – 11);
inputting the fifth sample input information into the initial multimodal encoding layer to obtain a fifth sample implicit representation corresponding to the fifth sample input information output by the initial multimodal encoding layer; inputting the fifth sample implicit representation into the evaluation feedback layer to obtain prediction evaluation feedback information for the fifth sample input information output by the evaluation feedback layer; and adjusting one or more parameters of the initial multimodal encoding layer and the evaluation feedback layer based on the real evaluation feedback information and the prediction evaluation feedback information (refer to the mapping above and Claim 1, 4 & 10 – 11, the claimed limitations describes a supervised training steps and data for the error correction/evaluation feedback task (layer) in a multitask modal training. The combination of Caruana, Yang and Ashish renders obviousness of these limitations.).

  Claim(s) 17 – 18 are rejected under 35 U.S.C. 103 as being unpatentable over Caruana, “Multitask Learning” in view of Yang, CN112861811, Ashish US20240211647, Wang CN120792954 as applied to Claim 15 above, and further in view of Unger, CN113168763.

Claim 17, Caruana, Yang, Ashish and Wang combination renders obviousness of all the limitation of Claim 15. The combination further teach: the first sample input information comprises an intervention identifier, … and the first training process further comprises: inputting the first sample implicit representation into the evaluation feedback layer to obtain sample evaluation feedback information for the first prediction autonomous driving strategy information output by the evaluation feedback layer, and wherein the adjusting one or more parameters of the initial multimodal encoding layer and the initial decision control layer based on at least the first prediction autonomous driving strategy information and the first real autonomous driving strategy information comprises: adjusting one or more parameters of the initial multimodal encoding layer and the initial decision control layer based on the sample evaluation feedback information, the intervention identifier, the first prediction autonomous driving strategy information and the first real autonomous driving strategy information (The claimed limitation of human intervention identifier, as described in page 34 – 35 of the description of instant application, refer to the tagging of event (dangerous/anomaly) associated with the collected data and is used for selecting unfavorable/negative examples. Ashish, 0032 – 0033, “Autonomous driving systems are often overtrained on uneventful data, while those same systems may be undertrained on less common scenarios”, “The performance of those driving systems may be improved by providing a greater number of training scenarios representative of those anomaly conditions”, “If the driver is not flagging (with identifier) or otherwise annotating driving events as they occur, an activity that might have safety implications of its own, identifying driving scenarios of interest for training and simulation after the fact may be very difficult”, “Systems and methods as described herein are configured to modify, annotate, or otherwise augment the multi-sensor data captured by vehicles to make that data amenable to location by search for driving system training and verification”; 0037, “The training/simulation module 114 may be configured to augment network link weights in the autonomous driving system based on the identified training scenarios 112 associated with the state pattern of the search criteria 110 so that the autonomous driving system 116 may better handle similar occurrences in the real world.” i.e., the system of Ashish flag collected training data with scenarios and update model parameters based on the identified scenarios. Examiner notes that the).  
The combination does not explicitly teach: 
the intervention identifier can represent whether the first real autonomous driving strategy information is autonomous driving strategy information with human intervention
Unger, in the same field of endeavor, explicitly teach: 
the intervention identifier can represent whether the first real autonomous driving strategy information is autonomous driving strategy information with human intervention (Unger, translation page 2, “In the method, intervention data are provided by a plurality of vehicles, wherein the intervention data each represent an at least semiautonomous driving mode of a human intervention vehicle. The intervention data includes sensor data at a point in time of the intervention and location data for the location at the point in time of the intervention. A location at which an increased number of vehicle records (registratierens) are manually intervened is determined from the intervention data for the plurality of vehicles”, “it is now possible in a simple manner to determine at which locations an at least semi-autonomous driving operation has presumably been carried out by mistake”; i.e., the system of Unger adds human intervention identifier to the recorded data which can be used to identify the anomalies in the autonomous driving behavior. Thus, the combined reference with Unger renders obviousness of the claimed limitation.)
 Caruana (in view of Yang, Ashish and Wang) and Unger both teach autonomous driving and recording application and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention with a reasonable likelihood of success to further include the human intervention tagging taught by Unger in the autonomous driving system of Caruana (in view of Yang and Ashish and Wang)’s teaching to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification in order to “further improve the driving operation” (Unger translation page 4).

Claim 18. Caruana (in view of Yang, Ashish and Wang) and Unger combination renders obviousness of all the limitation of Claim 17. The combination further teach: the multimodal encoding layer and the decision control layer of the automatic driving model are obtained by further performing a second training process, and wherein the second training process comprises: performing a second autonomous driving by using the autonomous driving model obtained by the first training process, and obtaining sixth sample input information and fourth real autonomous driving strategy information corresponding to the sixth sample input information during the second autonomous driving, wherein the sixth sample input information comprises sixth sample navigation information of the sixth sample vehicle and sample perception information for the surrounding environment of the sixth sample vehicle; obtaining fourth prediction autonomous driving strategy information output by the autonomous driving model based on the sixth sample input information; and adjusting the one or more parameters of the initial multimodal encoding layer and the initial decision control layer again based on at least the fourth real autonomous driving strategy information and the fourth prediction autonomous driving strategy information (The recited second training process describes retraining after the first training process. Refer to the mapping in Claim 10 – 11 & Ashish 0038, “the behavior of the autonomous driving system 120 is inappropriate may indicate a need for further training”, i.e., Ashish teaches to verify the trained model and retrain the model when need. The recited sixth samples and sixth sample vehicle recites crowed sourced training data. Refer to the mapping in Claim 12 & Ashish, 0034, Ashish teaches that data may be collected from crowed source. The rest of the limitations are referring to the training of the decision control task/layer which have been mapped in Claim 10.).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHIEN MING CHOU whose telephone number is (571)272-9354. The examiner can normally be reached Monday- Friday 9 am - 5 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, HELAL ALGAHAIM can be reached on (571) 270-5227. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/SHIEN MING CHOU/Examiner, Art Unit 3666         

/HELAL A ALGAHAIM/SPE , Art Unit 3666
Read full office action
Prosecution Timeline

Mar 15, 2024
Application Filed
Jan 09, 2026
Non-Final Rejection — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/686,041
Patent 12602583
METHOD OF TRAINING A NEURAL NETWORK TO CONTROL AN AIRCRAFT SYSTEM
2y 5m to grant Granted Apr 14, 2026
17/521,625
Patent 12585954
PERFORMANCE OF AUTONOMOUS VEHICLE OPERATION IN VARYING CONDITIONS BY USING IMAGERY GENERATED WITH MACHINE LEARNING FOR SIMULATIONS
2y 5m to grant Granted Mar 24, 2026
17/977,846
Patent 12576833
DEVICE AND METHOD OF CONTROLLING REMOTE PARKING ASSIST FUNCTION
2y 5m to grant Granted Mar 17, 2026
18/901,743
Patent 12565237
OFF-BOARD PERCEPTION BASED ON SEQUENCE TO SEQUENCE SENSOR DATA GENERATION
2y 5m to grant Granted Mar 03, 2026
16/987,083
Patent 12502994
METHODS AND SYSTEMS FOR MANAGING DEMAND FOR ELECTRIC VEHICLE CHARGING IN A TENANT ENVIRONMENT AND RELATED APPLICATIONS AND DEVICES
2y 5m to grant Granted Dec 23, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
57%
Grant Probability
88%
With Interview (+30.8%)
4y 4m
Median Time to Grant
Low
PTA Risk
Based on 95 resolved cases by this examiner. Grant probability derived from career allow rate.