Last updated: April 19, 2026
Application No. 18/608,250
EFFICIENT CLOUD-BASED DYNAMIC MULTI-VEHICLE BEV FEATURE FUSION FOR EXTENDED ROBUST COOPERATIVE PERCEPTION

Non-Final OA §102§103§112
Filed
Mar 18, 2024
Examiner
ALFONSO, DENISE G
Art Unit
2662
Tech Center
2600 — Communications
Assignee
Qualcomm Incorporated
OA Round
1 (Non-Final)
Interview Optional

— +19.8% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 103 resolved cases, 2023–2026
Examiner Intelligence

ALFONSO, DENISE G View full profile →
Grants 74% — above average
Career Allow Rate
76 granted / 103 resolved
+11.8% vs TC avg
Strong +20% interview lift
Without
With
+19.8%
Interview Lift
resolved cases with interview
Typical timeline
3y 1m
Avg Prosecution
31 currently pending
Career history
134
Total Applications
across all art units
Statute-Specific Performance

§101
8.3%
-31.7% vs TC avg
§103
59.8%
+19.8% vs TC avg
§102
19.4%
-20.6% vs TC avg
§112
8.1%
-31.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 103 resolved cases
Office Action

§102 §103 §112
DETAILED ACTIONS
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (“IDS”) filed on 06/18/2024 and 08/19/2025 were reviewed and the listed references were noted.

Drawings
The 14-page drawings have been considered and placed on record in the file.

Status of Claims
Claims 1-30 are pending.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 1 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. The phrase "grid-free" renders the claim indefinite because it is unclear what the term “grid-free” means in terms of a vehicle data. For the purpose of furthering prosecution, Examiner has interpreted “grid-free” as data that does not have any grids shown optically on the data. Claim 27 and claim 30 are rejected for the same reason as above.
Claims 7-8 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. The phrase "grid-free kernels" renders the claim indefinite because it is unclear what the term “grid-free kernels” means in terms of a vehicle data. For the purpose of furthering prosecution, Examiner has interpreted “grid-free kernels” as region of the data that does not have any grids shown optically. Claims 28-29 are rejected for the same reason as above.


Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claims 1-2, 4-7, 9, 12, 15-20, 22, and 24-30 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Chang et al., "BEV-V2X: Cooperative Birds-Eye-View Fusion and Grid Occupancy Prediction via V2X-Based Data Sharing", (2023), hereinafter referred to as Chang.

Claim 1
Chang discloses a system for processing data from a plurality of vehicles (Chang, Fig. 2), the system comprising: 
one or more memories (Chang, Section 4.A, “the CPU of the machine is Intel 10900X, and the GPU is RTX 3090. Our operating system is Ubuntu 18.04LTS with 128GB RAM”) for storing vehicle data from each of the plurality of vehicles (Chang, Fig. 2, Vehicle No.1, Vehicle No. 2, and Vehicle No.N), the vehicle data being grid-free (Chang, Fig. 2, Single Vehicle BEV Image, the BEV image in Fig. 2 is grid-free); and 
one or more processors in communication with the one or more memories (Chang, Section 4.A, “the CPU of the machine is Intel 10900X, and the GPU is RTX 3090. Our operating system is Ubuntu 18.04LTS with 128GB RAM”), the one or more processors (Chang, Section 4.A, “the CPU of the machine is Intel 10900X, and the GPU is RTX 3090”)configured to: 
determine one or more first features from first vehicle data of the vehicle data (Chang, Fig. 2, Single Vehicle BEV Image from Vehicle No. 1, Section III.A, “With the help of various sensors, such as cameras and Lidar, the single vehicle perceives the surrounding environment. Then, the vehicle system converts the raw sensory data, such as images and point clouds into BEV space, and generates the local BEV centered on its own coordinates. BEV is a semantically composite data structure, which uses matrices to represent the occupancy of scenario elements within a certain spatial area. Each matrix element corresponds to the occupancy probability or state of each grid in the driving environment, which can be further summarized and displayed as RGB image.”, Section III.A.1.a, “At each grid location of the BEV, the occupying objects may include both vehicles and road elements, and they are not in conflict with each other.”), the first vehicle data being from a first vehicle of the plurality of vehicles (Chang, Fig. 2, Single Vehicle BEV Image from Vehicle No. 1); 
determine one or more second features from second vehicle data of the vehicle data (Chang, Fig. 2, Single Vehicle BEV Image from Vehicle No. 2, Section III.A.1, “With the help of various sensors, such as cameras and Lidar, the single vehicle perceives the surrounding environment. Then, the vehicle system converts the raw sensory data, such as images and point clouds into BEV space, and generates the local BEV centered on its own coordinates. BEV is a semantically composite data structure, which uses matrices to represent the occupancy of scenario elements within a certain spatial area. Each matrix element corresponds to the occupancy probability or state of each grid in the driving environment, which can be further summarized and displayed as RGB image.”, Section III.A.1.a, “At each grid location of the BEV, the occupying objects may include both vehicles and road elements, and they are not in conflict with each other.”), the second vehicle data being from a second vehicle of the plurality of vehicles (Chang, Fig. 2, Single Vehicle BEV Image from Vehicle No. 2); 
fuse the one or more first features and the one or more second features to generate fused features (Chang, Section III.A.2, “The roadside unit collects the local BEV information of all CAVs in the control area, and periodically extracts data in the historical time horizon. Combined with the internal pre-stored grid map data of the control area, the system calls the deployed BEV-V2X fusion and prediction model, and obtains the cooperative BEV occupancy of the global scenario in the future.”, Abstract, “connected and automated vehicles (CAVs)”); and 
generate a bird’s-eye-view (BEV) representation based on the fused features (Chang, Section III.A.2c, “The model outputs the BEV occupancy grid estimate for the whole control area in future time horizon F. The output [Pc(x,y)]C× HO× WO is the occupancy probability of C elements in the global grid network with the size of HO×WO. Further, the occupancy state representation and visual image of the global CBEV are generated by the transformation rules in Formula (1) and (2), which constitute the final fusion and prediction information”).  

Claim 2
Chang discloses the system of claim 1 (Chang, Fig. 2), wherein the one or more processors (Chang, Section 4.A, “the CPU of the machine is Intel 10900X, and the GPU is RTX 3090. Our operating system is Ubuntu 18.04LTS with 128GB RAM”) are further configured to send the BEV representation to at least one vehicle of the plurality of vehicles (Chang, Section I, “By extracting the single vehicle BEV data in the historical time horizons, we can integrate the perception information of different CAVs and predict the global BEV occupancy grid map in the future time horizons. This article focuses on BEV fusion and prediction. The fusion and prediction results can help achieve accurate environment perception and strengthen the understanding of the global scenario. Based on the results, the system can provide real-time driving risk warning, formulate the corresponding planning scheme, and send the messages to vehicles in the control area.”, Section 1, “Each connected and automated vehicle (CAV) regularly reports its own information to other vehicles or roadside units. By aggregating and fusing the data information from different CAVs, we can get a more accurate understanding of the global scenario”).  

Claim 4
Chang discloses the system of claim 1 (Chang, Fig. 2), wherein at least one processor of the one or more processors (Chang, Section 4.A, “the CPU of the machine is Intel 10900X, and the GPU is RTX 3090. Our operating system is Ubuntu 18.04LTS with 128GB RAM”) that is configured to generate the BEV representation is located outside of the first vehicle and the second vehicle (Chnag, Fig. 1, roadside unit, Section 1, “The roadside unit collects the local BEV of all CAVs in the control area, and periodically extracts the historical data.”, “In fact, there are basically two modes of vehicle-to-vehicle (V2V) communication and vehicle-to-infrastructure (V2I) communication to achieve BEV fusion via V2X technique. We recommend that the corresponding models be deployed at the roadside or cloud center.”).  

Claim 5
Chang discloses the system of claim 1 (Chang, Fig. 2), wherein the one or more processors (Chang, Section 4.A, “the CPU of the machine is Intel 10900X, and the GPU is RTX 3090. Our operating system is Ubuntu 18.04LTS with 128GB RAM”) are further configured to at least one of receive a first indication of the one or more first features or receive a second indication of the one or more second features (Chang, Section III.A.1.a , “At each grid location of the BEV, the occupying objects may include both vehicles and road elements, and they are not in conflict with each other. Therefore, we divide the scenario elements into different categories, i.e., dynamic traffic participants such as vehicles and pedestrians, and static road environment information such as drivable areas, lanes, traffic infrastructures, channelization, etc. We apply the symbol Pc(x,y) to denote the probability that the BEV position(x,y) is occupied by category c.[Pc(x,y)]C× H× W is the occupancy probability matrix of C elements in the grid network with the size of H × W.”)

Claim 6
Chang discloses the system of claim 1 (Chang, Fig. 2), wherein the one or more processors (Chang, Section 4.A, “the CPU of the machine is Intel 10900X, and the GPU is RTX 3090. Our operating system is Ubuntu 18.04LTS with 128GB RAM”) are further configured to at least one of send a first indication of the one or more first features or receive a second indication of the one or more second features (Chang, Section I, “By extracting the single vehicle BEV data in the historical time horizons, we can integrate the perception information of different CAVs and predict the global BEV occupancy grid map in the future time horizons. This article focuses on BEV fusion and prediction. The fusion and prediction results can help achieve accurate environment perception and strengthen the understanding of the global scenario. Based on the results, the system can provide real-time driving risk warning, formulate the corresponding planning scheme, and send the messages to vehicles in the control area.”, Section 1, “Each connected and automated vehicle (CAV) regularly reports its own information to other vehicles or roadside units. By aggregating and fusing the data information from different CAVs, we can get a more accurate understanding of the global scenario”).  

Claim 7
Chang discloses the system of claim 1 (Chang, Fig. 2), wherein the first vehicle data comprises first grid-free kernels (Chang, Fig. 2, Vehicle No.1, Section III.A.1, “Each matrix element corresponds to the occupancy probability or state of each grid in the driving environment, which can be further summarized and displayed as RGB image.”, the grids are removed and it is displayed as an RGB image), wherein the second vehicle data comprises second grid-free kernels (Chang, Fig. 2, Vehicle No.2, Section III.A.1, “Each matrix element corresponds to the occupancy probability or state of each grid in the driving environment, which can be further summarized and displayed as RGB image.”, the grids are removed and it is displayed as an RGB image),, and wherein as part of fusing the one or more first features and the one or more second features, the one or more processors are configured to fuse the first grid-free kernels and the second grid-free kernels (Chang, Section III.A.2, “The roadside unit collects the local BEV information of all CAVs in the control area, and periodically extracts data in the historical time horizon. Combined with the internal pre-stored grid map data of the control area, the system calls the deployed BEV-V2X fusion and prediction model, and obtains the cooperative BEV occupancy of the global scenario in the future.”, Abstract, “connected and automated vehicles (CAVs)”);.  

Claim 9
Chang discloses the system of claim 1 (Chang, Fig. 2), wherein the vehicle data comprises at least one of vehicle pose, vehicle location, or vehicle trajectory (Chang, Fig. 2, the BEV data shows the location and pose of the vehicle), and wherein the one or more processors (Chang, Section 4.A, “the CPU of the machine is Intel 10900X, and the GPU is RTX 3090. Our operating system is Ubuntu 18.04LTS with 128GB RAM”) are further configured to determine to not fuse a third one or more features from third vehicle data (Chang, Fig. 1, the roadside unit only fuses the data from the vehicles, ABD, and not CE), the third vehicle data being from a third vehicle of the plurality of vehicles (Chang, Fig.1, Vehicles C or E, Section I, “After raw perception data are aggregated to the BEV, the corresponding grid position and associated confidence of vehicle C in the local BEV of A and B are also different. It is a critical issue to collect and fuse the local BEV of different vehicles to obtain global BEV with higher reliability and more comprehensive scenario understanding” ) as part of generating the BEV representation based on the at least one of vehicle pose, vehicle location, or vehicle trajectory (Chang, Section A, “The roadside unit collects the local BEV of all CAVs in the control area, and periodically extracts the historical data. Using the data, the system calls the deep learning model to fuse the information from different vehicles, and predict the future cooperative BEV occupancy grid map.”).  

Claim 12
Chang discloses the system of claim 1 (Chang, Fig. 2), wherein the BEV representation is a unified BEV representation for the plurality of vehicles (Chang, Section V, “The third is the method adopted by our model, which applies roadside unit or cloud center to collect all CAVs’ information in the control area in a unified and centralized manner for global fusion and prediction.”).  

Claim 15
Chang discloses the system of claim 1 (Chang, Fig. 2), wherein the one or more processors (Chang, Section 4.A, “the CPU of the machine is Intel 10900X, and the GPU is RTX 3090. Our operating system is Ubuntu 18.04LTS with 128GB RAM”) are configured to obtain vehicle data generated over time (Chang, Section III.A, “CAV sends its real-time SBEV data package to the roadside unit in a timer-trigger style”), wherein the vehicle data comprises information indicative of at least one of a change in pose of the first vehicle or a change in pose of the second vehicle (Chang, Section III.a, “W eapply the symbol Pc(x,y) to denote the probability that the BEV position (x,y) is occupied by category c.[Pc(x,y)]C× H× W is the occupancy probability matrix of C elements in the grid network with the size of H × W.”), and wherein the change in pose of the first vehicle or the change in pose of the second vehicle comprise a change in at least one of rotation or translation (Chang, Section IV, “The simulation experiment of BEV fusion and prediction requires naturalistic driving scenario data, which contain the movement information of traffic participants in a certain spatial area and a continuous time range, as well as the environment in formation”).  

Claim 16
Chang discloses the system of claim 1 (Chang, Fig. 2), wherein a first feature of the one or more first features( Chang, Fig. 2, Single Vehicle BEV Image from Vehicle No. 1, Section III.A, “With the help of various sensors, such as cameras and Lidar, the single vehicle perceives the surrounding environment. Then, the vehicle system converts the raw sensory data, such as images and point clouds into BEV space, and generates the local BEV centered on its own coordinates. BEV is a semantically composite data structure, which uses matrices to represent the occupancy of scenario elements within a certain spatial area. Each matrix element corresponds to the occupancy probability or state of each grid in the driving environment, which can be further summarized and displayed as RGB image.”, Section III.A.1.a, “At each grid location of the BEV, the occupying objects may include both vehicles and road elements, and they are not in conflict with each other.”) corresponds to a second feature of the one or more second features (Chang, Fig. 2, Single Vehicle BEV Image from Vehicle No. 2, Section III.A.1, “With the help of various sensors, such as cameras and Lidar, the single vehicle perceives the surrounding environment. Then, the vehicle system converts the raw sensory data, such as images and point clouds into BEV space, and generates the local BEV centered on its own coordinates. BEV is a semantically composite data structure, which uses matrices to represent the occupancy of scenario elements within a certain spatial area. Each matrix element corresponds to the occupancy probability or state of each grid in the driving environment, which can be further summarized and displayed as RGB image.”, Section III.A.1.a, “At each grid location of the BEV, the occupying objects may include both vehicles and road elements, and they are not in conflict with each other.”), and wherein the first feature is represented with a different level of distortion than the second feature (Chang, Section III.A, “we divide the scenario elements into different categories, i.e., dynamic traffic participants such as vehicles and pedestrians, and static road environment information such as drivable areas, lanes, traffic infrastructures, channelization,”, the dynamic and static features will have different level of distortions).  

Claim 17
Chang discloses the system of claim 1 (Chang, Fig. 2), wherein a first feature of the one or more first features is representative of a dynamic object (Chang, Section III.A, “we divide the scenario elements into different categories, i.e., dynamic traffic participants such as vehicles and pedestrians, and static road environment information such as drivable areas, lanes, traffic infrastructures, channelization”).  

Claim 18
Chang discloses the system of claim 1 (Chang, Fig. 2), wherein at least a portion of the system resides in a cloud-computing environment (Chang, Section V, “The third is the method adopted by our model, which applies roadside unit or cloud center to collect all CAVs’ information in the control area in a unified and centralized manner for global fusion and prediction.”).  

Claim 19
Chang discloses the system of claim 1 (Chang, Fig. 2), wherein the one or more processors (Chang, Section 4.A, “the CPU of the machine is Intel 10900X, and the GPU is RTX 3090. Our operating system is Ubuntu 18.04LTS with 128GB RAM”) are further configured to transmit BEV feature configuration information (Chang, Section III.A, “The model outputs the BEV occupancy grid estimate for the whole control area in future time horizon F. The output [Pc(x,y)]C× HO× WO is the occupancy probability of C elements in the global grid network with the size of HO×WO. Further, the occupancy state representation and visual image of the global CBEV are generated by the transformation rules in Formula (1) and (2), which constitute the final fusion and prediction information.”) to the first vehicle and the second vehicle for configuring BEV processing of the first vehicle and the second vehicle (Chang, Section I, “By extracting the single vehicle BEV data in the historical time horizons, we can integrate the perception information of different CAVs and predict the global BEV occupancy grid map in the future time horizons. This article focuses on BEV fusion and prediction. The fusion and prediction results can help achieve accurate environment perception and strengthen the understanding of the global scenario. Based on the results, the system can provide real-time driving risk warning, formulate the corresponding planning scheme, and send the messages to vehicles in the control area.”, Section 1, “Each connected and automated vehicle (CAV) regularly reports its own information to other vehicles or roadside units. By aggregating and fusing the data information from different CAVs, we can get a more accurate understanding of the global scenario”).  , wherein the BEV feature configuration information comprises at least one of BEV feature vector size, a model index for a model to transform two-dimensional camera images to BEV feature vectors, or the model (Chang, Section III.A, “The model outputs the BEV occupancy grid estimate for the whole control area in future time horizon F. The output [Pc(x,y)]C× HO× WO is the occupancy probability of C elements in the global grid network with the size of HO×WO. Further, the occupancy state representation and visual image of the global CBEV are generated by the transformation rules in Formula (1) and (2), which constitute the final fusion and prediction information.”).  

Claim 20
Chang discloses the system of claim 19 (Chang, Fig. 2), wherein the one or more processors (Chang, Section 4.A, “the CPU of the machine is Intel 10900X, and the GPU is RTX 3090. Our operating system is Ubuntu 18.04LTS with 128GB RAM”) are further configured to receive, from the first vehicle (Chang, Section III.A, “CAV sends its real-time SBEV data package to the roadside unit in a timer-trigger style”), a BEV feature vector in accordance with the BEV feature vector size, wherein the BEV feature vector comprises a raw BEV feature vector or a compressed BEV feature vector (Chang, Section III.A, “With the help of various sensors, such as cameras and Lidar, the single vehicle perceives the surrounding environment. Then, the vehicle system converts the raw sensory data, such as images and point clouds into BEV space, and generates the local BEV centered on its own coordinates. BEV is a semantically composite data structure, which uses matrices to represent the occupancy of scenario elements within a certain spatial area. Each matrix element corresponds to the occupancy probability or state of each grid in the driving environment, which can be further summarized and displayed as RGB image.”) .  

Claim 22
Chang discloses the system of claim 20 (Chang, Fig. 2), wherein the compressed BEV feature vector comprises soft BEV features or hard BEV features (Chang, Section III.A, “With the help of various sensors, such as cameras and Lidar, the single vehicle perceives the surrounding environment. Then, the vehicle system converts the raw sensory data, such as images and point clouds into BEV space, and generates the local BEV centered on its own coordinates. BEV is a semantically composite data structure, which uses matrices to represent the occupancy of scenario elements within a certain spatial area. Each matrix element corresponds to the occupancy probability or state of each grid in the driving environment, which can be further summarized and displayed as RGB image.”), wherein the soft BEV features comprise probabilities or likelihoods of object presence or object attributes, and wherein hard BEV features comprise binary or categorical representations of object presence or attributes (Chang, Section III.A, “At each grid location of the BEV, the occupying objects may include both vehicles and road elements, and they are not in conflict with each other. Therefore, we divide the scenario elements into different categories, i.e., dynamic traffic participants such as vehicles and pedestrians, and static road environment information such as drivable areas, lanes, traffic infrastructures, channelization, etc. We apply the symbol Pc(x,y) to denote the probability that the BEV position(x,y) is occupied by category c.[Pc(x,y)]C× H× W is the occupancy probability matrix of C elements in the grid network with the size of H × W.”).  

Claim 24
Chang discloses the system of claim 1 (Chang, Fig. 2), wherein the system comprises an intermediate collaboration system (Chang, Fig. 1), wherein the one or more processors (Chang, Section 4.A, “the CPU of the machine is Intel 10900X, and the GPU is RTX 3090. Our operating system is Ubuntu 18.04LTS with 128GB RAM”) are further configured to: receive, from the first vehicle (Chang, Fig. 2, Vehicle No.1), an indication of the one or more first features (Chang, Section III.A, “At each grid location of the BEV, the occupying objects may include both vehicles and road elements, and they are not in conflict with each other. Therefore, we divide the scenario elements into different categories, i.e., dynamic traffic participants such as vehicles and pedestrians, and static road environment information such as drivable areas, lanes, traffic infrastructures, channelization, etc. We apply the symbol Pc(x,y) to denote the probability that the BEV position(x,y) is occupied by category c.[Pc(x,y)]C× H× W is the occupancy probability matrix of C elements in the grid network with the size of H × W.”); and receive, from the second vehicle (Chang, Fig. 2, Vehicle No.2), an indication of the one or more second features (Chang, Section III.A, “At each grid location of the BEV, the occupying objects may include both vehicles and road elements, and they are not in conflict with each other. Therefore, we divide the scenario elements into different categories, i.e., dynamic traffic participants such as vehicles and pedestrians, and static road environment information such as drivable areas, lanes, traffic infrastructures, channelization, etc. We apply the symbol Pc(x,y) to denote the probability that the BEV position(x,y) is occupied by category c.[Pc(x,y)]C× H× W is the occupancy probability matrix of C elements in the grid network with the size of H × W.”).  

Claim 25
Chang discloses the system of claim 1 (Chang, Fig. 2), wherein as part of generating the BEV representation (Chang, Section III.A.2c, “The model outputs the BEV occupancy grid estimate for the whole control area in future time horizon F. The output [Pc(x,y)]C× HO× WO is the occupancy probability of C elements in the global grid network with the size of HO×WO. Further, the occupancy state representation and visual image of the global CBEV are generated by the transformation rules in Formula (1) and (2), which constitute the final fusion and prediction information”), the one or more processors (Chang, Section 4.A, “the CPU of the machine is Intel 10900X, and the GPU is RTX 3090. Our operating system is Ubuntu 18.04LTS with 128GB RAM”) are configured to discretize grid-free kernels for at least one vehicle of the plurality of vehicles (Chang, Fig. 2, Vehicle No.2, Section III.A.1, “Each matrix element corresponds to the occupancy probability or state of each grid in the driving environment, which can be further summarized and displayed as RGB image.”, the grids are removed and it is displayed as an RGB image).  

Claim 26
Chang discloses the system of claim 1 (Chang, Fig. 2), wherein the one or more processors (Chang, Section 4.A, “the CPU of the machine is Intel 10900X, and the GPU is RTX 3090. Our operating system is Ubuntu 18.04LTS with 128GB RAM”) are further configured to, prior to or as part of generating the BEV representation, align grids associated with the first vehicle and the second vehicle (Chnag, Section III.B, “Since the map information is pre-stored on the roadside, the system only needs to take the predicted grid occupancy state of vehicles and pedestrians, and then concatenate the tensors with the pre-stored standard map occupancy state as the final results.”).  

Claims 27 and 28 are rejected for similar reasons as those described in claims 1-2 and 7. The additional elements in Claim 27 and 28 (Chang) discloses includes: a method for processing data from a plurality of vehicles (Chang, Fig. 2).

Claim 30 is rejected for similar reasons as those described in claims 1-2. The additional elements in Claim 30 (Chang) discloses includes: a system for processing data from a plurality of vehicles (Chang, Fig. 2).


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 3, 11, and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Chang in view of Ren et al., "Collaborative Perception for Autonomous Driving: Current Status and Future Trend", (2022), hereinafter referred to as Ren.

Claim 3
Chang discloses the system of claim 1 (Chang, Fig. 2), wherein at least one processor of the one or more processors (Chang, Section 4.A, “the CPU of the machine is Intel 10900X, and the GPU is RTX 3090. Our operating system is Ubuntu 18.04LTS with 128GB RAM”).

Chang does not explicitly disclose that generating the BEV representation is located in the first vehicle or the second vehicle.  
	However, Ren teaches that generating the BEV representation (Ren, Fig. 5, shows the bird’s eye view collaboration detection)  is located in the first vehicle or the second vehicle (Ren, Fig. 5, “The collaborative object detection at ego-vehicle”).  
Chang and Ren are both considered to be analogous to the claimed invention because they are in the same field of multiple vehicle data sharing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system as taught by Chang to incorporate the teachings of Ren that generating the BEV representation is located in the first vehicle or the second vehicle. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to improve the accuracy of environmental perception as well as robustness and safety of transportation systems (Ren, Section 1).
Claim 11
Chang discloses the system of claim 1 (Chang, Fig. 2), wherein at least one processor of the one or more processors (Chang, Section 4.A, “the CPU of the machine is Intel 10900X, and the GPU is RTX 3090. Our operating system is Ubuntu 18.04LTS with 128GB RAM”).

Chang does not explicitly disclose to generate a mask based on overlapping fields of view of at least one sensor system of the first vehicle and at least one sensor system of the second vehicle; and apply the mask to a plurality of first features as part of determining the one or more first features and apply the mask to a plurality of second feature as part of determining the one or more second features.  
	However, Ren teaches to generate a mask based on overlapping fields of view of at least one sensor system of the first vehicle and at least one sensor system of the second vehicle; and apply the mask to a plurality of first features as part of determining the one or more first features and apply the mask to a plurality of second feature as part of determining the one or more second features (Ren, Fig. 4, collaborative perception mask).  
Chang and Ren are both considered to be analogous to the claimed invention because they are in the same field of multiple vehicle data sharing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system as taught by Chang to incorporate the teachings of Ren to generate a mask based on overlapping fields of view of at least one sensor system of the first vehicle and at least one sensor system of the second vehicle; and apply the mask to a plurality of first features as part of determining the one or more first features and apply the mask to a plurality of second feature as part of determining the one or more second features. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to improve the accuracy of environmental perception as well as robustness and safety of transportation systems (Ren, Section 1).

Claim 14
Chang discloses the system of claim 1 (Chang, Fig. 2),  wherein the first vehicle data is based on first sensor data from a first plurality of sensor systems of the first vehicle (Chang, Fig. 2, Vehicle no. 1, Section III.A, “With the help of various sensors, such as cameras and Lidar, the single vehicle perceives the surrounding environment. Then, the vehicle system converts the raw sensory data, such as images and point clouds into BEV space, and generates the local BEV centered on its own coordinates. BEV is a semantically composite data structure, which uses matrices to represent the occupancy of scenario elements within a certain spatial area. Each matrix element corresponds to the occupancy probability or state of each grid in the driving environment, which can be further summarized and displayed as RGB image.”) and the second vehicle data is based on second sensor data from a second plurality of sensor systems of the second vehicle (Chang, Fig. 2, Vehicle no. 2, Section III.A, “With the help of various sensors, such as cameras and Lidar, the single vehicle perceives the surrounding environment. Then, the vehicle system converts the raw sensory data, such as images and point clouds into BEV space, and generates the local BEV centered on its own coordinates. BEV is a semantically composite data structure, which uses matrices to represent the occupancy of scenario elements within a certain spatial area. Each matrix element corresponds to the occupancy probability or state of each grid in the driving environment, which can be further summarized and displayed as RGB image.”).

Chang does not explicitly disclose wherein at least one sensor system of the first plurality of sensor systems is of a different type than each sensor system of the second plurality of sensor systems.  
	However, Ren teaches wherein at least one sensor system of the first plurality of sensor systems is of a different type than each sensor system of the second plurality of sensor systems (Ren, Section 4.2, “Collaborative semantic segmentation of 3D scenes targets to produce semantic segmentation masks for each agent given observations (images, LIDAR point clouds, etc.) of 3D scenes from several agents”).  
Chang and Ren are both considered to be analogous to the claimed invention because they are in the same field of multiple vehicle data sharing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system as taught by Chang to incorporate the teachings of Ren wherein at least one sensor system of the first plurality of sensor systems is of a different type than each sensor system of the second plurality of sensor systems. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to improve the accuracy of environmental perception as well as robustness and safety of transportation systems (Ren, Section 1).

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Chang in view of Cheng et al., (US 2022/0036098 A1), hereinafter referred to as Cheng.

Claim 13
Chang discloses the system of claim 1 (Chang, Fig. 2).

Chang does not explicitly disclose wherein the first vehicle data is based on first sensor data from a first plurality of sensor systems of the first vehicle and the second vehicle data is based on second sensor data from a second plurality of sensor systems of the second vehicle, wherein the first sensor data and the second sensor data have a different resolution.  
	However, Cheng teaches wherein the first vehicle data (Cheng, Fig. 6, vehicle 602A) is based on first sensor data from a first plurality of sensor systems of the first vehicle (Cheng, [0073], “ The vehicle 602A can use one or more sensors 240 (such as one or more cameras 246) to acquire first environment data of at least a portion 608A of the external environment of the vehicle 602A”) and the second vehicle data (Cheng, Fig. 6, vehicle 602B)  is based on second sensor data from a second plurality of sensor systems of the second vehicle (Cheng, [0073], “ The vehicle 602B can use one or more sensors 240 (such as one or more cameras 246) to acquire second environment data of at least a portion 608B of the external environment of the vehicle 602B.”), wherein the first sensor data and the second sensor data have a different resolution (Cheng, [0076], The vehicle 602A can identify the first environment data that is located in the common region 610, such the first environment data that includes the person 606. The vehicle 602A can reduce the resolution of the first portion of the first environment data that is located in the common region 610. As an example, the vehicle 602A can downsample the first portion using a Canny edge detector. In some instances, the vehicle 602A can also compress the first portion using a neural network encoder/decoder and/or any suitable compression method.”).  
Chang and Cheng are both considered to be analogous to the claimed invention because they are in the same field of multiple vehicle data sharing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system as taught by Chang to incorporate the teachings of Cheng wherein the first vehicle data is based on first sensor data from a first plurality of sensor systems of the first vehicle and the second vehicle data is based on second sensor data from a second plurality of sensor systems of the second vehicle, wherein the first sensor data and the second sensor data have a different resolution. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to reduce redundant sensor data being transferred from the ego vehicle to the other vehicle (Cheng, [0016]).

Claims 10, 21, and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Chang in view of Xu et al., "CoBEVT: Cooperative Bird’s Eye View Semantic Segmentation with Sparse Transformers", hereinafter referred to as Xu.

Claim 10
Chang discloses the system of claim 9 (Chang, Fig. 2).

Chang does not explicitly disclose wherein as part of determining to not fuse the third one or more features, the one or more processors are configured to determine that the third one or more features based on a determination that the third vehicle will be outside a neighborhood in less than or less than or equal to a predetermined threshold amount of time, the neighborhood comprising a geographical area including the plurality of vehicles.  
	However, Xu teaches wherein as part of determining to not fuse the third one or more features, the one or more processors are configured to determine that the third one or more features based on a determination that the third vehicle will be outside a neighborhood in less than or less than or equal to a predetermined threshold amount of time, the neighborhood comprising a geographical area including the plurality of vehicles (Zu, Section 4.2, “We assume all the AVs have a 70m communication range following , and all the vehicles out of this broadcasting radius of ego vehicle will not have any collaboration. For the OPV2V camera-track,we choose ResNet34 [52] as the image feature extractor in SinBEVT. The transmitted BEV intermediate representation has a resolution of 32 × 32 × 128. For the multi agent fusion, our FuseBEVT component has 3 encoded layers and a window size of 8 for both local and global attention”).  
Chang and Xu are both considered to be analogous to the claimed invention because they are in the same field of multiple vehicle data sharing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system as taught by Chang to incorporate the teachings of Xu wherein as part of determining to not fuse the third one or more features, the one or more processors are configured to determine that the third one or more features based on a determination that the third vehicle will be outside a neighborhood in less than or less than or equal to a predetermined threshold amount of time, the neighborhood comprising a geographical area including the plurality of vehicles. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to retrieve better accuracy (Xu, Section 2.2).

Claim 21
Chang discloses the system of claim 20 (Chang, Fig. 2).

Chang does not explicitly disclose wherein the compressed BEV feature vector is compressed using at least one of quantization, pruning, hashing, or transformation. 
	However, Xu teaches wherein the compressed BEV feature vector is compressed (Zu, Fig. 1, Compressed BEV features) using at least one of quantization, pruning, hashing, or transformation (Section 1, “Each AV computes its own BEV representation from its camera rigs with the SinBEVTTransformer and then transmits it to others after compression. The receiver (i.e. other AVs) transforms the received BEV features onto its coordinate system, and employs the proposed FuseBEVT for BEV-level aggregation. The core ingredient of these two transformers is a novel fused axial attention(FAX) module, which can search over the whole BEV or camera image space across all agents or camera views via local and global spatial sparsity.”).
Chang and Xu are both considered to be analogous to the claimed invention because they are in the same field of multiple vehicle data sharing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system as taught by Chang to incorporate the teachings of Xu wherein the compressed BEV feature vector is compressed using at least one of quantization, pruning, hashing, or transformation. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to retrieve better accuracy (Xu, Section 2.2).


Claim 23
Chang discloses the system of claim 1 (Chang, Fig. 2), wherein the one or more processors (Chang, Section 4.A, “the CPU of the machine is Intel 10900X, and the GPU is RTX 3090. Our operating system is Ubuntu 18.04LTS with 128GB RAM”.

Chang does not explicitly disclose wherein prior to fusing the one or more first features and the one or more second features, compress the one or more first features and the one of more second features using at least one of quantization, pruning, hashing, or transformation.  
	However, Xu teaches wherein prior to fusing the one or more first features and the one or more second features (Zu, Fig. 2, aggregated BEV Features), compress the one or more first features and the one of more second features (Zu, Fig. 1, Compressed BEV features) using at least one of quantization, pruning, hashing, or transformation (Section 1, “Each AV computes its own BEV representation from its camera rigs with the SinBEVTTransformer and then transmits it to others after compression. The receiver (i.e. other AVs) transforms the received BEV features onto its coordinate system, and employs the proposed FuseBEVT for BEV-level aggregation. The core ingredient of these two transformers is a novel fused axial attention(FAX) module, which can search over the whole BEV or camera image space across all agents or camera views via local and global spatial sparsity.”).  
Chang and Xu are both considered to be analogous to the claimed invention because they are in the same field of multiple vehicle data sharing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system as taught by Chang to incorporate the teachings of Xu wherein prior to fusing the one or more first features and the one or more second features, compress the one or more first features and the one of more second features using at least one of quantization, pruning, hashing, or transformation. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to retrieve better accuracy (Xu, Section 2.2).



Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DENISE G ALFONSO whose telephone number is (571)272-1360. The examiner can normally be reached Monday - Friday 7:30 - 5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Amandeep Saini can be reached at (571)272-3382. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/DENISE G ALFONSO/Examiner, Art Unit 2662                                                                                                                                                                                                        
/AMANDEEP SAINI/Supervisory Patent Examiner, Art Unit 2662
Read full office action
Prosecution Timeline

Mar 18, 2024
Application Filed
Feb 21, 2026
Non-Final Rejection — §102, §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/161,911
Patent 12586352
IMAGE PROCESSING APPARATUS, IMAGE PROCESSING METHOD AND STORAGE MEDIUM
2y 5m to grant Granted Mar 24, 2026
17/537,799
Patent 12579693
ELECTRONIC SHELF LABEL MANAGING SERVER, DISPLAY DEVICE AND CONTROLLING METHOD THEREOF
2y 5m to grant Granted Mar 17, 2026
18/080,993
Patent 12555371
VISION TRANSFORMER FOR MOBILENET SIZE AND SPEED
2y 5m to grant Granted Feb 17, 2026
17/821,378
Patent 12541980
METHOD FOR DETERMINING OBJECT INFORMATION RELATING TO AN OBJECT IN A VEHICLE ENVIRONMENT, CONTROL UNIT AND VEHICLE
2y 5m to grant Granted Feb 03, 2026
18/007,104
Patent 12541941
A Method for Testing an Embedded System of a Device, a Method for Identifying a State of the Device and a System for These Methods
2y 5m to grant Granted Feb 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
74%
Grant Probability
94%
With Interview (+19.8%)
3y 1m
Median Time to Grant
Low
PTA Risk
Based on 103 resolved cases by this examiner. Grant probability derived from career allow rate.