Last updated: April 19, 2026
Application No. 18/586,720
System and Method for Radar Object Detection and Tracking Using Cross-Frame Spatial-Temporal Relationality

Non-Final OA §101§102§103§112
Filed
Feb 26, 2024
Examiner
WAHEED, NAZRA NUR
Art Unit
3648
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
Mitsubishi Electric Research Laboratories Inc.
OA Round
1 (Non-Final)
Interview Optional

— +11.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 221 resolved cases, 2023–2026
Examiner Intelligence

WAHEED, NAZRA NUR View full profile →
Grants 83% — above average
Career Allow Rate
184 granted / 221 resolved
+31.3% vs TC avg
Moderate +11% lift
Without
With
+11.2%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
37 currently pending
Career history
258
Total Applications
across all art units
Statute-Specific Performance

§101
4.1%
-35.9% vs TC avg
§103
46.5%
+6.5% vs TC avg
§102
22.8%
-17.2% vs TC avg
§112
23.6%
-16.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 221 resolved cases
Office Action

§101 §102 §103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
Claims 1-21 are currently pending and have been examined. 
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 02/26/2024 has been considered by the examiner and an initialed copy of the IDS is hereby attached.  
Claim Objections
Claims 3 and 16 objected to because of the following informalities: 
Claim 3 recites, “with each shifted of the set of shifted windows” which appears to be an incomplete limitation and grammatically incorrect. 
Claim 16 recites, “to indicative” in “positional vectors to indicative of a positional difference associated with the one or more objects in the sequence of the radar images” which renders the limitation grammatically incorrect. 
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 8 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Regarding claim 8, the phrase "may correspond to at least one of" renders the claim indefinite because it is unclear whether the limitation(s) following the phrase are part of the claimed invention.  See MPEP § 2173.05(d).

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-21 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception without significantly more. The claim(s) recite(s) judicial exceptions as explained in the Step 2A, Prong 1 analysis below. The judicial exceptions are not integrated into a practical application as explained in the Step 2A, Prong 2 analysis below. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception as explained in the Step 2B analysis below.

Claim 1:  
A radar system for detecting and tracking one or more objects in a scene, the radar system comprising: a processor; and a memory having instructions stored thereon that, when executed by the processor, cause the radar system to:
collect features of each radar image in a sequence of radar images indicative of radar measurements of the scene at different consecutive instances of time to form a spatiotemporal pool of features collected across space and time, wherein the sequence of radar images includes a plurality of radar images; 
process, using a neural network employing an attention mechanism, the spatiotemporal pool of features to generate a spatiotemporal pool of selected features; 
process, using the neural network employing a window shifting mechanism, the spatiotemporal pool of selected features to generate discrete spatiotemporal patches; 
process, using the neural network employing the attention mechanism, the discrete spatiotemporal patches to generate an enhanced spatiotemporal pool of features; and 
determine at least one property of the one or more objects in the scene based on the enhanced spatiotemporal pool of features.  
Step
Analysis
1: Statutory Category?
Yes. The claim recites a system and therefore, is an apparatus and eligible for further analysis. 
2A - Prong 1: Judicial Exception Recited (i.e., mathematical concepts, certain methods of organizing human activities such as a fundamental economic practice, or mental processes)?
Yes. The claim recites the limitation of:

 “process, using a neural network employing an attention mechanism, the spatiotemporal pool of features to generate a spatiotemporal pool of selected features”;

“process, using the neural network employing a window shifting mechanism, the spatiotemporal pool of selected features to generate discrete spatiotemporal patches”; 

“process, using the neural network employing the attention mechanism, the discrete spatiotemporal patches to generate an enhanced spatiotemporal pool of features”; and 

“determine at least one property of the one or more objects in the scene based on the enhanced spatiotemporal pool of features.” 


These limitations, as drafted, are processed that, under its broadest reasonable interpretation, can be performed in the human mind and is simply mathematical manipulation of data. Thus, the claim recites an abstract idea. 
2A - Prong 2: Integrated into a Practical Application?
No. 

The claim does not recite any additional elements that would integrate the judicial exception into a practical application. 

The recitation of the limitation of, 

“collect features of each radar image in a sequence of radar images indicative of radar measurements of the scene at different consecutive instances of time to form a spatiotemporal pool of features collected across space and time, wherein the sequence of radar images includes a plurality of radar images; ” 

amounts to mere data gathering and is considered an insignificant extra-solution activity to the judicial exception. 
2B: Claim provides an Inventive Concept?
No. 

Step 2 considers whether the claim provides limitations which amount to “significantly more” than the recited judicial exception. The claim as a whole does not provide any meaningful limitations which amount to significantly more than the mental process of claim 1. 

For example, the use of the “a processor; and a memory having instructions stored thereon that, when executed by the processor” in claim 1 are elements which are well understood, routine, and conventional in the field to gather and process data. Furthermore, the use of “a neural network” has been claimed at a high level of generality and therefore does not provide significantly more than the recited judicial exception. 

 Therefore, the claim is ineligible.


Independent claim(s) 20 and 21 are also rejected under 35 U.S.C. 101 due to same analysis and rationale as independent claim 1 above where claim 20 is a method claim and 21 is a system claim. 
Dependent claim(s) 2-19 do not recite any further limitations that cause the claim(s) to be patent eligible.  Rather, the limitations of the dependent claims are directed toward additional aspects of the judicial exception and/or well-understood, routine and conventional additional elements that do not integrate the judicial exception into a practical application. Specifically, the claims only recite limitations further defining the mental process and recite further data gathering and the mathematical manipulation of the gathered data. These limitations are considered mental process steps and additional steps that amount to necessary data gathering, data process and data output. These additional elements fail to integrate the abstract idea into a practical application because they do not impose meaningful limits on the claimed invention.  As such, the additional elements individually and in combination do not amount to significantly more than the abstract idea. 
Therefore, when considering the combination of elements and the claimed invention as a whole, claims 1-20 are not patent eligible.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-10,13-17, 20 and 21 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Park et al. (US 20240062520 A1), hereinafter Park.

Regarding claim 1, Park discloses 
A radar system for detecting and tracking one or more objects in a scene (see vehicle system 200 which includes radar sensors, further see paragraph 0044, “Autonomous system 202 includes a sensor suite that includes one or more devices such as cameras 202a, LiDAR sensors 202b, radar sensors 202c, and microphones 202d. In some embodiments, autonomous system 202 can include more or fewer devices and/or different devices (e.g., ultrasonic sensors, inertial sensors, GPS receivers (discussed below), odometry sensors that generate data associated with an indication of a distance that vehicle 200 has traveled, and/or the like). In some embodiments, autonomous system 202 uses the one or more devices included in autonomous system 202 to generate data associated with environment 100, described herein. The data generated by the one or more devices of autonomous system 202 can be used by one or more systems described herein to observe the environment (e.g., environment 100) in which vehicle 200 is located.”), the radar system comprising: a processor; and a memory having instructions stored thereon (see paragraph 0063, “In some embodiments, device 300 performs one or more processes described herein. Device 300 performs these processes based on processor 304 executing software instructions stored by a computer-readable medium, such as memory 306 and/or storage component 308. A computer-readable medium (e.g., a non-transitory computer readable medium) is defined herein as a non-transitory memory device. A non-transitory memory device includes memory space located inside a single physical storage device or memory space spread across multiple physical storage devices.”) that, when executed by the processor, cause the radar system to:
collect features of each radar image in a sequence of radar images indicative of radar measurements of the scene at different consecutive instances of time to form a spatiotemporal pool of features collected across space and time, wherein the sequence of radar images includes a plurality of radar images (see Fig. 5A, images 502, further see paragraphs 0100-0101, “The images 502 for a particular scene (also referred to herein as a set of images 502) may include image data from one or more sensors in a sensor suite. The images 502 may include different types of images corresponding to the sensor or device used to generate them. For example, the images 502 may be camera images generated from one or more cameras, such as cameras 202a, or lidar images generated from one or more lidar sensors, such as lidar sensors 202b. Other image types can be used, such as radar images generated from one or more radar sensors (e.g., generated from radar sensors 202c)…Moreover, the images 502 of a set of images may be generated at approximately the same time and may form part of a stream of different images. As such, the images 502 may represent the scene of a vehicle at a particular time. As the perception system 402 uses the images to generate bounding boxes 512 and navigate a vehicle, it will be understood that the perception system 402 may process the images 502 in real-time or near real-time to generate the bounding boxes 512”, further see paragraph 0155, “The images 502 may correspond to images taken at the same (or approximately same) time (e.g., within milliseconds of each other). In this way, the images may correspond to the same scene for the vehicle. Moreover, the perception system 402 may repeatedly receive images and perform the functions described herein multiple times per second as new images are received. Accordingly, it will be understood that the perception system 402 may operate in real-time or near real-time to generate bounding boxes 512 from the images 502.”); 
process, using a neural network employing an attention mechanism (see Fig. 5A, where the perception system 402 implements a neural network with an attention stage 506, further see paragraph 0074, “In some embodiments, perception system 402, planning system 404, localization system 406, and/or control system 408 implement at least one machine learning model (e.g., at least one multilayer perceptron (MLP), at least one convolutional neural network (CNN), at least one recurrent neural network (RNN), at least one autoencoder, at least one transformer, and/or the like).”), the spatiotemporal pool of features to generate a spatiotemporal pool of selected features (see paragraph 0109, “The attention stage 506 may be used to enrich feature maps and object queries. In certain cases, the attention stage 506 may enrich feature maps and object queries using self-attention and/or cross-attention techniques.”); 
process, using the neural network employing a window shifting mechanism, the spatiotemporal pool of selected features to generate discrete spatiotemporal patches (see Figs. 5A and 5B, further see paragraph 0162, “In addition, the groups of grid cells (e.g., windows) used for self-attending grid cells in one layer may differ from the groups of grid cells in another layer. For example, the groups of grid cells used by the multi-view stage 516 in a first layer may differ from the groups of grid cells used in a second layer. In some cases, the windows used by the multi-view stage 516 in a second layer may be shifted in one or more directions relative to windows used by the multi-view stage 516 in the first layer. Subsequent layers may include additional shifts or oscillate between the placement of windows in the first layer and the placement of windows in the second layer. In certain cases, windows in different layers of the multi-view stage 516 may be shifted differently than the windows in different layers of the ROI stage 518. For example, windows in different layers of the multi-view stage 516 may be shifted in one direction (e.g., horizontally) and windows in different layers of the ROI stage 518 may be shifted in multiple directions (e.g., vertically and horizontally).”, where the processing requires digitized data and therefore the patches of data are indeed “discrete spatiotemporal patches”, further see Fig. 4C for support where the input image is digitized to create “discrete spatiotemporal patches” at the first convolution layer); 
process, using the neural network employing the attention mechanism, the discrete spatiotemporal patches to generate an enhanced spatiotemporal pool of features (see paragraph 0124, “The multi-view stage 516 may enrich feature maps by comparing and/or correlating features from different grid cells of the feature maps. In some cases, the multi-view stage 516 uses the features from grid cells in a group of grid cells to update each other (also referred to herein as self-attention). For example, the multi-view stage 516 may use features of a group of grid cells in one or more feature maps to enrich or modify features of a particular grid cell in the group of grid cells.”, further see paragraph 0134, “By offsetting windows (or changing groups) in different layers, the attention stage 506 may improve the enrichment of the (grid cells of the) feature maps. For example, as grid cells within window 552a are compared and used to enrich each other at one layer, those enrichments may be propagated to grid cells within the window 554a in a subsequent layer. In this way enrichments may be propagated across the feature maps/images.”); and 
determine at least one property of the one or more objects in the scene based on the enhanced spatiotemporal pool of features (see paragraph 0190, “As described herein, the perception system 402 may use the set of second enriched semantic data to enrich object queries. For example, using features of the object queries, the perception system 402 may identify grid cells from the enriched feature maps that correspond to the object queries (e.g., using a linear layer matrix). The identified grid cells may be used to enrich or modify the features of the object queries. Moreover, the perception system 402 may perform a self-attention function on the object queries that it has generated to cross relate and/or correlate the features of the object queries. Similar to the enrichment of the feature maps, the perception system 402 may use multiple layers of enriched feature maps to (further) enrich object queries. The perception system 402 may use the resulting enriched object queries to generate bounding boxes.”, further see Fig. 6, generating bounding boxes at 512).  

Regarding claim 2, Park further discloses
The radar system of claim 1, wherein the processor is further configured to:
partition, using the neural network employing the window shifting mechanism, the spatiotemporal pool of selected features to generate a set of shifted windows (see Fig. 5B, further see paragraph 0127, “FIG. 5B is a diagram illustrating an example of rows of windows 552 (individual windows referred to as 552a, 552b, 552c, etc.) applied to feature maps 551 (individual feature maps referred to as 551a-551f) corresponding to different images (received from different images sensors). In the illustrated example, the regions are equally sized and each row is aligned with the row above and below. However, it will be understood that different sized windows may be used and/or the row may be offset from each other. In some cases, each row may be offset from another row. In certain cases, alternating rows may be aligned with intermediate rows offset (e.g., like rows of bricks). In addition, as illustrated in FIG. 5B, the windows 552 may overlap across multiple feature maps. For example, the window 552a includes grid cells from the feature map 551a and the feature map 551f.”); and 
generate the discrete spatiotemporal patches based on temporally indexed radar images associated with each shifted window of the set of shifted windows (see Fig. 5B, where the patches are generated based on indexed radar imaged associated with each shifted window of the set of shifted windows).  

Regarding claim 3, Park further discloses
The radar system of claim 2, wherein the processor is further configured to:
generate a first discrete spatiotemporal patch of the discrete spatiotemporal patches based on one temporally indexed radar images of the temporally indexed radar images associated with each shifted window of the set of shifted windows (see Figs. 5B and 5C, further see paragraph 0131, “In some cases, the attention stage 506 may include multiple layers of the multi-view stage 516. In some such cases, the groups (e.g., windows) in the different layers of the multi-view stage 516 may be different. In some cases, the windows may be sized and/or positioned differently. For example, as illustrated by the windows 554 (individual windows referred to as 554a, 554b, 554c, etc.) on feature maps 551 of FIG. 5B, the windows 554 in a second layer of an attention stage 506 may be offset from the windows 552 in the first layer of the attention stage 506.”); and 
generate a second discrete spatiotemporal patch of the discrete spatiotemporal patches based on another temporally indexed radar images of the temporally indexed radar images associated with each shifted of the set of shifted windows (see Figs. 5B and 5C, further see paragraphs 0131-0132, “In certain cases, where the windows in different layers are offset from each other, alternating layers may use the same or different positions. For example, in an attention stage 506 with six layers, the odd numbered layers (e.g., layers one, three, and five) may be aligned similar to the rows of windows 552 shown in FIG. 5B and the even numbered layers (e.g., layers two, four, and six) may be aligned similar to the rows of windows 554 (individual windows referred to as 554a, 554b, 554c, etc.) shown in FIG. 5B.”).  

Regarding claim 4, Park further discloses
The radar system of claim 3, wherein the processor is further configured to process, using the neural network employing the attention mechanism iteratively, the first discrete spatiotemporal patch of the discrete spatiotemporal patches and the second discrete spatiotemporal patch of the discrete spatiotemporal patches to generate a spatiotemporal pool of updated features (see paragraph 0134, “By offsetting windows (or changing groups) in different layers, the attention stage 506 may improve the enrichment of the (grid cells of the) feature maps. For example, as grid cells within window 552a are compared and used to enrich each other at one layer, those enrichments may be propagated to grid cells within the window 554a in a subsequent layer. In this way enrichments may be propagated across the feature maps/images.”).  

Regarding claim 5, Park further discloses
The radar system of claim 4, wherein the processor is further configured to apply a window merging operation corresponding to overlapped positions of each shifted window associated the spatiotemporal pool of updated features to generate the enhanced spatiotemporal pool of features (see paragraph 0136, “In certain cases, the ROI stage 518 may group grid cells based on objects (e.g., group grid cells that appear to correspond to the same object or to an outline of the same object). In some cases, the ROI stage 518 may group grid cells by dividing a feature map into multiple regions or windows and/or assigning different grid cells of a feature map to different regions or windows, similar to the multi-view stage 516 but using differently shaped or differently sized regions or windows. In certain cases, the different regions or windows in a layer of the ROI stage 518 may be mutually exclusive (e.g., a grid cell of a feature map may be assigned to only one region or window). In certain cases, the ROI stage 518 may divide the feature map into multiple rows or columns of regions or windows. Some or all of the regions or window may be equally (or differently) sized, and one or more of the regions may overlap with multiple feature maps corresponding to different images. The rows or columns may be aligned or offset from each other.”).  

Regarding claim 6, Park further discloses
The radar system of claim 5, wherein the window merging operation include at least one of: a maximization operation, a summation operation (see paragraph 0136, where the feature of “group grid cells based on objects” is a summation operation as you are grouping by adding different cells together, where such a feature fulfills the BRI of “a summation operation”), and a mean operation. 

Regarding claim 7, Park further discloses
The radar system of claim 1, wherein the attention mechanism corresponds to a masked cross attention mechanism (see Fig. 4C, where the attention mechanisms corresponds to a Boolean masked image (i.e. which employs a masked cross attention mechanism to generate the Boolean image)).  

Regarding claim 8, Park further discloses
The radar system of claim 1, wherein the determined at least one property of the one or more objects may correspond to at least one of: object center coordinates, an object width, an object length, an object height, an object orientation, and object offsets (see Fig. 6, where generating the bounding boxes at 512 corresponds to object center coordinates, object width, object length, object height, object orientation and object offsets).  

Regarding claim 9, Park further discloses
The radar system of claim 1, wherein the processor is further configured to output, based on each enhanced feature of the enhanced spatiotemporal pool of features, one or more visual indicators to indicate the determined at least one property of the one or more objects in the scene (see Fig. 6, where generating the bounding boxes at 512 is an output which depicts visual indicators of the properties of the one or more objects in the scene). 

Regarding claim 10, Park further discloses
The radar system of claim 9, wherein the one or more visual indicators correspond to one or more bounding boxes  (see Fig. 6, where generating the bounding boxes at 512 is an output which depicts visual indicators of the properties of the one or more objects in the scene). 

Regarding claim 13, Park further discloses
The radar system of claim 1, wherein the processor is further configured to 
generate a set of temporal windows by applying a window grouping operation on the radar images in the sequence of radar images (see paragraph 0096, “By grouping grid cells into subsets of feature maps, the autonomous vehicle can cross-relate and correlate the features of the grid cells using fewer compute resources, which can increase the speed of a self-attention process and the speed of enriching feature maps.”);
permute an order associated with the radar images to generate concatenated sequences of permuted radar images corresponding to each temporal window of the set of temporal windows (see paragraph 0125, “In certain cases, the multi-view stage 516 may group grid cells based on objects (e.g., group grid cells that correspond (or appear to correspond) to the same object or to an outline of the same object). In some cases, the multi-view stage 516 may group grid cells by dividing a feature map into multiple regions (also referred to herein as windows) and/or assign different grid cells of a feature map to the different regions or windows. In certain cases, the different regions or windows of the feature map may be mutually exclusive (e.g., a grid cell may be assigned to only one region or window). In certain cases, the multi-view stage 516 may divide the feature map into multiple rows or columns of regions or windows. Some or all of the regions or windows may have the same (or different) sized (e.g., width and height), and one or more of the regions may overlap with multiple feature maps corresponding to different images. The rows of windows may be aligned or offset from each other.”); 
collect features for the concatenated sequences of permutated radar images corresponding to each temporal window of the set of temporal windows to generate the spatiotemporal pool of features (see Figs. 5A and 5B, further see paragraphs 0125-0128); and 
process, using the neural network employing the attention mechanism, the spatiotemporal pool of features within each temporal window to generate a first set of enhanced spatiotemporal patches of features corresponding to each radar image in the sequence of radar images (see paragraph 0129, “In certain cases, the multi-view stage 516 may cross correlate some or all of the features of the various grid cells within a group (e.g., within a particular region) to each other. In this way, the multi-view stage 516 may enrich some or all of the grid cells within the particular group. Moreover, the multi-view stage 516 may repeat the comparison for each of the groups (e.g., windows) of a feature map and/or across some or all of the feature maps such that some or all grid cells of the feature maps are compared/updated based on comparisons with features from other grid cells in the same group (e.g., window or region).”). 

Regarding claim 14, Park further discloses
The radar system of claim 13, wherein the processor is further configured to:
partition each enhanced spatiotemporal patch of the first set of enhanced spatiotemporal patches of features to generate a spatiotemporal pool of subset features within each temporal window (see paragraphs 0135-0136, “The ROI stage 518 may enrich feature maps by comparing and/or correlating features from different grid cells of the feature maps. In some cases, the ROI stage 518 uses the features from grid cells in a group of grid cells (such as grid cells in a window) to update each other (also referred to herein as self-attention). For example, the ROI stage 518 may use features of a group of grid cells in one or more feature maps to enrich or modify features of a particular grid cell in the group of grid cells…In certain cases, the ROI stage 518 may group grid cells based on objects (e.g., group grid cells that appear to correspond to the same object or to an outline of the same object). In some cases, the ROI stage 518 may group grid cells by dividing a feature map into multiple regions or windows and/or assigning different grid cells of a feature map to different regions or windows, similar to the multi-view stage 516 but using differently shaped or differently sized regions or windows. In certain cases, the different regions or windows in a layer of the ROI stage 518 may be mutually exclusive (e.g., a grid cell of a feature map may be assigned to only one region or window). In certain cases, the ROI stage 518 may divide the feature map into multiple rows or columns of regions or windows. Some or all of the regions or window may be equally (or differently) sized, and one or more of the regions may overlap with multiple feature maps corresponding to different images. The rows or columns may be aligned or offset from each other.”); 
generate an updated set of temporal windows by applying a re-grouping operation on each subset feature of the spatiotemporal pool of subset features (see paragraph 0135, “The ROI stage 518 may enrich feature maps by comparing and/or correlating features from different grid cells of the feature maps. In some cases, the ROI stage 518 uses the features from grid cells in a group of grid cells (such as grid cells in a window) to update each other (also referred to herein as self-attention). For example, the ROI stage 518 may use features of a group of grid cells in one or more feature maps to enrich or modify features of a particular grid cell in the group of grid cells.”); and 
process, using the neural network employing the attention mechanism, each subset features within each updated temporal window of the set of updated temporal windows to generate a second set of enhanced spatiotemporal patches of subset features for each temporal window of the updated set of temporal windows (see paragraph 0139, “The ROI stage 518 may compare semantic data of groups of grid cells (e.g., different grid cells within a particular window or region) with each other. Based on the comparison, the ROI stage 518 may modify the semantic data of the different grid cells. For example, the ROI stage 518 may compare certain features of a grid cell (e.g., color, reflectivity, shape, edge, etc.) with corresponding features of a different grid cell in the same group (e.g., compare features of a grid cell within window 556a with corresponding features of a different grid cell within the window 556a). Based on a similarity, the ROI stage 518 may determine a probabilistic relationship between the grid cells in the group (e.g., probability that the grid cells are part of the same object, such as a vehicle, bicycle, pedestrian, construction cone, etc.). For example, one grid cell may be updated to indicate that it is the middle portion of an object, and another grid cells may be updated to indicate that it is the beginning of the same object. As another non-limiting example, one grid may be updated to indicate that it is moving 60 m/s, and another grid cells may be updated to indicate that it is moving 10 m/s, etc.”). 

Regarding claim 15, Park further discloses
The radar system of claim 14, wherein the processor is further configured to:
process, each spatiotemporal patch of the second set of enhanced spatiotemporal patches with a reverse re-grouping operation to generate a set of updated spatiotemporal sub patches (see paragraph 0141, “As described above, with reference to the self-attention of object queries, in some cases, the ROI stage 518 may generate a matrix that includes some or all of the grid cells within a group. The ROI stage 518 may then determine a weight or probabilistic relationship between the grid cells and include the weight in the matrix. The ROI stage 518 may use the weights/relationships in the matrix (indicative of a relationship or weight between grid cells) to calculate updated values for the features of the different grid cells. For example, the ROI stage 518 may update a particular value of a particular grid cell using corresponding weighted values of some or all of the other grid cells in the group. An example of such a matrix and calculation (but for object queries) is described herein with reference to the self-attention of object queries. Moreover, this process may be repeated across some or all of the groups of grid cells of a feature map and across some or all of the feature maps. For example, the image feature extractor 504 may generate multiple feature maps for each image with each feature map corresponding to one or more detected characteristics of the image. In some such cases, the windows (or other form of grouping) may be applied to some or all of the feature maps and the grid cells of the feature maps updated as described herein.”); and 
process, each updated spatiotemporal sub patch of the set of updated spatiotemporal sub patches with the window merging operation to generate the enhanced spatiotemporal pool of features (see paragraph 0141, “As described above, with reference to the self-attention of object queries, in some cases, the ROI stage 518 may generate a matrix that includes some or all of the grid cells within a group. The ROI stage 518 may then determine a weight or probabilistic relationship between the grid cells and include the weight in the matrix. The ROI stage 518 may use the weights/relationships in the matrix (indicative of a relationship or weight between grid cells) to calculate updated values for the features of the different grid cells. For example, the ROI stage 518 may update a particular value of a particular grid cell using corresponding weighted values of some or all of the other grid cells in the group. An example of such a matrix and calculation (but for object queries) is described herein with reference to the self-attention of object queries. Moreover, this process may be repeated across some or all of the groups of grid cells of a feature map and across some or all of the feature maps. For example, the image feature extractor 504 may generate multiple feature maps for each image with each feature map corresponding to one or more detected characteristics of the image. In some such cases, the windows (or other form of grouping) may be applied to some or all of the feature maps and the grid cells of the feature maps updated as described herein.”). 

Regarding claim 16, Park further discloses
The radar system of claim 15, wherein the processor is further configured to determine, based on each enhanced feature of the enhanced spatiotemporal pool of features, positional vectors to indicative of a positional difference associated with the one or more objects in the sequence of the radar images (see paragraph 0171, “In some cases, the enriched object queries 608 (or object queries 604) may also be referred to as floating queries (or a first type of representation queries) as their determined location may change. For example, as the feature values of the object queries (or corresponding tensors) change due to cross-attention and/or self-attention (or other modifications), the combination of the modified object queries with the (same) linear layer matrix may result in a different location being determined. Moreover, as different linear layer matrices may be used to determine a corresponding location of an object query (e.g., at different layers of the attention stage 506), the combination of a different linear layer matrix with an (same or modified) object query may result in a different location being associated with the object query. As such, the location of an object query 604 or enriched object query 608 may vary or “float” to different grid cells of the feature maps 602 and/or enriched feature maps 606.”).  

Regarding claim 17, Park further discloses
The radar system of claim 16, wherein the processor is further configured to 
determine pairs of vectors based on the positional vectors, each pair of the pairs of vectors include a pseudo-observation vector and forward direction vector (see paragraph 0120, “] In some cases, the object query self-attention stage 514 compares the features of a particular object query with the features of some or all of the other object queries (or some or all of the object features of a group of object features) to determine a correlation or similarity between the particular object query and the other object queries. In some cases, the correlation or similarity can be represented as a probability or weight. Using the correlation between the particular object query and the other object queries, the features of the object queries (include the particular object query) may be weighted and the weighted features may be used to calculate a new (or modified) value for the respective features of the particular object query. For example, a first feature of some or all of the object queries may be weighted (relative to the particular object query) and the weighted values used to determine a value for the first feature of the particular object query. Similarly, the other features of the particular object query may be updated (e.g., using the same or a different weighting). In some cases, the object query self-attention stage 514 may update the features of some or all of the object queries in this way. In certain cases, the object query self-attention stage 514 may determine a matrix to indicate the relationship (or weight) between the features of the various object queries and use the matrix to update the features of some or all of the object queries.”);
determine, based on the pairs of vectors, a first similarity factor to indicate an association between the pairs of vectors (see paragraph 0120 as noted above); and 
generate, based on a similarity factor, a set of control commands to the one or more objects in the scene (see paragraph 0120, “In certain cases, the object query self-attention stage 514 may determine a matrix to indicate the relationship (or weight) between the features of the various object queries and use the matrix to update the features of some or all of the object queries.”, further see paragraph 0123, “Based on the determined relationship or weighting, the object query self-attention stage 514 may update the values for the features of the object queries as follows:..”). 

Regarding claim 20, the same cited section and rationale as claim 1 is applied. 

Regarding claim 21, the same cited section and rationale as claim 1 is applied.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 11 and 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Park et al. (US 20240062520 A1) in view of Lin et al. (US 20240005547 A1), hereinafter Lin.

Regarding claim 11, Park discloses [Note: what Park fails to clearly disclose is strike-through] 
The radar system of claim 10, 

Lin discloses, 
wherein the processor is further configured to train the neural network based on an output of a bounding box loss function indicative of a bounding box loss associated with the one or more bounding boxes, wherein the bounding box loss function includes a linear combination of regression functions (see paragraph 0092 which discloses using a heatmap loss function and an offset loss function for the bounding box loss function include linear combination of regression functions, further see paragraph 0557 for support, further note that claim 12 below indicates an offset loss function and heatmap loss functions indeed “includes a linear combination of regression functions”).

It would have been obvious to someone with ordinary skill in the art prior to the
effective filing date of the claimed invention to incorporate the features as disclosed by Lin into the invention of Park. Both references are considered analogous arts to the claimed invention as they both disclose the use of machine learning to analyze sensor data and create bounding boxes around areas of interest in a vehicular environment. The combination would be obvious with a reasonable expectation of success in order to efficiently train the neural network using loss data for accurate object detection. 



Regarding claim 12, the combination of Park and Lin discloses [Note: what Park fails to clearly disclose is strike-through]
The radar system of claim 11,
Park fails to discloses: 



Lin discloses, 
wherein the linear combination of regression functions includes at least two of: 
a heatmap loss function, a width and length loss function, an orientation loss function, and an offset loss function (see paragraph 0092 which discloses using a heatmap loss function and an offset loss function for the bounding box loss function). 

It would have been obvious to someone with ordinary skill in the art prior to the
effective filing date of the claimed invention to incorporate the features as disclosed by Lin into the invention of Park. Both references are considered analogous arts to the claimed invention as they both disclose the use of machine learning to analyze sensor data and create bounding boxes around areas of interest in a vehicular environment. The combination would be obvious with a reasonable expectation of success in order to efficiently train the neural network using loss data for accurate object detection. 

Claim(s) 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Park et al. (US 20240062520 A1) in view of Cennamo et al. (US 20220120858 A1), hereinafter Cennamo.

Regarding claim 18, Park discloses [Note: what Park fails to clearly disclose is strike-through]
The radar system of claim 17, wherein the processor is further configured to:




Cennamo discloses, 
perform a generalized intersection over union operation on bounding boxes associated with predicted trajectory observations and actual trajectory observations to determine a trajectory similarity factor (see paragraph 0073, “In order to estimate the bounding box confidence score 55, ground truth bounding boxes need to be defined, and for all bounding boxes for which the bounding box parameters 27 are determined, a so called Intersection over Union score (IoU score) is calculated. The Intersection over Union score compares a predicted bounding box with a ground truth bounding box and is defined as the ratio of the intersection or overlap of the predicted and ground truth bounding boxes with respect to their union. A detection of a bounding box is regarded as positive if the IoU score is above a given threshold, e.g. 0.35. In order to avoid confusion in the network, bounding boxes having an IoU score slightly below this threshold (e.g. in a range from 0.2 to 0.35) are masked out.”); 
perform a rotation operation on a Bayesian filter predicted state to determine an angular similarity factor (see paragraph 0074, “For estimating the bounding box confidence score 55 for bounding boxes which are not aligned, an empirical approximation is used. First, the center of a predicted bounding box and the center of a ground truth bounding box are determined. Around these centers, bounding boxes having an estimated size or estimated dimensions are formed, and both bounding boxes are rotated by the ground truth yaw angle. Then the intersection for these aligned boxes is computed. The value of the intersection is weighted by the absolute cosine value of the angle error and rescaled to fit in a range from 0.5 to 1. The final bounding box confidence score or IoU score is then computed by dividing the approximated intersection by the union area, i.e. the sum of the areas of the two boxes, i.e. the ground truth bounding box and the predicted bounding box, minus the approximated intersection.”, where bounding boxes determined using neural networks use Kalman filters which are a type of Bayesian filters used to determine the predicted state); and 
determine the similarity factor based on the trajectory similarity factor and the angular similarity factor (see paragraph 0074, “The value of the intersection is weighted by the absolute cosine value of the angle error and rescaled to fit in a range from 0.5 to 1. The final bounding box confidence score or IoU score is then computed by dividing the approximated intersection by the union area, i.e. the sum of the areas of the two boxes, i.e. the ground truth bounding box and the predicted bounding box, minus the approximated intersection.”). 

It would have been obvious to someone with ordinary skill in the art prior to the
effective filing date of the claimed invention to incorporate the features as disclosed by Cennamo into the invention of Park. Both references are considered analogous arts to the claimed invention as they both disclose the use of machine learning to analyze sensor data and create bounding boxes around areas of interest in a vehicular environment. The combination would be obvious with a reasonable expectation of success in order to efficiently train the neural network using loss data for accurate object detection. 

Claim(s) 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Park et al. (US 20240062520 A1) in view of King et al. (US 12271790 B1) hereinafter King.

Regarding claim 19, Park discloses [Note: what Park fails to clearly disclose is strike-through]
The radar system of claim 17, 

King discloses, 
wherein the processor is further configured to train the neural network based on a direction estimation loss function indicative of a direction estimation loss associated with an anticipated positional difference of the one or more objects in the sequence of the radar images (see Col. 21, lines 17-26, “In some implementations, the track adjustment system (e.g., system 519) may calculate loss of an adjustment, e.g., difference between the predicted track and ground truth track, which includes at least one of translation loss, angle loss, or extent loss. In some implementations, the adjustment loss may be used by the training engine 214 (see FIG. 2) to train a machine learning model (e.g., MLP 670). The training engine can use the adjustment loss to evaluate how well the machine learning model fits the given training samples so that the weights of the model can be updated to reduce the loss on the next evaluation.”, where the samples are from radar data of objects detected around the vehicle).

It would have been obvious to someone with ordinary skill in the art prior to the
effective filing date of the claimed invention to incorporate the features as disclosed by King into the invention of Park. Both references are considered analogous arts to the claimed invention as they both disclose the use of machine learning to analyze sensor data and create bounding boxes around areas of interest in a vehicular environment. The combination would be obvious with a reasonable expectation of success in order to efficiently train the neural network using loss data for accurate object detection. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: 
Sun et al. (US 12315083 B2) is considered close pertinent art to the claimed invention as it discloses a neural network employing a self-attention mechanism with a SWIN transformer for object detection. 
Irshad et al. (US 20250225721 A1) is considered close pertinent art to the claimed invention as it discloses a neural network system which employs a window shifting mechanism to generate enhanced images. 
Singh et al. (US 20240127596 A1) is considered close pertinent art to the claimed invention as it discloses a neural network system which employs a window shifting mechanism to generate enhanced images and bounding boxes for object detection.
Liu, Ze, et al. "Swin transformer: Hierarchical vision transformer using shifted windows." Proceedings of the IEEE/CVF international conference on computer vision. 2021. is considered close pertinent art to the claimed invention as it discloses a Swin transformer and how it implements window shifting functions. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NAZRA N. WAHEED whose telephone number is (571)272-6713. The examiner can normally be reached M-F (8 AM - 4:30 PM).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vladimir Magloire can be reached at (571)270-5144. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/NAZRA NUR WAHEED/Examiner, Art Unit 3648
Read full office action
Prosecution Timeline

Feb 26, 2024
Application Filed
Feb 09, 2026
Non-Final Rejection — §101, §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/525,243
Patent 12578459
OBJECT DETECTION FROM SYNTHETIC APERTURE RADAR USING A COMPLEX-VALUED CONVOLUTIONAL NEURAL NETWORK
2y 5m to grant Granted Mar 17, 2026
18/337,220
Patent 12546641
HYGIENIC GUIDED WAVE LEVEL MEASUREMENT WITH SHEATH
2y 5m to grant Granted Feb 10, 2026
18/334,013
Patent 12535574
Radar Device and Method of Operating a Radar Device
2y 5m to grant Granted Jan 27, 2026
18/184,100
Patent 12529779
RADAR SYSTEM AND INSPECTION METHOD
2y 5m to grant Granted Jan 20, 2026
18/267,390
Patent 12523755
SENSOR AND CONTROL METHOD
2y 5m to grant Granted Jan 13, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
83%
Grant Probability
94%
With Interview (+11.2%)
2y 11m
Median Time to Grant
Low
PTA Risk
Based on 221 resolved cases by this examiner. Grant probability derived from career allow rate.