Last updated: April 19, 2026
Application No. 18/506,583
ADAPTIVE LEARNABLE POOLING FOR OVERLAPPING CAMERA FEATURES

Non-Final OA §102§103
Filed
Nov 10, 2023
Examiner
FUJITA, KATRINA R
Art Unit
2672
Tech Center
2600 — Communications
Assignee
Qualcomm Incorporated
OA Round
1 (Non-Final)
Interview Optional

— +24.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 674 resolved cases, 2023–2026
Examiner Intelligence

FUJITA, KATRINA R View full profile →
Grants 70% — above average
Career Allow Rate
472 granted / 674 resolved
+8.0% vs TC avg
Strong +24% interview lift
Without
With
+24.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
25 currently pending
Career history
699
Total Applications
across all art units
Statute-Specific Performance

§101
11.3%
-28.7% vs TC avg
§103
55.7%
+15.7% vs TC avg
§102
15.3%
-24.7% vs TC avg
§112
11.8%
-28.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 674 resolved cases
Office Action

§102 §103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Interpretation

The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.

Claim Rejections - 35 USC § 102

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1, 5, 9-11, 15, 19, 20, 24, 26 and 29 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Mahjourian et al. (US 2022/0301182).
Regarding claim 1, Mahjourian et al. discloses an apparatus for processing image data, the apparatus comprising: 
a memory for storing the image data (“one or more memory devices for storing instructions and data” at paragraph 0125, line 8); and 
processing circuitry in communication with the memory (“a central processing unit will receive instructions and data from a read-only memory or a random access memory” at paragraph 0125, line 4), wherein the processing circuitry is configured to: 
extract features from a respective image from each camera (“Each group of camera sensor measurements can be represented as an image patch, e.g., an RGB image patch” at paragraph 0029, last sentence; “Once the sensor subsystems 130 classify one or more groups of raw sensor measurements as being measures of respective other agents, the sensor subsystems 130 can compile the raw sensor measurements into a set of raw data 132, and send the raw data 132 to a data representation system 140” at paragraph 0030) of a plurality of cameras (“The on-board system 110 includes one or more sensor subsystems 130. The sensor subsystems 130 include a combination of components that receive reflections of electromagnetic radiation, e.g., lidar systems that detect reflections of laser light, radar systems that detect reflections of radio waves, and camera systems that detect reflections of visible light” at paragraph 0027); and 
fuse the features into a fused image having a grid structure (“The motion prediction system 150 processes the scene data 142 to generate features of a top-down representation of the environment. The top-down representation of the environment is a grid that includes a plurality of grid cells that each represent a region of the scene in the environment” at paragraph 0034, line 1), wherein to fuse the features, the processing circuitry is configured to: 
determine a contribution of the respective image from each of the plurality of cameras to a respective cell of the grid structure (“The system can then interpolate between the occupancy values by performing an interpolation, e.g., a bilinear sampling (bilinear interpolation), based on the position and the coordinates of each of the neighbor grid cells to generate a respective weight for each neighbor grid cell.” at paragraph 0117, line 1), and 
aggregate, based on the contribution to the respective cell and a respective set of learnable parameters for each cell, the features from each of the respective images to each respective cell of the fused image to generate aggregated features (“The system can then compute the occupancy value for the grid cell as a weighted sum of the occupancy values for the neighbor grid cells in the preceding flow-warped occupancy, i.e., weighted by the weight for the corresponding neighbor grid cell” at paragraph 0117, line 5).
Regarding claim 11, Mahjourian et al. discloses a method for processing image data, the method comprising: 
extracting features from a respective image from each camera (“Each group of camera sensor measurements can be represented as an image patch, e.g., an RGB image patch” at paragraph 0029, last sentence; “Once the sensor subsystems 130 classify one or more groups of raw sensor measurements as being measures of respective other agents, the sensor subsystems 130 can compile the raw sensor measurements into a set of raw data 132, and send the raw data 132 to a data representation system 140” at paragraph 0030) of a plurality of cameras (“The on-board system 110 includes one or more sensor subsystems 130. The sensor subsystems 130 include a combination of components that receive reflections of electromagnetic radiation, e.g., lidar systems that detect reflections of laser light, radar systems that detect reflections of radio waves, and camera systems that detect reflections of visible light” at paragraph 0027); and 
fusing the features into a fused image having a grid structure (“The motion prediction system 150 processes the scene data 142 to generate features of a top-down representation of the environment. The top-down representation of the environment is a grid that includes a plurality of grid cells that each represent a region of the scene in the environment” at paragraph 0034, line 1), wherein fusing the features comprises: 
determining a contribution of the respective image from each of the plurality of cameras to a respective cell of the grid structure (“The system can then interpolate between the occupancy values by performing an interpolation, e.g., a bilinear sampling (bilinear interpolation), based on the position and the coordinates of each of the neighbor grid cells to generate a respective weight for each neighbor grid cell.” at paragraph 0117, line 1), and 
aggregating, based on the contribution to the respective cell and a respective set of learnable parameters for each cell, the features from each of the respective images to each respective cell of the fused image to generate aggregated features (“The system can then compute the occupancy value for the grid cell as a weighted sum of the occupancy values for the neighbor grid cells in the preceding flow-warped occupancy, i.e., weighted by the weight for the corresponding neighbor grid cell” at paragraph 0117, line 5).
Regarding claim 20, Mahjourian et al. discloses a non-transitory computer-readable storage medium storing instructions (“a central processing unit will receive instructions and data from a read-only memory or a random access memory” at paragraph 0125, line 4) that, when executed, cause one or more processors to: 
extract features from a respective image from each camera (“Each group of camera sensor measurements can be represented as an image patch, e.g., an RGB image patch” at paragraph 0029, last sentence; “Once the sensor subsystems 130 classify one or more groups of raw sensor measurements as being measures of respective other agents, the sensor subsystems 130 can compile the raw sensor measurements into a set of raw data 132, and send the raw data 132 to a data representation system 140” at paragraph 0030) of a plurality of cameras (“The on-board system 110 includes one or more sensor subsystems 130. The sensor subsystems 130 include a combination of components that receive reflections of electromagnetic radiation, e.g., lidar systems that detect reflections of laser light, radar systems that detect reflections of radio waves, and camera systems that detect reflections of visible light” at paragraph 0027); and 
fuse the features into a fused image having a grid structure (“The motion prediction system 150 processes the scene data 142 to generate features of a top-down representation of the environment. The top-down representation of the environment is a grid that includes a plurality of grid cells that each represent a region of the scene in the environment” at paragraph 0034, line 1), wherein to fuse the features, the instructions further cause the one or more processors to: 
determine a contribution of the respective image from each of the plurality of cameras to a respective cell of the grid structure (“The system can then interpolate between the occupancy values by performing an interpolation, e.g., a bilinear sampling (bilinear interpolation), based on the position and the coordinates of each of the neighbor grid cells to generate a respective weight for each neighbor grid cell.” at paragraph 0117, line 1), and 
aggregate, based on the contribution to the respective cell and a respective set of learnable parameters for each cell, the features from each of the respective images to each respective cell of the fused image to generate aggregated features (“The system can then compute the occupancy value for the grid cell as a weighted sum of the occupancy values for the neighbor grid cells in the preceding flow-warped occupancy, i.e., weighted by the weight for the corresponding neighbor grid cell” at paragraph 0117, line 5).
Regarding claim 26, Mahjourian et al. discloses an apparatus for processing image data, the apparatus comprising: 
means for extracting features from a respective image from each camera (“Each group of camera sensor measurements can be represented as an image patch, e.g., an RGB image patch” at paragraph 0029, last sentence; “Once the sensor subsystems 130 classify one or more groups of raw sensor measurements as being measures of respective other agents, the sensor subsystems 130 can compile the raw sensor measurements into a set of raw data 132, and send the raw data 132 to a data representation system 140” at paragraph 0030) of a plurality of cameras (“The on-board system 110 includes one or more sensor subsystems 130. The sensor subsystems 130 include a combination of components that receive reflections of electromagnetic radiation, e.g., lidar systems that detect reflections of laser light, radar systems that detect reflections of radio waves, and camera systems that detect reflections of visible light” at paragraph 0027); and 
means for fusing the features into a fused image having a grid structure (“The motion prediction system 150 processes the scene data 142 to generate features of a top-down representation of the environment. The top-down representation of the environment is a grid that includes a plurality of grid cells that each represent a region of the scene in the environment” at paragraph 0034, line 1), wherein the means for fusing the features comprises: 
means for determining a contribution of the respective image from each of the plurality of cameras to a respective cell of the grid structure (“The system can then interpolate between the occupancy values by performing an interpolation, e.g., a bilinear sampling (bilinear interpolation), based on the position and the coordinates of each of the neighbor grid cells to generate a respective weight for each neighbor grid cell.” at paragraph 0117, line 1), and 
means for aggregating, based on the contribution to the respective cell and a respective set of learnable parameters for each cell, the features from each of the respective images to each respective cell of the fused image to generate aggregated features (“The system can then compute the occupancy value for the grid cell as a weighted sum of the occupancy values for the neighbor grid cells in the preceding flow-warped occupancy, i.e., weighted by the weight for the corresponding neighbor grid cell” at paragraph 0117, line 5).
Regarding claims 5, 15, 24 and 29, Mahjourian et al. discloses an apparatus, method and medium wherein, for each respective cell of the grid structure, the respective set of learnable parameters includes a weight for each of the plurality of cameras that contribute to the respective cell (“The system can then compute the occupancy value for the grid cell as a weighted sum of the occupancy values for the neighbor grid cells in the preceding flow-warped occupancy, i.e., weighted by the weight for the corresponding neighbor grid cell” at paragraph 0117, line 5).
Regarding claims 9 and 19, Mahjourian et al. discloses an apparatus and method wherein the processing circuitry and the memory are part of an advanced driver assistance system (“the vehicle 102 can have an advanced driver assistance system (ADAS) that assists a human driver of the vehicle 102 in driving the vehicle 102 by detecting potentially unsafe situations and alerting the human driver or otherwise responding to the unsafe situation” at paragraph 0026, second to last sentence).
Regarding claim 10, Mahjourian et al. discloses an apparatus wherein the apparatus further comprises: the plurality of cameras (“The on-board system 110 includes one or more sensor subsystems 130. The sensor subsystems 130 include a combination of components that receive reflections of electromagnetic radiation, e.g., lidar systems that detect reflections of laser light, radar systems that detect reflections of radio waves, and camera systems that detect reflections of visible light” at paragraph 0027).

Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 2-4, 12-14, 21-23, 27 and 28 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Mahjourian et al. and Mueller et al. (US 2020/0388005).
Regarding claims 2, 12, 21 and 27, Mahjourian et al. discloses an apparatus, method and medium as described above.
Mahjourian et al. does not explicitly disclose means for determining, from a lookup table, the contribution of the respective image from each of the plurality of cameras to the respective cell of the grid structure.
Mueller et al. teaches an apparatus, method and medium in the same field of endeavor of surrounding vehicle image fusion, comprising means for determining, from a lookup table, the contribution of the respective image from each of the plurality of cameras to the respective cell of the grid structure (“The conceptual blending table has the weights given to the camera images and tells the synthesizer 411 how much each frame in the image data 915 may be associated with the surround view image 908” at paragraph 0105, second to last sentence).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to utilize a lookup table as taught by Mueller et al. to store the weight parameters in Mahjourian et al. to provide a simple data structure from which to reference the proper image fusion parameters without needing to calculate these parameters every time.
Regarding claims 3, 13, 22 and 28, the Mahjourian et al. and Mueller et al. combination discloses an apparatus, method and medium wherein the lookup table indicates contribution of the respective image from each of the plurality of cameras to the respective cell of the grid structure based on one or more of a configuration of the plurality of cameras (“The calibrated viewpoint look-up table 902 and the conceptual blending look-up table 904 may be associated with a single camera and saved as a set with respect to a virtual viewpoint” Mueller et al. at paragraph 0105, line 13; therefore the associated weight for each frame from each camera is established by where the camera is in relation to the overall configuration to generate the surrounding image) or a type of the plurality of cameras.
Regarding claims 4, 14 and 23, the Mahjourian et al. and Mueller et al. combination discloses an apparatus, method and medium wherein the processing is circuitry is further configured to: generate the lookup table based on one or more of a configuration of the plurality of cameras (“The calibrated viewpoint look-up table 902 and the conceptual blending look-up table 904 may be associated with a single camera and saved as a set with respect to a virtual viewpoint” Mueller et al. at paragraph 0105, line 13; therefore the associated weight for each frame from each camera is established by where the camera is in relation to the overall configuration to generate the surrounding image) or a type of the plurality of cameras.

Claim(s) 7 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Mahjourian et al. and Singh et al. (US 2024/0127596).
Mahjourian et al. discloses the elements of claims 1 and 11 as described above.
Mahjourian et al. does not explicitly disclose applying, to the fused image, object detection decoder to generate a set of bounding boxes that indicate a location of one or more objects within the fused image.
Singh et al. teaches an apparatus and method in the same field of endeavor of surrounding vehicle image fusion wherein the processing is circuitry is further configured to: apply, to the fused image (see paragraph 0197 that describes the enriched object queries that result from fusion of the sensor data), object detection decoder to generate a set of bounding boxes that indicate a location of one or more objects within the fused image (“At block 812, the perception system 402 generates at least one bounding box based on the enriched object queries. As described herein, the perception system 402 may use one or more decoders to identify bounding boxes for objects in an image based on the enriched object queries. In some cases, the perception system 402 may use a feed forward network to process the enriched object queries. The perception system may generate an object classification associated with the bounding box” at paragraph 0198, line 1).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to utilize the bounding box generation as taught by Singh et al. on the fused data of Mahjourian et al. to “improve the autonomous vehicle's ability to identify objects within the autonomous vehicle's scene” (Singh et al. at paragraph 0024, line 7).

Claim(s) 8 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Mahjourian et al. and Boyraz et al. (US 2021/0405638).
Mahjourian et al. discloses the elements of claims 1 and 11 as described above.
Mahjourian et al. does not explicitly disclose applying, to the fused image, a segmentation decoder to identify types of objects in the fused image.
Boyraz et al. teaches an apparatus and method in the same field of endeavor of vehicle obstacle detection, wherein the processing is circuitry is further configured to: apply, to the fused image, a segmentation decoder to identify types of objects in the fused image (“In a multi-resolution setup, encoder features from both RGB sensor and LIDAR sensor modalities at different resolutions may be pulled and combined together before being passed to the decoder network of the segmentation module 312” at paragraph 0061, last sentence).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to utilize a segmentation module as taught by Boyraz et al. on the fused data of Mahjourian et al. as a way to incorporate the fused sensor data to better determine where true obstacles are.

Allowable Subject Matter

Claims 6, 16, 25 and 30 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:  the prior art does not teach or disclose calculate the aggregated features according to a function: 

    PNG
    media_image1.png
    28
    766
    media_image1.png
    Greyscale
,
wherein i is the respective cell, aggregated_features.sub.i are the aggregated features at cell i, W.sub.i[1] is the weight of a first camera of the plurality of cameras that contributes to cell i, F.sub.i1 is the respective features of the first camera of the plurality of cameras at cell i, W.sub.i[2] is the weight of a second camera of the plurality of cameras that contributes to cell i, F.sub.i2 is the respective features of the second camera of the plurality of cameras at cell i, W.sub.i[N.sub.i] is the weight of an Nth camera of the plurality of cameras that contributes to cell i, and F.sub.iNi is the respective features of the Nth camera of the plurality of cameras at cell i as required by claims 6, 16, 25 and 30.

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to KATRINA R FUJITA whose telephone number is (571)270-1574. The examiner can normally be reached Monday - Friday 9:30-5:30 pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sumati Lefkowitz can be reached at 5712723638. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/KATRINA R FUJITA/Primary Examiner, Art Unit 2672
Read full office action
Prosecution Timeline

Nov 10, 2023
Application Filed
Dec 09, 2025
Non-Final Rejection — §102, §103
Mar 26, 2026
Examiner Interview Summary
Mar 26, 2026
Applicant Interview (Telephonic)
Precedent Cases

Applications granted by this same examiner with similar technology

19/190,916
Patent 12597250
DETECTION OF PLANT DETRIMENTS
2y 5m to grant Granted Apr 07, 2026
18/041,337
Patent 12582476
SYSTEMS FOR PLANNING AND PERFORMING BIOPSY PROCEDURES AND ASSOCIATED METHODS
2y 5m to grant Granted Mar 24, 2026
18/119,983
Patent 12585698
MULTIMEDIA FOCALIZATION
2y 5m to grant Granted Mar 24, 2026
18/544,074
Patent 12586190
SYSTEM AND METHOD OF CLASSIFICATION OF BIOLOGICAL PARTICLES
2y 5m to grant Granted Mar 24, 2026
17/807,230
Patent 12566341
PREDICTING SIZING AND/OR FITTING OF HEAD MOUNTED WEARABLE DEVICE
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
70%
Grant Probability
94%
With Interview (+24.0%)
3y 0m
Median Time to Grant
Low
PTA Risk
Based on 674 resolved cases by this examiner. Grant probability derived from career allow rate.
ADAPTIVE LEARNABLE POOLING FOR OVERLAPPING CAMERA FEATURES

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email