Last updated: April 19, 2026

Application No. 18/516,590

MULTIMODAL 3D OBJECT DETECTION USING TEMPORAL AND STRUCTURE CONSISTENCY IN VOXEL FEATURE SPACE

Non-Final OA §112

Filed

Nov 21, 2023

Examiner

DUONG, JOHNNYKHOI BAO

Art Unit

2667

Tech Center

2600 — Communications

Assignee

Qualcomm Incorporated

OA Round

1 (Non-Final)

Interview Optional

— +32.8% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 56 resolved cases, 2023–2026

Examiner Intelligence

DUONG, JOHNNYKHOI BAO View full profile →

Grants 66% — above average

Career Allow Rate

37 granted / 56 resolved

+4.1% vs TC avg

Strong +33% interview lift

Without

With

+32.8%

Interview Lift

resolved cases with interview

Typical timeline

3y 8m

Avg Prosecution

10 currently pending

Career history

Total Applications

across all art units

Statute-Specific Performance

§101

5.6%

-34.4% vs TC avg

§103

50.9%

+10.9% vs TC avg

§102

36.3%

-3.7% vs TC avg

§112

4.4%

-35.6% vs TC avg

Black line = Tech Center average estimate • Based on career data from 56 resolved cases

Office Action

§112

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 26 June 2024 was filed and is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Specification
The disclosure is objected to because of the following informalities:
Possible typos, such as in [0018, line 1] that states “may b” that should be read as “maybe” or “may be” depending on intent, or such as in [0058, line 3] states “into a confidence store” that might be read as “confidence score” instead.
Appropriate correction is required.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that use the word “means” or “step” but are nonetheless not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph because the claim limitation(s) recite(s) sufficient structure, materials, or acts to entirely perform the recited function.  Such claim limitation(s) is/are: 
“means for forming a first voxel representation” in claim 20 described in paragraph [0005-0008, 0062-0064] and implemented on hardware disclosed in paragraphs [0040, 0042-0043, 0084].
“means for determining a first set of features” in claim 20 described in paragraph [0005-0008, 0037-0038, 0043-0046, 0073] and implemented on hardware disclosed in paragraphs [0040, 0042-0043, 0084].
“means for forming a second voxel representation” in claim 20 described in paragraph [0005-0008, 0062-0064] and implemented on hardware disclosed in paragraphs [0040, 0042-0043, 0084].
“means for determining a second set of features” in claim 20 described in paragraph [0005-0008, 0037-0038, 0043-0046, 0073] and implemented on hardware disclosed in paragraphs [0040, 0042-0043, 0084].
“means for determining correspondences between the voxels” in claim 20 described in paragraph [0005-0008, 0044-0046] and implemented on hardware disclosed in paragraphs [0040, 0042-0043, 0084].
“means for determining positions of objects” in claim 20 described in paragraph [0005-0008, 0043-0049] and implemented on hardware disclosed in paragraphs [0040, 0042-0043, 0084].
Because this/these claim limitation(s) is/are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are not being interpreted to cover only the corresponding structure, material, or acts described in the specification as performing the claimed function, and equivalents thereof.
If applicant intends to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to remove the structure, materials, or acts that performs the claimed function; or (2) present a sufficient showing that the claim limitation(s) does/do not recite sufficient structure, materials, or acts to perform the claimed function.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claim 1 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. As claims 2-9 depend upon claim 1, claims 2-9 are rejected for similar reasons.
Claim 1 recites the limitation "the unit" in line 11.  There is insufficient antecedent basis for this limitation in the claim. In the interest of compact prosecution, claim 1 will be interpreted as having similar limitations as claims 10 and 20 that have “a unit” in the first limitation. 

Allowable Subject Matter
Claims 10-20 allowed. Claims 1-9 would be allowed after amendments to overcome the 35 USC § 112 rejection for claim 1.
The following is an examiner’s statement of reasons for allowance:
Regarding claims 10 and 20: The cited prior art fails to disclose, teach, or suggest: “A method of processing media data, the method comprising: forming a first voxel representation of a three-dimensional space at a first time using a first image of the three-dimensional space captured by a camera of a moving object having a first pose and a first point cloud of the three-dimensional space captured by a sensor of the moving object; determining a first set of features for voxels in the first voxel representation, the first set of features representing visual characteristics of the corresponding voxels; forming a second voxel representation of the three-dimensional space at a second time using a second image of the three-dimensional space captured by the camera of the moving object having a second pose and a second point cloud of the three-dimensional space captured by the unit of the moving object; determining a second set of features for voxels in the second voxel representation; determining correspondences between the voxels in the first voxel representation and the voxels in the second voxel representation according to similarities between the first set of features and the second set of features; and determining positions of objects in the three-dimensional space relative to the moving object according to the first pose, the second pose, and the correspondences between the voxels in the first voxel representation and the voxels in the second voxel representation, the objects being represented by the voxels in the first voxel representation and the voxels in the second voxel representation” in the context of the claim as a whole.
Regarding claims 11-19: the claims depend directly or indirectly from claims 10 and 20; therefore, allowed at least for similar reasons.
The cited prior art includes:
Chandler (US 2025/0078402 A1, 2022) discloses first and second image with a first and second time (interpreted from structure from motion using images for a time sequence) of a 3D space captured by a camera of a moving object having a first and second pose (interpreted from camera pose for each image, [0008]) and first and second point cloud (interpreted from point clouds associated with each frame, [0067]); first and second feature extraction for visual characteristics (interpreted from semantic keypoints of 3D models); and determining position of objects in the 3D space relative to the moving object according to the first and second pose (interpreted from initial object pose and new object pose). However, Chandler does not appear to specifically teach “determining correspondences between the voxels in the first voxel representation and the voxels in the second voxel representation according to similarities between the first set of features and the second set of features” (Chandler instead generates a new model with pose, not determining correspondence, as the “structure from motion” method in Chandler determines correspondences using the 2D images instead of voxel representation, [0135-0137]); and does not appear to specifically teach “determining positions of objects in the three-dimensional space relative to the moving object according to … the correspondences between the voxels in the first voxel representation and the voxels in the second voxel representation” (Chandler broadly speaks of voxel representations, but is broader in usage than the instant application). Further, there appears to be no motivation to combine with the cited prior art.
Tasse (US 2024/0144595 A1, 2022) discloses correspondence between first and second images of first and second time points (interpreted from “structure form motion”) with first and second point clouds (interpreted from point clouds associated with video) and first pose. However, Tasse does not appear to explicitly teach second pose (may be inferred from Figure 12A, but is broader than the instant application), and “determining correspondences between the voxels in the first voxel representation and the voxels in the second voxel representation according to similarities between the first set of features and the second set of features”. While Tasse does determine positions of objects in 3D space (required for creating augmented reality), Tasse does not explicitly teach using first pose, second pose, and correspondences between voxel representations to do so. Further, there appears to be no motivation to combine with the cited prior art.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Tauber (US 2020/0195904 A1, 2020) discloses a depth based 3D reconstruction method using structure from motion and stereo images (first and second images, but not necessarily first and second time). However, Tauber does not appear to explicitly teach 2nd voxel representation (which can be broadly inferred as part of structure from motion, but is broader than the instant application), point clouds (Tauber is broader than the instant application) and correspondence (broadly inferred from structure from motion, so is broader than the instant application). Further, there appears to be no motivation to combine with the cited prior art.
Dupuis et al (US 2020/0342653 A1, 2020) discloses first and second voxel representation using images with associated point clouds and pose (interpreted from position and orientation of objects) and correspondence between voxel representations (in medical image analysis, image registration would be the equivalent of correspondence). However, Dupuis does not appear to specifically teach camera of a moving object (as Dupuis is for medical images) and poses for determining object position (Dupuis appears to store the known positions already). Further, there appears to be no motivation to combine with the cited prior art.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHNNY B DUONG whose telephone number is (571)272-1358. The examiner can normally be reached Monday - Thursday 10a-9p (ET).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Bella can be reached at (571)272-7778. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/J.B.D./Examiner, Art Unit 2667                                                                                                                                                                                                        
/MATTHEW C BELLA/Supervisory Patent Examiner, Art Unit 2667

Read full office action

Prosecution Timeline

Nov 21, 2023

Application Filed

Jan 22, 2026

Non-Final Rejection — §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/514,059

Patent 12586187

LESION LINKING USING ADAPTIVE SEARCH AND A SYSTEM FOR IMPLEMENTING THE SAME

2y 5m to grant Granted Mar 24, 2026

17/715,145

Patent 12525024

ELECTRONIC DEVICE, METHOD, AND COMPUTER READABLE STORAGE MEDIUM FOR DETECTION OF VEHICLE APPEARANCE

2y 5m to grant Granted Jan 13, 2026

17/731,769

Patent 12518510

MACHINE LEARNING FOR VECTOR MAP GENERATION

2y 5m to grant Granted Jan 06, 2026

17/571,677

Patent 12498556

Microscopy System and Method for Evaluating Image Processing Results

2y 5m to grant Granted Dec 16, 2025

17/403,017

Patent 12488438

DEEP LEARNING-BASED IMAGE QUALITY ENHANCEMENT OF THREE-DIMENSIONAL ANATOMY SCAN IMAGES

2y 5m to grant Granted Dec 02, 2025

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

66%

Grant Probability

99%

With Interview (+32.8%)

3y 8m

Median Time to Grant

Low

PTA Risk

Based on 56 resolved cases by this examiner. Grant probability derived from career allow rate.