Last updated: April 19, 2026

Application No. 18/419,417

LEARNABLE SENSOR SIGNATURES TO INCORPORATE MODALITY-SPECIFIC INFORMATION INTO JOINT REPRESENTATIONS FOR MULTI-MODAL FUSION

Non-Final OA §102§103

Filed

Jan 22, 2024

Examiner

MAIDEN, MICHAEL KIM

Art Unit

2665

Tech Center

2600 — Communications

Assignee

Qualcomm Incorporated

OA Round

1 (Non-Final)

Interview Optional

— +8.9% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 72 resolved cases, 2023–2026

Examiner Intelligence

MAIDEN, MICHAEL KIM View full profile →

Grants 93% — above average

Career Allow Rate

67 granted / 72 resolved

+31.1% vs TC avg

Moderate +9% lift

Without

With

+8.9%

Interview Lift

resolved cases with interview

Typical timeline

2y 11m

Avg Prosecution

16 currently pending

Career history

Total Applications

across all art units

Statute-Specific Performance

§101

9.8%

-30.2% vs TC avg

§103

52.1%

+12.1% vs TC avg

§102

29.0%

-11.0% vs TC avg

§112

8.0%

-32.0% vs TC avg

Black line = Tech Center average estimate • Based on career data from 72 resolved cases

Office Action

§102 §103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Status
	Claim(s) 1-2 and 12-15 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Muehlenstaedt (US 20230075425 A1).
Claim(s) 3-6, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Muehlenstaedt (US 20230075425 A1) in view of Hong (US 20240069176 A1).
Claim(s) 7-8, 10-11, and 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Muehlenstaedt (US 20230075425 A1) in view of Chiu (US 20200357143 A1)
Claims 9 and 19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
	
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-2 and 12-15 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Muehlenstaedt (US 20230075425 A1).

Regarding claims 1 and 14, Muehlenstaedt discloses [Claim 1: An apparatus for wireless communication (¶48 “System interface 360 is configured to facilitate wired or wireless communications to and from external devices”) at a user equipment (UE), comprising: (¶22 “Such image-based machine learning systems can be implemented in robotic systems (e.g., autonomous vehicles and articulating arms).”)
at least one memory; and (¶48 “a memory 312 connected to and accessible by other portions of computing device 300”)
at least one processor coupled to the at least one memory, the at least one processor, individually or in any combination, is configured to: (¶16 “An “electronic device” or a “computing device” refers to a device that includes a processor and memory”)]
[Claim 14: A method of data processing, comprising: (¶16 “An “electronic device” or a “computing device” refers to a device that includes a processor and memory”) ]
extract a set of features from each sensor of multiple sensors; (¶2 “ The vehicle also comprises cameras, radars and LiDAR sensors for detecting objects in proximity thereto.” ¶22 “The robotic systems may use the machine learning models and/or algorithms for various purposes such as feature extraction using multi-camera views…”)
map a vector to each feature in the set of features extracted from each sensor, wherein the vector is related to at least one of: (¶63 “Thus, the term “feature embedding” as used herein refers to a vector representation of visual and spatial features extracted from an image.”) positioning information or a set of intrinsic parameters associated with each sensor of the multiple sensors; (¶40 “Operational parameter sensors that are common to both types of mobile platforms include, for example: a position sensor 236”)
concatenate sets of features from the multiple sensors with their corresponding embedded vectors; and (¶63 “Thus, the term “feature embedding” as used herein refers to a vector representation of visual and spatial features extracted from an image.”)
train a machine learning (ML) model to identify relationships between different sensors in the multiple sensors based on the concatenated sets of features and the corresponding embedded vectors; or output the concatenated sets of features and the corresponding embedded vectors for training of the ML model for identification of the relationships between the different sensors in the multiple sensors. (¶35 “The training data set 126 is then stored in datastore 112 (e.g., a database) and/or used by the computing device 110 during a training process to train the machine learning model(s)/algorithm(s) 128 to, for example, facilitate scene perception by another mobile platform using loss functions that iteratively process training examples over multiple cycles. The scene perception can be achieved via feature extraction using multi-camera views, object detection using the extracted features and/or object prediction”)

Regarding claims 2 and 15, Muehlenstaedt discloses wherein [Claim 2: to map the vector to each feature in the set of features extracted from each sensor, the at least one processor, individually or in any combination, is configured to: (¶63 “Thus, the term “feature embedding” as used herein refers to a vector representation of visual and spatial features extracted from an image.”)]
[Claim 15: mapping the vector to each feature in the set of features extracted from each sensor comprises: (¶63 “Thus, the term “feature embedding” as used herein refers to a vector representation of visual and spatial features extracted from an image.”) ]
embed the vector into each feature in the set of features extracted from each sensor. (¶63 “Thus, the term “feature embedding” as used herein refers to a vector representation of visual and spatial features extracted from an image.”)

Regarding claim 12, Muehlenstaedt discloses wherein the at least one processor, individually or in any combination, is further configured to: (Muehlenstaedt: ¶16 “An “electronic device” or a “computing device” refers to a device that includes a processor and memory”)
output an indication of the trained ML model. (¶36 “Once trained, the machine learning model(s)/algorithm(s) 128 is(are) deployed on the other mobile platforms such as vehicle 102”)

Regarding claim 13, Muehlenstaedt discloses wherein to output the indication of the trained ML model, the at least one processor, individually or in any combination, is configured to: (¶36 “Once trained, the machine learning model(s)/algorithm(s) 128 is(are) deployed on the other mobile platforms such as vehicle 102”)
transmit the indication of the trained ML model; or (¶36 “Once trained, the machine learning model(s)/algorithm(s) 128 is(are) deployed on the other mobile platforms such as vehicle 102”)
store the indication of the trained ML model. (¶36 “Once trained, the machine learning model(s)/algorithm(s) 128 is(are) deployed on the other mobile platforms such as vehicle 102”)

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 3-6, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Muehlenstaedt (US 20230075425 A1) in view of Hong (US 20240069176 A1).

Regarding claims 3 and 16, Muehlenstaedt discloses wherein the multiple sensors include different types of sensors, (Muehlenstaedt: ¶2 “The vehicle also comprises cameras, radars and LiDAR sensors for detecting objects in proximity thereto” ¶70 “This detection is made based on sensor data output from a camera (e.g., camera 262 of FIG. 2) of the mobile platform and/or LiDAR datasets generated by a LiDAR system (e.g., LiDAR system 264 of FIG. 2) of the mobile platform.” ) 
Muehlenstaedt fails to specifically disclose and wherein vectors mapped to features extracted from the different types of sensors are associated with different embedding dimensions.
In related art, Hong discloses and wherein vectors mapped to features extracted from the different types of sensors are associated with different embedding dimensions. (Hong: ¶4 “ a visible light RGB camera provides two-dimensional (2D) positions and color information, and a LiDAR provides distance information, an object may be visualized in three dimensions by mapping the information of the camera and the LiDAR.”)
Therefore, it would have been obvious to for one of ordinary skill in the art before the effective filing date to incorporate different types of sensors producing position information different dimensions disclosed by Hong into the method of feature extraction and feature vector embedding utilizing output from cameras and LiDAR sensors disclosed by Muehlenstaedt to embed each feature vector according to the dimensionality of its corresponding imaging modality.

Regarding claim 4, Muehlenstaedt, as modified by Hong, disclose wherein the different types of sensors include at least one of camera sensors, light detection and ranging (Lidar) sensors, or camera-Lidar sensors. (Muehlenstaedt: ¶2 “The vehicle also comprises cameras, radars and LiDAR sensors for detecting objects in proximity thereto” ¶70 “This detection is made based on sensor data output from a camera (e.g., camera 262 of FIG. 2) of the mobile platform and/or LiDAR datasets generated by a LiDAR system (e.g., LiDAR system 264 of FIG. 2) of the mobile platform.” )

Regarding claim 5, Muehlenstaedt, as modified by Hong, disclose wherein the at least one processor, individually or in any combination, is further configured to: (Muehlenstaedt: ¶16 “An “electronic device” or a “computing device” refers to a device that includes a processor and memory”)
select an embedding dimension for each type of sensor in the different types of sensors based on corresponding extracted features. (Hong: ¶4 “ a visible light RGB camera provides two-dimensional (2D) positions and color information, and a LiDAR provides distance information, an object may be visualized in three dimensions by mapping the information of the camera and the LiDAR.”)

Regarding claim 6, Muehlenstaedt, as modified by Hong, disclose wherein the corresponding extracted features include scene properties or environmental properties. (Muehlenstaedt: ¶41 “The mobile platform also will include various sensors that operate to gather information about the environment in which the mobile platform is traveling.”)

Claim(s) 7-8, 10-11, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Muehlenstaedt (US 20230075425 A1) in view of Chiu (US 20200357143 A1)

Regarding claims 7 and 17, Muehlenstaedt discloses wherein [Claim 7: the at least one processor, individually or in any combination, is further configured to: (Muehlenstaedt: ¶16 “An “electronic device” or a “computing device” refers to a device that includes a processor and memory”)]
Muehlenstaedt fails to specifically disclose apply an attention mechanism to the concatenated sets of features to obtain a set of attended features associated with the multiple sensors; and
fuse the set of attended features.
In related art, Chiu discloses apply an attention mechanism to the concatenated sets of features to obtain a set of attended features associated with the multiple sensors; and (Chiu: ¶36 “In some embodiments, the attention module 130 of the present principles predicts attention at different spatial locations independently for the appearance (visual) and semantic feature maps.”)
fuse the set of attended features. (Chiu: ¶37 “the modality fusion module 205 of the attention module 130 aligns the feature maps of mid-level appearance features, App.sub.mid, and the feature maps of high-level semantic features, Sem.sub.high, by, in some embodiments, first projecting the appearance features and the semantic features into a common embedding space and then adding the features together.” Chiu discloses fusing the features that have undergone the attention module 130) 
Therefore, it would have been obvious to for one of ordinary skill in the art before the effective filing date to incorporate utilizing an attention module and modality fusion module disclosed by Chiu into the method of feature extraction and feature vector embedding utilizing output from cameras and LiDAR sensors disclosed by Muehlenstaedt to focus on informative regions of the image for feature extraction from the multiple imaging modalities prior to fusing the set of gathered features.

Regarding claims 8 and 18, Muehlenstaedt, as modified by Chiu discloses wherein to train the ML model to identify the relationships between the different sensors in the multiple sensors based on the concatenated sets of features and the corresponding embedded vectors, [Claim 8: the at least one processor, individually or in any combination, is configured to:] (Muehlenstaedt : ¶35 “The training data set 126 is then stored in datastore 112 (e.g., a database) and/or used by the computing device 110 during a training process to train the machine learning model(s)/algorithm(s) 128 to, for example, facilitate scene perception by another mobile platform using loss functions that iteratively process training examples over multiple cycles. The scene perception can be achieved via feature extraction using multi-camera views, object detection using the extracted features and/or object prediction”)
train the ML model to identify the relationships between the different sensors in the multiple sensors (Muehlenstaedt: ¶35 “The training data set 126 is then stored in datastore 112 (e.g., a database) and/or used by the computing device 110 during a training process to train the machine learning model(s)/algorithm(s) 128 to, for example, facilitate scene perception by another mobile platform using loss functions that iteratively process training examples over multiple cycles. The scene perception can be achieved via feature extraction using multi-camera views, object detection using the extracted features and/or object prediction”) based on the fused set of attended features. (Muehlenstaedt: ¶22 “The robotic systems may use the machine learning models and/or algorithms for various purposes such as feature extraction using multi-camera views to perform perception feature fusion for cuboid association using loss functions that iteratively process data points over multiple cycles.” Muehlenstaedt discloses performing ) based on the fused set of attended features. (Chiu: ¶36 “In some embodiments, the attention module 130 of the present principles predicts attention at different spatial locations independently for the appearance (visual) and semantic feature maps.”)

Regarding claim 10, Muehlenstaedt, as modified by Chiu discloses wherein to fuse the set of attended features, the at least one processor, individually or in any combination, is configured to: (Muehlenstaedt: ¶22 “The robotic systems may use the machine learning models and/or algorithms for various purposes such as feature extraction using multi-camera views to perform perception feature fusion for cuboid association using loss functions that iteratively process data points over multiple cycles.”)
fuse the set of attended features with a multilayer perceptron (MLP) to integrate information across the multiple sensors. (Muehlenstaedt: ¶22 “The robotic systems may use the machine learning models and/or algorithms for various purposes such as feature extraction using multi-camera views to perform perception feature fusion for cuboid association using loss functions that iteratively process data points over multiple cycles.” ¶57 “the machine learning model(s)/algorithm(s) is(are) trained to output a feature embedding that can be used for object-cuboid associations. This can be done by passing the intermediate features and camera calibration information in the region used for a cuboid's prediction through additional layers of computation (e.g., 2D convolutions and/or fully connected layers).”)

Regarding claim 11, Muehlenstaedt, as modified by Chiu discloses wherein to output the concatenated sets of features and the corresponding embedded vectors for the training of the ML model, the at least one processor, individually or in any combination, is configured to: (Muehlenstaedt: ¶58 “During the training process of FIG. 4, the intermediate feature embedding output by the machine learning model(s)/algorithm(s) can be passed into a loss function that encourages the embedding to have a small distance to embeddings for the same object from a different view,” Muehlenstaedt discloses outputting embedded vectors used for training the machine learning model through the loss function)
output the fused set of attended features (Chiu: ¶36 “In some embodiments, the attention module 130 of the present principles predicts attention at different spatial locations independently for the appearance (visual) and semantic feature maps.”) for the training of the ML model. (Muehlenstaedt: ¶58 “During the training process of FIG. 4, the intermediate feature embedding output by the machine learning model(s)/algorithm(s) can be passed into a loss function that encourages the embedding to have a small distance to embeddings for the same object from a different view,” Muehlenstaedt discloses outputting embedded vectors used for training the machine learning model through the loss function)
	 
Allowable Subject Matter
Claims 9 and 19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: 
Park (US 20240020953 A1) discloses in various examples, feature values corresponding to a plurality of views are transformed into feature values of a shared orientation or perspective to generate a feature map—such as a Bird's-Eye-View (BEV), top-down, orthogonally projected, and/or other shared perspective feature map type. Feature values corresponding to a region of a view may be transformed into feature values using a neural network. The feature values may be assigned to bins of a grid and values assigned to at least one same bin may be combined to generate one or more feature values for the feature map. To assign the transformed features to the bins, one or more portions of a view may be projected into one or more bins using polynomial curves. Radial and/or angular bins may be used to represent the environment for the feature map.
Kavulya (US 11423570 B2) discloses technologies for performing sensor fusion include a compute device. The compute device includes circuitry configured to obtain detection data indicative of objects detected by each of multiple sensors of a host system. The detection data includes camera detection data indicative of a two or three dimensional image of detected objects and lidar detection data indicative of depths of detected objects. The circuitry is also configured to merge the detection data from the multiple sensors to define final bounding shapes for the objects.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL KIM MAIDEN whose telephone number is (703)756-1264. The examiner can normally be reached Monday - Friday 7:30 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Stephen Koziol can be reached at 4089187630. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MICHAEL KIM MAIDEN/Examiner, Art Unit 2665                                                                                                                                                                                                        
/Stephen R Koziol/Supervisory Patent Examiner, Art Unit 2665

Read full office action

Prosecution Timeline

Jan 22, 2024

Application Filed

Jan 09, 2026

Non-Final Rejection — §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/246,625

Patent 12597290

THREE-DIMENSIONAL (3D) FACIAL FEATURE TRACKING FOR AUTOSTEREOSCOPIC TELEPRESENCE SYSTEMS

2y 5m to grant Granted Apr 07, 2026

17/909,828

Patent 12592058

DATA GENERATING METHOD, LEARNING METHOD, ESTIMATING METHOD, DATA GENERATING DEVICE, AND PROGRAM

2y 5m to grant Granted Mar 31, 2026

18/226,050

Patent 12579654

INTERFACE DETECTION IN RECIPROCAL SPACE

2y 5m to grant Granted Mar 17, 2026

18/647,366

Patent 12579830

COMBINING BRIGHTFIELD AND FLUORESCENT CHANNELS FOR CELL IMAGE SEGMENTATION AND MORPHOLOGICAL ANALYSIS IN IMAGES OBTAINED FROM AN IMAGING FLOW CYTOMETER

2y 5m to grant Granted Mar 17, 2026

17/931,878

Patent 12561944

POINT CLOUD DATA PROCESSING APPARATUS, POINT CLOUD DATA PROCESSING METHOD, AND PROGRAM

2y 5m to grant Granted Feb 24, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

93%

Grant Probability

99%

With Interview (+8.9%)

2y 11m

Median Time to Grant

Low

PTA Risk

Based on 72 resolved cases by this examiner. Grant probability derived from career allow rate.