Prosecution Insights
Last updated: April 19, 2026
Application No. 18/746,002

CLIP SEARCH WITH MULTIMODAL QUERIES

Non-Final OA §103§112
Filed
Jun 17, 2024
Examiner
UDDIN, MOHAMMED R
Art Unit
2161
Tech Center
2100 — Computer Architecture & Software
Assignee
Tesla Inc.
OA Round
3 (Non-Final)
78%
Grant Probability
Favorable
3-4
OA Rounds
3y 3m
To Grant
99%
With Interview

Examiner Intelligence

Grants 78% — above average
78%
Career Allow Rate
564 granted / 726 resolved
+22.7% vs TC avg
Strong +31% interview lift
Without
With
+30.8%
Interview Lift
resolved cases with interview
Typical timeline
3y 3m
Avg Prosecution
23 currently pending
Career history
749
Total Applications
across all art units

Statute-Specific Performance

§101
22.4%
-17.6% vs TC avg
§103
51.9%
+11.9% vs TC avg
§102
5.4%
-34.6% vs TC avg
§112
8.8%
-31.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 726 resolved cases

Office Action

§103 §112
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . This action is in response to the communication filed on January 14, 2026. Response to Amendment Applicant’s amendment filed on January 14, 2026 with respect to claims 1-20 has been received, entered in to the record and considered. As a result of the amendment, claims 1, 10 and 19 has been amended. Claims 1-20 remain pending in this office action. Information Disclosure Statement The information disclosure statement (IDS) submitted on 02/04/2026. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner. Continued Examination Under 37 CFR 1.114 A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on January 14, 2026 has been entered. Claim Rejections - 35 USC § 112 The following is a quotation of the first paragraph of 35 U.S.C. 112(a): (a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention. The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112: The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention. Claims 1, 10 and 19 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The amended claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. Claims 1, 10 and 19 includes amended limitation wherein the shared latent space is generated by training a text encoder and an image encoder on paired text and labeled images such that embeddings from the text encoder and the image encoder are correlated in the shared latent space. However, nowhere in the specification disclose that the shared latent space is generated by training a text encoder and image encoder on a paired text and labeled image. In remarks, applicants also did not specify where in the specification have support for this newly added limitation. Therefore, this newly amended limitation introduces new matter. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Danna (US 2021/0403036 A1), in view of Schulter et al (US 2023/0281999 A1). As per claim 1, Danna discloses: - a method, comprising (method and system for encoding and searching, Para [0001]), - obtaining, by at least one processor, data associated with an input representing a query, the query comprising one or more semantic elements (receiving a query with a scenario (i.e., semantic element) and obtain data associated with the query, Para [0003], Para [0051], [0059], Fig. 6 and 7, by a processor, Fig. 12, item 1202), - extracting, by the at least one processor, an embedding representing the one or more semantic elements based on the query (embedding for the scenario (i.e., semantic element) is generated based on the query, Para [0003], [0041], [0053], [0059], Fig. 4, item 410, by a processor, Fig. 12, item 1202), the embedding corresponding to a first location that is associated with the query in a shared latent space (embedding reflects the location or scenario in the vector space (i.e., latent space), Para [0003], [0004], [0041], [0044], [0045], [0053], Fig. 4, item 414, 412, 418, Fig. 6A-6B), - comparing, by the at least one processor, the embedding to at least one predetermined embedding of a set of predetermined embeddings comparing embedding (comparing with the set of embedding with other embedding and find similarity in different scenarios or location in a vector space (i.e., latent space), Para [0045], [0053], by a processor, Fig. 12, item 120, Para [0054, Fig. 2, item 210, Para [0005], [0013], [0050], Fig. 4, item 412, 416, Para [0060]), and Para [0102], The machine-learning models may be trained using any suitable training algorithm, including supervised learning based on labeled training data, unsupervised learning based on unlabeled training data, and semi-supervised learning based on a mixture of labeled and unlabeled training data), - selecting, by the at least one processor, the at least one predetermined embedding based on a degree of similarity between the embedding and the at least one predetermined embedding (selecting embedding based on degree of similarity, Para [0053] – [0054], by a processor, Fig. 12, item 1202), - and providing, by the at least one processor, data associated with a graphical user interface (GUI) to cause a display device to display the GUI representing a set of images comprising an image corresponding to the at least one predetermined embedding (representing (i.e., displaying) set of images corresponding to embedding, Para [0041], [0045], [0059], [0061], by a processor, communication interface (i.e., GUI), Fig. 12, item 1202), - wherein the one or more semantic elements at least in part correspond to one or more objects represented by the image (sematic element correspond to an object, Para [0056], [0110]). Danna does not explicitly disclose comparing embedding in the shared latent space. wherein the shared latent space is generated by training a text encoder and an image encoder on paired text and labeled images such that embeddings from the text encoder and the image encoder are correlated in the shared latent space. However, in the same field of endeavor Schulter in an analogous art disclose comparing embedding in the shared latent space (comparing embedding in a latent space, Fig. 4, item 410, 414, Fig. 5, item 506, Para [0049], [0044], [0056], claim 3, Para [0025]), wherein the shared latent space is generated by training a text encoder and an image encoder on paired text and labeled images such that embeddings from the text encoder and the image encoder are correlated in the shared latent space (Para [0024], [0025], [0026], [0035], training pair of named or labeled (i.e., labeled) object such as chair, sofa, couch, in the shared latent space and relating based on semantic relationship). Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the comparing image and text embedding in a latent space as taught by Schulter et al as the means to process semantic embedding in an input query and finding the similarity between embeddings in Danna, (Danna, Para [0003], Para [0051], [0059], Fig. 6 and 7, Schulter, Fig. 4, item 410, 414, Fig. 5, item 506, Para [0049]). Danna and Schulter are analogous prior art since they both deal with processing image and text embedding and finding similarity in embeddings. A person of the ordinary skill in the art would have been motivated to make aforementioned modification to improve safety while driving a vehicle. This is because one aspect of Danna invention is to enable the vehicle to determine its surroundings so that it may safely navigate to target destinations or assist a human driver, as described at least in Para [0040]. Comparing image and semantic query embedding in a shared latent space is part of this process. However, Danna doesn’t specify any particular manner in which embeddings are compared in a shared latent space. This would have lead one of the ordinary skill in the art to seek and recognize embeddings are compared in a shared latent space as taught by Schulter. Schulter describes how their panoptic image segmentation improving the safety of operations at traffic intersection when driving a vehicle, as described at least in Para [0018]). As per claim 2, rejection of claim 1 is incorporated, and further Danna discloses: - providing, by the at least one processor, the data associated with the input to a text encoder to cause the text encoder to generate the embedding (scenario encoder (i.e., text encoder) to generate embedding, Para [0003], Fig. 2, item 208, Fig. 4, item 410, Para [0044], [0056]). As per claim 3, rejection of claim 1 is incorporated, and further Danna discloses: - obtaining, by the at least one processor, the set of predetermined embeddings from a database, the set of predetermined embeddings generated based on images corresponding to the predetermined embeddings and an image encoder (retrieving (i.e., obtaining) embedding and image encoder data from data store 220 (i.e., database), Para [0048], [0053], Fig. 2, item 220, by a processor, Fig. 12, item 1202). As per claim 4, rejection of claim 3 is incorporated, and further Danna discloses: - wherein the image encoder is configured to receive data associated with images generated by at least one sensor supported by at least one vehicle as input and provide embeddings associated with a latent space as output (image encoder image data generated by a sensor in a vehicle, Para [0051]). As per claim 5, rejection of claim 1 is incorporated, and further Danna discloses: - selecting, by the at least one processor, at least one second predetermined embedding of the set of predetermined embeddings based on a second degree of similarity between the at least one second predetermined embedding and other predetermined embeddings of the set of predetermined embeddings (second embedding based on second similarity, Para [0013], [0046], second level similarity, Para [0061). As per claim 6, rejection of claim 1 is incorporated, and further Danna discloses: - obtaining, by the at least one processor, data associated with a second input, the second input indicating selection of a different image represented by the GUI (query with the second scenario (i.e., second input), Para [0053], [0054]), - determining, by the at least one processor, at least one second predetermined embedding based on the selection of the different image represented by the GUI (second embedding based on second scenario, Para [0053], [0054]), - and selecting, by the at least one processor, at least one third predetermined embedding based on a degree of similarity between the at least one second predetermined embedding and embeddings of the set of predetermined embeddings (selecting third embedding based similarity between embeddings, Para [0046], [0053], [0054]). As per claim 7, rejection of claim 6 is incorporated, and further Danna discloses: - wherein the second input indicates selection of the different image that corresponds to a different embedding of the set of predetermined embeddings (first query scenario and second query scenario are different, Para [046], [0053], [0065]). As per claim 8, rejection of claim 1 is incorporated, and further Danna discloses: - wherein the embedding and the set of predetermined embeddings comprise vector representations corresponding to one or more features in a shared latent space (embedding representing in a vector space, Para [0004], [0013], [0053]). As per claim 9, rejection of claim 1 is incorporated, and further Danna discloses: - determining that the degree of similarity between the embedding and the at least one predetermined embedding satisfies a similarity threshold (embedding similarity satisfied threshold, Para [0005], [0006]), - and selecting the at least one predetermined embedding based on the degree of similarity satisfying the similarity threshold (satisfying similarity threshold, Para [0054], [0060]). As per claim 10-18, Claims 10-18 are system claims corresponding to method claims 1-9 respectively and rejected under the same reason set forth to the rejection of claims 1-9 above. As per claims 19-20, Claims 19-20 are computer readable medium claims corresponding to method claims 1-2 respectively and rejected under the same reason set forth to the rejection of claims 1-2 above. Response to Arguments 11. Applicant’s arguments with respect to claims 1-20 have been considered but they are not deemed to be persuasive. In response to the applicant’s argument in page 8-9, applicants argued that, Schulter relates to systems and methods for "[embedding images] using a segmentation model that includes an image branch having an image embedding layer that embeds images into a joint latent space and a text branch having a text embedding layer that embeds text into the joint latent space… throughout Schulter, "masks" are extracted (corresponding to "predicted masks and ground truth masks/boxes" and used to generate the embeddings… This is different from the subject matter of independent claim 1, as a label is associated with the image itself (not a portion thereof) and combining Danna and Schulter would necessarily involve using only a portion of a given image to generate a corresponding embedding … that Danna and Schulter could be combined (which Applicant does not concede), the resulting latent space used for comparisons would be different from the claimed shared latent space, as only masked portions thereof would ultimately be comparable with corresponding text embeddings. Examiner disagree and respectfully response that, First, amended limitation does not have any support in the specification. Therefore, this amended limitation introduce new matter. Second, Shhulter teaches wherein the shared latent space is generated by training a text encoder and an image encoder on paired text and labeled images such that embeddings from the text encoder and the image encoder are correlated in the shared latent space in Para [0024], [0025], [0026], [0035], training pair of named or labeled (i.e., labeled) object such as chair, sofa, couch, in the shared latent space and relating based on semantic relationship, as claimed. Therefore, examiner firmly believe that, Danna and Schulter alone or in combination reasonably teaches the argued limitation and claim 1, 10 and 19 as claimed. Contact Information Any inquiry concerning this communication or earlier communications from the examiner should be directed to MOHAMMED R UDDIN whose telephone number is (571)270-3138. The examiner can normally be reached M-F: 9:00 AM-5:00 PM. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Beausoliel Robert can be reached on 571-272-3645. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /MOHAMMED R UDDIN/Primary Examiner, Art Unit 2167
Read full office action

Prosecution Timeline

Jun 17, 2024
Application Filed
Apr 05, 2025
Non-Final Rejection — §103, §112
Jul 03, 2025
Examiner Interview Summary
Jul 03, 2025
Applicant Interview (Telephonic)
Jul 10, 2025
Response Filed
Oct 23, 2025
Final Rejection — §103, §112
Dec 22, 2025
Response after Non-Final Action
Jan 14, 2026
Request for Continued Examination
Jan 26, 2026
Response after Non-Final Action
Feb 21, 2026
Non-Final Rejection — §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12602432
SUMMARY GENERATION FOR A DISTRIBUTED GRAPH DATABASE
2y 5m to grant Granted Apr 14, 2026
Patent 12596676
RECORDS RETENTION MANAGEMENT
2y 5m to grant Granted Apr 07, 2026
Patent 12596960
MISUSE INDEX FOR EXPLAINABLE ARTIFICIAL INTELLIGENCE IN COMPUTING ENVIRONMENTS
2y 5m to grant Granted Apr 07, 2026
Patent 12585890
System and Method for Image Generation Using Neuroscience-Inspired Prompt Strategy
2y 5m to grant Granted Mar 24, 2026
Patent 12566800
EFFICIENT AND SCALABLE DATA PROCESSING AND MODELING
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
78%
Grant Probability
99%
With Interview (+30.8%)
3y 3m
Median Time to Grant
High
PTA Risk
Based on 726 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month