Last updated: April 19, 2026
Application No. 18/802,734
IMAGE QUERY PROCESSING USING LARGE LANGUAGE MODELS

Final Rejection §103
Filed
Aug 13, 2024
Examiner
TOUGHIRY, ARYAN D
Art Unit
2165
Tech Center
2100 — Computer Architecture & Software
Assignee
Google LLC
OA Round
2 (Final)
Interview Optional

— +19.9% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 189 resolved cases, 2023–2026
Examiner Intelligence

TOUGHIRY, ARYAN D View full profile →
Grants 68% — above average
Career Allow Rate
128 granted / 189 resolved
+12.7% vs TC avg
Strong +20% interview lift
Without
With
+19.9%
Interview Lift
resolved cases with interview
Typical timeline
3y 1m
Avg Prosecution
17 currently pending
Career history
206
Total Applications
across all art units
Statute-Specific Performance

§101
7.0%
-33.0% vs TC avg
§103
64.4%
+24.4% vs TC avg
§102
14.9%
-25.1% vs TC avg
§112
7.0%
-33.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 189 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant's arguments filed 10/31/2025 have been fully considered
35 USC § 102 & 35 USC § 103: 
Regarding Applicant’s Argument (pages 8-11):  Examiner’s response:- In response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986). Zhang teaches  "processing ... to generate one or more natural language descriptors ... descriptive of one or more properties of the input image" (see paragraph 15,57 and figure 4). Secondary prior art Tandom teaches "generating, using an explication model and based on the input text query, one or more explicit text queries that explicate one or more of the implicit queries in the input text query" (see paragraph 21,40,44 and figure 1). It is important to note that this rejection is one of obviousness and not one of anticipation, hence elements from one art can be combined into a foundation of another separate art and there can be obviousness conclusions reached in the mapping and teaching of the prior arts inventions into the instant applications claim limitations. Here the one or more natural language descriptors in Zhang are being integrated with the explication model shown and established by secondary prior art Tandom. The input prompt from Tandon (paragraph 39-42 and FIG.1) is also being applied to primary prior art Zhangs already established input query/image (see paragraphs 15,32 and FIG.4). As far as the new amendments to claim 17 Zhang in the context of the obviousness nature of the rejection teaches "obtaining that at least one of one or more of the images responsive to the image search request, including extracting a first text extract, of the text extracts" (see paragraph 16,32,56,77 and figure 4) and secondary prior art Abrams teaches " from a subset of a first resource of the web resources and extracting a second text extract, of the text extracts, from a subset of a second resource of the web resources" (see paragraph 25,31, and figure 1).											The examiner recommends further elaborating on "explication model" in the independent claims. The examiner believes amendments directed towards parameters/factors involved in the explication model will help push over the current prior art and push the application towards allowance. If the applicant would like further guidance for overcoming the prior art(s), please call the examiner at 571-272-5212.
Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-14 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over US 20240004924 A1; Zhang; Zhifei et al. (hereinafter Zhang) in view of US 20240354376 A1; Tandon; Abhas et al. (hereinafter Tandon).
Regarding claim 1, Zhang teaches A method implemented by one or more processors, the method comprising: receiving an input query associated with a client device, the input query comprising an input image and an input text query, wherein the input text query refers to the input image (Zhang [0015] FIG. 9 illustrates a diagram for utilizing an embedding-based search engine to conduct an image search using a multi-modal search input that includes a text query and an image query in accordance with one or more embodiments;  [0027]  an image search engine and a text search engine in responses to receiving a search query having textual and visual components. Further, in some implementations, the search-based editing system utilizes the search results to modify one or more attributes-such as color, tone, or texture—of an input digital image, bridging the search and editing processes. [0032] In some cases, the search-based editing system generates input embeddings for a multi-modal search input within a common embedding space. To illustrate, in some implementations, the search-based editing system receives a multi-modal search input, such as a search input having a text query and an image query. The search-based editing system generates, within a common embedding space (e.g., a text-image embedding space) a text embedding for the text query and an image embedding for the image query. The search-based editing system further retrieves digital images to return as the search results using the embeddings within the common embedding space. In some cases, the search-based editing system determines a weighted combination of the various components of the multi-modal search input (e.g., a weighted combination of the text query and the image query) and retrieves the search results using the weighted combination [56-63] elaborate on the matter [FIG.4] shows receiving an input query associated with a client device, the input query comprising an input image and an input text query, wherein the input text query refers to the input image and comprises one or more implicit queries)											processing, using a multi-modal image processing model, the input image and the one or more explicit text queries to generate one or more natural language descriptors, the one or more natural language descriptors descriptive of one or more properties of the input image, wherein the one or more natural language descriptors are responsive to the one or more explicit text queries; (Zhang [0015] FIG. 9 illustrates a diagram for utilizing an embedding-based search engine to conduct an image search using a multi-modal search input that includes a text query and an image query in accordance with one or more embodiments; [0018] FIGS. 12A-12B each illustrate the search-based editing system utilizing a multi-modal embedding neural network to generate an input embedding for a sketch query in accordance with one or more embodiments;[0034] To provide an example, in some cases, the search-based editing system receives a multi-modal search input that includes multiple visual (e.g., sketch, brush, or image) and/or textual components that provide semantic and layout information to consider when conducting the image search. The search-based editing system further utilizes a multi-modal embedding neural network to generate an input embedding that represents the semantic and layout information from the multi-modal search input. In some cases, the multi-modal embedding neural network determines segment-level semantic and layout information from the multi-modal search input and generates the input embedding based on this segment-level information. [0057] In some cases, the search-based editing system 106 additionally or alternatively utilizes an image search engine 116 to conduct the image search using the search input 202. In one or more embodiments, an image search engine includes a search engine that conducts an image search using search input that includes visual input (e.g., an image query, a sketch query, or a local query, such as a cropped region or a semantic region of a digital image). For example, in some cases, an image search engine identifies visual features of a visual input and searches for and retrieves digital images that incorporate one or more of those visual features...uses text input to conduct the image search (e.g., text input provided in connection with visual input, such as text input provided as part of a multi-modal canvas search query). [76-78 & 138-140] elaborate on the matter [FIG.4] shows a visual of the system)				generating, based on the one or more natural language descriptors and the input text query and/or the one or more explicit text queries, (Zhang [0032] The search-based editing system generates, within a common embedding space (e.g., a text-image embedding space) a text embedding for the text query and an image embedding for the image query. The search-based editing system further retrieves digital images to return as the search results using the embeddings within the common embedding space. In some cases, the search-based editing system determines a weighted combination of the various components of the multi-modal search input (e.g., a weighted combination of the text query and the image query) and retrieves the search results using the weighted combination. [0108] In particular, the encoder 602, in one or more implementations, comprises convolutional layers that generate a feature vector in the form of a feature map. To detect objects within the digital image 616, the object detection machine learning model 608 processes the feature map utilizing a convolutional layer in the form of a small network that is slid across small windows of the feature map. The object detection machine learning model 608 then maps each sliding window to a lower-dimensional feature. The object detection machine learning model 608 then processes this feature using two separate detection heads that are fully connected layers. In particular, the first head can comprise a box-regression layer that generates the detected object and an object-classification layer that generates the object label. [119] utilizes the feature pyramids and feature maps to identify objects within the digital image 616 and based on user input 612 generates segmentation masks via the masking head[0142] As further shown, the search-based editing system 106 utilizes an image embedding model 906 to generate an image embedding 908 from the image query 902. The image embedding 908 represents one or more visual features from the image query 902 (e.g., image characteristics or other patent or latent features of the image query 902, such as semantic information and/or layout information). Similarly, the search-based editing system 106 utilizes a text embedding model 910 to generate a text embedding 912 from the text query 904. The text embedding 912 represents one or more textual features from the text query 904 (e.g., patent or latent features of the text query 904, such as semantic information and/or layout information represented by the language, words, or structure of the text query 904). In one or more embodiments, the search-based editing system 106 respectively utilizes, as the image embedding model 906 and the text embedding model 910, the image encoder and the text encoder described in U.S. patent application Ser. No. 17/652,390 filed on Feb. 24, 2022, entitled GENERATING ARTISTIC CONTENT FROM A TEXT PROMPT OR A STYLE IMAGE UTILIZING A NEURAL NETWORK MODEL, the contents of which are expressly incorporated herein by reference in their entirety ....[179-184] elaborates on the matter [FIG.4] shows corresponding visual)			generating… and using the LLM, a response to the input query; and causing the response to the input query to be rendered at the client device. (Zhang [FIG.4] shows generating using the LLM, a response to the input query; and causing the response to the input query to be rendered at the client device. [0098] As also illustrated in FIG. 5B, embodiments of the neural network appearance encoder 514 include five layers. For example, the neural network appearance encoder 514 includes a convolutional ConvBlock layer with a 7×7 kernel, in addition to four ConvBlock layers with 3×3 kernels, each with their own respective resolutions for input and output.[0107] As shown in FIG. 6, the object detection machine learning model 608 includes lower neural network layers and higher neural network layers. In general, the lower neural network layers collectively form the encoder 602 and the higher neural network layers collectively form the detection heads 604 (e.g., decoder). In one or more embodiments, the encoder 602 includes convolutional layers that encodes digital images into feature vectors, which are outputted from the encoder 602 and provided as input to the detection heads 604. In various implementations, the detection heads 604 comprise fully connected layers that analyze the feature vectors and output the detected objects (potentially with approximate boundaries around the objects [0129] FIG. 7B illustrates the search-based editing system 106 performing a WCT color transfer operation using a neural network in accordance with one or more embodiments. As shown in FIG. 7B, the search-based editing system 106 provides a content image 722 (e.g., an input digital image) and a style image 724 (e.g., a reference image) to a neural network 720 to generate an output 726 (e.g., a modified digital image). As indicated in FIG. 7B, in one or more embodiments, the neural network 720 includes an encoder 728. In particular, as shown, in one or more embodiments, the neural network 720 includes a visual geometry group (VGG) neural network, such as a VGG-19 network, as the encoder 728. As further shown in FIG. 7B, the neural network 720 includes a decoder 730. In one or more embodiments, the decoder 730 includes a symmetric decoder that inverts the features of the encoder 728 (e.g., the VGG-19 features) and output the modified digital image.)									the combination lacks explicitly and orderly teaching comprises one or more implicit queries; generating, using an explication model and based on the input text query, one or more explicit text queries that explicate one or more of the implicit queries in the input text query;	an input prompt for a large language model, LLM;		However Tandon teaches comprises one or more implicit queries; generating, using an explication model and based on the input text query, one or more explicit text queries that explicate one or more of the implicit queries in the input text query; (Tandon  [FIG.1] shows a system which comprises one or more implicit queries and generating, using an explication model and based on the input text query, one or more explicit text queries that explicate one or more of the implicit queries in the input text query[0021] use these implicit attributes to generate the...[0040] As shown in FIG. 1, prompt generation component 160 can include a prompt feedback component 168. For example, prompt generation component 160 generates an initial prompt using implicit attribute suggestions 106, explicit attribute suggestions 107, user inputs 120, and attribute selections 122... [0044] output of the generative machine learning model 108 can be... based on the attribute selections 122 and user inputs 120 used to generate the task descriptions (e.g., prompts) to which the generative machine learning model 108 is applied. For example, if a particular job title is common to many users of the online system, a prompt can be configured based on the implicit and explicit attributes associated with that job title so that the generative machine learning model 108 generates recommendation text pertaining to the job title. Since users have the ability to select the applicable implicit and explicit attributes, over time recommendation generation system 100 learns the best attributes for specific combinations of explicit attribute data 104 and is able to suggest applicable attributes. Additionally, since users do not typically write about their personal qualities in their profiles, generative system for writing entity recommendations 105 is able to provide recommendations that discuss such implicit attributes [0087] In some embodiments, as shown in FIG. 11, prompt generation component 160 generates adjectives to accompany the explicit attributes in explicit attributes section 625. For example, prompt generation component 160 generates qualifiers for a variety of different proficiency levels (e.g., familiar, proficient, and master) and uses them with the explicit attributes from profile 505. In some embodiments, the qualifiers are generated randomly. In other embodiments, the qualifiers are generated in response to information from profile 505. For example, qualifiers for higher proficiency levels are generated for a user with more recommendations for that particular explicit attribute.... [89-97] go into further details)		an input prompt for a large language model, LLM; (Tandon [0039] Prompt generation component 160 receives implicit attribute suggestions 106, explicit attribute suggestions 107, user inputs 120, and attribute selections 122 and creates prompt 114... [0040] As shown in FIG. 1, prompt generation component 160 can include a prompt feedback component 168. For example, prompt generation component 160 generates an initial prompt using implicit attribute suggestions 106, explicit attribute suggestions 107, user inputs 120, and attribute selections 122. Prompt generation component 160 uses these prompt inputs and a set of instructions to create prompt 114. In some embodiments, prompt generation component 160 generates the set of instructions. For example, prompt generation component 160 generates the set of instructions based on one or more of explicit attribute data 104, implicit attribute suggestions 106, explicit attribute suggestions 107, user inputs 120, and attribute selections 122. In other embodiments, the set of instructions is prestored and extracted from a data store (such as data store 240 of FIG. 2). In still other embodiments, an initial set of instructions is prestored and extracted from the data store and prompt generation component 160 uses the initial set of instructions to generate the set of instructions used for creating prompt 114. For example, prompt generation component 160 uses the initial set of instructions and one or more of explicit attribute data 104, implicit attribute suggestions 106, explicit attribute suggestions 107, user inputs 120, and attribute selections 122 to generate the set of instructions used for creating prompt 114. The term set of instructions as used in this disclosure can be a single instruction or multiple instructions. Further details about the set of instructions are discussed with reference to FIG. 4. [0042] Prompt generation component 160 creates prompt 114, x, based on the implicit attribute suggestions 106, explicit attribute suggestions 107, user inputs 120, and attribute selections 122. In some embodiments, prompt generation component 160 creates more than one prompt. As shown in FIG. 4, prompt 114 can include instructions 410, prompt input 420, and examples 440. Although illustrated as including instructions 410, prompt input 420, and examples 440, prompt 114 can include different combinations of one or more of these as well as include further components. Further details about prompt generation component 160 are described with reference to FIG. 4...[FIG.1] shows system which is generating input prompt based on the one or more natural language descriptors and the input text query and/or the one or more explicit text queries)			Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to take all prior methods and make the addition of Tandon in order to efficiently create a more accurate output generative learning model methods (Tandom [0044] By reducing the barrier of entry to recommending other users, the recommendation system can grow and become more dynamic allowing the models (e.g., language model 305, domain specific language model 310, classifier language model 315, and generative machine learning model 108) to more accurately generate recommendations. Further details with regards to language model 305, domain specific language model 310, and classifier language model 315 are described with reference to FIG. 3. [0084] generate increasingly accurate suggestions for implicit attributes section 620 and explicit attributes section 625. [FIG.1] shows corresponding visual)
Regarding claim 2, Zhang and Tandon teach the method of claim 1, wherein the multi-modal image processing model is a visual query answering model. (Zhang [0015] FIG. 9 illustrates a diagram for utilizing an embedding-based search engine to conduct an image search using a multi-modal search input that includes a text query and an image query in accordance with one or more embodiments; [0018] FIGS. 12A-12B each illustrate the search-based editing system utilizing a multi-modal embedding neural network to generate an input embedding for a sketch query in accordance with one or more embodiments;[0034] To provide an example, in some cases, the search-based editing system receives a multi-modal search input that includes multiple visual (e.g., sketch, brush, or image) and/or textual components that provide semantic and layout information to consider when conducting the image search. The search-based editing system further utilizes a multi-modal embedding neural network to generate an input embedding that represents the semantic and layout information from the multi-modal search input. In some cases, the multi-modal embedding neural network determines segment-level semantic and layout information from the multi-modal search input and generates the input embedding based on this segment-level information. [0057] In some cases, the search-based editing system 106 additionally or alternatively utilizes an image search engine 116 to conduct the image search using the search input 202. In one or more embodiments, an image search engine includes a search engine that conducts an image search using search input that includes visual input (e.g., an image query, a sketch query, or a local query, such as a cropped region or a semantic region of a digital image). For example, in some cases, an image search engine identifies visual features of a visual input and searches for and retrieves digital images that incorporate one or more of those visual features...uses text input to conduct the image search (e.g., text input provided in connection with visual input, such as text input provided as part of a multi-modal canvas search query). [76-78 & 138-140] elaborate on the matter [FIG.4] shows a visual of the system)
Regarding claim 3, Zhang and Tandon teach The method of claim 1, wherein generating the input prompt for the LLM comprises: completing one or more pre-defined strings using the one or more natural language descriptors. (Tandon [0032] In the example of FIG. 1, a recommendation generation system 100 includes a generative system for writing entity recommendations 105 including implicit attribute generation component 150, a prompt generation component 160, and generative machine learning model 108. The generative system for writing entity recommendations 105 interfaces with one or more components of an application software system (such as application software system 230 of FIG. 2) that create, edit, and store entity profiles, network activity data, and related data such as rankings, scores, and labels. For example, in FIG. 1, a profile 102 has been created and stored by an online system, such as a professional social network system or another type of application software system. Profile 102 contains explicit attribute data 104 including descriptors of the skills and capabilities of the user or entity associated with profile 102. These descriptors...[0039] Prompt generation component 160 receives implicit attribute suggestions 106, explicit attribute suggestions 107, user inputs 120, and attribute selections 122 and creates prompt 114. In some embodiments, prompt generation component 160 generates prompt 114 using user inputs 120 and attribute selections 122. User inputs 120 are inputs received by the application software system from user system 110 in response to a user interaction with recommendation interface 115 and/or user interface 112 of user system 110. For example, as explained in further detail with references to FIGS. 6-12, an interaction with recommendation interface 115 and/or user interface 112 causes user system 110 to send user inputs 120 and attribute selections[0050] In some embodiments, prompt feedback component 168 includes a trained inference machine learning model which is trained on sentence pairs and uses logical rules about language modeling to generate a performance parameter for the entity recommendation suggestion 116. For example, the inference machine learning model is trained to determine whether sentences are redundant and/or contradictory. The inference machine learning model can be, for example, a Multi-Genre Natural Language Inference (MNLI) model or an Adversarial Natural Language Inference (ANLI) model....[FIG.1] shows overall visual [84-90] show wherein generating the input prompt for the LLM comprises: completing one or more pre-defined strings using the one or more natural language descriptors )
Regarding claim 4, Zhang and Tandon teach The method of claim 1, wherein the method further comprises: processing, using one or more unimodal image processing models, the input image to generate one or more query independent properties of the input image, wherein generating the input prompt for the LLM is further based on the one or more one or more query independent properties of the input image. (Tandon [FIG.1] shows the overall system which using one or more unimodal image processing models, the input to generate one or more query independent properties of the input, wherein generating the input prompt for the LLM is further based on the one or more one or more query independent properties of the input [0032] In the example of FIG. 1, a recommendation generation system 100 includes a generative system for writing entity recommendations 105 including implicit attribute generation component 150, a prompt generation component 160, and generative machine learning model 108. The generative system for writing entity recommendations 105 interfaces with one or more components of an application software system (such as application software system 230 of FIG. 2) that create, edit, and store entity profiles, network activity data, and related data such as rankings, scores, and labels. For example, in FIG. 1, a profile 102 has been created and stored by an online system, such as a professional social network system or another type of application software system. Profile 102 contains explicit attribute data 104 including descriptors of the skills and capabilities of the user or entity associated with profile 102. These descriptors include, [0042] Prompt generation component 160 creates prompt 114, x, based on the implicit attribute suggestions 106, explicit attribute suggestions 107, user inputs 120, and attribute selections 122. In some embodiments, prompt generation component 160 creates more than one prompt. As shown in FIG. 4, prompt 114 can include instructions 410, prompt input 420, and examples 440. Although illustrated as including instructions 410, prompt input 420, and examples 440, prompt 114 can include different combinations of one or more of these as well as include further components. Further details about prompt generation component 160 are described with reference to FIG. 4.[0050] In some embodiments, prompt feedback component 168 includes a trained inference machine learning model which is trained on sentence pairs and uses logical rules about language modeling to generate a performance parameter for the entity recommendation suggestion 116. For example, the inference machine learning model is trained to determine whether sentences are redundant and/or contradictory. The inference machine learning model can be, for example, a Multi-Genre Natural Language Inference (MNLI) model or an Adversarial Natural Language Inference (ANLI) model. Prompt feedback component 168 includes the inference machine learning model which uses sentences of entity recommendation suggestion 116 as inputs and determines the performance parameter...[86-92] elaborates on the matter) It is important to note that as shown in the independent claim 1 above primary prior art Zhangestablishes the input image and image processing embodiments (as mapped above and shown in for example figure 4) and secondary prior art Tandon establishing the obviousness addition of generating a prompt and corresponding steps (as mapped above and shown in for example figure 1)
Regarding claim 5, Zhang and Tandon teach The method of claim 4, wherein generating the one or more explicit text queries further comprises: processing, using the explication model, the one or more query independent properties of the input image. (Tandon [0032] In the example of FIG. 1, a recommendation generation system 100 includes a generative system for writing entity recommendations 105 including implicit attribute generation component 150, a prompt generation component 160, and generative machine learning model 108. The generative system for writing entity recommendations 105 interfaces with one or more components of an application software system (such as application software system 230 of FIG. 2) that create, edit, and store entity profiles, network activity data, and related data such as rankings, scores, and labels. For example, in FIG. 1, a profile 102 has been created and stored by an online system, such as a professional social network system or another type of application software system. Profile 102 contains explicit attribute data[0037] Implicit attribute generation component 150 generates implicit attribute suggestions 106 using explicit attribute data 104. In one embodiment, implicit attribute generation component 150 uses explicit attribute data 104 as an input to a machine learning model that outputs attribute suggestions including implicit attribute suggestions 106 and explicit attribute suggestions including classifiers identifying whether an attribute is an implicit attribute (e.g., belongs to implicit attribute suggestions 106) or whether the attribute is an explicit attribute (e.g., belongs to explicit attribute suggestions 107). For example, prompt generation component 160 inputs explicit attribute data 104 including a job title for profile 102 into a machine learning model which determines implicit attribute suggestions 106 based on the job title as well as explicit attribute suggestions based on the job title and the rest of explicit attribute data 104. The machine learning model also classifies the implicit attribute suggestions 106 and explicit attribute suggestions 107 based on whether the attributes [0050] prompt feedback component 168 includes a trained inference machine learning model which is trained on sentence pairs and uses logical rules about language modeling to generate a performance parameter for the entity recommendation suggestion 116. For example, the inference machine learning model is trained to determine whether sentences are redundant and/or contradictory. The inference machine learning model can be, for example, a Multi-Genre Natural Language Inference (MNLI) model or an Adversarial Natural Language Inference (ANLI) model. Prompt feedback component 168 includes the inference machine learning model which uses sentences of entity recommendation suggestion 116 as inputs and determines the performance parameter by labeling pairs of sentences of entity recommendation suggestion 116 as contradictions and/or redundancies. Prompt feedback component 168 determines the performance parameter based on the outputs of the inference machine learning model. [0052] prompt generation component 160 to determine that the performance parameter does not meet a threshold. In some embodiments, prompt feedback component 168 generates training data using feedback 118 and prompt 114 to train a prompt generation machine learning model. For example, prompt feedback component 168 trains a domain specific language model (such as domain specific language model 310 of FIG. 3) using prompts and their associated labels. In some embodiments, implicit attribute generation component 150 and prompt generation component 160 use the prompt generation machine learning model to generate their respective outputs.) [86-92] elaborates on the matter) It is important to note that as shown in the independent claim 1 above primary prior art Zhang establishes the input image and image processing embodiments (as mapped above and shown in for example figure 4) and secondary prior art Tandon establishing the obviousness addition of generating a prompt and corresponding steps (as mapped above and shown in for example figure 1)
Regarding claim 6, Zhang and Tandon teach The method of claim 4, wherein generating the input prompt for the LLM comprises: completing one or more pre-defined string using the one or more query independent properties of the input image. (Tandon [0032] In the example of FIG. 1, a recommendation generation system 100 includes a generative system for writing entity recommendations 105 including implicit attribute generation component 150, a prompt generation component 160, and generative machine learning model 108. The generative system for writing entity recommendations 105 interfaces with one or more components of an application software system (such as application software system 230 of FIG. 2) that create, edit, and store entity profiles, network activity data, and related data such as rankings, scores, and labels. For example, in FIG. 1, a profile 102 has been created and stored by an online system, such as a professional social network system or another type of application software system. Profile 102 contains explicit attribute data[0037] Implicit attribute generation component 150 generates implicit attribute suggestions 106 using explicit attribute data 104. In one embodiment, implicit attribute generation component 150 uses explicit attribute data 104 as an input to a machine learning model that outputs attribute suggestions including implicit attribute suggestions 106 and explicit attribute suggestions including classifiers identifying whether an attribute is an implicit attribute (e.g., belongs to implicit attribute suggestions 106) or whether the attribute is an explicit attribute (e.g., belongs to explicit attribute suggestions 107). For example, prompt generation component 160 inputs explicit attribute data 104 including a job title for profile 102 into a machine learning model which determines implicit attribute suggestions 106 based on the job title as well as explicit attribute suggestions based on the job title and the rest of explicit attribute data 104. The machine learning model also classifies the implicit attribute suggestions 106 and explicit attribute suggestions 107 based on whether the attributes [0050] prompt feedback component 168 includes a trained inference machine learning model which is trained on sentence pairs and uses logical rules about language modeling to generate a performance parameter for the entity recommendation suggestion 116. For example, the inference machine learning model is trained to determine whether sentences are redundant and/or contradictory. The inference machine learning model can be, for example, a Multi-Genre Natural Language Inference (MNLI) model or an Adversarial Natural Language Inference (ANLI) model. Prompt feedback component 168 includes the inference machine learning model which uses sentences of entity recommendation suggestion 116 as inputs and determines the performance parameter by labeling pairs of sentences of entity recommendation suggestion 116 as contradictions and/or redundancies. Prompt feedback component 168 determines the performance parameter based on the outputs of the inference machine learning model. [0052]  prompt generation component 160 to determine that the performance parameter does not meet a threshold. In some embodiments, prompt feedback component 168 generates training data using feedback 118 and prompt 114 to train a prompt generation machine learning model. For example, prompt feedback component 168 trains a domain specific language model (such as domain specific language model 310 of FIG. 3) using prompts and their associated labels. In some embodiments, implicit attribute generation component 150 and prompt generation component 160 use the prompt generation machine learning model to generate their respective outputs.) [86-92] elaborates on the matter ) It is important to note that as shown in the independent claim 1 above primary prior art Zhang establishes the input image and image processing embodiments (as mapped above and shown in for example figure 4) and secondary prior art Tandon establishing the obviousness addition of generating a prompt and corresponding steps (as mapped above and shown in for example figure 1)
Regarding claim 7, Zhang and Tandon teach The method of claim 4, wherein the one or more unimodal image processing models comprises: an object detection model; an entity recognition model; a captioning model; an optical character recognition model; and/or an image segmentation model. (Tandon [0032] In the example of FIG. 1, a recommendation generation system 100 includes a generative system for writing entity recommendations 105 including implicit attribute generation component 150, a prompt generation component 160, and generative machine learning model 108. The generative system for writing entity recommendations 105 interfaces with one or more components of an application software system (such as application software system 230 of FIG. 2) that create, edit, and store entity profiles, network activity data, and related data such as rankings, scores, and labels. For example, in FIG. 1, a profile 102 has been created and stored by an online system, such as a professional social network system or another type of application software system. Profile 102 contains explicit attribute data[0037] Implicit attribute generation component 150 generates implicit attribute suggestions 106 using explicit attribute data 104. In one embodiment, implicit attribute generation component 150 uses explicit attribute data 104 as an input to a machine learning model that outputs attribute suggestions including implicit attribute suggestions 106 and explicit attribute suggestions including classifiers identifying whether an attribute is an implicit attribute (e.g., belongs to implicit attribute suggestions 106) or whether the attribute is an explicit attribute (e.g., belongs to explicit attribute suggestions 107). For example, prompt generation component 160 inputs explicit attribute data 104 including a job title for profile 102 into a machine learning model which determines implicit attribute suggestions 106 based on the job title as well as explicit attribute suggestions based on the job title and the rest of explicit attribute data 104. The machine learning model also classifies the implicit attribute suggestions 106 and explicit attribute suggestions 107 based on whether the attributes [0050] prompt feedback component 168 includes a trained inference machine learning model which is trained on sentence pairs and uses logical rules about language modeling to generate a performance parameter for the entity recommendation suggestion 116. For example, the inference machine learning model is trained to determine whether sentences are redundant and/or contradictory. The inference machine learning model can be, for example, a Multi-Genre Natural Language Inference (MNLI) model or an Adversarial Natural Language Inference (ANLI) model. Prompt feedback component 168 includes the inference machine learning model which uses sentences of entity recommendation suggestion 116 as inputs and determines the performance parameter by labeling pairs of sentences of entity recommendation suggestion 116 as contradictions and/or redundancies. Prompt feedback component 168 determines the performance parameter based on the outputs of the inference machine learning model. [0052]  prompt generation component 160 to determine that the performance parameter does not meet a threshold. In some embodiments, prompt feedback component 168 generates training data using feedback 118 and prompt 114 to train a prompt generation machine learning model. For example, prompt feedback component 168 trains a domain specific language model (such as domain specific language model 310 of FIG. 3) using prompts and their associated labels. In some embodiments, implicit attribute generation component 150 and prompt generation component 160 use the prompt generation machine learning model to generate their respective outputs.) [86-92] elaborates on the matter) It is important to note that as shown in the independent claim 1 above primary prior art Zhang establishes the input image and image processing embodiments (as mapped above and shown in for example figure 4) and secondary prior art Tandon establishing the obviousness addition of generating a prompt and corresponding steps (as mapped above and shown in for example figure 1)
Regarding claim 8, Zhang and Tandon teach The method of claim 1, wherein the input prompt for the LLM comprises contextual information indicative of contents of the image, wherein the contextual information is based on the one or more natural language descriptors. (Zhang [0015] FIG. 9 illustrates a diagram for utilizing an embedding-based search engine to conduct an image search using a multi-modal search input that includes a text query and an image query in accordance with one or more embodiments; [0018] FIGS. 12A-12B each illustrate the search-based editing system utilizing a multi-modal embedding neural network to generate an input embedding for a sketch query in accordance with one or more embodiments;[0034] To provide an example, in some cases, the search-based editing system receives a multi-modal search input that includes multiple visual (e.g., sketch, brush, or image) and/or textual components that provide semantic and layout information to consider when conducting the image search. The search-based editing system further utilizes a multi-modal embedding neural network to generate an input embedding that represents the semantic and layout information from the multi-modal search input. In some cases, the multi-modal embedding neural network determines segment-level semantic and layout information from the multi-modal search input and generates the input embedding based on this segment-level information. [0057] In some cases, the search-based editing system 106 additionally or alternatively utilizes an image search engine 116 to conduct the image search using the search input 202. In one or more embodiments, an image search engine includes a search engine that conducts an image search using search input that includes visual input (e.g., an image query, a sketch query, or a local query, such as a cropped region or a semantic region of a digital image). For example, in some cases, an image search engine identifies visual features of a visual input and searches for and retrieves digital images that incorporate one or more of those visual features...uses text input to conduct the image search (e.g., text input provided in connection with visual input, such as text input provided as part of a multi-modal canvas search query). [76-78 & 138-140] elaborate on the matter [FIG.4] shows a visual of the system)						
Regarding claim 9, Zhang and Tandon teach The method of claim 1, further comprising: generating, based on the input image, a search request for a search engine; transmitting, to the search engine, the search request; receiving, from the search engine and in response to the search request, a search response, (Zhang [FIG.4] shows  generating, based on the input image, a search request for a search engine transmitting, to the search engine, the search request receiving, from the search engine and in response to the search request, a search response, [0015] FIG. 9 illustrates a diagram for utilizing an embedding-based search engine to conduct an image search using a multi-modal search input that includes a text query and an image query in accordance with one or more embodiments;  [0027]  an image search engine and a text search engine in responses to receiving a search query having textual and visual components. Further, in some implementations, the search-based editing system utilizes the search results to modify one or more attributes-such as color, tone, or texture—of an input digital image, bridging the search and editing processes. [0032] In some cases, the search-based editing system generates input embeddings for a multi-modal search input within a common embedding space. To illustrate, in some implementations, the search-based editing system receives a multi-modal search input, such as a search input having a text query and an image query. The search-based editing system generates, within a common embedding space (e.g., a text-image embedding space) a text embedding for the text query and an image embedding for the image query. The search-based editing system further retrieves digital images to return as the search results using the embeddings within the common embedding space. In some cases, the search-based editing system determines a weighted combination of the various components of the multi-modal search input (e.g., a weighted combination of the text query and the image query) and retrieves the search results using the weighted combination [56-63] elaborate on the matter)												wherein generating the input prompt for the LLM is further based on the search response. (Tandon [0039] Prompt generation component 160 receives implicit attribute suggestions 106, explicit attribute suggestions 107, user inputs 120, and attribute selections 122 and creates prompt 114... [0040] As shown in FIG. 1, prompt generation component 160 can include a prompt feedback component 168. For example, prompt generation component 160 generates an initial prompt using implicit attribute suggestions 106, explicit attribute suggestions 107, user inputs 120, and attribute selections 122. Prompt generation component 160 uses these prompt inputs and a set of instructions to create prompt 114. In some embodiments, prompt generation component 160 generates the set of instructions. For example, prompt generation component 160 generates the set of instructions based on one or more of explicit attribute data 104, implicit attribute suggestions 106, explicit attribute suggestions 107, user inputs 120, and attribute selections 122. In other embodiments, the set of instructions is prestored and extracted from a data store (such as data store 240 of FIG. 2). In still other embodiments, an initial set of instructions is prestored and extracted from the data store and prompt generation component 160 uses the initial set of instructions to generate the set of instructions used for creating prompt 114. For example, prompt generation component 160 uses the initial set of instructions and one or more of explicit attribute data 104, implicit attribute suggestions 106, explicit attribute suggestions 107, user inputs 120, and attribute selections 122 to generate the set of instructions used for creating prompt 114. The term set of instructions as used in this disclosure can be a single instruction or multiple instructions. Further details about the set of instructions are discussed with reference to FIG. 4. [0042] Prompt generation component 160 creates prompt 114, x, based on the implicit attribute suggestions 106, explicit attribute suggestions 107, user inputs 120, and attribute selections 122. In some embodiments, prompt generation component 160 creates more than one prompt. As shown in FIG. 4, prompt 114 can include instructions 410, prompt input 420, and examples 440. Although illustrated as including instructions 410, prompt input 420, and examples 440, prompt 114 can include different combinations of one or more of these as well as include further components. Further details about prompt generation component 160 are described with reference to FIG. 4...[FIG.1] shows system which is generating input prompt based on the one or more natural language descriptors and the input text query and/or the one or more explicit text queries)
Regarding claim 10, Zhang and Tandon teach The method of claim 9, wherein the search request is based on the one or more natural language descriptors and/or the one or more explicit text queries. (Zhang [0015] FIG. 9 illustrates a diagram for utilizing an embedding-based search engine to conduct an image search using a multi-modal search input that includes a text query and an image query in accordance with one or more embodiments; [0018] FIGS. 12A-12B each illustrate the search-based editing system utilizing a multi-modal embedding neural network to generate an input embedding for a sketch query in accordance with one or more embodiments;[0034] To provide an example, in some cases, the search-based editing system receives a multi-modal search input that includes multiple visual (e.g., sketch, brush, or image) and/or textual components that provide semantic and layout information to consider when conducting the image search. The search-based editing system further utilizes a multi-modal embedding neural network to generate an input embedding that represents the semantic and layout information from the multi-modal search input. In some cases, the multi-modal embedding neural network determines segment-level semantic and layout information from the multi-modal search input and generates the input embedding based on this segment-level information. [0057] In some cases, the search-based editing system 106 additionally or alternatively utilizes an image search engine 116 to conduct the image search using the search input 202. In one or more embodiments, an image search engine includes a search engine that conducts an image search using search input that includes visual input (e.g., an image query, a sketch query, or a local query, such as a cropped region or a semantic region of a digital image). For example, in some cases, an image search engine identifies visual features of a visual input and searches for and retrieves digital images that incorporate one or more of those visual features...uses text input to conduct the image search (e.g., text input provided in connection with visual input, such as text input provided as part of a multi-modal canvas search query). [76-78 & 138-140] elaborate on the matter [FIG.4] shows a visual of the system)
Regarding claim 11, Zhang and Tandon teach The method of claim 9, wherein the search response comprises one or more text extracts associated with one or more images returned by the search engine in response to the search request. (Zhang [0029] the search-based editing system conducts the image search using search input. In particular, the search-based editing system utilizes one or more search queries to identify and retrieve the digital images included in the search results. The search-based editing system utilizes search queries of various types in different embodiments. For example, in some implementations, the search-based editing system uses a text query, an image query, a sketch query, or a local query (e.g., a cropped region or a semantic region of a digital image) in retrieving the search results. In some instances, the search-based editing system utilizes a multi-modal search input in retrieving the search results. [0041] The search-based editing system operates with improved flexibility when compared to conventional systems. For instance, the search-based editing system flexibly provides features for retrieving a reference image to use in modifying an input digital image. Indeed, by retrieving search results in response to a search input and modifying an input digital image using the search results, the search-based editing system flexibly bridges search and editing. Further, the search-based editing system provides more flexible search engines. For instance, the search-based editing system implements search engines that can retrieve search results in response to multi-modal search inputs, such as those providing spatial or other layout information for the image search (e.g., inputs including multiple brush, sketch, or image/crop components). Further, the search-based editing system provides more flexible control over how components of a multi-modal search input are utilized in conducting an image search. Indeed, as previously indicated, some embodiments of the search-based editing system provide an option for selecting a weight to be used in combining the components of a multi-modal search input, such as a text query and an image query. Thus, the search-based editing system flexibly adapts to interactions with the option, potentially retrieving different search results in response to similar text and image query combinations. [0056] In one or more embodiments, the search-based editing system 106 utilizes a text search engine 114 to conduct the image search using the search input 202. In one or more embodiments, a text search engine includes a search engine that conducts an image search using search input that includes text input (e.g., a text query). In particular, in some embodiments, a text search engine includes a search engine that utilizes text input to retrieve image search results. For example, in some cases, a text search engine identifies textual features of a text input and searches for and retrieves digital images that incorporate one or more of those textual features. [224-233] elaborate on the matter [FIG.4] shows overall visual of the system )
Regarding claim 12, Zhang and Tandon teach The method of claim 9, wherein: the search request is an image search request requesting similar images to the input image; and the search response comprises text from one or more resources in which at least one of the images responsive to the image search request are incorporated. (Zhang [0031] the search-based editing system utilizes an embedding-based search engine. For instance, in some cases, the search-based editing system generates one or more input embeddings from the search input and identifies the digital images to return as the search results using the input embedding(s). For example, in some embodiments, the search-based editing system generates the input embedding(s) within an embedding space and identifies digital images for the search results based on distances between embeddings corresponding to the digital images and the input embedding(s) within the embedding space. [0041] Thus, the search-based editing system flexibly adapts to interactions with the option, potentially retrieving different search results in response to similar text and image query combinations. [0143] As further shown in FIG. 9, the search-based editing system 106 generates the image embedding 908 and the text embedding 912 within a text-image embedding space 914. In one or more embodiments, a text-image embedding space includes a common embedding space for input embeddings that correspond to text queries and image queries (i.e., text embeddings and image embeddings, respectively). Accordingly, in some cases, the search-based editing system 106 positions the image embedding 908 and the text embedding 912 within the text-image embedding space 914 based on their respective image and text features. [226]  image having a similarity to the text query that is higher than a similarity to the image query based on weighing the text query higher than the image query. Similarly, in some embodiments, determining the weighted combination of the text query and the image query comprises weighing the image query higher than the text query. Accordingly, retrieving the one or more digital images utilizing the weighted combination comprises retrieving at least one digital image having a similarity to the image query that is higher than a similarity to the text query based on weighing the image query higher than the text query. [179-182]  elaborate on the matter [FIG.4] shows corresponding visual)
Regarding claim 13, Zhang and Tandon teach The method of claim 12, wherein: the search response comprises the one or more resources in which the images responsive to the image search request are incorporated; and the method further comprises extracting the text from the one or more resources in which at least one of the images responsive to the image search request are incorporated. (Zhang [0031] the search-based editing system utilizes an embedding-based search engine. For instance, in some cases, the search-based editing system generates one or more input embeddings from the search input and identifies the digital images to return as the search results using the input embedding(s). For example, in some embodiments, the search-based editing system generates the input embedding(s) within an embedding space and identifies digital images for the search results based on distances between embeddings corresponding to the digital images and the input embedding(s) within the embedding space. [0041] Thus, the search-based editing system flexibly adapts to interactions with the option, potentially retrieving different search results in response to similar text and image query combinations. [0143] As further shown in FIG. 9, the search-based editing system 106 generates the image embedding 908 and the text embedding 912 within a text-image embedding space 914. In one or more embodiments, a text-image embedding space includes a common embedding space for input embeddings that correspond to text queries and image queries (i.e., text embeddings and image embeddings, respectively). Accordingly, in some cases, the search-based editing system 106 positions the image embedding 908 and the text embedding 912 within the text-image embedding space 914 based on their respective image and text features. [226]  image having a similarity to the text query that is higher than a similarity to the image query based on weighing the text query higher than the image query. Similarly, in some embodiments, determining the weighted combination of the text query and the image query comprises weighing the image query higher than the text query. Accordingly, retrieving the one or more digital images utilizing the weighted combination comprises retrieving at least one digital image having a similarity to the image query that is higher than a similarity to the text query based on weighing the image query higher than the text query. [179-182]  elaborate on the matter [FIG.4] shows corresponding visual)
Regarding claim 14, Zhang and Tandon teach The method of claim 12, wherein the text from the one or more resources in which at least one of the images responsive to the image search request are incorporated comprises: text of one or more webpages in which at least one of the images responsive to the image search request are incorporated; text of one or more captions of at least one of the images responsive to the image search request; one or more tags of at least one of the images responsive to the image search request; and/or one or more sets of metadata of at least one of the images responsive to the image search request. (Zhang [0027] the search-based editing system utilizes an image search engine in response to receiving a multi-modal canvas search query. In some instances, the search-based editing system utilizes an image search engine and a text search engine in responses to receiving a search query having textual and visual components. Further, in some implementations, the search-based editing system utilizes the search results to modify one or more attributes-such as color, tone, or texture—of an input digital image, bridging the search and editing processes. [0031] the search-based editing system utilizes an embedding-based search engine. For instance, in some cases, the search-based editing system generates one or more input embeddings from the search input and identifies the digital images to return as the search results using the input embedding(s). For example, in some embodiments, the search-based editing system generates the input embedding(s) within an embedding space and identifies digital images for the search results based on distances between embeddings corresponding to the digital images and the input embedding(s) within the embedding space. [0041] Thus, the search-based editing system flexibly adapts to interactions with the option, potentially retrieving different search results in response to similar text and image query combinations. [0143] As further shown in FIG. 9, the search-based editing system 106 generates the image embedding 908 and the text embedding 912 within a text-image embedding space 914. In one or more embodiments, a text-image embedding space includes a common embedding space for input embeddings that correspond to text queries and image queries (i.e., text embeddings and image embeddings, respectively). Accordingly, in some cases, the search-based editing system 106 positions the image embedding 908 and the text embedding 912 within the text-image embedding space 914 based on their respective image and text features. [226]  image having a similarity to the text query that is higher than a similarity to the image query based on weighing the text query higher than the image query. Similarly, in some embodiments, determining the weighted combination of the text query and the image query comprises weighing the image query higher than the text query. Accordingly, retrieving the one or more digital images utilizing the weighted combination comprises retrieving at least one digital image having a similarity to the image query that is higher than a similarity to the text query based on weighing the image query higher than the text query. [179-182]  elaborate on the matter [FIG.4] shows corresponding visual)
Regarding claim 16, Zhang and Tandon teach The method of claim 1, wherein the explication model comprises the LLM or a further LLM. (Zhang [0034] To provide an example, in some cases, the search-based editing system receives a multi-modal search input that includes multiple visual (e.g., sketch, brush, or image) and/or textual components that provide semantic and layout information to consider when conducting the image search. The search-based editing system further utilizes a multi-modal embedding neural network to generate an input embedding that represents the semantic and layout information from the multi-modal search input. In some cases, the multi-modal embedding neural network determines segment-level semantic and layout information from the multi-modal search input and generates the input embedding based on this segment-level information. [0080] In one or more embodiments, the search-based editing system 106 utilizes a neural network to implement the editing operation 416d. In one or more embodiments, a neural network includes a type of machine learning model, which can be tuned (e.g., trained) based on inputs to approximate unknown functions used for generating the corresponding outputs. In particular, in some embodiments, a neural network includes a model of interconnected artificial neurons (e.g., organized in layers) that communicate and learn to approximate complex functions and generate outputs based on a plurality of inputs provided to the model. In some instances, a neural network includes one or more machine learning algorithms. [0174] As further shown in FIG. 12B, the search-based editing system 106 utilizes a transformer neural network 1210 of the multi-modal embedding neural network 1200 to generate a semantic embedding 1212 and a layout embedding 1214 from the segment-level semantic embeddings 1206 and the segment-level layout embeddings 1208. For instance, in some cases, the search-based editing system 106 utilizes the transformer neural network 1210 to generate the semantic embedding 1212 from the segment-level semantic embeddings 1206 and generate the layout embedding 1214 from the segment-level layout embeddings 1208. In one or more embodiments, the search-based editing system 106 utilizes, as the transformer neural network 1210, the vision transformer model described by Alexey Dosovitskiy et al., An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale, ICLR, 2021, arXiv:2010.11929v2, which is incorporated herein by reference in its entirety. In some cases, rather than using the transformer neural network 1210, the search-based editing system 106 utilizes a convolutional neural network to generate the semantic embedding 1212 and the layout embedding 1214. [210-219] elaborates on the matter)
Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over US 20240004924 A1; Zhang; Zhifei et al. (hereinafter Zhang) in view of US 20240354376 A1; Tandon; Abhas et al. (hereinafter Tandon) and US 20170324752 A1; Todasco; Michael Charles et al. (hereinafter Todasco).
 Regarding claim 15, Zhang and Tandon teach The method of claim 1				However the combination lacks explicitly and orderly teaching receiving a conversation history comprising a summary of previous user interactions with the client device, wherein generating the one or more explicit text queries is further based on the conversation history.									However Todasco teaches receiving a conversation history comprising a summary of previous user interactions with the client device, wherein generating the one or more explicit text queries is further based on the conversation history. (Todasco [0021] generate another query for the user where the user fails to correctly answer the authentication query. The next query may be generally generated based on the user's user history as before. However, in other embodiments, the next query may instead be based on the user's incorrect response to the previous query, as well as incorrect responses to other previous queries. For example, if the user consistently fails to correctly respond to audio queries based on a song heard earlier that day by the user, the service provider may choose not to utilize audio based queries, or may utilize such queries more infrequently. Similarly, if the user answers image based queries easily and quickly, the service provider may favor image based queries [0038] Service provider server 140 may be maintained, for example, by an online service provider, which may provide authentication services for the user associated with communication device 110, as well as other entities where the other entities are requesting increased authentication security using one or more processes of service provider server 140. In this regard, service provider server 140 includes one or more processing applications which may be configured to interact with communication device 110, user history source 130, and/or another device/server to facilitate authenticating a user through an authentication query generated by service provider server 140 using a user history for the user. In one example, service provider server 140 may be provided by PAYPAL®, Inc. of San Jose, Calif., USA. However, in other embodiments, service provider server 140 may be maintained by or include a financial service provider, social networking service, email or messaging service, media sharing service, and/or other service provider, which may provide authentication services, for example, for the use of a provider account.[FIG.1] shows corresponding visual)											Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to take all prior methods and make the addition of Todasco's query creation methods in order to improve the system output and security (Todasco [0013] In order to provide increased security for an authentication attempt (e.g., to authenticate the user for use of an account, such as a login attempt, or to validate the identity of the user), the service provider may generate an authentication query for the user based on historical events that the user experienced, observed, or otherwise knows. For example, the service provider may receive, access, or determine a user history for the user, which may include real-life and/or real-world events experienced by or otherwise associated with the user. In this regard, during a time frame tracked in the user's history (e.g., previous hour, day, week, month, etc.) the user may perform various real-life or real-world actions, for example, movement between locations, observation and/or interaction with events, co-locating with other users, or other types of physical actions by the user. [0038] Service provider server 140 may be maintained, for example, by an online service provider, which may provide authentication services for the user associated with communication device 110, as well as other entities where the other entities are requesting increased authentication security using one or more processes of service provider server 140. In this regard, service provider server 140 includes one or more processing applications which may be configured to interact with communication device 110, user history source 130, and/or another device/server to facilitate authenticating a user through an authentication query generated by service provider server 140 using a user history for the user. [FIG.1] shows corresponding visual)
Claims 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over US 20240004924 A1; Zhang; Zhifei et al. (hereinafter Zhang) in view of US 20240256841 A1; ABRAMS; Bradley Moore et al. (hereinafter Abrams).
Regarding claim 17, Zhang teaches A method implemented by one or more processors, the method comprising: receiving an input query associated with a client device, the input query comprising an input image; generating, based on the input image, an image search request for a search engine; (Zhang [0015] FIG. 9 illustrates a diagram for utilizing an embedding-based search engine to conduct an image search using a multi-modal search input that includes a text query and an image query in accordance with one or more embodiments;  [0027]  an image search engine and a text search engine in responses to receiving a search query having textual and visual components. Further, in some implementations, the search-based editing system utilizes the search results to modify one or more attributes-such as color, tone, or texture—of an input digital image, bridging the search and editing processes. [0032] In some cases, the search-based editing system generates input embeddings for a multi-modal search input within a common embedding space. To illustrate, in some implementations, the search-based editing system receives a multi-modal search input, such as a search input having a text query and an image query. The search-based editing system generates, within a common embedding space (e.g., a text-image embedding space) a text embedding for the text query and an image embedding for the image query. The search-based editing system further retrieves digital images to return as the search results using the embeddings within the common embedding space. In some cases, the search-based editing system determines a weighted combination of the various components of the multi-modal search input (e.g., a weighted combination of the text query and the image query) and retrieves the search results using the weighted combination [56-63] elaborate on the matter [FIG.4] shows receiving an input query associated with a client device, the input query comprising an input image and generating, based on the input image, an image search request for a search engine )			transmitting, to the search engine, the image search request; receiving, from the search engine and in response to the image search request, a search response (Zhang [FIG.4] transmitting, to the search engine, the image search request [0015] FIG. 9 illustrates a diagram for utilizing an embedding-based search engine to conduct an image search using a multi-modal search input that includes a text query and an image query in accordance with one or more embodiments;  [0027]  an image search engine and a text search engine in responses to receiving a search query having textual and visual components. Further, in some implementations, the search-based editing system utilizes the search results to modify one or more attributes-such as color, tone, or texture—of an input digital image, bridging the search and editing processes. [0032] In some cases, the search-based editing system generates input embeddings for a multi-modal search input within a common embedding space. To illustrate, in some implementations, the search-based editing system receives a multi-modal search input, such as a search input having a text query and an image query. The search-based editing system generates, within a common embedding space (e.g., a text-image embedding space) a text embedding for the text query and an image embedding for the image query. The search-based editing system further retrieves digital images to return as the search results using the embeddings within the common embedding space. In some cases, the search-based editing system determines a weighted combination of the various components of the multi-modal search input (e.g., a weighted combination of the text query and the image query) and retrieves the search results using the weighted combination [56-63] elaborate on the matter)					containing that at least one of one or more of the images responsive to the image search request, including extracting a first text extract, of the text extracts, (Zhang [0016] FIGS. 10A-10E each illustrate image search results retrieved in response to weighted combinations of a text query and an image query in accordance with one or more embodiments; [0032]  determines a weighted combination of the various components of the multi-modal search input (e.g., a weighted combination of the text query and the image query) and retrieves the search results using the weighted combination. [0056]  a text search engine identifies textual features of a text input and searches for and retrieves digital images that incorporate one or more of those textual features. As will be discussed in more detail below, in some cases, a text search engine conducts the image search using embeddings (e.g., an embedding representing the text input and/or embeddings representing the digital images that are searched [0077] As shown in FIG. 4, the search-based editing system 106 determines the search modal to use for conducting the image search based on the search input received. For instance, in response to receiving one or more of the search inputs 406a-406c, the search-based editing system 106 determines to use the search modal 412a (i.e., a textual-visual search modal). [224-233] elaborate on the matter [FIG.4] shows overall visual of the system )													generating … using the LLM, a response to the input query and causing the response to the input query to be rendered at a client device. (Zhang [FIG.4] shows using the LLM, a response to the input query and causing the response to the input query to be rendered at a client device. [0098] As also illustrated in FIG. 5B, embodiments of the neural network appearance encoder 514 include five layers. For example, the neural network appearance encoder 514 includes a convolutional ConvBlock layer with a 7×7 kernel, in addition to four ConvBlock layers with 3×3 kernels, each with their own respective resolutions for input and output.[0107] As shown in FIG. 6, the object detection machine learning model 608 includes lower neural network layers and higher neural network layers. In general, the lower neural network layers collectively form the encoder 602 and the higher neural network layers collectively form the detection heads 604 (e.g., decoder). In one or more embodiments, the encoder 602 includes convolutional layers that encodes digital images into feature vectors, which are outputted from the encoder 602 and provided as input to the detection heads 604. In various implementations, the detection heads 604 comprise fully connected layers that analyze the feature vectors and output the detected objects (potentially with approximate boundaries around the objects [0129] FIG. 7B illustrates the search-based editing system 106 performing a WCT color transfer operation using a neural network in accordance with one or more embodiments. As shown in FIG. 7B, the search-based editing system 106 provides a content image 722 (e.g., an input digital image) and a style image 724 (e.g., a reference image) to a neural network 720 to generate an output 726 (e.g., a modified digital image). As indicated in FIG. 7B, in one or more embodiments, the neural network 720 includes an encoder 728. In particular, as shown, in one or more embodiments, the neural network 720 includes a visual geometry group (VGG) neural network, such as a VGG-19 network, as the encoder 728. As further shown in FIG. 7B, the neural network 720 includes a decoder 730. In one or more embodiments, the decoder 730 includes a symmetric decoder that inverts the features of the encoder 728 (e.g., the VGG-19 features) and output the modified digital image.[FIG.1] shows the corresponding visual)						Zhang lacks explicitly and orderly teaching comprising one or more web resources containing at least one of one or more images responsive to the image search request; extracting one or more text extracts from the one or more web resources; from a subset of a first resource of the web resources and extracting a second text extract, of the text extracts, from a subset of a second resource of the web resources; generating, based on the one or more text extracts, an input prompt for a large language model, LLM and corresponding methods based around the generated input prompts										one or more web resources containing at least one of one or more images responsive to the image search request; extracting one or more text extracts from the one or more web resources; (Abrams [0043] The search engine 110 can obtain information about “public” pages, where public pages are those represented in a search engine index. When indexing a webpage, for example, the search engine 110 can identify topics discussed in the webpage, entities referenced in the web page, sentiment of text in the web page (e.g., positive, negative, neutral), and so forth. At 202, public page information is provided by the search engine 110 to the interface module 113. While the search engine 110 is depicted in FIG. 2 as providing the public page information prior to the web browser 136 loading a webpage, it is understood that the interface module 113 can obtain information for a webpage in response to receiving an indication that the webpage has been loaded by the web browser 136. [0044] At 204, the web browser 136 requests the webpage 144 from the web server 142, and at 206 the web browser 136 obtains the webpage 144 from the web server 142. At 208, the web browser 136 transmits an information request to the interface module 113, where the information request is for the public page information for the webpage...[0045] the query and identifies search results, where the search results can include webpages identified by the search engine 110 as being relevant to the query, an instant answer, a knowledge card, and so forth. At 220, the search engine 110 provides the interface module 113 with at least some of the search results identified by the search engine 110. At 222, the interface module 113 provides the generative model 112 with additional context, where the additional context includes at least some of the search results identified by the search engine 110 (formatted in a manner that can be consumed by the generative model 112) [0048] Referring now to FIG. 3, a schematic that depicts a GUI 300 of the web browser 136 is shown. In the example depicted in FIG. 3, the web browser 136 has retrieved a webpage 144 that includes textual content...[0051-52] elaborates on the matter) 													from a subset of a first resource of the web resources and extracting a second text extract, of the text extracts, from a subset of a second resource of the web resources (Abrams [0025] The technologies described herein relate to integrating a generative model with an application (such as a web browser) and/or an operating system (OS). In an example, a client computing device executes a web browser, where the web browser receives user input to retrieve a webpage. The web browser is configured to communicate with a generative model (such as a GLM). For example, upon receipt of an indication that the user desires to interact with the generative model, a side panel is presented by the web browser. The side panel can overlay a portion of the webpage being displayed by the web browser. In another example, content of the webpage is resized to accommodate the screen real estate consumed by the side panel. When the webpage is a public page (e.g., indexed by a search engine), the web browser can cause content of the webpage to be provided to the generative model. In addition, the web browser can cause other information pertaining to the web browser to be provided to the generative model, such as uniform resource locators (URLs) of webpages loaded in tabs of the browser, titles of such webpages, times when the webpages were accessed, and so forth. [0031] The computing system 101 includes a processor 106 and memory 108, where the memory 108 includes instructions that are executed by the processor 106. More specifically, the memory 108 includes a search engine 110, a generative model 112, and an interface module 113 that, as will be described in greater detail below, acts as an interface between an application executing on the client computing device 102 (and/or an operating system of the client computing device 102), the search engine 110, and the generative model 112. Operations of the search engine 110, the generative model 112, and the interface module 113 are described in greater detail below. The computing system 106 also includes data stores 114-122, where the data stores 114-122 store data that is accessed by the search engine 110 and/or the generative model 112. With more particularity, the data stores 114-122 include a web index data store 114, an instant answers data store 116, a knowledge graph data store 118, a supplemental content data store 120, and a dialog history data store 122. The web index data store 114 includes a web index that indexes webpages by keywords included in or associated with the webpages. The instant answers data store 116 includes an index of instant answers that are indexed by queries, query terms, and/or terms that are semantically similar or equivalent to the queries and/or query terms. For example, the instant answer “2.16 meters” can be indexed by the query “height of Shaquille O'Neal” (and queries that are semantically similar or equivalent, such as “how tall is Shaquille O'Neal” [43-51] further elaborates [FIG.1] shows overall visual)										generating, based on the text extracts, including the first text extract and the second text extract, an input prompt for a large language model, LLM; and corresponding methods based around the generated input prompt; (Abrams [0053] The interface module 113 constructs a prompt at 518 based upon the information received by the interface module 113 at 514 and 516. The generative model 112 generates model output based upon the prompt received at 518. The model output can be conversational output, a summary of content shown on the webpage 144, and so forth. The generative model 112 transmits the model output to the interface module 113 at 520, and at 522 the interface module 113 transmits the model output to the web browser 136, whereupon the web browser 136 presents the model output together with content of the webpage 144. [0054] It is noted that the search engine 110 is not represented in the communications diagram 500. In an example, however, the generative model 112 can generate a query based upon the prompt received at 518 and can cause such query to be provided to the search engine 110 (e.g., by way of the interface module 113). Accordingly, the prompt used by the generative model 112 to generate the model output can include information identified by the search engine 110. Moreover, as indicated previously, at least a portion of the generative model 112 may be included in the web browser 136.[0061]  constructs a prompt for provision to the generative model 112 based upon such input. The generative model 112 generates model output based upon the prompt and provides the model output to the operating system 802 of the client computing device 102 by way of the interface module 113. Additionally, as described above, the generative model 112 can generate queries based upon input received from the interface module 113 and can cause such queries to be presented to the search engine 110, which conducts searches and provides at least a portion of search results identified based upon the searches to the generative model 112. The generative model 112 can generate output based upon these identified search results [74-77] go into further detail [FIG.1] shows the corresponding visual)								Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to take all prior methods and make the addition of Zhang in order to create a more accurate output via unique generative model methods (Abrams [0003] In addition, GLMs are not well-suited to provide accurate answers to user input that pertains to recent information. For instance, a GLM, upon receipt of input “what is the current weather in Chicago,” is unable to provide accurate output, as the GLM generates textual output based upon training data which tends to be at least somewhat stale (i.e., it is impractical to retrain a GLM every minute with updated information about weather, sporting events, stock markets, news events, and so forth).[0029] Referring now to FIG. 1, a functional block diagram of a computing environment 100 is illustrated. The computing environment 100 includes a computing system 101. While illustrated as a single system, it is to be understood that the computing system 101 can include several different server computing devices, can be distributed across data centers, etc. The computing system 101 is configured to facilitate interaction between a user and a generative model by way of a web browser (or other suitable application). [0040] Operation of the generative model 112 is improved due to the generative model 112 generating output based upon a prompt that includes content from the webpage 144 (or information about the webpage 144 determined by the search engine 110). In contrast, conventionally, generative models generate output based solely upon user input and dialog history. There are numerous use cases where the generative model 112 provides functionality that conventional generative models are unable to provide, and several of such use cases are set forth below [0056]  the generative model 112 can generate output that accurately addresses the input (identifying the holidays of the user in the month of May). Conventionally, a generative model is unable to appropriately respond to such input, as the generative model does not have access to the information requested by the user [0059] Recent history of all main page content across multiple browser tabs can be accessible to the generative model 112 to allow very richly grounded conversations over time. Reduced representations may be stored to save memory/bandwidth and control prompt size. History can be compressed further by models that embed the data into vector representations. More contextual weight can be given to main page content that is currently scrolled into view and or where the user has spent more time viewing.  [FIG.1] shows the corresponding visual)
Regarding claim 18, Zhang and Abrams teach The method of claim 17, wherein the text extracts from the one or more web resources in which one or more of the images responsive to the image search request are incorporated: text of one or more webpages in which at least one of the images responsive to the image search request are incorporated; text of one or more captions of at least one of the images responsive to the image search request; one or more tags of at least one of the images responsive to the image search request; and/or one or more sets of metadata at least one of the images responsive to the image search request. (Abrams [0025] a web browser, where the web browser receives user input to retrieve a webpage. The web browser is configured to communicate with a generative model (such as a GLM). For example, upon receipt of an indication that the user desires to interact with the generative model, a side panel is presented by the web browser. The side panel can overlay a portion of the webpage being displayed by the web browser. In another example, content of the webpage is resized to accommodate the screen real estate consumed by the side panel. When the webpage is a public page (e.g., indexed by a search engine), the web browser can cause content of the webpage to be provided to the generative model. In addition, the web browser can cause other information pertaining to the web browser to be provided to the generative model, such as uniform resource locators (URLs) of webpages loaded in tabs of the browser, titles of such webpages, times when the webpages were accessed, and so forth. [0027]  text in fields of the webpage, summarizing the text, and so forth. Further, the generative model can reason over content of the webpage (public or private) and provide output. Therefore, if a webpage includes content about statistics of Babe Ruth, and with respect to the conversational input “how many home runs did Babe Ruth hit before turning 30”, the generative model can reason over content of the webpage and generate output based upon content of such web page—e.g., “Babe Ruth hit 284 home runs before turning 30”. [0031] the search engine 110 and/or the generative model 112. With more particularity, the data stores 114-122 include a web index data store 114, an instant answers data store 116, a knowledge graph data store 118, a supplemental content data store 120, and a dialog history data store 122. The web index data store 114 includes a web index that indexes webpages by keywords included in or associated with the webpages. The instant answers data store 116 includes an index of instant answers that are indexed by queries, query terms, and/or terms that are semantically similar or equivalent to the queries and/or query terms. For example, the instant answer “2.16 meters” can be indexed by the query “height of Shaquille O'Neal” (and queries that are semantically similar or equivalent, such as “how tall is Shaquille O'Neal” [0059] a webpage, is provided with the HTML of the webpage and/or a rendered image of the webpage. The generative model 112 can be provided with the main page body as clean text. In another example, the generative model 112 is provided with information selected by the user (e.g., when the user highlights a portion of a webpage). In another example, entity extraction is undertaken on the page and the generative model 112 is provided with named entities. In connection with performing such tasks, the HTML (or image) can be converted to text and/or other models can be applied, such as object character recognition, object classifiers, image embedding models [FIG.1] shows corresponding visual) 
Regarding claim 19, Zhang and Tandon teach The method of claim 17, wherein: the input query further comprises an input text query; and the search request and/or the input prompt is further based on the input text query. (Zhang [FIG.4] shows using the LLM, a response to the input query and causing the response to the input query to be rendered at a client device. [0098] As also illustrated in FIG. 5B, embodiments of the neural network appearance encoder 514 include five layers. For example, the neural network appearance encoder 514 includes a convolutional ConvBlock layer with a 7×7 kernel, in addition to four ConvBlock layers with 3×3 kernels, each with their own respective resolutions for input and output.[0107] As shown in FIG. 6, the object detection machine learning model 608 includes lower neural network layers and higher neural network layers. In general, the lower neural network layers collectively form the encoder 602 and the higher neural network layers collectively form the detection heads 604 (e.g., decoder). In one or more embodiments, the encoder 602 includes convolutional layers that encodes digital images into feature vectors, which are outputted from the encoder 602 and provided as input to the detection heads 604. In various implementations, the detection heads 604 comprise fully connected layers that analyze the feature vectors and output the detected objects (potentially with approximate boundaries around the objects [0129] FIG. 7B illustrates the search-based editing system 106 performing a WCT color transfer operation using a neural network in accordance with one or more embodiments. As shown in FIG. 7B, the search-based editing system 106 provides a content image 722 (e.g., an input digital image) and a style image 724 (e.g., a reference image) to a neural network 720 to generate an output 726 (e.g., a modified digital image). As indicated in FIG. 7B, in one or more embodiments, the neural network 720 includes an encoder 728. In particular, as shown, in one or more embodiments, the neural network 720 includes a visual geometry group (VGG) neural network, such as a VGG-19 network, as the encoder 728. As further shown in FIG. 7B, the neural network 720 includes a decoder 730. In one or more embodiments, the decoder 730 includes a symmetric decoder that inverts the features of the encoder 728 (e.g., the VGG-19 features) and output the modified digital image.[FIG.1] shows the corresponding visual)
Regarding claim 20, Zhang and Abrams teach The method of claim 17, wherein the method further comprises: processing, using one or more unimodal image processing models, the input image to generate one or more query independent properties of the input image, wherein generating the image search request for a search engine and/or generating the input prompt for the LLM is further based on the one or more one or more query independent properties of the input image. (Abrams [0053] The interface module 113 constructs a prompt at 518 based upon the information received by the interface module 113 at 514 and 516. The generative model 112 generates model output based upon the prompt received at 518. The model output can be conversational output, a summary of content shown on the webpage 144, and so forth. The generative model 112 transmits the model output to the interface module 113 at 520, and at 522 the interface module 113 transmits the model output to the web browser 136, whereupon the web browser 136 presents the model output together with content of the webpage 144. [0054] It is noted that the search engine 110 is not represented in the communications diagram 500. In an example, however, the generative model 112 can generate a query based upon the prompt received at 518 and can cause such query to be provided to the search engine 110 (e.g., by way of the interface module 113). Accordingly, the prompt used by the generative model 112 to generate the model output can include information identified by the search engine 110. Moreover, as indicated previously, at least a portion of the generative model 112 may be included in the web browser 136.[0061]  constructs a prompt for provision to the generative model 112 based upon such input. The generative model 112 generates model output based upon the prompt and provides the model output to the operating system 802 of the client computing device 102 by way of the interface module 113. Additionally, as described above, the generative model 112 can generate queries based upon input received from the interface module 113 and can cause such queries to be presented to the search engine 110, which conducts searches and provides at least a portion of search results identified based upon the searches to the generative model 112. The generative model 112 can generate output based upon these identified search results.[74-77] go into further detail [FIG.1] shows the corresponding visual)  It is important to note that as shown in the independent claim 17 above primary prior art Zhang establishes the input image and image processing embodiments (as mapped above and shown in for example figure 4) and secondary prior art Abrams establishing the obviousness addition of generating a prompt and corresponding steps (as mapped above and shown in for example figure 1)
Regarding claim 21, Zhang and Abrams teach The method of claim 20, wherein the one or more unimodal image processing models comprises: an object detection model; an entity recognition model; a captioning model; an optical character recognition model; and/or an image segmentation model. (Zhang [0049] To provide an example implementation, in some embodiments, the search-based editing system 106 on the server(s) 102 supports the search-based editing system 106 on the client device 110n. For instance, in some cases, the search-based editing system 106 on the server(s) 102 learns parameters for a text search engine 114, an image search engine 116, and/or one or more models for modifying digital images. The search-based editing system 106 then, via the server(s) 102, provides the text search engine 114, the image search engine 116, and/or the one or more models for modifying digital images to the client device 110n. In other words, the client device 110n obtains (e.g., downloads) text search engine 114, the image search engine 116, and/or the one or more models for modifying digital images with the learned parameters from the server(s) 102. Once downloaded, the search-based editing system 106 on the client device 110n utilizes the text search engine 114 and/or the image search engine 116 to search for digital images independent from the server(s) 102. Further, the search-based editing system 106 on the client device 110n utilizes the one or more models for modifying digital images [0080] the search-based editing system 106 utilizes a neural network to implement the editing operation 416d. In one or more embodiments, a neural network includes a type of machine learning model, which can be tuned (e.g., trained) based on inputs to approximate unknown functions used for generating the corresponding outputs. In particular, in some embodiments, a neural network includes a model of interconnected artificial neurons (e.g., organized in layers) that communicate and learn to approximate complex functions and generate outputs based on a plurality of inputs provided to the model. [0102] Although FIG. 6 illustrates the search-based editing system 106 utilizing the detection-masking neural network 600, in one or more implementations, the search-based editing system 106 utilizes different machine learning models to detect and/or generate segmentation masks for objects. For instance, in one or more implementations, the search-based editing system 106 utilizes, as the object detection machine learning model, one of the machine learning models or neural networks described in U.S. patent application Ser. No. 17/158,527, entitled “Segmenting Objects In Digital Images Utilizing A Multi-Object Segmentation Model Framework,” filed on Jan. 26, 2021; or U.S. patent application Ser. No. 16/388,115, entitled “Robust Training of Large-Scale Object Detectors with Noisy Data,” filed on Apr. 8, 2019; or U.S. patent application Ser. No. 16/518,880, entitled “Utilizing Multiple Object Segmentation Models ... [0105] the detection-masking neural network 600 includes the object detection machine learning model 608 and the object segmentation machine learning model 610. In one or more implementations, the object detection machine learning model 608 includes both the encoder 602 and the detection heads 604 shown in FIG. 6. While the object segmentation machine learning model 610 includes both the encoder 602 and the masking head 606. Furthermore, the object detection machine learning model)







Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ARYAN D TOUGHIRY whose telephone number is (571)272-5212. The examiner can normally be reached Monday - Friday, 9 am - 5 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aleksandr Kerzhner can be reached at (571) 270-1760. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ARYAN D TOUGHIRY/Examiner, Art Unit 2165                                                                                                                                                                                                        
/ALEKSANDR KERZHNER/Supervisory Patent Examiner, Art Unit 2165
Read full office action
Prosecution Timeline

Aug 13, 2024
Application Filed
Jul 14, 2025
Non-Final Rejection — §103
Oct 31, 2025
Response Filed
Dec 15, 2025
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

19/052,975
Patent 12602374
DATA ACQUISITION METHOD AND APPARATUS, COMPUTER DEVICE AND STORAGE MEDIUM
2y 5m to grant Granted Apr 14, 2026
16/372,783
Patent 12596596
USER-SPACE PARALLEL ACCESS CHANNEL FOR TRADITIONAL FILESYSTEM USING CAPI TECHNOLOGY
2y 5m to grant Granted Apr 07, 2026
18/917,566
Patent 12579141
GENERATING QUERY ANSWERS FROM A USER'S HISTORY
2y 5m to grant Granted Mar 17, 2026
17/119,986
Patent 12572390
SYSTEMS AND METHODS FOR ADAPTIVE WEIGHTING OF MACHINE LEARNING MODELS
2y 5m to grant Granted Mar 10, 2026
17/703,522
Patent 12573292
VEHICLE IDENTIFICATION USING ADVANCED DRIVER ASSISTANCE SYSTEMS (ADAS)
2y 5m to grant Granted Mar 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
68%
Grant Probability
88%
With Interview (+19.9%)
3y 1m
Median Time to Grant
Moderate
PTA Risk
Based on 189 resolved cases by this examiner. Grant probability derived from career allow rate.