Detailed Action
This communication is in response to the Application filed on 7/15/2024.
Claims 1-20 are pending and have been examined.
Claims 1-20 are rejected
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 7/15/2024 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding Independent Claim 1, Claim 1 recites,
“1. A method for media processing, comprising:
receiving a text prompt including an entity phrase; [This relates to a human receiving a text prompt using vision or auditory processes.]
marking the entity phrase within the text prompt to obtain a revised prompt; [This relates to a human identifying and marking the phrase using pen and paper.]
generating, using a language generation model, a replacement phrase by performing autoregressive token generation based on a sequence of tokens from the revised prompt wherein the replacement phrase comprises a variant of the entity phrase;
[This relates to a human generating a replacement phrase in the human mind or using pen and paper.]
and generating an augmented prompt that includes the replacement phrase. [This relates to a human generating an augmented prompt in the human mind or using pen and paper.]
The Dependent Claim does not include additional limitations that could incorporate the abstract idea into a practical application or cause the Claim as a whole to amount to significantly more than the underlying abstract idea.
Regarding Independent Claim 12, claim 12 recites,
“A method of training a machine learning model, the method comprising:
obtaining a training set including a training text prompt and a training replacement phrase, [This relates to a human obtaining a training set using visual and auditory processes.] wherein the training text prompt includes a training entity phrase surrounded by a first tag and a second tag, and the training replacement phrase comprises a ground-truth variant of the training entity phrase; [This relates to a human performing training using visual and auditory processes.]
and training, using the training set, a language generation model to generate a replacement phrase based on a text prompt, wherein the replacement phrase comprises a variant of an entity phrase in the text prompt. [This relates to a human training using a training set using pen and paper or auditory processes.]
Regarding independent claim 16 claim 16 recites,
“A system for media processing, comprising: at least one memory; at least one processor executing instructions stored in the at least one memory; an entity marking model comprising entity marking parameters stored in the at least one memory,
the entity marking model trained to mark the entity phrase within a text prompt to obtain a revised prompt; and [This relates to a human marking a phrase using pen and paper.]
a language generation model comprising text generation parameters stored in the at least one memory, the language generation model trained to generate a replacement phrase based on the revised prompt, wherein the replacement phrase comprises a variant of the entity phrase. [This relates to a human generating a replacement phrase using pen and paper.]
This judicial exception is not integrated into a practical application. In particular, claim 16 recites additional elements of “memory” and “processor” For example, in [0153] of the as filed specification, there is description of using memory is used to store computer-readable, computer-executable software including instructions that, when executed, cause at least one processor of processor unit 1305 to perform various functions described herein. And in [0152] In some cases, processor unit 1305 is configured to operate a memory array using a memory controller. In other cases, a memory controller is integrated into processor unit 1305. In some cases, processor unit 1305 is configured to execute computer-readable instructions stored in memory unit 1310 Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claims are directed to an abstract idea.
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional element of using a processor and memory is noted as a general computer. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Further, the additional limitation in the claims noted above are directed towards insignificant solution activity. The claims are not patent eligible.
Dependent claim 2 recites,
“2. The method of claim 1, further comprising:
identifying, using a natural language processing model, the entity phrase from the text prompt. [This relates to a human identifying an entity phrase in the human mind.] No additional limitations present.
Dependent claim 3 recites,
“3. The method of claim 1, further comprising:
generating a plurality of replacement phrases including the replacement phrase; and receiving a user input selecting the replacement phrase from among the plurality of replacement phrases, wherein the augmented prompt is generated based on the user input. [This relates to a human generating a plurality of replacement phrases using pen and paper.] No additional limitations present.
Dependent claim 4 recites,
“4. The method of claim 1, further comprising:
identifying an additional entity phrase in the text prompt; and generating an additional replacement phrase for the additional entity phrase, wherein the augmented prompt includes the additional replacement phrase. [This relates to a human identifying an additional entity phrase in the human mind.
Dependent claim 5 recites,
“5. The method of claim 4, wherein: the additional replacement phrase is generated based on the replacement phrase. [This relates to a human generating a replacement phrase using pen and paper] No additional limitations present.
Dependent claim 6 recites,
“6. The method of claim 1, further comprising:
displaying the entity phrase; [This relates to a human displaying the entity phrase using pen and paper.]
receiving a selection of the entity phrase; [This relates to a human receiving a selection entity phrase using vision or auditory systems.]
and displaying the replacement phrase in response to the selection. [This relates to a human displaying the entity phrase using pen and paper.] No additional limitations present.
Dependent claim 7 recites,
“7. The method of claim 1, further comprising:
generating, using an image generation model, a synthetic image based on the augmented prompt, wherein the synthetic image depicts an entity described by the replacement phrase. [This relates to a human generating synthetic image based on the augmented prompt using pen and paper.] No additional limitations present.
Dependent claim 8 recites,
“8. The method of claim 1, further comprising:
retrieving a media item from a database based on the augmented prompt. [This relates to a human retrieving a media item from a database using logic and reasoning] No additional limitations present.
Dependent claim 8 recites,
Dependent claim 9 recites,
9. The method of claim 1, further comprising: receiving a refresh command; [This relates to a human receiving a refresh command using visual or auditory processes]
and generating an additional replacement phrase based on the refresh command. [This relates to a human generating an additional replacement phrase using pen and paper] No additional limitations present.
Dependent claim 10 recites,
10. The method of claim 1, wherein marking the entity phrase comprises:
inserting a first tag before the entity phrase and a second tag after the entity phrase. [This relates to a human inserting a first tag before the entity phrase and a second tag after the entity phrase pen and paper] No additional limitations present.
Dependent claim 11 recites,
11. The method of claim 1, wherein: the language generation model is trained to generate the replacement phrase using a training set including a training text prompt and a training replacement phrase. [This relates to a human generating a replacement phrase using pen and paper] No additional limitations present.
Dependent claim 13 recites,
13. The method of claim 12, wherein obtaining the training set comprises:
identifying the training entity phrase in the training text prompt; [This relates to a human identifying a phrase using visual systems and the human mind]
and inserting the first tag before the training entity phrase and the second tag after the training entity phrase. [This relates to a human inserting a first tag before the entity phrase and a second tag after the entity phrase pen and paper] No additional limitations present.
Dependent claim 14 recites,
“14. The method of claim 12, wherein training the language generation model comprises:
generating, using the language generation model, a training output based on the training text prompt; [This relates to a human generating an output using pen and paper]
computing a loss function based on the training output and the training replacement phrase; [This relates to a human computing a loss function using pen and paper]
and updating parameters of the language generation model based on the loss function. [This relates to a human updating parameters using pen and paper] No additional limitations present.
Dependent claim 15 recites,
15. The method of claim 12, wherein obtaining the training set comprises:
obtaining an additional replacement phrase comprising an additional variant of the training entity phrase. [This relates to a human obtaining an additional replacement phrase using visual or auditory systems] No additional limitations present.
Dependent claim 17 recites,
17. The system of claim 16, the system further comprising: an augmentation component configured to generate an augmented prompt that includes the replacement phrase. [This relates to a human generate an augmented prompt using pen and paper] No additional limitations present.
Dependent claim 18 recites,
18. The system of claim 16, the system further comprising:
an image generation model comprising image generation parameters stored in the at least one memory, the image generation model configured to generate an image based on the replacement phrase. [This relates to a human generate an image based on the replacement phrase using pen and paper] No additional limitations present.
Dependent claim 19 recites,
19. The system of claim 16, the system further comprising: a retrieval component configured to retrieve a media item from a database based on the replacement phrase. [This relates to a human retrieve a media item from a database using logic and reasoning in the human mind] No additional limitations present.
Dependent claim 20 recites,
20. The system of claim 16, the system further comprising: a user interface configured to receive a selection of the entity phrase [This relates to a human receiving an entity phrase using visual or auditory systems]
and display the replacement phrase in response to the selection. [This relates to a human display the replacement phrase using pen and paper] No additional limitations present.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Tambi (U.S. Patent Number US 20240104131 A1), in view of Patterson (U.S. Patent Number US 20120197885 A1).
Regarding Claim 1, Tambi teaches 1. A method for media processing, comprising: receiving a text prompt including an entity phrase; (see Tambi [0022] “According to some embodiments, the query processing apparatus identifies a target phrase in an original query. The target phrase is a phrase to be replaced in the original query. The query processing apparatus then replaces the target phrase with a mask token to obtain a modified query and generates an alternative query using the masked language model. The query processing apparatus then retrieves a search result (e.g., images related to the alternative query).”) generating, using a language generation model, a replacement phrase by performing autoregressive token generation based on a sequence of tokens from the revised prompt, (see Tambi [0025] “In FIGS. 1-4, an apparatus and method for query processing are described. One or more aspects of the apparatus and method include at least one processor; at least one memory including instructions executable by the processor; an MLM configured to generate a plurality of candidate alternative phrases based on a modified query, wherein the modified query comprises a mask token in place of a target phrase in an original query; an embedding model configured to encode the target phrase to obtain a target phrase embedding and to encode the plurality of candidate alternative phrases to obtain a plurality of candidate alternative phrase embeddings; and a query generation component configured to select an alternative phrase from the plurality of candidate alternative phrases based on the target phrase embedding and the plurality of candidate alternative phrase embeddings and to generate an alternative query by replacing the target phrase with the alternative phrase.”) wherein the replacement phrase comprises a variant of the entity phrase; and generating an augmented prompt that includes the replacement phrase. (see Tambi [0019] “The present disclosure describes systems and methods for query processing. Embodiments of the present disclosure include a query processing apparatus configured to generate alternative queries based on an original query (i.e., to retrieve more varied search results). The present disclosure involves creating a modified query by using a mask token in place of a target phrase. A masked language model (MLM) generates candidate alternative phrases based on the modified query by filling the mask token with nearest neighbors, respectively. One or more alternative queries are then selected by comparing candidate phrase embedding to the target phrase embedding. Accordingly, the query processing apparatus provides a search result related to an alternative query such as images depicting the query.”)
Tambi does not specifically teach marking the entity phrase within the text prompt to obtain a revised prompt; However, Patterson does teach this limitation (See Patterson, [0032] The phrase identification operation of the indexing system 110 identifies "good" and "bad" phrases in the document collection that are useful to indexing and searching documents. In one aspect, good phrases are phrases that tend to occur in more than certain percentage of documents in the document collection, and/or are indicated as having a distinguished appearance in such documents, such as delimited by markup tags or other morphological, format, or grammatical markers. Another aspect of good phrases is that they are predictive of other good phrases, and are not merely sequences of words that appear in the lexicon. For example, the phrase "President of the United States" is a phrase that predicts other phrases such as "George Bush" and "Bill Clinton." However, other phrases are not predictive, such as "fell down the stairs" or "top of the morning," "out of the blue," since idioms and colloquisms like these tend to appear with many other different and unrelated phrases. Thus, the phrase identification phase determines which phrases are good phrases and which are bad (i.e., lacking in predictive power).”)
Tambi and Patterson are in the same field of endeavor of signal processing, therefore, it would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified a method for media processing of Tambi to incorporate marking the entity phrase within the text prompt to obtain a revised prompt of Patterson. This allows idexing, document annotation, searching, ranking, and other areas of document analysis and processing as recognized by Patterson [0014].
As to Claim 2, Tambi in view of Patterson teaches 2. The method of claim 1,
Furthermore, Tambi teaches further comprising: identifying, using a natural language processing model, the entity phrase from the text prompt. (see Tambi, [0022] “According to some embodiments, the query processing apparatus identifies a target phrase in an original query. The target phrase is a phrase to be replaced in the original query. The query processing apparatus then replaces the target phrase with a mask token to obtain a modified query and generates an alternative query using the masked language model. The query processing apparatus then retrieves a search result (e.g., images related to the alternative query).”)
As to Claim 3, Tambi in view of Patterson teaches 3. The method of claim 1,
Furthermore, Tambi teaches further comprising: generating a plurality of replacement phrases including the replacement phrase; (see Tambi [0019] The present disclosure describes systems and methods for query processing. Embodiments of the present disclosure include a query processing apparatus configured to generate alternative queries based on an original query (i.e., to retrieve more varied search results). The present disclosure involves creating a modified query by using a mask token in place of a target phrase. A masked language model (MLM) generates candidate alternative phrases based on the modified query by filling the mask token with nearest neighbors, respectively. One or more alternative queries are then selected by comparing candidate phrase embedding to the target phrase embedding. Accordingly, the query processing apparatus provides a search result related to an alternative query such as images depicting the query. and receiving a user input selecting the replacement phrase from among the plurality of replacement phrases, wherein the augmented prompt is generated based on the user input. (see Tambi, [0031] “User device 105 may be a personal computer, laptop computer, mainframe computer, palmtop computer, personal assistant, mobile device, or any other suitable processing apparatus. In some examples, user device 105 includes software that incorporates a query processing application. In some examples, the query processing application on user device 105 may include functions of query processing apparatus 110. In some examples, user device 105 includes a user interface that displays one or more alternative queries to user 100. The user interface receives the original query from user 100 and receives a user input indicating a target phrase (e.g., “grunge”) of the original query.”)
Regarding Claim 4, Tambi in view of Patterson teaches 4. The method of claim 1,
Furthermore, Tambi teaches further comprising: identifying an additional entity phrase in the text prompt; and generating an additional replacement phrase for the additional entity phrase, wherein the augmented prompt includes the additional replacement phrase. (See Tambi [0043] According to some embodiments, machine learning model 220 identifies a set of candidate alternative phrases based on the filtered set of replacement tokens. In some examples, machine learning model 220 compares the target phrase embedding to the set of candidate alternative phrase embeddings. Machine learning model 220 is an example of, or includes aspects of, the corresponding element described with reference to FIGS. 9 and 10.”)(see Tambi [0046-0047] According to some embodiments, masked language model 225 generates an alternative query based on the modified query using a masked language model (MLM), where the alternative query includes an alternative phrase in place of the target phrase that is consistent with a context of the target phrase. In some examples, masked language model 225 generates a set of replacement tokens based on the modified query. [0047] In some examples, masked language model 225 generates an additional alternative phrase based on the modified query. Masked language model 225 generates an additional alternative query that includes the additional alternative phrase in place of the target phrase. In some examples, masked language model 225 generates an additional alternative query based on the additional target phrase, where the additional alternative query includes an additional alternative phrase in place of the additional target phrase.”)
Regarding Claim 5, Tambi in view of Patterson teaches 5. The method of claim 4,
Furthermore, Tambi teaches wherein: the additional replacement phrase is generated based on the replacement phrase. (See Tambi [0043] According to some embodiments, machine learning model 220 identifies a set of candidate alternative phrases based on the filtered set of replacement tokens. In some examples, machine learning model 220 compares the target phrase embedding to the set of candidate alternative phrase embeddings. Machine learning model 220 is an example of, or includes aspects of, the corresponding element described with reference to FIGS. 9 and 10.”)(see Tambi [0046-0047] According to some embodiments, masked language model 225 generates an alternative query based on the modified query using a masked language model (MLM), where the alternative query includes an alternative phrase in place of the target phrase that is consistent with a context of the target phrase. In some examples, masked language model 225 generates a set of replacement tokens based on the modified query. [0047] In some examples, masked language model 225 generates an additional alternative phrase based on the modified query. Masked language model 225 generates an additional alternative query that includes the additional alternative phrase in place of the target phrase. In some examples, masked language model 225 generates an additional alternative query based on the additional target phrase, where the additional alternative query includes an additional alternative phrase in place of the additional target phrase.”)
Regarding Claim 6, Tambi in view of Patterson teaches 6. The method of claim 1,
Furthermore, Tambi teaches further comprising: displaying the entity phrase; receiving a selection of the entity phrase; and displaying the replacement phrase in response to the selection. (see Tambi, [0031] “User device 105 may be a personal computer, laptop computer, mainframe computer, palmtop computer, personal assistant, mobile device, or any other suitable processing apparatus. In some examples, user device 105 includes software that incorporates a query processing application. In some examples, the query processing application on user device 105 may include functions of query processing apparatus 110. In some examples, user device 105 includes a user interface that displays one or more alternative queries to user 100. The user interface receives the original query from user 100 and receives a user input indicating a target phrase (e.g., “grunge”) of the original query.”)
Regarding Claim 7, Tambi in view of Patterson teaches 7. The method of claim 1,
Furthermore, Tambi teaches further comprising: generating, using an image generation model, a synthetic image based on the augmented prompt, wherein the synthetic image depicts an entity described by the replacement phrase. (see Tambi, [0059] According to some embodiments, search interface 255 provides an image based on the alternative query via searching a database of candidate images. In some examples, an image generation model may be used to generate or synthesize a set of images based on the alternative query. Search interface 255 provides the set of images. Search interface 255 is an example of, or includes aspects of, the corresponding element described with reference to FIGS. 6 and 7.”)
Regarding Claim 8, Tambi in view of Patterson teaches 8. The method of claim 1,
Furthermore, Tambi teaches further comprising: retrieving a media item from a database based on the augmented prompt. (see Tambi, [0030] “Query processing apparatus 110 replaces the target phrase with a mask token to obtain a modified query and generates an alternative query based on the modified query using a masked language model, where the alternative query includes an alternative phrase in place of the target phrase that is consistent with a context of the target phrase. For example, the alternative phrase is “scratched background with geometric shapes”. Query processing apparatus 110 retrieves a search result (e.g., images related to the alternative phrase) from database 120. The alternative query and the search result is returned to user 100 via cloud 115 and user device 105.”)
Regarding Claim 9, Tambi in view of Patterson teaches 9. The method of claim 1,
Furthermore Tambi teaches further comprising: receiving a refresh command; and generating an additional replacement phrase based on the refresh command. (see Tambi, Figure 5, see Tambi [0091] “At operation 505, the user provides an original query. In some cases, the operations of this step refer to, or may be performed by, a user as described with reference to FIG. 1. In some examples, the user uploads the original query via a query upload element of a user interface. The original query recites “grunge background with geometric shapes.” The word “grunge” is underscored to indicate it is a target phrase selected by the user to be replaced.”)
Regarding Claim 10, Tambi in view of Patterson teaches 10. The method of claim 1,
Furthermore, Patterson teaches wherein marking the entity phrase comprises: inserting a first tag before the entity phrase and a second tag after the entity phrase. (see Patterson, [0119] In one embodiment, the related phrase information is a related phase bit vector. This bit vector may be characterized as a "bi-bit" vector, in that for each related phrase g.sub.k there are two bit positions, g.sub.k-1, g.sub.k-2. The first bit position stores a flag indicating whether the related phrase g.sub.k is present in the document d (i.e., the count for g.sub.k in document d is greater than 0). The second bit position stores a flag that indicates whether a related phrase g.sub.j of g.sub.k is also present in document d. The related phrases g.sub.l of a related phrase g.sub.k of a phrase g.sub.j are herein called the "secondary related phrases of g.sub.j" The counts and bit positions correspond to the canonical order of the phrases in R (sorted in order of decreasing information gain). This sort order has the effect of making the related phrase g.sub.k that is most highly predicted by g.sub.j associated with the most significant bit of the related phrase bit vector, and the related phrase g.sub.l that is least predicted by g.sub.j associated with the least significant bit.”)
Tambi and Patterson are in the same field of endeavor of signal processing, therefore, it would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified a method of combination of Tambi and Patterson to incorporate marking the entity phrase comprises: inserting a first tag before the entity phrase and a second tag after the entity phrase of Patterson. This allows idexing, document annotation, searching, ranking, and other areas of document analysis and processing as recognized by Patterson [0014].
Regarding Claim 11, Tambi in view of Patterson teaches 11. The method of claim 1,
Furthermore, Tambi teaches wherein: the language generation model is trained to generate the replacement phrase using a training set including a training text prompt and a training replacement phrase. (see Tambi [0051] According to some embodiments, embedding model 230 encodes the target phrase to obtain a target phrase embedding. In some examples, embedding model 230 encodes a set of candidate alternative phrases to obtain a set of candidate alternative phrase embeddings. In some examples, embedding model 230 encodes the alternative query to obtain an alternative query embedding. Embedding model 230 compares the alternative query embedding to one or more image embeddings. Embedding model 230 is an example of, or includes aspects of, the corresponding element described with reference to FIG. 3.”) (see Tambi [0052] “A word embedding is a learned representation for text where words that have the same meaning have a similar representation. GloVe and Word2vec are examples of systems for obtaining a vector representation of words. GloVe is an unsupervised algorithm for training a network using on aggregated global word-word co-occurrence statistics from a corpus. Similarly, a Word2vec model may include a shallow neural network trained to reconstruct the linguistic context of words. GloVe and Word2vec models may take a large corpus of text and produces a vector space as output. In some cases, the vector space may have a large number of dimensions. Each word in the corpus is assigned a vector in the space. Word vectors are positioned in the vector space in a manner such that similar words are located nearby in the vector space. In some”)
As to independent Claim 12, Tambi teaches
A method of training a machine learning model, the method comprising: obtaining a training set including a training text prompt and a training replacement phrase, (see Tambi [0098] “FIG. 7 shows an example of a user interface according to aspects of the present disclosure. The example shown includes search interface 700, original query 705, target phrase 710, candidate alternative phrases 715, and search result 720.”) and the training replacement phrase comprises a ground-truth variant of the training entity phrase; (see Tambi [0022] “According to some embodiments, the query processing apparatus identifies a target phrase in an original query. The target phrase is a phrase to be replaced in the original query. The query processing apparatus then replaces the target phrase with a mask token to obtain a modified query and generates an alternative query using the masked language model. The query processing apparatus then retrieves a search result (e.g., images related to the alternative query).”) and training, using the training set, a language generation model to generate a replacement phrase based on a text prompt, (see Tambi [0025] “In FIGS. 1-4, an apparatus and method for query processing are described. One or more aspects of the apparatus and method include at least one processor; at least one memory including instructions executable by the processor; an MLM configured to generate a plurality of candidate alternative phrases based on a modified query, wherein the modified query comprises a mask token in place of a target phrase in an original query; an embedding model configured to encode the target phrase to obtain a target phrase embedding and to encode the plurality of candidate alternative phrases to obtain a plurality of candidate alternative phrase embeddings; and a query generation component configured to select an alternative phrase from the plurality of candidate alternative phrases based on the target phrase embedding and the plurality of candidate alternative phrase embeddings and to generate an alternative query by replacing the target phrase with the alternative phrase.”) wherein the replacement phrase comprises a variant of an entity phrase in the text prompt. (see Tambi [0019] “The present disclosure describes systems and methods for query processing. Embodiments of the present disclosure include a query processing apparatus configured to generate alternative queries based on an original query (i.e., to retrieve more varied search results). The present disclosure involves creating a modified query by using a mask token in place of a target phrase. A masked language model (MLM) generates candidate alternative phrases based on the modified query by filling the mask token with nearest neighbors, respectively. One or more alternative queries are then selected by comparing candidate phrase embedding to the target phrase embedding. Accordingly, the query processing apparatus provides a search result related to an alternative query such as images depicting the query.”)
Tambi does not specifically teach wherein the training text prompt includes a training entity phrase surrounded by a first tag and a second tag, However, Patterson does teach this limitation (See Patterson, [0032] The phrase identification operation of the indexing system 110 identifies "good" and "bad" phrases in the document collection that are useful to indexing and searching documents. In one aspect, good phrases are phrases that tend to occur in more than certain percentage of documents in the document collection, and/or are indicated as having a distinguished appearance in such documents, such as delimited by markup tags or other morphological, format, or grammatical markers. Another aspect of good phrases is that they are predictive of other good phrases, and are not merely sequences of words that appear in the lexicon. For example, the phrase "President of the United States" is a phrase that predicts other phrases such as "George Bush" and "Bill Clinton." However, other phrases are not predictive, such as "fell down the stairs" or "top of the morning," "out of the blue," since idioms and colloquisms like these tend to appear with many other different and unrelated phrases. Thus, the phrase identification phase determines which phrases are good phrases and which are bad (i.e., lacking in predictive power).”)
Tambi and Patterson are in the same field of endeavor of signal processing, therefore, it would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified a method for media processing of Tambi to incorporate wherein the training text prompt includes a training entity phrase surrounded by a first tag and a second tag of Patterson. This allows idexing, document annotation, searching, ranking, and other areas of document analysis and processing as recognized by Patterson [0014].
Regarding Claim 13, Tambi in view of Patterson teaches 13. The method of claim 12,
Furthermore, Patterson teaches wherein obtaining the training set comprises: identifying the training entity phrase in the training text prompt; and inserting the first tag before the training entity phrase and the second tag after the training entity phrase. (see Patterson [0119] “In one embodiment, the related phrase information is a related phase bit vector. This bit vector may be characterized as a "bi-bit" vector, in that for each related phrase g.sub.k there are two bit positions, g.sub.k-1, g.sub.k-2. The first bit position stores a flag indicating whether the related phrase g.sub.k is present in the document d (i.e., the count for g.sub.k in document d is greater than 0). The second bit position stores a flag that indicates whether a related phrase g.sub.j of g.sub.k is also present in document d. The related phrases g.sub.l of a related phrase g.sub.k of a phrase g.sub.j are herein called the "secondary related phrases of g.sub.j" The counts and bit positions correspond to the canonical order of the phrases in R (sorted in order of decreasing information gain). This sort order has the effect of making the related phrase g.sub.k that is most highly predicted by g.sub.j associated with the most significant bit of the related phrase bit vector, and the related phrase g.sub.l that is least predicted by g.sub.j associated with the least significant bit.”)
Tambi and Patterson are in the same field of endeavor of signal processing, therefore, it would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified a method for media processing of Tambi to incorporate obtaining the training set comprises: identifying the training entity phrase in the training text prompt; and inserting the first tag before the training entity phrase and the second tag after the training entity phrase of Patterson. This allows idexing, document annotation, searching, ranking, and other areas of document analysis and processing as recognized by Patterson [0014].
Regarding Claim 14, Tambi in view of Patterson teaches 14. The method of claim 12,
Furthermore, Tambi teaches wherein training the language generation model comprises: generating, using the language generation model, a training output based on the training text prompt; computing a loss function based on the training output and the training replacement phrase; and updating parameters of the language generation model based on the loss function. (see Tambi [0045] “During the training process, the parameters and weights of machine learning model 220 are adjusted to increase the accuracy of the result (i.e., by attempting to minimize a loss function which corresponds to the difference between the current result and the target result). The weight of an edge increases or decreases the strength of the signal transmitted between nodes. In some cases, nodes have a threshold below which a signal is not transmitted at all. In some examples, the nodes are aggregated into layers. Different layers perform different transformations on their inputs. The initial layer is known as the input layer and the last layer is known as the output layer. In some cases, signals traverse certain layers multiple times.”)
Regarding Claim 15, Tambi in view of Patterson teaches 15. The method of claim 12,
Furthermore, Tambi teaches wherein obtaining the training set comprises: obtaining an additional replacement phrase comprising an additional variant of the training entity phrase. (See Tambi [0043] According to some embodiments, machine learning model 220 identifies a set of candidate alternative phrases based on the filtered set of replacement tokens. In some examples, machine learning model 220 compares the target phrase embedding to the set of candidate alternative phrase embeddings. Machine learning model 220 is an example of, or includes aspects of, the corresponding element described with reference to FIGS. 9 and 10.”)(see Tambi [0046-0047] According to some embodiments, masked language model 225 generates an alternative query based on the modified query using a masked language model (MLM), where the alternative query includes an alternative phrase in place of the target phrase that is consistent with a context of the target phrase. In some examples, masked language model 225 generates a set of replacement tokens based on the modified query. [0047] In some examples, masked language model 225 generates an additional alternative phrase based on the modified query. Masked language model 225 generates an additional alternative query that includes the additional alternative phrase in place of the target phrase. In some examples, masked language model 225 generates an additional alternative query based on the additional target phrase, where the additional alternative query includes an additional alternative phrase in place of the additional target phrase.”)
As to independent Claim 16, Tambi teaches A system for media processing, comprising: at least one memory; at least one processor executing instructions stored in the at least one memory; (see Tambi [0006] An apparatus and method for query processing are described. One or more embodiments of the apparatus and method include at least one processor; at least one memory including instructions executable by the processor”) and a language generation model comprising text generation parameters stored in the at least one memory, (see Tambi [0025] “In FIGS. 1-4, an apparatus and method for query processing are described. One or more aspects of the apparatus and method include at least one processor; at least one memory including instructions executable by the processor; an MLM configured to generate a plurality of candidate alternative phrases based on a modified query, wherein the modified query comprises a mask token in place of a target phrase in an original query; an embedding model configured to encode the target phrase to obtain a target phrase embedding and to encode the plurality of candidate alternative phrases to obtain a plurality of candidate alternative phrase embeddings; and a query generation component configured to select an alternative phrase from the plurality of candidate alternative phrases based on the target phrase embedding and the plurality of candidate alternative phrase embeddings and to generate an alternative query by replacing the target phrase with the alternative phrase.”) the language generation model trained to generate a replacement phrase based on the revised prompt, wherein the replacement phrase comprises a variant of the entity phrase. (see Tambi [0019] “The present disclosure describes systems and methods for query processing. Embodiments of the present disclosure include a query processing apparatus configured to generate alternative queries based on an original query (i.e., to retrieve more varied search results). The present disclosure involves creating a modified query by using a mask token in place of a target phrase. A masked language model (MLM) generates candidate alternative phrases based on the modified query by filling the mask token with nearest neighbors, respectively. One or more alternative queries are then selected by comparing candidate phrase embedding to the target phrase embedding. Accordingly, the query processing apparatus provides a search result related to an alternative query such as images depicting the query.”)
Tambi does not specifically teach an entity marking model comprising entity marking parameters stored in the at least one memory, the entity marking model trained to mark the entity phrase within a text prompt to obtain a revised prompt; However, Patterson does teach this limitation (See Patterson, [0032] The phrase identification operation of the indexing system 110 identifies "good" and "bad" phrases in the document collection that are useful to indexing and searching documents. In one aspect, good phrases are phrases that tend to occur in more than certain percentage of documents in the document collection, and/or are indicated as having a distinguished appearance in such documents, such as delimited by markup tags or other morphological, format, or grammatical markers. Another aspect of good phrases is that they are predictive of other good phrases, and are not merely sequences of words that appear in the lexicon. For example, the phrase "President of the United States" is a phrase that predicts other phrases such as "George Bush" and "Bill Clinton." However, other phrases are not predictive, such as "fell down the stairs" or "top of the morning," "out of the blue," since idioms and colloquisms like these tend to appear with many other different and unrelated phrases. Thus, the phrase identification phase determines which phrases are good phrases and which are bad (i.e., lacking in predictive power).”)
Tambi and Patterson are in the same field of endeavor of signal processing, therefore, it would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified a system for media processing of Tambi to incorporate an entity marking model comprising entity marking parameters stored in the at least one memory the entity marking model trained to mark the entity phrase within a text prompt to obtain a revised prompt; of Patterson. This allows indexing, document annotation, searching, ranking, and other areas of document analysis and processing as recognized by Patterson [0014].
Regarding Claim 17, Tambi in view of Patterson teaches 17. The system of claim 16,
Furthermore, Tambi teaches the system further comprising: an augmentation component configured to generate an augmented prompt that includes the replacement phrase. (see Tambi, [0031] “User device 105 may be a personal computer, laptop computer, mainframe computer, palmtop computer, personal assistant, mobile device, or any other suitable processing apparatus. In some examples, user device 105 includes software that incorporates a query processing application. In some examples, the query processing application on user device 105 may include functions of query processing apparatus 110. In some examples, user device 105 includes a user interface that displays one or more alternative queries to user 100. The user interface receives the original query from user 100 and receives a user input indicating a target phrase (e.g., “grunge”) of the original query.”)
Regarding Claim 18, Tambi in view of Patterson teaches 18. The system of claim 16,
Furthermore, Tambi teaches the system further comprising: an image generation model comprising image generation parameters stored in the at least one memory, the image generation model configured to generate an image based on the replacement phrase. (see Tambi, [0059] According to some embodiments, search interface 255 provides an image based on the alternative query via searching a database of candidate images. In some examples, an image generation model may be used to generate or synthesize a set of images based on the alternative query. Search interface 255 provides the set of images. Search interface 255 is an example of, or includes aspects of, the corresponding element described with reference to FIGS. 6 and 7.”)
Regarding Claim 19, Tambi in view of Patterson teaches 19. The system of claim 16,
Furthermore, Tambi teaches the system further comprising: a retrieval component configured to retrieve a media item from a database based on the replacement phrase. (see Tambi Figure 7 Element 720)
Regarding Claim 20, Tambi in view of Patterson teaches 20. The system of claim 16,
Furthermore, Tambi teaches the system further comprising: a user interface configured to receive a selection of the entity phrase and display the replacement phrase in response to the selection. (see Tambi, [0031] “User device 105 may be a personal computer, laptop computer, mainframe computer, palmtop computer, personal assistant, mobile device, or any other suitable processing apparatus. In some examples, user device 105 includes software that incorporates a query processing application. In some examples, the query processing application on user device 105 may include functions of query processing apparatus 110. In some examples, user device 105 includes a user interface that displays one or more alternative queries to user 100. The user interface receives the original query from user 100 and receives a user input indicating a target phrase (e.g., “grunge”) of the original query.”)
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KRISTEN MICHELLE MASTERS whose telephone number is (703)756-1274. The examiner can normally be reached M-F 8:30 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Louis Desir can be reached at 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/KRISTEN MICHELLE MASTERS/Examiner, Art Unit 2659
/PIERRE LOUIS DESIR/Supervisory Patent Examiner, Art Unit 2659