DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action is in response to Application No. 18/216,031 filed on 06/29/2023.
Claims 1-20 have been examined and are pending in this application.
Information Disclosure Statement
The information disclosure statement (IDS), submitted on 06/29/2023, is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
Claim(s) 1, 4-6, 9, and 12-14 are rejected under 35 U.S.C. 103 as being unpatentable over Klein et al. (US 2022/0067486; Hereinafter “Klein”) in view of Lu et al. (US 2020/0065389; Hereinafter “Lu”).
Regarding claim 1, Klein teaches a method, comprising:
receiving, by a machine learning model, a first target question, a second target question, and a context (Klein: Para. [0044], The fine-tuning may include the first machine learning model 115a performing a conditional generation of questions given an annotated answer. For example, during this training phase, the first machine learning model 115a may be provided a question context c along with an l quantity of answer-question tuples (a.sub.i, q.sub.i), wherein the value of l may vary from context to context, a.sub.i may denote the ground truth answer, and q.sub.i may denote the ground truth question. [quantity of questions may include 2]);
outputting, by a first decoder of the machine learning model, a first representation of first candidate words to follow one or more words of the first target question based on the context and the one or more words of the first target question (Klein: Para. [0048], The first machine learning model 115a may be trained collaboratively with the second machine learning model 115b to perform the question generation task by at least generating a question for a given context. The context may be endowed with the question generated by the first machine learning model 115a (without answer annotation) before being given to the second machine learning model 115b as a basis for the question answering task.);
Klein does not explicitly teach outputting, by a second decoder of the machine learning model, a second representation of second candidate words to follow one or more words of the second target question based on the context and the one or more words of the second target question; and training the machine learning model to generate diverse questions for a given context based on a diversity loss that captures a degree of variance between the first representation and the second representation.
In an analogous art, Lu teaches outputting, by a second decoder of the machine learning model, a second representation of second candidate words to follow one or more words of the second target question based on the context and the one or more words of the second target question (Lu: Para. [0005], inputting the sentence vector of the sample sentence into a first decoder model corresponding to each context sentence of the sample sentence, to obtain a first identifier corresponding to the context sentence. The method also includes inputting the sentence vector of the sample sentence into a second decoder model corresponding to each word of the sample sentence, to obtain a second identifier corresponding to the word;); and
training the machine learning model to generate diverse questions for a given context based on a diversity loss that captures a degree of variance between the first representation and the second representation (Lu: Para. [0005], and obtaining a first probability corresponding to the first identifier according to the first decoder models, obtaining a second probability corresponding to the second identifier according to the second decoder models, and determining a value of a target function, the value of the target function being used for indicating that the sentence vector of the sample sentence represents a semantic accuracy degree. Further, the method includes performing parameter training on the encoder model according to the value of the target function; and inputting a word vector of each word in a test sentence into the trained encoder model, to obtain a sentence vector representing semantics of the test sentence.).
It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Lu with the system and method of Klein to include outputting, by a second decoder of the machine learning model, a second representation of second candidate words to follow one or more words of the second target question based on the context and the one or more words of the second target question; and training the machine learning model to generate diverse questions for a given context based on a diversity loss that captures a degree of variance between the first representation and the second representation because this functionality provides training of a machine learning model to produce the semantics of a sentence (Lu: Para. [0004]).
Regarding claim 4, Klein, in combination with Lu, teaches the method of claim 1, wherein the machine learning model is a pre-trained natural language processing model, and the training includes fine-tuning the pre-trained natural language processing model to generate diverse questions for the given context (Klein: Para. [0036], The transformer decoder network and the transformer encoder network may be fine-tuned in tandem in an end-to-end manner including by adjusting the weights applied by the transformer decoder network when generating questions in order to minimize the errors in the corresponding answers output by the transformer encoder network.).
Regarding claim 5, Klein, in combination with Lu, teaches the method of claim 1, wherein the machine learning model is a transformer having an encoder-decoder architecture (Klein: Para. [0011], the first machine learning model may be a transformer decoder network and the second machine learning model may be a transformer encoder network. Para. [0036], The transformer decoder network and the transformer encoder network may be fine-tuned in tandem in an end-to-end manner including by adjusting the weights applied by the transformer decoder network when generating questions in order to minimize the errors in the corresponding answers output by the transformer encoder network.).
Regarding claim 6, Klein, in combination with Lu, teaches the method of claim 1, wherein the machine learning model is a transformer having a decoder architecture (Klein: Para. [0011], the first machine learning model may be a transformer decoder network and the second machine learning model may be a transformer encoder network.).
Regarding claim 9, Claim 9 is rejected under the same rational as claims 1.
Regarding claims 12-14, Claims 12-14 are rejected under the same rational as claims 4-6, respectively.
Claim(s) 2-3 and 10-11 are rejected under 35 U.S.C. 103 as being unpatentable over Klein et al. (US 2022/0067486; Hereinafter “Klein”) in view of Lu et al. (US 2020/0065389; Hereinafter “Lu”) in view of Zhou et al. (US 2021/0319188; Hereinafter “Zhou”).
Regarding claim 2, Klein, in combination with Lu, teaches the method of claim 1. Klein, in combination with Lu, does not explicitly teach further comprising: converting the first representation into a first probability distribution including probabilities that individual words of the first candidate words follow the one or more words of the first target question; and converting the second representation into a second probability distribution including probabilities that individual words of the second candidate words follow the one or more words of the second target question.
In an analogous art, Zhou teaches further comprising: converting the first representation into a first probability distribution including probabilities that individual words of the first candidate words follow the one or more words of the first target question (Zhou: Para. [0012], receiving one of one or more candidate words of a probability distribution and a current state of a decoder in a recurrent decoding process of the language generation logic; evaluating, based on the current state of the decoder, the received candidate word to determine if the word is likely to lead to a grammatically correct output; assigning a numerical value indicative of a level of grammatical correctness to the received candidate word; and outputting the assigned numerical value.); and
converting the second representation into a second probability distribution including probabilities that individual words of the second candidate words follow the one or more words of the second target question (Zhou: Para. [0012], Para. [0013], receive the current state and the one or more candidate words of the probability distribution of the decoder in the recurrent decoding process of the language generation logic; feed, separately, each of the one or more candidate words to the classifier module; and create a vector of acceptable words including each of the candidate words to which the classifier module assigned a numerical value indicative of a level of acceptable grammatical correctness. This may allow for a more efficient process at the classifier module, e.g. efficient transfer of data to the classifier module and reduced processing cost at the classifier module.).
It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Zhou with the system and method of Klein and Lu to include further comprising: converting the first representation into a first probability distribution including probabilities that individual words of the first candidate words follow the one or more words of the first target question; and converting the second representation into a second probability distribution including probabilities that individual words of the second candidate words follow the one or more words of the second target question because this functionality provides generation of candidate words that are likely to lead to a logical sentence (Zhou: Para. [0012]).
Regarding claim 3, Klein, in combination with Lu and Zhou, teaches the method of claim 2, wherein the training includes training the machine learning model to generate questions that are relevant to the given context based on a first probability assigned to a respective one of the first candidate words that matches a next word of the first target question, and a second probability assigned to a respective one of the second target question that matches a next word of the second target question (Zhou: Claim 11, receiving an example candidate word of a probability distribution of one decoder step of said language generation logic; and training the classifier module using imitation learning, based on the example candidate word and either an expert policy configured to identify words which have a high probability of leading to sensical sentences or the classifier module when partially trained, to determine the likelihood a candidate word will lead to a sentence which has an acceptable level of grammatical correctness and to assigning a numerical value to the candidate word indicative of the level of grammatical correctness.).
Regarding claims 10-11, Claims 10-11 are rejected under the same rational as claims 2-3, respectively.
Claim(s) 7 is rejected under 35 U.S.C. 103 as being unpatentable over Klein et al. (US 2022/0067486; Hereinafter “Klein”) in view of Lu et al. (US 2020/0065389; Hereinafter “Lu”) in view of Lin (US 2020/0265192).
Regarding claim 7, Klein, in combination with Lu, teaches the method of claim 1. Klein, in combination with Lu, does not explicitly teach wherein the first decoder and the second decoder each include a plurality of layers, respective subsequent layers of the first decoder updating the first representation based on the context and the first representation as output from respective previous layers of the first decoder, and respective subsequent layers of the second decoder updating the second representation based on the context and the second representation as output from respective previous layers of the second decoder.
In an analogous art, Lin teaches wherein the first decoder and the second decoder each include a plurality of layers, respective subsequent layers of the first decoder updating the first representation based on the context and the first representation as output from respective previous layers of the first decoder, and respective subsequent layers of the second decoder updating the second representation based on the context and the second representation as output from respective previous layers of the second decoder (Lin: Para. [0111], obtaining a character included in a target text sequentially, and decoding the character according to a first-layer long short-term memory (LSTM) structure inputted into a LSTM model sequentially to obtain a sequence composed of hidden states, wherein the LSTM model is a LSTM neural network; inputting the sequence composed of hidden states into a second-layer LSTM structure of the LSTM model and decoding the sequence composed of hidden states to obtain a word sequence of a summary; inputting the word sequence of the summary into the first-layer LSTM structure of the LSTM model and decoding the word sequence of the summary to obtain an updated sequence composed of hidden states; obtaining a context vector corresponding to a contribution value of a hidden state of a decoder according to the contribution value of the hidden state of the decoder in the updated sequence composed of hidden states; and obtaining a probability distribution of a word in the updated sequence composed of hidden states according to the updated sequence composed of hidden states and the context vector, and outputting the most probable word in the probability distribution of the word as a summary of the target text.).
It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Lin with the system and method of Klein and Lu to include wherein the first decoder and the second decoder each include a plurality of layers, respective subsequent layers of the first decoder updating the first representation based on the context and the first representation as output from respective previous layers of the first decoder, and respective subsequent layers of the second decoder updating the second representation based on the context and the second representation as output from respective previous layers of the second decoder because this functionality provides for automatic text summarization (Lin: Para. [0002]).
Claim(s) 15 and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Klein et al. (US 2022/0067486; Hereinafter “Klein”) in view of Lu et al. (US 2020/0065389; Hereinafter “Lu”) in view of Linck et al. (US 2022/0269938; Hereinafter “Linck”).
Regarding claim 15, Klein teaches one or more computer-readable storage media storing instructions that, responsive to execution by a processing device, causes the processing device to perform operations including:
receiving, by a machine learning model, a context (Klein: Para. [0044], The fine-tuning may include the first machine learning model 115a performing a conditional generation of questions given an annotated answer. For example, during this training phase, the first machine learning model 115a may be provided a question context c along with an l quantity of answer-question tuples (a.sub.i, q.sub.i), wherein the value of l may vary from context to context, a.sub.i may denote the ground truth answer, and q.sub.i may denote the ground truth question.);
generating, by the machine learning model, a first question and a second question based on the context (Klein: Para. [0044], The fine-tuning may include the first machine learning model 115a performing a conditional generation of questions given an annotated answer. For example, during this training phase, the first machine learning model 115a may be provided a question context c along with an l quantity of answer-question tuples (a.sub.i, q.sub.i), wherein the value of l may vary from context to context, a.sub.i may denote the ground truth answer, and q.sub.i may denote the ground truth question. [quantity of questions may include 2]),
Klein does not explicitly teach the machine learning model trained to diversify the first question and the second question by employing the machine learning model to generate representations of candidate words for inclusion in training questions and updating the machine learning model based on differences between the generated representations.
In an analogous art, Lu teaches the machine learning model trained to diversify the first question and the second question by employing the machine learning model to generate representations of candidate words for inclusion in training questions and updating the machine learning model based on differences between the generated representations (Lu: Para. [0005], inputting the sentence vector of the sample sentence into a first decoder model corresponding to each context sentence of the sample sentence, to obtain a first identifier corresponding to the context sentence. The method also includes inputting the sentence vector of the sample sentence into a second decoder model corresponding to each word of the sample sentence, to obtain a second identifier corresponding to the word; Para. [0005], and obtaining a first probability corresponding to the first identifier according to the first decoder models, obtaining a second probability corresponding to the second identifier according to the second decoder models, and determining a value of a target function, the value of the target function being used for indicating that the sentence vector of the sample sentence represents a semantic accuracy degree. Further, the method includes performing parameter training on the encoder model according to the value of the target function; and inputting a word vector of each word in a test sentence into the trained encoder model, to obtain a sentence vector representing semantics of the test sentence.).
It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Lu with the system and method of Klein to include the machine learning model trained to diversify the first question and the second question by employing the machine learning model to generate representations of candidate words for inclusion in training questions and updating the machine learning model based on differences between the generated representations because this functionality provides training of a machine learning model to produce the semantics of a sentence (Lu: Para. [0004]).
Klein, in combination with Lu, does not explicitly teach outputting, by the machine learning model, the first question and the second question.
In an analogous art, Linck teaches outputting, by the machine learning model, the first question and the second question (Linck: Para. [0029], generating the question may further include processing, at a third machine learning model, the most contested claim and the paragraph including the most contested claim using bidirectional long short term memory (LSTM) to generate LSTM encoder output data; concatenating, by the third machine learning model, the LSTM encoder output data to generate LSTM decoder input data; and processing, by the third machine learning model, the LSTM decoder input data and a context vector to generate question data corresponding to the question, wherein the context vector is the sum of a weighted average of encoder hidden states.).
It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Linck with the system and method of Klein and Lu to include teach outputting, by the machine learning model, the first question and the second question because this functionality provides training of a machine learning model to produce the semantics of a sentence (Linck: Para. [0004]).
Regarding claim 17, Klein, in combination with Lu and Linck, teaches the one or more computer-readable storage media of claim 15, wherein the context is retrieved from a product listing of a digital marketplace, the context including a title of the product listing (Linck: Para. [0024], determining one or more claims for the media product using a first machine learning model; identifying a plurality of related media products having respective one or more related sentences based at least on a topic, title, summary text, or topical information of the media product. The title, summary text, or topical information of the media product may be determined by processing, using a trained classifier, the one or more sentences of the media product to identify text corresponding to the title, summary text, or topical information.).
Regarding claim 18, Klein, in combination with Lu and Linck, teaches the one or more computer-readable storage media of claim 17, wherein the outputting includes communicating the first question and the second question to a client device of a publisher of the product listing (Linck: Para. [0029], generating the question may further include processing, at a third machine learning model, the most contested claim and the paragraph including the most contested claim using bidirectional long short term memory (LSTM) to generate LSTM encoder output data; concatenating, by the third machine learning model, the LSTM encoder output data to generate LSTM decoder input data; and processing, by the third machine learning model, the LSTM decoder input data and a context vector to generate question data corresponding to the question, wherein the context vector is the sum of a weighted average of encoder hidden states.).
Regarding claim 19, Klein, in combination with Lu and Linck, teaches the one or more computer-readable storage media of claim 17, wherein the outputting includes automatically inserting the first question and the second question into a comments section of the product listing (Linck: Para. [0047], Further, the one or more processors may determine that the user has read the media product if the user clicks on a user selectable icon displayed on user interface 122 that corresponds to the media product, wherein the user selectable icon may correspond to user feedback (e.g., like, comment, etc.) or user action (e.g., share, save, repost, etc.) to the media product. Para. [0029]).
Regarding claim 20, Klein, in combination with Lu and Linck, teaches the one or more computer-readable storage media of claim 17, wherein the machine learning model is a transformer having an encoder-decoder architecture or a decoder architecture (Klein: Para. [0011], the first machine learning model may be a transformer decoder network and the second machine learning model may be a transformer encoder network. Para. [0036], The transformer decoder network and the transformer encoder network may be fine-tuned in tandem in an end-to-end manner including by adjusting the weights applied by the transformer decoder network when generating questions in order to minimize the errors in the corresponding answers output by the transformer encoder network.).
Claim(s) 16 is rejected under 35 U.S.C. 103 as being unpatentable over Klein et al. (US 2022/0067486; Hereinafter “Klein”) in view of Lu et al. (US 2020/0065389; Hereinafter “Lu”) in view of Linck et al. (US 2022/0269938; Hereinafter “Linck”) in view of Zhou et al. (US 2021/0319188; Hereinafter “Zhou”).
Regarding claim 16, Klein, in combination with Lu and Linck, teaches the one or more computer-readable storage media of claim 15. Klein, in combination with Lu and Linck, does not explicitly teach wherein the machine learning model is further trained to relate the first question and the second question to the context by assigning probabilities to the candidate words indicating measures of likelihood that the candidate words correspond to words of the training questions and updating the machine learning model based on respective ones of the probabilities assigned to the candidate words that match the words of the training questions.
In an analogous art, Zhou teaches wherein the machine learning model is further trained to relate the first question and the second question to the context by assigning probabilities to the candidate words indicating measures of likelihood that the candidate words correspond to words of the training questions and updating the machine learning model based on respective ones of the probabilities assigned to the candidate words that match the words of the training questions (Zhou: Para. [0012], receiving one of one or more candidate words of a probability distribution and a current state of a decoder in a recurrent decoding process of the language generation logic; evaluating, based on the current state of the decoder, the received candidate word to determine if the word is likely to lead to a grammatically correct output; assigning a numerical value indicative of a level of grammatical correctness to the received candidate word; and outputting the assigned numerical value. Para. [0013], receive the current state and the one or more candidate words of the probability distribution of the decoder in the recurrent decoding process of the language generation logic; feed, separately, each of the one or more candidate words to the classifier module; and create a vector of acceptable words including each of the candidate words to which the classifier module assigned a numerical value indicative of a level of acceptable grammatical correctness. This may allow for a more efficient process at the classifier module, e.g. efficient transfer of data to the classifier module and reduced processing cost at the classifier module. Claim 11, receiving an example candidate word of a probability distribution of one decoder step of said language generation logic; and training the classifier module using imitation learning, based on the example candidate word and either an expert policy configured to identify words which have a high probability of leading to sensical sentences or the classifier module when partially trained, to determine the likelihood a candidate word will lead to a sentence which has an acceptable level of grammatical correctness and to assigning a numerical value to the candidate word indicative of the level of grammatical correctness.).
It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Zhou with the system and method of Klein, Lu, and Linck to include wherein the machine learning model is further trained to relate the first question and the second question to the context by assigning probabilities to the candidate words indicating measures of likelihood that the candidate words correspond to words of the training questions and updating the machine learning model based on respective ones of the probabilities assigned to the candidate words that match the words of the training questions because this functionality provides generation of candidate words that are likely to lead to a logical sentence (Zhou: Para. [0012]).
Allowable Subject Matter
Regarding Claim 8, Claim 8 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is an Examiner’s statement of reasons for allowance:
The closest prior art includes Klein et al. (US 2022/0067486; Hereinafter “Klein”) in view of Lu et al. (US 2020/0065389; Hereinafter “Lu”) in view of Lin (US 2020/0265192). However, none of Klein, Lu, and Lin, teaches or suggests, alone or in combination, the particular combination of steps or elements as recited in claim 8. For example, none of the cited prior art teaches or suggest the steps of “wherein the training includes calculating the diversity loss by applying a difference metric to respective word representation pairs, the respective word representation pairs including the first representation and the second representation as output by corresponding layers of the first decoder and the second decoder.” As a result, the claims are allowable over the cited prior art.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
U.S. Patent Application Publication No. US 2022/0138267 by Otsuka et al.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Nelson Giddins whose telephone number is (571)272-7993. The examiner can normally be reached on Monday - Friday, 9:00 AM - 5:00 PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Linglan Edwards can be reached at (571) 270-5440. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/NELSON S. GIDDINS/ Primary Examiner, Art Unit 2408