DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
Claims 1, 3, 6-7, 9, 14, 17-18 are amended. Claims 2, 4-5, 13, 15-16, and 20 are cancelled. Claims 21-26 are newly added. Claims 1, 3, 6-12, 14, 17-19 and 21-26 are presented for examination.
Response to Arguments
Applicant arguments filed on 2/9/2026 have been reviewed. Following are the response:
35 U.S.C. 103 Rejections
Applicant’s arguments with respect to claim(s) 1, 3, 6-12, 14, 17-19 and 21-26have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
35 U.S.C. 101 Rejection
In light of amendments rejection under 101 is withdrawn.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 3, 8-10, 14, 19, 21-22 and 25-26 are rejected under 35 U.S.C. 103 as being unpatentable over Aman ( Self-Refine: Iterative Refinement with Self-Feedback )and further in view of Zheng (Judging LLM-as-a-judge with MT-Bench and Chatbot Arena)
Regarding claim 1, Imani teaches method implemented by one or more processors, the method comprising: receiving natural language (NL) based input associated with a client device ( code or dialog context, Page 30-31) ; generating, using a large language model (LLM), a LLM response based on processing the NL based input ( Final response, Fig 13), wherein generating the LLM response comprises: obtaining a set of response evaluation criteria for evaluating a plurality of candidate LLM responses (criteria/feedback, Under L.1 and M.1) , wherein obtaining the set of response evaluation criteria for evaluating the plurality of candidate LLM responses comprises: generating a request for the LLM to generate the set of response evaluation criteria based on the NL input and processing the request to generate the set of response evaluation criteria (response criteria is based on input. Examiner is given for the code – L.1; Some of these criteria are difficult to quantify, and are a matter of human preference. As with other modules, we leverage the superior instruction following capabilities of modern LLMs to instead provide a few demonstrations of each task., Under Q Acronym generation ); generating the candidate LLM responses based on processing the NL based input (generating a response, Fig 13); generating, for a candidate LLM response, a corresponding critique response based on processing of candidate LLM responses and the set of response evaluation criteria using the LLM , wherein the corresponding critique responses are indicative of an extent to which a respective one of the plurality of candidate LLM responses comply with the set of response evaluation criteria (scoring based on each criteria, Fig 13); and causing the LLM response to be rendered at the client device ( Final response, Fig 13)
Imani does not explicitly teach generating the plurality of candidate LLM responses based on processing the NL based input; generating, for each of the plurality of candidate LLM responses, a corresponding critique response based on processing each of the plurality of candidate LLM responses and the set of response evaluation criteria using the LLM, wherein each the corresponding critique responses are indicative of an extent to which a respective one of the plurality of candidate LLM responses comply with the set of response evaluation criteria; and selecting, based on the corresponding critique response associated with one of the plurality of candidate LLM responses indicating a threshold compliance with the set of response evaluation criteria, the one of the plurality of candidate LLM responses as the LLM response
However Zheng teaches generating the plurality of candidate LLM responses based on processing the NL based input (LLM responses, Fig 1) ;generating for each of the plurality of candidate LLM responses, a corresponding critique response based on processing each of the plurality of candidate LLM responses and the set of response evaluation criteria using the LLM ( judged against adhere instructions, Under Introduction) , wherein each the corresponding critique responses are indicative of an extent to which a respective one of the plurality of candidate LLM responses comply with the set of response evaluation criteria (clear vs clearer and accurate, For e.g. Fig 12-14) ; and selecting, based on the corresponding critique response associated with one of the plurality of candidate LLM responses indicating a threshold compliance with the set of response evaluation criteria , the one of the plurality of candidate LLM responses as the LLM response ( selecting a response, Fig 12-14)
It would have been obvious having the teachings of Aman to further include the concept of Zheng before effective filing date since using Zheng concept reveal that strong LLM judges like GPT-4 can match both controlled and crowdsourced human preferences well, achieving over 80% agreement, the same level of agreement between humans ( Under Introduction, Zheng)
Regarding claim 3, Aman modified by Zheng as above in claim 1, teaches , wherein the indication of the extent to which a respective one of the plurality of candidate LLM responses complies with the set of response evaluation criteria comprises a comparison measure, the comparison measure being generated, for each of the plurality of candidate LLM responses, based on comparing each of the plurality of candidate LLM responses to the set of response evaluation criteria using the LLM (Fig 13, metrics and the comparison measure, Aman ; Refer to Fig 14 both of the them are accurate however B’s response is more clearer, Zheng)
Regarding claim 8, Aman modified by Zheng as above in claim 1, teaches , wherein generating a corresponding critique response for a given candidate LLM response comprises: generating a request for the LLM to determine which of the set of response evaluation criteria the given candidate LLM response complies with(feedback, L.1, M.1 Some of these criteria are difficult to quantify, and are a matter of human preference. As with other modules, we leverage the superior instruction following capabilities of modern LLMs to instead provide a few demonstrations of each task., Under Q Acronym generation );) ; and processing the request using the LLM to generate the corresponding critique response (fig 3, Imani)
Regarding claim 9, arguments analogous to claim 1, are applicable. In addition, Zheng t teaches :generating training data for fine-tuning a large language model (LLM), wherein generating the training data comprises: the steps of claim 1 and storing, as an instance of the training data, the NL based input along with the LLM response that is selected from among the plurality of candidate LLM responses ( fine-tuning the judge model, Under 3.4)
Regarding claim 10, Zheng as above in claim 9, teaches fine-tuning the LLM based on the training data ( fine-tuning the judge model, Under 3.4)
Regarding claim 26, Aman modified by Zheng as above in claim 21, teaches wherein the instructions further cause the at least one processor to be operable to: store, as an instance of training data, the NL based input along with the LLM response that is selected from among the plurality of candidate LLM responses; and fine-tune, based on the stored instance of the training data, the LLM ( fine-tuning the judge model, Under 3.4, Zheng)
Regarding claim 14, arguments analogous to claim 3, are applicable.
Regarding claim 19, arguments analogous to claim 8, are applicable.
Regarding claim 21, arguments analogous to claim 1, are applicable.
Regarding claim 22, arguments analogous to claim 3, are applicable.
Regarding claim 25, arguments analogous to claim 8, are applicable.
Claims 6, 17 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Aman ( Self-Refine: Iterative Refinement with Self-Feedback ) and further in view of Zheng (Judging LLM-as-a-judge with MT-Bench and Chatbot Arena) and further in view of Jin (When Large Language Models Meet Personalization: Perspectives of Challenges and Opportunities)
Regarding claim 6, Aman modified by Zheng as above in claim 1, mention and determining the set of response evaluation criteria based on the user information ( Future - bench mark based on human preferences, Under 6) however does not explicitly teach wherein obtaining the set of response evaluation criteria comprises :obtaining user information associated with the user of the client device;
Jin teaches wherein obtaining the set of response evaluation criteria comprises :obtaining user information associated with the user of the client device (user context information, Fig 3) ; and determining the set of response evaluation criteria based on the user information ( answer is provides with the set of evaluation for e..g preference for film, Fig 3)
It would have been obvious having the teachings of Aman and Zheng to further include the concept of Jin before effective filing date to have personalization and thus improving the performance ( Abstract, Jin)
Regarding claim 17, arguments analogous to claim 6, are applicable.
Regarding claim 23, arguments analogous to claim 6, are applicable.
Claims 7, 18 and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Aman ( Self-Refine: Iterative Refinement with Self-Feedback ) and further in view of Zheng (Judging LLM-as-a-judge with MT-Bench and Chatbot Arena) and further in view of Le ( US 20240346251 )
Regarding claim 7, Aman modified by Zheng as above in claim 1, does not explicitly teach wherein obtaining the set of response evaluation criteria comprises: obtaining information indicative of a set of response evaluation criteria associated with a third party (3P); and determining the set of response evaluation criteria based on the obtained information
In the same field of endeavor Le teaches obtaining information indicative of a set of response evaluation criteria associated with a third party (3P); and determining the set of response evaluation criteria based on the obtained information ( a third party may leverage a data management application 514 to make training data 516 available to one or more machine learning models. Subsequent to training a model with enterprise (e.g., third-party) and application specific training data, the model may be hosted via the messaging service platform, such that the model can be used by the topic evaluation engine 502 in processing and routing messages based on custom defined topics, custom defined message intent values, and custom defined message context values, that may be specific to a particular add-on software app, Para 0047-0049, Fig 5-6)
It would have been obvious having the teachings of Aman and Zheng to further include the teachings of Le before effective filing date so that topic evaluation engine in processing and routing messages based on custom defined topics, custom defined message intent values, and custom defined message context values, that may be specific to a particular add-on software app ( Para 0047, Le)
Regarding claim 18, arguments analogous to claim 7, are applicable.
Regarding claim 24, arguments analogous to claim 7, are applicable.
Claim 11-12 are rejected under 35 U.S.C. 103 as being unpatentable over Aman ( Self-Refine: Iterative Refinement with Self-Feedback ) and further in view of Zheng (Judging LLM-as-a-judge with MT-Bench and Chatbot Arena) and further in view of Liu ( US Pub: 20240346254 )
Regarding claim 11, Zheng as above in claim 10, teaches :generating, for each of the plurality of candidate LLM responses, and based on comparing each of the plurality of candidate LLM responses to the set of response evaluation criteria using the LLM, a corresponding comparison measure ( for e.g. fig 14- which model is better) , wherein fine-tuning the LLM comprises fine-tuning the LLM with reinforcement learning (RL) using the reward model ( reinforcement learning, Under introduction)
Aman modified by Zheng does not teach training a reward model based on the selected one of the plurality of candidate LLM responses and the corresponding comparison measure,
However Liu teaches and training a reward model based on the selected one of the plurality of candidate LLM responses and the corresponding comparison measure ( the reward model 120 of the reinforcement learning protocol 118 can configure the natural language generation system 104 to seek actions that maximize the output score 124, Para 0033, 0044, 0052) , wherein fine-tuning the LLM comprises fine-tuning the LLM with reinforcement learning (RL) using the reward model ( adapting the RL model, ( fine tuning, Para 0033)
It would have been obvious having the teachings of Aman and Zheng to further include the concept of Liu before effective filing date since RL model is a knows model especially for the policy and rules, fine tuning the RL model will optimize the output results (Para 0032-0033, Liu)
Regarding claim 12, Zheng as above in claim 10, mentions( fine-tuning and further using a fine-tuned model, Under 3.4) however does not explicitly teaches subsequent to fine-tuning the LLM: receiving an NL based input associated with a client device; generating an LLM response based on processing the NL based input associated with the client device using the LLM; and causing the LLM response to be rendered at the client device
Liu teaches subsequent to fine-tuning the LLM: receiving an NL based input associated with a client device; generating an LLM response based on processing the NL based input associated with the client device using the LLM; and causing the LLM response to be rendered at the client device ( Fig 4, after the feedback model responds based on adapting)
It would have been obvious having the teachings of Aman and Zheng to further include the concept of Liu before effective filing date since RL model is a knows model especially for the policy and rules, fine tuning the RL model will optimize the output results (Para 0032-0033, Liu)
2nd rejection
Regarding claim 1, Imani teaches method implemented by one or more processors, the method comprising: receiving natural language (NL) based input associated with a client device( input that includes natural language, S402, Fig 4); generating, using a large language model (LLM), a LLM response based on processing the NL based input ( S408, presenting the LLM response, Para 0047), wherein generating the LLM response comprises: obtaining a set of response evaluation criteria for evaluating a plurality of candidate LLM responses (The readability model 108 creates a feature vector 16 based on the evaluation of the readability metrics, Para 0020, Fig 3) , wherein obtaining the set of response evaluation criteria for evaluating the plurality of candidate LLM responses comprises: generating a request for the LLM to generate the set of response evaluation criteria based on the NL based input and processing the request to generate the set of response evaluation criteria ( readability metrics based on the input, Para 0039, Step 404, Fig 1 and Fig 4); generating the candidate LLM responses based on processing the NL based input (generating a response, Para 0030-0034, 0047); generating, for a candidate LLM response, a corresponding critique response based on processing of candidate LLM responses and the set of response evaluation criteria using the LLM , wherein the corresponding critique responses are indicative of an extent to which a respective one of the plurality of candidate LLM responses comply with the set of response evaluation criteria ( insight for a response, Para 0030-0034, 0047; ( confidence score which is based on features like on measures and metric ( for e.g. readability), Para 0015, 0025-006, 0030)); and causing the LLM response to be rendered at the client device ( present the response and the score and insight, Para 0047)
Imani does not explicitly teach generating the plurality of candidate LLM responses based on processing the NL based input; generating, for each of the plurality of candidate LLM responses, a corresponding critique response based on processing each of the plurality of candidate LLM responses and the set of response evaluation criteria using the LLM, wherein each the corresponding critique responses are indicative of an extent to which a respective one of the plurality of candidate LLM responses comply with the set of response evaluation criteria; and selecting, based on the corresponding critique response associated with one of the plurality of candidate LLM responses indicating a threshold compliance with the set of response evaluation criteria, the one of the plurality of candidate LLM responses as the LLM response
However Zheng teaches generating the plurality of candidate LLM responses based on processing the NL based input (LLM responses, Fig 1) ;generating for each of the plurality of candidate LLM responses, a corresponding critique response based on processing each of the plurality of candidate LLM responses and the set of response evaluation criteria using the LLM ( judged against adhere instructions, Under Introduction) , wherein each the corresponding critique responses are indicative of an extent to which a respective one of the plurality of candidate LLM responses comply with the set of response evaluation criteria (clear vs clearer and accurate, For e.g. Fig 12-14) ; and selecting, based on the corresponding critique response associated with one of the plurality of candidate LLM responses indicating a threshold compliance with the set of response evaluation criteria , the one of the plurality of candidate LLM responses as the LLM response ( selecting a response, Fig 12-14)
It would have been obvious having the teachings of Imani to further include the concept of Zheng before effective filing date since using Zheng concept reveal that strong LLM judges like GPT-4 can match both controlled and crowdsourced human preferences well, achieving over 80% agreement, the same level of agreement between humans ( Under Introduction, Zheng)
Regarding claim 3, Imani modified by Zheng as above in claim 1, teaches , wherein the indication of the extent to which a respective one of the plurality of candidate LLM responses complies with the set of response evaluation criteria comprises a comparison measure, the comparison measure being generated, for each of the plurality of candidate LLM responses, based on comparing each of the plurality of candidate LLM responses to the set of response evaluation criteria using the LLM ( measure or metric, Fig 2A-2D, Fig 3, Imani; Refer to Fig 14 both of the them are accurate however B’s response is more clearer, Zheng)
Regarding claim 8, Imani modified by Zheng as above in claim 1, teaches , wherein generating a corresponding critique response for a given candidate LLM response comprises: generating a request for the LLM to determine which of the set of response evaluation criteria the given candidate LLM response complies with ( LLM outputs the feature, Para 0024-0025) ; and processing the request using the LLM to generate the corresponding critique response (fig 3, Imani)
Regarding claim 9, arguments analogous to claim 1, are applicable. In addition, Imani teaches :generating training data for fine-tuning a large language model (LLM), wherein generating the training data comprises: the steps of claim 1 and storing, as an instance of the training data, the NL based input along with the LLM response that is selected from among the plurality of candidate LLM responses ( used as a feedback to further train the model, Para 0030, 0048, Imani; wherein fine tuning is an intended process here hence not given a weight; and it is obvious that if Imani is using the insight as feedback it will help further train the model)
Regarding claim 10, Zheng as above in claim 9, teaches fine-tuning the LLM based on the training data ( fine-tuning the judge model, Under 3.4)
Regarding claim 26, Imani modified by Zheng as above in claim 21, teaches wherein the instructions further cause the at least one processor to be operable to: store, as an instance of training data, the NL based input along with the LLM response that is selected from among the plurality of candidate LLM responses; and fine-tune, based on the stored instance of the training data, the LLM ( ( used as a feedback to further train the model, Para 0030, 0048, Imani; wherein fine tuning is an intended process here hence not given a weight; and it is obvious that if Imani is using the insight as feedback it will help further train the model; fine-tuning the judge model, Under 3.4, Zheng)
Regarding claim 14, arguments analogous to claim 3, are applicable.
Regarding claim 19, arguments analogous to claim 8, are applicable.
Regarding claim 21, arguments analogous to claim 1, are applicable.
Regarding claim 22, arguments analogous to claim 3, are applicable.
Regarding claim 25, arguments analogous to claim 8, are applicable.
Claims 6, 17 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Imani ( US 20240362417) and further in view of Zheng (Judging LLM-as-a-judge with MT-Bench and Chatbot Arena) and further in view of Jin (When Large Language Models Meet Personalization: Perspectives of Challenges and Opportunities)
Regarding claim 6, Imani modified by Zheng as above in claim 1, mention and determining the set of response evaluation criteria based on the user information ( Future - bench mark based on human preferences, Under 6) however does not explicitly teach wherein obtaining the set of response evaluation criteria comprises :obtaining user information associated with the user of the client device;
Jin teaches wherein obtaining the set of response evaluation criteria comprises :obtaining user information associated with the user of the client device (user context information, Fig 3) ; and determining the set of response evaluation criteria based on the user information ( answer is provides with the set of evaluation for e..g preference for film, Fig 3)
It would have been obvious having the teachings of Imani and Zheng to further include the concept of Jin before effective filing date to have personalization and thus improving the performance ( Abstract, Jin)
Regarding claim 17, arguments analogous to claim 6, are applicable.
Regarding claim 23, arguments analogous to claim 6, are applicable.
Claims 7, 18 and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Imani ( US 20240362417) and further in view of Zheng (Judging LLM-as-a-judge with MT-Bench and Chatbot Arena) and further in view of Le ( US 20240346251 )
Regarding claim 7, Imani modified by Zheng as above in claim 1, does not explicitly teach wherein obtaining the set of response evaluation criteria comprises: obtaining information indicative of a set of response evaluation criteria associated with a third party (3P); and determining the set of response evaluation criteria based on the obtained information
In the same field of endeavor Le teaches obtaining information indicative of a set of response evaluation criteria associated with a third party (3P); and determining the set of response evaluation criteria based on the obtained information ( a third party may leverage a data management application 514 to make training data 516 available to one or more machine learning models. Subsequent to training a model with enterprise (e.g., third-party) and application specific training data, the model may be hosted via the messaging service platform, such that the model can be used by the topic evaluation engine 502 in processing and routing messages based on custom defined topics, custom defined message intent values, and custom defined message context values, that may be specific to a particular add-on software app, Para 0047-0049, Fig 5-6)
It would have been obvious having the teachings of Imani and Zheng to further include the teachings of Le before effective filing date so that topic evaluation engine in processing and routing messages based on custom defined topics, custom defined message intent values, and custom defined message context values, that may be specific to a particular add-on software app ( Para 0047, Le)
Regarding claim 18, arguments analogous to claim 7, are applicable.
Regarding claim 24, arguments analogous to claim 7, are applicable.
Claim 11-12 are rejected under 35 U.S.C. 103 as being unpatentable over Imani ( US 20240362417) and further in view of Zheng (Judging LLM-as-a-judge with MT-Bench and Chatbot Arena) and further in view of Liu ( US Pub: 20240346254 )
Regarding claim 11, Zheng as above in claim 10, teaches :generating, for each of the plurality of candidate LLM responses, and based on comparing each of the plurality of candidate LLM responses to the set of response evaluation criteria using the LLM, a corresponding comparison measure ( for e.g. fig 14- which model is better) , wherein fine-tuning the LLM comprises fine-tuning the LLM with reinforcement learning (RL) using the reward model ( reinforcement learning, Under introduction)
Imani modified by Zheng does not teach training a reward model based on the selected one of the plurality of candidate LLM responses and the corresponding comparison measure,
However Liu teaches and training a reward model based on the selected one of the plurality of candidate LLM responses and the corresponding comparison measure ( the reward model 120 of the reinforcement learning protocol 118 can configure the natural language generation system 104 to seek actions that maximize the output score 124, Para 0033, 0044, 0052) , wherein fine-tuning the LLM comprises fine-tuning the LLM with reinforcement learning (RL) using the reward model ( adapting the RL model, ( fine tuning, Para 0033)
It would have been obvious having the teachings of Imani and Zheng to further include the concept of Liu before effective filing date since RL model is a knows model especially for the policy and rules, fine tuning the RL model will optimize the output results (Para 0032-0033, Liu)
Regarding claim 12, Zheng as above in claim 10, mentions( fine-tuning and further using a fine-tuned model, Under 3.4) however does not explicitly teaches subsequent to fine-tuning the LLM: receiving an NL based input associated with a client device; generating an LLM response based on processing the NL based input associated with the client device using the LLM; and causing the LLM response to be rendered at the client device
Liu teaches subsequent to fine-tuning the LLM: receiving an NL based input associated with a client device; generating an LLM response based on processing the NL based input associated with the client device using the LLM; and causing the LLM response to be rendered at the client device ( Fig 4, after the feedback model responds based on adapting)
It would have been obvious having the teachings of Imani and Zheng to further include the concept of Liu before effective filing date since RL model is a knows model especially for the policy and rules, fine tuning the RL model will optimize the output results (Para 0032-0033, Liu)
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Jiang ( US 11586814)
Korganyan (US 20240378390 )
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Richa Sonifrank whose telephone number is (571)272-5357. The examiner can normally be reached M-T 7AM - 5:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Phan Hai can be reached at (571)272-6338. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Richa Sonifrank/Primary Examiner, Art Unit 2654