DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
Claims 1, 12, and 21-25 are rejected under 35 U.S.C. 103 as being unpatentable over Williams et al., (US 20240333728 A1) hereinafter referred to as Williams in view of Seth (US 20210110035 A1) hereinafter referred to as Seth.
Regarding Claim 1, Williams discloses A system, comprising: a processor; a prompt library, the prompt library associating prompts with areas of expertise, each prompt being associated with at least one area of expertise and, for the at least one area of expertise, a respective score; [paragraph 0070, The prompt 222 may be previously unknown to the SFT ML model 215, e.g., the labelers may generate new prompt data, the prompt 222 may include testing data stored on database 126, and/or any other suitable prompt data] [paragraph 0062, the chatbot may be capable of receiving and understanding prompts for network security vulnerability testing code…In another aspect, the chatbot may be capable of receiving and understanding prompts for abnormal network traffic detection code. In another aspect, the chatbot may be capable of receiving and understanding privacy policies and prompts for privacy enforcement code – teaches chatbots “expert systems” that evaluates specific items of interest “area of expertise”]
and memory storing instructions that, when executed by the processor, cause the system to: evaluate first responses generated by a large language model to a set of prompts selected from the prompt library using expert systems associated with the areas of expertise for the set of prompts against second responses generated by a modified version of the large language model to the set of prompts, [paragraph 0071, The data labelers may provide feedback via the server 204 on the responses 224A, 224B, 224C, 224D when ranking 226 them from best to worst based upon the prompt-response pairs]
determine that the evaluation indicates a degradation criterion is met, [paragraph 0072, the scalar reward 225 may include a value numerically representing a human preference for the best and/or most expected response to a prompt…Inputting a “losing” prompt-response pair data to the same reward model 220 may generate a losing reward]
Williams does not explicitly teach and initiating remedial action in response to determining that the evaluation indicates a degradation criterion is met.
Seth teaches and initiating remedial action in response to determining that the evaluation indicates a degradation criterion is met. [paragraph 0069, Once the model is validated, the status of the ML package may be changed to indicate that the package is ready for deployment (e.g., including the status “UNDEPLOYED”). If validation fails, the status of the ML package may be changed to indicate that failure occurred (e.g., including the status “VALIDATION_FAILED”). The platform may then prevent deployment of the failed package (e.g., the model is not deployed by the conductor)]
Before the effective filing date of the claimed invention, it would have been obvious to one with ordinary skill in the art to combine the teachings of Seth with the disclosure of Williams. The motivation or suggestion would have been “for validation of machine learning (ML) models for robotic process automation (RPA) before deployment.” (Abstract)
Regarding Claim 12, Williams does not explicitly teach wherein the remedial action prevents the modified version of the large language model from being put into a production environment.
Seth teaches wherein the remedial action prevents the modified version of the large language model from being put into a production environment. [paragraph 0069, Once the model is validated, the status of the ML package may be changed to indicate that the package is ready for deployment (e.g., including the status “UNDEPLOYED”). If validation fails, the status of the ML package may be changed to indicate that failure occurred (e.g., including the status “VALIDATION_FAILED”). The platform may then prevent deployment of the failed package (e.g., the model is not deployed by the conductor)]
Before the effective filing date of the claimed invention, it would have been obvious to one with ordinary skill in the art to combine the teachings of Seth with the disclosure of Williams. The motivation or suggestion would have been “for validation of machine learning (ML) models for robotic process automation (RPA) before deployment.” (Abstract)
Regarding Claim 21, Williams discloses A method comprising: selecting, from a prompt collection, a set of prompts identified as quality backstop prompts; [paragraph 0070, The prompt 222 may be previously unknown to the SFT ML model 215, e.g., the labelers may generate new prompt data, the prompt 222 may include testing data stored on database 126, and/or any other suitable prompt data]
using at least a first expert system to evaluate responses, generated by a modified version of a large language model to prompts in the set of prompts; [paragraph 0071, The data labelers may provide feedback via the server 204 on the responses 224A, 224B, 224C, 224D when ranking 226 them from best to worst based upon the prompt-response pairs]
determining that at least one response fails to meet a quality threshold; [paragraph 0072, the scalar reward 225 may include a value numerically representing a human preference for the best and/or most expected response to a prompt…Inputting a “losing” prompt-response pair data to the same reward model 220 may generate a losing reward]
Williams does not explicitly teach and preventing the modified version of the large language model from being put into a production environment.
Seth teaches and preventing the modified version of the large language model from being put into a production environment. [paragraph 0069, Once the model is validated, the status of the ML package may be changed to indicate that the package is ready for deployment (e.g., including the status “UNDEPLOYED”). If validation fails, the status of the ML package may be changed to indicate that failure occurred (e.g., including the status “VALIDATION_FAILED”). The platform may then prevent deployment of the failed package (e.g., the model is not deployed by the conductor)]
Before the effective filing date of the claimed invention, it would have been obvious to one with ordinary skill in the art to combine the teachings of Seth with the disclosure of Williams. The motivation or suggestion would have been “for validation of machine learning (ML) models for robotic process automation (RPA) before deployment.” (Abstract)
Regarding Claim 22, Williams discloses wherein the prompts in the set of prompts are identified in the prompt collection as quality backstop prompts for an area of expertise of the first expert system. [paragraph 0062, the chatbot may be capable of receiving and understanding prompts for network security vulnerability testing code]
Regarding Claim 23, Williams discloses further comprising using a second expert system to evaluate responses generated by the modified version of the large language model to prompts associated with a second area of expertise, the second expert system corresponding to the second area of expertise. [paragraph 0062, In another aspect, the chatbot may be capable of receiving and understanding prompts for abnormal network traffic detection code. In another aspect, the chatbot may be capable of receiving and understanding privacy policies and prompts for privacy enforcement code – teaches other chatbots “expert systems” that evaluates specific items of interest “area of expertise”]
Regarding Claim 24, Williams discloses wherein using the first expert system to evaluate a response generated for a prompt in the set of prompts includes obtaining a score for the response using the first expert system. [paragraph 0092, the ML model 310 may use a regression model to determine a severity score associated with an identified security vulnerability based upon the security vulnerability documents, which may be a preferred model in situations involving scoring output data. In one aspect, the ML model 310 may rank the identified security vulnerabilities 350 based upon the severity scores]
Regarding Claim 25, Williams discloses wherein the first expert system is a knowledge engine and the score represents at least one of a topicality score for the response or a veracity score for the response. [paragraph 0062, the chatbot may be capable of receiving and understanding prompts for network security vulnerability testing code] [paragraph 0092, the ML model 310 may use a regression model to determine a severity score associated with an identified security vulnerability based upon the security vulnerability documents, which may be a preferred model in situations involving scoring output data. In one aspect, the ML model 310 may rank the identified security vulnerabilities 350 based upon the severity scores]
Allowable Subject Matter
Claims 2-11 and 13 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is an examiner’s statement of reasons for allowance:
Regarding Claims 2-11 and 13, the closest prior art of record does not explicitly teach nor suggest in detail the limitations of these claims in view of other limitations of the intervening claims.
Thus the prior arts of record taking singly or in combination do not teach or suggest the above-stated limitations taking wholly in combination with all the elements of each independent claim.
Allowable Subject Matter
Claims 14-20 are allowed.
The following is an examiner’s statement of reasons for allowance:
Regarding Claim 14, although the closest prior art of record (such as Williams et al., (US 20240333728 A1) and Seth (US 20210110035 A1)) teaches A method comprising: using at least a first expert system to evaluate first responses, generated by a large language model in response to prompts in a set of prompts selected from a prompt collection, against second responses generated by a modified version of the large language model in response to the prompts in the set of prompts, the evaluation including for each prompt in the set of prompts: obtaining a first score for a response, of the first responses, generated for the prompt, obtaining a second score for a response, of the second responses, generated for the prompt, and preventing the modified version of the large language model from being put into a production environment.
However, none of the prior art, alone or in combination teaches and identifying the prompt as a flagged prompt in response to determining that the second score represents a predetermined drop in quality from the first score; determining a ratio of flagged prompts to prompts in the set of prompts; determining that the ratio represents an unacceptable ratio in view of other limitations of the independent claims.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee. Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANDREW J STEINLE whose telephone number is (571)272-9923. The examiner can normally be reached M-F 10am-6pm CT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Eleni Shiferaw can be reached at (571) 272-3867. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ANDREW J STEINLE/Primary Examiner, Art Unit 2497