DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1 - 20 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent Application Publication No. 2024/0135116 A1 to Duy Vu et al. (hereinafter Vu) in view of Non-Patent Literature HSimCSE: Improving Contrastive Learning of Unsupervised Sentence Representation with Adversarial Hard Positive and Dual Hard Negatives to Bo Xu et al. (hereinafter Xu).
Regarding claim 1, A method of training a neural network model, the method comprising: (Vu teaches training a multi-lingual multi-task model wherein at least a portion of the machine learning model is a neural network. Vu at ¶¶ [0140] - [0143] and [0153] - [0156].)
receiving, via a data interface, a first plurality of data samples; (Vu teaches accessing, obtaining, or generating a plurality of datasets that differ from each other in one or more differentiation attributes. (i.e., a plurality of data samples). Vu at ¶¶ [0159] - [0160].)
generating a plurality of batches using the first plurality of data samples, wherein a first batch includes data samples associated with a single first task, and wherein a second batch includes data samples associated with a single second task; (Vu teaches generating training batches used for ABSA and SLSA training tasks (i.e., a first batch for a first task and a second batch for a second task.) Vu at ¶¶ [0160] - [0167].)
and performing a first training process to the neural network model using the plurality of batches, wherein the performing the first training process includes: (Vu teaches training a multi-lingual multi-task model wherein at least a portion of the machine learning model is a neural network. Vu at ¶¶ [0140] - [0143] and [0153] - [0156]. Further, Vu teaches training the neural network using the ABSA and SLSA training batches. ¶¶ [0160] - [0167].)
generating a first loss objective function for the first batch based on the first task; generating a second loss objective function for the second batch based on the second task; (Vu teaches, using the training batches, minimizing a cross-entropy loss function and updating model parameters based on the loss function. Vu at ¶¶ [0375] - [0380]. The training batches are associated with specific tasks, and because training is performed for each task, wherein a loss is minimized based upon the training batches used for that task, then this constitutes generating a first loss objective function for the first batch based on the first task. Further, the training batches apply to multiple tasks (ABSA and SLSA) therefore a new loss is used for each task (i.e., a second task, a second loss function, and a second batch based on the second task). Vu at ¶¶ [0160] - [0167] and [0375] - [0380].)
computing a first loss based on the first loss objective function; computing a second loss based on the second loss objective function; (Vu teaches minimizing a loss based on each task at hand as a process of training for the task. Vu at ¶¶ [0375] - [0380].)
and updating parameters of the neural network model based on the first loss and the second loss via backpropagation; (Vu teaches updating parameters of a machine learning model based on the loss functions used for training. Vu at ¶¶ [0375] - [0380]. Further, Vu teaches using backpropagation to train a machine learning model (i.e., the model trained by updating parameters). Vu at ¶¶ [0140] - [0141].)
Vu, however, does not alone teach “And wherein the neural network model trained by the first training process is used to perform a text retrieval task based on text embedding”
In a similar field of endeavor (e.g., training neural networks for performing specific tasks including text retrieval/question answering wherein embeddings are used), Xu teaches and wherein the neural network model trained by the first training process is used to perform a text retrieval task based on text embedding. (Xu teaches sentence (i.e., text) embeddings being used for question answering (i.e., text retrieval) wherein training is performed to achieve such a task. Xu at sections I. Introduction, II. Overview, and III. Method, subsections A - D.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Vu with the teachings of Xu to provide the limitations of claim 1. Doing so would have improved the performance of the model of Vu by increasing performance over traditional models using hard positive and hard negative sampling to enhanced performance of the models as recognized by Xu at section IV. Experiment, B. Performance Comparison, and C. Ablation Study.
Regarding claim 2, Vu in view of Xu (hereinafter Vu-Xu) teaches all the limitations of claim 1. Further, Xu teaches the method of claim 1, wherein prior to the first training process, the neural network model is trained using a second training process using a second plurality of data samples. (Xu teaches the sentence encoder is a pre-trained model (i.e., trained using a second training process with different data prior to the training Xu). Xu at Section II. Overview, A. Problem Formulation.)
Regarding claim 3, Vu-Xu teaches all the limitations of claim 1 as laid out above. Further, Xu teaches the method of claim 1, wherein the neural network model includes a pre-trained generative large language model (LLM). (Xu teaches the sentence encoder is a pre-trained model (i.e., RoBERTa, which is a commonly known pre-trained generative transformer model and large language model.) Xu at Section II. Overview, A. Problem Formulation.)
Regarding claim 4, Vu-Xu teaches all the limitations of claim 1 as laid out above. Further Xu teaches the method of claim 1, wherein the text retrieval task is different from the first task and the second task. (Xu teaches that the sentence encoder, and specifically learning sentence representations themselves, is used in downstream applications (i.e., tasks that are not part of the training process) that includes question answering (i.e., text retrieval). Xu at Section I. Introduction. Therefore, the text-retrieval task is a different task than the training tasks as it is a downstream task, not a training task.)
Regarding claim 5, Vu-Xu teaches all the limitations of claim 1 as laid out above. Further, Xu teaches the method of claim 1, wherein the first loss objective function includes a first contrastive loss customized to the first task, and wherein the second loss objective function includes a second contrastive loss customized to the second task. (Xu teaches minimizing the normalized temperature-scaled cross-entropy loss NT-Xent which is recognized as the "Core of contrastive learning". Xu at section III. Method, A. Backbone. Further, Xu teaches using additionally using a contrastive quadruplet loss function to train the sentence encoder (i.e., a second training task separate from the first that minimized cross-entropy loss). Xu at section III. Method, sections A, B, and D.)
Regarding claim 6, Vu-Xu teaches all the limitations of claim 1 as laid out above. Further, Xu teaches the method of claim 1, wherein the performing the first training process includes: generating a plurality of hard negatives for the first task; (Xu teaches retrieving the closest samples to anchor samples as hard-negative samples (i.e., generating a plurality of hard negatives.) Xu at section III. Method, C. Dual Negative Sample Selection Module.)
selecting a predetermined number of hard negatives from the plurality of hard negatives for the first task; (Xu teaches selecting the top-K closest samples as the hard-negatives, then removing the top-1 sample to avoid the false-negative problem. Xu at section III. Method, C. Dual Negative Sample Selection Module.)
and updating the first batch using the selected predetermined number of hard negatives. (Xu teaches using the hard-negatives to alter the Faiss library to solve the false-negative problem by removing the top-1 closest negative sample and obtaining a mixed global negative sample from the remaining top-K hard negative samples (i.e., updating the batch based on the predetermined number.) Xu at section III. Method, C. Dual Negative Sample Selection Module.)
Regarding claim 7, Vu-Xu teaches all the limitations of claim 6 as laid out above. Further, Xu teaches the method of claim 6, wherein a pre-trained second neural network model is used to generate the plurality of hard negatives for the first task. (Xu teaches using RoBERTa to encode sentences then retrieving the top-K closest negative samples from the resulting encodings (i.e., RoBERTa is used as a secondary model specifically for the generation of the hard-negative samples.) Xu at section III. Method, C. Dual Negative Sample Selection Module.)
Regarding claim 8, Vu teaches a system for providing a trained neural network, the system comprising: a memory that stores a neural network model and a plurality of processor-executable instructions; (Vu teaches the system implemented on computer readable memory connected to processors. Vu at ¶ [0158].)
a communication interface that receives a first plurality of data samples; (Vu teaches the system implemented with communication mechanisms. Vu at ¶ [0158]. (Vu teaches accessing, obtaining, or generating a plurality of datasets that differ from each other in one or more differentiation attributes. (i.e., a plurality of data samples). Vu at ¶¶ [0159] - [0160].))
and one or more hardware processors that read and execute the plurality of processor-executable instructions from the memory to perform operations comprising: (Vu teaches multiple processors that read and execute instructions containing the system. Vu at ¶ [0158].)
generating a plurality of batches using the first plurality of data samples, wherein a first batch includes data samples associated with a single first task, and wherein a second batch includes data samples associated with a single second task; (Vu teaches generating training batches used for ABSA and SLSA training tasks (i.e., a first batch for a first task and a second batch for a second task.) Vu at ¶¶ [0160] - [0167].)
and performing a first training process to the neural network model using the plurality of batches, wherein the performing the first training process includes: (Vu teaches training a multi-lingual multi-task model wherein at least a portion of the machine learning model is a neural network. Vu at ¶¶ [0140] - [0143] and [0153] - [0156]. Further, Vu teaches training the neural network using the ABSA and SLSA training batches. ¶¶ [0160] - [0167].)
generating a first loss objective function for the first batch based on the first task; generating a second loss objective function for the second batch based on the second task; (Vu teaches, using the training batches, minimizing a cross-entropy loss function and updating model parameters based on the loss function. Vu at ¶¶ [0375] - [0380]. The training batches are associated with specific tasks, and because training is performed for each task, wherein a loss is minimized based upon the training batches used for that task, then this constitutes generating a first loss objective function for the first batch based on the first task. Further, the training batches apply to multiple tasks (ABSA and SLSA) therefore a new loss is used for each task (i.e., a second task, a second loss function, and a second batch based on the second task). Vu at ¶¶ [0160] - [0167] and [0375] - [0380].)
computing a first loss based on the first loss objective function; computing a second loss based on the second loss objective function; (Vu teaches minimizing a loss based on each task at hand as a process of training for the task. Vu at ¶¶ [0375] - [0380].)
and updating parameters of the neural network model based on the first loss and the second loss via backpropagation; (Vu teaches updating parameters of a machine learning model based on the loss functions used for training. Vu at ¶¶ [0375] - [0380]. Further, Vu teaches using backpropagation to train a machine learning model (i.e., the model trained by updating parameters). Vu at ¶¶ [0140] - [0141].)
Vu, however, does not alone teach “And wherein the neural network model trained by the first training process is used to perform a text retrieval task based on text embedding”
In a similar field of endeavor (e.g., training neural networks for performing specific tasks including text retrieval/question answering wherein embeddings are used), Xu teaches and wherein the neural network model trained by the first training process is used to perform a text retrieval task based on text embedding. (Xu teaches sentence (i.e., text) embeddings being used for question answering (i.e., text retrieval) wherein training is performed to achieve such a task. Xu at sections I. Introduction, II. Overview, and III. Method, subsections A - D.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Vu with the teachings of Xu to provide the limitations of claim 1. Doing so would have improved the performance of the model of Vu by increasing performance over traditional models using hard positive and hard negative sampling to enhanced performance of the models as recognized by Xu at section IV. Experiment, B. Performance Comparison, and C. Ablation Study.
Regarding claim 9, Vu-Xu teaches all the limitations of claim 8 as laid out above. Further, Xu teaches the system of claim 8, wherein prior to the first training process, the neural network model is trained using a second training process using a second plurality of data samples. (Xu teaches the sentence encoder is a pre-trained model (i.e., trained using a second training process with different data prior to the training Xu). Xu at Section II. Overview, A. Problem Formulation.)
Regarding claim 10, Vu-Xu teaches all the limitations of claim 8 as laid out above. Further, Xu teaches the system of claim 8, wherein the neural network model includes a pre-trained generative large language model (LLM). (Xu teaches the sentence encoder is a pre-trained model (i.e., RoBERTa, which is a commonly known pre-trained generative transformer model and large language model.) Xu at Section II. Overview, A. Problem Formulation.).
Regarding claim 11, Vu-Xu teaches all the limitations of claim 8 as laid out above. Further, Xu teaches the system of claim 8, wherein the text retrieval task is different from the first task and the second task. (Xu teaches that the sentence encoder, and specifically learning sentence representations themselves, is used in downstream applications (i.e., tasks that are not part of the training process) that includes question answering (i.e., text retrieval). Xu at Section I. Introduction. Therefore, the text-retrieval task is a different task than the training tasks as it is a downstream task, not a training task.).
Regarding claim 12, Vu-Xu teaches all the limitations of claim 8 as laid out above. Further, Xu teaches the system of claim 8, wherein the first loss objective function includes a first contrastive loss customized to the first task, and wherein the second loss objective function includes a second contrastive loss customized to the second task. (Xu teaches minimizing the normalized temperature-scaled cross-entropy loss NT-Xent which is recognized as the "Core of contrastive learning". Xu at section III. Method, A. Backbone. Further, Xu teaches using additionally using a contrastive quadruplet loss function to train the sentence encoder (i.e., a second training task separate from the first that minimized cross-entropy loss). Xu at section III. Method, sections A, B, and D.).
Regarding claim 13, Vu-Xu teaches all the limitations of claim 8 as laid out above. Further, Xu teaches the system of claim 8, wherein the performing the first training process includes: generating a plurality of hard negatives for the first task; (Xu teaches retrieving the closest samples to anchor samples as hard-negative samples (i.e., generating a plurality of hard negatives.) Xu at section III. Method, C. Dual Negative Sample Selection Module.)
selecting a predetermined number of hard negatives from the plurality of hard negatives for the first task; (Xu teaches selecting the top-K closest samples as the hard-negatives, then removing the top-1 sample to avoid the false-negative problem. Xu at section III. Method, C. Dual Negative Sample Selection Module.)
and updating the first batch using the selected predetermined number of hard negatives. (Xu teaches using the hard-negatives to alter the Faiss library to solve the false-negative problem by removing the top-1 closest negative sample and obtaining a mixed global negative sample from the remaining top-K hard negative samples (i.e., updating the batch based on the predetermined number.) Xu at section III. Method, C. Dual Negative Sample Selection Module.)
Regarding claim 14, Vu-Xu teaches all the limitations of claim 13 as laid out above. Further, Xu teaches the system of claim 13, wherein a pre-trained second neural network model is used to generate the plurality of hard negatives for the first task. (Xu teaches using RoBERTa to encode sentences then retrieving the top-K closest negative samples from the resulting encodings (i.e., RoBERTa is used as a secondary model specifically for the generation of the hard-negative samples.) Xu at section III. Method, C. Dual Negative Sample Selection Module.)
Regarding claim 15, Vu teaches a non-transitory machine-readable medium comprising a plurality of machine-executable instructions which, when executed by one or more processors, are adapted to cause the one or more processors to perform operations comprising: (Vu teaches the system implemented on computer readable memory connected to processors. Vu at ¶ [0158].)
receiving, via a data interface, a first plurality of data samples; (Vu teaches accessing, obtaining, or generating a plurality of datasets that differ from each other in one or more differentiation attributes. (i.e., a plurality of data samples). Vu at ¶¶ [0159] - [0160].)
generating a plurality of batches using the first plurality of data samples, wherein a first batch includes data samples associated with a single first task, and wherein a second batch includes data samples associated with a single second task; (Vu teaches generating training batches used for ABSA and SLSA training tasks (i.e., a first batch for a first task and a second batch for a second task.) Vu at ¶¶ [0160] - [0167].)
and performing a first training process to the neural network model using the plurality of batches, wherein the performing the first training process includes: (Vu teaches training a multi-lingual multi-task model wherein at least a portion of the machine learning model is a neural network. Vu at ¶¶ [0140] - [0143] and [0153] - [0156]. Further, Vu teaches training the neural network using the ABSA and SLSA training batches. ¶¶ [0160] - [0167].)
generating a first loss objective function for the first batch based on the first task; generating a second loss objective function for the second batch based on the second task; (Vu teaches, using the training batches, minimizing a cross-entropy loss function and updating model parameters based on the loss function. Vu at ¶¶ [0375] - [0380]. The training batches are associated with specific tasks, and because training is performed for each task, wherein a loss is minimized based upon the training batches used for that task, then this constitutes generating a first loss objective function for the first batch based on the first task. Further, the training batches apply to multiple tasks (ABSA and SLSA) therefore a new loss is used for each task (i.e., a second task, a second loss function, and a second batch based on the second task). Vu at ¶¶ [0160] - [0167] and [0375] - [0380].)
computing a first loss based on the first loss objective function; computing a second loss based on the second loss objective function; (Vu teaches minimizing a loss based on each task at hand as a process of training for the task. Vu at ¶¶ [0375] - [0380].)
and updating parameters of the neural network model based on the first loss and the second loss via backpropagation; (Vu teaches updating parameters of a machine learning model based on the loss functions used for training. Vu at ¶¶ [0375] - [0380]. Further, Vu teaches using backpropagation to train a machine learning model (i.e., the model trained by updating parameters). Vu at ¶¶ [0140] - [0141].)
Vu, however, does not alone teach “And wherein the neural network model trained by the first training process is used to perform a text retrieval task based on text embedding”
In a similar field of endeavor (e.g., training neural networks for performing specific tasks including text retrieval/question answering wherein embeddings are used), Xu teaches and wherein the neural network model trained by the first training process is used to perform a text retrieval task based on text embedding. (Xu teaches sentence (i.e., text) embeddings being used for question answering (i.e., text retrieval) wherein training is performed to achieve such a task. Xu at sections I. Introduction, II. Overview, and III. Method, subsections A - D.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Vu with the teachings of Xu to provide the limitations of claim 1. Doing so would have improved the performance of the model of Vu by increasing performance over traditional models using hard positive and hard negative sampling to enhanced performance of the models as recognized by Xu at section IV. Experiment, B. Performance Comparison, and C. Ablation Study.
Regarding claim 16, Vu-Xu teaches all the limitations of claim 15 as laid out above. Further, Xu teaches the non-transitory machine-readable medium of claim 15, wherein prior to the first training process, the neural network model is trained using a second training process using a second plurality of data samples. (Xu teaches the sentence encoder is a pre-trained model (i.e., trained using a second training process with different data prior to the training Xu). Xu at Section II. Overview, A. Problem Formulation.)
Regarding claim 17, Vu-Xu teaches all the limitations of claim 15 as laid out above. Further, Xu teaches the non-transitory machine-readable medium of claim 15, wherein the neural network model includes a pre-trained generative large language model (LLM). (Xu teaches the sentence encoder is a pre-trained model (i.e., RoBERTa, which is a commonly known pre-trained generative transformer model and large language model.) Xu at Section II. Overview, A. Problem Formulation.).
Regarding claim 18, Vu-Xu teaches all the limitations of claim 15 as laid out above. Further, Xu teaches the non-transitory machine-readable medium of claim 15, wherein the text retrieval task is different from the first task and the second task. (Xu teaches that the sentence encoder, and specifically learning sentence representations themselves, is used in downstream applications (i.e., tasks that are not part of the training process) that includes question answering (i.e., text retrieval). Xu at Section I. Introduction. Therefore, the text-retrieval task is a different task than the training tasks as it is a downstream task, not a training task.).
Regarding claim 19, Vu-Xu teaches all the limitations of claim 15 as laid out above. Further, Xu teaches the non-transitory machine-readable medium of claim 15, wherein the first loss objective function includes a first contrastive loss customized to the first task, and wherein the second loss objective function includes a second contrastive loss customized to the second task. (Xu teaches minimizing the normalized temperature-scaled cross-entropy loss NT-Xent which is recognized as the "Core of contrastive learning". Xu at section III. Method, A. Backbone. Further, Xu teaches using additionally using a contrastive quadruplet loss function to train the sentence encoder (i.e., a second training task separate from the first that minimized cross-entropy loss). Xu at section III. Method, sections A, B, and D.).
Regarding claim 20, Vu-Xu teaches all the limitations of claim 15 as laid out above. Further, Xu teaches the non-transitory machine-readable medium of claim 15, wherein the performing the first training process includes: generating a plurality of hard negatives for the first task; (Xu teaches retrieving the closest samples to anchor samples as hard-negative samples (i.e., generating a plurality of hard negatives.) Xu at section III. Method, C. Dual Negative Sample Selection Module.)
selecting a predetermined number of hard negatives from the plurality of hard negatives for the first task; (Xu teaches selecting the top-K closest samples as the hard-negatives, then removing the top-1 sample to avoid the false-negative problem. Xu at section III. Method, C. Dual Negative Sample Selection Module.)
and updating the first batch using the selected predetermined number of hard negatives. (Xu teaches using the hard-negatives to alter the Faiss library to solve the false-negative problem by removing the top-1 closest negative sample and obtaining a mixed global negative sample from the remaining top-K hard negative samples (i.e., updating the batch based on the predetermined number.) Xu at section III. Method, C. Dual Negative Sample Selection Module.)
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CAMERON KENNETH YOUNG whose telephone number is (703)756-1527. The examiner can normally be reached Mon - Fri, 9:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached at 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/CAMERON KENNETH YOUNG/Examiner, Art Unit 2655
/ANDREW C FLANDERS/Supervisory Patent Examiner, Art Unit 2655