DETAILED ACTION
Acknowledgement
This non-final office action is in response to claims filed on 12/17/2024.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 05/29/2025 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Claim Objections
Claims 3 and 14 are objected to because of the following informalities:
Claims 3 and 14 include the limitations of “…processing a second training datasets using the base model to obtain a third output results, wherein the second training sample comprises data of the target business type; processing the second training sample…”. “A second training datasets” should recite “a second training sample” for proper antecedent basis as in claim 9.
Appropriate correction is required.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claims 1-16 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Fukuda et al. (US 2022/0414448 A1).
As per claims 1, 7, and 12, Fukuda teaches a method (Fukuda e.g. A method for training a neural network includes training language-specific teacher models using different respective source language datasets [0003]. FIG. 2 is a block/flow diagram showing a method of training a target model [0008]. FIG. 6 is a block/flow diagram of a method for training, deploying, and using a language model, in accordance with an embodiment of the present invention [0012];), Fukuda teaches an electronic device, comprising: at least one processor; and a memory communicatively connected with the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform a method, and a non-transitory computer readable storage medium with computer instructions stored thereon, wherein the computer instructions are used for causing a method for generating a target business model based on large model, comprising (Fukuda e.g. A system for training a neural network includes a hardware processor and a memory that stores a computer program product. When executed by the hardware processor, the computer program product causes the hardware processor to…[0004]. Cross-lingual knowledge transfer learning may be performed for languages with few training resources available, making use of large amounts of multilingual resources to train a language independent part of the model that may be transferred to a language-specific model using a teacher/student learning framework [0016]. FIG. 5 is an exemplary system for training and using a language model. A model training system 500 performs training for a target language model 502, for example as described above, using a set of training data 504. The trained target language model 502 may then be transmitted to a deployed edge device 510 [0032]. The present invention may be a system, a method, and/or a computer program product at any possible technical detail level of integration. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention [0037].):
Fukuda teaches performing knowledge distillation on at least two pre-trained large models to obtain a base model of a target scenario, each pre-trained large model corresponding to one of at least two business types included in the target scenario; (Fukuda e.g. During training of a language-independent part, a multilingual student model is trained with multiple languages, each of which may have large amounts of training data, using a set of single-language teacher models [0017]. Language-specific teacher models 102 are trained using respective language datasets 104, which may include multiple different respective languages. It is specifically contemplated that these languages may include languages that have a large amount of available training data, such as English, Japanese, and German, but any appropriate languages may be used. The source language data 104 may include, for example, speech data that is labeled according to particular phonemes. The source language data 104 may also include multilingual training datasets, for example including speech from multiple different languages in a single combined dataset (Fig. 1 and [0020]). The teacher models 102 may include an artificial neural network (ANN) that has multiple distinct sections. Each of the teacher models 102 may be separately trained with its respective language data 104. In some examples, the teacher models 102 may be trained to generate phones, senones, or tied quinphone states from speech data inputs [0021]. Knowledge distillation may thus be used to enhance phoneme classification abilities for each language in the student model. In such a framework, instead of training models in a single step, training may be split into two steps, including training a complex teacher neural network, followed by training a relatively simple student network using soft outputs generated by the teacher model [0031]. The Examiner submits that the different types of languages could represent and/or be analogous to different business types with no change in function of the models.)
Fukuda teaches performing knowledge distillation on the base model to obtain a target business model of a target business type among the at least two business types, the target business model being used for processing data of the target business type. (Fukuda e.g. FIG. 1 is a block diagram of relationships between teacher models, a multi-language student model, and a target model for training the target model using language independent layers generated in the training of the multi-language student model, in accordance with an embodiment of the present invention [0007]. FIG. 2 is a block/flow diagram showing a method of training a target model using language-independent layers generated in the training of a multi-language student model, in accordance with an embodiment of the present invention [0008]. Weights of language-independent layers of the student model are copied to a language-independent layers of a target model to initialize language-independent layers of the target model. The target model is trained with a target language dataset [0003]. Cross-lingual knowledge transfer learning may be performed for languages with few training resources available, making use of large amounts of multilingual resources to train a language independent part of the model that may be transferred to a language-specific model using a teacher/student learning framework [0016]. The learned parameters of the language-independent layers within the student model may then be used to initialize a corresponding part of a language-specific target model. During training of the language-specific target model, only a target language is used to train the entire target model, including the initialized part [0018]. Knowledge distillation may thus be used to enhance phoneme classification abilities for each language in the student model. In such a framework, instead of training models in a single step, training may be split into two steps, including training a complex teacher neural network, followed by training a relatively simple student network using soft outputs generated by the teacher model [0031].)
As per claims 2, 8, and 13, Fukuda teaches the method according to claim 1, the electronic device according to claim 7, and the non-transitory computer readable storage medium according to claim 12, wherein performing knowledge distillation on at least two pre-trained large models to obtain a base model of a target scenario comprises:
Fukuda teaches processing a first training sample using each pre-trained large model to obtain a first output result, wherein the first training sample comprises: data from the at least two business types; (Fukuda e.g. Language-specific teacher models 102 are trained using respective language datasets 104, which may include multiple different respective languages. It is specifically contemplated that these languages may include languages that have a large amount of available training data, such as English, Japanese, and German, but any appropriate languages may be used. The source language data 104 may include, for example, speech data that is labeled according to particular phonemes. The source language data 104 may also include multilingual training datasets, for example including speech from multiple different languages in a single combined dataset (Fig. 1 and [0020]). The teacher models 102 may include an artificial neural network (ANN) that has multiple distinct sections. Each of the teacher models 102 may be separately trained with its respective language data 104. In some examples, the teacher models 102 may be trained to generate phones, senones, or tied quinphone states from speech data inputs [0021]. Knowledge distillation may thus be used to enhance phoneme classification abilities for each language in the student model. In such a framework, instead of training models in a single step, training may be split into two steps, including training a complex teacher neural network, followed by training a relatively simple student network using soft outputs generated by the teacher model [0031].)
Fukuda teaches processing the first training sample using the base model to obtain at least two second output results, wherein each second output result corresponds to a business type; (Fukuda e.g. During training of a language-independent part, a multilingual student model is trained with multiple languages, each of which may have large amounts of training data, using a set of single-language teacher models. The student model may include some layers which are language-dependent and some layers that are language-independent. Training of the multilingual student model may include shuffling of the language inputs and shuffling of trained language-dependent layer parameters to diminish the amount of language dependence within the language-independent layers [0017]. The outputs of the teacher models 102 may be used to train a student model in block 110, which may also be implemented as an ANN. The student model may include a set of language-independent layers 116 and language-dependent layers 118, where the language-dependent layers 118 may correspond to the respective different language data sets 104 (Fig. 1 and [0022]). During training, the source language data 104 is used to train the model, including both the language independent layers 116 and the language dependent layers 118 in conjunction. The outputs of the teacher models 102 may be used to provide a baseline for the training,… [0022]. Training may include shuffling of the language input data, as will be described in greater detail below. Such shuffling may include, for example, cross-training the language-dependent layers 118 with language training datasets 104 other than their native language [0023]. Knowledge distillation may thus be used to enhance phoneme classification abilities for each language in the student model. In such a framework, instead of training models in a single step, training may be split into two steps, including training a complex teacher neural network, followed by training a relatively simple student network using soft outputs generated by the teacher model [0031]. In the first step, a relatively complex model based on BLSTM layers, convolutional neural networks (CNNs), and/or residual neural networks is initially trained using hard targets. The student network is then trained using the soft outputs of the teachers, using a training criterion that minimizes the differences between the student and teacher distributions [0031].)
Fukuda teaches constructing a first loss function based on the first output results and the second output results; adjusting model parameters of the base model based on the first loss function. (Fukuda e.g. The outputs of the teacher models 102 may be used to provide a baseline for the training, with differences of the generated output of the student model from the expected outputs of the teacher models 102 being used to generate an error signal that provides updates to the weights of the student model (Fig. 1 and [0022]). Thus, training the student model 110 seeks to minimize this loss function by adjusting the parameters of the language-independent layers 116 and the language-dependent layers 118 [0031].)
As per claims 3, 9, and 14, Fukuda teaches the method according to claim 1, the electronic device according to claim 7, and the non-transitory computer readable storage medium according to claim 12, wherein performing knowledge distillation on the base model to obtain a target business model of a target business type among the at least two business types comprises:
Fukuda teaches processing a second training datasets using the base model to obtain a third output results, wherein the second training sample comprises data of the target business type; (Fukuda e.g. Once student model training 110 is complete, target model training 120 may be performed, using language data specific to the target language. Any language dataset may be used for the target language data 122...[0024]. As with the student model, the target model may include language independent layers 116 and target-language—specific layers 124, each of which may be made up of a respective series of fully connected layers [0024]. Target model training 120 imports the weight parameters of the language independent layers 116 that were generated during the student model training 110. These weight parameters are used to initialize the language independent layers 116 of the target model before training on the target language data 122 [0025].)
Fukuda teaches processing the second training sample using the target business model to obtain a fourth output result; (Fukuda e.g. Training then updates both the language-independent layers 116 and target-language—specific layers 124 to train the target model on the target language [0025]. During training of the target model 120, the language-independent layers 116 of the target model may be initialized 222 using the values learned during training of the student model 116. The resulting target model, including language-independent initialization, may provide superior results to one that is trained from scratch, for example using zero-initialized or randomly initialized values for these layers [0030].)
Fukuda teaches constructing a second loss function based on the third output result and the fourth output result; adjusting model parameters of the target business model based on the second loss function (Fukuda e.g. The teacher models, student model, and target language model may be implemented as ANNs [0048]. Referring now to FIG. 7, a generalized diagram of a neural network is shown [0049]. This represents a “feed-forward” computation, where information propagates from input neurons 702 to the output neurons 706 [0051]. Upon completion of a feed-forward computation, the output is compared to a desired output available from training data. The error relative to the training data is then processed in “backpropagation” computation, where the hidden neurons 704 and input neurons 702 receive information regarding the error propagating backward from the output neurons 706. Once the backward error propagation has been completed, weight updates are performed, with the weighted connections 708 being updated to account for the received error [0051]. To train an ANN, training data can be divided into a training set and a testing set. The training data includes pairs of an input and a known output. During training, the inputs of the training set are fed into the ANN using feed-forward propagation. After each input, the output of the ANN is compared to the respective known output. Discrepancies between the output of the ANN and the known output that is associated with that particular input are used to generate an error value, which may be backpropagated through the ANN, after which the weight values of the ANN may be updated. This process continues until the pairs in the training set are exhausted [0052].)
As per claims 4, 10, and 15, Fukuda teaches the method according to claim 1, The electronic device according to claim 7, and The non-transitory computer readable storage medium according to claim 12, further comprising:
Fukuda teaches in response to determining that at least some of the pre-trained large models have been updated, performing knowledge distillation on the updated pre-trained large models to obtain an updated base model; performing knowledge distillation on the updated base model to obtain an updated target business model. (Fukuda e.g. A system for training a neural network includes a hardware processor and a memory that stores a computer program product. When executed by the hardware processor, the computer program product causes the hardware processor to train language-specific teacher models using different respective source language datasets, train a student model, using the different respective source language datasets and soft labels generated by the language-specific teacher models, including shuffling the source language datasets and shuffling weights of language-dependent layers in language-specific parts of the student model, copy weights of language-independent layers of the student model to a language-independent layers of a target model to initialize language-independent layers of the target model, and train the target model using a target language dataset [0004]. Cross-lingual knowledge transfer learning may be performed for languages with few training resources available, making use of large amounts of multilingual resources to train a language independent part of the model that may be transferred to a language-specific model using a teacher/student learning framework [0016]. During training of a language-independent part, a multilingual student model is trained with multiple languages, each of which may have large amounts of training data, using a set of single-language teacher models. The student model may include some layers which are language-dependent and some layers that are language-independent. Training of the multilingual student model may include shuffling of the language inputs and shuffling of trained language-dependent layer parameters to diminish the amount of language dependence within the language-independent layers [0017]. The learned parameters of the language-independent layers within the student model may then be used to initialize a corresponding part of a language-specific target model. During training of the language-specific target model, only a target language is used to train the entire target model, including the initialized part [0018]. In this way, parameters of a language-independent part may be trained with a large amount of source language data. When a target language with relatively few training resources is used, this cross-lingual transfer learning provides a significant improvement, and diminishes the need for target-language—specific data collection. For example, the language-independent part may be used to train the phoneme discrimination abilities of a speech recognition network, which may be relatively consistent across different languages [0019]. Referring now to FIG. 1, an exemplary relationship between training teacher models 100, training a student model 110, and training a target model 120 is shown [0020]. FIG. 2 is a block/flow diagram showing a method of training a target model using language-independent layers generated in the training of a multi-language student model, in accordance with an embodiment of the present invention [0008]. The teacher models, student model, and target language model may be implemented as ANNs (Fig. 7 and [0048]). Knowledge distillation may thus be used to enhance phoneme classification abilities for each language in the student model. In such a framework, instead of training models in a single step, training may be split into two steps, including training a complex teacher neural network, followed by training a relatively simple student network using soft outputs generated by the teacher model [0031]. In the first step, a relatively complex model based on BLSTM layers, convolutional neural networks (CNNs), and/or residual neural networks is initially trained using hard targets. The student network is then trained using the soft outputs of the teachers, using a training criterion that minimizes the differences between the student and teacher distributions [0031]. The Examiner submits that teacher, student, and target models include shared/interconnected layers. Thus an update in one model (e.g. teacher) propagates an update to the other models (e.g. student and target).)
As per claims 5, 11, and 16, Fukuda teaches the method according to claim 1, the electronic device according to claim 7, and the non-transitory computer readable storage medium according to claim 12, further comprising:
Fukuda teaches processing data of the target business type using the target business model to obtain a data processing result; (Fukuda e.g. FIG. 5 is an exemplary system for training and using a language model [0032]. The trained target language model 502 may be transmitted from the model training system 500 to the deployed edge device 510 by any appropriate communications medium [0033]. After receiving the target language model 502, the deployed edge device may use a sensor 512 to perform a language processing function. For example, such a sensor 512 may include a microphone or other device that is capable of recording speech information. The recorded speech information may then be used as an input to the target language model 502, for example to perform speech recognition or any other appropriate task [0034]. FIG. 6 is a method of training, deploying, and using a language model [0036].)
Fukuda teaches in response to determining that the data processing result meets a preset evolution condition, evolving the target business model to obtain an evolved target business model. (Fukuda e.g. In some cases, information that is recorded by the sensor 512 may be transmitted back to the model training system 500. Such returned information may be annotated and used to update the training of the target language model 502 [0034]. The teacher models, student model, and target language model may be implemented as ANNs [0048]. Referring now to FIG. 7, a generalized diagram of a neural network is shown [0049]. This represents a “feed-forward” computation, where information propagates from input neurons 702 to the output neurons 706 [0051]. Upon completion of a feed-forward computation, the output is compared to a desired output available from training data. The error relative to the training data is then processed in “backpropagation” computation, where the hidden neurons 704 and input neurons 702 receive information regarding the error propagating backward from the output neurons 706. Once the backward error propagation has been completed, weight updates are performed, with the weighted connections 708 being updated to account for the received error [0051]. After the training has been completed, the ANN may be tested against the testing set, to ensure that the training has not resulted in overfitting. If the ANN can generalize to new inputs, beyond those which it was already trained on, then it is ready for use. If the ANN does not accurately reproduce the known outputs of the testing set, then additional training data may be needed, or hyperparameters of the ANN may need to be adjusted [0053].)
As per claim 6, Fukuda teaches a data processing method, comprising (Fukuda e.g. FIG. 5 is an exemplary system for training and using a language model [0032]. FIG. 6 is a method of training, deploying, and using a language model [0036].):
Fukuda teaches obtaining target data of a target business type; processing the target data using a target business model corresponding to the target business type to obtain a processing result; wherein the target business model is generated using the method according to claim 1. (Fukuda e.g. FIG. 5 is an exemplary system for training and using a language model [0032]. The trained target language model 502 may be transmitted from the model training system 500 to the deployed edge device 510 by any appropriate communications medium [0033]. After receiving the target language model 502, the deployed edge device may use a sensor 512 to perform a language processing function. For example, such a sensor 512 may include a microphone or other device that is capable of recording speech information. The recorded speech information may then be used as an input to the target language model 502, for example to perform speech recognition or any other appropriate task [0034]. FIG. 6 is a method of training, deploying, and using a language model. Block 600 trains the target language model 502, as described above. Block 610 then deploys the trained target language model 502 to the deployed edge device 510. Block 620 performs the language processing task, using the trained target language model 502 and data from the sensor 512 [0036].)
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure include FOR: Chen, Yan (CN-111198940-B) “FAQ Method, Question-Answer Search System, Electronic Device, And Storage Medium” and NPL: S. Orihashi et al., "Hierarchical Knowledge Distillation for Dialogue Sequence Labeling," 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Cartagena, Colombia, 2021, pp. 433-440.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Ayanna Minor whose telephone number is (571)272-3605. The examiner can normally be reached M-F 9am-5 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jerry O'Connor can be reached at 571-272-6787. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/A.M./Examiner, Art Unit 3624
/Jerry O'Connor/Supervisory Patent Examiner,Group Art Unit 3624