DETAILED ACTION
Remarks
The instant application having Application Number 18/218,405 filed on July 5, 2023 has a total of 15 claims pending in the application; there are 4 independent claims and 11 dependent claims, all of which are presented for examination by the examiner.
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
Examiner Notes
Examiner cites particular columns and line numbers in the references as applied to the claims below for the convenience of the applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that, in preparing responses, the applicant fully consider the references in entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the examiner.
The examiner requests, in response to this Office action, supports are shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line no(s) in the specification and/or drawing figure(s). This will assist the examiner in prosecuting the application.
When responding to this office action, Applicant is advised to clearly point out the patentable novelty which he or she thinks the claims present, in view of the state of the art disclosed by the references cited or the objections made. He or she must also show how the amendments avoid such references or objections See 37 CFR 1.111(c).
Information Disclosure Statement
As required by M.P.E.P. 609(C), the applicant’s submissions of the Information Disclosure Statements dated July 5, 2023 and July 8, 2024 are acknowledged by the examiner and the cited references have been considered in the examination of the claims now pending. As required by M.P.E.P 609 C (2), a copy of the PTOL-1449 initialed and dated by the examiner is attached to the instant office action.
Drawings
The applicant’s drawings submitted are acceptable for examination purposes.
Claim Objections
The claims 1-15 are objected to because they include reference characters “ML” which are not enclosed within parentheses.
Reference characters corresponding to elements recited in the detailed description of the drawings and used in conjunction with the recitation of the same element or group of elements in the claims should be enclosed within parentheses so as to avoid confusion with other numbers or characters which may appear in the claims. See MPEP § 608.01(m).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-6, 10-15 are rejected under 35 U.S.C. 103 as being unpatentable over Park et al. (US Patent Publication No. 2022/0207431 A1, ‘Park’, hereafter) in view of Zhang et al. (Chinese Patent Publication No. CN 110837761 A, ‘Zhang’, hereafter).
Regarding claim 1. Park teaches a system for knowledge distillation between machine learning, ML, models (Park [0026-0027] and Fig. 1), the system comprising:
a pre-trained teacher ML model, trained using a first training dataset, the pre-trained teacher ML model comprising first model parameters (Park [0028-0029]);
a pre-trained student ML model, trained using a second training dataset, where the second training dataset is a subset of the first training dataset or is a different training dataset, the pre-trained student ML model comprising second model parameters (Park [0028-0029], [0031]);
Park does not teach
a condenser machine learning, ML, model parameterized by a set of parameters; and
at least one processor coupled to memory configured to:
input, into the condenser ML model, a third training dataset, the third training dataset comprising the first model parameters, the second model parameters, the first training dataset and the second training dataset; and
train the condenser ML model, using the third training dataset, to learn a parameter mapping function that models a relationship between the first model parameters and the second model parameters, and to output the second model parameters from an input comprising the first model parameters,
However, Zhang teaches
a condenser machine learning, ML, model parameterized by a set of parameters (Zhang, page 5, lines 10-14); and
at least one processor coupled to memory (Zhang, page 3, lines 52-60, page 9, lines 35-60) configured to:
input, into the condenser ML model, a third training dataset, the third training dataset comprising the first model parameters, the second model parameters, the first training dataset and the second training dataset (Zhang, page 3, lines 47-48, page 4, lines 2-9); and
train the condenser ML model, using the third training dataset, to learn a parameter mapping function that models a relationship between the first model parameters and the second model parameters, and to output the second model parameters from an input comprising the first model parameters (Zhang, page 3, lines 13-48, page 4, lines 2-9).
Therefore, it would have been obvious to one ordinary skill in the art before the effective filing date of the claimed invention was made having the teachings of Park and Zhang before him/her, to modify Park with the teaching of Zhang’s multi-model knowledge distillation method and device, electronic equipment and storage medium. One would have been motivated to do so for the benefit of acquiring different feature representations in the training data can be obtained by utilizing the plurality of sub-models in the teacher model, and the student model can learn features in the teacher model by utilizing a knowledge distillation mode, so that the problem of limited expression capability of a single model is solved, and the model precision of the student model is improved (Zhang, Abstract).
Regarding claim 2. Park as modified teaches wherein training the condenser ML model comprises training a first submodel of the condenser ML model using:
the first model parameters, wherein the first model parameters comprise parameters of parameter mapping functions that map parameters of the pre-trained teacher ML model to parameters of the pre-trained student ML model (Zhang, page 3, lines 47-48, page 4, lines 2-9); and
the second model parameters, wherein the second model parameters comprise parameters of parameter mapping functions that map parameters of the pre-trained student ML 23 model to parameters of the pre-trained teacher ML model (Zhang, page 3, lines 13-48, page 4, lines 2-9, page 5, lines 42-47).
Regarding claim 3. Park as modified teaches wherein the parameter mapping functions map comprises at least one of ML model weights, parameters or variables of graphical models, parameters of kernel machines, and variables of regression functions (Zhang, page 2, lines 54-55).
Regarding claim 4. Park as modified teaches wherein training the condenser ML model comprises training a second submodel of the condenser ML model using:
the first model parameters, wherein the first model parameters comprise parameters of feature mapping functions that map features of the pre-trained teacher ML model to features of the pre-trained student ML model (Park [0029], [0031]); and
the second model parameters, wherein the second model parameters comprise parameters of feature mapping functions that map features of the pre-trained student ML model to features of the pre-trained teacher ML model (Park [0029], [0031]).
Regarding claim 5. Park as modified teaches wherein the at least one processor is further configured to: generate a new student ML model using the pre-trained teacher ML model and the learned parameter mapping function (Park [0034]).
Regarding claim 6. Park as modified teaches wherein the first training dataset comprises at least one of images and videos (Zhang, page 5, lines 16-19).
Regarding claim 10. Park teaches a system for knowledge distillation between machine learning, ML, models that perform object recognition (Park [0026-0027] and Fig. 1), the system comprising:
a pre-trained teacher ML model, trained using a first training dataset comprising a plurality of images of objects, the pre-trained teacher ML model comprising first model parameters (Park [0028-0029]);
a pre-trained student ML model, trained using a second training dataset, where the second training dataset is a subset of the first training dataset or is a different training dataset, the pre-trained student ML model comprising second model parameters (Park [0028-0029], [0031]);
Park does not teach
a condenser machine learning, ML, model parameterized by a set of parameters; and
at least one processor coupled to memory configured to:
input, into the condenser ML model, a third training dataset, the third training dataset comprising the first model parameters, the second model parameters, the first training dataset and the second training dataset; and
train the condenser ML model, using the third training dataset, to learn a parameter mapping function that models a relationship between the first model parameters and the second model parameters, and to output the second model parameters, and to output the second model parameters from an input comprising the first model parameters.
However, Zhang teaches
a condenser machine learning, ML, model parameterized by a set of parameters (Zhang, page 5, lines 10-14); and
at least one processor coupled to memory (Zhang, page 3, lines 52-60, page 9, lines 35-60) configured to:
input, into the condenser ML model, a third training dataset, the third training dataset comprising the first model parameters, the second model parameters, the first training dataset and the second training dataset (Zhang, page 3, lines 47-48, page 4, lines 2-9); and
train the condenser ML model, using the third training dataset, to learn a parameter mapping function that models a relationship between the first model parameters and the second model parameters, and to output the second model parameters, and to output the second model parameters from an input comprising the first model parameters (Zhang, page 3, lines 13-48, page 4, lines 2-9).
Therefore, it would have been obvious to one ordinary skill in the art before the effective filing date of the claimed invention was made having the teachings of Park and Zhang before him/her, to modify Park with the teaching of Zhang’s multi-model knowledge distillation method and device, electronic equipment and storage medium. One would have been motivated to do so for the benefit of acquiring different feature representations in the training data can be obtained by utilizing the plurality of sub-models in the teacher model, and the student model can learn features in the teacher model by utilizing a knowledge distillation mode, so that the problem of limited expression capability of a single model is solved, and the model precision of the student model is improved (Zhang, Abstract).
Regarding claim 11. Park teaches a system for knowledge distillation between machine learning, ML, models that perform speech recognition (Park [0026-0027] and Fig. 1), the system comprising:
a pre-trained teacher ML model, trained using a first training dataset comprising a plurality of audio files, each audio file comprising speech, the pre-trained teacher ML model comprising first model parameters (Park [0028-0029], [0031]);
a pre-trained student ML model, trained using a second training dataset, where the second training dataset is a subset of the first training dataset or is a different training dataset, the pre-trained student ML model comprising second model parameters (Park [0028-0029], [0031]);
Park does not teach
a condenser machine learning, ML, model parameterized by a set of parameters; and
at least one processor coupled to memory configured to:
input, into the condenser ML model, a third training dataset, the third training dataset comprising the first model parameters, the second model parameters, the first training dataset and the second training dataset; and
train the condenser ML model, using the third training dataset, to learn a parameter mapping function that models a relationship between the first model parameters and the second model parameters, and to output the second model parameters, and to output the second model parameters from an input comprising the first model parameters.
However, Zhang teaches
a condenser machine learning, ML, model parameterized by a set of parameters (Zhang, page 5, lines 10-14); and
at least one processor coupled to memory (Zhang, page 3, lines 52-60, page 9, lines 35-60) configured to:
input, into the condenser ML model, a third training dataset, the third training dataset comprising the first model parameters, the second model parameters, the first training dataset and the second training dataset (Zhang, page 3, lines 47-48, page 4, lines 2-9); and
train the condenser ML model, using the third training dataset, to learn a parameter mapping function that models a relationship between the first model parameters and the second model parameters, and to output the second model parameters, and to output the second model parameters from an input comprising the first model parameters (Zhang, page 3, lines 13-48, page 4, lines 2-9).
Therefore, it would have been obvious to one ordinary skill in the art before the effective filing date of the claimed invention was made having the teachings of Park and Zhang before him/her, to modify Park with the teaching of Zhang’s multi-model knowledge distillation method and device, electronic equipment and storage medium. One would have been motivated to do so for the benefit of acquiring different feature representations in the training data can be obtained by utilizing the plurality of sub-models in the teacher model, and the student model can learn features in the teacher model by utilizing a knowledge distillation mode, so that the problem of limited expression capability of a single model is solved, and the model precision of the student model is improved (Zhang, Abstract).
Regarding claims 12-15, the method steps of claims 1-4 substantially encompass the system recited in claims 12-15. Therefore, claims 12-15 are rejected for at least the same reason as claims 1-4 above.
Claims 7-9 are rejected under 35 U.S.C. 103 as being unpatentable over Park et al. (US Patent Publication No. 2022/0207431 A1, ‘Park’, hereafter) in view of Zhang et al. (Chinese Patent Publication No. CN 110837761 A, ‘Zhang’, hereafter) and further in view of Kim et al. (US Patent No. 11,244,671 B2, ‘Kim’, hereafter)
Regarding claim 7. Park and Zhang do not teach wherein the pre-trained teacher ML model is trained to perform a computer vision task, wherein the computer vision task comprises at least one of object recognition, object detection, object tracking, scene analysis, pose estimation, image or video segmentation, image or video synthesis, and image or video enhancement.
However, Kim teaches wherein the pre-trained teacher ML model is trained to perform a computer vision task (Kim, Col 5, line 28 – Col 6, line 6),
wherein the computer vision task comprises at least one of object recognition, object detection, object tracking, scene analysis, pose estimation, image or video segmentation, image or video synthesis, and image or video enhancement (Park [0028], [0087]).
Therefore, it would have been obvious to one ordinary skill in the art before the effective filing date of the claimed invention was made having the teachings of Park, Zhang and Kim before him/her, to further modify Park with the teaching of Kim’s model training method and apparatus. One would have been motivated to do so for the benefit of maximizing the recognition rate of the artificial neural network (ANN) while miniaturizing the size of the ANN (Kim, Abstract, Col 1, lines 46-48).
Regarding claim 8. Park as modified teaches wherein the first training dataset comprises audio files (Kim, Col 5, lines 28-35).
Regarding claim 9. Park as modified teaches wherein the pre-trained teacher ML model is trained to perform audio analysis task, wherein the audio analysis task comprises at least one of audio recognition, audio classification, speech synthesis, speech processing, speech enhancement, speech-to-text, and speech recognition (Kim, Col 5, lines 12-43).
Conclusion
The prior art made of record, listed on form PTO-892, and not relied upon, if any, is considered pertinent to applicant’s disclosure.
Chen et al. (Chinese Patent Publication No. CN 117726011 A) discloses a technical field of artificial intelligence, providing a model distillation method for natural language processing, a model distillation device for natural language processing, a computer storage medium, an electronic device, wherein the model distillation method for natural language processing comprises: obtaining a training sample set, pre-training the multi-task teacher model to be trained by using the training sample set, obtaining the pre-trained multi-task teacher model; the pre-trained multi-task teacher model and the to-be-trained student model are integrally distilled and trained by using the training sample set to obtain the trained multi-task teacher model and the student model after the primary distillation; using the target subset matched with the natural language processing task corresponding to the student model after the primary distillation and the trained multi-task teacher model, carrying out distillation training to the student model after the primary distillation, obtaining the distilled student model. The invention can improve the distillation effect of student model.
Fukuda (US Patent Publication No. 20220188643 A1) discloses a method of training a student neural network is provided. The method includes feeding a data set including a plurality of input vectors into a teacher neural network to generate a plurality of output values, and converting two of the plurality of output values from the teacher neural network for two corresponding input vectors into two corresponding soft labels. The method further includes combining the two corresponding input vectors to form a synthesized data vector, and forming a masked soft label vector from the two corresponding soft labels. The method further includes feeding the synthesized data vector into the student neural network, using the masked soft label vector to determine an error for modifying weights of the student neural network, and modifying the weights of the student neural network.
Jandial et al. (US Patent Publication No. 20240296335 A1) discloses a student model is trained based on a teacher model and a past student model. For example, a first set of labels are generated by a teacher model based on training data, a subset of labels are replace with labels generated by a past student model based on the training data, and a student model it trained based on these labels and the training data.
Na et al. (US Patent Publication No. 11488013 B2) discloses a model training method and apparatus is disclosed, where the model training method acquires a recognition result of a teacher model and a recognition result of a student model for an input sequence and trains the student model such that the recognition result of the teacher model and the recognition result of the student model are not distinguished from each other.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HASANUL MOBIN whose telephone number is (571)270-1289. The examiner can normally be reached on 9AM to 6:00PM EST M-F.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Charles Rones can be reached at 571-272-4085. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/HASANUL MOBIN/
Primary Examiner, Art Unit 2168