Response to Arguments
Applicant’s arguments with respect to claims 1-4, 6 and 8 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
Claim 2 is rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. Applicant claims teacher data, this is not mentioned in the specification.
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
Claims 1-4, 6 and 8 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claims 1, 6 and 8 recite something like, the first model comprises layer “((N-n)+1)-th layer to an N-th layer” and the second model comprises “(N-n+1)-th layer to the N-th layer…” These are the same layer and they are said to comprise different models. This doesn’t make sense. The change in parentheses from one group of layer to the next doesn’t actually change the layers that are included in the group.
Claims 1, 6 and 8 recite something like “wherein N and n are integers equal to or greater than 1, and satisfies N >n, and in the third model…” The last part, where it says, “and in the third model”, is unclear.
Claim 2 recites “the task”. There is a “multi-task” and a “predetermined task” in the preceding claims. It is unclear what task is claimed.
Claim 3 recites “the task”. There is a “multi-task” and a “predetermined task” in the preceding claims. It is unclear what task is claimed.
Claim 4 recites “the task”. There is a “multi-task” and a “predetermined task” in the preceding claims. It is unclear what task is claimed.
Claim 4 recites “token sequence in which a part of the sentence represented by the third data is masked and a segment id of all 0s.” This clause is unclear. Does Applicant mean that the token sequence has a segment id of all 0s?
Claim 8 recites “where the first comprises upper encoding layers…” Applicant probably means the first model, but it is unclear.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-4, 6 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Multi-Task Deep Neural Networks for Natural Language Understanding to Liu et al and US20190073568A1 to He et al.
Liu teaches claims 1, 6 and 8. An information processing
share encoding layers from a first layer to a (N-n)-th layer, the shared encoding layers having parameters trained in advance and being in common to a first model and a second model; and (Liu fig. 1, see below.)
PNG
media_image1.png
448
670
media_image1.png
Greyscale
train parameters of a combined model including the first model and the second model as a third model through multi-task training including training of the first model and retraining of the second model for a predetermined task, (Liu p. 4 sec. 3.1 “In the multi-task learning stage, we use minibatch based stochastic gradient descent (SGD) to learn the parameters of our model (i.e., the parameters of all shared layers and task-specific layers)…”)
wherein the first model comprises upper encoding layers from an ((N- n)+1)-th layer to an N-th layer, (The task specific layers are four separate models shown in fig. 1.)
the second model comprises upper encoding layers from the ((N-n+1)- th layer to the N-th layer, and (The task specific layers are four separate models shown in fig. 1.)
the shared encoding layers from the first layer to the (N-n)-th layer are updated during the multi-task training; (Liu p. 4 sec. 3.1 “In the multi-task learning stage, we use minibatch based stochastic gradient descent (SGD) to learn the parameters of our model (i.e., the parameters of all shared layers and task-specific layers)…”)
wherein N and n are integers equal to or greater than 1, and satisfies N >n, and in the third model, and (Examiner interprets this to mean that the n+1 to N layers are the third model, which is a combination of at least two other models. This idea is shown in Liu fig. 1, see below task specific layer.)
PNG
media_image1.png
448
670
media_image1.png
Greyscale
encoding layers from an ((N-n)+1)-th layer to an N-th layer having parameters trained in advance are divided into the first model and the second model. (Examiner interprets this to mean that the parameters are trained in advance of being used to pre. Liu p. 4 sec. 3.1 “In the multi-task learning stage, we use minibatch based stochastic gradient descent (SGD) to learn the parameters of our model (i.e., the parameters of all shared layers and task-specific layers)…”)
Liu doesn’t teach a generic computer.
However, He teaches An information processing apparatus comprising:
a processor; and a memory storing computer executable instructions, which, when executed by the processor, cause the information processing apparatus to: (He para 46 “one or more processors, which executes instructions from a memory medium.”)
wherein, during inference, the processor is configured to receive input and provide an output corresponding to a result of processing by the trained combined model to an output device, and (He para 77 “the neural network is shown as being used in a fully trained mode (i.e., the neural network has already been trained) with the inputs that may be provided to the neural network for runtime or production mode. (In all of the drawings shown herein, the elements that are shaded (e.g., inputs, outputs, layers) are those that are not being used for a particular application. In contrast, the elements that are not shaded (e.g., inputs, outputs, layers) are those that are being used for a particular application of the neural network.)”)
He, Liu and the claims share pretrained NN layers. It would have been obvious to a person having ordinary skill in the art, at the time of filing, to implement Liu on a computer system because there are several instances where values are “computed”. Liu p. 4.
Liu teaches claim 2. The information processing apparatus according to claim 1, wherein the processor is configured to:
use an error between outputs from training data of the task and teacher data in the training data to update parameters of the encoding layers from the first layer to the (N-n)-th layer shared by the first model and the second model and parameters of the encoding layers from the ((N-n)+1)-th layer to the N-th layer of the first model, and (Liu p. 4 “The training procedure of MT-DNN consists of two stages: pretraining and multi-task learning.”)
use an error between output by of the second model using third data as input and corresponding training data to update the parameters of the encoding layers from the first layer to the (N-n)-th layer shared by the first model and the second model and parameters of the encoding layers from the ((N-n)+1)-th layer to the N-th layer of the second model. (Liu p. 4 “The training procedure of MT-DNN consists of two stages: pretraining and multi-task learning.”)
Liu teaches claims 3. The information processing apparatus according to wherein the training data is in a first domain as a source domain, the third data is in a second domain as a target domain of the task, and the third data is data distinct from the first. (Liu p. 4 “The training procedure of MT-DNN consists of two stages: pretraining and multi-task learning.” And p. 6 shows the different tasks/domains that the second layer is trained on.)
Liu teaches claims 4. The information processing apparatus according to claim 2, wherein the task is a machine reading comprehension task, and the encoding layers are transformer layers of BERT, (Liu p. 4 “The pretraining stage follows that of the BERT model (Devlin et al., 2018). The parameters of the lexicon encoder and Transformer encoder are learned using two unsupervised prediction tasks: masked language modeling and next sentence prediction.”)
the training data as input comprise a token sequence including a question sentence and a document as an answer, and a segment id of 0 in the question sentence and 1 in the document, and the corresponding training data includes a token sequence in which a part of the sentence represented by the third data is masked and a segment id of all 0s. (Liu p. 1 “For example, BERT is based on a multi-layer bidirectional Transformer, and is trained on plain text for masked word prediction and next sentence prediction tasks.”)
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Austin Hicks whose telephone number is (571)270-3377. The examiner can normally be reached Monday - Thursday 8-4 PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached at (571) 270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/AUSTIN HICKS/Primary Examiner, Art Unit 2124