Last updated: May 29, 2026
Application No. 17/679,001
TRAINING METHOD, STORAGE MEDIUM, AND TRAINING DEVICE

Final Rejection §103
Filed
Feb 23, 2022
Priority
Aug 30, 2019 — continuation of PCTJP2019034198
Examiner
LANE, THOMAS BERNARD
Art Unit
2142
Tech Center
2100 — Computer Architecture & Software
Assignee
Fujitsu Limited
OA Round
2 (Final)
Interview Optional

— +22.2% interview lift. Examiner has a relatively high allowance rate (85%); +22.2% interview lift. A written response may suffice.
Based on 13 resolved cases, 2023–2026
Examiner Intelligence

LANE, THOMAS BERNARD View full profile →
Grants 85% — above average
Career Allowance Rate
11 granted / 13 resolved
+29.6% vs TC avg
Strong +22% interview lift
Without
With
+22.2%
Interview Lift
resolved cases with interview
Typical timeline
3y 9m
Avg Prosecution
6 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
4.1%
-35.9% vs TC avg
§103
91.8%
+51.8% vs TC avg
§102
2.0%
-38.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 13 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . This office action is in response to the Application filed on 02/23/2022. Claims 1-12 are presented for examination.

Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. PCT/JP2019/034198, filed on August 30, 2019.

Response to Arguments
Applicant's arguments filed 11/07/2025 regarding the rejection under 35 USC 101 have
been fully considered and they are persuasive.
The rejection under 35 USC 101 is withdrawn.
Applicant's arguments filed 11/07/2025 regarding the rejection under 35 USC 102 and 103 have been fully considered and they are persuasive.
Applicant argues, see especially page 6-8, “As evidenced by the above, at least the above-noted features provide a distinction over each of Huang, Ye and Wang alone or in combination. Huang discloses a method for adapting a pre-trained DNN for speech recognition. Its focus is on "adaptation" of an already trained model to new acoustic conditions. In contrast, amended claim 1 is directed to a specific and different training paradigm in the field of natural language processing.  Amended claim 1 recites concurrently training a word prediction model (a pre-training task) and a named entity extraction model (a fine-tuning task). Huang does not disclose or suggest this concurrent training process where the model for a general, unsupervised task (pre-training) continues to be trained simultaneously with the model for a specific, supervised task (fine-tuning). Huang's method involves taking a fully pre-trained model and then adapting it. An inventive concept of the present invention, as recited in amended claim 1, lies in this unique concurrent training, which prevents the degradation of the pre-trained knowledge during fine-tuning. This key limitation is not taught or suggested by Huang.  Since the primary reference (Huang) does not teach this foundational element, the combination with Ye (which applies MTL to relation extraction) cannot render the claimed invention obvious. Merely applying a different, general framework to a new field does not make the specific, novel framework of the present invention obvious.” Examiner respectfully disagrees. Huang teaches in section 3.1 “By adding the auxiliary architecture, the new DNN will have more than one output layers. This kind of multi-output-layer DNN can be trained using the MTL scheme. MTL is a machine learning scheme letting a classifier learn more than one related tasks at a time [22].” Huang uses a Multi-task learning model (MTL) to learn multiple output layers that do different tasks (i.e. models) at the same time (i.e. concurrently). Huang further teaches throughout Section 3 the use of a pretrained deep neural network as a base model (i.e. Pretraining) that is then used with the MTL model to train multiple output layers to do different tasks using the same common layers (i.e. fine tuning). 
Further Applicant argues, see especially pages 8-9, “(1) a specialized multi-task model is first created by concurrently training a word prediction model and a biotechnology named entity extraction model; and (2) this specialized model is then used as a foundation for transfer learning to adapt to a chemistry field task.  Wang teaches a general concept of cross-type MTL, i.e., mixing data from different domains (biomedical and chemical) for a single training process. It does not suggest the Applicant's specific, sequential workflow of first building a highly effective, specialized model via concurrent pre-training/fine-tuning in one domain, and then leveraging that entire trained model for adaptation to a second domain. This workflow is a non-obvious choice that yields superior results in transfer learning, and it would not have been obvious to a person of ordinary skill in the art from a mere combination of the references. The Examiner's reasoning relies on improper hindsight.” Examiner respectfully disagrees. In response to applicant's argument that the examiner's conclusion of obviousness is based upon improper hindsight reasoning, it must be recognized that any judgment on obviousness is in a sense necessarily a reconstruction based upon hindsight reasoning.  But so long as it takes into account only knowledge which was within the level of ordinary skill at the time the claimed invention was made, and does not include knowledge gleaned only from the applicant's disclosure, such a reconstruction is proper.  See In re McLaughlin, 443 F.2d 1392, 170 USPQ 209 (CCPA 1971). 
Further Applicant argues, see especially page 7, “the first model and the second model sharing at least an input layer and an intermediate layer, the first model including a first inputlayer coupled to the intermediate layer and being configured to perform a word prediction task using unlabeled text data from the first field, and the second model including a second output layer, distinct from the first output layer and coupled to the intermediate layer and being configured to perform an extraction of a named entity using labeled text data from the first field” Examiner respectfully disagrees. Applicant’s arguments have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 5, and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Huang et al. “Rapid Adaptation for Deep Neural Networks through Multi-Task Learning” in view of Yc et al. "Exploiting Entity BIO Tag Embeddings and Multi-task Learning for Relation Extraction with Imbalanced Data", Wang et al. "Cross-type biomedical named entity recognition with deep multi-task learning", and Pentina et al. “Multi-task Learning with Labeled and Unlabeled Tasks”.
Regarding Claim 1 Huang teaches A training method for a computer to execute a process comprising: generating a trained multi-task model by concurrently training a first model and a second model within a single neural network architecture (Huang, pages 3626-3627, section 3. Teaches a Multi-task learning model (MTL) to learn multiple output layers that do different tasks (i.e. models) at the same time (i.e. concurrently).) using training data that belongs to a first field, (Huang, page 3627, section 4.1, teaches the training of a baseline deep neural network that was trained using training data from a specified data set belonging to a first field.)  the first model and the second model sharing at least an input layer and an intermediate layer, the first model including a first input layer coupled to the intermediate layer (Huang, page 3626, section 3, teaches the use of an input layer, an intermediate layer that are utilized by both the first and second models that are distinguished by different output layers. Huang, page 3627, section 4.1-4.2, teaches the use of an input layer, an intermediate layer, and an output layer that is connected to said intermediate layer. These layers are present in both the base model that is trained and all the models of the multi-task model that is trained. The output layer being exclusive to the task of the first model (i.e., first output layer))…1, and the second model including a second output layer, distinct from the first output layer and coupled to the intermediate layer (Huang, page 3626, section 3, teaches the use of an input layer, an intermediate layer that are utilized by both the first and second models that are distinguished by different output layers. Huang, page 3627, section 4.1, teaches the training of a multi-task model that has at least two output layers meaning it has at least two models. The one multi-task model having at least two output layers represents two models withing the multi-task model that are trained using the base model and their own model training data. (i.e., trained a first model and a second model by training a multi-task learning model that includes the first model and the second model using training data)) …2
generating an objective model in which a new third output layer is coupled to the intermediate layer, the third output layer being configured for a task in a second field (Huang, page 3627, section 4.1, teaches the training of a multi-task model that has at least two output layers meaning it is able to have three output layers and in turn three models. The one multi-task model having at least two output layers represents three models withing the multi-task model that are trained using the base model and their own model training data. (i.e., the generating the trained model includes generating a third model that includes the input layer and the intermediate layer in the trained multi-task model and a third output layer) …3
Huang does not teach …1and being configured to perform a word prediction task …4 from the first field… However, Ye in analogous art teaches this limitation (Ye, page 3, section 2.3 and Figure 1, teaches the prediction of words and word relation in one of the models in the multi-task model, in regards to the BIO tag information that is being input into the model. the BIO tag information is biotechnology and used in the field of biotechnology.)
Further, Huang does not teach …2 and being configured to perform an extraction of a named entity using …5 from the first field;… However, Ye in analogous art teaches this limitation (Ye, page 3, section 2.3 and Figure 1, teaches the extraction of named entities and relationships in one of the models in the multi-task model, in regards to the BIO tag information that is being input into the model. the BIO tag information is biotechnology and used in the field of biotechnology.))
It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to combine Ye’s teaching of multi-task models using biotechnology data with Huang’s teaching of the training of multi-task models. The motivation to do so would be to would be to allow the multi-task model to be trained on biotechnological data and be able to use a base model on multiple types of data from different fields.
The combination of Huang and Ye does not teach …3that is related to but distinct from the first field; and training the objective model by using training data that belongs to a second field. (Wang, page 1747-1749, section 4, teaches the use of the combining of a chemical task and data set with a biomedical (i.e., biotechnology) multi-task model and shows that chemical tasks can be used in tandem with biotechnology tasks in these models showing that a model can be trained to do a task distinct from the initial trained task using a second set of data. Further Wang teaches the training of an output layer on the chemical using the chemical training data set for the chemical task which is different from the biomedical task.)
It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to combine Wang’s teaching of a multi-task model using chemical and biotechnical data with the combination of Huang and Ye’s teaching of a multi-task model. The motivation to do so would be to allow the model to use both chemical and biotechnological data to do chemical and biotechnical tasks along with the ability to use the chemical data to fill in any gaps in the biotechnological data and vice versa. 
The combination of Huang, Ye, and Wang does not teach …4 using unlabeled text data……5 labeled text data… However, Pentina in analogous art teaches this limitation (Pentina, page 3 – 6, sections 2-3, teaches the ability of a multi-task model being able to have a task that is trained for unlabeled data and a task that is trained for labeled data that are connected to the dame model.)
It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to combine Pentina’s teaching of a multi-task model using labeled and unlabeled data for different output layers with the combination of Huang Ye and Wang’s teaching of a multi-task model. The motivation to do so would be to allow the model to use both labeled and unlabeled data to train the models connected to the multi-task model in order for the labeled data to help fill in the gaps that are missed through the unlabeled data and do different tasks depending on the data type. 
Regarding Claim 5 Huang teaches A training device comprising: one or more memories; and one or more processors coupled to the one or more memories and the one or more processors configured to: generate a trained multi-task model by concurrently training a first model and a second model within a single neural network architecture (Huang, pages 3626-3627, section 3. Teaches a Multi-task learning model (MTL) to learn multiple output layers that do different tasks (i.e. models) at the same time (i.e. concurrently).) using training data that belongs to a first field, (Huang, page 3627, section 4.1, teaches the training of a baseline deep neural network that was trained using training data from a specified data set belonging to a first field.)  the first model and the second model sharing at least an input layer and an intermediate layer, the first model including a first input layer coupled to the intermediate layer (Huang, page 3626, section 3, teaches the use of an input layer, an intermediate layer that are utilized by both the first and second models that are distinguished by different output layers. Huang, page 3627, section 4.1-4.2, teaches the use of an input layer, an intermediate layer, and an output layer that is connected to said intermediate layer. These layers are present in both the base model that is trained and all the models of the multi-task model that is trained. The output layer being exclusive to the task of the first model (i.e., first output layer))…1, and the second model including a second output layer, distinct from the first output layer and coupled to the intermediate layer (Huang, page 3626, section 3, teaches the use of an input layer, an intermediate layer that are utilized by both the first and second models that are distinguished by different output layers. Huang, page 3627, section 4.1, teaches the training of a multi-task model that has at least two output layers meaning it has at least two models. The one multi-task model having at least two output layers represents two models withing the multi-task model that are trained using the base model and their own model training data. (i.e., trained a first model and a second model by training a multi-task learning model that includes the first model and the second model using training data)) …2
generate an objective model in which a new third output layer is coupled to the intermediate layer, the third output layer being configured for a task in a second field (Huang, page 3627, section 4.1, teaches the training of a multi-task model that has at least two output layers meaning it is able to have three output layers and in turn three models. The one multi-task model having at least two output layers represents three models withing the multi-task model that are trained using the base model and their own model training data. (i.e., the generating the trained model includes generating a third model that includes the input layer and the intermediate layer in the trained multi-task model and a third output layer) …3
Huang does not teach …1and being configured to perform a word prediction task …4 from the first field… However, Ye in analogous art teaches this limitation (Ye, page 3, section 2.3 and Figure 1, teaches the prediction of words and word relation in one of the models in the multi-task model, in regards to the BIO tag information that is being input into the model. the BIO tag information is biotechnology and used in the field of biotechnology.)
Further, Huang does not teach …2 and being configured to perform an extraction of a named entity using …5 from the first field;… However, Ye in analogous art teaches this limitation (Ye, page 3, section 2.3 and Figure 1, teaches the extraction of named entities and relationships in one of the models in the multi-task model, in regards to the BIO tag information that is being input into the model. the BIO tag information is biotechnology and used in the field of biotechnology.))
It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to combine Ye’s teaching of multi-task models using biotechnology data with Huang’s teaching of the training of multi-task models. The motivation to do so would be to would be to allow the multi-task model to be trained on biotechnological data and be able to use a base model on multiple types of data from different fields.
The combination of Huang and Ye does not teach …3that is related to but distinct from the first field; and training the objective model by using training data that belongs to a second field. (Wang, page 1747-1749, section 4, teaches the use of the combining of a chemical task and data set with a biomedical (i.e., biotechnology) multi-task model and shows that chemical tasks can be used in tandem with biotechnology tasks in these models showing that a model can be trained to do a task distinct from the initial trained task using a second set of data. Further Wang teaches the training of an output layer on the chemical using the chemical training data set for the chemical task which is different from the biomedical task.)
It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to combine Wang’s teaching of a multi-task model using chemical and biotechnical data with the combination of Huang and Ye’s teaching of a multi-task model. The motivation to do so would be to allow the model to use both chemical and biotechnological data to do chemical and biotechnical tasks along with the ability to use the chemical data to fill in any gaps in the biotechnological data and vice versa. 
The combination of Huang, Ye, and Wang does not teach …4 using unlabeled text data……5 labeled text data… However, Pentina in analogous art teaches this limitation (Pentina, page 3 – 6, sections 2-3, teaches the ability of a multi-task model being able to have a task that is trained for unlabeled data and a task that is trained for labeled data that are connected to the dame model.)
It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to combine Pentina’s teaching of a multi-task model using labeled and unlabeled data for different output layers with the combination of Huang Ye and Wang’s teaching of a multi-task model. The motivation to do so would be to allow the model to use both labeled and unlabeled data to train the models connected to the multi-task model in order for the labeled data to help fill in the gaps that are missed through the unlabeled data and do different tasks depending on the data type. 
Regarding Claim 9 Huang teaches A non-transitory computer-readable storage medium storing a training program that causes at least one computer to execute a process, the process comprising: generating a trained multi-task model by concurrently training a first model and a second model within a single neural network architecture (Huang, pages 3626-3627, section 3. Teaches a Multi-task learning model (MTL) to learn multiple output layers that do different tasks (i.e. models) at the same time (i.e. concurrently).) using training data that belongs to a first field, (Huang, page 3627, section 4.1, teaches the training of a baseline deep neural network that was trained using training data from a specified data set belonging to a first field.)  the first model and the second model sharing at least an input layer and an intermediate layer, the first model including a first input layer coupled to the intermediate layer (Huang, page 3626, section 3, teaches the use of an input layer, an intermediate layer that are utilized by both the first and second models that are distinguished by different output layers. Huang, page 3627, section 4.1-4.2, teaches the use of an input layer, an intermediate layer, and an output layer that is connected to said intermediate layer. These layers are present in both the base model that is trained and all the models of the multi-task model that is trained. The output layer being exclusive to the task of the first model (i.e., first output layer))…1, and the second model including a second output layer, distinct from the first output layer and coupled to the intermediate layer (Huang, page 3626, section 3, teaches the use of an input layer, an intermediate layer that are utilized by both the first and second models that are distinguished by different output layers. Huang, page 3627, section 4.1, teaches the training of a multi-task model that has at least two output layers meaning it has at least two models. The one multi-task model having at least two output layers represents two models withing the multi-task model that are trained using the base model and their own model training data. (i.e., trained a first model and a second model by training a multi-task learning model that includes the first model and the second model using training data)) …2
generating an objective model in which a new third output layer is coupled to the intermediate layer, the third output layer being configured for a task in a second field (Huang, page 3627, section 4.1, teaches the training of a multi-task model that has at least two output layers meaning it is able to have three output layers and in turn three models. The one multi-task model having at least two output layers represents three models withing the multi-task model that are trained using the base model and their own model training data. (i.e., the generating the trained model includes generating a third model that includes the input layer and the intermediate layer in the trained multi-task model and a third output layer) …3
Huang does not teach …1and being configured to perform a word prediction task …4 from the first field… However, Ye in analogous art teaches this limitation (Ye, page 3, section 2.3 and Figure 1, teaches the prediction of words and word relation in one of the models in the multi-task model, in regards to the BIO tag information that is being input into the model. the BIO tag information is biotechnology and used in the field of biotechnology.)
Further, Huang does not teach …2 and being configured to perform an extraction of a named entity using …5 from the first field;… However, Ye in analogous art teaches this limitation (Ye, page 3, section 2.3 and Figure 1, teaches the extraction of named entities and relationships in one of the models in the multi-task model, in regards to the BIO tag information that is being input into the model. the BIO tag information is biotechnology and used in the field of biotechnology.))
It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to combine Ye’s teaching of multi-task models using biotechnology data with Huang’s teaching of the training of multi-task models. The motivation to do so would be to would be to allow the multi-task model to be trained on biotechnological data and be able to use a base model on multiple types of data from different fields.
The combination of Huang and Ye does not teach …3that is related to but distinct from the first field; and training the objective model by using training data that belongs to a second field. (Wang, page 1747-1749, section 4, teaches the use of the combining of a chemical task and data set with a biomedical (i.e., biotechnology) multi-task model and shows that chemical tasks can be used in tandem with biotechnology tasks in these models showing that a model can be trained to do a task distinct from the initial trained task using a second set of data. Further Wang teaches the training of an output layer on the chemical using the chemical training data set for the chemical task which is different from the biomedical task.)
It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to combine Wang’s teaching of a multi-task model using chemical and biotechnical data with the combination of Huang and Ye’s teaching of a multi-task model. The motivation to do so would be to allow the model to use both chemical and biotechnological data to do chemical and biotechnical tasks along with the ability to use the chemical data to fill in any gaps in the biotechnological data and vice versa. 
The combination of Huang, Ye, and Wang does not teach …4 using unlabeled text data……5 labeled text data… However, Pentina in analogous art teaches this limitation (Pentina, page 3 – 6, sections 2-3, teaches the ability of a multi-task model being able to have a task that is trained for unlabeled data and a task that is trained for labeled data that are connected to the dame model.)
It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to combine Pentina’s teaching of a multi-task model using labeled and unlabeled data for different output layers with the combination of Huang Ye and Wang’s teaching of a multi-task model. The motivation to do so would be to allow the model to use both labeled and unlabeled data to train the models connected to the multi-task model in order for the labeled data to help fill in the gaps that are missed through the unlabeled data and do different tasks depending on the data type. 

Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to THOMAS B LANE whose telephone number is (571)272-1872. The examiner can normally be reached M-Th: 7am-5pm; F: Out of Office.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, MARIELA REYES can be reached at (571) 270-1006. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/THOMAS BERNARD LANE/Examiner, Art Unit 2142                                                                                                                                                                                                        

/HAIMEI JIANG/Primary Examiner, Art Unit 2142
Read full office action
Prosecution Timeline

Feb 23, 2022
Application Filed
Aug 07, 2025
Non-Final Rejection mailed — §103
Nov 07, 2025
Response Filed
Feb 09, 2026
Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/939,778
Patent 12619893
GENERATION APPARATUS, GENERATION METHOD, AND RECORDING MEDIUM
3y 8m to grant Granted May 05, 2026
18/031,065
Patent 12619857
Methods and Apparatuses for Bottleneck Stages in Neural-Network Processing
3y 0m to grant Granted May 05, 2026
17/536,230
Patent 12561398
VALIDATION PROCESSING FOR CANDIDATE RETRAINING DATA
4y 2m to grant Granted Feb 24, 2026
17/457,698
Patent 12541572
ACCELERATING DECISION TREE INFERENCES BASED ON COMPLEMENTARY TENSOR OPERATION SETS
4y 2m to grant Granted Feb 03, 2026
17/838,342
Patent 12468921
PIPELINING AND PARALLELIZING GRAPH EXECUTION METHOD FOR NEURAL NETWORK MODEL COMPUTATION AND APPARATUS THEREOF
3y 5m to grant Granted Nov 11, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
85%
Grant Probability
99%
With Interview (+22.2%)
3y 9m (~0m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 13 resolved cases by this examiner. Grant probability derived from career allowance rate.