Prosecution Insights
Last updated: April 19, 2026
Application No. 17/903,044

MACHINE LEARNING METHOD AND INFORMATION PROCESSING DEVICE OF TRAINING MODEL USING ADVERSARIAL NEURAL NETWORK

Final Rejection §101§103
Filed
Sep 06, 2022
Examiner
DAY, ROBERT N
Art Unit
2122
Tech Center
2100 — Computer Architecture & Software
Assignee
Fujitsu Limited
OA Round
2 (Final)
23%
Grant Probability
At Risk
3-4
OA Rounds
4y 3m
To Grant
46%
With Interview

Examiner Intelligence

Grants only 23% of cases
23%
Career Allow Rate
5 granted / 22 resolved
-32.3% vs TC avg
Strong +23% interview lift
Without
With
+23.2%
Interview Lift
resolved cases with interview
Typical timeline
4y 3m
Avg Prosecution
38 currently pending
Career history
60
Total Applications
across all art units

Statute-Specific Performance

§101
32.6%
-7.4% vs TC avg
§103
35.3%
-4.7% vs TC avg
§102
12.9%
-27.1% vs TC avg
§112
18.3%
-21.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 22 resolved cases

Office Action

§101 §103
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . DETAILED ACTION This action is in response to the amendments filed 27 October 2025. Claims 1-4 and 6-9 are amended. Claim 5 is cancelled. Claims 1-4 and 6-9 are pending and have been examined. Information Disclosure Statement The information disclosure statement (IDS) submitted on 22 August 2025 is being considered by the examiner. Response to Arguments Applicant’s arguments, see page 8, filed 27 October 2025, with respect to the objection to the title of the invention have been fully considered and are persuasive. The objection to the title of the invention has been withdrawn. APPLICANT'S ARGUMENT: Applicant argues (page 8, paragraph 6) that "The title is amended herein. Accordingly, withdrawal of the objection to the title is respectfully requested." EXAMINER'S RESPONSE: Examiner agrees. The objection has been withdrawn in light of arguments and/or amendments. Applicant's arguments, see pages 8-15, filed 27 October 2025, with respect to the rejections of Claims 1-9 under 35 U.S.C. 101 have been fully considered and are persuasive. The rejections of Claims 1-9 under 35 U.S.C. 101 has been withdrawn. APPLICANT'S ARGUMENT: Applicant argues (page 9, paragraph 3) that "the Examiner characterized the acts of 'generating second input data,' 'discriminating a rewritten portion,' 'generating correct answer information,' and calculating error information as mere 'mental processes.' This characterization appears to be inconsistent with MPEP guidance for claims involving complex computational processes, especially in the context of neural networks." Applicant argues (page 10, paragraph 2) that "This language unequivocally limits the claimed invention to a specific type of computing technology: a neural network model comprising a generator and a discriminator operating as a machine learning model. The claims delineate a series of computational steps executed within this specific framework. The operations ... are all functions of immense scale and profound complexity, thus demanding vast computational resources. Such tasks cannot practically be performed in the human mind by observation, evaluation, judgment, or opinion." Applicant argues (page 11, continued paragraph) that "While 'adversarial loss' conceptually involves mathematical operations, its description in the claim ... falls more squarely into the 'merely involve' category rather than explicitly 'reciting' a specific mathematical formula in isolation or abstract concept identifiable by name." EXAMINER'S RESPONSE: The rejection under 35 U.S.C. 101 of the prior Office Action has been withdrawn in light of arguments and/or amendments. APPLICANT'S ARGUMENT: Applicant argues (page 11, paragraph 3) that "The amended Claim 1 integrates any potential abstract idea into a practical application by directly providing a technical solution to a specific technical problem and unequivocally improving the functioning of a computer and relevant technology in the field of machine learning." Applicant argues (page 12, continued paragraph) that "The claimed invention addresses a well-defined technical problem and offers a concrete technical solution within the specialized field of machine learning. ... That is, the claimed invention addresses the critical technical challenge of attaining and maintaining high accuracy in machine learning pre-training, particularly by overcoming a conventional drawback in adversarial network architectures where the discriminator's learning efficiency can degrade due to the generator's increasing proficiency over time." Applicant argues (page 12, paragraph 2) that "The amended Claim 1 articulates a specific method ... not a mere recitation of an abstract idea; it is a precisely defined technical mechanism designed to overcome the identified problem of discriminator learning degradation. ¶ This 'adversarial loss' functions by compelling the generator to produce data that is increasingly difficult for the discriminator to classify, thereby continuously challenging and improving the discriminator's detection capabilities. This bespoke training objective directly addresses and mitigates the problem of declining discriminator learning efficiency, which was a specific technical flaw in the prior art." Applicant argues (page 13, paragraph 3) that "the claimed training methodology yields a tangible improvement in the accuracy of the machine learning model itself and thus constitutes a clear technological improvement in the functioning of a computer-implemented system. ¶ The amended Claim 1 therefore meticulously details a specific and innovative computational method for training a neural network model that fundamentally enhances the training process, leading to a demonstrably more accurate and robust machine learning model. ... The sophisticated interaction ... embodies a specific, non-conventional application of machine learning principles integrated into a potent technical solution. ¶ ... [I]t sets forth a novel computational methodology that inherently cannot be performed by humans, and provides a specific technical solution to a specific technical problem within the complex domain of machine learning. This undoubtedly constitutes an improvement in the operational efficacy of a computer system within the specialized AI/machine learning field." EXAMINER'S RESPONSE: The rejection under 35 U.S.C. 101 of the prior Office Action has been withdrawn in light of arguments and/or amendments. Applicant's arguments, see pages 15-18, filed 27 October 2025, with respect to the rejections of Claims 1-9 under 35 U.S.C. 102(a)(1) and 35 U.S.C. 103 have been fully considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. APPLICANT'S ARGUMENT: Applicant argues (page 15, paragraph 3) that "The assertion that 'MLM loss ( L M L M )' in Clark corresponds to the claimed 'first error information' and that the combined loss shown in Clark's figure implies joint minimization based on the discriminator result. However, this interpretation fundamentally misapprehends the nature of both the 'first error information' as now specifically defined in the amended Claim 1 and the role of Clark's MLM loss ...This direct statement from Clark unequivocally teaches that its generator is not trained using an adversarial loss to fool the discriminator ... ¶ ... This adversarial objective is fundamentally different from the maximum likelihood objective for the generator in Clark." EXAMINER'S RESPONSE: Applicant's arguments pertain to newly claimed limitations and are now moot. Amended Claim 1 is now rejected in view of Clark in view of Chen. Chen is relied on to teach error information being an adversarial loss. Amended dependent Claims 2 and 6-9 are now rejected in view of Clark in view of Chen. Amended dependent Claims 3 and 4 are now rejected in view of Clark in view of Chen in view of Zhu. Specification The objection to the title of the invention is withdrawn in light of arguments and/or amendments. The abstract of the disclosure is objected for being deficient with respect to the proper content of an abstract of the disclosure. A patent abstract is a concise statement of the technical disclosure of the patent and should include that which is new in the art to which the invention pertains. The abstract should be in narrative form. See MPEP § 608.01(b) for guidelines for the preparation of patent abstracts. Claim Rejections - 35 USC § 101 The rejections of Claims 1-4 and 6-9 under 35 U.S.C. 101 are withdrawn in light of arguments and/or amendments. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention. The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows: 1. Determining the scope and contents of the prior art. 2. Ascertaining the differences between the prior art and the claims at issue. 3. Resolving the level of ordinary skill in the pertinent art. 4. Considering objective evidence present in the application indicating obviousness or nonobviousness. Claims 1, 2, and 6-9 are rejected under 35 U.S.C. 103 as being unpatentable over Clark, et al., "Electra: Pre-training text encoders as discriminators rather than generators" (hereinafter "Clark") in view of Chen, et al., "Adding a filter based on the discriminator to improve unconditional text generation" (hereinafter "Chen"). Regarding Claim 1, Clark teaches: A non-transitory computer-readable recording medium storing a program of training a machine learning model (Clark, p. 2, 1 Introduction: "We call our approach ELECTRA for 'Efficiently Learning an Encoder that Classifies Token Replacements Accurately.' ... we build an ELECTRA-Small model that can be trained on 1 GPU in 4 days" and footnote 1: "Code and pre-trained weights will be released at https://github.com/google-research/electra," where a non-transitory medium is inherent in storing the program from the referenced repository used for training and evaluation) being a neural network model including a generator and a discriminator, the program including instructions for causing a computer to execute a training process (Clark, p. 3, 2 Method: "Our approach trains two neural networks, a generator G and a discriminator D"), the training process comprising: acquiring training data including first input data (Clark, p. 4, 3.1 Experimental Setup: "For most experiments we pre-train on the same data as BERT, which consists of 3.3 Billion tokens from Wikipedia and BooksCorpus .... However, for our Large model we pre-trained on the data used for XLNet," where Clark's pre-training includes the input data to the generator, corresponding to the instant first input data, as in p. 4, 2 Method: "After pre-training, we throw out the generator and fine-tune the discriminator on downstream tasks"); inputting the first input data included in the training data to the generator to cause the generator to generate second input data (Clark, p. 3, Fig. 2, "The generator can be any model that produces an output distribution over tokens, but we usually use a small masked language model that is trained jointly with the discriminator," depicting the generator receiving input and producing second input data, as in p. 3, 2 Method: "Our approach trains two neural networks, a generator G and a discriminator D . Each one primarily consists of an encoder (e.g., a Transformer network) that maps a sequence on input tokens x = x 1 , … , x n into a sequence of contextualized vector representations h x = h 1 , … , h n . For a given position t , (in our case only positions where x t = M A S K ), the generator outputs a probability for generating a particular token x t with a softmax layer") in which a part of the first input data is rewritten in response to the input of the first input data (Clark, p. 3, 2 Method: "The generator is trained to perform masked language modeling (MLM). Given an input x = x 1 , … , x n , MLM first select a random set of positions (integers between 1 and n) to mask out m = m 1 , … , m n . The tokens in the selected positions are replaced with a M A S K token: we denote this as x m a s k e d = R E P L A C E x , m , M A S K . The generator then learns to predict the original identities of the masked-out tokens"); inputting the generated second input data from the generator to the discriminator (Clark, p. 3, Fig. 2, "we usually use a small masked language model that is trained jointly with the discriminator," depicting the discriminator receiving generator output as training input) to cause the discriminator to output a discrimination result of discriminating a rewritten portion in the second input data in response to the input of the second input data (Clark, p. 3, 2 Method: "For a given position t , the discriminator predicts whether the token x t is 'real,' i.e., that it comes from the data rather than the generator distribution, with a sigmoid output layer. ... The discriminator is trained to distinguish tokens in the data from tokens that have been replaced by generator samples. More specifically, we create a corrupted example x c o r r u p t by replacing the masked-out tokens with generator samples and train the discriminator to predict which tokens in xcorrupt match the original input x "); generating correct answer information, based on the first input data of the training data and the second input data generated by the generator (Clark, p. 3, 2 Method: "we create a corrupted example x c o r r u p t by replacing the masked-out tokens with generator samples and train the discriminator to predict which tokens in x c o r r u p t match the original input x. ... [M]odel inputs are constructed according to m i ~ unif { 1 , n } for i = 1   t o   k x ^ i ~ p G x i x m a s k e d for i ∈ m x m a s k e d = R E P L A C E x , m , M A S K x c o r r u p t = R E P L A C E x , m , x ^ and the loss functions are ... L D i s c x , θ D = E ∑ t = 1 n 1 x t corrupt = x t log ⁡ D x t corrupt , t - 1 x t corrupt ≠ x t log ⁡ 1 - D x t corrupt , t ," where Clark's x , x c o r r u p t , and the sum term of L D i s c correspond to the instant first input data, second input data, and correct answer information); and updating parameters of the generator and the discriminator in the machine learning model to minimize a total loss derived from first error information and second error information (Clark, p. 4, 2 Method: "We minimize the combined loss min θ G , θ D ⁡ ∑ x ∈ X L M L M x , θ G + λ L D i s c x , θ D over a large corpus X of raw text," where Clark's MLM [masked language modeling, i.e., generator] and Disc [discriminator] losses correspond to the instant first and second error information), ... , the first error information being ... loss used to train the generator to deceive the discriminator (Clark, p. 3, 2 Method: "For a given position t , the discriminator predicts whether the token x t is 'real', i.e., that it comes from the data rather than the generator distribution"), the second error information being obtained based on the discrimination result output by the discriminator and the correct answer information (Clark, p. 3, 2 Method: "For a given position t , the discriminator predicts whether the token x t is 'real,' i.e., that it comes from the data rather than the generator distribution with a sigmoid output layer: D x , t = sigmoid w T , h D x t ... [T]he loss functions are L D i s c x , θ D = E ∑ t = 1 n 1 x t corrupt = x t log ⁡ D x t corrupt , t - 1 x t corrupt ≠ x t log ⁡ 1 - D x t corrupt , t ," where Clark's D x , t corresponds to the instant discrimination result). Clark teaches the first error information being loss used to train the generator to deceive the discriminator. Clark does not explicitly teach the first error information being obtained based on the second input data generated by of the generator and the discrimination result output by the discriminator, the first error information being an adversarial loss. However, Chen teaches: the first error information being obtained based on the second input data generated by of the generator and the discrimination result output by the discriminator (Chen, p. 3, Figure 1, "Our filtering mechanism versus GAN. A filter S ω x is added to determine the acceptance probability of a sample which is generated by G θ . This acceptance probability is computed according to feedback signals from discriminator D ϕ ," depicting the generator G f comprising the filter S ω as a component, with generator output p ω , θ being input to discriminator D ϕ , and filter S ω is parameterized by discriminator result according to p. 4, 2.4 Implement the Filter by Sampling, Eq. 7, and where filter has learned parameters ω , as in " S ω is a learnable function (a neural network)"), the first error information being an adversarial loss (Chen, p. 4, 2.4 Implement the Filter by Sampling: "Within this GAN framework, both the discriminator D ϕ and the filter S ω are updated in adversarial learning"). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Clark regarding the first error information being loss used to train the generator to deceive the discriminator with those of Chen regarding the first error information being obtained based on the second input data generated by of the generator and the discrimination result output by the discriminator, the first error information being an adversarial loss. The motivation to do so would be to facilitate producing training results of the generator of the GAN to have improved quality and diversity (Chen, p. 1, Abstract: "The autoregressive language model (ALM) trained with maximum likelihood estimation (MLE) is widely used in unconditional text generation. Due to exposure bias, the generated texts still suffer from low quality and diversity. ... To alleviate the exposure bias, generative adversarial networks (GAN) use the discriminator to update the generator’s parameters directly, but they fail by being evaluated precisely. ... We propose a novel mechanism by adding a filter which has the same input as the discriminator. ... Thus, the original generative distribution is revised to reduce the discrepancy"). Regarding Claim 8, Clark teaches: A machine learning method ... of training a machine learning model being a neural network model including a generator and a discriminator (Clark, p. 1, 1 Introduction: "Instead of masking, our method corrupts the input by replacing some tokens with samples from a proposal distribution, which is typically the output of a small masked language model. ... We then pre-train the network as a discriminator that predicts for every token whether it is an original or a replacement," where Clark's masked language model corresponds to the instant generator) implemented by a computer (Clark, p. 2, 1 Introduction: "We call our approach ELECTRA for 'Efficiently Learning an Encoder that Classifies Token Replacements Accurately.' ... we build an ELECTRA-Small model that can be trained on 1 GPU in 4 days" and footnote 1: "Code and pre-trained weights will be released at https://github.com/google-research/electra") ... , the machine learning method comprising: precisely those steps recited by Claim 1. Claim 8 is rejected under the same rationale as Claim 1. Regarding Claim 9, Clark teaches: An information processing device of training a machine learning model ..., the information processing device comprising: a memory; and a processor coupled to the memory (Clark, p. 2, 1 Introduction: "We call our approach ELECTRA for 'Efficiently Learning an Encoder that Classifies Token Replacements Accurately.' ... we build an ELECTRA-Small model that can be trained on 1 GPU in 4 days" and footnote 1: "Code and pre-trained weights will be released at https://github.com/google-research/electra," where a non-transitory medium is inherent in storing the program from the referenced repository used for training and evaluation)... being a neural network model including a generator and a discriminator (Clark, p. 1, 1 Introduction: "Instead of masking, our method corrupts the input by replacing some tokens with samples from a proposal distribution, which is typically the output of a small masked language model. ... We then pre-train the network as a discriminator that predicts for every token whether it is an original or a replacement," where Clark's masked language model corresponds to the instant generator), the processor being configured to perform training processing comprising: precisely those steps recited by Claim 1. Claim 9 is rejected under the same rationale as Claim 1. Regarding Claim 2, the rejection of Claim 1 is incorporated. The Clark/Chen combination teaches: wherein the first input data is first document data including a plurality of words (Clark, p. 4, 3.1 Experimental Setup: "For most experiments we pre-train on the same data as BERT, which consists of 3.3 Billion tokens from Wikipedia and BooksCorpus .... However, for our Large model we pre-trained on the data used for XLNet," where Clark's pre-training includes the input data to the generator, corresponding to the instant first input data, as in p. 4, 2 Method: "After pre-training, we throw out the generator"); the generator of the machine learning model generates, as the second input data, second document data in which some words in first document data are replaced with other words, in response to an input of the first document data (Clark, p. 3, 2 Method: "The generator is trained to perform masked language modeling (MLM). Given an input x = x 1 , … , x n , MLM first select a random set of positions (integers between 1 and n) to mask out m = m 1 , … , m n . The tokens in the selected positions are replaced with a M A S K token... The generator then learns to predict the original identities of the masked-out tokens. ... [I]f the generator happens to generate the correct token, that token is considered 'real' instead of 'fake'," where Clark's input x after randomly masking tokens corresponds to the instant first document data and input x with predicted tokens corresponds to the instant second document data), and the discriminator of the machine learning model output, as the discrimination result, a result of executing discrimination as to whether each of words in the second document data is any of the words replaced by the generator, in response to an input of the second document data generated by the generator (Clark, p. 3, Figure 2: "An overview of replaced token detection," depicting original, masked, predicted, and discriminated tokens, and p. 3, 2 Method: "The generator then learns to predict the original identities of the masked-out tokens. The discriminator is trained to distinguish tokens in the data from tokens that have been replaced by generator samples," where Clark's tokens replaced by generator samples corresponds to the instant second document data). Regarding Claim 6, the rejection of Claim 1 is incorporated. The Clark/Chen combination teaches: the program further including instructions for causing the computer to execute a fine-tuning process (Clark, p. 4, 2 Method: "After pre-training, we throw out the generator and fine-tune the discriminator on downstream tasks"), the fine-tuning process comprising: inputting supervised training data, which includes the correct answer information, to the discriminator of the machine learning model on which the training process has been executed (Clark, p. 14, B Fine-Tuning Details: "For WNLI, we follow the trick ... where we extract candidate antecedents for the pronoun using rules and train a model to score the correct antecedent highly. ... [W]e fine-tune ELECTRA’s discriminator so it assigns high scores to the tokens of the correct antecedent when the correct antecedent replaces the pronoun," where Clark's tokens of the correct antecedent corresponds to the instant supervised training data including correct answers); and updating parameters of the discriminator (Clark, p. 14, B Fine-Tuning Details: "For WNLI, we ... train a model to score the correct antecedent highly. ... [W]e fine-tune ELECTRA’s discriminator so it assigns high scores," where Clark's fine-tuning by training reasonably suggests updating discriminator model parameters, as in L D i s c x , θ D ) such that an error ... is minimized ... (Clark, p. 4, 2 Method: "We minimize the combined loss ... over a large corpus X of raw text. ... After pre-training, we throw out the generator and fine-tune the discriminator on downstream tasks," reasonably suggesting that the discriminator is trained according to L D i s c x , θ D , rather than jointly, during fine-tuning) between the discrimination result output by the discriminator in response to an input of the supervised training data and the correct answer information (Clark, p. 14, B Fine-Tuning Details: "For WNLI, we follow the trick ... where we extract candidate antecedents for the pronoun using rules and train a model to score the correct antecedent highly. ... [W]e fine-tune ELECTRA’s discriminator so it assigns high scores to the tokens of the correct antecedent when the correct antecedent replaces the pronoun," where Clark's correct antecedent corresponds to the correct answer information). Regarding Claim 7, the rejection of Claim 6 is incorporated. The Clark/Chen combination teaches: the program further includes instructions for causing the computer to execute an operation process, the operation process comprising: inputting target document data, which is targeted for discrimination and contains a plurality of words, to the discriminator of the machine learning model (Clark, p. 4, 3 Experiments, 3.1 Experimental Setup: "We evaluate on the General Language Understanding Evaluation (GLUE) benchmark... ¶ ... For fine-tuning on GLUE, we add simple linear classifiers on top of ELECTRA. ... Some of our evaluation datasets are small, which means accuracies of fine-tuned models can vary substantially depending on the random seed," where Clark's evaluation dataset corresponds to the instant target document data) trained by using the supervised training data in the fine-tuning process (Clark, p. 4, 2 Method: "After pre-training, we throw out the generator and fine-tune the discriminator on downstream tasks" and p. 14, B Fine-Tuning Details: "For WNLI, we follow the trick ... where we extract candidate antecedents for the pronoun using rules and train a model to score the correct antecedent highly. ... [W]e fine-tune ELECTRA’s discriminator so it assigns high scores to the tokens of the correct antecedent when the correct antecedent replaces the pronoun," where Clark's correct antecedent scores corresponds to the instant supervised training data); and discriminating words that have been altered among the plurality of words in the target document data, based on an output result of the discriminator (Clark, p. 14, B Fine-Tuning Details: "For WNLI, we follow the trick ... where we extract candidate antecedents for the pronoun using rules and train a model to score the correct antecedent highly. ... [W]e fine-tune ELECTRA’s discriminator so it assigns high scores to the tokens of the correct antecedent when the correct antecedent replaces the pronoun," where Clark's replaced token scores corresponds to the instant altered word). Claims 3 and 4 are rejected under 35 U.S.C. 103 as being unpatentable over Clark, et al., "Electra: Pre-training text encoders as discriminators rather than generators" (hereinafter "Clark") in view of Chen, et al., "Adding a filter based on the discriminator to improve unconditional text generation" (hereinafter "Chen") in view of Zhu, et al., "Unpaired image-to-image translation using cycle-consistent adversarial networks" (hereinafter "Zhu"). Regarding Claim 3, the rejection of Claim 2 is incorporated. The Clark/Chen combination has been shown to teach: the updating of the parameters includes updating the parameters of the generator, the discriminator... in the machine learning model, by using the first error information, the second error information ... obtained based on the first document data ..., to minimize a total loss derived from the first error information, the second error information (as recited in the rejection of Claim 1, Clark, p. 4, 2 Method: "We minimize the combined loss min θ G , θ D ⁡ ∑ x ∈ X L M L M x , θ G + λ L D i s c x , θ D over a large corpus X of raw text," where Clark's model parameters θ G and θ D and losses correspond to the instant generator and discrimimator parameters and error information, respectively). The Clark/Chen combination teaches a machine learning model that includes a generator and a discriminator and executing training of the machine learning model by using first error information and second error information. The Clark/Chen combination does not explicitly teach wherein the machine learning model further includes: a restorer that generates third document data obtained to restore the first document data, in response to an input of the second document data generated by the generator and the updating of the parameters includes updating the parameters of ... the restorer in the machine learning model, by using ... third error information obtained based on ... the third document data generated by the restorer, to minimize a total loss derived from ... the third error information. However, Zhu teaches: wherein the machine learning model further includes: a restorer that generates third document data obtained to restore the first document data, in response to an input of the second document data generated by the generator (Zhu, p. 4, 3.2. Cycle Consistency Loss: "for each image x from domain X , the image translation cycle should be able to bring x back to the original image, i.e., x → G x → F G x ≈ x . We call this forward cycle consistency. ... We incentivize this behavior using a cycle consistency loss", where Zhu's F corresponds to the instant restorer, G x corresponds to the instant second document data, and F G x ≈ x corresponds to the instant third document data), and the updating of the parameters includes updating the parameters of ... the restorer in the machine learning model, by using ... third error information obtained based on ... the third document data generated by the restorer, to minimize a total loss derived from ... the third error information (Zhu, p. 4, 3.3. Full Objective: "We aim to solve: G * , F * = arg ⁡ min G , F ⁡ max D x , D y ⁡ L G , F , D X , D Y (4) ... Such a setup can also be seen as a special case of 'adversarial autoencoders' [34], which use an adversarial loss to train the bottleneck layer of an autoencoder to match an arbitrary target distribution," where the minimization term m i n G , F for Zhu's cycle consistency loss term of Eq. 3 L c y c G , F , which corresponds to the instant third error information). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Clark regarding a machine learning model that includes a generator and a discriminator with those of Zhu regarding wherein the machine learning model further includes: a restorer that generates third document data obtained to restore the first document data, in response to an input of the second document data generated by the generator and the updating of the parameters includes updating the parameters of ... the restorer in the machine learning model, by using ... third error information obtained based on ... the third document data generated by the restorer, to minimize a total loss derived from ... the third error information. The motivation to do so would be to facilitate more efficient training by reduced the space of learned functions (Zhu, p. 4, 3.2. Cycle Consistency Loss: "with large enough capacity, a network can map the same set of input images to any random permutation of images in the target domain, where any of the learned mappings can induce an output distribution that matches the target distribution. Thus, adversarial losses alone cannot guarantee that the learned function can map an individual input x i to a desired output y i . To further reduce the space of possible mapping functions, we argue that the learned mapping functions should be cycle-consistent"). Regarding Claim 4, the rejection of Claim 3 is incorporated. The Clark/Chen/Zhu combination teaches: the training process further comprising: generating, as the first error information, error information that uses a first loss function configured to train the generator such that the second document data is not discriminated by the discriminator (Clark, p. 3, 2 Method: "Given an input x .... model inputs are constructed according to m i ~ unif { 1 , n } for i = 1   t o   k x ^ i ~ p G x i x m a s k e d for i ∈ m x m a s k e d = R E P L A C E x , m , M A S K x c o r r u p t = R E P L A C E x , m , x ^ and the loss functions are L M L M x , θ G = E ∑ i ∈ m - log ⁡ p G x i x masked ," where Clark's L M L M corresponds to the instant first error information); generating, as the second error information, error information that uses a second loss function configured to train the discriminator such that an error between the discrimination result and the correct answer information becomes smaller (Clark, p. 3, 2 Method: "Given an input x .... model inputs are constructed according to m i ~ unif { 1 , n } for i = 1   t o   k x ^ i ~ p G x i x m a s k e d for i ∈ m x m a s k e d = R E P L A C E x , m , M A S K x c o r r u p t = R E P L A C E x , m , x ^ and the loss functions are ... L D i s c x , θ D = E ∑ t = 1 n 1 x t corrupt = x t log ⁡ D x t corrupt , t - 1 x t corrupt ≠ x t log ⁡ 1 - D x t corrupt , t " and p. 4, 2 Method: "We minimize the combined loss min θ G , θ D ⁡ ∑ x ∈ X L M L M x , θ G + λ L D i s c x , θ D over a large corpus X of raw text," where Clark's jointly minimized discriminator loss L D i s c x , θ D is trained to minimize according to the correct answer information per x t corrupt ≠ x t ). Zhu further teaches: generating, as the third error information, error information that uses a third loss function configured to train the restorer such that an error between the first document data and the third document data becomes smaller (Zhu, p. 4, 3.2. Cycle Consistency Loss: "We incentivize this behavior using a cycle consistency loss: L c y c G , F = E x ∼ p d a t a x F G x = x 1 +   E y ∼ p d a t a y G F y = y 1 (2) ," and p. 4, 3.3. Full Objective: "Our full objective is L G , F , D x , D y = L G A N G , D Y , X , Y + L G A N F , D X , Y , X +   λ L c y c G , F where λ controls the relative importance of the two objectives. We aim to solve: G * , F * = arg ⁡ min G , F ⁡ max D x , D y ⁡ L G , F , D X , D Y (4) Notice that our model can be viewed as training two 'autoencoders'," where the term m i n G , F from Zhu's objective corresponds to the instant train the restorer such that an error between the first document data and the third document data becomes smaller, where Zhu's F corresponds to the instant restorer). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the Clark/Chen/Zhu combination regarding generating, as the first error information, error information that uses a first loss function configured to train the generator such that the second document data is not discriminated by the discriminator, and generating, as the second error information, error information that uses a second loss function configured to train the discriminator such that an error between the discrimination result and the correct answer information becomes smaller with the further teachings of Zhu regarding generating, as the third error information, error information that uses a third loss function configured to train the restorer such that an error between the first document data and the third document data becomes smaller. The motivation to do so would be to facilitate more efficient training by reduced the space of learned functions (Zhu, p. 4, 3.2. Cycle Consistency Loss: "with large enough capacity, a network can map the same set of input images to any random permutation of images in the target domain, where any of the learned mappings can induce an output distribution that matches the target distribution. Thus, adversarial losses alone cannot guarantee that the learned function can map an individual input x i to a desired output y i . To further reduce the space of possible mapping functions, we argue that the learned mapping functions should be cycle-consistent"). Conclusion THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to ROBERT N DAY whose telephone number is (703)756-1519. The examiner can normally be reached M-F 9-5. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached at (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /R.N.D./Examiner, Art Unit 2122 /KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122
Read full office action

Prosecution Timeline

Sep 06, 2022
Application Filed
Jul 21, 2025
Non-Final Rejection — §101, §103
Oct 27, 2025
Response Filed
Feb 06, 2026
Final Rejection — §101, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12406181
METHOD, DEVICE, AND COMPUTER PROGRAM PRODUCT FOR UPDATING MODEL
2y 5m to grant Granted Sep 02, 2025
Patent 12229685
MODEL SUITABILITY COEFFICIENTS BASED ON GENERATIVE ADVERSARIAL NETWORKS AND ACTIVATION MAPS
2y 5m to grant Granted Feb 18, 2025
Study what changed to get past this examiner. Based on 2 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
23%
Grant Probability
46%
With Interview (+23.2%)
4y 3m
Median Time to Grant
Moderate
PTA Risk
Based on 22 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month