Last updated: April 19, 2026
Application No. 18/348,759
COMPUTER-READABLE RECORDING MEDIUM STORING MACHINE LEARNING PROGRAM, MACHINE LEARNING METHOD, AND INFORMATION PROCESSING APPARATUS FOR PERFORMING EFFICIENT TRAINING ON LANGUAGE MODEL BY INCORPORATING NON-FUNCTIONAL PERFORMANCE IN LOSS FUNCTION

Non-Final OA §103
Filed
Jul 07, 2023
Examiner
WITHEY, THEODORE JOHN
Art Unit
2655
Tech Center
2600 — Communications
Assignee
Fujitsu Limited
OA Round
3 (Non-Final)
This examiner grants 44% of cases after interview

— +46.9% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 23 resolved cases, 2023–2026
Examiner Intelligence

WITHEY, THEODORE JOHN View full profile →
Grants 44% of resolved cases
Career Allow Rate
10 granted / 23 resolved
-18.5% vs TC avg
Strong +47% interview lift
Without
With
+46.9%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
39 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
22.0%
-18.0% vs TC avg
§103
48.6%
+8.6% vs TC avg
§102
17.1%
-22.9% vs TC avg
§112
12.0%
-28.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 23 resolved cases
Office Action

§103
DETAILED ACTION
	This office action is in response to Applicant’s Request for Continued Examination (RCE), received on 01/20/2026. Claims 1, 6, and 7 have been amended. Claims 1, 3-7 are pending and have been considered.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 01/20/2026 has been entered.
 
Response to Arguments
	Regarding Applicant’s statement concerning foreign priority (see remarks entered 08/25/2025), there is no entry dated 01/06/2024 (or any other date) corresponding to a retrieved foreign priority document. The examiner would like to note that Applicant makes reference to PAIR, which was retired before the requested retrieval date of 01/06/2024. The office made an attempt to retrieve the foreign priority document through an internal request by the examiners, but was unable to successfully perform this retrieval without a required access code associated with the foreign document.
As such, foreign priority acknowledgement is still made; however, there is still not a certified copy of the foreign priority document filed/retrieved.

Applicant’s arguments, see pgs. 6-12, filed 01/20/2026, with respect to “Claim Rejections Under 35 U.S.C. 101” have been fully considered and are persuasive.  The  rejections of claims 1, 3-7 under 35 U.S.C. 101 have been withdrawn. The examiner would like to note that the amended claims have incorporated a cited improvement, namely, training/updating a machine learning language model parameter set based on a loss function which considers non-functional performance of the language model, avoiding prolonged development times due to repeated generation and training (see [0041]-[0042] of instant app).

Applicant’s arguments, see pgs. 12-16, filed 01/20/2026, with respect to the rejection(s) of claim(s) 1, 6, and 7 under 35 U.S.C. 103 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Rafferty et al. (US-20210297376-A1), hereinafter Rafferty. Rafferty discloses “Systems and methods for processing user data are disclosed. A system may include a memory storing instructions and at least one processor configured to execute the instructions to perform operations. The operations may include receiving, from a sensor, first user data associated with a client device; receiving, from a filter model, feature data corresponding to the first user data, the feature data comprising at least one of workflow information, system messages, or email addresses; training a meta-model to predict the first user data based on the feature data; generating a meta-model output based on the filter model and the feature data; updating the filter model based on the meta-model output; and transmitting the updated filter model to the client device.” (abstract). See updated rejections below.

Priority
Acknowledgment is made of applicant's claim for foreign priority based on an application filed in Japan on 07/14/2022. It is noted, however, that applicant has not filed a certified copy of the JP2022-113423 application as required by 37 CFR 1.55.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim(s) 1, 3-7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lester et al. (US-20230325725-A1), hereinafter Lester, in view of Rafferty et al. (US-20210297376-A1), hereinafter Rafferty, further in view of Peleg et al. (US-20220198136-A1), hereinafter Peleg.

	Regarding claim 1, Lester discloses: a non-transitory computer-readable recording medium ([0066] one or more non-transitory computer-readable storage mediums) storing a machine learning program ([0095] Each application contains its own machine learning library and machine-learned model(s)) of performing training on a language model that is a machine learning model having at least a plurality of parameters to be trained on a machine learning processing using, as a training data set, a corpus that is language resources ([0028] a large pre-trained language model, [0079] the model trainer 160 can train the pre-trained machine-learned models, [A large language model is machine learning having a plurality, i.e. large amount, of parameters, wherein a language model is indicating to be trained on language]), the machine learning program comprising instructions which, when executed by a computer ([0072] instructions 138 which are executed by the processor 132 to cause the server computing system 130 to perform operations), cause the computer to execute processing comprising:
	measuring, for each of a plurality of pieces of data included in the corpus ([0079] the training data 162 can include a plurality of training examples and a plurality of respective labels, [0085] machine-learned model(s) can process the text or natural language data to generate an output), a non-functional performance that represents a performance for a requirement that excludes a function of each of the plurality of pieces of data ([0031] the prompt gradient can be determined by evaluating a loss function that is evaluated based on a difference between the training output and the one or more training labels. The loss function can include a perceptual loss or another loss function. In some implementations, the labels can include ground truth outputs for the respective training examples [Determining perceptual loss, i.e. accuracy, between training examples and ground truth output indicates a non-functional performance measure, see [0057] of the instant application, wherein prediction accuracy excludes functionality, i.e. masking or other prediction method. The examiner asserts that this is ]).
	Lester does not disclose:
the non-functional performance excluding objective accuracy for prediction task.
Rafferty discloses:
the non-functional performance excluding objective accuracy for prediction task ([0061] A training criterion may include a number of epochs, a training time, [A training time tracks to a non-functional performance, see [0022] of instant application “program execution speed”. Wherein the training is in the context of prediction, see abstract]).
 Lester and Rafferty are considered analogous art within user data prediction. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Lester to incorporate the teachings of Rafferty, because of the novel way to develop a filtering management system configured to determine an optimal time to deliver incoming notifications based on the intensity of the current user’s task, improving the accuracy and applicability of the filtering management system (Rafferty, [0007]).
	Lester in view of Rafferty does not disclose:
	
performing, based on using divided data obtained by dividing each of the plurality of pieces of data into a first portion of the data and a second portion that is correct answer data as training data.
Peleg discloses:
performing, based on using divided data obtained by dividing each of the plurality of pieces of data into a first portion of the data and a second portion that is correct answer data as training data ([0129] In order to synthesize text from at least the first and second text passages, the writing assistant may change the order of content in the text passages, merge sentences, split sentences… [0158] Such capabilities may be provided by training a model to predict text within a document from a large corpus conditioned upon the preceding text [In view of the training examples and ground truth outputs for text prediction of Lester, indicating that the text splitting of Peleg could be used to determine accuracy of predictions using the second portion of split sentences based on preceding, i.e. first portion, as correct answer data in view of the loss function comparing ground truth to expected output for predictions of Lester which indicates a correct answer data to compare the prediction to for determining accuracy]).
 Lester, Rafferty, and Peleg are considered analogous art within textual prediction analysis. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Lester in view of Rafferty to incorporate the teachings of Peleg, because of the novel way to generate text predictions based on context and/or word sense in addition to just words themselves, improving the rate of meaningful text prediction generation (Peleg, [0002]).
Lester further discloses:
the machine learning processing on the language model to predict the second portion of the data in response to an input of the first portion of the data ([0030] the pre-trained machine-learned model can include a model adapted to generate a text prediction output for text that follows an input text (e.g., the input text can include “the sky is” and the output can be “blue”). Alternatively and/or additionally, the pre-trained machine-learned model may have been trained with text masking (e.g., the input text can include “The man old” and the output can be “is”) [In view of the text splitting of Peleg, further in view of the ground truth outputs and training examples of Lester, indicates a system that is predicting ends of sentences, i.e. second portions of data, based on opening statements, i.e. first portions of data, using the text splitting of Peleg, wherein a ground truth output of Lester would be “the sky is blue” or “blue” for the split segment “the sky is” as would be determined in Peleg. Further, in view of the loss function of Lester ([0054]), there is indicated that there is an original undivided sentence so loss between target and prediction can be determined, further indicating division at some point to make the prediction]),
wherein the machine learning processing includes updating the plurality of parameters included in the language model based on using a loss function ([0077] a loss can be backpropagated through the model(s) to update one or more parameters of the model(s) (e.g., based on a gradient of the loss function)),
the loss function includes:
as a loss term, a difference between the correct answer data and a prediction result ([0054] The model's prediction can be compared to the target to calculate a loss, and the error can be back-propagated to calculate gradients, however the system may only apply these gradient updates to our new learnable vectors [Wherein a target tracks to correct answer data in view of the ground truth outputs for training examples disclosed in [0031], [0117]]).
Rafferty further discloses:
as a weight term, a parameter determined according to a measurement result of the non-functional performance that is the parameter that indicates a ratio of reflecting the non-functional performance in the language model ([0061] In some embodiments, training of a model may terminate when a training criterion is satisfied. A training criterion may include… a performance metric (e.g., an estimate of accuracy in reproducing test data, a loss function metric), or the like. Model trainer 436 may be configured to adjust model parameters during training. Model parameters may include weights, coefficients, offsets, or the like, [In view of the previously disclosed training time criterion of Rafferty which is a measure of non-functional performance. Considering the plurality of training criterion disclosed in Rafferty, in view of the loss function of Lester which can include a variety of losses ([0035]), indicating each disclosed training criterion of Rafferty could be implemented using the multi-level loss function of Lester. Further, representing non-functional performance as a ratio reflecting the non-functional performance, i.e. time, is a form of loss function, i.e. 50/35, 50 seconds taken over the target criterion 35 seconds, giving a loss of 15 seconds. The ratio will not be satisfied until the training criterion is. Further, representing time as something which “indicates a ratio” is a generic configuration which does not add patentable weight to the claims and does not necessarily have to be a ratio itself]).

Regarding claim 3, Lester in view of Rafferty, further in view of Peleg discloses: the non-transitory computer-readable recording medium of claim 1.
Lester further discloses:
wherein as the loss term of the loss function for the machine learning processing, the loss term based on an appearance probability of a superficial character of each of the plurality of pieces of data is used ([0035] the prompt training can include training the model conditioned by the prompt to output the most probable label. Training can involve a perceptual loss and/or a variety of other losses… [0134] Instead of modeling classification as the probability of an output class given some input, p(y|X), where X is a series of tokens and y is a single class label, the systems and methods can model the function as conditional generation, where Y is a sequence of tokens that represent a class label [Training based on probabilities of predicted labels, wherein training also considers loss (consider the loss function between training output and training example labels [0117]), indicates a loss term to be minimized based on probability of appearance, i.e. superficially, of a label and the associated loss based on accuracy/probability the selected label, comprising word(s), e.g. comprised of superficial characters, in a prediction, in view of the plurality of pieces of data of Lester as previously disclosed. Further, determining a most probable label indicates analysis of words and/or characters (superficially) of preceding text to know what is most probable to come next]).

Regarding claim 4, Lester in view of Rafferty, further in view of Peleg discloses: the non-transitory computer-readable recording medium according to claim 1.
Lester further discloses:
wherein the measuring measures, for each of a plurality of programs ([0037] The prompt database can include a plurality of prompts associated with a plurality of different tasks [Differing prompts and associated tasks indicates the programs are different, i.e. the outputs will be vastly different dependent on the prompt/task]), the non-functional performance that excludes a function that defines an operation of each of the plurality of programs ([0031] the prompt gradient can be determined by evaluating a loss function that is evaluated based on a difference between the training output and the one or more training labels. The loss function can include a perceptual loss or another loss function. In some implementations, the labels can include ground truth outputs for the respective training examples [Determining perceptual loss, i.e. accuracy, between training examples and ground truth output indicates a non-functional performance measure, see [0057] of the instant application, wherein prediction accuracy excludes functionality, i.e. masking or other prediction method]), and,
the executing the machine learning processing executes, through machine learning that uses divided data obtained by dividing each of the plurality of programs into a head portion and a subsequent portion that is correct answer data as training data ([Fig. 1A, Training Computing System 150, Data 156], [0030] In some implementations, the pre-trained machine-learned model can include a model adapted to generate a text prediction output for text that follows an input text (e.g., the input text can include “the sky is” and the output can be “blue”) … [0054] The model's prediction can be compared to the target to calculate a loss [Wherein “the sky is” represents a head portion and “blue” represents a subsequent portion. In view of the target/prediction comparison to calculate loss, there is an indication that “the sky is blue” is an original training data divided with “blue” removed as a correct answer target so a loss between the prediction and target can be accurately determined. Further, consider the previously disclosed text splitting of Peleg in view of the target/predictions of Lester]), machine learning processing of training the language model that predicts the subsequent portion of the program according to an input of the head portion of the program ([0030] the input text can include “The man old” and the output can be “is” [In this example, the input head portion represents “The man old” with a predicted subsequent portion “is”, wherein Lester further discloses predicting additional text, [0112]]).

Regarding claim 5, Lester in view of Rafferty, further in view of Peleg discloses: the non-transitory computer-readable recording medium according to claim 1.
Lester further discloses:
wherein the measuring measures, for each of a plurality of pieces of document data ([0037] The prompt database can include a plurality of prompts associated with a plurality of different tasks [Differing prompts and associated tasks indicates the documents/text are different, i.e. the outputs will be vastly different dependent on the prompt/task]), the non- functional performance that indicates evaluation for an indirect function from a direct function of each of the plurality of pieces of document data ([0031] the prompt gradient can be determined by evaluating a loss function that is evaluated based on a difference between the training output and the one or more training labels. The loss function can include a perceptual loss or another loss function. In some implementations, the labels can include ground truth outputs for the respective training examples [Determining perceptual loss, i.e. accuracy, between training examples and ground truth output indicates a non-functional performance measure, see [0057] of the instant application, wherein prediction accuracy is an indirect function based on the direct function of sentence/word masking/prediction/etc.]), that excludes a function that defines a direct usage when each of the plurality of pieces of document data is used ([0217] The prediction can then be used in order to evaluate a loss function (e.g., the loss function may be evaluated by comparing the prediction and a respective label for the example [Determining loss based solely on prediction accuracy indicates the direct usage, i.e. word/sentence masking/predicting method, is excluded from the measuring]); and,
the executing the machine learning processing executes, through machine learning that uses divided data obtained by dividing each of the plurality of pieces of document data into the first portion and the second portion that is correct answer data as training data ([Fig. 1A, Training Computing System 150, Data 156], [0030] In some implementations, the pre-trained machine-learned model can include a model adapted to generate a text prediction output for text that follows an input text (e.g., the input text can include “the sky is” and the output can be “blue”) … [0054] The model's prediction can be compared to the target to calculate a loss [Wherein “the sky is” represents a first portion and “blue” represents a second portion. In view of the target/prediction comparison to calculate loss, there is an indication that “the sky is blue” is an original training data divided with “blue” removed as a correct target answer so a loss between the prediction and target can be accurately determined. Further, consider the previously disclosed text splitting of Peleg in view of the target/predictions of Lester]), machine learning processing of training the language model that predicts the second portion according to an input of the first portion of the document data ([0030] the input text can include “The man old” and the output can be “is” [In this example, the input first portion represents “The man old” with a predicted second portion “is”, wherein Lester further discloses predicting additional text, [0112]]).

Regarding claim 6, Lester discloses: a computer-implemented machine learning method ([0095] Each application contains its own machine learning library and machine-learned model(s), [Machine learning indicates a computer implementation as the machine]) of performing training on a language model that is a machine learning model having at least a plurality of parameters trained to be on a machine learning processing using ([0028] a large pre-trained language models), as a training data set, a corpus that is language resources, ([0079] the model trainer 160 can train the pre-trained machine-learned models, [A large language model is machine learning having a plurality, i.e. large amount, of parameters, wherein a language model is indicating to be trained on language]), the machine learning method comprising:
measuring, for each of a plurality of pieces of data included in the corpus ([0079] the training data 162 can include a plurality of training examples and a plurality of respective labels, [0085] machine-learned model(s) can process the text or natural language data to generate an output), a non-functional performance that represents a performance for a requirement that excludes a function of each of the plurality of pieces of data ([0031] the prompt gradient can be determined by evaluating a loss function that is evaluated based on a difference between the training output and the one or more training labels. The loss function can include a perceptual loss or another loss function. In some implementations, the labels can include ground truth outputs for the respective training examples [Determining perceptual loss, i.e. accuracy, between training examples and ground truth output indicates a non-functional performance measure, see [0057] of the instant application, wherein prediction accuracy excludes functionality, i.e. masking or other prediction method. The examiner asserts that this is ]).
	Lester does not disclose:
the non-functional performance excluding objective accuracy for prediction task.
Rafferty discloses:
the non-functional performance excluding objective accuracy for prediction task ([0061] A training criterion may include a number of epochs, a training time, [A training time tracks to a non-functional performance, see [0022] of instant application “program execution speed”. Wherein the training is in the context of prediction, see abstract]).
 Lester and Rafferty are considered analogous art within user data prediction. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Lester to incorporate the teachings of Rafferty, because of the novel way to develop a filtering management system configure to determine an optimal time to deliver incoming notifications based on the intensity of the current user’s task, improving the accuracy and applicability of the filtering management system (Rafferty, [0007]).
	Lester in view of Rafferty does not disclose:
	
performing, based on using divided data obtained by dividing each of the plurality of pieces of data into a first portion of the data and a second portion that is correct answer data as training data.
Peleg discloses:
performing, based on using divided data obtained by dividing each of the plurality of pieces of data into a first portion of the data and a second portion that is correct answer data as training data ([0129] In order to synthesize text from at least the first and second text passages, the writing assistant may change the order of content in the text passages, merge sentences, split sentences… [0158] Such capabilities may be provided by training a model to predict text within a document from a large corpus conditioned upon the preceding text [In view of the training examples and ground truth outputs for text prediction of Lester, indicating that the text splitting of Peleg could be used to determine accuracy of predictions using the second portion of split sentences based on preceding, i.e. first portion, as correct answer data in view of the loss function comparing ground truth to expected output for predictions of Lester which indicates a correct answer data to compare the prediction to for determining accuracy]).
 Lester are considered analogous art within textual prediction analysis. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Lester in view of Rafferty to incorporate the teachings of Peleg, because of the novel way to generate text predictions based on context and/or word sense in addition to just words themselves, improving the rate of meaningful text prediction generation (Peleg, [0002]).
Lester further discloses:
the machine learning processing on the language model to predict the second portion of the data in response to an input of the first portion of the data ([0030] the pre-trained machine-learned model can include a model adapted to generate a text prediction output for text that follows an input text (e.g., the input text can include “the sky is” and the output can be “blue”). Alternatively and/or additionally, the pre-trained machine-learned model may have been trained with text masking (e.g., the input text can include “The man old” and the output can be “is”) [In view of the text splitting of Peleg, further in view of the ground truth outputs and training examples of Lester, indicates a system that is predicting ends of sentences, i.e. second portions of data, based on opening statements, i.e. first portions of data, using the text splitting of Peleg, wherein a ground truth output of Lester would be “the sky is blue” or “blue” for the split segment “the sky is” as would be determined in Peleg. Further, in view of the loss function of Lester ([0054]), there is indicated that there is an original undivided sentence so loss between target and prediction can be determined, further indicating division at some point to make the prediction]),
wherein the machine learning processing includes updating the plurality of parameters included in the language model based on using a loss function ([0077] a loss can be backpropagated through the model(s) to update one or more parameters of the model(s) (e.g., based on a gradient of the loss function)),
the loss function includes:
as a loss term, a difference between the correct answer data and a prediction result ([0054] The model's prediction can be compared to the target to calculate a loss, and the error can be back-propagated to calculate gradients, however the system may only apply these gradient updates to our new learnable vectors [Wherein a target tracks to correct answer data in view of the ground truth outputs for training examples disclosed in [0031], [0117]]).
Rafferty further discloses:
as a weight term, a parameter determined according to a measurement result of the non-functional performance that is the parameter that indicates a ratio of reflecting the non-functional performance in the language model ([0061] In some embodiments, training of a model may terminate when a training criterion is satisfied. A training criterion may include… a performance metric (e.g., an estimate of accuracy in reproducing test data, a loss function metric), or the like. Model trainer 436 may be configured to adjust model parameters during training. Model parameters may include weights, coefficients, offsets, or the like, [In view of the previously disclosed training time criterion of Rafferty which is a measure of non-functional performance. Considering the plurality of training criterion disclosed in Rafferty, in view of the loss function of Lester which can include a variety of losses ([0035]), indicating each disclosed training criterion of Rafferty could be implemented using the multi-level loss function of Lester. Further, representing non-functional performance as a ratio reflecting the non-functional performance, i.e. time, is a form of loss function, i.e. 50/35, 50 seconds taken over the target criterion 35 seconds, giving a loss of 15 seconds. The ratio will not be satisfied until the training criterion is. Further, representing time as something which “indicates a ratio” is a generic configuration which does not add patentable weight to the claims and does not necessarily have to be a ratio itself]).

Regarding claim 7, Lester discloses: an information processing apparatus (Abstract, Systems and methods for natural language processing) of performing training on a language model that is a machine learning model having at least a plurality of parameters to be trained on a machine learning processing ([0028] a large pre-trained language models, [Large language models necessarily have a plurality of parameters]) using, as a training data set, a corpus that is language resources, ([0079] the model trainer 160 can train the pre-trained machine-learned models), the information processing apparatus comprising:
a memory ([0066] a memory 114); and,
a processor coupled to the memory ([0066] one or more processors 112 [In view of the system of Fig. 1A, where the processors and memories are clearly coupled]) and configured to:
measure, for each of a plurality of pieces of data included in the corpus ([0079] the training data 162 can include a plurality of training examples and a plurality of respective labels, [0085] machine-learned model(s) can process the text or natural language data to generate an output), a non-functional performance that represents a performance for a requirement that excludes a function of each of the plurality of pieces of data ([0031] the prompt gradient can be determined by evaluating a loss function that is evaluated based on a difference between the training output and the one or more training labels. The loss function can include a perceptual loss or another loss function. In some implementations, the labels can include ground truth outputs for the respective training examples [Determining perceptual loss, i.e. accuracy, between training examples and ground truth output indicates a non-functional performance measure, see [0057] of the instant application, wherein prediction accuracy excludes functionality, i.e. masking or other prediction method. The examiner asserts that this is ]).
	Lester does not disclose:
the non-functional performance excluding objective accuracy for prediction task.
Rafferty discloses:
the non-functional performance excluding objective accuracy for prediction task ([0061] A training criterion may include a number of epochs, a training time, [A training time tracks to a non-functional performance, see [0022] of instant application “program execution speed”. Wherein the training is in the context of prediction, see abstract]).
 Lester and Rafferty are considered analogous art within user data prediction. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Lester to incorporate the teachings of Rafferty, because of the novel way to develop a filtering management system configure to determine an optimal time to deliver incoming notifications based on the intensity of the current user’s task, improving the accuracy and applicability of the filtering management system (Rafferty, [0007]).
	Lester in view of Rafferty does not disclose:
	
performing, based on using divided data obtained by dividing each of the plurality of pieces of data into a first portion of the data and a second portion that is correct answer data as training data.
Peleg discloses:
performing, based on using divided data obtained by dividing each of the plurality of pieces of data into a first portion of the data and a second portion that is correct answer data as training data ([0129] In order to synthesize text from at least the first and second text passages, the writing assistant may change the order of content in the text passages, merge sentences, split sentences… [0158] Such capabilities may be provided by training a model to predict text within a document from a large corpus conditioned upon the preceding text [In view of the training examples and ground truth outputs for text prediction of Lester, indicating that the text splitting of Peleg could be used to determine accuracy of predictions using the second portion of split sentences based on preceding, i.e. first portion, as correct answer data in view of the loss function comparing ground truth to expected output for predictions of Lester which indicates a correct answer data to compare the prediction to for determining accuracy]).
 Lester are considered analogous art within textual prediction analysis. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Lester in view of Rafferty to incorporate the teachings of Peleg, because of the novel way to generate text predictions based on context and/or word sense in addition to just words themselves, improving the rate of meaningful text prediction generation (Peleg, [0002]).
Lester further discloses:
the machine learning processing on the language model to predict the second portion of the data in response to an input of the first portion of the data ([0030] the pre-trained machine-learned model can include a model adapted to generate a text prediction output for text that follows an input text (e.g., the input text can include “the sky is” and the output can be “blue”). Alternatively and/or additionally, the pre-trained machine-learned model may have been trained with text masking (e.g., the input text can include “The man old” and the output can be “is”) [In view of the text splitting of Peleg, further in view of the ground truth outputs and training examples of Lester, indicates a system that is predicting ends of sentences, i.e. second portions of data, based on opening statements, i.e. first portions of data, using the text splitting of Peleg, wherein a ground truth output of Lester would be “the sky is blue” or “blue” for the split segment “the sky is” as would be determined in Peleg. Further, in view of the loss function of Lester ([0054]), there is indicated that there is an original undivided sentence so loss between target and prediction can be determined, further indicating division at some point to make the prediction]),
wherein the machine learning processing includes updating the plurality of parameters included in the language model based on using a loss function ([0077] a loss can be backpropagated through the model(s) to update one or more parameters of the model(s) (e.g., based on a gradient of the loss function)),
the loss function includes:
as a loss term, a difference between the correct answer data and a prediction result ([0054] The model's prediction can be compared to the target to calculate a loss, and the error can be back-propagated to calculate gradients, however the system may only apply these gradient updates to our new learnable vectors [Wherein a target tracks to correct answer data in view of the ground truth outputs for training examples disclosed in [0031], [0117]]).
Rafferty further discloses:
as a weight term, a parameter determined according to a measurement result of the non-functional performance that is the parameter that indicates a ratio of reflecting the non-functional performance in the language model ([0061] In some embodiments, training of a model may terminate when a training criterion is satisfied. A training criterion may include… a performance metric (e.g., an estimate of accuracy in reproducing test data, a loss function metric), or the like. Model trainer 436 may be configured to adjust model parameters during training. Model parameters may include weights, coefficients, offsets, or the like, [In view of the previously disclosed training time criterion of Rafferty which is a measure of non-functional performance. Considering the plurality of training criterion disclosed in Rafferty, in view of the loss function of Lester which can include a variety of losses ([0035]), indicating each disclosed training criterion of Rafferty could be implemented using the multi-level loss function of Lester. Further, representing non-functional performance as a ratio reflecting the non-functional performance, i.e. time, is a form of loss function, i.e. 50/35, 50 seconds taken over the target criterion 35 seconds, giving a loss of 15 seconds. The ratio will not be satisfied until the training criterion is. Further, representing time as something which “indicates a ratio” is a generic configuration which does not add patentable weight to the claims and does not necessarily have to be a ratio itself]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Nakata et al. (US-20230196142-A1) discloses “An information processing method includes: obtaining a first inference model serving as a reference; computing a second inference model that is larger than the first inference model in model size, based on the first inference model; quantizing the second inference model computed to generate a third inference model; training the third inference model, using machine learning; determining whether a performance of the third inference model trained satisfies a condition; and outputting the third inference model trained, when the performance satisfies the condition” (abstract). Specifically, [0075] discloses a loss function relating to inference accuracy and inference speed, though in the context of imaging.
Zhang et al. (“Text Entry Throughput: Towards Unifying Speed and Accuracy in a Single Performance Metric”) discloses “we introduce a text entry method-independent throughput metric based on Shannon information theory (1948). To explore the practical usability of the metric, we conducted an experiment in which 16 participants typed with a laptop keyboard using different cognitive sets, i.e., speed-accuracy biases. Our results show that as a performance metric, text entry throughput remains relatively stable under different speed-accuracy conditions” (abstract). See entire document.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to THEODORE JOHN WITHEY whose telephone number is (703)756-1754. The examiner can normally be reached Monday - Friday, 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached at (571) 272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/THEODORE WITHEY/Examiner, Art Unit 2655           

/ANDREW C FLANDERS/Supervisory Patent Examiner, Art Unit 2655
Read full office action
Prosecution Timeline

Jul 07, 2023
Application Filed
May 19, 2025
Non-Final Rejection — §103
Aug 25, 2025
Response Filed
Oct 14, 2025
Final Rejection — §103
Jan 20, 2026
Request for Continued Examination
Jan 29, 2026
Response after Non-Final Action
Feb 24, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/655,770
Patent 12591744
METHOD FOR TRAINING SEMANTIC REPRESENTATION MODEL, DEVICE AND STORAGE MEDIUM
2y 5m to grant Granted Mar 31, 2026
18/113,192
Patent 12536994
APPARATUS FOR CLASSIFYING SOUNDS BASED ON NEURAL CODE IN SPIKING NEURAL NETWORK AND METHOD THEREOF
2y 5m to grant Granted Jan 27, 2026
17/956,558
Patent 12475330
METHOD FOR IDENTIFYING NOISE SAMPLES, ELECTRONIC DEVICE, AND STORAGE MEDIUM
2y 5m to grant Granted Nov 18, 2025
17/813,944
Patent 12417759
SPEECH RECOGNITION USING CADENCE PATTERNS
2y 5m to grant Granted Sep 16, 2025
17/986,417
Patent 12412580
Sound Extraction System and Sound Extraction Method
2y 5m to grant Granted Sep 09, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
44%
Grant Probability
90%
With Interview (+46.9%)
2y 11m
Median Time to Grant
High
PTA Risk
Based on 23 resolved cases by this examiner. Grant probability derived from career allow rate.