Last updated: April 19, 2026
Application No. 17/383,860
SUPERLOSS: A GENERIC LOSS FOR ROBUST CURRICULUM LEARNING

Non-Final OA §101§103§112
Filed
Jul 23, 2021
Examiner
ZECHER, CORDELIA P K
Art Unit
2100
Tech Center
2100 — Computer Architecture & Software
Assignee
Naver Corporation
OA Round
4 (Non-Final)
Interview Optional

— +25.8% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 509 resolved cases, 2023–2026
Examiner Intelligence

ZECHER, CORDELIA P K View full profile →
Grants 50% of resolved cases
Career Allow Rate
253 granted / 509 resolved
-5.3% vs TC avg
Strong +26% interview lift
Without
With
+25.8%
Interview Lift
resolved cases with interview
Typical timeline
3y 8m
Avg Prosecution
287 currently pending
Career history
796
Total Applications
across all art units
Statute-Specific Performance

§101
19.0%
-21.0% vs TC avg
§103
46.8%
+6.8% vs TC avg
§102
13.1%
-26.9% vs TC avg
§112
16.0%
-24.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 509 resolved cases
Office Action

§101 §103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

	In view of the Pre-Appeal Conference Decision of December 15th 2025, PROSECUTION IS HEREBY REOPENED. A new ground(s) of rejection are set forth below.

	According to paper filed November 5th 2024, claims 1-20 are pending for examination with an October 9th 2020 priority date under 35 USC §119 (a)-(d) or (f).
	By way of the present Amendment, claim 20 is amended. No claim is added or canceled.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):

(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
The term “dictating the extent” in claims 1 and 20 is a relative term which renders the claim indefinite. The term “dictating the extent” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. For examination purpose, the feature of “dictating the extent” is construed as “updating the extent” in the present Office action and cited accordingly until further clarification provided.

The following is a quotation of 35 U.S.C. 112(d):
(d) REFERENCE IN DEPENDENT FORMS.—Subject to subsection (e), a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.


The following is a quotation of pre-AIA  35 U.S.C. 112, fourth paragraph:
Subject to the following paragraph [i.e., the fifth paragraph of pre-AIA  35 U.S.C. 112], a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.


Claim 18 is rejected under 35 U.S.C. 112(d) or pre-AIA  35 U.S.C. 112, 4th paragraph, as being of improper dependent form for failing to further limit the subject matter of the claim upon which it depends, or for failing to include all the limitations of the claim upon which it depends.
Claim 18 recites identical method steps as its parent claim 1.
Applicant may cancel the claim(s), amend the claim(s) to place the claim(s) in proper dependent form, rewrite the claim(s) in independent form, or present a sufficient showing that the dependent claim(s) complies with the statutory requirements.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 are rejected under 35 USC 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1:
Claims 1-20 are directed to a computer-implemented method. Claim 19 is directed to a system. Therefore, each of these claims is directed to one of the four statutory categories of patent eligible subject matter.
Step 2A Prong 1:
Claim 1 recites:
“for each data sample of a set of labeled data samples: by a first loss function for the data processing task; computing a first loss for that data sample”; computing loss function that can be carried out by a human in the mind or with pen and paper, and is thus a mental process.
“by a second loss function, automatically computing a weight value for the data sample based on the first loss”; computing a weight value that can be carried out by a human in the mind or with pen and paper, and is thus a mental process.
“the weight value indicative of a reliability of a label of the data sample predicted by the neural network for the data sample and dictating the extent to which that data sample impacts training of the neural network”; weight value indicative of reliability of a label of the data sample and dictating and impacting the neural network training that can be carried out by a human in the mind or with pen and paper, and is thus a mental process.
“training the neural network with the set of labelled data samples according to their respective weight value”; training with labelled data samples that can be carried out by a human in the mind or with pen and paper, and is thus a mental process.
Claim 4 recites:
“computing the threshold value based on a running average of the first loss”; computing threshold
value that can be carried out by a human in the mind or with pen and paper, and is thus a
mental process.
Claim 5 recites:
“computing the threshold value based on an exponential running average of the first loss and using a smoothing parameter”; computing threshold value that can be carried out by a human in the mind or with pen and paper, and is thus a mental process.
Claim 6 recites:
“wherein the threshold value is a fixed predetermined value”; computing threshold value that can be carried out by a human in the mind or with pen and paper, and is thus a mental process.
Claim 7 recites:
“automatically computing the weight value includes, by the second loss function, automatically computing the weight value further based on a regularization hyerparameter and a threshold value”; computing weight value that can be carried out by a human in the mind or with pen and paper, and is thus a mental process.
Claim 8 recites:
“automatically computing the weight value includes, by the second loss function, setting the weight value one of (a) based on and (b) equal to, a minimum one of : …”; computing weight value that can be carried out by a human in the mind or with pen and paper, and is thus a mental process.
Claim 12 recites:
“computing the confidence value of the data sample…”; computing confidence value that can be carried out by a human in the mind or with pen and paper, and is thus a mental process.
Claim 13 recites:
“automatically computing the weight value includes, by the second loss function, …”; computing
weight value that can be carried out by a human in the mind or with pen and paper, and is thus a mental process.
Claim 14 recites:
“automatically computing the weight value includes, by the second loss function, …”; computing weight value that can be carried out by a human in the mind or with pen and paper, and is thus a mental process.
Claim 15 recites:
“automatically computing the weight value includes, by the second loss function, …”; computing weight value that can be carried out by a human in the mind or with pen and paper, and is thus a mental process.
Claim 19 recites:
“using a first loss function for the data processing task, computing a first loss for that data sample”; computing loss function of data samples that can be carried out by a human in the mind or with pen and paper, and is thus a mental process.
“using a second loss function, automatically computing a weight value for the data sample based on
the first loss, the weight value indicative of a reliability of a label of the data sample predicted by the neural network for the data sample”; computing loss function of data samples that can be carried out by a human in the mind or with pen and paper, and is thus a mental process.
Claim 20 recites:
“for each data sample of a set of labeled data samples: by a first loss function for the data processing task, computing a first loss for that data sample”; computing a loss function that can be carried out by a human in the mind or with pen and paper, and is thus a mental process.
“by a second loss function, automatically computing a weight value for the data sample based on the first loss, the weight value indicative of a reliability of a label of the data sample predicted by the neural network for the data sample and dictating the extent to which that data sample impacts training of the neural network”; dictating training impacts is an evaluation that can be carried out by a human in the mind or with pen and paper, and is thus a mental process.
“training the neural network with the set of labelled data samples with impacts defined by their respective weight values”; training with labelled data samples that can be carried out by a human in the mind or with pen and paper, and is thus a mental process.
Step 2A Prong 2:
This judicial exception is not integrated into a practical application because the additional elements are as follows:
Claim 19 recites:
“one or more processors; memory including instructions that, when executed by the one or more
processors, train a neural network to perform a data processing task by, for each data sample of a set
of labeled data samples”; this limitation amounts to nothing more than applying an abstract
idea to a generic computer, as per MPEP 2106.05(f).
“selectively updating a trainable parameter of the neural network based on the weight value”; this
limitation amounts to nothing more than applying an abstract idea to a generic computer, as
per MPEP 2106.05(f).
Step 2B:
	The below claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements are as follows:
Claim 19 recites:
“selectively updating a trainable parameter of the neural network based on the weight value”; this limitation amounts to well-understood, routine, conventional and insignificant extra solution activity, as per MPEP 2106.05(d) & MPEP 2106.05(g).
Dependent Claims
	Claims 2-3, 9-11, and 16-18 are also rejected under 35 USC 101 for the following reasons:
Claim 2 recites:
“automatically computing the weight value for the data sample includes increasing the weight value for the data sample if the first loss is less than a threshold value””; this limitation amounts to well-understood, routine, and conventional activity, as per MPEP 2106.05(d), therefore fails to amount to significantly more than the judicial exception into a practical application, as per MPEP 2106.05(f).
Claim 3 recites:
“automatically computing the weight value for the data sample includes decreasing the weight value for the data sample if the first loss is greater than the threshold value”; this limitation amounts to well-understood, routine, and conventional activity, as per MPEP 2106.05(d), therefore fails to amount to significantly more than the judicial exception into a practical application, as per MPEP 2106.05(f).
Claim 9 recites:
“automatically computing the weight value includes, by the second loss function, automatically computing the weight value further based on a confidence value of the data sample”; this
limitation amounts to nothing more than field of use, as per MPEP 2106.05(h).
Claim 10 recites:
“computing the confidence value of the data sample based on the first loss”; this limitation amounts to nothing more than field of use, as per MPEP 2106.05(h)..
Claim 11 recites:
“computing the confidence value of the data sample includes computing the confidence value based on minimizing the second loss function for the first loss”; this limitation amounts to nothing more
than field of use, as per MPEP 2106.05(h).
Claim 16 recites:
“the second loss function is a monotonically increasing concave function”; this limitation amounts to nothing more than field of use, as per MPEP 2106.05(h).
Claim 17 recites:
“the second loss function is a homogeneous function”; this limitation amounts to nothing more than field of use, as per MPEP 2106.05(h).
Claim 18 recites:
“the neutral network of claim 1 trained according to the method of claim 1”; this limitation amounts to nothing more than field of use, as per MPEP 2106.05(h).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. §102 and §103 (or as subject to pre-AIA  35 U.S.C. §102 and §103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. §103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. §102(b)(2)(C) for any potential 35 U.S.C. §102(a)(2) prior art against the later invention.

Claims 1-7, 9-11, and 18-20 are rejected under 35 U.S.C. §103 as being unpatentable over Luong et al. (IEEE Application of Deep Reinforcement Learning in Communications & networking), hereinafter Luong, and further in view of Jegou et al. (US 2021/0216874), hereinafter Jegou, and Jung (US 11,741,341), hereinafter Jung.

Claim 1
“for each data sample of a set of labeled data samples: training a neural network to perform a data processing task, comprising: for each data sample of a set of labeled data samples: by a first loss function for the data processing task, computing a first loss for that data sample” Jegou [0056] teaches training neural network model to provide an indication to an originator of the dataset, and data labels;

“by a second loss function, automatically computing a weight value for the data sample based on the first loss” Luong p.3138 teaches an ANN, a computational nonlinear model, that is able to learn to perform tasks such as classification and prediction; using gradient descent optimization algorithm to adjust the weights of neurons by calculating the gradient of the loss function, and Luong p.3140 further teaches Double Deep Q-Learning (DDQL) model using a Double Deep Q-Network (DDQL) to update the loss function, the “updated” loss function indicating the claimed second loss function;

“the weight value indicative of a reliability of a label of the data sample predicted by the neural network for the data sample and dictating the extent to which that data sample impacts training of the neural network” Luong p.3166 teaches labeled and unlabeled data of DRL scheme; and Luong p.3143 teaches a DQL technique based on SARSA learning to determine optimal policies in an online fashion and a label vector related to at least a state-action value and reward used for the algorithm, Luong p.3164 specifically spells out the “reward can be defined as a weighted sum of spectrum efficiency and QoE”; and “impacts training of the neural network” is taught in Luong p.3142. Luong p.3140 further teaches Double Deep Q-Learning (DDQL) model using a Double Deep Q-Network (DDQL) to update the loss function, the “updated” loss function indicating the claimed second loss function,
	However, Luong and Jegou fail to clearly spell out the “the weight value indicative of a reliability of a label of the data sample” as recited, said feature is disclosed in Jung. Jung claim 1 recites “… training each randomly initialized clustering model comprises evaluating a reliability of the randomly initialized clustering model, wherein evaluating the reliability of each randomly initialized clustering model comprises computing a weight associated the randomly initialized clustering model based on matching rate between estimated labels outputted by the randomly initialized clustering model …”.

“training the neural network with the set of labelled data samples according to their respective weight value” Jegou [0027] teaches training neural networks includes assigning different weights.

Luong, Jegou, and Jung disclose analogous art. Jegou is analogous art because it is in the field of radioactive data generation with datasets having a plurality of classes of data, and Jung is in the field of detecting anomaly for high-dimensional sensor data associated with one or more machines. Luong does not spell out the “data sample” as recited above. It is taught in Jegou (Jegou [0056]: training neural network model to provide an indication to an originator of the dataset, and data labels. It would have been obvious to one ordinary skilled in the art at the time the present invention was made to incorporate said feature of Jegou into Luong to enhance its neural system training functions with respect to data originator and labels. Further, Luong fails to spelled out the “weight values indicative of the reliability of data labels” as recited above. It is taught in Jung (Jung claim 1 recites “… training each randomly initialized clustering model comprises evaluating a reliability of the randomly initialized clustering model, wherein evaluating the reliability of each randomly initialized clustering model comprises computing a weight associated the randomly initialized clustering model based on matching rate between estimated labels outputted by the randomly initialized clustering model …”). It would have been obvious to one ordinary skilled in the art at the time the present invention was made to incorporate said feature of Jung into Luong to enhance its neural system training functions via indicating the data label reliability by the weight values.

Claim 2
“wherein automatically computing the weight value for the data sample includes increasing the weight value for the data sample if the first loss is less than a threshold value” Jegou [0043] teaches assigning multiplicative
weights, which can be calibrated or “trained” to produce the proper system output.

Claim 3
“wherein automatically computing the weight value for the data sample includes decreasing the weight value for the data sample if the first loss is greater than the threshold value” Jegou [0043] teaches assigning multiplicative weights, which can be calibrated or “trained” to produce the proper system output.

Claim 4
“computing the threshold value based on a running average of the first loss” Jegou [0057] teaches a defined threshold value, and Jegou [0086] teaches a first loss value and a second loss value.

Claim 5
“computing the threshold value based on an exponential running average of the first loss and using a smoothing
parameter” Jegou [0086][0090] teaches a defined marker data and a first loss value and a second loss value, and referring the [00004][00005] equations, the log(pi) has an exponential distribution.

Claim 6
“wherein the threshold value is a fixed predetermined value” Jegou [0057] teaches a defined threshold value.

Claim 7
“wherein automatically computing the weight value includes, by the second loss function, automatically computing the weight value further based on a regularization hyerparameter and a threshold value” Jegou [0027] teaches tuning the neural network 114 can include setting different parameters 128 for each neural network 114 … or assigning different weights (e.g., hyperparameters, or learning rates).

Claim 9
“wherein automatically computing the weight value includes, by the second loss function, automatically computing
the weight value further based on a confidence value of the data sample” Luong p.3168 teaches confidence of
the DRL framework.

Claim 10
“computing the confidence value of the data sample based on the first loss” Jegou [0080] teaches determining one or more characteristics of the neural network, such as weight values and confidence scores, the characteristics include a first loss value from applying first data.

Claim 11
“wherein computing the confidence value of the data sample includes computing the confidence value based on minimizing the second loss function for the first loss” Luong p.3140 teaches Algorithm 2 of selecting randomly samples and optimizing the weights of the neural network by using stochastic gradient descent with respect to the network parameter ϴ to minimize the loss; and
“computing of confidence value” is taught in Jegou, [0080] teaches determining one or more characteristics of neural networks, the characteristics include a first loss value from applying first data.

Claim 18
Claim 18 is rejected for the rationale given for claim 1 since identical training method claimed.

Claim 19
“one or more processors; memory including instructions that, when executed by the one or more processors, train a neural network to perform a data processing task by, for each data sample of a set of labeled data samples” Luong p.3166 teaches labeled and unlabeled data and Jegou [0053] teaches processors and memory:

“for each data sample of a set of labeled data samples: using a first loss function for the data processing task, computing a first loss for that data sample” Jegou [0056] teaches training neural network model to provide an indication to an originator of the dataset, and data labels;

“using a second loss function, automatically computing a weight value for the data sample based on the first loss” Luong p.3138 teaches an ANN, a computational nonlinear model, that is able to learn to perform tasks such as classification and prediction; using gradient descent optimization algorithm to adjust the weights of neurons by calculating the gradient of the loss function, and Luong p.3140 further teaches Double Deep Q-Learning (DDQL) model using a Double Deep Q-Network (DDQL) to update the loss function, the “updated” loss function indicating the claimed second loss function,

“the weight value indicative of a reliability of a label of the data sample predicted by the neural network for the data sample and dictating the extent to which that data sample impacts training of the neural network” Luong p.3166 teaches labeled and unlabeled data of DRL scheme; and Luong p.3143 teaches a DQL technique based on SARSA learning to determine optimal policies in an online fashion and a label vector related to at least a state-action value and reward used for the algorithm, Luong p.3164 specifically spells out the “reward can be defined as a weighted sum of spectrum efficiency and QoE”; and “impacts training of the neural network” is taught in Luong p.3142.
	However, Luong and Jegou fail to clearly spell out the “the weight value indicative of a reliability of a label of the data sample” as recited, said feature is disclosed in Jung. Jung claim 1 recites “… training each randomly initialized clustering model comprises evaluating a reliability of the randomly initialized clustering model, wherein evaluating the reliability of each randomly initialized clustering model comprises computing a weight associated the randomly initialized clustering model based on matching rate between estimated labels outputted by the randomly initialized clustering model …”;

“selectively updating a trainable parameter of the neural network based on the weight value ” Jegou [0039] teaches updated parameters.

Claim 20
“one or more processors; memory including instructions that, when executed by the one or more processors, train a
 neural network to perform a data processing task by, for each data sample of a set of labeled data samples” Luong p.3166 teaches labeled and unlabeled data and Jegou [0053] teaches processors and memory:

“for each data sample of a set of labeled data samples: by a first loss function for the data processing task, computing a first loss for that data sample” Jegou [0056] teaches training neural network model to provide an indication to an originator of the dataset, and data labels;

“by a second loss function, automatically computing a weight value for the data sample based on the first loss” Luong p.3138 teaches an ANN, a computational nonlinear model, that is able to learn to perform tasks such as classification and prediction; using gradient descent optimization algorithm to adjust the weights of neurons by calculating the gradient of the loss function, and Luong p.3140 further teaches Double Deep Q-Learning (DDQL) model using a Double Deep Q-Network (DDQL) to update the loss function, the “updated” loss function indicating the claimed second loss function,

“the weight value indicative of a reliability of a label of the data sample predicted by the neural network for the data sample and dictating the extent to which that data sample impacts training of the neural network” Luong p.3166 teaches labeled and unlabeled data of DRL scheme; and Luong p.3143 teaches a DQL technique based on SARSA learning to determine optimal policies in an online fashion and a label vector related to at least a state-action value and reward used for the algorithm, Luong p.3164 specifically spells out the “reward can be defined as a weighted sum of spectrum efficiency and QoE”; and “impacts training of the neural network” is taught in Luong p.3142.
	However, Luong and Jegou fail to clearly spell out the “the weight value indicative of a reliability of a label of the data sample” as recited, said feature is disclosed in Jung. Jung claim 1 recites “… training each randomly initialized clustering model comprises evaluating a reliability of the randomly initialized clustering model, wherein evaluating the reliability of each randomly initialized clustering model comprises computing a weight associated the randomly initialized clustering model based on matching rate between estimated labels outputted by the randomly initialized clustering model …”.

“training the neural network using the set of labelled data samples with impacts defined by their respective weight values” Jegou [0027] teaches training neural networks includes assigning different weights.

Claims 16 and 17 are rejected under 35 U.S.C. §103 as being unpatentable over Luong et al. (IEEE Application of Deep Reinforcement Learning in Communications & networking), hereinafter Luong, in view of Jegou et al. (US 2021/0216874), hereinafter Jegou, and Jung (US 11,741,341), hereinafter Jung, and further in view of Olabiyi et al. (US 10,510,003), hereinafter Olabiyi.

Claim 16
“wherein the second loss function is a monotonically increasing concave function” Olabiyi col.9 lines 36-58 teaches, for a typical dialogue dataset, the target sequence entropy vs token position is concave with the beginning and the end of the sequence entropy having lower entropy than the middle, and the system may switch to normal maximum likelihood loss, the loss increases as the model’s prediction deviates from the target, but the rate of increase slows down as the deviation grows.

Luong, Jegou, and Jung disclose analogous art. Jegou is analogous art because it is in the field of radioactive data generation with datasets having a plurality of classes of data, and Jung is in the field of detecting anomaly for high-dimensional sensor data associated with one or more machines. Luong does not spell out the “data sample” as recited above. It is taught in Jegou (Jegou [0056]: training neural network model to provide an indication to an originator of the dataset, and data labels. It would have been obvious to one ordinary skilled in the art at the time the present invention was made to incorporate said feature of Jegou into Luong to enhance its neural system training functions with respect to data originator and labels. Further, Luong fails to spelled out the “weight values indicative of the reliability of data labels” as recited above. It is taught in Jung (Jung claim 1 recites “… training each randomly initialized clustering model comprises evaluating a reliability of the randomly initialized clustering model, wherein evaluating the reliability of each randomly initialized clustering model comprises computing a weight associated the randomly initialized clustering model based on matching rate between estimated labels outputted by the randomly initialized clustering model …”). It would have been obvious to one ordinary skilled in the art at the time the present invention was made to incorporate said feature of Jung into Luong to enhance its neural system training functions via indicating the data label reliability by the weight values. Still further, Luong fails to spelled out the “a monotonically increasing concave function” as recited above. It is taught in Olabiyi (Olabiyi c.9 l.36-58 teaches, for a typical dialogue dataset, the target sequence entropy vs token position is concave with the beginning and the end of the sequence entropy having lower entropy than the middle, and the system may switch to normal maximum likelihood loss, the loss increases as the model’s prediction deviates from the target, but the rate of increase slows down as the deviation grows). It would have been obvious to one ordinary skilled in the art at the time the present invention was made to incorporate said feature of Olabiyi into Luong to enhance its neural system training functions by adding a monotonically increasing concave function.

Claim 17
“wherein the second loss function is a homogeneous function” Olabiyi col.7 lines 25-30 teaches an equation where λ is the similarity measure between predicted output K and the ground truth γ*, and
Olabiyi col.8 lines 25-39 further teaches an equation that provide a reinforcement signal that scales the average negative output loss.

Allowable Subject Matter
Claims 8 and 12-15 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any
intervening claims.

Response to Arguments
Applicant's arguments filed June 13th 2025 have been fully considered but they are not persuasive.
Luong reference
Applicant argues that “Luong therefore involves learning/training by adjusting the internal weights of the neural network itself. … weight in Luong adjust the impact of neurons in the neural network as explicitly state in Luong … whereas weights in claim 1 adjust the impact of respective training data samples” and continues to argue that “Luong, is silent as to for each data sample of a set of labeled data samples: by a second loss function, automatically computing a weight value for the data sample based on the first loss, the weight value indicative of a reliability of a label of the data sample predicted by the neural network for the data sample and dictating the extent to which that data sample impacts training of the neural network.” Said argument is not persuasive.
	“Weight” is recited in the pending claims. There is no recitation related to internal weights or external weights. For the argument’s sake, even Luong discloses internal weights, which is still considered “weights”, as recited in the pending claims of the present application. Applicant emphasizes each data sample with a bolded “each”. However, the Luong reference never teaches “part of” or “partial” data samples, and no particular data sample selection criteria disclosed either. It is reasonable to conclude that, based on broadest reasonable interpretation, all features recited in the present application and the cited prior art references are applicable to each data sample unless otherwise specified.
	The Luong reference does not clearly spell out the “the weight value indicative of a reliability of a label of the data sample predicted by the neural network for the data sample”, as argued. Accordingly, a newly cited reference, Jung (USPN 11,741,341), is applied in the present Office action. Claim rejection citations are amended as well.
	With respect to the “dictating the extent to which that data sample impacts training of the neural network” feature, “data sample impacts training of the neural network” is cited in the present Office action. Applicant is directed to review the “dictating the extent” rejection under 35 USC 112(b).
The “impacts” as argued is construed as “impacting the noise” of the neural network training in light of the disclosure of the Specification of the present application. Accordingly, claim rejection citations are amended in the present Office action to include page 3142 of the Luong reference. Applicant is herein advised that, by law, applicant is required to review the cited reference at its entirety, not limited to the cited exemplary paragraphs.
	Further, applicant argues that “Luong is also silent as to training the neural network with the set of labelled data samples according to their respective weight values.” Said argument is not persuasive because both labeled and unlabeled data samples are taught in Luong, and Jegou teaches assigning weight to data sample in training neural network. Hence, the “training the neural network with the set of labelled data samples according to their respective weight values” is construed and cited as “training the neural network with the set of labelled data samples according to their weight values”.
	Jegou & Olabiyi references
	Applicant traverses the Jegou and Olabiyi references with a mere statement of “Jegou does not remedy the deficiencies of Luong … and Olabiyi also do not remedy the deficiencies of Jegou and Luong” without submitting any substantive discussion. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RUAY HO whose telephone number is (571)272-6088. The examiner can normally be reached Monday to Friday 9am - 5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a
USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use
the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David Yi can be reached on 571-270-7519. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Ruay Ho/
Patent Examiner, Art Unit 2126
Read full office action
Prosecution Timeline

Jul 23, 2021
Application Filed
Aug 19, 2024
Non-Final Rejection — §101, §103, §112
Nov 01, 2024
Applicant Interview (Telephonic)
Nov 01, 2024
Examiner Interview Summary
Nov 05, 2024
Response Filed
Nov 22, 2024
Final Rejection — §101, §103, §112
Jan 16, 2025
Applicant Interview (Telephonic)
Jan 16, 2025
Examiner Interview Summary
Jan 21, 2025
Notice of Allowance
Jan 21, 2025
Response after Non-Final Action
Mar 05, 2025
Response after Non-Final Action
Mar 10, 2025
Non-Final Rejection — §101, §103, §112
Jun 05, 2025
Applicant Interview (Telephonic)
Jun 05, 2025
Examiner Interview Summary
Jun 13, 2025
Response after Non-Final Action
Jun 13, 2025
Notice of Allowance
Dec 09, 2025
Response after Non-Final Action
Dec 17, 2025
Non-Final Rejection — §101, §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/121,725
Patent 12583466
VEHICLE CONTROL MODULES INCLUDING CONTAINERIZED ORCHESTRATION AND RESOURCE MANAGEMENT FOR MIXED CRITICALITY SYSTEMS
2y 5m to grant Granted Mar 24, 2026
18/448,891
Patent 12578751
DATA PROCESSING CIRCUITRY AND METHOD, AND SEMICONDUCTOR MEMORY
2y 5m to grant Granted Mar 17, 2026
18/062,207
Patent 12561162
AUTOMATED INFORMATION TECHNOLOGY INFRASTRUCTURE MANAGEMENT
2y 5m to grant Granted Feb 24, 2026
18/364,680
Patent 12536291
PLATFORM BOOT PATH FAULT DETECTION ISOLATION AND REMEDIATION PROTOCOL
2y 5m to grant Granted Jan 27, 2026
18/411,841
Patent 12393641
METHODS FOR UTILIZING SOLVER HARDWARE FOR SOLVING PARTIAL DIFFERENTIAL EQUATIONS
2y 5m to grant Granted Aug 19, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

4-5
Expected OA Rounds
50%
Grant Probability
76%
With Interview (+25.8%)
3y 8m
Median Time to Grant
High
PTA Risk
Based on 509 resolved cases by this examiner. Grant probability derived from career allow rate.
SUPERLOSS: A GENERIC LOSS FOR ROBUST CURRICULUM LEARNING

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email