Last updated: April 19, 2026
Application No. 17/420,357
TRAINING TRAINABLE MODULES USING LEARNING DATA, THE LABELS OF WHICH ARE SUBJECT TO NOISE

Non-Final OA §103
Filed
Jul 01, 2021
Examiner
WU, NICHOLAS S
Art Unit
2148
Tech Center
2100 — Computer Architecture & Software
Assignee
Robert Bosch GmbH
OA Round
3 (Non-Final)
This examiner grants 47% of cases after interview

— +43.1% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 38 resolved cases, 2023–2026
Examiner Intelligence

WU, NICHOLAS S View full profile →
Grants 47% of resolved cases
Career Allow Rate
18 granted / 38 resolved
-7.6% vs TC avg
Strong +43% interview lift
Without
With
+43.1%
Interview Lift
resolved cases with interview
Typical timeline
3y 9m
Avg Prosecution
44 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
26.7%
-13.3% vs TC avg
§103
52.6%
+12.6% vs TC avg
§102
3.1%
-36.9% vs TC avg
§112
17.4%
-22.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 38 resolved cases
Office Action

§103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 09/02/2025 has been entered.
 
Response to Arguments
Applicant's arguments filed 09/02/2025 have been fully considered but they are not fully persuasive.
Regarding the 101 rejections, after further consideration, applicant’s arguments and amendments to the independent claims are persuasive and overcome the previous 101 rejections. Specifically, applicant’s amended limitations provide a technical improvement to the field of artificial intelligence because weighting a learning dataset in a cost function based on an accuracy assessment limiting the negative impacts of learning samples on a model based on the assessment. See pg. 12 of “Remarks”: “[0020] In one particularly advantageous embodiment of the present invention, adaptable parameters which characterize the behavior of the trainable module are optimized with the goal of improving the value of a cost function. In an ANN, these parameters include, for example, the weights with which the inputs supplied to one neuron are offset for an activation of this neuron. The cost function measures to what extent the trainable module maps the learning input variable values contained in learning data sets on the associated learning output variable values. In conventional training of trainable modules, all learning data sets are equal in this aspect, i.e., the cost function measures how well the learning output variable values are reproduced on average. In this process, the ascertained assessment is introduced in such a way that the weighting of at least one learning data set in the cost function is dependent on its assessment. [0021] For example, a learning data set may be weighted less the worse its assessment is. This may go up to the point that in response to the assessment of a learning data set meeting a predefined criterion, this learning data set drops out of the cost function entirely, i.e., is no longer used at all for the further training of the trainable module. The finding underlies this that the additional benefit provided by the consideration of a further learning data set may be entirely or partially compensated, or even overcompensated, by the contradictions resulting in the training process from an inaccurate or incorrect learning output variable value. No information may thus be better than spurious information. [0060] This assessment 2b is a measure of the extent to which the association of learning output variable values 13a with learning input variable values l la, thus the labeling of learning data set 2, is accurate in learning data set 2. [0060] This assessment 2b is a measure of the extent to which the association of learning output variable values 13a with learning input variable values l la, thus the labeling of learning data set 2, is accurate in learning data set 2. These blurbs establish that the claimed weighting improves the accuracy of the training process for an ANN because it is based on the assessment of the learning data set, in which the assessment measures the extent to which the association of learning output variable values with learning input variable values is accurate. Consequently, training a trainable module in accordance with a weighting determined in this manner is the sort of ‘additional element’ under Prong 2 that Section 2106.04(d)(1) of the MPEP regards as ‘demonstrating that the claim as a whole integrates the exception into a practical application’ by ‘improv[ing] the functioning of a computer’ in the context of an ANN (applicant emphasis added).” Applicant’s amendments and corresponding arguments that the claimed invention provides a technical improvement to the field of artificial intelligence by weighting learning samples in a cost function based on an assessment are persuasive. Therefore, the 101 rejections are withdrawn.
Regarding the 103 rejections, applicant's arguments filed with respect to the prior art rejections have been fully considered but they are moot. Applicant has amended the claims to recite new combinations of limitations in the form of a modified, now canceled, claim 22. Applicant's arguments are directed at the amendment. Please see below for new grounds of rejection, necessitated by Amendment.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 21, 24-26, 33, and 36-37 are rejected under 35 U.S.C. 103 as being unpatentable over Feng, et al., Non-Patent Literature “Class noise removal and correction for image classification using ensemble margin” (“Feng”) in view of Yang, et al., Non-Patent Literature “An Ensemble Classification Algorithm for Convolutional Neural Network based on AdaBoost” (“Yang”) and further in view of Merhav, et al., US Pre-Grant Publication 2017/0300811A1 (“Merhav”).
Regarding claim 21 and analogous claim 36 and 37, Feng discloses:
A computer-implemented method for training a trainable module, which converts one or multiple input variables into one or multiple output variables, the training being with the aid of learning data sets which contain learning input variable values and associated learning output variable values, (Feng, pg. 4698 col. 1, “In supervised learning [A computer-implemented method for training a trainable module, which converts one or multiple input variables into one or multiple output variables,], the training set is an essential element of the learning process [1] [the training being with the aid of learning data sets which contain learning input variable values and associated learning output variable values,]. But the real data to be classified often include a certain amount of noise which can be mainly of two types: class noise (mislabeled data) and attribute noise [2]. Effective noise handling is one of the most difficult problems in inductive machine learning [3].”).
at least the learning input variable values including measured data, which were obtained by: (i) a physical measuring process, and/or (ii) a partial or complete simulation of the measuring process, and/or (iii) a partial or complete simulation of a technical system observable using the measuring process, (Feng, pg. 4700 col. 1, “We applied the class noise removal and correction methods on 5 image data sets from UCI Machine Learning repository [25] (table 1) [at least the learning input variable values including measured data, which were obtained by: (i) a physical measuring process, and/or (ii) a partial or complete simulation of the measuring process, and/or (iii) a partial or complete simulation of a technical system observable using the measuring process,]. Each data set has been divided into three parts: training set, validation set and test set, as shown on table 1. We randomly chose a subset of 20% from the whole sets of training set and validation set respectively. The class label values of these selected examples were randomly labeled to another label.”).
the method comprising the following steps: pretraining, at least using a subset of the learning data sets, each of a plurality of modifications of the trainable module, which differ from one another so that the modifications are not congruently merged into one another with progressive learning; (Feng, pg. 4698 col. 1-2, “The ensemble approach is a popular method to filter out mislabeled instances [5, 6, 11, 13, 14, 15]. It detects the mislabeled instances by considering the vote of each base classifier in the ensemble to each instance [9] [the method comprising the following steps: pretraining, at least using a subset of the learning data sets,]. A typical approach is the majority vote filter. In this method [6], if more than half of all the base classifiers; an ensemble is interpreted as multiple different models combining their outputs to produce a prediction therefore each base classifier is interpreted as a modification (i.e. each of a plurality of modifications of the trainable module, which differ from one another so that the modifications are not congruently merged into one another with progressive learning;) of the ensemble classify an instance incorrectly, then this instance is tagged as mislabeled.”).
supplying, as input variable, learning input variable values of at least one of the learning data sets to all of the modifications; (Feng, pg. 4698 col. 2, “The ensemble margin [16] is a fundamental concept in ensemble learning. Several studies have shown that the generalization performance of an ensemble classifier is related to the distribution of its margins on the training examples [1, 16]. A good margin distribution means that most examples have large margins [17]. The decision by an ensemble for each instance is made by voting.” [supplying, as input variable, learning input variable values of at least one of the learning data sets to all of the modifications;]).
ascertaining, from a deviation from one another of output variable values, into which the modifications each convert the learning input variable values, a measure of the uncertainty of the output variable values, (Feng, pg. 4698 col. 2, “The decision by an ensemble for each instance is made by voting. The ensemble margin can be calculated as a difference between the votes [9] according to two different well-known definitions [18] in both supervised [16] and unsupervised [19, 20] ways.”; an ensemble margin is interpreted as a measure of uncertainty as it is a comparison of votes between different models, or modifications, for an output value (i.e. ascertaining, from a deviation from one another of output variable values, into which the modifications each convert the learning input variable values, a measure of the uncertainty of the output variable values,)).
and associating with the at least one of the learning data sets as a measure of an uncertainty of the at least one of the learning data sets; (Feng, pg. 4699 col. 2, “Let us consider an ensemble classifier C, and a set of n training data denoted as S = {(x1, y1), . . . ,(xn, yn)}, where xi is a vector with feature values and yi is the value of the class label. The mislabeled instance ordering approach, introduced in [11], simply relies on an ensemble margin’s definition as a class noise evaluation function; each xi and yi pair is interpreted as a learning data set therefore, the ensemble margin for each xi/yi pair is associating a measure of uncertainty for a learning data set (i.e. and associating with the at least one of the learning data sets as a measure of an uncertainty of the at least one of the learning data sets;), slightly modified here, defined as (5).”).
and based on the uncertainty, ascertaining an assessment of the at least one learning data set, which is a measure of an extent to which the association of the learning output variable values with the learning input variable values in the at least one learning data set is accurate; (Feng, pg. 4699 col. 2, “The higher N(xi), the higher the probability of xi being mislabeled. Relying on the margin-based noise evaluation function, the ordering-based mislabeled instance elimination algorithm consists of the following steps [11]: 1. Constructing an ensemble classifier C with all the n training data (xi , yi) ∈ S. 2. Computing the margin of each training instance xi . 3. Ordering all the training instances xi , that have been misclassified, according to their noise evaluation values N(xi), in descending order.”; the noise evaluation function is interpreted as an assessment as it is based on the ensemble margin and determines whether a sample is mislabeled (i.e. and based on the uncertainty, ascertaining an assessment of the at least one learning data set, which is a measure of an extent to which the association of the learning output variable values with the learning input variable values in the at least one learning data set is accurate;)).
wherein a distribution of the uncertainties is ascertained based on a plurality of the learning data sets and the assessment is ascertained based on the distribution, the distribution being modeled as a superposition of multiple parameterized contributions, which each originate from those of the learning data sets having identical or similar assessment, (Feng, pg. 4699 col. 2, “1. Constructing an ensemble classifier C with all the n training data (xi , yi) ∈ S. 2. Computing the margin of each training instance xi. 3. Ordering all the training instances xi , that have been misclassified, according to their noise evaluation values N(xi), in descending order; ordering all the training instances based on their probability of being mislabeled is interpreted as a distribution of uncertainties from a plurality of learning data sets (i.e. wherein a distribution of the uncertainties is ascertained based on a plurality of the learning data sets). 4. Eliminating the first M most likely mislabeled instances xi [and the assessment is ascertained based on the distribution,] to form a new cleaner training set; the ordered ranking is interpreted as a superposition distribution of multiple parametrized contributions because the noise, or mislabeled, assessment for each training sample is a combination of multiple classifier models votes/predictions (i.e. the distribution being modeled as a superposition of multiple parameterized contributions, which each originate from those of the learning data sets having identical or similar assessment,). 5. Evaluating the cleaned training set by classification performance, on a validation set. 6. Selecting the best filtered training set.”). 
and parameters of the contributions being optimized in such a way that a deviation of the distribution from the ascertained superposition is minimized to ascertain the contributions. (Feng, pg. 4699 col. 2, “Noise removal can discard some useful data, so we also attempt to automatically correct the training instances that have been identified as mislabeled (highest absolute margin misclassified instances); samples with the highest margins are interpreted as deviations from the distribution as they are identified as mislabeled samples compared to all other samples (i.e. that a deviation of the distribution from the ascertained superposition is minimized to ascertain the contributions.). Noise correction has been shown to give better results than simply removing the noise from the data set in some cases [12]. In a data correction scheme, the noisy instances are identified, but instead of removing these instances out, they are repaired by replacing corrupted values with more appropriate ones [12]; replacing the mislabeled samples with new values is interpreted as the parameters of the contributions being optimized as the noise assessment on the sample, after correction, changes (i.e. and parameters of the contributions being optimized in such a way). The labels of the most likely mislabeled instances are changed to the predicted classes.”).
Feng does not explicitly disclose:
pretraining
wherein adaptable parameters, which characterize a behavior of the trainable module, are optimized, with a goal of improving a value of a cost function, the cost function measuring an extent to which the trainable module maps the learning input variable values contained in the at least one learning data set on the associated learning output variable values,
a weighting of the at least one learning data set in the cost function being a function of the assessment of the learning data set; and training the trainable module in accordance with the weighting of the at least one learning data set;
Yang teaches pretraining (Yang, pg. 403-404 and Figure 3, “The purpose of pre-training phase is to obtain some base classifier with different vote weights [pretraining]. We hope that each base classifier has a different strong class by introducing class weights assignment method. The end condition of each training is each class’s error rate less than 0.5. That can reach after several epochs. SLC represent sample learning coefficient, BCW represent base classifier weight, same as the following.”).
Feng and Yang are both in the same field of endeavor (i.e. ensemble learning). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Feng and Yang to teach the above limitation(s). The motivation for doing so is that pre-training the base classifiers with specific sample learning coefficients improve diversity (cf. Yang, pg. 403 col. 2, “The purpose of pre-training phase is to obtain some base classifier with different vote weights. We hope that each base classifier has a different strong class”).
Feng in view of Yang does not explicitly teach:
wherein adaptable parameters, which characterize a behavior of the trainable module, are optimized, with a goal of improving a value of a cost function, the cost function measuring an extent to which the trainable module maps the learning input variable values contained in the at least one learning data set on the associated learning output variable values,
a weighting of the at least one learning data set in the cost function being a function of the assessment of the learning data set; and training the trainable module in accordance with the weighting of the at least one learning data set;
Merhav teaches:
wherein adaptable parameters, which characterize a behavior of the trainable module, are optimized, with a goal of improving a value of a cost function, the cost function measuring an extent to which the trainable module maps the learning input variable values contained in the at least one learning data set on the associated learning output variable values, (Merhav, 56, “Back propagation involves calculating a gradient of a loss function (defined later) in a loss layer 414, with respect to a number of weights in the DCNN 400 [wherein adaptable parameters, which characterize a behavior of the trainable module, are optimized,]. The gradient is then fed to a method that updates the weights for the next iteration of the training of the DCNN 400 in an attempt to minimize the loss function [with a goal of improving a value of a cost function,]).”, and Merhav, ⁋62, “For classification problems, the possible N output categories may be enumerated as integers, and the desired output may be represented as a binary feature vector, such as (0, 1, 0 . . . 0) to represent output label l=2. Thus, for classification problems, the DCNN is trained to output a vector which represents the probability of every class, and some probabilistic loss function [the cost function measuring an extent to which the trainable module maps the learning input variable values contained in the at least one learning data set on the associated learning output variable values,]”).
a weighting of the at least one learning data set in the cost function being a function of the assessment of the learning data set; and training the trainable module in accordance with the weighting of the at least one learning data set; (Merhav, ⁋93-94, “In other example embodiments, however, the dynamic loss function [the cost function] may be more nuanced, applying statistical tests. For example, a Gaussian distribution of errors may be assumed, and the samples weighted by their chances of violating the Gaussian assumption [a weighting of the at least one learning data set in the cost function being a function of the assessment of the learning data set;]. This means the mean μ and standard deviation σ in the current hatch, and the measurements may be normalized as follows: z=(loss−μ)/σ and the samples may be weighted by their probability of not belonging to the error statistics: k(loss, rank)=loss*(1−erf(z)). The result is that the loss function is dynamically updated in each stage of the DCNN based on statistical analysis of which sample images showed the most deviation between their assigned professionalism score and an expected professionalism score [and training the trainable module in accordance with the weighting of the at least one learning data set;].”).
Feng, in view of Yang, and Merhav are both in the same field of endeavor (i.e. sample noise). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Feng, in view of Yang, and Merhav to teach the above limitation(s). The motivation for doing so is that weighting samples in a loss function based on an error measure prevents outliers from negatively affecting the model’s performance (cf. Merhav, see ⁋86-94).
Regarding claim 24, Feng in view of Yang and Merhav teaches the method as recited in claim 21. Feng further teaches wherein in response to the assessment of a learning data set of the at least one learning data set meeting a predefined criterion, an update of at least one learning output variable value contained in the learning data set is requested. (Feng, pg. 4699 col. 2, “Noise removal can discard some useful data, so we also attempt to automatically correct the training instances that have been identified as mislabeled [wherein in response to the assessment of a learning data set of the at least one learning data set meeting a predefined criterion,] (highest absolute margin misclassified instances). Noise correction has been shown to give better results than simply removing the noise from the data set in some cases [12]. In a data correction scheme, the noisy instances are identified, but instead of removing these instances out, they are repaired by replacing corrupted values with more appropriate ones [an update of at least one learning output variable value contained in the learning data set is requested.] [12]. The labels of the most likely mislabeled instances are changed to the predicted classes.”).
Regarding claim 25, Feng in view of Yang and Merhav teaches the method as recited in claim 21. Feng further teaches further comprising: ascertaining based on the deviation of the distribution from the superposition whether only learning data sets having identical or similar assessments have contributed to the distribution. (Feng, pg. 4699 col. 2, “The higher N(xi), the higher the probability of xi being mislabeled [ascertaining based on the deviation of the distribution from the superposition]. Relying on the margin-based noise evaluation function, the ordering-based mislabeled instance elimination algorithm consists of the following steps [11]: 1. Constructing an ensemble classifier C with all the n training data (xi , yi) ∈ S. 2. Computing the margin of each training instance xi . 3. Ordering all the training instances xi, that have been misclassified, according to their noise evaluation values N(xi), in descending order. 4. Eliminating the first M most likely mislabeled instances xi; eliminating the first M most likely mislabeled samples is interpreted as similar assessments having a negative contribution to the distribution as they are mislabeled (i.e. whether only learning data sets having identical or similar assessments have contributed to the distribution.) to form a new cleaner training set.”).
Regarding claim 26, Feng in view of Yang and Merhav teaches the method as recited in claim 21. Feng further teaches wherein various contributions to the superposition are modeled using identical parameterized functions, but using parameters independent from one another. (Feng, pg. 4699 col. 2, “The mislabeled instance ordering approach, introduced in [11], simply relies on an ensemble margin’s definition as a class noise evaluation function, slightly modified here, defined as (5).”; the noise evaluation function is used for each sample but each sample has a different input/output pair and ensemble margin thus the functions are identical but the parameters are different for each sample or contribution (i.e. wherein various contributions to the superposition are modeled using identical parameterized functions, but using parameters independent from one another.)). 
Regarding claim 33, Feng in view of Yang and Merhav teaches the method as recited in claim 21. Feng further teaches wherein a Kullback-Liebler divergence, and/or a Hellinger distance, and/or a Levy distance, and/or a Levy-Prochorov metric, and/or a Wasserstein metric, and/or a Jensen-Shannon divergence, and/or another scalar measure of an extent to which the contributions differ from one another is ascertained from the contributions. (Feng, pg. 4699 col. 2, “The higher N(xi), the higher the probability of xi being mislabeled. Relying on the margin-based noise evaluation function, the ordering-based mislabeled instance elimination algorithm consists of the following steps [11]: 1. Constructing an ensemble classifier C with all the n training data (xi , yi) ∈ S. 2. Computing the margin of each training instance xi . 3. Ordering all the training instances xi , that have been misclassified, according to their noise evaluation values N(xi), in descending order.”; ranking the samples based on their noise evaluation values is interpreted as a scalar measure of their contributions as a higher probability of being mislabeled is interpreted as a worse contribution (i.e. The method as recited in claim 21, wherein a Kullback-Liebler divergence, and/or a Hellinger distance, and/or a Levy distance, and/or a Levy-Prochorov metric, and/or a Wasserstein metric, and/or a Jensen-Shannon divergence, and/or another scalar measure of an extent to which the contributions differ from one another is ascertained from the contributions.)).

Claims 27-31 are rejected under 35 U.S.C. 103 as being unpatentable over Feng, et al., Non-Patent Literature “Class noise removal and correction for image classification using ensemble margin” (“Feng”) in view of Yang, et al., Non-Patent Literature “An Ensemble Classification Algorithm for Convolutional Neural Network based on AdaBoost” (“Yang”) and further in view of Merhav, et al., US Pre-Grant Publication 2017/0300811A1 (“Merhav”) and Bootkrajang, et al., Non-Patent Literature “A generalised label noise model for classification in the presence of annotation errors” (“Bootkrajang”).
Regarding claim 27, Feng in view of Yang and Merhav teaches the method as recited in claim 21. However, the combination does not explicitly teach wherein at least one of the parameterized contributions is modeled as a statistical distribution. 
Bootkrajang teaches wherein at least one of the parameterized contributions is modeled as a statistical distribution. (Bootkrajang, pg. 62 col. 1, “We do this by expressing label flipping probabilities by a parametric function. We employ the probability density function of the exponential distribution [wherein at least one of the parameterized contributions is modeled as a statistical distribution.] to model the likelihood of label flipping. This function is chosen in order to capture noises in a scenario where points that live closer to the decision boundary have relatively higher chance of being mislabelled than those that live further away.”).
Feng, in view of Yang and Merhav, and Bootkrajang are both in the same field of endeavor (i.e. label noise). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Feng, in view of Yang and Merhav, and Bootkrajang to teach the above limitation(s). The motivation for doing so is to reduce the negative effects of label noise (cf. Bootkrajang, pg. 62 col. 1, “Experiments show that the proposed method is able to counter the negative effect of the label noise while maintaining the computational feasibility of learning the model.”).
Regarding claim 28, Feng in view of Yang, Merhav, and Bootkrajang teaches the method as recited in claim 27. Bootkrajang further teaches wherein the statistical distribution is a normal distribution, and/or an exponential distribution, and/or a gamma distribution, and/or a chi- square distribution, and/or a beta distribution, and/or an exponential Weibull distribution, and/or a Dirichlet distribution. (Bootkrajang, pg. 62 col. 1, “We do this by expressing label flipping probabilities by a parametric function. We employ the probability density function of the exponential distribution [wherein the statistical distribution is…an exponential distribution,] to model the likelihood of label flipping. This function is chosen in order to capture noises in a scenario where points that live closer to the decision boundary have relatively higher chance of being mislabelled than those that live further away.”).
Regarding claim 29, Feng in view of Yang and Merhav teaches the method as recited in claim 21. However, the combination does not explicitly teach wherein the parameters of the contributions are optimized according to a likelihood method and/or according to a Bayesian method. 
Bootkrajang teaches wherein the parameters of the contributions are optimized according to a likelihood method and/or according to a Bayesian method. (Bootkrajang, pg. 63 col. 1, “Putting everything together, the objective function of the generalised robust Logistic Regression (gLR) is a penalised log-likelihood [according to a likelihood method]: L=∑n=1N(y˜nlog[ωn11Pn1+ωn01Pn0]+(1−y˜n)log[ωn00Pn0+ωn10Pn1])−∑i=1mαi|wi|+∑j=01λjlog(γj−1) (7) …To optimise the objective, we use the gradient-descent method to update w, γ0 and γ1. [wherein the parameters of the contributions are optimized]”). 
Feng, in view of Yang and Merhav, and Bootkrajang are both in the same field of endeavor (i.e. label noise). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Feng, in view of Yang and Merhav, and Bootkrajang to teach the above limitation(s). The motivation for doing so is to reduce the negative effects of label noise (cf. Bootkrajang, pg. 62 col. 1, “Experiments show that the proposed method is able to counter the negative effect of the label noise while maintaining the computational feasibility of learning the model.”).
Regarding claim 30, Feng in view of Yang and Merhav teaches the method as recited in claim 21. However, the combination does not explicitly teach wherein the parameters of the contributions are optimized using an expectation maximization algorithm, and/or using an expectation/conditional maximization algorithm, and/or using an expectation conjugate gradient algorithm, and/or using a Riemann batch algorithm, and/or using a Newton-based method, and/or using a Markov chain Monte Carlo-based method, and/or using a stochastic gradient algorithm.
Bootkrajang teaches wherein the parameters of the contributions are optimized using an expectation maximization algorithm, and/or using an expectation/conditional maximization algorithm, and/or using an expectation conjugate gradient algorithm, and/or using a Riemann batch algorithm, and/or using a Newton-based method, and/or using a Markov chain Monte Carlo-based method, and/or using a stochastic gradient algorithm. (Bootkrajang, pg. 63 col. 1, “To optimise the objective, we use the gradient-descent method [wherein the parameters of the contributions are optimized using… a stochastic gradient algorithm.] to update w, γ0 and γ1. We adopt an effective smooth approximation, j wi j ðw2 i þηÞ 1=2, originally proposed by [34] to take care of the discontinuity of the objective at the origin caused by the L1 regularisation.”). 
Feng, in view of Yang and Merhav, and Bootkrajang are both in the same field of endeavor (i.e. label noise). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Feng, in view of Yang and Merhav, and Bootkrajang to teach the above limitation(s). The motivation for doing so is to reduce the negative effects of label noise (cf. Bootkrajang, pg. 62 col. 1, “Experiments show that the proposed method is able to counter the negative effect of the label noise while maintaining the computational feasibility of learning the model.”).
Regarding claim 31, Feng in view of Yang and Merhav teaches the method as recited in claim 21. However, the combination does not explicitly teach wherein the assessment of the at least one learning data set is ascertained based on a local probability density, which outputs at least one contribution to the superposition when the uncertainty of the at least one learning data set is supplied to it as an input, and/or based on a ratio of the local probability densities. 
Bootkrajang teaches wherein the assessment of the at least one learning data set is ascertained based on a local probability density, which outputs at least one contribution to the superposition when the uncertainty of the at least one learning data set is supplied to it as an input, and/or based on a ratio of the local probability densities. (Bootkrajang, pg. 62 col. 1, “We do this by expressing label flipping probabilities by a parametric function. We employ the probability density function of the exponential distribution to model the likelihood of label flipping. This function is chosen in order to capture noises in a scenario where points that live closer to the decision boundary; points being close to a boundary are interpreted as a local assessment (i.e. wherein the assessment of the at least one learning data set is ascertained based on a local probability density,) have relatively higher chance of being mislabelled than those that live further away.”; identifying whether a sample is mislabeled is interpreted as outputting a contribution to the superposition, or distribution (i.e. which outputs at least one contribution to the superposition when the uncertainty of the at least one learning data set is supplied to it as an input)). 
Feng, in view of Yang and Merhav, and Bootkrajang are both in the same field of endeavor (i.e. label noise). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Feng, in view of Yang and Merhav, and Bootkrajang to teach the above limitation(s). The motivation for doing so is to reduce the negative effects of label noise (cf. Bootkrajang, pg. 62 col. 1, “Experiments show that the proposed method is able to counter the negative effect of the label noise while maintaining the computational feasibility of learning the model.”).

Claim 23 is rejected under 35 U.S.C. 103 as being unpatentable over Feng, et al., Non-Patent Literature “Class noise removal and correction for image classification using ensemble margin” (“Feng”) in view of Yang, et al., Non-Patent Literature “An Ensemble Classification Algorithm for Convolutional Neural Network based on AdaBoost” (“Yang”) and further in view of Merhav, et al., US Pre-Grant Publication 2017/0300811A1 (“Merhav”) and Ma, et al., Non-Patent Literature “Penalized feature selection and classification in bioinformatics” (“Ma”).
Regarding claim 23, Feng in view of Yang and Merhav teaches the method as recited in claim 22. Feng in view of Yang and Merhav teaches the assessment as seen in claim 22. However, the combination does not explicitly teach wherein in response to the assessment of a learning data set of the at least one learning data set meeting a predefined criterion, the learning data set is no longer taken into consideration in the cost function.
Ma teaches wherein in response to the assessment of a learning data set of the at least one learning data set meeting a predefined criterion, the learning data set is no longer taken into consideration in the cost function. (Ma, pg. 395 col. 2, “With penalization methods, feature selection and classifier construction are achieved simultaneously by computing                         
                            
                                
                                    β
                                
                                ^
                            
                        
                    , estimate of β that minimizes a penalized objective function. Classification rule can be defined as Y = 1 if XT                         
                            
                                
                                    β
                                
                                ^
                            
                        
                     > c for a cutoff c; the cutoff is interpreted as meeting a predefined criteria (i.e. wherein in response to the assessment of a learning data set of the at least one learning data set meeting a predefined criterion,). With properly tuned penalties, estimated β can have components exactly equal to zero. Feature selection is thus achieved, since only variables with nonzero coefficients will be used in the classifier; this means that the inputs that do not meet the cutoff are not considered in the objective function or cost function (i.e. the learning data set is no longer taken into consideration in the cost function.). Specifically, we define                         
                            
                                
                                    β
                                
                                ^
                            
                        
                     as                         
                            
                                
                                    β
                                
                                ^
                            
                            =
                            
                                
                                    a
                                    r
                                    g
                                    m
                                    i
                                    n
                                
                                
                                    β
                                
                            
                            {
                            m
                            
                                
                                    β
                                    ;
                                    D
                                
                            
                            +
                             
                            λ
                             
                            x
                             
                            p
                            e
                            n
                             
                            
                                
                                    β
                                
                            
                            }
                        
                    , where D represents the dataset consisting of (x1, y1), ... ,(xn, yn). In [1], m is referred to as the ‘classification objective function’.”).
Feng, in view of Yang and Merhav, and Ma are both in the same field of endeavor (i.e. classification). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Feng, in view of Yang and Merhav, and Ma to teach the above limitation(s). The motivation for doing so is that feature selection using penalties can exclude negative effects from noisy features (cf. Ma, pg. 392 col. 1-2, “feature selection can help to (i) provide more insights into the underlying causal relationships by focusing on a smaller number of features; (ii) generate more reliable estimates by excluding noises and (iii) provide faster and more efficient models”).

Claim 35 is rejected under 35 U.S.C. 103 as being unpatentable over Feng, et al., Non-Patent Literature “Class noise removal and correction for image classification using ensemble margin” (“Feng”) in view of Yang, et al., Non-Patent Literature “An Ensemble Classification Algorithm for Convolutional Neural Network based on AdaBoost” (“Yang”) and further in view of Merhav, et al., US Pre-Grant Publication 2017/0300811A1 (“Merhav”) and Rebhan, US Pre-Grant Publication 2015/0057907A1 (“Rebhan”).
Regarding claim 35, the claim is analogous to claim 21 and Feng in view of Yang and Merhav teaches the analogous limitations. Feng further teaches a classification system (Feng, see title). However, the combination does not teach the additional limitations: 
operating the trainable module by supplying to the trainable module first input variable values, the first input variable values including measured data, which were obtained by: (i) a physical measuring process, and/or (ii) a partial or complete simulation of the measuring process, and/or (iii) a partial or complete simulation of a technical system observable using the measuring process;
and as a function of output variable values supplied by the trainable module, activating a classification system using an activation signal.
Rebhan teaches:
operating the trainable module by supplying to the trainable module first input variable values, the first input variable values including measured data, which were obtained by: (i) a physical measuring process, and/or (ii) a partial or complete simulation of the measuring process, and/or (iii) a partial or complete simulation of a technical system observable using the measuring process; (Rebhan, ⁋3, “Relying on sensor equipment such as radar (radio detection and ranging), lidar (light detection and ranging), cameras for imaging, etc., for providing data of a host vehicle's environment, different functions related to driving or maneuvering can be implemented.” [operating the trainable module by supplying to the trainable module first input variable values, the first input variable values including measured data, which were obtained by: (i) a physical measuring process,]).
and as a function of output variable values supplied by the trainable module, activating a classification system using an activation signal (Rebhan, ⁋24, “In a further aspect of the invention, there is provided a driver assistance system for controlling a vehicle [activating a classification system], the driver assistance system comprising at least one sensor means configured to acquire sensor data, at least one actuating means configured to perform a control action for the vehicle, and a control means, wherein the control means comprises first evaluation means configured to generate a decision signal from the sensor data [and as a function of output variable values supplied by the trainable module,] acquired by the sensor means, and an activation decision means configured to generate an activation signal for the control action [using an activation signal.] when the decision signal exceeds a signal threshold”).
Feng, in view of Yang and Merhav, and Rebhan are both in the same field of endeavor (i.e. classification). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Feng, in view of Yang and Merhav, and Rebhan to teach the above limitation(s). The motivation for doing so is that an activation signal based on the model outputs improves system stability (cf. Rebhan, ⁋16-17).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Guo, et al., “Building an ensemble classifier using ensemble margin. Application to image classification” teaches using and ensemble bagging margins to identify mislabeled data samples for elimination. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NICHOLAS S WU whose telephone number is (571)270-0939. The examiner can normally be reached Monday - Friday 8:00 am - 4:00 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michelle Bechtold can be reached on 571-431-0762. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/N.S.W./Examiner, Art Unit 2148                                                                                                                                                                                                        /MICHELLE T BECHTOLD/Supervisory Patent Examiner, Art Unit 2148
Read full office action
Prosecution Timeline

Jul 01, 2021
Application Filed
Aug 29, 2024
Non-Final Rejection — §103
Dec 05, 2024
Response Filed
Mar 20, 2025
Final Rejection — §103
Sep 02, 2025
Request for Continued Examination
Sep 08, 2025
Response after Non-Final Action
Nov 19, 2025
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/882,311
Patent 12488244
APPARATUS AND METHOD FOR DATA GENERATION FOR USER ENGAGEMENT
2y 5m to grant Granted Dec 02, 2025
17/444,687
Patent 12423576
METHOD AND APPARATUS FOR UPDATING PARAMETER OF MULTI-TASK MODEL, AND STORAGE MEDIUM
2y 5m to grant Granted Sep 23, 2025
17/265,476
Patent 12361280
METHOD AND DEVICE FOR TRAINING A MACHINE LEARNING ROUTINE FOR CONTROLLING A TECHNICAL SYSTEM
2y 5m to grant Granted Jul 15, 2025
17/191,518
Patent 12354017
ALIGNING KNOWLEDGE GRAPHS USING SUBGRAPH TYPING
2y 5m to grant Granted Jul 08, 2025
17/161,152
Patent 12333425
HYBRID GRAPH NEURAL NETWORK
2y 5m to grant Granted Jun 17, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
47%
Grant Probability
90%
With Interview (+43.1%)
3y 9m
Median Time to Grant
High
PTA Risk
Based on 38 resolved cases by this examiner. Grant probability derived from career allow rate.