Last updated: May 29, 2026

Application No. 17/657,041

MORE ROBUST TRAINING FOR ARTIFICIAL NEURAL NETWORKS

Non-Final OA §103

Filed

Mar 29, 2022

Priority

Apr 13, 2021 — DE 10 2021 109 168.3

Examiner

HAN, KYU HYUNG

Art Unit

2123

Tech Center

2100 — Computer Architecture & Software

Assignee

Robert Bosch GmbH

OA Round

3 (Non-Final)

Interview Optional

— +7.1% interview lift. Interview lift (+7.1%) is below the 15.0% threshold. A written response is recommended.

Based on 11 resolved cases, 2023–2026

Examiner Intelligence

HAN, KYU HYUNG View full profile →

Grants 46% of resolved cases

Career Allowance Rate

5 granted / 11 resolved

-9.5% vs TC avg

Moderate +7% lift

Without

With

+7.1%

Interview Lift

resolved cases with interview

Typical timeline

4y 1m

Avg Prosecution

15 currently pending

Career history

Total Applications

across all art units

Statute-Specific Performance

§101

4.6%

-35.4% vs TC avg

§103

95.4%

+55.4% vs TC avg

Black line = Tech Center average estimate • Based on career data from 11 resolved cases

Office Action

§103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 01/29/2026 has been entered.

Response to Remarks
Claim Rejections – 35 U.S.C. 103
Applicant’s prior art arguments have been fully considered and they are not persuasive.
Applicant argues (pg. 5-6) that neither Keller or Duyck teach the amended limitation: “wherein a variability in the learning input quantity values is amplified by an influence of the sequence of the quasi-random numbers on the multiplicity of processing units”. Particularly, Applicant asserts that Duyck does not teach the amended limitation because it instead discloses modifications to dropout that is based on a paper by Baldi and Sadowski, where it is shown that the expectation of the dropout gradient is approximately equal to a regularization of the gradient of the network weighted.
Examiner respectfully disagrees. First, note that Applicant’s characterization of Duyck is not mutually exclusive to it teaching the amended limitation – Duyck’s dropout is merely based on a paper by Baldi and Sadowski; additionally, it also incorporates elements regarding quasi-random number sequences. Indeed, Duyck states in Page 4, Paragraph 6: “If a network can be sampled using low-discrepancy sequences, the equally distributed nature of low-discrepancy sequences allow greater diversity in the set of networks sampled. Each network sampled through dropout can be expressed as a binary vector, in which each element of the vector corresponds to each node in the network. For each node, if the corresponding value in the vector is 1, the node is retained, and if it is 0, the node is dropped out. To get a sequence of binary vectors, we use Sobol and Halton sequences” Duyck teaches that the low-discrepancy sequences, which are also known as quasi-random sequences, allow greater diversity in the set of networks sampled. Duyck uses Sobol and Halton sequences to achieve this. Furthermore, Duyck states in Page 2, Paragraph 4: “probabilistically finds the optimal dropout rate for a given node based on the results of the previous layers.” Duyck teaches that the deactivation is based on the output of the previous layers, which is necessarily based on the quantity values of the nodes in the previous layers as well as the output of such nodes (both input and output quantity values). This shows that the quasi-random sequences, in effect, enhance the variability of the input and output quantity values to choose the set of processing units, via dropout.
The foregoing applies to all independent claims and their dependent claims. 

Claim Rejections – 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 5-13 are rejected under 35 U.S.C. 103 as being unpatentable over Keller et al. (US 20190294972 A1) hereinafter known as Keller in view of Duyck et al. (“Modified Dropout for Training Neural Network”) hereinafter known as Duyck.

Regarding independent claim 1, Keller teaches:
A method for training an artificial neural network (ANN), which includes a multiplicity of processing units, the method comprising: optimizing parameters that characterize a behavior of the ANN a according to a cost function; (Keller ¶ [0113]: “If the neural network does not correctly label the input, then errors between the correct label and the predicted label are analyzed, and the weights are adjusted for each feature during a backward propagation phase until the DNN correctly labels the input and other inputs in a training dataset.” Keller teaches that the parameters are adjusted according to a simple loss function that indicates whether there is a discrepancy between the correct label and the predicted label.)

Keller does not explicitly teach:
and deactivating, depending on outputs determined from learning input quantity values and on learning output quantity values, an output of at least one selected processing unit, and selection of the selected processing unit being achieved using a sequence of quasi-random numbers, 
wherein a variability in the learning input quantity values is amplified by an influence of the sequence of the quasi-random numbers on the multiplicity of processing units.

However, Duyck teaches:
and deactivating, depending on outputs determined from learning input quantity values and on learning output quantity values, an output of at least one selected processing unit, and selection of the selected processing unit being achieved using a sequence of quasi-random numbers, (Duyck [Page 4, Paragraph 6]: “A similar concept can be applied to dropout. If a network can be sampled using low-discrepancy sequences, the equally distributed nature of low-discrepancy sequences allow greater diversity in the set of networks sampled. Each network sampled through dropout can be expressed as a binary vector, in which each element of the vector corresponds to each node in the network. For each node, if the corresponding value in the vector is 1, the node is retained, and if it is 0, the node is dropped out. To get a sequence of binary vectors, we use Sobol and Halton sequences.” Duyck teaching deactivating the neurons based on quasi-random sequences that correspond with the neurons in the neural network. Deactivating the neurons, which are used in to output a result, is effectively deactivating the output of a result. This is clear in the particular case of where a sufficiently large number of neurons in a layer is deactivated and the result for that layer is not represented, and thus is deactivated, in the output.)
wherein a variability in the learning input quantity values is amplified by an influence of the sequence of the quasi-random numbers on the multiplicity of processing units. (Duyck [Page 4, Paragraph 6]: “For each node, if the corresponding value in the vector is 1, the node is retained, and if it is 0, the node is dropped out. To get a sequence of binary vectors, we use Sobol and Halton sequences.” Duyck teaches quasi-random sequences to choose the multiplicity of processing units by dropout. Duyck [Page 2, Paragraph 4]: “probabilistically finds the optimal dropout rate for a given node based on the results of the previous layers.” Indeed, Duyck shows that the deactivation is based on the output of the previous layers, which is necessarily based on the quantity values of the nodes in the previous layers as well as the output of such nodes (both input and output quantity values) This shows that the quasi-random sequences, in effect, enhance the variability of the input and output quantity values to choose the set of processing units, via dropout.)

Keller and Duyck are in the same field of endeavor as the present invention, as the
references are directed to deactivating (or effectively deactivating via sampling) neurons based on quasi-random sequences. It would have been obvious, before the effective filing date of the claimed invention, to a person of ordinary skill in the art, to combine optimizing parameters of the artificial neural network (ANN) according to a cost function as taught in Keller with deactivating neurons based on a sequence of quasi-random numbers as taught in Duyck. Duyck provides this additional functionality. As such, it would have been obvious to one of ordinary skill in the art to modify the teachings of Keller to include teachings of Duyck because the combination would allow for ANNs to be trained based on a function even while deactivating a portion of neurons in the ANN. This has the potential benefit of speeding up the computation of the ANN, as fewer neurons are needed to be computed.

Regarding dependent claim 2, Keller and Duyck teach:
The method as recited in claim 1, wherein the sequence of quasi-random numbers is initialized using a random value. (Keller [¶ 0132 - ¶ 0133] “A deterministic choice would be low discrepancy sequences and a simple randomized alternative is jittered equidistant sampling… for one realization of the random variable…” Keller teaches that the quasi-random (low discrepancy) sequence is initialized based on a random value.)

The reasons to combine are substantially similar to those of claim 1.

Regarding dependent claim 5, Keller and Duyck teach:
The method as recited in claim 1, wherein a specifiable proportion of the processing units of the ANN is selected and deactivated. (Keller [¶ 0176] “Assigning a neural unit to exactly one partition p … out of …  partitions uses fewer pseudo-random number generator calls and guarantees that all neural units are considered.” Keller also teaches a partition in addition to a dropout, where a particular partition of the processing units are selected and the remaining are deactivated.

The reasons to combine are substantially similar to those of claim 1.


Regarding dependent claim 6, Keller and Duyck teach:
The method as recited in claim 1, wherein the sequence of quasi-random numbers is one of the following sequences: Halton sequence, Hammersley sequence, Niederreiter sequence, Kronecker sequence, Sobol sequence, Van der Corput sequence. (Duyck [Page 4, Paragraph 6]: “A similar concept can be applied to dropout. If a network can be sampled using low-discrepancy sequences, the equally distributed nature of low-discrepancy sequences allow greater diversity in the set of networks sampled. Each network sampled through dropout can be expressed as a binary vector, in which each element of the vector corresponds to each node in the network. For each node, if the corresponding value in the vector is 1, the node is retained, and if it is 0, the node is dropped out. To get a sequence of binary vectors, we use Sobol and Halton sequences.” Duyck teaches that the Sobol and Halton sequences are used.)

The reasons to combine are substantially similar to those of claim 1.

Regarding dependent claim 7, Keller and Duyck teach:
The method as recited in claim 1, wherein the ANN is configured as a classifier.
The reasons to combine are substantially similar to those of claim 1. (Keller [¶ 0023] “In another embodiment, the output data may include one or more of a classification” Keller teaches that the ANN may be configured as a classifier.)

The reasons to combine are substantially similar to those of claim 1.

Regarding dependent claim 8, Keller and Duyck teach:
The method as recited in claim 7, wherein the ANN is configured as a classifier of image data and/or audio data. (Keller [¶ 0023] “In yet another embodiment, the input data may include environmental data (e.g., recorded image data of an environment surrounding an automobile, etc.), and the output data may include an identification/classification of one or more objects within the environmental data (such as cars, cyclists, pedestrians, etc.).” Keller teaches that the ANN may be configured as a classifier of image data.)

The reasons to combine are substantially similar to those of claim 1.

Regarding independent claim 9, Keller and Duyck teach:
Claim 9 is substantially similar to claim 1, but has the additional elements:
A non-transitory machine-readable storage medium on which is stored a computer program … the computer program, when executed by a computer, causing the computer to perform the following steps: (Keller [¶ 0090] “Computer programs, or computer control logic algorithms, may be stored in the main memory 440 and/or the secondary storage. Such computer programs, when executed, enable the system 465 to perform various functions. The memory 440, the storage, and/or any other storage are possible examples of computer-readable media.” Keller teaches storage to store the computer program.)

The reasons to combine are substantially similar to those of claim 1.


Regarding independent claim 10, Keller and Duyck teach:
Claim 10 is substantially similar to claim 1, but has the additional elements:
A training device configure … the training device configured to: (Keller [¶ 0091] “For example, the system 465 may take the form of a desktop computer, a laptop computer, a tablet computer, servers, supercomputers, a smart-phone (e.g., a wireless, hand-held device), personal digital assistant (PDA), a digital camera, a vehicle, a head mounted display, a hand-held electronic device, a mobile phone device, a television, workstation, game consoles, embedded system, and/or any other type of logic”) Keller teaches a system/device.)

The reasons to combine are substantially similar to those of claim 1.

Regarding independent claim 11, Keller and Duyck teach:
The method as recited in claim 1, wherein the processing units corresponds to a plurality of neurons of the ANN, and wherein a length of the sequence of quasi-random numbers is greater than a number of the neurons of the ANN that are to be deactivated. (Duyck [Page 4, Paragraph 4]: “low-discrepancy sequence is a deterministic sequence that is more equidistributed in a given space compared to pseudo-random sequences. Formally, for a given space, low-discrepancy sequences minimize the difference between the proportion of points in a subspace and the proportion of the volume of the subspace to the volume of a space, for any subspace. Thus, the points are evenly distributed inside the space” Duyck teaches that the neurons are sampled using a distribution called a low-discrepancy sequence. Since there are necessarily more possible values in the sequence than is being sampled (the definition of sampling), the length of the sequence of quasi-random numbers is greater than the length of numbers sampled corresponding to the deactivated neurons.)

The reasons to combine are substantially similar to those of claim 1.

Claim 12 is rejected on the same grounds under 35 U.S.C. 103 as claim 11, as they are substantially similar. Mutatis mutandis.

Claim 13 is rejected on the same grounds under 35 U.S.C. 103 as claim 11, as they are substantially similar. Mutatis mutandis.

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Keller in view of Duyck in view of CodyBugstein (“Sum a series of series where each value increments by one” https://math.stackexchange.com/questions/829443/sum-a-series-of-series-where-each-value-increments-by-one) hereinafter known as CodyBugstein. 

Regarding dependent claim 4, Keller and Duyck teach:
The method as recited in claim 3, 

Keller and Duyck do not explicitly teach:
wherein the change in the initialization is performed by a specifiable increment. 

However, CodyBugstein teaches:
wherein the change in the initialization is performed by a specifiable increment. (CodyBugstein [Page 1, Paragraph 1]: “Can anyone suggest an elegant way to sum a series of numbers like this: (1, 2, 3, 4) (2, 3, 4, 5) (3, 4, 5, 6) (4, 5, 6, 7).” CodyBugstein teaches a series of sequences where each value increments by one in the subsequent series. As shown in page 8-9 in the specification of the present invention, the pseudocode of the algorithm shows that for each iteration, the matching criteria is incremented by 1 in each loop. Since the h function described in the specification (page 8) is effectively a normalization function and using the modulo by the number of neurons has the effect of keeping the index in bounds, the concept of incrementing the offset is analogous to incrementing any sequence of numbers. CodyBugstein teaches incrementing this sequence of numbers.)
CodyBugstein is in the same field of endeavor as the present invention, since it is directed to determining computations based on a series of sequences where a subsequent sequence is incremented by some number.  It would have been obvious, before the effective filing date of the claimed invention, to a person of ordinary skill in the art, to combine deactivating a portion of neurons using a criterion taught in Keller as modified by Duyck with incorporating a simple increment as taught in CodyBugstein. CodyBugstein provides this additional functionality.  As such, it would have been obvious to one of ordinary skill in the art to modify the teachings of Keller as modified by Duyck to include teachings of CodyBugstein because the combination would allow for the criterion for deactivating a neuron to be changed by a simple scalar addend in subsequent training loops. This has the potential benefit of being able to deactivate different portions of neurons in different training loops, which may improve the generalizability of the model.
Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Keller in view of Duyck in view of CodyBugstein in view of Golkar at al. (“Continual Learning via Neural Pruning”) hereinafter known as Golkar. 

Regarding dependent claim 3, Keller, Duyck, and CodyBugstein teach:
The method as recited in claim 2, wherein the initialization of the sequence of random numbers is changed … (CodyBugstein [Page 1, Paragraph 1]: “Can anyone suggest an elegant way to sum a series of numbers like this: (1, 2, 3, 4) (2, 3, 4, 5) (3, 4, 5, 6) (4, 5, 6, 7).” CodyBugstein teaches a series of sequences where each value increments by one in the subsequent series. As shown in page 8-9 in the specification of the present invention, the pseudocode of the algorithm shows that for each iteration, the matching criteria is incremented by 1 in each loop. Since the h function described in the specification (page 8) is effectively a normalization function and using the modulo by the number of neurons has the effect of keeping the index in bounds, the concept of incrementing the offset is analogous to incrementing any sequence of numbers. CodyBugstein teaches incrementing this sequence of numbers.)

Keller, Duyck, and CodyBugstein do not explicitly teach:
… after each training pass has been carried out.

However, Golkar teaches:
… after each training pass has been carried out. (Golkar [Page 5, Paragraph 4] “This step is referred to as fine-tuning and is done by retraining the network for a few epochs while only updating the weights which survive sparsification. This causes the model to regain some of its lost performance because of pruning. To achieve a yet higher level of sparsity, one can iterate the pruning and fine-tuning steps multiple times.” Golkar teaches that multiple rounds of training can be done to prune (or effectively deactivate) the neurons.)

Golkar is in the same field of endeavor as the present invention since it is directed to training machine learning models over many training passes while pruning (or effectively deactivating) neurons.  It would have been obvious, before the effective filing date of the claimed invention, to a person of ordinary skill in the art, to combine incrementing a sequence of numbers (effectively incrementing an addend in a series of sequences) taught in CodyBugstein with multiple training passes of deactivation of neurons taught in Golkar.  As such, it would have been obvious to one of ordinary skill in the art to modify the teaching of Keller as modified by Duyck as modified by CodyBugstein to include the teachings of Golkar because the combination would allow for multiple training passes to deactivate different portions of neurons because the incrementation of the offset addend allows for a different matching criterion of neuron to deactivate. This has the potential benefit of increasing the efficiency of the neural network model as the number of active nodes may be decreased, increasing the speed of the operation.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KYU HYUNG HAN whose telephone number is (703) 756-5529.  The examiner can normally be reached on MF 9-5.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on (571) 270-3428.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 

/Kyu Hyung Han/
Examiner
Art Unit 2123

/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123

Read full office action

Prosecution Timeline

Mar 29, 2022

Application Filed

Feb 18, 2025

Non-Final Rejection mailed — §103

Jul 18, 2025

Response Filed

Oct 29, 2025

Final Rejection mailed — §103

Jan 29, 2026

Request for Continued Examination

Feb 08, 2026

Response after Non-Final Action

Mar 31, 2026

Non-Final Rejection mailed — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/332,295

Patent 12585928

HARDWARE ARCHITECTURE FOR INTRODUCING ACTIVATION SPARSITY IN NEURAL NETWORK

4y 10m to grant Granted Mar 24, 2026

17/317,300

Patent 12387101

SYSTEMS AND METHODS FOR PRUNING BINARY NEURAL NETWORKS GUIDED BY WEIGHT FLIPPING FREQUENCY

4y 3m to grant Granted Aug 12, 2025

Study what changed to get past this examiner. Based on 2 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

3-4

Expected OA Rounds

46%

Grant Probability

53%

With Interview (+7.1%)

4y 1m (~0m remaining)

Median Time to Grant

High

PTA Risk

Based on 11 resolved cases by this examiner. Grant probability derived from career allowance rate.