Last updated: April 19, 2026
Application No. 17/394,780
MORE FLEXIBLE ITERATIVE OPERATION OF ARTIFICIAL NEURAL NETWORKS

Final Rejection §103§112
Filed
Aug 05, 2021
Examiner
THAI, JASMINE THANH
Art Unit
2129
Tech Center
2100 — Computer Architecture & Software
Assignee
Robert Bosch GmbH
OA Round
4 (Final)
This examiner grants 25% of cases after interview

— +56.3% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 24 resolved cases, 2023–2026
Examiner Intelligence

THAI, JASMINE THANH View full profile →
Grants only 25% of cases
Career Allow Rate
6 granted / 24 resolved
-30.0% vs TC avg
Strong +56% interview lift
Without
With
+56.3%
Interview Lift
resolved cases with interview
Typical timeline
4y 0m
Avg Prosecution
30 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
23.6%
-16.4% vs TC avg
§103
37.2%
-2.8% vs TC avg
§102
14.6%
-25.4% vs TC avg
§112
21.8%
-18.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 24 resolved cases
Office Action

§103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed 01/01/2026 have been fully considered but they are not persuasive.
Regarding applicant’s remarks directed to the rejection of claims under 35 USC § 103,
Alleged No teaching of an energy cost of the ANN increases linearly
	In Remarks p. 10-11, Applicant contends:
	Hemmat does not disclose a constant slope across iterations for the normalized power (energy cost) “the values in Table II show normalized power changing nonlinearly. That is to say, if the normalized power changed linearly in Table II, it would have exhibited a constant slop across iterations, but that is not the case.”
	The relevant claim limitations appear to be “wherein as the portion of the parameters increases over the iterations, a classification accuracy of the ANN increases nonlinearly and an energy cost of the ANN increases linearly” in claim 1. 
As noted in the previous Office Action, Hemmat teaches (emphasis added):
Hemmat teaches wherein as the portion of the parameters increases over the iterations, a classification accuracy of the ANN increases nonlinearly and an energy cost of the ANN increases linearly.
Examiner interprets the claim in light of the specification of the published application (“[0066]
Certain energy costs C also accrue when no parameters 15 c are changed. Starting with this basic amount, energy costs C increase linearly with the number of changed parameters 15 c′. Classification accuracy A, however, increases non-linearly. It increases drastically already if only a few parameters 15 c′ are changed. This growth weakens with the increasing number of changed parameters 15 c′ and at some point reaches a state of saturation. It is therefore advantageous to exploit for a small price of additional energy costs C the initially large increase in classification accuracy A.”)

    PNG
    media_image1.png
    623
    620
    media_image1.png
    Greyscale
(Hemmat, Figure 2, pg. 6 Col 1, “In this experiment we show the tradeoff between the classification accuracy and percentage of the included weights of our procedure during incremental inference per input. The percentage of included weights directly relates to the percentage of the DRAM accesses which is the dominant source of power consumption as we reported in Table I. The results are shown in Table II for different CNNs. The “base” row is when no clustering is applied so it has the maximum classification accuracy. The fraction of fetched weights is reported in column 2. Note, this fraction is constant per iteration [wherein as the portion of the parameters increases over the iterations, a classification accuracy of the ANN increases nonlinearly (see annotated table 2 wherein there is a big jump between iteration 1 and 2 of each dataset) and an energy cost of the ANN increases linearly ie constant with the fraction of weights] regardless of the received input as clustering is done once offline and the number of clusters and subsequently the fraction of weights in each cluster will not change in runtime. Power consumption (normalized to the base row) is reported in column 3. Here the normalized power is computed over all the testing input dataset. Note, the normalized power in one row (iteration) corresponds to (normalized) sum over all inputs for which the weights were fetched in that iteration. In other words, if the procedure terminated with fewer iterations for an input, then that input is not counted towards power consumption in future iterations. Column 4 measures the classification accuracy using the testing dataset for each CNN. To compute the classification accuracy at a particular row (iteration), we only considered the inputs for which our procedure terminated up to that iteration. The classification accuracy for the rest of the inputs was set to 0 because our algorithm had not yet terminated for them. As we can see from the table, increase in number of iterations results in increase in the classification accuracy because the percentage of inputs which are classified by our algorithm increases. We also observe that, for example in LeNet300-100, with only 3 iterations, the CNN reaches almost the same accuracy as the base case. This is achieved with only a small fraction of weights (=0.16+0.05+0.01=0.22). (The number of DRAM accesses are decreased with the same fraction). The total normalized power after 3 iterations is also significant (=0.08+0.07+0.01=0.16) in LeNet300-100. In summary with 3 iterations, and using 22% of the weights we achieve almost the same classification accuracy as the base CNN while consuming only 16% of power. We observe similar trend for the other two CNNs which are shown in the table.”);
In response to applicant's argument that the references fail to show certain features of the invention, it is noted that the features upon which applicant relies (i.e., linearly meaning constantly) are not recited in the rejected claim(s). Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
Further, Examiner previously looked to the specification for how linear energy cost should be interpreted (“[0066] Certain energy costs C also accrue when no parameters 15 c are changed. Starting with this basic amount, energy costs C increase linearly with the number of changed parameters 15 c′. Classification accuracy A, however, increases non-linearly. It increases drastically already if only a few parameters 15 c′ are changed. This growth weakens with the increasing number of changed parameters 15 c′ and at some point reaches a state of saturation. It is therefore advantageous to exploit for a small price of additional energy costs C the initially large increase in classification accuracy A.”) The specification discloses that such energy cost C increases even without a change in parameters. Thus, Examiner notes that the BRI of linearly increasing across measured data points (energy cost) is a trend.
Thus, after careful consideration, applicants arguments are unpersuasive as Hemmat teaches increasing energy cost that can be recognized as a linear increasing trend.
 The examiner refers to the rejection under 35 USC § 103 in the current office action for more details.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 1-9, 13-15 and 20-23 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 1 and claims 13-15 recites “feeding, once the iterations of the iterative block are completed, wherein the output supplied by the iterative block is input to a layer of the ANN following the iterative block or is provided as an output of the ANN… supplying the output of the ANN to an actuator of a vehicle; and controlling a physical action of the vehicle by the actuator based on the output of the ANN” 
Examiner respectfully points out that based on recitation of a conditional “or” clause recited in the claim, only one proposition can be true:
“wherein the output supplied by the iterative block is input to a layer of the ANN following the iterative block” or 
In this branch, there is only the output supplied by the iterative block and no recitation of an output of the ANN.
Wherein the output of the iterative block “is provided as an output of the ANN”
In this branch, the output of the iterative block is the output of the ANN.
Thus, considering branch 1 as the BRI of the claim, the subsequent limitations reciting “the output of the ANN” lacks antecedent basis.
For examination purposes, Examiner interprets these limitations as the output of the ANN is supplied by the layer of the ANN following the last iterative block.
	Claims 2-9 and 20-23 are further rejected on virtue of their dependencies to claim 1 or claims 13- 15.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-5, 7-8, 10, 12-13, and 20-23 is/are rejected under 35 U.S.C. 103 as being unpatentable over US Pub No. US20190005330A1 Uhlenbrock et al. (“Uhlenbrock”) in view of Pinheiro, P. &amp; Collobert, R.. (2014). Recurrent Convolutional Neural Networks for Scene Labeling. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(1):82-90 (“Pinheiro”) in further view of Hemmat et al., "Dynamic Reconfiguration of CNNs for Input-Dependent Approximation," [2019] ("Hemmat")
In regards to claim 1 and analogous claims 14 and 15,
Uhlenbrock teaches A method for operating an artificial neural network (ANN), which processes inputs in a sequence of layers to form outputs, 
(Uhlenbrock, “[0009] This disclosure provides a system for scene classification. In various embodiments, the system includes one or more processors and a memory. The memory is a non-transitory computer-readable medium having executable instructions encoded thereon, such that upon execution of the instructions, the one or more processors perform several operations, including operating at least two parallel, independent processing pipelines on an image or video to generate independent results; fusing the independent results of the at least two parallel, independent processing pipelines to generate a fused scene class; and electronically controlling machine behavior based on the fused scene class of the image or video.”)
Uhlenbrock teaches and feeding, once the iterations of the iterative block are completed, the output supplied by the iterative block input to a layer of the ANN following the iterative block or is provided as an output of the ANN; 
For examination purposes, Examiner interprets these limitations as the output of the ANN is supplied by the layer of the ANN following the last iterative block. (See 112(b) rejection)
(Uhlenbrock, “[0061] In class-level fusion and as shown in FIG. 6, information is combined at the class probability level such that the scene class 600 from the entity pipeline is fused with the scene class 602 from the whole image pipeline. More specifically, two classifiers 604 and 604′ are trained separately for the entities and whole image features. Each classifier 604 and 604′ produces a class probability distribution over the scene types (i.e., scene classes 600 and 602). These distributions are combined 606 (e.g., multiplied and renormalized) to produce the final classifier result 608 (fused scene class) [feeding, once the iterations of the iterative block are completed, the output supplied by the iterative block input to a layer of the ANN following the iterative block ie 606 (see fig. 6)].”; wherein a CNN can be replaced by the recurrent network of Pinheiro)

    PNG
    media_image2.png
    627
    651
    media_image2.png
    Greyscale

Uhlenbrock teaches supplying the output of the ANN to an actuator of a vehicle; and controlling a physical action of the vehicle by the actuator based on the output of the ANN.
(Uhlenbrock, “[0062] Based on the fused scene class, a number of actions can be taken by the associated system. For example, the system can electronically control machine behavior based on the fused scene class of the image or video, such as labeling data associated with the image or video with the fused scene class, displaying the fused scene class with the image or video, controlling vehicle performance (such as causing a mobile platform (e.g., vehicle, drone, UAV, etc.) to move or maneuver to or away from an identified scene class (such as away from a building, or to a person, etc.)), or controlling processor performance (e.g., increasing processing speed when the class is a busy road, yet decrease speed when the class is an open ocean to conserve processing capacity and power, etc.) [supplying the output of the ANN to an actuator of a vehicle; and controlling a physical action of the vehicle by the actuator based on the output of the ANN]. As another example, the system can be caused to display the image or video (on a display) with a label that includes the fused scene class.”)

However, Uhlenbrock does not explicitly teach the method comprising the following steps: establishing, within the ANN, at least one iterative block, made up of one or of multiple layers, which is to be implemented multiple times; 
establishing a number J of iterations, for which the iterative block is at most to be implemented; mapping, by the iterative block, an input of the iterative block onto an output; feeding the output to the iterative block as input and again mapping by the iterative block onto a new output, wherein the input corresponds to measured data, and wherein each of the output and the new output corresponds to a respective classification score;
wherein a portion of parameters, which characterize a behavior of the layers in the iterative block, is changed during switches between the iterations, for which the iterative block is implemented; wherein as the portion of the parameters increases over the iterations, a classification accuracy of the ANN increases nonlinearly and an energy cost of the ANN increases linearly
Pinheiro teaches the method comprising the following steps: establishing, within the ANN, at least one iterative block, made up of one or of multiple layers, which is to be implemented multiple times; 
(Pinheiro, Section 3.2, “The recurrent architecture [within the ANN] (see Figure 2) consists of the composition of P instances of the “plain” convolutional network f(·) [at least one iterative block, made up of one or of multiple layers, which is to be implemented multiple times] introduced in Section 3.1. Each instance has identical (shared) trainable parameters (W, b). For clarity, we drop the (W, b) notation in subsequent paragraphs.”)
Pinheiro teaches establishing a number J of iterations, for which the iterative block is at most to be implemented; mapping, by the iterative block, an input of the iterative block onto an output; feeding the output to the iterative block as input and again mapping by the iterative block onto a new output, wherein the input corresponds to measured data, and wherein each of the output and the new output corresponds to a respective classification score;
(Pinheiro, Section 3.2, “The p th instance of the network (1 ≤ p ≤ P) [establishing a number J of iterations, for which the iterative block is at most to be implemented; wherein the instance is performed for a max of P times] is fed with an input “image” F p of N + 3 features maps

    PNG
    media_image3.png
    31
    359
    media_image3.png
    Greyscale

which are the output label planes of the previous instance, and the scaled2 version of the raw RGB squared patch surrounding the pixel at location (i, j) of the training image k. Note that the first network instance takes 0 label maps as previous label predictions.”)

    PNG
    media_image4.png
    308
    496
    media_image4.png
    Greyscale
(Pinheiro, Section 3.2, “Figure 2. System considering one (f), two (f ◦ f) and three (f ◦ f ◦ f) instances of the network. In all three cases, the architecture produces labels (1 × 1 output planes) corresponding to the pixel at the center of the input patch [mapping, by the iterative block, an input of the iterative block onto an output]. Each network instance is fed with the previous label predictions [feeding the output to the iterative block as input and again mapping by the iterative block onto a new output, wherein the input corresponds to measured data ie image data measured by a camera, and wherein each of the output and the new output corresponds to a respective classification score], as well as a RGB patch surrounding the pixel of interest. For space constraints, we do not show the label maps of the first instances, as they are zero maps. Adding network instances increases the context patch size seen by the architecture (both RGB pixels and previous predicted labels).”; see annotated figure 2 (with inserted figure 1))
Wherein Pinheiro teaches a label as the class of the object a pixel belongs to
(Pinheiro, Section 1, “In the computer vision field, scene labeling is the task of fully labeling an image pixel-by-pixel with the class of the object each pixel belongs to.”)
Pinheiro teaches wherein a portion of parameters, which characterize a behavior of the layers in the iterative block, is changed during switches between the iterations, for which the iterative block is implemented; 
(Pinheiro, Section 3.1, “More specifically, the parameters (W, b) of the network f(·) are learned in an end-to-end supervised way, by minimizing the negative log-likelihood over the training set:

    PNG
    media_image5.png
    68
    593
    media_image5.png
    Greyscale

where li,j,k is the correct pixel label class at position (i, j) in image Ik. The minimization is achieved with the Stochastic Gradient Descent (SGD) algorithm with a fixed learning rate λ:

    PNG
    media_image6.png
    74
    553
    media_image6.png
    Greyscale
[wherein a portion of parameters, which characterize a behavior of the layers in the iterative block, is changed during switches between the iterations, for which the iterative block is implemented]”)
Hemmat teaches wherein as the portion of the parameters increases over the iterations, a classification accuracy of the ANN increases nonlinearly and an energy cost of the ANN increases linearly.
Examiner interprets the claim in light of the specification of the published application (“[0066]
Certain energy costs C also accrue when no parameters 15 c are changed. Starting with this basic amount, energy costs C increase linearly with the number of changed parameters 15 c′. Classification accuracy A, however, increases non-linearly. It increases drastically already if only a few parameters 15 c′ are changed. This growth weakens with the increasing number of changed parameters 15 c′ and at some point reaches a state of saturation. It is therefore advantageous to exploit for a small price of additional energy costs C the initially large increase in classification accuracy A.”)

    PNG
    media_image1.png
    623
    620
    media_image1.png
    Greyscale
(Hemmat, Figure 2, pg. 6 Col 1, “In this experiment we show the tradeoff between the classification accuracy and percentage of the included weights of our procedure during incremental inference per input. The percentage of included weights directly relates to the percentage of the DRAM accesses which is the dominant source of power consumption as we reported in Table I. The results are shown in Table II for different CNNs. The “base” row is when no clustering is applied so it has the maximum classification accuracy. The fraction of fetched weights is reported in column 2. Note, this fraction is constant per iteration [wherein as the portion of the parameters increases over the iterations, a classification accuracy of the ANN increases nonlinearly (see annotated table 2 wherein there is a big jump between iteration 1 and 2 of each dataset) and an energy cost of the ANN increases linearly ie constant with the fraction of weights] regardless of the received input as clustering is done once offline and the number of clusters and subsequently the fraction of weights in each cluster will not change in runtime. Power consumption (normalized to the base row) is reported in column 3. Here the normalized power is computed over all the testing input dataset. Note, the normalized power in one row (iteration) corresponds to (normalized) sum over all inputs for which the weights were fetched in that iteration. In other words, if the procedure terminated with fewer iterations for an input, then that input is not counted towards power consumption in future iterations. Column 4 measures the classification accuracy using the testing dataset for each CNN. To compute the classification accuracy at a particular row (iteration), we only considered the inputs for which our procedure terminated up to that iteration. The classification accuracy for the rest of the inputs was set to 0 because our algorithm had not yet terminated for them. As we can see from the table, increase in number of iterations results in increase in the classification accuracy because the percentage of inputs which are classified by our algorithm increases. We also observe that, for example in LeNet300-100, with only 3 iterations, the CNN reaches almost the same accuracy as the base case. This is achieved with only a small fraction of weights (=0.16+0.05+0.01=0.22). (The number of DRAM accesses are decreased with the same fraction). The total normalized power after 3 iterations is also significant (=0.08+0.07+0.01=0.16) in LeNet300-100. In summary with 3 iterations, and using 22% of the weights we achieve almost the same classification accuracy as the base CNN while consuming only 16% of power. We observe similar trend for the other two CNNs which are shown in the table.”);
Uhlenbrock and Pinheiro are both considered to be analogous to the claimed invention because they are in the same field of scene labeling utilizing convolutional neural networks. Pinheiro is further in the same field of recurrent neural architectures and Uhlenbrock is further in the same field of controlling autonomous vehicles with neural networks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Uhlenbrock to incorporate the teachings of Pinheiro in order to provide a recurrent convolutional neural network to allow for a large input context while limiting the capacity of the model. (Pinheiro, Abstract, “The goal of the scene labeling task is to assign a class label to each pixel in an image. To ensure a good visual coherence and a high class accuracy, it is essential for a model to capture long range (pixel) label dependencies in images. In a feed-forward architecture, this can be achieved simply by considering a sufficiently large input context patch, around each pixel to be labeled. We propose an approach that consists of a recurrent convolutional neural network which allows us to consider a large input context while limiting the capacity of the model. Contrary to most standard approaches, our method does not rely on any segmentation technique nor any task-specific features. The system is trained in an end-to-end manner over raw pixels, and models complex spatial dependencies with low inference cost. As the context size increases with the built-in recurrence, the system identifies and corrects its own errors. Our approach yields state-of-the-art performance on both the Stanford Background Dataset and the SIFT Flow Dataset, while remaining very fast at test time.”)
Hemmat is considered to be analogous to the claimed invention because they are in the same field of limiting the number of weights used in a CNN improve power consumption and iterative frameworks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Uhlenbrock and Pinheiro to incorporate the teachings of Hemmat in order to achieve power saving through a novel framework which enables dynamic reconfiguration (Hemmat, Abstract, “In this work, we propose a novel framework which enables dynamic reconfiguration of an already-trained Convolutional Neural Network (CNN) in hardware during inference. The reconfiguration enables input-dependent approximation of the CNN to achieve power saving without much degradation in its classification accuracy at run-time. For each input, our framework uses only a fraction of the CNN's edge weights based on that input (with the rest remaining 0) to conduct the inference. Consequently, power saving is possible due to fewer number of fetches from off-chip memory as well as fewer multiplications for majority of the inputs. To achieve per-input approximation, we use clustering algorithm which groups similar weights in the CNN based on their importance, and design an iterative framework which decides how many clusters of weights should be fetched from off-chip memory for each individual input. We also propose new hardware structures to implement our framework on top of a recently-proposed FPGA-based CNN accelerator. In our experiments with popular CNNs, we show significant power saving with almost no degradation in classification accuracy due to doing inference with only a fraction of the edge weights for the majority of the inputs.”)
Regarding claim 2, 
Uhlenbrock and Pinheiro and Hemmat teaches The method as recited in claim 1
Hemmat teaches wherein, starting from at least one iteration of the iterative block, a proportion of between 1% and 20% of the parameters which characterize the behavior of the layers in the iterative block, are changed during the switch to a next iteration.

    PNG
    media_image7.png
    485
    487
    media_image7.png
    Greyscale
(Hemmat, Figure 2, pg. 6 Col 1, “In this experiment we show the tradeoff between the classification accuracy and percentage of the included weights of our procedure during incremental inference per input. The percentage of included weights directly relates to the percentage of the DRAM accesses which is the dominant source of power consumption as we reported in Table I. The results are shown in Table II for different CNNs. The “base” row is when no clustering is applied so it has the maximum classification accuracy. The fraction of fetched weights is reported in column 2. Note, this fraction is constant per iteration [a proportion [of between 1% and 20%] of the parameters which characterize the behavior of the layers in the iterative block, are changed during the switch to a next iteration] regardless of the received input as clustering is done once offline and the number of clusters and subsequently the fraction of weights in each cluster will not change in runtime. Power consumption (normalized to the base row) is reported in column 3. Here the normalized power is computed over all the testing input dataset. Note, the normalized power in one row (iteration) corresponds to (normalized) sum over all inputs for which the weights were fetched in that iteration. In other words, if the procedure terminated with fewer iterations for an input, then that input is not counted towards power consumption in future iterations. Column 4 measures the classification accuracy using the testing dataset for each CNN. To compute the classification accuracy at a particular row (iteration), we only considered the inputs for which our procedure terminated up to that iteration. The classification accuracy for the rest of the inputs was set to 0 because our algorithm had not yet terminated for them. As we can see from the table, increase in number of iterations results in increase in the classification accuracy because the percentage of inputs which are classified by our algorithm increases. We also observe that, for example in LeNet300-100, with only 3 iterations, the CNN reaches almost the same accuracy as the base case. This is achieved with only a small fraction of weights (=0.16+0.05+0.01=0.22). (The number of DRAM accesses are decreased with the same fraction). The total normalized power after 3 iterations is also significant (=0.08+0.07+0.01=0.16) [of between 1% and 20% wherein each iteration only used less than 9% of the parameters] in LeNet300-100. In summary with 3 iterations, and using 22% of the weights we achieve almost the same classification accuracy as the base CNN while consuming only 16% of power. We observe similar trend for the other two CNNs which are shown in the table.”); Hemmat discloses minimizing the percentage of weights used over each iteration trends to lower power consumption. It would, therefore, have been obvious to optimize proportion of change for the parameters to have any appropriate value including one between 1% and 20% in order to achieve a particular desired power consumption. 

Regarding claim 3, 
Uhlenbrock and Pinheiro and Hemmat teaches The method according to claim 2
Hemmat teaches wherein the proportion is between 1% and 15%.
(Hemmat, Figure 2, pg. 6 Col 1, “In this experiment we show the tradeoff between the classification accuracy and percentage of the included weights of our procedure during incremental inference per input. The percentage of included weights directly relates to the percentage of the DRAM accesses which is the dominant source of power consumption as we reported in Table I. The results are shown in Table II for different CNNs. The “base” row is when no clustering is applied so it has the maximum classification accuracy. The fraction of fetched weights is reported in column 2. Note, this fraction is constant per iteration regardless of the received input as clustering is done once offline and the number of clusters and subsequently the fraction of weights in each cluster will not change in runtime. Power consumption (normalized to the base row) is reported in column 3. Here the normalized power is computed over all the testing input dataset. Note, the normalized power in one row (iteration) corresponds to (normalized) sum over all inputs for which the weights were fetched in that iteration. In other words, if the procedure terminated with fewer iterations for an input, then that input is not counted towards power consumption in future iterations. Column 4 measures the classification accuracy using the testing dataset for each CNN. To compute the classification accuracy at a particular row (iteration), we only considered the inputs for which our procedure terminated up to that iteration. The classification accuracy for the rest of the inputs was set to 0 because our algorithm had not yet terminated for them. As we can see from the table, increase in number of iterations results in increase in the classification accuracy because the percentage of inputs which are classified by our algorithm increases. We also observe that, for example in LeNet300-100, with only 3 iterations, the CNN reaches almost the same accuracy as the base case. This is achieved with only a small fraction of weights (=0.16+0.05+0.01=0.22). (The number of DRAM accesses are decreased with the same
fraction). The total normalized power after 3 iterations is also significant (=0.08+0.07+0.01=0.16) [wherein the proportion is between 1% and 15%. wherein each iteration only used less than 9% of the parameters] in LeNet300-100. In summary with 3 iterations, and using 22% of the weights we achieve almost the same classification accuracy as the base CNN while consuming only 16% of power. We observe similar trend for the other two CNNs which are shown in the table.”); Hemmat discloses minimizing the percentage of weights used over each iteration trends to lower power consumption. It would, therefore, have been obvious to optimize proportion of change for the parameters to have any appropriate value including one between 1% and 15% in order to achieve a particular desired power consumption.

Regarding claim 4, 
Uhlenbrock and Pinheiro and Hemmat teaches The method as recited in claim 1
Hemmat teaches wherein a first portion of the parameters is changed during a first switch between iterations and a second portion of the parameters is changed during a second switch between iterations, the second portion not being congruent with the first portion
 (Hemmat, pg. 3 Col. 1, “The above procedure approximates the CNN per input with as few non-zero (original) weights as possible. Each iteration of our procedure [switch between iterations] can be viewed as applying a level of approximation for the considered input, with the first iteration having the most approximation [wherein a first portion of the parameters is changed] and the final one [a second portion of the parameters] having the least [the second portion not being congruent with the first portion]. A separate inference is done per iteration so the procedure may be viewed as performing incremental inference with more [portion of parameters is changed] number of (non-zero) weights [parameters, which characterize a behavior of the layers in the iterative block] per iterations [between the iterations].”); 

Regarding claim 5, 
Uhlenbrock and Pinheiro and Hemmat teaches The method as recited in claim 1
Hemmat teaches wherein the parameters are stored in a memory, in which each write operation physically acts upon memory locations of multiple parameters and, starting from at least one iteration, all parameters, whose memory locations are acted upon by at least one write operation, are changed during the switch to the next iteration.
(Hemmat, Figure 3, pg. 4 Col 1, “As shown in Figure 3, in the base CNN accelerator [13], the weights and input data are stored in off-chip memory and are transferred to on-chip buffers through a DDR3 interface. In addition, each layer of the network is implemented separately and has its own input/output/weight buffers and computational units, as shown in Figure 4. During inference [starting from at least one iteration, all parameters, whose memory locations are acted upon by at least one write operation, are changed during the switch to the next iteration], the weights and input data are read from the main memory and are stored in the on-chip buffers [the parameters are stored in a memory] according to the instructions provided by a Read Weight Controller unit [in which each write operation physically acts upon memory locations of multiple parameters and].”);


In regards to claim 7, 
Uhlenbrock and Pinheiro and Hemmat The method as recited in claim 1, 
Pinheiro teaches wherein the ANN is selected, which processes inputs initially including multiple convolution layers and ascertains from a result obtained with at least one further layer as output at least one classification score relating to a predefined classification, and the iterative block is established in such a way that the iterative block includes at least a portion of the convolution layers.

    PNG
    media_image4.png
    308
    496
    media_image4.png
    Greyscale
(Pinheiro, Figure 1, “Figure 1. A simple convolutional network. Given an image patch providing a context around a pixel to classify (here blue), a series of convolutions and pooling operations (filters slid through input planes) are applied (here, five 4 × 4 convolutions, followed by one 2 × 2 pooling, followed by two 2 × 2 convolutions. Each 1 × 1 output plane is interpreted as a score for a given class.”)


In regards to claim 8, 
Uhlenbrock and Pinheiro and Hemmat teaches The method as recited in claim 7, 
Uhlenbrock teaches wherein image data and/or time series data are selected as the inputs of the ANN.
(Uhlenbrock, “[0054] For further understanding, upon receiving a whole image 300 (from a sensor (e.g., camera), database, video stream, etc.), the system performs convolution 302 by convolving the image 300 with various filters.”)

In regards to claim 10, 
Uhlenbrock teaches A method for training an artificial neural network (ANN),… and providing an actuator of a vehicle configured to receive the outputs of the ANN, wherein the actuator is configured to control a physical action of the vehicle based on the outputs of the ANN.
(Uhlenbrock, “[0009] This disclosure provides a system for scene classification. In various embodiments, the system includes one or more processors and a memory. The memory is a non-transitory computer-readable medium having executable instructions encoded thereon, such that upon execution of the instructions, the one or more processors perform several operations, including operating at least two parallel, independent processing pipelines on an image or video to generate independent results; fusing the independent results of the at least two parallel, independent processing pipelines to generate a fused scene class; and electronically controlling machine behavior based on the fused scene class of the image or video [providing an actuator of a vehicle configured to receive the outputs of the ANN, wherein the actuator is configured to control a physical action of the vehicle based on the outputs of the ANN].”)
However, Uhlenbrock does not explicitly teach comprising the following steps:providing learning inputs and associated learning outputs onto which the ANN is to map in each case the learning inputs;mapping the learning inputs by the ANN onto outputs;assessing a deviation of the outputs from the learning outputs, using a predefined loss function, wherein the learning inputs correspond to measured data, and wherein the outputs correspond to a respective classification score; andoptimizing parameters which characterize a behavior of layers in an iterative block of the ANN, including changes of the parameters during switches between iterations of the iterative block, to the extent that during further processing of learning inputs by the ANN, the assessment is improved using the loss function; wherein: the changes of the parameters during switches between iterations of the iterative block includes changing a portion of the parameters during the switches, andas the portion of the parameters increases over the iterations, a classification accuracy of the ANN increases nonlinearly and an energy cost of the ANN increases linearly.
Pinheiro teaches comprising the following steps: providing learning inputs and associated learning outputs onto which the ANN is to map in each case the learning inputs; 
mapping the learning inputs by the ANN onto outputs; 

    PNG
    media_image4.png
    308
    496
    media_image4.png
    Greyscale
(Pinheiro, Section 3.2, “Figure 2. System considering one (f), two (f ◦ f) and three (f ◦ f ◦ f) instances of the network. In all three cases, the architecture produces labels (1 × 1 output planes) corresponding to the pixel at the center of the input patch [mapping, by the iterative block, an input of the iterative block onto an output]. Each network instance is fed with the previous label predictions, as well as a RGB patch surrounding the pixel of interest. For space constraints, we do not show the label maps of the first instances, as they are zero maps. Adding network instances increases the context patch size seen by the architecture (both RGB pixels and previous predicted labels).”; see annotated figure 2 (with inserted figure 1))
Wherein Pinheiro teaches a label as the class of the object a pixel belongs to
(Pinheiro, Section 1, “In the computer vision field, scene labeling is the task of fully labeling an image pixel-by-pixel with the class of the object each pixel belongs to [mapping the learning inputs by the ANN onto outputs].”)

Pinheiro teaches assessing a deviation of the outputs from the learning outputs, 
(Pinheiro, Section 3.2, “As shown in Figure 2, the size of the input patch Ii,j,k needed to label one pixel increases with the number of compositions of f. However, the capacity of the system remains constant, since the parameters of each network instance are shared. The system is trained by maximizing the likelihood

    PNG
    media_image8.png
    40
    462
    media_image8.png
    Greyscale

where L(f) is a shorthand for the likelihood introduced in (5) in the case of the plain CNN, and ◦ p denotes the composition operation performed p times. This way, we ensure that each network instance is trained to output the correct label at location (i, j). In that respect, the system is able to learn to correct its own mistakes (made by earlier instances) [assessing a deviation of the outputs from the learning outputs]. It can also learn label dependencies, as an instance receives as input the label predictions made by the previous instance around location (i, j) (see Figure 2)”)
Pinheiro teaches using a predefined loss function, wherein the learning inputs correspond to measured data, and wherein the outputs correspond to a respective classification score; and optimizing parameters which characterize a behavior of layers in an iterative block of the ANN, including changes of the parameters during switches between iterations of the iterative block, to the extent that during further processing of learning inputs by the ANN, the assessment is improved using the loss function; 
(Pinheiro, Section 3.1, “The network is trained by transforming the scores fc(Ii,j,k; (W, b)) (for each class of interest c ∈ {1, ..., N}) into conditional probabilities, by applying a softmax function:

    PNG
    media_image9.png
    83
    532
    media_image9.png
    Greyscale

and maximizing the likelihood of the training data. More specifically, the parameters (W, b) of the network f(·) are learned in an end-to-end supervised way, by minimizing the negative log-likelihood over the training set:

    PNG
    media_image10.png
    64
    514
    media_image10.png
    Greyscale
[using a predefined loss function, wherein the learning inputs correspond to measured data, and wherein the outputs correspond to a respective classification score]
where li,j,k is the correct pixel label class at position (i, j) in image Ik. The minimization is achieved with the Stochastic Gradient Descent (SGD) algorithm with a fixed learning rate λ: [optimizing parameters which characterize a behavior of layers in an iterative block of the ANN, including changes of the parameters during switches between iterations of the iterative block]

    PNG
    media_image11.png
    61
    481
    media_image11.png
    Greyscale
”)
Hemmat teaches wherein: the changes of the parameters during switches between iterations of the iterative block includes changing a portion of the parameters during the switches, and as the portion of the parameters increases over the iterations, a classification accuracy of the ANN increases nonlinearly and an energy cost of the ANN increases linearly.
Examiner interprets the claim in light of the specification of the published application (“[0066]
Certain energy costs C also accrue when no parameters 15 c are changed. Starting with this basic amount, energy costs C increase linearly with the number of changed parameters 15 c′. Classification accuracy A, however, increases non-linearly. It increases drastically already if only a few parameters 15 c′ are changed. This growth weakens with the increasing number of changed parameters 15 c′ and at some point reaches a state of saturation. It is therefore advantageous to exploit for a small price of additional energy costs C the initially large increase in classification accuracy A.”)

    PNG
    media_image1.png
    623
    620
    media_image1.png
    Greyscale
(Hemmat, Figure 2, pg. 6 Col 1, “In this experiment we show the tradeoff between the classification accuracy and percentage of the included weights of our procedure during incremental inference per input. The percentage of included weights directly relates to the percentage of the DRAM accesses which is the dominant source of power consumption as we reported in Table I. The results are shown in Table II for different CNNs. The “base” row is when no clustering is applied so it has the maximum classification accuracy. The fraction of fetched weights [the changes of the parameters during switches between iterations of the iterative block includes changing a portion of the parameters during the switches] is reported in column 2. Note, this fraction is constant per iteration [wherein as the portion of the parameters increases over the iterations, a classification accuracy of the ANN increases nonlinearly (see annotated table 2 wherein there is a big jump between iteration 1 and 2 of each dataset) and an energy cost of the ANN increases linearly ie constant with the fraction of weights] regardless of the received input as clustering is done once offline and the number of clusters and subsequently the fraction of weights in each cluster will not change in runtime. Power consumption (normalized to the base row) is reported in column 3. Here the normalized power is computed over all the testing input dataset. Note, the normalized power in one row (iteration) corresponds to (normalized) sum over all inputs for which the weights were fetched in that iteration. In other words, if the procedure terminated with fewer iterations for an input, then that input is not counted towards power consumption in future iterations. Column 4 measures the classification accuracy using the testing dataset for each CNN. To compute the classification accuracy at a particular row (iteration), we only considered the inputs for which our procedure terminated up to that iteration. The classification accuracy for the rest of the inputs was set to 0 because our algorithm had not yet terminated for them. As we can see from the table, increase in number of iterations results in increase in the classification accuracy because the percentage of inputs which are classified by our algorithm increases. We also observe that, for example in LeNet300-100, with only 3 iterations, the CNN reaches almost the same accuracy as the base case. This is achieved with only a small fraction of weights (=0.16+0.05+0.01=0.22). (The number of DRAM accesses are decreased with the same fraction). The total normalized power after 3 iterations is also significant (=0.08+0.07+0.01=0.16) in LeNet300-100. In summary with 3 iterations, and using 22% of the weights we achieve almost the same classification accuracy as the base CNN while consuming only 16% of power. We observe similar trend for the other two CNNs which are shown in the table.”);
Uhlenbrock and Pinheiro are both considered to be analogous to the claimed invention because they are in the same field of scene labeling utilizing convolutional neural networks. Pinheiro is further in the same field of recurrent neural architectures and Uhlenbrock is further in the same field of controlling autonomous vehicles with neural networks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Uhlenbrock to incorporate the teachings of Pinheiro in order to provide a recurrent convolutional neural network to allow for a large input context while limiting the capacity of the model. (Pinheiro, Abstract, “The goal of the scene labeling task is to assign a class label to each pixel in an image. To ensure a good visual coherence and a high class accuracy, it is essential for a model to capture long range (pixel) label dependencies in images. In a feed-forward architecture, this can be achieved simply by considering a sufficiently large input context patch, around each pixel to be labeled. We propose an approach that consists of a recurrent convolutional neural network which allows us to consider a large input context while limiting the capacity of the model. Contrary to most standard approaches, our method does not rely on any segmentation technique nor any task-specific features. The system is trained in an end-to-end manner over raw pixels, and models complex spatial dependencies with low inference cost. As the context size increases with the built-in recurrence, the system identifies and corrects its own errors. Our approach yields state-of-the-art performance on both the Stanford Background Dataset and the SIFT Flow Dataset, while remaining very fast at test time.”)
Hemmat is considered to be analogous to the claimed invention because they are in the same field of limiting the number of weights used in a CNN improve power consumption and iterative frameworks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Uhlenbrock and Pinheiro to incorporate the teachings of Hemmat in order to achieve power saving through a novel framework which enables dynamic reconfiguration (Hemmat, Abstract, “In this work, we propose a novel framework which enables dynamic reconfiguration of an already-trained Convolutional Neural Network (CNN) in hardware during inference. The reconfiguration enables input-dependent approximation of the CNN to achieve power saving without much degradation in its classification accuracy at run-time. For each input, our framework uses only a fraction of the CNN's edge weights based on that input (with the rest remaining 0) to conduct the inference. Consequently, power saving is possible due to fewer number of fetches from off-chip memory as well as fewer multiplications for majority of the inputs. To achieve per-input approximation, we use clustering algorithm which groups similar weights in the CNN based on their importance, and design an iterative framework which decides how many clusters of weights should be fetched from off-chip memory for each individual input. We also propose new hardware structures to implement our framework on top of a recently-proposed FPGA-based CNN accelerator. In our experiments with popular CNNs, we show significant power saving with almost no degradation in classification accuracy due to doing inference with only a fraction of the edge weights for the majority of the inputs.”)

Regarding claim 12, 
Uhlenbrock and Pinheiro and Hemmat teaches The method as recited in claim 10
Uhlenbrock teaches wherein simultaneously to and/or in alternation with the parameters, which characterize the behavior of the layers in the iterative block, further parameters, which characterize behavior of further neurons and/or of other processing units of the ANN outside the iterative block, 
(Uhlenbrock, “[0061] In class-level fusion and as shown in FIG. 6, information is combined at the class probability level such that the scene class 600 from the entity pipeline is fused with the scene class 602 from the whole image pipeline. More specifically, two classifiers 604 and 604′ are trained separately [wherein simultaneously to and/or in alternation with the parameters, which characterize the behavior of the layers in the iterative block, further parameters, which characterize behavior of further neurons and/or of other processing units of the ANN outside the iterative block; ie parameters of the other classifier] for the entities and whole image features. Each classifier 604 and 604′ produces a class probability distribution over the scene types (i.e., scene classes 600 and 602). These distributions are combined 606 (e.g., multiplied and renormalized) to produce the final classifier result 608 (fused scene class).
However, Uhlenbrock does not explicitly teach a loss function

Pinheiro teaches are also optimized using the loss function for a likely better assessment.
(Pinheiro, Section 3.1, “The network is trained by transforming the scores fc(Ii,j,k; (W, b)) (for each class of interest c ∈ {1, ..., N}) into conditional probabilities, by applying a softmax function:

    PNG
    media_image9.png
    83
    532
    media_image9.png
    Greyscale

and maximizing the likelihood of the training data. More specifically, the parameters (W, b) of the network f(·) are learned in an end-to-end supervised way, by minimizing the negative log-likelihood over the training set:

    PNG
    media_image10.png
    64
    514
    media_image10.png
    Greyscale
[ are also optimized using the loss function for a likely better assessment]
where li,j,k is the correct pixel label class at position (i, j) in image Ik. The minimization is achieved with the Stochastic Gradient Descent (SGD) algorithm with a fixed learning rate λ: 

    PNG
    media_image11.png
    61
    481
    media_image11.png
    Greyscale
”)
Uhlenbrock and Pinheiro are both considered to be analogous to the claimed invention because they are in the same field of scene labeling utilizing convolutional neural networks. Pinheiro is further in the same field of recurrent neural architectures and Uhlenbrock is further in the same field of controlling autonomous vehicles with neural networks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Uhlenbrock to incorporate the teachings of Pinheiro in order to provide a generic loss function to train the network as doing so would (Pinheiro, Section 3.1, “maximizing the likelihood of the training data”)

In regards to claim 13, 
Uhlenbrock teaches A control unit for a vehicle, comprising: an input interface, which is connectable to one or to multiple sensors of the vehicle;
(Uhlenbrock, “[0054] For further understanding, upon receiving a whole image 300 (from a sensor (e.g., camera), database, video stream, etc.), the system performs convolution 302 by convolving the image 300 with various filters.”)
Uhlenbrock teaches an output interface, which is connectable to one or to multiple actuators of the vehicle; an artificial neural network (ANN) configured to be involved in processing of measured data obtained via the input interface from the one or more sensors to form an activation signal for the output interface
(Uhlenbrock, “[0043] The computer system 100 is configured to utilize one or more data storage units. The computer system 100 may include a volatile memory unit 106 (e.g., random access memory (“RAM”), static RAM, dynamic RAM, etc.) coupled with the address/data bus 102, wherein a volatile memory unit 106 is configured to store information and instructions for the processor 104. The computer system 100 further may include a non-volatile memory unit 108 (e.g., read-only memory (“ROM”), programmable ROM (“PROM”), erasable programmable ROM (“EPROM”), electrically erasable programmable ROM “EEPROM”), flash memory, etc.) coupled with the address/data bus 102, wherein the non-volatile memory unit 108 is configured to store static information and instructions for the processor 104. Alternatively, the computer system 100 may execute instructions retrieved from an online data storage unit such as in “Cloud” computing. In an aspect, the computer system 100 also may include one or more interfaces, such as an interface 110, coupled with the address/data bus 102. The one or more interfaces are configured to enable the computer system 100 to interface with other electronic devices and computer systems. The communication interfaces implemented by the one or more interfaces may include wireline (e.g., serial cables, modems, network adaptors, etc.) and/or wireless (e.g., wireless modems, wireless network adaptors, etc.) communication technology.”)
(Uhlenbrock, “[0050] As can be appreciated by those skilled in the art, there are many practical applications for the system as described herein. For example, the system can be employed in autonomous driving wherein entities (such as pedestrians, cyclists, vehicles, etc.) are being detected and the scene classification (e.g., freeway vs. parking lot vs. dirt road) could provide context for various control modes. For example, if the system determines an automobile is on a dirt road, it may automatically adjust the suspension of the vehicle, such as releasing air from the tires or loosening the shocks or suspension. Another application is for scene recognition for UAV surveillance and autonomous vehicles. For example, if a UAV is tasked with tracking a vehicle, it may limit its computing power to tracking only when a scene is determined to be one in which vehicles may be tracked, such as on roadways, whereas when flying over a forest the vehicle tracking operations may be disabled.”)
The rest of the steps of claim 13 are rejected under the same rationale as the analogous steps in claim 1 as it is substantially similar.

Uhlenbrock and Pinheiro are both considered to be analogous to the claimed invention because they are in the same field of scene labeling utilizing convolutional neural networks. Pinheiro is further in the same field of recurrent neural architectures and Uhlenbrock is further in the same field of controlling autonomous vehicles with neural networks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Uhlenbrock to incorporate the teachings of Pinheiro in order to provide a recurrent convolutional neural network to allow for a large input context while limiting the capacity of the model. (Pinheiro, Abstract, “The goal of the scene labeling task is to assign a class label to each pixel in an image. To ensure a good visual coherence and a high class accuracy, it is essential for a model to capture long range (pixel) label dependencies in images. In a feed-forward architecture, this can be achieved simply by considering a sufficiently large input context patch, around each pixel to be labeled. We propose an approach that consists of a recurrent convolutional neural network which allows us to consider a large input context while limiting the capacity of the model. Contrary to most standard approaches, our method does not rely on any segmentation technique nor any task-specific features. The system is trained in an end-to-end manner over raw pixels, and models complex spatial dependencies with low inference cost. As the context size increases with the built-in recurrence, the system identifies and corrects its own errors. Our approach yields state-of-the-art performance on both the Stanford Background Dataset and the SIFT Flow Dataset, while remaining very fast at test time.”)
Hemmat is considered to be analogous to the claimed invention because they are in the same field of limiting the number of weights used in a CNN improve power consumption and iterative frameworks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Uhlenbrock and Pinheiro to incorporate the teachings of Hemmat in order to achieve power saving through a novel framework which enables dynamic reconfiguration (Hemmat, Abstract, “In this work, we propose a novel framework which enables dynamic reconfiguration of an already-trained Convolutional Neural Network (CNN) in hardware during inference. The reconfiguration enables input-dependent approximation of the CNN to achieve power saving without much degradation in its classification accuracy at run-time. For each input, our framework uses only a fraction of the CNN's edge weights based on that input (with the rest remaining 0) to conduct the inference. Consequently, power saving is possible due to fewer number of fetches from off-chip memory as well as fewer multiplications for majority of the inputs. To achieve per-input approximation, we use clustering algorithm which groups similar weights in the CNN based on their importance, and design an iterative framework which decides how many clusters of weights should be fetched from off-chip memory for each individual input. We also propose new hardware structures to implement our framework on top of a recently-proposed FPGA-based CNN accelerator. In our experiments with popular CNNs, we show significant power saving with almost no degradation in classification accuracy due to doing inference with only a fraction of the edge weights for the majority of the inputs.”)

In regards to claim 20 and analogous claims 21-23, 
Uhlenbrock and Pinheiro and Hemmat teaches The method as recited in claim 1, 
Uhlenbrock teaches wherein a first convolution layer to which the input of the ANN is fed is not a part of the iterative block.
(Uhlenbrock, “[0054] For further understanding, upon receiving a whole image 300 (from a sensor (e.g., camera), database, video stream, etc.), the system performs convolution 302 [a first convolution layer to which the input of the ANN is fed] by convolving the image 300 with various filters. Each filter convolution generates a 2D “image” of network activations. Pooling 304 is then performed on those images of activations, resulting in smaller images. A rectification function is applied at each element of the pooled images. The rectified activations can then be in the input to the next convolution layer and so on. After some number of these convolution-pooling-rectification stages, the activations are input to a fully-connected layer 308, a type of neural network layer where each output is a function of all the inputs. It can be thought of also as a convolution layer with filters the same size as the input. One or two fully-connected layers 308 are typically used at the end of the convolutional neural network. The output of the final fully-connected layer 308 is the extracted feature 310…

    PNG
    media_image2.png
    627
    651
    media_image2.png
    Greyscale
[0061] In class-level fusion and as shown in FIG. 6, information is combined at the class probability level such that the scene class 600 from the entity pipeline is fused with the scene class 602 from the whole image pipeline. More specifically, two classifiers 604 and 604′ are trained separately for the entities and whole image features [not a part of the iterative block]. Each classifier 604 and 604′ produces a class probability distribution over the scene types (i.e., scene classes 600 and 602). These distributions are combined 606 (e.g., multiplied and renormalized) to produce the final classifier result 608 (fused scene class).”)

Claim(s) 6 and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Uhlenbrock and Pinheiro in view Hemmat in further view of Yao et al., "Fully hardware-implemented memristor convolutional neural network" ("Yao").
Regarding claim 6, 
Uhlenbrock and Pinheiro and Hemmat teaches The method as recited in claim 1
Yao teaches wherein the parameters are coded in electrical resistance values of memristors or of other memory elements, whose electrical resistance values are changeable in a non-volatile manner using a programming voltage or a programming current.
(Yao, Fig. 2, pg. 643 col. 1-2, “Realizing memristor-based [memristors or of other memory elements, whose electrical resistance values are changeable in a non-volatile manner using a programming voltage or a programming current; wherein memristors are non-volatile] convolutional operations requires performing sliding operations with various kernels. Memristor arrays are highly efficient in achieving parallel MACs under shared inputs for different kernels22. Figure 2b shows a typical convolution example at a particular slipping step, and Fig. 2c reveals the associated events in the 1T1R memristor array. The input value is encoded by the pulse number according to its quantized bit number (Extended Data Fig. 2). A signed kernel weight is mapped to the differential conductance of a pair of memristors. In this manner, all the weights of a kernel are mapped to two conductance rows: one row for positive weights with positive pulse inputs and the other for negative weights with equivalent negative pulse inputs [wherein the parameters are coded in electrical resistance values]. After inputting the encoded pulses into the bit lines, the output currents through the two differential source lines are sensed and accumulated. The differential current is the weighted sum corresponding to the input patch and the chosen kernel. Different kernels with different weights are mapped to different pairs of differential rows, and the entire memristor array operates MACs in parallel under the same inputs. All the desired weighted-sum results are obtained concurrently.”); 
Yao is considered to be analogous to the claimed invention because they are in the same field of image classification using neural networks, specifically convolutional neural networks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Uhlenbrock and Pinheiro and Hemmat to incorporate the teachings of Yao. One of ordinary skill in the art would have been motivated to make this modification in order to reduce latency and substantially improve CNN efficiency. (Yao, pg. 642 Col 1, “In this study, a complete five-layer mCNN for MNIST digit image recognition was successfully demonstrated. The optimized material stacks enabled reliable and uniform analogue switching behaviours in 2,048 one-transistor–one-memristor (1T1R) arrays. With the proposed hybrid-training scheme, the experimental recognition accuracy reached 96.19% for the entire test dataset. Furthermore, replication of the convolutional kernels to three parallel memristor convolvers was implemented to reduce the mCNN latency roughly by a factor of 3. Our highly integrated neuromorphic system provides a feasible solution to substantially improve the CNN efficiency by closing the throughput gap between memristor-based convolutional computation and fully connected VMM.”)

Regarding claim 9, 
Uhlenbrock and Pinheiro and Hemmat teaches The method as recited in claim 1
Hemmat teaches wherein the mapping of the input of the iterative block onto the output includes adding up [using analog electronics], inputs in a weighted manner, which are fed to neurons and/or to other processing units in the iterative block.
Hemmat teaches wherein the mapping of the input of the iterative block onto the output includes adding up [using analog electronics], inputs in a weighted manner, which are fed to neurons and/or to other processing units in the iterative block.
(Hemmat, pg. 4 Col 1, “The main functions of this phase are multiplication and accumulation operations in each layer of the network in which each weight is multiplied by its corresponding input neuron [inputs in a weighted manner]. Next, the results are accumulated using adder trees and are passed through a linear/non-linear activation function [which are fed to neurons and/or to other processing units in the iterative block] to generate output neurons [the mapping of the input of the iterative block onto the output]”)
Uhlenbrock and Pinheiro and Hemmat do not explicitly teach [wherein the mapping of the input of the iterative block onto the output includes adding up] using analog electronics, [inputs in a weighted manner, which are fed to neurons and/or to other processing units in the iterative block.]
Yao teaches [wherein the mapping of the input of the iterative block onto the output includes adding up] using analog electronics, [inputs in a weighted manner, which are fed to neurons and/or to other processing units in the iterative block.]
 (Yao, Fig. 1, pg. 642 Col 1, “Here we propose a versatile memristor-based computing architecture for neural networks, shown in Fig. 1a. The memristor cell uses a material stack of TiN/TaOx/HfOx/TiN, and shows continuous conductance-tuning capability (see Supplementary Information) in both potentiation (SET) and depression (RESET) by modulating the electric field and heat29. The materials and fabrication process (see Methods for details) are compatible with the conventional CMOS (complementary metal–oxide semiconductor) process, so that the memristor arrays can be conveniently built in the back end of line in a silicon fab to reduce process variations and achieve high reproducibility. The fabricated crossbar arrays exhibit uniform analogue switching behaviours [using analog electronics] under identical programming conditions.”)
Uhlenbrock and Pinheiro are both considered to be analogous to the claimed invention because they are in the same field of image classification using neural networks, specifically convolutional neural networks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Uhlenbrock and Pinheiro to incorporate the teachings of Yao. One of ordinary skill in the art would have been motivated to make this modification in order to reduce latency and substantially improve CNN efficiency.
(Yao, pg. 642 Col 1, “In this study, a complete five-layer mCNN for MNIST digit image
recognition was successfully demonstrated. The optimized material
stacks enabled reliable and uniform analogue switching behaviours
in 2,048 one-transistor–one-memristor (1T1R) arrays. With the proposed
hybrid-training scheme, the experimental recognition accuracy
reached 96.19% for the entire test dataset. Furthermore, replication of
the convolutional kernels to three parallel memristor convolvers was
implemented to reduce the mCNN latency roughly by a factor of 3. Our
highly integrated neuromorphic system provides a feasible solution
to substantially improve the CNN efficiency by closing the throughput
gap between memristor-based convolutional computation and fully
connected VMM.”)

    PNG
    media_image12.png
    402
    515
    media_image12.png
    Greyscale


Claim(s) 11 is rejected under 35 U.S.C. 103 as being unpatentable over Uhlenbrock in view of Pinheiro and Hemmat in further view of Wang et al., "Recurrent U-Net for Resource-Constrained Segmentation," [2019] ("Wang").
Regarding claim 11, 
Uhlenbrock and Pinheiro and Hemmat teaches The method as recited in claim 10
Wang teaches wherein the loss function contains a contribution, which is a function of the number of the parameters changed during the switch between iterations, of a rate of change of the changed parameters and/or of an absolute or relative change across all parameters
 (Wang, Fig. 1, pg. 1 Col 2, “Figure 1: Speed vs accuracy. Each circle represents the performance of a model in terms frames-per-second and mIoU accuracy on our Keyboard Hand Dataset using a Titan X (Pascal) GPU. The radius of each circle denotes the models’ number of parameters [contribution, which is a function of the number of the parameters]. For our recurrent approach, we plot these numbers after 1, 2, and 3 iterations, and we show the corresponding segmentations in the bottom row [changed during the switch between iterations]. The performance of our approach is plotted in red [a rate of change of the changed parameters and/or of an absolute or relative change across all parameters; wherein the plotting of the bubble size ie number of parameters over the iterations is the rate of relative change across all parameters] and the other acronyms are defined in Section 4.2. ICNet [45] is slightly faster than us but at the cost of a significant accuracy drop, whereas RefineNet [17] and DeepLab [6] are both slower and less accurate on this dataset, presumably because there are not enough training samples to learn their many parameters.”); 
Wang is considered to be analogous to the claimed invention because they are in the same field of recurrent neural architectures and analyzing the number of parameters in terms of performance speed. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Uhlenbrock and Pinheiro to incorporate the teachings of Wang in order to provide a visual of measuring the change in the number of parameters in terms of performance speed as doing so would provide insights on the performance of the model compared to state-of-the-art benchmarks (Wang, Figure 1, “Figure 1: Speed vs accuracy. Each circle represents the performance of a model in terms frames-per-second and mIoU accuracy on our Keyboard Hand Dataset using a Titan X (Pascal) GPU. The radius of each circle denotes the models’ number of parameters. For our recurrent approach, we plot these numbers after 1, 2, and 3 iterations, and we show the corresponding segmentations in the bottom row. The performance of our approach is plotted in red and the other acronyms are defined in Section 4.2. ICNet [41] is slightly faster than us but at the cost of a significant accuracy drop, whereas RefineNet [15] and DeepLab [6] are both slower and less accurate on this dataset, presumably because there are not enough training samples to learn their many parameters.”)

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JASMINE THAI whose telephone number is (703)756-5904. The examiner can normally be reached M-F 8-4.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael Huntley can be reached at (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/J.T.T./Examiner, Art Unit 2129                                                                                                                                                                                                        




/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129
Read full office action
Prosecution Timeline

Aug 05, 2021
Application Filed
Oct 01, 2024
Non-Final Rejection — §103, §112
Jan 06, 2025
Response Filed
Jan 15, 2025
Final Rejection — §103, §112
May 28, 2025
Request for Continued Examination
May 30, 2025
Response after Non-Final Action
Jul 16, 2025
Non-Final Rejection — §103, §112
Jan 21, 2026
Response Filed
Mar 09, 2026
Final Rejection — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/366,773
Patent 12561603
SYSTEM FOR TIME BASED MONITORING AND IMPROVED INTEGRITY OF MACHINE LEARNING MODEL INPUT DATA
2y 5m to grant Granted Feb 24, 2026
17/588,175
Patent 12555000
GENERATION OF CONVERSATIONAL TASK COMPLETION STRUCTURE
2y 5m to grant Granted Feb 17, 2026
17/676,775
Patent 12462154
METHOD AND SYSTEM FOR ASPECT-LEVEL SENTIMENT CLASSIFICATION BY MERGING GRAPHS
2y 5m to grant Granted Nov 04, 2025
17/470,900
Patent 12395590
REDUCTION AND GEO-SPATIAL DISTRIBUTION OF TRAINING DATA FOR GEOLOCATION PREDICTION USING MACHINE LEARNING
2y 5m to grant Granted Aug 19, 2025
17/357,626
Patent 12380361
Federated Machine Learning Management
2y 5m to grant Granted Aug 05, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
25%
Grant Probability
81%
With Interview (+56.3%)
4y 0m
Median Time to Grant
High
PTA Risk
Based on 24 resolved cases by this examiner. Grant probability derived from career allow rate.