Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Remarks
This Office Action is in response to applicant’s amendment filed on December 4, 2025, under which claims 1-8, 10-15, 17-20, and 22-24 are pending and under consideration.
Response to Arguments
Applicant’s arguments have been fully considered but are not persuasive in distinguishing over the cited references. The prior art rejections have been updated to account for the amended claim language, but the claims remain rejected over the previously applied references.
Applicant argues:
…To advance prosecution, applicant further amends claim 1 to recites "each path representing a forward, backward, or bi-directional connection corresponding to a respective time period of a same node."
…
…In sum, Asad merely performs aggregate collection of statistics across time steps for the purpose of selecting a single, common quantization format for all instances of a tensor, thereby enforcing time invariance of number formats across steps. Asad's process combines per-step data only to derive one unified format.
In contrast, amended claim 1 recites "each path representing a forward, backward, or bi- directional connection corresponding to a respective time period of a same node," which reflects that the claimed temporal profiling is applied to the same node across multiple time periods, rather than aggregating statistics for all time steps. Accordingly, the claimed method performs quantization using activation ranges determined for respective time periods, rather than applying a single uniform format across time.
(Applicant’s response, pages 10-11).
This argument is not persuasive because the new limitation of “each path representing a forward, backward, or bi-directional connection corresponding to a respective time period of a same node” merely refers to conventional features of an LSTM, and is not necessarily related to the aspect of temporal profiling. Note that this limitation only defines the paths of the node, and does not further define the output of the node that is the subject of temporal profiling. The claim also does not require the output to be of any of the specific paths. In other words, the concept of “paths” and “output” do not have any specific dependency or interrelation in the current claim language. Therefore, contrary to applicant’s remarks, the new limitation of “each path representing a forward, backward, or bi-directional connection corresponding to a respective time period of a same node” is independent of any concept of applying temporal profiling to the same node across multiple time periods, rather than aggregating statistics for all time steps.
The Examiner also notes that since the new limitation recites “a forward, backward, or bi-directional connection,” it does not require an RNN with a backward or bi-direction connection, since the term “or” makes those alternatives optional when the RNN already has multiple forward connections. Since a forward connection is a conventional feature of RNNs, the limitation in question does not recite any specific distinguishing feature.
Next, applicant argues:
In particular, Asad fails to disclose or suggest "obtaining, via temporal profiling that records and analyzes temporal variations of node outputs across multiple time periods within the quantization process," as amended claim 1 recites. The Office cites paragraphs [0075], [0176], and [0189] of Asad as allegedly disclosing "a neural network 'configured for temporal profiling over a plurality of nodes,' obtaining node outputs 'via temporal profiling,' and determining statistic properties 'based on temporal profiling.' Office Action, pp. 8-9. Applicant respectfully disagrees.
Paragraph [0176] of Asad describes that the RNN is executed "for a predetermined number of one or more time steps so as to generate the statistics at each time step" and paragraph [0189] of Asad describes that a format-selection algorithm is "applied to the statistics captured over all of the predefined number of time steps." Asad merely describes collecting per-step statistics and averaging or combining them to determine a single, common quantization format across all time steps. Nothing in Asad discloses recording or analyzing temporal variations of node outputs for any given node across multiple time periods. Instead, Asad's process eliminates temporal distinctions by enforcing a unified, time-invariant quantization range. In contrast, amended claim 1 recites "temporal profiling that records and analyzes temporal variations of node outputs across the multiple the time periods within the quantization process." Asad's approach does not record, compare, or analyze such temporal changes in node behavior and therefore does not disclose, teach or suggest at least this feature as amended claim 1 recites.
(Applicant’s response, page 12).
This argument is not persuasive because the new limitation of “records and analyses temporal variations of node outputs” merely refer to concepts claimed at a high degree of generality, without any specific methodology that distinguishes over Asad.
Asad’s teaching of “collecting per-step statistics and averaging or combining them to determine a single, common quantization format across all time steps” (as characterized in applicant’s response) in fact qualifies as recording statistics of different time steps, because the collection of per-step statistics also records them. Furthermore, this method in Asad also constitutes analyzing those temporal variations because Asad teaches, in [0196] that “[0196]: “if a certain tensor behaves differently at a given time step to the previous timesteps resulting in different number formats at those time steps, this approach has the potential to generalise this format to all other timesteps before and after it in the unrolled graph, meaning that those tensor values can be handled correctly should the unusual behaviour occur at a different point in the sequence.” This constitutes analysis of the temporal variations because it determines that tensors behave different (and thus have variation) at different time steps, resulting in “different number formats.” The determination of “different number formats” in fact constitutes an analysis of the temporal variations. Furthermore, by obtaining a common number format over all of the time steps, the temporal variations across them are analyzed in terms of the medium, minimum, maximum, or mean of different (i.e., varying) statistics. See Asad, [0192].
Similarly, the term “across the multiple time periods” does not have any specificity as to what steps are involved in the recording and analysis of the temporal variations. Since Asad teaches collecting statistics for different time steps, this constitutes “across the multiple time periods.”
Based on applicant’s arguments, the Examiner believes that applicant may have intended a more complex manner of recording and analyzing temporal variations than what is in Asad. However, based on the Examiner’s review of the specification, it is not apparent whether such a more specific methodology is disclosed, especially since the term “temporal variation” does not explicitly appear in the specification. Since “temporal variation” is not used in any specific manner in the specification, this term is interpreted broadly to refer to any changes in the statistics across the time steps, and the analysis of such temporal variations is interpreted broadly to any analysis based on those changes.
Next, applicant argues:
Furthermore, Asad does not disclose, teach or suggest "determining, based on temporal profiling, statistical properties of the node outputs across the multiple time periods," as claim 1 recites. In Asad, statistical properties are computed independently for each time step and then aggregated to select a uniform number format, rather than being determined based on the temporal evolution of the same node's outputs. In contrast, amended claim 1 requires statistical properties derived from the recorded temporal variations of the node outputs across multiple time periods. Asad's statistics are static and per-step; they are not temporally correlated, not derived from temporal profiling, and not used to identify time-dependent activation ranges as required by claim 1. Li fails to cure the deficiencies of Asad.
…
As discussed above, Asad applies the format-selection algorithm across multiple time steps to identify a common number format that remains constant across all instances of a given tensor. Thus, Asad's "temporal profiling" merely aggregates statistical data over time to determine a single shared format, and it does not determine or apply distinct activation ranges or quantization parameters for each time period of a node. Therefore, Asad fails to disclose, teach or suggest at least the above features as amended claim 1 recites. Li fails to cure the deficiencies of Asad.
(Applicant’s response, page 12) (emphasis added).
This argument is not persuasive because the claim does not have any specific claim language that excludes statistics that are “static” or “per-step” (as characterized in applicant’s remarks). Furthermore, the statistics in Asad are temporally correlated because they are collected for each time step (see [0189]: “statistics (1105) captured at each time step”), and each time step is correlated with a time (hence the term “time step”). The claim merely recites “temporal profiling” in a high degree of generality, without specific limitations that distinguish over Asad.
Furthermore, in regards to applicant’s statement that Asad’s statistics are “not used to identify time-dependent activation ranges as required by claim 1,” the Examiner notes that the claim does not require the use of the statistics in a particular way to determine the activation ranges. Instead, the claim recites “determining…activation ranges of the node outputs based on at least the first set of statistic properties of the node outputs…and a second set of statistic properties.” Here, the term “based on” does not require any specific relationship or manner of use, but instead is a broad that erm that is met when the statistics are used in some way that affects the determination of the activation ranges. In the combination of references, this is met because the statistics in Asad are used to determine the number format, which affects the operation and numerical outputs of the neural network, and hence also activation ranges in general.
In regards to applicant’s statement that Asad “does not determine or apply distinct activation ranges or quantization parameters for each time period of a node”, the Examiner notes that the claim does not require these features as distinctions over the art. The claim does not require such a feature that the activation range is determined for each time period, but instead only requires activation ranges determined “based on” statistics at different time periods. Thus, there is not a one-to-one correspondence between activation ranges and time periods. Moreover, the claim does not recite “quantization parameters for each time period.”
Finally, in regards to dependent claim 22, applicant argues:
…In sum, Asad uses one shared format for all time steps of the same tensor. Asad, however, does not track or use changes over time, instead, it removes any time variation by applying a single fixed format to every instance. Further, paragraph [0196] acknowledges that different time steps may behave differently, but explains that the purpose of Asad's approach is to "generalise this format to all other timesteps before and after it in the unrolled graph, meaning that those tensor values can be handled correctly should the unusual behaviour occur at a different point in the sequence." Thus, Asad describes generalization and uniformity, not per-time-step adaptation based on detected changes.
Therefore, Asad fails to disclose, teach or suggest "the neural network is a recurrent neural network and temporal profiling comprises obtaining outputs of a same node during multiple sequential time periods of the quantization process and analyzing the outputs to determine changes in statistical properties and activation ranges over time for use in quantizing the recurrent neural network" as claim 22 recites.
(Applicant’s response, page 14).
In response, the Examiner notes that the instant claim does not distinguish over the generalization and uniformity concept and does not explicitly recite a feature of “per-time-step adaptation.” Furthermore, contrary to applicant’s characterization, Asad does teach using changes over time, since the common number format is based on collected statistics that are different for each time step (and thus change over time), as discussed above.
Therefore, applicant’s arguments are not persuasive, and the claims remain rejected over the previously applied references.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
1. Claims 1, 4, 6, 8, 11, 13, 15, 18, 20, and 22-24 are rejected under 35 U.S.C. 103 as being unpatentable over Li et al., “On the Quantization of Recurrent Neural Networks,” arXiv:2101.05453v1 [cs.LG] 14 Jan 2021 (“Li”) in view of Asad et al. (US 2022/0044096 A1) (“Asad”).
As to claim 1, Li teaches a method for neural network quantization, [The preamble phrase “for neural network quantization” state a purpose or intended use for the invention. Thus, this preamble expression is not a claim limitation. See MPEP § 2111.02(II). Nonetheless, Li also teaches that its method is “for neural network quantization.” See abstract: “In this work, we present an integer-only quantization strategy for Long Short-Term Memory (LSTM) neural network topologies.”] comprising:
obtaining a neural network […], wherein the plurality of nodes [Abstract: “In this work, we present an integer-only quantization strategy for Long Short-Term Memory (LSTM) neural network topologies, which themselves are the foundation of many production ML systems.” The LSTM as a whole has multiple layers and units. See § 5, paragraph 1: “All models have the same RNN Transducer (RNN-T) architecture with 10 layers of LSTM and each of them contains 2048 hidden units.” Note that in the context of LSTMs, a unit includes a cell with a plurality of gates and an output h, as defined in § 2. Therefore, multiple units (2048 hidden units) correspond to multiple nodes.] comprises a first node connected to different paths across multiple time periods [The limitation of a “first node connected to different paths across multiple time periods” is met because this is a feature of LSTMs, as also exemplified in the instant application as a neural network with such a feature. For example, paragraph 37 of the specification teaches that gt, mt, and yt represent different paths in this neural network layer. These paths are conventional features of LSTMs and are taught in Li, § 2, where those same paths are represented as zt, mt, and ht. In regards to multiple time periods, the description of “xt is the input at time t” (below equation 7) means that the cell is connected to different inputs x at different values of t. Similarly, equation (1)-(3) indicates that at different time periods t, the cell is connected to different recurrent inputs Riht-1.] within a quantization process [As noted above, Li also teaches that its method is for neural network quantization. See abstract: “In this work, we present an integer-only quantization strategy for Long Short-Term Memory (LSTM) neural network topologies.” Thus, the method of Li in general, including the time steps of the LSTM that are executed, is part of a quantization process.], each path representing a forward, backward, or bi-directional connection corresponding to a respective time period of a same node [In the above examples, zt, mt, and ht are forward paths. Note that the instant claim language does not require the existence of all three types of paths, since the claim recites an “or”-delimited list of alternatives.]
obtaining […] node outputs for the first node across the multiple time periods, wherein the different multiple periods comprise at least a first time period and a second time period in temporal sequence, [§ 2, equation (7) teaches obtaining the node output ht (and intermediatory outputs included in ht) for time period t. Since the LSTM is executed for actual data sets with t-index sequences, the limitation of inputs for different values of t (i.e., “across the multiple time periods” including “a first time period and a second time period”) is implicitly disclosed, where, for example, one value of t would correspond to a first time period while another value of t would correspond to a second time period. See § 5, paragraph 1: “In table 1, we reproduce [21] the accuracy for speech recognization on 3 anonymized private benchmark speech datasets on VoiceSearch, YouTube and Telephony.” In other words, by running the model on speech datasets, it is understood that it is run on data that is time-indexed. Additionally, Li teaches that its method accounts for error accumulation in the time dimension in § 1, paragraph 6: “However, the stateful nature of RNN makes its quantization numerically challenging since the quantization error can accumulate in both depth dimension as well as the sequence (e.g. time) dimension.” For example, in equations (1)-(7) in § 2, t-1 and t a temporal sequence with corresponding node outputs of ht and ht-1, and it is understood that t increments as per the operation of the LSTM.] and wherein the node outputs comprise at least a first node output at the first time period and a second node output at the second time period; [Referring to equations (1)-(7) in § 2, the output of the i, f, and o gates, as well as the cell output h, are time indexed (i.e., have an index of t). Thus, on the basis of how the LSTM operates, there is an output at one time t and an output at a second time t.]
determining, […], statistic properties of the node outputs across the multiple time periods [§ 4, paragraph 1: “For activations, there are two ways of collecting maximum and minimum values: Post Training [25] and Quantization Aware Training (QAT) [7, 22]. Post Training runs inference on a representative data set and collect statistics; QAT collects tensor statistics during training and additionally fine tunes model weight by simulating quantization noise in the training process.” Note that § 4, paragraph 2 refers to “activations” as “recurrent and input activations,” i.e., referring to the output of the LSTM cell. See also Appendix A, Table 2 caption: “range is max(|xi|) − min(|xi|).” The examiner notes that the maximum and minimum values as described here are considered to read on the instant limitation of “statistic properties” and is a maximum and minimum for the node outputs in general, where such node outputs include those at different time periods since an LSTM computes the equations (1)-(7) for successive values of t.] […] and
determining activation ranges of the node outputs […] [§ 4, paragraph 1: “As shown in equation 8, in order to calculate the scale, maximum and minimum values of the tensor are needed… For activations, there are two ways of collecting maximum and minimum values…” As shown in equation (8), the maximum and minimum values are used to compute a difference max(T)-min(T), which corresponds to the range of an output (activation). See also Appendix A, Table 2 caption: “range is max(|xi|) − min(|xi|).” Furthermore, as shown in this table, a “range” is calculated for recurrent activation h and other activations used by the cell to product the output (e.g., the input activation x and the output of gate o as shown in the table), i.e., corresponding to a plurality of “activation ranges of the node outputs.” See also FIG. 16 caption: “Input activation and recurrent activations have different scales.”];
determining additional activation ranges for remaining nodes of the plurality of nodes in the neural network […]; [In general, Li teaches that the entire model is quantized, as described in § 3, paragraph 1: “quantization strategy for LSTMs, which covers the transformation of floating point values to integer, as well as the necessary rewrite of the computation for the execution to take place entirely in the integer domain.” Furthermore, Li teaches that model has multiple layers and units in § 5, paragraph 1: “All models have the same RNN Transducer (RNN-T) architecture with 10 layers of LSTM and each of them contains 2048 hidden units.” Note that in the context of LSTMs, a unit includes a cell with a plurality of nodes (gates), both as defined in § 2. Therefore, the same technique is applied to multiple nodes.] and
quantizing […] the neural network by quantizing each layer in the neural network and respectively quantizing each node output based on respective activation range. [As noted above, Li teaches quantizing the entire LSTM, including multiple layers and nodes, and the same technique is applied to all nodes being quantized.]
Li does not explicitly teach:
The neural network being “configured for temporal profiling over a plurality of nodes,” the obtaining of the outputs being “via temporal profiling that records and analyzes temporal variations of node outputs across the multiple time periods within the quantization process,” and the similar limitations of “using temporal profiling,” “based on temporal profiling,” and “using temporal profiling,” for the remaining steps of the remaining steps of determining statistic properties, determining activation ranges, and quantizing, respectively;
“wherein the statistic properties of the node outputs comprise at least: a first set of statistic properties of the first node output of the first node at the first time period, wherein the first set of statistic properties comprises a first maximum value and a first minimum value, and a second set of statistic properties of the second node output of the first node at the second time period, wherein the second set of statistic properties comprises a second maximum value and a second minimum value”; and
The related limitation that the determining the activation ranges is “based on at least the first set of statistic properties of the node outputs at the first time period and a second set of statistic properties of the node outputs at the second time period.”
Asad, which generally pertains to RNN quantization (see title and [053]-[0154]), teaches part of the above limitations.
In particular, Asad teaches a neural network “configured for temporal profiling over a plurality of nodes,” obtaining node outputs “via temporal profiling that records and analyzes temporal variations of node outputs across the multiple time periods within the quantization process,” and determining statistic properties “based on temporal profiling” [[0176]: “The RNN is executed for a predetermined number of one or more time steps so as to generate the statistics at each time step which are required by the number format selection algorithm.” This constitutes recording the output across the multiple time periods. See also [0075]: “At each time step t the RNN provides an output o(t) 105. By operating the RNN on the input at each timestep of an input sequence x(t), the RNN generates a respective sequence of outputs o(t).” That is, statistics are recorded during execution of the RNN, which also corresponds to obtaining the outputs of those nodes (e.g., cells) of the RNN. This corresponds to the limitation of “records…temporal variations.” Additionally, see also [0189]: “the format selection algorithm is independently applied to the statistics (1105) captured at each time step (or a subset of the time steps for which statistics are captured) so as to identify a number format for each instance of a network value at each (or those) time step(s). In other examples, the format selection algorithm is (e.g. simultaneously) applied to the statistics captured over all of the predefined number of time steps for which the RNN is performed (1105) so as to identify a common number format for a network value over all of the time steps (i.e. every instance of the network value) over which the RNN is performed (1106).” See also [0192]: “For example, integer parameters expressing the number formats established for the instances of a tensor may be combined by identifying a median, minimum, maximum, or mean (e.g. the integer value closest to the mean) integer value which may then be used as the respective parameter of the common number format.” See also [0196]: “For example, if a certain tensor behaves differently at a given time step to the previous timesteps resulting in different number formats at those time steps, this approach has the potential to generalise this format to all other timesteps before and after it in the unrolled graph, meaning that those tensor values can be handled correctly should the unusual behaviour occur at a different point in the sequence.” In regards to the limitation of “analyzes…temporal variations…” the above paragraph teaches that tensors are determined to behave differently at different timesteps. This constitutes analysis of the temporal variations because it determines that tensors behave different (and thus have variation) at different time steps, resulting in “different number formats.” The determination of “different number formats” in fact constitutes an analysis of the temporal variations. Furthermore, by obtaining a common number format over all of the time steps, the temporal variations across them are analyzed in terms of the medium, minimum, maximum, or mean of different (i.e., varying) statistics. The instant claim does not recite any specific methodology of analyzing temporal variations that distinguishes over this cited reference.]
Asad further teaches “wherein the statistic properties of the node outputs comprise at least: a first set of statistic properties of the first node output of the first node at the first time period, wherein the first set of statistic properties comprises a first maximum value and a first minimum value, and a second set of statistic properties of the second node output of the first node at the second time period, wherein the second set of statistic properties comprises a second maximum value and a second minimum value” [[0176]: “The RNN is executed for a predetermined number of one or more time steps so as to generate the statistics at each time step which are required by the number format selection algorithm…The statistics (e.g. data values, maximums/minimums, histogram data) generated at the RNN and/or logic associated with the RNN (e.g. at format selection unit 344) may be captured in any suitable manner. For example, in the case that the RNN is implemented in software running at CPU 902 in FIG. 9, the statistics may be stored at memory 304 for concurrent or subsequent processing by the format selection unit 344 (which may also be running at the CPU). In some examples, at least some of the statistics comprise intermediate data values generated at the RNN (e.g. between stacked RNN cells and/or operations of an RNN cell).” [0189]: “the format selection algorithm is independently applied to the statistics (1105) captured at each time step (or a subset of the time steps for which statistics are captured) so as to identify a number format for each instance of a network value at each (or those) time step(s).” [0190]: “For example, the statistics captured on running the RNN on sample input data may comprise capturing at each time step the maximum absolute values of a set of two or more values of the RNN.” [0169]: “Such statistics may be one or more of network values, mean/variance of network values, minimum/maximum network values.” That is, “each time step” here corresponds to the limitations of “first time period” and “second time period.” Furthermore, the concept of “network value” refers to the output value of an RNN (see [0155]), such as “input values…state values…and intermediate tensors representing values between operations of the network,” thereby being analogous to the limitation of “first node output” and “second node output” in the context that there are values at different time steps. The outputs are of “operations of an RNN cell” ([0176], part quoted above), and are thus analogous to outputs “of the first node” (see also [0171]: “Different state tensors (e.g. h1 and h2) may have different number formats and the inputs to different RNN cells (e.g. RNN cells 102 and 103) may have different number formats” which teaches that different quantization formats can be applied to the state and inputs of individual cells). Furthermore, the limitations of minimum and maximum are taught as statistics. The use of both minimum and maximum together is taught in, e.g., [0024] (“the statistics may comprise one or more of one or more of…a minimum or maximum”), [0164] (teaching “a linear interpolation factor between the minimum and maximum representable numbers”); [0180] (teaching that “minimum/maximum” refers to the “MinMax” or “full range method”, which is understood as using both the minimum and maximum), and the reference generally teaches “any combination of two or more such features” ([0220]); alternatively, the teaching of “histogram data” in the sections quoted above also includes a minimum and a maximum.]
Asad further teaches “using temporal profiling” and “based on temporal profiling” for operations analogous to determining activation ranges of the node outputs and remaining nodes and quantizing the neural network [[0187]: “As has been described, the RNN is operated at step 1104 on sample input data over a predefined number of time steps without any (or minimal) quantisation of its network values in order to capture the statistics at each time step which are required by the format selection method. The format selection method is applied 1105 to the statistics captured at each time step of the RNN to select optimal number formats for the network values of the RNN. The number format selection algorithm may be chosen and/or configured so as to identify a block-configurable type of number format for each network value for which a number format is to be determined.” See also [0173]-[0177]. That is, the statistics are used by the format selection algorithm, which itself is a quantization process (see [0153]) and is analogous to the quantization method in the base reference Li. Furthermore, “temporal profiling” in the context of this reference refers to the recording of information over the time steps, and such forms the basis of further analysis of the neural network in general, even in Li.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Li with the teachings of Asad by implementing the temporal profiling and statistics collection technique of Asad, so as to arrive at the claimed invention, including the limitations not taught by Li. The motivation would have been to collect statistics in a manner that enables selection of number formats that is robust as to behaviors at certain time steps (see Asad, [0196]: “the method described herein also makes the selected formats more robust because information is pooled from across multiple time steps of the RNN. For example, if a certain tensor behaves differently at a given time step to the previous timesteps resulting in different number formats at those time steps, this approach has the potential to generalise this format to all other timesteps before and after it in the unrolled graph, meaning that those tensor values can be handled correctly should the unusual behaviour occur at a different point in the sequence.”)
As to claim 4, the combination of Li and Asad teaches the method of claim 1, wherein the neural network is one of following neural networks: a Long Short-Term Memory (LSTM), a LSTM with recurrent project layer (LSTMP), or a Gated Recurrent Unit (GRU). [Li, abstract: “In this work, we present an integer-only quantization strategy for Long Short-Term Memory (LSTM) neural network topologies, which themselves are the foundation of many production ML systems”]
As to claim 6, the combination of Li and Asad teaches the method of claim 1, as set forth above.
Asad further teaches “wherein the statistic properties comprise one or a combination of following properties: a mean estimate, a histogram, a probability density function, a variance estimate, an entropy, a cross entropy, or a Kullback-Leiber Divergence.” [[0176]: “The statistics (e.g. data values, maximums/minimums, histogram data) generated at the RNN and/or logic associated with the RNN (e.g. at format selection unit 344) may be captured in any suitable manner.” [0169]: “Such statistics may be one or more of network values, mean/variance of network values, minimum/maximum network values.” [0024]: “The RNN may comprise a plurality of values including at least the two or more values, and the statistics may comprise one or more of: a mean of at least some of the plurality of values; a variance of at least some of the plurality of values; a minimum or maximum of at least some of the plurality of values; one or more histograms summarising at least some of the plurality of values; and gradients calculated with respect to an RNN output or a measure of error based on an RNN output over at least some of the plurality of values.” That is, the alternatives of “mean,” “variance,” and “histogram” are disclosed.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have further combined the teachings of Li with the teachings of Asad so as to arrive the claimed invention of this dependent claim with respect to any one or more of mean, variance, and histogram being included in the statistic properties. Since these teachings of Asad discussed above are part of the teachings discussed in the rejection of the parent claim, the motivation for doing so is the same as the one given for Asad in the rejection of the parent claim.
As to claims 8, 11, and 13, these claims are directed an apparatus for performing operations that are the same or substantially the same as those of claims 1, 4, and 6. Therefore, the rejections made for claims 1, 4, and 6 are applied to claims 8, 11, and 13, respectively.
Furthermore, Li teaches “An apparatus for implementing a neural network, comprising: one or more processors; and a memory configured to store instructions executable by the one or more processors.” [§ 4, paragraph 1: “Currently LSTM quantization is enabled in the post-training approach in TensorFlow.” § 7: “We have demonstrated that integer RNN is accurate and meaningfully faster on CPU.” Therefore, the limitation of a processor is taught in the form of a CPU. Furthermore, TensorFlow refers to a well-known software library that runs on a computer. Since Li teaches that the quantization method is performed on a computer using such software, and also teaches that the RNN runs on a CPU, it is implicitly disclosed and understood by one of ordinary skill in the art that the processor executes instructions stored in a memory, which is a generic component of a computer. See MPEP § 2144.01 (Implicit Disclosure): “[I]n considering the disclosure of a reference, it is proper to take into account not only specific teachings of the reference but also the inferences which one skilled in the art would reasonably be expected to draw therefrom.”]
As to claims 15, 18, and 20, these claims are directed an apparatus for performing operations that are the same or substantially the same as those of claims 1, 4, and 6. Therefore, the rejections made for claims 1, 4, and 6 are applied to claims 15, 18, and 20 respectively.
Furthermore, Li teaches “A non-transitory computer readable storage medium, comprising instructions stored therein to implement a neural network, wherein, upon execution of the instructions by one or more processors, the instructions cause the one or more processors to perform acts.” [§ 4, paragraph 1: “Currently LSTM quantization is enabled in the post-training approach in TensorFlow.” § 7: “We have demonstrated that integer RNN is accurate and meaningfully faster on CPU.” Therefore, the limitation of a processor is taught in the form of a CPU. Furthermore, TensorFlow refers to a well-known software library that runs on a computer. Since Li teaches that the quantization method is performed on a computer using such software, and also teaches that the RNN runs on a CPU, it is implicitly disclosed and understood by one of ordinary skill in the art that the processor executes instructions stored in a non-transitory computer readable storage medium, which is a generic component of a computer. See MPEP § 2144.01 (Implicit Disclosure): “[I]n considering the disclosure of a reference, it is proper to take into account not only specific teachings of the reference but also the inferences which one skilled in the art would reasonably be expected to draw therefrom.”]
As to claim 22, the combination of Li and Asad teaches the method of claim 1, wherein the neural network is a recurrent neural network [Li, § 7, last paragraph: “Unidirectional-RNN/LSTM and bidirectional-RNN/LSTMhave loops on top of LSTM cell and the quantization strategy described in this work can be directly applied.”]
Asad further teaches “temporal profiling comprises obtaining outputs of a same node during multiple sequential time periods of the quantization process” [[0176]: “The RNN is executed for a predetermined number of one or more time steps so as to generate the statistics at each time step which are required by the number format selection algorithm… In some examples, at least some of the statistics comprise intermediate data values generated at the RNN (e.g. between stacked RNN cells and/or operations of an RNN cell).” [0189]: “the format selection algorithm is independently applied to the statistics (1105) captured at each time step (or a subset of the time steps for which statistics are captured) so as to identify a number format for each instance of a network value at each (or those) time step(s).” Furthermore, as discussed in the rejection of the parent claim, the outputs in Asad are of “operations of an RNN cell” ([0176], part quoted above), and are thus outputs of a same node. See also [0171]: “Different state tensors (e.g. h1 and h2) may have different number formats and the inputs to different RNN cells (e.g. RNN cells 102 and 103) may have different number formats” which teaches that different quantization formats can be applied to the state and inputs of individual cells.] “and analyzing the outputs to determine changes in statistical properties and activation ranges over time for use in quantizing the recurrent neural network.” [Initially, the Examiner notes that the instant specification does not explicitly describe determining changes in statistical properties and activation ranges, and instead only implies that the statistics and the minimum and maximum output being determined happen to vary over the time periods. In Asad, a change in statistical properties is implied because the statistics are generated at each time step, and it is understood that the statistical properties may vary, particularly since [0196] states: “For example, if a certain tensor behaves differently at a given time step to the previous timesteps resulting in different number formats at those time steps, this approach has the potential to generalise this format to all other timesteps before and after it in the unrolled graph, meaning that those tensor values can be handled correctly should the unusual behaviour occur at a different point in the sequence.” See also [0002]: “the operation of an RNN is influenced by the historical processing performed by the network and the same input could produce a different output depending on the previous inputs in the sequence provided to the RNN” That is, the behavior and thus the statistics change over the time steps. Similarly, an activation range is disclosed in the form of the set of maximum and minimum values when applied to the state/intermediate tensors (see [0150]), which is also understood to be changing see ([0102]-[0103]).]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have further combined the teachings of Li with the teachings of Asad so as to arrive the claimed invention of this dependent claim. Since these teachings of Asad discussed above are part of the teachings discussed in the rejection of the parent claim, the motivation for doing so is the same as the one given for Asad in the rejection of the parent claim.
As to claims 23 and 24, the further features recited in these claims are the same or substantially the same as those of claim 22. Therefore, the rejection made to claim 22 is applied to claims 23 and 24.
2. Claims 3, 10, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Li in view of Asad, and further in view of Sriram et al. (US 2022/0044114 A1) (“Sriram”).
As to claim 3, the combination of Li and Asad teaches the method of claim 1, wherein the neural network is a recurrent neural network for automatic speech recognition; [As discussed in the rejection of a parent claim, Li teaches an LSTM, which is a type of RNN. See also § 2, paragraph 1: “The long short-term memory cell is one of the most complex and widely used RNN topologies.” In regards to speech recognition, see § 5: “In table 1, we reproduce [21] the accuracy for speech recognization on 3 anonymized private benchmark speech datasets on VoiceSearch, YouTube and Telephony. VoiceSearch and Telephony dataset have average utterance length of 4.7 seconds while YouTube dataset contains longer utterances, averaging 16.5 min per utterance. All models have the same RNN Transducer (RNN-T) architeture with 10 layers of LSTM and each of them contains 2048 hidden units.”]
Li as modified thus far does not explicitly teach the method further comprising: “implementing the recurrent neural network in an edge computing device after quantizing all neural network layers in the recurrent neural network.”
Sriram teaches “implementing the recurrent neural network in an edge computing device after quantizing all neural network layers in the recurrent neural network” [[0060]: “This compute budget may be acceptable, at training, as most DNNs are trained in data centers or in the cloud with GPUs that have significantly large compute capability and much larger power budgets. However, during deployment, these models are most often required to run on edge devices with much smaller computing resources and lower power budgets. Running a DNN inference using the full 32-bit representation may not always (or ever) be practical for real-time analysis given the compute, memory, and power constraints of edge devices.” Note that RNNS are mentioned as an example of a neural network in [0182], and “speech recognition” is mentioned in [0110]. Furthermore, the “after quantizing” aspect is taught because the method in Sriram quantizes the network in order for it to run efficiently on an edge device.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of the references combined thus far with the teachings of Sriram by performing the further operation of “implementing the recurrent neural network in an edge computing device after quantizing all neural network layers in the recurrent neural network.” Doing so would have bene a combination of prior art elements according to known methods to yield predictable results (MPEP § 2143(I)(A)). Specifically, the prior art included each element claimed (as discussed above, Li teaches a quantized model as taught in Li and Sriram teaches deployment on an edge device after quantization); one of ordinary skill in the art could have combined the elements as claimed by known methods, and that in combination, each element merely performs the same function as it does separately (since an edge device is a known type of platform suitable for running a quantized model, and the model is suitable for operation on a variety of hardware, including edge hardware (see Li, § 7, paragraph 1, bullet 1); and one of ordinary skill in the art would have recognized that the results of the combination were predictable (specifically, the use of a known model on a known type of platform, namely edge devices).
As to claims 10 and 17, the further features recited in these claims are the same or substantially the same as those of claim 3. Therefore, the rejection made to claim 3 is applied to claims 10 and 17.
3. Claims 5, 12, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Li in view of Asad, and further in view of Chan et al. (US 2021/0406266 A1) (“Chan”).
As to claim 5, the combination of Li and Asad teaches the method of claim 1, wherein obtaining node outputs for the first node across the multiple time periods further comprises:
multiplying input vectors of the first node across the multiple time periods with weight matrices to obtain weighted matrices; [Li, § 2, paragraph 1, equations (1)-(3) which teaches the input vectors xt being multiplied by the weight matrices wi to obtain weighted matrices Wixt.].
Li as modified thus far does not explicitly teach the further operation of “concatenating the weighted matrices for further processing to obtain the node outputs.”
Chan teaches “concatenating the weighted matrices for further processing to obtain the node outputs.” [[0004]: “derive feature vectors representing individual cells, rows, and/or columns of the table and concatenate some or all of this information into another feature vector that aggregates these values. In this way, embodiments can use some or all of the “context” or values contained in various feature vectors representing some or all of a single table as signals or factors to consider when generating a decision statistic (e.g., a classification prediction)”].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of the references combined thus far with the teachings of Chan by modifying Chan so that obtaining node outputs further includes “concatenating the weighted matrices for further processing to obtain the node outputs.” The motivation would have been to aggregate information to facilitate generating a decision statistic, as suggested by Chan ([0004]: “In this way, embodiments can use some or all of the “context” or values contained in various feature vectors representing some or all of a single table as signals or factors to consider when generating a decision statistic (e.g., a classification prediction)”).
As to claims 12 and 19, the further features recited in these claims are the same or substantially the same as those of claim 5. Therefore, the rejection made to claim 5 is applied to claims 12 and 19.
4. Claims 7, 14, and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Li in view of Asad, and further in view of Ullah et al., “Action Recognition in Video Sequences using Deep Bi-Directional LSTM With CNN Features,” in IEEE Access, vol. 6, pp. 1155-1166, 2018 (“Ullah”).
As to claim 7, the combination of Li and Asad teaches the method of claim 1, wherein the neural network is a recurrent neural network […]. [As discussed in the rejection of a parent claim, Li teaches an LSTM, which is a type of RNN. See also § 2, paragraph 1: “The long short-term memory cell is one of the most complex and widely used RNN topologies.”].
The combination of references thus far does not teach the limitation that the recurrent neural network is “for video recognition.”
Ullah teaches an RNN that is “for video recognition” [Abstract: “In this paper, we propose a novel action recognition method by processing the video data using convolutional neural network (CNN) and deep bidirectional LSTM (DB-LSTM) network… The proposed method is capable of learning long term sequences and can process lengthy videos by analyzing features for a certain time interval.”]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of the references combined thus far with the teachings of Ullah by implementing the recurrent neural network to be for video recognition. The motivation would have been to apply the model techniques of Li for the application of action recognition in video sequences, as suggested by Ullah (see abstract: “action recognition method by processing the video data using convolutional neural network (CNN) and deep bidirectional LSTM (DB-LSTM) network,” which teaches that action recognition is a known problem application for LSTMs).
As to claims 14 and 21, the further features recited in these claims are the same or substantially the same as those of claim 7. Therefore, the rejection made to claim 7 is applied to claims 14 and 21.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The following document depicts the state of the art.
Strobelt et al., “LSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks,” arXiv:1606.07461v2 [cs.CL] 30 Oct 2017 teaches temporal profiling of LSTMs.
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YAO DAVID HUANG whose telephone number is (571)270-1764. The examiner can normally be reached Monday - Friday 9:00 am - 5:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached at (571) 270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Y.D.H./Examiner, Art Unit 2124
/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124