Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Detailed Action
This action is in response to the claims filed 4/24/2023:
Claims 1 – 20 are pending.
Claims 1, 13, and 20 are independent.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Regarding claims 1, 13, and 20, "programming the hardware device to implement the computational graph based on the updated linearized metric" is grammatically indefinite. It's grammatically unclear what is "based on the updated linearized metric". Syntactically the "programming", "to implement", or "the computational graph" could be based on the updated linearized metric. Because the limitation is underspecified it's unclear which of these is based on the updated linearized metric or how. As each of these interpretations are contradictory the scope of the claim cannot be reasonably determined. In the interest of further Examination the claim is interpreted as "programming the hardware device to implement the computational graph, wherein the computational graph is based on the updated linearized metric".
The remaining claims are rejected with respect to their dependence on the rejected claims.
Claim Rejections - 35 USC § 101
101 Rejection
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 are rejected under 35 USC § 101 because the claimed invention is directed to non-statutory subject matter.
Regarding Claim 1: Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 1 is directed to a method, which is directed towards a process, one of the statutory categories.
Step 2A Prong One Analysis: Claim 1 under its broadest reasonable interpretation is a series of mental processes. For example, but for the generic computer components language, the above limitations in the context of this claim encompass graph processing, including the following:
obtaining an algorithm for a computational graph (observation, evaluation, and judgement),
computing an initial linearized metric for a performance parameter for performing the algorithm using a hardware device (observation, evaluation, and judgement)
computing an updated linearized metric based on the initial linearized metric and a non-linear constraint on the performance parameter (observation, evaluation, and judgement)
Therefore, claim 1 recites an abstract idea which is a judicial exception.
Step 2A Prong Two Analysis: Claim 1 recites additional elements “programming the hardware device to implement the computational graph based on the updated linearized metric”. However, these additional features are computer components recited at a high-level of generality, such that they amount to no more than mere instructions to apply the judicial exception using a generic computer component. An additional element that merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, does not integrate the judicial exception into a practical application (See MPEP 2106.05(f)). Claim 1 also recites additional elements “performing learning by an autoencoder” which amounts to generally linking the judicial exception to a particular technology or field of use. Therefore, claim 1 is directed to a judicial exception.
Step 2B Analysis: Claim 1 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the lack of integration of the abstract idea into a practical application, the additional elements recited in claim 1 amount to no more than mere instructions to apply the judicial exception using a generic computer component.
For the reasons above, claim 1 is rejected as being directed to non-patentable subject matter under §101. This rejection applies equally to independent claims 13 and 20, which recite an apparatus and a computer program product, respectively, as well as to dependent claims 2-12 and 14-19. The additional limitations of the dependent claims are addressed briefly below:
Dependent claim 2 recites additional observation, evaluation, and judgement “identifying a plurality of layers of the computational graph”, “grouping the plurality of layers into a plurality of sequences”, and “performing a dynamic programming process to identify a subset of sequences from the plurality of sequences, wherein the initial linearized metric is based on the subset of sequences, and wherein the dynamic programming process is based on a linearity constraint on the performance parameter”
Dependent claim 3 recites additional observation, evaluation, and judgement “identifying a tiling of the plurality of layers” as well as additional instructions to apply the judicial exception using generic computer components “wherein the hardware device is programmed based on the tiling.”
Dependent claim 4 recites additional observation, evaluation, and judgement “performing an additional dynamic programming process to identify an updated subset of sequences from the plurality of sequences, wherein the updated linearized metric is based on the updated subset of sequences” as well as additional instructions to apply the judicial exception using generic computer components “wherein the hardware device is programmed based on the updated subset of sequences”
Dependent claim 5 recites additional observation, evaluation, and judgement “determining an initial weight for the performance parameter; and computing an initial score for the subset of sequences based on the initial linearized metric and the initial weight”
Dependent claim 6 recites additional observation, evaluation, and judgement “computing a weighted sum of a plurality of linearized metrics including the initial linearized metric based on a plurality of initial weights including the initial weight, wherein the initial score is based on the weighted sum”
Dependent claim 7 recites additional observation, evaluation, and judgement “determining an updated weight for the performance parameter based on the initial score; and computing an updated score for the subset of sequences based on the updated weight and the updated linearized metric, wherein the subset of sequences is selected based on the updated score”
Dependent claim 8 recites additional observation, evaluation, and judgement “computing scores for a plurality of subsets of sequences, respectively, wherein the subset of sequences is selected from the plurality of subsets of sequences based on the scores”
Dependent claim 9 recites additional observation, evaluation, and judgement “computing a non-linear term based on the initial weight and the non-linear constraint, wherein the initial score is based on the non-linear term.”
Dependent claim 10 recites additional observation, evaluation, and judgement “compiling instructions for performing the algorithm based on the updated linearized metric, wherein the hardware device is programmed based on the compiled instructions”
Dependent claim 11 recites additional observation, evaluation, and judgement “modifying a design for the hardware device based on the updated linearized metric, wherein the hardware device is programmed based on the modified design”
Dependent claim 12 recites additional observation, evaluation, and judgement “modifying an algorithm for the computational graph based on the updated linearized metric, wherein the hardware device is programmed based on the modified algorithm”
Therefore, when considering the elements separately and in combination, they do not add significantly more to the inventive concept. Accordingly, claims 1-20 are rejected under 35 U.S.C. § 101.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claims 1-6, 8-18, and 20 are rejected under U.S.C. §102(a)(1) as being unpatentable over the combination of Gondimalla (“Occam: Optimal Data Reuse for Convolutional Neural Networks”, 2022).
Regarding claim 1, Gondimalla teaches A method comprising: obtaining an algorithm for a computational graph;([p. 2] "we describe CNNs in terms of access patterns" [p. 14 §4] "We implement Occam in CUDA. which is used prevalently to implement CNNs." CNN interpreted as an algorithm for a computational graph in light of the instant specification at least at ([¶0059] "a neural network computational graph (e.g., neural network graph 500-a, neural network graph 500-b, etc.) is a graphical representation of operations and data flow in a neural network (e.g., such as in an ANN, CNN, etc.).").)
computing an initial linearized metric for a performance parameter for performing the algorithm using a hardware device;([p. 2] "CNNs are amenable to prefetching and multithreading, the problem is memory bandwidth" [p. 11] "If single-layer spans (SPAN(i,i + 1)) fit in the cache, we initialize OP[i,i + 1], 0 ≤ i < l" [p. 14] "we carefully scale the simulated GPU to match a slice of the above system that performs a single inference out of the 32-image mini-batch. This scaling reduces the compute bandwidth" [p. 11] "X, the optimal number of transfers" base span/initial number of off-chip feature map transfers OP[i,j].X interpreted as initial linearized metric for a performance parameter (total off-chip traffic (the data moved between the hardware chip to off-chip hardware memory)) for performing the algorithm using a hardware device [p. 1] “Occam’s partitions reside on different chips, forming a pipeline so that a partition’s filters and dependence closure remain on-chip as different images pass through (i.e., each partition incurs off-chip traffic only for its inputs and outputs).”)
computing an updated linearized metric based on the initial linearized metric and a non-linear constraint on the performance parameter; and ([p. 11] "The above table update step accurately tracks the number of off-chip transfers (i.e., X) of each of the two resulting spans" [p. 11] "If single-layer spans (SPAN(i,i + 1)) fit in the cache, we initialize OP[i,i + 1], 0 ≤ i < l […] If even a single layer does not fit, then chip-residence for filters is not possible and weights must be included in the transfers " See also Eqn. 5 which shows the linearized metric update. If span fits in cache interpreted as non-linear constraint on the performance parameter. As mentioned above OP[i,j].X interpreted as initial linearized metric. Gondimalla explicitly discloses that if span does not fit in cache (the non-linear constraint) then OP[i,j].X is updated (as shown in FIG. 5).)
programming the hardware device to implement the computational graph based on the updated linearized metric. ([p. 12] "The DP algorithm is used to optimize partitions offline (like compiler optimizations) […] Occam captures the enormous inter-image filter reuse by placing each partition – its filters and dependence closure – in a separate chip (e.g., a GPU, TPU, or FPGA)").
Regarding claim 2, Gondimalla teaches The method of claim 1, further comprising: identifying a plurality of layers of the computational graph; (Gondimalla [p. 2] "the large number of layers (e.g., 34 in ResNet), and filters per layers […] result in heavy compute and large intermediate data" See also FIG. 1)
grouping the plurality of layers into a plurality of sequences; and (Gondimalla [p. 9] "We define a SPAN (i, j) as the convolution computations starting with Li as the input and ending with Lj as the output" [p. 10] "the layers must be partitioned into spans such that (1) each SPAN (i, j) satisfies the capacity constraint that the dependence closure" SPAN interpreted as sequence of layers)
performing a dynamic programming process to identify a subset of sequences from the plurality of sequences, wherein the initial linearized metric is based on the subset of sequences, and wherein the dynamic programming process is based on a linearity constraint on the performance parameter. (Gondimalla [p. 12] "The DP algorithm is used to optimize partitions offline (like compiler optimizations) […] Occam captures the enormous inter-image filter reuse by placing each partition – its filters and dependence closure – in a separate chip (e.g., a GPU, TPU, or FPGA)" DP interpreted as dynamic programming.).
Regarding claim 3, Gondimalla teaches The method of claim 2, further comprising: identifying a tiling of the plurality of layers, wherein the hardware device is programmed based on the tiling. (Gondimalla [p. 14] "We implement our DP algorithm as a stand-alone JavaScript application that takes as input the network parameters and produces the optimal partitions and tile dimensions" [p. 15] "we use Occam’s optimal partition algorithm for Layer Fusion to compute the partitions with the largest square tile whose dependence closure for a given partition would fit in the cache (a different tile size for each partition). Because the tiles are suboptimal even though the partitions are optimal, Layer Fusion does not capture full reuse. We verified the functional correctness of these schemes by comparing against an unmodified ConvNet" See also FIG. 6).
Regarding claim 4, Gondimalla teaches The method of claim 2, further comprising: performing an additional dynamic programming process to identify an updated subset of sequences from the plurality of sequences, wherein the updated linearized metric is based on the updated subset of sequences, wherein the hardware device is programmed based on the updated subset of sequences. (Gondimalla [p. 16 §5.1] "In Table 2, we present the optimal partitions and tile dimensions for our networks for 3-MB onchip memory. While the tile dimensions for Occam and Layer Fusion are different, we use Occam’s partition algorithm to derive partitions for Layer Fusion (see Section 4). The partitions are shown using the start layer for each" See Table 2 which shows a plurality of additional DP processes, each corresponding to a different model).
Regarding claim 5, Gondimalla teaches The method of claim 4, further comprising: determining an initial weight for the performance parameter; and (Gondimalla [p. 10] "weights (i.e., Σj−1 k=i |Wk |) of the span must fit in the cache" Wk interpreted as initial weight for the performance parameter)
computing an initial score for the subset of sequences based on the initial linearized metric and the initial weight. (Gondimalla [p. 10] "(|DC(pi ,pi+1) | + Σpi+1−1 k=pi |Wk |)" Interpreted as initial score for the subset of sequences (defined by pi, pi+1) based on the initial linearized metric and the initial weight).
Regarding claim 6, Gondimalla teaches The method of claim 5, further comprising: computing a weighted sum of a plurality of linearized metrics including the initial linearized metric based on a plurality of initial weights including the initial weight, wherein the initial score is based on the weighted sum. (Gondimalla [p. 11] "OP[i,j].X = OP[i,popt].X +OP[popt,j].X" See Eqn. 5 and Eqn. 7).
Regarding claim 8, Gondimalla teaches The method of claim 5, further comprising: computing scores for a plurality of subsets of sequences, respectively, wherein the subset of sequences is selected from the plurality of subsets of sequences based on the scores. (Gondimalla [p. 2] "CNNs are amenable to prefetching and multithreading, the problem is memory bandwidth" [p. 11] "If single-layer spans (SPAN(i,i + 1)) fit in the cache, we initialize OP[i,i + 1], 0 ≤ i < l" [p. 14] "we carefully scale the simulated GPU to match a slice of the above system that performs a single inference out of the 32-image mini-batch. This scaling reduces the compute bandwidth" [p. 11] "X, the optimal number of transfers" [p. 11] "we consider every possible partition of SPAN (i, j) and pick the partition point p that yields the fewest transfers in the two resulting sub-spans, SPAN (i,p) and SPAN (p, j)." X of OP[i,j].X interpreted as score for subset SPAN(i,j) which is selected based on X.).
Regarding claim 9, Gondimalla teaches The method of claim 5, further comprising: computing a non-linear term based on the initial weight and the non-linear constraint, wherein the initial score is based on the non-linear term.(Gondimalla [p. 9] "DC(i,j) defines the dependence closure of one row-plane of the output feature map in Lj extending back to the feature map of layer Li").
Regarding claim 10, Gondimalla teaches The method of claim 1, further comprising: compiling instructions for performing the algorithm based on the updated linearized metric, wherein the hardware device is programmed based on the compiled instructions. (Gondimalla [p. 12] "The DP algorithm is used to optimize partitions offline (like compiler optimizations) […] Occam captures the enormous inter-image filter reuse by placing each partition – its filters and dependence closure – in a separate chip (e.g., a GPU, TPU, or FPGA)").
Regarding claim 11, Gondimalla teaches The method of claim 1, further comprising: modifying a design for the hardware device based on the updated linearized metric, wherein the hardware device is programmed based on the modified design.(Gondimalla [p. 14] "We implement our DP algorithm as a stand-alone JavaScript application that takes as input the network parameters and produces the optimal partitions and tile dimensions. We then feed these outputs to ConvNet." See FIG. 5).
Regarding claim 12, Gondimalla teaches The method of claim 1, further comprising: modifying an algorithm for the computational graph based on the updated linearized metric, wherein the hardware device is programmed based on the modified algorithm.(Gondimalla [p. 1] "we propose a dynamic programming algorithm that optimally partitions a given CNN to guarantee the least off-chip traffic at the partition boundaries for a given on-chip capacity" Partitioning the CNN using dynamic programming interpreted as synonymous with modifying an algorithm (the CNN) for the computational graph based on the updated linearized metric, wherein the hardware device is programmed based on the modified algorithm.).
Regarding claim 13, claim 13 is directed towards an apparatus for performing the method of claim 1. Therefore, the rejection applied to claim 1 also applies to claim 13.
Claim 13 also recites additional elements a processor and a memory storing instructions and in electronic communication with the processor, the processor being configured to execute the instructions to (Gondimalla [p. 3] "Finally, the input-stationary approach [11] holds on-chip input/output maps and fetches filters from off-chip, ignoring filter reuse across images (e.g., TPU). Occam’s partitions reside on different chips (e.g., GPU, TPU, or FPGA) available in a multi-accelerator environment such as data centers, forming a pipeline so that a partition’s filters and dependence closure remain on-chip as different images pass through (i.e., each partition incurs off-chip traffic only for its inputs and outputs)").
Similarly, regarding claims 14-18, claims 14-18 are directed towards an apparatus for performing the method of claims 2-6, therefore, the rejections applied to claims 2-6 also apply to claims 14-18.
Regarding claim 20, claim 20 is directed towards a non-transitory computer readable medium storing code, the code comprising instructions executable by a processor for performing the method of claim 1. Therefore, the rejection applied to claim 1 also applies to claim 20.
A non-transitory computer readable medium storing code, the code comprising instructions executable by a processor to (Gondimalla [p. 3] "Finally, the input-stationary approach [11] holds on-chip input/output maps and fetches filters from off-chip, ignoring filter reuse across images (e.g., TPU). Occam’s partitions reside on different chips (e.g., GPU, TPU, or FPGA) available in a multi-accelerator environment such as data centers, forming a pipeline so that a partition’s filters and dependence closure remain on-chip as different images pass through (i.e., each partition incurs off-chip traffic only for its inputs and outputs)").
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 7 and 19 are rejected under U.S.C. §103 as being unpatentable over the combination of Gondimalla and Hu (“Learning Anytime Predictions in Neural Networks via Adaptive Loss Balancing”, 2019).
Regarding claim 7, Gondimalla teaches The method of claim 5.
However, Gondimalla doesn't explicitly teach determining an updated weight for the performance parameter based on the initial score; and computing an updated score for the subset of sequences based on the updated weight and the updated linearized metric, wherein the subset of sequences is selected based on the updated score.
Hu, in the same field of endeavor, teaches determining an updated weight for the performance parameter based on the initial score; and computing an updated score for the subset of sequences based on the updated weight and the updated linearized metric, wherein the subset of sequences is selected based on the updated score.([p. 4] "we form the following joint optimization over 0 and Bi for general losses without probability models:" See Eqn. 3 where li is a performance parameter, the initial L for i=0, Bi is the updated weight for i+1, and the updated score is L for i+1).
Gondimalla as well as Hu are directed towards neural network optimization. Therefore, Gondimalla as well as Hu are reasonably pertinent analogous art. It would have been obvious before the effective filing date of the claimed invention to use the AdaLoss function in Hu with the OCCAM system in Gondimalla by equating B with W and l with X. Hu provides as additional motivation for combination ([p. 2] “we find the joint optimization equivalent to optimizing the geometric mean of the expected training losses, an objective that treats the relative improvement of each loss equally. Empirically, we show on multiple models and visual recognition data-sets that the proposed adaptive weights outperform natural, non-adaptive weighting schemes”).
Regarding claim 19, claim 19 is directed towards an apparatus for performing the method of claim 7. Therefore, the rejection applied to claim 7 also applies to claim 19.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Harlap (“PipeDream: Fast and Efficient Pipeline Parallel DNN Training”, 2018) is directed towards a hardware aware dynamic programming process for neural networks.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720. The examiner can normally be reached M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/SIDNEY VINCENT BOSTWICK/Examiner, Art Unit 2124
/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124