DETAILED ACTION
This non-final office action is responsive to application 18/105,396 as submitted 03 Feb. 2023.
Claim status is currently pending and under examination for claims 1-20 of which independent claims are 1, 13 and 19.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55. The application has an effective filing date of 10/06/2022.
Information Disclosure Statement
As required by MPEP 609(c), the applicant’s submissions of the Information Disclosure Statement dated 02/03/2023 is acknowledged by the examiner and the cited references have been considered in the examination of the claims now pending. As required by MPEP 609 C(2), a copy of the PTOL-1449 initialed and dated by the examiner is attached to the instant office action.
Specification
The specification is objected to because the title of the invention is not descriptive. A new title is required that is clearly indicative of the invention to which the claims are directed, see MPEP 606.01. The following title is suggested: Method and Device with Checkpointing of Neural Networks.
Claim Objections
Claims 10 and 13 are objected to because of the following informalities:
Claim 10 recites “operation of a another” grammar should read “operation of another”
Claim 13 of independent form recites “an ANN model” without introducing the acronym as “artificial neural network” similar to claim 1 being proper.
Appropriate correction is required.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 4-6 and 15 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. In determining whether the claims are subject matter eligible, the examiner applies guidance set forth under MPEP 2106.
Claims 4-6 depend from Claim 1 which does not recite an abstract idea. However, Claims 4-6 further introduce “determining” limitations which can be a mental determination as the abstract idea. This applies to Claim 15 which depends from Claim 13 in a similar manner, reciting limitation of Claim 4.
Under Step 1, each of the claims is to one of the four statutory categories: claims 4-6 recite a process/method and claim 15 recites an electronic device/machine, thus the analysis should proceed per MPEP 2106.03
Under Step 2A, prong one: abstract idea is recited under the broadest reasonable interpretation. Particularly, claims recite following limitations which can be mental processes under MPEP 2106.04(a)(2)
Claim 4: “determining whether a performing of a performing of a checkpointing of a result of performing an operation iteration is completed at a first time point at which a weight update operation of a subsequent operation iteration starts”
Claim 5: “determination that the performing of the checkpointing of the result of performing the operation iteration is not completed at the first time point”
Claim 6: “determining a storage path through the current storage location and the checkpointing based on a target location for storing the information about the state”
Claim 15: “determine whether a performing of a performing of a checkpointing of a result of performing an operation iteration is completed at a first time point at which a weight update operation of a subsequent operation iteration starts”
The above identified limitations further describe checkpointing wherein determinations are relied upon to become the focus of the claim. Such determinations do not preclude mental performance and may be carried out by a human such as judgment or evaluation by ad-hoc rule for observations. Accordingly, these claims are found to recite mental processes as the abstract idea.
Under Step 2A, prone two: additional elements do not integrate the judicial exception into a practical application. Particularly, additional elements point back to earlier dependency of ANN model learned and performing operations for information to be stored, as well as processor-implementation. The learned ANN model falls under MPEP 2106.05(h) generally linking the use of the judicial exception to a particular technological environment or field of use. Further, storing information is an insignificant extra-solution activity under MPEP 2106.05(g). Finally, processor device implementation amounts to mere use of a computer to perform the abstract idea under MPEP 2106.05(f). These independent claim additional elements are recited at a high level of generality such that the claim as a whole is no more than a drafting effort designed to monopolize the exception, with severe risk of pre-emption. Dependent claims 5 and 6 include further additional elements where claim 6 embellishes storage and claim 5 includes stopping the weight update operation. These fall under MPEP 2106.05(g) insignificant extra-solution activity similar to that which is already discussed. As is noted under MPEP 2106.04(a)(2) “a claim that requires a computer may still recite a mental process.” Accordingly, the claim remains directed to the abstract idea and the additional elements do not integrate the judicial exception into a practical application.
Under Step 2B: additional elements do not amount to significantly more. As already discussed, the additional elements are identified under MPEP 2106.05 and do not further reveal inventive concept. Particularly, the learned ANN model generally links the abstract idea to a particular technological environment or field of use under MPEP 2106.05(h). The specification gives non-limiting examples of the ANN model such as CNN [0046,45] or other which can be an off-the-shelf model not newly developed, it merely serves as an object to be checkpointed. Further, limitation of storing information is particularly a well-understood, routine and conventional (WURC) activity under MPEP 2106.05(d)(II)(iv) as identified by the courts. Finally, processor device implementation does not qualify as a particular machine under MPEP 2106.05(b). If the claim language provides only a result-oriented solution, with insufficient detail for how a computer accomplishes it, then the claims do contain an inventive concept. Taken alone, the additional elements do not amount to significantly more than the judicial exception. Looking at the limitations as an ordered combination does not elevate the claim as a whole to render eligible. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide for conventional computer implementation. For at least the foregoing reasons, claims 4-6 and 15 are found to be patent ineligible.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claims 1-7, 12-15 and 18-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by:
Mohan et al., “CheckFreq: Frequent, Fine-Grained DNN Checkpointing” hereinafter Mohan with Supplemental Slides from video presentation.
With respect to claim 1, Mohan teaches:
A processor-implemented method with checkpointing {Mohan Fig 3 introduced [P.203 Last¶] “We present CheckFreq, a fine-grained checkpointing framework for DNN training” and uses GPU-CPU processors of Server configuration [P.210 Sect5.1]}, the method comprising:
performing an operation for learning of an artificial neural network (ANN) model
{Instant specification [0046,45] “ANN (e.g., a CNN 20)” encompasses deep/multilayer networks. Mohan Fig 3 DNN training job [P.207 Last2¶] “DNN learning… learnable model” learning/training is operation, may use SGD, known models are listed [P.210 Sect5.1] ResNet, VGG16, BERT}; and
performing a checkpointing to store information about a state of the ANN model, simultaneously with performing the operation for the learning of the ANN model {Mohan discloses [P.207 Rt.Col] Checkpointing includes snapshot() phase where “model state is captured in memory, so that it can be written out to storage asynchronously” cont.’d [P.208 ¶3] “asynchronous checkpointing” asynchronous allows simultaneous operations for [P.211 Last2¶] “parallel training” or concurrent jobs. The operations are pipelined shown Fig 4(c)}.
With respect to claim 2, Mohan teaches the method of claim 1, wherein
the operation for the learning of the ANN model comprises a plurality of operation iterations {Mohan Fig 3 shows Iterator for DNN training, [P.206 Sect4.2] “perform n iterations per epoch” and/or [P.209 Sect4.3.2] “every k iterations (called the checkpointing frequency)”}, and
each of the plurality of operation iterations comprises a forward propagation operation, a backward propagation operation, and a weight update operation. {Mohan [Slide 3] shown below illustrates forward pass, backward pass and weight update of iterations where epochs are a complete pass for DNN training, this is introduced [P.205 ¶2] and reflected in Fig 4(c) where legend/key identifies forward, backward and weight update blocks by color}
PNG
media_image1.png
484
1340
media_image1.png
Greyscale
With respect to claim 3, Mohan teaches the method of claim 1, wherein the performing the checkpointing comprises
storing information about a state of the ANN model for a result of performing an operation iteration simultaneously with performing either one or both of a forward propagation operation and a backward propagation operation of a subsequent operation iteration {Mohan Fig 3 Storage, described [P.207 Sect4.3.1 Rt.Col] “checkpointing… model state is captured in memory” and Fig 3 Iterator e.g. [P.208 ¶1] “iteration i+1” +1 being subsequent, iterations perform forward/back passes [Slide 3] Fig 4(c). The simultaneous operation is performed asynchronously [P.207 Rt.Col – P.208 Left.Col]}.
With respect to claim 4, Mohan teaches the method of claim 1, wherein the performing of the checkpointing comprises
determining whether a performing of a checkpointing of a result of performing an operation iteration is completed at a first time point at which a weight update operation of a subsequent operation iteration starts {Mohan [P.209 Sect4.3.2] “determine the checkpointing frequency” frequency defined as iterations [P.209 Sect.4.3.2] and may use Algorithm 1. Further, [P.210 ¶2-3] CheckFreq maintains a “completed checkpoint” and “determines the initial checkpoint” as well as [P.208 ¶3] “subsequent checkpoint” where operation of weight update is shown Fig 4(c) and iterator is Fig 5}.
With respect to claim 5, Mohan teaches the method of claim 4, wherein the performing of the checkpointing comprises
stopping the weight update operation of the subsequent operation iteration based on a determination that the performing of the checkpointing of the result of performing the operation iteration is not completed at the first time point {Mohan [P.208 ¶1,3] “If snapshot() does not complete by then, then iteration i+1 waits… If the persist() has not completed, then the compute process waits” waiting/pausing is stopping, the compute is weight update of training and upon if/then determination that the operation is not complete, see Fig 4(c) checkpoint ‘stall’}.
With respect to claim 6, Mohan teaches the method of claim 1, wherein the performing of the checkpointing comprises:
obtaining a current storage location of the information about the state of the ANN model {Mohan [P.207 Sect4.3.1] “model state resides in GPU memory”}; and
determining a storage path through the current storage location and the checkpointing based on a target location for storing the information about the state of the ANN model {Mohan [P.207 Sect4.3.1] “snapshot() involves copying the model parameters from GPU to CPU memory… critical path” similar at [P.208 ¶2] “model state from GPU to CPU… GPU to CPU copy in the critical path” thus the current storage location is GPU memory and the target location for storing is CPU memory}.
With respect to claim 7, Mohan teaches the method of claim 1, wherein the information about the state of the ANN model comprises
any one or any combination of a parameter and an optimizer of the ANN model {Mohan [P.213 ¶1] “parameters and optimizer state” parameters include weights Fig 4, [P.205 ¶2-3], [P.207 Sect4.3.1]}.
With respect to claim 12, Mohan teaches the method of claim 1, further comprising
a non-transitory computer-readable storage medium storing instructions that, when executed by a processor, configure the processor to perform the method of claim 1 {Mohan [P.210 Sect4.4 ¶1] “We implement CheckFreq as a pluggable module for PyTorch” and [P.210 Sect5.1 ¶2, Tbl.1] “We use two ML server SKUs; each with 24 CPU cores, 500GB DRAM, and 8 GPUs…” hardware and software implementation and setup described, GitHub source code link is provided [Slides 9,44]}.
With respect to claim 13, the rejection of claim 1 is incorporated. The difference in scope being an electronic device comprising processor to perform limitations of method claim 1. Mohan discloses [P.210 Sect5.1] server configurations with GPU and CPU processors. The remainder of this claim is rejected for the same rationale as claim 1.
With respect to claim 14, Mohan teaches the electronic device of claim 13 and further teaches the limitation of claim 3. Therefore, the rejection of claim 3 is applied to claim 14.
With respect to claim 15, Mohan teaches the electronic device of claim 13 and further teaches the limitation of claim 4. Therefore, the rejection of claim 4 is applied to claim 15.
With respect to claim 18, Mohan teaches the electronic device of claim 13 and further teaches the limitation of claim 12. Therefore, the rejection of claim 12 is applied to claim 18.
With respect to claim 19, the rejection of claims 1-3 are incorporated. Mohan teaches a processor implemented method similar to claim 1 and further teaches limitations similar to claims 2-3. Therefore, the rejection of claims 1-3 are applied to claim 19.
With respect to claim 20, Mohan teaches the method of claim 19, wherein the performing of the checkpointing operation comprises
ending the checkpointing operation prior to a start of a weight update operation of the second ANN learning operation iteration {Mohan [P.209] Alg.1 “end if” of checkpointing freq algorithm, the frequency defined as iterations [P.209 ¶3], such that [P.210 ¶3] “waits on the ongoing snapshot() to ensure that a copy of the model state is completed before it is updated by the next iteration” similar at [P.208 ¶1] “weight update of iteration i+1 …iteration i+1 waits until the ongoing snapshot successfully completes as shown in Figure 4c”}.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 8-9 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Mohan in view of
Liao et al., “Mimose: An Input-Aware Checkpointing Planner for Efficient Training on GPU” hereinafter Liao (arXiv: 2209.02478v1).
With respect to claim 8, Mohan teaches the method of claim 8, wherein the performing of the checkpointing comprises. Mohan discloses ResNet and VGG-16 which are known CNN multilayer models but does not fairly detail the layers which is met by Liao:
performing the checkpointing of a layer, in which a weight update of an operation iteration is completed, in the unit of layer {Liao Fig 12 “checkpointed layer” or Fig 6-right checkpoint plan for layers [P.4 Sect4.1 ¶2] “layer in a given DL model” e.g. convolution, pooling and linear operators [P.5 Last2¶] to “update the model parameters” [P.3 ¶1] Fig 1 training}.
Liao is directed to checkpointing for training models thus being analogous. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to perform checkpointing layers per Liao in combination to support Mohan’s multilayer models ResNet or VGG16 in combination for a motivation of “checkpointing plan based on per-layer memory prediction and applies it to training progress on the fly” [Abst] see key contributions [P.2 Sect.1 Last2¶].
With respect to claim 9, the combination of Mohan and Liao teaches the method of claim 1, wherein the performing of the checkpointing comprises
checkpointing in a unit of layer of the ANN model {Liao Fig 12 “checkpointed layer” Fig 6-right checkpoint plan for layers being [P.4 Sect4.1 ¶2] “layer in a given DL model” e.g. convolution, pooling and linear operators [P.5 Last2¶]. An exemplary model is BERT-based XLNet [P.8 Sect6.1 ¶2] Table 1 and/or ResNet Fig 10-b}.
With respect to claim 16, Mohan teaches the electronic device of claim 13 and further combination with Liao teaches the limitation of claim 8. Therefore, the rejection of claim 8 with equal motivation is applied to claim 16.
Claims 10-11 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Mohan in view of Narayanan et al “Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM” hereinafter Narayanan.
With respect to claim 10, Mohan teaches the method of claim 1, wherein the performing of the operation for the learning of the ANN model comprises
while performing a backward propagation operation of a layer of an operation iteration, performing a weight update operation of a another layer of the operation simultaneously {Narayanan [Sect2.2 ¶2] “weight updates in the backward pass” with [Sect3.2 ¶2] “backward pass for each layer” illustratively Figs 2-4 transformer layers compute weight parameters with pipeline model parallelism for backward pass in shown in green}.
Narayanan is directed to trained machine learning models with checkpointing thus being analogous. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to employ the teaching of Narayanan to arrive at the invention as claimed as applying known techniques to known methods ready for improvement to yield predictable results and/or for a motivation it “improves efficiency” [P.2 ¶1] and/or “to achieve high aggregate throughput (502 petaFLOP/s) while training large models with a trillion parameters. This facilitates end-to-end training in reasonable times” [Sect.7 ¶1].
With respect to claim 11, the combination of Mohan and Narayanan teaches the method of claim 10, wherein the performing of the checkpointing comprises,
while performing the backward propagation operation of the layer of the operation iteration, performing a checkpointing of a another layer of the operation iteration simultaneously {Narayanan [Sect3.5 ¶2] “checkpointing every 1 or 2 transformer layers” shown Fig 2 layers, Fig 3-4 backward pass the checkpoints are loaded and saved per [Sect5.10]}. Motivation for combination is applied similarly as in claim 10.
With respect to claim 17, Mohan teaches the electronic device of claim 13 and further combination with Narayanan teaches the limitation of claim 10. Therefore, the rejection of claim 10 with equal motivation is applied to claim 17.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Chase P Hinckley whose telephone number is (571)272-7935. The examiner can normally be reached M-F 9:00 - 5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda M. Huang can be reached at 571-270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/CHASE P. HINCKLEY/Examiner, Art Unit 2124