Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
The objections to Drawings and Specification are withdrawn based on the amendment filed on 12/16/2025.
Applicant’s arguments in light of the amendment filed on 12/16/2025, with respect to the 35 U.S.C. 112 have been fully considered and are persuasive. The 35 U.S.C. 112 rejection has been withdrawn.
Applicant’s arguments in light of the amendment filed on 12/16/2025, with respect to the prior art rejections have been fully considered and are persuasive. The 35 U.S.C. 102 rejection and 35 U.S.C. 103 rejection have been withdrawn.
Applicant’s arguments (starting on pg. 14 of the Remarks filed on 12/16/2025) with regards to the newly added claims 44-49 are not persuasive. Specifically, Examiner disagrees that claim 44 and claim 47 are of identical scope as allowable original claim 11 per the previous office action. Claim 11 from the original claim set recites, inter alia, three distinct calculations, explicit application of a lossy compression, and gradually increase compression rate of the data. Applicant has broadened the scope in claims 44 and 47, where these aspects of original claim 11 are no longer required. Applicant further attempts be bring in the some aspects of the original claims into dependent claims 45, 46, 48 and 49. However, there are still differences between the original claim 11, for instance, there is not a clear process correlation of what the first, second and third processes recited in the original claims vis-à-vis these new claims. Thus, new claims 45, 46, 48 and 49 are also of different scope than original claim 11.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claims 44, 45, 47 and 48 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by POSTER: A Novel Memory-Efficient Deep Learning Training Framework via Error-Bounded Lossy Compression to Jin et al. (hereinafter Jin).
Per claim 44, Jin discloses a method for processing data (Abstract…a method for processing training data through a neural network, “we propose a novel memory-driven high performance CNN training framework that leverages error-bounded lossy compression to significantly reduce the memory requirement for training”), the method comprising:
executing, by at least one processor (Section 3…“Our experiment platform is the TACC Longhorn system, of which each GPU node is equipped with 4 Nvidia Tesla V100 GPUs per node”), a forward process of a neural network to generate intermediate data (Abstract…“the intermediate activation data must be saved in the memory during forward propagation”; Figure 1…the “Forward” step generates “Activation Data” at each convolutional layer);
compressing, by the at least one processor, the intermediate data to generate compressed data (Section 2…Adaptive Compression, “we deploy the lossy compression with our optimized configuration to the corresponding convolutional layers. We use the GPU version of SZ lossy compression”; Figure 1…the “SZ Compression” block receives activation data, e.g., the intermediate data and outputs “Compressed Data”);
executing, by the at least one processor, a backward process of the neural network based on the compressed data (Abstract…“then restored for backward propagation”; Figure 1…the “SZ Decompression” block decompresses the stored compressed data for use in the “Backward” step, which computes gradients based on the decompressed activation data); and
executing, by the at least one processor, an update process of parameters of the neural network after executing the backward process (Figure 1…the backward step produces “Gradient” and “Momentum” values which are used to update the network parameters; Section 2…Parameter Collection, “Our framework mainly collects two types of parameters: (1) offline parameters in CNN architecture, and (2) semi-online parameters including activation data samples, gradient, and momentum ”; Section 3…“the learning rate only matters when updating the weights”);
wherein the method is repeatedly executed through a plurality of iterations (Section 2…“We iteratively repeat the process shown in Figure 1 for each convolutional layer in every iteration”; Section 2, Parameter Collection…“we only extract semi-online parameters every W iterations…”), and
wherein a compression rate used in the compressing is varied in at least one of the plurality of iterations (Section 2, Activation Assessment…“We dynamically configure the lossy compression for activation data based on the gradient assessment in the previous phase and the collected parameters” with error bound of eb = σ/(aL̅√(NR)) where L̅ (average loss) and R (sparsity ratio) change during training, causing the compression configuration to vary across iterations; Section 3…“In the early stage of the training, compression ratio can be slightly unstable because of the relatively large change to the model. Note that the compression ratio will change slightly when the learning rate changes”; Figure 2…the average compression ratio varies from approximately 5x in early iterations to approximately 20x in later iterations).
Per claim 45, Jin discloses claim 44, further disclosing generating, by the at least one processor, the compressed data by applying lossy compression to the intermediate data (Abstract…“leverages error-bounded lossy compression to significantly reduce the memory requirement for training”; Section 2, Adaptive Compression…“we deploy the lossy compression with our optimized configuration to the corresponding convolutional layers”).
Claims 47 and 48 are substantially similar in scope and spirit to claims 44 and 45. Therefore, the rejections of claims 44 and 45 are applied accordingly. Jin discloses A data processing device comprising at least one processor and at least one memory storing instructions that implement the method (Section 3…“Our experiment platform is the TACC Longhorn system, of which each GPU node is equipped with 4 Nvidia Tesla V100 GPUs per node”).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 46 and 49 are rejected under 35 U.S.C. 103 as being unpatentable over Jin in view of To Prune, or not to prune: exploring the efficacy of pruning for model compression to Zhu et al. (hereinafter Zhu).
Per claim 46, Jin discloses claim 44. Jin further discloses executing repeatedly the plurality of iterations each including the forward process and the backward process (Section 2…"We iteratively repeat the process shown in Figure 1 for each convolutional layer in every iteration"; Figure 1…each iteration includes Forward and Backward steps). Jin discloses that the compression ratio varies across training iterations (Section 3…"the compression ratio will change slightly when the learning rate changes"; Figure 2…compression ratio changes over iterations), but does not explicitly characterize this variation as gradually increasing the compression rate.
Jin does not expressly disclose, but with Zhu does teach: while gradually increasing the compression rate of the intermediate data (Zhu: Section 3…"We introduce a new automated gradual pruning algorithm in which the sparsity is increased from an initial sparsity value si (usually 0) to a final sparsity value sf over a span of n pruning steps, starting at training step t0 and with pruning frequency Δt"; Equation 1…st = sf + (si − sf)(1 − (t − t0)/(nΔt))3; Section 3…"The binary weight masks are updated every Δt steps as the network is trained to gradually increase the sparsity of the network while allowing the network training steps to recover from any pruning-induced loss in accuracy"; Figure 1…the sparsity function shows a gradual increase from 0 to target sparsity over pruning steps). A person having ordinary skill in the art would have recognized that Zhu's gradual sparsity increase is a form of gradually increasing compression, such that as sparsity increases, a greater proportion of network parameters are compressed to zero, directly increasing the effective compression rate of the network data. The same scheduling principle applies to any form of training-time compression, including the lossy compression of intermediate activation data in Jin's framework, because both weight pruning and activation compression remove information from the training pipeline and the network's tolerance for such information loss increases as training progresses toward convergence.
Jin and Zhu are analogous art because they are both within the same field of endeavor, specifically memory-efficient and computationally-efficient deep neural network training. They address the same problem solving area of reducing memory consumption and computational cost during the training of deep neural networks. Jin proposes compressing intermediate activation data to reduce GPU memory consumption during training (Jin: Abstract…"significantly reduce the memory requirement for training in order to allow training larger neural networks"), while Zhu proposes gradually increasing sparsity, e.g., compression, during training to reduce model size (Zhu: Abstract…"propose a new gradual pruning technique that is simple and straightforward to apply across a variety of models/datasets with minimal tuning and can be seamlessly incorporated within the training process"). Both references address the fundamental tradeoff between compression aggressiveness and training accuracy.
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to modify Jin's adaptive activation compression framework to gradually increase the compression rate over training iterations following a schedule, as taught by Zhu. This combination represents the application of a known technique (Zhu's gradual compression scheduling) to a known method (Jin's adaptive activation compression framework) ready for improvement, to yield the predictable result of a training framework with a controlled, gradually increasing activation compression rate. Additionally, Jin's own experimental results demonstrate that the compression ratio naturally trends upward as training progresses (Jin: Figure 2…compression ratio increases from approximately 5x to 20x over training iterations), establishing that the combination follows a pattern already suggested by the primary reference to Jin. A person having ordinary skill in the art would have recognized that the same principle underlying Zhu's gradual pruning approach, that neural networks are more sensitive to aggressive compression in the early stages of training when parameters are far from convergence, and can tolerate more aggressive compression as training progresses and the model stabilizes, which applies equally to the compression of intermediate activation data in Jin's framework, because the fidelity requirements for both stored weights and stored activations are governed by the same training convergence dynamics.
The suggestion/motivation for doing so would have been that Jin's own experimental results demonstrate that the compression ratio naturally increases as training progresses (Jin: Figure 2…compression ratio increases from approximately 5x to 20x over training iterations), and Zhu provides an explicit, principled schedule for gradually increasing compression during training that "allows the network training steps to recover from any pruning-induced loss in accuracy" (Zhu: Section 3). A person having ordinary skill in the art would have been motivated to apply Zhu's gradual increase schedule to Jin's activation compression to provide a more controlled and predictable compression trajectory, thereby ensuring training stability while maximizing memory savings. The combination of Jin's activation compression with Zhu's gradual compression schedule would have yielded predictable results namely, a training framework that gradually increases the activation compression rate since both techniques operate on well-understood principles of neural network training dynamics.
Claim 49 is substantially similar in scope and spirit to claim 46. Therefore, the rejection of claim 46 is applied accordingly.
Allowable Subject Matter
Claims 1, 21-30, and 31-43 are allowed.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALAN CHEN whose telephone number is (571)272-4143. The examiner can normally be reached M-F 10-7.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached at (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ALAN CHEN/Primary Examiner, Art Unit 2125