DETAILED ACTION
This Action is responsive to Claims filed 04/03/2023.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.
Drawings
Receipt of Drawings filed 04/03/2023 is acknowledged. These Drawings are acceptable.
Status of the Claims
Claims 1-20 are currently pending.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claim 12 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claim 12 recites “wherein the operations further comprise determining, while performing inference, the memory capacity available for performing inference; wherein the selecting is performed in response to a change in memory capacity available for performing inference; and wherein the retrieving is performed in response to selecting model metadata corresponding to a different model than currently used for performing inference.” Which, based on the Applicant’s Specification [0078], seems to pertain to whether a given model is loaded versus a new model retrieved. The verbiage of the claim seems to imply this determination is made during inference, and that a new model is selected during inference. It is unclear from the verbiage of the claim what action is being taken and when by the claim limitations.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more; and because the claims as a whole, considering all claim elements both individually and in combination, do not amount to significantly more than the abstract idea, see Alice Corporation Pty. Ltd. v. CLS Bank International, et al, 573 U.S. (2014). In determining whether the claims are subject matter eligible, the Examiner applies the 2019 USPTO Patent Eligibility Guidelines. (2019 Revised Patent Subject Matter Eligibility Guidance, 84 Fed. Reg. 50, Jan. 7, 2019.)
Claims 1, 11, and 13:
Step 1:
Claims 1-10 recite a non-transitory computer-readable medium, which falls under the statutory category of a manufacture. Claims 11-12 recite a non-transitory computer-readable medium, which falls under the statutory category of a manufacture. Claims 13-20 recite a method, which falls under the statutory category of a process.
Claims 1 and 13:
Step 2A – Prong 1:
Claim 1 recites an abstract idea, law of nature, or natural phenomenon. The limitations of “masking at least one edge among a plurality of edges of a trained model to produce a masked model;”, “detecting, from among a plurality of channels of the masked model, each channel among the plurality of channels including a set of edges among the plurality of edges, at least one zero channel in which each edge among the set of edges is masked;”, “determining, from among a plurality of nodes of the masked model, each node corresponding to two channels among the plurality of channels, at least one removable node in which the corresponding two channels are zero channels;”, and “pruning the masked model to remove the removable nodes from the masked model, resulting in a pruned model.” under the broadest reasonable interpretation, cover a mental process including an observation, evaluation, judgment or opinion that could be performed in the human mind or with the aid of pencil and paper. These limitations therefore fall within the mental process group.
Generically masking an edge of a pretrained model is practically performed within a human mind or with the aid of pen and paper. Generically detecting a zero channel of said model is practically performed within the human mind or with the aid of pen and paper. Generically determining one or more removeable nodes based on the channels and pruning said nodes from the generic model is practically performed within the human mind or with the aid of pen and paper.
Step 2A – Prong 2:
The additional elements of claim 1 do not integrate the abstract idea into a judicial exception. The claim recites the additional elements “A non-transitory computer-readable medium including instructions executable by a processor to cause the processor to perform operations comprising” which are recognized as generic computer components recited at a high level of generality. Although it has and executes instructions to perform the abstract idea itself, this also does not serve to integrate the abstract idea into a practical application as it merely amounts to instructions to "apply it." (See MPEP 2106.04(f)(2) indicating mere instructions to apply an abstract idea does not amount to integrating the abstract idea into a practical application).
The additional elements “model”, “edge”, and “node” are recognized as non-generic computer components, however, they are found to generally link the abstract idea to a particular technological field (See MPEP 2106.05(h)).
The additional elements recited in the limitations “initializing the masked model;” and “training the masked model;” amount to mere instructions to apply the abstract idea mental process step(s) of producing the masked model, as the initializing and training are recited highly generally (See MPEP 2106.05(f)).
Step 2B:
The only limitation on the performance of the described method is a limitation reciting “A non-transitory computer-readable medium including instructions executable by a processor to cause the processor to perform operations comprising” These elements are insufficient to transform a judicial exception to a patentable invention because the recited elements are considered insignificant extra-solution activity (generic computer system, processing resources, links the judicial exception to a particular, respective, technological environment). The claim thus recites computing components only at a high-level of generality such that it amounts to no more than mere instructions to apply the exception using generic computer components; mere instructions to apply an exception using a generic computer component cannot provide an inventive concept (see MPEP 2106.05(f)).
The additional elements “model”, “edge”, and “node” are recognized as non-generic computer components, however, they are found to generally link the abstract idea to a particular technological field (See MPEP 2106.05(h).
The additional elements recited in the limitations “initializing the masked model;” and “training the masked model;” amount to mere instructions to apply the abstract idea mental process step(s) of producing the masked model, as the initializing and training are recited highly generally (See MPEP 2106.05(f)).
Taken alone or in ordered combination, these additional elements do not amount to significantly more than the above-identified abstract idea. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide conventional computer implementation.
For the reasons above, claim 1 is rejected as being directed to non-patentable subject matter under §101. This rejection applies equally to independent claim 13.
Claim 13 recites similar limitations to Claim 1 with the exception of “A method comprising:” (generic computer components), therefore, both claims are similarly rejected.
Claim 11:
Step 2A – Prong 1:
Claim 11 recites an abstract idea, law of nature, or natural phenomenon. The limitations of “determining a memory capacity available for performing inference;” and “selecting a model metadata based on the accuracy from among model metadata representing memory capacity required during inference that is less than or equal to the memory capacity available for performing inference;” under the broadest reasonable interpretation, cover a mental process including an observation, evaluation, judgment or opinion that could be performed in the human mind or with the aid of pencil and paper. These limitations therefore fall within the mental process group.
Generically determining a memory capacity is practically performed within the human mind or with the aid of pen and paper. Generically selecting a model based on a threshold available memory capacity is practically performed within the human mind or with the aid of pen and paper.
Step 2A – Prong 2:
The additional elements of claim 1 do not integrate the abstract idea into a judicial exception. The claim recites the additional elements “A non-transitory computer-readable medium including instructions executable by a processor to cause the processor to perform operations comprising:”, “a server”, “a network”, and “memory” which are recognized as generic computer components recited at a high level of generality. Although it has and executes instructions to perform the abstract idea itself, this also does not serve to integrate the abstract idea into a practical application as it merely amounts to instructions to "apply it." (See MPEP 2106.04(f)(2) indicating mere instructions to apply an abstract idea does not amount to integrating the abstract idea into a practical application).
The additional elements recited in the limitations “model” are recognized as non-generic computer components, however, they are found to generally link the abstract idea to a particular technological field (See MPEP 2106.05(h)).
The additional elements recited in the limitations “receiving a plurality of model metadata from a server through a network, each model metadata among the plurality of model metadata representing an accuracy and a memory capacity required during inference of a corresponding model in a model portfolio;” and “retrieving a model corresponding to the selected model metadata from the server;” are recited highly generally and amount to mere pre- or post-solution extra solution activity or data transmittal steps (See MPEP 2106.05(g)).
The additional elements recited in the limitations “performing inference using the model.” amount to mere instructions to apply the abstract idea mental process step(s) of producing the masked model, as the “performing inference” is recited highly generally (See MPEP 2106.05(f)).
Step 2B:
The only limitation on the performance of the described method is a limitation reciting “A non-transitory computer-readable medium including instructions executable by a processor to cause the processor to perform operations comprising:”, “a server”, “a network”, and “memory” These elements are insufficient to transform a judicial exception to a patentable invention because the recited elements are considered insignificant extra-solution activity (generic computer system, processing resources, links the judicial exception to a particular, respective, technological environment). The claim thus recites computing components only at a high-level of generality such that it amounts to no more than mere instructions to apply the exception using generic computer components; mere instructions to apply an exception using a generic computer component cannot provide an inventive concept (see MPEP 2106.05(f)).
The additional elements recited in the limitations “model” are recognized as non-generic computer components, however, they are found to generally link the abstract idea to a particular technological field (See MPEP 2106.05(h)).
The additional elements recited in the limitations “receiving a plurality of model metadata from a server through a network, each model metadata among the plurality of model metadata representing an accuracy and a memory capacity required during inference of a corresponding model in a model portfolio;” and “retrieving a model corresponding to the selected model metadata from the server;” are recited highly generally and amount to well-understood, routine, and conventional activity (See MPEP 2016.05(d)(II)(i)).
The additional elements recited in the limitations “performing inference using the model.” amount to mere instructions to apply the abstract idea mental process step(s) of producing the masked model, as the “performing inference” is recited highly generally (See MPEP 2106.05(f)).
Dependent Claims:
Claim 2 (claim 14) recites refinements to the process of Claim 1. The limitations included therein or as steps referenced therein are interpreted the same as the above analysis of Claim 1.
Claim 3 (claim 15) recites refinements to the abstract idea mental process steps of Claim 1. The limitation “each subsequent iteration further comprises increasing the threshold weight value.” Is interpretable as an abstract idea mental process because increasing a threshold is practically performed within the human mind or with the aid of pen and paper.
Claim 4 (claim 16) recites abstract idea mental process step “grouping pruned models among the plurality of pruned models into a plurality of groups based on memory capacity required during inference.” Grouping pruned models based on a metric is practically performed within the human mind or with the aid of pen and paper.
Claim 5 (claim 17) recites instructions to apply the abstract idea mental process steps used to generate pruned model(s) in the limitation “testing an accuracy of each pruned model among the plurality of pruned models;” (See MPEP 2106.05(f)) and an abstract idea mental process step in the limitation “and adding a most accurate model among pruned models of each group among the plurality of groups to a model portfolio.”
Claim 6 (claim 18) recites the limitations “transmitting a plurality of model metadata to a computation device, each model metadata among the plurality of model metadata representing the accuracy and the memory capacity required during inference of a pruned model added to the model portfolio; receiving a request for a pruned model among the plurality of pruned models added to the model portfolio corresponding to a selected model metadata of the request from the computation device; and transmitting the pruned model corresponding to the selected model metadata to the computation device.” These limitations have been evaluated under Step 2A – Prong 2 and reevaluated under Step 2B and found to be pre- or post-extra-solution activity and/or well-understood, routine, and conventional activity (See MPEP 2106.05(g) and 2106.05(d)(II)(i), respectively).
Claim 7 (claim 19) recites an abstract idea mental process step in the limitation “selecting a pruned model among the plurality of pruned models added to the model portfolio corresponding to an accuracy requirement;” and instructions to apply the abstract idea mental process step in the limitation “and instructing the cloud server to perform inference of the pruned model corresponding to the accuracy requirement.” (See MPEP 2106.05(f)). The limitation “transmitting the pruned model corresponding to the accuracy requirement to a cloud server;” has been evaluated under Step 2A – Prong 2 and reevaluated under Step 2B and found to be pre- or post-extra-solution activity and/or well-understood, routine, and conventional activity (See MPEP 2106.05(g) and 2106.05(d)(II)(i), respectively).
Claim 8 (claim 20) recites abstract idea mental process steps “determining a decrease in accuracy between the accuracy of the masked model and a preceding accuracy of a preceding masked model of a preceding iteration, and the iterations are performed until the decrease in accuracy exceeds a threshold accuracy change value.” Iteratively performing the abstract idea mental process step(s) of Claim 1 and 2 until an accuracy threshold is reached is practically performed within the human mind or with the aid of pen and paper.
Claim 9 recites abstract idea mental process steps “restoring initialized parameters of an untrained model previously trained to become the trained model.”
Claim 10 recites abstract idea mental process step “reformatting each layer among a plurality of layers of the masked model that includes at least one removable node.”
Claim 12 recites abstract idea mental process steps “determining, while performing inference, the memory capacity available for performing inference; wherein the selecting is performed in response to a change in memory capacity available for performing inference;” and refinements to the data transmittal additional elements of Claim 11 in the limitation “the retrieving is performed in response to selecting model metadata corresponding to a different model than currently used for performing inference.”
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claim(s) 1-2, 9-10, and 13-14 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Liu et al. (Learning Efficient Convolutional Networks through Network Slimming, 2017), hereinafter Liu.
In regards to claim 1: The present invention recites: “A non-transitory computer-readable medium including instructions executable by a processor to cause the processor to perform operations comprising: masking at least one edge among a plurality of edges of a trained model to produce a masked model;” Liu teaches “We associate a scaling factor (reused from a batch normalization layer) with each channel in convolutional layers.” (Figure 1, mapping the introduction of a scaling factor onto the output of a neuron as a mask). See also Page 3, right column and Page 4, left column, for further description of the scaling factor’s application to determine channels to prune.
“initializing the masked model; training the masked model;” Liu teaches “Our idea is introducing a scaling factor γ for each channel, which is multiplied to the output of that channel. Then we jointly train the network weights and these scaling factors, with sparsity regularization imposed on the latter.” (Page 3, right column and Figure 2). See Section 4.3 (Page 6) for discussion of initialization of the model with the scaling factors (mapped to masks).
“detecting, from among a plurality of channels of the masked model, each channel among the plurality of channels including a set of edges among the plurality of edges, at least one zero channel in which each edge among the set of edges is masked;” Liu teaches “After training under channel-level sparsity-induced regularization, we obtain a model in which many scaling factors are near zero (see Figure 1).” While, in this case, Liu is not necessarily making the channels zero, only near zero, see Page 2, right column, for Liu’s discussion of current technology utilizing zero masks and/or the drawbacks of using such methods, indicating a person of ordinary skill in the art at the time of Liu’s writing and the Applicant’s filing would have been aware of such a method.
“determining, from among a plurality of nodes of the masked model, each node corresponding to two channels among the plurality of channels, at least one removable node in which the corresponding two channels are zero channels;” Liu teaches “After training under channel-level sparsity-induced regularization, we obtain a model in which many scaling factors are near zero (see Figure 1).” (Page 4, right column). See also Figure 1 for the scaled outputs near zero, indicating the “zero” channels.
“and pruning the masked model to remove the removable nodes from the masked model, resulting in a pruned model.” Liu teaches “ Then we can prune channels with near-zero scaling factors, by removing all their incoming and outgoing connections and corresponding weights.” (Page 4, right column). See also “It can be applied to any typical CNNs or fully connected networks (treat each neuron as a channel), and the resulting network is essentially a “thinned” version of the unpruned network, which can be efficiently inferenced on conventional CNN platforms.” (Page 3, right column) (mapping the pruning of a channel, which may be individual neurons, with a near-zero scale factor, to pruning the removable nodes of the masked model).
In regards to claim 2: The present invention claims: “wherein the operations further comprise producing a plurality of masked models by performing iterations of the masking, the initializing, and the training; wherein the trained model of each subsequent iteration is the masked model after the training of a preceding iteration; and wherein the detecting, determining, and restructuring is performed for each masked model among the plurality of masked models, resulting in a plurality of pruned models.” Liu teaches “We can also extend the proposed method from single-pass learning scheme (training with sparsity regularization, pruning, and fine-tuning) to a multipass scheme. Specifically, a network slimming procedure results in a narrow network, on which we could again apply the whole training procedure to learn an even more compact model. This is illustrated by the dotted-line in Figure 2. Experimental results show that this multi-pass scheme can lead to even better results in terms of compression rate.” (Page 4, right column). See also the Pruning and Fine-tuning Sections on Page 6 for multiple, narrower models being made.
In regards to claim 9: The present invention claims: “wherein the initializing includes restoring initialized parameters of an untrained model previously trained to become the trained model.” Liu teaches “The weight initialization introduced by [13] is adopted. Our optimization settings closely follow the original implementation at [10]. In all our experiments, we initialize all channel scaling factors to be 0.5, since this gives higher accuracy for the baseline models compared with default setting (all initialized to be 1) from [10].” (Page 6, left column) and “After the pruning we obtain a narrower and more compact model, which is then fine-tuned. On CIFAR, SVHN and MNIST datasets, the fine-tuning uses the same optimization setting as in training.” (Page 6, right column, mapping to use of same/similar parameters).
In regards to claim 10: The present invention claims: “wherein the restructuring includes reformatting each layer among a plurality of layers of the masked model that includes at least one removable node.” Liu teaches “The pruning process is implemented by building a new narrower model and copying the corresponding weights from the model trained with sparsity.” (Page 6).
In regards to claims 13-14: Claims 13 and 14 recite similar limitations to Claims 1-2, with the exception of “A method comprising:” of Claim 13; therefore, both sets of Claims are similarly rejected.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim(s) 3, 8, 15, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Liu as applied to claims 1-2 and 13-14 above, and further in view of Frankle et al. (THE LOTTERY TICKET HYPOTHESIS: FINDING SPARSE, TRAINABLE NEURAL NETWORKS, 2019), hereinafter Frankle.
In regards to claim 3: While Liu teaches masking channel outputs with a scaling value and iteratively pruning/training a model, Liu fails to explicitly teach “wherein the masking includes masking edges having a weight value less than a threshold weight value, and each subsequent iteration further comprises increasing the threshold weight value.” However, Frankle, in a similar field of endeavor teaches “We use a simple layer-wise pruning heuristic: remove a percentage of the weights with the lowest magnitudes within each layer (as in Han et al. (2015)). Connections to outputs are pruned at half of the rate of the rest of the network. We explore other hyperparameters in Appendix G, including learning rates, optimization strategies (SGD, momentum), initialization schemes, and network sizes.” (Page 3). Page 4, Iterative Pruning also goes into pruning a percentage of lowest-value weights each iteration, which would necessarily raise the minimum weight value.
Frankle Page 4 and Figures 3-4 demonstrate the speed and accuracy of learning gained by finding a winning lottery ticket subnetwork of a neural network. The benefits of Liu and/or Frankle’s methods would have been known to a person skilled in the art at the time of the Applicant’s filing, and combination of the two utilizing Frankle’s increasing percentage threshold with Liu’s masking scaling value would have been obvious to one of ordinary skill in the art to realize the benefits of finding said winning lottery ticket subnetwork.
In regards to claim 8: The present invention claims: “wherein each iteration includes testing an accuracy of the masked model, and determining a decrease in accuracy between the accuracy of the masked model and a preceding accuracy of a preceding masked model of a preceding iteration, and the iterations are performed until the decrease in accuracy exceeds a threshold accuracy change value.” Frankle Page 4, Iterative Pruning discusses early iteration stop conditions as accuracy diminishes with iterative pruning.
In regards to claims 15 and 20: Claims 15 and 20 recite similar limitations to Claims 3 and 8, with the exception of “A method comprising:” of Claim 13; therefore, both sets of Claims are similarly rejected.
Claim(s) 4-7, 11-12, and 16-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Liu and Frankle as applied to claims 1 and 13 above, and further in view of Duong et al. (Paying more Attention to Snapshots of Iterative Pruning: Improving Model Compression via Ensemble Distillation, 2020), hereinafter Duong; and Fuez et al. (US 11,537,931 B2), hereinafter Fuez.
In regards to claim 4: While the combination of Liu and Frankle teach iteratively pruning a model, the combination fails to explicitly teach “wherein the operations further comprise grouping pruned models among the plurality of pruned models into a plurality of groups based on memory capacity required during inference.” However, Duong, in a similar field of endeavor of model pruning, teaches “we argue that conventional methods for retraining pruned networks (i.e., using small, fixed learning rate) are inadequate as they completely ignore the benefits from snapshots of iterative pruning. In this work, we show that strong ensembles can be constructed from snapshots of iterative pruning, which achieve competitive performance and vary in network structure.” (Abstract, mapping the formation of a group or ensemble of model snapshots to the formation of grouping(s) of pruned models).
Fuez, also in a similar field of endeavor, teaches “Thus, the on-device platform represents a centralized system that enables discovery of machine-learned models that are available for access. Further, the on-device platform can handle and facilitate communications between applications and their corresponding models.” (Column 4, Lines 35-39) and “The on-device machine learning platform 122 may be in the form of one or more computer programs stored locally on the computing device 102 (e.g., a smartphone or tablet), which are configured, when executed by the device 102, to perform machine learning management operations which enable performance of on-device machine learning functions on behalf of one or more locally-stored applications 120a-c or other local clients. In some implementations, the on-device machine learning platform 122 can include a context manager 126 that securely injects context features into model invocations that include application-provided input data used to generate predictions/inferences. In some implementations, the context features can be grouped or otherwise categorized according to a number of different context types. In general, each context type can specify or include a set of context features with well-known names and well-known types. One example context type is device information which includes the following example context features: audio state, network state, power connection, etc.” (Column 10, Lines 27-47, mapping a combination of Duong and the storage of multiple machine learning models as in Fuez to reasonably include grouping based on model memory usage).
Duong highlights the benefits of using model snapshots while iteratively pruning to leverage the power of intermediate states of the model (Abstract, Page 2) and Fuez highlights the need to quickly and efficiently select models most appropriate for machine learning requests in a system (Background, Column 1). It would have been obvious to a person of ordinary skill in the art at the time of the Applicant’s filing to leverage the benefits of maintaining model snapshots in a combination of Liu, Frankle, and Duong, as well as store or group said models for real-time, efficient, and appropriate access as a combination with Fuez would realize.
In regards to claim 5: The present invention claims: “wherein the operations further comprise: testing an accuracy of each pruned model among the plurality of pruned models; and adding a most accurate model among pruned models of each group among the plurality of groups to a model portfolio.” See above how a combination of Liu, Frankle, Duong, and Fuez reads on the formation of a group(s) or ensemble of snapshot models, accessible via a storage medium and arranged based on structure/memory usage/speed/etc. It would have been obvious to one of ordinary skill in the art at the time of the Applicant’s filing to store higher performing or more accurate models in said storage medium.
In regards to claim 6: The present invention claims: “wherein the operations further comprise: transmitting a plurality of model metadata to a computation device, each model metadata among the plurality of model metadata representing the accuracy and the memory capacity required during inference of a pruned model added to the model portfolio; receiving a request for a pruned model among the plurality of pruned models added to the model portfolio corresponding to a selected model metadata of the request from the computation device; and transmitting the pruned model corresponding to the selected model metadata to the computation device.” See Fuez Figures 1, 4A, and 4B for the machine learning model storage medium receiving a request from another device or application to utilize one of the stored models, the selected model performing the requesting device’s inference task. It would have been obvious to a person of ordinary skill in the art combining Liu, Frankle, Duong, and Fuez to select a stored version of the pruned model for inference tasks based on the requesting device.
In regards to claim 7: The present invention claims: “wherein the operations further comprise: selecting a pruned model among the plurality of pruned models added to the model portfolio corresponding to an accuracy requirement; transmitting the pruned model corresponding to the accuracy requirement to a cloud server; and instructing the cloud server to perform inference of the pruned model corresponding to the accuracy requirement.” See Fuez Figure 3 for the involvement of a cloud service in their method when processing a request.
In regards to claim 11: The present invention claims: “A non-transitory computer-readable medium including instructions executable by a processor to cause the processor to perform operations comprising: receiving a plurality of model metadata from a server through a network, each model metadata among the plurality of model metadata representing an accuracy and a memory capacity required during inference of a corresponding model in a model portfolio; determining a memory capacity available for performing inference; selecting a model metadata based on the accuracy from among model metadata representing memory capacity required during inference that is less than or equal to the memory capacity available for performing inference; retrieving a model corresponding to the selected model metadata from the server; and performing inference using the model.” See above how a combination of Liu, Frankle, Duong, and Fuez would read on a system containing multiple subnetwork snapshots of a pruned model for selection based on a request by an additional device or application to perform an inference task. Fuez Column 6 teaches supplementing a request with “context information” regarding a machine learning model and/or request to use it. A person of ordinary skill in the art would reasonably include the memory requirements to utilize a given model as part of the selection process.
In regards to claim 12: The present invention claims: “wherein the operations further comprise determining, while performing inference, the memory capacity available for performing inference; wherein the selecting is performed in response to a change in memory capacity available for performing inference; and wherein the retrieving is performed in response to selecting model metadata corresponding to a different model than currently used for performing inference.” Based on Applicant’s Specification [0078], this claim pertains merely to retrieving a new model versus loading a previously accessed model from memory. A person of ordinary skill in the art at the time of the Applicant’s filing combining the methods of Liu, Frankle, Duong, and Fuez would have reasonably managed the memory footprint of loading a previous model or receiving a new one, especially as Fuez directly references memory footprint management (Column 1, Background).
In regards to claims 16-19: Claims 16-19 recite similar limitations to Claims 4-7, with the exception of the method of Claim 13; therefore, both sets of claims are similarly rejected.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to GRIFFIN T BEAN whose telephone number is (703)756-1473. The examiner can normally be reached M - F 7:30 - 4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li Zhen can be reached at (571) 272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/GRIFFIN TANNER BEAN/ Examiner, Art Unit 2121
/Li B. Zhen/ Supervisory Patent Examiner, Art Unit 2121