DETAILED ACTION
This Action is responsive to claims filed 04/14/2023.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 04/14/2023 was filed before the mailing of the first action on the merits. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Drawings
Receipt of Drawing filed 04/14/2023 is acknowledged. These Drawings are acceptable.
Status of the Claims
Claims 2-11 and 15-20 were preliminarily amended. Claims 1-20 are currently pending.
Claim Objections
Claims 19 and 20 objected to because of the following informalities:
Claims 19 and 20 claim a statutory category different from the claim on which they depend. Claim 1 recites a method, Claim 19 recites a computer system, and Claim 20 recites computer readable storage media. Matching statutory categories or amending Claims 19 and 20 to be independent would improve clarity.
Appropriate correction is required.
The following is a quotation of 35 U.S.C. 112(d):
(d) REFERENCE IN DEPENDENT FORMS.—Subject to subsection (e), a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.
The following is a quotation of pre-AIA 35 U.S.C. 112, fourth paragraph:
Subject to the following paragraph [i.e., the fifth paragraph of pre-AIA 35 U.S.C. 112], a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.
Claim 18 objected to under 37 CFR 1.75(c) as being in improper form because a multiple dependent claim should refer to other claims in the alternative only. See MPEP § 608.01(n):
“A multiple dependent claim may refer in the alternative to only one set of claims. A claim such as "A device as in claims 1, 2, 3, or 4, made by a process of claims 5, 6, 7, or 8" is improper.”
Claim 18 refers to independent Claim 14 as well as independent Claim 1. Accordingly, the claim has not been further treated on the merits.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-17 and 19-20 rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more; and because the claims as a whole, considering all claim elements both individually and in combination, do not amount to significantly more than the abstract idea, see Alice Corporation Pty. Ltd. v. CLS Bank International, et al, 573 U.S. (2014). In determining whether the claims are subject matter eligible, the Examiner applies the 2019 USPTO Patent Eligibility Guidelines. (2019 Revised Patent Subject Matter Eligibility Guidance, 84 Fed. Reg. 50, Jan. 7, 2019.)
Step 1: All Claims
Claims 1-13 recite a method, which falls under the statutory category of a process. Claims 14-18 recite a method, which falls under the statutory category of a process. Claim 19 recites a computer system, which falls under the statutory category of a machine. Claim 20 recites one or more computer readable storage media, which falls under the statutory category of a manufacture.
Step 2A – Prong 1: Claim 1
Claim 1 recites an abstract idea, law of nature, or natural phenomenon. The limitations of “selecting a mini-batch from a plurality of mini-batches, a training data set for a task being grouped into the plurality of mini-batches and each of the plurality of mini-batches comprising a plurality of instances;”, “stochastically selecting a plurality of network architectures of the neural network for the selected mini-batch;”, “obtaining a loss for each instance of the selected mini-batch by applying the instance to one of the plurality of network architectures;”, and “and updating shared weights of the neural network based on the loss for each instance of the selected mini -batch.” under the broadest reasonable interpretation, cover a mental process including an observation, evaluation, judgment or opinion that could be performed in the human mind or with the aid of pencil and paper.
Selecting mini-batches of instances is practically performed within the human mind or with the aid of pen and paper. Selecting architectures is practically performed within the human mind or with the aid of pen and paper. Obtaining a loss and updating weights is practically performed within the human mind or with the aid of pen and paper.
Step 2A – Prong 2: Claim 1
The additional elements of claim 1 do not integrate the abstract idea into a judicial exception. The claim recites the additional elements “A method”, “a mini-batch”, and “data” are recognized as generic computer components recited at a high level of generality. Although they have and execute instructions to perform the abstract idea itself, this also does not serve to integrate the abstract idea into a practical application as it merely amounts to instructions to "apply it." (See MPEP 2106.04(d)(2) indicating mere instructions to apply an abstract idea does not amount to integrating the abstract idea into a practical application).
The additional elements of “neural network”, “training data”, “network architecture”, and “shared weights” are recognized as non-generic computer components, but are recited at a high level of generality and are found to generally link the abstract idea to a particular technological environment or field of use (See MPEP 2106.05(h)).
The additional elements recited in the limitations “training a weight-sharing neural network with stochastic architectures” are found to be mere instructions to apply the abstract idea(s) the dataset(s) (see MPEP 2106.05(f) indicating mere instructions to apply an abstract idea does not amount to integrating the abstract idea into a practical application).
Step 2B: Claim 1
The only limitation on the performance of the described method is a limitation reciting “A method”, “a mini-batch”, and “data” These elements are insufficient to transform a judicial exception to a patentable invention because the recited elements are considered insignificant extra-solution activity (generic computer system, processing resources, links the judicial exception to a particular, respective, technological environment). The claim thus recites computing components only at a high-level of generality such that it amounts to no more than mere instructions to apply the exception using generic computer components; mere instructions to apply an exception using a generic computer component cannot provide an inventive concept (see MPEP 2106.05(f)).
The additional elements of “neural network”, “training data”, “network architecture”, and “shared weights” are recognized as non-generic computer components, but are recited at a high level of generality and are found to generally link the abstract idea to a particular technological environment or field of use (See MPEP 2106.05(h)).
The additional elements recited in the limitations “training a weight-sharing neural network with stochastic architectures” are found to be mere instructions to apply the abstract idea (See MPEP 2106.05(f) indicating mere instructions to apply an abstract idea does not recite significantly more).
Taken alone or in ordered combination, these additional elements do not amount to significantly more than the above-identified abstract idea. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide conventional computer implementation.
For the reasons above, claim 1 is rejected as being directed to non-patentable subject matter under §101.
Step 2A – Prong 1: Claim 14
Claim 14 recites an abstract idea, law of nature, or natural phenomenon. The limitations of “randomly selecting one or more network architectures of the neural network;”, “inferring one or more output data by the selected one or more network architectures respectively based on the input data;”, and “and obtaining a final inference data based on the one or more output data.” under the broadest reasonable interpretation, cover a mental process including an observation, evaluation, judgment or opinion that could be performed in the human mind or with the aid of pencil and paper.
Selecting architectures is practically performed within the human mind or with the aid of pen and paper. Inferring output data is practically performed within the human mind or with the aid of pen and paper. Obtaining a final result based on the inferred data is practically performed within the human mind or with the aid of pen and paper.
Step 2A – Prong 2: Claim 14
The additional elements of claim 14 do not integrate the abstract idea into a judicial exception. The claim recites the additional elements “A method” and “data” are recognized as generic computer components recited at a high level of generality. Although they have and execute instructions to perform the abstract idea itself, this also does not serve to integrate the abstract idea into a practical application as it merely amounts to instructions to "apply it." (See MPEP 2106.04(d)(2) indicating mere instructions to apply an abstract idea does not amount to integrating the abstract idea into a practical application).
The additional elements of “neural network”, “training data”, “network architecture”, and “shared weights” are recognized as non-generic computer components, but are recited at a high level of generality and are found to generally link the abstract idea to a particular technological environment or field of use (See MPEP 2106.05(h)).
The additional element “receiving an input data;” is found to be a mere pre- post-extra-solution activity or data gathering step (See MPEP 2106.05(g)).
Step 2B: Claim 14
The only limitation on the performance of the described method is a limitation reciting “A method” and “data” These elements are insufficient to transform a judicial exception to a patentable invention because the recited elements are considered insignificant extra-solution activity (generic computer system, processing resources, links the judicial exception to a particular, respective, technological environment). The claim thus recites computing components only at a high-level of generality such that it amounts to no more than mere instructions to apply the exception using generic computer components; mere instructions to apply an exception using a generic computer component cannot provide an inventive concept (see MPEP 2106.05(f)).
The additional elements of “neural network”, “training data”, “network architecture”, and “shared weights” are recognized as non-generic computer components, but are recited at a high level of generality and are found to generally link the abstract idea to a particular technological environment or field of use (See MPEP 2106.05(h)).
The additional element “receiving an input data;” is found to be well-understood, routine, and conventional activity (See MPEP 2016.05(d)(II)(i)).
Taken alone or in ordered combination, these additional elements do not amount to significantly more than the above-identified abstract idea. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide conventional computer implementation.
For the reasons above, claim 14 is rejected as being directed to non-patentable subject matter under §101.
Dependent Claims:
Claim 2 recites additional elements recognized as non-generic computer components, but are recited at a high level of generality and found to generally link the abstract idea to a particular technological environment or field of use (See MPEP 2106.05(h)).
Claim 3 recites additional elements recognized as non-generic computer components, but are recited at a high level of generality and found to generally link the abstract idea to a particular technological environment or field of use (See MPEP 2106.05(h)).
Claim 4 recites additional elements recognized as non-generic computer components, but are recited at a high level of generality and found to generally link the abstract idea to a particular technological environment or field of use (See MPEP 2106.05(h)).
Claim 5 recites abstract idea mental process steps “calculating…” and “updating…”
Claim 6 recites additional elements recognized as non-generic computer components, but are recited at a high level of generality and found to generally link the abstract idea to a particular technological environment or field of use (See MPEP 2106.05(h)). Claim 6 recites an abstract idea mental process step “updating…”
Claim 7 recites additional elements recognized as non-generic computer components, but are recited at a high level of generality and found to generally link the abstract idea to a particular technological environment or field of use (See MPEP 2106.05(h)).
Claim 8 recites additional elements recognized as non-generic computer components, but are recited at a high level of generality and found to generally link the abstract idea to a particular technological environment or field of use (See MPEP 2106.05(h)).
Claim 9 recites additional elements recognized as non-generic computer components, but are recited at a high level of generality and found to generally link the abstract idea to a particular technological environment or field of use (See MPEP 2106.05(h)).
Claim 10 recites abstract idea mental process steps “calculating…” and “updating…”
Claim 11 recites additional elements recognized as non-generic computer components, but are recited at a high level of generality and found to generally link the abstract idea to a particular technological environment or field of use (See MPEP 2106.05(h)).
Claim 12 recites abstract idea mental process step “repeating…”
Claim 13 recites abstract idea mental process step “repeating…”
Claim 15 recites additional elements recognized as non-generic computer components, but are recited at a high level of generality and found to generally link the abstract idea to a particular technological environment or field of use (See MPEP 2106.05(h)).
Claim 16 recites additional elements recognized as non-generic computer components, but are recited at a high level of generality and found to generally link the abstract idea to a particular technological environment or field of use (See MPEP 2106.05(h)).
Claim 17 recites additional elements recognized as non-generic computer components, but are recited at a high level of generality and found to generally link the abstract idea to a particular technological environment or field of use (See MPEP 2106.05(h)).
Although Claim 18 is not to be treated on the merits (See Objection above), Claim 18 merely recites instructions to apply the abstract idea mental process steps of Claim 1 (See MPEP 2106.05(f)).
Claim 19 recites generic computer components applying the abstract idea mental process steps of Claim 1.
Claim 20 recites generic computer components applying the abstract idea mental process steps of Claim 1.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim(s) 1-8, 10-17, and 19-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Pham et al. (Efficient Neural Architecture Search via Parameter Sharing, 2018), hereinafter Pham, and Li et al. (Random Search and Reproducibility for Neural Architecture Search, 2019), hereinafter Li.
In regards to claim 1: The present invention claims: “A method for training a weight-sharing neural network…comprising:” Pham teaches “In ENAS, there are two sets of learnable parameters: the parameters of the controller LSTM…and the shared parameters of the child models” (Page 3, left column)
“selecting a mini-batch from a plurality of mini-batches, a training data set for a task being grouped into the plurality of mini-batches and each of the plurality of mini-batches comprising a plurality of instances;” Pham teaches “The first phase trains…the shared parameters of the child models, on a whole pass through the training data set. For our Penn Treebank experiments…is trained for about 400 steps, each on a minibatch of 64 examples” (Page 3, left column).
“obtaining a loss for each instance of the selected mini-batch by applying the instance to one of the plurality of network architectures;” Pham teaches “In this step, we fix the controller’s policy…and perform stochastic gradient descent (SGD)…to minimize the expected loss function... Here, L(m; w) is the standard cross-entropy loss, computed on a minibatch of training data, with a model m sampled from _(m; [theta]).”(page 3, left column).
“and updating shared weights of the neural network based on the loss for each instance of the selected mini -batch.” Pham teaches “Nevertheless – and this is perhaps surprising – we find that M = 1 works just fine, i.e. we can update [the shared parameters of the child models] using the gradient from any single model m sampled from _(m; _). As mentioned, we train [the shared parameters of the child models] during a entire pass through the training data.” (page 3, left column).
Pham fails to explicitly teach “…with stochastic architectures,” and “stochastically selecting a plurality of network architectures of the neural network for the selected mini-batch;” However, Li, in a similar field of endeavor, teaches “In order to combine random search with weight-sharing, we simply use randomly sampled architectures to train the shared weights. Shared weights are updated by selecting a single architecture for a given minibatch and updating the shared weights by back-propagating through the network with only the edges and operations as indicated by the architecture activated. Hence, the number of architectures used to update the shared weights is equivalent to the total number of minibatch training iterations.” (Page 7).
Li teaches “Leveraging these observations, we evaluate both random search with early-stopping and a novel random search with weight-sharing algorithm on two standard NAS benchmarks—PTB and CIFAR-10. Our results show that random search with early-stopping is a competitive NAS baseline, e.g., it performs at least as well as ENAS [41], a leading NAS method, on both benchmarks. Additionally, random search with weight-sharing outperforms random search with early-stopping, achieving a state-of-the-art NAS result on PTB and a highly competitive result on CIFAR-10.” (Abstract, Page 1). It would have been obvious to one of ordinary skill in the art at the time of the Applicant’s filing to leverage known benefits of a system such as Li’s to randomly or stochastically search architectures when combining with elements of Pham’s ENAS.
In regards to claim 2: The present invention claims: “wherein the neural network comprises a set of nodes and a set of edges, each of the nodes representing at least one operation, each of the edges connecting two of the nodes, each network architecture of the neural network being represented as a directed graph of nodes connected by edges.” Pham teaches “Intuitively, ENAS’s DAG is the superposition of all possible child models in a search space of NAS, where the nodes represent the local computations and the edges represent the flow of information. The local computations at each node have their own parameters, which are used only when the particular computation is activated.” (Pages 1-2 and Figure 2).
In regards to claim 3: The present invention claims: “wherein the shared weights of the neural network comprises at least part of operations of the nodes.” See Pham Section 2.2 (Page 3) and “The shared parameters of the child models ! are trained using SGD with a learning rate of 20.0, decayed by a factor of 0.96 after every epoch starting at epoch 15, for a total of 150 epochs.” (Page 5).
In regards to claim 4: The present invention claims: “wherein the at least part of comprises convolution operations.” Sections 2.3 and 2.4 of Pham pertain directly to performing convolutional operations.
In regards to claim 5: The present invention claims: “wherein updating the shared weights of the neural network based on the loss for each instance of the selected mini-batch further comprises: calculating gradients for the shared weights of the neural network by back-propagating mean loss of the loss for each instance of the selected mini-hatch along the selected plurality of network architectures respectively or by back-propagating the loss for each instance of the selected mini-batch along a corresponding one of the selected plurality of network architectures respectively;” Pham teaches “The first phase trains w, the shared parameters of the child models, on a whole pass through the training data set. For our Penn Treebank experiments, w is trained for about 400 steps, each on a minibatch of 64 examples, where the gradient V is computed using back-propagation through time, truncated at 35 time steps. Meanwhile, for CIFAR-10, ! is trained on 45, 000 training images, separated into minibatches of size 128, where r! is computed using standard back-propagation.” (Page 3, left column)
“and updating the shared weights of the neural network by using an accumulation or average of the gradients for each of the shared weights.” The section of Pham proceeding the above citation on Page 3 (Training the shared parameters w of the child models.) teaches the shard parameters being updated.
In regards to claim 6: The present invention claims: “wherein the neural network further comprises architecture specific weights for each network architecture of the neural network, and the method further comprises: updating the architecture specific weights for each of the selected plurality of network architectures based on the loss for each instance of the selected mini-batch.” Li teaches “Shared weights are updated by selecting a single architecture for a given minibatch and updating the shared weights by back-propagating through the network with only the edges and operations as indicated by the architecture activated. Hence, the number of architectures used to update the shared weights is equivalent to the total number of minibatch training iterations.” (Page 7)
In regards to claim 7: Claim 7 recites similar limitations to Claim 2, therefore both claims are similarly rejected.
In regards to claim 8: Claim 8 recites similar limitations to Claims 3 and/or 4, therefore both claims are similarly rejected.
In regards to claim 10: Claim 10 recites similar limitations to Claim 5, therefore both claims are similarly rejected.
In regards to claim 11: The present invention claims: “wherein the neural network comprises a main chain which comprises the set of nodes connected in series by edges, each network architecture of the neural network comprises the mainchain.” See Pham Figures 1-5 and Li Figure 2 for a DAG comprising nodes that the architecture(s) are a part of.
In regards to claim 12: The present invention claims: “repeating the steps of claim 1 until all of the plurality of mini-batches have been selected for one time.” Li teaches “(1)Training epochs. Increasing the number of training epochs while keeping all other parameters the same increases the total number of minibatch updates and hence, the number of architectures used to update the shared weights. Intuitively, training with more architectures should help the shared weights generalize better to what are likely unseen architectures in the evaluation step. Unsurprisingly, more epochs increase the computational time required for architecture search.
(2) Batch size. Decreasing the batch size while keeping all other parameters the same also increases the number of minibatch updates but at the cost of noisier gradient update. Hence, we expect reducing the batch size to have a similar effect as increasing the number of training epochs but may necessitate adjusting other meta-hyperparameters to account for the noisier gradient update. Intuitively, more minibatch updates increase the computational time required for architecture search.” (Pages 7-8), which broadly reads on training for each mini-batch at least once.
In regards to claim 13: The present invention claims: “repeating the repeating step of claim 12 until a convergence condition is met.” Pham teaches “The main contribution of this work is to improve the efficiency of NAS by forcing all child models to share weights to eschew training each child model from scratch to convergence.” (Introduction) and “To prevent premature convergence, we also use a tanh constant of 2.5 and a temperature of 5.0 for the sampling logits…” (Page 5). Section 3.3 also goes into training the architectures until convergence.
In regards to claim 14: The present invention claims: “A method for inferencing by using a weight-sharing neural network, comprising: receiving an input data; randomly selecting one or more network architectures of the neural network; inferring one or more output data by the selected one or more network architectures respectively based on the input data; and obtaining a final inference data based on the one or more output data.” See above how a combination of Pham and Li reads on a weight-sharing network that randomly selects architectures. Pham teaches “Third, as shown in Figure 6, the output of our ENAS cell is an average of 6 nodes. This behavior is similar to that of Mixture of Contexts (MoC) (Yang et al., 2018). Not only does ENAS independently discover MoC, but it also learns to balance between i) the number of contexts to mix, which increases the model’s expressiveness, and ii) the depth of the recurrent cell, which learns more complex transformations (Zilly et al., 2017).” (Page 6 and Figure 6).
In regards to claims 15-17: Claims 15-17 recite similar limitations to those found in claims 1-6, therefore both sets of claims are similarly rejected.
Although Claim 18 is not to be treated on the merits (See Objection above), it would merely inherit the rejection of the limitations of claim 1.
In regards to claim 19: Claim 19 merely recites a computer system performing the limitations of Claim 1, therefore both claims are similarly rejected.
In regards to claim 20: Claim 20 merely recites a computer readable storage media with instructions for performing the limitations of Claim 1, therefore both claims are similarly rejected.
Claim(s) 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Pham and Li as applied to claim 1 above, and further in view of Xie et al. (SNAS: STOCHASTIC NEURAL ARCHITECTURE SEARCH, published Apr. 2020), hereinafter Xie.
In regards to claim 9: The present invention claims: “wherein the part of operations comprises batch normalization (BN) operations.” While both Pham and Li make reference to “normal” or “normalization” (Pham, Page 6, right column and Li, Page 5, Section 2.2), the combination fails to explicitly teach the limitations of Claim 9. However, Xie, in a similar field of endeavor, teaches “We employ the following techniques in our experiments: centrally padding the training images to 40 x 40 and then randomly cropping them back to 32 x 32; randomly flipping the training images horizontally; normalizing the training and validation images by subtracting the channel mean and dividing by the channel standard deviation.” (Page 16).
Pham (Section 3.2), Li (Page 5, Section 2.2), and Xie (Page 16) make reference to “normal” or “normalization” operations in the context of performing benchmarking with CIFAR-10. It would have been obvious to one of ordinary skill in the art at the time of the Applicant’s filing to leverage known methods to achieve known outcomes with the use of batch normalization operations in a system combining aspects of Pham, Li, and Xie.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to GRIFFIN T BEAN whose telephone number is (703)756-1473. The examiner can normally be reached M - F 7:30 - 4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li Zhen can be reached at (571) 272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/GRIFFIN TANNER BEAN/Examiner, Art Unit 2121
/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121