DETAILED ACTION
This action is responsive to the Application filed on 3/21/2023. Claims 1-12 are pending in the case. Claims 1, 11 and 12 are independent claims.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1- 12 are rejected under 35 U.S.C. 101 because the claims are directed to an abstract idea without significantly more.
Regarding Claim 1, 11 and 12
Under step 1, Claim 1 is directed to processing apparatus for optimizing a structure of a neural network which is directed to a machine, one of the statutory categories.
Under step 1, Claim 11 is directed to processing method for optimizing a structure of a neural network which is directed to a process, one of the statutory categories.
Under step 1, Claim 12 is directed to a non-transitory computer readable storage medium storing a computer executable program for optimizing a structure of a neural network which is directed to an article of manufacture, one of the statutory categories.
Under Step 2A Prong 1, the claim recites the following limitations which are considered mental evaluations:
“generate a plurality of candidates for an edge of the neural network…
calculate a loss of the neural network based on a specified candidate number which is the number of candidates to be selected from the plurality of candidates, and on the inference result…
update the weight coefficient for each of the plurality of candidates based on the loss;
select candidates from the plurality of candidates based on the corresponding updated weight coefficient.”
Each of these limitations describe mental steps which can be performed in the human mind. Updating, selecting, and generating candidates are decisions about selecting an organizing abstract data. No details in the claim confine these steps to processes which can not be performed in the mind. Further, calculating a loss based on a number to be selected and a result is a decision about abstract idea. The loss being “of a neural network” does not tie the calculation to a computer confined step.
Therefore, the claim recites an abstract idea.
Under step 2A Prong 2, The claim recites the following additional element(s):
“a candidate generation unit configured to… a loss calculation unit configured to… an updating unit configured to… a selection unit configured to… with a weight coefficient set to each of the plurality of candidates for the edge… An information processing apparatus configured to learn an architecture for optimizing a structure of a neural network… An information processing method which is executed by an information processing apparatus configured to learn an architecture for optimizing a structure of a neural network (from claim 11)… A non-transitory computer-readable storage medium storing a computer-executable program for causing a computer to perform a method which is executed by an information processing apparatus configured to learn an architecture for optimizing a structure of a neural network (from claim 12)” (which amounts to mere instructions to apply a computer technology to an abstract idea, see MPEP 2106.05(f))
“with a weight coefficient set to each of the plurality of candidates for the edge” (which amounts to generally linking the use of the judicial exception to a particular technological environment or field of use, see MPEP 2106.05(h))
And “an inference unit configured to obtain an inference result by inputting learning data to the neural network” (which amounts to adding insignificant extra-solution activity to the judicial exception. See MPEP 2106.05(g) because it does not impose meaningful limits on the claim and amounts to mere data gathering.)
The claim does not recite any further additional elements, therefore the claim is directed to the recited abstract idea
Under step 2B, the claim recites the additional elements of “an inference unit configured to obtain an inference result by inputting learning data to the neural network” are insignificant extra-solution activities that are considered well-understood, routine, conventional activities. This amounts to collecting information and sending information to a device over a network is well understood, routine, and conventional activity because it amounts to “transmitting or receiving data over a network" (see MPEP 2106.05(d)(II)(i).
The recited additional elements when considered alone or in combination neither integrates the abstract idea into a practical application nor provides significantly more than the abstract idea itself.
Regarding Claim 2-7
Each of the limitations described in the claim, under Step 2A Prong 1, only serve to describe the abstract ideas addressed in the independent claim, in particular the limitations describe mental evaluations.
Regarding Claim 8
The claim does not recite the following limitation which further describes features of the previously recited abstract idea “into which the loss for the inference result of the neural network and the loss related to the neural network architecture are integrated, as the loss of the neural network “. This features merely describes the loss which may be calculated in the mind.
The claim recites the following additional element(s), in addition to those already identified in the parent claim:
“the loss calculation unit acquires a loss” (which amounts to adding insignificant extra-solution activity to the judicial exception. See MPEP 2106.05(g) because it does not impose meaningful limits on the claim and amounts to mere data gathering.)
Under step 2B, the claim recites the additional elements of “the loss calculation unit acquires a loss” are insignificant extra-solution activities that are considered well-understood, routine, conventional activities. This amounts to collecting information and sending information to a device over a network is well understood, routine, and conventional activity because it amounts to “transmitting or receiving data over a network" (see MPEP 2106.05(d)(II)(i).
The recited additional elements when considered alone or in combination neither integrates the abstract idea into a practical application nor provides significantly more than the abstract idea itself.
Regarding Claim 9
Each of the limitations described in the claim, under Step 2A Prong 1, only serve to describe the abstract ideas addressed in the independent claim, in particular the limitations describe mental evaluations.
Regarding Claim 10
The claim does not recite further abstract idea to consider, beyond those recited in the parent claim.
The claim recites the following additional element(s), in addition to those already identified in the parent claim:
“wherein the neural network is a neural network for detecting a detection target or tracking a tracking target in an image” (which amounts to generally linking the use of the judicial exception to a particular technological environment or field of use, see MPEP 2106.05(h))
The recited additional elements when considered alone or in combination neither integrates the abstract idea into a practical application nor provides significantly more than the abstract idea itself.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 3, 6, 7 and 8 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claim 3 recites the limitation " a difference between a weight of individual candidates with the specified candidate number largest weights". There is insufficient antecedent basis for underlined portion of this limitation in the claim. Examiner notes that while there is basis for the “specific candidate number”, it is not clear from the present claim form what “specified candidate number largest weights” corresponds to. For the purposes of compact prosecution, the claim is interpreted to describe a difference between a weight of an individual candidate and a largest weight value of a plurality of candidates.
Claim 7 recites the limitation " the loss calculation unit calculates the loss based on…". There is insufficient antecedent basis for underlined portion of this limitation in the claim. Examiner notes that the claim recites three difference losses, a loss of the neural network (from parent claim 1), a loss for the inference result of the neural network and a loss related to a neural network architecture. It is understood that “the loss” recited here may refer to any of these three loses.
Claim 8 is rejected by virtue of dependency on claim 7.
Claim Interpretation
Examiner highlights that claim(s) 2-4 and 6 introduce “Contingent Limitations” as noted in the MPEP 2111.04:
“The broadest reasonable interpretation of a system (or apparatus or product) claim having structure that performs a function, which only needs to occur if a condition precedent is met, requires structure for performing the function should the condition occur. The system claim interpretation differs from a method claim interpretation because the claimed structure must be present in the system regardless of whether the condition is met and the function is actually performed"
That is to say for example that the BRI of claim 2 is that the cited art should describe the structure for performing the claimed function “the loss calculation unit calculates the loss so that a value of the loss increases”. The cited art merely requires a loss calculation unit capable of calculating that the loss value increases should the consequent condition (i.e “there is a difference…”) occur.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claim(s) 1-12 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Chen “Stabilizing Differentiable Architecture Search via Perturbation-based Regularization”.
Claim 1
Chen teaches, An information processing apparatus configured to learn an architecture for optimizing a structure of a neural network, the information processing apparatus comprising (pg 1 abstract “Differentiable architecture search (DARTS) is a prevailing NAS solution to identify architectures… DARTS learns a differentiable architecture weight and largely reduces the search cost… we propose a perturbation based regularization- SmoothDARTS (SDARTS)” pg 7 Table 1 “Recorded on a single GTX 1080Ti GPU” the neural networks systems describes were trained and recorded on physical GPU hardware. Examiner notes that frequently the rejection cites the “related works” section of the cited art. The cited art explicitly describes an extension, SDARTS, of the original DARTS method. These extensions are understood to perform all of the same principal steps of the DARTS algorithm but further modify the loss functions. Nevertheless, all of the methods are implemented separately in the experiments section.) a candidate generation unit configured to generate a plurality of candidates for an edge of the neural network; ( pg 2 “Within a cell, there are N nodes organized as a DAG (Figure 2), where every node x(i) is a latent representation and every edge (i,j) is associated with a certain operation o(i,j)…. As a solution, DARTS constructs a mixed operation ¯o(i,j) on every edge…
PNG
media_image1.png
92
442
media_image1.png
Greyscale
…where O is the candidate operation corpus” pg 4 algorithm 1
PNG
media_image2.png
266
642
media_image2.png
Greyscale
the algorithm describes first generating a mixed operation which is a plurality of candidate operations for every edge of the neural network DAG) an inference unit configured to obtain an inference result by inputting learning data to the neural network with a weight coefficient set to each of the plurality of candidates for the edge; (pg 2-3 “where O is the candidate operation corpus and α (i,j) denotes the corresponding architecture weight for operation o on edge (i,j)… And the architecture search is relaxed to learning a continuous architecture weight A =[ α (i,j)]… DARTS formulates a bi-level optimization objective:…
PNG
media_image3.png
67
586
media_image3.png
Greyscale
… Then, A and w are updated via gradient descent alternately” pg 8 Section 5.5 “We perform a thorough comparison on 4 simplified search spaces…across 3 datasets… Results in Table 4 are obtained by running every method 4 independent times and pick the final architecture” in the context of neural network training “running” using “SGD” means that the given dataset is input into the model to receive an output used to update the parameters while minimizing the given loss function, corresponding to the claim limitation. Examiner noted the specification describes similarly updating via SGD.) a loss calculation unit configured to calculate a loss of the neural network based on a specified candidate number which is the number of candidates to be selected from the plurality of candidates, and on the inference result (pg 4
PNG
media_image4.png
280
637
media_image4.png
Greyscale
updating the weight and architecture is based on the calculation of the losses which in gradient descent style training is based on the training data provided to generate an inference result, indicated by L_train. Pg 6 Section 5.2 “We employ SDARTS-RS and SDARTS-ADV to search CNN cells on CIFAR-10 following the search space (with 7 possible operations)” the loss is based in part on the specified candidate number of possible operations, here is it 7. ) an updating unit configured to update the weight coefficient for each of the plurality of candidates based on the loss; (pg 3 “DARTS formulates a bi-level optimization objective:…
PNG
media_image3.png
67
586
media_image3.png
Greyscale
…Then, A and w are updated via gradient descent alternately,”) and a selection unit configured to select candidates from the plurality of candidates based on the corresponding updated weight coefficient. (pg 3 “After search, DARTS simply prunes out operations on every edge except the one with the largest architecture weight when evaluation” pruning is after training or search, thus based on the updates weight coefficient.)
Claim 2
Chen teaches claim 1
Chen teaches, in a case where there is a difference between the specified candidate number and the number of candidates to be selected by the selection unit, the loss calculation unit calculates the loss so that a value of the loss increases. (pg 5 Results “Also, the trajectory (mean ± std) of the spectral norm of
∇
A
2
L
V
a
l
i
d
is shown in Figure 5.
PNG
media_image5.png
350
462
media_image5.png
Greyscale
The spiky curve which is the spectral norm of the hessian of the Loss value indicates that the loss values is increasing and decreasing corresponding to the claim. This is a calculation of a loss value thus corresponding to the structure of the claim, the BRI does not require the condition to occur. Nevertheless, the cited art describes the condition, Pg 6-7 Section 5.2 “We employ SDARTS-RS and SDARTS-ADV to search CNN cells on CIFAR-10 following the search space (with 7 possible operations) in DARTS” pg 8 “All simplified search spaces only contain a portion of candidate operations…Results in Table 4 are obtained by running every method 4 independent times and pick the final architecture based on the validation accuracy” pg 3 “After search, DARTS simply prunes out operations on every edge except the one with the largest architecture weight when evaluation” the selected architecture prunes out every edge except one, thus a difference between a specified candidate number and selected candidate number.)
Claim 3
Chen teaches claim 1
Chen teaches, in a case where a difference between a weight of individual candidates with the specified candidate number largest weights, out of the plurality of candidates, and weights of the other candidates is smaller than a predetermined threshold value, the loss calculation unit calculates the loss so that a value of the loss increases. (pg 5 Results “Also, the trajectory (mean ± std) of the spectral norm of
∇
A
2
L
V
a
l
i
d
is shown in Figure 5.
PNG
media_image5.png
350
462
media_image5.png
Greyscale
The spiky curve which is the spectral norm of the hessian of the Loss value indicates that the loss values is increasing and decreasing corresponding to the claim. This is a calculation of a loss value thus corresponding to the structure of the claim, the BRI does not require the condition to occur. Nevertheless, the cited art describes the condition, Pg 6-7 Section 5.2 “We employ SDARTS-RS and SDARTS-ADV to search CNN cells on CIFAR-10 following the search space (with 7 possible operations) in DARTS” pg 8 “All simplified search spaces only contain a portion of candidate operations…Results in Table 4 are obtained by running every method 4 independent times and pick the final architecture based on the validation accuracy” pg 3 “After search, DARTS simply prunes out operations on every edge except the one with the largest architecture weight when evaluation” the selected architecture prunes out every edge except one, thus a difference between the largest weight value and another weight value is smaller than some threshold.)
Claim 4
Chen teaches claim 1
Chen teaches, wherein, in a case where, with the plurality of candidates sorted in descending order of weights thereof, a difference between the K-th and the (K + 1)-th largest weights for candidates is smaller than a predetermined threshold value, the loss calculation unit calculates the loss so that a value of the loss increases, and wherein K is the specified candidate number. (pg 5 Results “Also, the trajectory (mean ± std) of the spectral norm of
∇
A
2
L
V
a
l
i
d
is shown in Figure 5.
PNG
media_image5.png
350
462
media_image5.png
Greyscale
The spiky curve which is the spectral norm of the hessian of the Loss value indicates that the loss values is increasing and decreasing corresponding to the claim. This is a calculation of a loss value thus corresponding to the structure of the claim, the BRI does not require the condition to occur. Nevertheless, the cited art describes the condition, Pg 6-7 Section 5.2 “We employ SDARTS-RS and SDARTS-ADV to search CNN cells on CIFAR-10 following the search space (with 7 possible operations) in DARTS” pg 8 “All simplified search spaces only contain a portion of candidate operations…Results in Table 4 are obtained by running every method 4 independent times and pick the final architecture based on the validation accuracy” pg 3 “After search, DARTS simply prunes out operations on every edge except the one with the largest architecture weight when evaluation” K is a number of candidates to be selected by the system. When sorted there is a difference between the k-th and k-th+1 candidates which is smaller than an existing threshold number. Examiner notes the claim does not strictly requiring sorting. At most the limitations recites that a difference between the two largest weights is smaller than a threshold. For example, for this vector [5,2,7,2] there exists a difference between the k-th and k-th+1 digits when sorted in descending order which is smaller than a threshold value.)
Claim 5
Chen teaches claim 1
Chen teaches, wherein the loss calculation unit calculates the loss based on a maximal value of the specified candidate number (pg 5 “SDARTS-ADV ensures that the validation loss is small under the worst-case perturbation of A. If we assume the Hessian matrix is roughly constant within-ball, then adversarial training implicitly minimizes…
PNG
media_image6.png
121
372
media_image6.png
Greyscale
” as previously stated, A is the set of candidate architectures the loss calculation above is based in part on a maximimal value of a specified number of candidate architecures, A)
Claim 6
Chen teaches claim 5
Chen teaches, wherein, in a case where the number of candidates having a weight exceeding a predetermined threshold value exceeds the maximal value of the specified candidate number, the loss calculation unit calculates the loss so that a value of the loss increases. (pg 5 Results “Also, the trajectory (mean ± std) of the spectral norm of
∇
A
2
L
V
a
l
i
d
is shown in Figure 5.
PNG
media_image5.png
350
462
media_image5.png
Greyscale
The spiky curve which is the spectral norm of the hessian of the Loss value indicates that the loss values is calculated as increasing and decreasing corresponding to the claim. This is a calculation of a loss value thus corresponding to the structure of the claim, the BRI does not require the condition to occur. Nevertheless, the cited art describes the condition, Pg 6-7 Section 5.2 “We employ SDARTS-RS and SDARTS-ADV to search CNN cells on CIFAR-10 following the search space (with 7 possible operations) in DARTS” pg 8 “All simplified search spaces only contain a portion of candidate operations…Results in Table 4 are obtained by running every method 4 independent times and pick the final architecture based on the validation accuracy” pg 3 “After search, DARTS simply prunes out operations on every edge except the one with the largest architecture weight when evaluation” the number of candidates having a weight exceeding a threshold value is greater than the maximal value of specified candidates. In the case the maximal value is 1 per edge.)
Claim 7
Chen teaches claim 1
Chen teaches, wherein the loss calculation unit calculates a loss for the inference result of the neural network and a loss related to a neural network architecture, ( pg 3 Section 3.2 “This leads to the following two versions of SDARTS by redefining w(A):
PNG
media_image7.png
185
611
media_image7.png
Greyscale
” pg 4 Section 3.3 “Similar to DARTS, our algorithm is based on alternating minimization between A and w” as shown in the equations the loss is calculated alternating between the loss related to a neural network architecture L_val and the loss for the inference result L_train) and wherein, in the calculation of the loss related to the neural network architecture, the loss calculation unit calculates the loss based on the specified candidate number and the inference result (the above describes bi-level optimization between two objectives, the architecture loss is thus based on both the number of architectures in A, and on the training loss which is based on the inferences result.)
Claim 8
Chen teaches claim 7
Chen teaches, wherein the loss calculation unit acquires a loss into which the loss for the inference result of the neural network and the loss related to the neural network architecture are integrated, as the loss of the neural network. ( pg 3 Section 3.2 “This leads to the following two versions of SDARTS by redefining w(A):
PNG
media_image7.png
185
611
media_image7.png
Greyscale
” these losses are integrated as noted in the mathematical shorthand “s.t.”)
Claim 9
Chen teaches claim 1
Chen teaches, wherein a weight of each candidate is a weight coefficient indicating an importance. (pg 2 Section 2.1 “where O is the candidate operation corpus and α(i,j) o denotes the corresponding architecture weight for operation o on edge (i,j).” by definition in the context of neural networks weights indicate the importance weight of an edge. )
Claim 10
Chen teaches claim 1
Chen teaches, wherein the neural network is a neural network for detecting a detection target or tracking a tracking target in an image. (Section 5.3 pg 7 “We test the transferability of our discovered cells on ImageNet” Table 2 pg 8 caption “Comparison with state-of-the-art image classifiers on Image Net in the mobile setting” the neural network cells are for image classification, thus detecting a detection target)
Claim 11
Chen teaches, An information processing method which is executed by an information processing apparatus configured to learn an architecture for optimizing a structure of a neural network, the information processing method comprising: (pg 1 abstract “Differentiable architecture search (DARTS) is a prevailing NAS solution to identify architectures… DARTS learns a differentiable architecture weight and largely reduces the search cost… we propose a perturbation based regularization- SmoothDARTS (SDARTS)” pg 7 Table 1 “Recorded on a single GTX 1080Ti GPU” the neural networks systems describes were trained and recorded on physical GPU hardware)
The remaining limitations are rejected for the reasons set forth in the rejection of claim 1
Claim 12
Chen teaches, A non-transitory computer-readable storage medium storing a computer-executable program for causing a computer to perform a method which is executed by an information processing apparatus configured to learn an architecture for optimizing a structure of a neural network, (pg 1 abstract “Differentiable architecture search (DARTS) is a prevailing NAS solution to identify architectures… DARTS learns a differentiable architecture weight and largely reduces the search cost… we propose a perturbation based regularization- SmoothDARTS (SDARTS)” pg 7 Table 1 “Recorded on a single GTX 1080Ti GPU” the neural networks systems describes were trained and recorded on physical GPU hardware)
The remaining limitations are rejected for the reasons set forth in the rejection of claim 1
Conclusion
Prior art not relied upon:
Li et al. “SGAS: Sequential Greedy Architecture Search” describes ranking and sorting neural network architectures during the neural architecture search.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHNATHAN R GERMICK whose telephone number is (571)272-8363. The examiner can normally be reached M-F 9:30-4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on 571-272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/J.R.G./
Examiner, Art Unit 2122
/KAKALI CHAKI/ Supervisory Patent Examiner, Art Unit 2122