Office Action Analysis: 18208157 — BLOCK-WISE NEURAL ARCHITECTURE SEARCH USING GUIDED SEARCH ALGORITHM

Office Action

§101 §102 §103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . This action is in response to a patent application filed on June 9th, 2023. Claims 1-20 are pending in the current application.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 4, 10 and 17 recite the limitation "the second initial seed network".  There is insufficient antecedent basis for this limitation in the claim, and is therefore rejected under 35 USC 112(b). To examine the claim for its merits, the limitation is interpreted as, “the initial seed network”.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.



	Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Regarding claim 1, Under Step 1 of the Subject Matter Eligibility Test of Products and Processes, the claim is directed towards a machine, which is one of the four statutory categories.
Next, under a Step 2A Prong 1 Analysis, the claim recites the following limitations which interpreted to be, under the broadest reasonable interpretation, abstract ideas:
dividing an initial seed network into a plurality of blocks to form a network search space, wherein the network search space includes a plurality of a candidate neural architectures (mental process)
defining, for each block in the plurality of blocks, a plurality of sample-based search spaces, wherein each sample-based search space includes a plurality of candidate block configurations, and the plurality of candidate block configurations are determined by determining candidate block configurations that minimize a block- wise knowledge distillation loss (mental process)
determining a first set of block configurations that are Pareto optimal block configurations from the plurality of candidate block configurations in the plurality of sample-based search spaces (mental process)
determining a plurality of sub-super-net search spaces for each block configuration in the first set of block configurations (mental process)
and determining an optimized neural architecture by determining a first trained candidate model of the plurality of trained candidate models that minimizes a knowledge distillation loss of the plurality of trained candidate models. (mental process)
Therefore, we have to examine the claim under Step 2A prong 2, which considers the additional elements within the claim. The claim’s additional elements are:
one or more processors
one or more non-transitory computer-readable media that stores instructions
and training, using a set of input training data, each of the plurality of sub-super-net search spaces to generate a plurality of trained candidate models
The limitations, as drafted, merely recite instructions to apply a judicial exception, as it instructs to use one or more processors and media, as tools to perform the abstract idea, and use a set of input training data to train each of the plurality of sub-super-net spaces to then generate a plurality of models. (See MPEP 2106.05(f)) Therefore, these additional elements do not integrate the abstract idea into a practical application. The claim is directed to an abstract idea.
Under a Step 2B analysis, the claim’s additional elements do not amount to significantly
more than the judicial exception as explained above in Step 2A prong 2. Therefore, the claim is ineligible. 

Regarding claim 6, Under Step 1 of the Subject Matter Eligibility Test of Products and Processes, the claim is directed towards a machine, which is one of the four statutory categories.
Next, under a Step 2A Prong 1 Analysis, the claim recites the following limitations which interpreted to be, under the broadest reasonable interpretation, abstract ideas:
determining a network search space that includes a plurality of a candidate neural architectures based on an initial seed network (mental process)
defining, for each block in the plurality of blocks, a plurality of sample-based search spaces, wherein each sample-based search space includes a plurality of candidate block configurations (mental process)
determining a plurality of sub-super-net search spaces for each block configuration in a first set of block configurations of the plurality of candidate block configurations (mental process)
and determining an optimized neural architecture by determining a first trained candidate model of the plurality of trained candidate models, wherein the first trained candidate model minimizes a knowledge distillation loss of the plurality of trained candidate models.(mental process)
Therefore, we have to examine the claim under Step 2A prong 2, which considers the additional elements within the claim. The claim’s additional elements are:
one or more processors
one or more non-transitory computer-readable media that stores instructions
and training, using a set of input training data, each of the plurality of sub-super-net search spaces to generate a plurality of trained candidate models.
The limitations, as drafted, merely recite instructions to apply a judicial exception, as it instructs to use one or more processors and media, as tools to perform the abstract idea, and use a set of input training data to train each of the plurality of sub-super-net spaces to then generate a plurality of models. (See MPEP 2106.05(f)) Therefore, these additional elements do not integrate the abstract idea into a practical application. The claim is directed to an abstract idea.
Under a Step 2B analysis, the claim’s additional elements do not amount to significantly
more than the judicial exception as explained above in Step 2A prong 2. Therefore, the claim is ineligible. 

Regarding claim 14, Under Step 1 of the Subject Matter Eligibility Test of Products and Processes, the claim is directed towards a machine, which is one of the four statutory categories.
Next, under a Step 2A Prong 1 Analysis, the claim recites the following limitations which interpreted to be, under the broadest reasonable interpretation, abstract ideas:
determining a network search space that includes a plurality of a candidate neural architectures based on an initial seed network (mental process)
defining, for each block in the plurality of blocks, a plurality of sample-based search spaces, wherein each sample-based search space includes a plurality of candidate block configurations (mental process)
determining a plurality of sub-super-net search spaces for each block configuration in a first set of block configurations in the plurality of candidate block configurations (mental process)
and determining an optimized neural architecture by determining a first trained candidate model of the plurality of trained candidate models, wherein the first trained candidate model minimizes a knowledge distillation loss of the plurality of trained candidate models. (mental process)
Therefore, we have to examine the claim under Step 2A prong 2, which considers the additional elements within the claim. The claim’s additional elements are:
one or more processors
one or more non-transitory computer-readable media that stores instructions
and training, using a set of input training data, each of the plurality of sub-super-net search spaces to generate a plurality of trained candidate models.
The limitations, as drafted, merely recite instructions to apply a judicial exception, as it instructs to use one or more processors and media, as tools to perform the abstract idea, and use a set of input training data to train each of the plurality of sub-super-net spaces to then generate a plurality of models. (See MPEP 2106.05(f)) Therefore, these additional elements do not integrate the abstract idea into a practical application. The claim is directed to an abstract idea.
Under a Step 2B analysis, the claim’s additional elements do not amount to significantly
more than the judicial exception as explained above in Step 2A prong 2. Therefore, the claim is ineligible. 

	Regarding claims 2, 7 and 15, the claims recite “storing information describing each block in the plurality of blocks of the initial seed network into a database.” The limitation, as drafted, is considered to be insignificant extra-solution activity, (See MPEP 2105.06(g)) and additionally considered to be well-understood, routine, and conventional, as it is considered to be storing and retrieving information in memory. (See MPEP 2106.05(d)(II)) Therefore, the claims are rejected on the same basis as claims 1, 6, and 14.

	Regarding claims 3, 8, and 16, the claims recite “the information describing each block includes at least one of an input resolution, an output resolution, a position in the seed network, and a size of each block.” The limitation, as drafted, merely describes the particular technological environment, and field of use, and “generally links” input resolution, output resolution, a position in the seed network, and size of each block, to the information describing each block. (See MPEP 2106.05(h)) Therefore, the claims are rejected on the same basis as claims 2, 7, and 15.

	Regarding claims 4, 10, and 17, the claims recite “dividing an initial seed network into a second plurality of blocks, wherein the second initial seed network includes a second plurality of a candidate neural architectures; and retrieving, from the database, information describing at least one block of the second plurality of blocks.” The “dividing an initial seed network into a second plurality of blocks, wherein the second initial seed network includes a second plurality of a candidate neural architectures” is considered to be, under the broadest reasonable interpretation, a “mental process”, which is a grouping of abstract idea. The “retrieving, from the database, information describing at least one block of the second plurality of blocks” is a limitation that, as drafted, is considered to be insignificant extra-solution activity, (See MPEP 2105.06(g)) and additionally considered to be well-understood, routine, and conventional, as it is considered to be storing and retrieving information in memory. (See MPEP 2106.05(d)(II)) Therefore, the claims are rejected on the same basis as claims 2, 7, and 14.

	Regarding claims 5, 11, and 18 the claims recite “the step of retrieving, from the database, information describing at least one block of the second plurality of blocks is performed before defining, for each block in the second plurality of blocks, a second plurality of sample-based search spaces, wherein each sample-based search space in the second plurality of sample-based search spaces includes a second plurality of candidate block configurations, and the second plurality of candidate block configurations are determined by determining candidate block configurations that minimize the block-wise knowledge distillation loss.” The limitations, as drafted, is interpreted to be, under the broadest reasonable interpretation, “mental processes”, which is a grouping of abstract idea. Therefore, the claims are rejected on the same basis as claims 4, 10, and 17.
 
	Regarding claims 12 and 19, the claims recite “determining the first set of block configurations includes determining the first set of block configurations are Pareto optimal block configurations.” The limitation, as drafted, is considered to be, under the broadest reasonable interpretation, a “mental process”, which is a grouping of abstract idea. Therefore the claims are rejected on the same basis as claims 6 and 14.

	Regarding claims 13 and 20, the claims recite “training, using a set of input training data, each of the plurality of sub-super-net search spaces to generate the plurality of trained candidate models.” The limitations, as drafted, merely recite instructions to apply a judicial exception, as it instructs to use a set of input training data to train each of the plurality of sub-super-net spaces to then generate a plurality of models. (See MPEP 2106.05(f)) Therefore the claims are rejected on the same basis as claims 6 and 14.


Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claim(s) 6, 13, 14 and 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Changlin Li et al. (Herein referred to as Li) (Block-wisely Supervised Neural Architecture Search with Knowledge Distillation) (As cited in the IDS)

Regarding claim 6, Li teaches a computing system, comprising: one or more processors; and one or more non-transitory computer-readable media that stores instructions that, when executed by the one or more processors, cause the computing system to implement a neural architecture search (While not explicitly disclosed in Li, one would implicitly need these components to run the NAS with knowledge distillation of Li) by performing the steps of: determining a network search space that includes a plurality of a candidate neural architectures based on an initial seed network (“We consider a network architecture has several blocks, conceptualized as analogous to the ventral visual blocks… As Fig. 1 shows, we find that different blocks of an existing architecture have different knowledge in extracting different patterns of an image”, Figure 1; pg. 2, left column, third paragraph) (Figure 1 shows candidate neural architectures) defining, for each block in the plurality of blocks, a plurality of sample-based search spaces, wherein each sample-based search space includes a plurality of candidate block configurations, (“we propose to modularize the large search space of NAS into blocks to ensure that the potential candidate architectures are fully trained; this reduces the representation shift caused by the shared parameters and leads to the correct rating of the candidates.”, pg. 1, Abstract) (The search spaces are divided into blocks.) determining a plurality of sub-super-net search spaces for each block configuration in a first set of block configurations of the plurality of candidate block configurations (“To improve the accuracy of the evaluation, we divide the supernet into blocks of smaller sub-space.”, pg. 3, right column, under “Block-wise NAS.”) training, using a set of input training data, each of the plurality of sub-super-net search spaces to generate a plurality of trained candidate models (“Illustration of our DNA. The teacher’s previous feature map is used as input for both teacher and student block. Each cell of the supernet is trained independently to mimic the behavior of the corresponding teacher block by minimizing the l2-distance between their output feature maps. The dotted lines indicate randomly sampled paths in a cell… As shown in Figure 2, in each training step, the teacher’s previous feature map is first fed to several cells (as suggested by the solid line), and one of the candidate operations of each layer in the cell is randomly chosen to form a path (as suggested by the dotted line).”, pg. 4, Figure 2; pg. 5, left column, second paragraph) and determining an optimized neural architecture by determining a first trained candidate model of the plurality of trained candidate models wherein the first trained candidate model minimizes a knowledge distillation loss of the plurality of trained candidate models. (“We tested two progressive block-wise distillation strategy and compare their effectiveness with ours by experiments. All the three strategy is performed block by block by minimizing the MSE loss between feature maps of student supernet and the teacher.”, pg. 7, right column, under “4.4. Ablation Study”, See also Tables 4 and 5 on pg. 8) (A student model is selected with minimal distillation loss and high accuracy.)

Regarding claim 14, Li teaches a method, comprising: determining a network search space that includes a plurality of a candidate neural architectures based on an initial seed network, (“We consider a network architecture has several blocks, conceptualized as analogous to the ventral visual blocks… As Fig. 1 shows, we find that different blocks of an existing architecture have different knowledge in extracting different patterns of an image”, Figure 1; pg. 2, left column, third paragraph) (Figure 1 shows candidate neural architectures) defining, for each block in the plurality of blocks, a plurality of sample-based search spaces, wherein each sample-based search space includes a plurality of candidate block configurations, (“we propose to modularize the large search space of NAS into blocks to ensure that the potential candidate architectures are fully trained; this reduces the representation shift caused by the shared parameters and leads to the correct rating of the candidates.”, pg. 1, Abstract) (The search spaces are divided into blocks.) determining a plurality of sub-super-net search spaces for each block configuration in a first set of block configurations in the plurality of candidate block configurations, (“To improve the accuracy of the evaluation, we divide the supernet into blocks of smaller sub-space.”, pg. 3, right column, under “Block-wise NAS.”) training, using a set of input training data, each of the plurality of sub-super-net search spaces to generate a plurality of trained candidate models, (“Illustration of our DNA. The teacher’s previous feature map is used as input for both teacher and student block. Each cell of the supernet is trained independently to mimic the behavior of the corresponding teacher block by minimizing the l2-distance between their output feature maps. The dotted lines indicate randomly sampled paths in a cell… As shown in Figure 2, in each training step, the teacher’s previous feature map is first fed to several cells (as suggested by the solid line), and one of the candidate operations of each layer in the cell is randomly chosen to form a path (as suggested by the dotted line).”, pg. 4, Figure 2; pg. 5, left column, second paragraph) and determining an optimized neural architecture by determining a first trained candidate model of the plurality of trained candidate models, wherein the first trained candidate model minimizes a knowledge distillation loss of the plurality of trained candidate models. (“We tested two progressive block-wise distillation strategy and compare their effectiveness with ours by experiments. All the three strategy is performed block by block by minimizing the MSE loss between feature maps of student supernet and the teacher.”, pg. 7, right column, under “4.4. Ablation Study”, See also Tables 4 and 5 on pg. 8) (A student model is selected with minimal distillation loss and high accuracy.)

	Regarding claim 13 and 20, Li teaches the computing system and method of claims 6 and 14 respectively, as well as training, using a set of input training data, each of the plurality of sub-super-net search spaces to generate the plurality of trained candidate models. (“We evaluated our method on ImageNet [11], a large-scale classification dataset that has been used to evaluate various NAS methods. During the architecture search, we randomly select 50 images from each class of the original training set to form a 50k-image validation set for the rating step of the NAS and use the remainder as the supernet training set. After that, all of our searched architectures are retrained from scratch on the original training set without supervision from the teacher network and tested on the original validation set.”, pg. 6, left column, under “4.1 Setups”)


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim(s) 1, 12 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Li in view of Yuanzheng Ci et al. (Herein referred to as Ci) (Evolving Search Space for Neural Architecture Search)

Regarding claim 1, Li teaches one or more processors; and one or more non-transitory computer-readable media that stores instructions that, when executed by the one or more processors, cause the computing system to implement a neural architecture search (While not explicitly disclosed in Li, one would implicitly need these components to run the NAS with knowledge distillation of Li) by performing the steps of: dividing an initial seed network into a plurality of blocks to form a network search space, wherein the network search space includes a plurality of a candidate neural architectures, (“We consider a network architecture has several blocks, conceptualized as analogous to the ventral visual blocks… As Fig. 1 shows, we find that different blocks of an existing architecture have different knowledge in extracting different patterns of an image”, Figure 1; pg. 2, left column, third paragraph) (Figure 1 shows candidate neural architectures) defining, for each block in the plurality of blocks, a plurality of sample-based search spaces, (“we propose to modularize the large search space of NAS into blocks to ensure that the potential candidate architectures are fully trained; this reduces the representation shift caused by the shared parameters and leads to the correct rating of the candidates.”, pg. 1, Abstract; See also Figure 1) (The search spaces are divided into blocks.) wherein each sample-based search space includes a plurality of candidate block configurations, and the plurality of candidate block configurations are determined by determining candidate block configurations that minimize a block- wise knowledge distillation loss, (“All the three strategy is performed block by block by minimizing the MSE loss between feature maps of student supernet and the teacher”, pg. 7, right column, bottom paragraph) (The MSE loss between teach and student corresponds to a distillation loss.) determining a plurality of sub-super-net search spaces for each block configuration in the first set of block configurations, (“To improve the accuracy of the evaluation, we divide the super-net into blocks of smaller sub-space.”, pg. 3, right column, under “Block-wise NAS.”) training, using a set of input training data, each of the plurality of sub-super-net search spaces to generate a plurality of trained candidate models, (“Illustration of our DNA. The teacher’s previous feature map is used as input for both teacher and student block. Each cell of the supernet is trained independently to mimic the behavior of the corresponding teacher block by minimizing the l2-distance between their output feature maps. The dotted lines indicate randomly sampled paths in a cell… As shown in Figure 2, in each training step, the teacher’s previous feature map is first fed to several cells (as suggested by the solid line), and one of the candidate operations of each layer in the cell is randomly chosen to form a path (as suggested by the dotted line).”, pg. 4, Figure 2; pg. 5, left column, second paragraph) and determining an optimized neural architecture by determining a first trained candidate model of the plurality of trained candidate models that minimizes a knowledge distillation loss of the plurality of trained candidate models. (“We tested two progressive block-wise distillation strategy and compare their effectiveness with ours by experiments. All the three strategy is performed block by block by minimizing the MSE loss between feature maps of student supernet and the teacher.”, pg. 7, right column, under “4.4. Ablation Study”, See also Tables 4 and 5 on pg. 8) (A student model is selected with minimal distillation loss and high accuracy.)
However, Li does not explicitly teach determining a first set of block configurations that are Pareto optimal block configurations from the plurality of candidate block configurations in the plurality of sample-based search spaces.
Ci teaches determining a first set of block configurations that are Pareto optimal block configurations from the plurality of candidate block configurations in the plurality of sample-based search spaces (“During the iterative process, instead of keeping a single architecture as the intermediate result, we combine all architectures on the Pareto front found by a supernet trained with One-Shot [2] method to obtain an optimized search space, which will be inherited to the next round of search.”, pg. 2, left column, bottom paragraph)
Therefore, it would have been considered obvious to one of ordinary skill in the art, prior to the filing date of the current application, to combine the block-wise network architecture of Li, with the Pareto optimization of Ci. One would be motivated to combine the two teachings, prior to the filing date of the current application, as Patero optimal architectures help obtain optimal search spaces, as disclosed in Ci (“After Pareto front retrieval, we take the union of operations from all P Pareto-optimal architectures to get the optimized search space Aˆ s. Mathematically, we denote e p l = {opl n |g l n = 1, n ∈ Kl} as the selected operations of l-th layer for the p-th Pareto-optimal architecture ap, and denote Eˆ s l as the optimized search space subset of the layer l in Aˆ s”, pg. 5, right column, under “Aggregation.”)

Regarding claim 12, Li teaches the computing system of claim 6, but does not explicitly teach determining the first set of block configurations are Pareto optimal block configurations.
Ci teaches determining the first set of block configurations are Pareto optimal block configurations (“During the iterative process, instead of keeping a single architecture as the intermediate result, we combine all architectures on the Pareto front found by a supernet trained with One-Shot [2] method to obtain an optimized search space, which will be inherited to the next round of search.”, pg. 2, left column, bottom paragraph)
Therefore, it would have been considered obvious to one of ordinary skill in the art, prior to the filing date of the current application, to combine the block-wise network architecture of Li, with the Pareto optimization of Ci. One would be motivated to combine the two teachings, prior to the filing date of the current application, as Patero optimal architectures help obtain optimal search spaces, as disclosed in Ci (“After Pareto front retrieval, we take the union of operations from all P Pareto-optimal architectures to get the optimized search space Aˆ s. Mathematically, we denote e p l = {opl n |g l n = 1, n ∈ Kl} as the selected operations of l-th layer for the p-th Pareto-optimal architecture ap, and denote Eˆ s l as the optimized search space subset of the layer l in Aˆ s”, pg. 5, right column, under “Aggregation.”)

Regarding claim 19, Li teaches the method of claim 14, but does not explicitly teach determining the first set of block configurations are Pareto optimal block configurations.
Ci teaches determining the first set of block configurations are Pareto optimal block configurations (“During the iterative process, instead of keeping a single architecture as the intermediate result, we combine all architectures on the Pareto front found by a supernet trained with One-Shot [2] method to obtain an optimized search space, which will be inherited to the next round of search.”, pg. 2, left column, bottom paragraph)
Therefore, it would have been considered obvious to one of ordinary skill in the art, prior to the filing date of the current application, to combine the block-wise network architecture of Li, with the Pareto optimization of Ci. One would be motivated to combine the two teachings, prior to the filing date of the current application, as Patero optimal architectures help obtain optimal search spaces, as disclosed in Ci (“After Pareto front retrieval, we take the union of operations from all P Pareto-optimal architectures to get the optimized search space Aˆ s. Mathematically, we denote e p l = {opl n |g l n = 1, n ∈ Kl} as the selected operations of l-th layer for the p-th Pareto-optimal architecture ap, and denote Eˆ s l as the optimized search space subset of the layer l in Aˆ s”, pg. 5, right column, under “Aggregation.”)


Claim 2, 4 and 5 are rejected under 35 U.S.C. 103 as being unpatentable over Li in view of Ci and in further view of Jiaheng Liu et al. (Herein referred to as Liu) (Block Proposal Neural Architecture Search)

Regarding claim 2 Li, as modified by Ci, teaches the computing system of claim 1, but does not explicitly teach storing information describing each block in the plurality of blocks of the initial seed network into a database.
Liu teaches storing information describing each block in the plurality of blocks of the initial seed network into a database. (“Since the block search space is too large, we cannot measure the latencies of all possible blocks. To solve this problem, we use the sum of latencies from all paths in each block to estimate the latency of this block. This approximation strategy works well for most devices. We enumerate all possible input image resolutions and the number of channels, then separately measure the latency of each possible path, and store them in a lookup table.”, pgs. 4-5) (Latency of the blocks, which corresponds to information describing each block, is stored in a lookup table, the lookup table corresponding to a database, teaching the limitation)
Therefore, it would have been considered obvious to one of ordinary skill in the art, prior to the filing date of the current application, to combine the system of Li, as modified by Ci, with the storing of information as disclosed in Liu. One would be motivated to combine the two teachings, prior to the filing date of the current application, as it allow for the storage of data to be retrieved at a later time, like how is described in Liu. (“We stack the super blocks described in Section III-C-1 to construct the supernet by using the backbone network shown in Fig. 2. We adopt the lookup table to produce the latency of the i-th block proposal at the lth layer, which is represented as LATENCY(blockl i ), and use the corresponding sampling probability P(i;l) to estimate the latency of the whole supernet a”, pg. 5, right column, bottom paragraph)

Regarding claim 4, Li, as modified by Ci and Liu teaches dividing an initial seed network into a second plurality of blocks, wherein the second initial seed network includes a second plurality of a candidate neural architectures; (“After that, all of our searched architectures are retrained from scratch on the original training set without supervision from the teacher network and tested on the original validation set.”, pg. 6, left column, under “Choice of dataset and teacher model” (Li)) (The retrained architectures correspond to a second plurality of blocks and candidate architectures.) and retrieving, from the database, information describing at least one block of the second plurality of blocks. (“Let d denote the depth of the i-th block and C denote the num ber of the candidate operations in each layer. Then the size of the search space of the i-th block is Cdi,∀i ∈ [1,N]; N the size of the search space A is 3 i=0 Cdi”, pg. 3, right column, bottom paragraph; See also Algorithm 2 on pg. 2 (Li)) (Li teaches information describing at least one block, as evidenced by the information of the size of block used in Algorithm 2, and it is implying as such that this information is retrieved from somewhere. Although a database is never explicitly taught by Li, it would be easy to configure the block data to be stored and retrieved in a database, such as the database of Liu.)

Regarding claim 5, Li, as modified by Ci and Liu teaches the computing system of claim 4, wherein the step of retrieving, from the database, information describing at least one block of the second plurality of blocks is performed before defining, for each block in the second plurality of blocks, a second plurality of sample-based search spaces, wherein each sample-based search space in the second plurality of sample-based search spaces includes a second plurality of candidate block configurations, (“we propose to modularize the large search space of NAS into blocks to ensure that the potential candidate architectures are fully trained; this reduces the representation shift caused by the shared parameters and leads to the correct rating of the candidates.”, pg. 1, Abstract) (With the retrieval of data of Li, it would be easy to configure the retrieval of information describing at least block to happen before it is used to define a second plurality of block and search spaces.) and the second plurality of candidate block configurations are determined by determining candidate block configurations that minimize the block-wise knowledge distillation loss. (“All the three strategy is performed block by block by minimizing the MSE loss between feature maps of student supernet and the teacher.” pg. 7, right column, bottom paragraph (Li))


Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Li in view of Ci, in further view of Liu, and in further view of Jiemin Fang et al. (Herein referred to as Fang.) (Densely Connected Search Space for More Flexible Neural Architecture Search) (As cited in the IDS)

Regarding claim 3, Li, and modified by Ci and Liu, teach the computing system of claim 2, wherein the information describing each block includes at least one of a position in the seed network, and a size of each block. (“Let d denote the depth of the i-th block and C denote the number of the candidate operations in each layer. Then the size of the search space of the i-th block is C di , ∀i ∈ [1, N]; the size of the search space A is QN i=0 C di .”, pg. 3, right column, second to last paragraph)
However, the combination does not explicitly teach the information describing each block includes at least one of an input resolution, an output resolution
Fang teaches the information describing each block includes at least one of an input resolution, an output resolution (“As shown in Fig. 2, the input tensors from these routing blocks differ in terms of width and spatial resolution. Each input tensor is transformed to a same size by the corresponding branch of shape-alignment layers in Bi .”, pg. 5, left column, first paragraph) (As shown in Figure 2, and described on pages 4 and 5, the blocks have input and output resolutions.)
Therefore it would have been considered obvious to one of ordinary skill in the art, prior to the filing date of the current application, to combine the system of Li, as modified by Ci and Liu, with the resolution of blocks as disclosed in Fang. One would be motivated to combine the two teachings, prior to the filing date of the current application, as spatial resolution helps provide better connections between routing blocks, as disclosed in Fang. (“We define the connection between the routing block Bi and its subsequent routing block Bj (j > i) as Cij. The spatial resolutions of Bi and Bj are Hi × Wi and Hj ×Wj respectively (normally Hi = Wi and Hj = Wj). We set some constraints on the connections to avoid the stride of the spatial down-sampling exceeding 2. Specifically, Cij only exists when j − i ≤ M and Hi Hj ≤ 2.”, pg. 4, right column, first paragraph)


Claims 7, 10, 11, 15, 17 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Li in view of Liu.

Regarding claims 7 and 15, Li teaches the computing system and method of claims 6 and 14 respectively, but does not explicitly teach storing information describing each block in the plurality of blocks of the initial seed network into a database.
Liu teaches storing information describing each block in the plurality of blocks of the initial seed network into a database. (“Since the block search space is too large, we cannot measure the latencies of all possible blocks. To solve this problem, we use the sum of latencies from all paths in each block to estimate the latency of this block. This approximation strategy works well for most devices. We enumerate all possible input image resolutions and the number of channels, then separately measure the latency of each possible path, and store them in a lookup table.”, pgs. 4-5) (Latency of the blocks, which corresponds to information describing each block, is stored in a lookup table, teaching the limitation)
Therefore, it would have been considered obvious to one of ordinary skill in the art, prior to the filing date of the current application, to combine the system of Li, as modified by Ci, with the storing of information as disclosed in Liu. One would be motivated to combine the two teachings, prior to the filing date of the current application, as it allow for the storage of data to be retrieved at a later time, like how is described in Liu. (“We stack the super blocks described in Section III-C-1 to construct the supernet by using the backbone network shown in Fig. 2. We adopt the lookup table to produce the latency of the i-th block proposal at the lth layer, which is represented as LATENCY(blockl i ), and use the corresponding sampling probability P(i;l) to estimate the latency of the whole supernet a”, pg. 5, right column, bottom paragraph)

Regarding claims 10 and 17, Li, as modified by Liu teaches the computing system and method of claims 6 and 14 respectively, as well as dividing an initial seed network into a second plurality of blocks, wherein the second initial seed network includes a second plurality of a candidate neural architectures; (“After that, all of our searched architectures are retrained from scratch on the original training set without supervision from the teacher network and tested on the original validation set.”, pg. 6, left column, under “Choice of dataset and teacher model” (Li)) (The retrained architectures correspond to a second plurality of blocks and candidate architectures.) and retrieving, from the database, information describing at least one block of the second plurality of blocks. (“Let d denote the depth of the i-th block and C denote the num ber of the candidate operations in each layer. Then the size of the search space of the i-th block is Cdi,∀i ∈ [1,N]; N the size of the search space A is 3 i=0 Cdi”, pg. 3, right column, bottom paragraph; See also Algorithm 2 on pg. 2 (Li)) (Li teaches information describing at least one block, as evidenced by the information of the size of block used in Algorithm 2, and it is implying as such that this information is retrieved from somewhere. Although a database is never explicitly taught by Li, it would be easy to configure the block data to be stored and retrieved in a database, such as the database of Liu.)

Regarding claims 11 and 18, Li, as modified by Liu teaches the computing system and method of claims 10 and 17 respectively, as well as the step of retrieving, from the database, information describing at least one block of the second plurality of blocks is performed before defining, for each block in the second plurality of blocks, a second plurality of sample-based search spaces, wherein each sample-based search space in the second plurality of sample-based search spaces includes a second plurality of candidate block configurations, (“we propose to modularize the large search space of NAS into blocks to ensure that the potential candidate architectures are fully trained; this reduces the representation shift caused by the shared parameters and leads to the correct rating of the candidates.”, pg. 1, Abstract) (With the retrieval of data of Li, it would be easy to configure the retrieval of information describing at least block to happen before it is used to define a second plurality of block and search spaces.) and the second plurality of candidate block configurations are determined by determining candidate block configurations that minimize the block-wise knowledge distillation loss. (“All the three strategy is performed block by block by minimizing the MSE loss between feature maps of student supernet and the teacher.” pg. 7, right column, bottom paragraph (Li))


Claims 8, 9 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Li in view of Liu and in further view of Fang.

Regarding claims 8 and 16, Li, as modified by Liu teaches the computing system and method of claims 7 and 15 respectively, as well as the information describing each block includes at least a position in the seed network, and a size of each block. (“Let d denote the depth of the i-th block and C denote the number of the candidate operations in each layer. Then the size of the search space of the i-th block is C di , ∀i ∈ [1, N]; the size of the search space A is QN i=0 C di .”, pg. 3, right column, second to last paragraph)
However, the combination does not explicitly teach the information describing each block includes at least one of an input resolution, an output resolution
Fang teaches the information describing each block includes at least one of an input resolution, an output resolution (“As shown in Fig. 2, the input tensors from these routing blocks differ in terms of width and spatial resolution. Each input tensor is transformed to a same size by the corresponding branch of shape-alignment layers in Bi .”, pg. 5, left column, first paragraph) (As shown in Figure 2, and described on pages 4 and 5, the blocks have input and output resolutions.)
Therefore it would have been considered obvious to one of ordinary skill in the art, prior to the filing date of the current application, to combine the system of Li, as modified by Ci and Liu, with the resolution of blocks as disclosed in Fang. One would be motivated to combine the two teachings, prior to the filing date of the current application, as spatial resolution helps provide better connections between routing blocks, as disclosed in Fang. (“We define the connection between the routing block Bi and its subsequent routing block Bj (j > i) as Cij. The spatial resolutions of Bi and Bj are Hi × Wi and Hj ×Wj respectively (normally Hi = Wi and Hj = Wj). We set some constraints on the connections to avoid the stride of the spatial down-sampling exceeding 2. Specifically, Cij only exists when j − i ≤ M and Hi Hj ≤ 2.”, pg. 4, right column, first paragraph)

Regarding claim 9, Li, as modified by Liu and Fang, teaches the computing system of claim 8, wherein the information describing each block includes each of the input resolution, the output resolution, the position in the seed network, and the size of each block. (“Let d denote the depth of the i-th block and C denote the number of the candidate operations in each layer. Then the size of the search space of the i-th block is C di , ∀i ∈ [1, N]; the size of the search space A is QN i=0 C di .”, pg. 3, right column, second to last paragraph (Li)) (Li teaches the position and size for each block) (“As shown in Fig. 2, the input tensors from these routing blocks differ in terms of width and spatial resolution. Each input tensor is transformed to a same size by the corresponding branch of shape-alignment layers in Bi .”, pg. 5, left column, first paragraph (Feng)) (Feng teaches the resolution for each block)


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Tyler E Iles whose telephone number is (571)272-5442. The examiner can normally be reached 9:00am - 5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached at (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/T.E.I./             Patent Examiner, Art Unit 2122                                                                                                                                                                                           
/KAKALI CHAKI/             Supervisory Patent Examiner, Art Unit 2122
Read full office action
BLOCK-WISE NEURAL ARCHITECTURE SEARCH USING GUIDED SEARCH ALGORITHM

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

BLOCK-WISE NEURAL ARCHITECTURE SEARCH USING GUIDED SEARCH ALGORITHM

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email