DETAILED ACTION
Claims 1-20 are pending and have been examined.
--
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statements (IDS) submitted on 03/31/2023, 08/22/2024 and 01/15/2025 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements are being considered by the examiner.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
-
Claims 1-20 rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more
Step 1: Claims 1-7 recite a method. Claims 8-14 recite an apparatus comprising a memory and a processor. Claims 15-20 recite a non-transitory machine-readable media. Therefore, claims 1-7 are directed to a process, claims 8-14 are directed to a machine, and claims 15-20 are directed to a manufacture.
With respect to claims 1, 8 and 15:
2A Prong 1: The claim recites a judicial exception.
constructing a supernetwork based on the search space information and…, the supernetwork comprising a plurality of sub-networks (mental process – evaluation or judgement, a user can manually construct/design a supernetwork comprising sub-networks)
searching a sub-network from the trained supernetwork based on the search indicator, to obtain a target network for performing the target task (mental process – evaluation or judgement, a user can manually search a sub-network based on the indicator)
2A Prong 2: The judicial exception is not integrated into a practical application.
(claim 8) a memory operable to store computer-readable instructions; and a processor circuitry operable to read the computer-readable instructions, the processor circuitry when executing the computer-readable instructions is configured to (claim 15) having instructions stored on the machine- readable media, the instructions configured to, when executed, cause a machine to (mere instructions to apply an exception, (2) Whether the claim invokes computers - MPEP 2106.05(f); generic computer components)
obtaining network construction information corresponding to a target task, the network construction information comprising search space information, sample data, and a search indicator (insignificant extra-solution activity – MPEP 2106.05(g), (3) data gathering and outputting)
training the supernetwork based on the sample data (mere instructions to apply an exception – MPEP 2106.05(f), (3) The particularity or generality of the application of the judicial exception; high level recitation of training the supernetwork)
Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is directed to an abstract idea.
2B: The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.
(claim 8) a memory operable to store computer-readable instructions; and a processor circuitry operable to read the computer-readable instructions, the processor circuitry when executing the computer-readable instructions is configured to (claim 15) having instructions stored on the machine- readable media, the instructions configured to, when executed, cause a machine to (mere instructions to apply an exception, (2) Whether the claim invokes computers - MPEP 2106.05(f); generic computer components)
obtaining network construction information corresponding to a target task, the network construction information comprising search space information, sample data, and a search indicator (insignificant extra-solution activity – MPEP 2106.05(g), (3) data gathering and outputting, and WURC: receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 - MPEP 2106.05(d)(II)(i))
training the supernetwork based on the sample data (mere instructions to apply an exception – MPEP 2106.05(f), (3) The particularity or generality of the application of the judicial exception; high level recitation of training the supernetwork)
Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.
With respect to claims 2, 9 and 16:
2A Prong 1: The claim recites a judicial exception.
the constructing the supernetwork based on the search space information comprises (mental process – evaluation or judgement, a user can manually construct/design a supernetwork)
extracting, from the search space information, a branch construction form corresponding to each of the modules (mental process – evaluation or judgement, a user can manually extract a branch form)
constructing, for each of the modules, branches of the module according to a branch construction form corresponding to the module (mental process – evaluation or judgement, a user can manually construct/design branches according to a branch form)
connecting the branches of the module in parallel to construct the module; and connecting the modules in series to construct the supernetwork (mental process – evaluation or judgement, a user can manually connect the branches and the modules)
2A Prong 2: The judicial exception is not integrated into a practical application.
wherein the search space information comprises branch construction forms corresponding to a plurality of modules used to construct the supernetwork, and (insignificant extra-solution activity – MPEP 2106.05(g), (3) data gathering and outputting; Claim 1 recites “obtaining network construction information… comprising search space information” which insignificant extra-solution activity. Specifying the search space information comprising branch construction forms does not cause the limitation to integrate the exception into a practical application)
Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is directed to an abstract idea.
2B: The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.
wherein the search space information comprises branch construction forms corresponding to a plurality of modules used to construct the supernetwork, and (insignificant extra-solution activity – MPEP 2106.05(g), (3) data gathering and outputting, and WURC: receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 - MPEP 2106.05(d)(II)(i)) (Claim 1 recites “obtaining network construction information… comprising search space information” which insignificant extra-solution activity. Specifying the search space information comprising branch construction forms does not cause the limitation to be significantly more than the judicial exception)
Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.
With respect to claims 3, 10 and 17:
2A Prong 1: The claim recites a judicial exception.
wherein the connecting the modules in series to construct the supernetwork comprises: connecting the modules in series and connecting respective inputs and outputs of the modules to construct the supernetwork (mental process – evaluation or judgement, a user can manually connect the modules and respective inputs and outputs of the modules)
With respect to claims 4, 11 and 18:
2A Prong 1: The claim recites a judicial exception.
selecting a sub-network from the supernetwork, and (mental process – evaluation or judgement, a user can manually select a sub-network)
performing backpropagation on the selected sub-network based on the data outputted by the selected sub-network and the reference sample data (mathematical calculations, backpropagation based on the output data and the reference data)
2A Prong 2: The judicial exception is not integrated into a practical application.
wherein the sample data comprises training data, the training data comprises training sample data and reference sample data corresponding to the training sample data, and (insignificant extra-solution activity – MPEP 2106.05(g), (3) data gathering and outputting; Claim 1 recites “obtaining network construction information… comprising… sample data” which insignificant extra-solution activity. Specifying the sample data comprising training data with training/reference sample data does not cause the limitation to integrate the exception into a practical application)
the training the supernetwork based on the sample data comprises (mere instructions to apply an exception – MPEP 2106.05(f), (3) The particularity or generality of the application of the judicial exception; high level recitation of training the supernetwork)
inputting the training sample data into the selected sub-network for forward calculation, to obtain data outputted by the selected sub-network (insignificant extra-solution activity – MPEP 2106.05(g), (3) data gathering and outputting)
Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is directed to an abstract idea.
2B: The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.
wherein the sample data comprises training data, the training data comprises training sample data and reference sample data corresponding to the training sample data, and (insignificant extra-solution activity – MPEP 2106.05(g), (3) data gathering and outputting, and WURC: receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 - MPEP 2106.05(d)(II)(i)) (Claim 1 recites “obtaining network construction information… comprising… sample data” which insignificant extra-solution activity. Specifying the sample data comprising training data with training/reference sample data does not cause the limitation to be significantly more than the judicial exception)
the training the supernetwork based on the sample data comprises (mere instructions to apply an exception – MPEP 2106.05(f), (3) The particularity or generality of the application of the judicial exception; high level recitation of training the supernetwork)
inputting the training sample data into the selected sub-network for forward calculation, to obtain data outputted by the selected sub-network (insignificant extra-solution activity – MPEP 2106.05(g), (3) data gathering and outputting, and WURC: receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 - MPEP 2106.05(d)(II)(i))
Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.
With respect to claims 5, 12 and 19:
2A Prong 1: The claim recites a judicial exception.
wherein the supernetwork comprises a plurality of modules connected in series, each of the plurality of modules comprises a plurality of branches connected in parallel, and (mental process – evaluation or judgement, claim 1 recites “constructing a supernetwork” which is an abstract idea. Specifying the supernetwork comprising modules with branches does not change the scope of the claim)
the selecting a sub-network from the supernetwork comprises: selecting a branch from each of the plurality of modules of the supernetwork, and connecting, in series, branches selected from the modules to form a sub-network (mental process – evaluation or judgement, a user can manually select a branch and connect branches)
With respect to claims 6, 13 and 20:
2A Prong 1: The claim recites a judicial exception.
… the searching the sub-network from the trained supernetwork based on the search indicator to obtain the target network comprises (mental process – evaluation or judgement, a user can manually search the sub-network)
selecting, from the sub-networks, a sub-network with a best performance and satisfying the search indicator as the target network based on the performance parameter (mental process – evaluation or judgement, a user can manually select a sub-network)
2A Prong 2: The judicial exception is not integrated into a practical application.
wherein the sample data comprises test data, and (insignificant extra-solution activity – MPEP 2106.05(g), (3) data gathering and outputting; Claim 1 recites “obtaining network construction information… comprising… sample data” which insignificant extra-solution activity. Specifying the sample data comprising test data does not cause the limitation to integrate the exception into a practical application)
searching for sub-networks from the trained supernetwork based on a search algorithm (mere instructions to apply an exception – MPEP 2106.05(f), (3) The particularity or generality of the application of the judicial exception; high level recitation of searching based on an algorithm)
performing a performance test on the sub-networks by inputting the test data into the sub- networks obtained through search, to obtain a performance parameter for each of the sub-networks (mere instructions to apply an exception – MPEP 2106.05(f), (3) The particularity or generality of the application of the judicial exception; high level recitation of performing a performance test on the sub-networks)
Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is directed to an abstract idea.
2B: The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.
wherein the sample data comprises test data, and (insignificant extra-solution activity – MPEP 2106.05(g), (3) data gathering and outputting, and WURC: receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 - MPEP 2106.05(d)(II)(i)) (Claim 1 recites “obtaining network construction information… comprising… sample data” which insignificant extra-solution activity. Specifying the sample data comprising test data does not cause the limitation to be significantly more than the judicial exception)
searching for sub-networks from the trained supernetwork based on a search algorithm (mere instructions to apply an exception – MPEP 2106.05(f), (3) The particularity or generality of the application of the judicial exception; high level recitation of searching based on an algorithm)
performing a performance test on the sub-networks by inputting the test data into the sub- networks obtained through search, to obtain a performance parameter for each of the sub-networks (mere instructions to apply an exception – MPEP 2106.05(f), (3) The particularity or generality of the application of the judicial exception; high level recitation of performing a performance test on the sub-networks)
Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.
With respect to claims 7 and 14:
2A Prong 2: The judicial exception is not integrated into a practical application.
wherein the method further comprises: training the target network based on the sample data (mere instructions to apply an exception – MPEP 2106.05(f), (3) The particularity or generality of the application of the judicial exception; high level recitation of training the network based on the data)
Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is directed to an abstract idea.
2B: The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.
wherein the method further comprises: training the target network based on the sample data (mere instructions to apply an exception – MPEP 2106.05(f), (3) The particularity or generality of the application of the judicial exception; high level recitation of training the network based on the data)
Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 4, 7-8, 11, 14-15 and 18 rejected under 35 U.S.C. 103 as being unpatentable over Cai ("Once for all: Train one network and specialize it for efficient deployment" 20200429) in view of Roth (US 20210374502 A1, filed on 20200601)
In regard to claims 1, 8 and 15, Cai teaches: A search method, comprising: obtaining network construction information corresponding to a target task, (Cai, p. 1 Abstract "In this work, we propose to train a once-for-all (OFA) network that supports diverse architectural settings… OFA is the winning solution for the 3rd Low Power Computer Vision Challenge (LPCVC), DSP classification track and the 4th LPCVC, both classification track and detection track. [e.g. a target task]"; see the next limitation for network construction information)
the network construction information comprising search space information, sample data, and a search indicator; (Cai, p. 4, Architecture space [search space information] "Our once-for-all network provides one model but supports many sub-networks of different sizes, covering four important dimensions of the convolutional neural networks (CNNs) architectures, i.e., depth, width, kernel size, and resolution... We allow each unit to use arbitrary numbers of layers (denoted as elastic depth); For each layer, we allow to use arbitrary numbers of channels (denoted as elastic width) and arbitrary kernel sizes (denoted as elastic kernel size)...") (Cai, p. 7, 4 "we first apply the progressive shrinking algorithm to train the once-for-all network on ImageNet [e.g. sample data]") (Cai, p. 2, 1 Introduction "Given the target hardware and constraint, [a search indicator] a predictor-guided architecture search... is conducted to get a specialized sub-network")
constructing a supernetwork based on the search space information and training the supernetwork based on the sample data, the supernetwork comprising a plurality of sub-networks; and (Cai, p. 5, Progressive Shrinking "The once-for-all network comprises many sub-networks of different sizes where small sub-networks are nested in large sub-networks. [the supernetwork comprising a plurality of sub-networks]… where we start with training the largest neural network with the maximum kernel size (e.g., 7), depth (e.g., 4), and width (e.g., 6). [constructing a supernetwork based on the search space information] Next, we progressively fine-tune the network to support smaller sub-networks by gradually adding them into the sampling space...") (Cai, p. 7, 4 Experiments "we first apply the progressive shrinking algorithm to train the once-for-all network on ImageNet [training
PNG
media_image1.png
352
448
media_image1.png
Greyscale
the supernetwork based on the sample data]")
searching a sub-network from the trained supernetwork based on the search indicator, to obtain a target network for performing the target task. (Cai, p. 6, 3.4 Specialized model deployment with once-for-all network "Having trained a once-for-all network, [from the trained supernetwork] the next stage is to derive the specialized sub-network [searching a sub-network... to obtain the target network] for a given deployment scenario. The goal is to search for a neural network that satisfies the efficiency (e.g., latency, energy) constraints [based on the search indicator] on the target hardware while optimizing the accuracy...")
Cai teaches the concept of a supernetwork as an OFA network (comprising many sub-networks). Cai does not explicitly teach the term ‘supernetwork,’ but Roth teaches: constructing a supernetwork… (Roth, [0063] "neural network 104 is a supernet, which may also be referred to as a supernetwork,
PNG
media_image2.png
572
566
media_image2.png
Greyscale
model architecture, and/or a neural network comprising a plurality of neural networks.")
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Cai to incorporate the teachings of Roth by using Cai's OFA as Roth's supernet and by including data adaptation process. Doing so would effectively adapt a model to a target domain and result in an optimal path that defines a locally adapted sub-network. (Roth, [0069] "once supernet 104 is trained, a sub-network s0 is found, at each client 106, 108 through supernet 104, effectively adapting a model to a target domain… during adaptation, model parameters ϕ stay fixed and only path weights are optimized... this results in an optimal path... that defines a locally adapted sub-network s0 ∈ S.")
Claims 8 and 15 recite substantially the same limitation as claim 1, therefore the rejection applied to claim 1 also apply to claims 8 and 15. In addition, Cai teaches: (claim 8) A search apparatus, comprising: a memory operable to store computer-readable instructions; and a processor circuitry operable to read the computer-readable instructions, the processor circuitry when executing the computer-readable instructions is configured to: (claim 15) A non-transitory machine-readable media, having instructions stored on the machine-readable media, the instructions configured to, when executed, cause a machine to: (Cai, p. 7, 4 EXPERIMENTS "Training Details… The full network is trained for 180 epochs with batch size 2048 network on 32 GPUs... further fine-tune the full network. The whole training process takes around 1,200 GPU hours on V100 GPUs"; GPU and training details inherently teach all the computer components)
In regard to claims 4, 11 and 18, Cai teaches: the training the supernetwork based on the sample data comprises: (Cai, p. 7, 4 Experiments "we first apply the progressive shrinking algorithm to train the once-for-all network on ImageNet [training the supernetwork based on the sample data]")
Cai does not teach, but Roth teaches: wherein the sample data comprises training data, the training data comprises training sample data and reference sample data corresponding to the training sample data, and (Roth, [0070] "gi is a ground truth label map [reference sample data] at a given voxel i. [training sample data]... during adaptation, model parameters ϕ stay fixed and only path weights are optimized for one epoch on a local validation set. [training data]")
selecting a sub-network from the supernetwork, and inputting the training sample data into the selected sub-network for forward calculation, to obtain data outputted by the selected sub-network; and (Roth, [0069] "a Dice loss is applied as a loss function, which works well in segmentation tasks with an unbalance in an amount of foreground/background regions: min(L_Dice) = (… pigi...) (2)… pi is a predicted probability from a final sigmoid activated output layer of supernet f (X) [forward calculation, to obtain data outputted by the selected sub-network] and gi is a ground truth label map at a given voxel i. [training sample data] In at least one embodiment, once supernet 104 is trained, a sub-network s0 is found, [selecting a sub-network] at each client 106, 108 through supernet 104, effectively adapting a model to a target domain.")
performing backpropagation on the selected sub-network based on the data outputted by the selected sub-network and the reference sample data. (Roth, [0069] "a Dice loss is applied as a loss function... min(L_Dice) = (… pigi...) (2)… pi is a predicted probability from a final sigmoid activated output layer of supernet f (X) [the data outputted by the selected sub-network] and gi is a ground truth label map [the reference sample data] at a given voxel i... during adaptation, model parameters ϕ stay fixed and only path weights are optimized for one epoch on a local validation set. In at least one embodiment, this results in an optimal path... that defines a locally adapted sub-network s0 ∈ S."; [0077] "parameters of new sub-networks are updated during gradient back-propagation.")
The rationale for combining the teachings of Cai and Roth is the same as set forth in the rejection of claim 1.
In regard to claims 7 and 14, Cai teaches: wherein the method further comprises: training the target network based on the sample data. (Cai, p. 8, Table 1 "'#25' denotes the specialized sub-networks are fine-tuned [training the target network based on data] for 25 epochs after grabbing weights from the once-for-all network"; p. 8, Comparison with NAS on Mobile Devices "We can further improve the top1 accuracy to 76.4% by fine-tuning the specialized sub-network for 25 epochs and to 76.9% by fine-tuning for 75 epochs")
Claims 2-3, 5, 9-10, 12, 16-17 and 19 rejected under 35 U.S.C. 103 as being unpatentable over Cai in view of Cai and Roth as applied to claims 1, 8 and 15, and in further view of Bender ("Understanding and Simplifying One-Shot Architecture Search" 2018)
In regard to claims 2, 9 and 16, Cai and Roth do not teach, but Bender teaches: wherein the search space information comprises branch construction forms corresponding to a plurality of modules used to construct the supernetwork, and the constructing the supernetwork based on the search space information comprises: (Bender, p. 3, 3.1. Search Space Design "This approach is applied to a much larger model as shown in Figure 3. Following Zoph et al. (2017), our network [the supernetwork] is composed of several identical cells which are stacked on top of each other. Each cell [modules] is divided into a fixed number of choice blocks. [branch construction forms]"; see Figure 3, cell is [module])
PNG
media_image3.png
396
1338
media_image3.png
Greyscale
extracting, from the search space information, a branch construction form corresponding to each of the modules; (Bender, p. 3, 3.1. Search Space Design "The number of choice blocks within each cell, N_choice, is a hyper-parameter of the search space. In our experiments, we set N_choice = 4. [extracting/selecting a branch construction form (choice blocks)]")
constructing, for each of the modules, branches of the module according to a branch construction form corresponding to the module; (Bender, p. 3, 3.1. Search Space Design "Each choice block can consume the outputs of the two most recent cells in the network… Each choice block can select up to two operations from a menu of seven possible options…")
connecting the branches of the module in parallel to construct the module; and (Bender, p. 4, Figure 3 "Diagram of the one-shot architecture used in our experiments. Solid lines indicate components that are present in every architecture"; see Figure 3, choice 1, 2 and 3 are in parallel)
connecting the modules in series to construct the supernetwork. (Bender, p. 3, 3.1. Search Space Design "our network is composed of several identical cells which are stacked on top of each other."; see Figure 3, the leftmost diagram, all the blocks are connected from the input to output)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Cai and Roth to incorporate the teachings of Bender by including the search space as proposed in Bender. Doing so would have a search space that is large and expressive enough to capture a diverse set of interesting candidate architectures. (Bender, p. 2, 3.1. Search Space Design "the search space should be large and expressive enough to capture a diverse set of interesting candidate architectures.")
In regard to claims 3, 10 and 17, Cai and Roth do not teach, but Bender teaches: wherein the connecting the modules in series to construct the supernetwork comprises: connecting the modules in series and connecting respective inputs and outputs of the modules to construct the supernetwork. (Bender, p. 3, 3.1. Search Space Design "our network is composed of several identical cells which are stacked on top of each other."; see Figure 3, the leftmost diagram, all the blocks are connected from the input, stem…, cell…, output; also see prior art Zoph "Figure 2. Scalable architectures for image classification consist of two repeated motifs termed Normal Cell and Reduction Cell")
The rationale for combining the teachings of Cai, Roth and Bender is the same as set forth in the rejection of claim 2.
In regard to claims 5, 12 and 19, Cai and Roth do not teach, but Bender teaches: wherein the supernetwork comprises a plurality of modules connected in series, each of the plurality of modules comprises a plurality of branches connected in parallel, and the selecting a sub-network from the supernetwork comprises: (Bender, p. 3, 3.1. Search Space Design "This approach is applied to a much larger model as shown in Figure 3. Following Zoph et al. (2017), our network [the supernetwork] is composed of several identical cells which are stacked on top of each other. Each cell [modules] is divided into a fixed number of choice blocks. [branch construction forms]"; see Figure 3, cell is [module])
selecting a branch from each of the plurality of modules of the supernetwork, and connecting, in series, branches selected from the modules to form a sub-network. (Bender, p. 3, 3.1. Search Space Design "Each choice block can consume the outputs of the two most recent cells in the network. This means that each choice block can select from up to five possible inputs: two from previous cells and up to three from previous choice blocks within the same cell."; p. 4, Figure 3 "-----> Edge selected by architecture search… dashed lines indicate optional components that are part of the search space."; see the dashed lines [a branch] in the cell)
The rationale for combining the teachings of Cai, Roth and Bender is the same as set forth in the rejection of claim 2.
Claims 6, 13 and 20 rejected under 35 U.S.C. 103 as being unpatentable over Cai and Roth as applied to claims 1, 8 and 15, and in further view of Li ("Random Search and Reproducibility for Neural Architecture Search" 20190730)
In regard to claims 6, 13 and 20, Cai teaches: the searching the sub-network from the trained supernetwork based on the search indicator to obtain the target network comprises: (Cai, p. 6, 3.4 Specialized model deployment with once-for-all network "Having trained a once-for-all network, [from the trained supernetwork] the next stage is to derive the specialized sub-network [searching the sub-network... to obtain the target network] for a given deployment scenario. The goal is to search for a neural network that satisfies the efficiency (e.g., latency, energy) constraints [based on the search indicator] on the target hardware while optimizing the accuracy...")
… selecting, from the sub-networks, a sub-network with a best performance and satisfying the search indicator as the target network based on the performance parameter. (Cai, p. 2, Figure 1 "Left:… Given a deployment scenario, a specialized sub-network is directly selected [selecting... a sub-network as the target network] from the once-for-all network without training."; p. 6, 3.4 Specialized model deployment with once-for-all network "Having trained a once-for-all network, the next stage is to derive the specialized sub-network for a given deployment scenario. The goal is to search for a neural network that satisfies the efficiency (e.g., latency, energy) constraints [satisfying the search indicator] on the target hardware while optimizing the accuracy. [a best performance]... we randomly sample 16K sub-networks [the sub-networks] with different architectures and input image sizes... These [architecture, accuracy] pairs are used to train an accuracy predictor to predict the accuracy of a model [the performance parameter] given its architecture and input image...")
Cai and Roth do not teach, but Li teaches: wherein the sample data comprises test data, and (Li, p. 7, 3 METHODOLOGY "After training the shared weights for a certain number of epochs, we use these trained shared weights to evaluate the performance of a number of randomly sampled architectures on a separate held out dataset. [test data]"; a held-out dataset is deliberately separated from the original dataset, i.e. the original sample data comprises a head-out dataset [test data])
searching for sub-networks from the trained supernetwork based on a search algorithm; (Li, p. 7, 3 METHODOLOGY "In order to combine random search [a search algorithm] with weight-sharing, we simply use randomly sampled architectures [sub-networks] to train the shared weights. Shared weights are updated by selecting a single architecture for a given minibatch... the number of architectures used to update the shared weights is equivalent to the total number of minibatch training iterations."; p. 14, 4.2.2 Impact of Meta-Hyperparameters "each version of random search with weight-sharing...")
performing a performance test on the sub-networks by inputting the test data into the sub-networks obtained through search, to obtain a performance parameter for each of the sub-networks; and (Li, p. 7, 3 METHODOLOGY "After training the shared weights for a certain number of epochs, we use these trained shared weights to evaluate the performance [performing a performance test] of a number of randomly sampled architectures [the sub-networks obtained through search] on a separate held out dataset. [test data]"; p. 14, 4.2.2 Impact of Meta-Hyperparameters "In stage (1), we train the shared weights and use them to evaluate a given number of randomly sampled architectures on the test set."; p. 4, Evaluation Method "For each hyperparameter configuration considered by a search method... subsequently measuring its quality, e.g., its predictive accuracy")
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Cai and Roth to incorporate the teachings of Li by including a random search algorithm. Doing so would achieving a state-of-the-art result. (Li, p. 1, Abstract "a novel random search with weight-sharing algorithm on two standard NAS benchmarks—PTB and CIFAR-10…. random search with weight-sharing outperforms random search with early-stopping, achieving a state-of-the-art NAS result on PTB and a highly competitive result on CIFAR-10.")
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Zoph ("Learning Transferable Architectures for Scalable Image Recognition" 20180411) teaches scalable architectures.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SU-TING CHUANG whose telephone number is (408)918-7519. The examiner can normally be reached Monday - Thursday 8-5 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Usmaan Saeed can be reached at (571) 272-4046. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/SU-TING CHUANG/Examiner, Art Unit 2146