Last updated: May 29, 2026
Application No. 17/514,840
METHOD AND APPARATUS FOR ANALYZING NEURAL NETWORK PERFORMANCE

Final Rejection §103
Filed
Oct 29, 2021
Priority
Sep 29, 2020 — EU 20199106.4 +1 more
Examiner
LEE, MICHAEL CHRISTOPHER
Art Unit
2128
Tech Center
2100 — Computer Architecture & Software
Assignee
Samsung Electronics Co., Ltd.
OA Round
4 (Final)
Interview Optional

— +26.0% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 61% grant rate with +26.0% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 144 resolved cases, 2023–2026
Examiner Intelligence

LEE, MICHAEL CHRISTOPHER View full profile →
Grants 61% of resolved cases
Career Allowance Rate
88 granted / 144 resolved
+6.1% vs TC avg
Strong +26% interview lift
Without
With
+26.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 3m
Avg Prosecution
19 currently pending
Career history
195
Total Applications
across all art units
Statute-Specific Performance

§101
15.9%
-24.1% vs TC avg
§103
79.7%
+39.7% vs TC avg
§102
0.8%
-39.2% vs TC avg
§112
3.3%
-36.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 144 resolved cases
Office Action

§103
DETAILED ACTION
Notice of AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
Applicant’s Amendment and remarks submitted on 1/20/2026 have been considered.  Claims 2-3 have been cancelled by Applicant.  Claims 1 and 4-15 are pending.
Response to Arguments
On page 13 of Applicant’s 1/20/2026 Amendment and remarks, Applicant asserts that page 10 of the instant specification provides sufficient written description support for the claim amendments.
The examiner agrees that page 10 of the instant specification provides sufficient written description support for the claim amendments.

On pages 13-16 of Applicant’s 1/20/2026 Amendment and remarks, with respect to the rejections under 35 U.S.C. 103, Applicant argues that as amended, the prior art of record does not teach the “wherein the hardware arrangement is selected as being a single-chip device comprising the plurality of interconnected components or devices or a system comprising the plurality of interconnected components or devices; wherein: based on the hardware arrangement being selected as the single-chip device, the feature vector comprises at least one of a component type or a bandwidth; and  based on the hardware arrangement being selected as the system, the feature vector comprises at least one of a processor type, a device type, a clock frequency, a memory size, or a bandwidth” limitations.
The examiner agrees that the prior art of record does not teach at least the “wherein the hardware arrangement is selected as being a single-chip device comprising the plurality of interconnected components or devices or a system comprising the plurality of interconnected components or devices” limitation.  The previous rejections under 35 U.S.C. 103 are hereby withdrawn.  However, new grounds of rejection, in view of the JIANG, SHI, SCANLON, and COHEN references, are provided below, where such new grounds of rejection are necessitated by Applicant’s amendments to independent claim 1.

On page 15 of Applicant’s 1/20/2026 Amendment and remarks, with respect to the rejection of claim 1 under 35 U.S.C. 103, Applicant argues with respect to the COHEN reference:

    PNG
    media_image1.png
    178
    640
    media_image1.png
    Greyscale

In response to Applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).  The rejections are based on the combination of the JIANG, SHI, SCANLON, and COHEN references, where JIANG teaches graphical models representing hardware arrangements and SHI teaches the concept of extracting feature matrices and vectors from a graph structure, and together with COHEN, teaches that features relating to hardware can include features such as component or processor type, bandwidth, etc.

On page 16 of Applicant’s 1/20/2026 Amendment and remarks, Applicant argues that dependent claims 4-15 should be allowed for the same reasons argued with respect to claim 1.
The examiner respectfully disagrees for the same reasons explained with respect to claim 1.

Claim Rejections - 35 USC § 103
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 4-7, and 9-15 are rejected under 35 U.S.C. 103 as being unpatentable over Jiang, Weiwen, et al. "Hardware/Software Co-Exploration of Neural Architectures." arXiv preprint arXiv:1907.04650 (2019), pp. 1-10, hereinafter referenced as JIANG, in view of Shi, Han, et al. "Efficient Sample-Based Neural Architecture Search with Learnable Predictor." arXiv preprint arXiv:1911.09336v2 (March 5, 2020), hereinafter referenced as SHI, and further in view of US 20200286490 A1, hereinafter referenced as SCANLON, and further in view of US 20140372347 A1, hereinafter referenced as COHEN.

Regarding Claim 1
	JIANG teaches:
A computer-implemented method using a trained predictor for predicting performance of a neural network model on a hardware arrangement, ... the computer-implemented method comprising:  (JIANG, pp. 1-2, section I: “Interestingly, the hardware design space is tightly coupled with the architecture search space, i.e., the best neural architecture depends on the hardware (hardware-aware NAS), and the best hardware depends on the neural architecture. It is therefore best to jointly explore both spaces to push forward the Pareto frontier between hardware efficiency and test accuracy for better design tradeoffs. ... As such, we use accuracy and pipeline efficiency to guide the exploration of the neural architecture space and hardware design space respectively, while satisfying a given throughput specifications (e.g., ≥30FPS for the ordinary camera). Experimental results show that the co-exploration approach can significantly push forward the Pareto frontier. On ImageNet, the proposed co-exploration framework can identify architecture and hardware pairs to achieve the same accuracy, 35.42% higher throughput, and 54.05% higher energy efficiency with the reduced search time, compared with the hardware-aware NAS.”;
JIANG, p. 4, section III.A: “Figure 3 shows the HW/SW co-exploration framework. The
framework contains a RNN based controller and two levels of explorations.”;
JIANG, p. 6, section IV: “we build the distributed GPU training environment
on top of Uber Horovod” [40].  Training settings are similar to those for CIFAR-10”;
Examiner’s Note (EN): JIANG discloses a computer-implemented (in a GPU training environment) HW/SW co-exploration framework (corresponding to recited “trained predictor”) that predicts the accuracy of a neural network architecture on specific hardware)
obtaining the hardware arrangement comprising a plurality of interconnected components or devices; (JIANG, p. 1, Fig. 1b (see below); JIANG, p. 2, section II.A: “the main contribution of this work is to propose a framework to co-explore the architecture search space and the hardware design space, as shown in Figure 1(b). More specifically, this framework determines the best hardware during the search process, which is tailor-made for the candidate architectures. In this way, the framework can obtain a set of superior architecture and hardware design pairs on the Pareto frontier in terms of accuracy and hardware efficiency tradeoffs.”;

    PNG
    media_image2.png
    290
    504
    media_image2.png
    Greyscale

Examiner’s Note (EN): In Fig. 1b, the Designs 1 and 2 in the Hardware Design Space corresponds to the “hardware arrangement”, where the connected FPGAs correspond to the recited “interconnected components or devices”) 
obtaining the neural network model which is to be implemented on the hardware arrangement, the neural network model comprising a plurality of operations; (JIANG, p. 1, Fig. 1b (see below); JIANG, p. 2, section II.A: “the main contribution of this work is to propose a framework to co-explore the architecture search space and the hardware design space, as shown in Figure 1(b). More specifically, this framework determines the best hardware during the search process, which is tailor-made for the candidate architectures. In this way, the framework can obtain a set of superior architecture and hardware design pairs on the Pareto frontier in terms of accuracy and hardware efficiency tradeoffs.”

    PNG
    media_image2.png
    290
    504
    media_image2.png
    Greyscale

JIANG, p. 3, Fig. 2; JIANG, p. 3, section II.B: “Figure 2 demonstrates one such example, in which a 5-layer network is partitioned into 3 pipeline stages, and each pipeline stage is mapped to a certain FPGA in an available pool.”; 
JIANG, p. 4, section II.C: “From the software perspective, first, the proposed framework can handle neural networks with residual connections by integrating techniques in [34] to partition DAG-based child network; second, it can explore different operations (e.g., group convolutions, depthwise separable convolution, etc.) for each node in a child network by adding one additional parameter in parai to determine a specific operation for the node.;
Examiner’s Note (EN): In Fig. 1b, the NN1 and NN2 in the Architecture Search Space corresponds to the recited “neural network model” and as shown in Fig. 2, a child network which is integrated onto one or more FPGAs (e.g. as an integrated version of NN1 of Fig. 1), is a direct acyclic graph (DAG) that has operations including group convolutions or depthwise separable convolutions)
 obtaining a first graphical model representing the hardware arrangement as a plurality of connected nodes, each of the plurality of interconnected components or devices being represented by one of the plurality of connected nodes; (JIANG, Fig. 2: 

    PNG
    media_image3.png
    284
    512
    media_image3.png
    Greyscale

(EN): As shown in Fig. 2, each of FPGA nodes (U1, U2, U3) corresponds to a different FPGA device, and therefore, the design in (4) corresponds to the recited “first graphical model” and the FPGAs (U1, U2, U3) are each of the “plurality of interconnected components or devices being represented by one of the plurality of connected nodes”)
obtaining a second graphical model representing the neural network model as a plurality of interconnected nodes, each of the plurality of operations being represented by one of the plurality of interconnected nodes; (JIANG, Fig. 2: 

    PNG
    media_image3.png
    284
    512
    media_image3.png
    Greyscale

(EN): As shown in Fig. 2, in (1) there are 5 layer nodes for the child network (l1, l2, l3, l4, l5) (corresponding to recited “second graphical model” representing the operations of the child network)
outputting the performance of the neural network model on the hardware arrangement. (JIANG, Table II:

    PNG
    media_image4.png
    214
    498
    media_image4.png
    Greyscale

(EN): the performance is output and summarized in Table II of JIANG).

	However, JIANG fails to explicitly teach:
the trained predictor comprising a first feature extractor and a second feature extractor, the first feature extractor and the second feature extractor comprising a respective graph convolutional network comprising a plurality of layers,
converting the first graphical model to a first adjacency matrix and a first initial feature matrix and converting the second graphical model to a second adjacency matrix and a second initial feature matrix, the first adjacency matrix and the second adjacency matrix respectively indicating which nodes are connected in the first graphical model and the second graphical model, and the first initial feature matrix and the second initial feature matrix respectively encapsulating node parameters of the first graphical model and the second graphical model;
extracting, using the first feature extractor and the second feature extractor, respectively, a first graphical representation of the first graphical model from the first adjacency matrix and the first initial feature matrix, and a second graphical representation of the second graphical model from the second adjacency matrix and the second initial feature matrix, the first graphical representation and the second graphical representation respectively comprising feature vector representations for each node of the first graphical model and the second graphical model, a first graph convolutional network and a second graph convolutional network respectively comprising first weights and second weights obtained during training of the trained predictor, the first graph convolutional network and the second graph convolutional network respectively extracting the first graphical representation and the second graphical representation using a layer wise propagation rule;
predicting, the performance of the neural network model on the hardware arrangement by using the first graphical representation of the first graphical model and the second graphical representation of the second graphical model as an input to a fully connected layer of the trained predictor, wherein the fully connected layer maps the input to one or more performance metrics comprising at least one of an accuracy, a latency, an energy consumption, thermals and memory utilization, wherein the trained predictor has been trained with measurements of the one or more performance metrics when running neural networks on hardware arrangements; and
wherein the extracting of the first graphical representation comprises extracting a feature vector for each node of the plurality of connected nodes, and
wherein:
based on the hardware arrangement being a single-chip device comprising the plurality of interconnected components or devices, the feature vector comprises at least one of a component type or a bandwidth; and 
based on the hardware arrangement being a system comprising the plurality of interconnected components or devices, the feature vector comprises at least one of a processor type, a device type, a clock frequency, a memory size, or a bandwidth.

	However, in a related field of endeavor (neural architecture search), SHI teaches:
the trained predictor comprising a first feature extractor and a second feature extractor, the first feature extractor and the second feature extractor comprising a respective graph convolutional network comprising a plurality of layers, (SHI, p. 2, section 1: “In the Embedding Extractor, we use a graph convolutional network (GCN) to produce embeddings for neural architectures.”; 
SHI, p. 3, section 2: “For a L-layer GCN...”
(EN): the JIANG-SHI combination now uses the GCNs of SHI to extract features, where a first GCN is used for a first extractor and a second GCN is used for a second extractor, and where the GCN has L layers (corresponding to the recited “plurality of layers”); the examiner further notes that pursuant to MPEP 2144.04 VI.B, “mere duplication of parts has no patentable significance unless a new and unexpected result is produced”, and it would have been obvious to one of ordinary skill to have 2 GCNs act as extractors)
converting the first graphical model to a first adjacency matrix and a first initial feature matrix and converting the second graphical model to a second adjacency matrix and a second initial feature matrix, the first adjacency matrix and the second adjacency matrix respectively indicating which nodes are connected in the first graphical model and the second graphical model, and the first initial feature matrix and the second initial feature matrix respectively encapsulating node parameters of the first graphical model and the second graphical model; (SHI, p. 3, section 3.1: “Specifically, graph connectivity is encoded by the adjacency matrix A, which can be obtained from the graph structure directly. Individual operations are encoded as one-hot vectors, and then aggregated to form the feature matrix X.”; 
SHI, Fig. 2: 

    PNG
    media_image5.png
    250
    478
    media_image5.png
    Greyscale

(EN): the JIANG-SHI combination now takes the first and second graphical models (relating to hardware arrangement and NN architecture, respectively), and to each, creates adjacency matrices and feature matrices as disclosed by SHI, where the adjacency matrix encodes connectivity of nodes, and the feature matrix relates to the individual operations (corresponding to recited “node parameters”); the examiner further notes that pursuant to MPEP 2144.04 VI.B, “mere duplication of parts has no patentable significance unless a new and unexpected result is produced”, and it would have been obvious to one of ordinary skill to create adjacency matrices and feature matrices for both the first and second graphical models)
extracting, using the first feature extractor and the second feature extractor, respectively, a first graphical representation of the first graphical model from the first adjacency matrix and the first initial feature matrix, and a second graphical representation of the second graphical model from the second adjacency matrix and the second initial feature matrix, the first graphical representation and the second graphical representation respectively comprising feature vector representations for each node of the first graphical model and the second graphical model, (SHI, p. 3, section 2.3:

    PNG
    media_image6.png
    200
    474
    media_image6.png
    Greyscale

Examiner’s Note: the broadest reasonable interpretation of “extracting ... a first graphical representation of the first graphical model from the first adjacency matrix and the first initial feature matrix, and a second graphical representation of the second graphical model from the second adjacency matrix and the second initial feature matrix” includes using a layer-wise propagation rule that operates on the adjacency matrix and features matrix as explained on p. 11, lines 6-13 of the instant specification, and SHI similarly teaches such a layer-wise propagation that operates on the adjacency matrix (A) and the initial feature matrix (H(0)) to result in a feature matrix H(l); the JIANG-SHI combination now modifies JIANG to utilize the layer-wise propagation of SHI in order to extract the first and second graphical representations (H(l)); the examiner further notes that pursuant to MPEP 2144.04 VI.B, “mere duplication of parts has no patentable significance unless a new and unexpected result is produced”, and it would have been obvious to one of ordinary skill to create first and second graphical representations (H(l)) for the hardware architecture and NN architecture co-exploration paths)
a first graph convolutional network and a second graph convolutional network respectively comprising first weights and second weights obtained during training of the trained predictor, (SHI, p. 3, section 2.3: “The graph convolutional network (GCN) is a model for graph-structured data, which utilizes localized spectral filters to extract an useful embedding of each node”; 
SHI, p. 3, section 3.1: “To train the GCN, we feed its output (cell embedding) to a standard regressor. In this paper, we use a single-hidden layer neural network. Using the NAS-Bench data sets, the target output is the actual accuracy of the network constructed by stacking this particular cell. This regressor is then trained end-to-end with the GCN”
Examiner’s Note: The GCN is trained and the weights are reflected as W(l) weight matrices as shown in Equation (3) (the layer-wise propagation rule); the examiner further notes that pursuant to MPEP 2144.04 VI.B, “mere duplication of parts has no patentable significance unless a new and unexpected result is produced”, and it would have been obvious to one of ordinary skill to train first and second GCNs for the hardware architecture and NN architecture co-exploration paths, respectively)
the first graph convolutional network and the second graph convolutional network respectively extracting the first graphical representation and the second graphical representation using a layer wise propagation rule; (SHI, p. 3, section 2.3:

    PNG
    media_image6.png
    200
    474
    media_image6.png
    Greyscale

Examiner’s Note: the JIANG-SHI combination now modifies JIANG to utilize the layer-wise propagation of SHI in order to extract the first and second graphical representations (H(l)); the examiner further notes that pursuant to MPEP 2144.04 VI.B, “mere duplication of parts has no patentable significance unless a new and unexpected result is produced”, and it would have been obvious to one of ordinary skill to create first and second graphical representations (H(l)) for the hardware architecture and NN architecture co-exploration paths)
predicting, the performance of the neural network model on the hardware arrangement by using the first graphical representation of the first graphical model and the second graphical representation of the second graphical model as an input to a fully connected layer of the trained predictor, (SHI, p. 4, section 3.2:

    PNG
    media_image7.png
    674
    480
    media_image7.png
    Greyscale

Examiner’s Note: the Bayesian linear regressor (BLR) corresponds to the recited trained predictor, and by inputting the updated features matrices (X) that correspond to the recited “first and second graphical representations”, the BLR outputs prediction variances as shown in Fig. 5, which uses a last fully-connected layer (corresponding to recited “fully-connected layer”); the JIANG-SHI combination now modifies JIANG to use the prediction methods of SHI)

wherein the fully connected layer maps the input to one or more performance metrics comprising at least one of an accuracy, a latency, an energy consumption, thermals and memory utilization, (SHI, p. 3, section 3.1: “Using the NAS-Bench data sets, the target output is the actual accuracy of the network constructed by stacking this particular cell”)
wherein the trained predictor has been trained with measurements of the one or more performance metrics when running neural networks on hardware arrangements; (SHI, p. 4, section 3.4: 


    PNG
    media_image8.png
    512
    462
    media_image8.png
    Greyscale

Examiner’s Note: the predictor model is trained using predicted and ground truth accuracies using the loss metric of equation (6); the JIANG-SHI combination now modifies JIANG to use the trained predictor of SHI with respect to the accuracy metric)
wherein the extracting of the first graphical representation comprises extracting a feature vector for each node of the plurality of connected nodes, and (SHI, p. 3, section 2.3: “The graph convolutional network (GCN) is a model for graph-structured data, which utilizes localized spectral filters to extract an useful embedding of each node”

    PNG
    media_image6.png
    200
    474
    media_image6.png
    Greyscale

SHI, p. 3, section 3.1: “Specifically, graph connectivity is encoded by the adjacency matrix A, which can be obtained from the graph structure directly. Individual operations are encoded as one-hot vectors, and then aggregated to form the feature matrix X.”; 
SHI, Fig. 2: 

    PNG
    media_image5.png
    250
    478
    media_image5.png
    Greyscale

Examiner’s Note: SHI teaches a layer-wise propagation that operates on the adjacency matrix (A) and the initial feature matrix (H(0)) to result in a feature matrix H(l), and as shown in Fig. 2, the structure of the feature matrix has a one-hot encoded row vector for each of the 8 connected nodes, where each row vector corresponds to the recited “feature vector” of the feature matrix; the JIANG-SHI combination now modifies JIANG to utilize the layer-wise propagation of SHI in order to extract the first graphical representation (H(l)) which has the format of SHI such that each row is a row feature vector corresponding to a node as disclosed by SHI)

	Before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to combine the teachings of JIANG with SHI as explained above.  As disclosed by SHI, one of ordinary skill would have been motivated to do so because SHI teaches, for neural architecture search, the “proposed BONAS” (Bayesian Optimized Neural Architecture Search”) is 123.7x more efficient than random search within a sample space.  (p. 2, section 1).  One of ordinary skill would further understand the benefit of making performance predictions directly from a graph, as disclosed by SHI, instead of having to train each model, which saves considerable resources.  (see p. 2, section 1).

	However, JIANG and SHI fail to explicitly teach:
wherein the hardware arrangement is selected as being a single-chip device comprising the plurality of interconnected components or devices or a system comprising the plurality of interconnected components or devices
wherein:
based on the hardware arrangement being selected as the single-chip device, the feature vector comprises at least one of a component type or a bandwidth; and 
based on the hardware arrangement being selected as the system, the feature vector comprises at least one of a processor type, a device type, a clock frequency, a memory size, or a bandwidth.

However, in a related field of endeavor (designing a system having multiple components, see para. 0097), SCANLON teaches and makes obvious:
wherein the hardware arrangement is selected as being a single-chip device comprising the plurality of interconnected components or devices or a system comprising the plurality of interconnected components or devices (SCANLON, para. 0097: “The designer of a system may choose to implement the functionality on a single processor or the functionality may be distributed across different devices and systems. Within a single device it is a matter of choice as to whether a single processor or multiple processors, including dedicated chips for e.g. audio processing, are used.”;
Examiner’s Note: the JIANG-SHI-SCANLON combination now makes the selection of a single-chip device vs. a multi-component system a design choice as taught by SCANLON)

	Before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to combine the teachings of JIANG with SHI and SCANLON as explained above.  One of ordinary skill would understand that a single-chip implementation has certain advantages, such as a smaller form factor, whereas a multiple-component system may be more powerful. 

However, JIANG and SHI and SCANLON fail to explicitly teach:
wherein:
based on the hardware arrangement being selected as the single-chip device, the feature vector comprises at least one of a component type or a bandwidth; and 
based on the hardware arrangement being selected as the system, the feature vector comprises at least one of a processor type, a device type, a clock frequency, a memory size, or a bandwidth.

	However, in a related field of endeavor (analyzing multi-component hardware systems and associated performance, see paras. 0001-0002), COHEN teaches:
wherein:
based on the hardware arrangement being selected as the single-chip device, the feature vector comprises at least one of a component type or a bandwidth; and (COHEN, para. 0060: “In the following, an example for computing a feature vector X is illustrated. As shown in FIG. 8, step 802 may include (i) a sub-step 804 in which a component type feature vector X.sub.t is computed using the set of metrics, and (ii) a sub-step 806 in which an instance feature vector X.sub.ins is computed for each component type.”
COHEN, para. 0089: “Embodiments can be realized in any computer-readable media for use by or in connection with an instruction execution system such as a computer/processor based system or an ASIC (Application Specific Integrated Circuit) or other system that can fetch or obtain the logic from computer-readable media and execute the instructions contained therein.”; 
Examiner’s Note: the JIANG-SHI-SCANLON-COHEN combination now modifies JIANG with respect to the feature matrices of SHI such that the feature vectors include information about a component type, including an ASIC, as in COHEN)
based on the hardware arrangement being selected as the system, the feature vector comprises at least one of a processor type, a device type, a clock frequency, a memory size, or a bandwidth. (COHEN, para. 0014: “Other examples include CPU utilization, memory utilization, disk utilization and bandwidth, or queries received/processed by a database. These features, for example, are related to the measured performance of cloud resources.”
COHEN, para. 0023: “Multiple actions may be performed for changing the specific configuration of resources in cloud 100 supporting execution of an application, such as any of the following: (a) adding a component type; (b) removing a component type; (c) re-allocating resources of cloud 100 realizing an instance of a component; (d) re-starting an instance of a component; or (e) changing the configuration of an instance of a component (e.g., allocate more memory or a higher CPU to an instance). Actions a) and b) are actions performed on a component type; actions c) to e) are actions on instances of components. It will be understood that this list of actions is not exhaustive. There is a vast variety of actions that may be performed for changing a configuration of cloud resources supporting execution of an application.”
COHEN, para. 0060: “In the following, an example for computing a feature vector X is illustrated. As shown in FIG. 8, step 802 may include (i) a sub-step 804 in which a component type feature vector X.sub.t is computed using the set of metrics, and (ii) a sub-step 806 in which an instance feature vector X.sub.ins is computed for each component type.”
COHEN, para. 0089: “Embodiments can be realized in any computer-readable media for use by or in connection with an instruction execution system such as a computer/processor based system or an ASIC (Application Specific Integrated Circuit) or other system that can fetch or obtain the logic from computer-readable media and execute the instructions contained therein.”; 
Examiner’s Note: the JIANG-SHI-SCANLON-COHEN combination now modifies JIANG with respect to the feature matrices of SHI such that the feature vectors include information about a processor type, including an ASIC, as in COHEN, and can further include information regarding different CPUs, more memory, and bandwidth metrics as taught by COHEN)

Before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to modify the teachings of JIANG, SHI, and COHEN as explained above.  As disclosed by COHEN, one of ordinary skill would have been motivated to do so because COHEN teaches modifying components and component types for scaling execution of an application. (para. 0022).  One of ordinary skill would understand the benefit of encoding aspects related to a component type for consideration by the neural network.

Regarding Claim 4:
	JIANG, SHI, SCANLON, and COHEN teach the method of claim 1 as explained above.  However, JIANG does not explicitly teach:
wherein the extracting of the second graphical representation comprises extracting a feature vector for each node of the plurality of interconnected nodes. 

However, in a related field of endeavor (neural architecture search), SHI teaches:
wherein the extracting of the second graphical representation comprises extracting a feature vector for each node of the plurality of interconnected nodes. (SHI, p. 3, section 2.3:

    PNG
    media_image6.png
    200
    474
    media_image6.png
    Greyscale

SHI, p. 3, section 3.1: “Specifically, graph connectivity is encoded by the adjacency matrix A, which can be obtained from the graph structure directly. Individual operations are encoded as one-hot vectors, and then aggregated to form the feature matrix X.”; 
SHI, Fig. 2: 

    PNG
    media_image5.png
    250
    478
    media_image5.png
    Greyscale

Examiner’s Note: SHI teaches a layer-wise propagation that operates on the adjacency matrix (A) and the initial feature matrix (H(0)) to result in a feature matrix H(l), and as shown in Fig. 2, the structure of the feature matrix has a one-hot encoded row vector for each of the 8 connected nodes, where each row vector corresponds to the recited “feature vector” of the feature matrix; the JIANG-SHI-SCANLON-COHEN combination now modifies JIANG to utilize the layer-wise propagation of SHI in order to extract the second graphical representation (H(l)) which has the format of SHI such that each row is a row feature vector corresponding to a node as disclosed by SHI)

Before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to modify the teachings of JIANG with SHI, SCANLON, and COHEN as explained above.  As disclosed by SHI, one of ordinary skill would have been motivated to do so because SHI teaches, for neural architecture search, the “proposed BONAS” (Bayesian Optimized Neural Architecture Search”) is 123.7x more efficient than random search within a sample space.  (p. 2, section 1).  One of ordinary skill would further understand the benefit of making performance predictions directly from a graph, as disclosed by SHI, instead of having to train each model, which saves considerable resources.  (see p. 2, section 1).

Regarding Claim 5:
	JIANG, SHI, SCANLON, and COHEN teach the method of claim 4 as explained above.  However, JIANG does not explicitly teach:
wherein the feature vector comprises at least one of an input, an output, a 3x3 convolutional layer, a 1x1 convolutional layer, or an averaging operation. 

However, in a related field of endeavor (neural architecture search), SHI teaches:
wherein the feature vector comprises at least one of an input, an output, a 3x3 convolutional layer, a 1x1 convolutional layer, or an averaging operation. (SHI, Fig. 2: 

    PNG
    media_image5.png
    250
    478
    media_image5.png
    Greyscale

Examiner’s Note: As shown in Fig. 2, for input node 1, the first row in the feature matrix is [1,0,0,0,0,0] where the “1” corresponds to the input (0) operation)

Before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to modify the teachings of JIANG with SHI, SCANLON, and COHEN as explained above.  As disclosed by SHI, one of ordinary skill would have been motivated to do so because SHI teaches, for neural architecture search, the “proposed BONAS” (Bayesian Optimized Neural Architecture Search”) is 123.7x more efficient than random search within a sample space.  (p. 2, section 1).  One of ordinary skill would further understand the benefit of making performance predictions directly from a graph, as disclosed by SHI, instead of having to train each model, which saves considerable resources.  (see p. 2, section 1).

Regarding Claim 6:
	JIANG, SHI, SCANLON, and COHEN teach the method of claim 1 as explained above.  JIANG further teaches:
wherein the predicting of the performance comprises at least one of: 
predicting individual performances of each of the first plurality of interconnected components or devices; or 
predicting overall performance of the first hardware arrangement. (JIANG, p. 7, section V.B: “As shown in Figure 1(a), in the search loop, the controller will first predict a neural architecture; second, the framework tests the hardware efficiency of the predicted architecture on FPGAs; third, it trains architecture to get its accuracy; finally, it utilizes hardware efficiency and accuracy to update the controller.”; (EN): the examiner notes that this limitation is claimed in the alternative, so the broadest reasonable interpretation is only one of “individual performances” and “overall performance” is required, and as explained above, JIANG at least discloses the “overall performance” alternative)

Regarding Claim 7:
	JIANG, SHI, SCANLON, and COHEN teach the method of claim 6 as explained above.  However, JIANG does not explicitly teach:
the first graphical model comprises a global node; and 
the predicting of the overall performance of the first hardware arrangement is based on the global node.

However, in a related field of endeavor (neural architecture search), SHI teaches:
the first graphical model comprises a global node; and 

    PNG
    media_image5.png
    250
    478
    media_image5.png
    Greyscale

Examiner’s Note: As shown in Fig. 2, node 8 is the “global node”)
the predicting of the overall performance of the first hardware arrangement is based on the global node. (SHI, p. 4, section 3.2:

    PNG
    media_image7.png
    674
    480
    media_image7.png
    Greyscale

Examiner’s Note: the BLR outputs prediction variances as shown in Fig. 5, which is based on the graph that is based on a global node; the JIANG-SHI-COHEN combination now modifies JIANG to use the prediction methods of SHI)

Before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to modify the teachings of JIANG with SHI, SCANLON, and COHEN as explained above.  As disclosed by SHI, one of ordinary skill would have been motivated to do so because SHI teaches, for neural architecture search, the “proposed BONAS” (Bayesian Optimized Neural Architecture Search”) is 123.7x more efficient than random search within a sample space.  (p. 2, section 1).  One of ordinary skill would further understand the benefit of making performance predictions directly from a graph, as disclosed by SHI, instead of having to train each model, which saves considerable resources.  (see p. 2, section 1).

Regarding Claim 9:
	JIANG, SHI, SCANLON, and COHEN teach the method of claim 1 as explained above.  JIANG further teaches:
obtaining another hardware arrangement comprising another plurality of interconnected components or devices; (JIANG, p. 1, Fig. 1b (see below); JIANG, p. 2, section II.A: “the main contribution of this work is to propose a framework to co-explore the architecture search space and the hardware design space, as shown in Figure 1(b). More specifically, this framework determines the best hardware during the search process, which is tailor-made for the candidate architectures. In this way, the framework can obtain a set of superior architecture and hardware design pairs on the Pareto frontier in terms of accuracy and hardware efficiency tradeoffs.”;

    PNG
    media_image2.png
    290
    504
    media_image2.png
    Greyscale

Examiner’s Note (EN): In Fig. 1b, the Design 2 in the Hardware Design Space corresponds to the “another hardware arrangement”, where the connected FPGAs correspond to the recited “interconnected components or devices”) 
obtaining a third graphical model representing the another hardware arrangement as a another plurality of connected nodes, each of the another plurality of interconnected components or devices being represented by one of the another plurality of connected nodes; (JIANG, Fig. 2: 

    PNG
    media_image3.png
    284
    512
    media_image3.png
    Greyscale

(EN): As shown in Fig. 2, each of FPGA nodes (U1, U2, U3) corresponds to a different FPGA device, and therefore, the design in (4) for the next iteration of child networks in the sample space corresponds to the recited “third graphical model” and the FPGAs (U1, U2, U3) are each of the “plurality of interconnected components or devices being represented by one of the another plurality of connected nodes”)
comparing the performance of the hardware arrangement and the performance of the another hardware arrangement, (JIANG, p. 5, section III.B: “According to the generated pipeline structure, we then reorganize the controller and iteratively update the controller to generate child networks with higher hardware utilization. Our goal is to maximize the average hardware utilization, which is equivalent to maximize the utilization of each hardware”; (EN): JIANG discloses iterative updates using successive child networks with higher hardware utilization, where a comparison is performed to determine if the hardware utilization is higher for a successive child network (e.g., the another hardware arrangement as compared to the hardware arrangement))
wherein the outputting of the performance of the neural network model on the hardware arrangement comprises outputting an indication of the performance the hardware arrangement relative to the performance of the another hardware arrangement. (JIANG, p. 3, section II.C: “The child network is the bridge between the architecture search space and the hardware design space. Specifically, in each iteration, the controller RNN will predict child networks from the architecture search space, and then determine their implementations in the hardware design space.”; 
JIANG, p. 5, section III.B: “According to the generated pipeline structure, we then reorganize the controller and iteratively update the controller to generate child networks with higher hardware utilization. Our goal is to maximize the average hardware utilization, which is equivalent to maximize the utilization of each hardware”; (EN): JIANG discloses iterative updates using successive child networks with higher hardware utilization, where a comparison is performed to determine if the hardware utilization is higher for a successive child network (e.g., the another hardware arrangement as compared to the hardware arrangement), meaning that the output shows that the “another hardware arrangement” is relatively better than the “hardware arrangement”)

However, JIANG fails to explicitly teach:
extracting, based on the third graphical model, a third graphical representation of the another hardware arrangement; 
predicting, based on the third graphical representation of the another hardware arrangement, performance of the another hardware arrangement; and 

However, in a related field of endeavor (neural architecture search), SHI teaches:
extracting, based on the third graphical model, a third graphical representation of the another hardware arrangement; (SHI, p. 3, section 2.3:

    PNG
    media_image6.png
    200
    474
    media_image6.png
    Greyscale

Examiner’s Note: SHI teaches a layer-wise propagation that operates on the adjacency matrix (A) and the initial feature matrix (H(0)) to result in a feature matrix H(l); the JIANG-SHI combination now modifies JIANG to utilize the layer-wise propagation of SHI in order to extract the third graphical representation (H(l)))
predicting, based on the third graphical representation of the another hardware arrangement, performance of the another hardware arrangement; and (SHI, p. 4, section 3.2:

    PNG
    media_image7.png
    674
    480
    media_image7.png
    Greyscale

Examiner’s Note: the Bayesian linear regressor (BLR) corresponds to the recited trained predictor, and by inputting the updated features matrix (X) that corresponds to the recited “third graphical representation”, the BLR outputs prediction variances as shown in Fig. 5; the JIANG-SHI-SCANON-COHEN combination now modifies JIANG to use the prediction methods of SHI)

Before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to modify the teachings of JIANG with SHI, SCANLON, and COHEN as explained above.  As disclosed by SHI, one of ordinary skill would have been motivated to do so because SHI teaches, for neural architecture search, the “proposed BONAS” (Bayesian Optimized Neural Architecture Search”) is 123.7x more efficient than random search within a sample space.  (p. 2, section 1).  One of ordinary skill would further understand the benefit of making performance predictions directly from a graph, as disclosed by SHI, instead of having to train each model, which saves considerable resources.  (see p. 2, section 1).

Regarding Claim 10:
	JIANG, SHI, SCANLON, and COHEN disclose the method of claim 1 as explained above.  JIANG further teaches:
obtaining a another neural network model comprising a another plurality of operations; (JIANG, p. 1, Fig. 1b (see below); JIANG, p. 2, section II.A: “the main contribution of this work is to propose a framework to co-explore the architecture search space and the hardware design space, as shown in Figure 1(b). More specifically, this framework determines the best hardware during the search process, which is tailor-made for the candidate architectures. In this way, the framework can obtain a set of superior architecture and hardware design pairs on the Pareto frontier in terms of accuracy and hardware efficiency tradeoffs.”

    PNG
    media_image2.png
    290
    504
    media_image2.png
    Greyscale

JIANG, p. 3, Fig. 2; JIANG, p. 3, section II.B: “Figure 2 demonstrates one such example, in which a 5-layer network is partitioned into 3 pipeline stages, and each pipeline stage is mapped to a certain FPGA in an available pool.”; 
JIANG, p. 4, section II.C: “From the software perspective, first, the proposed framework can handle neural networks with residual connections by integrating techniques in [34] to partition DAG-based child network; second, it can explore different operations (e.g., group convolutions, depthwise separable convolution, etc.) for each node in a child network by adding one additional parameter in parai to determine a specific operation for the node.;
Examiner’s Note (EN): In Fig. 1b, the NN2 in the Architecture Search Space corresponds to the recited “another neural network model” and as shown in Fig. 2, a child network which is integrated onto one or more FPGAs (e.g. as an integrated version of NN2 of Fig. 1), is a direct acyclic graph (DAG) that has operations including group convolutions or depthwise separable convolutions)

obtaining a third graphical model representing the another neural network model as a another plurality of interconnected nodes of the another plurality of operations being represented by one of the another plurality of interconnected nodes; (JIANG, Fig. 2: 

    PNG
    media_image3.png
    284
    512
    media_image3.png
    Greyscale

(EN): As shown in Fig. 2, in (1) there are 5 layer nodes for the child network (l1, l2, l3, l4, l5) (corresponding to recited “third graphical model” representing the operations of the child network for the next iteration of the child network in the search space)
comparing the performance of the neural network model and the performance of the another neural network model, (JIANG, p. 5, section III.B: “According to the generated pipeline structure, we then reorganize the controller and iteratively update the controller to generate child networks with higher hardware utilization. Our goal is to maximize the average hardware utilization, which is equivalent to maximize the utilization of each hardware”; (EN): JIANG discloses iterative updates using successive child networks with higher hardware utilization, where a comparison is performed to determine if the hardware utilization is higher for a successive child network (e.g., the another neural network model as compared to the neural network model))
wherein the outputting of the performance of the neural network model on the hardware arrangement comprises outputting an indication of the performance the neural network model relative to the performance of the another neural network model. (JIANG, p. 3, section II.C: “The child network is the bridge between the architecture search space and the hardware design space. Specifically, in each iteration, the controller RNN will predict child networks from the architecture search space, and then determine their implementations in the hardware design space.”; 
JIANG, p. 5, section III.B: “According to the generated pipeline structure, we then reorganize the controller and iteratively update the controller to generate child networks with higher hardware utilization. Our goal is to maximize the average hardware utilization, which is equivalent to maximize the utilization of each hardware”; (EN): JIANG discloses iterative updates using successive child networks with higher hardware utilization, where a comparison is performed to determine if the hardware utilization is higher for a successive child network, meaning that the output shows that the “another neural network model” is relatively better than the “neural network model”)

	However, JIANG fails to explicitly teach:
extracting, based on the third graphical model, a third graphical representation of the another neural network model; 
predicting, based on the third graphical representation of the another neural network model, performance of the another neural network model; and 

However, in a related field of endeavor (neural architecture search), SHI teaches:
extracting, based on the third graphical model, a third graphical representation of the another neural network model; (SHI, p. 3, section 2.3:

    PNG
    media_image6.png
    200
    474
    media_image6.png
    Greyscale

Examiner’s Note: SHI teaches a layer-wise propagation that operates on the adjacency matrix (A) and the initial feature matrix (H(0)) to result in a feature matrix H(l); the JIANG-SHI-COHEN combination now modifies JIANG to utilize the layer-wise propagation of SHI in order to extract the third graphical representation (H(l)))
predicting, based on the third graphical representation of the another neural network model, performance of the another neural network model; (SHI, p. 4, section 3.2:

    PNG
    media_image7.png
    674
    480
    media_image7.png
    Greyscale

Examiner’s Note: the Bayesian linear regressor (BLR) corresponds to the recited trained predictor, and by inputting the updated features matrix (X) that corresponds to the recited “third graphical representation”, the BLR outputs prediction variances as shown in Fig. 5; the JIANG-SHI-SCANLON-COHEN combination now modifies JIANG to use the prediction methods of SHI)

Before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to modify the teachings of JIANG with SHI, SCANLON, and COHEN as explained above.  As disclosed by SHI, one of ordinary skill would have been motivated to do so because SHI teaches, for neural architecture search, the “proposed BONAS” (Bayesian Optimized Neural Architecture Search”) is 123.7x more efficient than random search within a sample space.  (p. 2, section 1).  One of ordinary skill would further understand the benefit of making performance predictions directly from a graph, as disclosed by SHI, instead of having to train each model, which saves considerable resources.  (see p. 2, section 1).

Regarding Claim 11:
	JIANG, SHI, SCANLON, and COHEN disclose the method of claim 1 as explained above.  JIANG further teaches:
a first paired combination comprises the hardware arrangement and the neural network model; (JIANG, p. 2, section II.A: “the main contribution of this work is to propose a framework to co-explore the architecture search space and the hardware design space, as shown in Figure 1(b). More specifically, this framework determines the best hardware during the search process, which is tailor-made for the candidate architectures. In this way, the framework can obtain a set of superior architecture and hardware design pairs on the Pareto frontier in terms of accuracy and hardware efficiency tradeoffs.”)
the obtaining of the first graphical model comprises obtaining a first hardware graphical model corresponding to the hardware arrangement and a first network graphical model corresponding to the neural network model; (JIANG, Fig. 1b; (EN): In Fig. 1b, the Design 1 in the Hardware Design Space corresponds to the “first hardware arrangement” and the NN1 in the Architecture Search Space corresponds to the “first neural network model”)
the extracting of the first graphical representation comprises extracting a first hardware graphical representation of the hardware arrangement and a first network graphical representation of the neural network model; and (JIANG, p. 1, Fig. 1b (see below), and Section I: “Interestingly, the hardware design space is tightly coupled with the architecture search space, i.e., the best neural architecture depends on the hardware (hardware-aware NAS), and the best hardware depends on the neural architecture. It is therefore best to jointly explore both spaces to push forward the Pareto frontier between hardware efficiency and test accuracy for better design tradeoffs.”: 
JIANG, p. 2, section II.A: “the main contribution of this work is to propose a framework to co-explore the architecture search space and the hardware design space, as shown in Figure 1(b). More specifically, this framework determines the best hardware during the search process, which is tailor-made for the candidate architectures.”; JIANG, p. 3, section II.B: “This paper will employ FPGA as a vehicle to study how to co-explore neural architectures and hardware designs.” 
JIANG, p. 6, section IV: “Hardware Design Space: The hardware design space is composed of up to three Xilinx FPGAs (XC7Z015), each of which contains 74K logic cells, 4.9Mb on-chip memory, and 150 DSP Slices. ... In the implementation, the child network is partitioned into pipeline stages, and each stage is mapped to one FPGA.”; (EN): In Fig. 1b, the Design 2 in the Hardware Design Space corresponds to the “first hardware arrangement”)
the computer-implemented method further comprises:
obtaining a second paired combination comprising a second hardware arrangement and a second neural network model; (JIANG, p. 2, section II.A: “the main contribution of this work is to propose a framework to co-explore the architecture search space and the hardware design space, as shown in Figure 1(b). More specifically, this framework determines the best hardware during the search process, which is tailor-made for the candidate architectures. In this way, the framework can obtain a set of superior architecture and hardware design pairs on the Pareto frontier in terms of accuracy and hardware efficiency tradeoffs.”)
obtaining a second hardware graphical model corresponding to the second hardware arrangement and a second network graphical model corresponding to the second neural network model; (JIANG, Fig. 2: (EN): As shown in Fig. 2, there are 3 nodes for the pipelined FPGAs (U1, U2, U3), corresponding to the “second plurality of nodes corresponding to the obtained first hardware arrangement”)
predicting, based on the first hardware graphical representation and the first network graphical representation, performance of the first paired combination; (JIANG, p. 7, section V.B: “As shown in Figure 1(a), in the search loop, the controller will first predict a neural architecture; second, the framework tests the hardware efficiency of the predicted architecture on FPGAs; third, it trains architecture to get its accuracy; finally, it utilizes hardware efficiency and accuracy to update the controller.”; (EN):  for the first pair of hardware + neural network architecture, the hardware efficiency and accuracy (corresponding to the recited “performance”) is predicted and measured.)
predicting, based on the second hardware graphical representation and the second network graphical representation, performance of the second paired combination; (JIANG, p. 7, section V.B: “As shown in Figure 1(a), in the search loop, the controller will first predict a neural architecture; second, the framework tests the hardware efficiency of the predicted architecture on FPGAs; third, it trains architecture to get its accuracy; finally, it utilizes hardware efficiency and accuracy to update the controller.”; (EN):  for the second pair of hardware + neural network architecture, the hardware efficiency and accuracy (corresponding to the recited “performance”) is predicted and measured.)
comparing the performance of the first paired combination and the performance of the second paired combination; and (JIANG, p. 5, section III.B: “According to the generated pipeline structure, we then reorganize the controller and iteratively update the controller to generate child networks with higher hardware utilization. Our goal is to maximize the average hardware utilization, which is equivalent to maximize the utilization of each hardware”; (EN): JIANG iteratively updates using successive child networks with higher hardware utilization, where a comparison is performed to determine if the hardware utilization is higher for a successive child network (e.g., the second hardware arrangement with second neural network architecture as compared to the first hardware arrangement with first neural network architecture))
outputting a relative performance of the first paired combination compared to the second paired combination. (JIANG, p. 3, section II.C: “The child network is the bridge between the architecture search space and the hardware design space. Specifically, in each iteration, the controller RNN will predict child networks from the architecture search space, and then determine their implementations in the hardware design space.”; JIANG, p. 5, section III.B: “According to the generated pipeline structure, we then reorganize the controller and iteratively update the controller to generate child networks with higher hardware utilization. Our goal is to maximize the average hardware utilization, which is equivalent to maximize the utilization of each hardware”; (EN): JIANG iteratively updates using successive child networks, where each iterance is a performance improvement to the previous iteration, meaning that the output shows that the “second hardware arrangement” paired with the second neural network architecture is relatively better than the “first hardware arrangement” paired with the first neural network architecture.

	However, JIANG fails to explicitly teach:
extracting, based on the second hardware graphical model and the second network graphical model, a second hardware graphical representation of the second hardware arrangement and a second network graphical representation of the second neural network model;

However, in a related field of endeavor (neural architecture search), SHI teaches:
extracting, based on the second hardware graphical model and the second network graphical model, a second hardware graphical representation of the second hardware arrangement and a second network graphical representation of the second neural network model; (SHI, p. 3, section 2.3:

    PNG
    media_image6.png
    200
    474
    media_image6.png
    Greyscale

Examiner’s Note: SHI teaches a layer-wise propagation that operates on the adjacency matrix (A) and the initial feature matrix (H(0)) to result in a feature matrix H(l); the JIANG-SHI-SCANLON-COHEN combination now modifies JIANG to utilize the layer-wise propagation of SHI in order to extract the second graphical representation (H(l)))

Before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to modify the teachings of JIANG with SHI, SCANLON, and COHEN as explained above.  As disclosed by SHI, one of ordinary skill would have been motivated to do so because SHI teaches, for neural architecture search, the “proposed BONAS” (Bayesian Optimized Neural Architecture Search”) is 123.7x more efficient than random search within a sample space.  (p. 2, section 1).  One of ordinary skill would further understand the benefit of making performance predictions directly from a graph, as disclosed by SHI, instead of having to train each model, which saves considerable resources.  (see p. 2, section 1).

Regarding Claim 12:
	JIANG, SHI, SCANLON, and COHEN teach the method of claim 1 as explained above.  JIANG further teaches:
obtaining a plurality of hardware arrangements; (JIANG, Fig. 1(b); (EN): In Fig. 1b, the Design 1 and Design 2 in the Hardware Design Space correspond to the “plurality of hardware arrangements”)
predicting a performance of the neural network model on each hardware arrangement of the plurality of hardware arrangements; (JIANG, p. 1, Fig. 1b (see below); JIANG, p. 2, section II.A: “the main contribution of this work is to propose a framework to co-explore the architecture search space and the hardware design space, as shown in Figure 1(b).)
comparing performances for each hardware arrangement of the plurality of hardware arrangements; and (JIANG, p. 5, section III.B: “According to the generated pipeline structure, we then reorganize the controller and iteratively update the controller to generate child networks with higher hardware utilization. Our goal is to maximize the average hardware utilization, which is equivalent to maximize the utilization of each hardware”; (EN): JIANG iteratively updates using successive child networks with higher hardware utilization, where a comparison is performed to determine if the hardware utilization is higher for a successive child network (e.g., the second hardware arrangement with second neural network architecture as compared to the first hardware arrangement with first neural network architecture)
identifying, based on a predetermined performance criteria, a selected hardware arrangement from among the plurality of hardware arrangements. (JIANG, p. 2, section 1: “The optimization objectives in the hardware design space can be varied according to the design specifications, such as area, monetary cost, energy efficiency, reliability, resource utilization, etc.”; (EN): the design specifications (corresponding to the recited “predetermined performance criteria”) are used to identify an optimal hardware arrangement + neural network architecture pair)

Regarding Claim 13:
	JIANG, SHI, SCANLON, and COHEN teach the method of claim 1 as explained above.  JIANG further teaches:
obtaining a plurality of neural network models; (JIANG, Fig. 1(b); (EN): In Fig. 1b, NN1 and NN2 in the Architecture Search Space correspond to the “plurality of neural network models”)
predicting a performance of the hardware arrangement on each neural network model of the plurality of neural network models; (JIANG, p. 1, Fig. 1b (see below); JIANG, p. 2, section II.A: “the main contribution of this work is to propose a framework to co-explore the architecture search space and the hardware design space, as shown in Figure 1(b).)
comparing performances of each neural network model of the plurality of neural network models; and (JIANG, p. 5, section III.B: “According to the generated pipeline structure, we then reorganize the controller and iteratively update the controller to generate child networks with higher hardware utilization. Our goal is to maximize the average hardware utilization, which is equivalent to maximize the utilization of each hardware”; (EN): JIANG iteratively updates using successive child networks with higher hardware utilization, where a comparison is performed to determine if the hardware utilization is higher for a successive child network (e.g., the second hardware arrangement with second neural network architecture as compared to the first hardware arrangement with first neural network architecture)
identifying, based on a predetermined performance criteria, a selected neural network model from among the plurality of neural network models. (JIANG, p. 2, section 1: “The optimization objectives in the hardware design space can be varied according to the design specifications, such as area, monetary cost, energy efficiency, reliability, resource utilization, etc.”; (EN): the design specifications (corresponding to the recited “predetermined performance criteria”) are used to identify an optimal hardware arrangement + neural network architecture pair)

Regarding Claim 14:
	JIANG teaches:
A server comprising: a memory storing at least one instruction; (JIANG, p. 6, section IV: “we build the distributed GPU training environment on top of Uber Horovod”; (EN): a distributed GPU training environment corresponds to a “server” and because this is done via a computer, memory is necessarily required to store the instructions for performing the co-exploration system; p. 7, footnote 1 provides a GitHub link to source code, which means that that source code (corresponding to “instructions”) is utilized on the GPUs having memory)
a trained predictor configured to predict a performance of a neural network model on a hardware arrangement, ... (JIANG, pp. 1-2, section I: “Interestingly, the hardware design space is tightly coupled with the architecture search space, i.e., the best neural architecture depends on the hardware (hardware-aware NAS), and the best hardware depends on the neural architecture. It is therefore best to jointly explore both spaces to push forward the Pareto frontier between hardware efficiency and test accuracy for better design tradeoffs. ... As such, we use accuracy and pipeline efficiency to guide the exploration of the neural architecture space and hardware design space respectively, while satisfying a given throughput specifications (e.g., ≥30FPS for the ordinary camera). Experimental results show that the co-exploration approach can significantly push forward the Pareto frontier. On ImageNet, the proposed co-exploration framework can identify architecture and hardware pairs to achieve the same accuracy, 35.42% higher throughput, and 54.05% higher energy efficiency with the reduced search time, compared with the hardware-aware NAS.”; Examiner’s Note (EN): JIANG discloses a HW/SW co-exploration framework (corresponding to recited “trained predictor”) that predicts the accuracy of a neural network architecture on specific hardware)
and at least one processor configured to execute the at least one instruction to: (JIANG, p. 6, section IV: “we build the distributed GPU training environment on top of Uber Horovod”)
obtain the hardware arrangement comprising a plurality of interconnected components or devices; (JIANG, p. 1, Fig. 1b (see below); JIANG, p. 2, section II.A: “the main contribution of this work is to propose a framework to co-explore the architecture search space and the hardware design space, as shown in Figure 1(b). More specifically, this framework determines the best hardware during the search process, which is tailor-made for the candidate architectures. In this way, the framework can obtain a set of superior architecture and hardware design pairs on the Pareto frontier in terms of accuracy and hardware efficiency tradeoffs.”;

    PNG
    media_image2.png
    290
    504
    media_image2.png
    Greyscale

Examiner’s Note (EN): In Fig. 1b, the Designs 1 and 2 in the Hardware Design Space corresponds to the “hardware arrangement”, where the connected FPGAs correspond to the recited “interconnected components or devices”) 
obtaining the neural network model which is to be implemented on the hardware arrangement, the neural network model comprising a plurality of operations; (JIANG, p. 1, Fig. 1b (see below); JIANG, p. 2, section II.A: “the main contribution of this work is to propose a framework to co-explore the architecture search space and the hardware design space, as shown in Figure 1(b). More specifically, this framework determines the best hardware during the search process, which is tailor-made for the candidate architectures. In this way, the framework can obtain a set of superior architecture and hardware design pairs on the Pareto frontier in terms of accuracy and hardware efficiency tradeoffs.”

    PNG
    media_image2.png
    290
    504
    media_image2.png
    Greyscale

JIANG, p. 3, Fig. 2; JIANG, p. 3, section II.B: “Figure 2 demonstrates one such example, in which a 5-layer network is partitioned into 3 pipeline stages, and each pipeline stage is mapped to a certain FPGA in an available pool.”; 
JIANG, p. 4, section II.C: “From the software perspective, first, the proposed framework can handle neural networks with residual connections by integrating techniques in [34] to partition DAG-based child network; second, it can explore different operations (e.g., group convolutions, depthwise separable convolution, etc.) for each node in a child network by adding one additional parameter in parai to determine a specific operation for the node.;
Examiner’s Note (EN): In Fig. 1b, the NN1 and NN2 in the Architecture Search Space corresponds to the recited “neural network model” and as shown in Fig. 2, a child network which is integrated onto one or more FPGAs (e.g. as an integrated version of NN1 of Fig. 1), is a direct acyclic graph (DAG) that has operations including group convolutions or depthwise separable convolutions)	
obtain a first graphical model representing the hardware arrangement as a plurality of connected nodes being represented by one of the plurality of connected nodes; (JIANG, Fig. 2: 

    PNG
    media_image3.png
    284
    512
    media_image3.png
    Greyscale

(EN): As shown in Fig. 2, each of FPGA nodes (U1, U2, U3) corresponds to a different FPGA device, and therefore, the design in (4) corresponds to the recited “first graphical model” and the FPGAs (U1, U2, U3) are each of the “plurality of interconnected components or devices being represented by one of the plurality of connected nodes”)
obtain a second graphical model representing the neural network model as a plurality of interconnected nodes, each of the plurality of operations being represented by one of the plurality of interconnected nodes; (JIANG, Fig. 2: 

    PNG
    media_image3.png
    284
    512
    media_image3.png
    Greyscale

(EN): As shown in Fig. 2, in (1) there are 5 layer nodes for the child network (l1, l2, l3, l4, l5) (corresponding to recited “second graphical model” representing the operations of the child network)
output the performance of the neural network model on the hardware arrangement. (JIANG, Table II:

    PNG
    media_image4.png
    214
    498
    media_image4.png
    Greyscale

(EN): the performance is output and summarized in Table II of JIANG).

	However, JIANG fails to explicitly teach:
and comprising a first feature extractor and a second feature extractor, the first feature extractor and the second feature extractor comprising a respective graph convolutional network comprising a plurality of layers;
convert the first graphical model to a first adjacency matrix and a first initial feature matrix and converting the second graphical model to a second adjacency matrix and a second initial feature matrix, the first adjacency matrix and the second adjacency matrix respectively indicating which nodes are connected in the first graphical model and the second graphical model, and the first initial feature matrix and the second initial feature matrix respectively encapsulating node parameters of the first graphical model and the second graphical model;
extract, using the first feature extractor and the second feature extractor, respectively, a first graphical representation of the first graphical model from the first adjacency matrix and the first initial feature matrix, and a second graphical representation of the second graphical model from the second adjacency matrix and the second initial feature matrix, the first graphical representation and the second graphical representation respectively comprising feature vector representations for each node of the first graphical model and the second graphical model, a first graph convolutional network and a second graph convolutional network respectively comprising first weights and second weights obtained during training of the trained predictor, the first graph convolutional network and the second graph convolutional network respectively extracting the first graphical representation and the second graphical representation using a layer wise propagation rule;
predict, the performance of the neural network model on the hardware arrangement by using the first graphical representation of the first graphical model and the second graphical representation of the second graphical model as an input to a fully connected layer of the trained predictor, wherein the fully connected layer maps the input to one or more performance metrics comprising at least one of an accuracy, a latency, an energy consumption, thermals and memory utilization, wherein the trained predictor has been trained with measurements of the one or more performance metrics when running neural networks on hardware arrangements; and
wherein the extracting of the first graphical representation comprises extracting a feature vector for each node of the plurality of connected nodes, and
wherein:
based on the hardware arrangement being a single-chip device comprising the plurality of interconnected components or devices, the feature vector comprises at least one of a component type or a bandwidth; and 
based on the hardware arrangement being a system comprising the plurality of interconnected components or devices, the feature vector comprises at least one of a processor type, a device type, a clock frequency, a memory size, or a bandwidth.

However, in a related field of endeavor (neural architecture search), SHI teaches:
and comprising a first feature extractor and a second feature extractor, the first feature extractor and the second feature extractor comprising a respective graph convolutional network comprising a plurality of layers; (SHI, p. 2, section 1: “In the Embedding Extractor, we use a graph convolutional network (GCN) to produce embeddings for neural architectures.”; 
SHI, p. 3, section 2: “For a L-layer GCN...”
(EN): the JIANG-SHI combination now uses the GCNs of SHI to extract features, where a first GCN is used for a first extractor and a second GCN is used for a second extractor, and where the GCN has L layers (corresponding to the recited “plurality of layers”); the examiner further notes that pursuant to MPEP 2144.04 VI.B, “mere duplication of parts has no patentable significance unless a new and unexpected result is produced”, and it would have been obvious to one of ordinary skill to have 2 GCNs act as extractors)
convert the first graphical model to a first adjacency matrix and a first initial feature matrix and converting the second graphical model to a second adjacency matrix and a second initial feature matrix, the first adjacency matrix and the second adjacency matrix respectively indicating which nodes are connected in the first graphical model and the second graphical model, and the first initial feature matrix and the second initial feature matrix respectively encapsulating node parameters of the first graphical model and the second graphical model; (SHI, p. 3, section 3.1: “Specifically, graph connectivity is encoded by the adjacency matrix A, which can be obtained from the graph structure directly. Individual operations are encoded as one-hot vectors, and then aggregated to form the feature matrix X.”; 
SHI, Fig. 2: 

    PNG
    media_image5.png
    250
    478
    media_image5.png
    Greyscale

(EN): the JIANG-SHI combination now takes the first and second graphical models (relating to hardware arrangement and NN architecture, respectively), and to each, creates adjacency matrices and feature matrices as disclosed by SHI, where the adjacency matrix encodes connectivity of nodes, and the feature matrix relates to the individual operations (corresponding to recited “node parameters”); the examiner further notes that pursuant to MPEP 2144.04 VI.B, “mere duplication of parts has no patentable significance unless a new and unexpected result is produced”, and it would have been obvious to one of ordinary skill to create adjacency matrices and feature matrices for both the first and second graphical models)
extract, using the first feature extractor and the second feature extractor, respectively, a first graphical representation of the first graphical model from the first adjacency matrix and the first initial feature matrix, and a second graphical representation of the second graphical model from the second adjacency matrix and the second initial feature matrix, the first graphical representation and the second graphical representation respectively comprising feature vector representations for each node of the first graphical model and the second graphical model, (SHI, p. 3, section 2.3:

    PNG
    media_image6.png
    200
    474
    media_image6.png
    Greyscale

Examiner’s Note: the broadest reasonable interpretation of “extracting ... a first graphical representation of the first graphical model from the first adjacency matrix and the first initial feature matrix, and a second graphical representation of the second graphical model from the second adjacency matrix and the second initial feature matrix” includes using a layer-wise propagation rule that operates on the adjacency matrix and features matrix as explained on p. 11, lines 6-13 of the instant specification, and SHI similarly teaches such a layer-wise propagation that operates on the adjacency matrix (A) and the initial feature matrix (H(0)) to result in a feature matrix H(l); the JIANG-SHI combination now modifies JIANG to utilize the layer-wise propagation of SHI in order to extract the first and second graphical representations (H(l)); the examiner further notes that pursuant to MPEP 2144.04 VI.B, “mere duplication of parts has no patentable significance unless a new and unexpected result is produced”, and it would have been obvious to one of ordinary skill to create first and second graphical representations (H(l)) for the hardware architecture and NN architecture co-exploration paths)
a first graph convolutional network and a second graph convolutional network respectively comprising first weights and second weights obtained during training of the trained predictor, (SHI, p. 3, section 2.3: “The graph convolutional network (GCN) is a model for graph-structured data, which utilizes localized spectral filters to extract an useful embedding of each node”; 
SHI, p. 3, section 3.1: “To train the GCN, we feed its output (cell embedding) to a standard regressor. In this paper, we use a single-hidden layer neural network. Using the NAS-Bench data sets, the target output is the actual accuracy of the network constructed by stacking this particular cell. This regressor is then trained end-to-end with the GCN”
Examiner’s Note: The GCN is trained and the weights are reflected as W(l) weight matrices as shown in Equation (3) (the layer-wise propagation rule); the examiner further notes that pursuant to MPEP 2144.04 VI.B, “mere duplication of parts has no patentable significance unless a new and unexpected result is produced”, and it would have been obvious to one of ordinary skill to train first and second GCNs for the hardware architecture and NN architecture co-exploration paths, respectively)
the first graph convolutional network and the second graph convolutional network respectively extracting the first graphical representation and the second graphical representation using a layer wise propagation rule; (SHI, p. 3, section 2.3:

    PNG
    media_image6.png
    200
    474
    media_image6.png
    Greyscale

Examiner’s Note: the JIANG-SHI combination now modifies JIANG to utilize the layer-wise propagation of SHI in order to extract the first and second graphical representations (H(l)); the examiner further notes that pursuant to MPEP 2144.04 VI.B, “mere duplication of parts has no patentable significance unless a new and unexpected result is produced”, and it would have been obvious to one of ordinary skill to create first and second graphical representations (H(l)) for the hardware architecture and NN architecture co-exploration paths)
predict, the performance of the neural network model on the hardware arrangement by using the first graphical representation of the first graphical model and the second graphical representation of the second graphical model as an input to a fully connected layer of the trained predictor, (SHI, p. 4, section 3.2:

    PNG
    media_image7.png
    674
    480
    media_image7.png
    Greyscale

Examiner’s Note: the Bayesian linear regressor (BLR) corresponds to the recited trained predictor, and by inputting the updated features matrices (X) that correspond to the recited “first and second graphical representations”, the BLR outputs prediction variances as shown in Fig. 5, which uses a last fully-connected layer (corresponding to recited “fully-connected layer”); the JIANG-SHI combination now modifies JIANG to use the prediction methods of SHI)
wherein the fully connected layer maps the input to one or more performance metrics comprising at least one of an accuracy, a latency, an energy consumption, thermals and memory utilization, (SHI, p. 3, section 3.1: “Using the NAS-Bench data sets, the target output is the actual accuracy of the network constructed by stacking this particular cell”)
wherein the trained predictor has been trained with measurements of the one or more performance metrics when running neural networks on hardware arrangements; and (SHI, p. 4, section 3.4: 

    PNG
    media_image8.png
    512
    462
    media_image8.png
    Greyscale

Examiner’s Note: the predictor model is trained using predicted and ground truth accuracies using the loss metric of equation (6); the JIANG-SHI combination now modifies JIANG to use the trained predictor of SHI with respect to the accuracy metric)
wherein the extracting of the first graphical representation comprises extracting a feature vector for each node of the plurality of connected nodes, and (SHI, p. 3, section 2.3: “The graph convolutional network (GCN) is a model for graph-structured data, which utilizes localized spectral filters to extract an useful embedding of each node”

    PNG
    media_image6.png
    200
    474
    media_image6.png
    Greyscale

SHI, p. 3, section 3.1: “Specifically, graph connectivity is encoded by the adjacency matrix A, which can be obtained from the graph structure directly. Individual operations are encoded as one-hot vectors, and then aggregated to form the feature matrix X.”; 
SHI, Fig. 2: 

    PNG
    media_image5.png
    250
    478
    media_image5.png
    Greyscale

Examiner’s Note: SHI teaches a layer-wise propagation that operates on the adjacency matrix (A) and the initial feature matrix (H(0)) to result in a feature matrix H(l), and as shown in Fig. 2, the structure of the feature matrix has a one-hot encoded row vector for each of the 8 connected nodes, where each row vector corresponds to the recited “feature vector” of the feature matrix; the JIANG-SHI combination now modifies JIANG to utilize the layer-wise propagation of SHI in order to extract the first graphical representation (H(l)) which has the format of SHI such that each row is a row feature vector corresponding to a node as disclosed by SHI)

	Before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to modify the teachings of JIANG with respect to a co-exploration of hardware and neural network architectures, to use the teachings of SHI with respect to using GCNs to extract features from adjacency and feature matrices, which are used to predict the accuracy of the model.  As disclosed by SHI, one of ordinary skill would have been motivated to do so because SHI teaches, for neural architecture search, the “proposed BONAS” (Bayesian Optimized Neural Architecture Search”) is 123.7x more efficient than random search within a sample space.  (p. 2, section 1).  One of ordinary skill would further understand the benefit of making performance predictions directly from a graph, as disclosed by SHI, instead of having to train each model, which saves considerable resources.  (see p. 2, section 1).

	However, JIANG and SHI fail to explicitly teach:
wherein the hardware arrangement is selected as being a single-chip device comprising the plurality of interconnected components or devices or a system comprising the plurality of interconnected components or devices
wherein:
based on the hardware arrangement being selected as the single-chip device, the feature vector comprises at least one of a component type or a bandwidth; and 
based on the hardware arrangement being selected as the system, the feature vector comprises at least one of a processor type, a device type, a clock frequency, a memory size, or a bandwidth.

However, in a related field of endeavor (designing a system having multiple components, see para. 0097), SCANLON teaches and makes obvious:
wherein the hardware arrangement is selected as being a single-chip device comprising the plurality of interconnected components or devices or a system comprising the plurality of interconnected components or devices (SCANLON, para. 0097: “The designer of a system may choose to implement the functionality on a single processor or the functionality may be distributed across different devices and systems. Within a single device it is a matter of choice as to whether a single processor or multiple processors, including dedicated chips for e.g. audio processing, are used.”;
Examiner’s Note: the JIANG-SHI-SCANLON combination now makes the selection of a single-chip device vs. a multi-component system a design choice as taught by SCANLON)

	Before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to combine the teachings of JIANG with SHI and SCANLON as explained above.  One of ordinary skill would understand that a single-chip implementation has certain advantages, such as a smaller form factor, whereas a multiple-component system may be more powerful. 

However, JIANG and SHI and SCANLON fail to explicitly teach:
wherein:
based on the hardware arrangement being selected as the single-chip device, the feature vector comprises at least one of a component type or a bandwidth; and 
based on the hardware arrangement being selected as the system, the feature vector comprises at least one of a processor type, a device type, a clock frequency, a memory size, or a bandwidth.

	However, in a related field of endeavor (analyzing multi-component hardware systems and associated performance, see paras. 0001-0002), COHEN teaches:
wherein:
based on the hardware arrangement being selected as the single-chip device, the feature vector comprises at least one of a component type or a bandwidth; and (COHEN, para. 0060: “In the following, an example for computing a feature vector X is illustrated. As shown in FIG. 8, step 802 may include (i) a sub-step 804 in which a component type feature vector X.sub.t is computed using the set of metrics, and (ii) a sub-step 806 in which an instance feature vector X.sub.ins is computed for each component type.”
COHEN, para. 0089: “Embodiments can be realized in any computer-readable media for use by or in connection with an instruction execution system such as a computer/processor based system or an ASIC (Application Specific Integrated Circuit) or other system that can fetch or obtain the logic from computer-readable media and execute the instructions contained therein.”; 
Examiner’s Note: the JIANG-SHI-SCANLON-COHEN combination now modifies JIANG with respect to the feature matrices of SHI such that the feature vectors include information about a component type, including an ASIC, as in COHEN)
based on the hardware arrangement being selected as the system, the feature vector comprises at least one of a processor type, a device type, a clock frequency, a memory size, or a bandwidth. (COHEN, para. 0014: “Other examples include CPU utilization, memory utilization, disk utilization and bandwidth, or queries received/processed by a database. These features, for example, are related to the measured performance of cloud resources.”
COHEN, para. 0023: “Multiple actions may be performed for changing the specific configuration of resources in cloud 100 supporting execution of an application, such as any of the following: (a) adding a component type; (b) removing a component type; (c) re-allocating resources of cloud 100 realizing an instance of a component; (d) re-starting an instance of a component; or (e) changing the configuration of an instance of a component (e.g., allocate more memory or a higher CPU to an instance). Actions a) and b) are actions performed on a component type; actions c) to e) are actions on instances of components. It will be understood that this list of actions is not exhaustive. There is a vast variety of actions that may be performed for changing a configuration of cloud resources supporting execution of an application.”
COHEN, para. 0060: “In the following, an example for computing a feature vector X is illustrated. As shown in FIG. 8, step 802 may include (i) a sub-step 804 in which a component type feature vector X.sub.t is computed using the set of metrics, and (ii) a sub-step 806 in which an instance feature vector X.sub.ins is computed for each component type.”
COHEN, para. 0089: “Embodiments can be realized in any computer-readable media for use by or in connection with an instruction execution system such as a computer/processor based system or an ASIC (Application Specific Integrated Circuit) or other system that can fetch or obtain the logic from computer-readable media and execute the instructions contained therein.”; 
Examiner’s Note: the JIANG-SHI-SCANLON-COHEN combination now modifies JIANG with respect to the feature matrices of SHI such that the feature vectors include information about a processor type, including an ASIC, as in COHEN, and can further include information regarding different CPUs, more memory, and bandwidth metrics as taught by COHEN)

Before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to modify the teachings of JIANG, SHI, and COHEN as explained above.  As disclosed by COHEN, one of ordinary skill would have been motivated to do so because COHEN teaches modifying components and component types for scaling execution of an application. (para. 0022).  One of ordinary skill would understand the benefit of encoding aspects related to a component type for consideration by the neural network.

Regarding Claim 15
	Claim 15 recites: “A non-transitory machine-readable medium containing instructions that, when executed, cause at least one processor of an apparatus to perform operations corresponding to the computer-implemented method of claim 1.”
	The examiner notes that claim 14 claims a server having a memory (e.g., a non-transitory machine-readable medium containing instructions) and corresponds to the method of claim 1, from which claim 15 depends.  Therefore, claim 15 is rejected for the same reasons explained above with respect to claim 14.

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over JIANG in view of SHI, SCANLON, and COHEN and further in view of US 20200410337 A1, hereinafter referenced as HUANG.

Regarding Claim 8:
	JIANG and SHI and COHEN the method of claim 1 as explained above.  However, JIANG, SHI, SCANLON, and COHEN do not explicitly teach:
wherein the predicting of the performance comprises predicting individual performances of each of the plurality of operations.

	However, in a related field of endeavor (neural networks), HUANG teaches:
wherein the predicting of the performance of the obtained first neural network model comprises predicting individual performances of each of the first plurality of operations. (HUANG, para. 0119: “a tensor operation, such as a convolution operation, can be divided into multiple sub-operations, where each sub-operation may be performed by a computing engine to generate a portion of the output feature maps, and the results of the sub-operations may be used individually or in combination to make an earlier prediction or decision.”; (EN): the JIANG-SHI-SCANLON-COHEN-HUANG combination now modifies the hardware and NN architecture co-exploration system of JIANG to now track the individual operation performances (e.g., 1x1 conv, 3x3 conv, etc) at an individual level as in HUANG when determining the performance)

Before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to modify the teachings of JIANG with SHI, SCANLON, COHEN, and HUANG as explained above.  As disclosed by HUANG, one of ordinary skill would be motivated to do so because “it may be desirable to make a prediction or decision as soon as possible in some applications, such as some applications where the prediction or decision may be used for real-time control or other real-time operations.”  (HUANG, para. 0119).  One of ordinary skill would understand that keeping account of individual operation performance could allow for more granular decision making and predictions.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Zimmer, Brian, et al. "A 0.32–128 TOPS, scalable multi-chip-module-based deep neural network inference accelerator with ground-referenced signaling in 16 nm." IEEE Journal of Solid-State Circuits 55.4 (April 2020): 920-932. “Because there is only one unique chip in the system, the architecture must be efficient for both small single-chip configurations and huge 36-chip configurations.” (p. 921, section I).

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL C LEE whose telephone number is (571)272-4933. The examiner can normally be reached M-F 12:00 pm - 8:00 pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez Rivas can be reached at 571-272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MICHAEL C. LEE/Examiner, Art Unit 2128                                                                                                                                                                                                        

/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128
Read full office action
Prosecution Timeline

Show 3 earlier events
May 07, 2025
Examiner Interview Summary
Jun 10, 2025
Response Filed
Jul 10, 2025
Final Rejection mailed — §103
Sep 10, 2025
Request for Continued Examination
Sep 18, 2025
Response after Non-Final Action
Oct 23, 2025
Non-Final Rejection mailed — §103
Jan 20, 2026
Response Filed
Feb 10, 2026
Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/475,724
Patent 12603081
METHOD AND SERVER FOR A TEXT-TO-SPEECH PROCESSING
4y 7m to grant Granted Apr 14, 2026
17/732,871
Patent 12602605
QUANTUM COMPUTER ARCHITECTURE BASED ON MULTI-QUBIT GATES
3y 11m to grant Granted Apr 14, 2026
17/207,554
Patent 12591915
METHODS AND SYSTEMS FOR DETERMINING RECOMMENDATIONS BASED ON REAL-TIME OPTIMIZATION OF MACHINE LEARNING MODELS
5y 0m to grant Granted Mar 31, 2026
18/885,396
Patent 12585743
INTERFACE ACCESS PROCESSING METHOD, COMPUTER DEVICE AND STORAGE MEDIUM
1y 6m to grant Granted Mar 24, 2026
17/486,877
Patent 12568935
AI-BASED LIVESTOCK MANAGEMENT SYSTEM AND LIVESTOCK MANAGEMENT METHOD THEREOF
4y 5m to grant Granted Mar 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

5-6
Expected OA Rounds
61%
Grant Probability
87%
With Interview (+26.0%)
3y 3m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 144 resolved cases by this examiner. Grant probability derived from career allowance rate.