Prosecution Insights
Last updated: April 19, 2026
Application No. 18/160,680

DEPLOYING NEURAL NETWORK MODELS ON RESOURCE-CONSTRAINED DEVICES

Non-Final OA §103
Filed
Jan 27, 2023
Examiner
AKBARI, FARAZ TIMA
Art Unit
2196
Tech Center
2100 — Computer Architecture & Software
Assignee
Sony Group Corporation
OA Round
1 (Non-Final)
0%
Grant Probability
At Risk
1-2
OA Rounds
3y 3m
To Grant
0%
With Interview

Examiner Intelligence

Grants only 0% of cases
0%
Career Allow Rate
0 granted / 2 resolved
-55.0% vs TC avg
Minimal +0% lift
Without
With
+0.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 3m
Avg Prosecution
36 currently pending
Career history
38
Total Applications
across all art units

Statute-Specific Performance

§101
13.0%
-27.0% vs TC avg
§103
71.2%
+31.2% vs TC avg
§102
1.1%
-38.9% vs TC avg
§112
14.7%
-25.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 2 resolved cases

Office Action

§103
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . This office action is in response to claims filed 1/27/2023. Claims 1-20 are pending. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1-4 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Paek et al. (US 11175898 B2) in view of Banitalebi Dehkordi et al. (US 20220414432 A1), hereinafter referred to as Paek and Banitalebi, respectively. Regarding Claim 1, Paek discloses A method, comprising: storing, on a persistent storage of a first electronic device, a model file that includes a neural network mode (Col. 4, Line 67- A memory 240 includes neural network model document files 244. Please note the described embodiments of the reference stating a memory 240 including neural network model document files 244 corresponds to Applicant’s method comprising storing a model file that includes a neural network mode on a persistent storage of a first electronic device, as it is known in the art that a memory of an electronic device may include a persistent storage.); determining constraint information associated with a deployment of the neural network model on the first electronic device (Col. 2, Lines 26-30- when deploying a given deep neural network for execution on a target platform and/or target processor on the target platform, depending on the available hardware, resource constraints (e.g., memory and/or computing). Please note that considering resource constraints of a target processor when deploying a deep neural network corresponds to Applicant’s determining constraint associated with a deployment of the neural network model on the first electronic device.); receiving an input associated with a machine learning task (Col. 6, Lines 58-60-, a neural network (NN) is a computing model that uses a collection of connected nodes to process input data based on machine learning techniques. Please note that the neural network model processing input data based on machine learning techniques corresponds to Applicant’s receiving an input associated with a machine learning task.); a second operation to generate an intermediate result by an application of the sub-model on the input (Col. 11, Lines 57-58- intermediate data layer 408 uses the output of intermediate data layer 406. Please note that the output of an intermediate data layer corresponds to Applicant’s second operation to generate an intermediate result by an application of the sub-model on the input, as the layers which are inherent in sub-models as later disclosed by Banitalebi are operating on inputs.); a third operation to unload the sub-model from the working memory of the first electronic device (Col. 12, Lines 7-10- a memory allocation may be required to hold the output until whatever intermediate data layer needs it has used the output. Please note that holding the output until whatever intermediate data layer needs it has used the output corresponds to Applicant’s third operation to unload the sub-model from the working memory of the first electronic device, as there must inherently be a means by which the memory is un-allocated once it is no longer needed.); repeating the execution of the first set of operations for a next sub-model of the plurality of sub-models to generate an output, wherein the intermediate result is the input for the next sub-model (Col. 11, Lines 46-60- Convolutional neural network 400 also illustrates the dependencies between different intermediate data layers. Thus, intermediate data layer 404 and intermediate data layer 406 both use the output of intermediate data layer 402; intermediate data layer 408 uses the output of intermediate data layer 406; and intermediate data layer 410 uses the output of intermediate data layer 408 and intermediate data layer 404. Please note that dependencies between different, subsequent intermediate data layers, where for example the output of layer 406 is used for layer 408, corresponds to Applicant’s repeating the execution of the first set of operations of the first set of operations for a next sub-model of the plurality of sub-models to generate an output, wherein the intermediate result is the input for the next sub-model, since, as later disclosed by Banitalebi, each sub-model consists of layers, and therefore may implement this intermediate layer pipeline system.; and controlling a first display device to render the output (Col. 17, Lines 17-19- The output device interface 806 may enable, for example, the display of images generated by electronic system 800. Please note that displaying images generated by electronic system 800 on the output device interface 806 corresponds to Applicant’s controlling a first display device to render the output.). Paek does not explicitly disclose determining a partition of the neural network model based on the constraint information and the model file; extracting a plurality of sub-models from the neural network model based on the partition; executing a first set of operations for a sub-model of the plurality of sub-models, wherein the first set of operations comprises: a first operation to load the sub-model in a working memory of the first electronic device; However, Banitalebi discloses determining a partition of the neural network model based on the constraint information and the model file ([0014] In one or more of the preceding aspects, the selecting may be further based on a memory constraint for the first device.; [0040] An deep learning model splitting module 10 (hereinafter splitting module 10) is configured to receive, as an input a trained deep learning model for an inference task, and automatically process the trained deep learning model to divide (i.e. split) it into first and second deep learning models. Please note that dividing or splitting a deep learning model into first and second deep learning models corresponds to Applicant’s determining a partition of the neural network model based on the constraint information and the model file; it is known to one of ordinary skill in the art that a deep learning model is a variant of a neural network model, and this splitting is inherently done based on the previously mentioned input model file and memory constraints. ); extracting a plurality of sub-models from the neural network model based on the partition ([0040] An deep learning model splitting module 10 (hereinafter splitting module 10) is configured to receive, as an input a trained deep learning model for an inference task, and automatically process the trained deep learning model to divide (i.e. split) it into first and second deep learning models. Please note that the first and second deep learning models that are created as a result of the split/divide correspond to Applicant’s extracting a plurality of sub-models from the neural network model based on the partition.); executing a first set of operations for a sub-model of the plurality of sub-models, wherein the first set of operations comprises: a first operation to load the sub-model in a working memory of the first electronic device ([0041] the deep learning model that is provided as input to the splitting module 10 is a trained DNN 11, and the resulting first and second deep learning models that are generated by the splitting module 10 are an edge DNN 30 that is configured to for deployment on a target edge device 88 and a cloud DNN 40 that is configured for deployment on a target cloud device 86. Please note that the resulting split models being configured for deployment corresponds to Applicant’s executing a first set of operations for a sub-model of the plurality of sub-models, wherein the first set of operations comprises a first operation to load the sub-model, i.e., the split model, in a working memory of the first electronic device, i.e., deploying it.); Paek and Banitalebi are both considered to be analogous to the claimed invention because they are in the same field of managing neural networks while considering computer resource constraints. Therefore, it would have been obvious to someone of ordinary skill in the art prior to the effective filing date of the claimed invention to have modified Paek to incorporate the teachings of Banitalebi to modify the neural network model system that stores a model file, determines constraint information, receives input for a machine learning task, generates an intermediate result with intermediate layers that use previous output as input, and unloads from memory once completed to partition the neural network model based on the constraint information and model file, extract sub-models based on the partition, and execute a set of operations including loading the sub-model into working memory, allowing for improved performance via decreased latency and more flexible deployment, as described in Banitalebi. Regarding Claim 2, Paek-Banitalebi as described in Claim 1, Banitalebi further discloses wherein each sub-model of the plurality of sub- models includes a subset of a set of NN layers of the neural network model ([0010] identifying a first set of one or more neural network layers from the trained neural network for inclusion in the first neural network and a second set of one or more neural network layers from the trained neural network for inclusion in the second neural network. Please note that identifying corresponding sets of neural network layers for inclusion in the split neural network models corresponds to Applicant’s sub-model of the plurality of sub-models each including a subset of NN layers of the neural network model. ). Regarding Claim 3, Paek-Banitalebi as described in Claim 1, Paek further discloses wherein the first set of operations further comprises a fourth operation to store the intermediate result in the persistent storage (Col. 11, Lines 57-58- intermediate data layer 408 uses the output of intermediate data layer 406; Col. 15, Lines 50-52-different memory allocation portions are designated for different intermediate data layers. Please note that memory allocation being performed for intermediate data layers which have outputs corresponds to Applicant’s first set of operations further comprising a fourth operation to store the intermediate result in the persistent storage.). Regarding Claim 4, Paek-Banitalebi as described in Claim 1, Paek further discloses wherein the constraint information includes at least one of: a size of the working memory of the first electronic device, a processing capability of the first electronic device to perform a count of multiply-accumulate (MAC) operations per second, a network communication capability indicative of a transmission bandwidth of the first electronic device and a reception bandwidth of the first electronic device, and an indication that the input includes personal or sensitive data (Col. 12, Lines 44-47- the total amount of memory available for allocation may be determined based at least in part on an amount of available memory of a given target device. Please note that the total amount of memory available for allocation being determined based on an amount of available memory of a given target device corresponds to Applicant’s constraint information including a size of the working memory of the first electronic device. As the claim states “at least one of” the possible limitations included in the constraint information, this is interpreted as fulfilling the requirements of the claim.). Regarding Claim 19, Paek discloses A first electronic device, comprising: a memory configured to store a model file that includes a neural network model (Col. 4, Line 67- A memory 240 includes neural network model document files 244. Please note the described embodiments of the reference stating a memory 240 including neural network model document files 244 corresponds to Applicant’s memory configured to store a model file that includes a neural network mode on a first electronic device, as it is known in the art that a memory of an electronic device may include a persistent storage.); and circuitry configured to (Col. 6, Lines 33-37-specialized (e.g., dedicated) hardware has been developed that is optimized for performing particular operations from a given NN. A given electronic device may include a neural processor, which can be implemented as circuitry that performs various machine learning operations. Please note that a neural processor implemented as circuitry performing various machine learning operations corresponds to Applicant’s circuitry): determine constraint information associated with a deployment of the neural network model on the first electronic device (Col. 2, Lines 26-30- when deploying a given deep neural network for execution on a target platform and/or target processor on the target platform, depending on the available hardware, resource constraints (e.g., memory and/or computing). Please note that considering resource constraints of a target processor when deploying a deep neural network corresponds to Applicant’s determining constraint associated with a deployment of the neural network model on the first electronic device.); receive an input associated with a machine learning task (Col. 6, Lines 58-60-, a neural network (NN) is a computing model that uses a collection of connected nodes to process input data based on machine learning techniques. Please note that the neural network model processing input data based on machine learning techniques corresponds to Applicant’s receiving an input associated with a machine learning task.); a second operation to generate an intermediate result by an application of the sub-model on the input (Col. 11, Lines 57-58- intermediate data layer 408 uses the output of intermediate data layer 406. Please note that the output of an intermediate data layer corresponds to Applicant’s second operation to generate an intermediate result by an application of the sub-model on the input, as the layers which are inherent in sub-models as later disclosed by Banitalebi are operating on inputs.); a third operation to unload the sub-model from the working memory of the first electronic device (Col. 12, Lines 7-10- a memory allocation may be required to hold the output until whatever intermediate data layer needs it has used the output. Please note that holding the output until whatever intermediate data layer needs it has used the output corresponds to Applicant’s third operation to unload the sub-model from the working memory of the first electronic device, as there must inherently be a means by which the memory is un-allocated once it is no longer needed.); repeat the execution of the first set of operations for a next sub-model of the plurality of sub-models to generate an output, wherein the intermediate result is the input for the next sub-model (Col. 11, Lines 46-60- Convolutional neural network 400 also illustrates the dependencies between different intermediate data layers. Thus, intermediate data layer 404 and intermediate data layer 406 both use the output of intermediate data layer 402; intermediate data layer 408 uses the output of intermediate data layer 406; and intermediate data layer 410 uses the output of intermediate data layer 408 and intermediate data layer 404. Please note that dependencies between different, subsequent intermediate data layers, where for example the output of layer 406 is used for layer 408, corresponds to Applicant’s repeating the execution of the first set of operations of the first set of operations for a next sub-model of the plurality of sub-models to generate an output, wherein the intermediate result is the input for the next sub-model, since, as later disclosed by Banitalebi, each sub-model consists of layers, and therefore may implement this intermediate layer pipeline system.; and control a first display device to render the output (Col. 17, Lines 17-19- The output device interface 806 may enable, for example, the display of images generated by electronic system 800. Please note that displaying images generated by electronic system 800 on the output device interface 806 corresponds to Applicant’s controlling a first display device to render the output.). Paek does not explicitly disclose determine a partition of the neural network model based on the constraint information and the model file; extract a plurality of sub-models from the neural network model based on the partition; execute a first set of operations for a sub-model of the plurality of sub-models, wherein the first set of operations comprises: a first operation to load the sub-model in a working memory of the first electronic device; However, Banitalebi discloses determine a partition of the neural network model based on the constraint information and the model file ([0014] In one or more of the preceding aspects, the selecting may be further based on a memory constraint for the first device.; [0040] An deep learning model splitting module 10 (hereinafter splitting module 10) is configured to receive, as an input a trained deep learning model for an inference task, and automatically process the trained deep learning model to divide (i.e. split) it into first and second deep learning models. Please note that dividing or splitting a deep learning model into first and second deep learning models corresponds to Applicant’s determining a partition of the neural network model based on the constraint information and the model file; it is known to one of ordinary skill in the art that a deep learning model is a variant of a neural network model, and this splitting is inherently done based on the previously mentioned input model file and memory constraints. ); extract a plurality of sub-models from the neural network model based on the partition ([0040] An deep learning model splitting module 10 (hereinafter splitting module 10) is configured to receive, as an input a trained deep learning model for an inference task, and automatically process the trained deep learning model to divide (i.e. split) it into first and second deep learning models. Please note that the first and second deep learning models that are created as a result of the split/divide correspond to Applicant’s extracting a plurality of sub-models from the neural network model based on the partition.); execute a first set of operations for a sub-model of the plurality of sub-models, wherein the first set of operations comprises: a first operation to load the sub-model in a working memory of the first electronic device ([0041] the deep learning model that is provided as input to the splitting module 10 is a trained DNN 11, and the resulting first and second deep learning models that are generated by the splitting module 10 are an edge DNN 30 that is configured to for deployment on a target edge device 88 and a cloud DNN 40 that is configured for deployment on a target cloud device 86. Please note that the resulting split models being configured for deployment corresponds to Applicant’s executing a first set of operations for a sub-model of the plurality of sub-models, wherein the first set of operations comprises a first operation to load the sub-model, i.e., the split model, in a working memory of the first electronic device, i.e., deploying it.); Paek and Banitalebi are both considered to be analogous to the claimed invention because they are in the same field of managing neural networks while considering computer resource constraints. Therefore, it would have been obvious to someone of ordinary skill in the art prior to the effective filing date of the claimed invention to have modified Paek to incorporate the teachings of Banitalebi to modify the neural network model system that stores a model file, determines constraint information, receives input for a machine learning task, generates an intermediate result with intermediate layers that use previous output as input, and unloads from memory once completed to partition the neural network model based on the constraint information and model file, extract sub-models based on the partition, and execute a set of operations including loading the sub-model into working memory, allowing for improved performance via decreased latency and more flexible deployment, as described in Banitalebi. Regarding Claim 20, Paek discloses A non-transitory computer-readable medium having stored thereon, computer executable instructions that, when executed by an electronic device, causes the electronic device to perform operations, the operations comprising (Col. 19, Lines 23-30- The tangible computer-readable storage medium also can be non-transitory in nature. The computer-readable storage medium can be any storage medium that can be read, written, or otherwise accessed by a general purpose or special purpose computing device, including any processing electronics and/or processing circuitry capable of executing instructions. Please note the non-transitory computer-readable storage medium that can be read by a computing device to execute instructions corresponds to Applicant’s non-transitory computer-readable medium having stored thereon, computer executable instructions that, when executed by an electronic device, causes the electronic device to perform operations.): storing, on a persistent storage of a first electronic device, a model file that includes a neural network mode (Col. 4, Line 67- A memory 240 includes neural network model document files 244. Please note the described embodiments of the reference stating a memory 240 including neural network model document files 244 corresponds to Applicant’s storing a model file that includes a neural network mode on a persistent storage of a first electronic device, as it is known in the art that a memory of an electronic device may include a persistent storage.); determining constraint information associated with a deployment of the neural network model on the first electronic device (Col. 2, Lines 26-30- when deploying a given deep neural network for execution on a target platform and/or target processor on the target platform, depending on the available hardware, resource constraints (e.g., memory and/or computing). Please note that considering resource constraints of a target processor when deploying a deep neural network corresponds to Applicant’s determining constraint associated with a deployment of the neural network model on the first electronic device.); receiving an input associated with a machine learning task (Col. 6, Lines 58-60-, a neural network (NN) is a computing model that uses a collection of connected nodes to process input data based on machine learning techniques. Please note that the neural network model processing input data based on machine learning techniques corresponds to Applicant’s receiving an input associated with a machine learning task.); a second operation to generate an intermediate result by an application of the sub-model on the input (Col. 11, Lines 57-58- intermediate data layer 408 uses the output of intermediate data layer 406. Please note that the output of an intermediate data layer corresponds to Applicant’s second operation to generate an intermediate result by an application of the sub-model on the input, as the layers which are inherent in sub-models as later disclosed by Banitalebi are operating on inputs.); a third operation to unload the sub-model from the working memory of the first electronic device (Col. 12, Lines 7-10- a memory allocation may be required to hold the output until whatever intermediate data layer needs it has used the output. Please note that holding the output until whatever intermediate data layer needs it has used the output corresponds to Applicant’s third operation to unload the sub-model from the working memory of the first electronic device, as there must inherently be a means by which the memory is un-allocated once it is no longer needed.); repeating the execution of the first set of operations for a next sub-model of the plurality of sub-models to generate an output, wherein the intermediate result is the input for the next sub-model (Col. 11, Lines 46-60- Convolutional neural network 400 also illustrates the dependencies between different intermediate data layers. Thus, intermediate data layer 404 and intermediate data layer 406 both use the output of intermediate data layer 402; intermediate data layer 408 uses the output of intermediate data layer 406; and intermediate data layer 410 uses the output of intermediate data layer 408 and intermediate data layer 404. Please note that dependencies between different, subsequent intermediate data layers, where for example the output of layer 406 is used for layer 408, corresponds to Applicant’s repeating the execution of the first set of operations of the first set of operations for a next sub-model of the plurality of sub-models to generate an output, wherein the intermediate result is the input for the next sub-model, since, as later disclosed by Banitalebi, each sub-model consists of layers, and therefore may implement this intermediate layer pipeline system.; and controlling a first display device to render the output (Col. 17, Lines 17-19- The output device interface 806 may enable, for example, the display of images generated by electronic system 800. Please note that displaying images generated by electronic system 800 on the output device interface 806 corresponds to Applicant’s controlling a first display device to render the output.). Paek does not explicitly disclose determining a partition of the neural network model based on the constraint information and the model file; extracting a plurality of sub-models from the neural network model based on the partition; executing a first set of operations for a sub-model of the plurality of sub-models, wherein the first set of operations comprises: a first operation to load the sub-model in a working memory of the first electronic device; However, Banitalebi discloses determining a partition of the neural network model based on the constraint information and the model file ([0014] In one or more of the preceding aspects, the selecting may be further based on a memory constraint for the first device.; [0040] An deep learning model splitting module 10 (hereinafter splitting module 10) is configured to receive, as an input a trained deep learning model for an inference task, and automatically process the trained deep learning model to divide (i.e. split) it into first and second deep learning models. Please note that dividing or splitting a deep learning model into first and second deep learning models corresponds to Applicant’s determining a partition of the neural network model based on the constraint information and the model file; it is known to one of ordinary skill in the art that a deep learning model is a variant of a neural network model, and this splitting is inherently done based on the previously mentioned input model file and memory constraints. ); extracting a plurality of sub-models from the neural network model based on the partition ([0040] An deep learning model splitting module 10 (hereinafter splitting module 10) is configured to receive, as an input a trained deep learning model for an inference task, and automatically process the trained deep learning model to divide (i.e. split) it into first and second deep learning models. Please note that the first and second deep learning models that are created as a result of the split/divide correspond to Applicant’s extracting a plurality of sub-models from the neural network model based on the partition.); executing a first set of operations for a sub-model of the plurality of sub-models, wherein the first set of operations comprises: a first operation to load the sub-model in a working memory of the first electronic device ([0041] the deep learning model that is provided as input to the splitting module 10 is a trained DNN 11, and the resulting first and second deep learning models that are generated by the splitting module 10 are an edge DNN 30 that is configured to for deployment on a target edge device 88 and a cloud DNN 40 that is configured for deployment on a target cloud device 86. Please note that the resulting split models being configured for deployment corresponds to Applicant’s executing a first set of operations for a sub-model of the plurality of sub-models, wherein the first set of operations comprises a first operation to load the sub-model, i.e., the split model, in a working memory of the first electronic device, i.e., deploying it.); Paek and Banitalebi are both considered to be analogous to the claimed invention because they are in the same field of managing neural networks while considering computer resource constraints. Therefore, it would have been obvious to someone of ordinary skill in the art prior to the effective filing date of the claimed invention to have modified Paek to incorporate the teachings of Banitalebi to modify the neural network model system that stores a model file, determines constraint information, receives input for a machine learning task, generates an intermediate result with intermediate layers that use previous output as input, and unloads from memory once completed to partition the neural network model based on the constraint information and model file, extract sub-models based on the partition, and execute a set of operations including loading the sub-model into working memory, allowing for improved performance via decreased latency and more flexible deployment, as described in Banitalebi. Claims 5-18 are rejected under 35 U.S.C. 103 as being unpatentable over Paek et al. (US 11175898 B2) in view of Banitalebi Dehkordi et al. (US 20220414432 A1), and further in view of Muthusamy et al. (US 20200349413 A1), hereinafter referred to as Paek, Banitalebi, and Muthusamy, respectively. Regarding Claim 5, Paek-Banitalebi as described in Claim 4, Paek further discloses determining a memory footprint of each NN layer of a set of NN layers of the neural network model, wherein the memory footprint is indicative of a memory required to load a corresponding NN layer on the working memory of the first electronic device as part of a sub-model of the plurality of sub-models (Col. 7, Lines 55-58- the memory allocations 344 correspond to code for allocating memory portions based on a determined size of each layer of the NN and/or based on an amount of memory available at the target device. Please note that determining the size of each layer of the NN and allocating memory accordingly at the target device corresponds to Applicant’s determining a memory footprint of each NN layer of a set of NN layers of the neural network model, with the memory footprint being indicative of a memory required to load a corresponding NN layer on the working memory of the first electronic device as part of a sub-model of the plurality of sub-models. This is because, as previously stated by Banitalebi in the combination, the sub-models are each comprised of NN layers.); Paek-Banitalebi does not explicitly disclose and grouping adjoining NN layers of the set of NN layers into a plurality of subsets of NN layers based on the determined memory footprint of each NN layer, wherein a memory footprint of each subset is less than or equal to the size of the working memory of the first electronic device, and the partition of the neural network model is further determined based on the grouping of the adjoining NN layers of the set of NN layers However, Muthusamy discloses and grouping adjoining NN layers of the set of NN layers into a plurality of subsets of NN layers based on the determined memory footprint of each NN layer, wherein a memory footprint of each subset is less than or equal to the size of the working memory of the first electronic device, and the partition of the neural network model is further determined based on the grouping of the adjoining NN layers of the set of NN layers ([0029] a monolithic grouping strategy that has constraints on total size (e.g. Cloud Functions). In the monolithic grouping strategy, all model outputs are available every time when running the models and there is a high inferencing throughput. The dotted boxes in FIGS. 4-6 represent groups, which are the minimum units of deployment granularity. All layers in a group must be deployed together to one compute unit. Please note that a grouping strategy to have layers in a group corresponds to Applicant’s grouping adjoining NN layers of the set of NN layers based on the determined memory footprint of each NN layer, wherein a memory footprint of each subset is less than or equal to the size of the working memory of the first electronic device, as since the group is deployed together to a compute unit based on constraints on total size, each subset inherently has a memory footprint less than or equal to the size of the working memory. Furthermore, as they are all deployed together when grouped, this corresponds to them being adjoining, and there is inherently a partition of the neural network model when the NN layers of the set of NN layers are grouped to separate them from other groups. ). Paek-Banitalebi and Muthusamy are both considered to be analogous to the claimed invention because they are in the same field of managing resources for neural network models. Therefore, it would have been obvious to someone of ordinary skill in the art prior to the effective filing date of the claimed invention to have modified Paek-Banitalebi to incorporate the teachings of Muthusamy to modify the system as described in Claim 4 that determines a memory footprint of each NN layer to group adjoining NN layers into subsets based on the memory footprint of each layer, allowing for improved memory management and resource usage, as described in Muthusamy. Regarding Claim 6, Paek-Banitalebi-Muthusamy as described in Claim 5, Banitalebi further discloses partitioning the neural network model based on the plurality of subsets of NN layers, wherein each subset of the plurality of subsets of NN layers corresponds to a sub-model of the plurality of sub-models, and the plurality of sub-models is extracted further based on the partitioning ([0010] identifying a first set of one or more neural network layers from the trained neural network for inclusion in the first neural network and a second set of one or more neural network layers from the trained neural network for inclusion in the second neural network. Please note that identifying corresponding sets of neural network layers for inclusion in the split neural network models corresponds to Applicant’s partitioning the neural network model based on the plurality of subsets of NN layers, wherein each subset of the plurality of subsets of NN layers corresponds to a sub-model of the plurality of sub-models, and the plurality of sub-models is extracted further based on the partitioning. This is because, as previously stated by Muthusamy, each subset of the plurality of subsets belongs to a model, which could include the previously mentioned sub-models of Banitalebi, and therefore the partitions between subsets would also allow for the extraction of sub-models that they correspond to.). Regarding Claim 7, Paek-Banitalebi-Muthusamy as described in Claim 5, Paek further discloses wherein the memory footprint of each NN layer is determined based on a size of the corresponding NN layer (Col. 7, Lines 55-58- the memory allocations 344 correspond to code for allocating memory portions based on a determined size of each layer of the NN and/or based on an amount of memory available at the target device. Please note that determining the size of each layer of the NN and allocating memory accordingly at the target device corresponds to Applicant’s determining a memory footprint of each NN layer based on a size of the corresponding NN layer.), a size of an input to be received by the corresponding NN layer, a size of an output to be generated by the corresponding NN layer (Col. 5, Lines 5-8-information including descriptions of input and output feature(s), […] may be included in a given neural network model document file. Please note that information including descriptions of input and output features corresponds to Applicant’s input and output sizes for corresponding NN layers that may be considered in determining the memory footprint.), and a size of a buffer to be allocated to the corresponding NN layer (Col. 12, Lines 7-10- a memory allocation may be required to hold the output until whatever intermediate data layer needs it has used the output. Please note that the memory allocation holding the output until the intermediate data layer that needs it has used it corresponds to the buffer to be allocated to the corresponding NN layer, and in making the allocation, there must necessarily be a size.), and the memory footprint of each subset of the plurality of subsets of NN layers is a sum of memory footprints of adjoining NN layers of the set of NN layers that are grouped into a corresponding subset of the plurality of subsets (Col. 7, Lines 55-58- the memory allocations 344 correspond to code for allocating memory portions based on a determined size of each layer of the NN and/or based on an amount of memory available at the target device. Please note that as the size of each layer of the NN is determined and memory is allocated accordingly at the target device, a set of layers as grouped in the subset disclosed by Muthusamy would be able to have the memory footprint of each subset of the plurality of subsets of NN layers determined via the sum of footprints of adjoining layers of the set that are grouped; the sum of memory footprints of the adjoining grouped layers of the subset would be obvious as a measure of the memory footprint of the subset.). Regarding Claim 8, Paek-Banitalebi-Muthusamy as described in Claim 5, Muthusamy further discloses wherein the determined memory footprint of each NN layer of the set of NN layers of the neural network model is less than or equal to the size of the working memory of the first electronic device ([0029] a monolithic grouping strategy that has constraints on total size (e.g. Cloud Functions). In the monolithic grouping strategy, all model outputs are available every time when running the models and there is a high inferencing throughput. The dotted boxes in FIGS. 4-6 represent groups, which are the minimum units of deployment granularity. All layers in a group must be deployed together to one compute unit. Please note that the constraints of total size being considered when deploying layers in a group to one compute unit corresponds to Applicant’s wherein the determined memory footprint of each NN layer of the set of NN layers of the neural network model is less than or equal to the size of the working memory of the first electronic device, as since the group of NN layers is deployed together to a compute unit based on constraints on total size, each layer of the subset inherently has a memory footprint less than or equal to the size of the working memory.). Regarding Claim 9, Paek-Banitalebi as described in Claim 4, Paek further discloses determining a count of MAC operations associated with each NN layer of a set of NN layers of the neural network model (Col. 6, Lines 33-43-specialized (e.g., dedicated) hardware has been developed that is optimized for performing particular operations from a given NN. A given electronic device may include a neural processor, which can be implemented as circuitry that performs various machine learning operations based on computations including multiplication, adding and accumulation. Such computations may be arranged to perform, for example, convolution of input data. A neural processor, in an example, is specifically configured to perform machine learning algorithms, typically by operating on predictive models such as NNs. Please note that the neural processor circuitry performing machine learning operations based on computations including multiplication, adding, and accumulating, by operating on NN predictive models corresponds to Applicant’s determining a count of MAC operations associated with each NN layer of a set of NN layers of the neural network model, because as will be disclosed, the NN model consists of NN layers, and since it must follow processing resource constraints, it would be obvious that each layer of the set that performs MAC operations factors in to the determination of meeting the constraints.); wherein a count of MAC operations associated with each subset is less than or equal to the processing capability of the first electronic device (Col. 2, Lines 26-31- when deploying a given deep neural network for execution on a target platform and/or target processor on the target platform, depending on the available hardware, resource constraints (e.g., memory and/or computing) can be encountered that may limit the execution of a given neural network. Please note that the constraints of computing resource availability being considered when deploying layers in a group to one compute unit corresponds to Applicant’s wherein the count of MAC operations associated with each subset is less than or equal to the processing capability of the first electronic device, as since subset is deployed together to a compute unit based on constraints on processing, each subset inherently has a processing requirement less than or equal to the processing capability of the target platform’s hardware.)., Paek-Banitalebi does not explicitly disclose and grouping adjoining NN layers of the set of NN layers into a plurality of subsets of NN layers based on the determined count of MAC operations associated with each NN layer, and the partition of the neural network model is further determined based on the plurality of subsets of NN layers. However, Muthusamy discloses and grouping adjoining NN layers of the set of NN layers into a plurality of subsets of NN layers based on the determined count of MAC operations associated with each NN layer, and the partition of the neural network model is further determined based on the plurality of subsets of NN layers ([0029] a monolithic grouping strategy that has constraints on total size (e.g. Cloud Functions). In the monolithic grouping strategy, all model outputs are available every time when running the models and there is a high inferencing throughput. The dotted boxes in FIGS. 4-6 represent groups, which are the minimum units of deployment granularity. All layers in a group must be deployed together to one compute unit. Please note that a grouping strategy to have layers in a group corresponds to Applicant’s grouping adjoining NN layers of the set of NN layers based on the determined count of MAC operations of each NN layer, as since the group is deployed together to a compute unit based on constraints on total size, the constraint on size could be based upon the maximum processing capacity for a deployed model as previously disclosed by Paek. Furthermore, as they are all deployed together when grouped, this corresponds to them being adjoining, and there is inherently a partition of the neural network model when the NN layers of the set of NN layers are grouped to separate them from other groups. ). Paek-Banitalebi and Muthusamy are both considered to be analogous to the claimed invention because they are in the same field of managing resources for neural network models. Therefore, it would have been obvious to someone of ordinary skill in the art prior to the effective filing date of the claimed invention to have modified Paek-Banitalebi to incorporate the teachings of Muthusamy to modify the system as described in Claim 4 that determines a count of MAC operations associated with each NN layer where each count of operations associated with each subset is less than or equal to the processing capability of the device to group adjoining NN layers into subsets based on the determined MAC operations of each layer, allowing for improved computing resource management, as described in Muthusamy. Regarding Claim 10, Paek-Banitalebi-Muthusamy as described in Claim 9, Banitalebi further discloses partitioning the neural network model based on the plurality of subsets of NN layers, wherein each subset of the plurality of subsets of NN layers corresponds to a sub-model of the plurality of sub-models, and the plurality of sub-models is extracted further based on the partitioning ([0010] identifying a first set of one or more neural network layers from the trained neural network for inclusion in the first neural network and a second set of one or more neural network layers from the trained neural network for inclusion in the second neural network. Please note that identifying corresponding sets of neural network layers for inclusion in the split neural network models corresponds to Applicant’s partitioning the neural network model based on the plurality of subsets of NN layers, wherein each subset of the plurality of subsets of NN layers corresponds to a sub-model of the plurality of sub-models, and the plurality of sub-models is extracted further based on the partitioning. This is because, as previously stated by Muthusamy, each subset of the plurality of subsets belongs to a model, which could include the previously mentioned sub-models of Banitalebi, and therefore the partitions between subsets would also allow for the extraction of sub-models that they correspond to.) Regarding Claim 11, Paek-Banitalebi-Muthusamy as described in Claim 9, Paek further discloses wherein the count of MAC operations associated with each subset is a sum of counts of MAC operations associated with adjoining NN layers of the set of NN layers that may be grouped into a corresponding subset of the plurality of subsets of NN layers (Col. 7, Lines 55-58- the memory allocations 344 correspond to code for allocating memory portions based on a determined size of each layer of the NN and/or based on an amount of memory available at the target device. Please note that as the size of each layer of the NN may be determined according to constraints such as the previously mentioned compute resource constraints, a set of layers as grouped in the subset disclosed by Muthusamy would be able to have the count of MAC operations of each subset of the plurality of subsets of NN layers determined via the sum of counts of MAC operations of adjoining layers of the set that are grouped; the sum of MAC operations of the adjoining grouped layers of the subset would be obvious as a measure of the MAC operations of the subset.). Regarding Claim 12, Paek-Banitalebi-Muthusamy as described in Claim 9, Paek further discloses wherein the determined count of MAC operations associated with each NN layer of the set of NN layers of the neural network model is less than or equal to the processing capability of the first electronic device (Col. 2, Lines 26-31- when deploying a given deep neural network for execution on a target platform and/or target processor on the target platform, depending on the available hardware, resource constraints (e.g., memory and/or computing) can be encountered that may limit the execution of a given neural network. Please note that the constraints of computing resource availability being considered when deploying layers in a group to one compute unit corresponds to Applicant’s wherein the determined count of MAC operations associated with each NN layer of the set of NN layers of the neural network model is less than or equal to the processing capability of the first electronic device, as since the group of NN layers is deployed together to a compute unit based on constraints on processing, each layer of the subset inherently has a processing requirement less than or equal to the processing capability of the target platform’s hardware.). Regarding Claim 13, Paek-Banitalebi as described in Claim 4, Banitalebi further discloses determining a size of a working memory of a second electronic device ([0041] (i) Edge device constraints 22: one or more parameters that define the computational abilities (e.g., memory size, CPU bit processing size) of the target edge device 88 that will be used to implement the edge DNN 30. Please note that the memory size of the target edge device corresponds to Applicant’s determining the size of a working memory of a second electronic device. ); determining a network communication capability indicative of a transmission bandwidth of the second electronic device and a reception bandwidth of the second electronic device ([0010] optimize, within an accuracy constraint, an overall latency of: […] transmission of the feature map output from the first device to the second device, and execution of the second neural network on the second device to generate an inference output based on the feature map output from the first device.; [0041] (iv) Network constraints 28: one or more parameters that specify information about the communication network links that exist between the cloud device 86 and the edge device 88. Please note that optimizing the overall latency of the second device and considering network constraints 28 about communication network links that exist between the cloud device 86 and the edge device 88 corresponds to Applicant’s determining a network communication capability indicative of a transmission bandwidth of the second electronic device and a reception bandwidth of the second electronic device, as it is known in the art that the specified parameters about the network link may include bandwidth for transmission and reception.); Paek-Banitalebi does not explicitly disclose and determining a subset of adjoining NN layers of a set of NN layers of the neural network model by grouping the adjoining NN layers based on at least one of the size of the working memory of the second electronic device, the network communication capability of the first electronic device, and the network communication capability of the second electronic device, wherein the determined subset of the adjoining NN layers is a sub-model of the plurality of sub-models. However, Muthusamy discloses and determining a subset of adjoining NN layers of a set of NN layers of the neural network model by grouping the adjoining NN layers based on at least one of the size of the working memory of the second electronic device, the network communication capability of the first electronic device, and the network communication capability of the second electronic device, wherein the determined subset of the adjoining NN layers is a sub-model of the plurality of sub-models ([0029] a monolithic grouping strategy that has constraints on total size (e.g. Cloud Functions). In the monolithic grouping strategy, all model outputs are available every time when running the models and there is a high inferencing throughput. The dotted boxes in FIGS. 4-6 represent groups, which are the minimum units of deployment granularity. All layers in a group must be deployed together to one compute unit. Please note that a grouping strategy to have layers in a group corresponds to Applicant’s grouping adjoining NN layers of the set of NN layers based on the size of the working memory of the second electronic device, as the group is deployed together to a compute unit based on constraints on total size, i.e., the size of the working memory of the second electronic device. Furthermore, as they are all deployed together when grouped, this corresponds to them being adjoining, and there is inherently a partition of the neural network model when the NN layers of the set of NN layers are grouped to separate them from other groups. As the claim states “at least one of” the determinations of subsets of adjoining NN layers, this is interpreted as fulfilling the requirements of the claim. ). Paek-Banitalebi and Muthusamy are both considered to be analogous to the claimed invention because they are in the same field of managing resources for neural network models. Therefore, it would have been obvious to someone of ordinary skill in the art prior to the effective filing date of the claimed invention to have modified Paek-Banitalebi to incorporate the teachings of Muthusamy to modify the system as described in Claim 4 that determines a size of the working memory of the second electronic memory device and determining its transmission and reception bandwidth to group adjoining NN layers into subsets based on the size of the working memory of the second electronic device, allowing for improved memory management and resource usage, as described in Muthusamy. Regarding Claim 14, Paek-Banitalebi-Muthusamy as described in Claim 13, Paek further discloses wherein a memory footprint of the determined subset is a sum of memory footprints of the adjoining NN layers of the set of NN layers, and a memory footprint of the subset is less than or equal to the size of the working memory of the second electronic device (Col. 7, Lines 55-58- the memory allocations 344 correspond to code for allocating memory portions based on a determined size of each layer of the NN and/or based on an amount of memory available at the target device. Please note that as the size of each layer of the NN is determined and memory is allocated accordingly at the target device, a set of layers as grouped in the subset disclosed by Muthusamy would be able to have the memory footprint of each subset of the plurality of subsets of NN layers determined via the sum of footprints of adjoining layers of the set that are grouped; the sum of memory footprints of the adjoining grouped layers of the subset would be obvious as a measure of the memory footprint of the subset. Furthermore, as it is also based on the amount of memory available at the target device, this corresponds to the subset memory footprint being less than or equal to the size of the working memory of the second electronic device.). Regarding Claim 15, Paek-Banitalebi-Muthusamy as described in Claim 13, Muthusamy further discloses detecting personal or sensitive data in the input ([0073] Security provides identity verification for cloud consumers and tasks, as well as protection for data and other resources. Please note that the security providing protection for data throughout the system corresponds to Applicant’s detecting personal or sensitive data in the input.); Banitalebi further discloses and determining, based on the detection, one or more NN layers of the set of NN layers that receive the input, wherein the adjoining NN layers in the determined subset are subsequent to each of the determined one or more NN layers of the set of NN layers ([0041] divide the trained DNN 11 into edge DNN 30 and cloud DNN 40 based on a set of constraints 20 that are received by the splitting module 10 as inputs. These constrains may include, for example: […] (ii) Cloud device constraints 24: one or more parameters that define the computational abilities of the target cloud device 86 that will be used to implement the cloud DNN 40. Please note that, since the splitting may occur based on cloud device constraints 24 of parameters of the target cloud device, such as the protected data of the cloud disclosed by Muthusamy that may be processed as input by the system, this corresponds to determining NN layers of the set of NN layers that receive the input based on the detection, as in dividing each layer of the neural network based on the constraints the system would necessarily be able to identify which receive protected data as input to satisfy the constraint. Additionally, since the layers are adjoining within the previously obtained subsets of Muthusamy, this corresponds to the adjoining NN layers in the determined subset being subsequent to each of the determined NN layers of the set of NN layers.). Regarding Claim 16, Paek-Banitalebi-Muthusamy as described in Claim 13, Banitalebi further discloses transmitting the extracted sub- model and the intermediate result to the second electronic device, wherein a bandwidth required for the transmission is less than or equal to the transmission bandwidth of the first electronic device, and a bandwidth required for a reception of the extracted sub-model and the intermediate result, by the second electronic device, is less than or equal to the reception bandwidth of the second electronic device ([0010] optimize, within an accuracy constraint, an overall latency of: […] transmission of the feature map output from the first device to the second device, and execution of the second neural network on the second device to generate an inference output based on the feature map output from the first device.; [0041] (iv) Network constraints 28: one or more parameters that specify information about the communication network links that exist between the cloud device 86 and the edge device 88. Please note that transmitting the feature map output from the first device to the second device corresponds to Applicant’s transmitting the extracted sub- model and the intermediate result to the second electronic device. Furthermore, it is obvious to one of ordinary skill in the art that optimizing the overall latency of this process between the devices corresponds to Applicant’s bandwidth required for the transmission being less than or equal to the transmission bandwidth of the first electronic device, and a bandwidth required for a reception of the extracted sub-model and the intermediate result, by the second electronic device, being less than or equal to the reception bandwidth of the second electronic device. This is because an “optimized latency” would inherently have the bandwidth required for the transmission being less than or equal to the transmission of the first device, i.e., up to its maximum, and the bandwidth required for reception of the feature map output corresponding to the extracted sub-model and intermediate result by the second device to be less than or equal to the reception of its second electronic device, i.e., up to its respective maximum.). Regarding Claim 17, Paek-Banitalebi-Muthusamy as described in Claim 13, Banitalebi further discloses controlling the second electronic device to execute a second set of operations for the received sub-model, wherein the second set of operations comprises: a fifth operation to load the sub-model in a working memory of the second electronic device ([0041] a cloud DNN 40 that is configured for deployment on a target cloud device 86. Please note that a cloud DNN 40 being configured for deployment on a target cloud device 86 corresponds to Applicant’s controlling the second electronic device to execute a second set of operations for the received sub-model, wherein the second set of operations comprises: a fifth operation to load the sub-model in a working memory of the second electronic device. This is because an inherent aspect of deployment of a neural network model is being loaded into memory.); a sixth operation to generate a result by an application of the sub-model on the output ([0088] execution of the second neural network on the second device to generate an inference output. Please note that generating an inference output on the second device by a second neural network corresponds to Applicant’s sixth operation to generate a result by an application of the sub-model on the output.); Paek further discloses and an eighth operation to render the result on a second display device (Col. 17, Lines 17-20- The output device interface 806 may enable, for example, the display of images generated by electronic system 800. Output devices that may be used with the output device interface 806. Please note that displaying images generated by electronic system 800 on the output device interface 806, where there may possibly be multiple output devices used with the interface, corresponds to Applicant’s eight operation to render the result on a second display device, i.e., on a device distinct from the first. ). a seventh operation to unload the sub-model from the working memory of the second electronic device (Col. 3, Lines 4-6- deallocation techniques, which are often performed during running of the neural network model. Please note that deallocation performed during running of the neural network model corresponds to Applicant’s seventh operation to unload the sub-model from the working memory of the second electronic device.); Regarding Claim 18, Paek-Banitalebi-Muthusamy as described in Claim 17, Banitalebi further discloses the second set of operations further comprises a ninth operation to transmit the result to the first electronic device, a bandwidth required for the transmission is less than or equal to the transmission bandwidth of the second electronic device, and a bandwidth required for a reception of the result, by the first electronic device, is less than or equal to the reception bandwidth of the first electronic device ([0010] optimize, within an accuracy constraint, an overall latency of: […] transmission of the feature map output from the first device to the second device, and execution of the second neural network on the second device to generate an inference output based on the feature map output from the first device.; [0041] (iv) Network constraints 28: one or more parameters that specify information about the communication network links that exist between the cloud device 86 and the edge device 88. Please note that execution of the second neural network on the second device to generate an inference output based on the feature map output from the first device corresponds to Applicant’s transmitting the result to the first electronic device. Furthermore, it is obvious to one of ordinary skill in the art that optimizing the overall latency of this process between the devices corresponds to Applicant’s bandwidth required for the transmission being less than or equal to the transmission bandwidth of the second electronic device, and a bandwidth required for a reception of the result, by the first electronic device, being less than or equal to the reception bandwidth of the first electronic device. This is because an “optimized latency” would inherently have the bandwidth required for the transmission being less than or equal to the transmission of the second device, i.e., up to its maximum, and the bandwidth required for reception of the inference output corresponding to the result by the first device to be less than or equal to the reception of its second electronic device, i.e., up to its respective maximum.). Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Seok et al. (US 20240169201 A1) discloses performing ML tasks on resource constrained devices, a pipeline in which results are stored before feeding them to the next stage, storing weight data and intermediate input and output data, a maximum throughput, multiply and accumulate operations(see [0006, 0010-0011, 0014, 0039, 0055, 0060-0061]). Kierat et al. (US 20240119267 A1) discloses storage to store forward and output weight and I/O data for a neural network and for each layer, loading the neural network into processors, an intermediate representation of a model, transmitting neural networks, and storing intermediate data in buffers (see [00179-0182, 0198, 0313, 0433]). Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARAZ T AKBARI whose telephone number is (571)272-4166. The examiner can normally be reached Monday-Thursday 9:30am-7:30pm ET. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, April Blair can be reached at (571)270-1014. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /FARAZ T AKBARI/Examiner, Art Unit 2196 /APRIL Y BLAIR/Supervisory Patent Examiner, Art Unit 2196
Read full office action

Prosecution Timeline

Jan 27, 2023
Application Filed
Jan 15, 2026
Non-Final Rejection — §103 (current)

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
0%
Grant Probability
0%
With Interview (+0.0%)
3y 3m
Median Time to Grant
Low
PTA Risk
Based on 2 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month