Office Action Analysis: 17902632 — METHOD AND SYSTEM FOR SPLITTING AND BIT-WIDTH ASSIGNMENT OF DEEP LEARNING MODELS FOR INFERENCE ON DISTRIBUTED SYSTEMS

Office Action

§101 §102 §103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is in response to the amendment filed on November 26th, 2025. The amendments are linked to the original application filed on September 2nd, 2022.

Response to Amendment
The Examiner thanks the applicant for the remarks, edits and arguments.
Regarding Claim Rejections – 35 U.S.C. 101
Applicant Remarks:
	The applicant states that they have amended the claims to recite a practical application. They believe that the claims currently recite patent eligible subject matter. In particular the applicant states the limitation, “assigning weight bit-widths for weights that configure the first set of one or more neural network layers and feature map bit-widths for feature maps that are generated by the first set of one or more neural network layers” recite a non-existing process that is not routine or conventional and therefore would not be considered generic processes. According to the applicant there is support in the specification to prove that this process is novel and not generic as well.
	Next, the applicant argues that the specification [0008] recites a technical problem of current edge-cloud computing architectures. It is recited that some edge-cloud computing models have reduced accuracy and increased latency and it is hard to implement any model to this architecture which requires specific programming. The solution to this proposed problem is also recited in the specification and the claims according to the applicant. Further the applicant recites claim 1 as an example of a technical improvement to the proposed problem. Accordingly, the applicant restates that the claim 1 recites a practical application and provides a nonconventional or generic process to complete this.
	Next, the applicant states that the claimed invention also provide evidence of improving computational operations and efficiency. The applicant points to the specification [0092] which recites a process which improves computation with optimization and accuracy by properly selecting where to split a neural network between two different devices. To further this argument the applicant points to Appeals Review Panel (ARP) decision of Ex Parle Desjardins et al., which recites, among others, that “reducing storage requirements and preserving task performance across sequential training” is an improvement to current conventional systems and is considered a technical solution to a proposed problem. After this the applicant states that similar to this case, the current claims recite an improved methods and systems to edge-cloud computing and therefore provide sufficient evidence which integrates the claimed subject matter into a practical application. 
	Next, the applicant argues, the claims recite elements which are integrated into practical application and that the process recited is not conventional or routine. Therefore, with the amendments made, the applicant requests that the 101 rejections be withdrawn.

Examiner Response:
	The applicant states that the element, “assigning weight bit-widths for weights that configure the first set of one or more neural network layers and feature map bit-widths for feature maps that are generated by the first set of one or more neural network layers” recites a process that is not routine or conventional. As the claim is interpreted by the examiner this process consists of determining a bit-width for the weights and features maps for layers of a neural network, which is very similar to the process of quantization. This process will evaluate a network and usually adjust or convert the values to a different precision level. This is seen in the proposed art Liu as well. Therefore, the examiner would disagree and not consider this process novel or conventional.
	Next, the applicant argues that the claims recite a technical improvement to a technical problem stated in the specification. The examiner would like to note that the problem statement in the specification is defined, however the claims fail to recite a technical solution to this problem. As stated in the specification, the process of splitting a neural network between an edge device and cloud server can decreases model accuracy and increases latency. The claims do state the use of an accuracy constraint which is used to optimize overall latency. However, these fail to show how this process will improve a model’s accuracy and decrease latency. As the claims are interpreted, the accuracy is considered in claims 8 and 19, but they fail to disclose how inaccuracies are handled. As it is understood by the examiner, the most accurate splitting model is determined after reviewing the potential split models. This fails to teach how this improves overall accuracy of machine learning model. It appears to select the most accurate model. The claims also disclose this in regards to the performance and latency of the model. Claim 16 discloses that the selected model is determined by taking into consideration the transmission cost. This teaches that the model is determined based on selecting a best option out of a set of models. This fails to disclose how this would improve the overall latency of the model. The examiners interpretation of this would be selecting the best model based on these constraints, which fails to disclose an improvement, it is merely selecting the best model.
	Next, the applicant argues that the specification recites an improvement to computational operations and efficiency. The examiner must once again look at the claims and see if this is reflected in the claims. For the reasons stated in the previous paragraphs, the examiner believes that the claims fail to recite a process which improves computational operation and efficiency. The model disclosed in the claims state process for splitting a neural network between a edge device and a cloud server. This problem is a common issue and the problems of decreased accuracy and increased latency is a common problem in the field. The claims fail to disclose how this process provides a solution to that problem. The claims recite a process and system that is able to process a neural network and then determine how to split that network. The claims fail to again recite how this improves computations and efficiency, instead state how the most efficient model after an evaluation is selected. This does not teach how this is improves computation time and efficiency.
	Next, the applicant argues that the claims as a whole are integrated into a practical application. While evaluating the amended claims using the Alice/Mayo test, each and every limitation is considered. While evaluating the claims any Abstract ideas are identified and evaluated as well. If any abstract ideas are located then the Alice/Mayo test will be used to further evaluate the remaining claims. While reviewing the amended claims, the examiner did locate and identify possible abstract ideas, so further evaluation and consideration was needed. The remaining limitations are evaluated to see if they are not routine or conventional and if they can integrate the claims as a whole into a practical application. While reviewing the remaining limitations, the examiner was able to locate and evaluate limitations that disclose routine or conventional process, steps or extra-solution activity. After further review of the amended claims, it was found that, when considering the claims as a whole, they did disclose processes which are routine, well-understood or conventional, these examples can be viewed below, see 101 rejection. Therefore, the current amended claims have been evaluated and it was determined they recite patent ineligible material and the rejection under 35 U.S.C. 101 is upheld.
	
Regarding Claim Rejections – 35 U.S.C. 102
Applicant Remarks:
	The applicant argues that the art Campos fails to teach key limitations of the claimed invention. The applicant states that Campos fails to teach the partitioning of a neural network by layers. It is argued that this is a key element of the claims and it not explicitly stated in Campos. Next the applicant states that Campos fails to teach, “assigning weight bit-widths for weights that configure the first set of one or more neural network layers and feature map bit-widths for feature maps that are generated by the first set of one or more neural network layers” and instead only mentions an expanded neural network with given bit-widths. Accordingly, Campos does not explicitly teach the assigning weights bit-widths as disclosed in claim 1.
	Next, the applicant argues that Capos fails to teach, “the identifying and the assigning being performed to optimize, within an accuracy constraint, an overall latency of: the execution of the first neural network on the first device to generate a feature map output based on input data, transmission of the feature map output from the first device to the second device, and execution of the second neural network on the second device to generate an inference output based on the feature map output from the first device” because Campos does not teach optimization of overall latency by using an accuracy constraint.
	Next, the applicant argues that Campos fails to teach the identification and assigning of the weight and feature bit-widths. Further the applicant states Campos fails to teach each and every element of the independent claims and fails to teach optimization as stated in these claims.
	Next, the applicant argues that Campos fails to teach Claim 2, which is not amended to be part of the independent claims. Again, the applicant points out that campos fails to teach a splitting solution based on an accuracy constraint. Finally, the applicant states that Campos fails to teach a splitting solution which is able to identify layers, set weight and feature map bit-widths as recited in the now amended independent claims.
	For the reasons stated above, the applicant believes that Campos fails to explicitly teach the claimed subject matter. Therefore, the applicant believes that the rejection under 35 U.S.C. 102 should be withdrawn.

Examiner Response:
	After each amendment the examiner must review the previous rejection and review the amended claims and remarks. After further reviewing the claims, it was found that Campos does not thoroughly teach the amended claims. The art does teach elements of the claims but does fail to explicitly disclose all limitations of the independent claims. Further search was performed and no single art was found that would properly teach the elements in the proposed amended claims. Therefore, the examiner has withdrawn the 102 rejection.

Regarding Claim Rejections – 35 U.S.C. 103
Applicant Remarks:
	The applicant argues that Campos fails to teach the core elements of the claims of the independent claims. Further the art Li would fail to teach the dependent claims because it cannot teach the deficiencies of Campos. Since Li cannot overcome the deficiencies of Campos the combined art would fail to teach the elements of the claimed invention. Therefore, since neither art, in combination or alone, can teach the claimed amendments, the applicant believes the rejection under 35 U.S.C. 103 should be withdrawn.

Examiner Response:
	As stated above, Campos does fail to explicitly teach each element of the amended independent claims. However, per the examiner’s interpretation of the claims, Campos still does teach key elements of the claims. After each amendment a full and complete search is conducted. While searching, the examiner found art that better matches the interpretation of the claims and is similar to Campos. According to the examiners interpretation of the claims. the combination of Campos, Li and Liu do teach the elements of the proposed claims. Therefore, with the addition of the new art, the 103 rejection is upheld, see 103 rejection below. 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1, 3-13, and 15-22 are rejected under 35 U.S.C 101 because the claimed invention is directed to an abstract idea without significantly more. The analysis of the claims will follow the 2019 Revised Patent Subject Matter Eligibility Guidance, 84 Fed. Reg. 50 (“2019 PEG”).

Claim 1
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
	Claim 1, recites “A method for splitting a trained neural network into a first neural network for execution on a first device and a second neural network for execution on a second device, comprising” therefore it is directed to the statutory category of a process.

Step 2A Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	The claim recites, inter alia:
	“identifying a first set of one or more neural network layers from the trained neural network for inclusion in the first neural network and a second set of one or more neural network layers from the trained neural network for inclusion in the second neural network; and” Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mental process of evaluating and observing data, which is an evaluation or observation that is practically capable of being performed in the human mind with the assistance of pen and paper. A human is able to evaluate a neural network and judge where to place partition between the layers. The limitation is merely applying an abstract idea on generic computer system. See MPEP 2106.04(a)(2)(III)(c).
	“the identifying and the assigning being performed to optimize, within an accuracy constraint, an overall latency of: the execution of the first neural network on the first device to generate a feature map output based on input data, transmission of the feature map output from the first device to the second device, and execution of the second neural network on the second device to generate an inference output based on the feature map output from the first device,” Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mental process of evaluating and observing data, which is an evaluation or observation that is practically capable of being performed in the human mind with the assistance of pen and paper. A human is able to evaluate a network for accuracy. The limitation is merely applying an abstract idea on generic computer system. See MPEP 2106.04(a)(2)(III)(c).
	“wherein the identifying and the assigning comprise: selecting, from among a plurality of potential splitting solutions for splitting the trained neural network into the first set of one or more neural network layers and the second set of one or more neural network layers, a set of one or more feasible solutions that fall within the accuracy constraint, wherein each feasible solution identifies: (i) a splitting point that indicates the layers from the trained neural network that are to be included in the first set of one or more layers: (ii) a set of weight bit-widths for the weights that configure the first set of one or more neural network layers: and (iii) a set of feature map bit-widths for the feature maps that are generated by the first set of one or more neural network layers.” Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mental process of evaluating and observing data, which is an evaluation or observation that is practically capable of being performed in the human mind with the assistance of pen and paper. A human is able to observe and evaluate a network in order to perform actions on it. The limitation is merely applying an abstract idea on generic computer system. See MPEP 2106.04(a)(2)(III)(c).

Step 2A Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	The claim recites the additional elements, “assigning weight bit-widths for weights that configure the first set of one or more neural network layers and feature map bit-widths for feature maps that are generated by the first set of one or more neural network layers;” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).

Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
	Finally, the claim taken as a whole does not contain an inventive concept which provides significantly more than the abstract idea. The additional elements, “assigning weight bit-widths for weights that configure the first set of one or more neural network layers and feature map bit-widths for feature maps that are generated by the first set of one or more neural network layers;” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
Taken alone or in combination, the additional elements of the claim do not provide an inventive concept and thus the claim is subject-matter ineligible.

Claim 2 (Cancelled)

Claim 3
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
	A process, as above.

Step 2A Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	The claim recites, inter alia:
	“selecting an implementation solution from the set of one or more feasible solutions;” Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mental process of evaluating and observing data, which is an evaluation or observation that is practically capable of being performed in the human mind with the assistance of pen and paper. A human is able to evaluate and select a potential solution. The limitation is merely applying an abstract idea on generic computer system. See MPEP 2106.04(a)(2)(III)(c).

Step 2A Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	The claim recites the additional elements, “generating, in accordance with the implementation solution, first neural network configuration information that defines the first neural network and second neural network configuration information that defines the second neural network; and” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
	“providing the first neural network configuration information to the first device and the first second neural network configuration information to the second device.” is an insignificant extra-solution activity required for any uses of the mental processes (see MPEP § 2106.05(g)) As such, the claim is ineligible.

Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
	Finally, the claim taken as a whole does not contain an inventive concept which provides significantly more than the abstract idea. The additional elements, “generating, in accordance with the implementation solution, first neural network configuration information that defines the first neural network and second neural network configuration information that defines the second neural network; and” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
	“providing the first neural network configuration information to the first device and the first second neural network configuration information to the second device.” is an insignificant extra-solution activity required for any uses of abstract ideas (see MPEP § 2106.05(g)), and is a well-understood, routine, conventional activity (see MPEP § 2106.05(d)(i); “Receiving or transmitting data over a network, e.g., using the Internet to gather data”.
Taken alone or in combination, the additional elements of the claim do not provide an inventive concept and thus the claim is subject-matter ineligible.

Claim 4
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
	A process, as above.	

Step 2A Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
The claim recites, inter alia:
	“wherein the selecting is further based on a memory constraint for the first device.” Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mental process of evaluating and observing data, which is an evaluation or observation that is practically capable of being performed in the human mind with the assistance of pen and paper. A human is able to make an evaluate based on observing data and constraints. The limitation is merely applying an abstract idea on generic computer system. See MPEP 2106.04(a)(2)(III)(c).

Step 2A Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	This claim does not recite any additional limitations which integrate the abstract idea into a practical application.

Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
	Finally, the claim taken as a whole does not contain an inventive concept which provides significantly more than the abstract idea and thus the claim is subject-matter ineligible.

Claim 5
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
	A process, as above.

Step 2A Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
The claim recites, inter alia:
“determining the plurality of potential splitting solutions is based on identifying transmission costs associated with different possible splitting points that are lower than a transmission cost associated with having all layers of the trained neural network included in the second neural network.” Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mental process of evaluating and observing data, which is an evaluation or observation that is practically capable of being performed in the human mind with the assistance of pen and paper. A human is able to evaluate a model and make a determination or judgement from that evaluation. The limitation is merely applying an abstract idea on generic computer system. See MPEP 2106.04(a)(2)(III)(c).

Step 2A Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	This claim does not recite any additional limitations which integrate the abstract idea into a practical application.

Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
	Finally, the claim taken as a whole does not contain an inventive concept which provides significantly more than the abstract idea and thus the claim is subject-matter ineligible.

Claim 6
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
	A process, as above.

Step 2A Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	The claim recites the abstract ideas of the preceding claims from which it depends. 

Step 2A Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	The claim recites the additional elements, “computing quantization errors for the combined performance of the first neural network and the second neural network for different weight bit-widths and feature map bit-widths for each of the plurality of potential solutions, wherein the selecting the set of one or more feasible solutions is based on selecting weight bit-widths and feature map bit-widths that result in computed quantization errors that fall within the accuracy constraint.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).

Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
	Finally, the claim taken as a whole does not contain an inventive concept which provides significantly more than the abstract idea. The additional elements, “computing quantization errors for the combined performance of the first neural network and the second neural network for different weight bit-widths and feature map bit-widths for each of the plurality of potential solutions, wherein the selecting the set of one or more feasible solutions is based on selecting weight bit-widths and feature map bit-widths that result in computed quantization errors that fall within the accuracy constraint.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
Taken alone or in combination, the additional elements of the claim do not provide an inventive concept and thus the claim is subject-matter ineligible.

Claim 7
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
	A process, as above.

Step 2A Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	The claim recites, inter alia:
	“wherein the different weight bit-widths and feature map bit-widths for each of the plurality of potential solutions are uniformly selected from sets of possible weight bit-widths and feature map bit-widths, respectively.” Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mental process of evaluating and observing data, which is an evaluation or observation that is practically capable of being performed in the human mind with the assistance of pen and paper. A human is able to evaluate data and make a judgement form that evaluation to determine potential solutions. The limitation is merely applying an abstract idea on generic computer system. See MPEP 2106.04(a)(2)(III)(c).

Step 2A Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	This claim does not recite any additional limitations which integrate the abstract idea into a practical application.

Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
	Finally, the claim taken as a whole does not contain an inventive concept which provides significantly more than the abstract idea and thus the claim is subject-matter ineligible.

Claim 8
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
	A process, as above.

Step 2A Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	The claim recites the abstract ideas of the preceding claims from which it depends. 

Step 2A Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	The claim recites the additional elements, “wherein the accuracy constraint comprises a defined accuracy drop tolerance threshold for combined performance of the first neural network and the second neural network relative to performance of the trained neural network.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).

Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
	Finally, the claim taken as a whole does not contain an inventive concept which provides significantly more than the abstract idea. The additional elements, “wherein the accuracy constraint comprises a defined accuracy drop tolerance threshold for combined performance of the first neural network and the second neural network relative to performance of the trained neural network.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
Taken alone or in combination, the additional elements of the claim do not provide an inventive concept and thus the claim is subject-matter ineligible.

Claim 9
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
	A process, as above.

Step 2A Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	The claim recites the abstract ideas of the preceding claims from which it depends. 

Step 2A Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	The claim recites the additional elements, “wherein the first device has lower memory capabilities than the second device.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).

Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
	Finally, the claim taken as a whole does not contain an inventive concept which provides significantly more than the abstract idea. The additional elements, “wherein the first device has lower memory capabilities than the second device.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
Taken alone or in combination, the additional elements of the claim do not provide an inventive concept and thus the claim is subject-matter ineligible.

Claim 10
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
	A process, as above.

Step 2A Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	The claim recites the abstract ideas of the preceding claims from which it depends. 

Step 2A Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	The claim recites the additional elements, “wherein the first device is an edge device and the second device is a cloud based computing platform.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).

Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
	Finally, the claim taken as a whole does not contain an inventive concept which provides significantly more than the abstract idea. The additional elements, “wherein the first device is an edge device and the second device is a cloud based computing platform.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
Taken alone or in combination, the additional elements of the claim do not provide an inventive concept and thus the claim is subject-matter ineligible.

Claim 11
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
	A process, as above.

Step 2A Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	The claim recites the abstract ideas of the preceding claims from which it depends. 

Step 2A Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	The claim recites the additional elements, “wherein the trained neural network is an optimized trained neural network represented as a directed acyclic graph.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).

Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
	Finally, the claim taken as a whole does not contain an inventive concept which provides significantly more than the abstract idea. The additional elements, “wherein the trained neural network is an optimized trained neural network represented as a directed acyclic graph.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
Taken alone or in combination, the additional elements of the claim do not provide an inventive concept and thus the claim is subject-matter ineligible.

Claim 12
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
	A process, as above.

Step 2A Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	The claim recites the abstract ideas of the preceding claims from which it depends. 

Step 2A Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	The claim recites the additional elements, “wherein the first neural network is a mixed-precision network comprising at least some layers that have different weight and feature map bit-widths than other layers.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).

Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
	Finally, the claim taken as a whole does not contain an inventive concept which provides significantly more than the abstract idea. The additional elements, “wherein the first neural network is a mixed-precision network comprising at least some layers that have different weight and feature map bit-widths than other layers.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
Taken alone or in combination, the additional elements of the claim do not provide an inventive concept and thus the claim is subject-matter ineligible.

Claim 13
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
	Claim 13 recites “A non-transient computer readable medium storing computer implementable instructions that configured to a computer system to perform a method for splitting a trained neural network into a first neural network for execution on a first device and a second neural network for execution on a second device, comprising:” therefore it is directed to the statutory category of a machine.

Step 2A Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	The claim recites, inter alia:
	“identifying a first set of one or more neural network layers from the trained neural network for inclusion in the first neural network and a second set of one or more neural network layers from the trained neural network for inclusion in the second neural network; and” Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mental process of evaluating and observing data, which is an evaluation or observation that is practically capable of being performed in the human mind with the assistance of pen and paper. A human is able to evaluate a neural network and judge where to place partition between the layers. The limitation is merely applying an abstract idea on generic computer system. See MPEP 2106.04(a)(2)(III)(c).
	“the identifying and the assigning being performed to optimize, within an accuracy constraint, an overall latency of: the execution of the first neural network on the first device to generate a feature map output based on input data, transmission of the feature map output from the first device to the second device, and execution of the second neural network on the second device to generate an inference output based on the feature map output from the first device,” Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mental process of evaluating and observing data, which is an evaluation or observation that is practically capable of being performed in the human mind with the assistance of pen and paper. A human is able to evaluate a network for accuracy. The limitation is merely applying an abstract idea on generic computer system. See MPEP 2106.04(a)(2)(III)(c).
	“wherein the identifying and the assigning comprise: selecting, from among a plurality of potential splitting solutions for splitting the trained neural network into the first set of one or more neural network layers and the second set of one or more neural network layers, a set of one or more feasible solutions that fall within the accuracy constraint, wherein each feasible solution identifies: (i) a splitting point that indicates the layers from the trained neural network that are to being included in the first set of one or more layers; (ii) a set of weight bit-widths for the weights that configure the first set of one or more neural network layers; and (iii) a set of feature map bit-widths for the feature maps that are generated by the first set of one or more neural network layers.” Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mental process of evaluating and observing data, which is an evaluation or observation that is practically capable of being performed in the human mind with the assistance of pen and paper. A human is able to observe and evaluate a network in order to perform actions on it. The limitation is merely applying an abstract idea on generic computer system. See MPEP 2106.04(a)(2)(III)(c).

Step 2A Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	The claim recites the additional elements, “assigning weight bit-widths for weights that configure the first set of one or more neural network layers and feature map bit-widths for feature maps that are generated by the first set of one or more neural network layers;” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).

Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
	Finally, the claim taken as a whole does not contain an inventive concept which provides significantly more than the abstract idea. The additional elements, “assigning weight bit-widths for weights that configure the first set of one or more neural network layers and feature map bit-widths for feature maps that are generated by the first set of one or more neural network layers;” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
Taken alone or in combination, the additional elements of the claim do not provide an inventive concept and thus the claim is subject-matter ineligible.

Claim 14 (Cancelled)

Claim 15
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
	A machine, as above.

Step 2A Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	The claim recites, inter alia:
	“wherein the method comprises selecting an implementation solution from the set of one or more feasible solutions;” Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mental process of evaluating and observing data, which is an evaluation or observation that is practically capable of being performed in the human mind with the assistance of pen and paper. A human is able to evaluate and select a potential solution. The limitation is merely applying an abstract idea on generic computer system. See MPEP 2106.04(a)(2)(III)(c).

Step 2A Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	The claim recites the additional elements, “generating, in accordance with the implementation solution, first neural network configuration information that defines the first neural network and second neural network configuration information that defines the second neural network; and” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
	“providing the first neural network configuration information to the first device and the first second neural network configuration information to the second device.” is an insignificant extra-solution activity required for any uses of the mental processes (see MPEP § 2106.05(g)) As such, the claim is ineligible.

Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
	Finally, the claim taken as a whole does not contain an inventive concept which provides significantly more than the abstract idea. The additional elements, “generating, in accordance with the implementation solution, first neural network configuration information that defines the first neural network and second neural network configuration information that defines the second neural network; and” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
	“providing the first neural network configuration information to the first device and the first second neural network configuration information to the second device.” is an insignificant extra-solution activity required for any uses of abstract ideas (see MPEP § 2106.05(g)), and is a well-understood, routine, conventional activity (see MPEP § 2106.05(d)(i); “Receiving or transmitting data over a network, e.g., using the Internet to gather data”.
Taken alone or in combination, the additional elements of the claim do not provide an inventive concept and thus the claim is subject-matter ineligible.

Claim 16
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
	A machine, as above.

Step 2A Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	The claim recites, inter alia:
	“prior to the selecting the set of one or more feasible solutions, determining the plurality of potential splitting solutions is based on identifying transmission costs associated with different possible splitting points that are lower than a transmission cost associated with having all layers of the trained neural network included in the second neural network.” Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mental process of evaluating and observing data, which is an evaluation or observation that is practically capable of being performed in the human mind with the assistance of pen and paper. A human is able to evaluate a model and make a determination or judgement from that evaluation. The limitation is merely applying an abstract idea on generic computer system. See MPEP 2106.04(a)(2)(III)(c).

Step 2A Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	This claim does not recite any additional limitations which integrate the abstract idea into a practical application.

Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
	Finally, the claim taken as a whole does not contain an inventive concept which provides significantly more than the abstract idea and thus the claim is subject-matter ineligible.

Claim 17
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
	A machine, as above.

Step 2A Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	The claim recites the abstract ideas of the preceding claims from which it depends. 

Step 2A Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	The claim recites the additional elements, “wherein the selecting comprises: computing quantization errors for the combined performance of the first neural network and the second neural network for different weight bit-widths and feature map bit-widths for each of the plurality of potential solutions, wherein the selecting the set of one or more feasible solutions is based on selecting weight bit-widths and feature map bit-widths that result in computed quantization errors that fall within the accuracy constraint.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).

Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
	Finally, the claim taken as a whole does not contain an inventive concept which provides significantly more than the abstract idea. The additional elements, “wherein the selecting comprises: computing quantization errors for the combined performance of the first neural network and the second neural network for different weight bit-widths and feature map bit-widths for each of the plurality of potential solutions, wherein the selecting the set of one or more feasible solutions is based on selecting weight bit-widths and feature map bit-widths that result in computed quantization errors that fall within the accuracy constraint.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
Taken alone or in combination, the additional elements of the claim do not provide an inventive concept and thus the claim is subject-matter ineligible.

Claim 18
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
	A machine, as above.

Step 2A Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	The claim recites, inter alia:
	“wherein the different weight bit-widths and feature map bit-widths for each of the plurality of potential solutions are uniformly selected from sets of possible weight bit-widths and feature map bit-widths, respectively.” Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mental process of evaluating and observing data, which is an evaluation or observation that is practically capable of being performed in the human mind with the assistance of pen and paper. A human is able to evaluate data and make a judgement form that evaluation to determine potential solutions. The limitation is merely applying an abstract idea on generic computer system. See MPEP 2106.04(a)(2)(III)(c).

Step 2A Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	This claim does not recite any additional limitations which integrate the abstract idea into a practical application.

Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
	Finally, the claim taken as a whole does not contain an inventive concept which provides significantly more than the abstract idea and thus the claim is subject-matter ineligible.

Claim 19
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
	A machine, as above.

Step 2A Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	The claim recites the abstract ideas of the preceding claims from which it depends. 

Step 2A Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	The claim recites the additional elements, “wherein the accuracy constraint comprises a defined accuracy drop tolerance threshold for combined performance of the first neural network and the second neural network relative to performance of the trained neural network.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).

Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
	Finally, the claim taken as a whole does not contain an inventive concept which provides significantly more than the abstract idea. The additional elements, “wherein the accuracy constraint comprises a defined accuracy drop tolerance threshold for combined performance of the first neural network and the second neural network relative to performance of the trained neural network.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
Taken alone or in combination, the additional elements of the claim do not provide an inventive concept and thus the claim is subject-matter ineligible.

Claim 20
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
	Claim 20, recites “A non-transient computer readable medium storing computer implementable instructions that configured to a computer system to perform a method for splitting a trained neural network into a first neural network for execution on a first device and a second neural network for execution on a second device, comprising:” therefore it is directed to the statutory category of a machine.

Step 2A Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	The claim recites, inter alia:
	“identifying a first set of one or more neural network layers from the trained neural network for inclusion in the first neural network and a second set of one or more neural network layers from the trained neural network for inclusion in the second neural network; and” Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mental process of evaluating and observing data, which is an evaluation or observation that is practically capable of being performed in the human mind with the assistance of pen and paper. A human is able to evaluate a neural network and judge where to place partition between the layers. The limitation is merely applying an abstract idea on generic computer system. See MPEP 2106.04(a)(2)(III)(c).
	“the identifying and the assigning being performed to optimize, within an accuracy constraint, an overall latency of: the execution of the first neural network on the first device to generate a feature map output based on input data, transmission of the feature map output from the first device to the second device, and execution of the second neural network on the second device to generate an inference output based on the feature map output from the first device,” Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mental process of evaluating and observing data, which is an evaluation or observation that is practically capable of being performed in the human mind with the assistance of pen and paper. A human is able to evaluate a network for accuracy. The limitation is merely applying an abstract idea on generic computer system. See MPEP 2106.04(a)(2)(III)(c).
	“wherein the identifying and the assigning comprise: selecting, from among a plurality of potential splitting solutions for splitting the trained neural network into the first set of one or more neural network layers and the second set of one or more neural network layers, a set of one or more feasible solutions that fall within the accuracy constraint, wherein each feasible solution identifies: (i) a splitting point that indicates the layers from the trained neural network that are to be included in the first set of one or more layers; (ii) a set of weight bit-widths for the weights that configure the first set of one or more neural network layers; and (iii) a set of feature map bit-widths for the feature maps that are generated by the first set of one or more neural network layers.” Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mental process of evaluating and observing data, which is an evaluation or observation that is practically capable of being performed in the human mind with the assistance of pen and paper. A human is able to observe and evaluate a network in order to perform actions on it. The limitation is merely applying an abstract idea on generic computer system. See MPEP 2106.04(a)(2)(III)(c).
	
Step 2A Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	The claim recites the additional elements, “assigning weight bit-widths for weights that configure the first set of one or more neural network layers and feature map bit-widths for feature maps that are generated by the first set of one or more neural network layers;” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).

Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
Finally, the claim taken as a whole does not contain an inventive concept which provides significantly more than the abstract idea. The additional elements, “assigning weight bit-widths for weights that configure the first set of one or more neural network layers and feature map bit-widths for feature maps that are generated by the first set of one or more neural network layers;” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
Taken alone or in combination, the additional elements of the claim do not provide an inventive concept and thus the claim is subject-matter ineligible.

Claim 21
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
	Claim 21 is a dependent claim of claim 1, therefore it is directed to a process, as above.

Step 2A Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	The claim recites the abstract ideas of the preceding claims from which it depends. 

Step 2A Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	The claim recites the additional elements, “wherein each layer in the first set of one or more neural network layers and the second set of one or more neural network layers represents an entire layer from the trained neural network.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).

Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
	Finally, the claim taken as a whole does not contain an inventive concept which provides significantly more than the abstract idea. The additional elements, “wherein each layer in the first set of one or more neural network layers and the second set of one or more neural network layers represents an entire layer from the trained neural network.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
Taken alone or in combination, the additional elements of the claim do not provide an inventive concept and thus the claim is subject-matter ineligible.

Claim 22
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
	Claim 22 is a dependent claim of claim 13, therefore it is directed to a machine, as above.


Step 2A Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	The claim recites the abstract ideas of the preceding claims from which it depends. 

Step 2A Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	The claim recites the additional elements, “wherein each layer in the first set of one or more neural network layers and the second set of one or more neural network layers represents an entire layer from the trained neural network.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).

Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
	Finally, the claim taken as a whole does not contain an inventive concept which provides significantly more than the abstract idea. The additional elements, “wherein each layer in the first set of one or more neural network layers and the second set of one or more neural network layers represents an entire layer from the trained neural network.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
Taken alone or in combination, the additional elements of the claim do not provide an inventive concept and thus the claim is subject-matter ineligible.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3-5, 7, 9, 11-13, 15, 16, 18, and 20-22 are rejected under 35 U.S.C. 103 as being unpatentable over Campos de Oliveira et al., (Campos de Oliveria et al., "Partitioning Convolutional Neural Networks to Maximize the Inference Rate on Constrained loT Devices", Sept. 29th, 2019, pp. 1-30, hereinafter "Campos") in view of Liu et al., (Liu et al, “Auto-Tuning Neural Network Quantization Framework for Collaborative Inference Between the Cloud and Edge”, 2018, pp. 401-411, hereinafter “Liu”).

Regarding claim 1, Campos discloses, “A method for splitting a trained neural network into a first neural network for execution on a first device and a second neural network for execution on a second device, comprising:” (Table 4, pp. 16; This table contains the number of devices used in each of the experiments. The First column discloses how many devices were used for a specific device based on its available memory. This article discloses a method can partition a DNN across multiple devices disclosing at least 2 devices using the STM32F469xx device model.)
“identifying a first set of one or more neural network layers from the trained neural network for inclusion in the first neural network and a second set of one or more neural network layers from the trained neural network for inclusion in the second neural network; and” (Proposed Deep Neural Networks Partitioning for Constrained loT Devices (DN2PCloT), pp. 9; "DN2PCloTaccepts a dataflow graph as the input for the neural network, in which the vertices represent the neural network neurons (input data, operations, or output data), and the edges represent data transfers between the vertices. This same approach is used in SCOTCH and METIS. DN2PCloT also receives a target graph, which contains information about the devices (the number of them in the system, computational power, communication performance, and system topology) in a way similar to SCOTCH." The system disclosed in this article shows that a dataflow graph of a neural network is input into the system. After this model is input, the system will begin the partitioning process disclosed in algorithm 1)
Campos fails to explicitly disclose, “assigning weight bit-widths for weights that configure the first set of one or more neural network layers and feature map bit-widths for feature maps that are generated by the first set of one or more neural network layers;”, “the identifying and the assigning being performed to optimize, within an accuracy constraint, an overall latency of: the execution of the first neural network on the first device to generate a feature map output based on input data, transmission of the feature map output from the first device to the second device, and execution of the second neural network on the second device to generate an inference output based on the feature map output from the first device,”, and “wherein the identifying and the assigning comprise: selecting, from among a plurality of potential splitting solutions for splitting the trained neural network into the first set of one or more neural network layers and the second set of one or more neural network layers, a set of one or more feasible solutions that fall within the accuracy constraint, wherein each feasible solution identifies: (i) a splitting point that indicates the layers from the trained neural network that are to be included in the first set of one or more layers: (ii) a set of weight bit-widths for the weights that configure the first set of one or more neural network layers: and (iii) a set of feature map bit-widths for the feature maps that are generated by the first set of one or more neural network layers.”.
However, Liu discloses, “assigning weight bit-widths for weights that configure the first set of one or more neural network layers and feature map bit-widths for feature maps that are generated by the first set of one or more neural network layers;” (Neural Network Quantization, pp. 404; “In order to accelerate inference and compress the size of DNN models, many network quantization methods are proposed. Some studies focus on scalar and vector quantization [4,7], while others center on fixed-point quantization [18,19]. In this paper, we are mainly interested in scalar quantization of INT8, which is supported by many advanced computing libraries such as Google’s gemmlowp [1] and NVIDIA’s cuDNN [2].” The method proposed in this article is able to split a neural network and quantize layers of that network. This method will evaluate the network and determine where to split the network and what to quantize. This will quantize the input, weights and output on the first set of layers, which is designed to be executed on the edge device.)
“the identifying and the assigning being performed to optimize, within an accuracy constraint, an overall latency of: the execution of the first neural network on the first device to generate a feature map output based on input data, transmission of the feature map output from the first device to the second device, and execution of the second neural network on the second device to generate an inference output based on the feature map output from the first device,” (Introduction, pp. 403; “In this paper, we propose an auto-tuning neural network quantization framework as shown in Fig. 1. During deployment, the framework profiles the operators of DNNs on edge devices and generates the candidate layers as partition points. When the neural network is ready to be used, the framework starts auto-tuning for network partition. In the time of inference, the first part of the network is quantized and executed on the edge devices, and the second part of the network is executed in the cloud servers. On the edge, we use quantized neural network to reduce storage and computation. In the cloud, we use original full-precision network to achieve high accuracy.” The model in this article will evaluate a neural network and determine where to split the network. This will run the Auto-Tuning portioning algorithm and determine the best place to split a neural network to reduce storage and overall computations.) And (Table 3, pp. 408; As seen in the table, the accuracy of the models is taken into consideration. During the experiment the accuracy of the models are stored and evaluated. This would show that this model does operate with accuracy in mind or at least recorded and evaluated.)
“wherein the identifying and the assigning comprise: selecting, from among a plurality of potential splitting solutions for splitting the trained neural network into the first set of one or more neural network layers and the second set of one or more neural network layers, a set of one or more feasible solutions that fall within the accuracy constraint, wherein each feasible solution identifies: (i) a splitting point that indicates the layers from the trained neural network that are to be included in the first set of one or more layers: (ii) a set of weight bit-widths for the weights that configure the first set of one or more neural network layers: and (iii) a set of feature map bit-widths for the feature maps that are generated by the first set of one or more neural network layers.” (Candidate Network Partition Points, pp. 405; “In general, a deep neural network contains many kinds of layers such as convolution layers, fully-connected layers and activation layers. We analyze the characteristics of different network layers and decide how to select candidate layers as reasonable partition points. The set of candidate layers, Rule = {L1, L2, . . . , Ln}, is based on the results of the following analysis.” The proposed method in this article will evaluate a neural network and determine the best location to split the network. This will evaluate the network with a set of rules and partition it accordingly. The model will also quantize a portion of the network so it can execute on an edge device. Once the partition point it located and tested, one portion of the model will be sent to the edge device for execution and the other half of the model will remain on the cloud server for processing. As stated above the portion of the model sent to the edge device will be quantized using another algorithm.   And (Introduction, pp. 403; “In this paper, we propose an auto-tuning neural network quantization framework as shown in Fig. 1. During deployment, the framework profiles the operators of DNNs on edge devices and generates the candidate layers as partition points. When the neural network is ready to be used, the framework starts auto-tuning for network partition. In the time of inference, the first part of the network is quantized and executed on the edge devices, and the second part of the network is executed in the cloud servers. On the edge, we use quantized neural network to reduce storage and computation. In the cloud, we use original full-precision network to achieve high accuracy.” As stated above, this model will determine a partition point of a neural network and execute the partitions on an edge device and a cloud server. This model is able to perform mixed-precision co-computation of a neural network.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Campos and Liu. Campos teaches an edge-cloud system which is able to perform actions of a neural network together using collaborative inference and quantization. Liu teaches an edge-cloud system that is able to partition a neural network and perform a partition on the edge device and the other partition the cloud server using collaborative inference and quantization. One of ordinary skill would have motivation to combine systems that use similar edge-cloud collaboration and quantization to perform actions of a neural network on two different devices, “Table 3 summarizes the results of our framework. We tested AlexNet, VGG16, ResNet-18 and GoogLeNet in different wireless network environments. For each neural network, the framework gives the best partition point and the fastest partition point. According to the inference time and the speed-up in the table, we can see that sometimes the speed of collaborative inference is faster than that of the cloud inference only. This is due to the large transmission overhead in the low-bandwidth wireless environments. In collaborative inference, we only need to download the parameters required by the edge inference, which can significantly reduce the size of download data. If users need to achieve the fastest inference speed, the fastest partition point should be selected. If users need to avoid privacy disclosure, the best partition point should be selected. In addition, quantized neural networks do not lead to a significant drop in accuracy (usually less than 1%).”(Liu, Experimental Results, pp. 408.)

Regarding claim 3, Liu discloses, “selecting an implementation solution from the set of one or more feasible solutions;” (Auto-Tuning Partition, pp. 407; “According to the candidate rule Rule, the framework performs auto-tuning partition for cloud-edge collaborative inference, as described in Algorithm 1. The input of the algorithm contains candidate layer rules and a neural network. Firstly, candidate rules are used to select candidate partition points in the neural network (lines 1–2). Secondly, all candidate partition networks are tested, and the information of performance is recorded in P (lines 3–9).” As seen in the algorithm different portioning points are taken into consideration. In lines 11-13 the best partition point is determined and the best point is saved as                         
                            
                                
                                    p
                                
                                
                                    b
                                    e
                                    s
                                    t
                                
                            
                        
                    , which is then returned by the system.)
“generating, in accordance with the implementation solution, first neural network configuration information that defines the first neural network and second neural network configuration information that defines the second neural network; and” (Auto-Tuning Partition, pp. 407; “The function of PredictPerformance can predict the performance of collaborative inference based on the results of off-line profiling. Finally, we find the best partition point in P for collaborative inference of mixed-precision neural network (lines 10–14).” The model will evaluate all the potential partition points for the network. Once the model makes the determination, the proposed point it returned to the system for execution. Lines 3-9 discloses the different points to be evaluated and line 14 returns the best point after an evaluation.)
“providing the first neural network configuration information to the first device and the first second neural network configuration information to the second device.” (Algorithm 1 Auto-Tuning Partition, pp. 407; This model is designed to have a neural network execute on an edge device and cloud server. After the different partition points are evaluated and the algorithm will return the best partition point, as seen in line 14. This will then be sent to the computing system and the model will be partitioned accordingly. This process is seen in the experiments and shows that the partitioned model is sent to the edge device and cloud server for execution.)

Regarding claim 4, Campos discloses, “wherein the selecting is further based on a memory constraint for the first device.” (Greedy: A Greedy Algorithm for Communication Reduction, pp. 17; "Next, if there is any space left in the device and the layer type is convolution, pooling, or input, then a two-dimensional number of vertices (width and height) that fit the rest of the memory of this device are assigned to it or, if the layer is fully connected, then a number of vertices that fit the rest of the memory of this device are assigned to it. After that or if there is any space left in the device, the next layer or the rest of the current layer is assigned to the next device and the process goes on until all the vertices are assigned to a device." This approach disclosed in this article teaches that the devices memory constraints are considered during the partitioning process. This algorithm will be used to find the optimal partition based on the memory limits of a device. This will partition across multiple devices until each vertex is assigned to a device.)

Regarding claim 5, Campos discloses, “determining the plurality of potential splitting solutions is based on identifying transmission costs associated with different possible splitting points that are lower than a transmission cost associated with having all layers of the trained neural network included in the second neural network.” (Proposed Deep Neural Networks Partitioning for Constrained loT Devices (DN2PCloT), pp.10; "Finally, we designed DN2PCloTto produce partitioning’s that maximize the neural network inference rate or reduce the amount of transferred data per inference. Other objective functions can be easily employed in DN2PCloT due to its design." The cost of transmission is taken into account when the partitioning algorithm to reduce the amount of transmission between devices. This teaches that that it is able to identify the transmission cost and determine partitions accordingly.)

Regarding claim 7, Liu discloses, “wherein the different weight bit-widths and feature map bit-widths for each of the plurality of potential solutions are uniformly selected from sets of possible weight bit-widths and feature map bit-widths, respectively.” (Neural Network Quantization, pp. 404-405; [Off-line Quantization] Step 1. Find quantization thresholds (                        
                            
                                
                                    T
                                
                                
                                    m
                                    i
                                    n
                                
                            
                        
                     and                         
                            
                                
                                    T
                                
                                
                                    m
                                    a
                                    x
                                
                            
                        
                    ) for calculating scale factors of Input, Weights and Output; Step 2. Quantize Input and Weights according to the following formula: (see formula (1)) where:                         
                            
                                
                                    R
                                    a
                                    n
                                    g
                                    e
                                
                                
                                    L
                                    P
                                
                            
                        
                     is the range of low-precision values (e.g. 255 for INT8),                         
                            
                                
                                    V
                                
                                
                                    l
                                    o
                                    w
                                    -
                                    p
                                    r
                                    e
                                    c
                                    i
                                    s
                                    i
                                    o
                                    n
                                
                            
                        
                     is the set of low-precision values, Data(x) is the original value,                         
                            
                                
                                    D
                                    a
                                    t
                                    a
                                
                                
                                    Q
                                
                            
                            (
                            x
                            )
                        
                     is the quantized value.” The model in this article will determine the quantization threshold when performing quantization of a layer. This will determine the bit-widths for the input, weight and output of a partition of a neural network. As stated above the edge model will be quantized according to the selected precision level.)

Regarding claim 9, Campos discloses, “wherein the first device has lower memory capabilities than the second device.” (Dataflow Graphs and Neural Network Models, pp. 4; "Figure 1b shows the same dataflow graph partitioned for distributed execution on two fictional devices: device A, which can perform 18 FLOP/second (FLOP/s) and provide 20 B of memory and device B, which can perform 18 FLOP/sand provide 52 B of memory. Additionally, the communication link between these devices can transfer 4 B per second. The amount of transferred data per inference in this partitioning is 8 B because, although six edges are crossing the partitions, they represent the data transfer of only 8 B." This figure shows an example of two different devices and their memory capabilities. The figures also disclose how a neural network might be partition and applied to the separate devices. In this example device A would be comparable to the first device, which has 20 bytes, and the second device would be comparable to device B which has 52 bytes.)

Regarding claim 11, “wherein the trained neural network is an optimized trained neural network represented as a directed acyclic graph.” (Dataflow Graphs and Neural Network Models, pp. 4; "Some important concepts need to be defined before proceeding with the related work in ML, loT, and partitioning tools. Neural networks can be modeled as a dataflow graph. Dataflow graphs are composed of a directed acyclic graph that models the computation of a program through its data flow [29]. In a dataflow graph, vertices represent computations and may send/receive data to/from other vertices in the graph. In our approach, a vertex represents one or more neural network neurons and may also require an amount of memory to store the intermediate (layer) results and the neural network parameters required by the respective neurons it represents. Dataflow graph edges may contain weights to represent different amounts of data that are sent to other vertices." This system is able to take in and interpret dataflow graphs. A dataflow graph consists of a Directed Acyclic graph.)

Regarding claim 12, Campos discloses, “wherein the first neural network is a mixed-precision network comprising at least some layers that have different weight and feature map bit-widths than other layers.” (Methodology, pp. 14; "Figure 3 shows the dataflow graph of each Le Net version with the following per-layer data: the number of vertices in height, width, and depth, the layer type, and the amount of transferred data in byte required by each edge in each layer. In Figure 3, the cubes represent the original Le Net neurons and the circles and ellipses represent the dataflow graph vertices. One of the experiments run in this article discloses a NN which contains many different layers. As seen in figure 3, each layer contains different feature maps, weights and dimensions.)

Regarding claim 13, Campos discloses, “A computer system comprising one or more processing devices and one or more non-transient storages storing computer implementable instructions for execution by the one or more processing devices, wherein execution of the computer implementable instructions configures the computer system to perform a method for splitting a trained neural network into a first neural network for execution on a first device and a second neural network for execution on a second device, comprising:” (Conclusion, pp. 27; "In this work, we partitioned a Convolutional Neural Network for distributed inference into constrained Internet-of-Things devices using nine different approaches and we propose Deep Neural Networks Partitioning for Constrained loT Devices (DN2PCloT), an algorithm that partitions graphs representing Deep Neural Network for distributed execution on multiple constrained loT devices aiming for inference rate maximization or communication reduction. This algorithm adequately treats the memory required by the shared parameters and biases of CNNs so that DN 2PCloT can produce valid partitioning’s for constrained devices. Additionally, DN2PCloT makes it easy to use other objective functions as well." This article discloses a method for partitioning a NN onto edge devices. This system was designed for use on a generic computer which is a system containing memory devices, processors, I/0 devices and produces an output to many different Internet-of-Things devices.)
“identifying a first set of one or more neural network layers from the trained neural network for inclusion in the first neural network and a second set of one or more neural network layers from the trained neural network for inclusion in the second neural network; and” (Proposed Deep Neural Networks Partitioning for Constrained loT Devices (DN2PCloT), pp. 9; "DN 2PCloTaccepts a dataflow graph as the input for the neural network, in which the vertices represent the neural network neurons (input data, operations, or output data), and the edges represent data transfers between the vertices. This same approach is used in SCOTCH and METIS. DN2PCloT also receives a target graph, which contains information about the devices (the number of them in the system, computational power, communication performance, and system topology) in a way similar to SCOTCH." The system disclosed in this article shows that a dataflow graph of a neural network is input into the system. After this model is input, the system will begin the partitioning process disclosed in algorithm 1.)
Campos fails to explicitly disclose, “assigning weight bit-widths for weights that configure the first set of one or more neural network layers and feature map bit-widths for feature maps that are generated by the first set of one or more neural network layers;”, “the identifying and the assigning being performed to optimize, within an accuracy constraint, an overall latency of: the execution of the first neural network on the first device to generate a feature map output based on input data, transmission of the feature map output from the first device to the second device, and execution of the second neural network on the second device to generate an inference output based on the feature map output from the first device,”, and “wherein the identifying and the assigning comprise: selecting, from among a plurality of potential splitting solutions for splitting the trained neural network into the first set of one or more neural network layers and the second set of one or more neural network layers, a set of one or more feasible solutions that fall within the accuracy constraint, wherein each feasible solution identifies: (i) a splitting point that indicates the layers from the trained neural network that are to being included in the first set of one or more layers; (ii) a set of weight bit-widths for the weights that configure the first set of one or more neural network layers; and (iii) a set of feature map bit-widths for the feature maps that are generated by the first set of one or more neural network layers.”.
However, Liu discloses, “assigning weight bit-widths for weights that configure the first set of one or more neural network layers and feature map bit-widths for feature maps that are generated by the first set of one or more neural network layers;” (Neural Network Quantization, pp. 404; “In order to accelerate inference and compress the size of DNN models, many network quantization methods are proposed. Some studies focus on scalar and vector quantization [4,7], while others center on fixed-point quantization [18,19]. In this paper, we are mainly interested in scalar quantization of INT8, which is supported by many advanced computing libraries such as Google’s gemmlowp [1] and NVIDIA’s cuDNN [2].” The method proposed in this article is able to split a neural network and quantize layers of that network. This method will evaluate the network and determine where to split the network and what to quantize. This will quantize the input, weights and output on the first set of layers, which is designed to be executed on the edge device.)
“the identifying and the assigning being performed to optimize, within an accuracy constraint, an overall latency of: the execution of the first neural network on the first device to generate a feature map output based on input data, transmission of the feature map output from the first device to the second device, and execution of the second neural network on the second device to generate an inference output based on the feature map output from the first device,” (Introduction, pp. 403; “In this paper, we propose an auto-tuning neural network quantization framework as shown in Fig. 1. During deployment, the framework profiles the operators of DNNs on edge devices and generates the candidate layers as partition points. When the neural network is ready to be used, the framework starts auto-tuning for network partition. In the time of inference, the first part of the network is quantized and executed on the edge devices, and the second part of the network is executed in the cloud servers. On the edge, we use quantized neural network to reduce storage and computation. In the cloud, we use original full-precision network to achieve high accuracy.” The model in this article will evaluate a neural network and determine where to split the network. This will run the Auto-Tuning portioning algorithm and determine the best place to split a neural network to reduce storage and overall computations.) And (Table 3, pp. 408; As seen in the table, the accuracy of the models is taken into consideration. During the experiment the accuracy of the models are stored and evaluated. This would show that this model does operate with accuracy in mind or at least recorded and evaluated.)
“wherein the identifying and the assigning comprise: selecting, from among a plurality of potential splitting solutions for splitting the trained neural network into the first set of one or more neural network layers and the second set of one or more neural network layers, a set of one or more feasible solutions that fall within the accuracy constraint, wherein each feasible solution identifies: (i) a splitting point that indicates the layers from the trained neural network that are to being included in the first set of one or more layers; (ii) a set of weight bit-widths for the weights that configure the first set of one or more neural network layers; and (iii) a set of feature map bit-widths for the feature maps that are generated by the first set of one or more neural network layers.” (Candidate Network Partition Points, pp. 405; “In general, a deep neural network contains many kinds of layers such as convolution layers, fully-connected layers and activation layers. We analyze the characteristics of different network layers and decide how to select candidate layers as reasonable partition points. The set of candidate layers, Rule = {L1, L2, . . . , Ln}, is based on the results of the following analysis.” The proposed method in this article will evaluate a neural network and determine the best location to split the network. This will evaluate the network with a set of rules and partition it accordingly. The model will also quantize a portion of the network so it can execute on an edge device. Once the partition point it located and tested, one portion of the model will be sent to the edge device for execution and the other half of the model will remain on the cloud server for processing. As stated above the portion of the model sent to the edge device will be quantized using another algorithm.   And (Introduction, pp. 403; “In this paper, we propose an auto-tuning neural network quantization framework as shown in Fig. 1. During deployment, the framework profiles the operators of DNNs on edge devices and generates the candidate layers as partition points. When the neural network is ready to be used, the framework starts auto-tuning for network partition. In the time of inference, the first part of the network is quantized and executed on the edge devices, and the second part of the network is executed in the cloud servers. On the edge, we use quantized neural network to reduce storage and computation. In the cloud, we use original full-precision network to achieve high accuracy.” As stated above, this model will determine a partition point of a neural network and execute the partitions on an edge device and a cloud server. This model is able to perform mixed-precision co-computation of a neural network.)

Regarding claim 15, Liu discloses, “wherein the method comprises selecting an implementation solution from the set of one or more feasible solutions;” (Auto-Tuning Partition, pp. 407; “According to the candidate rule Rule, the framework performs auto-tuning partition for cloud-edge collaborative inference, as described in Algorithm 1. The input of the algorithm contains candidate layer rules and a neural network. Firstly, candidate rules are used to select candidate partition points in the neural network (lines 1–2). Secondly, all candidate partition networks are tested, and the information of performance is recorded in P (lines 3–9).” As seen in the algorithm different portioning points are taken into consideration. In lines 11-13 the best partition point is determined and the best point is saved as                         
                            
                                
                                    p
                                
                                
                                    b
                                    e
                                    s
                                    t
                                
                            
                        
                    , which is then returned by the system.)
“generating, in accordance with the implementation solution, first neural network configuration information that defines the first neural network and second neural network configuration information that defines the second neural network; and” (Auto-Tuning Partition, pp. 407; “The function of PredictPerformance can predict the performance of collaborative inference based on the results of off-line profiling. Finally, we find the best partition point in P for collaborative inference of mixed-precision neural network (lines 10–14).” The model will evaluate all the potential partition points for the network. Once the model makes the determination, the proposed point it returned to the system for execution. Lines 3-9 discloses the different points to be evaluated and line 14 returns the best point after an evaluation.)
“providing the first neural network configuration information to the first device and the first second neural network configuration information to the second device.” (Algorithm 1 Auto-Tuning Partition, pp. 407; This model is designed to have a neural network execute on a edge device and cloud server. After the different partition points are evaluated and the algorithm will return the best partition point, as seen in line 14. This will then be sent to the computing system and the model will be partitioned accordingly. This process is seen in the experiments and shows that the partitioned model is sent to the edge device and cloud server for execution.)

Regarding claim 16, Campos discloses, “prior to the selecting the set of one or more feasible solutions, determining the plurality of potential splitting solutions is based on identifying transmission costs associated with different possible splitting points that are lower than a transmission cost associated with having all layers of the trained neural network included in the second neural network.” (Proposed Deep Neural Networks Partitioning for Constrained loT Devices (DN2PCloT), pp.10; "Finally, we designed DN2PCloTto produce partitioning’s that maximize the neural network inference rate or reduce the amount of transferred data per inference. Other objective functions can be easily employed in DN2PCloT due to its design." The cost of transmission is taken into account when the partitioning algorithm to reduce the amount of transmission between devices. This teaches that that it is able to identify the transmission cost and determine partitions accordingly.)

Regarding claim 18, Campos discloses, “wherein the different weight bit-widths and feature map bit-widths for each of the plurality of potential solutions are uniformly selected from sets of possible weight bit-widths and feature map bit-widths, respectively.” (Neural Network Quantization, pp. 404-405; [Off-line Quantization] Step 1. Find quantization thresholds (                        
                            
                                
                                    T
                                
                                
                                    m
                                    i
                                    n
                                
                            
                        
                     and                         
                            
                                
                                    T
                                
                                
                                    m
                                    a
                                    x
                                
                            
                        
                    ) for calculating scale factors of Input, Weights and Output; Step 2. Quantize Input and Weights according to the following formula: (see formula (1)) where:                         
                            
                                
                                    R
                                    a
                                    n
                                    g
                                    e
                                
                                
                                    L
                                    P
                                
                            
                        
                     is the range of low-precision values (e.g. 255 for INT8),                         
                            
                                
                                    V
                                
                                
                                    l
                                    o
                                    w
                                    -
                                    p
                                    r
                                    e
                                    c
                                    i
                                    s
                                    i
                                    o
                                    n
                                
                            
                        
                     is the set of low-precision values, Data(x) is the original value,                         
                            
                                
                                    D
                                    a
                                    t
                                    a
                                
                                
                                    Q
                                
                            
                            (
                            x
                            )
                        
                     is the quantized value.” The model in this article will determine the quantization threshold when performing quantization of a layer. This will determine the bit-widths for the input, weight and output of a partition of a neural network. As stated above the edge model will be quantized according to the selected precision level.)

Regarding claim 20, Campos discloses, “A non-transient computer readable medium storing computer implementable instructions that configured to a computer system to perform a method for splitting a trained neural network into a first neural network for execution on a first device and a second neural network for execution on a second device, comprising:” (Conclusion, pp. 27; "In this work, we partitioned a Convolutional Neural Network for distributed inference into constrained Internet-of- Things devices using nine different approaches and we propose Deep Neural Networks Partitioning for Constrained loT Devices (DN2PCloT), an algorithm that partitions graphs representing Deep Neural Network for distributed execution on multiple constrained loT devices aiming for inference rate maximization or communication reduction. This algorithm adequately treats the memory required by the shared parameters and biases of CNNs so that DN2PCloT can produce valid partitioning’s for constrained devices. Additionally, DN2PCloT makes it easy to use other objective functions as well." This article discloses a method for partitioning a NN onto edge devices. This system was designed for use on a generic computer which is a system containing memory devices, instructions or data stored on memory, processors, I/0 devices and produces an output to many different Internet-of-Things devices.)
“identifying a first set of one or more neural network layers from the trained neural network for inclusion in the first neural network and a second set of one or more neural network layers from the trained neural network for inclusion in the second neural network; and” (Proposed Deep Neural Networks Partitioning for Constrained loT Devices (DN2PCloT), pp. 9; "DN2PCloT accepts a dataflow graph as the input for the neural network, in which the vertices represent the neural network neurons (input data, operations, or output data), and the edges represent data transfers between the vertices. This same approach is used in SCOTCH and METIS. DN2PCloT also receives a target graph, which contains information about the devices (the number of them in the system, computational power, communication performance, and system topology) in a way similar to SCOTCH." The system disclosed in this article shows that a dataflow graph of a neural network is input into the system. After this model is input, the system will begin the partitioning process disclosed in algorithm 1.)
Campos fails to explicitly discloses, assigning weight bit-widths for weights that configure the first set of one or more neural network layers and feature map bit-widths for feature maps that are generated by the first set of one or more neural network layers;”, “the identifying and the assigning being performed to optimize, within an accuracy constraint, an overall latency of: the execution of the first neural network on the first device to generate a feature map output based on input data, transmission of the feature map output from the first device to the second device, and execution of the second neural network on the second device to generate an inference output based on the feature map output from the first device,”, and “wherein the identifying and the assigning comprise: selecting, from among a plurality of potential splitting solutions for splitting the trained neural network into the first set of one or more neural network layers and the second set of one or more neural network layers, a set of one or more feasible solutions that fall within the accuracy constraint, wherein each feasible solution identifies: (i) a splitting point that indicates the layers from the trained neural network that are to be included in the first set of one or more layers; (ii) a set of weight bit-widths for the weights that configure the first set of one or more neural network layers; and (iii) a set of feature map bit-widths for the feature maps that are generated by the first set of one or more neural network layers.”.
However, Liu discloses, “assigning weight bit-widths for weights that configure the first set of one or more neural network layers and feature map bit-widths for feature maps that are generated by the first set of one or more neural network layers;” (Neural Network Quantization, pp. 404; “In order to accelerate inference and compress the size of DNN models, many network quantization methods are proposed. Some studies focus on scalar and vector quantization [4,7], while others center on fixed-point quantization [18,19]. In this paper, we are mainly interested in scalar quantization of INT8, which is supported by many advanced computing libraries such as Google’s gemmlowp [1] and NVIDIA’s cuDNN [2].” The method proposed in this article is able to split a neural network and quantize layers of that network. This method will evaluate the network and determine where to split the network and what to quantize. This will quantize the input, weights and output on the first set of layers, which is designed to be executed on the edge device.)
“the identifying and the assigning being performed to optimize, within an accuracy constraint, an overall latency of: the execution of the first neural network on the first device to generate a feature map output based on input data, transmission of the feature map output from the first device to the second device, and execution of the second neural network on the second device to generate an inference output based on the feature map output from the first device,” (Introduction, pp. 403; “In this paper, we propose an auto-tuning neural network quantization framework as shown in Fig. 1. During deployment, the framework profiles the operators of DNNs on edge devices and generates the candidate layers as partition points. When the neural network is ready to be used, the framework starts auto-tuning for network partition. In the time of inference, the first part of the network is quantized and executed on the edge devices, and the second part of the network is executed in the cloud servers. On the edge, we use quantized neural network to reduce storage and computation. In the cloud, we use original full-precision network to achieve high accuracy.” The model in this article will evaluate a neural network and determine where to split the network. This will run the Auto-Tuning portioning algorithm and determine the best place to split a neural network to reduce storage and overall computations.) And (Table 3, pp. 408; As seen in the table, the accuracy of the models is taken into consideration. During the experiment the accuracy of the models are stored and evaluated. This would show that this model does operate with accuracy in mind or at least recorded and evaluated.)
“wherein the identifying and the assigning comprise: selecting, from among a plurality of potential splitting solutions for splitting the trained neural network into the first set of one or more neural network layers and the second set of one or more neural network layers, a set of one or more feasible solutions that fall within the accuracy constraint, wherein each feasible solution identifies: (i) a splitting point that indicates the layers from the trained neural network that are to be included in the first set of one or more layers; (ii) a set of weight bit-widths for the weights that configure the first set of one or more neural network layers; and (iii) a set of feature map bit-widths for the feature maps that are generated by the first set of one or more neural network layers.” (Candidate Network Partition Points, pp. 405; “In general, a deep neural network contains many kinds of layers such as convolution layers, fully-connected layers and activation layers. We analyze the characteristics of different network layers and decide how to select candidate layers as reasonable partition points. The set of candidate layers, Rule = {L1, L2, . . . , Ln}, is based on the results of the following analysis.” The proposed method in this article will evaluate a neural network and determine the best location to split the network. This will evaluate the network with a set of rules and partition it accordingly. The model will also quantize a portion of the network so it can execute on an edge device. Once the partition point it located and tested, one portion of the model will be sent to the edge device for execution and the other half of the model will remain on the cloud server for processing. As stated above the portion of the model sent to the edge device will be quantized using another algorithm.   And (Introduction, pp. 403; “In this paper, we propose an auto-tuning neural network quantization framework as shown in Fig. 1. During deployment, the framework profiles the operators of DNNs on edge devices and generates the candidate layers as partition points. When the neural network is ready to be used, the framework starts auto-tuning for network partition. In the time of inference, the first part of the network is quantized and executed on the edge devices, and the second part of the network is executed in the cloud servers. On the edge, we use quantized neural network to reduce storage and computation. In the cloud, we use original full-precision network to achieve high accuracy.” As stated above, this model will determine a partition point of a neural network and execute the partitions on an edge device and a cloud server. This model is able to perform mixed-precision co-computation of a neural network.)


    PNG
    media_image1.png
    768
    1319
    media_image1.png
    Greyscale
Regarding claim 21, Liu discloses, “wherein each layer in the first set of one or more neural network layers and the second set of one or more neural network layers represents an entire layer from the trained neural network.” (Auto-Tuning Partition, pp. 407; “According to the candidate rule Rule, the framework performs auto-tuning partition for cloud-edge collaborative inference, as described in Algorithm 1. The input of the algorithm contains candidate layer rules and a neural network. Firstly, candidate rules are used to select candidate partition points in the neural network (lines 1–2).” And Fig. 2, pp. 405; As seen in the figure the partition point consists of a point before/after a layer in a neural network. The model in this article will determine a partition point and then spilt the network by layers. Selected layers will be quantized and execute on an edge device. After this the results of the first partition are sent to the cloud server for further inference and to obtain the final result.)






Regarding claim 22, Liu discloses, “wherein each layer in the first set of one or more neural network layers and the second set of one or more neural network layers represents an entire layer from the trained neural network.” (Auto-Tuning Partition, pp. 407; “According to the candidate rule Rule, the framework performs auto-tuning partition for cloud-edge collaborative inference, as described in Algorithm 1. The input of the algorithm contains candidate layer rules and a neural network. Firstly, candidate rules are used to select candidate partition points in the neural network (lines 1–2).” And Fig. 2, pp. 405; As seen in the figure the partition point consists of a point before/after a layer in a neural network. The model in this article will determine a partition point and then spilt the network by layers. Selected layers will be quantized and execute on an edge device. After this the results of the first partition are sent to the cloud server for further inference and to obtain the final result.)

    PNG
    media_image2.png
    434
    870
    media_image2.png
    Greyscale









Claims 6, 8, 10, 17, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Campos and Liu in view of Li et al., (Li et al., "JALAD: Joint Accuracy- and Latency-Aware Deep Structure Decoupling for Edge-Cloud Execution", 2018, pp. 671-678, hereinafter "Li").

Regarding claim 6, Campos and Liu fail to explicitly disclose the elements of this claim. However, Li discloses, “computing quantization errors for the combined performance of the first neural network and the second neural network for different weight bit-widths and feature map bit-widths for each of the plurality of potential solutions, wherein the selecting the set of one or more feasible solutions is based on selecting weight bit-widths and feature map bit-widths that result in computed quantization errors that fall within the accuracy constraint.” (Compressed accuracy and data size predictor, pp. 673; "Feature quantization would result in accuracy loss, but the DNNs' prediction of an image is inexplicable yet, and the compressed data size is highly related to the input data. From Fig. 5 we can observe that the accuracy loss and data size of a specific compression setting c is stable, therefore we can predict the current accuracy and compressed size based on historical statistics. We build a lookup table Ai(c) to predict the accuracy loss and compressed data size Si(c) in a specific quantization bit c." This article discloses that feature quantization will result in some form of accuracy loss from the original model. To rectify this, they designed a process to monitor the accuracy of the model using a lookup table to help predict and control the accuracy loss within a threshold.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Campos, Liu and Li. Campos teaches an edge-cloud system which is able to perform actions of a neural network together using collaborative inference and quantization. Liu teaches an edge-cloud system that is able to partition a neural network and perform a partition on the edge device and the other partition the cloud server using collaborative inference and quantization. Li teaches a method which also partitions a deep neural network into parts so that a part of the model can execute on an edge device and the other part can execute on a cloud-based server. One of ordinary skill would have motivation to combine systems that use similar edge-cloud collaboration and quantization to perform actions of a neural network with a system that is able to partition a neural network into parts so that the neural network can accurately and optimally execute on part of the neural network on different edge devices and part of the neural network on a cloud-based network or database, "Our study reveals that such decoupling has potential to reduce the overall execution latency, we propose an accuracy-aware strategy for in-layer feature map compression to enable the decoupling. We further formulate the deep structure decoupling as an optimization problem to minimize the overall execution latency, with a guaranteed accuracy constraint. Our real-world experiments based on 4 representative deep neural networks demonstrate that our design can speed up the execution while guaranteeing the accuracy loss within a user-define boundary." (Li, Concluding Remarks, pp. 677).

Regarding claim 8, Campos and Liu fail to explicitly disclose the elements of this claim. However, Li discloses, “wherein the accuracy constraint comprises a defined accuracy drop tolerance threshold for combined performance of the first neural network and the second neural network relative to performance of the trained neural network.” (Impact of Accuracy Threshold, pp. 676; "Next, we study the impact of our design on the model accuracy. We choose different accuracy threshold                 
                    ∆
                    α
                
             to test JALAD's latency performance and plot the average execution latency and decoupling decision in Figure 7. We observe that as the threshold increases, our design can achieve better latency gain. The reason is that JALAD can either change the decoupling layer or cast the in-layer feature maps into lower bit-depth (which means transmitting fewer bytes) to achieve lower latency." This article discloses a method which splits a DNN to execute a portion of the model on an edge device and the other portion on a cloud network. This Model uses an accuracy threshold ensure that the original model is accurately executed. This model utilizes a threshold at varying values to determine a balance between accuracy and latency.)

Regarding claim 10, Campos and Liu fail to explicitly disclose the elements of this claim. However, Li discloses, “wherein the first device is an edge device and the second device is a cloud based computing platform.” (Figure 1, pp. 673; Figure 1 discloses the framework of the JALAD system. This figure depicts a system that is able to partition a deep neural network to run a portion of the network on an edge device and another portion of the network on a cloud-based system. The different computing systems are outlined in the boxes shown below.)
    PNG
    media_image3.png
    207
    686
    media_image3.png
    Greyscale


Regarding claim 17, Campos and Liu fail to explicitly disclose the elements of this claim. However, Li discloses, “wherein the selecting comprises: computing quantization errors for the combined performance of the first neural network and the second neural network for different weight bit-widths and feature map bit-widths for each of the plurality of potential solutions, wherein the selecting the set of one or more feasible solutions is based on selecting weight bit-widths and feature map bit-widths that result in computed quantization errors that fall within the accuracy constraint.” (Compressed accuracy and data size predictor, pp. 673; "Feature quantization would result in accuracy loss, but the DNNs' prediction of an image is inexplicable yet, and the compressed data size is highly related to the input data. From Fig. 5 we can observe that the accuracy loss and data size of a specific compression setting c is stable, therefore we can predict the current accuracy and compressed size based on historical statistics. We build a lookup table Ai(c) to predict the accuracy loss and compressed data size Si(c) in a specific quantization bit c." This article discloses that feature quantization will result in some form of accuracy loss from the original model. To rectify this, they designed a process to monitor the accuracy of the model using a lookup table to help predict and control the accuracy loss within a threshold.)

Regarding claim 19, Campos and Liu fail to explicitly disclose the elements of this claim. However, Li discloses, “wherein the accuracy constraint comprises a defined accuracy drop tolerance threshold for combined performance of the first neural network and the second neural network relative to performance of the trained neural network.” (Impact of Accuracy Threshold, pp. 676; "Next, we study the impact of our design on the model accuracy. We choose different accuracy threshold                 
                    ∆
                    α
                
             to test JALAD's latency performance and plot the average execution latency and decoupling decision in Figure 7. We observe that as the threshold increases, our design can achieve better latency gain. The reason is that JALAD can either change the decoupling layer or cast the in-layer feature maps into lower bit-depth (which means transmitting fewer bytes) to achieve lower latency." This article discloses a method which splits a DNN to execute a portion of the model on an edge device and the other portion on a cloud network. This Model uses an accuracy threshold ensure that the original model is accurately executed. This model utilizes a threshold at varying values to determine a balance between accuracy and latency.)

Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAUL MICHAEL GALVIN-SIEBENALER whose telephone number is (571)272-1257. The examiner can normally be reached Monday - Friday 8AM to 5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Viker Lamardo can be reached at (571) 270-5871. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/PAUL M GALVIN-SIEBENALER/Examiner, Art Unit 2147                                                                                                                                                                                                        
/VIKER A LAMARDO/Supervisory Patent Examiner, Art Unit 2147
Read full office action
METHOD AND SYSTEM FOR SPLITTING AND BIT-WIDTH ASSIGNMENT OF DEEP LEARNING MODELS FOR INFERENCE ON DISTRIBUTED SYSTEMS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

METHOD AND SYSTEM FOR SPLITTING AND BIT-WIDTH ASSIGNMENT OF DEEP LEARNING MODELS FOR INFERENCE ON DISTRIBUTED SYSTEMS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email