DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
The present application is being examined under the claims filed 10/29/2025.
Claims 1-15, 18, and 21-24 are pending.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 11/04/2025 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Response to Amendment
This Office Action is in response to Applicant’s communication filed 10/29/2025 in response to office action mailed 07/31/2025. The Applicant’s remarks and any amendments to the claims or specification have been considered with the results that follow.
Response to Arguments
Regarding double patenting rejections
In Remarks page 11, Argument 1
(Examiner summarizes Applicant’s argument) Applicant argues that the claims have been amended thus obviating the double patenting rejections over Wang.
Examiner’s response to Argument 1
Examiner agrees that Applicant’s amendments appear sufficient to overcome double patenting rejections.
Regarding claim objections
In Remarks page 12, Argument 2
(Examiner summarizes Applicant’s argument) Applicant argues that the claims have been amended thus obviating the objections.
Examiner’s response to Argument 2
Examiner agrees that the objections to the claims have been obviated by amendments.
Regarding 35 U.S.C. 112 rejections
In Remarks page 12, Argument 3
(Examiner summarizes Applicant’s argument) Applicant argues that the claims have been amended thus obviating the rejections under 35 U.S.C. 112.
Examiner’s response to Argument 3
Some of the rejections have been obviated by applicants arguments, however others remain.
Regarding 35 U.S.C. 101 rejections
In Remarks page 13-14, Argument 4
(Examiner summarizes Applicant’s argument) Applicant argues that the limitations of the amended claims do not possibly fall under a mental process grouping, specifically citing the limitation of “splitting the tensor data according to the target splitting path and distributing the tensor data for processing by corresponding cores of the multi-core processor”.
Examiner’s response to Argument 4
Examiner disagrees. Firstly, the portion of “distributing the tensor data for processing by corresponding cores of the multi-core processor” was never construed to be a mental process (see previous office action) and instead treated as an additional element. Second, splitting tensor data could be performed in the human mind and Applicant provides mere allegation that it cannot. For example, consider a 2x2 tensor
1
2
3
4
If the target splitting path is to split the tensor down the center vertically, the tensor could be split along the target splitting path as follows:
1
3
,
2
4
This example shows that the limitation could be practically performed in the human mind, or by a human using pen and paper as a tool. Though the claim also recites further elements that could not be performed in the human mind, those elements are treated under steps 2A prong 2 and 2B, not in 2A prong 1. Therefore, the rejections of the independent claims are maintained, as well as their dependents for similar reasons.
In Remarks page 14-15, Argument 5
(Examiner summarizes Applicant’s arguments) Applicant argues that even if the claims are directed to an abstract idea, they integrate the abstract idea into a practical application by providing a technical improvement. Applicant points to example 42 and particularly recites the following limitations, stating that they reflect improvements to resource allocation and usage of the multicore processors supporting a neural network model:
“splitting a target operator of a neural network model to be processed by a multi-core processor”
“splitting the tensor data according to the target splitting path and distributing the tensor data to corresponding cores of the multi-core processor”
Examiner’s response to Argument 5
Examiner disagrees. MPEP 2106.05(a) recites the following:
It is important to note, the judicial exception alone cannot provide the improvement. The improvement can be provided by one or more additional elements. See the discussion of Diamond v. Diehr, 450 U.S. 175, 187 and 191-92, 209 USPQ 1, 10 (1981)) in subsection II, below. In addition, the improvement can be provided by the additional element(s) in combination with the recited judicial exception.
As explained above and in the rejections under 35 U.S.C. 101, splitting an operator and splitting a tensor are directed to abstract ideas. Distributing data to cores of a multi-core processor is merely using a processor for the purpose it was intended to serve (using a computer as a tool MPEP 2106.05(f)(2)), while all remaining limitations are directed to abstract ideas. Furthermore, the claim does nothing to limit the judicial exception to any particular application apart from implementing it using a multi-core processor. This amounts to claiming any solution to a problem (see MPEP 2106.05(f)). Using a computer as a tool and claiming any solution to an identified problem is not sufficient to integrate a judicial exception as a practical application. The claim does not include any limitations, apart from the abstract ideas and generic computing, that would reflect an improvement in the claims. Therefore, the rejections of the independent claims are maintained, as well as their dependents for similar reasons.
In Remarks page 15-17, Argument 6
(Examiner summarizes Applicant’s arguments) Applicant argues that the claims include limitations that are not conventional in the field. Applicant states the limitations of the claims and states that they are associated with specific conditions and technical environments to handle certain neural network modelled operations and the operations for doing so are organized in a specific order, transforming the claimed invention into an inventive concept.
Examiner’s response to Argument 6,
Examiner disagrees. Applicant’s remarks amount to a mere allegation that the claim recites unconventional neural network modelling and organizing limitations without explaining how the additional elements are unconventional, beyond the abstract idea alone. In fact, MPEP 2106.05(d) recites
Another consideration when determining whether a claim recites significantly more than a judicial exception is whether the additional element(s) are well-understood, routine, conventional activities previously known to the industry. This consideration is only evaluated in Step 2B of the eligibility analysis.
If the additional element (or combination of elements) is a specific limitation other than what is well-understood, routine and conventional in the field, for instance because it is an unconventional step that confines the claim to a particular useful application of the judicial exception, then this consideration favors eligibility. If, however, the additional element (or combination of elements) is no more than well-understood, routine, conventional activities previously known to the industry, which is recited at a high level of generality, then this consideration does not favor eligibility.
The majority of the claim is identified as a mental process while the sole remaining additional element merely applies generic computer functionalities (rejected with MPEP 2106.05(f)(2)). Applicant provides no evidence that the computer processor nor the particular function it provides is anything but conventional.
Regarding 35 U.S.C. 102 and 103 rejections
In Remarks page 18-24, Argument 7
(Examiner summarizes Applicant’s arguments) Applicant argues that the dataflow graph of Wang’s SOYBEAN does not teach on the “calculation graph corresponding to the neural network model” and “determine a target splitting path” as recited in claim 1, among other limitations. Applicant further argues that Wang does not teach a sequence of split state sets such as figure 7 of the instant application, and that none of the cited references cure the deficiencies of Wang.
In Remarks page 21-22 , Argument 8
(Examiner summarizes Applicant’s arguments) Applicant argues that combining Wang and Mayer to teach claims 2-3, 10-11, and 21-22 amounts to impermissible hindsight since Wang already achieves data speedups due to parallelism.
Examiner’s response to Argument 8
While examiner disagrees with Applicant’s reasoning, the rejections have been withdrawn for other reasons rendering the arguments moot.
In Remarks page 22-23, Argument 9
(Examiner summarizes Applicant’s arguments) Applicant argues that the teachings of Jia cannot be applied to Wang or Mayer to achieve the claimed invention. Applicant argues that Jia’s solution offers data and model parallelism as contrasting alternatives while Wang uses a hybrid approach and accordingly combining Wang with Jia would require a complete redesign.
Examiner’s response to Argument 9
Regarding claims 6-7 and analogous: While examiner disagrees with Applicant’s reasoning, the rejections have been withdrawn for other reasons rendering the arguments moot.
Regarding claims 4-5 and analogous: Examiner disagrees. Wang and Jia are not incompatible, but rather highly similar references in the same field of endeavor. Wang offers a data parallelization system that reconfigures how data and operations are split. However, Wang and Jia both fundamentally operate by finding an optimal data split to parallelize data. Moreover, Jia offers the benefits of (page 1 abstract) “To accelerate this search, FlexFlow introduces a novel execution simulator that can accurately predict a parallelization strategy’s performance and is three orders of magnitude faster than prior approaches that have to execute each strategy. […] FlexFlow can increase training throughput by up to 3.8× over state-of-the-art approaches, even when including its search time, and also improves scalability”. Therefore Jia offers enormous improvements and a person having ordinary skill in the art, recognizing that Jia and Wang solve a similar problem in the same field of endeavor, would find it obvious to combine Jia with Wang in the way prescribed in the office action.
In Remarks page 23, Argument 10
(Examiner summarizes Applicant’s arguments) Applicant argues that the teachings of Jia are not suitable for “applications whose execution time is data dependent” and that in Wang and Mayer’s approach, data content impacts execution time and therefore it would not be suitable to combine them.
Examiner’s response to Argument 10,
Applicant takes Jia out of context. The full passage reads:
“Therefore, our approach may not be applicable to applications whose execution time is data dependent. However, for the DNN applications that are the subject of study here, which are based on dense matrix operations, execution time is highly predictable and independent of the contents of the matrices.”
Both Jia and Wang are designed to operate on deep neural networks, their input data, and their operations. Thus Jia and Wang both parallelize fundamentally the same type of data and model operations and a person having ordinary skill in the art would not regard this section as teaching away from combining with Wang and in fact that the method would be highly applicable to Wang. While the method may not be applicable to methods outside the realm of neural networks, both Wang and Jia are directed to deep neural networks. Examiner does not rely on Mayer for the rejections of claims 4-5.
In Remarks page 23, Argument 11
(Examiner summarizes Applicant’s arguments) Applicant argues that, since the 1.2x-3.8x improvement of Jia does not match the 4x improvements to processing speed of Wang and Mayer, it would amount to impermissible hindsight to combine with Wang and Mayer.
Examiner’s response to Argument 11
Examiner disagrees. Jia and Wang BOTH provide improvements over ordinary data and model parallelism techniques. Thus it is obvious that when combined, Jia may provide further improvements to Wang (e.g. above and beyond 4x). Examiner notes that when combining references, one need not choose between the techniques of one or the other, but rather combine the most effective techniques from both references to provide a stronger expected result than either in isolation. Examiner does not rely on Mayer for the rejections of claims 4-5.
In Remarks page 23-24, Argument 12
(Examiner summarizes Applicant’s arguments) Applicant argues that Wang’s approach already achieves 4x speedups and the citation of Liu does not describe how Liu’s four layers would achieve performance speedups nor how they would be incorporated to Wang or Mayer, and whether the speedups of Wang and Mayer would already achieve the benefit of Liu. Applicant argues this amounts to impermissible hindsight reasoning.
Examiner’s response to Argument 12
Applicant’s arguments have been fully considered but are not persuasive. In response to applicant’s argument that the examiner’s conclusion of obviousness is based upon improper hindsight reasoning, it must be recognized that any judgment on obviousness is in a sense necessarily a reconstruction based upon hindsight reasoning. But so long as it takes into account only knowledge which was within the level of ordinary skill at the time the claimed invention was made, and does not include knowledge gleaned only from the applicant’s disclosure, such a reconstruction is proper. See In re McLaughlin, 443 F.2d 1392, 170 USPQ 209 (CCPA 1971).
Claim Objections
Regarding Claim 1, 9, and 18
Claim 2 is objected to because of the following informalities: “wherein the split state set of the input tensor data is the first split state set in the sequence and the split state set of the output tensor data is the last split state set in the sequence” should read “wherein a [[the]] split state set of the input tensor data is a [[the]] first split state set in the sequence and a [[the]] split state set of the output tensor data is a [[the]] last split state set in the sequence”. Appropriate correction is required.
Regarding Claim 2, 10, and 21
Claim 2 is objected to because of the following informalities:
“traversing the split state sets and determining splitting paths of the tensor data further comprises, for a current split state” should read “traversing the split state sets and determining splitting paths of the tensor data further comprises, for a current split state of the sequence of split state sets”
“determining weights of splitting paths weights of the directed edges comprised in the splitting paths” should read “determining weights of splitting paths according to weights of the directed edges comprised in the splitting paths”.
Appropriate correction is required.
Regarding Claims 3, 11, and 22
Claim 3 is objected to because of the following informalities: “traversing all split state sets and determining splitting paths of the tensor data further comprise, for a current split state of the sequence of split state sets” should read “traversing all split state sets and determining splitting paths of the tensor data further comprise, for a current split state of the sequence of split state sets”.
Allowable Subject Matter
Claims 2-3, 6-7, 10-11, 14-15, and 21-22 objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. Examiner notes that issues under 35 U.S.C. 112(b) remain and must be resolved before allowance.
The following is a statement of reasons for the indication of allowable subject matter:
Regarding 35 U.S.C. 101
The “traversing” limitations in the dependent claims as amended are not directed to mental processes because they intrinsically require accessing, searching, and manipulating computer memory. Moreover, these limitations are not merely generic data gathering. They amount to a novel form of data manipulation which is not conventional in the field. Thus, claims 2-3, 10-11, and 21-22 are subject-matter eligible. Dependent claims 6-7 and 14-15 are eligible for similar reasons.
Regarding 35 U.S.C. 103
While Mayer does teach many of the general concepts of claim 2, it does not teach the specifics of the claim as amended. For example, claim 2 as amended recites
traversing the plurality of split states in the current split state to obtain the directed edges directing from each split state in the previous split state set in the sequence to each split state in the current split state set
While Mayer generally teaches “partitioning”, Mayer does not teach traversing between split states of split state sets as defined by the specification (i.e. directed edges between elements of different partitions). Mayer instead teaches (page 3 column 2 section 3) “we describe strategies to partition the graph such that the local scheduling algorithms running on the devices can exploit the locality and idle time is minimal”. That is, Mayer teaches partitioning a computation graph by traversing nodes and edges of the graph. This is not the same as what is being claimed by Applicant.
Further art revealed in search similarly does not teach the claim. Consider Chien et al. “Tensor-Factorized Neural Networks”. Figure 2a shows splitting an input into multiple splitting paths. However, Chien does not teach traversing a plurality of splitting path options to split tensor data and finding a target splitting path.
PNG
media_image1.png
221
372
media_image1.png
Greyscale
Ma et al. “NeuGraph: Parallel Deep Neural Network Computation on Large Graph” teaches partitioning neural networks on graphs, but does not teach using graph representations to find a target splitting path.
Claim 3 is deemed allowable subject matter for similar reasons as claim 2, as well as dependent claims 6-7 and 14-15.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 2-3, 6-7, 10-11, 14-15, and 21-22 are rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor regards as the invention.
Regarding Claims 2, 10, and 21
Claims 2, 10, and 21 recite the limitation “determining weights of splitting paths weights of directed edges comprised in the splitting paths”. It is unclear whether “the splitting paths” refers to the splitting paths of claim 1, or one of the splitting paths mentioned in claim 2.
Regarding Claims 3, 11, and 22
Claims 3, 11, and 22 recite the limitation “determining weights of splitting paths according to weights of the directed edges comprised in the splitting paths”. It is unclear whether “the splitting paths” refers to the splitting paths of claim 1, or one of the splitting paths mentioned in claim 3.
Regarding Dependent Claims
Claims
6 is dependent upon claim 2,
7 is dependent upon claim 3,
14 is dependent upon claim 10, and
15 is dependent upon claim 11
and are therefore similarly rejected for including the deficiencies of claims 2-3 and 10-11 respectively.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1, 4-5, 9, 12-13, 18, and 23-24 are rejected under 35 U.S.C. 101 for containing an abstract idea without significantly more.
Regarding Claim 1:
Step 1 – Is the claim to a process, machine, manufacture, or composition of matter?
Yes, the claim is to a process.
Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites the abstract ideas of:
A method for splitting a target operator of a neural network model to be processed by a multi-core processor, the method comprising: determining a sequence of split state sets of tensor data associated with the target operator according to a calculation graph corresponding to the neural network model wherein the tensor data includes input tensor data and output tensor data, wherein the split state set of the input tensor data is the first split state set in the sequence and the split state set of the output tensor data is the last split state in the sequence — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to performing a judgement on an array of data about the best split states for the array.
traversing the split state sets and determining splitting paths of the tensor data of the target operator and weights of the splitting paths, wherein each splitting path comprises directed edges each between two split states respectively from two adjacent split state sets in the sequence — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to performing a judgement for the most optimal paths out of a set of possible paths.
determining a target splitting path, among the determined splitting paths, for splitting the tensor data according to the weights of the splitting path — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to performing an evaluation of an array of data based on a set of weights.
and splitting the tensor data according to the target splitting path — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to performing an evaluation of an array of data to separate it into data splits.
Step 2A – Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No, the claim does not recite additional elements that integrate the judicial exception into a practical application. The additional elements:
and distributing the tensor data to corresponding cores of the multi-core processor for processing — This limitation is directed to merely applying an abstract idea using a generic computer as a tool (see MPEP 2106.05(f)(2), 2106.04(d)).
Step 2B – Does the claim recite additional elements that amount to significantly more than the abstract idea itself?
No, the claim does not recite additional elements which amount to significantly more than the abstract idea itself. The additional elements as identified in step 2A prong 2:
and distributing the tensor data to corresponding cores of the multi-core processor for processing — This limitation is directed to merely applying an abstract idea using a generic computer as a tool (see MPEP 2106.05(f)(2), 2106.04(d)).
Regarding Claim 4
Claim 4 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 1 which included an abstract idea (see rejection for claim 1). The claim recites the additional limitations:
Step 2A Prong 2:
wherein split states in the split state sets of the input tensor data are determined according to a computational logic of the target operator and split states in the split state sets of corresponding output tensor data — This limitation is directed to merely limiting a judicial exception to a particular field of use (see MPEP 2106.05(h)) as it merely limits the field of the split states in the split state sets.
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2.
Step 2B:
The additional elements as identified in step 2A prong 2:
wherein split states in the split state sets of the input tensor data are determined according to a computational logic of the target operator and split states in the split state sets of corresponding output tensor data — Merely limiting a judicial exception to a particular field of use (see MPEP 2106.05(h)) cannot amount to significantly more than the judicial exception.
Thus, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B.
Regarding Claim 5
Claim 5 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 1 which included an abstract idea (see rejection for claim 1). The claim recites the additional limitations:
Step 2A Prong 2:
wherein split states in the split state sets of the output tensor data are determined according to a computational logic of the target operator and split states in the split state sets of corresponding input tensor data — This limitation is directed to merely limiting a judicial exception to a particular field of use (see MPEP 2106.05(h)) as it merely limits the field of the split states in the split state sets.
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2.
Step 2B:
The additional elements as identified in step 2A prong 2:
wherein split states in the split state sets of the output tensor data are determined according to a computational logic of the target operator and split states in the split state sets of corresponding input tensor data — Merely limiting a judicial exception to a particular field of use (see MPEP 2106.05(h)) cannot amount to significantly more than the judicial exception.
Thus, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B.
Regarding Claim 8
Claim 8 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 2 which included an abstract idea (see rejection for claim 2). The claim recites the additional limitations:
Step 2A Prong 2:
wherein the weights of the directed edges are determined according to a computational operational type of the target operator corresponding to the splitting paths, a data scale of corresponding sub-data obtained by the tensor data of the target operator through the splitting paths, and a throughput rate and a memory access bandwidth of each processor core — This limitation is directed to merely limiting a judicial exception to a particular field of use (see MPEP 2106.05(h)) as it merely limits the field of the weights of the directed edges.
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2.
Step 2B:
The additional elements as identified in step 2A prong 2:
wherein the weights of the directed edges are determined according to a computational operational type of the target operator corresponding to the splitting paths, a data scale of corresponding sub-data obtained by the tensor data of the target operator through the splitting paths, and a throughput rate and a memory access bandwidth of each processor core — Merely limiting a judicial exception to a particular field of use (see MPEP 2106.05(h)) cannot amount to significantly more than the judicial exception.
Thus, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B.
Regarding Claim 9
Independent claim 9 is an apparatus claim corresponding to method claim 1, which was directed to an abstract idea, therefore the same rejection and rationale applies. The only difference is that claim 9 recites the following additional elements treated under step 2A prong 2 and step 2B:
Step 2A Prong 2:
An apparatus for splitting a target operator of a neural network model to be processed by a multi-core processor, the apparatus comprising a general-purpose processor configured to — This limitation is directed to merely applying an abstract idea using a generic computer as a tool (see MPEP 2106.05(f)(2), 2106.04(d)).
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2.
Step 2B:
An apparatus for splitting a target operator of a neural network model to be processed by a multi-core processor, the apparatus comprising a general-purpose processor configured to — Using a generic computer as a tool (see MPEP 2106.05(f)(2), 2106.05(d)) cannot amount to significantly more than the judicial exception itself.
Thus, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B.
Regarding Claim 12
Dependent claim 12 is an apparatus claim corresponding to method claim 4, which was directed to an abstract idea, therefore the same rejection and rationale applies.
Regarding Claim 13
Dependent claim 13 is an apparatus claim corresponding to method claim 5, which was directed to an abstract idea, therefore the same rejection and rationale applies.
Regarding Claim 18
Independent claim 18 is a computer device claim corresponding to method claim 1, which was directed to an abstract idea, therefore the same rejection and rationale applies. The only difference is that claim 18 recites the following additional elements treated under step 2A prong 2 and step 2B:
Step 2A Prong 2:
A computer device, comprising processors and a memory that is connected to each of the processors, wherein the processors comprise a general-purpose processor and an artificial intelligence processor, the memory is configured to store a computer program comprising a program instruction, when executed by the general-purpose processor, performing — This limitation is directed to merely applying an abstract idea using a generic computer as a tool (see MPEP 2106.05(f)(2), 2106.04(d)).
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2.
Step 2B:
A computer device, comprising processors and a memory that is connected to each of the processors, wherein the processors comprise a general-purpose processor and an artificial intelligence processor, the memory is configured to store a computer program comprising a program instruction, when executed by the general-purpose processor, performing — Using a generic computer as a tool (see MPEP 2106.05(f)(2), 2106.05(d)) cannot amount to significantly more than the judicial exception itself.
Thus, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B.
Regarding Claim 23
Dependent claim 23 is a computer device claim corresponding to method claim 4, which was directed to an abstract idea, therefore the same rejection and rationale applies.
Regarding Claim 24
Dependent claim 24 is a computer device claim corresponding to method claim 5, which was directed to an abstract idea, therefore the same rejection and rationale applies.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 9, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over NPL reference Wang et al. “Unifying Data, Model, and Hybrid Parallelism in Deep Learning via Tensor Tiling” herein referred to as Wang in view of NPL reference Bettilyon et al. “Deep Neural Networks as Computational Graphs” herein referred to as Bettilyon.
Regarding Claim 1
Wang teaches:
A method for splitting a target operator of a neural network model to be processed by a multi-core processor, the method comprising:
(page 9 column 1 section 6.1) “We evaluated SOYBEAN on Amazon’s EC2 cluster. We used a p2.8xlarge instance, with 480GB of memory and 32 virtual CPUs as well as 8 NVIDIA GK210 GPUs on the instance.”; (page 7 column 2 section 4.3) “Specifically, we can divide 2^k devices into 2 groups, each with 2^k−1 devices. We first use the one-cut algorithm to find the best tiling to partition the computation among the two groups.”
determining a sequence of split state sets of tensor data associated with the target operator according to a calculation graph corresponding to the neural network model
(page 2 column 1 paragraph 2) “Fortunately, many DNN models have the common structure of multiple stacked neuron layers. As a result, we can reorganize the dataflow graph of a DNN training[*Examiner notes: calculation graph corresponding to neural network model] into a chain of levels such that each level only interacts with the adjacent levels[*Examiner notes: sequence of split state sets]. With this formulation, we solve the tiling problem using a novel algorithm that recursively applies dynamic programming to find the optimal tiling solution given any DNN configuration[*Examiner notes: each tensor tiling = split state set] and batch size.”; (page 4 column 1 paragraph 2) “We reuse the front-end of existing deep learning systems that express tensor computation by a dataflow graph, which we refer to as the semantic dataflow graph[*Examiner notes: mapped to calculation graph]. An example semantic dataflow graph is shown in Figure 8(b). It is mostly serial.”; (page 4 column 2 bullet point 1) “SOYBEAN solves this by considering operators[*Examiner notes: mapped to target operator] that share inputs or outputs together when searching for the optimal tiling.”; (page 5 column 1 paragraph 2) “Let T1 = {R, C, r} be the set that contains all basic tilings of a matrix, where R, C and r represent row tiling, column tiling and replication, respectively. We then define a k-cut tiling set that contains all possible tilings after k compositions as follows: Definition 1. Tk[*Examiner notes: mapped to split state sets of tensor data]=[…]”
wherein the tensor data includes input tensor data and output tensor data,
(page 4 column 2 bullet point 2) “Given the tilings of the inputs and outputs[*Examiner notes: tensor data includes input tensor data and output tensor data], SOYBEAN needs to determine the corresponding communication costs”
traversing the split state sets and determining splitting paths of the tensor data of the target operator and weights of the splitting paths, wherein each splitting path comprises directed edges each between two split states respectively from two adjacent split state sets in the sequence
(page 7 column 1 paragraph 3) “To achieve this, our algorithm first treats the dataflow graph as an undirected graph G0, and then uses a breadth-first search on this graph to organize graph nodes into a list of levels L = hl0, l1, . . . , lni. BFS puts nodes that share inputs or outputs in adjacent levels. We then use dynamic programming (DP) to search for optimal tilings[*Examiner notes: traversing the split state sets]” (page 4 column 1 second to last paragraph) “Based on the semantic dataflow graph, SOYBEAN determines the best tensor tiling scheme[*Examiner notes: splitting path between adjacent split state sets] that incurs the least communication cost[*Examiner notes: weights of the splitting paths]. SOYBEAN then transforms the serial semantic dataflow graph into a parallel execution dataflow graph based on the scheme. It automatically maps the partitioned arrays and operators to the set of underlying devices”
determining a target splitting path, among the determined splitting paths, for splitting the tensor data according to the weights of the splitting path
(page 6 column 2 last paragraph) “Given a dataflow graph G, the one-cut tiling algorithm finds a tiling across two devices (or groups)[*Examiner notes: mapped to determining a target splitting path], Tmin : M → T1, such that the overall communication cost is minimized: [Equation 3] where OG represents all the matrix multiplications in the dataflow graph G, and oX, oY and oZ represent the input matrices and output matrix of a matrix multiplication o.”; Equation 3
PNG
media_image2.png
208
779
media_image2.png
Greyscale
and splitting the tensor data according to the target splitting path and distributing the tensor data to corresponding cores of the multi-core processor for processing
(page 8 column 1 section 5) “This section covers how SOYBEAN dispatches operators to different devices[*Examiner notes: corresponding cores of the multi-core processor] and how the semantic dataflow graph is converted to the execution graph given the k-cuts tiling schemes computed by our algorithm.”
Wang does not explicitly teach:
wherein the split state set of the input tensor data is the first split state set in the sequence and the split state set of the output tensor data is the last split state in the sequence
However, Bettilyon teaches:
wherein the split state set of the input tensor data is the first split state set in the sequence and the split state set of the output tensor data is the last split state in the sequence
[*Examiner notes: In the example computational graph, the inputs appear first and the outputs appear last. Thus, when read from left to right, the input tensor data is first in the sequence and the output tensor data is last in the sequence]; (picture on page 6)
PNG
media_image3.png
536
1078
media_image3.png
Greyscale
Wang, Bettilyon, and the instant application are analogous because they are all directed to neural networks.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the present invention to modify the neural network splitting of Wang with the computation graph representation taught by Bettilyon because (Bettilyon page 1) “Because these functions are often monstrously complex we use graphs to represent them rather than the standard formula notation. These graphs help us organize our thinking about the functions we set out to build and it turns out some graphs work much better than others for particular tasks. A lot of research and development in the neural network space is about inventing new architectures for these graphs, rather than inventing brand new algorithms.”
Regarding Claim 9
Claim 9 is an apparatus claim corresponding to method claim 1. The only difference is that claim 9 recites a general-purpose processor:
Wang teaches:
An apparatus for splitting a neural network model to be processed by a multi-core processor, comprising a general-purpose processor configured to:
(page 9 column 1 section 6.1) “We evaluated SOYBEAN on Amazon’s EC2 cluster. We used a p2.8xlarge instance, with 480GB of memory and 32 virtual CPUs as well as 8 NVIDIA GK210 GPUs on the instance. Each of the GPUs has 12GB of memory; they are connected by PCI-e, with a maximum peer-to-peer bi-directional bandwidth of 20GB/s.”
The remaining limitations of the claim are taught by the rejection of claim 1.
Regarding Claim 18
Claim 18 is a computer device claim corresponding to method claim 1. The only difference is that claim 18 recites a computer with memory, connections, and processors:
Wang teaches:
A computer device, comprising processors and a memory that is connected to each of the processors, wherein the processors comprise a general-purpose processor and an artificial intelligence processor, the memory is configured to store a computer program comprising a program instruction, when executed by the general-purpose processor, performing
(page 9 column 1 section 6.1) “We evaluated SOYBEAN on Amazon’s EC2 cluster. We used a p2.8xlarge instance, with 480GB of memory and 32 virtual CPUs[*Examiner notes: general-purpose processor] as well as 8 NVIDIA GK210 GPUs[*Examiner notes: artificial intelligence processor] on the instance. Each of the GPUs has 12GB of memory; they are connected by PCI-e, with a maximum peer-to-peer bi-directional bandwidth of 20GB/s.”
The remaining limitations of the claim are taught by the rejection of claim 1.
Claims 4-5, 12-13, and 23-24 are rejected under 35 U.S.C. 103 as being unpatentable over Wang in view of Bettilyon and further in view of NPL reference Jia et al. “Beyond Data and Model Parallelism for Deep Neural Networks” herein referred to as Jia.
Regarding Claim 4
Wang in view of Bettilyon teaches:
The method of claim 1
(see rejection of claim 1)
Wang in view of Bettilyon does not explicitly teach:
wherein split states in the split state sets of the input tensor data are determined according to a computational logic of the target operator and split states in the split state sets of corresponding output tensor data.
However, Jia teaches:
wherein split states in the split state sets of the input tensor data are determined according to a computational logic of the target operator and split states in the split state sets of corresponding output tensor data.
(page 3 column 2 section 3.1 paragraph 1) “Similar to existing deep learning systems [7, 6, 2], FlexFlow uses an operator graph G to describe all operators and state in a DNN[*Examiner notes: computational logic]”; (page 7 column 2 paragraph 2) “This section describes the execution optimizer that takes an operator graph and a device topology as inputs and automatically finds an efficient parallelization strategy[*Examiner notes: split states determined according to computational logic].”; (page 5 column 1 paragraph 2) “Figure 4 shows an example parallelization configuration for a matrix multiplication operator (i.e., Y = WX). The operator is partitioned into four independent tasks assigned to different GPU devices. The input and output tensors of the tasks[*Examiner notes: output tensor] are shown in the figure.”
Wang, Bettilyon, Jia, and the instant application are analogous because they are all directed to machine learning.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the present invention to modify the neural network splitting of Wang in view of Bettilyon with the determining split states based on operational logic as taught by Jia because (Jia page 1 abstract) “To accelerate this search, FlexFlow introduces a novel execution simulator that can accurately predict a parallelization strategy’s performance and is three orders of magnitude faster than prior approaches that execute each strategy.”
Regarding Claim 5
Wang in view of Bettilyon teaches:
The method of claim 1
(see rejection of claim 1)
Wang in view of Bettilyon does not explicitly teach:
wherein split states in the split state sets of the output tensor data are determined according to a computational logic of the target operator and split states in the split state sets of corresponding input tensor data
However, Jia teaches:
wherein split states in the split state sets of the output tensor data are determined according to a computational logic of the target operator and split states in the split state sets of corresponding input tensor data.
(page 3 column 2 section 3.1 paragraph 1) “Similar to existing deep learning systems [7, 6, 2], FlexFlow uses an operator graph G to describe all operators and state in a DNN[*Examiner notes: computational logic]”; (page 7 column 2 paragraph 2) “This section describes the execution optimizer that takes an operator graph and a device topology as inputs and automatically finds an efficient parallelization strategy[*Examiner notes: split states determined according to computational logic].”; (page 5 column 1 paragraph 2) “Figure 4 shows an example parallelization configuration for a matrix multiplication operator (i.e., Y = WX). The operator is partitioned into four independent tasks assigned to different GPU devices. The input and output tensors of the tasks[*Examiner notes: output tensor] are shown in the figure.”
Wang, Bettilyon, Jia, and the instant application are analogous because they are all directed to machine learning.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the present invention to modify the neural network splitting of Wang in view of Bettilyon with the determining split states based on operational logic as taught by Jia because (Jia page 1 abstract) “To accelerate this search, FlexFlow introduces a novel execution simulator that can accurately predict a parallelization strategy’s performance and is three orders of magnitude faster than prior approaches that execute each strategy.”
Regarding Claim 12
Claim 12 is an apparatus claim corresponding to method claim 4. The only difference is that claim 12 recites a general-purpose processor as taught in the rejection of claim 9 above. The remaining limitations of the claim are taught by the rejection of claim 4.
Regarding Claim 13
Claim 13 is an apparatus claim corresponding to method claim 5. The only difference is that claim 13 recites a general-purpose processor as taught in the rejection of claim 9 above. The remaining limitations of the claim are taught by the rejection of claim 5.
Regarding Claim 23
Claim 23 is a computer device claim corresponding to method claim 4. The only difference is that claim 21 recites a computer as taught in the rejection of claim 18 above. The remaining limitations of the claim are taught by the rejection of claim 4.
Regarding Claim 24
Claim 24 is a computer device claim corresponding to method claim 5. The only difference is that claim 24 recites a computer device as taught in the rejection of claim 18 above. The remaining limitations of the claim are taught by the rejection of claim 5.
Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Wang in view of Bettilyon, and further in view of NPL reference Liu et al. “Throughput-Optimized FPGA Accelerator for Deep Convolutional Neural Networks” herein referred to as Liu.
Regarding Claim 8
Wang in view of Bettilyon teaches:
The method of claim 1
(see rejection of claim 1)
Wang in view of Bettilyon does not explicitly teach:
wherein the weights of the directed edges are determined according to a computational operational type of the target operator corresponding to the splitting paths
a data scale of corresponding sub-data obtained by the tensor data of the target operator through the splitting paths
and a throughput rate and a memory access bandwidth of each processor core
However, Liu teaches:
wherein the weights of the directed edges are determined according to a computational operational type of the target operator corresponding to the splitting paths
(page 4 section 3.1) “Table I lists the computation complexity and memory footprint of each layer type in three representative CNNs in 32-bit floating point implementation. All the values are computed according to the equations in Section 2.”
a data scale of corresponding sub-data obtained by the tensor data of the target operator through the splitting paths
(page 20 last paragraph) “Hence the implementation method is closely related to the CNN scale and the on-chip memory capacity of the selected FPGA chip.”
and a throughput rate and a memory access bandwidth of each processor core
(page 1 abstract) “We further put forward a systematic design space exploration methodology to search for the optimal solution that maximizes accelerator throughput under the FPGA constraints such as on-chip memory, computational resources, external memory bandwidth, and clock frequency”
Wang, Bettilyon, Liu, and the instant application are analogous because they are all directed to machine learning.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the present invention to modify the neural network splitting as taught by Wang in view of Bettilyon with the factors of Liu because (Liu page 1 abstract) “The average performance of the three accelerators is 424.7, 445.6, and 473.4GOP/s under 100MHz working frequency, which outperforms the CPU and previous work significantly.”
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Ezra J Baker whose telephone number is (703)756-1087. The examiner can normally be reached Monday - Friday 10:00 am - 8:00 pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David Yi can be reached at (571) 270-7519. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/E.J.B./Examiner, Art Unit 2126
/DAVID YI/Supervisory Patent Examiner, Art Unit 2126