DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
This action is responsive to remarks filed 10/03/2025. Claims 1 and 11 are amended. Claims 2, 4, 12, and 14 are cancelled, and there are no new claims.
Thus, claims 1, 3, 5–11, 13, and 15 are pending for examination.
Response to Arguments
In reference to Foreign Priority
Applicant’s arguments, filed on 10/03/2025, with respect to the foreign priority claim have been fully considered. Applicant may not be able to rely upon the certified copy of the foreign priority application to overcome a rejection made because a translation of said application has not been made of record in accordance with 37 CFR 1.55. When an English language translation of a non-English language foreign application is required, the translation must be that of the certified copy (of the foreign application as filed) submitted together with a statement that the translation of the certified copy is accurate. See MPEP §§ 215 & 216. Furthermore, Examiner points to the “translation if not in the English language” requirement for a claim to foreign priority set forth by 35 U.S.C. 119(b)(3). Examiner notes a certified copy of the foreign priority document was automatically received at the USPTO on 10/25/2021, however, said foreign priority document is not in English and is not a translated copy provided in accordance with 37 CFR 1.55. The Office requires “a certified copy of the original foreign application, specification, and drawings upon which it is based, a translation if not in the English language.” See 35 U.S.C. 119(a)–(d); See Also MPEP §§ 215 & 216.
In reference to Specification objections
Applicant’s arguments along with amendments, filed on 10/03/2025, with respect to the Specification objections have been fully considered and are persuasive. Therefore, the objections to the Specification are withdrawn.
In reference to 35 USC § 101
Applicant’s arguments and amendments, filed on 10/03/2025, with respect to the § 101 rejections have been fully considered and are persuasive. Applicant argues, beginning on Pg. 8 in the Remarks, that “given the substantial amount of subject matter in this claim that is not directed to an abstract idea, it must be concluded that the claims as a whole is not actually directed to an abstract idea, and Step 2A, Prong I is satisfied. See August 4th memo.” Examiner agrees.
Examiner notes that while the claims recite several limitations that are abstract ideas (covering mathematical and mental concepts), the claims as a whole are not directed to an abstract idea. Applicant has amended the claims, which recite a specific collection of hardware to accomplish the steps of the limitations including (“wherein the neural network computation plan comprises at least one of a computation ratio between the first processor and the second processor, or a computational amount of the first processor and a computational amount of the second processor; distributing, based on the obtained neural network computation plan, the computation of the first neural network layer to each of the first processor and the second processor”) and are not abstract ideas, respectively (see MPEP 2106.04(a)(1)). Thus, these limitations must be considered additional elements to the abstract idea. Examiner notes that these additional element integrates the abstract idea into a practical application because the entire claim amounts to a detailed recitation of how a specific set of hardware processing an artificial neural network by an electronic device (as opposed to a broad recitation of steps performed at a high level of generality), and the specific method of steps recited in the additional element amounts to an improvement to the functioning of a computer/field, as set forth by MPEP 2106.05(a)), which states “the claim must include the components or steps of the invention that provide the improvement described in the specification.” Pursuant to this requirement set forth by the MPEP, Examiner points out that the Specification states in at least paragraphs [0054]–[0056], [0062], [0074], [0100], and [0102] as pointed out by Applicant. Thus, the additional elements reflects the improvement set forth and explains what the resulting improvement is.
In reference to 35 USC § 102 & 103
Applicant’s arguments filed on 10/03/2025, with respect to the newly amended limitations have been fully considered but are not persuasive. Examiner notes that the 35 USC § 102 rejections are withdrawn.
Regarding the 35 USC § 103 rejections:
Applicant argues, beginning on Pg. 23 in the Remarks, that “Sze does not disclose the feature of obtaining a computation distribution plan [sic] by taking into account processing time.” Additionally, Applicant argues that “the mapping performed by the mapper in Sze relates to hardware configuration optimization, which is different from the present invention that obtains a computation distribution ratio/computational amount for each processor.” Examiner respectfully disagrees. Examiner notes that the claim language requires “obtaining a neural network computation plan” and does not include “obtaining a computation distribution pla[n]” as argued by Applicant. Furthermore, Examiner notes the claim language further requires “based on at least one of a processing time of the first neural network layer of the respective first processor and second processor or available resources of the respective first processor and second processor.” Emphasis added. Examiner notes the limitation does not require that the computation plan be obtained based on “processing time” as argued by Applicant. With regard to Sze’s mapping, Examiner contends that Sze is obtaining a neural network computation plan (e.g., built by the compiler) which is based on available resources related to hardware configuration (e.g., first and second processors). See § 103 below for a detailed analysis.
Applicant’s arguments filed with respect to the newly amended limitations have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 5–11, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Sze et al., ("Efficient Processing of Deep Neural Networks: A Tutorial and Survey," https://arxiv.org/abs/1703.09039, 2017), hereinafter “Sze”, in view of Sankaradas et al., ("A Massively Parallel Coprocessor for Convolutional Neural Networks," 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors, Boston, MA, USA, 2009, pp. 53-60, doi: 10.1109/ASAP.2009.25), hereinafter “Sankaradas”.
Regarding claim 1, Sze teaches:
A method for processing an artificial neural network by an electronic device including a plurality of processors, the method comprising (Sze Pg. 5, Right Column: “In many applications, it is desirable to have the DNN inference processing near the sensor. For instance, in computer vision applications, such as measuring wait times in stores or predicting traffic patterns, it would be desirable to extract meaningful information from the video right at the image sensor rather than in the cloud to reduce the communication cost. For other applications such as autonomous vehicles, drone navigation and robotics, local processing is desired since the latency and security risks of relying on the cloud are too high”; see also Sze Pg. 10, Left Column: “For ease of DNN development and to enable sharing of trained networks, several deep learning frameworks have been developed from various sources. These open source libraries contain software libraries for DNNs. Caffe was made available in 2014 from UC Berkeley [46]. It supports C, C++, Python and MATLAB. Tensorflow was released by Google in 2015, and supports C++ and python; it also supports multiple CPUs and GPUs and has more flex”—[wherein the DNN processing (i.e., processing an artificial neural network) is near the sensor (i.e., device)]):
obtaining, by using a first processor among the plurality of processors and a second processor among the plurality of processors, a neural network computation plan for performing computation of a first neural network layer of the artificial neural network (Sze Pg. 10, Left Column: “For ease of DNN development and to enable sharing of trained networks, several deep learning frameworks have been developed from various sources. These open source libraries contain software libraries for DNNs. Caffe was made available in 2014 from UC Berkeley [46]. It supports C, C++, Python and MATLAB. Tensorflow was released by Google in 2015, and supports C++ and python; it also supports multiple CPUs and GPUs and has more flex”; see also Sze Pg. 14, Fig. 24: “The operation of DNN accelerators is analogous to that of general-purpose processors as illustrated in Fig. 24 [81]. In conventional computer systems, the compiler translates the program into machine-readable binary codes for execution given the hardware architecture (e.g., x86 or ARM); in the processing of DNNs, the mapper translates the DNN shape and size into a hardware-compatible computation mapping for execution given the dataflow. While the compiler usually optimizes for performance, the mapper optimizes for energy efficiency”—[wherein the system supports multiple CPUs and GPUs (i.e., using a first and second processor) to obtain the translation data for mapping the execution (i.e., obtain the neural network computation plan for performing the computation)]);
performing a first portion of the computation of the first neural network layer by using the first processor (Sze Pg. 12, Left Column, Fig. 17 “ALU”: “The fundamental component of both the CONV and FC layers are the multiply-and-accumulate (MAC) operations, which can be easily parallelized. In order to achieve high performance, highly-parallel compute paradigms are very commonly used, including both temporal and spatial architectures as shown in Fig. 17 … In contrast, spatial architectures use dataflow processing, i.e., the ALUs form a processing chain so that they can pass data from one to another directly. Sometimes each ALU can have its own control logic and local memory, called a scratchpad or register file. We refer to the ALU with its own local memory as a processing engine (PE). Spatial architectures are commonly used for DNNs in ASIC and FPGA-based designs”; see also Sze Pg. 15: “One example that implements the output stationary dataflow is ShiDianNao [89], where each PE handles the processing for each output activation value by fetching the corresponding input activations from neighboring PEs. The PE array implements dedicated networks to pass data horizontally and vertically. Each PE also has data delay registers to keep data around for the required amount of cycles. At the system level, the global buffer streams the input activations and broadcasts the weights into the PE array. The partial sums are accumulated inside each PE and then get streamed out back to the global buffer. Other examples of output stationary are found in [88, 90]”—[wherein the BRI of first and second processor is any functional unit capable of performing neural network computations (see present disclosure paragraph 0030), and wherein each layer of the network is processed by the PE array (i.e., a first portion of the computation using the first processor)]), and
performing a second portion of the computation of the first neural network layer by using the second processor, based on the distributed computation of the first neural network layer (Sze Pg. Figs. 27–30, Pg. 15–16: “The row stationary dataflow assigns the processing of a 1-D row convolution into each PE for processing as shown in Fig. 27. It keeps the row of filter weights stationary inside the RF of the PE and then streams the input activations into the PE. The PE does the MACs for each sliding window at a time, which uses just one memory space for the accumulation of partial sums. Since there are overlaps of input activations between different sliding windows, the input activations can then be kept in the RF and get reused. By going through all the sliding windows in the row, it completes the 1-D convolution and maximize the data reuse and local accumulation of data in this row. With each PE processing a 1-D convolution, multiple PEs can be aggregated to complete the 2-D convolution as shown in Fig. 28”—[wherein each layer of the network is processed by the PE array (i.e., a second portion of the computation using the second processor) according to the data flow assignment (i.e., the neural network computation plan)]);
obtaining a first output value based on a performance result of the first processor and a second output value based on a performance result of the second processor (Sze Pg. Figs. 27–30, Pg. 15–16: “For example, to generate the first row of output activations with a filter having three rows, three 1-D convolutions are required. Therefore, we can use three PEs in a column, each running one of the three 1-D convolutions. The partial sums are further accumulated vertically across the three PEs to generate the first output row. To generate the second row of output, we use another column of PEs, where three rows of input activations are shifted down by one row, and use the same rows of filters to perform the three 1-D convolutions. Additional columns of PEs are added until all rows of the output are completed (i.e., the number of PE columns equals the number of output rows)”—[wherein the output is obtained from the first PE output row and the second output is generated by another column of PEs based on the result which are all based on their respective columns results]); and
using the first output value and the second output value as an input value of a second neural network layer of the artificial neural network (Sze Pg. 2300, III. Overview of DNNs: “The networks that process the input come in two major forms: feed forward and recurrent as shown in Fig. 8(a). In feed-forward networks all of the computation is performed as a sequence of operations on the outputs of a previous layer”); and
wherein the obtaining the neural network computation plan comprises obtaining the neural network computation plan based on at least one of a processing time of the first neural network layer of the respective first processor and second processor or available resources of the respective first processor and second processor (Sze Pg. 4, Left Column: “This article will focus on the efficient processing of DNN inference rather than training, since DNN inference is often performed on embedded devices (rather than the cloud) where resources are limited as discussed in more details later”; see also Sze Pg. 14, Fig. 24: “The operation of DNN accelerators is analogous to that of general-purpose processors as illustrated in Fig. 24 [81]. In conventional computer systems, the compiler translates the program into machine-readable binary codes for execution given the hardware architecture (e.g., x86 or ARM); in the processing of DNNs, the mapper translates the DNN shape and size into a hardware-compatible computation mapping for execution given the dataflow. While the compiler usually optimizes for performance, the mapper optimizes for energy efficiency”; see also Sze Fig. 30, Pg. 16: “The number of filters, channels, and fmaps that can be processed at the same time is programmable, and there exists an optimal mapping for the best energy efficiency, which depends on the shape configuration of the DNN as well as the hardware resources provided, e.g., the number of PEs and the size of the memory in the hierarchy. Since all of the variables are known before runtime, it is possible to build a compiler (i.e., mapper) to perform this optimization off-line to configure the hardware for different mappings of the RS dataflow for different DNNs as shown in Fig. 30”—[(emphasis added)]).
Sze does not appear to explicitly teach:
wherein the neural network computation plan comprises at least one of a computation ratio between the first processor and the second processor, or a computational amount of the first processor and a computational amount of the second processor; and
distributing, based on the obtained neural network computation plan, the computation of the first neural network layer to each of the first processor and the second processor.
However, Sankaradas teaches:
wherein the neural network computation plan comprises at least one of a computation ratio between the first processor and the second processor, or a computational amount of the first processor and a computational amount of the second processor (Sankaradas Pg. 55, A. Coprocessor-based CNN Evaluation: “An interesting metric to note is the computation to memory bandwidth ratio” —[wherein the computation to memory bandwidth ratio is noted for all processors. Examiner notes Sankaradas’ coprocessor consists of many vector processing elements (VPEs) (i.e., first and second processors; see Fig. 4)]);
distributing, based on the obtained neural network computation plan, the computation of the first neural network layer to each of the first processor and the second processor (Sankaradas Fig. 4, Pgs. 55–57: “Third, we use banked off-chip data memory and introduce specific instructions to orchestrate data movement between the VPE clusters and the off-chip memory … The instruction set includes instructions to specify each CNN layer”—[wherein each layer includes instructions (i.e., based on the computation plan) for processing the CNN in orchestration with the processors (i.e., first and the second processors; e.g., VPE clusters)]); and
The methods of Sze, the teachings of Sankaradas, and the instant application are analogous art because they pertain to using processors to map data for optimization methods including machine learning.
It would be obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the methods of Sze with the teachings of Sankaradas to provide for a method to use distribute the computation to the processors based on a ratio of available resources. One would be motivated to do so to improve efficiency of the computation (Sankaradas Pg. 53, Abstract: “The coprocessor functional units, consisting of parallel 2D convolution primitives and programmable units performing sub-sampling and non-linear functions specific to CNNs, implement a “meta-operator” to which a CNN may be compiled to. The coprocessor is serviced by distributed off-chip memory banks with large data bandwidth. As a key feature, we use low precision data and further increase the effective memory bandwidth by packing multiple words in every memory operation, and leverage the algorithm’s simple data access patterns to use off-chip memory as a scratchpad for intermediate data, critical for CNNs. A CNN is mapped to the coprocessor hardware primitives with instructions to transfer data between the memory and coprocessor”).
Regarding claim 5, Sze in view of Sankaradas teaches all the limitations of claim 1.
Sze teaches:
wherein the obtaining the neural network computation plan comprises obtaining the neural network computation plan based on at least one of a size of an input value, a size of a filter, a number of filters or a size of an output value of the artificial neural network as a structure of the artificial neural network (Sze Pg. 8, Left Column: “Many DNN models have been developed over the past two decades. Each of these models has a different ‘network architecture’ in terms of number of layers, layer types, layer shapes (i.e., filter size, number of channels and filters), and connections between layers. Understanding these variations and trends is important for incorporating the right flexibility in any efficient DNN engine”; see also Sze Fig. 30–32, Pg. 17: “Two mapping strategies can be used to solve the first problem as shown in Fig. 32. First, replication can be used to map shapes that do not use up the entire PE array. For example, in the third to fifth layers of AlexNet, each 2-D convolution only uses a 13×3 PE array. This structure is then replicated four times, and runs different channels and filters in each replication. The second strategy is called folding. For example, in the second layer of AlexNet, it requires a 27×5 PE array to complete the 2-D convolution. In order to fit it into the 14×12 physical PE array, it is folded into two parts, 14×5 and 13×5, and each are vertically mapped into the physical PE array. Since not all PEs are used by the mapping, the unused PEs can be clock gated to save energy consumption”—[wherein the mapping strategy is based on the neural network layer having a specific number of channels and filters]).
Regarding claim 6, Sze in view of Sankaradas teaches all the limitations of claim 1.
Sze teaches:
wherein the performing the first portion of the computation of the first neural network layer by using the first processor comprises targeting a first input channel, and the performing the second portion of the computation of the first neural network layer by using the second processor comprises targeting a second input channel different from the first input channel (Sze Pg. 8: “Each of the CONV layers in CNN is primarily composed of high-dimensional convolutions as shown in Fig. 9(b). In this computation, the input activations of a layer are structured as a set of 2-D input feature maps (ifmaps), each of which is called a channel. Each channel is convolved with a distinct 2-D filter from the stack of filters, one for each channel; this stack of 2-D filters is often referred to as a single 3-D filter”; see also Sze Fig. 27–29, Pg. 16: “To address the high-dimensional convolution of the CONV layer (i.e., multiple fmaps, filters, and channels), multiple rows can be mapped onto the same PE as shown in Fig. 29. The 2-D convolution is mapped to a set of PEs, and the additional dimensions are handled by interleaving or concatenating the additional data”; see also Sze Fig. 32, Pg. 17: “Two mapping strategies can be used to solve the first problem as shown in Fig. 32. First, replication can be used to map shapes that do not use up the entire PE array. For example, in the third to fifth layers of AlexNet, each 2-D convolution only uses a 13×3 PE array. This structure is then replicated four times, and runs different channels and filters in each replication. The second strategy is called folding. For example, in the second layer of AlexNet, it requires a 27×5 PE array to complete the 2-D convolution. In order to fit it into the 14×12 physical PE array, it is folded into two parts, 14×5 and 13×5, and each are vertically mapped into the physical PE array. Since not all PEs are used by the mapping, the unused PEs can be clock gated to save energy consumption”—[wherein the mapping strategy is based on the neural network layer having a specific number of distinct channels (e.g., input channel 1 and input channel 2) and filters which are targeted in the 2-D filters for layer convolution (i.e., performing the first and second portions of the layer computations)]).
Regarding claim 7, Sze in view of Sankaradas teaches all the limitations of claim 6.
Sze teaches:
wherein the first neural network layer is a convolution layer, a fully-connected layer, a long short term memory (LSTM) layer, or a gated recurrent unit (GRU) layer LSTM layer (Sze Fig. 8, 10, Pg. 6: “recurrent neural networks (RNNs), of which Long Short-Term Memory networks (LSTMs) [38] are a popular variant, have internal memory to allow long-term dependencies to affect the output … DNNs can be composed solely of fully-connected (FC) layers (also referred to as multi-layer perceptrons, or MLP) as shown in the leftmost layer of Fig. 8(b) … A common form of DNNs is Convolutional Neural Nets (CNNs), which are composed of multiple CONV layers as shown in Fig. 10”—[(emphasis added)]).
Regarding claim 8, Sze in view of Sankaradas teaches all the limitations of claim 1.
Sze teaches:
wherein the performing the first portion of the computation of the first neural network layer by using the first processor comprises targeting a first output channel, and the performing the second portion of the computation of the first neural network layer by using the second processor comprises targeting a second output channel different from the first output channel (Sze Fig. 18, Pg. 12: “CPUs and GPUs use parallelizaton techniques such as SIMD or SIMT to perform the MACs in parallel. All the ALUs share the same control and memory (register file). On these platforms, both the FC and CONV layers are often mapped to a matrix multiplication (i.e., the kernel computation). Fig. 18 shows how a matrix multiplication is used for the FC layer. The height of the filter matrix is the number of filters and the width is the number of weights per filter (input channels (C) × width (W) × height (H), since R = W and S = H in the FC layer); the height of the input feature maps matrix is the number of activations per input feature map (C × W × H), and the width is the number of input feature maps (one in Fig. 18(a) and N in Fig. 18(b)); finally, the height of the output feature map matrix is the number of channels in the output feature maps (M), and the width is the number of output feature maps (N), where each output feature map of the FC layer has the dimension of 1×1×number of output channels (M)”—[wherein CPUs and GPUs use parallelization techniques based on the number of output channels to process the layer in parallel (i.e., first portion and second portion) using the plurality of processors including a first and second]).
Regarding claim 9, Sze in view of Sankaradas teaches all the limitations of claim 8.
Sze teaches:
wherein the first neural network layer is a pooling layer (Sze Pg. 7–8: “Pooling: A variety of computations that reduce the dimensionality of a feature map are referred to as pooling. Pooling, which is applied to each channel separately, enables the network to be robust and invariant to small shifts and distortions. Pooling combines, or pools, a set of values in its receptive field into a smaller number of values. It can be configured based on the size of its receptive field (e.g., 2×2) and pooling operation (e.g., max or average), as shown in Fig. 12. Typically pooling occurs on non-overlapping blocks (i.e., the stride is equal to the size of the pooling). Usually a stride of greater than one is used such that there is a reduction in the dimension of the representation (i.e., feature map)”—[wherein the pooling is applied to each channel (i.e., the first neural network layer) including the first network layer (i.e., a pooling layer)]).
Regarding claim 10, Sze in view of Sankaradas teaches all the limitations of claim 1.
Sze teaches:
wherein the obtaining the neural network computation plan comprises obtaining the neural network computation plan for performing computation of a plurality of neural network layers of the artificial neural network by using the first processor and the second processor (Sze Fig. 27–28, Pg. 16: “Within the domain of neural networks, there is an area called deep learning, in which the neural networks have more than three layers, i.e., more than one hidden layer. Today, the typical numbers of network layers used in deep learning range from five to more than a thousand. In this article, we will generally use the terminology deep neural networks (DNNs) to refer to the neural networks used in deep learning … The number of filters, channels, and fmaps that can be processed at the same time is programmable, and there exists an optimal mapping for the best energy efficiency, which depends on the shape configuration of the DNN as well as the hardware resources provided, e.g., the number of PEs and the size of the memory in the hierarchy. Since all of the variables are known before runtime, it is possible to build a compiler (i.e., mapper) to perform this optimization off-line to configure the hardware for different mappings of the RS dataflow for different DNNs as shown in Fig. 30”; see also Sze Fig. 9, Pg. 6: “A common form of DNNs is Convolutional Neural Nets (CNNs), which are composed of multiple CONV layers as shown in Fig. 10. In such networks, each layer generates a successively higher-level abstraction of the input data, called a feature map (fmap), which preserves essential yet unique information. Modern CNNs are able to achieve superior performance by employing a very deep hierarchy of layers. CNN are widely used in a variety of applications including image understanding [3], speech recognition [39], game play [6], robotics [32], etc. This paper will focus on its use in image processing, specifically for the task of image classification [3]. Each of the CONV layers in CNN is primarily composed of high-dimensional convolutions as shown in Fig. 9(b). In this computation, the input activations of a layer are structured as a set of 2-D input feature maps (ifmaps), each of which is called a channel. Each channel is convolved with a distinct 2-D filter from the stack of filters, one for each channel; this stack of 2-D filters is often referred to as a single 3-D filter.”).
Regarding claim 11,
Sze teaches:
an electronic device configured to process an artificial neural network, the electronic device comprising: a memory configured to store instructions; and a plurality of processors configured to execute the instructions and comprising a first processor and a second processor (Sze Pg. 6: “Many of the embedded platforms that perform DNN inference have stringent energy consumption, compute and memory cost limitations; efficient processing of DNNs have thus become of prime importance under these constraints. Therefore, in this article, we will focus on the compute requirements for inference rather than training”; see also Sze Pg. 14: “The fundamental component of both the CONV and FC layers are the multiply-and-accumulate (MAC) operations, which can be easily parallelized. In order to achieve high performance, highly-parallel compute paradigms are very commonly used, including both temporal and spatial architectures as shown in Fig. 17. The temporal architectures appear mostly in CPUs or GPUs, and employ a variety of techniques to improve parallelism such as vectors (SIMD) or parallel threads (SIMT). Such temporal architecture use a centralized control for a large number of ALUs. These ALUs can only fetch data from the memory hierarchy and cannot communicate directly with each other. In contrast, spatial architectures use dataflow processing, i.e., the ALUs form a processing chain so that they can pass data from one to another directly. Sometimes each ALU can have its own control logic and local memory, called a scratchpad or register file. We refer to the ALU with its own local memory as a processing engine (PE). Spatial architectures are commonly used for DNNs in ASIC and FPGA-based designs. In this section, we will discuss the different design strategies for efficient processing on these different platforms, without any impact on accuracy (i.e., all approaches in this section produce bit-wise identical results)”—[(emphasis added)])
The remaining limitations of claim 11 are substantially the same as the limitations of claim 1. Thus, the remaining limitations of claim 11 are rejected using the same reasoning and analysis as claim 1 above.
Regarding claims 15, although varying in scope, the limitations of claim15 are substantially the same as the limitations of claim 5. Thus, claim 15 is rejected using the same reasoning and analysis as claims 5 above.
Claims 3 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Sze in view of Sankaradas and further in view of Ann Chapin, (US 10331660 B1), hereinafter “Chapin”.
Regarding claim 3, Sze in view of Sankaradas teaches all the limitations of claim 1.
Sze teaches:
[Obtaining] a data type used in the respective first processor and second processor (Sze Fig. 33, Pg. 17, Right Column: “Fig. 33 compares the chip and DRAM energy consumption of each dataflow for the CONV layers of AlexNet with a batch size of 16. The WS and OS dataflows have the lowest energy consumption for accessing weights and partial sums, respectively. However, the RS dataflow has the lowest total energy consumption since it optimizes for the overall energy efficiency instead of only for a certain data type”; see also Sze Pg. 20, Right Column: “Finally, the quantization can be fixed (i.e., the same method of quantization is used for all data types and layers, filters, and channels in the network); or it can be variable (i.e., different methods of quantization can be used for weights and activations, and different layers, filters, and channels in the network)”; see also Sze Pg. 26, Right Column: “To evaluate the properties of a given DNN model, we should consider the following metrics: The accuracy of the model in terms of the top-5 error on datasets such as ImageNet. Also, the type of data augmentation used (e.g., multiple crops, ensemble models) should be reported”—[wherein the system knows the data types of all the data used by all processors including the first and second processors]),
wherein the performing the first portion of the computation of the first neural network layer by using the first processor, and the performing the second portion of the computation of the first neural network layer by using the second processor is performed based on the obtained neural network computation plan comprises,
performing the first portion of the computation of the first neural network layer by using the first processor, and performing the second portion of the computation of the first neural network layer by using the second processor, based on the obtained neural network computation plan and the data type (Sze Fig. 24, Pg. 14: “The operation of DNN accelerators is analogous to that of general-purpose processors as illustrated in Fig. 24 [81]. In conventional computer systems, the compiler translates the program into machine-readable binary codes for execution given the hardware architecture (e.g., x86 or ARM); in the processing of DNNs, the mapper translates the DNN shape and size into a hardware-compatible computation mapping for execution given the dataflow. While the compiler usually optimizes for performance, the mapper optimizes for energy efficiency”—[wherein the system uses the mapper to translate the DNN for computation including the given dataflow]).
Although Sze indeed teaches knowing, using, and reporting data types, Sze in view of Sankaradas does not appear to explicitly teach:
obtaining a data type.
However, Chapin teaches:
obtaining a data type (Chapin Col. 1, line 58 – Col. 2, Line 4: “The one or more instructions, when executed by the one or more processors, may cause the one or more processors to receive, from the second system, information identifying a manner in which the data lineage record was included in the second data structure. The one or more instructions, when executed by the one or more processors, may cause the one or more processors to map the first data structure and the second data structure based on the information identifying the manner in which the data lineage record was included in the second data structure. The one or more instructions, when executed by the one or more processors, may cause the one or more processors to perform an action related to the data after mapping the first data structure and the second data structure”; see also Chapin Col. 10, lines 60–67: “In some implementations, a unique value that the source system 230 generates may have the same data type as a corresponding attribute for the unique value. For example, if an attribute is configured as an integer data type, then the system 230 may generate a unique value for the attribute that is of an integer data type; if an attributed is configured as a date data type, then the system 230 may generate a unique value for the attribute that is of a date data type; and so on”; see also Chapin claims 2 and 14: “where, when mapping the first data structure and the second data structure, the one or more processors are to: map the first data structure and the second data structure by mapping the unique values of the data lineage record in the first data structure and the unique values of the data lineage record in the second data structure”—[wherein the system uses the processors to generate (i.e., obtain) the data type of the data to use if further operations when mapping data for computation]).
The methods of Sze, the teachings of Chapin, and the instant application are analogous art because they pertain to using processors to map data for optimization methods including machine learning.
It would be obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the methods of Sze with the teachings of Chapin to provide for a method to use the processors to obtain the data types used to map processes of the computation. One would be motivated to do so to improve efficiency of the computation (Chapin Col. 15, lines 59–61: “In this way, system 230 may facilitate mapping of a first data structure and a second data structure via generation of a data lineage record. This improves determination of a data lineage of data by improving an efficiency of determining a data lineage”).
Regarding claim 13, although varying in scope, the limitations of claim 13 are substantially the same as the limitations of claim 3. Thus, claim 13 is rejected using the same reasoning and analysis as claim 3.
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NICHOLAS SHINE whose telephone number is (571)272-2512. The examiner can normally be reached M-F, 9a-5p ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David Yi can be reached on (571) 270-7519. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/N.B.S./Examiner, Art Unit 2126
/VAN C MANG/Primary Examiner, Art Unit 2126