DETAILED ACTION
Claims 1-26 are presented for examination.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-17 and 20-26 are rejected under 35 U.S.C. 103 as being unpatentable over Linyu (CN 113610213 A) in view of Taher et al. (US 2022/0066813 A1) further in view of Chrysos et al. (US 2022/0100680 A1).
As to claim 1, Linyu teaches a method, comprising:
retrieving multi-channel data from a memory (data of multiple channels; page 3, paragraph 4th and inputting data of a plurality of channels in parallel; page 5, 10th paragraph); and
processing the multi-channel data with a hardware accelerator implementing a multi-
stage processing pipeline for each channel of a plurality of channels (a multichannel parallel convolutional neural network accelerator is provided, wherein a convolutional layer of a convolutional neural network is in a structure of inputting data of a plurality of channels in parallel … the PE operation unit comprises more than one n multipliers and a group of addition trees, wherein the n multipliers respectively receive input characteristic data of n input channels; page 2, 9th – 10th paragraphs, page 5, last 2 paragraphs);
wherein processing the multi-channel data comprises sequentially processing a plurality of batches, wherein each batch of the plurality of batches comprises one or more stages from different multi-stage processing pipelines and that are adjacent to each other in the cyclically descending order, and wherein processing each batch of the plurality of batches comprises processing the corresponding one or more stages in parallel (The convolution kernel of n x n can be divided into n x n time intervals according to the operation rule provided by the invention for operation, and the operation results of all time intervals are accumulated to obtain the convolution operation result of the convolution kernel of n x n. The operation of each time interval is performed in parallel on the channel dimension so as to realize the parallel convolution operation of multiple channels; page 4, 2nd paragraph and The n PE operation units in fig. 2 operate in parallel; page 4, 3rd paragraph and page 10, paragraph 14th – page 11, 8th paragraph);
wherein a first batch of the plurality of batches comprises a plurality of stages (The operation of each time interval is performed in parallel on the channel dimension so as to realize the parallel convolution operation of multiple channels; page 4, 2nd paragraph and The n PE operation units in fig. 2 operate in parallel; page 8, 11th paragraph); and
Linyu does not teach wherein the multi-stage processing pipelines are arranged in a cyclically descending order based on a total number of stages of each multi-stage processing pipeline, and wherein each stage of the multi-stage processing pipeline of each channel has a loop-carry dependency.
However, Taher, in the same field of endeavor, teaches wherein the multi-stage processing pipelines are arranged in a cyclically descending order based on a total number of stages of each multi-stage processing pipeline (dynamically generating an optimized processing pipeline for tasks, where the tasks are defined declaratively according to task manifests as a number of stages of data transformations; paragraph [0022], identifies one or more tasks to be executed from defined tasks that are defined declaratively as a number of stages of input data, data transformations, and output data; paragraph [0023] and determine dependencies between the tasks based on their defined stages; and creating one or more optimized data processing pipelines by performing a dependency resolution procedure on stages of all tasks in parallel using the task dependencies to determine the order of the stages; paragraph [0024] and Fig. 5).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply the teaching of Taher to the system of Linyu because Taher teaches a method for dynamically generating an optimized processing pipeline for tasks, which includes identifying one or more tasks to be executed from defined tasks that are defined declaratively as a number of stages of input data, thus avoiding the problem of static pipelines which do not change without user input (paragraphs [0006]-[0007]).
Chrysos teaches, in the same field of endeavor, stage of the multi-stage processing pipeline has a loop-carry dependency (To elucidate the performance implications of control flow (or instruction-flow) operations in a dataflow execution paradigm, consider the following code constructs … Loops with unknown exit conditions: If a loop contains an exit condition that is dependent on memory, then the associated conditional branch will create a loop carry dependency; paragraph [0294]).
It would have been obvious o one of ordinary skill in the art before the effective filing date of the claimed invention to apply the teaching of Chrysos to the system of Linyu as modified by Taher because Chrysos teaches a method that allows the system of Linyu to handle different types of loops when executing the dataflow tasks.
As to claim 2, Linyu as modified by Taher and Chrysos teaches the method of claim 1, wherein one or more subsequent batches relative to the first batch comprises fewer stages than the first batch (see Taher: Task 4; see Fig. 4C).
As to claim 3, Linyu as modified by Taher and Chrysos teaches the method of claim 2, wherein processing a batch of the one or more subsequent batches comprises performing a null operation in parallel with processing one or more stages corresponding to the batch of the one or more subsequent batches (see Chrysos: There is a basic dataflow dependency through the linked-list traversal. In addition to that, the branch conditions (the break and the test of p (null condition)) become dependencies on all operations guarded by those branches; paragraph [0294]).
As to claim 4, Linyu as modified by Taher and Chrysos teaches the method of claim 2, wherein processing each of the one or more subsequent batches further comprises skipping over one or more stages of one or more multi-stage processing pipelines in the cyclically descending order (see Taher: Fig. 4B, in which, Stage 3 423 is skipped).
As to claim 5, Linyu as modified by Taher and Chrysos teaches the method of claim 1, wherein none of the batches are defined across multiple stages of the multi-stage processing pipelines (see Linyu: The convolution kernel of n x n can be divided into n x n time intervals according to the operation rule provided by the invention for operation, and the operation results of all time intervals are accumulated to obtain the convolution operation result of the convolution kernel of n x n. The operation of each time interval is performed in parallel on the channel dimension so as to realize the parallel convolution operation of multiple channels; page 4, 2nd paragraph).
As to claim 6, Linyu as modified by Taher and Chrysos teaches the method of claim 1, wherein some of the batches are defined across multiple stages of the multi-stage processing pipelines (see Taher: see Fig. 4C).
As to claim 7, Linyu as modified by Taher and Chrysos teaches the method of claim 1, wherein a size of the batches is determined based on one or more of a number of multipliers of the hardware accelerator, a number of load resources of the hardware accelerator, and the loop-carry dependency of the stages (see Linyu: The convolution kernel of n x n can be divided into n x n time intervals according to the operation rule provided by the invention for operation, and the operation results of all time intervals are accumulated to obtain the convolution operation result of the convolution kernel of n x n; page 4, 2nd paragraph and the PE operation unit comprises more than one n multipliers and a group of addition trees, wherein the n multipliers respectively receive input characteristic data of n input channels and weight parameters corresponding to the input characteristic data and carry out convolution operation; page 5, last paragraph) and (see Chrysos: If a loop contains an exit condition that is dependent on memory, then the associated conditional branch will create a loop carry dependency; paragraph [0294], which depend on the outcome of the loop step).
As to claim 8, Linyu as modified by Taher and Chrysos does not teach the method of claim 1, wherein a size of the batches is determined based on a total number of channels in the plurality of channels divided by two.
However, Linyu teaches the convolution operation result of the convolution kernel of n x n (page 4, 2nd paragraph) which could be any number.
Therefore, it would have been obvious that the size of the batch could be determined based on a total number of channels in the plurality of channels divided by two.
As to claim 9, Linyu as modified by Taher and Chrysos teaches the method of claim 1, wherein a size of the batches is a multiple of a total number of channels in the plurality of channels (see Linyu: n x n time intervals according to the operation rule provided by the invention for operation, and the operation results of all time intervals are accumulated to obtain the convolution operation result of the convolution kernel of n x n; page 4, 2nd paragraph).
As to claim 10, Linyu as modified by Taher and Chrysos teaches the method of claim 1, wherein the total number of stages of each multi-stage processing pipeline differs between three or more of the multi-stage processing pipelines (see Taher: see Fig. 4B and 4C).
As to claim 11, Linyu as modified by Taher and Chrysos teaches the method of claim 1, wherein the cyclically descending order is determined by identifying the total number of stages of each multi-stage processing pipeline and sorting the multi-stage processing pipelines from greatest to least with respect to the total numbers of stages.
As to claim 12, Linyu as modified by Taher and Chrysos teaches the method of claim 1, wherein to process the multi-channel data, the hardware accelerator performs operations on the multi-channel data using a very large instruction word (VLIW) instruction set architecture (see Chrysos: VLIW; paragraph [0841]) or using a single instruction multiple data (SIMD) instruction (see Chrysos: SIMD; paragraph [0309]).
As to claim 13, Linyu as modified by Taher and Chrysos teaches the method of claim 10, wherein a number of the operations is based on a number of stages of all the multi-stage processing pipelines (see Chrysos: number of operations; paragraph [0305], [0306], [0317]).
As to claim 14, Linyu as modified by Taher and Chrysos teaches the method of claim 1, wherein each of the plurality of batches comprises the same number of stages (see Taher: Fig. 4C, Task 1, Task 2 and Task 3).
As to claim 15, Linyu as modified by Taher and Chrysos teaches the method of claim 1, further comprising arranging the multi-stage processing pipelines in the cyclically descending order using pointers pointing to respective locations in the memory (see Linyu: see Fig. 5 and pages 9-10).
As to claim 16, Linyu as modified by Taher and Chrysos teaches the method of claim 1, wherein sequentially processing the plurality of batches comprises for each batch of the plurality of batches, loading corresponding coefficients from the memory into the hardware accelerator (see Linyu: weight parameters; see pages 10-11, Figs. 7-15).
As to claim 17, Linyu as modified by Taher and Chrysos teaches the method of claim 16, wherein loading corresponding coefficients from the memory into the hardware accelerator comprises using a load operation capable of loading multiple coefficients within the same load operation, and wherein at least one load operation loads coefficients of different stages from memory into the hardware accelerator (see Linyu: loading weight parameters in multiple period; see pages 10-11, Figs. 7-15).
As to claim 20, Linyu teaches a device, comprising:
memory (a storage medium; page 12, 7th paragraph); and
a hardware accelerator coupled to the memory (a multichannel parallel convolutional neural network accelerator; page 8, 9th paragraph) and configured to:
retrieve multi-channel data from the memory (data of multiple channels; page 3, paragraph 4th and inputting data of a plurality of channels in parallel; page 5, 10th paragraph); and
process the multi-channel data by implementing a multi-stage processing pipeline for each channel of a plurality of channels (a multichannel parallel convolutional neural network accelerator is provided, wherein a convolutional layer of a convolutional neural network is in a structure of inputting data of a plurality of channels in parallel … the PE operation unit comprises more than one n multipliers and a group of addition trees, wherein the n multipliers respectively receive input characteristic data of n input channels; page 2, 9th – 10th paragraphs, page 5, last 2 paragraphs);
wherein processing the multi-channel data comprises sequentially processing a plurality of batches, wherein each batch of the plurality of batches comprises one or more stages from different multi-stage processing pipelines and that are adjacent to each other in the cyclically descending order, and wherein processing each batch of the plurality of batches comprises processing the corresponding one or more stages in parallel (The convolution kernel of n x n can be divided into n x n time intervals according to the operation rule provided by the invention for operation, and the operation results of all time intervals are accumulated to obtain the convolution operation result of the convolution kernel of n x n. The operation of each time interval is performed in parallel on the channel dimension so as to realize the parallel convolution operation of multiple channels; page 4, 2nd paragraph and The n PE operation units in fig. 2 operate in parallel; page 4, 3rd paragraph and page 10, paragraph 14th – page 11, 8th paragraph);
wherein a first batch of the plurality of batches comprises a plurality of stages (The operation of each time interval is performed in parallel on the channel dimension so as to realize the parallel convolution operation of multiple channels; page 4, 2nd paragraph and The n PE operation units in fig. 2 operate in parallel; page 8, 11th paragraph); and
Linyu does not teach wherein the multi-stage processing pipelines are arranged in a cyclically descending order based on a total number of stages of each multi-stage processing pipeline, and wherein each stage of the multi-stage processing pipeline of each channel has a loop-carry dependency.
However, Taher, in the same field of endeavor, teaches wherein the multi-stage processing pipelines are arranged in a cyclically descending order based on a total number of stages of each multi-stage processing pipeline (dynamically generating an optimized processing pipeline for tasks, where the tasks are defined declaratively according to task manifests as a number of stages of data transformations; paragraph [0022], identifies one or more tasks to be executed from defined tasks that are defined declaratively as a number of stages of input data, data transformations, and output data; paragraph [0023] and determine dependencies between the tasks based on their defined stages; and creating one or more optimized data processing pipelines by performing a dependency resolution procedure on stages of all tasks in parallel using the task dependencies to determine the order of the stages; paragraph [0024] and Fig. 5).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply the teaching of Taher to the system of Linyu because Taher teaches a method for dynamically generating an optimized processing pipeline for tasks, which includes identifying one or more tasks to be executed from defined tasks that are defined declaratively as a number of stages of input data, thus avoiding the problem of static pipelines which do not change without user input (paragraphs [0006]-[0007]).
Chrysos teaches, in the same field of endeavor, stage of the multi-stage processing pipeline has a loop-carry dependency (To elucidate the performance implications of control flow (or instruction-flow) operations in a dataflow execution paradigm, consider the following code constructs … Loops with unknown exit conditions: If a loop contains an exit condition that is dependent on memory, then the associated conditional branch will create a loop carry dependency; paragraph [0294]).
It would have been obvious o one of ordinary skill in the art before the effective filing date of the claimed invention to apply the teaching of Chrysos to the system of Linyu as modified by Taher because Chrysos teaches a method that allows the system of Linyu to handle different types of loops when executing the dataflow tasks.
As to claim 21, see rejection of claim 2 above.
As to claim 22, see rejection of claim 4 above.
As to claim 23, see rejection of claim 5 above.
As to claim 24, see rejection of claim 6 above.
As to claim 25, see rejection of claim 7 above.
As to claim 26, Linyu teaches an integrated circuit (a computer; page 12, 9th paragraph), comprising:
control circuitry (a CPU or processor; inherent from “a computer”); and
hardware accelerator circuitry (a multichannel parallel convolutional neural network accelerator; page 8, 9th paragraph);
wherein the control circuitry is configured to identify multi-channel data from a memory and provide the multi-channel data to the hardware accelerator circuitry in response to a request to process the multi-channel data (According to one aspect of the invention, a multichannel parallel convolutional neural network acceleration method is provided, and a method for inputting convolutional layers of a convolutional neural network in parallel by adopting data of multiple channels comprises the following steps; page 3, paragraph 4th and inputting data of a plurality of channels in parallel; page 5, 10th paragraph); and
wherein the hardware accelerator circuitry is configured to process the multi-channel data by implementing a multi-stage processing pipeline for each channel of a plurality of channels (a multichannel parallel convolutional neural network accelerator is provided, wherein a convolutional layer of a convolutional neural network is in a structure of inputting data of a plurality of channels in parallel … the PE operation unit comprises more than one n multipliers and a group of addition trees, wherein the n multipliers respectively receive input characteristic data of n input channels; page 2, 9th – 10th paragraphs, page 5, last 2 paragraphs);
wherein processing the multi-channel data comprises sequentially processing a plurality of batches, wherein each batch of the plurality of batches comprises one or more stages from different multi-stage processing pipelines and that are adjacent to each other in the cyclically descending order, and wherein processing each batch of the plurality of batches comprises processing the corresponding one or more stages in parallel (The convolution kernel of n x n can be divided into n x n time intervals according to the operation rule provided by the invention for operation, and the operation results of all time intervals are accumulated to obtain the convolution operation result of the convolution kernel of n x n. The operation of each time interval is performed in parallel on the channel dimension so as to realize the parallel convolution operation of multiple channels; page 4, 2nd paragraph and The n PE operation units in fig. 2 operate in parallel; page 4, 3rd paragraph and page 10, paragraph 14th – page 11, 8th paragraph);
wherein a first batch of the plurality of batches comprises a plurality of stages (The operation of each time interval is performed in parallel on the channel dimension so as to realize the parallel convolution operation of multiple channels; page 4, 2nd paragraph and The n PE operation units in fig. 2 operate in parallel; page 8, 11th paragraph); and
Linyu does not teach wherein the multi-stage processing pipelines are arranged in a cyclically descending order based on a total number of stages of each multi-stage processing pipeline, and wherein each stage of the multi-stage processing pipeline of each channel has a loop-carry dependency.
However, Taher, in the same field of endeavor, teaches wherein the multi-stage processing pipelines are arranged in a cyclically descending order based on a total number of stages of each multi-stage processing pipeline (dynamically generating an optimized processing pipeline for tasks, where the tasks are defined declaratively according to task manifests as a number of stages of data transformations; paragraph [0022], identifies one or more tasks to be executed from defined tasks that are defined declaratively as a number of stages of input data, data transformations, and output data; paragraph [0023] and determine dependencies between the tasks based on their defined stages; and creating one or more optimized data processing pipelines by performing a dependency resolution procedure on stages of all tasks in parallel using the task dependencies to determine the order of the stages; paragraph [0024] and Fig. 5).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply the teaching of Taher to the system of Linyu because Taher teaches a method for dynamically generating an optimized processing pipeline for tasks, which includes identifying one or more tasks to be executed from defined tasks that are defined declaratively as a number of stages of input data, thus avoiding the problem of static pipelines which do not change without user input (paragraphs [0006]-[0007]).
Chrysos teaches, in the same field of endeavor, stage of the multi-stage processing pipeline has a loop-carry dependency (To elucidate the performance implications of control flow (or instruction-flow) operations in a dataflow execution paradigm, consider the following code constructs … Loops with unknown exit conditions: If a loop contains an exit condition that is dependent on memory, then the associated conditional branch will create a loop carry dependency; paragraph [0294]).
It would have been obvious o one of ordinary skill in the art before the effective filing date of the claimed invention to apply the teaching of Chrysos to the system of Linyu as modified by Taher because Chrysos teaches a method that allows the system of Linyu to handle different types of loops when executing the dataflow tasks.
Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Linyu (CN 113610213 A) in view of Taher et al. (US 2022/0066813 A1) and Chrysos et al. (US 2022/0100680 A1) further in view of Rub et al. (US 2020/0278824 A1).
As to claim 18, Linyu as modified by Taher and Chrysos does not teach the method of claim 1, wherein each stage of the multi-stage processing pipeline of each channel is an IIR filter.
However, Rub teaches each stage is an IIR filter and is used to process multichannel data (paragraphs [0037]-[0038]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply the teaching of Rub to the system Linyu as modified by Taher and Chrysos because Rub teaches a method to minimize the latency of processing tasks in digital signal processing (paragraph [0002]).
Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Linyu (CN 113610213 A) in view of Taher et al. (US 2022/0066813 A1) and Chrysos et al. (US 2022/0100680 A1) further in view of Tassart (US 2015/0180436 A1).
As to claim 19, Linyu as modified by Taher and Chrysos does not teach wherein each stage of the multi-stage processing pipeline of each channel is a Biquadratic filter.
However, Tassart teaches each stage is a Biquadratic filter and is used to process multichannel data (Biquadratic filter; paragraph [0043]-[0044]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply the teaching of Tassart to the system Linyu as modified by Taher and Chrysos because Tassart teaches a method for improving the equalization of a numeric audio signal (paragraph [0009]).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DIEM K CAO whose telephone number is (571)272-3760. The examiner can normally be reached Monday-Friday 8:00am-4:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, April Blair can be reached at 571-270-1014. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/DIEM K CAO/Primary Examiner, Art Unit 2196
DC
January 5, 2026