DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 12/01/2025 has been entered.
Claims 1, 4, 11, 14, and 18 have been amended. Claims 1-9, 11-16, and 18-20 are pending and have been examined.
Claim Rejections - 35 USC § 101
The rejections under 35 USC § 101 to claims 18-20 are WITHDRAWN in view of Applicant’s amendments to Claim 18.
Claim Objections
Claims 1, 11, and 18 are objected to because of the following informalities: “wherein the drive preset information including…” should be “wherein the drive preset information includes…”. Appropriate correction is required.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-2, 4-6, 8, 11-12, 14-16, and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al. (U.S. Patent Application Publication No. US 20210081691 A1), hereinafter "Chen" in view of Eckert et al. “Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks”, hereinafter “Eckert”.
Regarding Claim 1, Chen teaches:
An electronic device for performing a neural network operation on input data based on a trained learning model, the electronic device comprising:
at least one processor configured to execute a computer program to implement (¶218, “one or more processors operable to execute instructions, stored in the memory, to perform any of the methods described above”):
a model parser configured to generate model metadata by converting a trained learning model into a layered graph (¶200, "the operation unit graph can be a deep neural network", ¶46, "the operation unit graph 204 can implement convolutional neural network (CNN) processing with several layers of varying sizes and data type" and "the operation unit graph 204 can involve memory operations to access the inputs and weights and floating point operations", Fig 10, Operation Unit Graph 1000 shows metadata such as architecture, weights, biases), the layered graph including subgraphs (Fig 10, ¶88, "The fusion algorithm 500 identifies the matched subgraph comprising the Conv2D operation unit 1002, the BatchNorm operation unit 1012, the Conv2D operation unit 1022, the BatchNorm operation unit 1032, and the Add operation unit 1042, along with their dataflows"); and
a control manager configured to generate control data indicating, for each of the subgraphs, a hardware block to be assigned, among a plurality of hardware blocks of different types, for performing a neural network operation included in a corresponding subgraph, based on a processing capability of the hardware block (Fuser is the control manager, Node patterns that are fused are the subgraphs, ¶71, "architectural hints 202 describe a list of node patterns that are fused into one operation which can be executed on one physical compute unit of the reconfigurable data processor 100", ¶92, "allocate the available physical compute units and/or physical memory units of the reconfigurable data processor 100 to operation units of the fused operation unit graph 224”, ¶200, "the operation unit graph can be a deep neural network", ¶91, “using performance estimation 1200 to allocate available physical compute units and/or physical memory units of the reconfigurable data processor 100 to operation units”); and
a memory configured to store the model metadata, the control data, and the trained learning model and configured to provide the model metadata and the control data based on a request for an operation of the trained learning model (¶172, "another implementation of the method described in this section can include a system including memory and one or more processors operable to execute instructions, stored in the memory, to perform any of the methods described above.", ¶90, "used for allocating available physical compute units and/or physical memory units of the reconfigurable data processor 100 to operation units", ¶92, "allocate the available physical compute units and/or physical memory units of the reconfigurable data processor 100 to operation units of the fused operation unit graph 224 and then to execute the fused operation unit graph"),
wherein the at least one processor is further configured to execute the computer program to implement a dispatcher configured to instruct a hardware block, which is assigned to a subgraph based on the control data provided from the memory, to perform the operation of the trained learning model (¶92, "allocate the available physical compute units and/or physical memory units of the reconfigurable data processor 100 to operation units of the fused operation unit graph 224 and then to execute the fused operation unit graph", ¶87, "An executer 244 executes the fused operation unit graph 224 on the reconfigurable data processor 100 based on the allocation.").
Chen does not expressly teach:
wherein the control manager is configured to generate the control data based on at least one of mode information or drive preset information,
wherein the mode information including information indicating whether one of a power saving mode and a boost mode is set, and
wherein the drive preset information including at least one of dynamic voltage frequency scaling (DVFS) level information, last level cache information, or data transmission bandwidth information
However, Eckert teaches:
wherein the control manager is configured to generate the control data based on at least one of mode information or drive preset information,
wherein the mode information including information indicating whether one of a power saving mode and a boost mode is set, and
wherein the drive preset information including at least one of dynamic voltage frequency scaling (DVFS) level information, last level cache information (Eckert, p. 385, col. 1, ¶3, “We provide a brief overview of a cache’s geometry… Shared Last Level Cache (LLC) is distributed into many slices (14 for Xeon E5 we modeled”, p. 388, col. 1, ¶4, “The Neural Cache architecture transforms SRAM arrays in LLC to compute functional units”, p. 3, col. 2, ¶3, “Up to 256 elements can be processed in parallel in a single array. A 2.5 MB LLC slice has 320 8KB arrays as shown in Figure 3”), or data transmission bandwidth information.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the last level cache information and compute function units of Eckert with the control data of Chen as part of the architecture specification used during Chen’s fusion and allocation. The motivation to do so would be to be able to use massively parallel compute units for inference and improve inference latency (Eckert, p. 383, Abstract, “re-purposes cache structures to transform them into massively parallel compute units capable of running inferences for Deep Neural Networks” and “the proposed architecture can improve inference latency”).
Regarding Claim 2, Chen in view of Eckert teaches the electronic device of Claim 1 as referenced above. Chen further teaches:
wherein the model metadata comprises at least one of information about the layered graph, tensor information, weight information, or bias information (Fig 10, Operation Unit Graph 1000 shows metadata such as architecture, tensor sizes, weights, biases, ¶46, "the operation unit graph 204 can involve memory operations to access the inputs and weights and floating point operations").
Regarding Claim 4, Chen in view of Eckert teaches the electronic device of Claim 1 as referenced above. Chen further teaches:
wherein the control manager is configured to generate the control data further based on at least one of hardware resource information or model information (architecture specification has hardware resource information and operation unit graph has model information, ¶45, "Fuser 214 takes as input the operation unit graph 204, architectural hints 202, and architecture specification 212 and produces a fused operation unit graph", ¶75, "Fuser 214 performs the fusion taking into account a target architecture of the reconfigurable data processor 100. The target architecture is specified in the architecture specification 212").
Regarding Claim 5, Chen in view of Eckert teaches the electronic device of Claim 4 as referenced above. Chen further teaches:
wherein the hardware resource information comprises information related to preprocessing and postprocessing on input data, the preprocessing and the postprocessing corresponding to a hardware block assigned for performing the neural network operation (¶75, "the architectural hints 202 are specific to the target architecture of the reconfigurable data processor 100", 119, "a reconfigurable data processor with an array of configurable units", ¶74, "one physical compute unit of the reconfigurable data processor 100 performs the 2D convolution operation and the batch normalization for two sets of data and then adds their results.", 2D convolution operation and batch normalization is preprocessing and adding their results is postprocessing).
Regarding Claim 6, Chen in view of Eckert teaches the electronic device of Claim 4 as referenced above. Chen further teaches:
wherein the model information comprises at least one of information related to preprocessing on input data corresponding to the trained learning model, information related to postprocessing on the input data corresponding to the trained learning model, and information about interworking between hardware blocks assigned for performing the neural network operation (Architectural hints contain all three types of information, ¶75, "the architectural hints 202 are specific to the target architecture of the reconfigurable data processor 100", ¶19, "a reconfigurable data processor with an array of configurable units", ¶74, "one physical compute unit of the reconfigurable data processor 100 performs the 2D convolution operation and the batch normalization for two sets of data and then adds their results.").
Regarding Claim 8, Chen in view of Eckert teaches the electronic device of Claim 1 as referenced above. Chen further teaches:
wherein the at least one processor is configured to execute the computer program (¶218, “one or more processors operable to execute instructions, stored in the memory, to perform any of the methods described above”) to implement a task manager configured to assign a hardware block to perform an operation for each of the subgraphs in the layered graph (¶86, "An allocator 234 allocates the physical compute units and/or physical memory units of the reconfigurable data processor 100 to the fused operation unit graph 224.").
Regarding Claim 11, Chen teaches:
A method of performing a neural network operation, the method comprising:
generating model metadata by converting a learning model into a layered graph (¶200, "the operation unit graph can be a deep neural network", ¶46, "the operation unit graph 204 can implement convolutional neural network (CNN) processing with several layers of varying sizes and data type" and "the operation unit graph 204 can involve memory operations to access the inputs and weights and floating point operations", Fig 10, Operation Unit Graph 1000 shows metadata such as architecture, weights, biases), the layered graph including subgraphs (Fig 10, ¶88, "The fusion algorithm 500 identifies the matched subgraph comprising the Conv2D operation unit 1002, the BatchNorm operation unit 1012, the Conv2D operation unit 1022, the BatchNorm operation unit 1032, and the Add operation unit 1042, along with their dataflows");
generating control data indicating, for each of the subgraphs, a hardware block to be assigned, among a plurality of hardware blocks of different types, for performing a neural network operation included in a corresponding subgraph, based on a processing capability of the hardware block (Fuser is the control manager, Node patterns that are fused are the subgraphs, ¶71, "architectural hints 202 describe a list of node patterns that are fused into one operation which can be executed on one physical compute unit of the reconfigurable data processor 100", ¶92, "allocate the available physical compute units and/or physical memory units of the reconfigurable data processor 100 to operation units of the fused operation unit graph 224”);
storing the model metadata and the control data in a memory (¶172, "another implementation of the method described in this section can include a system including memory and one or more processors operable to execute instructions, stored in the memory, to perform any of the methods described above.", ¶90, "used for allocating available physical compute units and/or physical memory units of the reconfigurable data processor 100 to operation units");
reading the model metadata and the control data from the memory based on a request for an operation of the learning model (¶92, "allocate the available physical compute units and/or physical memory units of the reconfigurable data processor 100 to operation units of the fused operation unit graph 224 and then to execute the fused operation unit graph"); and
instructing a hardware block, which is assigned to a subgraph based on the control data provided from the memory, to perform the operation of the learning model (¶92, "allocate the available physical compute units and/or physical memory units of the reconfigurable data processor 100 to operation units of the fused operation unit graph 224 and then to execute the fused operation unit graph", ¶87, "An executer 244 executes the fused operation unit graph 224 on the reconfigurable data processor 100 based on the allocation.").
Chen does not expressly teach:
wherein the control manager is configured to generate the control data based on at least one of mode information or drive preset information,
wherein the mode information including information indicating whether one of a power saving mode and a boost mode is set, and
wherein the drive preset information including at least one of dynamic voltage frequency scaling (DVFS) level information, last level cache information, or data transmission bandwidth information
However, Eckert teaches:
wherein the control manager is configured to generate the control data based on at least one of mode information or drive preset information,
wherein the mode information including information indicating whether one of a power saving mode and a boost mode is set, and
wherein the drive preset information including at least one of dynamic voltage frequency scaling (DVFS) level information, last level cache information (Eckert, p. 385, col. 1, ¶3, “We provide a brief overview of a cache’s geometry… Shared Last Level Cache (LLC) is distributed into many slices (14 for Xeon E5 we modeled”, p. 388, col. 1, ¶4, “The Neural Cache architecture transforms SRAM arrays in LLC to compute functional units”, p. 3, col. 2, ¶3, “Up to 256 elements can be processed in parallel in a single array. A 2.5 MB LLC slice has 320 8KB arrays as shown in Figure 3”), or data transmission bandwidth information.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the last level cache information and compute function units of Eckert with the control data of Chen as part of the architecture specification used during Chen’s fusion and allocation. The motivation to do so would be to be able to use massively parallel compute units for inference and improve inference latency (Eckert, p. 383, Abstract, “re-purposes cache structures to transform them into massively parallel compute units capable of running inferences for Deep Neural Networks” and “the proposed architecture can improve inference latency”).
Regarding Claim 12, the rejection of Claim 11 is incorporated and further, the claim is rejected for the same reasons as set forth in Claim 2.
Regarding Claim 14, the rejection of Claim 11 is incorporated and further, the claim is rejected for the same reasons as set forth in Claim 4.
Regarding Claim 15, the rejection of Claim 14 is incorporated and further, the claim is rejected for the same reasons as set forth in Claim 5.
Regarding Claim 16, the rejection of Claim 14 is incorporated and further, the claim is rejected for the same reasons as set forth in Claim 6.
Regarding Claim 18, Chen teaches:
A non-transitory computer-readable storage medium storing programs executable by at least one processor to implement a neural network module for controlling a neural network operation on input data based on a trained learning model, the neural network module being implemented as programs executable by at least processor (¶218, “one or more processors operable to execute instructions, stored in the memory, to perform any of the methods described above”) and comprising:
a model parser configured to generate model metadata by converting a trained learning model into a layered graph (¶200, "the operation unit graph can be a deep neural network", ¶46, "the operation unit graph 204 can implement convolutional neural network (CNN) processing with several layers of varying sizes and data type" and "the operation unit graph 204 can involve memory operations to access the inputs and weights and floating point operations", Fig 10, Operation Unit Graph 1000 shows metadata such as architecture, weights, biases), the layered graph including subgraphs (Fig 10, ¶88, "The fusion algorithm 500 identifies the matched subgraph comprising the Conv2D operation unit 1002, the BatchNorm operation unit 1012, the Conv2D operation unit 1022, the BatchNorm operation unit 1032, and the Add operation unit 1042, along with their dataflows"); and
a control manager configured to generate control data indicating, for each of the subgraphs, a hardware block to be assigned, among a plurality of hardware blocks of different types, for performing a neural network operation included in a corresponding subgraph, based on a processing capability of the hardware block (Fuser is the control manager, Node patterns that are fused are the subgraphs, ¶71, "architectural hints 202 describe a list of node patterns that are fused into one operation which can be executed on one physical compute unit of the reconfigurable data processor 100", ¶92, "allocate the available physical compute units and/or physical memory units of the reconfigurable data processor 100 to operation units of the fused operation unit graph 224”); and
a task manager configured to:
receive, based on a request for an operation of the trained learning model, the model metadata and the control data from a memory and assign, based on the model metadata and the control data, a hardware block to perform the operation of the trained learning model (¶86, "An allocator 234 allocates the physical compute units and/or physical memory units of the reconfigurable data processor 100 to the fused operation unit graph 224", Fig 2, Fused Operation Unit Graph 224, Allocator 234).
instruct a hardware block, which is assigned to a subgraph based on the control data provided from the memory, to perform the operation of the trained learning model (¶92, "allocate the available physical compute units and/or physical memory units of the reconfigurable data processor 100 to operation units of the fused operation unit graph 224 and then to execute the fused operation unit graph", ¶87, "An executer 244 executes the fused operation unit graph 224 on the reconfigurable data processor 100 based on the allocation"),
Chen does not expressly teach:
wherein the control manager is configured to generate the control data based on at least one of mode information or drive preset information,
wherein the mode information including information indicating whether one of a power saving mode and a boost mode is set, and
wherein the drive preset information including at least one of dynamic voltage frequency scaling (DVFS) level information, last level cache information, or data transmission bandwidth information
However, Eckert teaches:
wherein the control manager is configured to generate the control data based on at least one of mode information or drive preset information,
wherein the mode information including information indicating whether one of a power saving mode and a boost mode is set, and
wherein the drive preset information including at least one of dynamic voltage frequency scaling (DVFS) level information, last level cache information (Eckert, p. 385, col. 1, ¶3, “We provide a brief overview of a cache’s geometry… Shared Last Level Cache (LLC) is distributed into many slices (14 for Xeon E5 we modeled”, p. 388, col. 1, ¶4, “The Neural Cache architecture transforms SRAM arrays in LLC to compute functional units”, p. 3, col. 2, ¶3, “Up to 256 elements can be processed in parallel in a single array. A 2.5 MB LLC slice has 320 8KB arrays as shown in Figure 3”), or data transmission bandwidth information.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the last level cache information and compute function units of Eckert with the control data of Chen as part of the architecture specification used during Chen’s fusion and allocation. The motivation to do so would be to be able to use massively parallel compute units for inference and improve inference latency (Eckert, p. 383, Abstract, “re-purposes cache structures to transform them into massively parallel compute units capable of running inferences for Deep Neural Networks” and “the proposed architecture can improve inference latency”).
Regarding Claim 19, Chen in view of Eckert teaches the method of Claim 18 as referenced above. Chen further teaches:
wherein the control manager is further configured to store the model metadata and the control data in the memory (¶172, "another implementation of the method described in this section can include a system including memory and one or more processors operable to execute instructions, stored in the memory, to perform any of the methods described above.", ¶92, "allocate the available physical compute units and/or physical memory units of the reconfigurable data processor 100 to operation units of the fused operation unit graph 224).
Regarding Claim 20, the rejection of Claim 18 is incorporated and further, the claim is rejected for the same reasons as set forth in Claim 2.
Claims 3 and 13 are rejected as obvious over Chen, in view of Eckert, further in view of Rodrigues et al. (U.S. Patent Application Publication No. US 20230215144 A1), hereinafter "Rodrigues".
Regarding Claim 3, Chen in view of Eckert teaches the electronic device of Claim 1 as referenced above. Chen in view of Eckert does not teach, but Rodrigues teaches:
wherein the control manager is further configured to provide an update completion event to a user based on the model metadata and the control data being updated in the memory (Rodrigues, ¶85, "the training apparatus 2000 notifies the user that the training of the discriminator 10 has finished").
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to send a notification to the user based on model data being updated, as does Rodrigues, in the invention of Chen. The motivation to do so would be to notify the user that the model data has changed.
Regarding Claim 13, the rejection of Claim 11 is incorporated and further, the claim is rejected for the same reasons as set forth in Claim 3.
Claim 7 is rejected as obvious over Chen in view of Eckert, further in view of Kitazawa et al. (U.S. Patent Application Publication No. US 20210382575 A1), hereinafter “Kitazawa”.
Regarding Claim 7, Chen in view of Eckert teaches the electronic device of Claim 1 as referenced above. Chen in view of Eckert does not teach, but Kitazawa teaches:
wherein the control manager is further configured to encrypt and compress the model metadata and the control data (Kitazawa, 185, "The second data processing section 259 compresses the learned model 160 reduced in size by the size reduction processing section 257 and encrypts the compressed learned model 160").
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to compress and encrypt model data, as does Kitazawa, in the invention of Chen. The motivation to do so would be to reduce the size of the model and protect the data from being stolen, changed, or compromised.
Claim 9 is rejected as obvious over Chen in view of Eckert, further in view of Su et al. (U.S. Patent Application Publication No. US 20210182177 A1), hereinafter “Su”.
Regarding Claim 9, Chen in view of Eckert teaches the electronic device of Claim 1 as referenced above. Chen in view of Eckert does not teach, but Su teaches:
further comprising a cache memory configured to cache the model metadata and the control data based on the request (Su, 175, "cache the network model data").
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to cache model data, as does Su, in the invention of Chen. The motivation to do so would be to store the data when there is not enough storage space in off chip memory (Su, 138, "a storage module configured to store the output information in an off-chip memory", 175, "when the free storage space of the storage module is insufficient, choose either to release the network model data or cache the network model data").
Response to Arguments
35. U.S.C. 102 & 103
Argument 1: Chen merely discloses that the fuser takes as input the operation graph, architectural hints, and architecture specification and produces a fused operation unit graph. There is no disclosure of generating the fused operation unit graph based on mode information or drive preset information as recited in Claim 1.
Examiner Response: Examiner agrees that Chen does not disclose generating the fused operation unit graph based on mode information or drive preset information, however Eckert discloses the drive preset information which when combined with Chen teaches this limitation. Claim 1 recites, “wherein the control manager is configured to generate control data based on at least one of mode information or drive preset information… drive preset information including at least one of… last level cache information”. Eckert teaches last level cache information (Eckert, p. 385, col. 1, ¶3,b“We provide a brief overview of a cache’s geometry… Shared Last Level Cache (LLC) is distributed into many slices”) used to optimize execution of neural network operations on hardware (p. 383, Abstract, “re-purposes cache structures to transform them into massively parallel compute units capable of running inferences for Deep Neural Networks”).
This cache level information is compatible with the control data which in the claim mapping includes the architecture specification of Chen. Chen describes the architecture specification as ¶75, “The target architecture is specified in the architecture specification 212 and is provided by the user. In one implementation, the architectural hints 202 are specific to the target architecture of the reconfigurable data processor”. This is clearly compatible with the last level cache information and compute functional units of Eckert. The motivation for this combination would be to have parallel compute units and improve inference time, Eckert, p. 383, Abstract, “massively parallel compute units capable of running inferences for Deep Neural Networks… the proposed architecture can improve inference latency”. Chen in view of Eckert clearly teaches the recited claim language for this limitation.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JESSE CHEN COULSON whose telephone number is (571)272-4716. The examiner can normally be reached Monday-Friday 8:30-5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached at (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/JESSE C COULSON/
Examiner, Art Unit 2122
/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122