Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Status of the Application
This Office Action is in response to the Notice of Withdrawal from Issue sent 12/29/2025, and subsequent amendment filed 1/30/2026.
Claims 1-20 are pending for this examination.
Withdrawal of Allowability of Claims, Rejection on New Art
The indicated allowability of claims 1-20 sent in the Notice of Allowance dated 12/02/2025 is withdrawn in view of the newly discovered reference(s) to Sarangapani et al (2020/0405145) and Fauber (2023/0239224). Rejections based on the newly cited reference(s) follow below in the appropriate sections.
Amendment to Specifications
The amendments to the specification received on 1/30/2026 are acceptable.
Claim Rejections - 35 U.S.C. § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 3, 8-11, 14-15, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Sarangapani et al. (US 2020/0405145), herein referred to as Sarangapani ‘145, in view of Fauber (2023/0239224), herein referred to as Fauber ‘224.
Referring to claim 1, Sarangapani ‘145 teaches a processor (see Fig. 6A, system 600 with processor 610), comprising:
one or more tensor acceleration logic circuits (see Fig. 6A, processor 610; see Paragraph 0065, wherein processor may be representative of one or more CPUs, GPUs, tensor processing unit (TPUs), etc.) to cause data to be stored in one or more cache storages (see Fig. 6A, cache 612, where data is stored into cache by the processor and memory accessible through the cache 612, see Paragraph 0065).
However, Sarangapani ‘145 teaches using tensor processing units, i.e. processors specifically design to perform matrix operations and tensor operations, but Sarangapani ‘145 does not specifically teach the data being stored to be one or more tensor maps.
Fauber ‘224 teaches the storing or one or more tensor maps in memory (see Paragraph 0025, where sparse tensors are stored in key/value pairing, i.e. an index or mapping format with a reduced memory footprint to a full-size tensor).
Sarangapani ‘145 and Fauber ‘224 apply as analogous prior arts as both pertain to the same field of endeavor of utilizing hardware to handle tensor operations / data.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sarangapani ‘145 system as set forth above to store tensor key/value pairings, i.e. tensor maps, in the cache memory for usage by the processor, as taught by Fauber ‘224, as Sarangapani ‘145 teaches TPUs which would utilize tensors and as such tensors would need to be stored in memory for processing in which a person of ordinary skill in the art would be motivated to utilizing key/value pairing maps for storing tensors as this would reduce the memory footprint used to store the tensors instead of storing a full-size tensor (see Fauber ‘224, Paragraph 0025).
As to claim 3, Sarangapani ‘145 does not specifically teach the processor of claim 1, wherein the one or more tensor acceleration logic circuits are to cause the one or more tensor maps to be stored in one or more cache storages based, at least in part, on an application programming interface (API).
Fauber ‘224 teaches the storing or one or more tensor maps in memory (see Paragraph 0025, where sparse tensors are stored in key/value pairing, i.e. an index or mapping format with a reduced memory footprint to a full-size tensor), where components to implement the processing / execution / transfer of tensors can be done through I/O components and associated processors, application, and/or application programming interface components (APIs) (see Paragraph 0156).
Sarangapani ‘145 and Fauber ‘224 apply as analogous prior arts as both pertain to the same field of endeavor of utilizing hardware to handle tensor operations / data.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sarangapani ‘145 system as set forth above to utilize an API to implement the transfer and storage of tensor key/value pairings, i.e. tensor maps, in the cache memory for usage by the processor, as taught by Fauber ‘224, as a person of ordinary skill in the art would be motivated to utilizing APIs to facilitate transfer and storage of data as APIs by definition are rules and definitions that enable different software programs to communicate and exchange data with each other where caching APIs are known to be commonly used in the art to store frequently accessed data in a cache storage which is faster to access than the primary database / memory.
Referring to claim 8, Sarangapani ‘145 teaches a system ((see Fig. 6A, system 600), comprising:
one or more processors (see Fig. 6A, processor 610; see Paragraph 0065, wherein processor may be representative of one or more CPUs, GPUs, tensor processing unit (TPUs), etc.) to cause data to be stored in one or more cache storages (see Fig. 6A, cache 612, where data is stored into cache by the processor and memory accessible through the cache 612, see Paragraph 0065).
However, Sarangapani ‘145 teaches using tensor processing units, i.e. processors specifically design to perform matrix operations and tensor operations, but Sarangapani ‘145 does not specifically teach the data being stored to be one or more tensor maps.
Fauber ‘224 teaches the storing or one or more tensor maps in memory (see Paragraph 0025, where sparse tensors are stored in key/value pairing, i.e. an index or mapping format with a reduced memory footprint to a full-size tensor).
Sarangapani ‘145 and Fauber ‘224 apply as analogous prior arts as both pertain to the same field of endeavor of utilizing hardware to handle tensor operations / data.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sarangapani ‘145 system as set forth above to store tensor key/value pairings, i.e. tensor maps, in the cache memory for usage by the processor, as taught by Fauber ‘224, as Sarangapani ‘145 teaches TPUs which would utilize tensors and as such tensors would need to be stored in memory for processing in which a person of ordinary skill in the art would be motivated to utilizing key/value pairing maps for storing tensors as this would reduce the memory footprint used to store the tensors instead of storing a full-size tensor (see Fauber ‘224, Paragraph 0025).
As to claim 9, Sarangapani ‘145 teaches the system of claim 8, wherein the one or more processors are to cause the one or more tensor maps to be stored in one or more cache storages using one or more addresses of the one or more tensor maps in memory (see Fig. 6A, cache 612, where data is stored into cache by the processor and memory accessible through the cache 612, see Paragraph 0065; Examiner points out that data is inherently stored and called upon using address information).
However, does not specifically teach the system of claim 8, wherein the one or more processors are to cause the one or more tensor maps to be stored in one or more cache storages based, at least in part, on an application programming interface (API).
Fauber ‘224 teaches the storing or one or more tensor maps in memory (see Paragraph 0025, where sparse tensors are stored in key/value pairing, i.e. an index or mapping format with a reduced memory footprint to a full-size tensor), where components to implement the processing / execution / transfer of tensors can be done through I/O components and associated processors, application, and/or application programming interface components (APIs) (see Paragraph 0156).
Sarangapani ‘145 and Fauber ‘224 apply as analogous prior arts as both pertain to the same field of endeavor of utilizing hardware to handle tensor operations / data.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sarangapani ‘145 system as set forth above to utilize an API to implement the transfer and storage of tensor key/value pairings, i.e. tensor maps, in the cache memory for usage by the processor, as taught by Fauber ‘224, as a person of ordinary skill in the art would be motivated to utilizing APIs to facilitate transfer and storage of data as APIs by definition are rules and definitions that enable different software programs to communicate and exchange data with each other where caching APIs are known to be commonly used in the art to store frequently accessed data in a cache storage which is faster to access than the primary database / memory.
As to claim 10, Sarangapani ‘145 teaches the system of claim 8, wherein the one or more processors are to cause the one or more tensor maps to be stored in one or more cache storages of a graphics processing unit (GPU) (see Paragraph 0065, wherein the processor 610 can be representative of one or more processing units that includes GPUs, wherein data is inherently stored and called upon using address information; Examiner points out that the data being tensor maps is addressed in the 103 combination set forth in the independent claim).
As to claim 11, Sarangapani ‘145 teaches the system of claim 8, wherein the one or more processors are to cause the one or more tensor maps to be stored in one or more cache storages based, at least in part, on an instruction that uses one or more addresses of the one or more tensor maps (see Fig. 6A, cache 612, where data is stored into cache by the processor and memory accessible through the cache 612, see Paragraph 0065, wherein the processor 610 can be representative of one or more processing units that includes TPUs, wherein operations that use data inherently use address information to locate the data; Examiner points out that the data being tensor maps is addressed in the 103 combination set forth in the independent claim).
Referring to claim 14, Sarangapani ‘145 teaches a method (see Abstract), comprising:
storing data in one or more cache storages (see Fig. 6A, cache 612, where data is stored into cache by the processor and memory accessible through the cache 612, see Paragraph 0065) using one or more tensor acceleration logic circuits (see Fig. 6A, processor 610; see Paragraph 0065, wherein processor may be representative of one or more CPUs, GPUs, tensor processing unit (TPUs), etc.).
However, Sarangapani ‘145 teaches using tensor processing units, i.e. processors specifically design to perform matrix operations and tensor operations, but Sarangapani ‘145 does not specifically teach the data being stored to be one or more tensor maps.
Fauber ‘224 teaches the storing or one or more tensor maps in memory (see Paragraph 0025, where sparse tensors are stored in key/value pairing, i.e. an index or mapping format with a reduced memory footprint to a full-size tensor).
Sarangapani ‘145 and Fauber ‘224 apply as analogous prior arts as both pertain to the same field of endeavor of utilizing hardware to handle tensor operations / data.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sarangapani ‘145 system as set forth above to store tensor key/value pairings, i.e. tensor maps, in the cache memory for usage by the processor, as taught by Fauber ‘224, as Sarangapani ‘145 teaches TPUs which would utilize tensors and as such tensors would need to be stored in memory for processing in which a person of ordinary skill in the art would be motivated to utilizing key/value pairing maps for storing tensors as this would reduce the memory footprint used to store the tensors instead of storing a full-size tensor (see Fauber ‘224, Paragraph 0025).
As to claim 15, Sarangapani ‘145 does not specifically teach the method of claim 14, wherein storing the one or more tensor maps in one or more cache storages includes performing an application programming interface (API) to cause the one or more tensor maps to be stored.
Fauber ‘224 teaches the storing or one or more tensor maps in memory (see Paragraph 0025, where sparse tensors are stored in key/value pairing, i.e. an index or mapping format with a reduced memory footprint to a full-size tensor), where components to implement the processing / execution / transfer of tensors can be done through I/O components and associated processors, application, and/or application programming interface components (APIs) (see Paragraph 0156).
Sarangapani ‘145 and Fauber ‘224 apply as analogous prior arts as both pertain to the same field of endeavor of utilizing hardware to handle tensor operations / data.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sarangapani ‘145 system as set forth above to utilize an API to implement the transfer and storage of tensor key/value pairings, i.e. tensor maps, in the cache memory for usage by the processor, as taught by Fauber ‘224, as a person of ordinary skill in the art would be motivated to utilizing APIs to facilitate transfer and storage of data as APIs by definition are rules and definitions that enable different software programs to communicate and exchange data with each other where caching APIs are known to be commonly used in the art to store frequently accessed data in a cache storage which is faster to access than the primary database / memory.
As to claim 19, Sarangapani ‘145 teaches the method of claim 14, wherein storing the one or more tensor maps in one or more cache storages includes performing an instruction based, at least in part, on one or more addresses of the one or more tensor maps in memory accessible by a graphics processing unit (GPU) (see Paragraph 0065, wherein the processor 610 can be representative of one or more processing units that includes GPUs, wherein data is inherently stored and called upon using address information; Examiner points out that the data being tensor maps is addressed in the 103 combination set forth in the independent claim).
Referring to claim 20, Sarangapani ‘145 teaches a non-transitory computer-readable medium having stored thereon a set of instructions (see Paragraph 0073), which if performed by one or more processors (see Fig. 6A, processor 610), cause the one or more processors to at least perform the method of claim 14 (see the above rejection of claim 14).
Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Sarangapani ‘145, in view of Fauber ‘224, and further in view of Raikin et al. (US 11,321,092), herein referred to as Raikin ‘092.
As to claim 2, the combination of Sarangapani ‘145 and Fauber ‘224 teaches the processor of claim 1, wherein the one or more tensor acceleration logic circuits are to cause the one or more tensor maps to be stored in one or more cache storages.
However, the combination of Sarangapani ‘145 and Fauber ‘224 does not teach an instruction to prefetch data to be stored into the cache storage.
Raikin ‘092 teaches a tensor-based memory access system (see Fig. 1, system 100) where a processor (see Fig. 1, scalar processor 102) has an instruction cache (see Fig. 1, instruction cache 112) which is configured to prefetch instructions (see Col. 4, lines 30-47).
Sarangapani ‘145, Fauber ‘224, and Raikin ‘092 apply as analogous prior arts as all of these arts pertain to the same field of endeavor of utilizing hardware to handle tensor operations / data.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination Sarangapani ‘145 and Fauber ‘224 system as set forth above to prefetch instruction into the cache which are used to implement operations on data such as the tensor operations for a TPU to execute, i.e. storing data to be stored in the cache storages based in part on prefetched instructions, as taught by Raikin ‘092, as a person of ordinary skill in the art would be motivated to utilizing prefetching of data / instructions into cache in order to improve performance and reduce latency by having data and instructions fetched and available prior to the processor needing them, thereby speeding up processor execution / operations.
Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Sarangapani ‘145, in view of Fauber ‘224, and further in view of Tuan (US 2013/0226535), herein referred to as Tuan ‘535.
As to claim 4, the combination of Sarangapani ‘145 and Fauber ‘224 teaches the processor of claim 1, wherein the one or more tensor acceleration logic circuits are to cause the one or more tensor maps to be stored in one or more cache storages.
However, the combination of Sarangapani ‘145 and Fauber ‘224 does not specifically teach the storage of data based in part on one or more addresses in global memory of a graphics processing unit (GPU).
Tuan ‘535 teaches a processor system for execution operations / threads (see Abstract) where data can be stored in global memory locations of contiguous addresses in the GPU 9see Paragraph 0022).
Sarangapani ‘145, Fauber ‘224, and Tuan ‘535 apply as analogous prior arts as all of these arts pertain to the same field of endeavor of utilizing hardware to handle storage of data.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination Sarangapani ‘145 and Fauber ‘224 system as set forth above to have data stored in memory or cache based in part on addresses found in global memory of a GPU, as taught by Tuan ‘535, as a person of ordinary skill in the art would be motivated to utilizing addresses of global memory of a GPU storing data in order to transfer data from a memory into cache for usage by a processor when performing an operation using that data as cache is known to be a smaller memory located on a processor or in close proximity to the processor used specifically to store frequently accessed data or data expected to be used by a processor that is fetched and stored in the cache using address information prior to being used by the processor.
Claims 5, 12, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Sarangapani ‘145, in view of Fauber ‘224, and further in view of Tsuboki et al. (US 2002/0062429), herein referred to as Tsuboki ‘429.
As to claim 5, Sarangapani ‘145 does not specifically teach the processor of claim 1, wherein the one or more cache storages include an asynchronous data movement hardware cache.
Tsuboki ‘429 teaches a storage system where data can be written into cache memory where the transferring of data can be done with asynchronous timing (see Paragraph 0033).
Sarangapani ‘145, Fauber ‘224, and Tsuboki ‘429 apply as analogous prior arts as all of these arts pertain to the same field of endeavor of utilizing hardware to handle storage and transfer of data.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination Sarangapani ‘145 and Fauber ‘224 system as set forth above to have the cache storages transfer data asynchronously, as taught by Tsuboki ‘429, as a person of ordinary skill in the art would be motivated to utilize asynchronous data transfers between memory and cache as asynchronous data transfers can be done independently of each other without needing to wait for other processes which is ideal for caches which are used to fetch / prefetch data from memory to be readily available for a processor for usage.
As to claim 12, Sarangapani ‘145 teaches the system of claim 8, wherein the one or more cache storages include a graphics processing unit (GPU) (see Paragraph 0065, wherein the processor 610 can be representative of one or more processing units that includes GPUs).
However, the combination Sarangapani ‘145 and Fauber ‘224 system does not teach an asynchronous data movement hardware cache.
Tsuboki ‘429 teaches a storage system where data can be written into cache memory where the transferring of data can be done with asynchronous timing (see Paragraph 0033).
Sarangapani ‘145, Fauber ‘224, and Tsuboki ‘429 apply as analogous prior arts as all of these arts pertain to the same field of endeavor of utilizing hardware to handle storage and transfer of data.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination Sarangapani ‘145 and Fauber ‘224 system as set forth above to have the cache storages transfer data asynchronously, as taught by Tsuboki ‘429, as a person of ordinary skill in the art would be motivated to utilize asynchronous data transfers between memory and cache as asynchronous data transfers can be done independently of each other without needing to wait for other processes which is ideal for caches which are used to fetch / prefetch data from memory to be readily available for a processor for usage.
As to claim 16, Sarangapani ‘145 does not specifically teach the method of claim 14, wherein storing the one or more tensor maps in one or more cache storages includes performing an instruction to cause the one or more tensor maps to be stored in an asynchronous data movement hardware cache.
Tsuboki ‘429 teaches a storage system where data can be written into cache memory where the transferring of data can be done with asynchronous timing (see Paragraph 0033).
Sarangapani ‘145, Fauber ‘224, and Tsuboki ‘429 apply as analogous prior arts as all of these arts pertain to the same field of endeavor of utilizing hardware to handle storage and transfer of data.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination Sarangapani ‘145 and Fauber ‘224 system as set forth above to have the cache storages transfer data asynchronously, as taught by Tsuboki ‘429, as a person of ordinary skill in the art would be motivated to utilize asynchronous data transfers between memory and cache as asynchronous data transfers can be done independently of each other without needing to wait for other processes which is ideal for caches which are used to fetch / prefetch data from memory to be readily available for a processor for usage.
Allowable Subject Matter
Claims 6-7, 13, and 17-18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
As to claims 6, 13, and 17, Examiner finds that prior art does not specifically teach the one or more tensor maps include a first tensor map that includes information that indicates a structure of a first tensor stored in a first memory of a graphics processing unit (GPU), and indicates a structure of a second tensor to be stored in a second memory of the GPU based, at least in part, on the first tensor map and the first tensor.
As to claim 7, Examiner finds that prior art does not specifically teaches the processor of claim 1, wherein the one or more tensor maps include one or more image-to-column transformations. The closest prior arts that do teach tensors with image-to-column transformations are from the same assignee and at least one common inventor within one year of the priority date of the instant application.
As to claim 18, Examiner finds that prior art does not teach the method of claim 14, wherein storing the one or more tensor maps in one or more cache storages includes performing an application programming interface (API) to cause the one or more tensor maps to be stored in an asynchronous data movement hardware cache of a graphics processing unit (GPU) based, at least in part, on one or more addresses of the one or more tensor maps in global memory of the GPU. More specifically each element in this claim can be taught individually but the combination of all of these elements together is not taught by a reasonable combination of prior arts.
Relevant Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Goyal et al. (US 2017/0316312) teaches a processor used for deep learning including tensor engines that perform operations for a neural network, where instruction can be stored in RAM and cache and the tensor engines are used for accelerated computations, the system implementing prefetching of data from external memory and invoking an API to call further instructions for the processor to execute.
Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL SUN whose telephone number is (571)270-1724. The examiner can normally be reached Monday-Friday 8am-4pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on 571-270-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MICHAEL SUN/Primary Examiner, Art Unit 2183