DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This office action is in response to application 19/002,383 filed on 12/26/2024.
Claims 1-20 have been examined.
Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 12/26/2024 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Specification
The title of the invention is not descriptive. A new title is required that is clearly indicative of the invention to which the claims are directed.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim(s) 1-3, 5, 8-11, and 13-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yun et al. “GraNDe: Near-Data Processing Architecture with Adaptive Matric Mapping for Graph Convolutional Networks” and Poremba (US 2024/0143199).
With respect to claim 1, Yun teaches of a memory controller comprising: a plurality of computing units (fig. 2; section 3.2-3.3 on pages 46-47; where each near data processing (NDP) module contains a vector processing unit (PU) and control unit (claimed computing units)),
wherein the plurality of computing units is configured to perform calculations related to a matrix multiplication (fig. 2; abstract; section 3.2-3.3 on pages 46-47; where the vector PU and control unit of the NDP modules perform aggregation which is multiplication of the adjacency and feature matrices); and
a controller configured to: control the plurality of computing units to perform the calculations with respect to the plurality of rows by sequentially inputting the plurality of vectors into the plurality of computing units (sections 3.3-3.4 on page 47; where the aggregation phase is divided into multiple windows. The NDP module of each rank read the adj-bundle and stores it to the adj-bundle buffer where it is fetched by the control units of each NDP module and aggregation is performed. The adjacency matrix/adj-bundle and the feature matrix are multiplied together during the aggregation phase, see section 1 on page 45).
Yun fails to explicitly teach of (1) identify first information indicating a plurality of vectors required for the calculations with respect to a plurality of rows of a matrix, and (2) identify second information indicating at least one row to which each of the plurality of vectors corresponds among the plurality of rows.
However, Poremba teaches of a controller configured to: identify first information indicating a plurality of vectors required for the calculations with respect to a plurality of rows of a matrix (paragraph 13-19; where vectors stored in virtual memory are accessed via computing an offset virtual address that increments the base virtual address by the index value),
identify second information indicating at least one row to which each of the plurality of vectors corresponds among the plurality of rows (paragraph 13-19; where an index value describes a location in the sparse matrix where a non-zero element is disposed. The index is also used to increment the base virtual address of the vector to identify the vectors).
The combination of Yun and Poremba teaches of control the plurality of computing units to perform the calculations with respect to the plurality of rows by sequentially inputting the plurality of vectors into the plurality of computing units based on the first information and the second information (Yun, sections 3.3-3.4 on page 47; Poremba, paragraphs 13-19; where in the combination, the base virtual address, index value, and offset virtual address are used to access the representation of the sparse matrix, and the vectors in the virtual memory of Poremba which correspond to the adjacency matrix and feature matrix of Yun and send them to the control units of the NDP modules for the aggregation).
Yun and Poremba are analogous art because they are from the same field of endeavor, as they are directed to performing matrix operations using processing in memory.
It would have been obvious to one of ordinary skill in the art having the teachings Yun and Poremba before the time of the effective filing of the claimed invention to incorporate the indexing and offset virtual addressing of the matrix components of Yun as taught in Poremba. Their motivation would have been to more efficiently access the matrix (Poremba, paragraph 12-13).
With respect to claim 18, the combination of Yun and Poremba teaches of the limitations cited and described above with respect to claim 1 for the same reasoning as recited with respect to claim 1.
With respect to claim 19, Poremba teaches of a non-transitory computer-readable recording medium having a program for executing the operation method of claim 18 on a computer (paragraph 94-95; where a computer program is incorporated in non-transitory computer-readable storage medium to be executed by a processor to perform the disclosed steps).
Yun and Poremba are analogous art because they are from the same field of endeavor, as they are directed to performing matrix operations using processing in memory.
It would have been obvious to one of ordinary skill in the art having the teachings Yun and Poremba before the time of the effective filing of the claimed invention to incorporate the operations as computer program stored in a storage medium in Yun as taught in Poremba. Their motivation would have been to increase the flexibility of the system.
With respect to claim 20, the combination of Yun and Poremba teaches of the limitations cited and described above with respect to claim 1 for the same reasoning as recited with respect to claim 1.
Yun also teaches of a memory system comprising: a host system comprising a memory device controller (section 3.4 on page 47; where the host (claimed host system including the memory device controller as the host’s commands control the memory devices) issues the appropriate memory requests to the memory for GraNDe to offload the aggregation operations to GraNDe for the NDP modules in the memory to perform the aggregation.); and
a plurality of memory controllers that operate according to commands received from the memory device controller (fig. 2; section 3.2-3.3 on pages 46-47; where each DIMM contains multiple near data processing (NDP) modules that perform the aggregation in response to the commands from the host, see section 3.4 on page 47),
wherein the plurality of memory controllers comprises a first memory controller, wherein the first memory controller comprises: a plurality of computing units configured to perform calculations related to a matrix multiplication (fig. 2; section 3.2-3.3 on pages 46-47; where each near data processing (NDP) module contains a vector processing unit (PU) and control unit (claimed computing units)).
With respect to claim 2, Poremba teaches of wherein the second information includes information about a first row and a second row among the plurality of rows to which a first vector among the plurality of vectors corresponds (paragraph 14-15; where the index value specifies a row of the matrix where the non-zero element is. The vector element is identified by computing an offset virtual address by incrementing a base virtual address by the value of the index), and
wherein the controller is further configured to: input a first vector into: i) a first computing unit, the first computing unit corresponding to the first row, the plurality of computing units comprising the first computing unit, and ii) a second computing unit, the second computing unit corresponding to the second row, the plurality of computing units comprising the second computing unit (paragraph 17-18; where each processing in memory component is tasked with processing a different row of the sparse matrix. This suggests that the PIM components have received the first vector corresponding to the row they have been assigned to process),
control the first computing unit to perform a first calculation for the first row, and control the second computing unit to perform a second calculation for the second row (paragraph 15, 17-18; where each processing in memory component is tasked with processing a different row of the sparse matrix. The processing involves computing the resultant vector of the vector element and the non-zero element).
The reasoning for obviousness is the same as indicated above with respect to claim 1.
With respect to claim 3, Yun teaches of wherein the controller comprises: a first controller that comprises a matrix buffer that stores the plurality of rows (fig. 2; section 3.2 on pages 46-47; where the adj-bundle buffer stores a portion of the adjacency matrix that includes multiple rows).
Poremba teaches of an extractor configured to identify the plurality of vectors and the at least one row to which each of the plurality of vectors corresponds (paragraph 13-19; where the PIM component identifies the index value describing the row for a non-zero element of the sparse matrix and the vector elements are identified by computing an offset virtual address that is increments a base virtual address by the index value
a first queue that stores the first information and a second queue that stores the second information (paragraph 13-19; as the index and address values are identified by the PIM component and then used to access the memory locations, this suggests that there is a memory location or queue that temporarily stores the index and address values),
wherein the first information comprises address information where each of the plurality of vectors is stored in a memory (paragraph 13-19; where the base and offset virtual addresses identify the locations of the vectors), and
wherein the second information comprises index information of at least one row to which each of the plurality of vectors corresponds among the plurality of rows (paragraph 13-19; where the index value identifies non-zero element of the sparse matrix via the row of its location).
The reasoning for obviousness is the same as indicated above with respect to claim 1.
With respect to claim 5, Yun teaches of wherein the controller comprises a second controller configured to perform operations to access data stored in the memory (fig. 2-3; section 3.2-3.3 on pages 46-47; where the DRAM controller generates DRAM commands for the NDP module to access the DRAM memory).
The combination of Yun and Poremba teaches of wherein the first controller sequentially transmits the address information and the index information to the second controller (Yun, fig. 2; section 3.2 on pages 46-47; Poremba, paragraph 13-19; where in the combination the NDP module transmits the addresses and index values of Poremba to the DRAM controller to retrieve the vectors for the aggregation processing), and
wherein the second controller sequentially transmits commands comprising the address information to the memory, sequentially receives information about the plurality of vectors from the memory and transmits the information about the plurality of vectors to the plurality of computing units (Yun, fig. 2; section 3.2 on pages 46-47; Poremba, paragraph 13-19; where the DRAM controller generates the DRAM commands for the NDP operations to retrieve the adj-bundle from the DRAM into the adj-bundle buffer of the NDP module).
The reasoning for obviousness is the same as indicated above with respect to claim 1.
With respect to claim 8, Poremba teaches of wherein the plurality of vectors are identified based on columns corresponding to non-zero elements included in the plurality of rows (paragraph 50-54; where the column index value indicates a non-zero element in the vectors of the sparse matrix), and
wherein at least one first row corresponding to a first vector among the plurality of vectors is identified based on a row of a first non-zero element comprised in a first column corresponding to the first vector (paragraph 50-54; where as the matrix is made up of rows and column, each location has both a row and a column. The column index value identifies non-zero elements in the vectors of the sparse matrix).
The reasoning for obviousness is the same as indicated above with respect to claim 1.
With respect to claim 9, the combination of Yun and Poremba teaches of wherein the controller is further configured to control the plurality of computing units to perform the calculations with respect to the plurality of rows by sequentially inputting the plurality of vectors and non-zero elements into the plurality of computing units based on the first information, the second information and the non-zero elements comprised in the plurality of rows (Yun, fig. 2; section 3.2 on pages 46-47; Poremba, paragraph 13-19; where in the combination the NDP module uses the addresses and index values of Poremba to retrieve the adj-bundle and the feature matrix to the NDP modules for aggregation processing).
With respect to claim 10, Yun teaches of comprising a buffer storing result vectors related to the calculations (fig. 2; section 3.2-3.3 on pages 46-47; where the output buffers in the GP buffers store the output vectors),
wherein, in case that the second information includes information about the first row among the plurality of rows to which the first vector among the plurality of vectors corresponds, a result vector corresponding to the first row is updated based on the calculation in a first computing unit corresponding to the first row among the plurality of computing units (fig. 2; section 3.2-3.3 on pages 46-47; where the output vectors from each output buffers of the NDP modules are read by the host and concatenates them for the previous window and writes the concatenated output vector to the output buffers).
With respect to claim 11, Yun teaches of wherein an order of sequentially inputting the plurality of vectors comprises a second vector being input after a first vector is input (section 3.3 on page 47; where the aggregation phase is broken up into multiple windows and after the first window is aggregated, a second adj-bundle is fetched), and
wherein the controller is further configured to control the plurality of computing units to perform a third calculation based on the first vector and then perform a fourth calculation based on the second vector (section 3.3 on page 47; where the aggregation phase is broken up into multiple windows and after the first window is aggregated, a second adj-bundle is fetched and the NDP modules perform aggregation on the second adj-bundle. The results of which are later combined with the results of the previous window).
With respect to claim 13, Yun teaches of an interface, wherein the interface receives commands to perform the calculations from a host system comprising a memory device controller (section 3.4 on page 47; where the host issues the appropriate memory requests to the memory for GraNDe to offload the aggregation operations to GraNDe. As the host interacts with the NDP modules and memory, there must be an interface between the host and the memory containing the NDP modules).
With respect to claim 14, Yun teaches of wherein the plurality of rows are determined in order for non-zero elements comprised in a set number of rows of the matrix to be placed in an identical column in a maximum number (section 3.2-3.3 on page 46-47; the aggregation phase is broken up into multiple windows based on the size of the output buffer so that it can store all the output feature vectors of the window. The size/rows of the adj-bundle is set such that the output vectors are the appropriate size for the output buffer).
With respect to claim 15, Poremba teaches of wherein the plurality of rows comprises the first row and at least one second row, wherein the at least one second row is determined based on columns corresponding to non-zero elements comprised in the first row (paragraph 50-54; where the sparse matrix is made up of multiple rows and column, each location has both a row and a column identification. The column index value identifies non-zero elements in the vectors of the sparse matrix. Those vectors comprise multiple rows).
The reasoning for obviousness is the same as indicated above with respect to claim 1.
With respect to claim 16, Yun teaches of comprising a buffer, wherein the buffer is configured to store result vectors related to the calculations, wherein the set number is determined based on a size of the buffer (section 3.3 on page 47; where the aggregation phase is broken up into multiple windows based on the size of the output buffer so that it can store all the output feature vectors of the window. Thus, the size/rows of the adj-bundle is based on the output buffer size).
With respect to claim 17, Yun teaches of wherein the memory comprises a dynamic random access memory (DRAM) (section 2 on page 45; where the memory system includes multiple DIMMs each consisting of multiple DRAM devices).
Claim(s) 4 and 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yun and Poremba as applied to claim 3 above, and further in view of Araki (US 2024/0143695).
With respect to claim 4, the combination of Yun and Poremba fails to explicitly teach of wherein the extractor is configured to identify the plurality of vectors and the at least one row to which each of the plurality of vectors corresponds by sequentially searching for non-zero elements comprised in the plurality of rows for each column in a preset order.
However, Araki teaches of wherein the extractor is configured to identify the plurality of vectors and the at least one row to which each of the plurality of vectors corresponds by sequentially searching for non-zero elements comprised in the plurality of rows for each column in a preset order (paragraph 110; where the index continuation unit searches in the row direction for a non-zero component of the row constituting the work matrix, which is composed of a plurality of columns. When the non-zero component is identified, extracting the column with the non-zero component and adding the k-th extracted column to the third sparse matrix of the predetermined form as the k-th column, for each row, in order from top repeatedly).
Yun, Poremba, and Araki are analogous art because they are from the same field of endeavor, as they are directed to performing matrix operations on sparse matrices.
It would have been obvious to one of ordinary skill in the art having the teachings Yun, Poremba, and Araki before the time of the effective filing of the claimed invention to incorporate the identifying the vectors in the matrices of Yun and Poremba as taught in Araki. Their motivation would have been to more efficiently identify the non-zero elements.
With respect to claim 12, the combination of Yun, Poremba, and Araki teaches of wherein the extractor, in case that the first queue and the second queue are empty (Poremba, paragraph 13-19; when the index and address values have not yet been identified by the PIM components, the locations that would temporarily store them, would be empty),
is configured to identify the plurality of vectors and the at least one row to which each of the plurality of vectors corresponds (Araki, paragraph 110; where the index continuation unit searches the row direction for the non-zero components of the matrix).
The reasoning for obviousness is the same as indicated with respect to claim 4.
Claim 6is/are rejected under 35 U.S.C. 103 as being unpatentable over Yun and Poremba as applied to claim 5 above, and further in view of Chilappagari et al. (US 12,455,746) and Choi et al. (US 2018/0232325).
With respect to claim 6, the combination of Yun and Poremba fails to explicitly teach of wherein a number of times that the commands are sequentially transmitted to the memory is less than a number of non-zero elements comprised in the plurality of rows.
However, Chilappagari teaches of wherein a number of times that the commands are sequentially transmitted to the memory is a number of columns that correspond to non-zero elements comprised in the plurality of rows (claim 1; where the each of the identified plurality of non-zero entries has a corresponding column address and a corresponding row address within the first matrix. For each non-zero entry, the corresponding row address of the non-zero entry is used to identify a corresponding one of the N SIMD engines and a corresponding one of the N output register to process the non-zero entry. As each of the non-zero entries are routed from the memory to it respective n SIMD engine, this suggests to one of ordinary skill in the art that there are the same number of instructions instructing the routing to occur as there are non-zero entries).
Choi teaches of combining a plurality of read commands into one command group (paragraph 6).
The combination of Yun, Poremba, Chilappagari, and Choi suggests wherein a number of times that the commands are sequentially transmitted to the memory is less than a number of non-zero elements comprised in the plurality of rows (Chilappagari, claim 1; Choi, paragraph 6; in the combination the commands to obtain the non-zero elements are combined together).
Yun, Poremba, and Chilappagari are analogous art because they are from the same field of endeavor, as they are directed to performing matrix operations on sparse matrices.
It would have been obvious to one of ordinary skill in the art having the teachings Yun, Poremba, and Chilappagari before the time of the effective filing of the claimed invention to incorporate the instructing the entries to the NDP modules of the combination of Yun, Poremba as taught in Chilappagari. Their motivation would have been to more quickly obtain the matrix data for aggregation.
Yun, Poremba, Chilappagari, and Choi are analogous art because they are from the same field of endeavor, as they are directed to performing matrix operations on sparse matrices.
It would have been obvious to one of ordinary skill in the art having the teachings Yun, Poremba, Chilappagari, and Choi before the time of the effective filing of the claimed invention to incorporate the combining commands into one command group in the combination of Yun, Poremba and Chilappagari as taught in Choi. Their motivation would have been to more efficiently access the memory.
Claim 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yun and Poremba as applied to claim 5 above, and further in view of Chilappagari et al. (US 12,455,746).
With respect to claim 7, the combination of Yun and Poremba fails to explicitly teach of wherein a number of times that the commands are sequentially transmitted to the memory is a number of columns that correspond to non-zero elements comprised in the plurality of rows.
However, Chilappagari teaches of wherein a number of times that the commands are sequentially transmitted to the memory is a number of columns that correspond to non-zero elements comprised in the plurality of rows (claim 1; where the each of the identified plurality of non-zero entries has a corresponding column address and a corresponding row address within the first matrix. For each non-zero entry, the corresponding row address of the non-zero entry is used to identify a corresponding one of the N SIMD engines and a corresponding one of the N output register to process the non-zero entry. As each of the non-zero entries are routed from the memory to it respective n SIMD engine, this suggests to one of ordinary skill in the art that there are the same number of instructions instructing the routing to occur as there are non-zero entries).
Yun, Poremba, and Chilappagari are analogous art because they are from the same field of endeavor, as they are directed to performing matrix operations on sparse matrices.
It would have been obvious to one of ordinary skill in the art having the teachings Yun, Poremba, and Chilappagari before the time of the effective filing of the claimed invention to incorporate the instructing the entries to the NDP modules of the combination of Yun, Poremba as taught in Chilappagari. Their motivation would have been to more quickly obtain the matrix data for aggregation.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL C KROFCHECK whose telephone number is (571)272-8193. The examiner can normally be reached on Monday - Friday 8am -5pm, first Friday off.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tim Vo can be reached on (571) 272-3642. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Michael Krofcheck/Primary Examiner, Art Unit 2138
MICHAEL C. KROFCHECK
Primary Examiner
Art Unit 2138