DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 03/24/2025 is being considered by the examiner.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-4 and 21-22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sheaffer (US2012/0047344) in view of Bradford et al. (US 2019/0042248).
With respect to claim 1, Sheaffer teaches accessing a memory array (see Fig.1 and paragraph 18; memory array 155), wherein the memory array includes a plurality of memory banks (see Fig. 1, paragraph 18; data is read from multiple banks), and wherein each column in the memory array is associated with a unique memory bank within the plurality of memory banks (see paragraph 19; each memory row is divided into four banks (e.g., columns 151-154));
receiving a request to transpose a matrix (see paragraphs 18 and 28; permutation operations are rotate operations, for example to perform a matrix transpose operation), wherein the matrix comprises i rows (see paragraph 18-19 and 28; rows 110-140) and j columns (see paragraph 18-19 and 28; columns 151-154);
saving, in the memory array, the matrix (see paragraph 29; a loading instruction on table 171), wherein the saving includes rotating, to the right, each row within the i rows (see paragraph 31-33; rotate data elements from the second row of table 171 to the right by 4 bytes; load the result of the rotation to row 120); and
reading, from the memory array (see paragraph 34; a reading instruction), the diagonal format of the matrix (see paragraphs 35-38; B4, C4, D4, and A4, are read from 4 different banks), wherein the reading includes rotating, to the left, each row within the diagonal format of the matrix (see paragraph 36-38; D2, A2, B2, and C2 are read from 4 different banks and become data 163; data 163 are rotated to the left for 4 bytes (one data element) and become A2, B2, C2, D2 at outgoing data 164), and wherein the reading produces a transpose of the matrix (see paragraph 18; re-ordering apparatus reads data from multiple banks of memory array 155 and permutes the data (e.g., data 163) to produce outgoing data (e.g., data 164). In one embodiment, the permutation operations are rotate operations, for example to perform a matrix transpose operation).
Sheaffer does not explicitly teach wherein the rotating is based on a row index, and wherein the saving results in a diagonal format of the matrix within the memory array.
However, Bradford et al. teaches wherein for each buffered row X of the specified source matrix; cause the row to be written to the load buffer in a diagonal (i.e., matric is stored in diagonal format), starting at a matrix location shifted left by X positions… and executing, by the execution circuitry, a second operation to rotate each row Y of the reorder buffer rightwards by Y positions (i.e., rotation is done based on row index) (see paragraphs 58-60 and 215).
It would have been obvious to a person having ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to have modified the method taught by Sheaffer to include the above mentioned to efficiently execute a matrix transpose instruction with significant speedup and to improve performance by allowing each element of the load rotator to be broadcasted along a single column of the load buffer (see Bradford, paragraphs 36 and 59).
With respect to claim 2, Sheaffer teaches wherein the rotating aligns each element within each row of the matrix with a unique memory bank in the plurality of memory banks.
However, Bradford et al. teaches using a load rotator to rotate each row by X elements, thereby lining up each element with the column of the load buffer in which it will be written (see paragraph 59).
It would have been obvious to a person having ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to have modified the method taught by Sheaffer to include the above mentioned to efficiently execute a matrix transpose instruction with significant speedup and to improve performance by allowing each element of the load rotator to be broadcasted along a single column of the load buffer (see Bradford, paragraphs 36 and 59).
With respect to claim 3, Sheaffer teaches wherein the saving is accomplished within a single cycle of the memory array (see paragraph 34; in one clock cycle, a data element from one bank (from each memory row) is driven onto the corresponding output ban.
With respect to claim 4, Sheaffer teaches wherein the reading is accomplished within a single cycle of the memory array (see paragraph 39; operations 5-8 is performed in a clock cycle each).
With respect to claim 21, Sheaffer teaches a computer program product embodied in a non-transitory computer readable medium for data processing, the computer program product comprising code which causes one or more processors to generate semiconductor logic for (see paragraph 13; apparatuses may be specially constructed for the required purposes, or it may comprise a general purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium):
accessing a memory array (see Fig.1 and paragraph 18; memory array 155), wherein the memory array includes a plurality of memory banks (see Fig. 1, paragraph 18; data is read from multiple banks), and wherein each column in the memory array is associated with a unique memory bank within the plurality of memory banks (see paragraph 19; each memory row is divided into four banks (e.g., columns 151-154));
receiving a request to transpose a matrix (see paragraphs 18 and 28; permutation operations are rotate operations, for example to perform a matrix transpose operation), wherein the matrix comprises i rows (see paragraph 18-19 and 28; rows 110-140) and j columns (see paragraph 18-19 and 28; columns 151-154);
saving, in the memory array, the matrix (see paragraph 29; a loading instruction on table 171), wherein the saving includes rotating, to the right, each row within the i rows (see paragraph 31-33; rotate data elements from the second row of table 171 to the right by 4 bytes; load the result of the rotation to row 120); and
reading, from the memory array (see paragraph 34; a reading instruction), the diagonal format of the matrix (see paragraphs 35-38; B4, C4, D4, and A4, are read from 4 different banks), wherein the reading includes rotating, to the left, each row within the diagonal format of the matrix (see paragraph 36-38; D2, A2, B2, and C2 are read from 4 different banks and become data 163; data 163 are rotated to the left for 4 bytes (one data element) and become A2, B2, C2, D2 at outgoing data 164), and wherein the reading produces a transpose of the matrix (see paragraph 18; re-ordering apparatus reads data from multiple banks of memory array 155 and permutes the data (e.g., data 163) to produce outgoing data (e.g., data 164). In one embodiment, the permutation operations are rotate operations, for example to perform a matrix transpose operation).
Sheaffer does not explicitly teach wherein the rotating is based on a row index, and wherein the saving results in a diagonal format of the matrix within the memory array.
However, Bradford et al. teaches wherein for each buffered row X of the specified source matrix; cause the row to be written to the load buffer in a diagonal (i.e., matric is stored in diagonal format), starting at a matrix location shifted left by X positions… and executing, by the execution circuitry, a second operation to rotate each row Y of the reorder buffer rightwards by Y positions (i.e., rotation is done based on row index) (see paragraphs 58-60 and 215).
It would have been obvious to a person having ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to have modified the program product taught by Sheaffer to include the above mentioned to efficiently execute a matrix transpose instruction with significant speedup and to improve performance by allowing each element of the load rotator to be broadcasted along a single column of the load buffer (see Bradford, paragraphs 36 and 59).
With respect to claim 22, Sheaffer teaches a memory which stores instructions (see Fig. 3 and paragraphs 13 and 45; computer program may be stored in a computer readable storage medium, such as, but not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, DVD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs));
one or more processors coupled to the memory wherein the one or more processors (see Fig. 3 and paragraph 45; processor 705 coupled to memories), when executing the instructions which are stored (see paragraphs 13 and 45; apparatuses may be specially constructed for the required purposes, or it may comprise a general purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium), are configured to:
access a memory array (see Fig.1 and paragraph 18; memory array 155), wherein the memory array includes a plurality of memory banks (see Fig. 1, paragraph 18; data is read from multiple banks), and wherein each column in the memory array is associated with a unique memory bank within the plurality of memory banks (see paragraph 19; each memory row is divided into four banks (e.g., columns 151-154));
receive a request to transpose a matrix (see paragraphs 18 and 28; permutation operations are rotate operations, for example to perform a matrix transpose operation), wherein the matrix comprises i rows (see paragraph 18-19 and 28; rows 110-140) and j columns (see paragraph 18-19 and 28; columns 151-154);
save, in the memory array, the matrix (see paragraph 29; a loading instruction on table 171), wherein the saving includes rotating, to the right, each row within the i rows (see paragraph 31-33; rotate data elements from the second row of table 171 to the right by 4 bytes; load the result of the rotation to row 120); and
read, from the memory array (see paragraph 34; a reading instruction), the diagonal format of the matrix (see paragraphs 35-38; B4, C4, D4, and A4, are read from 4 different banks), wherein the reading includes rotating, to the left, each row within the diagonal format of the matrix (see paragraph 36-38; D2, A2, B2, and C2 are read from 4 different banks and become data 163; data 163 are rotated to the left for 4 bytes (one data element) and become A2, B2, C2, D2 at outgoing data 164), and wherein the reading produces a transpose of the matrix (see paragraph 18; re-ordering apparatus reads data from multiple banks of memory array 155 and permutes the data (e.g., data 163) to produce outgoing data (e.g., data 164). In one embodiment, the permutation operations are rotate operations, for example to perform a matrix transpose operation).
Sheaffer does not explicitly teach wherein the rotating is based on a row index, and wherein the saving results in a diagonal format of the matrix within the memory array.
However, Bradford et al. teaches wherein for each buffered row X of the specified source matrix; cause the row to be written to the load buffer in a diagonal (i.e., matric is stored in diagonal format), starting at a matrix location shifted left by X positions… and executing, by the execution circuitry, a second operation to rotate each row Y of the reorder buffer rightwards by Y positions (i.e., rotation is done based on row index) (see paragraphs 58-60 and 215).
It would have been obvious to a person having ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to have modified the system taught by Sheaffer to include the above mentioned to efficiently execute a matrix transpose instruction with significant speedup and to improve performance by allowing each element of the load rotator to be broadcasted along a single column of the load buffer (see Bradford, paragraphs 36 and 59).
Claim(s) 5-6, 9-11 and 13-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sheaffer (US2012/0047344) in view of Bradford et al. (US 2019/0042248) as applied to claim 1 above, and further in view of Guerrero (US 2006/0190517).
With respect to claim 5, Sheaffer and Bradford et al. do not teach wherein the memory array stores two or more matrices.
However, Guerrero teaches wherein the memory array stores two or more matrices (see paragraphs 33-34; memory element 102-1 may arrange media information as a two-dimensional (2D) matrix or array having N rows and M columns… media information may be arranged as one or more matrices).
It would have been obvious to a person having ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to have modified the method taught by Sheaffer and Bradford et al. to include the above mentioned to optimize storage of information (see Guerrero, paragraph 41).
With respect to claim 6, Sheaffer and Bradford et al. do not teach wherein the receiving includes a second matrix.
However, Guerrero teaches wherein the receiving includes a second matrix (see paragraph 41; media processing node 102 may transpose one or more matrices of information).
It would have been obvious to a person having ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to have modified the method taught by Sheaffer and Bradford et al. to include the above mentioned to optimize storage of information (see Guerrero, paragraph 41).
With respect to claim 9, Sheaffer and Bradford et al. do not teach wherein the accessing includes a second memory array.
However, Guerrero teaches wherein the accessing includes a second memory array (see paragraph 34; memory element 102-1 may arrange media information as a two-dimensional (2D) matrix or array having N rows and M columns… media information may be arranged as one or more matrices. Also in paragraph 51; an original matrix may be written on a per row basis and read on a per column basis to transpose data. A subsequent matrix may be written on a per column basis and read on a per row basis to transpose data).
It would have been obvious to a person having ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to have modified the method taught by Sheaffer and Bradford et al. to include the above mentioned to optimize storage of information (see Guerrero, paragraph 41).
With respect to claim 10, Sheaffer and Bradford et al. do not teach wherein the receiving includes a second matrix
However, Guerrero teaches wherein the receiving includes a second matrix (see paragraph 41; media processing node 102 may transpose one or more matrices of information in order to optimize storage of media information).
It would have been obvious to a person having ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to have modified the method taught by Sheaffer and Bradford et al. to include the above mentioned to optimize storage of information (see Guerrero, paragraph 41).
With respect to claim 11, Sheaffer and Bradford et al. do not teach wherein the request includes alternating between the memory array and the second memory array.
However, Guerrero teaches wherein the request includes alternating between the memory array and the second memory array (see paragraph 51; an original matrix may be written on a per row basis and read on a per column basis to transpose data. A subsequent matrix may be written on a per column basis and read on a per row basis to transpose data).
It would have been obvious to a person having ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to have modified the method taught by Sheaffer and Bradford et al. to include the above mentioned to optimize storage of information (see Guerrero, paragraph 41).
With respect to claim 13, Sheaffer and Bradford et al. do not teach dividing the matrix into at least two submatrices.
However, Guerrero teaches dividing the matrix into at least two submatrices (see paragraph 34; media information may be arranged as one or more matrices. Each matrix may, in turn, comprise multiple sub-matrices).
It would have been obvious to a person having ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to have modified the method taught by Sheaffer and Bradford et al. to include the above mentioned to optimize storage of information (see Guerrero, paragraph 41).
With respect to claim 14, Sheaffer and Bradford et al. do not teach wherein the saving and the reading include a first submatrix within the at least two submatrices.
However, Guerrero teaches wherein the saving and the reading include a first submatrix within the at least two submatrices (see paragraphs 53-54; transposing media information 200 may comprise transposing one or more sub-matrices 250).
It would have been obvious to a person having ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to have modified the method taught by Sheaffer and Bradford et al. to include the above mentioned to optimize storage of information (see Guerrero, paragraph 41).
With respect to claim 15, Sheaffer and Bradford et al. do not teach wherein the saving and the reading produce a transpose of the first submatrix.
However, Guerrero teaches wherein the saving and the reading produce a transpose of the first submatrix (see paragraphs 53-54; transposing media information 200 may comprise transposing one or more sub-matrices 250).
It would have been obvious to a person having ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to have modified the method taught by Sheaffer and Bradford et al. to include the above mentioned to optimize storage of information (see Guerrero, paragraph 41).
With respect to claim 16, Sheaffer and Bradford et al. do not teach wherein the saving and the reading include a second submatrix within the one or more submatrices.
However, Guerrero teaches wherein the saving and the reading include a second submatrix within the one or more submatrices (see paragraphs 53-54; transposing media information 200 may comprise transposing one or more sub-matrices 250).
It would have been obvious to a person having ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to have modified the method taught by Sheaffer and Bradford et al. to include the above mentioned to optimize storage of information (see Guerrero, paragraph 41).
With respect to claim 17, Sheaffer and Bradford et al. do not teach wherein the saving and the reading produce a transpose of the second submatrix.
However, Guerrero teaches However, Guerrero teaches wherein the saving and the reading produce a transpose of the first submatrix (see paragraphs 53-54; transposing media information 200 may comprise transposing one or more sub-matrices 250).
It would have been obvious to a person having ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to have modified the method taught by Sheaffer and Bradford et al. to include the above mentioned to optimize storage of information (see Guerrero, paragraph 41).
With respect to claim 18, Sheaffer and Bradford et al. do not teach assembling the transpose of the first submatrix and the transpose of the second submatrix, wherein the assembling comprises a transpose of the matrix.
However, Guerrero teaches assembling the transpose of the first submatrix and the transpose of the second submatrix, wherein the assembling comprises a transpose of the matrix (see paragraph 54; individually transposing one or more sub-matrices 250 may effectuate the transposition of an overall matrix).
It would have been obvious to a person having ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to have modified the method taught by Sheaffer and Bradford et al. to include the above mentioned to optimize storage of information (see Guerrero, paragraph 41).
With respect to claim 19, Sheaffer teaches wherein the memory array comprises at least i columns (see paragraph 18-19 and 28; columns 151-154).
With respect to claim 20, Sheaffer teaches wherein the memory array comprises at least j rows (see paragraph 18-19 and 28; rows 110-140).
Claim(s) 7-8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sheaffer (US2012/0047344), Bradford et al. (US 2019/0042248) and Guerrero (US 2006/0190517) as applied to claims 1 and 5-6 above, and further in view of Van Hook et al. (US 5,812,147).
With respect to claim 7, Sheaffer, Bradford et al. and Guerrero do not teach wherein the saving and the reading include a base address.
However, Van Hook et al. teaches wherein a base field of the instruction are added to an offset field to generate an effective byte address, addr, in memory 102. The contents of the base register represent a byte address in memory 102. Offset is an index based on the size of the data item to be moved and, in one embodiment, is required to be properly weighted before adding to the contents of the base register. If the data item to be moved is not a single byte, offset is then shifted left 1, 2, 3, or 4 bit positions to achieve the proper weighting. The data item starting at the effective byte address (addr) in memory 102 is loaded into the vector register specified by the vt field. A read from data RAM 102 accesses two double word elements, address and adress+8 (see column 5, lines 18-40).
It would have been obvious to a person having ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to have modified the method taught by Sheaffer, Bradford et al. and Guerrero to include the above mentioned to efficiently move data to memory (see Van Hook, column 3, lines 41-46).
With respect to claim 8, Sheaffer, Bradford et al. and Guerrero do not teach indexing, by the base address, between the matrix and the second matrix stored in the memory array.
However, Van Hook et al. teaches wherein a base field of the instruction are added to an offset field to generate an effective byte address, addr, in memory 102. Offset is an index based on the size of the data item to be moved and, in one embodiment, is required to be properly weighted before adding to the contents of the base register... The data item starting at the effective byte address (addr) in memory 102 is loaded into the vector register specified by the vt field. A read from data RAM 102 accesses two double word elements, address and adress+8 (see column 5, lines 18-40 and column 24, lines 9-33).
It would have been obvious to a person having ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to have modified the method taught by Sheaffer, Bradford et al. and Guerrero to include the above mentioned to efficiently move data to memory (see Van Hook, column 3, lines 41-46).
Claim(s) 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sheaffer (US2012/0047344), Bradford et al. (US 2019/0042248) and Guerrero (US 2006/0190517) as applied to claims 1 and 9-11 above, and further in view of Tran et al. (US2021/0191720).
With respect to claim 12, Sheaffer, Bradford et al. and Guerrero do not teach wherein the memory array and the second memory array comprise a streaming matrix transposer.
However, Tran et al. teaches a streaming matrix transposer (see paragraph 189; streaming engine 125 can transpose data elements at a different granularity).
It would have been obvious to a person having ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to have modified the method taught by Sheaffer, Bradford et al. and Guerrero to include the above mentioned because transposed streams avoid the cost of explicitly transforming the data in memory (see Tran, paragraph 144).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Hanounik et al. (US 2003/0084081) teaches transposing a two-dimensional array.
Ross (US 9,952,831) teaches transposing in a matrix-vector processor.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ARACELIS RUIZ whose telephone number is (571)270-1038. The examiner can normally be reached Monday-Friday 11:00am-7:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Reginald G. Bragdon can be reached at (571)272-4204. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ARACELIS RUIZ/ Primary Examiner, Art Unit 2139