Last updated: April 19, 2026
Application No. 17/727,122
METHOD FOR PERMUTING DIMENSIONS OF A MULTI-DIMENSIONAL TENSOR

Non-Final OA §102§103§DP
Filed
Apr 22, 2022
Examiner
LAROCQUE, EMILY E
Art Unit
2182
Tech Center
2100 — Computer Architecture & Software
Assignee
Arm Limited
OA Round
1 (Non-Final)
Interview Optional

— +12.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 454 resolved cases, 2023–2026
Examiner Intelligence

LAROCQUE, EMILY E View full profile →
Grants 81% — above average
Career Allow Rate
366 granted / 454 resolved
+25.6% vs TC avg
Moderate +12% lift
Without
With
+12.2%
Interview Lift
resolved cases with interview
Typical timeline
2y 8m
Avg Prosecution
41 currently pending
Career history
495
Total Applications
across all art units
Statute-Specific Performance

§101
29.3%
-10.7% vs TC avg
§103
22.2%
-17.8% vs TC avg
§102
12.8%
-27.2% vs TC avg
§112
29.4%
-10.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 454 resolved cases
Office Action

§102 §103 §DP
DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Applicant’s claim for the benefit of a prior-filed application under 35 U.S.C. 119(e) or under 35 U.S.C. 120, 121, 365(c), or 386(c) is acknowledged. With respect to claims 13-18, Applicant has not complied with one or more conditions for receiving the benefit of an earlier filing date under 35 U.S.C. 120 as follows:
The later-filed application must be an application for a patent for an invention which is also disclosed in the prior application (the parent or original nonprovisional application or provisional application). The disclosure of the invention in the parent application and in the later-filed application must be sufficient to comply with the requirements of 35 U.S.C. 112(a) or the first paragraph of pre-AIA  35 U.S.C. 112, except for the best mode requirement. See Transco Products, Inc. v. Performance Contracting, Inc., 38 F.3d 551, 32 USPQ2d 1077 (Fed. Cir. 1994).
The disclosure of the prior-filed application, Application No. 17080302, fails to provide adequate support or enablement in the manner provided by 35 U.S.C. 112(a) or pre-AIA  35 U.S.C. 112, first paragraph for one or more claims of this application: claim 13 - claim 18.  The prior-filed application does not disclose, is silent with respect to, the activation (AO) engine, and does not disclose wherein the DMA engine performs the reading or writing or permuting or scrambling limitations as in claims 13 – claim 18. 
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159.  See MPEP §§ 706.02(l)(1) - 706.02(l)(3) for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1-4, 8, 11-12, and 19-20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-7, of U.S. Patent No. 12412081 B2 (reference Patent). Although the claims at issue are not identical, they are not patentably distinct from each other because the respective claims in the reference Patent would anticipate the respective claims in the present application.  See e.g., representative claim 1 mapping below.                     







                                                                  

Application No. 17727122
Patent No. 12412081
1. A method performed by a processor for permuting dimensions of a multi- dimensional tensor, 
1. A method performed by a processor comprising a plurality of compute engines each of which comprises a programmable engine and a multiply-accumulate engine, each programmable engine operating up to a maximum number of tensor values in a cycle, each programmable engine configured to operate on a slice of data including data from one channel of a multi-dimensional tensor at a time such that channels of the multi-dimensional tensor are parallelized across multiple compute engines, wherein each multiply-accumulate engine is configured to perform matrix multiply operations on received data to generate output data, and a respective programmable engine is configured to receive the output data from the multiply-accumulate engine in order to perform a permutation function on the output data, wherein the method-permutes dimensions of the multi-dimensional tensor, 
wherein the multi-dimensional tensor contains an array of tensor values in three or more dimensions that are stored in a first storage unit, the method comprising: 
which contains an array of tensor values in three or more dimensions that are stored in a first storage unit, the method comprising:
transferring the array of tensor values from the first storage unit to a second storage unit by reading tensor values from the first storage that are arrayed along a first dimension of the multi-dimensional tensor and writing the corresponding tensor values to the second storage in locations corresponding to a second dimension of the multi-dimensional tensor.
 transferring the array of tensor values from the first storage unit to a second storage unit by reading tensor values from the first storage unit that are arrayed along a first dimension of the multi-dimensional tensor and writing the corresponding tensor values to the second storage unit in locations corresponding to a second dimension of the multi-dimensional tensor that is different from the first dimension thereby reordering the tensor values; and a plurality of programmable engines of the plurality of compute engines permuting a pair of dimensions of the multi-dimensional tensor in parallel by sequentially: reading sub-blocks of the multi-dimensional tensor from a local storage, permuting the pair of dimensions of the sub-blocks of the multi-dimensional tensor and writing the permuted sub-blocks to the local storage of the processor, wherein the sub-blocks are read from and written to the local storage using addresses in the local storage so as to re-order the sub-blocks to complete the permutation of the pair of dimensions across the multi-dimensional tensor, wherein the local storage is one of the first storage unit and the second storage unit.
 
 


Claim Objections
Claims 1-20 are objected to because of the following informalities.  
Claim 1 line 5, claim 11 line 1, claim 19 line 5, and claim 20 line 6 recite “the first storage”. This limitation lacks antecedent basis.  Antecedent basis is present for “the first storage unit”.  Claims 2-18 inherit the same deficiency as claim 1 based on dependence.  Claim 12 inherits the same deficiency as claim 11 based on dependence.  
Claim 1 line 6, claim 11 line 2, claim 19 line 7, and claim 20 line 7 recite “the second storage”. This limitation lacks antecedent basis.  Antecedent basis is present for “the second storage unit”.  Claims 2-18 inherit the same deficiency as claim 1 based on dependence.  Claim 12 inherits the same deficiency as claim 11 based on dependence.  
Claim 2 lines 1-2 recites “the first storage unit is one of the external storage unit in communication the processor”.  This appears to include a typographical error and should possibly recite “the first storage unit is one of the external storage unit in communication with the processor”.  
Claim 5 line 4, and claim 6 line 3 recite “the first storage unit and second storage unit”.  For antecedent basis this should recite “the first storage unit and the second storage unit”.
Appropriate correction is required.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1, 3-4, 7-12, and 19-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by US 11416959 B1 Tariq et al., (hereinafter “Tariq”)

Regarding claim 1, Tariq teaches the following:
a method performed by a processor for permuting dimensions of a multi- dimensional tensor (fig 1. computing device 102 including fist processing unit 104 and second processing unit 106, col 3 line 29- col 4 line 2, wherein col 3 line 40 describes the GPU or CPU comprise one ore more tensor processing units, col 4 lines 1-2 describing the data in a NHWC or NCHW format for multi-dimensional tensor, with col 2 line 31-47 describing each letter as dimension in the tensor, col 5 line 33- line 37 transpose for permute), wherein the multi-dimensional tensor contains an array of tensor values in three or more dimensions that are stored in a first storage unit (NHWC or NCHW for 4-dimensional tensor, format 1 and format 2 stored in 128, 130 of memory 110, or 132 and 134 stored in memory 114, first format and second format stored for first storage unit, and second storage unit respectively, and col 14 line 50-55 for different formats across memories), the method comprising: 
transferring the array of tensor values from the first storage unit to a second storage unit by reading tensor values from the first storage that are arrayed along a first dimension of the multi-dimensional tensor and writing the corresponding tensor values to the second storage in locations corresponding to a second dimension of the multi-dimensional tensor (fig 5, col 14 line 15-30, first format, second format as in col 3 line 67-col 4 line 2 NHWC for along a first dimension and NCHW for along a second dimension, col 5 line 23-41, col 6 lines 24-34).

Regarding claim 3, in addition to the teachings addressed in the claim 1 analysis, Tariq teaches the following:
wherein the processor is at least one of a neural processing unit, a graphics processing unit, a coprocessor, an accelerator and a central processing unit (col 3 line 29-44, neural processing unit, one or more of the various types for coprocessor, Fig 1-104 CPU, 106 GPU, CPU/GPU for accelerator and a CPU, col 3 line 29-44).

Regarding claim 4, in addition to the teachings addressed in the claim 1 analysis, Tariq teaches the following:
wherein the multi-dimensional tensor is a feature map of a neural network (col 5 line 23-41, NCHW image data and/or NCHW image data in neural processing unit as in col 3 line 29-44).

Regarding claim 7, in addition to the teachings addressed in the claim 1 analysis, Tariq teaches the following:
grouping a pair of dimensions of the multi-dimensional tensor before transferring the array of tensor values from the first storage unit to the second storage unit (fig 1, col 14 line 15-31, grouping as NCHW as in format 1 storage unit, before transferring to format 2 storge unit).
Regarding claim 8, in addition to the teachings addressed in the claim 1 analysis, Tariq teaches the following:
wherein the processor comprises one or more programmable engines, wherein the method further comprises the one or more programmable engines permuting a pair of dimensions of the multi-dimensional tensor (fig 1 CPU and GPU for programmable engines, col 5 line 8-41 permute the pair H, and C from NHWC to HCHW format).

Regarding claim 9, in addition to the teachings addressed in the claim 7 analysis, Tariq teaches the following:
wherein the one or more programmable engines have a maximum number of tensor values that it can operate on in a cycle, wherein the method comprises the one or more programmable engines sequentially (col 6 line 24-34, read/write speed, read/write at different times for operate on a cycle, and col 4 line 49-51 time stamp,  different size s 64 bit, 128 bit etc for having a maximum number of tensor values, fig 2 for sequentially, col 5 line 42-line 67): reading sub-blocks of the multi- dimensional tensor from a local storage (fig 4, col 2 line 50-col 12 line 4, col 3 line 58-62, memory ranges of the vision data for sub-blocks of the multi-dimensional tensor, memory associated with the processor for local storage), permuting the pair of dimensions of the sub-block of the multi-dimensional tensor and writing the permuted sub-blocks to the local storage of the processor (col 5 line 8-41, transpose for permute, stored in same memory location as cited above), wherein the sub-blocks are read from and written to the local storage using addresses in the local storage so as to re-order the sub-blocks to complete the permutation of the pair of dimensions across the multi-dimensional tensor, wherein the local storage is one of the first storage unit and the second storage unit ([col 5 line 8-64) .

Regarding claim 10, in addition to the teachings addressed in the claim 7 analysis, Tariq teaches the following:
 wherein the one or more programmable engines is a plurality of programmable engines, wherein the method comprises two or more of the programmable engines permuting the pair of dimensions of the multi-dimensional tensor in parallel (fig 2 GPU streams, CPU in parallel, col 5 line 42 – line 67).

Regarding claim 11, in addition to the teachings addressed in the claim 7 analysis, Tariq teaches the following:
wherein tensor values are read from the first storage and written to the second storage in stripes of data (col 2 line 20-30 transfer from CPU to GPU memory, lol 2 line 31-47, allocation of the NHWC format and NCHW format in memory ranges for stripes), wherein the method comprises transferring the array of tensor values from the second storage unit to the first storage unit, wherein transferring the stripe of tensor values from the first storage unit to the second storage unit occurs in parallel with transferring another stripe of tensor values from the second storage unit to the first storage unit (fig 2, col 5 line 42-64 in parallel, GPU to CPU and vice versa).

Regarding claim 12, in addition to the teachings addressed in the claim 11 analysis, Tariq teaches the following:
further comprising one or more programmable engines permuting a pair of dimensions of a further stripe of the multi-dimensional tensor in parallel with at least one of transferring the stripe of tensor values from the first storage unit to the second storage unit and transferring another stripe of tensor values from the second storage unit to the first storage unit (fig 2 second transfer from CPU 202 to GPU 204, and second transfer from GPU 204 to CPU 202, with permute and transfer as in the claim 11 analysis).

Regarding claim 19, Tariq teaches the following:
a processor for permuting dimensions of a multi-dimensional tensor, wherein the multi-dimensional tensor contains an array of tensor values in three or more dimensions that are stored in a first storage unit (fig 1. computing device 102 including fist processing unit 104 and second processing unit 106, col 3 line 29- col 4 line 2, wherein col 3 line 40 describes the GPU or CPU comprise one or more tensor processing units, col 4 lines 1-2 describing the data in a NHWC or NCHW format for multi-dimensional tensor, with col 2 line 31-47 describing each letter as dimension in the tensor, col 5 line 33- line 37 transpose for permute), the processor comprising: 
a controller configured to control transfer of the array of tensor values from the first storage unit to a second storage unit by reading tensor values from the first storage that are arrayed along a first dimension of the multi-dimensional tensor and writing the corresponding tensor values to the second storage in locations corresponding to a second dimension of the multi-dimensional tensor (fig 3, 324, configured to include memory controller as in col 15 line 9-11, fig 5, col 14 line 15-30, first format, second format as in col 3 line 67-col 4 line 2 NHWC for along a first dimension and NCHW for along a second dimension, col 5 line 23-41, col 6 lines 24-34).

Claim 20 is directed to a non-transitory computer-readable storage medium storing instructions that, when performed by a processor, cause the processor to perform the method as in claim 1.  All steps caused to be performed by the non-transitory computer-readable medium as in claim 20 are performed by the method of claim 1.  The claim 1 analysis applies equally to claim 20.  See also col 6 line 7-23 for Tariq performing the method executed by a non-transitory computer-readable storage medium storing instructions. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 2, and 13-18 are rejected under 35 U.S.C. 103 as being unpatentable over Tariq in view of US 20230244629 A1 Marcovitch et al.,  (hereinafter “Marcovitch”).

Regarding claim 2, in addition to the teachings addressed in the claim 1 analysis, Tariq teaches the first storage unit and the second storage unit are internal storage units (fig 1, 128, 130, 132, 134).  Tariq does not, however, explicitly disclose wherein one of the first and second storage units is a local storage unit, and the other is an external storage unit. However, in the same field of endeavor, Marcovitch discloses an apparatus similar to Tariq for performing reorganization of tensor data arrays to a different layout ([0002], [0024]).  Marcovitch further discloses:
wherein the first storage unit is one of an external storage unit in communication the processor and a local storage unit of the processor and the second storage unit is the other of the external storage unit in communication with the processor and the local storage unit of the processor (fig 1A data source 108 for second storage unit is an external storage unit, memory 136 for first storage unit is a local storage unit).
It would have been obvious to one of ordinary skill in the art before the effective filing date to include a data storage that is an external data storage and a data storage that is a local data storage to achieve the benefit of operating data from various networked devices ([0081-0085], [0092]), and to operate on the data using DMA read write with data in the local data storage ([0108]).

Regarding claim 13, Tariq teaches the claim 1 limitations.  Tariq is silent with respect to the processor comprises an activation output (AO) engine.  However, in the same field of endeavor, Marcovitch discloses an apparatus similar to Tariq for performing reorganization of tensor data arrays to a different layout ([0002], [0024]).  Marcovitch further discloses:
wherein the processor comprises an activation-output (AO) engine, wherein the method further comprises the AO engine permuting a pair of dimensions of the multi-dimensional tensor ([0108], fig 4-128 for AO engine, shuffle for scramble, fig 7A, 7B, [0102]).
It would have been obvious to one of ordinary skill in the art before the effective filing date to substitute the AO engine of Marcovitch to perform the permuting of a pair of dimensions, for the permuting dimensions of a multi-dimensional tensor as disclosed by Tariq.  It would have been obvious to achieve the benefit of to enable an all-to-all operations ([0107]).  Furthermore, it would have been obvious to one of ordinary skill in the art before the effective filing date to read a tensor slice and perform a data scramble operation as disclosed by Marcovitch, to achieve the benefit of allowing the CPU or GPU to perform the data shuffle operation while performing other tasks ([0003]).

Regarding claim 14, Tariq in view of Marcovitch teach the claim 13 limitations. Marcovitch further discloses:
wherein permuting the pair of dimensions of the multi-dimensional tensor by the AO engine comprises reading, by the AO engine, tensor slices of the multi-dimensional tensor in either a row order or a column order (fig 4 128, [0107-0108]).
In addition to the motivation provided with respect to claim 13, It would have been obvious to one of ordinary skill in the art before the effective filing date to achieve the benefit of allowing the CPU or GPU to perform the data shuffle operation while performing other tasks ([0003]).

Regarding claim 15, Tariq teaches the claim 1 limitations.  Tariq is silent with respect to DMA. However in the same field of endeavor Marcovitch discloses an apparatus similar to Tariq for performing reorganization of tensor data arrays to a different layout ([0002], [0024]).  Marcovitch further discloses:
wherein the processor comprises a direct memory access (DMA) engine ([0097-0099], wherein the method further comprises the DMA engine permuting a pair of dimensions of the multi-dimensional tensor (fig 4, [0108]). 
It would have been obvious to one of ordinary skill in the art before the effective filing date to substitute the DMA access engine of Marcovitch to perform the permuting of a pair of dimensions, for the permuting dimensions of a multi-dimensional tensor as disclosed by Tariq.  It would have been obvious to achieve the benefit of to enable an all-to-all operations ([0107]).  Furthermore, it would have been obvious to one of ordinary skill in the art before the effective filing date to read a tensor slice and perform the permuting operation as disclosed by Marcovitch, to achieve the benefit of allowing the CPU or GPU to perform the permuting operation while performing other tasks ([0003]).

Regarding claim 16, in addition to the teachings addressed in the claim 15 analysis, Tariq discloses the data shuffler pointing to data to be shuffled by rows and columns but does not explicitly disclose reading in either a row order or a column order or wherein the DMA engine performs the shuffle operation.  However in the same field of endeavor Marcovitch discloses an apparatus similar to Tariq for performing reorganization of tensor data arrays to a different layout ([0002], [0024]).  Marcovitch further discloses:
wherein permuting the pair of dimensions of the multi-dimensional tensor by the DMA engine comprises: 
reading, by the DMA engine, a tensor slice of the multi-dimensional tensor in either a row order or a column order ([0102],[0107-0108], fig 7A), or 
performing, by the DMA engine, a data scramble operation ([0108], fig 4 shuffle for scramble, fig 7A, 7B, [0102]).
It would have been obvious to one of ordinary skill in the art before the effective filing date to read a tensor slice and perform a data scramble operation as disclosed by Marcovitch, to achieve the benefit of allowing the CPU or GPU to perform the data shuffle operation while performing other tasks ([0003]).

Regarding claim 17, Tariq discloses the claim 1 limitations. Tariq is, however, silent with respect to the processor comprises an activation output (AO) engine or a DMA engine.  However, in the same field of endeavor, Marcovitch discloses an apparatus similar to Tariq for performing reorganization of tensor data arrays to a different layout ([0002], [0024]).  Marcovitch further discloses:
wherein the processor comprises an activation- output (AO) engine ([0104-0107], fig 4 128 Data Shuffle Unit for activation-output (AO) engine) and a direct memory access (DMA) engine ([0108], fig 4), and wherein permuting dimensions of a multi-dimensional tensor comprises: 
reading, by the AO engine, tensor slices of the multi-dimensional tensor in either a row order or a column order (fig 4 128, [0107-0108]); or 
reading, by the DMA engine, a tensor slice of the multi-dimensional tensor in either a row order or a column order, or performing, by the DMA engine, a data scramble operation ([0108], fig 4 shuffle for scramble, fig 7A, 7B, [0102]).
It would have been obvious to one of ordinary skill in the art before the effective filing date to substitute the DMA access engine or AO engine of Marcovitch to perform the permuting of a pair of dimensions, for the permuting dimensions of a multi-dimensional tensor as disclosed by Tariq.  It would have been obvious to achieve the benefit of to enable an all-to-all operations ([0107]).  Furthermore, it would have been obvious to one of ordinary skill in the art before the effective filing date to read a tensor slice and perform a data scramble operation as disclosed by Marcovitch, to achieve the benefit of allowing the CPU or GPU to perform the data shuffle operation while performing other tasks ([0003]).

Regarding claim 18, in addition to the teachings addressed in the claim 1 analysis, Tariq teaches a permutation circuit ([0107] data shuffle unit). Tariq does not, however, explicitly disclose reading, by the DMA engine, a tensor slice of the multi-dimensional tensor in either a row order or a column order, or specific reading and writing of arrays between local and external storage. However, in the same field of endeavor, Marcovitch discloses an apparatus similar to Tariq for performing reorganization of tensor data arrays to a different layout ([0002], [0024]).  Marcovitch further discloses:
wherein the processor comprises a direct memory access (DMA) engine and a permutation circuit, wherein the first storage unit is a local storage unit, wherein the second storage unit is an external storage unit in communication with the processor ([0080-0082], [0085], fig 1B data source 108 for second storage unit is an external storage unit in communication with the processor, fig 1B memory 136, buffer 14- for first storage unit is a local storage unit), the method further comprising: 
reading, by the DMA engine, a first array of tensor values, from the external storage unit (fig 1B from 108 to 128, [0092-0098], [0108], via data pointer by element size, row, col as in fig 4 412); 
writing, by the DMA engine, the first array of tensor values in the local storage unit as a second array of tensor values ([0107-0108], writing from 108 t0 128); 
reading, by the permutation circuit, the second array of tensor values from the local storage unit (fig 4 128 data shuffle circuit from 408, [0107-0108]; 
writing, by the permutation circuit, the second array of tensor values in the local storage unit as a third array of tensor values ([0107], fig 4 308); 
reading, by the DMA engine, the third array of tensor values from the local storage unit ([0109] last sentence read request); and 
writing, by the DMA engine, the third array of tensor values in the external storage unit as a fourth array of tensor values ([0109] last sentence provide the shuffle data over the network 148, fig 1B 148 connects the data shuffle unit to the external storage unit) 
wherein the fourth array of tensor values corresponds to the first array of tensor values having been permuted in at least one dimension, and wherein the permutation is performed by one or both of the DMA engine and the permutation circuit during their respective reading and writing operations (fig 4 [0107-0109].
It would have been obvious to one of ordinary skill in the art before the effective filing date to substitute the DMA access engine, data shuffling unit of Marcovitch to perform the DMA reading and writing operations and  permuting, for the permuting dimensions of a multi-dimensional tensor as disclosed by Tariq.  It would have been obvious to achieve the benefit of to enable an all-to-all operations ([0107]).  Furthermore, it would have been obvious to one of ordinary skill in the art before the effective filing date to read a tensor slice and perform data permute operations as disclosed by Marcovitch, to achieve the benefit of allowing the CPU or GPU to perform the data shuffle operation while performing other tasks ([0003]).

Claims 5-6 are rejected under 35 U.S.C. 103 as being unpatentable over Tariq in view of US 20210035258 A1 Ray et al.,  (hereinafter “Ray”).

Regarding claim 5, Tariq teaches the claim 4 limitations.  Tariq discloses vision data but does not explicitly disclose wherein the multi-dimensional tensor defines a compressed feature map of the neural network or decompressing the compressed feature map and storing the decompressed feature map in a first storage or second storage.  However, in the same field of endeavor, Ray discloses an apparatus similar to Tariq including operation on neural network tensor data including operations on sub-matrices ([0373], fig 31A, fig 33A, [0137]). Ray further discloses:
wherein the multi-dimensional tensor defines a compressed feature map of the neural network ([0392]), and wherein the method further comprises: 
decompressing the compressed feature map and storing the decompressed feature map in the first storage unit or second storage unit ([0392], fig 35).
It would have been obvious to one of ordinary skill in the art before the effective filing date to include the decompression of a compressed feature map and storing as disclosed by Ray in the method as disclosed by Tariq.  It would have been obvious to achieve the benefit of reducing the dimensionality of convolutional layers enabling scaling convolutional neural networks to process large images ([0191]).

Regarding claim 6, Tariq teaches the claim 4 limitations.  Tariq discloses vision data but does not explicitly disclose wherein the multi-dimensional tensor defines a compressed feature map of the neural network or compressing the feature map and storing the compressed feature map in a first storage or second storage.  However, in the same field of endeavor, Ray discloses an apparatus similar to Tariq including operation on neural network tensor data including operations on sub-matrices ([0373], fig 31A, fig 33A, [0137]). Ray further discloses:
compressing the feature map defined by the multi-dimensional tensor and storing the compressed feature map in the first storage unit or second storage unit ([0392], fig 35).
It would have been obvious to one of ordinary skill in the art before the effective filing date to include the compressing of a feature map and storing as disclosed by Ray in the method as disclosed by Tariq.  It would have been obvious to achieve the benefit of reducing the dimensionality of convolutional layers enabling scaling convolutional neural networks to process large images ([0191]). 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
US 20190042241 A1 Akin discloses an apparatus and method for a tensor permutation engine including a read address generation unit, storage element, and write address generation unit using a direct memory access unit (abstract, [0125]).

Any inquiry concerning this communication or earlier communications from the examiner should be directed to EMILY E LAROCQUE whose telephone number is (469)295-9289.  The examiner can normally be reached on 10:00am - 1200pm, 2:00pm - 8pm ET M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor Andrew Caldwell can be reached on 571-272-3701.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/EMILY E LAROCQUE/Examiner, Art Unit 2182
Read full office action
Prosecution Timeline

Apr 22, 2022
Application Filed
Jan 21, 2026
Non-Final Rejection — §102, §103, §DP (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/866,299
Patent 12602202
Finite State Machine-Based Bit-Stream Generator for Low-Discrepancy Stochastic Computing
2y 5m to grant Granted Apr 14, 2026
17/736,583
Patent 12596475
COMPRESSION AND DECOMPRESSION OF MULTI-DIMENSIONAL DATA
2y 5m to grant Granted Apr 07, 2026
17/499,506
Patent 12579414
ARTIFICIAL NEURON
2y 5m to grant Granted Mar 17, 2026
17/589,092
Patent 12579214
AUGMENTING MATHEMATICAL OPTIMIZATION MODELS GENERATED FROM HISTORICAL DATA
2y 5m to grant Granted Mar 17, 2026
17/689,295
Patent 12578923
METHOD AND APPARATUS FOR GENERATING ARCHITECTURE SPECIFIC CONVOLUTION GRADIENT KERNELS
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
81%
Grant Probability
93%
With Interview (+12.2%)
2y 8m
Median Time to Grant
Low
PTA Risk
Based on 454 resolved cases by this examiner. Grant probability derived from career allow rate.
METHOD FOR PERMUTING DIMENSIONS OF A MULTI-DIMENSIONAL TENSOR

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email