Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are presented for the examination.
§ 101 2. 35 U.S.C. 101 reads as follows
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1, 12, 18 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
As to Claims 1, 12, 18 have been rejected under 35 USC 101 for abstract idea without significantly more. Under Step 2A, Prong 1, the “ partitioning, by a computing device, a tensor into a plurality of portions”; “identifying, by the computing device, first portions among the plurality of portions, the plurality of portions including second portions not in the first portions”; “splitting, by the computing device, each respective portion among the first portions into a plurality of parts ” recite a mental process since “ partitioning” , “ identifying”, “ splitting” are functions that can be reasonably performed in the human mind with the aid of pen and paper through observation, evaluation, judgment, opinion.
Under Prong 2, the additional element “ generating, by the computing device, a plurality of computing tasks, each of the computing tasks configured to operate based on a respective part among the plurality of parts or a respective portion among the second portions, wherein at least one computing task of the plurality of computing tasks is generated by transforming at least one part of the plurality of parts of the first portions to create a transformed part to prevent external entities from reconstructing the tensor; outsourcing the at least one task to the transformed part to generate a first result of the transformed part” are recited at a high-level of generality such that it amounts no more than mere instructions to apply the exception using a generic computer component, or merely a generic computer or generic computer components to perform the judicial exception, Accordingly, the additional elements do not integrate the recited judicial exception into a practical application, and the claim is therefore directed to the judicial exception. See MPEP 2106.05(f).
Under Step 2B, the additional elements “ generating, by the computing device, a plurality of computing tasks, each of the computing tasks configured to operate based on a respective part among the plurality of parts or a respective portion among the second portions, wherein at least one computing task of the plurality of computing tasks is generated by transforming at least one part of the plurality of parts of the first portions to create a transformed part to prevent external entities from reconstructing the tensor; outsourcing the at least one task to the transformed part to generate a first result of the transformed part” - this generally have been a mental process although the computing device could be a generic computer component if the spec describes it as actual computer hardware.
“shuffling, by the computing device, at least the computing tasks in distribution of the computing tasks to a plurality of external entities such that each external entity only receives a subset of the portions and parts that is insufficient to reconstruct the tensor the external entities; and generating, by the computing device, a second result of operating based on the tensor using results, received from the external entities, of the computing tasks.” - this is mere instructions to apply the mental process under mpep 2106.05(f), amounts to merely generally, amounts to merely generally linking the use of the judicial exception to a particular technological environment or field or use, and is merely applying the judicial exception, therefore, does not amount to significantly more, hence, cannot provide an inventive concept.
4. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application. See MPEP 2106.05(d). Thus, the claim is not patent eligible.
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 1-20 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-18 of copending Application No 17715863. Although the claims at issue are not identical, they are not patentably distinct from each other because both computer systems comprise substantially the same elements. The difference between claims 1, 12, 17 of the copending application and this case is identifying, by the computing device, first portions among the plurality of portions, the plurality of portions including second portions not in the first portions; splitting, by the computing device, each respective portion among the first portions into a plurality of parts having a sum equal to the respective portion, a plurality of external entities such that each external entity only receives a subset of the portions and parts that is insufficient to reconstruct the tensor. It would have been obvious to one of the ordinary skill level in the art to include above feature because this analyzes a large tensor by dividing it into smaller subtensors that can be independently decomposed into factors for analysis of tensors representing networks that are much larger than can be handled by any single process.
this is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.
7.Claims 1-20 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-20 of copending Application No 17715877 . Although the claims at issue are not identical, they are not patentably distinct from each other because both computer systems comprise substantially the same elements. The difference between claims 1, 12, 18 of the copending application and this case is identifying, by the computing device, first portions among the plurality of portions, the plurality of portions including second portions not in the first portions; splitting, by the computing device, each respective portion among the first portions into a plurality of parts having a sum equal to the respective portion, a plurality of external entities such that each external entity only receives a subset of the portions and parts that is insufficient to reconstruct the tensor . It would have been obvious to one of the ordinary skill level in the art to include above feature because this analyzes a large tensor by dividing it into smaller subtensors that can be independently decomposed into factors for analysis of tensors representing networks that are much larger than can be handled by any single process.
this is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1, 2, 12, 13, 18 are rejected under 35 U.S.C. 103 as being unpatentable over Li(US 20200409664 A1) in view of Narayanamoort(US 20200272596 A1) in view of Owechko( US 20200348997 A1) in view of Xu(US 11385875 B2) and further in view of Prabhakar( US 11366783 B1).
As to claim 1, Li teaches A method, comprising: partitioning, by a computing device, a tensor into a plurality of portions for identifying, by the computing device, first portions among the plurality of portions, the plurality of portions including second portions not in the first portions ( decompose the multi-dimensional tensor into blocks of data elements, para[0087], ln 3-6/ where the number of blocks, k, is given by equation 1: where “//” denotes integer or floor division. The compiler may map[identify] each block of data elements of the decomposed tensor to the p partitions (i.e., rows) of the buffer memory with q data elements in each partition[second portions not in the first portions]),
generating, by the computing device, a plurality of computing tasks,( the first stage 132 can break the operations of the layer or node down into smaller operations, which can fit into the acceleration engine's local memory and/or can fit into the computing capacity of the acceleration engine 112, para[0026], ln 12-18);
computing tasks configured to operate based a respective portion among the second portions( to cause the execution engine (i.e., accelerator) to perform the matrix transpose operations on the blocks of the decomposed multi-dimensional tensor, para[0087], ln 7-12),
first portions among the plurality of portions, the plurality of portions including second portions not in the first portions; splitting, by the computing device, each respective portion among the first portions into a plurality of parts having a sum equal to the respective portion( the tensor may be decomposed into multiple blocks of data elements of appropriate size to be stored in the buffer memory. Further, the number of data elements of the decomposed tensor stored in the buffer memory may be too large for the processing engine array to process at one time. Therefore, the block of data elements stored in the buffer memory may be again decomposed to enable processing by the processing engine array, para[0012], ln 9-20/ The number of data elements in a block of a decomposed tensor stored in the buffer memory can be greater than the number of data elements that a processing element array is capable of processing at one time. For example, the buffer memory may have a size of 128 rows by 128 columns while the processing element array may have a size of 128 rows by 64 columns. In such cases, the compiler may further decompose the block of data into two sub-blocks of 128 rows by 64 columns of data. Each sub-block may be separately loaded into and processed by the processing element array, para[0053]);
; shuffling, by the computing device, at least the computing tasks in distribution of the computing tasks to external entities( data processing operations[tasks] include a matrix transpose operation[task] , para[0086], ln 1-3/ to cause the execution engine (i.e., accelerator) to perform the matrix transpose operations[task] on the blocks[portion[ of the decomposed multi-dimensional tensor , para[0087], ln 6-11/ the compiler 130 includes a first stage 132, a second stage 136, and a third stage 140, para[0024], ln 1-3/The input code 142 can be obtained, for example, from the storage device 106. Alternatively, though not illustrated here, the input code 142 may be located in the processor memory 104 or can be obtained from a network location, using the network interface 110. Processing of the input code 142 can include sorting[shuffling] the operations[tasks] described in the input code 142[tasks] into[ shuffling] layers[external entities], para[0025], ln 7-16/Fig.1/ The second stage 136 can perform intermediate processing on this output 134. For example, the operations performed in any one layer, or at any one node in a layer, may be too many for the acceleration engine 112 to perform at the same time. The acceleration engine 112 may, for example, have a limited amount[external entities] of locale storage space for the data needed for a computation, or the computations may be more than the acceleration engine 112 can perform at one time. In this example, the first stage 132 can break[shuffling] the operations[tasks] of the layer or node down into smaller operations, which can fit into the acceleration engine's local memory[external entities] and/or can fit into the computing capacity of the acceleration engine 112[external entities]. Processing of the output 134 of the first stage 132 can include other steps, such as scheduling, or determining the order[shuffling] in which the acceleration engine 112 and/or processor 102[external entities] will perform operations[tasks], among other examples, para[0026], ln 10-28/ to perform a set of operations, input data on which the operations are to be performed must first be moved[shuffling] into the accelerators 602a-602n[external entities], para[0100], ln 8-15).
Narayanamoort teaches each of the computing tasks configured to operate based on a respective part among the plurality of parts or a respective portion among the second portions for the computing tasks in distribution of the computing tasks( divides the first tensor 130 into a plurality of (i.e., four) 2×1 input arrays 132A-132D and the second tensor 140 into a plurality of (i.e., four) 1×2 input arrays 142A-142D. The systolic array circuitry 110 causes the transfer of corresponding ones of the first input arrays 132A-132D and the second input arrays 142A-142D into respective ones of the systolic sub-arrays 112. Each of the systolic sub-arrays 112 performs one or more mathematical operations on the corresponding ones of the first input arrays 132A-132D and the second input arrays 142A-142D, para[0035], ln 6-19).
, and generating, by the computing device, a result of operating based on the tensor using results, received from the external entities, of the computing tasks( the systolic array control circuitry 120 may combine at least a portion of the 2×2 systolic sub-arrays 112 to provide a 2×2 output tensor 150, para[0035], ln18-23/ All of the multiplication operations occur in one clock cycle, as the input arrays 132, 142 are transferred to the systolic array 110 by the systolic array control circuitry 120. In a second clock cycle, the systolic array circuitry 110 sums the contents of the systolic sub-arrays 112A-112D to generate the output tensor 150, para[0040], ln 11-20).
It would have been obvious to one of the ordinary skill in the art before the effective filling date of claimed invention was made to modify the teaching of Li with Narayanamoort to incorporate the feature of each of the computing tasks configured to operate based on a respective part among the plurality of parts or a respective portion among the second portions for the computing tasks in distribution of the computing tasks , and generating, by the computing device, a result of operating based on the tensor using results, received from the external entities, of the computing tasks because this improves the usage of the systolic array circuitry thereby advantageously reducing the number of clock cycles needed to perform a given number of calculations.
Owechko teaches identifying, by the computing device, first portions among the plurality of portions, the plurality of portions including second portions not in the first portions; splitting, by the computing device, each respective portion among the first portions into a plurality of parts having a sum equal to the respective portion( divides a tensor, having a plurality of tensor modes, into a plurality of subtensors, para[0006], ln 7-10/ decomposing[identifying] each subtensor[portion] comprises converting the subtensor into a set of one-dimensional vector signals; using the set of one-dimensional vector signals as inputs to independent component analysis; and extracting the plurality of subtensor mode factors, para[0008]/ As described herein, TTA is an extension of ICAT for cloud (Internet) implementations that can handle very large tensors by dividing the tensor into subtensors and then decomposing each subtensor independently on a separate processor, thereby creating sets of factors for each subtensor, para[0064], ln 1-6/ The concept of analyzing a tensor (element 500) by dividing it into subtensors (element 800) and then decomposing[identifying]the subtensors[portions] (element 800) into subtensor mode factors (element 802), into a plurality of subtensor mode factors, which are one-dimensional vectors that can be combined using the outer-product operation to form a subtensor, is shown in FIG. 8. The intersection of regions of the tensor addressed by nonzero portions of the tensor mode factors define[identifying] a subtensor[portion]. By varying the locations of the subtensor mode factor segments on the tensor mode factor vectors, any subtensor[portion]can be reconstructed[identifying]. If one wishes to address only the subtensor shown then the rest of the tensor mode factor vectors must be zeros, as indicated by the dashed portions of the vectors, para[0054], ln 5-23/ The tensor (element 500) is divided into subtensors (elements 900 and 902)[portion], and the subtensors are decomposed separately and independently[ second portions not in the first portions]using ICAT (or conventional tensor decomposition methods), para[0055], ln 23-28/ the decompositions of the subtensors can be performed independently[second portions not in the first portions], it is possible to efficiently utilize multiple processors if[identifying] they are available by assigning[identifying] one or more subtensors[portions] to a processor, Then, each processor decomposes[identifying] each subtensor into mode factors using ICAT or conventional tensor decomposition (element 1204) para[0058], ln 5-11/ coupling the dimensions or modes of separate[second portions not in the first portions] subtensors[portions] in a vectorized representation. When the vectors are separated[second portions not in the first portions] into factors using ICA, para[0059], ln 13-18 ).
It would have been obvious to one of the ordinary skill in the art before the effective filling date of claimed invention was made to modify the teaching of Li and Zhang with Owechko to incorporate the feature of identifying, by the computing device, first portions among the plurality of portions, the plurality of portions including second portions not in the first portions; splitting, by the computing device, each respective portion among the first portions into a plurality of parts having a sum equal to the respective portion because this analyzes a large tensor by dividing it into smaller subtensors that can be independently decomposed into factors for analysis of tensors representing networks that are much larger than can be handled by any single process.
Xu teaches sending at least one computing task of the plurality of computing tasks which is generated by transforming at least one part of the plurality of parts of the first portions to create a transformed part to prevent external entities from reconstructing the tensor( The machine learning processing system 130 can receive programs 112 from user devices 110 over a data communication network 120, e.g, col 5, In 7-10/ In particular, the compiler 140 includes a reduced-precision propagator 142 that identifies operators in the graph of a program 112 for which the precision of the numerical values on which the operators perform operations and the precision of the numerical values output by the operators can be reduced, col 5, In 54-60/The modified version of the program includes each adjusted first operator for which the precision of the numerical values has been adjusted, col 1, In 50-55/ computer programs, configured to perform the actions of the methods, encoded on computer storage devices, col 1, In 55-57/ each operator can be configured to receive, as input, one or more input values (e.g., in the form of a tensor) in a particular computer number format which has a corresponding level of precision, col 5, In 63-67/ When the precision of an operator is reduced, the computer number format of numbers input to and/or output by the operator are adjusted to the reduced-precision format, col 4, In 23-25/ The reduced-precision propagator 142 can propagate reduced-precision using forward and/or backward propagation. In backward propagation, the reduced-precision propagator 142 can start at the end of the graph (or another appropriate starting point in the graph) and evaluate operators in order from the end of the graph to the beginning of the graph. For each operator, the reduced-precision propagator 142 can determine whether reduced- precision can be propagated to the operator from a downstream operator using one or more propagation rules. In backward propagation, the propagation rules can include a rule that specifies that, if all uses (e.g., downstream operators) of the output of the operator has an input precision that is lower than the output precision of the operator, the output precision of the operator can be reduced to the highest input precision of the uses of the output. A use of the output of an operator is a downstream operator that uses the output of the operator as an input. For example, a downstream operator for a given operator is an operator that receives, as an input, a tensor output by the given operator directly without being modified by another operator, col 6, In 55-67 to col 7, In 1-12/ The compiler 140 can also reduce the precision of numerical values on which operations are performed for at least some operators of a machine learning model, e.g., to reduce data storage and memory requirements and increase the speed at which the processing unit 160 performs the computations. In particular, the compiler 140 includes a reduced-precision propagator 142 that identifies operators in the graph of a program 112 for which the precision of the numerical values on which the operators perform operations and the precision of the numerical values output by the operators can be reduced. The reduced-precision propagator 142 can then modify the identified operators to instead perform operations on reduced-precision values and output reduced-precision values, col 5, In 49-63).
It would have been obvious to one of the ordinary skill in the art before the effective filling date of claimed invention was made to modify the teaching of Li, Zhang and Owechko with XU to incorporate the above feature because this allows larger workloads to be executed by processors that otherwise would not be able to execute the workloads and improves the performance of processors (e.g., hardware processors) that perform machine learning and other numerical computations without negatively affecting the accuracy of the computations.
Prabhakar teaches shuffling, by the computing device, at least the computing tasks in distribution of the computing tasks to a plurality of external entities such that each external entity only receives a subset of the portions and parts that is insufficient to reconstruct the tensor(FIG. 11F is a combination of FIGS. 11D and 11E. FIG. 11F illustrates the order in which the partitions are output by the memory units 1102a, 1102b during the first and second cycles to the reorder units 1103 (e.g., received order comprises partitions 1106a, 1106e, 1106g, 1106c, 1106b, 1106f, 1106d, 1106h , col 22, ln 12-20/ IG. 11A illustrates a system 1150 including a configurable reorder memory unit to receive partitions from multiple configurable memory units, and to output the received partitions in correct order to consumers 1105 of the partitions. In the system 1150, a tensor 1100 (the full tensor 1100 is not illustrated in FIG. 11A) is partitioned into partitions 1106a, 1106b, 1106c, 1106d, 1106e, 1106f, 1106g, and 1106h, as discussed with respect to FIGS. 6 and 7. The partitions are stored in configurable memory units 1102a and 1102b included in the array of configurable units 190, col 17, ln 31-40/ The reset signal 1111 discussed herein above is also referred to herein as a “read ready signal 1111,” as the reorder unit 1103 transmits[task] the reset signal 1111 after receiving[task] all the partitions associated with the first cycle 1109a[task] of sequence IDs and after transmitting these partitions to the consumers. Thus, once the reorder unit 1103 is ready to read the next batch of partitions[portion] (e.g., the partitions associated with the second cycle 1109b[task] of sequence IDs), the reorder unit 1103 issues the reset signal 1111. Thus, initially, the reorder unit 1103 reads a first batch of partitions[portion] (the partitions associated with the first cycle 1109a of sequence IDs), reorders and transmits the first batch of partitions[insufficient to reconstruct the tensort] to the consumers in the target order, and then issues the reset signal when the reorder unit 1103 is ready to receive the second batch of partitions, col 22, ln 21-37/ the reorder unit receiving the partition and transferring the partition without reconstruct the tensor as described above/ a plurality of configurable units, each configurable unit having two or more corresponding sections, the plurality of configurable units arranged in a serial arrangement to form a chain of sections of the configurable units; and a data bus connected to the plurality of configurable units which communicates data at a clock rate, wherein the chain of sections is configured to: receive and write a series of tensors at the clock rate at a first end section of the chain of sections, sequentially propagate the series of tensors through individual sections within the chain of sections at the clock rate, such that a first tensor of the series of tensors is written to a first section of the chain of sections at a first clock cycle, and the first tensor is propagated and rewritten to a second section of the chain of sections at a second clock cycle, output the series of tensors at a second end section of the chain of sections, the first end section and the second end section being two opposite end sections of the chain of sections, and also output the series of tensors at an intermediate section of the chain of sections, the intermediate section between the first end section and the second end section of the chain of sections, and wherein each tensor received at the first end section of the chain of sections is output, without modification to corresponding tensor data of a corresponding tensor, at the second end section of the chain of sections and also at the intermediate section of the chain of sections, col 32, ln 21-52 ).
It would have been obvious to one of the ordinary skill in the art before the effective filling date of claimed invention was made to modify the teaching of Li, Zhang Owechko and Prabhakar with to incorporate the above feature because this maximizes operating efficiency, it may be desirable to time-multiplex programs on the reconfigurable architecture system.
As to claim 2, Narayanamoort teaches the tensor is a second tensor; and the method further comprises: shuffling rows or columns of a first tensor to generate the second tensor( para[0032], ln 1-10) for the same reason as to claim 1 above.
As to claims 12, 18, they are rejected for the same reason as to claim 1 above. In additional, memory; and at least one microprocessor( For example, the support systems 774 can include a microprocessor that coordinates the activities of the acceleration engine 760, including moving data around on the acceleration engine 760…. In some examples, the program executed by the microprocessor is stored on the hardware of microprocessor, or on a non-volatile memory chip in the host system 700, para[0125], ln 3- 20), non-transitory computer storage medium( on a non-transitory computer readable medium, para[0098], ln 3-5).
As to claim 13, It is rejected for the same reason as to claim 2 above. In additional, Owechko teaches the plurality of portions having a same size( para[0055], ln 23-50) for the same reason as to claim 1 above. Li teaches tensor of a computation of an artificial neural network( para[0030], ln 12-38).
Claim(s) 3, 4, 5, 6, 7, 14, 15, 16, 19, 20 are rejected under 35 U.S.C. 103 as being unpatentable over Li(US 20200409664 A1) in view of Narayanamoort(US 20200272596 A1) in view of Owechko( US 20200348997 A1) in view of Xu(US 11385875 B2) in view of Prabhakar( US 11366783 B1) and further in view of AVUDAIYAPPA(US 20230244448 A1).
As to claim 3, AVUDAIYAPPA teaches the first portions include a third portion and a fourth portion; the third portion is split into a first number of parts among the plurality of parts; the fourth portion is split into a second number of parts among the plurality of parts; and the first number is different from the second number( In some embodiments, state machine 800 may receive an operation type (e.g., 3?3 filter or 3?4 filter), divide an input tensor into slice, divide a slice into sub-slice, and divide a sub-slice into multiple variable-length cache lines, which may include halos and center pixels, and cache the formed cache lines in the activation cache 103. In various embodiments, sub-slices may be sized based on two things: 1) number of center pixels, which is decided by the number of rows in a MA group and 2) filter size, for example. As mentioned above, cache line size may vary based on the filter dimension. Accordingly, state machine 800 may receive filter dimension information, which dictates the size of the halos, for example. On the other hand, the number of center pixels is dependent on the size of the MA group, para[0030], ln 10-25).
It would have been obvious to one of the ordinary skill in the art before the effective filling date of claimed invention was made to modify the teaching of Li, Narayanamoort, Owechko, Xu and Prabhakar with AVUDAIYAPPA to incorporate the feature of first portions include a third portion and a fourth portion; the third portion is split into a first number of parts among the plurality of parts; the fourth portion is split into a second number of parts among the plurality of parts; and the first number is different from the second number because this allows activations to be efficiently loaded as sub-slices.
As to claim 4, Owechko teaches one part among the first number of parts is same as one part among the second number of parts( para[0055], ln 19-39) for the same reason as to claim 1 above.
As to claim 5, Owechko teaches one of the second portions is same as one part among the second number of parts( para[0055], ln 19-34) for the same reason as to claim 1 above .
As to claim 6, Owechko teaches generating a last part among the second number of parts from subtracting from the fourth portion a sum of parts, among the second number of parts, other than the last part( para[0055], ln 2-60).
As to claim 7, Owechko teaches generating, using a random number generator, at least one of the second number of parts( para[0055], ln 56-61).
As to claim 11, Owechko teaches the plurality of portions have a same size( para[0055], ln 23-50) for the same reason as to claim 1 above .
As to claim 14, AVUDAIYAPPA teaches the plurality of portions include a third portion and a fourth portion; the third portion is split into a first number of parts among a plurality of parts generated from the plurality of portions; the fourth portion is split into a second number of parts among the plurality of parts; and the first number is different from the second number( para[0030], ln 10-25) for the same reason as to claim 3 above.
As to claims 15, 16, they are rejected for the same reason as to claims 4, 7 , 2, 11 above.
As to claim 19, AVUDAIYAPPA teaches the tensor is a second tensor; and the method further comprises: shuffling rows and columns of a first tensor to generate the second tensor; wherein some of the plurality of portions have different sizes, para[0030], ln 10-25).
As to claim 20, AVUDAIYAPPA teaches some of the portions are split into different numbers of parts(para[0030], ln 10-25) for the same reason as to claim 3 above.
Allowable Subject Matter
Claims 8, 9, 10, 17 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Claims 8, 9, 10, 17 are rewritten in independent form including all of the limitations of the base claim and any intervening claims, that combination would overcome 101 rejection.
Reason for allowance
Li(US 20200409664 A1) in view of Narayanamoort(US 20200272596 A1) and further in view of Owechko( US 20200348997 A1) do not teach generating a list of unique parts from parts of the first portions; wherein the plurality of computing tasks are no more than operating on the list of unique parts and the second portions. The method of claim 6, further comprising: removing duplicative parts from parts of the first portions to generate the plurality of computing tasks, transforming a part of the first portions to generate a corresponding task among the plurality of computing tasks; wherein transforming includes offsetting, bit-wise shifting, adding a constant, multiplying by a constant, or homomorphic encryption, or any combination thereof that recited in the claims 8-10.
Li(US 20200409664 A1) in view of Narayanamoort(US 20200272596 A1) and further in view of Owechko( US 20200348997 A1) do not teach the at least one microprocessor is further configured via the instructions to: generate a list of unique parts from parts of the plurality of portions, wherein a count of the plurality of computing tasks is equal to a count of the unique parts; and transform, via offsetting, bit-wise shifting, adding a constant, multiplying by a constant, or homomorphic encryption, or any combination thereof, a part of the plurality of portions to generate a corresponding task among the plurality of computing tasks as recited in claim 17.
Response to the argument:
A. Applicant amendment filed on 04/07/2026 has been considered but they are not persuasive:
Applicant argued in substance that :
(1) “ the claims are not merely directed to the alleged abstract idea and even if construed to be directed to the alleged abstract idea, include claim recitations that amount to significantly more than the purported abstract idea and the subject matter of the claims are integrated into a practical application.”
(2) “ Owechko and Xu, fail to disclose “shuffling, by the computing device, at least the computing tasks in distribution of the computing tasks to a plurality of external entities such that each external entity only receives a subset of the portions and parts that is insufficient to reconstruct the tensor,"
B. Examiner respectfully disagreed with Applicant's remarks:
As to the point (1), Claims 1, 12, 18 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
As to Claims 1, 12, 18 have been rejected under 35 USC 101 for abstract idea without significantly more. Under Step 2A, Prong 1, the “ partitioning, by a computing device, a tensor into a plurality of portions”; “identifying, by the computing device, first portions among the plurality of portions, the plurality of portions including second portions not in the first portions”; “splitting, by the computing device, each respective portion among the first portions into a plurality of parts ” recite a mental process since “ partitioning” , “ identifying”, “ splitting” are functions that can be reasonably performed in the human mind with the aid of pen and paper through observation, evaluation, judgment, opinion.
Under Prong 2, the additional element “ generating, by the computing device, a plurality of computing tasks, each of the computing tasks configured to operate based on a respective part among the plurality of parts or a respective portion among the second portions, wherein at least one computing task of the plurality of computing tasks is generated by transforming at least one part of the plurality of parts of the first portions to create a transformed part to prevent external entities from reconstructing the tensor; outsourcing the at least one task to the transformed part to generate a first result of the transformed part” are recited at a high-level of generality such that it amounts no more than mere instructions to apply the exception using a generic computer component, or merely a generic computer or generic computer components to perform the judicial exception, Accordingly, the additional elements do not integrate the recited judicial exception into a practical application, and the claim is therefore directed to the judicial exception. See MPEP 2106.05(f).
Under Step 2B, the additional elements “ generating, by the computing device, a plurality of computing tasks, each of the computing tasks configured to operate based on a respective part among the plurality of parts or a respective portion among the second portions, wherein at least one computing task of the plurality of computing tasks is generated by transforming at least one part of the plurality of parts of the first portions to create a transformed part to prevent external entities from reconstructing the tensor; outsourcing the at least one task to the transformed part to generate a first result of the transformed part” - this generally have been a mental process although the computing device could be a generic computer component if the spec describes it as actual computer hardware.
“shuffling, by the computing device, at least the computing tasks in distribution of the computing tasks to a plurality of external entities such that each external entity only receives a subset of the portions and parts that is insufficient to reconstruct the tensor the external entities; and generating, by the computing device, a second result of operating based on the tensor using results, received from the external entities, of the computing tasks.” - this is mere instructions to apply the mental process under mpep 2106.05(f), amounts to merely generally, amounts to merely generally linking the use of the judicial exception to a particular technological environment or field or use, and is merely applying the judicial exception, therefore, does not amount to significantly more, hence, cannot provide an inventive concept.
As to the point (2), Prabhakar teaches FIG. 11F is a combination of FIGS. 11D and 11E. FIG. 11F illustrates the order in which the partitions are output by the memory units 1102a, 1102b during the first and second cycles to the reorder units 1103 (e.g., received order comprises partitions 1106a, 1106e, 1106g, 1106c, 1106b, 1106f, 1106d, 1106h , col 22, ln 12-20/ IG. 11A illustrates a system 1150 including a configurable reorder memory unit to receive partitions from multiple configurable memory units, and to output the received partitions in correct order to consumers 1105 of the partitions. In the system 1150, a tensor 1100 (the full tensor 1100 is not illustrated in FIG. 11A) is partitioned into partitions 1106a, 1106b, 1106c, 1106d, 1106e, 1106f, 1106g, and 1106h, as discussed with respect to FIGS. 6 and 7. The partitions are stored in configurable memory units 1102a and 1102b included in the array of configurable units 190, col 17, ln 31-40/ The reset signal 1111 discussed herein above is also referred to herein as a “read ready signal 1111,” as the reorder unit 1103 transmits[task] the reset signal 1111 after receiving[task] all the partitions associated with the first cycle 1109a[task] of sequence IDs and after transmitting these partitions to the consumers. Thus, once the reorder unit 1103 is ready to read the next batch of partitions[portion] (e.g., the partitions associated with the second cycle 1109b[task] of sequence IDs), the reorder unit 1103 issues the reset signal 1111. Thus, initially, the reorder unit 1103 reads a first batch of partitions[portion] (the partitions associated with the first cycle 1109a of sequence IDs), reorders and transmits the first batch of partitions[insufficient to reconstruct the tensor] to the consumers in the target order, and then issues the reset signal when the reorder unit 1103 is ready to receive the second batch of partitions, col 22, ln 21-37/ the reorder unit receiving the partition and transfer the partition without reconstruct the tensor as described above/ a plurality of configurable units, each configurable unit having two or more corresponding sections, the plurality of configurable units arranged in a serial arrangement to form a chain of sections of the configurable units; and a data bus connected to the plurality of configurable units which communicates data at a clock rate, wherein the chain of sections is configured to: receive and write a series of tensors at the clock rate at a first end section of the chain of sections, sequentially propagate the series of tensors through individual sections within the chain of sections at the clock rate, such that a first tensor of the series of tensors is written to a first section of the chain of sections at a first clock cycle, and the first tensor is propagated and rewritten to a second section of the chain of sections at a second clock cycle, output the series of tensors at a second end section of the chain of sections, the first end section and the second end section being two opposite end sections of the chain of sections, and also output the series of tensors at an intermediate section of the chain of sections, the intermediate section between the first end section and the second end section of the chain of sections, and wherein each tensor received at the first end section of the chain of sections is output, without modification to corresponding tensor data of a corresponding tensor, at the second end section of the chain of sections and also at the intermediate section of the chain of sections, col 32, ln 21-52 ).
Conclusion
US 20210256386 A1This effectively condenses information from neighboring frequencies into a lower dimensionality representation. A size of at least one of the first and second dimensions may be reduced within a convolutional group without use of a pooling operation
US 9092802 B1 achieved quickly and accurately, without the need for substantial efforts on their part.
US 20200042867 A1 n each dimension, tasks may be divided into many groups. Each group may be further divided into several waves. In one embodiment, an architecture, a first engine, the frontal engine (FE) receives 5D tensors [N, K, C, Y, X] from the host The frontal engine may divide them into many sets of tensors, such as [ Ng, Kg, Cg, Yg, Xg], and send these groups to the parietal engine (PE). The PE may obtain the group tensor and divides them into waves.
US 20210264250 A1The pooling unit includes a cropper configured to receive a feature tensor and to generate a cropped feature tensor including a plurality of data values by cropping the feature tensor.
US 20230100036 A1 fore divide or delineate the tensor 105 into a set of sub-tensors 115A-D (collectively sub-tensors 115). In the illustrated example, the system divides the tensor 105 into four sub-tensors (one for each quadrant of the input tensor 105). In various aspects, however, the tensor 105 may be divided into any number of sub-tensors.
US 20220108156 A1 The values in the rest of the blocks will be forced to zero. In partitioned K-winner, for a given output activation tensor, the processor may divide the output activation tensor into partitions. For each partition, the processor may select a fixed number of highest values in the partition as the winners. In this context, if K-winner is applied to an entire tensor, the K-winner approach may be referred to as a global K-winner.
US 20220360778 A1 according to embodiments single-tree partition may preserve the full and/or complete kernel tensor without dividing it into sub-groups (e.g., sub-tensors). In a case involving non-single-tree partition, then a kernel tensor may be (e.g., has been) divided into sub-groups (e.g., sub-tensors) in those non-single tree partitions before the end.
US 20130339414 A1 teaches denominator are evaluated separately and then divided, or a more geometric method is obtained by performing the division before each linear interpolation operation of the recursive process.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LECHI TRUONG whose telephone number is (571)272-3767. The examiner can normally be reached 10-8 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor Young Kevin can be reached on (571)270-3180. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/LECHI TRUONG/ Primary Examiner, Art Unit 2194