DETAILED ACTION
This communication is responsive to the application # 18/210,364 filed on June 15, 2023. Claims 1-20 are pending and are directed toward PARALLEL EXECUTION OF SELF-ATTENTION-BASED AI MODELS.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 6-8 and 15-17 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention. X and Y numbers are not defined. For example, for Y=1, the claimed Y-1=0, further if Y>X then the limitations are indefinite.
Claim 9 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention. Claim 9 requires “the first operation can be performed in parallel with the second operation, wherein, during a first time period, the systolic array is configured to perform the first operation for a first data sequence in parallel with the self-attention circuit performing the second operation for the first data sequence”, which contradicts limitations of claim 1.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-5, 9-14, and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Meyer et al. (US 11,803,736, Filed: Jun. 30, 2020), in view of Chowdhery et al. (US 2022/0253672, Pub. Date: Aug. 11, 2022), hereinafter referred to as Meyer and Chowdhery.
As per claim 1, Meyer in view of Chowdhery teaches an integrated circuit (IC) (FIG. 2 illustrates a block diagram of an example of an integrated circuit device; Meyer, Column 1, lines 39-40), comprising:
a systolic array configured to perform a first operation in a layer of an artificial intelligence (AI) model (Such dedicated circuitry can be implemented using an array of processing elements (which may be referred to as a systolic array), where each processing element (PE) contains circuitry to perform multiplication and accumulation operations to implement a matrix multiplication computation. Meyer, Column 2, lines 17-23) that does not use data from previous data sequences (In a systolic array, two types of information may flow into each row of the array: feature map (FMAP) input elements and weight values. The weight values may flow into the array before the actual matrix multiply computation, and are stored in the processing elements (PEs) of the array. Meyer, Column 2, lines 23-27);
Meyer does not teach a self-attention circuit, Chowdhery however teaches a self-attention circuit configured to perform a second operation in the layer of the AI model that does use data from previous data sequences (As one example, when the network input is an input sequence, the attention neural network can include an encoder neural network that includes a subset of the plurality of layers and that encodes the input sequence to generate a respective encoded representation of each input in the sequence. In this example, the attention mechanism applied by the layers in the encoder is a self-attention mechanism, e.g., a multi-head self-attention mechanism. Chowdhery, [0035]).
Meyer in view of Chowdhery are analogous art to the claimed invention, because they are from a similar field of endeavor of systems, components and methodologies for providing secure communication between computer systems. It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify Meyer in view of Chowdhery. This would have been desirable because Neural networks are machine learning models that employ one or more layers of nonlinear units to predict an output for a received input. Some neural networks include one or more hidden layers in addition to an output layer. The output of each hidden layer is used as input to the next layer in the network, i.e., the next hidden layer or the output layer. Each layer of the network generates an output from a received input in accordance with current values of a respective set of parameters (Chowdhery, [0003]).
As per claim 2, Meyer in view of Chowdhery teaches the IC of claim 1, wherein, for a particular data sequence, the first operation must be performed before the second operation (Examiner NOTE: for a particular data sequence, applying limitations of claim 1 necessitate limitations of claim 2 by causality relation.).
As per claim 3, Meyer in view of Chowdhery teaches the IC of claim 2, wherein, during a first time period, the systolic array is configured to perform the first operation for a first data sequence in parallel with the self-attention circuit performing the second operation for a second data sequence (In these examples, the activation engine 216 may be able to perform between 1 and n parallel computations, where n is equal to the number of colunms in the processing engine array 210. In some cases, one or more of the computations can be performed simultaneously. Examples of computations that each execution channel can perform include exponentials, squares, square roots, identities, binary steps, bipolar steps, sigmoidals, and ramps, among other examples. Meyer, Column 9, lines 20-28).
As per claim 4, Meyer in view of Chowdhery teaches the IC of claim 3, wherein, during a second time period following the first time period, the self-attention circuit is configured to perform the second operation for the first data sequence in parallel with the systolic array performing a third operation in the layer of the AI model for the second data sequence, wherein the third operation must be performed after the second operation (In these examples, the pooling engine 218 may be able to perform between 1 and n parallel computations, where n is equal to the number of columns in the processing engine array 210. In various examples, execution channels of the pooling engine 218 can operate in parallel and/or simultaneously. Meyer, Column 9, lines 38-42).
As per claim 5, Meyer in view of Chowdhery teaches the IC of claim 4, wherein the first and second data sequences correspond to different inputs or queries made to the AI model (the set of sparse matrix multiplication operations may include concurrently performing multiple sparse matrix multiplications using multiple rows of the array of processing elements to multiply the same set of feature map inputs with different constrained fine-grained sparse weight matrices. An example of an integrated circuit device executing such instructions is shown in PE array 550 of FIG. 5. Meyer, Column 9, lines 9-16).
As per claim 9, Meyer in view of Chowdhery teaches the IC of claim 1, wherein the first operation can be performed in parallel with the second operation, wherein, during a first time period, the systolic array is configured to perform the first operation for a first data sequence in parallel with the self-attention circuit performing the second operation for the first data sequence (While only a single input 202 is shown in FIG. 2, in practice the sub-layer 200 is configured to process each layer input in the attended input sequence in parallel to generate an output sequence that includes a respective layer output at each of the positions in the attended input sequence. Chowdhery, [0074]).
As per claim 10, Meyer in view of Chowdhery teaches the IC of claim 1, wherein the second operation performed by the self-attention circuit comprises multiplying each row of a token by a different matrix which is based on data computed from previous tokens (Chowdhery, FIG. 5).
Meyer in view of Chowdhery are analogous art to the claimed invention, because they are from a similar field of endeavor of systems, components and methodologies for providing secure communication between computer systems. It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify Meyer in view of Chowdhery. This would have been desirable because when the neural network 150 generates the network output auto-regressively, the input sequence 140 can be (i) embedded representations of the currently generated network output as of the current time step, optionally modified by adding or element-wise multiplying each embedding by a positional embedding or (ii) embedded representations of a concatenation of a set of encoded representations of the system input 102 and the currently generated network output as of the current time step, optionally separated by one or more separator tokens and further optionally modified by adding or element-wise multiplying each embedding by a positional embedding (Chowdhery, [0049]).
Claims 11-14 and 18-20 have limitations similar to those treated in the above rejection, and are met by the references as discussed above, and are rejected for the same reasons of obviousness as used above.
Claims 6-8 and 15-17 are rejected under 35 U.S.C. 103 as being unpatentable over Meyer et al. (US 11,803,736, Filed: Jun. 30, 2020), in view of Chowdhery et al. (US 2022/0253672, Pub. Date: Aug. 11, 2022), in view of LIANG et al. (US 2022/0300823, Pub. Date: Sep. 22, 2022), hereinafter referred to as Meyer, Chowdhery and LIANG.
As per claim 6, Meyer in view of Chowdhery teaches the IC of claim 3, wherein the AI model comprises performing X number of transformer decoder layers for each of the first and second data sequences, wherein, during the first time period, the first operation corresponds to a Yth transformer decoder layer of the transformer decoder layers for the first data sequence while the second operation corresponds to a Y−1th transformer decoder layer of the transformer decoder layers for the second data sequence (Processing of the input code 142 can include sorting the operations described in the input code 142 into layers, where the outputs of one layer provide the inputs to a next layer. Meyer, Column 5, lines 31-34). Meyer in view of Chowdhery does not teach decoder, LIANG however teaches decoder layer (the first training stage further comprises feeding the plurality of source feature maps to the decoder to reconstruct the respective data sample of each source feature map, generating a source reconstructed dataset, then calculating a source reconstruction loss. Further, the method adds the source reconstruction loss to the first training stage loss. LIANG, [0011]).
Meyer in view of Chowdhery in view of LIANG are analogous art to the claimed invention, because they are from a similar field of endeavor of systems, components and methodologies for providing secure communication between computer systems. It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify Meyer in view of Chowdhery in view of LIANG. This would have been desirable because a DNN includes multiple layers of neurons, including an input layer, hidden layers, and an output layer. Each neuron of the layers receives inputs from one or more previous layers, applying a set of weights to the inputs, and combining these weighted inputs to generate an output, which is in turn provided as input to one or more neurons of a subsequent layer (LIANG, [0002]).
As per claim 7, Meyer in view of Chowdhery in view of LIANG teaches the IC of claim 6, wherein the AI model comprise decoding operations performed after the X number of transformer decoder layers have been completed (the attention neural network can include a decoder neural network that includes a different subset of the plurality of layers and that processes either the network input or, when the attention neural network also includes the encoder neural network, the encoded representation of the network input to generate the network output. In some of these examples, when the network output is an output sequence, the decoder neural network operates autoregressively and the attention sub-layers within some or all of the layers of the decoder apply masked self-attention over the partially generated output sequence. Chowdhery, [0036]). Meyer in view of Chowdhery does not teach wherein the systolic array is configured to: perform decoding operations for the first data sequence during a second time period following the first time period, LIANG however teaches wherein the systolic array is configured to: perform decoding operations for the first data sequence during a second time period following the first time period (the first training stage further comprises feeding the plurality of source feature maps to the decoder to reconstruct the respective data sample of each source feature map, generating a source reconstructed dataset, then calculating a source reconstruction loss. Further, the method adds the source reconstruction loss to the first training stage loss. LIANG, [0011]);
perform a third operation in the layer of the AI model for the second data sequence during a third time period following the second time period (In some examples, the first training stage further comprises feeding the source reconstructed dataset to the encoder to generate a plurality of source reconstructed feature maps. The method also feeds the plurality of source reconstructed feature maps to the fully connected layer to predict the class label of each source reconstructed feature map of the plurality of source reconstructed feature maps, then calculates a source reconstruction classification loss. Lastly, the method adds the source reconstruction classification loss to the first training stage loss. LIANG, [0012]); and perform the first operation for a third data sequence during a fourth time period following the third time period, wherein the third data sequence is based on results of the decoding operations (In some aspects, the second training stage further comprises feeding the plurality of support set feature maps to the decoder to reconstruct the respective data sample of each support set feature map, generating a support set reconstructed dataset, then calculating a support set reconstruction loss. Also, the method may add the support set reconstruction loss to the second training stage loss. LIANG, [0014]).
Meyer in view of Chowdhery in view of LIANG are analogous art to the claimed invention, because they are from a similar field of endeavor of systems, components and methodologies for providing secure communication between computer systems. It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify Meyer in view of Chowdhery in view of LIANG. This would have been desirable because a DNN includes multiple layers of neurons, including an input layer, hidden layers, and an output layer. Each neuron of the layers receives inputs from one or more previous layers, applying a set of weights to the inputs, and combining these weighted inputs to generate an output, which is in turn provided as input to one or more neurons of a subsequent layer (LIANG, [0002]).
As per claim 8, Meyer in view of Chowdhery in view of LIANG teaches the IC of claim 7, wherein the third time period is sufficiently long to flush results from performing the decoding operations from every data processing unit (DPU) in the systolic array (The third stage 140 can operate on the output 138 of the second stage 136, and perform various steps before producing the instructions that are to be executed by the acceleration engine 112. These steps can include, for example, removing redundant dependencies, resolving or handling dependencies between nodes by inserting synchronization instructions into the code, identifying possibly optimizations in memory usage or memory bandwidth usage, and other operations. In some examples, the third stage 140 can include a data scheduler 150 that determines how and when input data are loaded into the accelerator engine 112. Meyer, Column 6, lines 4-14).
Claims 15-17 have limitations similar to those treated in the above rejection, and are met by the references as discussed above, and are rejected for the same reasons of obviousness as used above.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to OLEG KORSAK whose telephone number is (571)270-1938. The examiner can normally be reached on 5:00 AM- 4:00 PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Rupal Dharia can be reached on (571) 272-3880. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/OLEG KORSAK/
Primary Examiner, Art Unit 2492