DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 06/29/2023 and 02/04/2026 was filed. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 rejected under 35 U.S.C. 101 because the claimed invention is directed to an
abstract idea without significantly more.
Regarding claim 1,
Step 1: Is the claim to a process, machine, manufacture or composition of matter?
Claim 1 is directed to manufacture (computer program product).
Step 1: yes
Step 2A, prong 1: Does the claim recite an abstract idea, law of nature, or natural phenomenon?
selecting a sequence concatenation length of a sequence concatenation of concatenated tokens from inputs to process in the accelerator; (limitation is directed to a mental process. One can mentally select a sequence length by use of pen and paper with respect to the tokens from the input.)
determining combinations of input lengths, wherein each combination of input lengths sum to the sequence concatenation length; (limitation is directed to a mental process. One can mentally determine combination of lengths by use of pen and paper with respect to the sum of input lengths.)
processing a representation of a transformer to generate program binaries for each combination of input lengths of the combinations, wherein the program binaries, for a given combination of input lengths of the combinations,…and compute attention scores between tokens within each input of the inputs in the sequence concatenation; and (limitation is directed to a mental process. One can mentally process the representation of the transformer by use of pen and paper with respect to given combinations of lengths and tokens being decomposed.)
Step 2A, prong 1: If claim limitations, under their broadest reasonable interpretation, covers performance of the limitations as a mental process but for the recitation of generic computer components, then it falls within the mental process grouping of abstract ideas. According, the claim “recites” an abstract idea.
Step 2A, prong 2: Does the claim recite additional elements that integrate the judicial exception into a practical application?
generating program binaries to program a transformer in an array of processing elements in an accelerator, the computer program product comprising a computer readable storage medium having computer readable program code embodied therein that is executable to perform operations, the operations comprising: (e.g., mere instructions to apply the judicial exception using generic computing components, (see MPEP 2106.05(f)).
program the accelerator to decompose tokens of inputs in the sequence concatenation according to the input lengths in the given combination (e.g., mere instructions to apply the judicial exception using generic computing components, (see MPEP 2106.05(f)).
executing program binaries, for a combination of input lengths, to process a sequence concatenation of received inputs, having input lengths of the combination, to generate outputs for tokens in the processed sequence concatenation. (e.g., mere instructions to apply the judicial exception using generic computing components, (see MPEP 2106.05(f)).
Step 2A, prong 2: Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is directed to an abstract idea.
Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
generating program binaries to program a transformer in an array of processing elements in an accelerator, the computer program product comprising a computer readable storage medium having computer readable program code embodied therein that is executable to perform operations, the operations comprising: (e.g., mere instructions to apply the judicial exception using generic computing components, (see MPEP 2106.05(f)).
program the accelerator to decompose tokens of inputs in the sequence concatenation according to the input lengths in the given combination (e.g., mere instructions to apply the judicial exception using generic computing components, (see MPEP 2106.05(f)).
executing program binaries, for a combination of input lengths, to process a sequence concatenation of received inputs, having input lengths of the combination, to generate outputs for tokens in the processed sequence concatenation. (e.g., mere instructions to apply the judicial exception using generic computing components, (see MPEP 2106.05(f)).
Step 2B: Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.
Regarding claim 2,
Claim 2 incorporates the analysis of the product of claim 1.
Step 2A, prong 1:
determining an optimal sequence concatenation length that when input to the transformer maximizes throughput within a latency constraint, wherein the selected sequence concatenation length comprises the optimal sequence concatenation length. (limitation is directed to a mental process. One can mentally determine an optimal sequence length by use of pen and paper with respect to the transformer maximizing the input with a latency constraint.)
Step 2A, prong 1: If claim limitations, under their broadest reasonable interpretation, covers performance of the limitations as a mental process but for the recitation of generic computer components, then it falls within the mental process grouping of abstract ideas. According, the claim “recites” an abstract idea.
Regarding claim 3,
Claim 2 incorporates the analysis of the product of claim 1.
Step 2A, prong 1:
determining a distribution of occurrences of different input lengths of received inputs, wherein the combinations of input lengths in the combinations comprises the input lengths having a greatest number of occurrences in the distribution of occurrences. (limitation is directed to a mental process. One can mentally determine a distribution of occurrences by use of pen and paper with respect to the combination of input lengths having the greatest occurrence.)
Step 2A, prong 1: If claim limitations, under their broadest reasonable interpretation, covers performance of the limitations as a mental process but for the recitation of generic computer components, then it falls within the mental process grouping of abstract ideas. According, the claim “recites” an abstract idea.
Regarding claim 4 and analogous claims 12 and 17,
Claim 4 incorporates the analysis of the product of claim 1.
Step 2A, prong 1:
forming the sequence concatenation from a plurality of tokens from the received inputs,…,having input lengths summing to the sequence concatenation length. (limitation is directed to a mental process. One can mentally form the sequence concatenation by use of pen and paper with respect to the sum of the input lengths.)
Step 2A, prong 1: If claim limitations, under their broadest reasonable interpretation, covers performance of the limitations as a mental process but for the recitation of generic computer components, then it falls within the mental process grouping of abstract ideas. According, the claim “recites” an abstract idea.
Step 2A, prong 2/ Step2B:
for processing by the transformer, (e.g., mere instructions to apply the judicial exception using generic computing components, (see MPEP 2106.05(f)).
Step 2A, prong 2: Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is directed to an abstract idea.
Step 2B: Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.
Regarding claim 5 and analogous claims 13 and 18,
Claim 5 incorporates the analysis of the product of claim 1.
Step 2A, prong 1:
in response to the input lengths of the received inputs not comprising input lengths in one of the combinations of input lengths, performing: determining a combination of input lengths closest to the input lengths of the received inputs; (limitation is directed to a mental process. One can mentally determine a combination of lengths by use of pen and paper with respect to the input lengths received.)
padding the received inputs to have the input lengths of the determined combination; and (limitation is directed to a mental process. One can mentally pad received inputs by use of pen and paper with respect to input lengths determined for combination.)
forming the sequence concatenation from the padded received inputs, having input lengths summing to the sequence concatenation length. (limitation is directed to a mental process. One can mentally form the sequence concatenation from padded inputs by use of pen and paper with respect to the sum of the input lengths.)
Step 2A, prong 1: If claim limitations, under their broadest reasonable interpretation, covers performance of the limitations as a mental process but for the recitation of generic computer components, then it falls within the mental process grouping of abstract ideas. According, the claim “recites” an abstract idea.
In regard to claim 6 and analogous claims 14 and 19,
Claim 6 incorporates the analysis of the product of claim 1.
Step 2A, prong 1:
determining a combination having input lengths corresponding to input lengths of tokens in the received inputs to include in the sequence concatenation; (limitation is directed to a mental process. One can mentally determine a combination of input lengths by use of pen and paper with respect to the length of the input tokens.)
determining program binaries of the program binaries generated for the determined combination; and (limitation is directed to a mental process. One can mentally determine program binaries by use of pen and paper with respect to the determined combination.)
Step 2A, prong 1: If claim limitations, under their broadest reasonable interpretation, covers performance of the limitations as a mental process but for the recitation of generic computer components, then it falls within the mental process grouping of abstract ideas. According, the claim “recites” an abstract idea.
Step 2A, prong 2/Step 2B:
executing the determined program binaries to control the transformer to transform the tokens in the received inputs in the sequence concatenation to generate the outputs for the received inputs in the sequence concatenation. (e.g., mere instructions to apply the judicial exception using generic computing components, (see MPEP 2106.05(f)).
Step 2A, prong 2: Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is directed to an abstract idea.
Step 2B: Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.
In regard to claim 7,
Claim 7 incorporates the analysis of the product of claim 1.
Step 2A, prong 2/Step 2B:
programming the transformer to perform sequence-parallel matrix multiplications on the sequence concatenation as part of encoding and decoding of the sequence concatenation. (e.g., mere instructions to apply the judicial exception using generic computing components, (see MPEP 2106.05(f)).
Step 2A, prong 2: Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is directed to an abstract idea.
Step 2B: Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.
In regard to claim 8 and analogous claims 15 and 20,
Claim 8 incorporates the analysis of the product of claim 1.
Step 2A, prong 1:
performing attention analysis of the sequence concatenation of tokens to calculate attention scores between pairs of tokens within each of the received inputs, wherein attention scores are not determined between tokens from different received inputs; (limitation is directed to a mental process. One can mentally calculate attention scores by use of pen and paper with respect to tokens from received inputs.)
forming an attention matrix, for the sequence concatenation, having the calculated attention scores for pairs of tokens within the received inputs of the sequence concatenation; and (limitation is directed to a mental process. One can mentally form an attention matrix by use of pen and paper with respect to the calculated attention scores.)
Step 2A, prong 1: If claim limitations, under their broadest reasonable interpretation, covers performance of the limitations as a mental process but for the recitation of generic computer components, then it falls within the mental process grouping of abstract ideas. According, the claim “recites” an abstract idea.
Step 2A, prong 2/Step 2B:
applying the attention matrix in matrix multiplications to produce the outputs for the received inputs. (e.g., mere instructions to apply the judicial exception using generic computing components, (see MPEP 2106.05(f)).
Step 2A, prong 2: Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is directed to an abstract idea.
Step 2B: Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.
In regard to claim 9,
Claim 9 incorporates the analysis of the product of claim 8.
Step 2A, prong 2/Step 2B:
wherein the attention matrix comprises a sparse matrix. (e.g., merely indicating a field of use or technological environment in which to apply a judicial exception (see MPEP 2106.05(h)).
Step 2A, prong 2: Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is directed to an abstract idea.
Step 2B: Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.
In regard to claim 10,
Claim 10 incorporates the analysis of the product of claim 8.
Step 2A, prong 1:
generating an attention mask for the attention matrix indicating cells in the attention matrix having the attention scores and cells not having attention scores, wherein the attention mask is used to process only cells having calculated attention scores. (limitation is directed to a mental process. One can mentally determine an attention mask by use of pen and paper with respect to the attention scores.)
Step 2A, prong 1: If claim limitations, under their broadest reasonable interpretation, covers performance of the limitations as a mental process but for the recitation of generic computer components, then it falls within the mental process grouping of abstract ideas. According, the claim “recites” an abstract idea.
Regarding claim 11,
Step 1: Is the claim to a process, machine, manufacture or composition of matter?
Claim 11 is directed to a machine (system).
Step 1: yes
Step 2A, prong 2/Step 2B:
an accelerator; a processor; and a computer readable storage medium having computer readable program code embodied therein that is when executed by the processor performs operations the operations comprising: (e.g., mere instructions to apply the judicial exception using generic computer components, (see MPEP 2106.05(f)).
Step 2A, prong 2: Since the claim as a whole, looking at the additional elements individually and
in combination, does not contain any other additional elements that are indicative of integration into a
practical application, the claim is directed to an abstract idea.
Step 2B: Considering the additional elements individually and in combination, and the claim as a
whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the
claim is not patent eligible.
The rest of the analysis for claim 11 is analogous to claim 1.
Regarding claim 16,
Step 1: Is the claim to a process, machine, manufacture or composition of matter?
Claim 16 is directed to a process (method).
Step 1: yes
The rest of the analysis for claim 16 is analogous to claim 1.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-2, 4, 7-8, 10-12, 15-17 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Burr et al (US Published Patent Application No. 20230169305, "Burr"), in view of Park et al (OPTIMUS: OPTIMIZED MATRIX MULTIPLICATION STRUCTURE FOR TRANSFORMER NEURAL NETWORK ACCELERATOR, "Park").
In regard to claim 1, Burr teaches A computer program product for generating program binaries to program a transformer in an array of processing elements in an accelerator, the computer program product comprising a computer readable storage medium having computer readable program code embodied therein that is executable to perform operations, the operations comprising: (Burr, paragraph 0004, “According to another embodiment, a computer program product includes one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media, the program instructions including instructions configured to cause one or more processors to perform a method”)
selecting a sequence concatenation length of a sequence concatenation of concatenated tokens from inputs to process in the accelerator; (Burr, paragraph 0083, “From the perspective of the local compute-units, the workload presents itself as a continuous stream of M=2, with each sequence passing through the accelerator in a weight-stationary manner, overlapping at each compute-unit only with the sequence immediately preceding it and the sequence immediately following it.”)
determining combinations of input lengths, wherein each combination of input lengths sum to the sequence concatenation length; (Burr, paragraph 0058, “The sequence-size indicates the length of a sequence (in units of tokens) that is input into the transformer. In some embodiments, the sequence-size may change during the network, as tokens are potentially "stacked" or combined together, or broken apart and duplicated, which may change the sequence-size as the workload proceeds through a multi-layer network.’)
executing program binaries, for a combination of input lengths, to process a sequence concatenation of received inputs, having input lengths of the combination, to generate outputs for tokens in the processed sequence concatenation. (Burr, paragraph 0075, “Transformers (e.g., BERT, GPT-3, etc.) may include multiple often-identical layers, each comprising blocks of vector-matrix multiply (VMM) (where input excitation vectors are multiplied by trained-weight matrices), interspersed with blocks of attention compute calling for matrix-matrix multiplication between the output excitation vectors from the VMM blocks. The workload may be arranged into sequences of tokens (words or characters) of length S.” and paragraph 0076, “The number of compute operations in the attention compute tends to scale as S2, and this block of the workload may typically be staged-for example, all of the input data needed to initiate the attention-compute block may need to be available before the computation starts (otherwise, the computation will rapidly stall).”)
However, Burr does not explicitly teach processing a representation of a transformer to generate program binaries for each combination of input lengths of the combinations, wherein the program binaries, for a given combination of input lengths of the combinations, program the accelerator to decompose tokens of inputs in the sequence concatenation according to the input lengths in the given combination and compute attention scores between tokens within each input of the inputs in the sequence concatenation; and
Park teaches processing a representation of a transformer to generate program binaries for each combination of input lengths of the combinations, wherein the program binaries, for a given combination of input lengths of the combinations, program the accelerator to decompose tokens of inputs in the sequence concatenation according to the input lengths in the given combination and compute attention scores between tokens within each input of the inputs in the sequence concatenation; and (Park, pg. 8, Col. 2, paragraph 3, “Note that the computation and the data load cycles can be estimated given a word-length; i.e., the computation cycle for COM1 = [d2 model _ t] / [# MAC_ Effective PE Util], whereas the data transfer cycle forWo = [d2model_Sparsity (dense=1.0)] [generate program binaries for each combination of input lengths of the combinations] / Bandwidth.” and pg. 12, Col. 2, paragraph 2, “The second computation (COM2) is to compute the score. A score is computed as the inner product of K and V , which represents how words relate to each other. COM2 is always multiplication of two dense matrices (dM_dM) because any pruned weight [decompose tokens of inputs in the sequence concatenation] is not used in COM2.”)
Burr and Park are related to the same field of endeavor (i.e. transformer computation). In view of the teachings of Park, it would have been obvious for a person with ordinary skill in the art to apply the teachings of Park to Burr before the effective filing date of the claimed invention in order to improve the accuracy of the attention mechanism. (Park, pg. 1, Col. 1, paragraph 1, “The attention mechanism improves the accuracy by allowing the decoding process to focus on the input part which is the most relevant to the current decoding step.”)
In regard to claim 11, the claim recites similar limitations as corresponding claim 1, and is rejected for similar reasons as claim 1 using similar teachings and rationale.
Burr further teaches an accelerator; (Burr, paragraph 0081, “This invention describes a way to organize a batch of size M of sequences for submission to an AI hardware accelerator performing transformer operations in such a way that the attention compute is continuously busy and the VMM compute can be efficiently power-gated.”)
a processor; and (Burr, paragraph 0088, “The computer program product may include a computer readable storage medium ( or media) having computer readable program instructions thereon for causing a processor to carry out embodiments of the present invention.”)
a computer readable storage medium having computer readable program code embodied therein that is when executed by the processor performs operations the operations comprising: (Burr, paragraph 0088, “The computer program product may include a computer readable storage medium ( or media) having computer readable program instructions thereon for causing a processor to carry out embodiments of the present invention.”)
In regard to claim 16, the claim recites similar limitations as corresponding claim 1, and is rejected for similar reasons as claim 1 using similar teachings and rationale.
In regard to claim 2, Burr and Park teach the product of claim 1.
Burr further teaches determining an optimal sequence concatenation length that when input to the transformer maximizes throughput within a latency constraint, wherein the selected sequence concatenation length comprises the optimal sequence concatenation length. (Burr, paragraph 0081, “This invention describes a way to organize a batch of size M of sequences for submission to an AI hardware accelerator performing transformer operations in such a way that the attention compute is continuously busy and the VMM compute can be efficiently power-gated. The advantage of this approach is that internal compute units are kept occupied for a larger fraction of the time, with fewer pipeline stalls, and lower total latency.” and paragraph 0082, “In one embodiment, the batch of size M that is ready for submission is organized in sequence order in terms of minimum distance between the sequence-size of each batch member, S, and the sequence-size, Sb, for which the compute time for the sequence to pass through each VMM block is identical to the time for the sequence to pass through each attention-compute block.”)
In regard to claim 4 and analogous claims 12 and 17, Burr and Park teach the product of claim 1.
Burr further teaches forming the sequence concatenation from a plurality of tokens from the received inputs, for processing by the transformer, having input lengths summing to the sequence concatenation length. (Burr, paragraph 0058, “Further still, in one embodiment, the transformer may be implemented utilizing one or more hardware computing components. In another embodiment, a sequence may include a series of tokens (e.g., words, characters, phonemes, images, regions-of-interest from within an image, etc.). In yet another embodiment, the sequence may include a sentence, paragraph, or other text sample for which NLP is to be performed. The sequence-size indicates the length of a sequence (in units of tokens) that is input into the transformer.” and paragraph 0060, “In a similar manner, the sequence-size of the input sequence determines a second compute time for the sequence by each of one or more attention-compute blocks of the transformer, representing the taken by the one or more attention-compute blocks to compute the input sequence.” And paragraph 0061, “For example, a threshold sequence-size may be determined for the transformer where the first compute time has a known relationship with second compute time ( e.g., when the first compute time equals the second compute time, when the first compute time is longer or shorter than the second compute time by a predetermined length, when the first compute time is longer or shorter than the second compute time by a predetermined factor, etc.). This threshold sequence-size will depend on the entire equation for the first and second compute times, including the scaling with sequence-size, any pre-factors affecting these scaling factors, and any constant pre-factors within these compute times that are independent of sequence-size. For example, when the first compute time scales as A*S*S+B*S+C and the second compute time scales as D*S*S+E*S*S+F, [having input lengths summing to the sequence concatenation length. ] where S represents sequence-size, then the threshold sequence-size is affected by the size of A, B, C, D, E and F.)
In regard to claim 7, Burr and Park teach the product of claim 1.
Park further teaches programming the transformer to perform sequence-parallel matrix multiplications on the sequence concatenation as part of encoding and decoding of the sequence concatenation. (Park, pg. 3, Col. 2, paragraph 4, “In the Transformer, the computation pattern in the encoding stage is vastly different from the decoding stage. In the encoding stage, all the words in an input can be processed in parallel thanks to the attention-based layer structure” and pg. 4, Col. 1, paragraph 1, “Therefore, one can exploit parallelism in the time-step dimension to accelerate the processing speed. For example, one can stack word vectors into an input matrix and employ matrix-matrix multiplication to reuse weight matrix and perform computation in parallel across the timestep.”)
Burr and Park are combinable for the same rationale as set forth above with respect to claim 1.
In regard to claim 8 and analogous claims 15 and 20, Burr and Park teach the product of claim 1.
Park further teaches performing attention analysis of the sequence concatenation of tokens to calculate attention scores between pairs of tokens within each of the received inputs, wherein attention scores are not determined between tokens from different received inputs; (Park, pg. 12, Col. 2, paragraph 2, “The second computation (COM2) is to compute the score. A score is computed as the inner product of K and V , which represents how words relate to each other. COM2 is always multiplication of two dense matrices (dM_dM) because any pruned weight is not used in COM2.”)
forming an attention matrix, for the sequence concatenation, having the calculated attention scores for pairs of tokens within the received inputs of the sequence concatenation; and (Park, pg. 12, Col. 2, paragraph 2, “The second computation (COM2) is to compute the score. A score is computed as the inner product of K and V , which represents how words relate to each other. COM2 is always multiplication of two dense matrices (dM_dM) because any pruned weight is not used in COM2.”)
applying the attention matrix in matrix multiplications to produce the outputs for the received inputs. (Park, pg. 12, Col. 2, paragraph 1, “On the other hand, when the COM1 in multihead attention of the decoder is computed, K and V are computed by multiplying the final output of the encoder by WK, WV . Q is computed by multiplying the output of the masked multi-head attention of the decoder byWQ. IfWQ, WK, and WV are pruned, COM1 becomes sparse matrix and dense matrix multiplication (sM_dM).”)
Burr and Park are combinable for the same rationale as set forth above with respect to claim 1.
In regard to claim 10, Burr and Park teach the product of claim 8.
Park further teaches generating an attention mask for the attention matrix indicating cells in the attention matrix having the attention scores and cells not having attention scores, wherein the attention mask is used to process only cells having calculated attention scores. (Park, pg. 13, Col. 2, A.5, “Masked multi-head attention is additionally performed only at the decoder. This process is the same as the multi-head attention computation process except the computation of COM2 and COM4 (Fig. A.5). Unlike the correlation among all words in a sentence is computed in the encoder, the correlation between each word and its previous words is only computed in the masked multi-head attention.”)
Burr and Park are combinable for the same rationale as set forth above with respect to claim 1.
Claims 3, 5-6, 9, 13-14 and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Burr, in view of Park and in further view of Luss et al (US Published Patent Application No. 20190188120, "Luss").
In regard to claim 3, Burr and Park teach the product of claim 1.
However, Burr and Park do not explicitly teach determining a distribution of occurrences of different input lengths of received inputs, wherein the combinations of input lengths in the combinations comprises the input lengths having a greatest number of occurrences in the distribution of occurrences.
Luss teaches determining a distribution of occurrences of different input lengths of received inputs, wherein the combinations of input lengths in the combinations comprises the input lengths having a greatest number of occurrences in the distribution of occurrences. (Luss, paragraph 0038, “Therefore, in steps 101 and 102, for data generation, in general, any set that is described is based on an alphabet ~={xl, ... , xn} comprised ofn tokens xi for i=l, ... , n. [greatest number of occurrences in the distribution of occurrences], paragraph 0044 of applicants specification explains that this is what is commonly submitted and not going above the space to store.” and paragraph 0039, “The first representation is token based and was previously described. In this representation, matrix A has n columns where n is the number of individual tokens. The ith string is represented by Ai·, where A,r 1 if token j appears in the string and Aij=0 if it does not appear. The second representation is pattern-based. It only keeps track of what possible patterns can appear in strings. One represents patterns as tuples of tokens, such as ("a", "b", "c") for a pattern consisting of the three tokens "a", "b", "c".’)
Burr, Park and Luss are related to the same field of endeavor (i.e. transformer computation). In view of the teachings of Luss, it would have been obvious for a person with ordinary skill in the art to apply the teachings of Luss to Burr and Park before the effective filing date of the claimed invention in order to efficiently reduce the number of tests done. (Luss, paragraph 0001, “The present invention relates generally to a software optimization method, and more particularly, but not by way of limitation, to a system, method, and recording medium for recovering a small distinguished subset of tokens from a large population while efficiently reducing the total number of tests.”)
In regard to claim 5 and analogous claims 13 and 18, Burr and Park teach the product of claim 1.
Burr further teaches in response to the input lengths of the received inputs not comprising input lengths in one of the combinations of input lengths, performing: determining a combination of input lengths closest to the input lengths of the received inputs; (Burr, paragraph 0058, “Further still, in one embodiment, the transformer may be implemented utilizing one or more hardware computing components. In another embodiment, a sequence may include a series of tokens (e.g., words, characters, phonemes, images, regions-of-interest from within an image, etc.). In yet another embodiment, the sequence may include a sentence, paragraph, or other text sample for which NLP is to be performed. The sequence-size indicates the length of a sequence (in units of tokens) that is input into the transformer. In some embodiments, the sequence-size may change during the network, as tokens are potentially "stacked" or combined together, or broken apart and duplicated, which may change the sequence-size as the workload proceeds through a multi-layer network.”)
However, Burr and Park do not explicitly teach padding the received inputs to have the input lengths of the determined combination; and
forming the sequence concatenation from the padded received inputs, having input lengths summing to the sequence concatenation length.
Luss teaches padding the received inputs to have the input lengths of the determined combination; and (Luss, paragraph 0041, “A substring based on the pattern is genernted by padding before, in between. Jnd after the two tokens in the pattern with random characters from I a-z, \-Z0-<J I.”)
forming the sequence concatenation from the padded received inputs, having input lengths summing to the sequence concatenation length. (Luss, paragraph 0058, “Create M random strings. For each string: sample which patterns will appear in the string using a binomial distribution with probability p and create a substring for each pattern (with padding be-fore, in between, and after each token of the pattern), and append the substrings to form the sample string. Note that each padding is a random number of characters (up to 10) from the set [a-zA-Z0-9].”)
Burr, Park and Luss are combinable for the same rationale as set forth above with respect to claim 3.
In regard to claim 6 and analogous claims 14 and 19, Burr and Park teach the product of claim 1.
Burr further teaches determining a combination having input lengths corresponding to input lengths of tokens in the received inputs to include in the sequence concatenation; (Burr, paragraph 0075, “The workload may be arranged into sequences of tokens (words or characters) of length S.”)
executing the determined program binaries to control the transformer to transform the tokens in the received inputs in the sequence concatenation to generate the outputs for the received inputs in the sequence concatenation. (Burr, paragraph 0066, “In one embodiment, the batch of sequences may be input into the transformer in the alternating order determined utilizing the threshold sequence-size. In another embodiment, the transformer may perform one or more actions (e.g., NLP, etc.) on each of the input sequences. In yet another embodiment, the transformer may provide output ( e.g., a translation, text prediction, a summarization, an analysis, a suggested response, etc.) for each of the input sequences.”)
However, Burr and Park do not explicitly teach determining program binaries of the program binaries generated for the determined combination; and
Luss teaches determining program binaries of the program binaries generated for the determined combination; and (Luss, paragraph 0023, “To set up the notation and define the vocabulary, it is assumed that y, A and x are all binary { 0, 1}. The Boolean vector x E{0, 1} N has K<<N non-zero (faulty) entries. Tokens j with x1=0 are called 'normal'”)
Burr, Park and Luss are combinable for the same rationale as set forth above with respect to claim 3.
In regard to claim 9, Burr and Park teach the product of claim 8.
However, Burr and Park do not explicitly teach wherein the attention matrix comprises a sparse matrix.
Luss teaches wherein the attention matrix comprises a sparse matrix. (Luss, paragraph 0020, “Now, given y and knowing the measurement matrix A, if A was chosen properly, and if x is sparse enough, then the unknown sparse vector x can be recovered.”)
Burr, Park and Luss are combinable for the same rationale as set forth above with respect to claim 3.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SKYLAR K VANWORMER whose telephone number is (703)756-1571. The examiner can normally be reached M-F 6:00am to 3:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Usmaan Saeed can be reached at (571) 272-4046. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/S.K.V./Examiner, Art Unit 2146 /USMAAN SAEED/Supervisory Patent Examiner, Art Unit 2146