DETAILED ACTION
Notice of AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
Applicant’s amendment and remarks dated 1/21/2026 have been considered. Claims 11-13 and 16 have been cancelled and claims 17-18 are newly-added. Claims 1-10, 14-15, and 17-18 are pending.
Drawing Objections. The objections to Figs. 2-8 are withdrawn in view of the substitute drawings provided by Applicant.
Objection to Claim 3. The objection to claim 3 is withdrawn in view of Applicant’s amendments to such claim.
35 U.S.C. 112(f) Interpretation. The interpretation of certain claim elements in claims 11-13 under 35 U.S.C. 112(f) is moot in view of Applicant’s cancellation of such claims.
35 U.S.C. 112(b) Rejections. The rejection to claim 10 under 35 U.S.C. 112(b) is withdrawn in view of Applicant’s amendments to such claim.
Response to Arguments
On pages 10-11 of Applicant’s 1/21/2026 Amendment and remarks, Applicant argues that the “amendments are clearly described in paragraphs [00122], [00123], [00136], and [00145] of the present specification.
The examiner respectfully disagrees. See detailed 35 U.S.C. 112(a) rejections below.
On pages 10-12 of Applicant’s 1/21/2026 Amendment and remarks, with respect to the rejections under 35 U.S.C. 101, with respect to Step 2A, Prong 1, Applicant argues that as amended, claim 1 “defines a computer-implemented information processing technique that selectively aggregates and perceives interactions among target vectors to reduce information confusion and avoid unnecessary computation.”
The examiner respectfully disagrees. Claim 1 recites no limitations requiring the claim to be “computer-implemented.” Moreover, the entirety of claim 1 recites mental processes as explained in the detailed rejection. The broadest reasonable interpretation of the claims merely requires processing and analysis of simple vectors (e.g., as simple as [0,0] and [1,1]) which are mental processes.
On pages 11-12 of Applicant’s 1/21/2026 Amendment and remarks, with respect to the rejections under 35 U.S.C. 101, with respect to Step 2A, Prong 2, Applicant argues that as amended, claim 1 “constitutes a technical solution that improves the manner in which sequence-based information is performed”
The examiner respectfully disagrees. As explained above the entirety of claim 1 recites mental processes as explained in the detailed rejection. There is no “technical solution” because no actual technology is recited in the claim.
On page 12 of Applicant’s 1/21/2026 Amendment and remarks, with respect to the rejections under 35 U.S.C. 101, with respect to Step 2B, Applicant argues that as amended, the claims “additionally include a specific processing constraint that feature perception is performed only for correlated positions, which constitutes ‘significantly more’ as required under Alice step 2B”.
The examiner respectfully disagrees. As explained above the entirety of claim 1 recites mental processes as explained in the detailed rejection. There is therefore nothing “significantly more” than the judicial exception in claim 1.
On pages 13-14 of Applicant’s 1/21/2026 Amendment and remarks, with respect to the rejections under 35 U.S.C. 102 as anticipated by LEE-THORP, Applicant argues that the newly-added claim limitations are not taught by LEE-THORP.
The examiner agrees. All rejections under 35 U.S.C. 102 are hereby withdrawn. However, Applicant’s amendments have necessitated the new grounds of rejection under 35 U.S.C. 103 as explained herein.
On pages 13-16 of Applicant’s 1/21/2026 Amendment and remarks, with respect to the rejections under 35 U.S.C. 103, Applicant argues that claims 2-10, 14-15, and new claims 17-18 should all be allowed for the same reasons argued with respect to claim 1.
The examiner respectfully disagrees for the same reasons explained above with respect to claim 1.
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
Claims 1-10, 14-15, and 17-18 are rejected under 35 U.S.C. 112(a) as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, at the time the application was filed, had possession of the claimed invention.
Independent claims 1, 14, and 15 recite the following newly-added features that do not have sufficient written description support:
wherein the feature crossing process aggregates interactions among the at least two target vectors in a manner that reduces information confusion while capturing contextual semantic correlations among target objects, and
wherein the feature perception process selectively processes the output sequence based on positions corresponding to correlated target vector pairs, without requiring feature perception for all pairwise combinations of the input sequence.
Applicant has identified paras. 00122-00123, 00136, and 00145 of the instant specification as providing support.
PNG
media_image1.png
126
580
media_image1.png
Greyscale
PNG
media_image2.png
236
576
media_image2.png
Greyscale
The examiner respectfully disagrees that these portions of the specification, or any other portions of the disclosure (including Figs. 5, 6A, 6B, and paras. 00122-00155), provide sufficient written description support for the newly-added claim limitations to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, at the time the application was filed, had possession of the claimed invention.
First, regarding the “wherein the feature crossing process aggregates interactions among the at least two target vectors in a manner that reduces information confusion while capturing contextual semantic correlations among target objects” limitation, while paras. 00122-00155 provide an explanation of numerous equations used to perform “operations B11-B13”, the specification does not explain how any of these operations or equations will somehow aggregate “interactions among the at least two target vectors in a manner that reduces information confusion while capturing contextual semantic correlations among target objects.” While crossing two vectors can certainly aggregate information from the two target vectors into a cross-vector, the specification does not explain how such crossing process aggregates information (1) “in a manner that reduces information confusion” and (2) “while capturing contextual semantic correlations among target objects.” Nor would one of ordinary skill understand that the operations B11-B13 and/or any of the equations in paras. 00122-00155 would naturally result in aggregating information (1) “in a manner that reduces information confusion” and (2) “while capturing contextual semantic correlations among target objects.” In particular, para. 00146 explains that the output of operation B13 (using equation 10-1) is that the “semantic information of each token in the target sequence correlated to other token can be determined.” One of ordinary skill would understand that equation 10-1 provides information about tokens of the output (or target sequence) relative to other tokens in the output sequence, which is different from the “two target vectors in an input sequence.” Moreover, it is unclear how equation 10-1 (or any other equations) relates to “capturing contextual semantic correlations among target objects.”
Second, regarding the “wherein the feature perception process selectively processes the output sequence based on positions corresponding to correlated target vector pairs, without requiring feature perception for all pairwise combinations of the input sequence” limitation, the specification does not provide sufficient disclosure to enable one of ordinary skill to understand what “positions corresponding to correlated target vector pairs” even refers to. While paras. 0097, 00111-00112, 00119, and 00211 describe “hidden states” of a vector pair, no such “hidden states” are claimed herein.
Moreover, as explained in MPEP 2173.05(i), with respect to negative limitations, “silence will not generally suffice to support a negative claim limitation. The “without requiring feature perception for all pairwise combinations of the input sequence” limitation is a negative limitation and the specification is silent as to how such “pairwise combinations of the input sequence” are created.
Claims 2-10 depend from claim 1, and claims 17-18 depend from claim 14, and none of these claims remedy the deficiencies of the independent claims, and therefore claims 2-10 and 17-18 are rejected for the same reasons explained above with respect to claims 1 and 14.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-10, 14-15, and 17-18 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding Step 1 of the Alice/Mayo framework, Claims 1-10 are directed to a method (a process), Claims 14 and 17-18 are directed to an electronic device (a machine), and Claim 15 is directed to one or more non-transitory computer readable storage media (an article of manufacture), which each fall within one of the four statutory categories of inventions.
Regarding Claim 1
Step 2A, prong 1 (Is the claim directed to a law of nature, a natural phenomenon or an abstract idea).
Claim 1 recites the following mental processes, that in each case under the broadest reasonable interpretation, covers performance of the limitation in the mind (including an observation, evaluation, judgment, opinion) or with the aid of pencil and paper, or mathematical calculations.
An information processing method, the method comprising (under the broadest reasonable interpretation, this limitation can be performed mentally by a human, for example, a human can mentally process information)
performing a feature crossing process on at least two target vectors in an input sequence of target information to obtain an output sequence of the target information; and (under the broadest reasonable interpretation, this limitation can be performed mentally by a human, for example, a human can mentally (or using pencil and paper), perform a vector cross product on 2 vectors to obtain an output sequence)
performing a feature perception process on the output sequence of the target information to obtain a target sequence of the target information, (under the broadest reasonable interpretation, this limitation can be performed mentally by a human, for example, a human can mentally process the “output sequence of the target information” (the output of the vector cross product) to obtain a target sequence (e.g., performing “processing” such as rounding numbers, performing normalization, etc.))
wherein the target sequence represents semantic information of each target object in the target information correlated to other target objects in the target information. (under the broadest reasonable interpretation, this limitation can be performed mentally by a human, for example, a human can mentally review the resulting target sequence and understand that such target sequence represents sematic information for different target objects (e.g., different words) in the target information)
wherein the feature crossing process aggregates interactions among the at least two target vectors in a manner that reduces information confusion while capturing contextual semantic correlations among target objects (under the broadest reasonable interpretation, this limitation can be performed mentally by a human, for example, a human can mentally (or using pencil and paper), perform a vector cross product on 2 vectors to obtain an output sequence, where such cross product aggregates interactions among the two vectors (e.g., aggregates the two vectors into a single perpendicular vector), that reduces information confusion (goes from 2 vectors to 1 vector), and when such vectors pertain to semantic word embeddings, capture contextual semantic correlations)
wherein the feature perception process selectively processes the output sequence based on positions corresponding to correlated target vector pairs, without requiring feature perception for all pairwise combinations of the input sequence (under the broadest reasonable interpretation, this limitation can be performed mentally by a human, for example, a human can mentally selectively choose pairs of words in the output sequence, without processing all pairwise combinations)
Regarding Step 2A, Prong 2, the claim does not include any additional elements that integrate the judicial exception into a practical application and regarding Step 2B, there are no additional elements recited that amount to significantly more than the judicial exception.
Regarding Claim 2
Step 2A, Prong 1
determining corresponding crossed hidden states based on a feature function for the at least two target vectors in the input sequence of the target information, and (under the broadest reasonable interpretation, this limitation can be performed mentally by a human, for example, a human can mentally determine states that are hidden based on a feature function, e.g., a function that identifies that the 2nd entry in a vector is a hidden state)
determining the crossed hidden states based on a fast Fourier transform to obtain the output sequence of the target information, and (under the broadest reasonable interpretation, this limitation can be performed mentally by a human, for example, a human can mentally perform a FFT, and from the resulting output sequence, determine hidden states in the output sequence, such as designating the 2nd entry in a vector as a hidden state)
wherein the feature function comprises a parameterized non-linear feature mapping function. (under the broadest reasonable interpretation, this limitation can be performed mentally by a human, for example, a human can mentally apply a non-linear feature mapping function using parameters, such as a feature function designating only the 2nd entry in a vector as a hidden state)
Regarding Step 2A, Prong 2, the claim does not include any additional elements that integrate the judicial exception into a practical application and regarding Step 2B, there are no additional elements recited that amount to significantly more than the judicial exception.
Regarding Claim 3
Step 2A, Prong 1
determining a first sequence and a second sequence based on the input sequence of the target information, and (under the broadest reasonable interpretation, this limitation can be performed mentally by a human, for example, a human can mentally decompose an input sequence into first and second sequences)
determining the corresponding crossed hidden states based on the feature function for a first target vector in the first sequence and a second target vector in the second sequence, and (under the broadest reasonable interpretation, this limitation can be performed mentally by a human, for example, a human can mentally apply the feature function described with respect to claim 2 to target vectors with respect to the first and second sequences)
wherein the first target vector is different from the second target vector; (under the broadest reasonable interpretation, this limitation can be performed mentally by a human, for example, a human can mentally confirm that the first and second target vectors differ)
wherein in the feature function, the first target vector corresponds to a first learnable parameter matrix, and the second target vector corresponds to a second learnable parameter matrix. (under the broadest reasonable interpretation, this limitation can be performed mentally by a human, for example, a human can mentally note that the first and second target vectors correspond to learnable parameter matrices)
Regarding Step 2A, Prong 2, the claim does not include any additional elements that integrate the judicial exception into a practical application and regarding Step 2B, there are no additional elements recited that amount to significantly more than the judicial exception.
Regarding Claim 4
Step 2A, Prong 1
performing a first feature function corresponding to the first target vector and the first learnable parameter matrix based on the fast Fourier transform for real input to obtain a first feature information; (under the broadest reasonable interpretation, this limitation can be performed mentally by a human, for example, a human can mentally perform a feature function that corresponds to the first target vector and the first learnable parameter matrix based on the FFT only on real numbers)
performing a second feature function corresponding to the second target vector and the second learnable parameter matrix based on the fast Fourier transform for real input to obtain a second feature information; and (under the broadest reasonable interpretation, this limitation can be performed mentally by a human, for example, a human can mentally perform a feature function that corresponds to the second target vector and the second learnable parameter matrix based on the FFT only on real numbers)
performing a convolution transform of the first feature information and the second feature information based on an inverse fast Fourier transform for real input to obtain the output sequence of the target information. (under the broadest reasonable interpretation, this limitation can be performed mentally by a human, for example, a human can mentally perform a convolution transform on the first and second feature information based on an inverse FFT on real numbers)
Regarding Step 2A, Prong 2, the claim does not include any additional elements that integrate the judicial exception into a practical application and regarding Step 2B, there are no additional elements recited that amount to significantly more than the judicial exception.
Regarding Claim 5
Step 2A, Prong 1
deleting a same element values in a cross matrix corresponding to the output sequence of the target information, the element values being hidden states after crossing of the at least two target vectors; and (under the broadest reasonable interpretation, this limitation can be performed mentally by a human, for example, a human can mentally delete elements in a cross matrix, where such elements are hidden states)
performing the feature perception process on the cross matrix with the same element values deleted. (under the broadest reasonable interpretation, this limitation can be performed mentally by a human, for example, a human can mentally perceive features from the cross matrix after element values are deleted, such as determining that the sum of each row or column vector is a feature)
Regarding Step 2A, Prong 2, the claim does not include any additional elements that integrate the judicial exception into a practical application and regarding Step 2B, there are no additional elements recited that amount to significantly more than the judicial exception.
Regarding Claim 6
Step 2A, Prong 1
performing the feature perception process on hidden states of a target vector pair having correlation in the output sequence of the target information, the target vector pair being the target vectors for which the feature crossing have been performed. (under the broadest reasonable interpretation, this limitation can be performed mentally by a human, for example, a human can mentally perceive features on hidden states of a vector pair, such as perceiving elements in the vector pair that match as features)
Regarding Step 2A, Prong 2, the claim does not include any additional elements that integrate the judicial exception into a practical application and regarding Step 2B, there are no additional elements recited that amount to significantly more than the judicial exception.
Regarding Claim 7
Step 2A, Prong 1
performing the feature perception process on dominant elements in a cross matrix corresponding to the output sequence of the target information, and wherein the dominant elements include non-zero elements. (under the broadest reasonable interpretation, this limitation can be performed mentally by a human, for example, a human can mentally perceive features only based on non-zero elements in a cross matrix corresponding to the output sequence of the target information)
Regarding Step 2A, Prong 2, the claim does not include any additional elements that integrate the judicial exception into a practical application and regarding Step 2B, there are no additional elements recited that amount to significantly more than the judicial exception.
Regarding Claim 8
Step 2A, Prong 1
determining column indexes of the dominant elements in the cross matrix corresponding to the output sequence of the target information; (under the broadest reasonable interpretation, this limitation can be performed mentally by a human, for example, a human can mentally determine such column indexes of the non-zero elements in a cross matrix)
determining a confidence of a sparse matrix based on the column indexes, and (under the broadest reasonable interpretation, this limitation can be performed mentally by a human, for example, a human can mentally determine such a confidence, such as by using equation 8 in para. 0132 of the instance disclosure, which is also a mathematical calculation)
obtaining a sparse attention matrix based on the confidence of the sparse matrix and determination of an attention probability matrix; and (under the broadest reasonable interpretation, this limitation can be performed mentally by a human, for example, a human can mentally obtain such a sparse attention matrix, such as by multiplying the sparse matrix by the attention probability matrix and a scalar confidence value)
determining the target sequence of the target information based on the sparse attention matrix. (under the broadest reasonable interpretation, this limitation can be performed mentally by a human, for example, a human can mentally use the sparse attention matrix to determine a target sequence of the target information, such as reading-out the different rows to determine the target sequence)
Regarding Step 2A, Prong 2, the claim does not include any additional elements that integrate the judicial exception into a practical application and regarding Step 2B, there are no additional elements recited that amount to significantly more than the judicial exception.
Regarding Claim 9
Step 2A, Prong 1
wherein the performing of the feature perception process on the output sequence of the target information to obtain the target sequence of the target information is performed (under the broadest reasonable interpretation, this limitation can be performed mentally by a human, for example, a human can mentally perceive features from an output sequence to obtain a target sequence of the target information)
a gradient truncation is performed on a back-transferred positive gradient, and wherein the back-transferred gradient is determined by a loss value of the attention model and a mean value of the column indexes (under the broadest reasonable interpretation, this limitation can be performed mentally by a human, for example, a human can mentally perform the gradient truncation and then back-transfer the gradient; the examiner further notes that calculating a gradient is a mathematical calculation)
Step 2A, Prong 2
Regarding the “based on a pre-constructed attention model” limitation, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception. In particular, the claim only recites the additional element of a pre-constructed attention model. This additional element is recited at a high-level of generality and amounts to no more than mere instructions to apply the exception using a generic computer component (a generic computing model). Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.05(f)).
Regarding the “wherein, when the attention model is trained,” limitation, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception. In particular, the claim only recites the additional element of a pre-constructed attention model. This additional element is recited at a high-level of generality and amounts to no more than mere instructions to apply the exception using a generic computer component (a generic computing model). Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.05(f)).
Step 2B
Regarding the “based on a pre-constructed attention model” limitation, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception, because the limitation merely provides instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Accordingly, this additional element does not add significantly more than the judicial exception. (See MPEP 2106.05(f)).
Regarding the “wherein, when the attention model is trained” limitation, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception, because the limitation merely provides instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Accordingly, this additional element does not add significantly more than the judicial exception. (See MPEP 2106.05(f)).
Regarding Claim 10
Step 2A, Prong 1
determining an element vector based on the dominant elements selected from the sparse attention matrix and a corresponding value vector; and (under the broadest reasonable interpretation, this limitation can be performed mentally by a human, for example, a human can mentally determine such an element vector based on the dominant elements, such as by extracting the dominant elements into a new vector corresponding to the recited “element vector”)
accumulating the element vector into positions of the target sequence corresponding to the column indexes of the selected dominant elements to determine the target sequence of the target information. (under the broadest reasonable interpretation, this limitation can be performed mentally by a human, for example, a human can mentally accumulate the element vector into positions of the target sequence corresponding to the column indexes of the selected dominant elements to determine the target sequence of the target information)
Regarding Step 2A, Prong 2, the claim does not include any additional elements that integrate the judicial exception into a practical application and regarding Step 2B, there are no additional elements recited that amount to significantly more than the judicial exception.
Regarding Claim 14
Step 2A, Prong 1
Claim 14 recites an electronic device that corresponds to the method of claim 1, and therefore the analysis under Step 2A, Prong 1 with respect to claim 1 also applies to this claim 14. While claim 14 recites additional generic computing components (“processors”, “memory”, “instructions”), such additional generic computing components do not change the analysis under Step 2A, Prong 1.
Step 2A, Prong 2
Claim 14 recites an electronic device that corresponds to the method of claim 1. While claim 14 recites additional generic computing components (“processors”, “memory”, “instructions”), such additional generic computing components do not change the analysis under Step 2A, Prong 2.
Step 2B
Claim 14 recites an electronic device that corresponds to the method of claim 1. While claim 14 recites additional generic computing components (“processors”, “memory”, “instructions”), such additional generic computing components do not change the analysis under Step 2B.
Regarding Claim 15
Step 2A, Prong 1
Claim 15 recites one or more non-transitory computer readable storage media for storing instructions that when executed by a computer perform the method of claim 1. While claim 15 recites additional generic computing components (“non-transitory computer readable storage media”, “computer instructions”, “processor”), such additional generic computing components do not change the analysis under Step 2A, Prong 1.
Step 2A, Prong 2
Claim 15 recites one or more non-transitory computer readable storage media for storing instructions that when executed by a computer perform the method of claim 1. While claim 15 recites additional generic computing components (“non-transitory computer readable storage media”, “computer instructions”, “processor”), such additional generic computing components do not change the analysis under Step 2A, Prong 2.
Step 2B
Claim 15 recites one or more non-transitory computer readable storage media for storing instructions that when executed by a computer perform the method of claim 1. While claim 15 recites additional generic computing components (“non-transitory computer readable storage media”, “computer instructions”, “processor”), such additional generic computing components do not change the analysis under Step 2B.
Claim 17 depends from claim 14 and claims an electronic device that corresponds to the method of claim 5, and is therefore rejected for the same reasons explained above with respect to claims 5 and 14.
Claim 18 depends from claim 14 and claims an electronic device that corresponds to the method of claim 6, and is therefore rejected for the same reasons explained above with respect to claims 6 and 14.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 6, 14-15, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Lee-Thorp, James, et al. "Sparse mixers: Combining moe and mixing to build a more efficient bert." arXiv preprint arXiv:2205.12399 (May 9, 2021), hereinafter referenced as LEE-THORP, in view of US 20210264449 A1, hereinafter referenced as TSAI, and further in view of US 20060253517 A1, hereinafter referenced as ARGENTAR.
Regarding Claim 1
LEE-THORP discloses:
An information processing method, the method comprising: (LEE-THORP, p. 1, section 1: “We show that FNet offers an excellent compromise between speed, memory footprint, and accuracy, achieving 92% of the accuracy of BERT in a common classification transfer learning setup on the GLUE benchmark (Wang et al., 2018), but training seven times as fast on GPUs and twice as fast on TPUs.”;
Examiner’s Note: LEE-THORP discloses the FNet model architecture, which processes information, such as input text as in BERT, as shown further on p. 8, table 4, which shows the results of the different tasks FNet performed against BERT and other benchmarks)
performing a feature crossing process on at least two target vectors in an input sequence of target information to obtain an output sequence of the target information; and (LEE-THORP, p. 5, section 3.2: “FNet is a layer normalized ResNet architecture with multiple layers, each of which consists of a Fourier mixing sublayer followed by a feed-forward sublayer. The architecture is shown in Figure 1. Essentially, we replace the self-attention sublayer of each Transformer encoder layer with a Fourier Transform sublayer, which applies a 2D Fourier Transform to its (sequence length, hidden dimension) embedding input – one 1D Fourier Transform along the sequence dimension, Fseq, and one 1D Fourier Transform along the hidden dimension, Fhidden ... The simplest interpretation for the Fourier Transform is as a particularly effective mechanism for mixing tokens, which evidently provides the feedforward sublayers sufficient access to all tokens.”;
LEE-THORP, p. 5, section 3.3: “As a result, our GPU FNet implementation always uses the FFT, while our TPU implementation computes the 2D Fourier Transform using matrix multiplications for sequences up to lengths of 8192 and the FFT for longer lengths.”;
PNG
media_image3.png
440
284
media_image3.png
Greyscale
Examiner’s Note: LEE-THORP discloses that the input token embeddings are input into a Fourier sub-layer, that performs a FFT (Fast Fourier Transform) on the input token embeddings to mix the tokens for the feed forward layer, where each token (which is a multi-dimensional embedding for a word) corresponds to a recited “target vector” (and since there are N encoding blocks, 1 per token, there are at least 2 token vectors), and the sequence of tokens corresponds to the recited “target vectors in an input sequence of target information” and the output of the Fourier sublayer corresponds to the recited “output sequence of the target information”)
performing a feature perception process on the output sequence of the target information to obtain a target sequence of the target information, (LEE-THORP, p. 5, Fig. 1:
PNG
media_image3.png
440
284
media_image3.png
Greyscale
Examiner’s Note: the feed forward sub-layers + dense sub-layers + output projection sub-layers (corresponding to recited “feature perception process”) perform inferencing on the input token sequence, resulting in an output sequence that corresponds to the recited “target sequence of the target information”)
wherein the target sequence represents semantic information of each target object in the target information correlated to other target objects in the target information. (LEE-THORP, p. 12, section 5: “We showed that simple, linear token “mixing” transformations, along with the nonlinearities in feed-forward layers, are sufficient to model diverse semantic relationships in text.”; Examiner’s Note: LEE-THORP discloses that the output of FNet (a series of tokens, where each token in the output sequence corresponds to the recited “each target object”) models semantic relationships between different tokens of text, where each token typically indicates a particular word as shown in Fig. 1)
wherein the feature crossing process aggregates interactions among the at least two target vectors ... while capturing contextual semantic correlations among target objects, and (LEE-THORP, p. 12, section 5: “We showed that simple, linear token “mixing” transformations, along with the nonlinearities in feed-forward layers, are sufficient to model diverse semantic relationships in text.”; Examiner’s Note: LEE-THORP discloses that the output of FNet models semantic relationships between different tokens of text, where each token typically indicates a particular word as shown in Fig. 1, where information about the input word vectors are aggregated through the feed forward sub-layers + dense sub-layers + output projection sub-layers and LEE-THORP specifically teaches capturing semantic relationships in text)
However, LEE-THORP fails to explicitly teach:
...in a manner that reduces information confusion ...
wherein the feature perception process selectively processes the output sequence based on positions corresponding to correlated target vector pairs, without requiring feature perception for all pairwise combinations of the input sequence.
However, in a related field of endeavor (machine learning, see para. 0059), TSAI teaches and makes obvious:
...in a manner that reduces information confusion ...(TSAI, para. 0029: “In the XGBoost regression model 160, each split node is measured using the entropy in the theory of information gain. It is expected that the goal of reducing the level of information confusion is achieved after each split.”;
Examiner’s Note: the LEE-THORP-TSAI now modifies the training of the FNet architecture in LEE-THORP to use XGBoost regression model training in order to reduce “the level of information confusion” as taught by TSAI)
Before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to combine the teachings of LEE-THORP with TSAI as explained above. As disclosed by TSAI, one of ordinary skill would have been motivated to do so because the “XGBoost model is suitable for calculation of large-scale data.” (para. 0059). Moreover, as disclosed by TSAI, one of ordinary skill would have been motivated to do so in order to “make the XGBoost algorithm explainable.” (para. 0062).
However, LEE-THORP and TSAI fail to explicitly teach:
wherein the feature perception process selectively processes the output sequence based on positions corresponding to correlated target vector pairs, without requiring feature perception for all pairwise combinations of the input sequence.
However, in a related field of endeavor (finding patterns in text, see para. 0007), ARGENTAR teaches and makes obvious:
wherein the feature perception process selectively processes the output sequence based on positions corresponding to correlated target vector pairs, without requiring feature perception for all pairwise combinations of the input sequence. (ARGENTAR, para. 0031: “The most preferred pair-wise combinations are those involving the representations of patterns in a higher order n-tuple [n=3 to n=(k-1)] with the representations of patterns in a 2-tuple that shares the same reference sequence and whose tuple identifier includes a sequence index greater than the sequence indices included in the identification of the n-tuple, provided there exists patterns in each n-tuple and 2-tuple. Combining an n-tuple with such a 2-tuple insures that no redundant pattern representations are produced by the comparison, while finding all patterns at successive levels of support.”;
ARGENTAR, para. 0118: “In a similar manner the position index numerical array (PINA) for each pattern produced by each pair-wise combination of sequences may be derived. In FIGS. 6A and 6B the position index numerical arrays (PINAs) are set forth beneath the frame enclosing each 2-tuple to which these position index numerical arrays (PINAs) correspond. Arrows are provided to show more explicitly show the respective correspondences between each pattern and its position index numerical array (PINA).”;
ARGENTAR, para. 0140: “FIGS. 11A and 11B illustrate the position index numerical array (PINA) representations of all 2-tuples that share a common reference sequence as well as all 3-tuples created by the pair-wise combinations of these 2-tuples intersected in the manner shown in FIG. 10. The patterns of symbols in the 3-tuples are also indicated in FIGS. 11A and 11B.”;
Examiner’s Note: ARGENTAR discloses selectively choosing combinations of n-tuples and 2-tuples, where the combination of tuples creates a pattern that is analyzed using a position index; the LEE-THRORP-TSAI-ARGENTAR combination modifies LEE-THORP to selectively analyze input word vectors in 2-tuple and n-tuple combinations as in ARGENTAR, where position information (which is also taught by the position embedding of LEE-THORP as illustrated in Fig. 1) is utilized when selecting the combination of 2-tuples and n-tuples)
Before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to combine the teachings of LEE-THORP with TSAI and ARGENTAR as explained above. As disclosed by ARGENTAR, one of ordinary skill would have been motivated to do so in order to eliminate redundancies of pairwise combinations. (para. 0026).
Regarding Claim 6
LEE-THORP, TSAI, and ARGENTAR disclose the method of claim 1 as explained above. LEE-THORP further teaches:
wherein the performing of the feature perception process on the output sequence of the target information comprises: performing the feature perception process on hidden states of a target vector pair having correlation in the output sequence of the target information, the target vector pair being the target vectors for which the feature crossing have been performed. (LEE-THORP, p. 5, section 3.2: “FNet is a layer normalized ResNet architecture with multiple layers, each of which consists of a Fourier mixing sublayer followed by a feed-forward sublayer. The architecture is shown in Figure 1. Essentially, we replace the self-attention sublayer of each Transformer encoder layer with a Fourier Transform sublayer, which applies a 2D Fourier Transform to its (sequence length, hidden dimension) embedding input – one 1D Fourier Transform along the sequence dimension, Fseq, and one 1D Fourier Transform along the hidden dimension, Fhidden ... The simplest interpretation for the Fourier Transform is as a particularly effective mechanism for mixing tokens, which evidently provides the feedforward sublayers sufficient access to all tokens.”;
LEE-THORP, p. 5, Fig. 1:
PNG
media_image3.png
440
284
media_image3.png
Greyscale
Examiner’s Note: the feed forward sub-layers + dense sub-layers + output projection sub-layers (corresponding to recited “feature perception process”) perform inferencing on the input token sequence (having 2 or more tokens, each being a recited “target vector pair” having hidden dimensions corresponding to recited “hidden states”), resulting in an output sequence that corresponds to the recited “target sequence of the target information”, where such 2 or more input tokens are correlated because they are adjacent and part of each other’s context)
Regarding Claim 14
LEE-THORP discloses:
An electronic device comprising: memory, comprising one or more storage media, storing instructions; and at least one processor communicatively coupled to the memory, wherein the instructions when executed by the at least one processor individually or collectively, cause the electronic device to: (LEE-THORP, p. 1, section 1: “We show that FNet offers an excellent compromise between speed, memory footprint, and accuracy, achieving 92% of the accuracy of BERT in a common classification transfer learning setup on the GLUE benchmark (Wang et al., 2018), but training seven times as fast on GPUs and twice as fast on TPUs.”;
LEE-THORP, p. 6, section 3.3: “We implement our models in JAX using the Flax
Framework”;
Examiner’s Note: LEE-THORP discloses the FNet model architecture, which is executed on either GPUs or TPUs (corresponding to recited processors), where p. 6, footnote 2 shows a link to GitHub code, which corresponds to recited “computer programs” executed by the GPUs/TPUS, and where memory or other storage is necessarily required to store the computer code and for execution by the GPUs/TPUs)
The remaining limitations correspond to the method of claim 1, and therefore claim 14 is rejected for the same reasons explained above with respect to claim 1.
Regarding Claim 15
LEE-THORP discloses:
One or more non-transitory computer readable storage media storing one or more computer programs including computer-executable instructions that, when executed by one or more processors of an electronic device individually or collectively, cause the electronic device to perform operations, the operations comprising: (LEE-THORP, p. 1, section 1: “We show that FNet offers an excellent compromise between speed, memory footprint, and accuracy, achieving 92% of the accuracy of BERT in a common classification transfer learning setup on the GLUE benchmark (Wang et al., 2018), but training seven times as fast on GPUs and twice as fast on TPUs.”;
LEE-THORP, p. 6, section 3.3: “We implement our models in JAX using the Flax
Framework”;
Examiner’s Note: LEE-THORP discloses the FNet model architecture, which is executed on either GPUs or TPUs (corresponding to recited computers), where p. 6, footnote 2 shows a link to GitHub code, which corresponds to recited “computer instructions” executed by the GPUs/TPUS, and where non-transitory computer readable storage is necessarily required to store the computer code and for execution by the GPUs/TPUs)
The remaining limitations correspond to the method of claim 1, and therefore claim 15 is rejected for the same reasons explained above with respect to claim 1.
Claim 18 depends from claim 14 and recites an electronic device that corresponds to the method of claim 6, and is therefore rejected for the same reasons explained above with respect to claims 6 and 14.
Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over LEE-THORP, in view of TSAI and ARGENTAR, and further in view of Xu, Chengfeng, et al. "Recurrent convolutional neural network for sequential recommendation." The world wide web conference. 2019, hereinafter referenced as XU.
Regarding Claim 2
LEE-THORP, TSAI, and ARGENTAR disclose the method of claim 1 as explained above. However, LEE-THORP, TSAI, and ARGENTAR fail to explicitly teach:
determining corresponding crossed hidden states based on a feature function for the at least two target vectors in the input sequence of the target information, and
determining the crossed hidden states based on a fast Fourier transform to obtain the output sequence of the target information, and
wherein the feature function comprises a parameterized non-linear feature mapping function.
However, in a related field of endeavor (neural networks for analyzing sequences, see p. 3398, section 1), XU teaches:
determining corresponding crossed hidden states based on a feature function for the at least two target vectors in the input sequence of the target information, and (XU, p. 3399, section 1: “In this paper, we propose a novel recurrent neural network model, namely Recurrent Convolutional Neural Network (RCNN), for sequential recommendation. It leverages the strengths of both the recurrent architecture of LSTM to capture complex long-term dependencies and the convolutional operation of CNN to extract local sequential patterns among hidden states. Specifically, we first generate a hidden state, i.e, a hidden sequential preference representation, at each time step by inputting a current item into the recurrent layer. Then, we treat recent hidden states of each time step as an “image” and search for local sequential features using horizontal and vertical convolutional filters. An intra-step horizontal filter is used to capture non-linear feature interactions, while an inter-step vertical filter is used for non-monotone local patterns. Moreover, the outputs of CNN and the hidden state vector of LSTM are concatenated together to describe user’s overall interest, and then are fed into a fully-connected layer to generate a recommendation list.”;
XU, p. 4000, section 4.1:
PNG
media_image4.png
288
392
media_image4.png
Greyscale
Examiner’s Note: The activation function for the recurrent layer corresponds to the recited “feature function” that is used with respect to the recited “hidden states”; the LEE-THORP-TSAI-ARGENTAR-XU combination now applies the activation function of XU to the hidden dimension of LEE-THORP (see p. 5, section 3.2) that contains hidden states, prior to input to the Fourier sub-layer of LEE-THORP)
determining the crossed hidden states based on a fast Fourier transform to obtain the output sequence of the target information, and (XU, p. 3399, section 1: “In this paper, we propose a novel recurrent neural network model, namely Recurrent Convolutional Neural Network (RCNN), for sequential recommendation. It leverages the strengths of both the recurrent architecture of LSTM to capture complex long-term dependencies and the convolutional operation of CNN to extract local sequential patterns among hidden states. Specifically, we first generate a hidden state, i.e, a hidden sequential preference representation, at each time step by inputting a current item into the recurrent layer. Then, we treat recent hidden states of each time step as an “image” and search for local sequential features using horizontal and vertical convolutional filters. An intra-step horizontal filter is used to capture non-linear feature interactions, while an inter-step vertical filter is used for non-monotone local patterns. Moreover, the outputs of CNN and the hidden state vector of LSTM are concatenated together to describe user’s overall interest, and then are fed into a fully-connected layer to generate a recommendation list.”;
Examiner’s Note: the LEE-THORP-TSAI-ARGENTAR-XU combination now applies the activation function of XU to the hidden dimension of LEE-THORP (see p. 5, section 3.2) that contains hidden states, and then the Fourier sub-layer of LEE-THORP performs the feature mixing (corresponding to recited “feature crossing”) on the hidden dimension that has had the non-linear activation function of XU applied to it)
wherein the feature function comprises a parameterized non-linear feature mapping function. (XU, p. 4000, section 4.1:
PNG
media_image4.png
288
392
media_image4.png
Greyscale
Examiner’s Note: The activation function for the recurrent layer of XU corresponds to the recited “feature function” that is non-linear sigmoid function, that projects (corresponding to recited “mapping”) values to a range between [0,1], where the activation function is “parameterized” because it accepts parameters to define its behavior; the LEE-THORP-TSAI-ARGENTAR-XU combination now applies the activation function of XU to the hidden dimension of LEE-THORP (see p. 5, section 3.2) that contains hidden states, prior to input to the Fourier sub-layer of LEE-THORP)
Before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to combine the teachings of LEE-THORP with the teachings of TSAI, ARGENTAR, and XU as explained above. As disclosed by XU, one of ordinary skill would have been motivated to do so because XU teaches that it is “crucial to consider the interactions between hidden state features in RNN when learning from sparse sequential data.” (p. 3399, section 1).
Claims 5 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over LEE-THORP in view of TSAI and ARGENTAR and further in view of US 20220172173 A1, hereinafter referenced as DANSHCHIN.
Regarding Claim 5
LEE-THORP, TSAI, and ARGENTAR disclose the method of claim 1 as explained above. However, LEE-THORP, TSAI, and ARGENTAR fail to explicitly teach:
wherein the performing of the feature perception process on the output sequence of the target information comprises: deleting a same element values in a cross matrix corresponding to the output sequence of the target information, the element values being hidden states after crossing of the at least two target vectors; and
performing the feature perception process on the cross matrix with the same element values deleted.
However, in a related field of endeavor (natural language processing, see para. 0004), DANSHCHIN teaches:
wherein the performing of the feature perception process on the output sequence of the target information comprises: deleting a same element values in a cross matrix corresponding to the output sequence of the target information, the element values being hidden states after crossing of the at least two target vectors; and (DANSHCHIN, para. 0047: “Further, columns in which the same value is recorded in all fields, or columns that do not substantially affect search results are removed from the reduced-rank matrix. The columns that may be removed by reducing the rank may further be determined using the matrix rank reduction technique using singular value decomposition. As a result, a reduced-dimension matrix or reduced-dimension matrices are formed. Furthermore, similar fields, the attribute rows of which are assumed to be identical to each other, may be consolidated. In such matrices, each of the sets of identical rows is represented by one row, in relation to which the initial rows transformed into corresponding identical rows are brought into correspondence.”;
Examiner’s Note: DANSHCHIN discloses removing columns having the same value in each field (corresponding to recited “same element values in a cross matrix”); the LEE-THORP-TSAI-ARGENTAR-DANSHCHIN combination now deletes common values from the output sequence of LEE-THORP which is now represented by a cross matrix (see LEE-THORP, p. 5, sections 3.2-3.3), where only elements from the hidden dimensions of LEE-THORP are removed)
performing the feature perception process on the cross matrix with the same element values deleted. (DANSHCHIN, para. 0047: “Further, columns in which the same value is recorded in all fields, or columns that do not substantially affect search results are removed from the reduced-rank matrix. The columns that may be removed by reducing the rank may further be determined using the matrix rank reduction technique using singular value decomposition. As a result, a reduced-dimension matrix or reduced-dimension matrices are formed. Furthermore, similar fields, the attribute rows of which are assumed to be identical to each other, may be consolidated. In such matrices, each of the sets of identical rows is represented by one row, in relation to which the initial rows transformed into corresponding identical rows are brought into correspondence.”;
Examiner’s Note: DANSHCHIN discloses removing columns having the same value in each field (corresponding to recited “same element values in a cross matrix”); the LEE-THORP-TSAI-ARGENTAR-DANSHCHIN combination now deletes common values from the output sequence of LEE-THORP which is now represented by a cross matrix (see LEE-THORP, p. 5, sections 3.2-3.3), where only elements from the hidden dimensions of LEE-THORP are removed, and then performs the process of p. 5, Fig. 1 of LEE-THORP regarding the feedforward, dense, and output layers (corresponding to recited “feature perception process”) on said representation of the output sequence (corresponding to recited “cross matrix”) )
Before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to combine the teachings of LEE-THORP with TSAO, ARGENTAR, and DANSHCHIN as explained above. As disclosed by DANSHCHIN, one of ordinary skill would have been motivated to do so in order to create “a reduced-dimension matrix or reduced-dimension matrices” for simpler computations. (para. 0047).
Claim 17 depends from claim 14 and recites an electronic device that corresponds to the method of claim 5, and is therefore rejected for the same reasons explained above with respect to claims 5 and 14.
Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over LEE-THORP in view of TSAI and ARGENTAR and further in view of US 20210240797 A1, hereinafter referenced as RASHID.
Regarding Claim 7
LEE-THORP, TSAI, and ARGENTAR disclose the method of claim 1 as explained above. However, LEE-THORP, TSAI, and ARGENTAR fail to explicitly teach:
wherein the performing of the feature perception process on the output sequence of the target information comprises: performing the feature perception process on dominant elements in a cross matrix corresponding to the output sequence of the target information, and wherein the dominant elements include non-zero elements.
However, in a related field of endeavor (matrix calculations within a neural network, see para. 0009), RASHID teaches:
wherein the performing of the feature perception process on the output sequence of the target information comprises: performing the feature perception process on dominant elements in a cross matrix corresponding to the output sequence of the target information, and wherein the dominant elements include non-zero elements. (RASHID, para. 0017: “For example, due to non-linear (e.g., rectified linear device) activation and quantization, the inputs to each layer of a neural network may include many zero-valued elements. In some examples, matrices may be stored as “sparse matrices,” used herein to refer to matrices in which only non-zero elements are explicitly defined. As a non-limiting example, sparse matrices may be stored in the form of two vectors, an order vector indicating which elements of a sparse matrix are populated (e.g., a bit-vector indicating a “1” value for non-zero entries and a “0” value for zero entries in a row, column lexicographic order) and a data vector including all of the non-zero elements (e.g., listed in the row, column lexicographic order). Storing and computing with sparse matrices may be particularly efficient when there are relatively few non-zero entries, because only the non-zero entries are explicitly defined. Accordingly, only the non-zero elements need to be stored, and in some cases, computations may be simplified or optimized based on the implicit encoding of the zero-valued elements (e.g., skipping a portion of a computation corresponding to computing the product of one or more values including one of the implicitly-encoded zero values). In some examples, sparse matrix data may be “unpacked” to populate a dense matrix, e.g., by explicitly storing all of the non-zero and zero elements indicated by the sparse matrix.”;
Examiner’s Note: RASHID discloses sparse matrices including only non-zero elements (corresponding to recited “dominant elements”); the LEE-THORP-TSAI-ARGENTAR-RASHID combination now reorganizes the output sequence of LEE-THORP into a sparse matrix of only non-zero elements (corresponding to recited “cross matrix”), and then performs the process of p. 5, Fig. 1 of LEE-THORP regarding the feedforward, dense, and output layers (corresponding to recited “feature perception process”) on said representation of the output sequence.
Before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to combine the teachings of LEE-THORP with TSAI, ARGENTAR, and RASHID as explained above. As disclosed by RASHID, one of ordinary skill would have been motivated to do so in order to because “computations may be simplified or optimized based on the implicit encoding of the zero-valued elements (e.g., skipping a portion of a computation corresponding to computing the product of one or more values including one of the implicitly-encoded zero values).” (para. 0017).
Allowable Subject Matter
Claims 3-4 and 8-10 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:
Claim 3 would be considered allowable, if the rejections under 35 U.S.C. 101 and 112(a) are overcome, because none of the references of record either alone or in combination fairly disclose or suggest the combination of limitations specified in claim 3 claims, including at least:
wherein the determining of the corresponding crossed hidden states based on the feature function for the at least two target vectors in the input sequence of the target information comprises:
determining a first sequence and a second sequence based on the input sequence of the target information, and
determining the corresponding crossed hidden states based on the feature function for a first target vector in the first sequence and a second target vector in the second sequence, and
wherein the first target vector is different from the second target vector; and
wherein in the feature function, the first target vector corresponds to a first learnable parameter matrix, and the second target vector corresponds to a second learnable parameter matrix.
The closest prior art of record discloses:
LEE-THORP discloses the Fnet model that uses a Fast Fourier Transform in a mixing sublayer to operate on text token sequences. (p. 5, sections 3.2-3.3).
TSAI teaches using an XGBoost technique that reduces the level of information confusion achieved. (para. 0029).
ARGENTAR discloses finding patterns in text with respect to corresponding 2-tuples and n-tuple pairs. (paras. 0031, 0118, 0140).
XU teaches applying a non-linear activation function to hidden states by a neural network. (p. 3399, section 1)
However, the examiner has found that the distinct feature of the Applicant's claimed invention over the prior art is the explicit claiming of the aforementioned limitations in combination with all the other limitations as specified in claim 3. In particular, one of ordinary skill would not have been motivated to implement the recited “determining of the corresponding crossed hidden states based on the feature function for the at least two target vectors in the input sequence of the target information” function in the precise manner recited (using both first and second sequences, each having a respective target vector corresponding to learnable parameter matrices) without the hindsight aid of Applicant’s disclosure. Therefore, because the limitations of claim 3 are not anticipated nor made obvious by the prior art of record, claim 3 would be allowable if the rejections under 35 U.S.C. 101 and 112(a) are overcome.
Claim 4 depends from claim 3 and would be allowable for the same reasons recited in claim 3 provided that the rejections under 35 U.S.C. 101 and 112(a) are overcome.
Claim 8 would be considered allowable, if the rejections under 35 U.S.C. 101 are overcome, because none of the references of record either alone or in combination fairly disclose or suggest the combination of limitations specified in claim 8 claims, including at least:
wherein the performing of the feature perception process on the dominant elements in the cross matrix corresponding to the output sequence of the target information to obtain the target sequence of the target information comprises:
determining column indexes of the dominant elements in the cross matrix corresponding to the output sequence of the target information;
determining a confidence of a sparse matrix based on the column indexes, and obtaining a sparse attention matrix based on the confidence of the sparse matrix and determination of an attention probability matrix; and
determining the target sequence of the target information based on the sparse attention matrix.
The closest prior art of record discloses:
LEE-THORP discloses the Fnet model that uses a Fast Fourier Transform in a mixing sublayer to operate on text token sequences. (p. 5, sections 3.2-3.3).
TSAI teaches using an XGBoost technique that reduces the level of information confusion achieved. (para. 0029).
ARGENTAR discloses finding patterns in text with respect to corresponding 2-tuples and n-tuple pairs. (paras. 0031, 0118, 0140).
RASHID discloses sparse matrices including only non-zero elements. (para.0017).
However, the examiner has found that the distinct feature of the Applicant's claimed invention over the prior art is the explicit claiming of the aforementioned limitations in combination with all the other limitations as specified in claim 8. In particular, one of ordinary skill would not have been motivated to implement the recited “performing of the feature perception process on the dominant elements in the cross matrix corresponding to the output sequence of the target information to obtain the target sequence of the target information” function in the precise manner recited (determining a “confidence” of a sparse matrix and then obtaining a sparse attention matrix based on such confidence and the determination of an attention probability matrix) without the hindsight aid of Applicant’s disclosure. Therefore, because the limitations of claim 8 are not anticipated nor made obvious by the prior art of record, claim 8 would be allowable if the rejections under 35 U.S.C. 101 and 112(a) are overcome.
Claims 9-10 depends from claim 8 and would be allowable for the same reasons recited in claim 8 provided that the rejections under 35 U.S.C. 101 and 112(a) are overcome.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL C LEE whose telephone number is (571)272-4933. The examiner can normally be reached M-F 12:00 pm - 8:00 pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez Rivas can be reached at 571-272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MICHAEL C. LEE/Examiner, Art Unit 2128
/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128