DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
The present application is being examined under the claims filed 12/20/2022.
Claims 1-22 are pending.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 12/20/2022 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
The information disclosure statement (IDS) submitted on 02/24/2023 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-12 and 15-22 are rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor regards as the invention.
Regarding claims 1, 2, 7-11, 15-16, and 20-22
Claims 1, 2, 7-11, 15-16, and 20-22 are rejected under 35 U.S.C. 112(b) as being indefinite because they each recite the claim terms “lambda layer” and “lambda function”, which are not clearly defined by the specification. MPEP 2173.01 recites “A fundamental principle contained in 35 U.S.C. 112(b) or pre-AIA 35 U.S.C. 112, second paragraph is that applicants are their own lexicographers. They can define in the claims what the inventor or a joint inventor regards as the invention essentially in whatever terms they choose so long as any special meaning assigned to a term is clearly set forth in the specification.” The terms “lambda layer” and “lambda function” have no clear meaning as a term of art, nor are they clearly set forth in the specification (i.e. paragraphs 6 and 32 of the specification do not sufficiently define the terms) rendering the claims indefinite.
Regarding Claims 4 and 18
Claims 4 and 18 each recite the limitation "wherein each respective content function encodes a transform of query content". There is insufficient antecedent basis for this limitation in the claim. In particular, there is insufficient antecedent basis for the claimed “each respective content function”. For purposes of examination, the examiner interprets the limitation as though it said "wherein the one or more lambda functions are generated based, at least in part, on a plurality of content functions and each respective content function encodes a transform of query content"
Regarding Claim 6
Claim 6 is rejected under 35 U.S.C. 112(b) as being indefinite because it recites the claim term “relative position” which is not clearly defined by the specification. MPEP 2173.01 recites “A fundamental principle contained in 35 U.S.C. 112(b) or pre-AIA 35 U.S.C. 112, second paragraph is that applicants are their own lexicographers. They can define in the claims what the inventor or a joint inventor regards as the invention essentially in whatever terms they choose so long as any special meaning assigned to a term is clearly set forth in the specification.” The terms “relative position” has no clear meaning as a term of art, nor is it clearly set forth in the specification (i.e. paragraphs 34 and 44 of the specification do not sufficiently define the terms) rendering the claim indefinite.
Regarding Claims 10 and 21
The term “local lambda function” in claim 10 is a relative term which renders the claim indefinite. The term “local lambda function” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. In this instance, the terms “local” and “global” appear to refer to the length of the context (see paragraph 45). However, it is unclear the lengths of the contexts for which functions would be considered “local” or “global” functions.
Regarding Claims 11 and 22
The term “global lambda function” in claim 11 is a relative term which renders the claim indefinite. The term “global lambda function” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. In this instance, the terms “local” and “global” appear to refer to the length of the context (see paragraph 45). However, it is unclear the lengths of the contexts for which functions would be considered “local” or “global” functions.
Regarding dependent claims
Claims 2-12 are dependent upon claim 1, 16-19 are dependent upon claim 15, and 21-22 are dependent upon claim 20 are therefore similarly rejected for including the deficiencies of claims 1, 15, and 20 respectively.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-12 and 15-22 are rejected under 35 U.S.C. 101 for containing an abstract idea without significantly more.
Regarding Claim 1:
Step 1 – Is the claim to a process, machine, manufacture, or composition of matter?
Yes, the claim is to a machine.
Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites the abstract ideas of:
generating one or more lambda functions based, at least in part, on a content function and a position function of each of the plurality of context elements in the context data — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to performing a judgement of known functions (for example, generating the lambda functions by adding them together which could be performed in the human mind).
and applying the one or more generated lambda functions to the input data as part of generating a layer output associated with the respective lambda layer — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to evaluating data based on known functions and procedures.
Step 2A – Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No, the claim does not recite additional elements that integrate the judicial exception into a practical application. The additional elements:
A computing system for modeling long-range interactions with reduced feature materialization, comprising: one or more processors; and one or more non-transitory computer-readable media that collectively store: — This limitation is directed to merely applying an abstract idea using a generic computer as a tool (see MPEP 2106.05(f)(2), 2106.04(d)).
a machine-learned model configured to receive a model input and process the model input to generate a model output, wherein the machine-learned model comprises — This limitation is directed to mere instructions to apply a judicial exception. Using generic machine learning models to apply a judicial exception (see MPEP 2106.05(f)) is insufficient to integrate the judicial exception into a practical application. Even if the machine learning model is implemented on a generic computer (see MPEP 2106.05(f)(2), 2106.04(d)), the limitation does not integrate the judicial exception into a practical application.
one or more lambda layers, wherein each of the one or more lambda layers is configured to perform operations comprising: receiving a layer-input comprising input data and context data comprising a plurality of context elements — This limitation is directed to mere data gathering and outputting which has been recognized by the courts (as per Ultramercial, 772 F.3d at 715, 112 USPQ2d at 1754) as insignificant extra-solution activity (see MPEP 2106.05(g)).
Step 2B – Does the claim recite additional elements that amount to significantly more than the abstract idea itself?
No, the claim does not recite additional elements which amount to significantly more than the abstract idea itself. The additional elements as identified in step 2A prong 2:
A computing system for modeling long-range interactions with reduced feature materialization, comprising: one or more processors; and one or more non-transitory computer-readable media that collectively store: — Using a generic computer as a tool (see MPEP 2106.05(f)(2), 2106.05(d)) cannot amount to significantly more than the judicial exception itself.
a machine-learned model configured to receive a model input and process the model input to generate a model output, wherein the machine-learned model comprises — Mere instructions to apply a judicial exception (see MPEP 2106.05(f)) and using a generic computer as a tool (see MPEP 2106.05(f)(2), 2106.05(d)) cannot amount to significantly more than the judicial exception itself.
one or more lambda layers, wherein each of the one or more lambda layers is configured to perform operations comprising: receiving a layer-input comprising input data and context data comprising a plurality of context elements — This limitation is recited at a high level of generality and amounts to mere data gathering of transmitting and receiving data over a network, which is well-understood, routine, and conventional activity (see MPEP 2106.05(d) II.), which cannot amount to significantly more than the judicial exception.
Regarding Claim 2
Claim 2 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 1 which included an abstract idea (see rejection for claim 1). The claim merely recites the additional abstract idea:
Step 2A Prong 1:
wherein generating the one or more lambda functions comprises: averaging content functions and position functions for the plurality of the context elements — This limitation is directed to the abstract idea of a mathematical process, and mathematical calculations in particular (MPEP 2106.04(a)(2) I. C.). The claim describes the mathematical operation of calculating an average in words.
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. Thus, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B.
Regarding Claim 3
Claim 3 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 3 which included an abstract idea (see rejection for claim 3). The claim merely recites the additional abstract idea:
Step 2A Prong 1:
wherein the operations further comprise: determining keys and values based on linearly projecting the context data — This limitation is directed to the abstract idea of a mathematical process, and mathematical calculations in particular (MPEP 2106.04(a)(2) I. C.). The claim describes the mathematical operation of calculating a hash function, and more particularly a projection function which can be calculated using a matrix multiplication.
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. Thus, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B.
Regarding Claim 4
Claim 4 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 1 which included an abstract idea (see rejection for claim 1). The claim recites the additional limitations:
Step 2A Prong 2:
wherein each respective content function encodes a transform of query content based on the context data, independent of a target query position — This limitation is directed to merely limiting a judicial exception to a particular field of use (see MPEP 2106.05(h)) as it merely limits the field of the particular type of data operated on by the content function.
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2.
Step 2B:
The additional elements as identified in step 2A prong 2:
wherein each respective content function encodes a transform of query content based on the context data, independent of a target query position — Merely limiting a judicial exception to a particular field of use (see MPEP 2106.05(h)) cannot amount to significantly more than the judicial exception.
Thus, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B.
Regarding Claim 5
Claim 5 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 1 which included an abstract idea (see rejection for claim 1). The claim recites the additional limitations:
Step 2A Prong 2:
wherein each respective position function encodes a transform of query content based on the context data, a query position, and a position in the context data — This limitation is directed to merely limiting a judicial exception to a particular field of use (see MPEP 2106.05(h)) as it merely limits the field of the particular type of data operated on by the content function.
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2.
Step 2B:
The additional elements as identified in step 2A prong 2:
wherein each respective position function encodes a transform of query content based on the context data, a query position, and a position in the context data — Merely limiting a judicial exception to a particular field of use (see MPEP 2106.05(h)) cannot amount to significantly more than the judicial exception.
Thus, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B.
Regarding Claim 6
Claim 6 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 1 which included an abstract idea (see rejection for claim 1). The claim recites the additional limitations:
Step 2A Prong 1:
wherein translation-equivariant position interactions are determined based on relative positions of one or more pairs of a plurality of query positions and a plurality of positions in the context data — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to evaluating data using a known algorithm (e.g. a math function).
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B.
Regarding Claim 7
Claim 7 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 1 which included an abstract idea (see rejection for claim 1). The claim recites the additional limitations:
Step 2A Prong 1:
wherein the operations further comprise: transforming the input data into one or more queries, wherein applying the one or more generated lambda functions to the input data comprises applying at least one of the generated lambda functions to each of the one or more queries — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to evaluating data using a known algorithm (e.g. a math function).
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B.
Regarding Claim 8
Claim 8 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 1 which included an abstract idea (see rejection for claim 1). The claim recites the additional limitations:
Step 2A Prong 1:
wherein applying the one or more generated lambda functions to the input data comprises combining a series of outputs resulting from applying at least one of the generated lambda functions to a plurality of queries associated with the input data — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to evaluating data using a known algorithm (e.g. a math function).
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B.
Regarding Claim 9
Claim 9 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 1 which included an abstract idea (see rejection for claim 1). The claim recites the additional limitations:
Step 2A Prong 2:
wherein one or more of the lambda functions are global lambda functions — This limitation is directed to merely limiting a judicial exception to a particular field of use (see MPEP 2106.05(h)) as it merely limits the field of the particular type of lambda function.
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2.
Step 2B:
The additional elements as identified in step 2A prong 2:
wherein one or more of the lambda functions are global lambda functions — Merely limiting a judicial exception to a particular field of use (see MPEP 2106.05(h)) cannot amount to significantly more than the judicial exception.
Thus, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B.
Regarding Claim 10
Claim 10 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 1 which included an abstract idea (see rejection for claim 1). The claim recites the additional limitations:
Step 2A Prong 2:
wherein one or more of the lambda functions are local lambda functions — This limitation is directed to merely limiting a judicial exception to a particular field of use (see MPEP 2106.05(h)) as it merely limits the field of the particular type of lambda function.
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2.
Step 2B:
The additional elements as identified in step 2A prong 2:
wherein one or more of the lambda functions are local lambda functions — Merely limiting a judicial exception to a particular field of use (see MPEP 2106.05(h)) cannot amount to significantly more than the judicial exception.
Thus, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B.
Regarding Claim 11
Claim 11 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 1 which included an abstract idea (see rejection for claim 1). The claim recites the additional limitations:
Step 2A Prong 2:
wherein generating the one or more lambda functions comprises masking one or more positions of the context data — This limitation is directed to mere instructions to apply a judicial exception. Using a generic masking function to apply a judicial exception (see MPEP 2106.05(f)) is insufficient to integrate the judicial exception into a practical application. Even if the masking function is implemented on a generic computer (see MPEP 2106.05(f)(2), 2106.04(d)), the limitation does not integrate the judicial exception into a practical application.
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2.
Step 2B:
The additional elements as identified in step 2A prong 2:
wherein generating the one or more lambda functions comprises masking one or more positions of the context data — Mere instructions to apply a judicial exception (see MPEP 2106.05(f)) and using a generic computer as a tool (see MPEP 2106.05(f)(2), 2106.05(d)) cannot amount to significantly more than the judicial exception itself.
Thus, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B.
Regarding Claim 12
Claim 12 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 1 which included an abstract idea (see rejection for claim 1). The claim recites the additional limitations:
Step 2A Prong 2:
wherein the machine-learned model is configured to perform an image processing task, wherein the image processing task comprises image classification, object detection, image recognition, image segmentation, image data modification, image encoding, image compression or image upscaling — This limitation is directed to merely limiting a judicial exception to a particular field of use (see MPEP 2106.05(h)) as it merely limits the judicial exception to the technological environment of a particular set of machine learning tasks.
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2.
Step 2B:
The additional elements as identified in step 2A prong 2:
wherein the machine- learned model is configured to perform an image processing task, wherein the image processing task comprises image classification, object detection, image recognition, image segmentation, image data modification, image encoding, image compression or image upscaling — Merely limiting a judicial exception to a particular field of use (see MPEP 2106.05(h)) cannot amount to significantly more than the judicial exception.
Thus, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B.
Regarding Claim 15
Independent claim 15 is a method claim corresponding to computer system claim 1, which was directed to an abstract idea. The only difference is that claim 15 recite a method with slightly different wording that does not change the scope of the claim, therefore the same rejection and rationale applies.
Regarding Claim 16
Dependent claim 16 is a computer-implemented method claim corresponding to computer system claim 2, which was directed to an abstract idea, therefore the same rejection and rationale applies.
Regarding Claim 17
Dependent claim 17 is a computer-implemented method claim corresponding to computer system claim 3, which was directed to an abstract idea, therefore the same rejection and rationale applies.
Regarding Claim 18
Dependent claim 18 is a computer-implemented method claim corresponding to computer system claim 4, which was directed to an abstract idea, therefore the same rejection and rationale applies.
Regarding Claim 19
Dependent claim 19 is a computer-implemented method claim corresponding to computer system claim 5, which was directed to an abstract idea, therefore the same rejection and rationale applies.
Regarding Claim 20
Independent claim 20 is a non-transitory computer-readable medium claim corresponding to computer system claim 1, which was directed to an abstract idea, therefore the same rejection and rationale applies. The only difference is that claim 20 recites the following additional elements treated under step 2A prong 2 and step 2B:
Step 2A Prong 2:
One or more non-transitory computer-readable media that store: — This limitation is directed to merely applying an abstract idea using a generic computer as a tool (see MPEP 2106.05(f)(2), 2106.04(d)).
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2.
Step 2B:
One or more non-transitory computer-readable media that store: — Using a generic computer as a tool (see MPEP 2106.05(f)(2), 2106.05(d)) cannot amount to significantly more than the judicial exception itself.
Thus, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B.
Regarding Claim 21
Dependent claim 21 is a non-transitory computer-readable medium claim corresponding to computer system claim 10, which was directed to an abstract idea, therefore the same rejection and rationale applies.
Regarding Claim 22
Dependent claim 22 is a non-transitory computer-readable medium claim corresponding to computer system claim 11, which was directed to an abstract idea, therefore the same rejection and rationale applies.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claims 1-5, 7, 9-12, and 15-22 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Katharopoulos et al. “Transformers are RNNS: Fast Autoregressive Transformers with Linear Attention” herein referred to as Katharopoulos.
Regarding Claim 1
Katharopoulos teaches:
A computing system for modeling long-range interactions with reduced feature materialization, comprising: one or more processors; and one or more non-transitory computer-readable media that collectively store:
(page 4 column 2 last paragraph) “When it comes to training, the computations can be parallelized and take full advantage of GPUs or other accelerators. When it comes to inference, the cost per time and memory for one prediction is constant for our model”
a machine-learned model configured to receive a model input and process the model input to generate a model output, wherein the machine-learned model comprises one or more lambda layers
(page 3 column 1 paragraph 1) “Let x ∈ RN×F denote a sequence of N feature vectors of dimensions F[*Examiner notes: ]. A transformer is a function T : RN×F →RN×F defined by the composition of L transformer layers T1(·), . . . , TL(·)[*Examiner notes: transformer layers] as follows,”
wherein each of the one or more lambda layers is configured to perform operations comprising: receiving a layer-input comprising input data and context data comprising a plurality of context elements;
(page 5 column 1 below equation 20) “In the above equations, xi denotes the i-th input and yi the i-th output for a specific transformer layer.”
generating one or more lambda functions based, at least in part, on a content function and a position function of each of the plurality of context elements in the context data; and applying the one or more generated lambda functions to the input data as part of generating a layer output associated with the respective lambda layer
(page 3 column 1 paragraph 1) “A transformer is a function T : RN×F →RN×F defined by the composition of L transformer layers T1(·), . . . , TL(·) as follows,”; [*Examiner notes: Tl(x) is mapped to the lambda function. The lambda function is applied to the input x to obtain the output of the lambda layer]
PNG
media_image1.png
32
257
media_image1.png
Greyscale
Regarding Claim 2
Katharopoulos teaches:
The computing system of claim 1
(see rejection of claim 1)
wherein generating the one or more lambda functions comprises: averaging content functions and position functions for the plurality of the context elements
(page 3 column 1 paragraph 2) “The self attention function Al(·) computes, for every position, a weighted average of the feature representations of all other positions with a weight proportional to a similarity score between the representations.”
Regarding Claim 3
Katharopoulos teaches:
The computing system of claim 1
(see rejection of claim 1)
wherein the operations further comprise: determining keys and values based on linearly projecting the context data
(page 3 column 1 paragraph 2) “Formally, the input sequence x is projected by three matrices WQ ∈ RF ×D, WK ∈ RF ×D and WV ∈ RF ×M to corresponding representations Q, K and V”; (page 3 column 1 below equation 2) “Following common terminology, the Q, K and V are referred to as the “queries”, “keys” and “values” respectively.”
Regarding Claim 4
Katharopoulos teaches:
The computing system of claim 1
(see rejection of claim 1)
wherein each respective content function encodes a transform of query content based on the context data, independent of a target query position.
(page 3 column 1 below equation 1) “The function fl(·) transforms each feature independently of the others and is usually implemented with a small two-layer feedforward network.”
Regarding Claim 5
The computing system of claim 1
(see rejection of claim 1)
wherein each respective position function encodes a transform of query content based on the context data, a query position, and a position in the context data
(page 3 paragraph 2) “The self attention function Al(·) computes, for every position, a weighted average of the feature representations of all other positions with a weight proportional to a similarity score between the representations”
Regarding Claim 7
Katharopoulos teaches:
The computing system of any preceding claim 1
(see rejection of claim 1)
wherein the operations further comprise: transforming the input data into one or more queries,
(page 3 column 1 paragraph 2) “Formally, the input sequence x is projected by three matrices WQ ∈ RF ×D, WK ∈ RF ×D and WV ∈ RF ×M to corresponding representations Q, K and V”; (page 3 column 1 below equation 2) “Following common terminology, the Q, K and V are referred to as the “queries”, “keys” and “values” respectively.”
wherein applying the one or more generated lambda functions to the input data comprises applying at least one of the generated lambda functions to each of the one or more queries
Equation 2
PNG
media_image2.png
45
239
media_image2.png
Greyscale
Regarding Claim 9
Katharopoulos teaches:
The computer system of claim 1
(see rejection of claim 1)
wherein one or more of the lambda functions are global lambda functions
(page 6 column 1 last paragraph) “We compute the attention and the gradients for a synthetic input with varying sequence lengths N ∈ {29, 210, . . . , 216}[*Examiner notes: long sequence length mapped to global lambda function] and measure the peak allocated GPU memory and required time for each variation of transformer.”
Regarding Claim 10
Katharopoulos teaches:
The computer system of claim 1
(see rejection of claim 1)
wherein one or more of the lambda functions are local lambda functions
(page 6 column 1 last paragraph) “We compute the attention and the gradients for a synthetic input with varying sequence lengths N ∈ {29, 210, . . . , 216}[*Examiner notes: short sequence length mapped to local lambda function] and measure the peak allocated GPU memory and required time for each variation of transformer.”
Regarding Claim 11
Katharopoulos teaches:
The computer system of any preceding claim 1
(see rejection of claim 1))
wherein generating the one or more lambda functions comprises masking one or more positions of the context data
(page 4 column 1 paragraph 2) “The transformer architecture can be used to efficiently train autoregressive models by masking the attention computation such that the i-th position can only be influenced by a position j if and only if j ≤ i, namely a position cannot be influenced by the subsequent positions.”
Regarding Claim 12
Katharopoulos teaches:
The computer system of any preceding claim 1
(see rejection of claim 1)
wherein the machine- learned model is configured to perform an image processing task, wherein the image processing task comprises image classification, object detection, image recognition, image segmentation, image data modification, image encoding, image compression or image upscaling.
(page 7 column 1 second to last paragraph) “Image completions and unconditional samples from our MNIST model can be seen in figure 3. We observe that our linear transformer generates very convincing samples with sharp boundaries and no noise. In the case of image completion, we also observe that the transformer learns to use the same stroke style and width as the original image effectively attending over long temporal distances.”
PNG
media_image3.png
154
366
media_image3.png
Greyscale
Regarding Claim 15
Claim 15 is a method claim corresponding to system claim 1. The only difference is that claim 15 recites a computer-implemented method instead of a computer system. Therefore, the same rejection and rationale applies to claim 15.
Regarding Claim 16
Claim 16 is a method claim corresponding to computer system claim 2. Therefore, the same rejection and rationale applies.
Regarding Claim 17
Claim 17 is a method claim corresponding to computer system claim 3. Therefore, the same rejection and rationale applies.
Regarding Claim 18
Claim 18 is a method claim corresponding to computer system claim 4. Therefore, the same rejection and rationale applies.
Regarding Claim 19
Claim 19 is a method claim corresponding to computer system claim 5. Therefore, the same rejection and rationale applies.
Regarding Claim 20
Claim 20 is a computer-readable medium claim corresponding to system claim 1. The only difference is that claim 20 recites a non-transitory computer-readable medium instead of a computer system. Therefore, the same rejection and rationale applies to claim 20.
Regarding Claim 21
Claim 21 is a computer-readable medium claim corresponding to computer system claim 10. Therefore, the same rejection and rationale applies.
Regarding Claim 22
Claim 22 is a computer-readable medium claim corresponding to computer system claim 11. Therefore, the same rejection and rationale applies.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Katharopoulos in view of Worral et al. “CubeNet: Equivariance to 3D Rotation and Translation”.
Regarding Claim 6
Katharopoulos teaches:
The computer system of claim 1
(see rejection of claim 1)
Katharopoulos does not explicitly teach:
wherein translation- equivariant position interactions are determined based on relative positions of one or more pairs of a plurality of query positions and a plurality of positions in the context data
However, Worral teaches:
wherein translation- equivariant position interactions are determined based on relative positions of one or more pairs of a plurality of query positions and a plurality of positions in the context data.
(page 18) “We introduce a Group Convolutional Neural Network with linear equivariance to translations and right angle rotations in three dimensions. We call this network CubeNet, reflecting its cube-like symmetry. By construction, this network helps preserve a 3D shape’s global and local signature, as it is transformed through successive layers.”
Katharopoulos, Worral, and the instant application are analogous because they are all directed to neural networks.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the present invention to modify the machine learning layers of Katharopoulos by implementing the translation-invariant position interactions disclosed by Worral because (Worral page 14 section 6 paragraph 1) “On the ModelNet10 classification challenge, we have achieved state-of-the-art for a single model, beating some much larger models, which rely on heavy data augmentation. Since our models are rotation in/equivariant by design, our CNNs need not learn to overcome rotations, the way a standard CNN does. In 3D, this is an especially important gain. As a result, our model is positioned to get better generalization with less data, while avoiding the need to perform time-costly rotation averaging at test-time”
Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Katharopoulos in view of NPL reference Wei et al. “Fusion of an Ensemble of Augmented Image Detectors for Robust Object Detection” herein referred to as Wei.
Regarding Claim 8
Katharopoulos teaches:
The computing system of claim 1
(see rejection of claim 1)
Katharopoulos does not explicitly teach:
wherein applying the one or more generated lambda functions to the input data comprises combining a series of outputs resulting from applying at least one of the generated lambda functions to a plurality of queries associated with the input data.
However, Wei teaches:wherein applying the one or more generated lambda functions to the input data comprises combining a series of outputs resulting from applying at least one of the generated lambda functions to a plurality of queries associated with the input data.
(page 6 section 3.1) “First, the input is augmented to produce several variations[*Examiner notes: plurality of queries associated with input], so we can have augmented inputs for future stages.”; (page 6 section 3.1) “Then, the AABBFI fusion method is used to fuse the T AABBs to obtain one AABB for each object in the input[*Examiner notes: combining a series of outputs].”; Figure 2
PNG
media_image4.png
515
710
media_image4.png
Greyscale
Katharopoulos, Wei, and the instant application are analogous because they are all directed to neural networks.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the present invention to modify the machine learning layers of Katharopoulos with the combining of outputs taught by Wei because (Wei page 18 section 5) “Our proposed system is not only fast, but also accurate, which are two important criteria in ADAS. By using this computational intelligence system, we are able to build a more robust object detection sub-system for ADAS applications, with the proposed system showing improvement in both IoU and mAP metrics. Furthermore, very good results were obtained when only utilizing three combined inputs, making the computational load roughly three-times the load for only using one input (the original image).”
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: Islam et al. “How much position do convolutional neural networks encode?” for teaching position functions in image processing neural networks.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Ezra J Baker whose telephone number is (703)756-1087. The examiner can normally be reached Monday - Friday 10:00 am - 8:00 pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David Yi can be reached at (571) 270-7519. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/E.J.B./Examiner, Art Unit 2126
/VAN C MANG/Primary Examiner, Art Unit 2126