Office Action Analysis: 18319259 — EFFICIENT TRANSFORMER WITH SERIAL COMPOSITION OF MULTI-SCALE MULTI-RANGE ATTENTIONS

Office Action

§102 §103 §112
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
Such claim limitations are: “means for accessing an input data sequence; means for slicing the input data sequence based on a slice length hyperparameter to generate a stacked slice input data representation; means for processing the stacked slice input data representation with a slice attention layer to generate a stacked slice output data representation; and means for de-slicing the stacked slice output data representation to generate an output data sequence.” in claim 30.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof. Means for accessing is interpreted as a general computer function of being able to store/access data and thus is covered by generic storage/memory present in every device. Means for processing is a specialized computer function clearly linked to the algorithm shown partly in Fig. 3 with the steps that are spelled out in claim 2. As discussed below, means for slicing and de-slicing are indefinite and lack written description.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claim 30 is rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. The specification (see
P0062) does not disclose sufficient corresponding structure for the claimed
functions of slicing the input data sequence based on a slice length hyperparameter to generate a stacked slice input data representation (see MPEP 2181 (IV)). The specification (see P0072) does not disclose sufficient corresponding structure for the claimed functions of de-slicing the stacked slice output data representation to generate an output data sequence (see MPEP 2181 (IV)).

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim limitations “means for slicing the input data sequence based on a slice length hyperparameter to generate a stacked slice input data representation; …and means for de-slicing the stacked slice output data representation to generate an output data sequence” of claim 30 invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. No association between the structure and the functions can be found in the specification (P0062, P0072). The specification fails to clearly link the claimed functions to disclosed structures, materials, or acts (see MPEP 2181 (III)). Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.
Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.




Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-7, 10-17, 20-27, and 30 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Han Kai et al ("Transformer in Transformer", October 26 2021), hereafter Kai.

Regarding claims 1, 11, and 21, Kai teaches computer-implemented method, processing system, and non-transitory computer-readable medium comprising computer-executable instructions that, when executed by one or more processors of a processing system, cause the processing system to perform an operation comprising (page 1, last four lines of abstract, and page 6, section 3.1 paragraph “Implementation Details”): accessing an input data sequence (“Therefore, we are motivated to explore a more exquisite visual image dividing method for generating visual sequences and improve the performance”, page 2, full paragraph 2. “we propose a novel Transformer-iN-Transformer (TNT) architecture for visual recognition as shown in Figure 1. To enhance the feature representation ability of visual transformers, we first divide the input images into several patches as “visual sentences” and then further divide them into sub-patches as “visual words”.”, page 2, full paragraph 3); slicing the input data sequence based on a slice length hyperparameter to generate a stacked slice input data representation (“we uniformly split it into n patches X = [X1, X2, …, Xn] ε Rn × p × p × 3, where (p, p) is the resolution of each image patch”, wherein P is the hyperparameter determining the slice length, page 3, section 2.2, full paragraph 1 and equation 4. “In TNT, we view the patches as visual sentences that represent the image. Each patch is further divided into m sub-patches, i.e., a visual sentence is composed of a sequence of visual words: Xi -» [xi,1, xi,2, …, xi,m] (4) where …, (s, s) is the spatial size of sub-patches”, wherein s is the hyperparameter determining slice length, page 3, section 2.2, lines 5-9); processing the stacked slice input data representation with a slice attention layer to generate a stacked slice output data representation (linear layer produces output based on divided input data. Page 3 section 2.1 subsection MSA and section 2.2 equation 6, page 2 figure 1); and de-slicing the stacked slice output data representation to generate an output data sequence (“With the above addition operation, the representation of sentence embedding is augmented by the word-level features”, page 4, section 2.2, lines 13-15. After data is pieced together it is output using the outer transformer block, page 2, figure 1, Outer Transformer Block).

Regarding claim 30, claim limitations “means for slicing the input data sequence based on a slice length hyperparameter to generate a stacked slice input data representation; … and means for de-slicing the stacked slice output data representation to generate an output data sequence” invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. These elements are interpreted under 35 U.S.C. 112(f) as processor(s) with the algorithm described in the specification (the algorithms to slice the input data sequence and de-slice the output data sequence) that causes the processor(s) to perform the claimed function.
	Regarding claims 2, 12, 22, and 30, Kai teaches the limitations of claims 1, 11, and 21 as outlined above. Kai further teaches processing the stacked slice input data representation with a high-resolution local attention layer to generate local attention output data (transformer block analyzes a relationship between visual words, the relationship between the visual words being interpreted as local attention output data, pages 3-4, section 2.2, equations 6 and 7, paragraph above and below equations 6 and 7); processing the local attention output data with a slice embedding layer to generate slice embeddings (standard transformer block transforms sentence embeddings, page 4, section 2.2, text under equation 8, equations 9 and 10); processing the slice embeddings with a reduced-resolution global attention layer to generate global attention output data (sentence embeddings produce TNT blocks, page 4, section 2.2, text under equation 10, equation 11); and performing a broadcast addition of the local attention output data and the global attention output data to generate the stacked slice output data representation (fully connected layer processes TNT blocks to output a final classification).
Regarding claims 3, 13, and 23, Kai teaches the limitations of claims 2, 12, and 22 as outlined above. Kai further teaches processing the stacked slice input data representation with the high-resolution local attention layer comprises applying a first set of trained weights to the stacked slice input data representation, and processing the slice embeddings with the reduced-resolution global attention layer comprises applying a second set of trained weights to the slice embeddings (each layer in the multi-layer perceptron (MLP) has a set of weights applied to it, pages 3-4, section 2.1 subsection MLP and section 2.2, page 2 figure 1, inner and outer transformer blocks).

Regarding claims 4, 14, and 24, Kai teaches the limitations of claims 3, 13, and 23 as outlined above. Kai further teaches generating a local key vector, a local query vector, and a local value vector by applying the first set of trained weights to the stacked slice input data representation (queries, keys, and values are generated, page 3, section 2.1, subsection MSA, equation 6 in section 2.2); and generating the local attention output data based on the local key vector, local query vector, and local value vector (outer transformer block generates outputs based on MSA, which is generated using queries, keys, and values, equations 9 and 11, page 4 and figure 1, page 2).

Regarding claims 5, 15, and 25, Kai teaches the limitations of claims 4, 14, and 24 as outlined above. Kai further teaches adding a local positional embedding to the local key vector and the local query vector, and a length of the local positional embedding is based on the slice length hyperparameter (position encoding is assigned to paired input data with the length of the position encoding dependent upon n number of words in the sentence, page 4, section 2.2, subsection Position Encoding and page 1, figure 1, word position encoding).

Regarding claims 6, 16, and 26, Kai teaches the limitations of claims 3, 13, and 23 as outlined above. Kai further teaches wherein processing the slice embeddings with the reduced-resolution global attention layer comprises: generating a global key vector, a global query vector, and a global value vector by applying the second set of trained weights to the slice embeddings (MSA generates keys, queries, and values, page 3, section 2.1, subsection MSA. MSA equation is shown on page 4 as equation 9); and generating the global attention output data based on the global key vector, global query vector, and global value vector (keys, queries, and values are split into h parts before being concatenated and linearly projected to form the final output, page 3, section 2.1, subsection MSA and Outer Transformer Block, page 2, figure 1).

Regarding claims 7, 17, and 27, Kai teaches the limitations of claims 6, 16, and 26 as outlined above. Kai further teaches processing the slice embeddings with the reduced-resolution global attention layer comprises adding a global positional embedding to the global key vector and the global query vector, and a length of the global positional embedding is based on an input data sequence length divided by the slice length hyperparameter (position encoding to retain special information is added to embeddings for sentence and word embeddings, with the length of the position encoding being based on the sentence divided by n words, page 4, section 2.2, subsection Position encoding and page 2, figure 1, Sentence position encoding).

Regarding claims 10 and 20, Kai teaches the limitations of claims 1 and 11 as outlined above. Kai further teaches wherein the slice attention layer comprises a plurality of slice attention heads (variants of TNT architecture include pluralities of heads, page 5, section 2.4, column #heads).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 8-9, 18-19, and 28-29 are rejected under 35 U.S.C. 103 as being unpatentable over Kai in view of Li et al (Pub. No.: US 20250069382 A1), hereafter Li (has priority date of provisional application 63/296625 (January 5th 2022), support found in P0045-P0047 of specification of provisional application).

Regarding claims 8, 18, and 28, Kai teaches the limitations of claims 2, 12, and 22 as outlined above. Kai does not appear to explicitly teach “wherein processing the stacked slice input data representation with the high-resolution local attention layer comprises performing overlapping slice local attention and wherein slicing the input data sequence is performed based further on an overlap hyperparameter to generate overlapping slices of the input data sequence”.
Li teaches wherein processing the stacked slice input data representation with the high-resolution local attention layer comprises performing overlapping slice local attention and wherein slicing the input data sequence is performed based further on an overlap hyperparameter to generate overlapping slices of the input data sequence (input tensors may be partitioned (split) into different portions. Portions of the input processed by the MLP may be overlapping, P0044).
Accordingly, it would have been obvious to a person having ordinary skill in the
art before the effective filing date of the claimed invention, having the teachings of
Kai and Li before them, to include Li’s specific teaching of input tensors being split into different portions with overlapping portions of the input processed by the MLP in Yu’s system of Weakly-supervised Object Detection Using One Or More Neural Networks. One would have been motivated to make such a combination of input tensors being split into different portions with overlapping portions of the input processed by the MLP (see Li P0044), and linearly transforming inputs into multiple parts to perform an attention function in parallel on each part of the input and concatenate and linearly project the outputs of each part (see Kai page 3 section 2.1 subsection MSA).

Regarding claims 9, 19, and 29, Kai teaches the limitations of claims 2, 12, and 22 as outlined above.
Kai does not appear to explicitly teach “wherein processing the stacked slice input data representation with the high-resolution local attention layer comprises performing focal slice local attention, wherein: slicing the input data sequence comprises generating a plurality of slices having a plurality of sequence lengths; and performing the focal slice local attention comprises: generating a plurality of intermediate tensors based on the plurality of slices, and aggregating the plurality of intermediate tensors”.
Li teaches wherein processing the stacked slice input data representation with the high-resolution local attention layer comprises performing focal slice local attention, wherein: slicing the input data sequence comprises generating a plurality of slices having a plurality of sequence lengths (input tensor may be partitioned (split) into different sizes, P0045); and performing the focal slice local attention comprises: generating a plurality of intermediate tensors based on the plurality of slices, and aggregating the plurality of intermediate tensors (global branch may automatically scale to handle inputs of different sizes by partitioning the input tensor irrespective of the size of the input tensor, P0045).
Accordingly, it would have been obvious to a person having ordinary skill in the
art before the effective filing date of the claimed invention, having the teachings of
Kai and Li before them, to include Li’s specific teaching of input tensors being split into different portions with a plurality of sizes in Yu’s system of Weakly-supervised Object Detection Using One Or More Neural Networks. One would have been motivated to make such a combination of input tensors being split into different portions with a plurality of sizes (see Li P0045), and linearly transforming inputs into multiple parts to perform an attention function in parallel on each part of the input and concatenate and linearly project the outputs of each part (see Kai page 3 section 2.1 subsection MSA).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
US 10956810 B1 (Wright et al) teaches a system including determining a result of a diagnostic test strip comprising a machine learning transformer architecture which parallel processes input.
US 20230360376 A1 (Hinz et al) teaches a system including a transformer model used to transform three-dimensional learned embeddings into learned query, value, and key representations of different input indices.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ISHAN MOUNDI whose telephone number is (703)756-1547. The examiner can normally be reached 8:30 A.M. - 5 P.M..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Ell can be reached at (571) 270-3264. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/I.M./           Examiner, Art Unit 2141                                                                                                                                                                                             
/MATTHEW ELL/           Supervisory Patent Examiner, Art Unit 2141
Read full office action
EFFICIENT TRANSFORMER WITH SERIAL COMPOSITION OF MULTI-SCALE MULTI-RANGE ATTENTIONS

This examiner grants 12% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

EFFICIENT TRANSFORMER WITH SERIAL COMPOSITION OF MULTI-SCALE MULTI-RANGE ATTENTIONS

This examiner grants 12% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email