Last updated: April 19, 2026

Application No. 18/518,075

TRAINING LARGE-SCALE VISION TRANSFORMER NEURAL NETWORKS WITH VARIABLE PATCH SIZES

Non-Final OA §103

Filed

Nov 22, 2023

Examiner

SHEN, QUN

Art Unit

2662

Tech Center

2600 — Communications

Assignee

Google LLC

OA Round

1 (Non-Final)

Interview Optional

— +38.6% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 754 resolved cases, 2023–2026

Examiner Intelligence

SHEN, QUN View full profile →

Grants 76% — above average

Career Allow Rate

575 granted / 754 resolved

+14.3% vs TC avg

Strong +39% interview lift

Without

With

+38.6%

Interview Lift

resolved cases with interview

Typical timeline

3y 1m

Avg Prosecution

34 currently pending

Career history

788

Total Applications

across all art units

Statute-Specific Performance

§101

5.6%

-34.4% vs TC avg

§103

61.4%

+21.4% vs TC avg

§102

8.4%

-31.6% vs TC avg

§112

16.8%

-23.2% vs TC avg

Black line = Tech Center average estimate • Based on career data from 754 resolved cases

Office Action

§103

Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA. DETAILED ACTION This communication is a non-Final office action in merits. Claims 1-20 , as originally filed, are presently pending . C laims 1, 6-1 3 , 17-19, af ter restriction election, have been elected and considered below. Information Disclosure Statement The information disclosure statement (IDS) submitted on 1/3/2025 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner. Restriction Election Applicant elects Species III (claims 7-13), without traverse, for further examination. Claim Rejections - 35 USC § 1 0 1 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. the claimed invention is directed to non-statutory subject matter. The claim s 1-20 do not fall within at least one of the four categories of patent eligible subject matter because Although the claim is directed to a statutory machine (claim 18), it recites certain data manipulations ( obtaining training images as input for neural network training; generating/selecting mage patches) and mathematical concepts /operations ( training the neural network using a difference between the network output for the training image and the target network output for the training image ) that could be decided mentally and/or manually and absent specific technical constraints and yet integrated into practical applications. Claims 1 and 19 recite similar limitations and are rejected with the same reason as claim 18. Dependent claims depending from claims 1 and 19 do not add significant more to their base claims and therefore rejected the same. Claim Rejections - 35 USC § 1 03 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. The factual inquiries set forth in Graham v. John Deere Co. , 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows: 1. Determining the scope and contents of the prior art. 2. Ascertaining the differences between the prior art and the claims at issue. 3. Resolving the level of ordinary skill in the pertinent art. 4. Considering objective evidence present in the application indicating obviousness or nonobviousness. This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention. Claims 1 , 6- 8, 17 - 19 are rejected under 35 U.S.C. 103 as being unpatentable over US 202 3 /0 100413 A1, Zhu et al. (hereinafter Zhu ) . As to claim 1 , Zhu discloses a method performed by one or more computers and for training a neural network that is configured to process an input image to generate a network output for the input image (Fig 5A ) , the method comprising, at each of a plurality of training steps: obtaining a plurality of training images for the training step ( Fig 4, 5A-5 B; pars 0004- 0005, 002 3 , 0 039 , 0061, images being obtained for training transformer based neural network ) ; obtaining, for each of the plurality of training images, a respective target output (Fig s 4-5 A-AC ; pars 0010 , 0028, 0031, 0063-0065, 0078, obtaining outputs from different layers/network/subnetwork for their respective target s /purpose s ) ; selecting, from a plurality of image patch generation schemes, an image patch generation scheme for the training step, wherein, given an input image, each of the plurality of image patch generation schemes generates a different number of patches of the input image (Figs 8-9; pars 0004-00012, generating a number of image patches , each patch being selected through window partitions and shifted window partitions for different sizes ) , and wherein each patch comprises a respective subset of the pixels of the input image ( Figs 8-9; pars 0004-00012 , since each patch is a part of the image, it contains parts of pixels of the image in which the patches being generated/selected) ; for each of the training images: generating a plurality of image patches of the training image by applying the selected image patch generation scheme to the training image (Figs 8-9; pars 0004-00012 , either selecting window partitions or shifted window partitions and the sizes of the patches depending on the window sizes ) ; and processing the plurality of image patches using the neural network to generate a network output for the training image (Figs 8-9; pars 0004-00012 ) ; and training the neural network on an objective that measures, for each training image, a difference between the network output for the training image and the target network output for the training image (pars 0015, 0036, 0091, 0113 -0115, 0152 , the loss function of network training being the outputs difference ) . Although Zhu discloses or teaches above limitations in more than one embodiments , consider Zhu’s teachings as a whole, it would have been obvious to one of skill in the art at time of invention to incorporate or combine teachings from different embodiments to achieve predictable results. 2 -5 . (Withdrawn) As to claim 6. Zhu discloses the method of claim 1, further comprising: prior to training the neural network, initializing values of parameters of the neural network based on trained values of parameters of a trained teacher neural network (pars 0063, 0115 , neural network initialization ) . As to claim 7 . Zhu discloses the method of claim 1, wherein the neural network comprises an embedding subnetwork (pars 0033, 0136, 0150, 0212, a liner embedding layer of the encoder sub-network) , a self-attention subnetwork (pars 0006, 0026, 0069, 0072-0093, 0118-0119) , and an output subnetwork (Figs 5A-5B; pars 0133, 0139, 0155) , and wherein processing the plurality of image patches using the neural network to generate a network output for the training image comprises: processing the plurality of image patches using the embedding subnetwork to generate a respective embedding for each of the image patches ( pars 0033, 0136, 0150, 0212 ) ; processing an input sequence comprising the respective embeddings for each of the image patches using the self-attention subnetwork to generate a self-attention output for the training image (pars 0006, 0026, 0069, 0072-0093, 0118-0119) ; and processing the self-attention subnetwork using the output subnetwork to generate the network output for the training image (Figs 5A-5B; pars 0133, 0139, 0155) . As to claim 8 . Zhu discloses the method of claim 7, wherein processing the plurality of image patches using the embedding subnetwork to generate a respective embedding for each of the image patches comprises: for each image patch, applying a set of patch embedding weights to the intensity values of the pixels in the image patch to generate an initial embedding of the image patch (pars 0030, 0063, 0083, 0071, 0091, 0095, 0102, 0135, 0140) . 14 -16 . (Withdrawn) As to claim 17, Zhu discloses the method of claim 1, wherein the target output for each of the training items is generated based on an output generated by a text processing neural network ( pars 0104, 0114, 0133, 0135, 0145, labeling process ) . As to claim 1 8 , it is a system claim encompassed claim 1 . Rejection of claim 1 is therefore incorporated herein. As to claim 19, it recites one or more non-transitory CRM storing instructions executed to perform functions and features in claim 1. Rejection of claim 1 is therefore incorporated herein. Claim 1 1 is rejected under 35 U.S.C. 103 as being unpatentable over Zhu in view of US 202 4 /0 281654 A1, Reed et al. (hereinafter Reed ) . As to claim 11 . Zhu discloses the method of claim 8, but does not expressly teach wherein processing the plurality of image patches using the embedding subnetwork to generate a respective embedding for each of the image patches comprises: combining the initial embedding of the image patch with a learned positional embedding that corresponds to a position of the image patch within the training image to generate the embedding of the image patch. Reed, in the same or similar field of endeavor, further teaches combining the initial embedding of the image patch with a learned positional embedding that corresponds to a position of the image patch within the training image to generate the embedding of the image patch (pars 0026-0027, 0088-0090, 0092, 0105). Therefore, consider Zhu and Reed’s teachings as a whole, it would have been obvious to one of skill in the art before the filing date of invention to incorporate Reed’s teachings in Zhu’s method to provide patch embedding with corresponding position information of the patch. 20. (Withdrawn) Allowable Subject Matter Claims 9-10, 12-13 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. Reasons for Allowance Prior art of record ( Zhu and Reed ) neither discloses alone nor teaches in combination functions and features recited in claim 9. Claims 10, 12-13 depend from claim 9 . Examiner’s Note Examiner has cited particular column, line number, paragraphs and/or figure(s) in the reference(s) as applied to the claims for the convenience of the Applicant. Although the specified citations are representative of the teachings of the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested from the applicant in preparing responses, to fully consider the reference(s) in entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the Examiner. Contact Information Any inquiry concerning this communication or earlier communications from the examiner should be directed to QUN SHEN whose telephone number is (571)270-7927 . The examiner can normally be reached on Mon-Fri 8:30-5:50 PT . Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration t ool. To schedule an interview, a pplicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Amandeep Saini can be reached on 571-272-3382. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /QUN SHEN/ Primary Examiner, Art Unit 2662

Read full office action

Prosecution Timeline

Nov 22, 2023

Application Filed

Mar 29, 2026

Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/884,109

Patent 12602799

REGISTRATION CHAINING WITH INFORMATION TRANSFER

2y 5m to grant Granted Apr 14, 2026

18/224,009

Patent 12579609

High Resolution Input Processing in a Neural Network

2y 5m to grant Granted Mar 17, 2026

18/965,817

Patent 12566972

DATA DENOISING METHOD AND RELATED DEVICE

2y 5m to grant Granted Mar 03, 2026

17/857,416

Patent 12561997

CONTEXT-BASED REVIEW TRANSLATION

2y 5m to grant Granted Feb 24, 2026

18/042,086

Patent 12560726

Low-Power-Consumption Positioning Method and Related Apparatus

2y 5m to grant Granted Feb 24, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

76%

Grant Probability

99%

With Interview (+38.6%)

3y 1m

Median Time to Grant

Low

PTA Risk

Based on 754 resolved cases by this examiner. Grant probability derived from career allow rate.