Prosecution Insights
Last updated: April 17, 2026
Application No. 18/417,929

COMPUTER-IMPLEMENTED METHODS, COMPUTING SYSTEMS, AND NON-TRANSITORY MACHINE-READABLE MEDIUMS FOR VISION TRANSFORMING

Non-Final OA §103
Filed
Jan 19, 2024
Examiner
DOTTIN, DARRYL V
Art Unit
2683
Tech Center
2600 — Communications
Assignee
unknown
OA Round
1 (Non-Final)
79%
Grant Probability
Favorable
1-2
OA Rounds
2y 1m
To Grant
92%
With Interview

Examiner Intelligence

Grants 79% — above average
79%
Career Allow Rate
411 granted / 521 resolved
+16.9% vs TC avg
Moderate +13% lift
Without
With
+13.3%
Interview Lift
resolved cases with interview
Fast prosecutor
2y 1m
Avg Prosecution
20 currently pending
Career history
541
Total Applications
across all art units

Statute-Specific Performance

§101
7.4%
-32.6% vs TC avg
§103
49.5%
+9.5% vs TC avg
§102
29.1%
-10.9% vs TC avg
§112
12.7%
-27.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 521 resolved cases

Office Action

§103
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Information Disclosure Statement The information disclosure statement (IDS) submitted on 05/10/2019 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner. Status of Claims Claims 1-20 are pending in this application. Oath/Declaration The receipt of Oath/Declaration is acknowledged. Drawings The receipt of Drawings is acknowledged. Allowable Subject Matter 6. Claims 4-6, 11-13 and 18-20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. Referring to Claim 4: The prior art(s) searched, cited and/or of record fails to explicitly disclose(s) or suggest(s) the teaching(s) of the computer-implemented method of claim 1, wherein, during the splitting, the first channel portion is positioned within a window at a first position in the channel and the second channel portion is outside of the window at the first position, the method further comprising: for each of the one or more channels of each of the set of tiles of the image: shifting the window to a second position within the channel to identify a third channel portion positioned within the window at the second position and a fourth channel portion outside of the window at the second position; processing the third channel portion using depthwise convolution; and processing the fourth channel portion with multi-head self- attention, wherein the combining includes combining the processed first channel portion, the processed second channel portion, the processed third channel portion, and the processed fourth channel portion. Referring to Claims 5 and 6: Claims 5 and 6 are objected to based on their claim dependencies to objected Claim 4, respectively. Referring to Claim 11: The prior art(s) searched, cited and/or of record fails to explicitly disclose(s) or suggest(s) the teaching(s) of the computer system of claim 8, wherein, during the splitting, the first channel portion is positioned within a window at a first position in the channel and the second channel portion is outside of the window at the first position, and wherein the computer-executable instructions, when executed by the one or more processors, cause the computing system to: for each of the one or more channels of each of the set of tiles of the image: shift the window to a second position within the channel to identify a third channel portion positioned within the window at the second position and a fourth channel portion outside of the window at the second position; process the third channel portion using depthwise convolution; and process the fourth channel portion with multi-head self- attention, wherein the processed first channel portion, the processed second channel portion, the processed third channel portion, and the processed fourth channel portion are combined for each of the one or more channels of each of the set of tiles of the image. Referring to Claims 12 and 13: Claims 12 and 13 are objected to based on their claim dependencies to objected Claim 11, respectively. Referring to Claim 18: The prior art(s) searched, cited and/or of record fails to explicitly disclose(s) or suggest(s) the teaching(s) of the non-transitory machine-readable medium of claim 15, wherein, during the splitting, the first channel portion is positioned within a window at a first position in the channel and the second channel portion is outside of the window at the first position, and wherein the computer-executable instructions, when executed by the one or more processors, cause the computing system to: for each of the one or more channels of each of the set of tiles of the image: shift the window to a second position within the channel to identify a third channel portion positioned within the window at the second position and a fourth channel portion outside of the window at the second position; process the third channel portion using depthwise convolution; and process the fourth channel portion with multi-head self- attention, wherein the processed first channel portion, the processed second channel portion, the processed third channel portion, and the processed fourth channel portion are combined for each of the one or more channels of each of the set of tiles of the image. Referring to Claims 19 and 20: Claims 19 and 20 are objected to based on their claim dependencies to objected Claim 18, respectively. Claim Rejections - 35 USC § 103 7. In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 8. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. 9. The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action. 10. The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows: 1. Determining the scope and contents of the prior art. 2. Ascertaining the differences between the prior art and the claims at issue. 3. Resolving the level of ordinary skill in the pertinent art. 4. Considering objective evidence present in the application indicating obviousness or nonobviousness. 11. Claims 1-3, 7-10 and 14-17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wu (US PG. Pub. 2023/0135109 A1) in view of Bulat (US PG. Pub. 2023/0298321 A1). Referring to Claim 1, Wu teaches a computer-implemented method for vision transforming (See Wu, Fig. 2, Method 200, Sect. [0002] a method for processing a signal, an electronic device, and a computer-readable storage medium to the field of deep learning and computer vision technologies) comprising: for each of one or more channels of each of a set of tiles of an image (See Wu, Fig. 3, Sect. [0038] lines 1-6, the input feature map 302 including sets of tiles in a rows and columns of an image is divided into a first feature map 306 and a second feature map 304 that are independent of each other in a channel dimension.): splitting the channel into at least a first channel portion and a second channel portion (See Wu, Sect. [0038] and [0073], a feature map splitting module configured to split the input feature map into a first feature map and a second feature map that are independent of each other in a channel dimension.); processing the first channel portion using depthwise convolution (See Wu, Sect. [0087] lines 14-17, a depthwise convolution is used to dynamically generate the position codes from the input image, the position codes output by inputting first feature map 306 into the convolution.); processing the second channel portion with multi-head self-attention (See Wu, Sect. [0059], performing multi-headed self-attention on channel dimension of second feature map 304). Wu fails to explicitly teach combining the processed first channel portion and the processed second channel portion; and identifying an object in the image at least partially based on the combined processed first channel portion and processed second channel portion for each of the one or more channels of each of the set of tiles of the image. However, Bulat teaches combining the processed first channel portion and the processed second channel portion (See Bulat, Sect. [0077] lines 10-12, each channel in the output vector y.sub.s.sup.l is a linear combination of the corresponding channel of the value vectors, suggesting a channel-wise operation. Thus, the plural channels are linearly combined). identifying an object in the image at least partially based on the combined processed first channel portion and processed second channel portion for each of the one or more channels of each of the set of tiles of the image (See Fig. 4, Step S100, Object Identification, Sect. [[0085], In FIG. 4 a flowchart for performing image or video recognition using the ML model is displayed at S100 is a method step for receiving an image depicting at least one feature to be identified, the image comprising a plurality of channels wherein, the image identifies objects, actions, sequences of interest, and so on within an image or video, (step S100).). It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wu to incorporate the teachings of Bulat to provide combining the processed first channel portion and the processed second channel portion; and identifying an object in the image at least partially based on the combined processed first channel portion and processed second channel portion for each of the one or more channels of each of the set of tiles of the image. Doing so would improve the efficiency of machine learning models for performing image or video recognition, as recognized by Bulat. Referring to Claim 2, the combination of Wu in view of Bulat teaches the computer-implemented method of claim 1 (See Wu, Fig. 2, Method 200, Sect. [0002] a method for processing a signal, an electronic device, and a computer-readable storage medium to the field of deep learning and computer vision technologies). Wu fails to explicitly teach wherein the processing of the second channel portion includes reducing a size of tokens of the second channel portion. However, Bulat teaches (See Bulat, Fig. 1, Token Size Rescaling, Sect. [0080], the channel-wise rescaling and toxen mixing implemented by the AV on the left-hand side are replaced, by local token mixing using the shift operator followed by channel-wise rescaling and bias correction. The Affine-shift block is shown on the right-hand side in FIG. 1. Here, the rescaling and channel mixing of the first channel-wise on left hand side in Fig. 1 reduces the token size of the second channel-wise on right hand side of Fig. 1). It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wu to incorporate the teachings of Bulat to provide wherein the processing of the second channel portion includes reducing a size of tokens of the second channel portion. Doing so would improve the efficiency of machine learning models for performing image or video recognition, as recognized by Bulat. Referring to Claim 3, the combination of Wu in view of Bulat teaches the computer-implemented method of claim 1 (See Wu, Fig. 2, Method 200, Sect. [0002] a method for processing a signal, an electronic device, and a computer-readable storage medium to the field of deep learning and computer vision technologies). Wu fails to explicitly teach wherein the combining includes processing the processed first channel portion and the processed second channel portion with a multilayer perceptron. However, Bulat teaches wherein the combining includes processing the processed first channel portion and the processed second channel portion with a multilayer perceptron (See Bulat, Sect. [0011], The approximation mechanism may comprise two separate modules to approximate the operations of the attention mechanism. One such module may be used for computing the channel-wise rescaling value using a multilayer perceptron module of the ML model.). It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wu to incorporate the teachings of Bulat to provide wherein the combining includes processing the processed first channel portion and the processed second channel portion with a multilayer perceptron. Doing so would improve the efficiency of machine learning models for performing image or video recognition, as recognized by Bulat. Referring to Claim 7, the combination of Wu in view of Bulat teaches the computer-implemented method of claim 1 See Wu, Fig. 2, Method 200, Sect. [0002] a method for processing a signal, an electronic device, and a computer-readable storage medium to the field of deep learning and computer vision technologies), Wu fails to explicitly teach teach wherein the splitting, the processing of the first channel portion, the processing of the second channel portion, and the combining are performed for each of the one or more channels of each of the set of tiles of two or more resolutions of the image. However, Bulat teaches wherein the splitting, the processing of the first channel portion, the processing of the second channel portion, and the combining are performed for each of the one or more channels of each of the set of tiles of two or more resolutions of the image (See Bulat, Sect. [0063], [0083] and [0101], by using spatio-temporal factorization, low resolution self-attention and hierarchical pyramid architecture, restricting the self-attention computation to local windows, or a local approximation of time attention…the standard hierarchical (pyramidal) structure is followed for these attention-free transformers, where the resolution is dropped between stages, similar to a ResNet.). It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wu to incorporate the teachings of Bulat to provide wherein the splitting, the processing of the first channel portion, the processing of the second channel portion, and the combining are performed for each of the one or more channels of each of the set of tiles of two or more resolutions of the image. Doing so would improve the efficiency of machine learning models for performing image or video recognition, as recognized by Bulat. Referring to Claim 8, arguments analogous to claim 1 are applicable herein. The functions of “A computer-implemented method for vision transforming” in claim 1 perform all of the operations of “A computing system for vision transforming” in claim 8. Thus, “A computing system for vision transforming” in claim 8 is rejected for reasons explicitly taught in the rejection of claim 1. Referring to Claim 9, arguments analogous to claim 2 are applicable herein. The functions of “The computer-implemented method” in claim 2 perform all of the operations of “The computing system” in claim 9. Thus, “The computing system” in claim 9 is rejected for reasons explicitly taught in the rejection of claim 2. Referring to Claim 10, arguments analogous to claim 3 are applicable herein. The functions of “The computer-implemented method” in claim 3 perform all of the operations of “The computing system” in claim 10. Thus, “The computing system” in claim 10 is rejected for reasons explicitly taught in the rejection of claim 3. Referring to Claim 14, arguments analogous to claim 7 are applicable herein. The functions of “The computer-implemented method” in claim 7 perform all of the operations of “The computing system” in claim 14. Thus, “The computing system” in claim 14 is rejected for reasons explicitly taught in the rejection of claim 7. Referring to Claim 15, arguments analogous to claim 1 are applicable herein. Thus, A non-transitory machine-readable medium of claim 15 is explicitly/inherently taught as evidenced by (See Wu, Fig. 9, Computing Unit 901, Sect. [0096], The computing unit 901 may be various general-purpose and/or dedicated processing components with processing and computing capabilities. Some examples of computing unit 901 include, but not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various dedicated AI computing chips, various computing units that run machine learning model algorithms, and a digital signal processor (DSP), and any appropriate processor, controller and microcontroller. The computing unit 901 executes the various methods and processes described above, such as processes 200, 300, 400 and 500 For example, in some embodiments, the processes 200, 300, 400 and 500 may be implemented as a computer software program, which is tangibly contained in a machine-readable medium, such as the storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed on the device 900 via the ROM 902 and/or the communication unit 909. When the computer program is loaded on the RAM 903 and executed by the computing unit 901, one or more steps of the processes 200, 300, 400 and 500 described above may be executed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform the processes 200, 300, 400 and 500 in any other suitable manner (for example, by means of firmware).) and various memories stored therein. Referring to Claim 16, arguments analogous to claim 2 are applicable herein. Thus, A non-transitory machine-readable medium of claim 16 is explicitly/inherently taught as evidenced by (See Wu, Fig. 9, Computing Unit 901, Sect. [0096], The computing unit 901 may be various general-purpose and/or dedicated processing components with processing and computing capabilities. Some examples of computing unit 901 include, but not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various dedicated AI computing chips, various computing units that run machine learning model algorithms, and a digital signal processor (DSP), and any appropriate processor, controller and microcontroller. The computing unit 901 executes the various methods and processes described above, such as processes 200, 300, 400 and 500 For example, in some embodiments, the processes 200, 300, 400 and 500 may be implemented as a computer software program, which is tangibly contained in a machine-readable medium, such as the storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed on the device 900 via the ROM 902 and/or the communication unit 909. When the computer program is loaded on the RAM 903 and executed by the computing unit 901, one or more steps of the processes 200, 300, 400 and 500 described above may be executed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform the processes 200, 300, 400 and 500 in any other suitable manner (for example, by means of firmware).) and various memories stored therein. Referring to Claim 17, arguments analogous to claim 3 are applicable herein. Thus, A non-transitory machine-readable medium of claim 17 is explicitly/inherently taught as evidenced by (See Wu, Fig. 9, Computing Unit 901, Sect. [0096], The computing unit 901 may be various general-purpose and/or dedicated processing components with processing and computing capabilities. Some examples of computing unit 901 include, but not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various dedicated AI computing chips, various computing units that run machine learning model algorithms, and a digital signal processor (DSP), and any appropriate processor, controller and microcontroller. The computing unit 901 executes the various methods and processes described above, such as processes 200, 300, 400 and 500 For example, in some embodiments, the processes 200, 300, 400 and 500 may be implemented as a computer software program, which is tangibly contained in a machine-readable medium, such as the storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed on the device 900 via the ROM 902 and/or the communication unit 909. When the computer program is loaded on the RAM 903 and executed by the computing unit 901, one or more steps of the processes 200, 300, 400 and 500 described above may be executed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform the processes 200, 300, 400 and 500 in any other suitable manner (for example, by means of firmware).) and various memories stored therein. Cited Art 12. The prior art made of record and not relied upon is considered pertinent to applicant's disclosure Maaz et al. (US PG. PUB. No. 2024/0193404 A1) discloses An edge computing system, computer readable storage medium and method for object detection, including processing circuitry. The processing circuitry is configured with a hybrid CNN and vision transformer backbone network in an object detection deep learning network. The backbone network receives an image, and includes a first convolutional encoder to extract local features from feature maps of the image, a second stage having consecutive second convolutional encoders, a positional encoding layer, split depth-wise transpose attention (SDTA) encoders, consecutive convolutional encoders, a third stage and a fourth stage SDTA encoder. Each of the SDTA encoders perform multi-headed self-attention by applying a dot product operation across channel dimensions in order to compute cross-covariance across channels to generate attention feature maps. The object detection neural network includes a convolutional network that produces a fixed-size collection of bounding boxes and scores for a presence of object class instances in those boxes. Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to DARRYL V DOTTIN whose telephone number is (571)270-5471. The examiner can normally be reached M-F 9am-5pm. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abderrahim Merouan can be reached on 571-270-5254. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /DARRYL V DOTTIN/Primary Examiner, Art Unit 2683 /DARRYL V DOTTIN/Primary Examiner, Art Unit 2683
Read full office action

Prosecution Timeline

Jan 19, 2024
Application Filed
Jan 30, 2026
Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12602618
ARTIFICIAL VISION PARAMETER LEARNING AND AUTOMATING METHOD FOR IMPROVING VISUAL PROSTHETIC SYSTEMS
2y 5m to grant Granted Apr 14, 2026
Patent 12602425
INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND PROGRAM
2y 5m to grant Granted Apr 14, 2026
Patent 12586181
FUNCTIONAL IMAGING FEATURES FROM COMPUTED TOMOGRAPHY IMAGES
2y 5m to grant Granted Mar 24, 2026
Patent 12586150
EFFICIENT BI-DIRECTIONAL IMAGE SCALING
2y 5m to grant Granted Mar 24, 2026
Patent 12585416
IMAGE PROCESSING APPARATUS, CONTROL METHOD OF IMAGE PROCESSING APPARATUS, AND STORAGE MEDIUM
2y 5m to grant Granted Mar 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
79%
Grant Probability
92%
With Interview (+13.3%)
2y 1m
Median Time to Grant
Low
PTA Risk
Based on 521 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in for Full Analysis

Enter your email to receive a magic link. No password needed.

Free tier: 3 strategy analyses per month