Last updated: April 17, 2026

Application No. 18/417,929

COMPUTER-IMPLEMENTED METHODS, COMPUTING SYSTEMS, AND NON-TRANSITORY MACHINE-READABLE MEDIUMS FOR VISION TRANSFORMING

Non-Final OA §103

Filed

Jan 19, 2024

Examiner

DOTTIN, DARRYL V

Art Unit

2683

Tech Center

2600 — Communications

Assignee

unknown

OA Round

1 (Non-Final)

Interview Optional

— +13.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 521 resolved cases, 2023–2026

Examiner Intelligence

DOTTIN, DARRYL V View full profile →

Grants 79% — above average

Career Allow Rate

411 granted / 521 resolved

+16.9% vs TC avg

Moderate +13% lift

Without

With

+13.3%

Interview Lift

resolved cases with interview

Fast prosecutor

2y 1m

Avg Prosecution

20 currently pending

Career history

541

Total Applications

across all art units

Statute-Specific Performance

§101

7.4%

-32.6% vs TC avg

§103

49.5%

+9.5% vs TC avg

§102

29.1%

-10.9% vs TC avg

§112

12.7%

-27.3% vs TC avg

Black line = Tech Center average estimate • Based on career data from 521 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 05/10/2019 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.


Status of Claims
Claims 1-20 are pending in this application.  

Oath/Declaration
The receipt of Oath/Declaration is acknowledged.

Drawings
The receipt of Drawings is acknowledged.

Allowable Subject Matter
6.	Claims 4-6, 11-13 and 18-20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.	
Referring to Claim 4:
The prior art(s) searched, cited and/or of record fails to explicitly disclose(s) or suggest(s) the teaching(s) of the computer-implemented method of claim 1, wherein, during the splitting, the first channel portion is positioned within a window at a first position in the channel and the second channel portion is outside of the window at the first position, the method further comprising:
for each of the one or more channels of each of the set of tiles of the image:
shifting the window to a second position within the channel to identify a third channel portion positioned within the window at the second position and a fourth channel portion outside of the window at the second position;
processing the third channel portion using depthwise convolution; and
processing the fourth channel portion with multi-head self- attention, wherein the combining includes combining the processed first channel portion, 
the processed second channel portion, the processed third channel portion, and the processed fourth channel portion.

Referring to Claims 5 and 6: Claims 5 and 6 are objected to based on their claim dependencies to objected Claim 4, respectively.

Referring to Claim 11:
The prior art(s) searched, cited and/or of record fails to explicitly disclose(s) or suggest(s) the teaching(s) of the computer system of claim 8, wherein, during the splitting, the first channel portion is positioned within a window at a first position in the channel and the second channel portion is outside of the window at the first position, and wherein the computer-executable instructions, when executed by the one or more processors, cause the computing system to:
for each of the one or more channels of each of the set of tiles of the image:
shift the window to a second position within the channel to identify a third channel portion positioned within the window at the second position and a fourth channel portion outside of the window at the second position;
process the third channel portion using depthwise convolution; and
process the fourth channel portion with multi-head self- attention, 
wherein the processed first channel portion, the processed second channel portion, the processed third channel portion, and the processed fourth channel portion are combined for each of the one or more channels of each of the set of tiles of the image.

Referring to Claims 12 and 13: Claims 12 and 13 are objected to based on their claim dependencies to objected Claim 11, respectively.

Referring to Claim 18:
The prior art(s) searched, cited and/or of record fails to explicitly disclose(s) or suggest(s) the teaching(s) of the non-transitory machine-readable medium of claim 15, wherein, during the splitting, the first channel portion is positioned within a window at a first position in the channel and the second channel portion is outside of the window at the first position, and wherein the computer-executable instructions, when executed by the one or more processors, cause the computing system to:
for each of the one or more channels of each of the set of tiles of the image:
shift the window to a second position within the channel to identify a third channel portion positioned within the window at the second position and a fourth channel portion outside of the window at the second position;
process the third channel portion using depthwise convolution; and
process the fourth channel portion with multi-head self- attention, 
wherein the processed first channel portion, the processed second channel portion, the processed third channel portion, and the processed fourth channel portion are combined for each of the one or more channels of each of the set of tiles of the image.

Referring to Claims 19 and 20: Claims 19 and 20 are objected to based on their claim dependencies to objected Claim 18, respectively.

Claim Rejections - 35 USC § 103
7.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
8.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

9.	The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
10.	The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

11.	Claims 1-3, 7-10 and 14-17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wu (US PG. Pub. 2023/0135109 A1) in view of Bulat (US PG. Pub. 2023/0298321 A1).

Referring to Claim 1, Wu teaches a computer-implemented method for vision transforming (See Wu, Fig. 2, Method 200, Sect. [0002] a method for processing a signal, an electronic device, and a computer-readable storage medium to the field of deep learning and computer vision technologies) comprising:
for each of one or more channels of each of a set of tiles of an image (See Wu, Fig. 3, Sect. [0038] lines 1-6, the input feature map 302 including sets of tiles in a rows and columns of an image is divided into a first feature map 306 and a second feature map 304 that are independent of each other in a channel dimension.):
splitting the channel into at least a first channel portion and a second channel 
portion (See Wu, Sect. [0038] and [0073], a feature map splitting module configured to split the input feature map into a first feature map and a second feature map that are independent of each other in a channel dimension.);
processing the first channel portion using depthwise convolution (See Wu, Sect. 
[0087] lines 14-17, a depthwise convolution is used to dynamically generate the position codes from the input image, the position codes output by inputting first feature map 306 into the convolution.);
processing the second channel portion with multi-head self-attention (See Wu, 
Sect. [0059], performing multi-headed self-attention on channel dimension of second feature map 304).

	Wu fails to explicitly teach 
combining the processed first channel portion and the processed second channel 
portion; and
identifying an object in the image at least partially based on the combined 
processed first channel portion and processed second channel portion for each of the one or more channels of each of the set of tiles of the image.

	However, Bulat teaches
combining the processed first channel portion and the processed second channel 
portion (See Bulat, Sect. [0077] lines 10-12, each channel in the output vector y.sub.s.sup.l is a linear combination of the corresponding channel of the value vectors, suggesting a channel-wise operation. Thus, the plural channels are linearly combined).

identifying an object in the image at least partially based on the combined 
processed first channel portion and processed second channel portion for each of the one or more channels of each of the set of tiles of the image (See  Fig. 4, Step S100, Object Identification, Sect. [[0085], In FIG. 4  a flowchart for performing image or video recognition using the ML model is displayed at S100 is a method step for receiving an image  depicting at least one feature to be identified, the image comprising a plurality of channels wherein, the image identifies objects, actions, sequences of interest, and so on within an image or video, (step S100).).

	It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wu to incorporate the teachings of Bulat to provide combining the processed first channel portion and the processed second channel portion; and identifying an object in the image at least partially based on the combined processed first channel portion and processed second channel portion for each of the one or more channels of each of the set of tiles of the image. Doing so would improve the efficiency of machine learning models for performing image or video recognition, as recognized by Bulat.

Referring to Claim 2, the combination of Wu in view of Bulat teaches the computer-implemented method of claim 1 (See Wu, Fig. 2, Method 200, Sect. [0002] a method for processing a signal, an electronic device, and a computer-readable storage medium to the field of deep learning and computer vision technologies).

Wu fails to explicitly teach
 wherein the processing of the second channel portion includes reducing a size of tokens of the second channel portion.

However, Bulat teaches
 (See Bulat, Fig. 1, Token Size Rescaling, Sect. [0080], the channel-wise rescaling and toxen mixing implemented by the AV on the left-hand side are replaced, by local token mixing using the shift operator followed by channel-wise rescaling and bias correction. The Affine-shift block is shown on the right-hand side in FIG. 1. Here, the rescaling and channel mixing of the first channel-wise on left hand side in Fig. 1 reduces the token size of the second channel-wise on right hand side of Fig. 1).
	It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wu to incorporate the teachings of Bulat to provide wherein the processing of the second channel portion includes reducing a size of tokens of the second channel portion. Doing so would improve the efficiency of machine learning models for performing image or video recognition, as recognized by Bulat.

Referring to Claim 3, the combination of Wu in view of Bulat teaches the computer-implemented method of claim 1 (See Wu, Fig. 2, Method 200, Sect. [0002] a method for processing a signal, an electronic device, and a computer-readable storage medium to the field of deep learning and computer vision technologies).

Wu fails to explicitly teach
wherein the combining includes processing the processed first channel portion and the processed second channel portion with a multilayer perceptron.

However, Bulat teaches 
wherein the combining includes processing the processed first channel portion and the processed second channel portion with a multilayer perceptron (See Bulat, Sect. [0011], The approximation mechanism may comprise two separate modules to approximate the operations of the attention mechanism. One such module may be used for computing the channel-wise rescaling value using a multilayer perceptron module of the ML model.).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wu to incorporate the teachings of Bulat to provide wherein the combining includes processing the processed first channel portion and the processed second channel portion with a multilayer perceptron. Doing so would improve the efficiency of machine learning models for performing image or video recognition, as recognized by Bulat.	

Referring to Claim 7, the combination of Wu in view of Bulat teaches the computer-implemented method of claim 1 See Wu, Fig. 2, Method 200, Sect. [0002] a method for processing a signal, an electronic device, and a computer-readable storage medium to the field of deep learning and computer vision technologies),

Wu fails to explicitly teach
teach
wherein the splitting, the processing of the first channel portion, the processing of the second channel portion, and the combining are performed for each of the one or more channels of each of the set of tiles of two or more resolutions of the image.

However, Bulat teaches
wherein the splitting, the processing of the first channel portion, the processing of the second channel portion, and the combining are performed for each of the one or more channels of each of the set of tiles of two or more resolutions of the image (See Bulat, Sect. [0063], [0083] and [0101], by using spatio-temporal factorization, low resolution self-attention and hierarchical pyramid architecture, restricting the self-attention computation to local windows, or a local approximation of time attention…the standard hierarchical (pyramidal) structure is followed for these attention-free transformers, where the resolution is dropped between stages, similar to a ResNet.).

It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wu to incorporate the teachings of Bulat to provide wherein the splitting, the processing of the first channel portion, the processing of the second channel portion, and the combining are performed for each of the one or more channels of each of the set of tiles of two or more resolutions of the image. Doing so would improve the efficiency of machine learning models for performing image or video recognition, as recognized by Bulat.	

Referring to Claim 8, arguments analogous to claim 1 are applicable herein. The functions of “A computer-implemented method for vision transforming” in claim 1 perform all of the operations of “A computing system for vision transforming” in claim 8.  Thus, “A computing system for vision transforming” in claim 8 is rejected for reasons explicitly taught in the rejection of claim 1.


Referring to Claim 9, arguments analogous to claim 2 are applicable herein. The functions of “The computer-implemented method” in claim 2 perform all of the operations of “The computing system” in claim 9.  Thus, “The computing system” in claim 9 is rejected for reasons explicitly taught in the rejection of claim 2.

Referring to Claim 10, arguments analogous to claim 3 are applicable herein. The functions of “The computer-implemented method” in claim 3 perform all of the operations of “The computing system” in claim 10.  Thus, “The computing system” in claim 10 is rejected for reasons explicitly taught in the rejection of claim 3.

Referring to Claim 14, arguments analogous to claim 7 are applicable herein. The functions of “The computer-implemented method” in claim 7 perform all of the operations of “The computing system” in claim 14.  Thus, “The computing system” in claim 14 is rejected for reasons explicitly taught in the rejection of claim 7.

Referring to Claim 15, arguments analogous to claim 1 are applicable herein.  Thus, A non-transitory machine-readable medium of claim 15 is explicitly/inherently taught as evidenced by (See Wu, Fig. 9, Computing Unit 901, Sect. [0096], The computing unit 901 may be various general-purpose and/or dedicated processing components with processing and computing capabilities. Some examples of computing unit 901 include, but not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various dedicated AI computing chips, various computing units that run machine learning model algorithms, and a digital signal processor (DSP), and any appropriate processor, controller and microcontroller. The computing unit 901 executes the various methods and processes described above, such as processes 200, 300, 400 and 500 For example, in some embodiments, the processes 200, 300, 400 and 500 may be implemented as a computer software program, which is tangibly contained in a machine-readable medium, such as the storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed on the device 900 via the ROM 902 and/or the communication unit 909. When the computer program is loaded on the RAM 903 and executed by the computing unit 901, one or more steps of the processes 200, 300, 400 and 500 described above may be executed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform the processes 200, 300, 400 and 500 in any other suitable manner (for example, by means of firmware).) and various memories stored therein.

Referring to Claim 16, arguments analogous to claim 2 are applicable herein.  Thus, A non-transitory machine-readable medium of claim 16 is explicitly/inherently taught as evidenced by (See Wu, Fig. 9, Computing Unit 901, Sect. [0096], The computing unit 901 may be various general-purpose and/or dedicated processing components with processing and computing capabilities. Some examples of computing unit 901 include, but not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various dedicated AI computing chips, various computing units that run machine learning model algorithms, and a digital signal processor (DSP), and any appropriate processor, controller and microcontroller. The computing unit 901 executes the various methods and processes described above, such as processes 200, 300, 400 and 500 For example, in some embodiments, the processes 200, 300, 400 and 500 may be implemented as a computer software program, which is tangibly contained in a machine-readable medium, such as the storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed on the device 900 via the ROM 902 and/or the communication unit 909. When the computer program is loaded on the RAM 903 and executed by the computing unit 901, one or more steps of the processes 200, 300, 400 and 500 described above may be executed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform the processes 200, 300, 400 and 500 in any other suitable manner (for example, by means of firmware).) and various memories stored therein.

Referring to Claim 17, arguments analogous to claim 3 are applicable herein.  Thus, A non-transitory machine-readable medium of claim 17 is explicitly/inherently taught as evidenced by (See Wu, Fig. 9, Computing Unit 901, Sect. [0096], The computing unit 901 may be various general-purpose and/or dedicated processing components with processing and computing capabilities. Some examples of computing unit 901 include, but not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various dedicated AI computing chips, various computing units that run machine learning model algorithms, and a digital signal processor (DSP), and any appropriate processor, controller and microcontroller. The computing unit 901 executes the various methods and processes described above, such as processes 200, 300, 400 and 500 For example, in some embodiments, the processes 200, 300, 400 and 500 may be implemented as a computer software program, which is tangibly contained in a machine-readable medium, such as the storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed on the device 900 via the ROM 902 and/or the communication unit 909. When the computer program is loaded on the RAM 903 and executed by the computing unit 901, one or more steps of the processes 200, 300, 400 and 500 described above may be executed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform the processes 200, 300, 400 and 500 in any other suitable manner (for example, by means of firmware).) and various memories stored therein.
Cited Art
12.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure Maaz et al. (US PG. PUB. No. 2024/0193404 A1) discloses An edge computing system, computer readable storage medium and method for object detection, including processing circuitry. The processing circuitry is configured with a hybrid CNN and vision transformer backbone network in an object detection deep learning network. The backbone network receives an image, and includes a first convolutional encoder to extract local features from feature maps of the image, a second stage having consecutive second convolutional encoders, a positional encoding layer, split depth-wise transpose attention (SDTA) encoders, consecutive convolutional encoders, a third stage and a fourth stage SDTA encoder. Each of the SDTA encoders perform multi-headed self-attention by applying a dot product operation across channel dimensions in order to compute cross-covariance across channels to generate attention feature maps. The object detection neural network includes a convolutional network that produces a fixed-size collection of bounding boxes and scores for a presence of object class instances in those boxes.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DARRYL V DOTTIN whose telephone number is (571)270-5471. The examiner can normally be reached M-F 9am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abderrahim Merouan can be reached on 571-270-5254. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/DARRYL V DOTTIN/Primary Examiner, Art Unit 2683






/DARRYL V DOTTIN/Primary Examiner, Art Unit 2683

Read full office action

Prosecution Timeline

Jan 19, 2024

Application Filed

Jan 30, 2026

Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/075,555

Patent 12602618

ARTIFICIAL VISION PARAMETER LEARNING AND AUTOMATING METHOD FOR IMPROVING VISUAL PROSTHETIC SYSTEMS

2y 5m to grant Granted Apr 14, 2026

18/246,492

Patent 12602425

INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND PROGRAM

2y 5m to grant Granted Apr 14, 2026

17/924,382

Patent 12586181

FUNCTIONAL IMAGING FEATURES FROM COMPUTED TOMOGRAPHY IMAGES

2y 5m to grant Granted Mar 24, 2026

18/471,255

Patent 12586150

EFFICIENT BI-DIRECTIONAL IMAGE SCALING

2y 5m to grant Granted Mar 24, 2026

18/516,308

Patent 12585416

IMAGE PROCESSING APPARATUS, CONTROL METHOD OF IMAGE PROCESSING APPARATUS, AND STORAGE MEDIUM

2y 5m to grant Granted Mar 24, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

79%

Grant Probability

92%

With Interview (+13.3%)

2y 1m

Median Time to Grant

Low

PTA Risk

Based on 521 resolved cases by this examiner. Grant probability derived from career allow rate.

COMPUTER-IMPLEMENTED METHODS, COMPUTING SYSTEMS, AND NON-TRANSITORY MACHINE-READABLE MEDIUMS FOR VISION TRANSFORMING

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in for Full Analysis