Last updated: May 29, 2026

Application No. 18/650,174

UNIFIED FRAMEWORK FOR VISION PROMPT TUNING

Non-Final OA §101§103

Filed

Apr 30, 2024

Priority

May 08, 2023 — provisional 63/500,654

Examiner

LIU, XIAO

Art Unit

2664

Tech Center

2600 — Communications

Assignee

NEC Laboratories America Inc.

OA Round

1 (Non-Final)

Interview Optional

— +11.8% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 89% grant rate with +11.8% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.

Based on 300 resolved cases, 2023–2026

Examiner Intelligence

LIU, XIAO View full profile →

Grants 89% — above average

Career Allowance Rate

266 granted / 300 resolved

+26.7% vs TC avg

Moderate +12% lift

Without

With

+11.8%

Interview Lift

resolved cases with interview

Typical timeline

2y 6m

Avg Prosecution

25 currently pending

Career history

335

Total Applications

across all art units

Statute-Specific Performance

§101

1.0%

-39.0% vs TC avg

§103

91.9%

+51.9% vs TC avg

§102

3.2%

-36.8% vs TC avg

§112

2.4%

-37.6% vs TC avg

Black line = Tech Center average estimate • Based on career data from 300 resolved cases

Office Action

§101 §103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 04/30/2024 has/have been considered by the examiner.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 15-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because “computer readable storage medium” may include transitory mediums.
Claims 15-20 are also drawn to a computer program for carrying out the instructions / functionalities of the claimed invention, which is no more than just a software computer program (i.e., software per se). The software computer program is non-statutory since it cannot be interpreted to fall into any of the four patentable categories of process, machine, manufacture or composition of matter. Applicant is suggested to exclude transitory embodiments.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-4, 8-11, and 15-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Jia et al (arXiv:2023.12119v2 20 July 2022), hereinafter Jia in view of Hatamizadeh et al (US 20230394781 A1), hereinafter Hatamizadeh.
	-Regarding claim 1, Jia discloses a computer-implemented method for dynamic prompt tuning in image processing, comprising (Abstract; FIGS. 1-21

    PNG
    media_image1.png
    385
    815
    media_image1.png
    Greyscale
): decomposing a received image into segments sized to balance detail retention and computational efficiency for processing by an embedding algorithm designed for token generation (FIG. 2, left side; Page 4, Sec. 3.1.; equation (1)) ; generating tokenized image data by transforming each of the decomposed segments into a sequence of tokens using an embedding process (FIG. 2, left side; Page 4, Sec. 3.1.; equation (1)); dynamically computing parameters for inserting prompts into the sequence of tokens, including a position and length of the prompts, utilizing a one-layer neural network combined with a continuous relaxation of a discrete distribution for optimizing categorical decision-making (FIG. 2(a)-2(b), 5-6, 8; Page 5, Sec.3.2., 2nd paragraph, “inserted into the first Transformer layer L1 only”, equations (4)-(6); Page 9, Last paragraph, “Prompt Location”, Page 10, 2nd paragraph, Last paragraph, “Prompt Length … optimal …”; Page 5, 1st paragraph, “a predicted class probability distribution y”); creating soft prompts based on the dynamically computed parameters and integrating the soft prompts with the tokenized image data (FIGS. 1-2; equations (1)-(8); Page 5, Sec. 3.2.); and processing the integrated image data and prompts using a pretrained vision model with a frozen backbone to enhance image feature recognition (FIGS. 1-2; Abstract, “a wide variety of downstream recognition tasks … achieves significant performance gains”; Page 2, “downstream recognition tasks … VPT beats all other transfer learning baselines …”; Page 6, Last paragraph).
	Jai does not disclose an embedding process for token generation includes a convolutional neural network.
In the same field of endeavor,  Hatamizadeh teaches a method for a vision transformer that captures global context (Hatamizadeh: Abstract; FIGS. 1-11). Hatamizadeh further teaches an embedding process for token generation includes a convolutional neural network (Hatamizadeh: FIGS. 3, 5A-5B; [0060], “the image via a global query token that represents an image embedding extracted with CNN-like module”).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Jai with the teaching of Hatamizadeh by using an embedding process for token generation includes a convolutional neural network in order to capture of long-range information via cross-region interaction (Hatamizadeh: [0060]).
-Regarding claim 8, Jia discloses a  system for dynamic prompt tuning in image processing, comprising: a processor device; and a memory storing instructions that (one or more processor and memory has to be used in order to implement Jia’s FIGS. 1-2; Page 15, Sec. A., 1st paragraph, “GPUs”), when executed by the processor device, cause the system to (Abstract; FIGS. 1-21): decompose a received image into segments sized to balance detail retention and computational efficiency for processing by an embedding algorithm designed for token generation (FIG. 2, left side; Page 4, Sec. 3.1.; equation (1)) ; generate tokenized image data by transforming each of the decomposed segments into a sequence of tokens using an embedding process (FIG. 2, left side; Page 4, Sec. 3.1.; equation (1)); dynamically compute parameters for inserting prompts into the sequence of tokens, including a position and length of the prompts, utilizing a one-layer neural network combined with a continuous relaxation of a discrete distribution for optimizing categorical decision-making (FIG. 2(a)-2(b), 5-6, 8; Page 5, Sec.3.2., 2nd paragraph, “inserted into the first Transformer layer L1 only”, equations (4)-(6); Page 9, Last paragraph, “Prompt Location”, Page 10, 2nd paragraph, Last paragraph, “Prompt Length … optimal …”; Page 5, 1st paragraph, “a predicted class probability distribution y”); create soft prompts based on the dynamically computed parameters and integrating the soft prompts with the tokenized image data (FIGS. 1-2; equations (1)-(8); Page 5, Sec. 3.2.); and process the integrated image data and prompts using a pretrained vision model with a frozen backbone to enhance image feature recognition (FIGS. 1-2; Abstract, “a wide variety of downstream recognition tasks … achieves significant performance gains”; Page 2, “downstream recognition tasks … VPT beats all other transfer learning baselines …”; Page 6, Last paragraph).
	Jai does not disclose an embedding process for token generation includes a convolutional neural network.
In the same field of endeavor,  Hatamizadeh teaches a method for a vision transformer that captures global context (Hatamizadeh: Abstract; FIGS. 1-11). Hatamizadeh further teaches an embedding process for token generation includes a convolutional neural network (Hatamizadeh: FIGS. 3, 5A-5B; [0060], “the image via a global query token that represents an image embedding extracted with CNN-like module”).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Jai with the teaching of Hatamizadeh by using an embedding process for token generation includes a convolutional neural network in order to capture of long-range information via cross-region interaction (Hatamizadeh: [0060]).
-Regarding claim 15, Jia discloses  computer program product for dynamic prompt tuning in image processing, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a hardware processor (one or more processor and memory has to be used in order to implement Jia’s FIGS. 1-2; Page 15, Sec. A., 1st paragraph, “GPUs”) to cause the hardware processor to (Abstract; FIGS. 1-21): decompose a received image into segments sized to balance detail retention and computational efficiency for processing by an embedding algorithm designed for token generation (FIG. 2, left side; Page 4, Sec. 3.1.; equation (1)) ; generate tokenized image data by transforming each of the decomposed segments into a sequence of tokens using an embedding process (FIG. 2, left side; Page 4, Sec. 3.1.; equation (1)); dynamically compute parameters for inserting prompts into the sequence of tokens, including a position and length of the prompts, utilizing a one-layer neural network combined with a continuous relaxation of a discrete distribution for optimizing categorical decision-making (FIG. 2(a)-2(b), 5-6, 8; Page 5, Sec.3.2., 2nd paragraph, “inserted into the first Transformer layer L1 only”, equations (4)-(6); Page 9, Last paragraph, “Prompt Location”, Page 10, 2nd paragraph, Last paragraph, “Prompt Length … optimal …”; Page 5, 1st paragraph, “a predicted class probability distribution y”); create soft prompts based on the dynamically computed parameters and integrating the soft prompts with the tokenized image data (FIGS. 1-2; equations (1)-(8); Page 5, Sec. 3.2.); and process the integrated image data and prompts using a pretrained vision model with a frozen backbone to enhance image feature recognition (FIGS. 1-2; Abstract, “a wide variety of downstream recognition tasks … achieves significant performance gains”; Page 2, “downstream recognition tasks … VPT beats all other transfer learning baselines …”; Page 6, Last paragraph).
	Jai does not disclose an embedding process for token generation includes a convolutional neural network.
In the same field of endeavor,  Hatamizadeh teaches a method for a vision transformer that captures global context (Hatamizadeh: Abstract; FIGS. 1-11). Hatamizadeh further teaches an embedding process for token generation includes a convolutional neural network (Hatamizadeh: FIGS. 3, 5A-5B; [0060], “the image via a global query token that represents an image embedding extracted with CNN-like module”).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Jai with the teaching of Hatamizadeh by using an embedding process for token generation includes a convolutional neural network in order to capture of long-range information via cross-region interaction (Hatamizadeh: [0060]).
-Regarding claims 2, 9, and 16, Jia in view of Hatamizadeh teaches the method of claim 1, the system of claim 8, and the computer program product of claim 15. The combination further teaches wherein the dynamic computation of the prompt parameters further includes adjusting a position of the soft prompts within the token sequence based on an analysis of the received image to optimize activation patterns within the pretrained vision model (Jia: Page 9 – Page 10, subsection Prompt Location; FIG. 8).
-Regarding claims 3, 10, and 17, Jia in view of Hatamizadeh teaches the method of claim 1, the system of claim 8, and the computer program product of claim 15.
Jai does not disclose wherein the embedding process further comprises applying a feature scaling technique.
In the same field of endeavor,  Hatamizadeh teaches a method for a vision transformer that captures global context (Hatamizadeh: Abstract; FIGS. 1-11). Hatamizadeh further wherein the embedding process further comprises applying a feature scaling technique (Hatamizadeh: FIG. 5A; [0058], “an image is split into a plurality of local windows … linear complexity scaling with image size; [0071]; Table 4).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Jai with the teaching of Hatamizadeh by applying a feature scaling technique in order to  normalize the image segments before tokenization to improve a consistency of input data, and extract local short range information (Hatamizadeh: [0058]).
-Regarding claims 4, 11, and 18, Jia in view of Hatamizadeh teaches the method of claim 1, the system of claim 8, and the computer program product of claim 15. The combination further teaches wherein the soft prompts are variably integrated within different layers of the token sequence to test various hypotheses for optimal prompt placement regarding the image processing in real-time during use (Jia: FIGS. 2(a), 7; Page 11, 1st paragraph).
Claim(s) 5-6, 12-13 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Jia et al (arXiv:2023.12119v2 20 July 2022), hereinafter Jia in view of Hatamizadeh et al (US 20230394781 A1), hereinafter Hatamizadeh, and further in view of Lester et al (US 20230325725 A1), hereinafter Lester.
-Regarding claims 5 and 12,  Jia in view of Hatamizadeh teaches the method of claim 1 and the system of claim 8.
Jia in view of Hatamizadeh does not teach comprising iteratively adjusting the soft prompt parameters based on a determined output accuracy of the vision model using a feedback loop to refine performance of the model on specific image recognition tasks.
However, Lester is an analogous art pertinent to the problem to be solved in this application and teaches a method for prompt tuning (Lester: Abstract; FIGS. 1-9). Lester further teaches comprising iteratively adjusting the soft prompt parameters based on a determined output accuracy of the vision model using a feedback loop to refine performance of the model on specific image recognition tasks (Lester: FIG. 9; [0054], “To create a soft prompt  … calculate a loss, and the error can be back-propagated …”; [0077], “tune a prompt 904 for a particular task … involve a plurality of iterations”; [0208]).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Jia in view of Hatamizadeh with the teaching of Lester by iteratively adjusting the soft prompt parameters in order to achieve the better performance for specific image recognition tasks.
-Regarding claims 6 and 13,  Jia in view of Hatamizadeh, and further in view of  Lester teaches the method of claim 5 and the system of claim 12. Jia in view of Hatamizadeh further teaches enhancing an ability of the model to generalize across different image datasets (Jia: Page 6, subsection Downstream Tasks, “two collections of datasets”; Page 7, 2nd paragraph, “FGVC datasets).
Jia in view of Hatamizadeh does not teach utilizing historical data from previous image processing tasks.
However, Lester is an analogous art pertinent to the problem to be solved in this application and teaches a method for prompt tuning (Lester: Abstract; FIGS. 1-9). Lester further teaches utilizes historical data from previous image processing tasks (Lester: [0029], “training dataset … a plurality of classifications”; [0053]; [0054], “the soft prompt can be extracting evidence about how to perform a task from the labeled dataset”; [0158]).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Jia in view of Hatamizadeh with the teaching of Lester by iteratively adjusting the soft prompt parameters and teach utilizing historical data from previous image processing tasks in order to achieve the better performance for specific image recognition tasks.
-Regarding claim 19, Jia in view of Hatamizadeh teaches the computer program product of claim 15. Jia in view of Hatamizadeh further teaches enhancing an ability of the model to generalize across different image datasets (Jia: Page 6, subsection Downstream Tasks, “two collections of datasets”; Page 7, 2nd paragraph, “FGVC datasets).
Jia in view of Hatamizadeh does not teach comprising iteratively adjusting the soft prompt parameters based on a determined output accuracy of the vision model using a feedback loop to refine performance of the model on specific image recognition tasks. Jia in view of Hatamizadeh does not teach utilizing historical data from previous image processing tasks
However, Lester is an analogous art pertinent to the problem to be solved in this application and teaches a method for prompt tuning (Lester: Abstract; FIGS. 1-9). Lester further teaches comprising iteratively adjusting the soft prompt parameters based on a determined output accuracy of the vision model using a feedback loop to refine performance of the model on specific image recognition tasks, and teach utilizing historical data from previous image processing tasks (Lester: FIG. 9; [0054], “To create a soft prompt  … calculate a loss, and the error can be back-propagated …”; [0077], “tune a prompt 904 for a particular task … involve a plurality of iterations”; [0208]; [0029], “training dataset … a plurality of classifications”; [0053]; [0054], “the soft prompt can be extracting evidence about how to perform a task from the labeled dataset”; [0158]).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Jia in view of Hatamizadeh with the teaching of Lester by iteratively adjusting the soft prompt parameters and teach utilizing historical data from previous image processing tasks in order to achieve the better performance for specific image recognition tasks.
Claim(s) 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Jia et al (arXiv:2023.12119v2 20 July 2022), hereinafter Jia in view of Hatamizadeh et al (US 20230394781 A1), hereinafter Hatamizadeh, and further in view of Burlina et al (US 20240259585 A1), hereinafter Burlina.
-Regarding claim 7, Jia in view of Hatamizadeh teaches the method of claim 1.
Jia in view of Hatamizadeh does not teach performing autonomous vehicle navigation utilizing the processed image data for real-time accurate image recognition for obstacle detection, decision making, and autonomous vehicle navigation control.
However, Burlina is an analogous art pertinent to the problem to be solved in this application and teaches a method for encoding an image or video stream using vision transformer (ViT) as encoder-decoder machine learning (ML) model (Burlina: [0014]-[0015]) in the context of autonomous vehicles (AV) as well as surveillance (Burlina: FIGS. 1-6; [0061]). Burlina further teaches performing autonomous vehicle navigation utilizing the processed image data for real-time accurate image recognition for obstacle detection, decision making, and autonomous vehicle navigation control (Burlina: FIG. 6; [0011]; [0067]; [0093]).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Jia in view of Hatamizadeh with the teaching of Burlina by performing autonomous vehicle navigation in order to provide a real-world application of vision prompt tuning associated with vision transformer to enhance flexibility and performance of autonomous driving.
Claim(s) 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Jia et al (arXiv:2023.12119v2 20 July 2022), hereinafter Jia in view of Hatamizadeh et al (US 20230394781 A1), hereinafter Hatamizadeh, and further in view of Lester et al (US 20230325725 A1), hereinafter Lester, in view of Ransinghe et al (US 20240169692 A1), hereinafter Ransinghe.
-Regarding claim 14, Jia in view of Hatamizadeh, and further in view of Lester teaches the system of claim 13.
Jia in view of Hatamizadeh, and further in view of Lester does not teach utilizing the tuned prompts for real-time variable object, person, and activity recognition in a security surveillance system to enhance recognition of the variable object, person, and activity in different environmental conditions.
However, Ransinghe is an analogous art pertinent to the problem to be solved in this application and teaches a method to train a vision transformer for human action recognition in a video (Ransinghe: FIGS. 1-8). Ransinghe further teaches utilizing vision transformer for real-time variable object, person, and activity recognition in a surveillance system to enhance recognition of the variable object, person, and activity in different environmental conditions (Ransinghe: FIGS. 5A-5B, 6A-6B; [0003]; [0029]).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Jia in view of Hatamizadeh, and further in view of Lester with the teaching of Ransinghe by using utilizing the tuned prompts associated with the vision transformer for real-time variable object, person, and activity recognition in order to provide a real-world application for a security surveillance system with enhanced flexibility and performance.
Claim(s) 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Jia et al (arXiv:2023.12119v2 20 July 2022), hereinafter Jia in view of Hatamizadeh et al (US 20230394781 A1), hereinafter Hatamizadeh, and further in view of Lester et al (US 20230325725 A1), hereinafter Lester, in view of Erol et al (US 20250117920 A1), hereinafter Erol.
-Regarding claim 20, Jia in view of Hatamizadeh, and further in view of Lester teaches the system of claim 19.
Jia in view of Hatamizadeh, and further in view of Lester does not teach utilizing the tuned prompts for automated, real-time detection of manufacturing defects for quality control in a manufacturing facility.
However, Erol is an analogous art pertinent to the problem to be solved in this application and teaches a method for visual inspection of parts manufactured using vision transformer (Erol: Abstract; FIGS. 1-6; [0040]). Erol further teaches utilizing a vision transformer for automated, real-time detection of manufacturing defects for quality control in a manufacturing facility (Erol: FIG. 1; [0019]-[0022]; [0040]; ).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Jia in view of Hatamizadeh, and further in view of Lester with the teaching of Erol by using utilizing the tuned prompts associated with the vision transformer for  real-time detection of manufacturing defects for quality control in a manufacturing facility in order to provide a real-world application with enhanced flexibility and performance.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to XIAO LIU whose telephone number is (571)272-4539. The examiner can normally be reached Monday-Thursday and Alternate Fridays 8:30-4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Mehmood can be reached at (571) 272-2976. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/XIAO LIU/Primary Examiner, Art Unit 2664

Read full office action

Prosecution Timeline

Apr 30, 2024

Application Filed

Feb 24, 2026

Non-Final Rejection mailed — §101, §103

May 08, 2026

Interview Requested

May 15, 2026

Applicant Interview (Telephonic)

May 15, 2026

Examiner Interview Summary

Precedent Cases

Applications granted by this same examiner with similar technology

18/234,172

Patent 12639819

IMAGE PROCESSING APPARATUS, IMAGE PROCESSING METHOD, AND NON-TRANSITORY STORAGE MEDIUM

2y 9m to grant Granted May 26, 2026

18/347,281

Patent 12639967

SYSTEMS AND METHODS FOR DETECTING AND RECOGNIZING A RAILCAR IDENTIFIER

2y 10m to grant Granted May 26, 2026

18/074,204

Patent 12626518

Method to Detect Lane Segments for Creating High Definition Maps

3y 5m to grant Granted May 12, 2026

18/208,144

Patent 12626507

METHOD AND APPARATUS FOR VIDEO ACTION CLASSIFICATION

2y 11m to grant Granted May 12, 2026

18/113,785

Patent 12608950

SYSTEMS AND METHODS FOR DETECTING OBJECTS BASED ON LIDAR DATA

3y 1m to grant Granted Apr 21, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

89%

Grant Probability

99%

With Interview (+11.8%)

2y 6m (~5m remaining)

Median Time to Grant

Low

PTA Risk

Based on 300 resolved cases by this examiner. Grant probability derived from career allowance rate.