Prosecution Insights
Last updated: April 19, 2026
Application No. 18/650,174

UNIFIED FRAMEWORK FOR VISION PROMPT TUNING

Non-Final OA §101§103
Filed
Apr 30, 2024
Examiner
LIU, XIAO
Art Unit
2664
Tech Center
2600 — Communications
Assignee
NEC Laboratories America Inc.
OA Round
1 (Non-Final)
89%
Grant Probability
Favorable
1-2
OA Rounds
2y 9m
To Grant
99%
With Interview

Examiner Intelligence

Grants 89% — above average
89%
Career Allow Rate
257 granted / 290 resolved
+26.6% vs TC avg
Moderate +12% lift
Without
With
+11.5%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
44 currently pending
Career history
334
Total Applications
across all art units

Statute-Specific Performance

§101
8.8%
-31.2% vs TC avg
§103
50.9%
+10.9% vs TC avg
§102
17.0%
-23.0% vs TC avg
§112
17.4%
-22.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 290 resolved cases

Office Action

§101 §103
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Information Disclosure Statement The information disclosure statement (IDS) submitted on 04/30/2024 has/have been considered by the examiner. Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims 15-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because “computer readable storage medium” may include transitory mediums. Claims 15-20 are also drawn to a computer program for carrying out the instructions / functionalities of the claimed invention, which is no more than just a software computer program (i.e., software per se). The software computer program is non-statutory since it cannot be interpreted to fall into any of the four patentable categories of process, machine, manufacture or composition of matter. Applicant is suggested to exclude transitory embodiments. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claim(s) 1-4, 8-11, and 15-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Jia et al (arXiv:2023.12119v2 20 July 2022), hereinafter Jia in view of Hatamizadeh et al (US 20230394781 A1), hereinafter Hatamizadeh. -Regarding claim 1, Jia discloses a computer-implemented method for dynamic prompt tuning in image processing, comprising (Abstract; FIGS. 1-21 PNG media_image1.png 385 815 media_image1.png Greyscale ): decomposing a received image into segments sized to balance detail retention and computational efficiency for processing by an embedding algorithm designed for token generation (FIG. 2, left side; Page 4, Sec. 3.1.; equation (1)) ; generating tokenized image data by transforming each of the decomposed segments into a sequence of tokens using an embedding process (FIG. 2, left side; Page 4, Sec. 3.1.; equation (1)); dynamically computing parameters for inserting prompts into the sequence of tokens, including a position and length of the prompts, utilizing a one-layer neural network combined with a continuous relaxation of a discrete distribution for optimizing categorical decision-making (FIG. 2(a)-2(b), 5-6, 8; Page 5, Sec.3.2., 2nd paragraph, “inserted into the first Transformer layer L1 only”, equations (4)-(6); Page 9, Last paragraph, “Prompt Location”, Page 10, 2nd paragraph, Last paragraph, “Prompt Length … optimal …”; Page 5, 1st paragraph, “a predicted class probability distribution y”); creating soft prompts based on the dynamically computed parameters and integrating the soft prompts with the tokenized image data (FIGS. 1-2; equations (1)-(8); Page 5, Sec. 3.2.); and processing the integrated image data and prompts using a pretrained vision model with a frozen backbone to enhance image feature recognition (FIGS. 1-2; Abstract, “a wide variety of downstream recognition tasks … achieves significant performance gains”; Page 2, “downstream recognition tasks … VPT beats all other transfer learning baselines …”; Page 6, Last paragraph). Jai does not disclose an embedding process for token generation includes a convolutional neural network. In the same field of endeavor, Hatamizadeh teaches a method for a vision transformer that captures global context (Hatamizadeh: Abstract; FIGS. 1-11). Hatamizadeh further teaches an embedding process for token generation includes a convolutional neural network (Hatamizadeh: FIGS. 3, 5A-5B; [0060], “the image via a global query token that represents an image embedding extracted with CNN-like module”). Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Jai with the teaching of Hatamizadeh by using an embedding process for token generation includes a convolutional neural network in order to capture of long-range information via cross-region interaction (Hatamizadeh: [0060]). -Regarding claim 8, Jia discloses a system for dynamic prompt tuning in image processing, comprising: a processor device; and a memory storing instructions that (one or more processor and memory has to be used in order to implement Jia’s FIGS. 1-2; Page 15, Sec. A., 1st paragraph, “GPUs”), when executed by the processor device, cause the system to (Abstract; FIGS. 1-21): decompose a received image into segments sized to balance detail retention and computational efficiency for processing by an embedding algorithm designed for token generation (FIG. 2, left side; Page 4, Sec. 3.1.; equation (1)) ; generate tokenized image data by transforming each of the decomposed segments into a sequence of tokens using an embedding process (FIG. 2, left side; Page 4, Sec. 3.1.; equation (1)); dynamically compute parameters for inserting prompts into the sequence of tokens, including a position and length of the prompts, utilizing a one-layer neural network combined with a continuous relaxation of a discrete distribution for optimizing categorical decision-making (FIG. 2(a)-2(b), 5-6, 8; Page 5, Sec.3.2., 2nd paragraph, “inserted into the first Transformer layer L1 only”, equations (4)-(6); Page 9, Last paragraph, “Prompt Location”, Page 10, 2nd paragraph, Last paragraph, “Prompt Length … optimal …”; Page 5, 1st paragraph, “a predicted class probability distribution y”); create soft prompts based on the dynamically computed parameters and integrating the soft prompts with the tokenized image data (FIGS. 1-2; equations (1)-(8); Page 5, Sec. 3.2.); and process the integrated image data and prompts using a pretrained vision model with a frozen backbone to enhance image feature recognition (FIGS. 1-2; Abstract, “a wide variety of downstream recognition tasks … achieves significant performance gains”; Page 2, “downstream recognition tasks … VPT beats all other transfer learning baselines …”; Page 6, Last paragraph). Jai does not disclose an embedding process for token generation includes a convolutional neural network. In the same field of endeavor, Hatamizadeh teaches a method for a vision transformer that captures global context (Hatamizadeh: Abstract; FIGS. 1-11). Hatamizadeh further teaches an embedding process for token generation includes a convolutional neural network (Hatamizadeh: FIGS. 3, 5A-5B; [0060], “the image via a global query token that represents an image embedding extracted with CNN-like module”). Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Jai with the teaching of Hatamizadeh by using an embedding process for token generation includes a convolutional neural network in order to capture of long-range information via cross-region interaction (Hatamizadeh: [0060]). -Regarding claim 15, Jia discloses computer program product for dynamic prompt tuning in image processing, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a hardware processor (one or more processor and memory has to be used in order to implement Jia’s FIGS. 1-2; Page 15, Sec. A., 1st paragraph, “GPUs”) to cause the hardware processor to (Abstract; FIGS. 1-21): decompose a received image into segments sized to balance detail retention and computational efficiency for processing by an embedding algorithm designed for token generation (FIG. 2, left side; Page 4, Sec. 3.1.; equation (1)) ; generate tokenized image data by transforming each of the decomposed segments into a sequence of tokens using an embedding process (FIG. 2, left side; Page 4, Sec. 3.1.; equation (1)); dynamically compute parameters for inserting prompts into the sequence of tokens, including a position and length of the prompts, utilizing a one-layer neural network combined with a continuous relaxation of a discrete distribution for optimizing categorical decision-making (FIG. 2(a)-2(b), 5-6, 8; Page 5, Sec.3.2., 2nd paragraph, “inserted into the first Transformer layer L1 only”, equations (4)-(6); Page 9, Last paragraph, “Prompt Location”, Page 10, 2nd paragraph, Last paragraph, “Prompt Length … optimal …”; Page 5, 1st paragraph, “a predicted class probability distribution y”); create soft prompts based on the dynamically computed parameters and integrating the soft prompts with the tokenized image data (FIGS. 1-2; equations (1)-(8); Page 5, Sec. 3.2.); and process the integrated image data and prompts using a pretrained vision model with a frozen backbone to enhance image feature recognition (FIGS. 1-2; Abstract, “a wide variety of downstream recognition tasks … achieves significant performance gains”; Page 2, “downstream recognition tasks … VPT beats all other transfer learning baselines …”; Page 6, Last paragraph). Jai does not disclose an embedding process for token generation includes a convolutional neural network. In the same field of endeavor, Hatamizadeh teaches a method for a vision transformer that captures global context (Hatamizadeh: Abstract; FIGS. 1-11). Hatamizadeh further teaches an embedding process for token generation includes a convolutional neural network (Hatamizadeh: FIGS. 3, 5A-5B; [0060], “the image via a global query token that represents an image embedding extracted with CNN-like module”). Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Jai with the teaching of Hatamizadeh by using an embedding process for token generation includes a convolutional neural network in order to capture of long-range information via cross-region interaction (Hatamizadeh: [0060]). -Regarding claims 2, 9, and 16, Jia in view of Hatamizadeh teaches the method of claim 1, the system of claim 8, and the computer program product of claim 15. The combination further teaches wherein the dynamic computation of the prompt parameters further includes adjusting a position of the soft prompts within the token sequence based on an analysis of the received image to optimize activation patterns within the pretrained vision model (Jia: Page 9 – Page 10, subsection Prompt Location; FIG. 8). -Regarding claims 3, 10, and 17, Jia in view of Hatamizadeh teaches the method of claim 1, the system of claim 8, and the computer program product of claim 15. Jai does not disclose wherein the embedding process further comprises applying a feature scaling technique. In the same field of endeavor, Hatamizadeh teaches a method for a vision transformer that captures global context (Hatamizadeh: Abstract; FIGS. 1-11). Hatamizadeh further wherein the embedding process further comprises applying a feature scaling technique (Hatamizadeh: FIG. 5A; [0058], “an image is split into a plurality of local windows … linear complexity scaling with image size; [0071]; Table 4). Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Jai with the teaching of Hatamizadeh by applying a feature scaling technique in order to normalize the image segments before tokenization to improve a consistency of input data, and extract local short range information (Hatamizadeh: [0058]). -Regarding claims 4, 11, and 18, Jia in view of Hatamizadeh teaches the method of claim 1, the system of claim 8, and the computer program product of claim 15. The combination further teaches wherein the soft prompts are variably integrated within different layers of the token sequence to test various hypotheses for optimal prompt placement regarding the image processing in real-time during use (Jia: FIGS. 2(a), 7; Page 11, 1st paragraph). Claim(s) 5-6, 12-13 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Jia et al (arXiv:2023.12119v2 20 July 2022), hereinafter Jia in view of Hatamizadeh et al (US 20230394781 A1), hereinafter Hatamizadeh, and further in view of Lester et al (US 20230325725 A1), hereinafter Lester. -Regarding claims 5 and 12, Jia in view of Hatamizadeh teaches the method of claim 1 and the system of claim 8. Jia in view of Hatamizadeh does not teach comprising iteratively adjusting the soft prompt parameters based on a determined output accuracy of the vision model using a feedback loop to refine performance of the model on specific image recognition tasks. However, Lester is an analogous art pertinent to the problem to be solved in this application and teaches a method for prompt tuning (Lester: Abstract; FIGS. 1-9). Lester further teaches comprising iteratively adjusting the soft prompt parameters based on a determined output accuracy of the vision model using a feedback loop to refine performance of the model on specific image recognition tasks (Lester: FIG. 9; [0054], “To create a soft prompt … calculate a loss, and the error can be back-propagated …”; [0077], “tune a prompt 904 for a particular task … involve a plurality of iterations”; [0208]). Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Jia in view of Hatamizadeh with the teaching of Lester by iteratively adjusting the soft prompt parameters in order to achieve the better performance for specific image recognition tasks. -Regarding claims 6 and 13, Jia in view of Hatamizadeh, and further in view of Lester teaches the method of claim 5 and the system of claim 12. Jia in view of Hatamizadeh further teaches enhancing an ability of the model to generalize across different image datasets (Jia: Page 6, subsection Downstream Tasks, “two collections of datasets”; Page 7, 2nd paragraph, “FGVC datasets). Jia in view of Hatamizadeh does not teach utilizing historical data from previous image processing tasks. However, Lester is an analogous art pertinent to the problem to be solved in this application and teaches a method for prompt tuning (Lester: Abstract; FIGS. 1-9). Lester further teaches utilizes historical data from previous image processing tasks (Lester: [0029], “training dataset … a plurality of classifications”; [0053]; [0054], “the soft prompt can be extracting evidence about how to perform a task from the labeled dataset”; [0158]). Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Jia in view of Hatamizadeh with the teaching of Lester by iteratively adjusting the soft prompt parameters and teach utilizing historical data from previous image processing tasks in order to achieve the better performance for specific image recognition tasks. -Regarding claim 19, Jia in view of Hatamizadeh teaches the computer program product of claim 15. Jia in view of Hatamizadeh further teaches enhancing an ability of the model to generalize across different image datasets (Jia: Page 6, subsection Downstream Tasks, “two collections of datasets”; Page 7, 2nd paragraph, “FGVC datasets). Jia in view of Hatamizadeh does not teach comprising iteratively adjusting the soft prompt parameters based on a determined output accuracy of the vision model using a feedback loop to refine performance of the model on specific image recognition tasks. Jia in view of Hatamizadeh does not teach utilizing historical data from previous image processing tasks However, Lester is an analogous art pertinent to the problem to be solved in this application and teaches a method for prompt tuning (Lester: Abstract; FIGS. 1-9). Lester further teaches comprising iteratively adjusting the soft prompt parameters based on a determined output accuracy of the vision model using a feedback loop to refine performance of the model on specific image recognition tasks, and teach utilizing historical data from previous image processing tasks (Lester: FIG. 9; [0054], “To create a soft prompt … calculate a loss, and the error can be back-propagated …”; [0077], “tune a prompt 904 for a particular task … involve a plurality of iterations”; [0208]; [0029], “training dataset … a plurality of classifications”; [0053]; [0054], “the soft prompt can be extracting evidence about how to perform a task from the labeled dataset”; [0158]). Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Jia in view of Hatamizadeh with the teaching of Lester by iteratively adjusting the soft prompt parameters and teach utilizing historical data from previous image processing tasks in order to achieve the better performance for specific image recognition tasks. Claim(s) 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Jia et al (arXiv:2023.12119v2 20 July 2022), hereinafter Jia in view of Hatamizadeh et al (US 20230394781 A1), hereinafter Hatamizadeh, and further in view of Burlina et al (US 20240259585 A1), hereinafter Burlina. -Regarding claim 7, Jia in view of Hatamizadeh teaches the method of claim 1. Jia in view of Hatamizadeh does not teach performing autonomous vehicle navigation utilizing the processed image data for real-time accurate image recognition for obstacle detection, decision making, and autonomous vehicle navigation control. However, Burlina is an analogous art pertinent to the problem to be solved in this application and teaches a method for encoding an image or video stream using vision transformer (ViT) as encoder-decoder machine learning (ML) model (Burlina: [0014]-[0015]) in the context of autonomous vehicles (AV) as well as surveillance (Burlina: FIGS. 1-6; [0061]). Burlina further teaches performing autonomous vehicle navigation utilizing the processed image data for real-time accurate image recognition for obstacle detection, decision making, and autonomous vehicle navigation control (Burlina: FIG. 6; [0011]; [0067]; [0093]). Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Jia in view of Hatamizadeh with the teaching of Burlina by performing autonomous vehicle navigation in order to provide a real-world application of vision prompt tuning associated with vision transformer to enhance flexibility and performance of autonomous driving. Claim(s) 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Jia et al (arXiv:2023.12119v2 20 July 2022), hereinafter Jia in view of Hatamizadeh et al (US 20230394781 A1), hereinafter Hatamizadeh, and further in view of Lester et al (US 20230325725 A1), hereinafter Lester, in view of Ransinghe et al (US 20240169692 A1), hereinafter Ransinghe. -Regarding claim 14, Jia in view of Hatamizadeh, and further in view of Lester teaches the system of claim 13. Jia in view of Hatamizadeh, and further in view of Lester does not teach utilizing the tuned prompts for real-time variable object, person, and activity recognition in a security surveillance system to enhance recognition of the variable object, person, and activity in different environmental conditions. However, Ransinghe is an analogous art pertinent to the problem to be solved in this application and teaches a method to train a vision transformer for human action recognition in a video (Ransinghe: FIGS. 1-8). Ransinghe further teaches utilizing vision transformer for real-time variable object, person, and activity recognition in a surveillance system to enhance recognition of the variable object, person, and activity in different environmental conditions (Ransinghe: FIGS. 5A-5B, 6A-6B; [0003]; [0029]). Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Jia in view of Hatamizadeh, and further in view of Lester with the teaching of Ransinghe by using utilizing the tuned prompts associated with the vision transformer for real-time variable object, person, and activity recognition in order to provide a real-world application for a security surveillance system with enhanced flexibility and performance. Claim(s) 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Jia et al (arXiv:2023.12119v2 20 July 2022), hereinafter Jia in view of Hatamizadeh et al (US 20230394781 A1), hereinafter Hatamizadeh, and further in view of Lester et al (US 20230325725 A1), hereinafter Lester, in view of Erol et al (US 20250117920 A1), hereinafter Erol. -Regarding claim 20, Jia in view of Hatamizadeh, and further in view of Lester teaches the system of claim 19. Jia in view of Hatamizadeh, and further in view of Lester does not teach utilizing the tuned prompts for automated, real-time detection of manufacturing defects for quality control in a manufacturing facility. However, Erol is an analogous art pertinent to the problem to be solved in this application and teaches a method for visual inspection of parts manufactured using vision transformer (Erol: Abstract; FIGS. 1-6; [0040]). Erol further teaches utilizing a vision transformer for automated, real-time detection of manufacturing defects for quality control in a manufacturing facility (Erol: FIG. 1; [0019]-[0022]; [0040]; ). Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Jia in view of Hatamizadeh, and further in view of Lester with the teaching of Erol by using utilizing the tuned prompts associated with the vision transformer for real-time detection of manufacturing defects for quality control in a manufacturing facility in order to provide a real-world application with enhanced flexibility and performance. Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to XIAO LIU whose telephone number is (571)272-4539. The examiner can normally be reached Monday-Thursday and Alternate Fridays 8:30-4:30. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Mehmood can be reached at (571) 272-2976. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /XIAO LIU/Primary Examiner, Art Unit 2664
Read full office action

Prosecution Timeline

Apr 30, 2024
Application Filed
Feb 20, 2026
Non-Final Rejection — §101, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12603972
WIRELESS TRANSMITTER IDENTIFICATION IN VISUAL SCENES
2y 5m to grant Granted Apr 14, 2026
Patent 12592069
OBJECT RECOGNITION METHOD AND APPARATUS, AND DEVICE AND MEDIUM
2y 5m to grant Granted Mar 31, 2026
Patent 12579834
Information Extraction Method and Apparatus for Text With Layout
2y 5m to grant Granted Mar 17, 2026
Patent 12576873
SYSTEM AND METHOD OF CAPTIONS FOR TRIGGERS
2y 5m to grant Granted Mar 17, 2026
Patent 12573175
TARGET TRACKING METHOD, TARGET TRACKING SYSTEM AND ELECTRONIC DEVICE
2y 5m to grant Granted Mar 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
89%
Grant Probability
99%
With Interview (+11.5%)
2y 9m
Median Time to Grant
Low
PTA Risk
Based on 290 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month