Last updated: April 19, 2026
Application No. 18/243,348
NEURAL NETWORK PROMPT TUNING

Non-Final OA §101§103§112
Filed
Sep 07, 2023
Examiner
CHUANG, SU-TING
Art Unit
2146
Tech Center
2100 — Computer Architecture & Software
Assignee
Nvidia Corporation
OA Round
1 (Non-Final)
This examiner grants 52% of cases after interview

— +39.7% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 101 resolved cases, 2023–2026
Examiner Intelligence

CHUANG, SU-TING View full profile →
Grants 52% of resolved cases
Career Allow Rate
52 granted / 101 resolved
-3.5% vs TC avg
Strong +40% interview lift
Without
With
+39.7%
Interview Lift
resolved cases with interview
Typical timeline
4y 5m
Avg Prosecution
28 currently pending
Career history
129
Total Applications
across all art units
Statute-Specific Performance

§101
27.4%
-12.6% vs TC avg
§103
46.3%
+6.3% vs TC avg
§102
10.8%
-29.2% vs TC avg
§112
11.7%
-28.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 101 resolved cases
Office Action

§101 §103 §112
DETAILED ACTION Claims 1-20 are pending and have been examined. Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA. Information Disclosure Statement The information disclosure statements (IDS) submitted on 11/30/2023 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements are being considered by the examiner. Claim Rejections - 35 USC § 112 The following is a quotation of 35 U.S.C. 112(b): (b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention. The following is a quotation of 35 U.S.C. 112 (pre-AIA), second paragraph: The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention. Claims 1-20 are rejected under 35 U.S.C. 112(b) or pre-AIA 35 U.S.C. 112, second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA the applicant regards as the invention. Claims 1, 8 and 15 recite the limitation “a most consistent output of one or more pre-trained neural networks” which comprises relative terms and therefore renders the claim indefinite. The terms “a most consistent output” are not defined by the claims, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. The specification does not provide examples or teachings of usage within the context of “a most consistent output.” See MPEP 2173.05(b). Claims 2-7, 9-14 and 16-20 are also rejected due to their dependency on a rejected claim. Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims 1-20 rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more Step 1 : Claims 1- 7 recite a processor. Claims 8-14 recite a method. Claims 15-20 recite a computer system. Therefore, claims 1-7 are directed to a machine, claims 8-14 are directed to a process, and claims 15-20 are directed to a machine. With respect to claims 1, 8 and 15: 2A Prong 1: The claim recites a judicial exception. cause a most consistent output of one or more pre-trained neural networks to be selected based, at least in part, on a plurality of variances of one or more inputs to the one or more neural networks (mental process – evaluation or judgement, a human can manually select the most consistent output of a pre-trained model based on input ) 2A Prong 2: The judicial exception is not integrated into a practical application. (claim 15) one or more processors and memory storing executable instructions that, if performed by the one or more processors (mere instructions to apply an exception, (2) Whether the claim invokes computers - MPEP 2106.05(f); generic computer components ) Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is directed to an abstract idea. 2B: The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. (claim 15) one or more processors and memory storing executable instructions that, if performed by the one or more processors (mere instructions to apply an exception, (2) Whether the claim invokes computers - MPEP 2106.05(f); generic computer components ) Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible. With respect to claims 2 and 16: 2A Prong 1: The claim recites a judicial exception. wherein the one or more inputs to the one or more neural networks comprise one or more images (mental process – evaluation or judgement; claim 1 recites an output of pre-trained neural networks to be selected based on inputs, which is an abstract idea. Specifying the details of the input does not change the scope of the claim. ) With respect to claims 3 and 17: 2A Prong 1: The claim recites a judicial exception. wherein the one or more inputs to the one or more neural networks comprise one or more text prompts (mental process – evaluation or judgement; claim 1 recites an output of pre-trained neural networks to be selected based on inputs, which is an abstract idea. Specifying the details of inputs does not change the scope of the claim. ) With respect to claims 4 and 18: 2A Prong 1: The claim recites a judicial exception. wherein the one or more neural networks include a pre-trained vision language model (mental process – evaluation or judgement; claim 1 recites an output of pre-trained neural networks to be selected based on inputs, which is an abstract idea. Specifying the details of the neural networks or the model does not change the scope of the claim. ) With respect to claims 5 and 19: 2A Prong 1: The claim recites a judicial exception. wherein the plurality of variances of the one or more inputs to the one or more neural networks are based, at least in part, on one or more randomly augmented views of one or more images (mental process – evaluation or judgement; claim 1 recites an output of pre-trained neural networks to be selected based on inputs, which is an abstract idea. Specifying the details of inputs does not change the scope of the claim. ) With respect to claim 6: 2A Prong 1: The claim recites a judicial exception. wherein a prompt to the one or more neural networks is tuned during inferencing (mental process – evaluation or judgement; under BRI, a human can manually tune or adjust a prompt) With respect to claim 7: 2A Prong 1: The claim recites a judicial exception. wherein a prompt to the one or more neural networks is tuned based, at least in part, on classifying the plurality of variances of the one or more inputs to the one or more neural networks based, at least in part, on removing one or more of the variances from the plurality of variances and computing an average of the plurality of variances (mental process – evaluation or judgement; under BRI, a human can manually tune or adjust a prompt based on classifying the inputs to the neural networks based on removing the variances and computing an average of the variances) With respect to claim 9: 2A Prong 1: The claim recites a judicial exception. wherein the one or more inputs to the one or more neural networks comprise a single image (mental process – evaluation or judgement; claim 8 recites an output of pre-trained neural networks to be selected based on inputs, which is an abstract idea. Specifying the details of inputs does not change the scope of the claim. ) With respect to claim 10: 2A Prong 1: The claim recites a judicial exception. wherein the one or more inputs to the one or more neural networks comprise one or more text prompts based, at least in part, on content of a single image (mental process – evaluation or judgement; claim 8 recites an output of pre-trained neural networks to be selected based on inputs, which is an abstract idea. Specifying the details of inputs does not change the scope of the claim. ) With respect to claim 11: 2A Prong 1: The claim recites a judicial exception. wherein the one or more neural networks include a vision language model (mental process – evaluation or judgement; claim 8 recites an output of pre-trained neural networks to be selected based on inputs, which is an abstract idea. Specifying the details of neural networks or the model does not change the scope of the claim. ) With respect to claim 12: 2A Prong 1: The claim recites a judicial exception. further comprising: generating multiple randomly augmented views of the one or more inputs to the one or more neural networks (mental process – evaluation or judgement, a human can manually generate augmented inputs) With respect to claim 13: 2A Prong 1: The claim recites a judicial exception. further comprising: generating one or more confidence metrics of the plurality of variances of the one or more inputs to the one or more neural networks (mental process – evaluation or judgement, a human can manually generate confidence metrices of the inputs) With respect to claim 14: 2A Prong 1: The claim recites a judicial exception. further comprising: classifying one or more multiple randomly augmented views of the one or more inputs to the one or more neural networks based, at least in part, on an average value of confidence metrics of the plurality of variances of the one or more inputs to the one or more neural networks (mental process – evaluation or judgement, a human can manually classify inputs based on an average value of confidence metrics of inputs) With respect to claim 20: 2A Prong 1: The claim recites a judicial exception. wherein a prompt to the one or more neural networks is tuned during inferencing based, at least in part, on minimizing entropy of the plurality of variances of the one or more inputs to the one or more neural networks (mental process – evaluation or judgement, under BRI, a human can manually tune or adjust a prompt based on minimizing entropy of inputs) Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows: 1. Determining the scope and contents of the prior art. 2. Ascertaining the differences between the prior art and the claims at issue. 3. Resolving the level of ordinary skill in the pertinent art. 4. Considering objective evidence present in the application indicating obviousness or nonobviousness. Claims 1-6 and 8-19 rejected under 35 U.S.C. 103 as being unpatentable over Zhou ("Learning to Prompt for Vision-Language Models" 20220812) in view of Zhang ("PointCLIP: Point Cloud Understanding by CLIP" 20211204) In regard to claims 1, 8 and 15, Zhou teaches: A processor comprising: one or more circuits to (Zhou, p. 3, 1 Introduction "To automate prompt engineering specifically for pre-trained vision-language models, we propose a simple approach based on continuous prompt learning and provide two implementations ... We open-source our project at https://github.com/KaiyangZhou/CoOp."; the source code and the prompt learning implementation inherently teaches all the computer components ) cause a most consistent output of one or more pre-trained neural networks to be selected based, at least in part, on a plurality of... one or more inputs to the one or more neural networks. (Zhou, p. 4, Fig 2 "Fig. 2 Overview of Context Optimization (CoOp). The main idea is to model a prompt's context using a set of learnable vectors , which can be optimized through minimizing the classification loss ."; p. 5, 3.2 Context Optimization "the prompt given to the text encoder g(·) is designed with the following form, t = [V]1[V]2...[V]M[CLASS] (2)... Other than placing the class token at the end of a sequence as in Equation (2), we can also put it in the middle like t = [V]1...[V]M/2[CLASS] [V]M/2 +1...[V]M, (4) which increases flexibility for learning"; also see Fig. 2, the class 'airplane' marked in red and 'maximize the score for the ground-truth class' is to maximize the similarity score of the class of 'airplane'; during test-time (i.e. inference time), the updated prompt vector (i.e. the 'refined' soft prompt) is the output of the optimization step [output of pre-trained neural networks] , and the optimization process will result in minimizing the classification loss (i.e. maximize the similarity score of text and image feature vectors), therefore the most consistent ‘refined’ prompt is determined/selected based on text and image inputs. [cause a most consistent output to be selected based on inputs to the neural 379730 4565650 0 0 networks] ) Zhou does not teach, but Zhang teaches: variances of one or more inputs (Zhang, p. 4, Figure 2 "... PointCLIP projects the point cloud onto multi-view depth maps [variances of inputs] , and conducts 3D recognition via CLIP pre-trained in 2D."; see Fig. 3, M views [variances of inputs] ) 76200 0 0 0 It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Zhou to incorporate the teachings of Zhang by including 2D multi-view images projected from 3D. Doing so would be effective for 3D point cloud understanding via CLIP. (Zhang, p. 1, Abstract "In this paper, we identify such a setting is feasible by proposing PointCLIP, which conducts alignment between CLIP encoded point cloud and 3D category texts. Specifically, we encode a point cloud by projecting it into multi-view depth maps... Therefore, PointCLIP is a promising alternative for effective 3D point cloud understanding via CLIP under low resource cost and data regime.") Claims 8 and 15 recite substantially the same limitation as claim 1, therefore the rejection applied to claim 1 also apply to claims 8 and 15. In addition, Zhou teaches: A computer system comprising: one or more processors and memory storing executable instructions that, if performed by the one or more processors, (Zhou, p. 3, 1 Introduction "To automate prompt engineering specifically for pre-trained vision-language models, we propose a simple approach based on continuous prompt learning and provide two implementations ... We open-source our project at https://github.com/KaiyangZhou/CoOp."; the source code and the prompt learning implementation inherently teaches all the computer components ) In regard to claims 2 and 16, Zhou teaches: wherein the one or more inputs to the one or more neural networks comprise one or more images. (Zhou, p. 4, Zero-Shot Inference "let f be image features extracted by the image encoder for an image x [image prompt] "; also see Fig. 2, an image of airplane is provided to the image encoder ) In regard to claims 3 and 17, Zhou teaches: wherein the one or more inputs to the one or more neural networks comprise one or more text prompts. (Zhou, . 5, 3.2 Context Optimization " the prompt [text prompt] given to the text encoder g(·) is designed with the following form, t = [V]1[V]2...[V]M[CLASS] (2)... Other than placing the class token at the end of a sequence as in Equation (2), we can also put it in the middle like t = [V]1...[V]M/2[CLASS][V]M/2 +1...[V]M, (4) which increases flexibility for learning"; also see Fig. 2, the text prompt (learnable context + [CLASS]) is provided to the image encoder ) In regard to claims 4 and 18, Zhou teaches: wherein the one or more neural networks include a pre-trained vision language model. (Zhou, p. 3, 1 Introduction "To automate prompt engineering specifically for pre-trained vision-language models , we propose a simple approach based on continuous prompt learning..."; p. 4, 3.1 Vision-Language Pre-training "We briefly introduce vision-language pre-training with a particular focus on CLIP [a pre-trained vision language model] ... CLIP consists of two encoders, one for images and the other for text.") In regard to claims 5 and 19, Zhou does not teach, but Zhang teaches: wherein the plurality of variances of the one or more inputs to the one or more neural networks are based, at least in part, on one or more randomly augmented views of one or more images. (Zhang, p. 12, B. Implementation Details "In ModelNet10 and ModelNet40, We apply random scaling and translation for training augmentation [randomly augmented views] , but in the challenging ScanObjectNN, we append jitter and random rotation following [43]. During training, we freeze CLIP’s both visual and textual encoders, and only fine-tune the inter-view adapter.") The rationale for combining the teachings of Zhou and Zhang is the same as set forth in the rejection of claim 1. In regard to claim 6, Zhou teaches: wherein a prompt to the one or more neural networks is tuned during inferencing. (Zhou, p. 5, 3.2 Context Optimization "We propose Context Optimization (CoOp), which avoids manual prompt tuning by modeling context words with continuous vectors that are end-to-end learned [a prompt is tuned] from data while the massive pre-trained parameters are frozen . [not during training, during inferencing] "; also see Fig. 2, the text prompt ( learnable context + [CLASS]) [learnable, tunable] ) In regard to claim 9, Zhou teaches: wherein the one or more inputs to the one or more neural networks comprise a single image. (Zhou, p. 4, Zero-Shot Inference "let f be image features extracted by the image encoder for an image x [image prompt] "; also see Fig. 2, an image of airplane is provided to the image encoder ) In regard to claim 10, Zhou teaches: wherein the one or more inputs to the one or more neural networks comprise one or more text prompts based, at least in part, on content of a single image. (Zhou, p. 5, 3 Methodology "vision-language pre-training allows open-set visual concepts to be explored through a high-capacity text encoder , leading to a broader semantic space and in turn making the learned representations more transferable to downstream tasks... By forwarding a prompt t to the text encoder g(·), we can obtain a classification weight vector representing a visual concept [text prompt is encoded with a visual concept generated from a dataset including millions of images, i.e. based in part on content of a single image] "; p. 4, 3.1 Vision-Language Pre-training "To learn diverse visual concepts that are more transferable to downstream tasks, CLIP's team collects a large training dataset consisting of 400 million image -text pairs.") In regard to claim 11, Zhou teaches: wherein the one or more neural networks include a vision language model. (Zhou, p. 3, 1 Introduction "To automate prompt engineering specifically for pre-trained vision-language models , we propose a simple approach based on continuous prompt learning..."; p. 4, 3.1 Vision-Language Pre-training "We briefly introduce vision-language pre-training with a particular focus on CLIP [a vision language model] ... CLIP consists of two encoders, one for images and the other for text.") In regard to claim 12, Zhou does not teach, but Zhang teaches: further comprising: generating multiple randomly augmented views of the one or more inputs to the one or more neural networks. (Zhang, p. 12, B. Implementation Details "In ModelNet10 and ModelNet40, We apply random scaling and translation for training augmentation [randomly augmented views] , but in the challenging ScanObjectNN, we append jitter and random rotation following [43]. During training, we freeze CLIP’s both visual and textual encoders, and only fine-tune the inter-view adapter.") The rationale for combining the teachings of Zhou and Zhang is the same as set forth in the rejection of claim 1. In regard to claim 13, Zhou does not teach, but Zhang teaches: further comprising: generating one or more confidence metrics of the plurality of variances of the one or more inputs to the one or more neural networks. (Zhang, p. 3, 3.1. A Revisit of CLIP "the feature of every test image is encoded by CLIP’s visual encoder to fv... and the classification are computed as, logits = fvWtT [confidence metrics] ; pi = softmaxi(logits), (1) where softmax_i () and pi denote the softmax function and predicted probability for category i."; p. 4, Zero-shot Classification "classification logitsi of each view are separately calculated and the final logitsp of point cloud are acquired by their weighted summation, ... logotsp = ... (2) [confidence metrics of the variances of the inputs] ") The rationale for combining the teachings of Zhou and Zhang is the same as set forth in the rejection of claim 1. In regard to claim 14, Zhou does not teach, but Zhang teaches: further comprising: classifying one or more multiple randomly augmented views of the one or more inputs to the one or more neural networks based, at least in part, on an average value of confidence metrics of the plurality of variances of the one or more inputs to the one or more neural networks. (Zhang, p. 3, 3.1. A Revisit of CLIP "the feature of every test image is encoded by CLIP’s visual encoder to fv... and the classification [classifying image (including views of inputs)] are computed as, logits = fvWtT [confidence metrics] ; pi = softmaxi(logits) , [an average value of confidence metrics] (1) where softmax_i (·) and pi denote the softmax function and predicted probability for category i."; p. 4, Zero-shot Classification "classification logitsi of each view are separately calculated and the final logitsp of point cloud are acquired by their weighted summation, ... logotsp = ... (2) [confidence metrics of the variances of the inputs] "; softmax = e^zi / Σj=1:K e^zj, i.e., exponentiated input value is divided by the sum of all exponentiated inputs, i.e. [an average value of confidence metrics] ; see claim 12 for 'randomly augmented views' ) The rationale for combining the teachings of Zhou and Zhang is the same as set forth in the rejection of claim 1. Claims 7 and 20 rejected under 35 U.S.C. 103 as being unpatentable over Zhou in view of Zhang, as applied to claims 1 and 15, and in further view of Wyss (US 20220189185 A1, filed on 20211101) In regard to claim 7, Zhou does not teach, but Zhang teaches: wherein a prompt to the one or more neural networks is tuned based, at least in part, on (Zhang, p. 5, 3.3. Interview Adapter for PointCLIP "For training , we freeze CLIP’s both visual and textual encoders and fine-tune the learnable adapter [a prompt is tuned based on the loss] via a cross-entropy loss."; p. 5 "Figure 3. Detailed structure of the proposed Inter-view Adapter. Given multi-view features of a point cloud, the adapter extracts its global representation and generates view-wise adapted features."; soft prompts are tuned based on the classification loss ) classifying the plurality of variances of the one or more inputs to the one or more neural networks... and computing an average of the plurality of variances. (Zhang, p. 3, 3.1. A Revisit of CLIP "the feature of every test image is encoded by CLIP’s visual encoder to fv... and the classification [classifying image (including views of inputs), classifying the variances of the inputs] are computed as, logits = fvWtT; pi = softmaxi(logits) , [computing an average of the variances] (1) where softmax_i (·) and pi denote the softmax function and predicted probability for category i."; p. 4, Zero-shot Classification "classification logitsi of each view are separately calculated and the final logitsp of point cloud are acquired by their weighted summation, ... logotsp = ... (2) [logit for the variances] "; softmax = e^zi / Σj=1:K e^zj, i.e., exponentiated input value is divided by the sum of all exponentiated inputs, i.e. [an average value of the variances] ) The rationale for combining the teachings of Zhou and Zhang is the same as set forth in the rejection of claim 1. Zhou and Zhang do not teach, but Wyss teaches: based, at least in part, on removing one or more of the variances from the plurality of variances (Wyss, [0006] "a high entropy can map to a low confidence score while a low entropy can map to a high confidence score."; [0033] " image entropy can be determined using available tools and processes within the image analysis software. This value, once computed for an image, is provided to the above confidence-determination computation. Note that a low confidence score, in the illustrative examples herein, implies that image data should be discarded and a high confidence score implies that image data should be noted for potential labelling as a candidate image . [using only input image data having low entropy (high confidence), i.e. minimizing entropy of the inputs] " [0018] "The vision system process(or) 130 further includes a scoring process(or) 138 that allows confidence scores to be generated with respect to input image data 132 that can be used for training candidate images. The scoring process(or) 138 operates to generate a confidence score that falls above or below a given threshold by which the image data is either used to provide a candidate training image, or is discarded .") It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Zhou and Zhang to incorporate the teachings of Wyss by including entropy (confidence value/score) of the input image data. Doing so would provide the best image candidates. (Wyss, [0028] "Thus, the computed confidence value/score can be used to assist in labelling of images... Such labelled images typically represent distinct/interesting characteristics that provide the best candidates...") In regard to claim 20, Zhou does not teach, but Zhang teaches: wherein a prompt to the one or more neural networks is tuned during inferencing based, at least in part, on… (Zhang, p. 5, 3.3. Interview Adapter for PointCLIP "For training , we freeze CLIP’s both visual and textual encoders and fine-tune the learnable adapter [a prompt is tuned] via a cross-entropy loss.") The rationale for combining the teachings of Zhou and Zhang is the same as set forth in the rejection of claim 1. Zhou and Zhang do not teach, but Wyss teaches: minimizing entropy of the plurality of variances of the one or more inputs to the one or more neural networks. (Wyss, [0006] "a high entropy can map to a low confidence score while a low entropy can map to a high confidence score."; [0033] " image entropy can be determined using available tools and processes within the image analysis software. This value, once computed for an image, is provided to the above confidence-determination computation. Note that a low confidence score, in the illustrative examples herein, implies that image data should be discarded and a high confidence score implies that image data should be noted for potential labelling as a candidate image . [using only input image data having low entropy (high confidence), i.e. minimizing entropy of the inputs] " [0018] "The vision system process(or) 130 further includes a scoring process(or) 138 that allows confidence scores to be generated with respect to input image data 132 that can be used for training candidate images. The scoring process(or) 138 operates to generate a confidence score that falls above or below a given threshold by which the image data is either used to provide a candidate training image, or is discarded .") The rationale for combining the teachings of Zhou, Zhang and Wyss is the same as set forth in the rejection of claim 7. Conclusion The reference made of record and not relied upon is considered pertinent to applicant's disclosure. Shu ("Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models" 20220915) teaches test-time prompt tuning (TPT), a method that can learn adaptive prompts on the fly with a single test sample. Any inquiry concerning this communication or earlier communications from the examiner should be directed to FILLIN "Examiner name" \* MERGEFORMAT SU-TING CHUANG whose telephone number is FILLIN "Phone number" \* MERGEFORMAT (408)918-7519 . The examiner can normally be reached FILLIN "Work Schedule?" \* MERGEFORMAT Monday - Thursday 8-5 PT . Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, FILLIN "SPE Name?" \* MERGEFORMAT Usmaan Saeed can be reached at FILLIN "SPE Phone?" \* MERGEFORMAT (571) 272-4046 . The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /SU-TING CHUANG/ Examiner, Art Unit 2146
Read full office action
Prosecution Timeline

Sep 07, 2023
Application Filed
Mar 28, 2026
Non-Final Rejection — §101, §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

16/953,977
Patent 12561600
LINEAR TIME ALGORITHMS FOR PRIVACY PRESERVING CONVEX OPTIMIZATION
2y 5m to grant Granted Feb 24, 2026
16/984,909
Patent 12518154
TRAINING MULTIMODAL REPRESENTATION LEARNING MODEL ON UNNANOTATED MULTIMODAL DATA
2y 5m to grant Granted Jan 06, 2026
17/224,858
Patent 12481725
SYSTEMS AND METHODS FOR DOMAIN-SPECIFIC ENHANCEMENT OF REAL-TIME MODELS THROUGH EDGE-BASED LEARNING
2y 5m to grant Granted Nov 25, 2025
16/540,414
Patent 12468951
Unsupervised outlier detection in time-series data
2y 5m to grant Granted Nov 11, 2025
18/609,221
Patent 12412095
COOPERATIVE LEARNING NEURAL NETWORKS AND SYSTEMS
2y 5m to grant Granted Sep 09, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
52%
Grant Probability
91%
With Interview (+39.7%)
4y 5m
Median Time to Grant
Low
PTA Risk
Based on 101 resolved cases by this examiner. Grant probability derived from career allow rate.