Last updated: May 29, 2026

Application No. 18/328,950

EFFICIENT AUGMENTATION FOR MULTIMODAL MACHINE LEARNING

Final Rejection §103

Filed

Jun 05, 2023

Examiner

BARNES JR, CARL E

Art Unit

2178

Tech Center

2100 — Computer Architecture & Software

Assignee

Adobe Inc.

OA Round

2 (Final)

Interview Optional

— +24.2% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 32% grant rate with +24.2% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.

Based on 205 resolved cases, 2023–2026

Examiner Intelligence

BARNES JR, CARL E View full profile →

Grants only 32% of cases

Career Allowance Rate

66 granted / 205 resolved

-22.8% vs TC avg

Strong +24% interview lift

Without

With

+24.2%

Interview Lift

resolved cases with interview

Typical timeline

3y 10m

Avg Prosecution

23 currently pending

Career history

238

Total Applications

across all art units

Statute-Specific Performance

§101

0.2%

-39.8% vs TC avg

§103

96.7%

+56.7% vs TC avg

§102

2.3%

-37.7% vs TC avg

§112

0.4%

-39.6% vs TC avg

Black line = Tech Center average estimate • Based on career data from 205 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 06/05/2023 was filed.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Response to Amendment
Claims 1-20 were previously pending and subject to non-final action filed on 11/04/2025. In the response filed 02/04/2026, claims 1, 4, 9 and 15 were amended. Therefore, claims 1-20 are currently pending and subject to the final action below.

Response to Arguments
Applicant’s arguments, see pages 7-16, filed 02/04/2026, with respect to claims 1-20 under 35 U.S.C. 101 abstract idea have been fully considered and are persuasive.  The 101 rejection of claims 1-20 has been withdrawn.

Applicant’s arguments filed 02/04/2026 with respect to claim(s) 1-20 have been considered but are moot because the arguments do not apply to the new combination of references being used in the current rejection.

Examiner Notes
Multi-head attention (MHA) can be self-attention preformed in parallel. Attention (heads) computes in parallel. Focus on different information such as (queries, keys, and values) run in parallel. MHA implementation is for reducing the feature size for each individual head. This allows multiple, smaller attention to operate in parallel.

Claim Objections
Claims 1 and 9 are objected to because claims the limitation of “wherein each of the plurality of MHA outputs includes a different number of tokens from each other,”. For example, one MHA data/information could be focus on adjective-noun pairs and second MHA could be focus on subject-verb; Another example could be one MHA is large scale (size) adjective-noun pairs and second MHA is focus on small scale (size) of tokens of adjective-noun pairs. The difference could be the size or different type of data.
The terms “from each other” is left open to a broad interpretation in the concept regarding MHA.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-4, 6-20 are rejected under 35 U.S.C. 103 as being unpatentable over Abramson (US 20230178076 A1, Filed Date: Dec. 7, 2022) in view of SHRIVASTAVA (US 20210374338 A1, Filed Date: May. 3, 2021).
Regarding independent claim 1, Abramson teaches: A method for multimodal machine learning, comprising: 
obtaining a prompt; (Abramson − [0062] In particular, at each time step the system 100 receives a multi-modal input, [0073] obtain additional information about how to perform the desired task from the other agents 107, e.g., by prompting the other agent(s) 107 to provide additional information. [0074] Processing input observations to generate action(s) Examiner NOTE: inputs, queries and responses are consider “a prompt”)
encoding the prompt using a multimodal encoder to obtain a prompt embedding, (Abramson − [0037] In some implementations, the perceptual encoder neural network includes the text embedding neural network, the image embedding neural network, and the multi-modal Transformer neural network of any of the methods described above, and the encoded representation is the aggregated embedding generated by the multi-modal Transformer neural network. [0164] FIG. 6 shows a specific example architecture of the neural networks used by the action selection system during inference and during training. In particular, in the example of FIG. 6, the perceptual encoder includes an image embedding neural network 610, a text embedding neural network 620, and a multi-modal Transformer 630.)
wherein the encoding comprises generating a plurality of multi-head attention (MHA) outputs (Abramson − [0122] can perform multi-head attention and therefore have multiple heads that each perform self-attention in parallel. The self-attention layer can then combine the outputs of the multiple heads to generate an output of the attention mechanism for the self-attention layer, e.g., by summing, averaging, or concatenating the outputs and then optionally applying a linear transformation to the result [0164] In particular, in the example of FIG. 6, the perceptual encoder includes an image embedding neural network 610, a text embedding neural network 620, and a multi-modal Transformer 630.) Examiner NOTE: MHA can be self-attention performed in parallel. 
and combining the plurality of MHA outputs using a multi-scale aggregator; (Abramson − [0167-0168] Fig. 6, element 630; After the self-attention layers, the multi-modal Transformer 630 aggregates the resulting text embeddings and dedicated embeddings to generate the aggregated embedding (“encoded representation”).)
and generating a response to the prompt based on the prompt embedding. (Abramson − [0008] selecting, using the aggregated embedding, one or more actions to be performed by the agent in response to the observation image;)
Abramson does not explicitly teach: MHA outputs includes a different number of tokens from each other,
However, SHRIVASTAVA teaches: wherein each of the plurality of MHA outputs includes a different number of tokens from each other, (SHRIVASTAVA – [0079] the base decoder 304 is transformer decoder (output). The base decoder 304 include a masked multi-head attention layer pre-trained on large scale text corpora; the base decoder 304 generates the first set of words (tokens) distribution 308; [0080] the fine-tuned decoder 306 is a transformer decoder. The fine-tuned decoder 306 include a masked multi-head attention layer pre-trained on small scale text summaries; the fine-tuned decoder 306 generates a second set of words (tokens) distribution 310. The base decoder output different than the fine-tuned decoder.)
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teaching of Abramson and SHRIVASTAVA.  Abramson and SHRIVASTAVA are analogous art because they are from the same problem-solving area, generating an predicted output using machine learning model. The rationale for doing so would have been to improve readability and tone of generated predict outputs from the machine learning model. Therefore, it would have been obvious to combine Abramson and SHRIVASTAVA to obtain the invention as specified in the instant claim(s).
Regarding dependent claim 2, depends on claim 1, Abramson teaches: wherein: the prompt comprises a text prompt and the prompt embedding comprises a text embedding in a multimodal embedding space. (Abramson − [0122] can perform multi-head attention and therefore have multiple heads that each perform self-attention in parallel. The self-attention layer can then combine the outputs of the multiple heads to generate an output of the attention mechanism for the self-attention layer, e.g., by summing, averaging, or concatenating the outputs and then optionally applying a linear transformation to the result [0164] In particular, in the example of FIG. 6, the perceptual encoder includes an image embedding neural network 610, a text embedding neural network 620, and a multi-modal Transformer 630.)
Regarding dependent claim 3, depends on claim 1, Abramson teaches: wherein: the prompt comprises an image prompt and the prompt embedding comprises an image embedding in a multimodal embedding space. (Abramson − [0122] can perform multi-head attention and therefore have multiple heads that each perform self-attention in parallel. The self-attention layer can then combine the outputs of the multiple heads to generate an output of the attention mechanism for the self-attention layer, e.g., by summing, averaging, or concatenating the outputs and then optionally applying a linear transformation to the result [0164] In particular, in the example of FIG. 6, the perceptual encoder includes an image embedding neural network 610, a text embedding neural network 620, and a multi-modal Transformer 630.)
Regarding dependent claim 4, depends on claim 1, Abramson teaches: applying a plurality of masks to obtain the plurality of MHA outputs. (Abramson – [0024] [0033] [0109] For example, the Transformer can auto-regressively generate the output text tokens while cross-attending into the state representation) Examiner Note: auto-regressive is causal masking.
Regarding dependent claim 6, depends on claim 4, Abramson does not explicitly teach: indicates neighboring words around a central word
However, SHRIVASTAVA teaches wherein: each of the plurality of masks indicates neighboring words around a central word. (SHRIVASTAVA – [0097-0098] decoding process words Learn, from, history, Expert)
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teaching of Abramson and SHRIVASTAVA.  Abramson and SHRIVASTAVA are analogous art because they are from the same problem-solving area, generating an predicted output using machine learning model. The rationale for doing so would have been to improve readability and tone of generated predict outputs from the machine learning model. Therefore, it would have been obvious to combine Abramson and SHRIVASTAVA to obtain the invention as specified in the instant claim(s).
Regarding dependent claim 7, depends on claim 1, Abramson teaches: processing an output of the multi-scale aggregator using an adapter, wherein the prompt embedding is based on an output of the adapter. (Abramson − [0167-0168] Fig. 6, element 630; After the self-attention layers, the multi-modal Transformer 630 aggregates the resulting text embeddings and dedicated embeddings to generate the aggregated embedding (“encoded representation”).)
Regarding dependent claim 8, depends on claim 1, Abramson teaches: wherein: the multimodal encoder comprises a pre-trained encoder that is fine-tuned based on the multi-scale aggregator. (Abramson − [0078] In yet other implementations, the system 190 can first train the neural networks through imitation learning, and then fine-tune the neural networks through reinforcement learning.)
Regarding independent claim 9, Abramson teaches: A method for multimodal machine learning, comprising: 
obtaining training data comprising an image and text describing the image; (Abramson − [0076-0077] the system 190 trains the neural networks through imitation learning, e.g., on ground truth data generated by an expert agent. The ground truth data includes a set of ground truth trajectories that each include, at a sequence of time steps, an observation that includes an observation image and a natural language text sequence, and one or more of a ground truth action or a ground truth text output.)
encoding the text using a multimodal encoder to obtain a predicted text embedding, (Abramson − [0037] In some implementations, the perceptual encoder neural network includes the text embedding neural network, the image embedding neural network, and the multi-modal Transformer neural network of any of the methods described above, and the encoded representation is the aggregated embedding generated by the multi-modal Transformer neural network. [0164] FIG. 6 shows a specific example architecture of the neural networks used by the action selection system during inference and during training. In particular, in the example of FIG. 6, the perceptual encoder includes an image embedding neural network 610, a text embedding neural network 620, and a multi-modal Transformer 630.)
wherein encoding the text comprises generating a plurality of multi-head attention (MHA) text outputs (Abramson − [0164] FIG. 6 shows a specific example architecture of the neural networks used by the action selection system during inference and during training. In particular, in the example of FIG. 6, the perceptual encoder includes an image embedding neural network 610, a text embedding neural network 620, and a multi-modal Transformer 630. Moreover, at each time step, the system can generate both a sequence of multiple (8) actions and an output text sequence.)
and combining the plurality of MHA text outputs using a text multi-scale aggregator; (Abramson − [0164-0167] [0167] the multi-modal Transformer 630 aggregates the resulting text embeddings and dedicated embeddings to generate the aggregated embedding (“encoded representation”).)
encoding the image using the multimodal encoder to obtain a predicted image embedding, (Abramson − [0037] In some implementations, the perceptual encoder neural network includes the text embedding neural network, the image embedding neural network, and the multi-modal Transformer neural network of any of the methods described above, and the encoded representation is the aggregated embedding generated by the multi-modal Transformer neural network. [0164] FIG. 6 shows a specific example architecture of the neural networks used by the action selection system during inference and during training. In particular, in the example of FIG. 6, the perceptual encoder includes an image embedding neural network 610, a text embedding neural network 620, and a multi-modal Transformer 630.)
wherein encoding the image comprises generating a plurality of MHA image outputs, (Abramson − [0122] can perform multi-head attention and therefore have multiple heads that each perform self-attention in parallel. The self-attention layer can then combine the outputs of the multiple heads to generate an output of the attention mechanism for the self-attention layer, e.g., by summing, averaging, or concatenating the outputs and then optionally applying a linear transformation to the result [0164] In particular, in the example of FIG. 6, the perceptual encoder includes an image embedding neural network 610, a text embedding neural network 620, and a multi-modal Transformer 630.) Examiner NOTE: MHA can be self-attention performed in parallel.
and combining the plurality of MHA image outputs using an image multi-scale aggregator; (Abramson − [0167-0168] Fig. 6, element 630; After the self-attention layers, the multi-modal Transformer 630 aggregates the resulting text embeddings and dedicated embeddings to generate the aggregated embedding (“encoded representation”).)
and training the multimodal encoder based on the predicted image embedding and the predicted text embedding. ([0076-0078] [0076] In some implementations, the system 190 trains the neural networks through imitation learning, e.g., on ground truth data generated by an expert agent. The ground truth data includes a set of ground truth trajectories that each include, at a sequence of time steps, an observation that includes an observation image and a natural language text sequence, and one or more of a ground truth action or a ground truth text output. A “ground truth” action is a target action that should be performed by the agent at a given time step (or a given position in an action sequence). Similarly, a “ground truth” text output is a target text output that should be generated by the system at a given time step. For example, the ground truth actions and text outputs can be the actual actions and text outputs (respectively) performed or generated (e.g., spoken) by the expert agent at a given time step. The expert agent can be, e.g., an agent that is controlled by a human user, an agent that is controlled by an already-learned policy, or an agent that is controlled by a hard-coded, heuristic-based policy.)
Abramson does not explicitly teach: MHA outputs includes a different number of tokens from each other,
However, SHRIVASTAVA teaches: wherein each of the plurality of MHA text outputs includes a different number of text tokens from each other, (SHRIVASTAVA – [0079] the base decoder 304 is transformer decoder (output). The base decoder 304 include a masked multi-head attention layer pre-trained on large scale text corpora; the base decoder 304 generates the first set of words (tokens) distribution 308; [0080] the fine-tuned decoder 306 is a transformer decoder. The fine-tuned decoder 306 include a masked multi-head attention layer pre-trained on small scale text summaries; the fine-tuned decoder 306 generates a second set of words (tokens) distribution 310. The base decoder output different than the fine-tuned decoder.)
wherein each of the plurality of MHA image outputs includes a different number of image tokens from each other, (SHRIVASTAVA – [0120] capable of capturing still picture images and/or video images; [0079] the base decoder 304 is transformer decoder (output). The base decoder 304 include a masked multi-head attention layer pre-trained on large scale text corpora; the base decoder 304 generates the first set of words (tokens) distribution 308; [0080] the fine-tuned decoder 306 is a transformer decoder. The fine-tuned decoder 306 include a masked multi-head attention layer pre-trained on small scale text summaries; the fine-tuned decoder 306 generates a second set of words (tokens) distribution 310. The base decoder output different than the fine-tuned decoder.)
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teaching of Abramson and SHRIVASTAVA.  Abramson and SHRIVASTAVA are analogous art because they are from the same problem-solving area, generating an predicted output using machine learning model. The rationale for doing so would have been to improve readability and tone of generated predict outputs from the machine learning model. Therefore, it would have been obvious to combine Abramson and SHRIVASTAVA to obtain the invention as specified in the instant claim(s).
Regarding dependent claim 10, depends on claim 9, Abramson teaches: further comprising: obtaining a pre-trained encoder; and inserting the image multi-scale aggregator and the text multi-scale aggregator to obtain the multimodal encoder. (Abramson − [0122] can perform multi-head attention and therefore have multiple heads that each perform self-attention in parallel. The self-attention layer can then combine the outputs of the multiple heads to generate an output of the attention mechanism for the self-attention layer, e.g., by summing, averaging, or concatenating the outputs and then optionally applying a linear transformation to the result [0164] In particular, in the example of FIG. 6, the perceptual encoder includes an image embedding neural network 610, a text embedding neural network 620, and a multi-modal Transformer 630.)
Regarding dependent claim 11, depends on claim 10, Abramson teaches: wherein: the pre-trained encoder is trained using pre-training data in a first domain and the training data is in a second domain different from the first domain. (Abramson − [0164] Fig. 6 The perceptual encoder includes an image embedding neural network 610, a text embedding neural network 620, and a multi-modal Transformer 630)
Regarding dependent claim 12, depends on claim 10, Abramson teaches: further comprising: inserting a text adapter following the text multi-scale aggregator; and inserting an image adapter following the image multi-scale aggregator. (Abramson − [0164] Fig. 6 The perceptual encoder includes an image embedding neural network 610, a text embedding neural network 620, and a multi-modal Transformer 630)
Regarding dependent claim 13, depends on claim 12, Abramson teaches: further comprising: updating parameters of the text adapter, wherein the multimodal encoder is trained based on the updated parameters of the text adapter. (Abramson − [0077-0078] the system 190 trains the neural networks through reinforcement learning; fine-tune the neural networks through reinforcement learning. [0088] In some cases, the system can be used to control the interactions of the agent with a simulated environment, and the system can train the parameters of the neural networks (e.g., the perceptual encoder neural network 122, the policy neural network 126, and, when used, the language generation neural network) used to control the agent based on the interactions of the agent with the simulated environment. Neural network contains a text encoder and image encoder)
Regarding dependent claim 14, depends on claim 12, Abramson teaches: further comprising: updating parameters of the image adapter, wherein the multimodal encoder is trained based on the updated parameters of the image adapter. (Abramson − [0077-0078] the system 190 trains the neural networks through reinforcement learning; fine-tune the neural networks through reinforcement learning. [0088] In some cases, the system can be used to control the interactions of the agent with a simulated environment, and the system can train the parameters of the neural networks (e.g., the perceptual encoder neural network 122, the policy neural network 126, and, when used, the language generation neural network) used to control the agent based on the interactions of the agent with the simulated environment. Neural network contains a text encoder and image encoder)
Regarding independent claim 15, is directed to an apparatus. Claim 15 have similar/same technical features/limitation as claim 1 and the claims are rejected under the same rationale.
Regarding dependent claim 16, depends on claim 15, Abramson teaches: further comprising: a training component configured to train the multimodal encoder. (Abramson − [0037] In some implementations, the perceptual encoder neural network includes the text embedding neural network, the image embedding neural network, and the multi-modal Transformer neural network of any of the methods described above, and the encoded representation is the aggregated embedding generated by the multi-modal Transformer neural network. [0164] FIG. 6 shows a specific example architecture of the neural networks used by the action selection system during inference and during training. In particular, in the example of FIG. 6, the perceptual encoder includes an image embedding neural network 610, a text embedding neural network 620, and a multi-modal Transformer 630.)
Regarding dependent claim 17, depends on claim 15, Abramson teaches: wherein: the multimodal encoder comprises an image multi-scale aggregator in an image encoder and a text multi-scale aggregator in a text encoder. (Abramson − [0077-0078] the system 190 trains the neural networks through reinforcement learning; fine-tune the neural networks through reinforcement learning. [0088] In some cases, the system can be used to control the interactions of the agent with a simulated environment, and the system can train the parameters of the neural networks (e.g., the perceptual encoder neural network 122, the policy neural network 126, and, when used, the language generation neural network) used to control the agent based on the interactions of the agent with the simulated environment. Neural network contains a text encoder and image encoder)
Regarding dependent claim 18, depends on claim 15, Abramson teaches: wherein: the multimodal encoder comprises an adapter following the multi-scale aggregator. (Abramson − [0167-0168] Fig. 6, element 630; After the self-attention layers, the multi-modal Transformer 630 aggregates the resulting text embeddings and dedicated embeddings to generate the aggregated embedding (“encoded representation”).)
Regarding dependent claim 19, depends on claim 18, Abramson teaches: wherein: the multimodal encoder is pretrained without the multi-scale aggregator and fine-tuned with the multi-scale aggregator. (Abramson − [0078] In yet other implementations, the system 190 can first train the neural networks through imitation learning, and then fine-tune the neural networks through reinforcement learning.)
Regarding dependent claim 20, depends on claim 15, Abramson teaches: further comprising: a response component configured to generate a response to the prompt based on the prompt embedding. (Abramson − [0008] selecting, using the aggregated embedding, one or more actions to be performed by the agent in response to the observation image;)

Claim(s) 5 is rejected under 35 U.S.C. 103 as being unpatentable over Abramson, SHRIVASTAVA as applied to claim 4 above, and further in view of YU (US 20240013504 A1, Filed Date: Oct. 31, 2022).
Regarding dependent claim 5, depends on claim 4, Abramson does not explicitly teach: wherein: each of the plurality of masks indicates neighboring pixels around a central pixel. 
However, YU teaches: wherein: each of the plurality of masks indicates neighboring pixels around a central pixel. (YU − [0019] Embodiments of the present disclosure provide techniques for training and using a machine learning model to perform referring image segmentation. In the referring image segmentation, the machine learning model determines object(s) or region(s) within an image being referenced by a natural language expression.  (8) a mask decoder that takes the refined feature tokens and the location-aware queries as inputs and outputs a mask, and (9) a convolution module that applies a convolution layer to the mask,… thereby generating a segmentation mask that indicates pixels within the image that are associated with the object(s) referenced by the natural language expression. Map each pixel in the image to location and position in the segmentation map.)
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teaching of Abramson and SHRIVASTAVA.  Abramson, SHRIVASTAVA and Yu are analogous art because they are from the same problem-solving area, generating an predicted output using machine learning model. It would have been obvious to combine Abramson, SHRIVASTAVA and Yu to obtain the invention as specified in the instant claim(s).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CARL E BARNES JR whose telephone number is (571)270-3395. The examiner can normally be reached Monday-Friday 9am-6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Stephen Hong can be reached at (571) 272-4124. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/CARL E BARNES JR/Examiner, Art Unit 2178                                                                                                                                                                                                        
/STEPHEN S HONG/Supervisory Patent Examiner, Art Unit 2178

Read full office action

Prosecution Timeline

Show 1 earlier event

Feb 22, 2024

Response after Non-Final Action

Nov 04, 2025

Non-Final Rejection mailed — §103

Jan 27, 2026

Applicant Interview (Telephonic)

Jan 30, 2026

Examiner Interview Summary

Feb 04, 2026

Response Filed

Mar 11, 2026

Final Rejection mailed — §103

May 04, 2026

Examiner Interview Summary

May 04, 2026

Applicant Interview (Telephonic)

Precedent Cases

Applications granted by this same examiner with similar technology

17/898,903

Patent 12639806

MEDICAL SYSTEM, INFORMATION PROCESSING METHOD, AND COMPUTER-READABLE MEDIUM

3y 9m to grant Granted May 26, 2026

17/289,673

Patent 12614280

SYSTEM FOR ESTIMATING PRIMARY OPEN-ANGLE GLAUCOMA LIKELIHOOD

5y 0m to grant Granted Apr 28, 2026

17/953,132

Patent 12584932

SLIDE IMAGING APPARATUS AND A METHOD FOR IMAGING A SLIDE

3y 6m to grant Granted Mar 24, 2026

16/871,512

Patent 12541640

COMPUTING DEVICE FOR MULTIPLE CELL LINKING

5y 8m to grant Granted Feb 03, 2026

16/262,443

Patent 12536464

SYSTEM FOR CONSTRUCTING EFFECTIVE MACHINE-LEARNING PIPELINES WITH OPTIMIZED OUTCOMES

6y 12m to grant Granted Jan 27, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

3-4

Expected OA Rounds

32%

Grant Probability

56%

With Interview (+24.2%)

3y 10m (~11m remaining)

Median Time to Grant

Moderate

PTA Risk

Based on 205 resolved cases by this examiner. Grant probability derived from career allowance rate.