Prosecution Insights
Last updated: April 19, 2026
Application No. 18/982,577

MULTIMODAL FOUNDATION MODEL FOR PATHOLOGY ANALYSIS

Non-Final OA §102§103
Filed
Dec 16, 2024
Examiner
BEG, SAMAH A
Art Unit
2676
Tech Center
2600 — Communications
Assignee
The Brigham And Women'S Hospital Inc.
OA Round
2 (Non-Final)
78%
Grant Probability
Favorable
2-3
OA Rounds
2y 4m
To Grant
99%
With Interview

Examiner Intelligence

Grants 78% — above average
78%
Career Allow Rate
249 granted / 317 resolved
+16.5% vs TC avg
Strong +30% interview lift
Without
With
+29.9%
Interview Lift
resolved cases with interview
Typical timeline
2y 4m
Avg Prosecution
16 currently pending
Career history
333
Total Applications
across all art units

Statute-Specific Performance

§101
10.8%
-29.2% vs TC avg
§103
42.3%
+2.3% vs TC avg
§102
20.2%
-19.8% vs TC avg
§112
21.2%
-18.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 317 resolved cases

Office Action

§102 §103
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Response to Amendment The Amendment filed 08/25/2025 overcomes the following: Objection to the Abstract of the disclosure; Objection to Claims 1, 8 and 14-17 for minor informalities; and Rejection of claims 1-7 and 20 under 35 USC 112(b). Claim 20 has been cancelled by the amendment. Positive Statement for Eligibility under 35 USC § 101 An abstract idea rejection under 35 USC 101 was considered but it was determined that the claims are patent eligible. Claims 1-19 are patent eligible as they recite combinations of elements which as a whole amount to significantly more than an abstract idea. Each of the claims 1, 8 and 17, on its surface, may seem to comprise a mental process of comparison of received data with additional elements amounting to no more than insignificant extra-solution activities, but Examiner has determined that the claimed elements recited in each of claims 1, 8 and 17 as a whole reflect an improvement to existing systems/methods used for pathology image analysis, in that the claims recite use of pretrained visual-language models for pathology image analysis to perform image-to-text or text-to-image retrieval. Thus, the claims cover a particular solution to a problem or a particular way to achieve a desired outcome, as opposed to merely claiming the idea of a solution or outcome. This would be considered a practical and useful application in the field of biomedical image analysis. Claims 1, 8, 17 and their respective dependent claims are therefore considered to be patent eligible. Claim Rejections - 35 USC § 102 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action: A person shall be entitled to a patent unless – (a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention. Claims 1-3, 7-11 and 16 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by “GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-efficient Medical Image Recognition” (hereinafter “Huang”; published 2021). Regarding claim 1, Huang teaches a system comprising: a processor; and a non-transitory computer readable medium storing instructions executable by the processor, the machine-executable instructions comprising (Huang, Introduction, first paragraph-third paragraphs, Section 3, Method, Figs. 1-2; “deep learning and computer vision provide a promising solution for automating medical image analysis”): a first encoder that reduces received data representing a pathology to a first set of tokens (Huang, Section 3.1.1; “We extract the local image features from an intermediate convolution layer and vectorize to get the C-dimentional features for each of M image sub-regions” using the image encoder Ev); a multimodal fusion model that matches the first set of tokens to a second set of tokens characterizing the pathology, the multimodal fusion model being trained on an pretraining dataset compiled from a plurality of pathology-related sources, a given training sample within the pretraining dataset comprising a data representing a pathology and data characterizing the data representing the pathology (Huang, Section 3, p.3944-3947, Fig. 2; “Given a pair of medical image and report, we first use the image encoder and text encoder to extract image and text features respectively. The global image-text representations are learned through the global contrastive loss. For learning local representations, we compute the similarity matrix based on the image sub-region features and word-level features to generate attention-weighted image representations. The local contrastive objective is based on the attention-weighted image representations and the corresponding word representations. The overall representation learning framework is trained end-to-end by jointly optimizing both local and global contrastive losses.” The training dataset comprises pairs of images and associated reports containing pathological findings in medical imaging examinations, of which the images are read here as the claimed “data representing a pathology” and the text phrases extracted from reports are read as the claimed “data characterizing the data representing the pathology”, and the trained model is used for zero-shot classification of an input image)); and a user interface that displays an output representing the second set of tokens (Huang, Fig. 4, Section 4.6; “well-trained attention weights should correctly identify significant image regions that correspond to a particular word…Fig. 4 demonstrates our attention model is able correctly identify significant image regions for a given word. For instance, the attention based on the word ”Pneumonia” Fig. 4a (bottom) correctly localize regions of the right lower lobe containing heterogenous consolidative opacities indicative of pneumonia”). Regarding claim 2, claim 1 is incorporated, and Huang further discloses wherein the received data is an image, the first set of tokens is a set of visual tokens, and the second set of tokens is a set of text tokens (Huang, Section 3.3.1; “Formally, given a query image xv and a collection of candidate texts Xt, we extract global image and text representations vg, tg by using their respective encoders and representation learning function.”). Regarding claim 3, claim 2 is incorporated, and Huang further discloses an image interface that receives the received image, divides the received image into a plurality of tiles, and provides the plurality of tiles to the first encoder to provide a set of visual tokens for each of the plurality of tiles (Huang, Section 3.2.3-3.3.1; “we propose to leverage both the global and local features for a more accurate retrieval. We use the attention-driven image-text matching-score Z(tli, vli) defined in Eq. 6 as the similarity metric for the local representations. In this way, the localized similarity between the query image and candidate sentences can be calculated base on the context-aware local representations”). Regarding claim 7, claim 2 is incorporated, and Huang further discloses wherein the multimodal fusion model is trained using an objective function having a contrastive objective component that aligns the first and second encoders by maximizing cosine-similarity scores between paired image and text embeddings and a captioning objective that maximizes the likelihood of generating the correct text conditioned on the image and previously generated text (Huang, Sections 3.2.2-3.2.3; “the global objective is formulated as minimizing the negative log posterior probability…where τ1 ∈ R is a scaling temperature parameter, and ⟨vgi, tgi⟩ represents the cosine similarity between the global image representation vgi and global text features” and “Similarly, due to the mutual correlation between the image and text pairs, we also maximize the posterior probability of the text given its corresponding image.”). Regarding claim 8, Huang discloses a method (Huang, Fig. 2, Fig. 3, Section 3, Method) comprising: receiving one of an input representing a pathology and a search query (Huang, Section 3.3.2; “In zero-shot classification, we take an image xv as input and aim at predicting the corresponding label”); generating a first set of tokens from the one of the input representing a pathology and the search query (Huang, Section 3.1.1; “We extract the local image features from an intermediate convolution layer and vectorize to get the C-dimentional features for each of M image sub-regions” using the image encoder Ev); matching the first set of tokens to a second set of tokens at a multimodal fusion model trained on a pretraining dataset complied from a plurality of pathology-related sources, a given training sample within the pretraining dataset comprising a data representing a pathology and text describing the image (Huang, Section 3, p.3944-3947, Fig. 2; “Given a pair of medical image and report, we first use the image encoder and text encoder to extract image and text features respectively. The global image-text representations are learned through the global contrastive loss. For learning local representations, we compute the similarity matrix based on the image sub-region features and word-level features to generate attention-weighted image representations. The local contrastive objective is based on the attention-weighted image representations and the corresponding word representations. The overall representation learning framework is trained end-to-end by jointly optimizing both local and global contrastive losses.” The training dataset comprises pairs of images and associated reports containing pathological findings in the medical imaging examinations, of which the images are read here as the claimed “data representing a pathology” and the text phrases extracted from radiology reports are read as the claimed “data characterizing the data representing the pathology”, and the trained model is used for zero-shot classification of an input image)); and providing an output based on the second set of tokens (Huang, Fig. 4, Section 4.6; “well-trained attention weights should correctly identify significant image regions that correspond to a particular word…Fig. 4 demonstrates our attention model is able correctly identify significant image regions for a given word. For instance, the attention based on the word “Pneumonia” Fig. 4a (bottom) correctly localize regions of the right lower lobe containing heterogenous consolidative opacities indicative of pneumonia”). Regarding claim 9, claim 8 is incorporated, and Huang further discloses wherein the input representing the pathology is an input image (Huang, Section 3.3.1; “Formally, given a query image xv…”). Regarding claim 10, claim 9 is incorporated, and Huang further discloses wherein the one of the input image and the search query is the input image, and the provided output is a class label associated with the input image (Huang, Section 3.3.1; “In the image-text retrieval task, a query image is used as the input to retrieve the closet matching text based on the similarities between their representations”). Regarding claim 11, claim 9 is incorporated, and Huang further discloses wherein the one of the input image and the search query is the input image (Huang, Section 3.3.1; “Formally, given a query image xv…”), and wherein the provided output is a segmented representation of the input image (Huang, Fig. 4, Section 4.6; “well-trained attention weights should correctly identify significant image regions that correspond to a particular word…Fig. 4 demonstrates our attention model is able correctly identify significant image regions for a given word. For instance, the attention based on the word “Pneumonia” Fig. 4a (bottom) correctly localize regions of the right lower lobe containing heterogenous consolidative opacities indicative of pneumonia”). Regarding claim 16, claim 9 is incorporated, and Huang further discloses dividing the input image into a plurality of tiles; and providing the plurality of tiles to a vision encoder to provide a set of visual tokens for each of the plurality of tiles; wherein matching the first set of tokens to the second set of tokens at the multimodal fusion model comprises generating a similarity metric between the set of visual tokens for each of the plurality of tiles with a set of text tokens associated with the input image, the output being provided according to the similarity metric for each of the plurality of tiles (Huang, Section 3.2.3-3.3.1; “In the image-text retrieval task, a query image is used as the input to retrieve the closet matching text based on the similarities between their representations…we propose to leverage both the global and local features for a more accurate retrieval. We use the attention-driven image-text matching-score Z(tli, vli) defined in Eq. 6 as the similarity metric for the local representations. In this way, the localized similarity between the query image and candidate sentences can be calculated based on the context-aware local representations”). Claims 17-19 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by “BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs” (hereinafter “Zhang”; published 2023). Regarding claim 17, Zhang discloses a system comprising: a processor; and a non-transitory computer readable medium storing instructions executable by the processor, the machine-executable instructions comprising (Zhang, p.3-4, p.22-23, Implementation; “The pretraining experiments were conducted with up to 16 NVIDIA A100 GPUs or 16 NVIDIA V100 GPUs, via PyTorch DDP”): a text encoder that reduces a received search to a set of text tokens (Zhang, p.7 (Cross-Modal retrieval, Fig. 1 (p. 15), p.20; “Specifically, we use the pretrained vision encoder and text encoder from the CLIP models to precompute embeddings of figures and captions respectively.”); a multimodal fusion model that matches the set of text tokens to a set of visual tokens, the multimodal fusion being trained on a pretraining dataset compiled from a plurality of pathology-related sources, a given training sample within the pretraining dataset comprising a pathology image and text describing the image (Zhang, p.3-4 (BiomedCLIP enables accurate cross-modal retrieval), p.7 (Cross-modal retrieval), Fig. 1 (p.15); “BiomedCLIP performs very well by pretraining on large-scale data from PMC-15M, further indicating the importance of utilizing a diverse and large dataset for domain-specific vision language models” wherein the training data include a plurality of biomedical image-text pairs. The pretrained model is then used in the testing phase, in the case of text-to-image retrieval, to identify the image having the highest similarity to the input text search query.); and a user interface that displays an image associated with the set of visual tokens (Zhang, p.3-4, Fig. 2B (p.16); “To understand how BiomedCLIP outperforms general-domain CLIP in biomedical cross-modal retrieval, we show three random examples in Fig. 2B. In each example, we show the top-4 image retrieval results given the text prompt”). Regarding claim 18, claim 17 is incorporated, and Zhang further discloses wherein the multimodal fusion model computes a similarity metric between the set of text tokens and a plurality of sets of visual tokens associated with the multimodal fusion model and matches the set of text tokens with each set of visual tokens for which the similarity metric meets a threshold value (Zhang, p.7, Cross-modal retrieval; “Specifically, we use the pretrained vision encoder and text encoder from the CLIP models to precompute embeddings of figures and captions respectively. Given a figure embedding, we compute its cosine similarities with all captions in the test set and retrieve the k most similar captions. Our evaluation metric measures if the original caption for the figure is within the k retrieved captions, i.e., recall at top-k or R@k. Similarly, we evaluate Recall@k for text-to-image cross-modal retrieval”). Regarding claim 19, claim 17 is incorporated, and Zhang further discloses wherein the multimodal fusion model computes a similarity metric between the set of text tokens and a plurality of sets of visual tokens associated with the multimodal fusion model and matches the set of text tokens with a predetermine number of sets of visual tokens having the highest similarity metrics (Zhang, p.7, Cross-modal retrieval; “Specifically, we use the pretrained vision encoder and text encoder from the CLIP models to precompute embeddings of figures and captions respectively. Given a figure embedding, we compute its cosine similarities with all captions in the test set and retrieve the k most similar captions. Our evaluation metric measures if the original caption for the figure is within the k retrieved captions, i.e., recall at top-k or R@k. Similarly, we evaluate Recall@k for text-to-image cross-modal retrieval”). Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows: 1. Determining the scope and contents of the prior art. 2. Ascertaining the differences between the prior art and the claims at issue. 3. Resolving the level of ordinary skill in the pertinent art. 4. Considering objective evidence present in the application indicating obviousness or nonobviousness. Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Huang, as applied to claim 1 above, in view of “Making the Most of Text Semantics to Improve Biomedical Vision–Language Processing” (hereinafter “Boecking”; published 2022). Regarding claim 5, claim 3 is incorporated, and Huang does not expressly teach the limitations as further claimed, but, in an analogous field of endeavor, Boecking does as follows. Boecking teaches wherein the multimodal fusion model provides a set of text tokens for the received image and a similarity metric for each tile for the set of text tokens, the output representing the similarity metric for each tile (Boecking, p.6-7, Section 2.2, Section 4.1, Fig. 3; “For each input image…we use the image encoder and projection module to obtain patch embeddings…for segmentation tasks… Probabilities for classes/regions can then be computed via a softmax over the cosine similarities between the image (or region) and prompt representations.”). Boecking is considered analogous art because it pertains to biomedical vision-language data processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system taught by Huang to include outputting a heatmap representation of the similarity between the text embeddings and the image patch embeddings, as taught by Boecking, in order to achieve more accurate local alignment and visualization of corresponding text phrases to image regions (Boecking, p.7-8, Section 3). Claim 6 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Huang, as applied to claims 1 and 8 above, in view of “Self-distillation Augmented Masked Autoencoders for Histopathological Image Understanding” (hereinafter “Luo”; published 2023). Regarding claim 6, claim 3 is incorporated, and Huang does not expressly teach the limitations as further claimed, but, in an analogous field of endeavor, Luo does as follows. Luo teaches wherein the first encoder is trained on a plurality of pathology images via a self-supervising learning algorithm using an objective function including a self-distillation loss and a masked image modeling loss (Luo, Section II.A-C, equation (4); the total loss includes a self-distillation loss and a MSE loss on masked patches). Luo is considered analogous art because it pertains to biomedical image analysis using machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system taught by Huang to pretrain the image encoder using a loss function incorporating both self-distillation loss and a MSE loss based on masked image patches, as taught by Luo, in order to obtain better self-supervised-based feature learning in the pretraining process (Luo, Section II.C). Regarding claim 13, claim 9 is incorporated, and Huang further teaches wherein the one of the input image and the search query is the input image, and generating the first set of tokens comprises providing the input image to a vision encoder trained on a plurality of pathology images (Huang, Section 3, p.3944-3947, Fig. 2; “Given a pair of medical image and report, we first use the image encoder and text encoder to extract image and text features respectively. The global image-text representations are learned through the global contrastive loss. For learning local representations, we compute the similarity matrix based on the image sub-region features and word-level features to generate attention-weighted image representations. The local contrastive objective is based on the attention-weighted image representations and the corresponding word representations. The overall representation learning framework is trained end-to-end by jointly optimizing both local and global contrastive losses.” The training dataset comprises a plurality of pairs of images and associated reports containing pathological findings in the medical imaging examinations, and the resultant trained model is used for zero-shot classification of an input image). Huang does not expressly teach the limitations as further claimed, but, in an analogous field of endeavor, Luo does as follows. Luo teaches a vision encoder trained on a plurality of pathology images via a self-supervising learning algorithm using an objective function including a self-distillation loss and a masked image modeling loss (Luo, Section II.A-C, equation (4); the total loss includes a self-distillation loss and a MSE loss on masked patches). Luo is considered analogous art because it pertains to biomedical image analysis using machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method taught by Huang to pretrain the image encoder using a loss function incorporating both self-distillation loss and a MSE loss based on masked image patches, as taught by Luo, in order to obtain better self-supervised-based feature learning in the pretraining process (Luo, Section II.C). Claim 12 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Huang, as applied to claim 8 above, in view of “BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs” (hereinafter “Zhang”; published March 2023). Regarding claim 12, claim 9 is incorporated, and Huang does not expressly teach the limitations as claimed, but, in an analogous field of endeavor, Zhang does as follows. Zhang teaches wherein the one of the input representing the pathology and the search query is the search query, and the provided output is an image that is responsive to the search query (Zhang, p.3-4 (BiomedCLIP enables accurate cross-modal retrieval), p.7 (Cross-modal retrieval), Fig. 1 (p.15), Fig. 2B (p.16); “We first evaluate the task of cross-modal retrieval which aims to retrieve the corresponding image from the caption (text-to-image retrieval)” The pretrained model is used in the testing phase to identify the image having the highest similarity to the input text search query.). Zhang is considered analogous art because it pertains to medical image analysis using machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method taught by Huang by training the model to accept an input text caption and perform a text-to-image retrieval task based on the input text, as taught by Zhang, in order to enable efficient cross-modal medical information retrieval (Zhang, p.3-7). Regarding claim 14, claim 9 is incorporated, and Huang further discloses Zhang teaches wherein the one of the input image and the search query is the search query (Zhang, p.3-4 (BiomedCLIP enables accurate cross-modal retrieval), p.7 (Cross-modal retrieval), Fig. 1 (p.15), Fig. 2B (p.16); “We first evaluate the task of cross-modal retrieval which aims to retrieve the corresponding image from the caption (text-to-image retrieval)” The pretrained model is used in the testing phase to identify the image having the highest similarity to the input text search query.), and matching the first set of tokens to the second set of tokens at the multimodal fusion model comprises computing a similarity metric between the set of text tokens and a plurality of sets of visual tokens associated with the multimodal fusion model and matching the set of text tokens with each set of visual tokens for which the similarity metric meets a threshold value (Zhang, p.7, Cross-modal retrieval; “Specifically, we use the pretrained vision encoder and text encoder from the CLIP models to precompute embeddings of figures and captions respectively. Given a figure embedding, we compute its cosine similarities with all captions in the test set and retrieve the k most similar captions. Our evaluation metric measures if the original caption for the figure is within the k retrieved captions, i.e., recall at top-k or R@k. Similarly, we evaluate Recall@k for text-to-image cross-modal retrieval”). Zhang is considered analogous art because it pertains to medical image analysis using machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method taught by Huang to include training the model to accept an input text caption and perform a text-to-image retrieval task based on a comparison of the text embeddings corresponding to the input caption and image embeddings learned by the model, as taught by Zhang, in order to enable efficient cross-modal medical information retrieval (Zhang, p.3-7). Allowable Subject Matter Claims 4 and 15 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. The following is a statement of reasons for the indication of allowable subject matter: Regarding claim 4, none of the cited prior art of record, either alone or in combination, expressly teaches “wherein the multimodal fusion model provides a set of text tokens for each of the plurality of tiles, the user interface displaying an output representing the sets of text token for each of the plurality of tiles”. In particular, while Huang and Boecking above both teach calculating image patch-level embeddings for an input image and comparing it to text embeddings of a text phrase, neither expressly teaches or suggests providing a set of text embeddings for each of the plurality of tiles of the divided input image and displaying an output representing the set of text tokens for each of the plurality of tiles, as claimed. Regarding claim 15, none of the cited prior art of record, either alone or in combination, expressly teaches “wherein matching the first set of tokens to the second set of tokens at the multimodal fusion model comprises matching the set of visual tokens for each of the plurality of tiles with a corresponding set of text tokens, the output being provided according to the set of text tokens for each of the plurality of tiles.” In particular, while Huang and Boecking above both teach dividing an input image into patches/tiles, calculating patch/tile embeddings using an image encoder, and comparing the patch embeddings to text embeddings of a text phrase, neither expressly teaches or suggests matching each of the patch embeddings to a corresponding set of text embeddings and displaying an output according to the set of text tokens for each of the plurality of tiles, as claimed. Contact Information Any inquiry concerning this communication or earlier communications from the examiner should be directed to SAMAH A BEG whose telephone number is (571)270-7912. The examiner can normally be reached M-F 9 AM - 5 PM. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, HENOK SHIFERAW can be reached on 571-272-4637. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /SAMAH A BEG/Primary Examiner, Art Unit 2676
Read full office action

Prosecution Timeline

Dec 16, 2024
Application Filed
Apr 18, 2025
Non-Final Rejection — §102, §103
Aug 25, 2025
Response Filed
Aug 25, 2025
Response after Non-Final Action
Nov 24, 2025
Non-Final Rejection — §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12599284
ENDOSCOPIC EXAMINATION SUPPORT APPARATUS, ENDOSCOPIC EXAMINATION SUPPORT METHOD, AND RECORDING MEDIUM
2y 5m to grant Granted Apr 14, 2026
Patent 12597142
SYSTEMS AND METHODS FOR PREPROCESSING IMMUNOCYTOCHEMISTRY IMAGES FOR MACHINE LEARNING IMAGE-TO-IMAGE TRANSLATION
2y 5m to grant Granted Apr 07, 2026
Patent 12573015
METHOD FOR CAPTURING IMAGE MATERIAL FOR MONITORING IMAGE-ANALYSING SYSTEMS, DEVICE AND VEHICLE FOR USE IN THE METHOD AND COMPUTER PROGRAM
2y 5m to grant Granted Mar 10, 2026
Patent 12561806
COMPUTE SYSTEM WITH EXPLAINABLE AI FOR SKIN LESIONS ANALYSIS MECHANISM AND METHOD OF OPERATION THEREOF
2y 5m to grant Granted Feb 24, 2026
Patent 12536618
ARTIFICIAL-INTELLIGENCE-BASED IMAGE PROCESSING METHOD AND APPARATUS, ELECTRONIC DEVICE, COMPUTER-READABLE STORAGE MEDIUM, AND COMPUTER PROGRAM PRODUCT
2y 5m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

2-3
Expected OA Rounds
78%
Grant Probability
99%
With Interview (+29.9%)
2y 4m
Median Time to Grant
Moderate
PTA Risk
Based on 317 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month