Last updated: May 29, 2026
Application No. 18/463,585
ATTENTION-BASED MULTIPLE INSTANCE LEARNING FOR WHOLE SLIDE IMAGES

Final Rejection §102
Filed
Sep 08, 2023
Priority
Mar 12, 2021 — provisional 63/160,493 +1 more
Examiner
TRAN, DUY ANH
Art Unit
2674
Tech Center
2600 — Communications
Assignee
Genentech Inc.
OA Round
2 (Final)
Interview Optional

— +18.4% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 80% grant rate with +18.4% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 133 resolved cases, 2023–2026
Examiner Intelligence

TRAN, DUY ANH View full profile →
Grants 80% — above average
Career Allowance Rate
107 granted / 133 resolved
+18.5% vs TC avg
Strong +18% interview lift
Without
With
+18.4%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
21 currently pending
Career history
162
Total Applications
across all art units
Statute-Specific Performance

§101
1.0%
-39.0% vs TC avg
§103
81.7%
+41.7% vs TC avg
§102
12.2%
-27.8% vs TC avg
§112
3.4%
-36.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 133 resolved cases
Office Action

§102
DETAILED ACTION
This Action is in response to Applicant’s response filed on 12/29/2025.  Claims 1-20 are still pending in the present application.  This Action is made FINAL.

Response to Arguments
Applicant's arguments filed on 12/29/2025 have been fully considered but are moot in view of the new ground(s) rejection in view of Ming Y. Lu et al (“Data Efficient and Weakly Supervised Computational Pathology on Whole Slide Images”; Ming).

Claim Status
Claim(s) 1-20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Ming Y. Lu et al (“Data Efficient and Weakly Supervised Computational Pathology on Whole Slide Images”; Ming).

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Ming Y. Lu et al (“Data Efficient and Weakly Supervised Computational Pathology on Whole Slide Images”; Ming).

Regarding claim 1, Ming discloses a computer-implemented method (Abstract: “CLAM- Clustering-constrained Attention Multiple instance learning, an easy-to-use, high throughput and interpretable WSI-level processing and learning method that only requires slide-level labels while being data efficient, adaptable and capable of handling multi-class subtyping problems. … CLAM is a flexible, general purpose, and adaptable method that can be used for a variety of different computational pathology tasks in both clinical and research settings”; Computational Hardware and Software) comprising:
receiving a whole slide image (Fig.1a – Whole slide images); segmenting the whole slide image into a plurality of tiles; (Figure 1: Overview of the CLAM conceptual framework, architecture and interpretability. “a Following segmentation, image patches are extracted from the tissue regions of the WSI.” ; Page 6: “Our pipeline first automatically segments the tissue region of each slide and divides it into many smaller patches (e.g. 256 × 256 pixels) so they can serve as direct inputs to a convolutional neural network (CNN) (Figure 1a).”)
generating a feature vector for each of the tiles, wherein the feature vector for each of the tiles represents an embedding for the tile; (Figure 1: Overview of the CLAM conceptual framework, architecture and interpretability. “b Patches are encoded once by a pretrained CNN into a descriptive feature representation. During training and inference, extracted patches in each WSI are passed to a CLAM model as feature vectors.”; Page 6: “Next, using a pretrained CNN for feature extraction, we convert all tissue patches into sets of low-dimensional feature embeddings (Figure 1b). Following this feature extraction, both training and inference can occur in the low-dimensional feature space instead of the high-dimensional pixel space.”)
computing a weighting value corresponding to each of the feature vectors using an attention network; (Figure 1: Overview of the CLAM conceptual framework, architecture and interpretability. “c For each class, the attention network ranks each region in the slide and assigns an attention score based on its relative importance to the slide-level diagnosis”; 
computing an image embedding comprising at least a first numerical component and a second numerical component based on the feature vectors, wherein each of the feature vectors is weighted based on the weighting value corresponding to the feature vector; (Figure 1: Overview of the CLAM conceptual framework, architecture and interpretability. “Attention-pooling weighs patches by their respective attention scores and summarizes patch-level features into slide-level representations, which are used to make the final diagnostic prediction. The strongly attended and weakly attended regions are used as representative samples to train clustering layers that learn a rich patch-level feature space separable between the positive and negative evidence of distinct classes. … The attention scores can be visualized as a heatmap to identify ROIs and interpret the important morphology used for diagnosis. ”;  Page 4: “the model examines and ranks all patches within the tissue regions of a WSI, assigning an attention score for each patch, which informs its contribution or importance to the collective, slide-level representation for a specific class (Figure 1). This interpretation of the attention score is reflected in the slide-level aggregation rule of attention-based pooling, which computes the slide-level representation as the average of all patches in the slide weighted by their respective attention score.”, it shows that “the plurality of attention scores of the weighs patches” is interpreted as “a first numerical component and a second numerical component based on the feature vectors”)  and 
generating a classification for the whole slide image based on the image embedding. (Fig.1 and Page 4: “CLAM is a deep-learning-based weakly-supervised method that uses attention-based learning to automatically identify sub-regions of high diagnostic value in order to accurately classify the whole slide, while also utilizing instance-level clustering over the representative regions identified to constrain and refine the feature space. … A CLAM model has n parallel attention branches that together calculate n unique slide-level representations, where each representation is determined from a different set of highly-attended regions in the image viewed by the network as strong positive evidence for the one of n classes in a multi-class diagnostic task (Figure 1-b, c). Each class-specific slide representation is then examined by a classification layer to obtain the final probability score predictions for the whole slide.”)

Regarding claim 2, Ming discloses further comprising: generating a heatmap corresponding to the whole slide image, wherein the heatmap comprises a plurality of regions associated with a plurality of intensity values, respectively, (Figure 1: Overview of the CLAM conceptual framework, architecture and interpretability. : “d The attention scores can be visualized as a heatmap to identify ROIs and interpret the important morphology used for diagnosis.”) wherein one or more regions of the plurality of regions is associated with an indication of a condition in the whole slide image, and wherein the respective intensity value associated with the one or more regions correlates to a statistical confidence of the indication. (Interpretability and Whole Slide Attention Visualization. : “In order to visualize and interpret the relative importance of each region in the WSI, we can generate an attention heatmap by converting the attention scores for the model’s predicted class into percentiles and mapping the normalized scores to their corresponding spatial location in the original slide. …. through weakly-supervised learning using slide-level labels only, trained CLAM models are generally capable of identifying the boundary between tumor and normal tissue (Figure 4 a-c, see interactive demo at http://clam.mahmoodlab.org for high resolution heatmaps).”)

Regarding claim 3, Ming discloses wherein the classification for the whole slide image indicates the presence of one or more biological abnormalities in tissue depicted in the whole slide image, the one or more biological abnormalities comprising hypertrophy, Kupffer cell abnormalities, necrosis, inflammation, glycogen abnormalities, lipid abnormalities, peritonitis, anisokaryosis, cellular infiltration, karyomegaly, microgranuloma, hyperplasia, or vacuolation. (Page 6: “we demonstrate the data efficiency, adaptability and interpretability of CLAM on three different computational pathology problems: (A) renal cell carcinoma (RCC) subtyping (B) Non-small cell lung cancer (NSCLC) subtyping (C) Breast cancer lymph node metastasis detection. We additionally show that CLAM models trained on WSIs are adaptable to cell phone microscopy images and biopsy slides”; Fig.2 and Result: “the high performance (> 0.95 AUC) on all three tasks indicate that our method can be effectively applied to solve both conventional positive vs. negative cancer detection binary classification and more general multi-class cancer subtyping problems across a variety of tissue types.”)

Regarding claim 4, Ming discloses wherein the classification for the whole slide image includes an evaluation of a toxic event associated with tissue depicted in the whole slide image.  (Fig.2 and Result: “the high performance (> 0.95 AUC) on all three tasks indicate that our method can be effectively applied to solve both conventional positive vs. negative cancer detection binary classification and more general multi-class cancer subtyping problems across a variety of tissue types.”)

Regarding claim 5, Ming discloses further comprising: generating a respective classification for the whole slide image based on each attention network of a plurality of attention networks. (Page 4 : “assigning an attention score for each patch, which informs its contribution or importance to the collective, slide-level representation for a specific class (Figure 1). This interpretation of the attention score is reflected in the slide-level aggregation rule of attention-based pooling, which computes the slide-level representation as the average of all patches in the slide weighted by their respective attention score. … A CLAM model has n parallel attention branches that together calculate n unique slide-level representations, where each representation is determined from a different set of highly-attended regions in the image viewed by the network as strong positive evidence for the one of n classes in a multi-class diagnostic task”; Fig.1)

Regarding claim 6, Ming discloses further comprising generating annotations for the whole slide image based on the weighting values by: identifying one or more weighting values satisfying a predetermined criteria; identifying one or more feature vectors corresponding to the identified weighting values; and identifying one or more tiles corresponding to the identified feature vectors.  (CLAM :High-throughput, interpretable, weakly-supervised and data-efficient whole slide analysis: “At a high level, during both training and inference, the model examines and ranks all patches within the tissue regions of a WSI, assigning an attention score for each patch, which informs its contribution or importance to the collective, slide-level representation for a specific class (Figure 1). This interpretation of the attention score is reflected in the slide-level aggregation rule of attention-based pooling, which computes the slide-level representation as the average of all patches in the slide weighted by their respective attention score.”; Interpretability and Whole Slide Attention Visualization: “In order to visualize and interpret the relative importance of each region in the WSI, we can generate an attention heatmap by converting the attention scores for the model’s predicted class into percentiles and mapping the normalized scores to their corresponding spatial location in the original slide. Fine-grained attention heatmaps can be created by using overlapping patches (e.g. 95% overlap) and averaging the attention scores in the overlapped regions”)

Regarding claim 7, Ming discloses further comprising providing the annotations for the whole slide image for display in association with the whole slide image, wherein providing the annotations comprises marking the one or more identified tiles. (Fig.4 and  Page 6: “use of the slide-level ground truth label and the attention scores predicted by the network to generate pseudo-labels for both highly attended and weakly attended patches as a novel means to increase the supervisory signals for learning a rich, separable, patch-level feature space.”; Interpretability and Whole Slide Attention Visualization: “through weakly-supervised learning using slide-level labels only, trained CLAM models are generally capable of identifying the boundary between tumor and normal tissue (Figure 4 a-c, see interactive demo at http://clam.mahmoodlab.org for high resolution heatmaps).”)

Regarding claim 8, Ming discloses further comprising: providing the classification for the whole slide image to a pathologist for verification. (Interpretability and Whole Slide Attention Visualization : “Human-readable interpretability of the trained weakly-supervised deep learning classifier can not only serve to validate that the predictive basis of the model aligns with well-known morphology used by pathologists for clinical diagnosis, but also has the potential to elucidate new morphological features of diagnostic relevance. A CLAM model makes its slide-level prediction by first identifying and aggregating regions in the WSI that are of high diagnostic importance (high attention score) while ignoring regions of low diagnostic relevance (low attention score).”)

Regarding claim 9, Ming discloses, further comprising: calculating a confidence score associated with the classification for the whole slide image based on at least the weighting values; and providing the confidence score for display in association with the classification for the whole slide image. (CLAM: High-throughput, interpretable, weakly-supervised and data-efficient whole slide analysis : “interpretation of the attention score is reflected in the slide-level aggregation rule of attention-based pooling, which computes the slide-level representation as the average of all patches in the slide weighted by their respective attention score … A CLAM model has n parallel attention branches that together calculate n unique slide-level representations, where each representation is determined from a different set of highly-attended regions in the image viewed by the network as strong positive evidence for the one of n classes in a multi-class diagnostic task (Figure 1-b, c). Each class-specific slide representation is then examined by a classification layer to obtain the final probability score predictions for the whole slide”)

Regarding claim 10, Ming discloses further comprising: identifying, based on the feature vectors, weighting values, and slide embedding feature value, one or more derivative characteristics associated with the classification for the whole slide image.  (Discussion: “In three separate analyses, we showed that our models can identify well-known morphological features and accordingly, has the capability of identifying new morphological features of diagnostic, prognostic, and therapeutic relevance … These heatmaps may be used as an interpretability tool in research applications to identify new morphological features associated with response and resistance to treatment or used as a visualization tool for secondary opinion in anatomic pathology”)

Regarding claim 11, Ming discloses further comprising: generating a plurality of classifications for a plurality of whole slide images, respectively; and training one or more attention networks to predict weighting values associated with one or more conditions, respectively, using the plurality of classifications. (Page 18: “the attention network predicts n distinct sets of attention scores corresponding to the n classes in a multi-class classification problem. This enables the network to unambiguously learn for each class, which morphological features should be considered as positive evidence (characteristic of the class) vs. negative evidence (non-informative) and summarize n unique slide-level representations”; Figure 1: Overview of the CLAM conceptual framework, architecture and interpretability “The strongly attended and weakly attended regions are used as representative samples to train clustering layers that learn a rich patch-level feature space separable between the positive and negative evidence of distinct classes.”)

Regarding claim 12, Ming discloses wherein the classification indicates the whole slide image depicts one or more abnormalities associated with the tissue depicted in the whole slide image.  (Fig.2 and Result: “the high performance (> 0.95 AUC) on all three tasks indicate that our method can be effectively applied to solve both conventional positive vs. negative cancer detection binary classification and more general multi-class cancer subtyping problems across a variety of tissue types.”)

Regarding claim 13, Ming discloses wherein the whole slide image is received from a user device and the method includes providing the classification for the whole slide image to the user device for display.  (Figs. 5-6; Page 6: “We additionally show that CLAM models trained on WSIs are adaptable to cell phone microscopy images and biopsy slides.”; Adapting networks trained on whole slide images to cellphone microscopy images : “We additionally explored the ability of our models (which are trained exclusively on WSIs) to directly adapt to microscopy images captured using a cellphone camera. … a robust model trained on WSIs that is capable of directly adapting to cellphone images (CPIs) and deliver accurate automated diagnosis is therefore of tremendous value to the wider adoption of telepathology.”)

Regarding claim 14, Ming discloses wherein the whole slide image is received from a digital pathology image generation system communicatively coupled with a digital pathology image processing system that performs the method.  (Figs. 5-6; Page 6: “We additionally show that CLAM models trained on WSIs are adaptable to cell phone microscopy images and biopsy slides.”; Adapting networks trained on whole slide images to cellphone microscopy images : “We additionally explored the ability of our models (which are trained exclusively on WSIs) to directly adapt to microscopy images captured using a cellphone camera. … a robust model trained on WSIs that is capable of directly adapting to cellphone images (CPIs) and deliver accurate automated diagnosis is therefore of tremendous value to the wider adoption of telepathology.”)

Regarding claim 15, Ming discloses A digital pathology image processing system comprising: one or more processors; and one or more computer-readable non-transitory storage media coupled to one or more of the processors and comprising instructions operable when executed by one or more of the processors to cause the system to perform operations (Computational Hardware and Software: “We used multiple hard drives to store the raw files of digitized whole slides. Segmentation and patching of WSIs are performed on Intel Xeon CPUs (Central Processing Units) and feature extraction using a pretrained neural network model is accelerated through data batch parallelization across multiple NVIDIA P100 GPUs (Graphics Processing Units) … For loading data and training deep learning models using CLAM, we used the Pytorch (version 1.3) deep learning library.”) comprising:
receiving a whole slide image (Fig.1a – Whole slide images); segmenting the whole slide image into a plurality of tiles; (Figure 1: Overview of the CLAM conceptual framework, architecture and interpretability. “a Following segmentation, image patches are extracted from the tissue regions of the WSI.” ; Page 6: “Our pipeline first automatically segments the tissue region of each slide and divides it into many smaller patches (e.g. 256 × 256 pixels) so they can serve as direct inputs to a convolutional neural network (CNN) (Figure 1a).”)
generating a feature vector for each of the tiles, wherein the feature vector for each of the tiles represents an embedding for the tile; (Figure 1: Overview of the CLAM conceptual framework, architecture and interpretability. “b Patches are encoded once by a pretrained CNN into a descriptive feature representation. During training and inference, extracted patches in each WSI are passed to a CLAM model as feature vectors.”; Page 6: “Next, using a pretrained CNN for feature extraction, we convert all tissue patches into sets of low-dimensional feature embeddings (Figure 1b). Following this feature extraction, both training and inference can occur in the low-dimensional feature space instead of the high-dimensional pixel space.”)
computing a weighting value corresponding to each of the feature vectors using an attention network; (Figure 1: Overview of the CLAM conceptual framework, architecture and interpretability. “c For each class, the attention network ranks each region in the slide and assigns an attention score based on its relative importance to the slide-level diagnosis”; 
computing an image embedding comprising at least a first numerical component and a second numerical component based on the feature vectors, wherein each of the feature vectors is weighted based on the weighting value corresponding to the feature vector; (Figure 1: Overview of the CLAM conceptual framework, architecture and interpretability. “Attention-pooling weighs patches by their respective attention scores and summarizes patch-level features into slide-level representations, which are used to make the final diagnostic prediction. The strongly attended and weakly attended regions are used as representative samples to train clustering layers that learn a rich patch-level feature space separable between the positive and negative evidence of distinct classes. … The attention scores can be visualized as a heatmap to identify ROIs and interpret the important morphology used for diagnosis. ”;  Page 4: “the model examines and ranks all patches within the tissue regions of a WSI, assigning an attention score for each patch, which informs its contribution or importance to the collective, slide-level representation for a specific class (Figure 1). This interpretation of the attention score is reflected in the slide-level aggregation rule of attention-based pooling, which computes the slide-level representation as the average of all patches in the slide weighted by their respective attention score.”, it shows that “the plurality of attention scores of the weighs patches” is interpreted as “a first numerical component and a second numerical component based on the feature vectors”)  and 
generating a classification for the whole slide image based on the image embedding. (Fig.1 and Page 4: “CLAM is a deep-learning-based weakly-supervised method that uses attention-based learning to automatically identify sub-regions of high diagnostic value in order to accurately classify the whole slide, while also utilizing instance-level clustering over the representative regions identified to constrain and refine the feature space. … A CLAM model has n parallel attention branches that together calculate n unique slide-level representations, where each representation is determined from a different set of highly-attended regions in the image viewed by the network as strong positive evidence for the one of n classes in a multi-class diagnostic task (Figure 1-b, c). Each class-specific slide representation is then examined by a classification layer to obtain the final probability score predictions for the whole slide.”)

Regarding claim 16, Ming discloses wherein the instructions are further operable when executed by one or more of the processors to cause the system to perform operations further comprising: generating a heatmap corresponding to the whole slide image, wherein the heatmap comprises a plurality of regions associated with a plurality of intensity values, respectively, (Figure 1: Overview of the CLAM conceptual framework, architecture and interpretability. : “d The attention scores can be visualized as a heatmap to identify ROIs and interpret the important morphology used for diagnosis.”) wherein one or more regions of the plurality of regions is associated with an indication of a condition in the whole slide image, and wherein the respective intensity value associated with the one or more regions correlates to a statistical confidence of the indication. (Interpretability and Whole Slide Attention Visualization. : “In order to visualize and interpret the relative importance of each region in the WSI, we can generate an attention heatmap by converting the attention scores for the model’s predicted class into percentiles and mapping the normalized scores to their corresponding spatial location in the original slide. …. through weakly-supervised learning using slide-level labels only, trained CLAM models are generally capable of identifying the boundary between tumor and normal tissue (Figure 4 a-c, see interactive demo at http://clam.mahmoodlab.org for high resolution heatmaps).”)

Regarding claim 17, Ming discloses wherein the classification for the whole slide image indicates the presence of one or more biological abnormalities in tissue depicted in the whole slide image, the one or more biological abnormalities comprising hypertrophy, Kupffer cell abnormalities, necrosis, inflammation, glycogen abnormalities, lipid abnormalities, peritonitis, anisokaryosis, cellular infiltration, karyomegaly, microgranuloma, hyperplasia, or vacuolation. (Page 6: “we demonstrate the data efficiency, adaptability and interpretability of CLAM on three different computational pathology problems: (A) renal cell carcinoma (RCC) subtyping (B) Non-small cell lung cancer (NSCLC) subtyping (C) Breast cancer lymph node metastasis detection. We additionally show that CLAM models trained on WSIs are adaptable to cell phone microscopy images and biopsy slides”; Fig.2 and Result: “the high performance (> 0.95 AUC) on all three tasks indicate that our method can be effectively applied to solve both conventional positive vs. negative cancer detection binary classification and more general multi-class cancer subtyping problems across a variety of tissue types.”)

Regarding claim 18, Ming discloses One or more computer-readable non-transitory storage media including instructions that, when executed by one or more processors, are configured to cause the one or more processors of a digital pathology image processing system(Computational Hardware and Software: “We used multiple hard drives to store the raw files of digitized whole slides. Segmentation and patching of WSIs are performed on Intel Xeon CPUs (Central Processing Units) and feature extraction using a pretrained neural network model is accelerated through data batch parallelization across multiple NVIDIA P100 GPUs (Graphics Processing Units) … For loading data and training deep learning models using CLAM, we used the Pytorch (version 1.3) deep learning library.”) to perform operations comprising:
receiving a whole slide image (Fig.1a – Whole slide images); segmenting the whole slide image into a plurality of tiles; (Figure 1: Overview of the CLAM conceptual framework, architecture and interpretability. “a Following segmentation, image patches are extracted from the tissue regions of the WSI.” ; Page 6: “Our pipeline first automatically segments the tissue region of each slide and divides it into many smaller patches (e.g. 256 × 256 pixels) so they can serve as direct inputs to a convolutional neural network (CNN) (Figure 1a).”)
generating a feature vector for each of the tiles, wherein the feature vector for each of the tiles represents an embedding for the tile; (Figure 1: Overview of the CLAM conceptual framework, architecture and interpretability. “b Patches are encoded once by a pretrained CNN into a descriptive feature representation. During training and inference, extracted patches in each WSI are passed to a CLAM model as feature vectors.”; Page 6: “Next, using a pretrained CNN for feature extraction, we convert all tissue patches into sets of low-dimensional feature embeddings (Figure 1b). Following this feature extraction, both training and inference can occur in the low-dimensional feature space instead of the high-dimensional pixel space.”)
computing a weighting value corresponding to each of the feature vectors using an attention network; (Figure 1: Overview of the CLAM conceptual framework, architecture and interpretability. “c For each class, the attention network ranks each region in the slide and assigns an attention score based on its relative importance to the slide-level diagnosis”; 
computing an image embedding comprising at least a first numerical component and a second numerical component based on the feature vectors, wherein each of the feature vectors is weighted based on the weighting value corresponding to the feature vector; (Figure 1: Overview of the CLAM conceptual framework, architecture and interpretability. “Attention-pooling weighs patches by their respective attention scores and summarizes patch-level features into slide-level representations, which are used to make the final diagnostic prediction. The strongly attended and weakly attended regions are used as representative samples to train clustering layers that learn a rich patch-level feature space separable between the positive and negative evidence of distinct classes. … The attention scores can be visualized as a heatmap to identify ROIs and interpret the important morphology used for diagnosis. ”;  Page 4: “the model examines and ranks all patches within the tissue regions of a WSI, assigning an attention score for each patch, which informs its contribution or importance to the collective, slide-level representation for a specific class (Figure 1). This interpretation of the attention score is reflected in the slide-level aggregation rule of attention-based pooling, which computes the slide-level representation as the average of all patches in the slide weighted by their respective attention score.”, it shows that “the plurality of attention scores of the weighs patches” is interpreted as “a first numerical component and a second numerical component based on the feature vectors”)  and 
generating a classification for the whole slide image based on the image embedding. (Fig.1 and Page 4: “CLAM is a deep-learning-based weakly-supervised method that uses attention-based learning to automatically identify sub-regions of high diagnostic value in order to accurately classify the whole slide, while also utilizing instance-level clustering over the representative regions identified to constrain and refine the feature space. … A CLAM model has n parallel attention branches that together calculate n unique slide-level representations, where each representation is determined from a different set of highly-attended regions in the image viewed by the network as strong positive evidence for the one of n classes in a multi-class diagnostic task (Figure 1-b, c). Each class-specific slide representation is then examined by a classification layer to obtain the final probability score predictions for the whole slide.”)

Regarding claim 19, Ming discloses wherein the instructions are further configured to cause the one or more processors of the digital pathology image processing system to perform operations further comprising: generating a heatmap corresponding to the whole slide image, wherein the heatmap comprises a plurality of regions associated with a plurality of intensity values, respectively, (Figure 1: Overview of the CLAM conceptual framework, architecture and interpretability. : “d The attention scores can be visualized as a heatmap to identify ROIs and interpret the important morphology used for diagnosis.”) wherein one or more regions of the plurality of regions is associated with an indication of a condition in the whole slide image, and wherein the respective intensity value associated with the one or more regions correlates to a statistical confidence of the indication. (Interpretability and Whole Slide Attention Visualization. : “In order to visualize and interpret the relative importance of each region in the WSI, we can generate an attention heatmap by converting the attention scores for the model’s predicted class into percentiles and mapping the normalized scores to their corresponding spatial location in the original slide. …. through weakly-supervised learning using slide-level labels only, trained CLAM models are generally capable of identifying the boundary between tumor and normal tissue (Figure 4 a-c, see interactive demo at http://clam.mahmoodlab.org for high resolution heatmaps).”)

Regarding claim 20, Ming discloses wherein the classification for the whole slide image indicates the presence of one or more biological abnormalities in tissue depicted in the whole slide image, the one or more biological abnormalities comprising hypertrophy, Kupffer cell abnormalities, necrosis, inflammation, glycogen abnormalities, lipid abnormalities, peritonitis, anisokaryosis, cellular infiltration, karyomegaly, microgranuloma, hyperplasia, or vacuolation. (Page 6: “we demonstrate the data efficiency, adaptability and interpretability of CLAM on three different computational pathology problems: (A) renal cell carcinoma (RCC) subtyping (B) Non-small cell lung cancer (NSCLC) subtyping (C) Breast cancer lymph node metastasis detection. We additionally show that CLAM models trained on WSIs are adaptable to cell phone microscopy images and biopsy slides”; Fig.2 and Result: “the high performance (> 0.95 AUC) on all three tasks indicate that our method can be effectively applied to solve both conventional positive vs. negative cancer detection binary classification and more general multi-class cancer subtyping problems across a variety of tissue types.”)

Relevant Prior Art Directed to State of Art
Schoenmeyer et al (U.S. 20170076442 A1), “Generating Image-Based Diagnostic Tests By Optimizing Image Analysis And Data Mining Of Co-Registered Images”, teaches about a method for generating an image-based test improves diagnostic accuracy by iteratively modifying rule sets governing image and data analysis of coregistered image tiles. Digital images of stained tissue slices are divided into tiles, and tiles from different images are coregistered. First image objects are linked to selected pixels of the tiles. First numerical data is generated by measuring the first objects. Each pixel of a heat map aggregates first numerical data from coregistered tiles. Second objects are linked to selected pixels of the heat map. Measuring the second objects generates second numerical data. The method improves how well second numerical data correlates with clinical data of the patient whose tissue is analyzed by modifying the rule sets used to generate the first and second objects and the first and second numerical data.
Courtiol et al (U.S. 20200250398 A1), “System and Methods for Image Classification”, teaches about the method includes tiling at least one region of interest of the input image into a set of tiles. For each tile, the method includes extracting a feature vector of the tile by applying a convolutional neural network, wherein a feature is a local descriptor of the tile; processing the extracted feature vectors includes computing a score of the tile from the extracted feature vector, said tile score being representative of a contribution of the tile into a classification of the input image; sorting a set of the tile scores and selecting a subset of the tile scores based on their value and/or their rank in the sorted set; and applying a classifier to the selected tile scores in order to classify the input image.
Yip et al (U.S. 20200258223 A1), “Determining Biomarkers From Histopathology Slide Images”, teaches about an imaging-based biomarker prediction system formed of a deep learning framework configured and trained to directly learn from histopathology slide images and predict the presence of biomarkers in medical images. Deep learning frameworks are configured to include different trained biomarker classifiers each configured to receive unlabeled histopathology images and provide different biomarker predictions for those images. Deep learning frameworks are provided that identify biomarkers indicating the presence of a tumor, a tumor state/condition, or information about a tumor of the tissue sample, from which a set of target immunotherapies can be determined. 

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Duy A Tran whose telephone number is (571)272-4887. The examiner can normally be reached Monday-Friday 8:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ONEAL R MISTRY can be reached at (313)-446-4912. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DUY TRAN/Examiner, Art Unit 2674                                                                                                                                                                                                        

/ONEAL R MISTRY/Supervisory Patent Examiner, Art Unit 2674
Read full office action
Prosecution Timeline

Show 1 earlier event
Oct 03, 2025
Non-Final Rejection mailed — §102
Dec 01, 2025
Interview Requested
Dec 11, 2025
Examiner Interview Summary
Dec 29, 2025
Response Filed
Mar 10, 2026
Final Rejection mailed — §102
May 01, 2026
Interview Requested
May 14, 2026
Applicant Interview (Telephonic)
May 15, 2026
Examiner Interview Summary
Precedent Cases

Applications granted by this same examiner with similar technology

18/332,927
Patent 12632960
CIGAR TOBACCO LEAF HARVESTING MATURITY IDENTIFICATION METHOD AND SYSTEM BASED ON INTEGRATED LEARNING
2y 11m to grant Granted May 19, 2026
18/085,007
Patent 12614277
OUTPUT DEVICE, METHOD, NON-TRANSITORY COMPUTER READABLE RECORDING MEDIUM AND DISPLAY DEVICE
3y 4m to grant Granted Apr 28, 2026
18/035,858
Patent 12608979
GESTURE RECOGNITION APPARATUS AND METHOD FOR RECOGNIZING GESTURE
2y 11m to grant Granted Apr 21, 2026
18/176,497
Patent 12608797
MEDICAL IMAGE DETECTION SYSTEM, TRAINING METHOD AND MEDICAL ANALYZATION METHOD
3y 1m to grant Granted Apr 21, 2026
17/947,989
Patent 12573024
IMAGE AUGMENTATION FOR MACHINE LEARNING BASED DEFECT EXAMINATION
3y 5m to grant Granted Mar 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
80%
Grant Probability
99%
With Interview (+18.4%)
2y 10m (~1m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 133 resolved cases by this examiner. Grant probability derived from career allowance rate.