Office Action Analysis: 18626419 — IMAGE DATA PROCESSING APPARATUS AND METHOD

Examiner Intelligence

HANSEN, CONNOR LEVI View full profile →
Grants 75% — above average
Career Allow Rate
21 granted / 28 resolved
+13.0% vs TC avg
Strong +29% interview lift
Without
With
+29.2%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
32 currently pending
Career history
60
Total Applications
across all art units
Statute-Specific Performance

§101
19.1%
-20.9% vs TC avg
§103
39.9%
-0.1% vs TC avg
§102
16.8%
-23.2% vs TC avg
§112
23.7%
-16.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 28 resolved cases
Office Action

§103 §112
Detailed Action
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Specification
The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 

Claim Interpretation
Claims 7 and 10 recite a listing of elements, as “at least one of A, B, C ….” Note that according to the Federal Circuit’s 2004 Superguide v. DirecTV decision, “at least one of … and … “ requires at least one instance of each and every item listed. Alternatively, “at least one of … or …” requires at least one instance of one of the items listed. Since claims 7 and 10 do not explicitly recite the conjunctive “and” or the disjunctive “or,” and since the specification supports a disjunctive “or” interpretation (see pgs. 3 and 11),  for examination purposes, claims 7 and 10 will be interpreted in the disjunctive.

Claim Objections
Claims 2-14 and 17-19 are objected to because they contain informalities. Each of the dependent claims recite “A medical image processing apparatus according to claim… “ which should read “The medical image processing apparatus according to claim… ” in order to properly refer back to the same claim element (i.e., medical image processing apparatus), across the claims.
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claim 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claim 1 recites “the computational latent representation” on line 12, which lacks antecedent basis. It is unclear if the element is meant to refer to the previously recited “compositional latent representation” or if it is meant to introduce a new element. Thus, one of ordinary skill in the art would not be able to ascertain the scope of the claims. For examination purposes, the element will be interpreted as if reciting the previously recited “compositional latent representation”.	

Claims 15, 16, and 20 contain limitations found analogous to claim 1. Therefore, claims 15 and 16 are rejected for the same reason as claim 1.

Claims 2-14 and 17-19 are rejected as being dependent on a rejected base claim.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 10, 12, 14-18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (“A Light-weight Interpretable Model for Nuclei Detection and Weakly-supervised Segmentation”, International Workshop on Medical Optical Imaging and Virtual Microscopy Image Analysis, 2022), (hereinafter Zhang) in view of Qu et al. (“Weakly Supervised Deep Nuclei Segmentation Using Partial Points Annotation in Histopathology Images”, IEEE Transactions on Medical Imaging, 2020), (hereinafter Qu).

Regarding claim 1, Zhang teaches a medical image processing apparatus comprising: 
a memory storing a plurality of training medical images, each annotated with respective weak supervision annotation information relating to at least one object represented in the training medical image, the at least one object comprising an anatomical object, a pathology, or a medical device (Zhang, “In this study, we propose a light-weight interpretable model for nuclei detection and weakly supervised segmentation. We aim to design a generative model for a single nucleus, therefore only annotations on isolated nucleus are required to learn this model, which significantly reduces the annotation cost. Inspired by the Compositional Networks (Kortylewski et al., 2020a), we developed a compositional model that first find signals of parts of a nucleus, then spatially combine the signals of parts to determine the position of nucleus.”, pg. 1, 2nd column, 2nd full paragraph, lines 1-10, “The mIF images were obtained using Vectra-3 and Vectra Polaris microscopes (Akoya BioSciences, MA, USA) from six patients with liver cancer (3), lung adenocarcinoma (1), lung small cell carcinoma (1), and melanoma (1). The whole slide consists of hundreds of fields in general, and each field has 1872 x 1404 (width x height) size with 32-bit pixel-depth. Total 6 fields, randomly selected from each patient, were used for the experiment in this study. For the nuclei detection and segmentation, DAPI stained images were used in this study among the multispectral images. The selected fields were manually checked and annotated by trained researchers and totally 18312 nuclei were annotated. Each field was divided to (256 x 256) patches which gives totally 210 patches, from which 186 patches were used for training and 24 patches were used for testing.”, pg. 5, 2nd column, 2nd full paragraph, A compositional model is trained via weak supervision using a set of annotated nuclei images as training images, the nuclei being an anatomical object. The microscope acquisition and image annotation processes corresponding to the training images would include a memory for storing the images.); and 
processing circuitry configured to use the plurality of training medical images to train a deep learning network to perform a task, wherein the training of the deep learning network comprises training a compositional latent representation comprising a plurality of kernels (Zhang, “Compositional Network (Kortylewski et al., 2020a) explains the feature map from a convolutional layer in a generative view. Denote a feature map as                         
                            F
                            ∈
                             
                                    R
                                
                                    H
                                     
                                    x
                                     
                                    W
                                     
                                    x
                                     
                                    D
                                
                    , with H and W being the spatial size and D being the channel size. The feature vector                         
                            
                                    f
                                
                                    i
                                
                     at position                         
                            i
                        
                     are assumed independently generated, and each is modeled as a mixture of von-Mises-Fisher (vMF) distributions:                         
                            p
                            
                                    F
                                
                                    A
                                    ,
                                    Λ
                                
                            =
                             
                                    ∏
                                    
                                        i
                                    
                                    p
                                
                                            f
                                        
                                            i
                                        
                                            A
                                        
                                            i
                                        
                                    ,
                                    Λ
                                
                    ,                         
                            p
                            
                                            f
                                        
                                            i
                                        
                                            A
                                        
                                            i
                                        
                                    ,
                                    Λ
                                
                            =
                             
                                    ∑
                                    
                                        k
                                    
                                            α
                                        
                                            i
                                            ,
                                            k
                                             
                                    p
                                    
                                                    f
                                                
                                                    i
                                                
                                                    μ
                                                
                                                    k
                                                
                                    ,
                                     
                            p
                            
                                            f
                                        
                                            i
                                        
                                            μ
                                        
                                            k
                                        
                            ∝
                            e
                            x
                            p
                             
                                    σ
                                    
                                            f
                                        
                                            i
                                        
                                            T
                                        
                                            μ
                                        
                                            k
                                        
                            ,
                             
                                            f
                                        
                                            i
                                        
                            =
                            1
                            ,
                             
                                            μ
                                        
                                            k
                                        
                            =
                            1
                            .
                        
                     where                         
                            Λ
                            =
                             
                                            μ
                                        
                                            k
                                        
                     are kernels for vMF distribution, which can be regarded as the “mean” feature vector of each mixture
 component k, and                         
                            
                                    A
                                
                                    i
                                
                            =
                            
                                            α
                                        
                                            i
                                            ,
                                            k
                                        
                     are the spatial coefficients, which learn the probability of                         
                            
                                    μ
                                
                                    k
                                
                    being activated at position                         
                            i
                        
                    .”, pg. 3, 1st column, 2nd full paragraph, lines 1-15, see deep learning network of Fig. 1, The compositional model is trained to perform nuclei detection and segmentation by training a compositional latent representation comprising von Mises-Fisher kernels (vMF).), 
wherein the training of the compositional latent network comprises guiding the compositional latent representation towards a representation in which different ones of the kernels are representative of different objects, the different objects comprising at least one anatomical object, pathology, or medical device (Zhang, “The vMF kernels represent image patterns that frequently occur in the training data. In prior work (Kortylewski et al., 2020a;b) the kernels were shown to correspond to object parts, such as tires of a car. We observe a similar property, as the feature vectors that have high cosine similarity with the vMF kernels resemble certain nucleus parts (background, edges or interior patterns), see Section 4.1… An important property of convolutional networks is that the spatial information from the image is preserved in the feature maps. To utilize this property, the set of spatial coefficients                         
                            
                                            α
                                        
                                            i
                                            ,
                                            k
                                        
                     are introduced to describe the expected activation of a kernel                         
                            
                                    μ
                                
                                    k
                                
                    at a position                         
                            i
                        
                    . Thus, k at all positions can be intuitively thought of as a 2D template, which depicts the expected spatial activation pattern of parts in an image of a nucleus – e.g. where the edges are expected to be located in the image. Therefore, the decision process of the proposed model can be interpreted as first detecting parts, then spatially combining them into a probability about the nucleus’ presence. Note that this implements a part-based voting mechanism.”, pg. 3, 2nd column, section 3.2, 1st and 2nd paragraphs, During training the vMF kernels are guided toward a part-based voting mechanism where different kernels learn representations of different parts of a cell, such as cell edges or interior patterns. Note that the learned representation of different parts of the cell corresponds to learning representation of different objects.).

Zhang does not teach wherein the training of the compositional latent network comprises using the weak supervision annotation information to provide weak supervision of the training of the computational latent representation, thereby guiding the compositional latent representation towards a representation in which different ones of the kernels are representative of different objects, the different objects comprising at least one anatomical object, pathology, or medical device.
However, Qu teaches wherein the training of the compositional latent network comprises using the weak supervision annotation information to provide weak supervision of the training of the computational latent representation, thereby guiding the compositional latent representation towards a representation in which different ones of the kernels are representative of different objects, the different objects comprising at least one anatomical object, pathology, or medical device (Qu, “Nuclei segmentation is a fundamental task in histopathology image analysis. Typically, such segmentation tasks require significant effort to manually generate accurate pixel-wise annotations for fully supervised training. To alleviate such tedious and manual effort, in this paper we propose a novel weakly supervised segmentation framework based on partial points annotation, i.e., only a small portion of nuclei locations in each image are labeled.”, see abstract, lines 1-8, “Our method consists of two stages: (1) semi-supervised nuclei detection and (2) weakly supervised nuclei segmentation. The goal of the first stage is to train a detector from partial points annotation to predict the locations of all nuclei in training images. A challenge is that there is no clear background information because only part of the nuclei are labeled in an image. To obtain a good initial detector, we first design an extended Gaussian mask to supervise the training with the labeled nuclear locations and ignore most unlabeled areas.”, pg. 3656, 1st column, 1st full paragraph, lines 5-15, “The first step of our detection method aims to train an initial detector using the labeled nuclei in each image... In order to tackle this issue, we define an extended Gaussian mask M according to the labeled points: (see eq. (1))… With the extended Gaussian masks, we are able to train a regression model for nuclei detection. We replace the encoder part of U-net [42] with the convolution layers of ResNet-34 [43] (shown in Fig. 1), which is more powerful in representation ability and can be initialized with pretrained parameters. The network is trained with a mean squared loss                         
                            
                                    L
                                
                                    m
                                    s
                                    e
                                
                     with respect to the corresponding extended Gaussian mask: (see eq. 2)”, pgs. 3657 and 3658, section A. Initial Training with extended Gaussian Masks, “Once we have the two types of pixel-level labels, we are able to train a deep convolutional neural network for nuclei segmentation. The network structure is the same as that in nuclei detection. It outputs two probability maps of background and nuclei, which are used to calculate two cross entropy losses with respect to the cluster label                         
                            
                                    L
                                
                                    c
                                    l
                                    u
                                    s
                                    t
                                    e
                                    r
                                
                     and Voronoi label                         
                            
                                    L
                                
                                    v
                                    o
                                    r
                                
                    : (see eq. 5)”, pg. 3659, 2nd column, section B. Training Deep Neural Networks with Pixel-Level Labels, A deep learning network is trained via weak supervision for nuclei detection and segmentation. Weak point annotations are used to create loss functions that guide the training of cell feature representations in a deep learning network.).
Zhang teaches training a compositional network for nuclei detection and segmentation. This includes using weak annotations of isolated nuclei only in a preprocessing step to identify nucleus patches, while the vMF kernel learning, which forms the computational latent representation, is performed in an unsupervised manner (Zhang, see sections 4.1 and 4.2). Zhang does not teach training the compositional latent representation using the weak supervision annotations. Qu teaches applying weak supervision to nuclei detection and segmentation by constructing loss functions from point annotations that guide feature learning in a deep learning network (see above). Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the unsupervised vMF learning of Zhang to include the weakly-supervised loss functions as taught by Qu (Qu, pgs. 3657 and 3658, section A. and pg. 3659, 2nd column, section B.). The motivation for doing so would have been to directly constrain the training of the compositional latent representation with the available weak annotations, thereby improving the reliability of feature representation learning. Further, one skilled in the art could have combined the elements as described above by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine the teachings of Zhang with Qu to obtain the invention as specified in claim 1.

Regarding claim 2, Zhang in view of Qu teaches a medical image processing apparatus according to claim 1, wherein the kernels are von mises fisher kernels (Zhang, “The feature vector fi at position i are assumed independently generated, and each is modeled as a mixture of von-Mises-Fisher (vMF) distributions: (see eqs. (1), (2), and (3)) where                                 
                                    Λ
                                    =
                                     
                                                    μ
                                                
                                                    k
                                                
                             are kernels for vMF distribution”, pg. 3, 1st column, section 3.1, lines 4-11, see Fig. 1).

Regarding claim 10, Zhang in view of Qu teaches a medical image processing apparatus according to claim 1, wherein the task comprises at least on of: segmentation, registration, image translation, regression (Zhang, “In this study, we propose a light-weight interpretable model for nuclei detection and weakly supervised segmentation…”, pg. 1, 2nd column, 2nd full paragraph, lines 1-10).

Regarding claim 12, Zhang in view of Qu teaches A medical image processing apparatus according to claim 1, wherein the processing circuitry is further configured to augment the plurality of training medical images by transforming at least some of the training medical images using at least one augmentation transformation to obtain augmented training medical images; and wherein the training of the deep learning network comprises using the training medical images and the augmented training medical images (Zhang, “A compositional model was learned for each cluster of nuclei with specific size and shape (Figure 4). To detect nuclei with various orientations, we rotated images by [-90, -60, -30, 30, 60] degrees before input, and restore the original rotation after getting output from the model.”, pg. 6, 1st column, lines 9-14).

Regarding claim 14, Zhang in view of Qu teaches a medical image processing apparatus according to claim 1, wherein the processing circuitry is further configured to: receive a target image; use the trained deep learning network to decompose the target image into a compositional latent representation comprising a plurality of kernels, each kernel having a respective activation; and use the kernels and activations to perform the task and obtain a task output (Zhang, “The vMF kernels represent image patterns that frequently occur in the training data. In prior work (Kortylewski et al., 2020a;b) the kernels were shown to correspond to object parts, such as tires of a car. We observe a similar property, as the feature vectors that have high cosine similarity with the vMF kernels resemble certain nucleus parts (background, edges or interior patterns), see Section 4.1… An important property of convolutional networks is that the spatial information from the image is preserved in the feature maps. To utilize this property, the set of spatial coefficients                                 
                                    
                                                    α
                                                
                                                    i
                                                    ,
                                                    k
                                                
                             are introduced to describe the expected activation of a kernel                                 
                                    
                                            μ
                                        
                                            k
                                        
                            at a position                                 
                                    i
                                
                            . Thus, k at all positions can be intuitively thought of as a 2D template, which depicts the expected spatial activation pattern of parts in an image of a nucleus – e.g. where the edges are expected to be located in the image. Therefore, the decision process of the proposed model can be interpreted as first detecting parts, then spatially combining them into a probability about the nucleus’ presence. Note that this implements a part-based voting mechanism.”, pg. 3, 2nd column, section 3.2, 1st and 2nd paragraphs, “The nuclei candidates obtained from Section 3.4 can also be used as segmentation masks. Since the algorithm only receives bounding box as supervision which is used to crop nucleus images, it achieves segmentation masks in a weakly-supervised way.”, pg. 5, section 3.5, see Figs. 1 and 3 for examples of activations of mVF kernels on sample images, At inference time, the compositional model processes sample images to determine activations of the mVF kernels to compute a likelihood map which is used for nuclei detection and segmentation.).

Claim 15 corresponds to claim 1, reciting a method for executing the functions according to claim 1. Zhang in view of Qu teaches a method for executing the functions according to claim 1 (Zhang, see section 4.1 Nuclei Detection, Implementation details). As indicated in the analysis of claim 1, Zhang in view of Qu teaches all the limitations according to claim 1. Therefore, claim 15 is rejected for the same reason of obviousness as claim 1. 

Claim 16 corresponds to claim 1, with the addition of “receive a target image; use the trained deep learning network to decompose the target image into a compositional latent representation comprising a plurality of kernels, each kernel having a respective activation; and use the kernels and activations to perform a task and obtain a task output”. 
Zhang in view of Qu teaches the addition of “receive a target image; use the trained deep learning network to decompose the target image into a compositional latent representation comprising a plurality of kernels, each kernel having a respective activation; and use the kernels and activations to perform a task and obtain a task output” (see analysis of claim 14). As indicated in the analysis of claim 1, Zhang in view of Qu teaches all the limitations according to claim 1. Therefore claim 16 is rejected for the same reason of obviousness as claim 1. 
 
Regarding claim 17, Zhang in view of Qu teaches a medical image processing apparatus according to claim 14, wherein the task comprises segmentation, and wherein the activations are used to provide the segmentation (Zhang, “The learning of the proposed method only requires the annotation of nucleus positions and bounding boxes. As stated in Section 3.5, by utilizing the unsupervisedly learned vMF kernels and the near-convex decomposition algorithm, we can obtain nuclei instance segmentation masks.”, pg. 7, section 4.2, lines 1-5, Likelihood maps are generated based on activations of mVF kernels processing sample images. These maps are used with a decomposition algorithm to provide the segmentation.).

Regarding claim 18, Zhang in view of Qu teaches a medical image processing apparatus according to claim 14, wherein the task comprises at least one of: segmentation, registration, image translation, regression (Zhang, “In this study, we propose a light-weight interpretable model for nuclei detection and weakly supervised segmentation…”, pg. 1, 2nd column, 2nd full paragraph, lines 1-10).

Claim 20 corresponds to claim 16, reciting a method for executing the functions according to claim 16. Zhang in view of Qu teaches a method for executing the functions according to claim 16 (Zhang, see section 4.1 Nuclei Detection and section 4.2 Weakly-supervised Nuclei Segmentation). As indicated in the analysis of claim 16, Zhang in view of Qu teaches all the limitations according to claim 16. Therefore, claim 20 is rejected for the same reason of obviousness as claim 16.

Claims 3-5 and 7 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (“A Light-weight Interpretable Model for Nuclei Detection and Weakly-supervised Segmentation”, International Workshop on Medical Optical Imaging and Virtual Microscopy Image Analysis, 2022) in view of Qu et al. (“Weakly Supervised Deep Nuclei Segmentation Using Partial Points Annotation in Histopathology Images”, IEEE Transactions on Medical Imaging, 2020) and further in view of Zhang-------1 et al. (CN 114782384 A), (hereinafter Zhang-------1).

Regarding claim 3, Zhang in view of Qu teaches a medical image processing apparatus according to claim 1. Zhang in view of Qu does not teach wherein the weak supervision annotation information indicates whether at least one predetermined organ is included in the training medical image.

However, Zhang-------1 teaches wherein the weak supervision annotation information indicates whether at least one predetermined organ is included in the training medical image (Zhang-------1, “Therefore, the invention aims to provide a heart chamber image segmentation method and device based on a semi-supervision method, so as to solve the problem that a segmentation model needs a large number of heart CT images with labels in the training process.”, pg. 2, lines 32-35, “Step S1, acquiring a training sample set, wherein the training sample set comprises a plurality of heart ct two-dimensional images, and labels for each part of the heart and the background are included on part of the two-dimensional images… Step S3, aiming at a first half-supervised neural network architecture, taking an image with a label in the standardized training sample set and an image with a pseudo label as the input of a model, performing first-stage training on the first half-supervised neural network architecture, wherein the pseudo label of an unlabeled image is generated based on an image data mixing disturbance strategy, and the first half-supervised neural network architecture comprises a UNet and TransUNet which are connected in parallel and is used for realizing supervision on the unlabeled image according to the pseudo label of the unlabeled image corresponding to the data mixing disturbance strategy; step S3, constructing a second semi-supervised neural network architecture based on the trained UNet and the Attention 3D Unet in the step S3, taking the standardized training  sample set as input, and training the second semi-supervised neural network architecture in a second stage based on a neural network cross-supervision strategy; And step S4, taking the UNet obtained in the step S3 as a final segmentation model, and executing a segmentation task by using the segmentation model.”, pg. 3, lines 5-24).
	Zhang in view of Qu teaches a compositional network for weakly-supervised nuclei detection and segmentation that is trained with histopathology images with weak annotations of isolated nuclei (Zhang, pg. 5, 2nd column, section 4, 2nd paragraph, see Fig. 1). Zhang in view of Qu does not teach using weak annotations of other anatomical structures, such as the heart. Zhang-------1 teaches a model for heart segmentation from CT images that is trained with images labeled with heart parts. Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the compositional network of Zhang in view of Qu to perform heart detection and segmentation by using the training images as taught by Zhang-------1 (Zhang-------1, pg. 3, lines 5-24). The motivation for doing so would have been to provide a weakly-supervised training method for cardiac segmentation, thereby reducing the annotation burden for cardiologists (as suggested by Zhang-------1, “With the development of technology, some imaging techniques (e.g., magnetic resonance, CT imaging, ultrasound, and PET imaging) can help doctors understand cardiac physiology and pathology at a higher level. However, because the medical image data volume is huge and the medical image is quite invisible, a two-dimensional segmentation method of a full-automatic heart chamber is urgently needed, and a better medical assistance is provided for judging heart diseases.”, pg. 2, lines 9-17). Further, one skilled in the art could have combined the elements as described above by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine the teachings of Zhang in view of Qu with Zhang-------1 to obtain the invention as specified in claim 3.

Regarding claim 4, Zhang in view of Qu and further in view of Zhang-------1 teaches a medical image processing apparatus according to claim 3, wherein the at least one predetermined organ comprises a heart (Zhang-------1, “Step S1, acquiring a training sample set, wherein the training sample set comprises a plurality of heart ct two-dimensional images, and labels for each part of the heart and the background are included on part of the two-dimensional images”, pg. 3, lines 5-7).

Regarding claim 5, Zhang in view of Qu and further in view of Zhang-------1 teaches A medical image processing apparatus according to claim 3, wherein the weak supervision annotation information for each training medical image indicates whether at least one predetermined organ sub-structure is included in the training medical image (Zhang-------1, “Step S1, acquiring a training sample set, wherein the training sample set comprises a plurality of heart ct two-dimensional images, and labels for each part of the heart and the background are included on part of the two-dimensional images”, pg. 3, lines 5-7).

Regarding claim 7, Zhang in view of Qu and further in view of Zhang-------1 teaches a medical image processing apparatus according to claim 3, wherein the weak supervision annotation information for each training medical image comprises at least one of: bounding information representative of a boundary of the at least one predetermined organ; bounding information representative of a boundary of a predetermined sub-structure of the at least one predetermined organ, a bounding box for the at least one predetermined organ, a bounding box for at least one predetermined sub-structure of the at least one predetermined organ (Zhang, “The learning of the proposed method only requires the annotation of nucleus positions and bounding boxes.”, pg. 7, 1st column, section 4.2, lines 1-2, The combination of Zhang in view of Qu and further in view of Zhang-------1 would use weak supervision annotations including bounding boxes for the hearts position.).
Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (“A Light-weight Interpretable Model for Nuclei Detection and Weakly-supervised Segmentation”, International Workshop on Medical Optical Imaging and Virtual Microscopy Image Analysis, 2022) in view of Qu et al. (“Weakly Supervised Deep Nuclei Segmentation Using Partial Points Annotation in Histopathology Images”, IEEE Transactions on Medical Imaging, 2020) and further in view of Zhang------- et al. (CN 114782384 A), (hereinafter Zhang-------1), and Zheng et al. (US 20130216110 A1), (hereinafter Zheng).

Regarding claim 6, Zhang in view of Qu and further in view of Zhang-------1 teaches a medical image processing apparatus according to claim 3. Zhang in view of Qu and further in view of Zhang-------1 does not teach wherein the weak supervision annotation information for each training medical image comprises at least one of: a volume of the at least one predetermined organ, a volume of a predetermined sub-structure of the at least one predetermined organ.

However, Zheng teaches wherein the weak supervision annotation information for each training medical image comprises at least one of: a volume of the at least one predetermined organ, a volume of a predetermined sub-structure of the at least one predetermined organ (Zheng, “In one embodiment of the present invention, heart chambers are segmented in a 3D Volume.”, pg. 1, paragraph 0005, lines 1-2, “After extracting the consistent part of all major coronary arteries (LM, LAD, LCX, and RCA) from a set of training Volumes, the extracted partial coronary artery models can be added to four-chamber heart model extracted from the training volumes.”, pg. 2, paragraph 0024, lines 1-5, A model is trained to produce segmentation with respect to 3D volumes of the heart. This includes learning a mean shape model from training volumes of the heart for segmentation predictions.).
Zhang in view of Qu and further in view of Zhang-------1 teaches a compositional network for heart detection and segmentation, which includes training the network using weak annotations comprising 2D boundaries of a heart (Zhang-------1, pg. 3, lines 5-24). Zhang in view of Qu and further in view of Zhang-------1 does not teach training the network using weak annotations comprising a 3D volume the heart or a 3D volume of sub-structures of the heart. Zheng teaches training a model using 3D training volumes of the heart to learn a shape model for detection and segmentation (see above). Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the weak annotations of Zhang in view of Qu and further in view of Zhang-------1 to include a volume of the heart as taught by Zheng (Zheng, pg. 2, paragraph 0024, lines 1-5). The motivation for doing so would have been to adapt the compositional network for 3D data processing, thereby improving the robustness and flexibility of the model in response to different imaging modalities. Further, one skilled in the art could have combined the elements as described above by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine the teachings of Zhang in view of Qu and further in view of Zhang-------1 with Zheng to obtain the invention as specified in claim 6.

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (“A Light-weight Interpretable Model for Nuclei Detection and Weakly-supervised Segmentation”, International Workshop on Medical Optical Imaging and Virtual Microscopy Image Analysis, 2022) in view of Qu et al. (“Weakly Supervised Deep Nuclei Segmentation Using Partial Points Annotation in Histopathology Images”, IEEE Transactions on Medical Imaging, 2020) and further in view of Qin et al. (CN 115482221 A), (hereinafter Qin).

Regarding claim 8, Zhang in view of Qu teaches a medical image processing apparatus according to claim 1. Zhang in view of Qu does not teach wherein the weak supervision annotation further comprises information relating to at least one pathology.

However, Qin teaches wherein the weak supervision annotation further comprises information relating to at least one pathology (Qin, “Since category labels are the least labor-intensive and the easiest to obtain, this application only focuses on using image-level category labels to train a weakly supervised semantic segmentation labeling model for pathological images.”, pg. 3, lines 5-8, “Step 1. Collect pathological image datasets and label them. For example, a dataset of digital pathology images stained by hematoxylin-eosin is collected and labeled with image-level categories.”, pg. 7, lines 12-15, “Step 4, use the trained semantic segmentation labeling model to segment and predict the target image. After the training is completed, the model parameters that meet the overall loss value standard, such as weights and biases, can be obtained. Further, the performance of the weakly supervised semantic segmentation labeling model obtained on the test data set can be verified. In practical applications, using the trained semantic segmentation labeling model to segment and recognize target images, tumors, inflamed or necrotic areas in pathological tissue images can be obtained, which can be used to assist doctors in diagnosis or pathological research. The invention only uses the image-level category labels of pathological images to train the pixel-level semantic segmentation labeling model, which improves the accuracy of weakly supervised semantic segmentation labeling of pathological images, reduces the training steps of weakly supervised semantic segmentation labeling, and can significantly reduce clinical and pathological quantification. The workload of human labeling in the analysis.”, pg. 13, lines 6-21).
Zhang in view of Qu teaches a compositional network for nuclei detection and segmentation, which includes training the network using weak annotations of isolated nuclei (Zhang, pg. 5, 2nd column, section 4, 2nd paragraph, see Fig. 1). Zhang in view of Qu does not teach training the network using weak annotations comprising information related to a pathology. Qin teaches training a model using weak image-level labels to perform segmentation of pathological regions (see above). Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the weak annotations of Zhang in view of Qu to include image-level labels for pathology segmentation as taught by Qin (Qin, pg. 7, lines 12-15 and pg. 13, lines 6-21). The motivation for doing so would have been simultaneously identify and segment pathological regions with nuclei, thereby improving the interpretability of the results for the physician. Further, one skilled in the art could have combined the elements as described above by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine the teachings of Zhang in view of Qu with Qin to obtain the invention as specified in claim 8.

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (“A Light-weight Interpretable Model for Nuclei Detection and Weakly-supervised Segmentation”, International Workshop on Medical Optical Imaging and Virtual Microscopy Image Analysis, 2022) in view of Qu et al. (“Weakly Supervised Deep Nuclei Segmentation Using Partial Points Annotation in Histopathology Images”, IEEE Transactions on Medical Imaging, 2020) and further in view of C. et al. (US 20220301163 A1), (hereinafter C).

Regarding claim 9, Zhang in view of Qu teaches a medical image processing apparatus according to claim 1. Zhang in view of Qu does not teach wherein the weak supervision annotation further comprises information relating to at least one medical device.

However, C teaches wherein the weak supervision annotation further comprises information relating to at least one medical device (C, “FIG. 4 is a schematic diagram depicting a method 400 for generating training images for the DL network model in accordance with another embodiment of the present technique. The method 400 may be implemented in training images generator 206 of FIG. 2A. The method 400 includes obtaining a template metal volume at step 402. The template metal volume comprises template source images that include representative examples of metal implants in a patient knee as shown in knee images 404. The template metal volume 402 is obtained from a database of historical images of patients having metal implants such as screws located in their knees, for example. At step 406, the method includes segmenting metal regions (i.e. , segmentation masks) from the template metal volume 402. For example, based on the geometry of the screw, a segmentation algorithm such as a 'connected pixel algorithm' may be used to segment or detect the metal region in the knee. In general, the segmentation algorithm determines which pixels of the images 404 belong to which objects.”, pg. 4, paragraph 0041).
Zhang in view of Qu teaches a compositional network for weakly-supervised nuclei detection and segmentation that is trained with histopathology images containing weak annotations of isolated nuclei (Zhang, pg. 5, 2nd column, section 4, 2nd paragraph, see Fig. 1). Zhang in view of Qu does not teach using weak annotations related to a medical device. C teaches generating images including a medical device, namely metal implants, to train a model for segmentation (see above). Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the compositional network of Zhang in view of Qu to perform medical device detection and segmentation by replacing the histopathology images with the images containing medical devices as taught by C (C, pg. 4, paragraph 0041). The motivation for doing so would have been to provide a weakly-supervised training method for medical device segmentation, thereby reducing the annotation burden for physicians. Further, one skilled in the art could have combined the elements as described above by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine the teachings of Zhang in view of Qu with C to obtain the invention as specified in claim 9.

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (“A Light-weight Interpretable Model for Nuclei Detection and Weakly-supervised Segmentation”, International Workshop on Medical Optical Imaging and Virtual Microscopy Image Analysis, 2022) in view of Qu et al. (“Weakly Supervised Deep Nuclei Segmentation Using Partial Points Annotation in Histopathology Images”, IEEE Transactions on Medical Imaging, 2020) and further in view of Zhang et al. (“Dive into Deep Learning”, Cambridge University Press, https://d2l.ai/chapter_computer-vision/image-augmentation.html, 2023), (hereinafter Zhang2).

Regarding claim 13, Zhang in view of Qu teaches a medical image processing apparatus according to claim 12. Zhang in view of Qu does not teach wherein the at least one augmentation transformation comprises scaling. 

However, Zhang2 teaches wherein the at least one augmentation transformation comprises scaling (Zhang2, “In the example image we used, the cat is in the middle of the image, but this may not be the case in general. In Section 7.5, we explained that the pooling layer can reduce the sensitivity of a convolutional layer to the target position. In addition, we can also randomly crop the image to make objects appear in different positions in the image at different scales, which can also reduce the sensitivity of a model to the target position. In the code below, we randomly crop an area with an area of 10%                         
                            ~
                        
                     100% of the original area each time, and the ratio of width to height of this area is randomly selected from 0.5                         
                            ~
                        
                     2. Then, the width and height of the region are both scaled to 200 pixels”, section 14.1.1.1., 3rd and 4th paragraphs). 
Zhang in view of Qu teaches applying augmentation transformations to images for training a compositional network (Zhang, pg. 6, 1st column, lines 10-17). Zhang in view of Qu does not teach scaling training images. Zhang2 teaches applying augmentation transformation for deep learning training including scaling images (see above). Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the augmentation transformation of Zhang in view of Qu to include scaling of training images as taught by Zhang2 (section 14.1.1.1., 3rd and 4th paragraphs). The motivation for doing so would have been ensure the compositional network learns features that are invariant to size changes, thereby improving accuracy of the detection and segmentation. Further, one skilled in the art could have combined the elements as described above by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine the teachings of Zhang in view of Qu with Zhang2 to obtain the invention as specified in claim 13.

Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (“A Light-weight Interpretable Model for Nuclei Detection and Weakly-supervised Segmentation”, International Workshop on Medical Optical Imaging and Virtual Microscopy Image Analysis, 2022) in view of Qu et al. (“Weakly Supervised Deep Nuclei Segmentation Using Partial Points Annotation in Histopathology Images”, IEEE Transactions on Medical Imaging, 2020) and further in view of Shen et al. (US 20230114388 A1), (hereinafter Shen).

Regarding claim 19, Zhang in view of Qu teaches a medical image processing apparatus according to claim 14. Zhang in view of Qu does not teach wherein the processing circuitry is further configured to analyse the activations to generate an explanation of the task output.

However, Shen teaches wherein the processing circuitry is further configured to analyse the activations to generate an explanation of the task output (Shen, “Activation Map Analysis (AMA) provides a system of multiple components to mature the neural net activation map-based analysis for solving the real-world scaling problems in explainable artificial intelligence (XAI). The AMA system attaches to an existing deep neural network (host network) as an observer that analyzes the internal activities of the host network to provide additional information, e.g., metrics for resilience, interpretability, and adversarial defense, to the end-user.”, pg. 2, paragraph 0026, lines 1-9, “Next, calculate the baseline non-conformity, which represents how well each element in C ( or training set r) conforms with the baseline activation maps 33. This is a multistep process. The calibration dataset 38 in fed into the host CNN 31. The activation map extraction module 32 produces the calibration activation maps 39 (CLAM)… The non-conformity set A serves as reference nonconformity pattern that the non-conformity of any other query input will be compared against to determine whether the query input is an in-distribution or out-distribution sample. If the query sample is out-distribution, then the host CNN output may not be trustworthy.”, pg. 3, paragraphs 0036-0038, Activation Map analysis is attached to a host neural network to observe internal activations of tasks. This allows the system to generate explanatory metrics for model outputs based on analyzing non-conformity of outputs to reference activations.).
Zhang in view of Qu teaches a compositional network which includes vMF kernel activations for nuclei detection and segmentation (Zhang, pg. 3, section 3.2, paragraphs 1 and 2). Zhang in view of Qu does not teach generating an explanation of the task output by analyzing the vMF kernel activations. Shen teaches performing an activation map analysis in line with a deep neural network to analyse activations and generate explanation for task outputs (see above). Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the compositional network of Zhang in view of Qu to include the activation map analysis as taught by Shen (Shen, pg. 2, paragraph 0026, lines 1-9 and pg. 3, paragraphs 0036-0038). The motivation for doing so would have been to better understand the decision-making of the composition model, thereby improving its interpretability and adaptability (as suggested by Shen, “Accordingly, before we can practically apply AI methods, there is a need for understanding how such decisions are made. The ability to explain DNN model decisions, which are particularly relevant to decision-making in key areas such as imaging and detection, is needed.”, pg. 1, paragraph 0005). Further, one skilled in the art could have combined the elements as described above by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine the teachings of Zhang in view of Qu with Shen to obtain the invention as specified in claim 19.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CONNOR LEVI HANSEN whose telephone number is (703)756-5533. The examiner can normally be reached Monday-Friday 9:00-5:00 (ET).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sumati Lefkowitz can be reached at (571) 272-3638. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/CONNOR L HANSEN/Examiner, Art Unit 2672

/SUMATI LEFKOWITZ/Supervisory Patent Examiner, Art Unit 2672
Read full office action
Prosecution Timeline

Apr 04, 2024
Application Filed
Mar 24, 2026
Non-Final Rejection — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/928,394
Patent 12530785
TRACKING DEVICE, TRACKING METHOD, AND RECORDING MEDIUM
2y 5m to grant Granted Jan 20, 2026
17/932,201
Patent 12524984
HISTOGRAM OF GRADIENT GENERATION
2y 5m to grant Granted Jan 13, 2026
18/152,283
Patent 12518363
IMAGE PROCESSING APPARATUS, IMAGE PROCESSING METHOD, IMAGE PROCESSING SYSTEM, AND STORAGE MEDIUM WITH PIECEWISE LINEAR FUNCTION FOR TONE CONVERSION ON IMAGE
2y 5m to grant Granted Jan 06, 2026
18/160,126
Patent 12499648
IMAGE PROCESSING APPARATUS, IMAGE CAPTURING APPARATUS, CONTROL METHOD, AND STORAGE MEDIUM FOR DETECTING SUBJECT IN CAPTURED IMAGE
2y 5m to grant Granted Dec 16, 2025
17/884,747
Patent 12482257
REDUCING ENVIRONMENTAL INTERFERENCE FROM IMAGES
2y 5m to grant Granted Nov 25, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds
Prosecution Projections

1-2
Expected OA Rounds
75%
Grant Probability
99%
With Interview (+29.2%)
2y 10m
Median Time to Grant
Low
PTA Risk
Based on 28 resolved cases by this examiner. Grant probability derived from career allow rate.
IMAGE DATA PROCESSING APPARATUS AND METHOD

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

IMAGE DATA PROCESSING APPARATUS AND METHOD

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email