DETAILED ACTION
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on August 22, 2025 have been entered.
Response to Arguments
The Applicant's Amendments and Arguments/Remarks filed 8/22/2025 have been considered, but still are not persuasive:
Re Claim 1: Applicant states, in page 10/13 of the Remarks, that cited references do not disclose newly added limitation of “generate one or more notifications to indicate quality of one or more segmentations of one or more objects…based on a score generated by comparing the one or more segmentations to one or more regenerated versions of the one or more segmentations generated by one or more neural networks”,
However, the Examiner disagrees, because
as discussed in the previous Office Action, Gazit teaches: “one or more regenerated versions of the one or more segmentations generated by one or more neural networks” as see: -- distribution of magnitudes of intensity gradient expected along a boundary of the organ, specific to the chosen set of organ intensity characteristics….. a) finding at least an approximate bounding region of the organ in the image, wherein the region of the image used for estimating organ intensity characteristics is the bounding region b) finding a new bounding region of the organ based on the segmentation of the organ; c) finding the organ intensity characteristics based on the new bounding region;…, and on the new bounding region; and e) segmenting the organ in the image again--, in [0022]-[0027];
so that it is clearly disclosed that the result of step (e) segmenting the organ in the image again read on the claimed “one or more regenerated versions of the one or more segmentations generated by one or more neural networks”; and “the segmentation of the organ” in b) finding a new bounding region of the organ based on the segmentation of the organ read on the claimed “the one or more segmentations”;
in other works, the claimed “the one or more segmentations” considered as the original segmentations, and segmentations of the ground truth in the training image data, and the claimed “regenerated segmentations” is the regenerated segmentations by the generative neural networks; and such,
Yang clearly discloses “generate one or more notifications to indicate quality of one or more segmentations of one or more objects…based on a score generated by comparing the one or more segmentations to one or more regenerated versions of the one or more segmentations generated by one or more neural networks” (see Yang: e.g., Fig. 6, and, -- a deep image-to-image network (DI2IN) that produces liver segmentation masks from input 3D medical images acts as the generator and is trained together with a discriminator that attempts to distinguish between ground truth liver segmentation mask training samples and liver segmentation masks generated by the DI2IN from input medical images. In an advantageous embodiment, the DI2IN employs a convolutional encoder-decoder architecture combined with multi-level feature concatenation and deep supervision. In training, the DI2IN-AN attempts to optimize a multi-class cross-entropy loss together with an adversarial term that aims to distinguish between the output of the DI2IN and the ground truth.--, in [0021], {apparently, above “to distinguish” is to compare one or more segmentations of one or more shape features of one or more images, because “segmentation masks” are the segmentations of one or more shape features of one or more images}, and, it is further illustrated the comparison of output of generated segmentations { “segmented liver boundary 802 generated using DI2IN-AN” to input of segmentations { the ground truth liver boundary 801} in Figs. 8-9, and, “Images 805 and 810 show the ground truth liver boundary 801 and segmented liver boundary 802 generated using DI2IN-AN”, in [0041]-[0042]: also see: -- a deep image-to-image network (DI2IN) for liver segmentation is pre-trained based on the training samples in a first training phase. The DI2IN is a multi-layer convolutional neural network (CNN) trained to perform liver segmentation in an input 3D medical image…. The segmentation task performed by the DI2IN 200 is defined as the voxel-wise binary classification of an input 3D medical image. As shown in FIG. 2, the DI2IN 200 takes an entire 3D CT volume 202 as input, and outputs a probability map that indicates the probability/likelihood of voxels belonging to the liver region. It is straightforward to covert such a probability map to a binary liver segmentation mask by labeling all voxels with a probability score greater than a threshold (e.g., 0.5) as positive (in the liver region) and all voxels with a probability score less than the threshold as negative (not in the liver region). The prediction 204 output by the DI2IN 200 for a given input 3D CT volume 202 can be output as a probability map or a binary liver segmentation mask.--, in [0024]
So that above Yang’s disclosed “a discriminator that attempts to distinguish between ground truth liver segmentation mask training samples and liver segmentation masks generated by the DI2IN from input medical images.” Determine/generate “one or more notifications” as discriminating/distinguishing whether the segmentations are of “segmentations” of the ground truth, or the segmentations of regenerated version);
Furthermore, it is clearly disclosed in above Gazit’s para. [0022]-[0027], “distribution of magnitudes of intensity gradient expected along a boundary of the organ”, and “intensity characteristics” and “estimated organ intensity characteristics”, which are applied in the comparison and cost function as a comparison operation are representations of boundaries of the organs, or the bounding region of the organ, such as the boundaries of a liver, which the boundaries, or bounding region are the “shape feature”, because these boundaries, or bounding region define the shape of the organ (see Gazit’s [0130]),
so that, Gazit’s “distribution of magnitudes of intensity gradient expected along a boundary of the organ”, and “intensity characteristics” and “estimated organ intensity characteristics” read on Application’s “label masks”, or “segmentation masks”, as the representation of the boundaries as the claimed limitation “one or more segmentations of one or more objects”, which is output by computer learning algorithm based on: --“because the intensity gradient expected for the boundary of a target organ will generally be higher for images of higher contrast.” …., and based on information on a distribution of intensities, in the target organ and/or its vicinity, is obtained first from each of a set of training images of that organ, including a variety of different organ intensity characteristics--, in [0096]-[0098]; And Gazit’s disclosures are consistent with Yang’s disclosures as discussed and applied above;
and furthermore, above Gazit’s disclosures of the one or more segmentations/boundaries of the training images are considered as input of one or more segmentations/boundaries of objects, being compared with computer generated output of one or more segmentations/boundaries of objects {organs} in representations of “the intensity gradient expected for the boundary of a target organ” and “the organ intensity characteristics based on the new bounding region”; as Gazit discloses: -- the intermediate results, such as a set of organ intensity characteristics determined for an input image, or image processing parameters to be used for processing an input image, or a segmentation mask of the lungs, and the final output, such as a smoothed image, or a segmented image, or a segmentation mask of a target organ--, in [0083]-[0084];
Gazit and Yang are combinable as they are in the same field of endeavor: segmentation using neural network and boundaries identification and comparison. Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify Gazit’s processor using Yang’s teachings by including generate one or more notifications to indicate quality of one or more segmentations of one or more objects…based on a score generated by comparing the one or more segmentations to one or more regenerated versions of the one or more segmentations generated by one or more neural networks {such as comparing ground truth liver segmentation mask training samples and liver segmentation masks generated by the DI2IN from input medical images} to Gazit’s learning machine in order to perform and improve automated computer-based liver segmentation in 3D medical images (see Yang: e.g. in [0005], and [0019]-[0024], [0041]-[0042]).
Therefore, claims 1-35 are still not patentably distinguishable over the prior art reference(s). Further discussions are addressed in the prior art rejection section below.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-5, 13-35 is rejected under 35 U.S.C. 103 as being unpatentable over Gazit (US 20160300351 A1, as provided in IDS); and in view of Yang (US 20180260957 A1, as provided in IDS).
Re Claim 1, Gazit discloses one or more processors, comprising:
circuitry to regenerate regenerated versions of the one or more segmentations by one or more neural networks see: -- distribution of magnitudes of intensity gradient expected along a boundary of the organ, specific to the chosen set of organ intensity characteristics….. a) finding at least an approximate bounding region of the organ in the image, wherein the region of the image used for estimating organ intensity characteristics is the bounding region b) finding a new bounding region of the organ based on the segmentation of the organ; c) finding the organ intensity characteristics based on the new bounding region;…, and on the new bounding region; and e) segmenting the organ in the image again--, in [0022]-[0027];
{so that it is clearly disclosed that the result of step (e) segmenting the organ in the image again read on the claimed “one or more regenerated versions of the one or more segmentations generated by one or more neural networks”; and “the segmentation of the organ” in b) finding a new bounding region of the organ based on the segmentation of the organ read on the claimed “the one or more segmentations”;
in other works, the claimed “the one or more segmentations” considered as the original segmentations, and segmentations of the ground truth in the training image data, and the claimed “regenerated segmentations” is the regenerated segmentations by the generative neural networks};
above Gazit’s disclosures of the one or more segmentations/boundaries of the training images are considered as input of one or more segmentations/boundaries of objects, being compared with computer generated output of one or more segmentations/boundaries of objects {organs} in representations of “the intensity gradient expected for the boundary of a target organ” and “the organ intensity characteristics based on the new bounding region”; as Gazit discloses: -- the intermediate results, such as a set of organ intensity characteristics determined for an input image, or image processing parameters to be used for processing an input image, or a segmentation mask of the lungs, and the final output, such as a smoothed image, or a segmented image, or a segmentation mask of a target organ--, in [0083]-[0084]; and,
so that, Gazit’s “distribution of magnitudes of intensity gradient expected along a boundary of the organ”, and “intensity characteristics” and “estimated organ intensity characteristics” read on Application’s “label masks”, or “segmentation masks”, as the representation of the boundaries as the claimed limitation “one or more segmentations of one or more objects”, which is output by computer learning algorithm based on: --“because the intensity gradient expected for the boundary of a target organ will generally be higher for images of higher contrast.” …., and based on information on a distribution of intensities, in the target organ and/or its vicinity, is obtained first from each of a set of training images of that organ, including a variety of different organ intensity characteristics--, in [0096]-[0098]);
Gazit however does not explicitly disclose generate one or more notifications to indicate quality of one or more segmentations of one or more objects within one or more images based, at least in part, on a score generated by comparing the one or more segmentations to one or more regenerated versions of the one or more segmentations generated by one or more neural networks;
Yang discloses generate one or more notifications to indicate quality of one or more segmentations of one or more objects within one or more images based, at least in part, on a score generated by comparing the one or more segmentations to one or more regenerated versions of the one or more segmentations generated by one or more neural networks (see Yang: e.g., Fig. 6, and, -- a deep image-to-image network (DI2IN) that produces liver segmentation masks from input 3D medical images acts as the generator and is trained together with a discriminator that attempts to distinguish between ground truth liver segmentation mask training samples and liver segmentation masks generated by the DI2IN from input medical images. In an advantageous embodiment, the DI2IN employs a convolutional encoder-decoder architecture combined with multi-level feature concatenation and deep supervision. In training, the DI2IN-AN attempts to optimize a multi-class cross-entropy loss together with an adversarial term that aims to distinguish between the output of the DI2IN and the ground truth.--, in [0021], {apparently, above “to distinguish” is to compare one or more segmentations of one or more shape features of one or more images, because “segmentation masks” are the segmentations of one or more shape features of one or more images}, and, it is further illustrated the comparison of output of generated segmentations { “segmented liver boundary 802 generated using DI2IN-AN” to input of segmentations { the ground truth liver boundary 801} in Figs. 8-9, and, “Images 805 and 810 show the ground truth liver boundary 801 and segmented liver boundary 802 generated using DI2IN-AN”, in [0041]-[0042]: also see: -- a deep image-to-image network (DI2IN) for liver segmentation is pre-trained based on the training samples in a first training phase. The DI2IN is a multi-layer convolutional neural network (CNN) trained to perform liver segmentation in an input 3D medical image…. The segmentation task performed by the DI2IN 200 is defined as the voxel-wise binary classification of an input 3D medical image. As shown in FIG. 2, the DI2IN 200 takes an entire 3D CT volume 202 as input, and outputs a probability map that indicates the probability/likelihood of voxels belonging to the liver region. It is straightforward to covert such a probability map to a binary liver segmentation mask by labeling all voxels with a probability score greater than a threshold (e.g., 0.5) as positive (in the liver region) and all voxels with a probability score less than the threshold as negative (not in the liver region). The prediction 204 output by the DI2IN 200 for a given input 3D CT volume 202 can be output as a probability map or a binary liver segmentation mask.--, in [0024]; and, --The alternating of the discriminator training 502 and generator training 504 is iterated for a plurality of iterations, until the discriminator is not able to easily distinguish between the ground truth label maps (ground truth liver segmentation masks) and the predictions output by the DI2IN (predicted liver segmentation masks). For example, the discriminator training 502 and the generator training 504 can be iterated until the discriminator weights and the generator weights converge or until a predetermine number of maximum iterations is reached. The algorithm 500 outputs the final updated generator weights. After the training process, the adversarial network is no longer needed during the inference stage (110 of FIG. 1). The trained generator (DI2IN) itself can be used during the inference stage (110) to provide high quality liver segmentation, with improved performance due to the adversarial training.--, in [0032];
{So that above Yang’s disclosed “a discriminator that attempts to distinguish between ground truth liver segmentation mask training samples and liver segmentation masks generated by the DI2IN from input medical images.” Determine/generate “one or more notifications” as discriminating/distinguishing whether the segmentations are of “segmentations” of the ground truth, or the segmentations of regenerated version; and, Gazit’s disclosures are consistent with Yang’s disclosures as discussed and applied above});
Gazit and Yang are combinable as they are in the same field of endeavor: segmentation using neural network and boundaries identification and comparison. Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify Gazit’s processor using Yang’s teachings by including generate one or more notifications to indicate quality of one or more segmentations of one or more objects within one or more images based, at least in part, on a score generated by comparing the one or more segmentations to one or more regenerated versions of the one or more segmentations generated by one or more neural networks {such as comparing ground truth liver segmentation mask training samples and liver segmentation masks generated by the DI2IN from input medical images} to Gazit’s learning machine in order to perform and improve automated computer-based liver segmentation in 3D medical images (see Yang: e.g. in [0005], and [0019]-[0024], [0041]-[0042]).
Re Claim 2, Gazit as modified by Yang further disclose wherein: the one or more images comprises a medical image; and the one or more segmentations represent a boundary corresponding to the one or more objects within the medical image (see Gazit: e.g., --Gaussian distributions of intensity gradient, with different mean value and standard deviation corresponding to the interior and to the boundary of the left kidney for each of four clusters of different organ intensity characteristics, made from training images according to the method of FIG. 4.--, in [0022], [0073], and, --the image is anisotropically smoothed, reducing noise in the interior and exterior of a target organ, but not blurring the sharpness of the boundaries of the organ. The parameters for doing this will in general depend on the organ intensity characteristics, especially when they are due to different results of contrast agent use or lack of use, because the intensity gradient expected for the boundary of a target organ will generally be higher for images of higher contrast. In another example, the contrast of the image is changed, in order to make a particular type of feature or structure of a target organ, for example a lesion, more easily visible when the image is viewed.--, and, --The characteristics of the image being processed are then learned by comparing information on the distribution of intensities in a region of the image, for example in or near the target organ, to the corresponding information for the different clusters of training images.--, in [0082]-[0083], [0096]-[0100], -- to include intensity gradients in the cost function of a region-growing algorithm for segmentation of the target organ, depending on information about the expected distribution of intensity gradients inside and at the boundary of the target organ in the image being processed.--, in [0104]-[0108], [0142]-[0143], [0170], [0174], and [0179]-[0180]; Gazit teaches: -- distribution of magnitudes of intensity gradient expected along a boundary of the organ, specific to the chosen set of organ intensity characteristics….. a) finding at least an approximate bounding region of the organ in the image, wherein the region of the image used for estimating organ intensity characteristics is the bounding region b) finding a new bounding region of the organ based on the segmentation of the organ; c) finding the organ intensity characteristics based on the new bounding region;…, and on the new bounding region; and e) segmenting the organ in the image again--, in [0022]-[0027];
it is clearly disclosed in above Gazit’s para. [0022]-[0027], “distribution of magnitudes of intensity gradient expected along a boundary of the organ”, and “intensity characteristics” and “estimated organ intensity characteristics”, which are applied in the comparison and cost function as a comparison operation are representations of boundaries of the organs, or the bounding region of the organ, such as the boundaries of a liver, which the boundaries, or bounding region are the “shape feature”, because these boundaries, or bounding region define the shape of the organ (see Gazit’s [0130]),
so that, Gazit’s “distribution of magnitudes of intensity gradient expected along a boundary of the organ”, and “intensity characteristics” and “estimated organ intensity characteristics” read on Application’s “label masks”, or “segmentation masks”, as the representation of the boundaries being compared;
as Gazit discloses: -- the intermediate results, such as a set of organ intensity characteristics determined for an input image, or image processing parameters to be used for processing an input image, or a segmentation mask of the lungs, and the final output, such as a smoothed image, or a segmented image, or a segmentation mask of a target organ--, in [0083]-[0084]; also see Yang: e.g., Fig. 6, and, -- a deep image-to-image network (DI2IN) that produces liver segmentation masks from input 3D medical images acts as the generator and is trained together with a discriminator that attempts to distinguish between ground truth liver segmentation mask training samples and liver segmentation masks generated by the DI2IN from input medical images. In an advantageous embodiment, the DI2IN employs a convolutional encoder-decoder architecture combined with multi-level feature concatenation and deep supervision. In training, the DI2IN-AN attempts to optimize a multi-class cross-entropy loss together with an adversarial term that aims to distinguish between the output of the DI2IN and the ground truth.--, in [0021]-[0024], {apparently, above “to distinguish” is to compare one or more boundaries of one or more shape features of one or more images, because “segmentation masks” are the boundaries of one or more shape features of one or more images}).
Re Claim 3, Gazit as modified by Yang further disclose wherein the one or more neural networks includes a variational autoencoder trained with ground truth boundary information (see Yang: e.g., --According to an embodiment of the present invention, a deep image-to-image network (DI2IN) that produces liver segmentation masks from input 3D medical images acts as the generator and is trained together with a discriminator that attempts to distinguish between ground truth liver segmentation mask training samples and liver segmentation masks generated by the DI2IN from input medical images. In an advantageous embodiment, the DI2IN employs a convolutional encoder-decoder architecture combined with multi-level feature concatenation and deep supervision. In training, the DI2IN-AN attempts to optimize a multi-class cross-entropy loss together with an adversarial term that aims to distinguish between the output of the DI2IN and the ground truth. Advantageously, the discriminator pushes the generator's output towards the distribution of ground truth, and thus enhances the generator's performance by refining its output during training. Since the discriminator can be implemented using a CNN which takes the joint configuration of many input variables, the discriminator embeds higher-order potentials in the adversarial network.--, in [0021]; and, --In the encoder part (BLK 1-BLK 4) of the DI2IN 200 only convolutional layers are used in all of the blocks. In order to increase the receptive field of neurons and lower the GPU memory consumption, the stride is set as 2 at some layers in the encoder and the size of the feature maps is reduced at each of those layers. Moreover, a larger receptive field covers more contextual information and helps to preserve liver shape information in the prediction.--, in [0025]; and, -- the DI2IN is trained together with a discriminator network in adversarial network in order to boost the performance of the DI2IN. FIG. 3 illustrates an adversarial network according to an embodiment of the present invention. As shown in FIG. 3, the adversarial network includes a generator 300 and a discriminator 310. According to an advantageous embodiment, the generator 300 is the DI2IN for liver segmentation. For example, the generator 300 can be the DI2IN 200 having the network structure shown in FIG. 2. The discriminator 310 is a deep neural network that attempts to distinguish between ground truth liver segmentation masks and predicted liver segmentation masks generated by the generator 300 (DI2IN) from training images. The adversarial network is utilized in training to capture high-order appearance information, which distinguishes between the ground truth and output from the DI2IN. During training, the generator 300 inputs training CT volumes 302 and generates predictions 304 (i.e., predicted liver segmentation masks or probability maps) from the input training CT volumes 302. The discriminator 310 inputs ground truth liver segmentation masks 306 and the predictions 304 generated by the generator 300, and classifies these images as real/ground truth (positive) or fake/prediction (negative).--, in [0028]).
Re Claim 4, Gazit as modified by Yang further disclose wherein a comparison of the one or more segmentations to the one or more regenerated versions of the one or more segmentations is a value used to train the one or more neural networks (see Gazit: e.g., --for each location relative to the bounding region, that the location is in the target organ. The set of probabilities for a given organ is referred to as a "probabilistic atlas" for that organ. In order to meaningfully compare bounding regions that in general have different dimensions in different images, the bounding region in each image is mapped into the probabilistic atlas, for example by linearly scaling the different dimensions of the bounding region, using scaling factors that may be different for different directions and different images.--, in [0147], and, also see: --Gaussian distributions of intensity gradient, with different mean value and standard deviation corresponding to the interior and to the boundary of the left kidney for each of four clusters of different organ intensity characteristics, made from training images according to the method of FIG. 4.--, in [0022], [0073], and, --the image is anisotropically smoothed, reducing noise in the interior and exterior of a target organ, but not blurring the sharpness of the boundaries of the organ. The parameters for doing this will in general depend on the organ intensity characteristics, especially when they are due to different results of contrast agent use or lack of use, because the intensity gradient expected for the boundary of a target organ will generally be higher for images of higher contrast. In another example, the contrast of the image is changed, in order to make a particular type of feature or structure of a target organ, for example a lesion, more easily visible when the image is viewed.--, and, --The characteristics of the image being processed are then learned by comparing information on the distribution of intensities in a region of the image, for example in or near the target organ, to the corresponding information for the different clusters of training images.--, in [0082]-[0083], [0096]-[0100], -- to include intensity gradients in the cost function of a region-growing algorithm for segmentation of the target organ, depending on information about the expected distribution of intensity gradients inside and at the boundary of the target organ in the image being processed.--, in [0104]-[0108], [0142]-[0143], [0170], [0174], and [0179]-[0180]; Gazit teaches: -- distribution of magnitudes of intensity gradient expected along a boundary of the organ, specific to the chosen set of organ intensity characteristics….. a) finding at least an approximate bounding region of the organ in the image, wherein the region of the image used for estimating organ intensity characteristics is the bounding region b) finding a new bounding region of the organ based on the segmentation of the organ; c) finding the organ intensity characteristics based on the new bounding region;…, and on the new bounding region; and e) segmenting the organ in the image again--, in [0022]-[0027]; also see Yang: e.g., Fig. 1, and, -- The trained deep image-to-image network is trained in an adversarial network together with a discriminative network that distinguishes between predicted liver segmentation masks generated by the deep image-to-image network from input training volumes and ground truth liver segmentation masks.--, in abstract, and, -- utilize a trained deep image-to-image network to generate a liver segmentation mask from an input medical image of a patient. Embodiments of the present invention train the deep image-to-image network for liver segmentation in an adversarial network, in which the deep image-to-image network is trained together with a discriminator network that attempts to distinguish between ground truth liver segmentation masks and liver segmentation masks generated by the deep image-to-image network.--, in [0005]; and, -- a deep image-to-image network (DI2IN) for liver segmentation is pre-trained based on the training samples in a first training phase. The DI2IN is a multi-layer convolutional neural network (CNN) trained to perform liver segmentation in an input 3D medical image…. The segmentation task performed by the DI2IN 200 is defined as the voxel-wise binary classification of an input 3D medical image. As shown in FIG. 2, the DI2IN 200 takes an entire 3D CT volume 202 as input, and outputs a probability map that indicates the probability/likelihood of voxels belonging to the liver region. It is straightforward to covert such a probability map to a binary liver segmentation mask by labeling all voxels with a probability score greater than a threshold (e.g., 0.5) as positive (in the liver region) and all voxels with a probability score less than the threshold as negative (not in the liver region). The prediction 204 output by the DI2IN 200 for a given input 3D CT volume 202 can be output as a probability map or a binary liver segmentation mask.--, in [0024]).
Re Claim 5, Gazit as modified by Yang further disclose wherein one or more notifications indicates that the one or more segmentations conforms to ground truth data. (see Yang: e.g., Fig. 1, and, -- The trained deep image-to-image network is trained in an adversarial network together with a discriminative network that distinguishes between predicted liver segmentation masks generated by the deep image-to-image network from input training volumes and ground truth liver segmentation masks.--, in abstract, and, -- utilize a trained deep image-to-image network to generate a liver segmentation mask from an input medical image of a patient. Embodiments of the present invention train the deep image-to-image network for liver segmentation in an adversarial network, in which the deep image-to-image network is trained together with a discriminator network that attempts to distinguish between ground truth liver segmentation masks and liver segmentation masks generated by the deep image-to-image network.--, in [0005]; and, -- a deep image-to-image network (DI2IN) for liver segmentation is pre-trained based on the training samples in a first training phase. The DI2IN is a multi-layer convolutional neural network (CNN) trained to perform liver segmentation in an input 3D medical image…. The segmentation task performed by the DI2IN 200 is defined as the voxel-wise binary classification of an input 3D medical image. As shown in FIG. 2, the DI2IN 200 takes an entire 3D CT volume 202 as input, and outputs a probability map that indicates the probability/likelihood of voxels belonging to the liver region. It is straightforward to covert such a probability map to a binary liver segmentation mask by labeling all voxels with a probability score greater than a threshold (e.g., 0.5) as positive (in the liver region) and all voxels with a probability score less than the threshold as negative (not in the liver region). The prediction 204 output by the DI2IN 200 for a given input 3D CT volume 202 can be output as a probability map or a binary liver segmentation mask.--, in [0024]).
Re Claim 13, Gazit as modified by and Yang further disclose wherein the processor comprises a graphical processing unit (GPU) (see Yang: e.g., -- CRF and graph cut both suffer from serious leakage in these situations. Using a NVIDIA TITAN X GPU and the Theano/Lasagne library, the run time of liver segmentation using the DI2IN-AN method is less than one second, which is significantly faster than most current approaches.--, in [0039]).
Re Claim 14, claim 14 is the corresponding method claim to claim 1 respectively. Thus, claim 14 is rejected for the similar reasons as for claim 1. Furthermore, Gazit as modified by and Yang further disclose a method, using a processor comprising one or more circuits, comprising causing one or more output boundaries of one or more objects within one or more images generated by one or more neural networks to be compared to one or more input boundaries of the one or more objects to the one or more neural networks (see Yang: e.g., Fig. 6, and, -- a deep image-to-image network (DI2IN) that produces liver segmentation masks from input 3D medical images acts as the generator and is trained together with a discriminator that attempts to distinguish between ground truth liver segmentation mask training samples and liver segmentation masks generated by the DI2IN from input medical images. In an advantageous embodiment, the DI2IN employs a convolutional encoder-decoder architecture combined with multi-level feature concatenation and deep supervision. In training, the DI2IN-AN attempts to optimize a multi-class cross-entropy loss together with an adversarial term that aims to distinguish between the output of the DI2IN and the ground truth.--, in [0021]-[0024], {apparently, above “to distinguish” is to compare one or more boundaries of one or more shape features of one or more images, because “segmentation masks” are the boundaries of one or more shape features of one or more images}).
Re Claim 15, Gazit as modified by Yang further disclose generating the one or moresegmentation of the image, wherein the one or more segmentation represents a processor-determined set of boundaries of objects depicted in the image (see Gazit: e.g., -- to include intensity gradients in the cost function of a region-growing algorithm for segmentation of the target organ, depending on information about the expected distribution of intensity gradients inside and at the boundary of the target organ in the image being processed.--, in [0104]-[0108], [0142]-[0143], [0174], and [0179]-[0180]; also see Yang: e.g., Fig. 1, and, -- The trained deep image-to-image network is trained in an adversarial network together with a discriminative network that distinguishes between predicted liver segmentation masks generated by the deep image-to-image network from input training volumes and ground truth liver segmentation masks.--, in abstract);
inputting the one or more segmentations to a neural network previously trained on a collection of training segmentations (see Yang: e.g., Fig. 1, and, -- The trained deep image-to-image network is trained in an adversarial network together with a discriminative network that distinguishes between predicted liver segmentation masks generated by the deep image-to-image network from input training volumes and ground truth liver segmentation masks.--, in abstract, and, -- utilize a trained deep image-to-image network to generate a liver segmentation mask from an input medical image of a patient. Embodiments of the present invention train the deep image-to-image network for liver segmentation in an adversarial network, in which the deep image-to-image network is trained together with a discriminator network that attempts to distinguish between ground truth liver segmentation masks and liver segmentation masks generated by the deep image-to-image network.--, in [0005]; and, -- a deep image-to-image network (DI2IN) for liver segmentation is pre-trained based on the training samples in a first training phase. The DI2IN is a multi-layer convolutional neural network (CNN) trained to perform liver segmentation in an input 3D medical image…. The segmentation task performed by the DI2IN 200 is defined as the voxel-wise binary classification of an input 3D medical image. As shown in FIG. 2, the DI2IN 200 takes an entire 3D CT volume 202 as input, and outputs a probability map that indicates the probability/likelihood of voxels belonging to the liver region. It is straightforward to covert such a probability map to a binary liver segmentation mask by labeling all voxels with a probability score greater than a threshold (e.g., 0.5) as positive (in the liver region) and all voxels with a probability score less than the threshold as negative (not in the liver region). The prediction 204 output by the DI2IN 200 for a given input 3D CT volume 202 can be output as a probability map or a binary liver segmentation mask.--, in [0024]);
comparing the one or more segmentations to an output of the one or more neural networks, wherein the output of the one or more neural networks comprises the one or more regenerated versions of the one or more segmentations (see Yang: e.g., Figs. 8-9; and, ---- utilize a trained deep image-to-image network to generate a liver segmentation mask from an input medical image of a patient. Embodiments of the present invention train the deep image-to-image network for liver segmentation in an adversarial network, in which the deep image-to-image network is trained together with a discriminator network that attempts to distinguish between ground truth liver segmentation masks and liver segmentation masks generated by the deep image-to-image network.--, in [0005]; and, --As shown in FIG. 8, image 800 is a CT image of a patient with pleural effusion, which brightens the lung region and changes the pattern of the upper boundary of the liver. This significantly increases the difficult for automatic liver segmentation, since in most CT volumes the lung looks dark with a low intensity. A test case shown in FIG. 8 usually corresponds with the largest error for a given method in Table 1. Images 805 and 810 show the ground truth liver boundary 801 and segmented liver boundary 802 generated using DI2IN-AN in different slices of the CT volume of the patient with pleural effusion. Although the DI2IN-AN segmentation has difficulty at the upper boundary, it still outperforms the other methods in this challenging test case. [0042] FIG. 9 illustrates exemplary liver segmentation results using the method of FIG. 1 in a CT volume of a patient with an enlarged liver. Another challenging case for automatic liver segmentation is a patient with an enlarged liver. As shown in FIG. 9, image 900 is a CT image of a patient with an enlarged liver. Images 905 and 910 show the ground truth liver boundary 901 and the segmented liver boundary 902 generated using DI2IN-AN segmentation.--, in [0041]-[0042]);
determining the score for the one or more segmentations, wherein the score is a function of differences between the one or more segmentations and the output of the neural network (see Yang: e.g., --These operations are repeated k.sub.D times, and then the algorithm proceeds to the generator G training 504. In the generator G training, a mini-batch of training images x.about.p.sub.data are samples, a prediction y.sub.pred is generated for each training image x by the generator with the current generator weights G (x;.theta..sub.0.sup.G) and a classification score D(G(x')) is computed by the discriminator for each prediction, and updated generator weights .theta..sub.1.sup.G are learned to minimize the generator loss function l.sub.G by propagating back the stochastic gradient .gradient.l.sub.G (y.sub.gt',y.sub.pred'). These operations are repeated k.sub.G times. The alternating of the discriminator training 502 and generator training 504 is iterated for a plurality of iterations, until the discriminator is not able to easily distinguish between the ground truth label maps (ground truth liver segmentation masks) and the predictions output by the DI2IN (predicted liver segmentation masks)…. The trained generator (DI2IN) itself can be used during the inference stage (110) to provide high quality liver segmentation, with improved performance due to the adversarial training. --, in [0032]-[0033]).
Re Claim 16, Gazit as modified by Yang further disclose wherein the neural network is a variational autoencoder that takes the one or more segmentations as its input, wherein the variational autoencoder maps features of its input to a reduced feature space from which the one or more segmentations can be approximately reproduced from features in the reduced feature space (see Yang: e.g., --According to an embodiment of the present invention, a deep image-to-image network (DI2IN) that produces liver segmentation masks from input 3D medical images acts as the generator and is trained together with a discriminator that attempts to distinguish between ground truth liver segmentation mask training samples and liver segmentation masks generated by the DI2IN from input medical images. In an advantageous embodiment, the DI2IN employs a convolutional encoder-decoder architecture combined with multi-level feature concatenation and deep supervision. In training, the DI2IN-AN attempts to optimize a multi-class cross-entropy loss together with an adversarial term that aims to distinguish between the output of the DI2IN and the ground truth. Advantageously, the discriminator pushes the generator's output towards the distribution of ground truth, and thus enhances the generator's performance by refining its output during training. Since the discriminator can be implemented using a CNN which takes the joint configuration of many input variables, the discriminator embeds higher-order potentials in the adversarial network.--, in [0021]; and, --In the encoder part (BLK 1-BLK 4) of the DI2IN 200 only convolutional layers are used in all of the blocks. In order to increase the receptive field of neurons and lower the GPU memory consumption, the stride is set as 2 at some layers in the encoder and the size of the feature maps is reduced at each of those layers. Moreover, a larger receptive field covers more contextual information and helps to preserve liver shape information in the prediction.--, in [0025]; and, -- the DI2IN is trained together with a discriminator network in adversarial network in order to boost the performance of the DI2IN. FIG. 3 illustrates an adversarial network according to an embodiment of the present invention. As shown in FIG. 3, the adversarial network includes a generator 300 and a discriminator 310. According to an advantageous embodiment, the generator 300 is the DI2IN for liver segmentation. For example, the generator 300 can be the DI2IN 200 having the network structure shown in FIG. 2. The discriminator 310 is a deep neural network that attempts to distinguish between ground truth liver segmentation masks and predicted liver segmentation masks generated by the generator 300 (DI2IN) from training images. The adversarial network is utilized in training to capture high-order appearance information, which distinguishes between the ground truth and output from the DI2IN. During training, the generator 300 inputs training CT volumes 302 and generates predictions 304 (i.e., predicted liver segmentation masks or probability maps) from the input training CT volumes 302. The discriminator 310 inputs ground truth liver segmentation masks 306 and the predictions 304 generated by the generator 300, and classifies these images as real/ground truth (positive) or fake/prediction (negative).--, in [0028]).
Re Claim 17, Gazit as modified by Yang further disclose training the variational autoencoder with the collection of training segmentations, wherein the collection of training segmentations are represented by label masks that are ground truth label masks of training images previously determined to represent good segmentations of the images (see Yang: e.g., --According to an embodiment of the present invention, a deep image-to-image network (DI2IN) that produces liver segmentation masks from input 3D medical images acts as the generator and is trained together with a discriminator that attempts to distinguish between ground truth liver segmentation mask training samples and liver segmentation masks generated by the DI2IN from input medical images. In an advantageous embodiment, the DI2IN employs a convolutional encoder-decoder architecture combined with multi-level feature concatenation and deep supervision. In training, the DI2IN-AN attempts to optimize a multi-class cross-entropy loss together with an adversarial term that aims to distinguish between the output of the DI2IN and the ground truth. Advantageously, the discriminator pushes the generator's output towards the distribution of ground truth, and thus enhances the generator's performance by refining its output during training. Since the discriminator can be implemented using a CNN which takes the joint configuration of many input variables, the discriminator embeds higher-order potentials in the adversarial network.--, in [0021]; -- a deep image-to-image network (DI2IN) for liver segmentation is pre-trained based on the training samples in a first training phase. The DI2IN is a multi-layer convolutional neural network (CNN) trained to perform liver segmentation in an input 3D medical image…. The segmentation task performed by the DI2IN 200 is defined as the voxel-wise binary classification of an input 3D medical image. As shown in FIG. 2, the DI2IN 200 takes an entire 3D CT volume 202 as input, and outputs a probability map that indicates the probability/likelihood of voxels belonging to the liver region. It is straightforward to covert such a probability map to a binary liver segmentation mask by labeling all voxels with a probability score greater than a threshold (e.g., 0.5) as positive (in the liver region) and all voxels with a probability score less than the threshold as negative (not in the liver region). The prediction 204 output by the DI2IN 200 for a given input 3D CT volume 202 can be output as a probability map or a binary liver segmentation mask.--, in [0024]; and, --In the encoder part (BLK 1-BLK 4) of the DI2IN 200 only convolutional layers are used in all of the blocks. In order to increase the receptive field of neurons and lower the GPU memory consumption, the stride is set as 2 at some layers in the encoder and the size of the feature maps is reduced at each of those layers. Moreover, a larger receptive field covers more contextual information and helps to preserve liver shape information in the prediction.--, in [0025]; and, -- the DI2IN is trained together with a discriminator network in adversarial network in order to boost the performance of the DI2IN. FIG. 3 illustrates an adversarial network according to an embodiment of the present invention. As shown in FIG. 3, the adversarial network includes a generator 300 and a discriminator 310. According to an advantageous embodiment, the generator 300 is the DI2IN for liver segmentation. For example, the generator 300 can be the DI2IN 200 having the network structure shown in FIG. 2. The discriminator 310 is a deep neural network that attempts to distinguish between ground truth liver segmentation masks and predicted liver segmentation masks generated by the generator 300 (DI2IN) from training images. The adversarial network is utilized in training to capture high-order appearance information, which distinguishes between the ground truth and output from the DI2IN. During training, the generator 300 inputs training CT volumes 302 and generates predictions 304 (i.e., predicted liver segmentation masks or probability maps) from the input training CT volumes 3