Last updated: April 19, 2026
Application No. 18/172,682
MULTI ARM MACHINE LEARNING MODELS WITH ATTENTION FOR LESION SEGMENTATION

Final Rejection §103
Filed
Feb 22, 2023
Examiner
YANG, WEI WEN
Art Unit
2662
Tech Center
2600 — Communications
Assignee
Hoffmann-La Roche, Inc.
OA Round
2 (Final)
Interview Optional

— +10.9% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 657 resolved cases, 2023–2026
Examiner Intelligence

YANG, WEI WEN View full profile →
Grants 82% — above average
Career Allow Rate
539 granted / 657 resolved
+20.0% vs TC avg
Moderate +11% lift
Without
With
+10.9%
Interview Lift
resolved cases with interview
Typical timeline
2y 8m
Avg Prosecution
34 currently pending
Career history
691
Total Applications
across all art units
Statute-Specific Performance

§101
8.1%
-31.9% vs TC avg
§103
72.5%
+32.5% vs TC avg
§102
11.1%
-28.9% vs TC avg
§112
7.5%
-32.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 657 resolved cases
Office Action

§103
DETAILED ACTION 
Response to Arguments
The amendments and arguments filed 10/9/2025 have been entered and made of record. 

The Applicant's amendments and arguments filed 10/9/2025 have been considered but are moot in view of the new ground(s) of rejection because the Applicant has amended independent claims 1, 16 and 20. Furthermore, Applicant's arguments in view of the amendments filed 10/9/2025 have been fully considered but they are not persuasive:
Re amended claim 1, Applicant asserts (in page 9 of the Arguments of 10/9/2025) that cited references, SHIM as modified by GHADIMI and KOCH do not disclose newly added limitation of “wherein the one or more corresponding encoder arms comprise a plurality of encoder models, and wherein the first three-dimensional MRI image is processed by a different encoder model than the second three-dimensional MRI image”.
However, the Examiner disagrees, because:
First, SHIM as modified by GHADIMI and KOCH apparently disclose wherein the one or more corresponding encoder arms comprise a plurality of encoder models (see SHIM: e.g., -- [0039] The systems and methods of the disclosure provide a parallelized deep learning approach to spectral fitting for magnetic resonance spectroscopy data enabling spectral fitting and (in vivo) metabolite measurements to be performed in substantially real-time (e.g., in about one minute), for example, using a conventional computer. The disclosure uses an unsupervised deep learning architecture that incorporates mathematical and physics-based models of spectral lines shape and baseline to generate an encoding of spectral parameters while being fully constrained within known physics. The architecture includes a series of linear operations and therefore is highly parallelizable. Multiple spectra acquired using a magnetic resonance spectroscopy system can be combined in a (2D) matrix (e.g. rows including datapoints representing each spectrum) and passed through the architecture according to the disclosure to generate independent parameter sets on each spectrum in parallel, resulting in spectral fitting and metabolite quantification (e.g., concentration or volume), for example, of the region of interest (e.g., whole-brain, an anatomic region within the brain, etc.), in substantially real-time without requiring substantial computational processing resources (e.g., graphical processing units (GPUs)). The disclosure can thereby address the computational bottleneck in processing of (in vivo) whole-brain spectroscopic imaging--, in [0039], and in view of
SHIM’s Fig. 2, and Fig. 3, as reproduced below:

    PNG
    media_image1.png
    1002
    871
    media_image1.png
    Greyscale

It is clearly demonstrated that SHIM’s one or more encoder(s) arms  to generate an encoding of spectral parameters, include multiple line shape model, baseline model,  θB, spectral parameters model such as spectral peaks model θP, (see SHIM: e.g., -- The trained encoder may determine a plurality of peak parameters for each spectra and the decoder may be defined by line-shape equations to determine peak components for each spectrum. In some embodiments, the line-shape equations may include but is not limited to Lorentzian-Gaussian lineshape model equations.

[0055] In some embodiments, the peak components of the spectrum for each voxel may include components/datapoints representing one or more peak regions associated with one or more metabolite resonances of the one or more of the metabolite(s) to be measured. The peak components for each voxel may define the peak model of its respective spectrum and may be represented or disposed in a matrix (may be referred to as a “fourth matrix”)--, in [0054]-[0055]; and,
-- [0064] FIG. 2 shows an example 200 of the neural network architecture according to embodiments. In some embodiments, the neural network architecture 200 may include two serial encoder-decoder stages: a first (convolutional) neural network 210 for determining baseline components for each voxel/spectrum and a second (convolutional) neural network 230 for determining peak components representing peak model for each respectively voxel/spectrum. In some embodiments, the (first) matrix of the plurality of the spectra/voxels may be passed through the first neural network 210 to determine a plurality of baseline parameters/coefficients (also referred to as “wavelet coefficients”) θ.sub.B. Next, the decoder of the first neural network 210 may use the baseline parameters to determine the one or more baseline components using a wavelet reconstruction technique. As shown, the baseline components in a form of a second matrix may be subtracted from the input/first matrix (220).
[0065] Then, each baseline-corrected/baseline-subtracted spectrum may pass through the second (convolutional) neural network 230 to compute the peak parameters (also referred to as “metabolite resonance peak parameters”), θ.sub.P. In some examples, the encoder for the second neural network 230 may determine six parameters: peak amplitude A.sub.m for each metabolite m to be measured, resonance frequency ω.sub.m for each metabolite m to be measured, zero and first order phases (ϕ.sub.0 and ϕ.sub.1), and Gaussian and Lorentzian decay constants (T.sub.a and T.sub.b). For example, for evaluating glioblastoma, the metabolites (m) to be measured may include Cho, Cr, and NAA. Next, an estimate of the peak components, which may refer to metabolite resonances, {right arrow over (s)}.sub.peak, can be determined by the decoder of the network 230 using a Lorentzian-Gaussian lineshape model. Using the peak components, one or more measurements (e.g., a concentration of each metabolite resonance) may be determined.--, in [0064]-[0065];
And, SHIM as modified by GHADIMI and KOCH further disclose wherein the first three-dimensional MRI image is processed by a different encoder model than the second three-dimensional MRI image ({as see in previous Office Action}, SHIM as modified by GHADIMI and KOCH discloses the first three-dimensional MRI image, and the second three-dimensional MRI image (see SHIM: e.g., 
--[0045] In some embodiments, the MR spectroscopy data may be acquired using any available magnetic resonance system capable of acquiring MR spectroscopy data, such as a 3T MRI scanner. In some embodiments, the data may be acquired using one or more MR protocols or parameters (e.g., pulse sequence, flip angle (“FA”), RF pulse phase, TR, echo time (“TE’), sampling patterns, etc.). In some embodiments, the one or more MR protocols/parameters may be specific to the metabolite(s) to be measured and/or region(s) of interest to be scanned/imaged.--, in [0044]-[0046]; also see Figs. 8, 9, 10, 11; and, --FIG. 8, which shows the individual metabolite maps, the Cho/NAA ratio map, and corresponding contrast-enhanced T1-weighted (CE-T1w) and fluid-attenuated inversion recovery (FLAIR) MM volumes. Superimposed on the CE-T1w image is a contour drawn by a neuroradiologist to indicate contrast enhancing tissue and the surgical cavity, regions that would normally be targeted for high dose radiation therapy.--, in [0095]-[0097]);
{so that, it is clearly disclosed by SHIM that MRI image of contrast-enhanced T1-weighted (CE-T1w), which read on the claimed “the first three-dimensional MRI image”, acquired with using one or more MR protocols or parameters (e.g., pulse sequence …….echo time (“TE’), sampling patterns, etc.);  and, 
MRI image of the fluid-attenuated inversion recovery (FLAIR) MM volumes which read on the claimed “the second three-dimensional MRI image”, acquired with using another one or more MR protocols or parameters (e.g., pulse sequence, flip angle (“FA”), RF pulse phase, TR, …….echo time (“TE’), sampling patterns, etc.);  
and, and above MRI image of contrast-enhanced T1-weighted (CE-T1w), and MRI image of the fluid-attenuated inversion recovery (FLAIR) MM volumes use different models of encoder since they are for different baseline, line shape, and peak models for different components and metabolites. 
In addition, KOCH’s disclosures are also in brain MRI images processing, KOCH particularly demonstrates the processing of MRI images and the Encoding Path -decoding Path configurations based on multiple paths and connections to further concatenate features extracted from the multiple Encoding Paths from brain MRI spectra {such as first MRI image, second MRI image …etc., as demonstrated in block 32, and 34 of Fig. 3, and in Feature Concatenations to the decoding Paths in Fig. 5:
And see KOCH’s Fig. 3, and Fig. 5 reproduced below: 
 	 

    PNG
    media_image2.png
    750
    835
    media_image2.png
    Greyscale







    PNG
    media_image3.png
    1202
    781
    media_image3.png
    Greyscale


Therefore, claims 1-20 are still not patentably distinguishable over the prior art reference(s). Further discussions are addressed in the prior art rejection section below.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over SHIM (US 20220018922 A1, Date Filed: 2019-11-20), and in view of GHADIMI (US 20210267455 A1, claims priority of US-Provisional-Application US 62/969342 20200203),and further in view of KOCH (US 11372066 B2, Date Filed: 2019-02-07).
Re Claim 1, SHIM discloses a computer-implemented method (see SHIM: e.g., --Systems and methods provide a parallelized deep learning approach to spectral fitting for magnetic resonance spectroscopy data enabling accurate and rapid spectral fitting and determination of metabolite measurements using a conventional computer. The method may include processing multi-spectra magnetic resonance (MR) spectroscopy data of a region of interest through a series of neural networks. The method may include determining baseline components of each spectrum using a first neural network of the series, generating baseline-corrected components for each spectrum using the baseline components; and determining one or more peak components of each spectrum using a second neural network of the series and the baseline-corrected components. The method may further include determining one or more metabolite measurements of the one or more metabolites in the region of interest using the one or more peak components.--, in abstract) comprising:
accessing a plurality of three-dimensional magnetic resonance imaging (MRI) images, wherein each of the plurality of three-dimensional MRI images depict a same volume of a brain of a subject (see SHIM: e.g., -- the method 100 may include a step 110 of obtaining magnetic resonance (MR) spectroscopy data of a region of interest of a subject. For example, if evaluating a subject for a neurologic pathology, such as glioblastoma, the region of interest may be a region of the brain. In some embodiments, the region of the brain may include the whole-brain or a portion of the brain, such as an anatomic region of the brain.
[0045] In some embodiments, the MR spectroscopy data may be acquired using any available magnetic resonance system capable of acquiring MR spectroscopy data, such as a 3T MRI scanner. In some embodiments, the data may be acquired using one or more MR protocols or parameters (e.g., pulse sequence, flip angle (“FA”), RF pulse phase, TR, echo time (“TE’), sampling patterns, etc.). In some embodiments, the one or more MR protocols/parameters may be specific to the metabolite(s) to be measured and/or region(s) of interest to be scanned/imaged.--, in [0044]-[0046]);
and a first three-dimensional MRI image was generated using a first type of MRI sequence that is different than a second type of MRI sequence used to generate a second three-dimensional MRI image (see SHIM: e.g., -- the method 100 may include a step 110 of obtaining magnetic resonance (MR) spectroscopy data of a region of interest of a subject. For example, if evaluating a subject for a neurologic pathology, such as glioblastoma, the region of interest may be a region of the brain. In some embodiments, the region of the brain may include the whole-brain or a portion of the brain, such as an anatomic region of the brain.
[0045] In some embodiments, the MR spectroscopy data may be acquired using any available magnetic resonance system capable of acquiring MR spectroscopy data, such as a 3T MRI scanner. In some embodiments, the data may be acquired using one or more MR protocols or parameters (e.g., pulse sequence, flip angle (“FA”), RF pulse phase, TR, echo time (“TE’), sampling patterns, etc.). In some embodiments, the one or more MR protocols/parameters may be specific to the metabolite(s) to be measured and/or region(s) of interest to be scanned/imaged.--, in [0044]-[0046]; also see Figs. 8, 9, 10, 11; and, --FIG. 8, which shows the individual metabolite maps, the Cho/NAA ratio map, and corresponding contrast-enhanced T1-weighted (CE-T1w) and fluid-attenuated inversion recovery (FLAIR) MM volumes. Superimposed on the CE-T1w image is a contour drawn by a neuroradiologist to indicate contrast enhancing tissue and the surgical cavity, regions that would normally be targeted for high dose radiation therapy.--, in [0095]-[0097]);
processing, for each three-dimensional MRI image of the plurality of three-dimensional MRI images, the three-dimensional MRI image using one or more corresponding encoder arms of a machine-leaming model to generate an encoding of the three- dimensional MRI image (see SHIM: e.g.,  Fig. 2, --[0050] The first neural network may include a trained encoder and a decoder. The trained encoder may determine a plurality of baseline parameters for each spectrum/voxel and the decoder may be defined by a mathematical technique to convert/determine the baseline component(s) (or datapoints) using the baseline parameters for each spectra. The mathematical technique may include but is not limited to one or more wavelet reconstruction equations. In some embodiments, the baseline parameters can represent local and non-local oscillations in the signal, enabling modeling by the decoder of the overall shape of the spectrum not including the metabolite peaks.--, in [0050], and, --0064] FIG. 2 shows an example 200 of the neural network architecture according to embodiments. In some embodiments, the neural network architecture 200 may include two serial encoder-decoder stages: a first (convolutional) neural network 210 for determining baseline components for each voxel/spectrum and a second (convolutional) neural network 230 for determining peak components representing peak model for each respectively voxel/spectrum. In some embodiments, the (first) matrix of the plurality of the spectra/voxels may be passed through the first neural network 210 to determine a plurality of baseline parameters/coefficients (also referred to as “wavelet coefficients”) θ.sub.B. Next, the decoder of the first neural network 210 may use the baseline parameters to determine the one or more baseline components using a wavelet reconstruction technique. As shown, the baseline components in a form of a second matrix may be subtracted from the input/first matrix (220).--, in [0064], [0079], and [0103]); 
wherein the one or more corresponding encoder arms comprise a plurality of encoder models (see SHIM: e.g., -- [0039] The systems and methods of the disclosure provide a parallelized deep learning approach to spectral fitting for magnetic resonance spectroscopy data enabling spectral fitting and (in vivo) metabolite measurements to be performed in substantially real-time (e.g., in about one minute), for example, using a conventional computer. The disclosure uses an unsupervised deep learning architecture that incorporates mathematical and physics-based models of spectral lines shape and baseline to generate an encoding of spectral parameters while being fully constrained within known physics. The architecture includes a series of linear operations and therefore is highly parallelizable. Multiple spectra acquired using a magnetic resonance spectroscopy system can be combined in a (2D) matrix (e.g. rows including datapoints representing each spectrum) and passed through the architecture according to the disclosure to generate independent parameter sets on each spectrum in parallel, resulting in spectral fitting and metabolite quantification (e.g., concentration or volume), for example, of the region of interest (e.g., whole-brain, an anatomic region within the brain, etc.), in substantially real-time without requiring substantial computational processing resources (e.g., graphical processing units (GPUs)). The disclosure can thereby address the computational bottleneck in processing of (in vivo) whole-brain spectroscopic imaging--, in [0039], and in view of
SHIM’s Fig. 2, and Fig. 3, as reproduced below:

    PNG
    media_image1.png
    1002
    871
    media_image1.png
    Greyscale

It is clearly demonstrated that SHIM’s one or more encoder(s) arms  to generate an encoding of spectral parameters, include multiple line shape model, baseline model,  θB, spectral parameters model such as spectral peaks model θP, (see SHIM: e.g., -- The trained encoder may determine a plurality of peak parameters for each spectra and the decoder may be defined by line-shape equations to determine peak components for each spectrum. In some embodiments, the line-shape equations may include but is not limited to Lorentzian-Gaussian lineshape model equations.

[0055] In some embodiments, the peak components of the spectrum for each voxel may include components/datapoints representing one or more peak regions associated with one or more metabolite resonances of the one or more of the metabolite(s) to be measured. The peak components for each voxel may define the peak model of its respective spectrum and may be represented or disposed in a matrix (may be referred to as a “fourth matrix”)--, in [0054]-[0055]; and,
-- [0064] FIG. 2 shows an example 200 of the neural network architecture according to embodiments. In some embodiments, the neural network architecture 200 may include two serial encoder-decoder stages: a first (convolutional) neural network 210 for determining baseline components for each voxel/spectrum and a second (convolutional) neural network 230 for determining peak components representing peak model for each respectively voxel/spectrum. In some embodiments, the (first) matrix of the plurality of the spectra/voxels may be passed through the first neural network 210 to determine a plurality of baseline parameters/coefficients (also referred to as “wavelet coefficients”) θ.sub.B. Next, the decoder of the first neural network 210 may use the baseline parameters to determine the one or more baseline components using a wavelet reconstruction technique. As shown, the baseline components in a form of a second matrix may be subtracted from the input/first matrix (220).
[0065] Then, each baseline-corrected/baseline-subtracted spectrum may pass through the second (convolutional) neural network 230 to compute the peak parameters (also referred to as “metabolite resonance peak parameters”), θ.sub.P. In some examples, the encoder for the second neural network 230 may determine six parameters: peak amplitude A.sub.m for each metabolite m to be measured, resonance frequency ω.sub.m for each metabolite m to be measured, zero and first order phases (ϕ.sub.0 and ϕ.sub.1), and Gaussian and Lorentzian decay constants (T.sub.a and T.sub.b). For example, for evaluating glioblastoma, the metabolites (m) to be measured may include Cho, Cr, and NAA. Next, an estimate of the peak components, which may refer to metabolite resonances, {right arrow over (s)}.sub.peak, can be determined by the decoder of the network 230 using a Lorentzian-Gaussian lineshape model. Using the peak components, one or more measurements (e.g., a concentration of each metabolite resonance) may be determined.--, in [0064]-[0065]);
wherein the first three-dimensional MRI image is processed by a different encoder model than the second three-dimensional MRI image (see SHIM: e.g., SHIM discloses the first three-dimensional MRI image, and the second three-dimensional MRI image:
--[0045] In some embodiments, the MR spectroscopy data may be acquired using any available magnetic resonance system capable of acquiring MR spectroscopy data, such as a 3T MRI scanner. In some embodiments, the data may be acquired using one or more MR protocols or parameters (e.g., pulse sequence, flip angle (“FA”), RF pulse phase, TR, echo time (“TE’), sampling patterns, etc.). In some embodiments, the one or more MR protocols/parameters may be specific to the metabolite(s) to be measured and/or region(s) of interest to be scanned/imaged.--, in [0044]-[0046]; also see Figs. 8, 9, 10, 11; and, --FIG. 8, which shows the individual metabolite maps, the Cho/NAA ratio map, and corresponding contrast-enhanced T1-weighted (CE-T1w) and fluid-attenuated inversion recovery (FLAIR) MM volumes. Superimposed on the CE-T1w image is a contour drawn by a neuroradiologist to indicate contrast enhancing tissue and the surgical cavity, regions that would normally be targeted for high dose radiation therapy.--, in [0095]-[0097]);
{so that, it is clearly disclosed by SHIM that MRI image of contrast-enhanced T1-weighted (CE-T1w), which read on the claimed “the first three-dimensional MRI image”, acquired with using one or more MR protocols or parameters (e.g., pulse sequence …….echo time (“TE’), sampling patterns, etc.);  and, 
MRI image of the fluid-attenuated inversion recovery (FLAIR) MM volumes which read on the claimed “the second three-dimensional MRI image”, acquired with using another one or more MR protocols or parameters (e.g., pulse sequence, flip angle (“FA”), RF pulse phase, TR, …….echo time (“TE’), sampling patterns, etc.);  
and, and above MRI image of contrast-enhanced T1-weighted (CE-T1w), and MRI image of the fluid-attenuated inversion recovery (FLAIR) MM volumes use different models of encoder since they are for different baseline, line shape, and peak models for different components and metabolites.));
SHIM however does not explicitly teach concatenating the encodings of the plurality of three-dimensional MRI images to generate a concatenated representation;
GHADIMI discloses concatenating the encodings of the plurality of three-dimensional MRI images to generate a concatenated representation (see GHADIMI: e.g., -- [0090] Four symmetric encoding and decoding blocks were used in the contracting and expanding path, respectively. Each decoding block can contain two consecutive sets of deconvolutional layers with filter size 3×3, a batch normalization layer and a rectified linear activation layer. The output of each encoding block in the contracting path was concatenated with those in the corresponding decoding block in the expanding path via skip-connections. The final segmentation map can include two classes: background and endocardium or epicardium. The loss function can be the summation of the weighted pixel-wise cross entropy and soft Dice loss. The assigned class weights were 1 for background, 2 for endocardium in the endocardial network and 3 for the epicardial network. During training, data augmentation on-the-fly was performed by applying random translations, rotations and scaling followed by a b-spline-based deformation to the input images and to the corresponding ground-truth label maps at each iteration. This type of augmentation has the advantage that the model sees different data at each iteration. The use of other network configurations, including networks with different numbers of layers, different filter sizes, stride numbers and dilation rates, is contemplated by the present disclosure, and the above are intended only as non-limiting examples of network parameters that can be used for segmentation.--, in [0089]-[0090]);
SHIM and GHADIMI are combinable as they are in the same field of endeavor: analysis of Magnetic Resonance image data with neural network. Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify SHIM’s method using GHADIMI’s teachings by including concatenating the encodings of the plurality of three-dimensional MRI images to generate a concatenated representation to SHIM’s MRI images encoding-decoding processing in order to combine two or more components to produce a final segmentation (see GHADIMI: e.g. in [0089]-[0090]);
SHIM as modified by GHADIMI however still do not explicitly disclose processing the concatenated representation using a decoder arm of the machine-learning model,
KOCH discloses processing the concatenated representation using a decoder arm of the machine-learning model (see KOCH: e.g., -- (46) The decoding path 504 generally includes a transposed convolutional layer with additional feature concatenation layers from the encoding layers. The output of each transposed convolutional layer is a feature map that is passed to the concatenation layer. At each transposed convolution step, the number of feature channels in the feature map can be halved, or otherwise reduced. As noted, each upsampled feature map is also concatenated with the corresponding feature map from the encoding path 502. The concatenated feature map is then passed to a convolutional layer followed by a batch normalization layer and a nonlinear layer (e.g., a ReLU). The output of the convolutional layer is a feature map that is passed to the batch normalization layer, the output of which is a feature map passed to the nonlinear layer. The final layer is a convolutional layer (e.g., a 1×1×1 convolution) with linear activation, which is applied to output the susceptibility estimations.
(47) As one example, the encoder-decoder network shown in FIG. 5 was trained using an L1 loss function and ADAM optimizer. The initial learning rate in this example was set as 0.001, with an exponential decay of the rate occurring at every 200 steps with a time constant of 0.9. Due to the deep nature of this neural network and the heavy computation required to train it, 10,000 datasets were used for training each model with batch size 8.
(48) Due to the different target QSM map resolutions for each of the test scenarios, two encoder-decoder network models were trained in this example. For a dataset with voxel size isotropic 1.06 mm, the network was trained to independently invert 3D parcels of 128×128×128 voxels. For another dataset with voxel size 0.5×0.5×2.0 mm.sup.3, the network was trained to invert 3D parcels of 192×192×64 voxels. In the prediction stage, segmentation of full-resolution input volumes into the parcels with parcel size equal to neural network input data size was performed using 32×32×32 overlap regions. After QSM inference using the trained encoder-decoder networks, the parcels were combined to form a composite image.--, in line 32, col. 10 through line 27, col. 11;
In addition, KOCH’s disclosures are also in brain MRI images processing, KOCH particularly demonstrates the processing of MRI images and the Encoding Path -decoding Path configurations based on multiple paths and connections to further concatenate features extracted from the multiple Encoding Paths from brain MRI spectra {such as first MRI image, second MRI image …etc., as demonstrated in block 32, and 34 of Fig. 3, and in Feature Concatenations to the decoding Paths in Fig. 5:
And see KOCH’s Fig. 3, and Fig. 5 reproduced below: 
 	 

    PNG
    media_image2.png
    750
    835
    media_image2.png
    Greyscale







    PNG
    media_image3.png
    1202
    781
    media_image3.png
    Greyscale
);
SHIM (as modified by GHADIMI) and KOCH are combinable as they are in the same field of endeavor: analysis of Magnetic Resonance image data with neural network. Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify SHIM (as modified by GHADIMI) ’s method using KOCH’s teachings by including processing the concatenated representation using a decoder arm of the machine-learning model to SHIM (as modified by GHADIMI)’s MRI images encoding-decoding processing in order to combine feature maps to produce a final segmentation (see KOCH: e.g. in in line 32, col. 10 through line 27, col. 11);
SHIM as modified by GHADIMI and KOCH further disclose and processing the concatenated representation using a decoder arm of the machine-learning model to generate a prediction that identifies one or more portions of the volume of the brain predicted to depict at least part of a lesion (see SHIM: e.g., Fig. 2, --[0050] The first neural network may include a trained encoder and a decoder. The trained encoder may determine a plurality of baseline parameters for each spectrum/voxel and the decoder may be defined by a mathematical technique to convert/determine the baseline component(s) (or datapoints) using the baseline parameters for each spectra. The mathematical technique may include but is not limited to one or more wavelet reconstruction equations. In some embodiments, the baseline parameters can represent local and non-local oscillations in the signal, enabling modeling by the decoder of the overall shape of the spectrum not including the metabolite peaks.--, in [0050], and, --0064] FIG. 2 shows an example 200 of the neural network architecture according to embodiments. In some embodiments, the neural network architecture 200 may include two serial encoder-decoder stages: a first (convolutional) neural network 210 for determining baseline components for each voxel/spectrum and a second (convolutional) neural network 230 for determining peak components representing peak model for each respectively voxel/spectrum. In some embodiments, the (first) matrix of the plurality of the spectra/voxels may be passed through the first neural network 210 to determine a plurality of baseline parameters/coefficients (also referred to as “wavelet coefficients”) θ.sub.B. Next, the decoder of the first neural network 210 may use the baseline parameters to determine the one or more baseline components using a wavelet reconstruction technique. As shown, the baseline components in a form of a second matrix may be subtracted from the input/first matrix (220).--, in [0064], [0079], and [0103]., and, -- the Cho-NAA-index (McKnight T R, von dem Bussche M H, Vigneron D B, Lu Y, Berger M S, McDermott M W, Dillon W P, Graves E E, Pirzkall A, Nelson S J. Histopathological validation of a three-dimensional magnetic resonance spectroscopy index as a predictor of tumor presence. Journal of Neurosurgery 2002; 97(4):794-802)… the method 100 may include a step 160 of outputting the one or more measurements of one or more metabolites for the region of interest, for example, for further processing, transmitting, and/or storing (e.g., an electronic record system such as PACs). In some embodiments, the outputting may include registering the metabolite measurement(s) with clinical anatomical MRI data, for example, to generate a map. For example, volumes can be co-registered using a rigid transformation and resampled (e.g., using trilinear interpolation into a high-resolution T1w image space), enabling overlays of the metabolic measurement(s) (information) onto the anatomic MRI. This map can enable a visual assessment of metabolic changes in spatially dependent manner. In some examples, a clinician can select a voxel on the map to bring up the corresponding spectrum.
[0062] In some embodiments, the one or more measurements may be used within a clinical workflow. For example, the one or more measurements may be used to generate a radiation plan for radiation therapy, target planning for surgical biopsy and/or resection, diagnosis indication of a medical condition (e.g., tumor, brain traumatic injury, etc.), among others, or a combination thereof. For example, for radiation therapy planning, the one or more measurements may be integrated with clinical 3D MRI volumes, enabling clinicians to evaluate relevant metabolite levels and the underlying spectra used for this quantitation, to delineate target volumes for radiation therapy planning based on this information--, in [0060]-[0064]; and, -- FIG. 6D shows the correlation between the Cho/NAA ratio, calculated by CEMD and MIDAS for the testing set. The solid lines plot the mean value between the two fitting techniques for each bin of values, and the shaded region around the line indicates +/−1 standard deviation. Overlaid in light gray are the histograms for the distribution of metabolite values computed by MIDAS. The variance of CEMD predictions compared to MIDAS is inversely correlated with the number of training samples available--, in [0093], and, -- The CEMD is an unsupervised deep learning architecture that incorporates spectral models to generate an encoding of spectral parameters, which is advantageous because it does not require any “ground truth” spectral quantitation for training. The predictions of the CEMD have contextual meaning, and the CEMD was trained to make these predictions within the constraints of an explicitly defined spectral model. Once trained, the CEMD performs spectral fitting on volumetric data in under one minute using standard computer hardware. The order of magnitude improvement in fitting time can greatly benefit the clinical adoption of whole-brain MRSI.--, in [0098]-[0101]; and see KOCH: e.g., -- (46) The decoding path 504 generally includes a transposed convolutional layer with additional feature concatenation layers from the encoding layers. The output of each transposed convolutional layer is a feature map that is passed to the concatenation layer. At each transposed convolution step, the number of feature channels in the feature map can be halved, or otherwise reduced. As noted, each upsampled feature map is also concatenated with the corresponding feature map from the encoding path 502. The concatenated feature map is then passed to a convolutional layer followed by a batch normalization layer and a nonlinear layer (e.g., a ReLU). The output of the convolutional layer is a feature map that is passed to the batch normalization layer, the output of which is a feature map passed to the nonlinear layer. The final layer is a convolutional layer (e.g., a 1×1×1 convolution) with linear activation, which is applied to output the susceptibility estimations.
(47) As one example, the encoder-decoder network shown in FIG. 5 was trained using an L1 loss function and ADAM optimizer. The initial learning rate in this example was set as 0.001, with an exponential decay of the rate occurring at every 200 steps with a time constant of 0.9. Due to the deep nature of this neural network and the heavy computation required to train it, 10,000 datasets were used for training each model with batch size 8.
(48) Due to the different target QSM map resolutions for each of the test scenarios, two encoder-decoder network models were trained in this example. For a dataset with voxel size isotropic 1.06 mm, the network was trained to independently invert 3D parcels of 128×128×128 voxels. For another dataset with voxel size 0.5×0.5×2.0 mm.sup.3, the network was trained to invert 3D parcels of 192×192×64 voxels. In the prediction stage, segmentation of full-resolution input volumes into the parcels with parcel size equal to neural network input data size was performed using 32×32×32 overlap regions. After QSM inference using the trained encoder-decoder networks, the parcels were combined to form a composite image.--, in line 32, col. 10 through line 27, col. 11;
also see GHADIMI: e.g. in [0089]-[0090] {for “the concatenated representation” such as generating a segmentation map}).


Re Claim 2, SHIM as modified by GHADIMI further disclose generating, for each three-dimensional MRI image of the plurality of three-dimensional MRI images, a downsampled encoding having a resolution that is lower than a resolution of the encoding of the three-dimensional MRI image (see GHADIMI: e.g., -- Padding can be used in each convolutional operation to maintain the spatial dimension. Between each encoding block, pooling layers with step size of 3×3 and stride 2 were applied to reduce the spatial dimension in all directions. The number of features can be doubled for the next encoding block.--, in [0089], and, -- Cine DENSE image acquisition parameters including a pixel size of 1.56×1.56 mm.sup.2-2.8×2.8 mm.sup.2, FOV=200 mm.sup.2 (using outer volume suppression) to 360 mm.sup.2, slice thickness=8 mm, a temporal resolution of 17 msec (with view sharing), 2D in-plane displacement encoding using the simple three-point method.sup.30, displacement-encoding frequency=0.1 cycles/mm, ramped flip angle with final flip angle of 15°, echo time=1.26-1.9 msec, and a spiral k-space trajectory with 4-6 interleaves.--, in [0100]; also see KOCH: e.g., -- The encoding path 502 generally implements a convolutional neural network. For instance, the encoding path 502 can include repeated application of convolutional layers (e.g., 3×3×3 convolutions) each followed by a batch normalization layer and a nonlinear layer, which may be a rectified linear unit (“ReLU”). The output of each convolutional layer is a feature map that is passed to the nonlinear layer. Each feature map generally represents a particular feature extracted at all locations on the input. Each nonlinear layer is followed by a downsampling layer, which may be a pooling layer, such as a max pooling layer (e.g., a max pooling layer using stride 2×2×2), an average pooling layer, an L2-norm pooling layer, or so on. The output of each nonlinear layer is a feature map that is passed to the downsampling layer. At each downsampling step, the number of feature channels in the feature map can be doubled, or otherwise increased.--, in lines 15-31, col. 10);
processing, for each three-dimensional MRI image of the plurality of three-dimensional MRI images, the downsampled encoding using one or more layers of the one or more corresponding encoding arms; and concatenating the downsampled encodings to generate another concatenated representation, wherein the prediction is further based on processing of the another concatenated representation using the decoder arm of the machine-learning model (see KOCH: e.g., -- The encoding path 502 generally implements a convolutional neural network. For instance, the encoding path 502 can include repeated application of convolutional layers (e.g., 3×3×3 convolutions) each followed by a batch normalization layer and a nonlinear layer, which may be a rectified linear unit (“ReLU”). The output of each convolutional layer is a feature map that is passed to the nonlinear layer. Each feature map generally represents a particular feature extracted at all locations on the input. Each nonlinear layer is followed by a downsampling layer, which may be a pooling layer, such as a max pooling layer (e.g., a max pooling layer using stride 2×2×2), an average pooling layer, an L2-norm pooling layer, or so on. The output of each nonlinear layer is a feature map that is passed to the downsampling layer. At each downsampling step, the number of feature channels in the feature map can be doubled, or otherwise increased.--, in lines 15-31, col. 10; also see GHADIMI: e.g., -- Padding can be used in each convolutional operation to maintain the spatial dimension. Between each encoding block, pooling layers with step size of 3×3 and stride 2 were applied to reduce the spatial dimension in all directions. The number of features can be doubled for the next encoding block.--, in [0089], and, -- [0090] Four symmetric encoding and decoding blocks were used in the contracting and expanding path, respectively. Each decoding block can contain two consecutive sets of deconvolutional layers with filter size 3×3, a batch normalization layer and a rectified linear activation layer. The output of each encoding block in the contracting path was concatenated with those in the corresponding decoding block in the expanding path via skip-connections. The final segmentation map can include two classes: background and endocardium or epicardium. The loss function can be the summation of the weighted pixel-wise cross entropy and soft Dice loss. The assigned class weights were 1 for background, 2 for endocardium in the endocardial network and 3 for the epicardial network. During training, data augmentation on-the-fly was performed by applying random translations, rotations and scaling followed by a b-spline-based deformation to the input images and to the corresponding ground-truth label maps at each iteration. This type of augmentation has the advantage that the model sees different data at each iteration. The use of other network configurations, including networks with different numbers of layers, different filter sizes, stride numbers and dilation rates, is contemplated by the present disclosure, and the above are intended only as non-limiting examples of network parameters that can be used for segmentation.--, in [0089]-[0090]).

Re Claim 3, SHIM as modified by GHADIMI and KOCH further disclose wherein the machine-learning model includes a U-Net machine-learning model (see GHADIMI: e.g., -- [0031] FIG. 1C is a schematic diagram of an example convolutional neural network (CNN) according to embodiments of the present disclosure. The CNN is a U-Net that used dilated convolutions of rate 2 in the contracting path, a pixel-wise cross-entropy loss function, Adam optimizer (learning rate of 5E-4, a mini batch size of 10), dropout rate of 0.5, and epochs of 200. Brown/Gold arrows represent convolutions of 3×3+batch normalization+ReLU; the blue arrows represent pooling of 3×3 with a stride of 2, the red arrows represent deconvolutions of 2×2, the purple/pink arrows from one side of the U-Net to the other are concatenations, and the final light orange arrow represents convolutions of 1×1+Softmax for the right side output.--, in [0031]-[0032]; and, --The anterior RV-LV insertion point is the location of the attachment of the anterior RV wall to the LV, and its location defines the alignment of the American Heart Association 16-segment model.sup.16 which can be used for segmental strain analysis of the LV. As the first frame of cine DENSE images can have poor blood-myocardium contrast, a U-Net is trained to detect the anterior RV-LV insertion point on early-systolic frames (e.g. frames 5 and 6), where the insertion point is reliably well visualized. To create the ground-truth data, an expert user can identify one point in these frames from magnitude-reconstructed DENSE images. During network training, instead of using that point as an absolute ground-truth, which only provides very limited information to the network to learn and suffers from severe class imbalance, a circle with a six-pixel radius around that point can be defined as the network target. The network's inputs were the DENSE magnitude image and the segmented LV binary mask obtained by the aforementioned myocardial segmentation networks as an additional input channel. The network's output is the probability map of a circle for which the center of mass is defined to be the detected RV-LV insertion point. The same aforementioned U-Net structure can be used.--, in [0092]).


Re Claim 4, SHIM as modified by GHADIMI and KOCH further disclose wherein the machine- learning model includes one or more skip attention modules, each of the one or more skip attention modules connecting an encoding block of the encoding arms of the machine- learning model to a decoder block of the decoder arm at a same resolution (see SHIM: e.g., -- a machine learning approach to spectral fitting is described that can perform sub-minute calculation of relative metabolite concentrations in MRSI of the brain. A convolutional encoder-model decoder technique has been implemented that explicitly incorporates a parametric spectral model with the power of unsupervised feature-learning to produce fast spectral fittings that are constrained by the model. This can be a powerful paradigm that does not require a priori ground truth and relies upon spectral lineshape and baseline models to optimize the underlying convolutional neural network parameters. The CEMD architecture can produce accurate fitting of a variety of spectra acquired from multiple scanners in patients with glioblastoma, including correctly fitting challenging spectra with low SNR, partial volume effects, baseline shifts, phase shifts, and dropout of one or more metabolite resonances. The CEMD can fit whole-brain data on a standard multicore computer without the need for expensive workstations or GPUs, in less than one minute. With this new autoencoder-based neural network, the largest computational bottleneck in processing MRSI can be overcome, bringing improved performance that will support the implementation of MRS for more widespread clinical use.--, in [0103], and also see GHADIMI: e.g., -- the final model of each network was trained using data from 64 subjects. Network training was performed on an Nvidia Titan Xp GPU with 12 GB RAM over 400 epochs using an Adam optimizer at a learning rate of 5E-4 and a mini batch size of 10. The times to train the myocardial segmentation networks (endocardium and epicardium), identifying the RV-LV insertion point network, and using the myocardial segmentation for the phase unwrapping network were 34, 48, and 30 hours, respectively. The networks were implemented using Python (version 3.5; Python Software Foundation, www.python.org) with the Tensorflow machine-learning framework (version 1.12.0).sup.37.--, in [0102], and [0117], and,  -- [0090] Four symmetric encoding and decoding blocks were used in the contracting and expanding path, respectively. Each decoding block can contain two consecutive sets of deconvolutional layers with filter size 3×3, a batch normalization layer and a rectified linear activation layer. The output of each encoding block in the contracting path was concatenated with those in the corresponding decoding block in the expanding path via skip-connections. The final segmentation map can include two classes: background and endocardium or epicardium. The loss function can be the summation of the weighted pixel-wise cross entropy and soft Dice loss. The assigned class weights were 1 for background, 2 for endocardium in the endocardial network and 3 for the epicardial network. During training, data augmentation on-the-fly was performed by applying random translations, rotations and scaling followed by a b-spline-based deformation to the input images and to the corresponding ground-truth label maps at each iteration. This type of augmentation has the advantage that the model sees different data at each iteration. The use of other network configurations, including networks with different numbers of layers, different filter sizes, stride numbers and dilation rates, is contemplated by the present disclosure, and the above are intended only as non-limiting examples of network parameters that can be used for segmentation.--, in [0089]-[0090]; also see KOCH: e.g., -- (44) As noted above, in some instances the trained neural network can be an encoder-decoder neural network, such as a three-dimensional encoder-decoder deep neural network. An example of such a neural network is shown in FIG. 5. The data input to the network 500 are the local field shift maps and the data output are estimates of the source magnetic susceptibility tensor at each input voxel. In this example, the applied encoder-decoder network architecture utilizes skip connections between the encoding path 502 and decoding path 504, which can effectively transfer local feature information from the encoding path to the decoding path and facilitate faster training.--, in lines 3-14, col. 10).

Re Claim 5, SHIM as modified by GHADIMI and KOCH further disclose  wherein each skip attention module of the skip attention modules receives an input of the concatenated representation and an upsampled encoding of the another concatenated representation at the resolution of the three-dimensional MRI image, and wherein the prediction is further based on processing an output of skip-feature encodings from the skip attention modules using the decoder arm of the machine-learning mode (see SHIM: e.g., -- a machine learning approach to spectral fitting is described that can perform sub-minute calculation of relative metabolite concentrations in MRSI of the brain. A convolutional encoder-model decoder technique has been implemented that explicitly incorporates a parametric spectral model with the power of unsupervised feature-learning to produce fast spectral fittings that are constrained by the model. This can be a powerful paradigm that does not require a priori ground truth and relies upon spectral lineshape and baseline models to optimize the underlying convolutional neural network parameters. The CEMD architecture can produce accurate fitting of a variety of spectra acquired from multiple scanners in patients with glioblastoma, including correctly fitting challenging spectra with low SNR, partial volume effects, baseline shifts, phase shifts, and dropout of one or more metabolite resonances. The CEMD can fit whole-brain data on a standard multicore computer without the need for expensive workstations or GPUs, in less than one minute. With this new autoencoder-based neural network, the largest computational bottleneck in processing MRSI can be overcome, bringing improved performance that will support the implementation of MRS for more widespread clinical use.--, in [0103], and also see GHADIMI: e.g., -- the final model of each network was trained using data from 64 subjects. Network training was performed on an Nvidia Titan Xp GPU with 12 GB RAM over 400 epochs using an Adam optimizer at a learning rate of 5E-4 and a mini batch size of 10. The times to train the myocardial segmentation networks (endocardium and epicardium), identifying the RV-LV insertion point network, and using the myocardial segmentation for the phase unwrapping network were 34, 48, and 30 hours, respectively. The networks were implemented using Python (version 3.5; Python Software Foundation, www.python.org) with the Tensorflow machine-learning framework (version 1.12.0).sup.37.--, in [0102], and [0117], and,  -- [0090] Four symmetric encoding and decoding blocks were used in the contracting and expanding path, respectively. Each decoding block can contain two consecutive sets of deconvolutional layers with filter size 3×3, a batch normalization layer and a rectified linear activation layer. The output of each encoding block in the contracting path was concatenated with those in the corresponding decoding block in the expanding path via skip-connections. The final segmentation map can include two classes: background and endocardium or epicardium. The loss function can be the summation of the weighted pixel-wise cross entropy and soft Dice loss. The assigned class weights were 1 for background, 2 for endocardium in the endocardial network and 3 for the epicardial network. During training, data augmentation on-the-fly was performed by applying random translations, rotations and scaling followed by a b-spline-based deformation to the input images and to the corresponding ground-truth label maps at each iteration. This type of augmentation has the advantage that the model sees different data at each iteration. The use of other network configurations, including networks with different numbers of layers, different filter sizes, stride numbers and dilation rates, is contemplated by the present disclosure, and the above are intended only as non-limiting examples of network parameters that can be used for segmentation.--, in [0089]-[0090]; also see KOCH: e.g., -- (44) As noted above, in some instances the trained neural network can be an encoder-decoder neural network, such as a three-dimensional encoder-decoder deep neural network. An example of such a neural network is shown in FIG. 5. The data input to the network 500 are the local field shift maps and the data output are estimates of the source magnetic susceptibility tensor at each input voxel. In this example, the applied encoder-decoder network architecture utilizes skip connections between the encoding path 502 and decoding path 504, which can effectively transfer local feature information from the encoding path to the decoding path and facilitate faster training.--, in lines 3-14, col. 10; and, -- (46) The decoding path 504 generally includes a transposed convolutional layer with additional feature concatenation layers from the encoding layers. The output of each transposed convolutional layer is a feature map that is passed to the concatenation layer. At each transposed convolution step, the number of feature channels in the feature map can be halved, or otherwise reduced. As noted, each upsampled feature map is also concatenated with the corresponding feature map from the encoding path 502. The concatenated feature map is then passed to a convolutional layer followed by a batch normalization layer and a nonlinear layer (e.g., a ReLU). The output of the convolutional layer is a feature map that is passed to the batch normalization layer, the output of which is a feature map passed to the nonlinear layer. The final layer is a convolutional layer (e.g., a 1×1×1 convolution) with linear activation, which is applied to output the susceptibility estimations.
(47) As one example, the encoder-decoder network shown in FIG. 5 was trained using an L1 loss function and ADAM optimizer. The initial learning rate in this example was set as 0.001, with an exponential decay of the rate occurring at every 200 steps with a time constant of 0.9. Due to the deep nature of this neural network and the heavy computation required to train it, 10,000 datasets were used for training each model with batch size 8.
(48) Due to the different target QSM map resolutions for each of the test scenarios, two encoder-decoder network models were trained in this example. For a dataset with voxel size isotropic 1.06 mm, the network was trained to independently invert 3D parcels of 128×128×128 voxels. For another dataset with voxel size 0.5×0.5×2.0 mm.sup.3, the network was trained to invert 3D parcels of 192×192×64 voxels. In the prediction stage, segmentation of full-resolution input volumes into the parcels with parcel size equal to neural network input data size was performed using 32×32×32 overlap regions. After QSM inference using the trained encoder-decoder networks, the parcels were combined to form a composite image.--, in line 32, col. 10 through line 27, col. 11).

Re Claim 6, SHIM as modified by GHADIMI and KOCH further disclose wherein the one or more skip attention modules include a residual connection between the input and the output of the skip attention module to facilitate skipping the skip attention module if relevant high-dimensional features are unavailable (see SHIM: e.g., -- a machine learning approach to spectral fitting is described that can perform sub-minute calculation of relative metabolite concentrations in MRSI of the brain. A convolutional encoder-model decoder technique has been implemented that explicitly incorporates a parametric spectral model with the power of unsupervised feature-learning to produce fast spectral fittings that are constrained by the model. This can be a powerful paradigm that does not require a priori ground truth and relies upon spectral lineshape and baseline models to optimize the underlying convolutional neural network parameters. The CEMD architecture can produce accurate fitting of a variety of spectra acquired from multiple scanners in patients with glioblastoma, including correctly fitting challenging spectra with low SNR, partial volume effects, baseline shifts, phase shifts, and dropout of one or more metabolite resonances. The CEMD can fit whole-brain data on a standard multicore computer without the need for expensive workstations or GPUs, in less than one minute. With this new autoencoder-based neural network, the largest computational bottleneck in processing MRSI can be overcome, bringing improved performance that will support the implementation of MRS for more widespread clinical use.--, in [0103], and also see GHADIMI: e.g., -- the final model of each network was trained using data from 64 subjects. Network training was performed on an Nvidia Titan Xp GPU with 12 GB RAM over 400 epochs using an Adam optimizer at a learning rate of 5E-4 and a mini batch size of 10. The times to train the myocardial segmentation networks (endocardium and epicardium), identifying the RV-LV insertion point network, and using the myocardial segmentation for the phase unwrapping network were 34, 48, and 30 hours, respectively. The networks were implemented using Python (version 3.5; Python Software Foundation, www.python.org) with the Tensorflow machine-learning framework (version 1.12.0).sup.37.--, in [0102], and [0117], and,  -- [0090] Four symmetric encoding and decoding blocks were used in the contracting and expanding path, respectively. Each decoding block can contain two consecutive sets of deconvolutional layers with filter size 3×3, a batch normalization layer and a rectified linear activation layer. The output of each encoding block in the contracting path was concatenated with those in the corresponding decoding block in the expanding path via skip-connections. The final segmentation map can include two classes: background and endocardium or epicardium. The loss function can be the summation of the weighted pixel-wise cross entropy and soft Dice loss. The assigned class weights were 1 for background, 2 for endocardium in the endocardial network and 3 for the epicardial network. During training, data augmentation on-the-fly was performed by applying random translations, rotations and scaling followed by a b-spline-based deformation to the input images and to the corresponding ground-truth label maps at each iteration. This type of augmentation has the advantage that the model sees different data at each iteration. The use of other network configurations, including networks with different numbers of layers, different filter sizes, stride numbers and dilation rates, is contemplated by the present disclosure, and the above are intended only as non-limiting examples of network parameters that can be used for segmentation.--, in [0089]-[0090]; also see KOCH: e.g., -- (44) As noted above, in some instances the trained neural network can be an encoder-decoder neural network, such as a three-dimensional encoder-decoder deep neural network. An example of such a neural network is shown in FIG. 5. The data input to the network 500 are the local field shift maps and the data output are estimates of the source magnetic susceptibility tensor at each input voxel. In this example, the applied encoder-decoder network architecture utilizes skip connections between the encoding path 502 and decoding path 504, which can effectively transfer local feature information from the encoding path to the decoding path and facilitate faster training.--, in lines 3-14, col. 10; and, -- (46) The decoding path 504 generally includes a transposed convolutional layer with additional feature concatenation layers from the encoding layers. The output of each transposed convolutional layer is a feature map that is passed to the concatenation layer. At each transposed convolution step, the number of feature channels in the feature map can be halved, or otherwise reduced. As noted, each upsampled feature map is also concatenated with the corresponding feature map from the encoding path 502. The concatenated feature map is then passed to a convolutional layer followed by a batch normalization layer and a nonlinear layer (e.g., a ReLU). The output of the convolutional layer is a feature map that is passed to the batch normalization layer, the output of which is a feature map passed to the nonlinear layer. The final layer is a convolutional layer (e.g., a 1×1×1 convolution) with linear activation, which is applied to output the susceptibility estimations.
(47) As one example, the encoder-decoder network shown in FIG. 5 was trained using an L1 loss function and ADAM optimizer. The initial learning rate in this example was set as 0.001, with an exponential decay of the rate occurring at every 200 steps with a time constant of 0.9. Due to the deep nature of this neural network and the heavy computation required to train it, 10,000 datasets were used for training each model with batch size 8.
(48) Due to the different target QSM map resolutions for each of the test scenarios, two encoder-decoder network models were trained in this example. For a dataset with voxel size isotropic 1.06 mm, the network was trained to independently invert 3D parcels of 128×128×128 voxels. For another dataset with voxel size 0.5×0.5×2.0 mm.sup.3, the network was trained to invert 3D parcels of 192×192×64 voxels. In the prediction stage, segmentation of full-resolution input volumes into the parcels with parcel size equal to neural network input data size was performed using 32×32×32 overlap regions. After QSM inference using the trained encoder-decoder networks, the parcels were combined to form a composite image.--, in line 32, col. 10 through line 27, col. 11).



Re Claim 7, SHIM as modified by GHADIMI and KOCH further disclose wherein the machine- learning model was trained using a weighted binary cross entropy loss and/or a Tversky loss (see GHADIMI: e.g., -- [0090] Four symmetric encoding and decoding blocks were used in the contracting and expanding path, respectively. Each decoding block can contain two consecutive sets of deconvolutional layers with filter size 3×3, a batch normalization layer and a rectified linear activation layer. The output of each encoding block in the contracting path was concatenated with those in the corresponding decoding block in the expanding path via skip-connections. The final segmentation map can include two classes: background and endocardium or epicardium. The loss function can be the summation of the weighted pixel-wise cross entropy and soft Dice loss. The assigned class weights were 1 for background, 2 for endocardium in the endocardial network and 3 for the epicardial network. During training, data augmentation on-the-fly was performed by applying random translations, rotations and scaling followed by a b-spline-based deformation to the input images and to the corresponding ground-truth label maps at each iteration. This type of augmentation has the advantage that the model sees different data at each iteration. The use of other network configurations, including networks with different numbers of layers, different filter sizes, stride numbers and dilation rates, is contemplated by the present disclosure, and the above are intended only as non-limiting examples of network parameters that can be used for segmentation.--, in [0089]-[0090]).

Re Claim 8, SHIM as modified by GHADIMI and KOCH further disclose wherein the machine- learning model was trained using loss calculated at each of multiple depths of the machine- learning model (see GHADIMI: e.g., -- [0090] Four symmetric encoding and decoding blocks were used in the contracting and expanding path, respectively. Each decoding block can contain two consecutive sets of deconvolutional layers with filter size 3×3, a batch normalization layer and a rectified linear activation layer. The output of each encoding block in the contracting path was concatenated with those in the corresponding decoding block in the expanding path via skip-connections. The final segmentation map can include two classes: background and endocardium or epicardium. The loss function can be the summation of the weighted pixel-wise cross entropy and soft Dice loss. The assigned class weights were 1 for background, 2 for endocardium in the endocardial network and 3 for the epicardial network. During training, data augmentation on-the-fly was performed by applying random translations, rotations and scaling followed by a b-spline-based deformation to the input images and to the corresponding ground-truth label maps at each iteration. This type of augmentation has the advantage that the model sees different data at each iteration. The use of other network configurations, including networks with different numbers of layers, different filter sizes, stride numbers and dilation rates, is contemplated by the present disclosure, and the above are intended only as non-limiting examples of network parameters that can be used for segmentation.--, in [0089]-[0090]).  


Re Claim 9,  SHIM as modified by GHADIMI and KOCH further disclose wherein the first type of MRI sequence includes a sequence from a sequence set of Tl, T2 and fluid-attenuated inversion recovery (FLAIR), and the second type of MRI sequence includes another sequence from the sequence set (see SHIM: e.g., -- the method 100 may include a step 110 of obtaining magnetic resonance (MR) spectroscopy data of a region of interest of a subject. For example, if evaluating a subject for a neurologic pathology, such as glioblastoma, the region of interest may be a region of the brain. In some embodiments, the region of the brain may include the whole-brain or a portion of the brain, such as an anatomic region of the brain.
[0045] In some embodiments, the MR spectroscopy data may be acquired using any available magnetic resonance system capable of acquiring MR spectroscopy data, such as a 3T MRI scanner. In some embodiments, the data may be acquired using one or more MR protocols or parameters (e.g., pulse sequence, flip angle (“FA”), RF pulse phase, TR, echo time (“TE’), sampling patterns, etc.). In some embodiments, the one or more MR protocols/parameters may be specific to the metabolite(s) to be measured and/or region(s) of interest to be scanned/imaged.--, in [0044]-[0046]; also see Figs. 8, 9, 10, 11; and, --FIG. 8, which shows the individual metabolite maps, the Cho/NAA ratio map, and corresponding contrast-enhanced T1-weighted (CE-T1w) and fluid-attenuated inversion recovery (FLAIR) MM volumes. Superimposed on the CE-T1w image is a contour drawn by a neuroradiologist to indicate contrast enhancing tissue and the surgical cavity, regions that would normally be targeted for high dose radiation therapy.--, in [0095]-[0097]; also see KOCH: e.g., --(38) The field shift maps contained in the multiresolution field shift map data are separately processed. Thus, as indicated at step 408, a field shift map for one of the resolution layers in the multiresolution field shift map data is selected for processing. The selected field shift map is then divided, or parcellated, into a number of field shift subvolumes, δ.sub.k,j, for the selected resolution layer, as indicated at step 410, thereby generating a plurality of local field shift maps for the selected resolution layer. The local field shift maps can be generated for the same subvolumes in each resolution layer, or can alternatively be generated for different subvolumes in different resolution layers. QSM inversion is then performed on the subvolumes in the selected resolution layer in order to produce quantitative susceptibility maps for each of the subvolumes in the selected resolution layer, as indicated at step 412. The parcel maps generated in this process thus represent quantitative susceptibility maps corresponding to regions of local susceptibility associated with the subvolumes, V.sub.k,j, in each particular resolution layer.--, in lines 47-65, col. 8).

Re Claim 10, SHIM as modified by GHADIMI and KOCH further disclose determining a number of lesions using the prediction (see SHIM: e.g., Fig. 2, --[0050] The first neural network may include a trained encoder and a decoder. The trained encoder may determine a plurality of baseline parameters for each spectrum/voxel and the decoder may be defined by a mathematical technique to convert/determine the baseline component(s) (or datapoints) using the baseline parameters for each spectra. The mathematical technique may include but is not limited to one or more wavelet reconstruction equations. In some embodiments, the baseline parameters can represent local and non-local oscillations in the signal, enabling modeling by the decoder of the overall shape of the spectrum not including the metabolite peaks.--, in [0050], and, --0064] FIG. 2 shows an example 200 of the neural network architecture according to embodiments. In some embodiments, the neural network architecture 200 may include two serial encoder-decoder stages: a first (convolutional) neural network 210 for determining baseline components for each voxel/spectrum and a second (convolutional) neural network 230 for determining peak components representing peak model for each respectively voxel/spectrum. In some embodiments, the (first) matrix of the plurality of the spectra/voxels may be passed through the first neural network 210 to determine a plurality of baseline parameters/coefficients (also referred to as “wavelet coefficients”) θ.sub.B. Next, the decoder of the first neural network 210 may use the baseline parameters to determine the one or more baseline components using a wavelet reconstruction technique. As shown, the baseline components in a form of a second matrix may be subtracted from the input/first matrix (220).--, in [0064], [0079], and [0103]., and, -- the Cho-NAA-index (McKnight T R, von dem Bussche M H, Vigneron D B, Lu Y, Berger M S, McDermott M W, Dillon W P, Graves E E, Pirzkall A, Nelson S J. Histopathological validation of a three-dimensional magnetic resonance spectroscopy index as a predictor of tumor presence. Journal of Neurosurgery 2002; 97(4):794-802)… the method 100 may include a step 160 of outputting the one or more measurements of one or more metabolites for the region of interest, for example, for further processing, transmitting, and/or storing (e.g., an electronic record system such as PACs). In some embodiments, the outputting may include registering the metabolite measurement(s) with clinical anatomical MRI data, for example, to generate a map. For example, volumes can be co-registered using a rigid transformation and resampled (e.g., using trilinear interpolation into a high-resolution T1w image space), enabling overlays of the metabolic measurement(s) (information) onto the anatomic MRI. This map can enable a visual assessment of metabolic changes in spatially dependent manner. In some examples, a clinician can select a voxel on the map to bring up the corresponding spectrum.
[0062] In some embodiments, the one or more measurements may be used within a clinical workflow. For example, the one or more measurements may be used to generate a radiation plan for radiation therapy, target planning for surgical biopsy and/or resection, diagnosis indication of a medical condition (e.g., tumor, brain traumatic injury, etc.), among others, or a combination thereof. For example, for radiation therapy planning, the one or more measurements may be integrated with clinical 3D MRI volumes, enabling clinicians to evaluate relevant metabolite levels and the underlying spectra used for this quantitation, to delineate target volumes for radiation therapy planning based on this information--, in [0060]-[0064]; and, -- FIG. 6D shows the correlation between the Cho/NAA ratio, calculated by CEMD and MIDAS for the testing set. The solid lines plot the mean value between the two fitting techniques for each bin of values, and the shaded region around the line indicates +/−1 standard deviation. Overlaid in light gray are the histograms for the distribution of metabolite values computed by MIDAS. The variance of CEMD predictions compared to MIDAS is inversely correlated with the number of training samples available--, in [0093], and, -- The CEMD is an unsupervised deep learning architecture that incorporates spectral models to generate an encoding of spectral parameters, which is advantageous because it does not require any “ground truth” spectral quantitation for training. The predictions of the CEMD have contextual meaning, and the CEMD was trained to make these predictions within the constraints of an explicitly defined spectral model. Once trained, the CEMD performs spectral fitting on volumetric data in under one minute using standard computer hardware. The order of magnitude improvement in fitting time can greatly benefit the clinical adoption of whole-brain MRSI.--, in [0098]-[0101]; and see KOCH: e.g., -- (46) The decoding path 504 generally includes a transposed convolutional layer with additional feature concatenation layers from the encoding layers. The output of each transposed convolutional layer is a feature map that is passed to the concatenation layer. At each transposed convolution step, the number of feature channels in the feature map can be halved, or otherwise reduced. As noted, each upsampled feature map is also concatenated with the corresponding feature map from the encoding path 502. The concatenated feature map is then passed to a convolutional layer followed by a batch normalization layer and a nonlinear layer (e.g., a ReLU). The output of the convolutional layer is a feature map that is passed to the batch normalization layer, the output of which is a feature map passed to the nonlinear layer. The final layer is a convolutional layer (e.g., a 1×1×1 convolution) with linear activation, which is applied to output the susceptibility estimations.
(47) As one example, the encoder-decoder network shown in FIG. 5 was trained using an L1 loss function and ADAM optimizer. The initial learning rate in this example was set as 0.001, with an exponential decay of the rate occurring at every 200 steps with a time constant of 0.9. Due to the deep nature of this neural network and the heavy computation required to train it, 10,000 datasets were used for training each model with batch size 8.
(48) Due to the different target QSM map resolutions for each of the test scenarios, two encoder-decoder network models were trained in this example. For a dataset with voxel size isotropic 1.06 mm, the network was trained to independently invert 3D parcels of 128×128×128 voxels. For another dataset with voxel size 0.5×0.5×2.0 mm.sup.3, the network was trained to invert 3D parcels of 192×192×64 voxels. In the prediction stage, segmentation of full-resolution input volumes into the parcels with parcel size equal to neural network input data size was performed using 32×32×32 overlap regions. After QSM inference using the trained encoder-decoder networks, the parcels were combined to form a composite image.--, in line 32, col. 10 through line 27, col. 11).

Re Claim 11, SHIM as modified by GHADIMI and KOCH further disclose determining one or more lesion sizes or a lesion load using the prediction (see SHIM: e.g., Fig. 2, --[0050] The first neural network may include a trained encoder and a decoder. The trained encoder may determine a plurality of baseline parameters for each spectrum/voxel and the decoder may be defined by a mathematical technique to convert/determine the baseline component(s) (or datapoints) using the baseline parameters for each spectra. The mathematical technique may include but is not limited to one or more wavelet reconstruction equations. In some embodiments, the baseline parameters can represent local and non-local oscillations in the signal, enabling modeling by the decoder of the overall shape of the spectrum not including the metabolite peaks.--, in [0050], and, --0064] FIG. 2 shows an example 200 of the neural network architecture according to embodiments. In some embodiments, the neural network architecture 200 may include two serial encoder-decoder stages: a first (convolutional) neural network 210 for determining baseline components for each voxel/spectrum and a second (convolutional) neural network 230 for determining peak components representing peak model for each respectively voxel/spectrum. In some embodiments, the (first) matrix of the plurality of the spectra/voxels may be passed through the first neural network 210 to determine a plurality of baseline parameters/coefficients (also referred to as “wavelet coefficients”) θ.sub.B. Next, the decoder of the first neural network 210 may use the baseline parameters to determine the one or more baseline components using a wavelet reconstruction technique. As shown, the baseline components in a form of a second matrix may be subtracted from the input/first matrix (220).--, in [0064], [0079], and [0103]., and, -- the Cho-NAA-index (McKnight T R, von dem Bussche M H, Vigneron D B, Lu Y, Berger M S, McDermott M W, Dillon W P, Graves E E, Pirzkall A, Nelson S J. Histopathological validation of a three-dimensional magnetic resonance spectroscopy index as a predictor of tumor presence. Journal of Neurosurgery 2002; 97(4):794-802)… the method 100 may include a step 160 of outputting the one or more measurements of one or more metabolites for the region of interest, for example, for further processing, transmitting, and/or storing (e.g., an electronic record system such as PACs). In some embodiments, the outputting may include registering the metabolite measurement(s) with clinical anatomical MRI data, for example, to generate a map. For example, volumes can be co-registered using a rigid transformation and resampled (e.g., using trilinear interpolation into a high-resolution T1w image space), enabling overlays of the metabolic measurement(s) (information) onto the anatomic MRI. This map can enable a visual assessment of metabolic changes in spatially dependent manner. In some examples, a clinician can select a voxel on the map to bring up the corresponding spectrum.
[0062] In some embodiments, the one or more measurements may be used within a clinical workflow. For example, the one or more measurements may be used to generate a radiation plan for radiation therapy, target planning for surgical biopsy and/or resection, diagnosis indication of a medical condition (e.g., tumor, brain traumatic injury, etc.), among others, or a combination thereof. For example, for radiation therapy planning, the one or more measurements may be integrated with clinical 3D MRI volumes, enabling clinicians to evaluate relevant metabolite levels and the underlying spectra used for this quantitation, to delineate target volumes for radiation therapy planning based on this information--, in [0060]-[0064]; and, -- FIG. 6D shows the correlation between the Cho/NAA ratio, calculated by CEMD and MIDAS for the testing set. The solid lines plot the mean value between the two fitting techniques for each bin of values, and the shaded region around the line indicates +/−1 standard deviation. Overlaid in light gray are the histograms for the distribution of metabolite values computed by MIDAS. The variance of CEMD predictions compared to MIDAS is inversely correlated with the number of training samples available--, in [0093], and, -- The CEMD is an unsupervised deep learning architecture that incorporates spectral models to generate an encoding of spectral parameters, which is advantageous because it does not require any “ground truth” spectral quantitation for training. The predictions of the CEMD have contextual meaning, and the CEMD was trained to make these predictions within the constraints of an explicitly defined spectral model. Once trained, the CEMD performs spectral fitting on volumetric data in under one minute using standard computer hardware. The order of magnitude improvement in fitting time can greatly benefit the clinical adoption of whole-brain MRSI.--, in [0098]-[0101]; and see KOCH: e.g., -- (46) The decoding path 504 generally includes a transposed convolutional layer with additional feature concatenation layers from the encoding layers. The output of each transposed convolutional layer is a feature map that is passed to the concatenation layer. At each transposed convolution step, the number of feature channels in the feature map can be halved, or otherwise reduced. As noted, each upsampled feature map is also concatenated with the corresponding feature map from the encoding path 502. The concatenated feature map is then passed to a convolutional layer followed by a batch normalization layer and a nonlinear layer (e.g., a ReLU). The output of the convolutional layer is a feature map that is passed to the batch normalization layer, the output of which is a feature map passed to the nonlinear layer. The final layer is a convolutional layer (e.g., a 1×1×1 convolution) with linear activation, which is applied to output the susceptibility estimations.
(47) As one example, the encoder-decoder network shown in FIG. 5 was trained using an L1 loss function and ADAM optimizer. The initial learning rate in this example was set as 0.001, with an exponential decay of the rate occurring at every 200 steps with a time constant of 0.9. Due to the deep nature of this neural network and the heavy computation required to train it, 10,000 datasets were used for training each model with batch size 8.
(48) Due to the different target QSM map resolutions for each of the test scenarios, two encoder-decoder network models were trained in this example. For a dataset with voxel size isotropic 1.06 mm, the network was trained to independently invert 3D parcels of 128×128×128 voxels. For another dataset with voxel size 0.5×0.5×2.0 mm.sup.3, the network was trained to invert 3D parcels of 192×192×64 voxels. In the prediction stage, segmentation of full-resolution input volumes into the parcels with parcel size equal to neural network input data size was performed using 32×32×32 overlap regions. After QSM inference using the trained encoder-decoder networks, the parcels were combined to form a composite image.--, in line 32, col. 10 through line 27, col. 11).
Re Claim 12, SHIM as modified by GHADIMI and KOCH further disclose accessing data corresponding to a previous MRI; determining a change in a quantity, a size or cumulative size of one or more lesions using the prediction and the data; and generating an output that represents the change (see SHIM: e.g., Fig. 2, --[0050] The first neural network may include a trained encoder and a decoder. The trained encoder may determine a plurality of baseline parameters for each spectrum/voxel and the decoder may be defined by a mathematical technique to convert/determine the baseline component(s) (or datapoints) using the baseline parameters for each spectra. The mathematical technique may include but is not limited to one or more wavelet reconstruction equations. In some embodiments, the baseline parameters can represent local and non-local oscillations in the signal, enabling modeling by the decoder of the overall shape of the spectrum not including the metabolite peaks.--, in [0050], and, --0064] FIG. 2 shows an example 200 of the neural network architecture according to embodiments. In some embodiments, the neural network architecture 200 may include two serial encoder-decoder stages: a first (convolutional) neural network 210 for determining baseline components for each voxel/spectrum and a second (convolutional) neural network 230 for determining peak components representing peak model for each respectively voxel/spectrum. In some embodiments, the (first) matrix of the plurality of the spectra/voxels may be passed through the first neural network 210 to determine a plurality of baseline parameters/coefficients (also referred to as “wavelet coefficients”) θ.sub.B. Next, the decoder of the first neural network 210 may use the baseline parameters to determine the one or more baseline components using a wavelet reconstruction technique. As shown, the baseline components in a form of a second matrix may be subtracted from the input/first matrix (220).--, in [0064], [0079], and [0103]., and, -- the Cho-NAA-index (McKnight T R, von dem Bussche M H, Vigneron D B, Lu Y, Berger M S, McDermott M W, Dillon W P, Graves E E, Pirzkall A, Nelson S J. Histopathological validation of a three-dimensional magnetic resonance spectroscopy index as a predictor of tumor presence. Journal of Neurosurgery 2002; 97(4):794-802)… the method 100 may include a step 160 of outputting the one or more measurements of one or more metabolites for the region of interest, for example, for further processing, transmitting, and/or storing (e.g., an electronic record system such as PACs). In some embodiments, the outputting may include registering the metabolite measurement(s) with clinical anatomical MRI data, for example, to generate a map. For example, volumes can be co-registered using a rigid transformation and resampled (e.g., using trilinear interpolation into a high-resolution T1w image space), enabling overlays of the metabolic measurement(s) (information) onto the anatomic MRI. This map can enable a visual assessment of metabolic changes in spatially dependent manner. In some examples, a clinician can select a voxel on the map to bring up the corresponding spectrum.
[0062] In some embodiments, the one or more measurements may be used within a clinical workflow. For example, the one or more measurements may be used to generate a radiation plan for radiation therapy, target planning for surgical biopsy and/or resection, diagnosis indication of a medical condition (e.g., tumor, brain traumatic injury, etc.), among others, or a combination thereof. For example, for radiation therapy planning, the one or more measurements may be integrated with clinical 3D MRI volumes, enabling clinicians to evaluate relevant metabolite levels and the underlying spectra used for this quantitation, to delineate target volumes for radiation therapy planning based on this information--, in [0060]-[0064]; and, -- FIG. 6D shows the correlation between the Cho/NAA ratio, calculated by CEMD and MIDAS for the testing set. The solid lines plot the mean value between the two fitting techniques for each bin of values, and the shaded region around the line indicates +/−1 standard deviation. Overlaid in light gray are the histograms for the distribution of metabolite values computed by MIDAS. The variance of CEMD predictions compared to MIDAS is inversely correlated with the number of training samples available--, in [0093], and, -- The CEMD is an unsupervised deep learning architecture that incorporates spectral models to generate an encoding of spectral parameters, which is advantageous because it does not require any “ground truth” spectral quantitation for training. The predictions of the CEMD have contextual meaning, and the CEMD was trained to make these predictions within the constraints of an explicitly defined spectral model. Once trained, the CEMD performs spectral fitting on volumetric data in under one minute using standard computer hardware. The order of magnitude improvement in fitting time can greatly benefit the clinical adoption of whole-brain MRSI.--, in [0098]-[0101]; and see KOCH: e.g., -- (46) The decoding path 504 generally includes a transposed convolutional layer with additional feature concatenation layers from the encoding layers. The output of each transposed convolutional layer is a feature map that is passed to the concatenation layer. At each transposed convolution step, the number of feature channels in the feature map can be halved, or otherwise reduced. As noted, each upsampled feature map is also concatenated with the corresponding feature map from the encoding path 502. The concatenated feature map is then passed to a convolutional layer followed by a batch normalization layer and a nonlinear layer (e.g., a ReLU). The output of the convolutional layer is a feature map that is passed to the batch normalization layer, the output of which is a feature map passed to the nonlinear layer. The final layer is a convolutional layer (e.g., a 1×1×1 convolution) with linear activation, which is applied to output the susceptibility estimations.
(47) As one example, the encoder-decoder network shown in FIG. 5 was trained using an L1 loss function and ADAM optimizer. The initial learning rate in this example was set as 0.001, with an exponential decay of the rate occurring at every 200 steps with a time constant of 0.9. Due to the deep nature of this neural network and the heavy computation required to train it, 10,000 datasets were used for training each model with batch size 8.
(48) Due to the different target QSM map resolutions for each of the test scenarios, two encoder-decoder network models were trained in this example. For a dataset with voxel size isotropic 1.06 mm, the network was trained to independently invert 3D parcels of 128×128×128 voxels. For another dataset with voxel size 0.5×0.5×2.0 mm.sup.3, the network was trained to invert 3D parcels of 192×192×64 voxels. In the prediction stage, segmentation of full-resolution input volumes into the parcels with parcel size equal to neural network input data size was performed using 32×32×32 overlap regions. After QSM inference using the trained encoder-decoder networks, the parcels were combined to form a composite image.--, in line 32, col. 10 through line 27, col. 11).


Re Claim 13, SHIM as modified by GHADIMI and KOCH further disclose recommending changing a treatment strategy based on the prediction (see SHIM: e.g., -- the method 100 may include a step 110 of obtaining magnetic resonance (MR) spectroscopy data of a region of interest of a subject. For example, if evaluating a subject for a neurologic pathology, such as glioblastoma, the region of interest may be a region of the brain. In some embodiments, the region of the brain may include the whole-brain or a portion of the brain, such as an anatomic region of the brain.
[0045] In some embodiments, the MR spectroscopy data may be acquired using any available magnetic resonance system capable of acquiring MR spectroscopy data, such as a 3T MRI scanner. In some embodiments, the data may be acquired using one or more MR protocols or parameters (e.g., pulse sequence, flip angle (“FA”), RF pulse phase, TR, echo time (“TE’), sampling patterns, etc.). In some embodiments, the one or more MR protocols/parameters may be specific to the metabolite(s) to be measured and/or region(s) of interest to be scanned/imaged.--, in [0044]-[0046]);
and a first three-dimensional MRI image was generated using a first type of MRI sequence that is different than a second type of MRI sequence used to generate a second three-dimensional MRI image (see SHIM: e.g., -- the method 100 may include a step 110 of obtaining magnetic resonance (MR) spectroscopy data of a region of interest of a subject. For example, if evaluating a subject for a neurologic pathology, such as glioblastoma, the region of interest may be a region of the brain. In some embodiments, the region of the brain may include the whole-brain or a portion of the brain, such as an anatomic region of the brain.
[0045] In some embodiments, the MR spectroscopy data may be acquired using any available magnetic resonance system capable of acquiring MR spectroscopy data, such as a 3T MRI scanner. In some embodiments, the data may be acquired using one or more MR protocols or parameters (e.g., pulse sequence, flip angle (“FA”), RF pulse phase, TR, echo time (“TE’), sampling patterns, etc.). In some embodiments, the one or more MR protocols/parameters may be specific to the metabolite(s) to be measured and/or region(s) of interest to be scanned/imaged.--, in [0044]-[0046]; also see Figs. 8, 9, 10, 11; and, --FIG. 8, which shows the individual metabolite maps, the Cho/NAA ratio map, and corresponding contrast-enhanced T1-weighted (CE-T1w) and fluid-attenuated inversion recovery (FLAIR) MM volumes. Superimposed on the CE-T1w image is a contour drawn by a neuroradiologist to indicate contrast enhancing tissue and the surgical cavity, regions that would normally be targeted for high dose radiation therapy.--, in [0095]-[0097]).


Re Claim 14, SHIM as modified by GHADIMI and KOCH further disclose providing an output corresponding to a possible or confirmed diagnosis of the subject of multiple sclerosis based at least in part on the prediction (see SHIM: e.g., Fig. 2, --[0050] The first neural network may include a trained encoder and a decoder. The trained encoder may determine a plurality of baseline parameters for each spectrum/voxel and the decoder may be defined by a mathematical technique to convert/determine the baseline component(s) (or datapoints) using the baseline parameters for each spectra. The mathematical technique may include but is not limited to one or more wavelet reconstruction equations. In some embodiments, the baseline parameters can represent local and non-local oscillations in the signal, enabling modeling by the decoder of the overall shape of the spectrum not including the metabolite peaks.--, in [0050], and, --0064] FIG. 2 shows an example 200 of the neural network architecture according to embodiments. In some embodiments, the neural network architecture 200 may include two serial encoder-decoder stages: a first (convolutional) neural network 210 for determining baseline components for each voxel/spectrum and a second (convolutional) neural network 230 for determining peak components representing peak model for each respectively voxel/spectrum. In some embodiments, the (first) matrix of the plurality of the spectra/voxels may be passed through the first neural network 210 to determine a plurality of baseline parameters/coefficients (also referred to as “wavelet coefficients”) θ.sub.B. Next, the decoder of the first neural network 210 may use the baseline parameters to determine the one or more baseline components using a wavelet reconstruction technique. As shown, the baseline components in a form of a second matrix may be subtracted from the input/first matrix (220).--, in [0064], [0079], and [0103]., and, -- the Cho-NAA-index (McKnight T R, von dem Bussche M H, Vigneron D B, Lu Y, Berger M S, McDermott M W, Dillon W P, Graves E E, Pirzkall A, Nelson S J. Histopathological validation of a three-dimensional magnetic resonance spectroscopy index as a predictor of tumor presence. Journal of Neurosurgery 2002; 97(4):794-802)… the method 100 may include a step 160 of outputting the one or more measurements of one or more metabolites for the region of interest, for example, for further processing, transmitting, and/or storing (e.g., an electronic record system such as PACs). In some embodiments, the outputting may include registering the metabolite measurement(s) with clinical anatomical MRI data, for example, to generate a map. For example, volumes can be co-registered using a rigid transformation and resampled (e.g., using trilinear interpolation into a high-resolution T1w image space), enabling overlays of the metabolic measurement(s) (information) onto the anatomic MRI. This map can enable a visual assessment of metabolic changes in spatially dependent manner. In some examples, a clinician can select a voxel on the map to bring up the corresponding spectrum.
[0062] In some embodiments, the one or more measurements may be used within a clinical workflow. For example, the one or more measurements may be used to generate a radiation plan for radiation therapy, target planning for surgical biopsy and/or resection, diagnosis indication of a medical condition (e.g., tumor, brain traumatic injury, etc.), among others, or a combination thereof. For example, for radiation therapy planning, the one or more measurements may be integrated with clinical 3D MRI volumes, enabling clinicians to evaluate relevant metabolite levels and the underlying spectra used for this quantitation, to delineate target volumes for radiation therapy planning based on this information--, in [0060]-[0064]; and, -- FIG. 6D shows the correlation between the Cho/NAA ratio, calculated by CEMD and MIDAS for the testing set. The solid lines plot the mean value between the two fitting techniques for each bin of values, and the shaded region around the line indicates +/−1 standard deviation. Overlaid in light gray are the histograms for the distribution of metabolite values computed by MIDAS. The variance of CEMD predictions compared to MIDAS is inversely correlated with the number of training samples available--, in [0093], and, -- The CEMD is an unsupervised deep learning architecture that incorporates spectral models to generate an encoding of spectral parameters, which is advantageous because it does not require any “ground truth” spectral quantitation for training. The predictions of the CEMD have contextual meaning, and the CEMD was trained to make these predictions within the constraints of an explicitly defined spectral model. Once trained, the CEMD performs spectral fitting on volumetric data in under one minute using standard computer hardware. The order of magnitude improvement in fitting time can greatly benefit the clinical adoption of whole-brain MRSI.--, in [0098]-[0101]; and see KOCH: e.g., -- (46) The decoding path 504 generally includes a transposed convolutional layer with additional feature concatenation layers from the encoding layers. The output of each transposed convolutional layer is a feature map that is passed to the concatenation layer. At each transposed convolution step, the number of feature channels in the feature map can be halved, or otherwise reduced. As noted, each upsampled feature map is also concatenated with the corresponding feature map from the encoding path 502. The concatenated feature map is then passed to a convolutional layer followed by a batch normalization layer and a nonlinear layer (e.g., a ReLU). The output of the convolutional layer is a feature map that is passed to the batch normalization layer, the output of which is a feature map passed to the nonlinear layer. The final layer is a convolutional layer (e.g., a 1×1×1 convolution) with linear activation, which is applied to output the susceptibility estimations.
(47) As one example, the encoder-decoder network shown in FIG. 5 was trained using an L1 loss function and ADAM optimizer. The initial learning rate in this example was set as 0.001, with an exponential decay of the rate occurring at every 200 steps with a time constant of 0.9. Due to the deep nature of this neural network and the heavy computation required to train it, 10,000 datasets were used for training each model with batch size 8.
(48) Due to the different target QSM map resolutions for each of the test scenarios, two encoder-decoder network models were trained in this example. For a dataset with voxel size isotropic 1.06 mm, the network was trained to independently invert 3D parcels of 128×128×128 voxels. For another dataset with voxel size 0.5×0.5×2.0 mm.sup.3, the network was trained to invert 3D parcels of 192×192×64 voxels. In the prediction stage, segmentation of full-resolution input volumes into the parcels with parcel size equal to neural network input data size was performed using 32×32×32 overlap regions. After QSM inference using the trained encoder-decoder networks, the parcels were combined to form a composite image.--, in line 32, col. 10 through line 27, col. 11;
also see GHADIMI: e.g. in [0089]-[0090] {for “the concatenated representation” such as generating a segmentation map}).

Re Claim 15, SHIM as modified by GHADIMI and KOCH further disclose diagnosing the subject with multiple sclerosis based at least in part on the prediction (see SHIM: e.g., Fig. 2, --[0050] The first neural network may include a trained encoder and a decoder. The trained encoder may determine a plurality of baseline parameters for each spectrum/voxel and the decoder may be defined by a mathematical technique to convert/determine the baseline component(s) (or datapoints) using the baseline parameters for each spectra. The mathematical technique may include but is not limited to one or more wavelet reconstruction equations. In some embodiments, the baseline parameters can represent local and non-local oscillations in the signal, enabling modeling by the decoder of the overall shape of the spectrum not including the metabolite peaks.--, in [0050], and, --0064] FIG. 2 shows an example 200 of the neural network architecture according to embodiments. In some embodiments, the neural network architecture 200 may include two serial encoder-decoder stages: a first (convolutional) neural network 210 for determining baseline components for each voxel/spectrum and a second (convolutional) neural network 230 for determining peak components representing peak model for each respectively voxel/spectrum. In some embodiments, the (first) matrix of the plurality of the spectra/voxels may be passed through the first neural network 210 to determine a plurality of baseline parameters/coefficients (also referred to as “wavelet coefficients”) θ.sub.B. Next, the decoder of the first neural network 210 may use the baseline parameters to determine the one or more baseline components using a wavelet reconstruction technique. As shown, the baseline components in a form of a second matrix may be subtracted from the input/first matrix (220).--, in [0064], [0079], and [0103]., and, -- the Cho-NAA-index (McKnight T R, von dem Bussche M H, Vigneron D B, Lu Y, Berger M S, McDermott M W, Dillon W P, Graves E E, Pirzkall A, Nelson S J. Histopathological validation of a three-dimensional magnetic resonance spectroscopy index as a predictor of tumor presence. Journal of Neurosurgery 2002; 97(4):794-802)… the method 100 may include a step 160 of outputting the one or more measurements of one or more metabolites for the region of interest, for example, for further processing, transmitting, and/or storing (e.g., an electronic record system such as PACs). In some embodiments, the outputting may include registering the metabolite measurement(s) with clinical anatomical MRI data, for example, to generate a map. For example, volumes can be co-registered using a rigid transformation and resampled (e.g., using trilinear interpolation into a high-resolution T1w image space), enabling overlays of the metabolic measurement(s) (information) onto the anatomic MRI. This map can enable a visual assessment of metabolic changes in spatially dependent manner. In some examples, a clinician can select a voxel on the map to bring up the corresponding spectrum.
[0062] In some embodiments, the one or more measurements may be used within a clinical workflow. For example, the one or more measurements may be used to generate a radiation plan for radiation therapy, target planning for surgical biopsy and/or resection, diagnosis indication of a medical condition (e.g., tumor, brain traumatic injury, etc.), among others, or a combination thereof. For example, for radiation therapy planning, the one or more measurements may be integrated with clinical 3D MRI volumes, enabling clinicians to evaluate relevant metabolite levels and the underlying spectra used for this quantitation, to delineate target volumes for radiation therapy planning based on this information--, in [0060]-[0064]; and, -- FIG. 6D shows the correlation between the Cho/NAA ratio, calculated by CEMD and MIDAS for the testing set. The solid lines plot the mean value between the two fitting techniques for each bin of values, and the shaded region around the line indicates +/−1 standard deviation. Overlaid in light gray are the histograms for the distribution of metabolite values computed by MIDAS. The variance of CEMD predictions compared to MIDAS is inversely correlated with the number of training samples available--, in [0093], and, -- The CEMD is an unsupervised deep learning architecture that incorporates spectral models to generate an encoding of spectral parameters, which is advantageous because it does not require any “ground truth” spectral quantitation for training. The predictions of the CEMD have contextual meaning, and the CEMD was trained to make these predictions within the constraints of an explicitly defined spectral model. Once trained, the CEMD performs spectral fitting on volumetric data in under one minute using standard computer hardware. The order of magnitude improvement in fitting time can greatly benefit the clinical adoption of whole-brain MRSI.--, in [0098]-[0101]; and see KOCH: e.g., --(1) Quantitative susceptibility mapping (“QSM”) is an established, yet growing field of MRI development. QSM is an imaging technique that provides high anatomical contrast and measurements of tissue susceptibility based on biomaterial compositions. As a result, QSM can be advantageously used for monitoring iron overload in diseases such as in Alzheimer's disease; for monitoring demyelinating diseases, such as Parkinson's disease and multiple sclerosis; for monitoring calcifications in the brain and other tissues; and for performing functional MRI.--, in lins 19-41, col. 1; and, -- (46) The decoding path 504 generally includes a transposed convolutional layer with additional feature concatenation layers from the encoding layers. The output of each transposed convolutional layer is a feature map that is passed to the concatenation layer. At each transposed convolution step, the number of feature channels in the feature map can be halved, or otherwise reduced. As noted, each upsampled feature map is also concatenated with the corresponding feature map from the encoding path 502. The concatenated feature map is then passed to a convolutional layer followed by a batch normalization layer and a nonlinear layer (e.g., a ReLU). The output of the convolutional layer is a feature map that is passed to the batch normalization layer, the output of which is a feature map passed to the nonlinear layer. The final layer is a convolutional layer (e.g., a 1×1×1 convolution) with linear activation, which is applied to output the susceptibility estimations.
(47) As one example, the encoder-decoder network shown in FIG. 5 was trained using an L1 loss function and ADAM optimizer. The initial learning rate in this example was set as 0.001, with an exponential decay of the rate occurring at every 200 steps with a time constant of 0.9. Due to the deep nature of this neural network and the heavy computation required to train it, 10,000 datasets were used for training each model with batch size 8.
(48) Due to the different target QSM map resolutions for each of the test scenarios, two encoder-decoder network models were trained in this example. For a dataset with voxel size isotropic 1.06 mm, the network was trained to independently invert 3D parcels of 128×128×128 voxels. For another dataset with voxel size 0.5×0.5×2.0 mm.sup.3, the network was trained to invert 3D parcels of 192×192×64 voxels. In the prediction stage, segmentation of full-resolution input volumes into the parcels with parcel size equal to neural network input data size was performed using 32×32×32 overlap regions. After QSM inference using the trained encoder-decoder networks, the parcels were combined to form a composite image.--, in line 32, col. 10 through line 27, col. 11;
also see GHADIMI: e.g. in [0089]-[0090] {for “the concatenated representation” such as generating a segmentation map}).

Re Claims 16-19, claims 16-19 are corresponding system claim to claims 1-4, respectively.  Claims 16-19 thus are rejected for the similar reasons for claims 1-4. See above discussions with regard to claims 1-4 respectively. SHIM as modified by GHADIMI and KOCH further disclose a system comprising: one or more data processors; and anon-transitory computer readable storage medium containing instructions which, when executed on the one or more data processors, cause the one or more data processors to perform a set of actions (see SHIM: e.g., --Systems and methods provide a parallelized deep learning approach to spectral fitting for magnetic resonance spectroscopy data enabling accurate and rapid spectral fitting and determination of metabolite measurements using a conventional computer. The method may include processing multi-spectra magnetic resonance (MR) spectroscopy data of a region of interest through a series of neural networks. The method may include determining baseline components of each spectrum using a first neural network of the series, generating baseline-corrected components for each spectrum using the baseline components; and determining one or more peak components of each spectrum using a second neural network of the series and the baseline-corrected components. The method may further include determining one or more metabolite measurements of the one or more metabolites in the region of interest using the one or more peak components.--, in abstract).

Re Claim 20, claim 20 is corresponding product claim to claim 1, respectively.  Claim 20 thus is rejected for the similar reasons for claim 1. See above discussions with regard to claim 1 respectively. SHIM as modified by GHADIMI and KOCH further computer-program product tangibly embodied in a non-transitory machine- readable storage medium, including instructions configured to cause one or more data processors to perform a set of actions (see SHIM: e.g., --[0116] Furthermore, embodiments can be implemented by hardware, software, scripting languages, firmware, middleware, microcode, hardware description languages, and/or any combination thereof. When implemented in software, firmware, middleware, scripting language, and/or microcode, the program code or code segments to perform the necessary tasks can be stored in a machine readable medium such as a storage medium. A code segment or machine-executable instruction can represent a procedure, a function, a subprogram, a program, a routine, a subroutine, a module, a software package, a script, a class, or any combination of instructions, data structures, and/or program statements. A code segment can be coupled to another code segment or a hardware circuit by passing and/or receiving information, data, arguments, parameters, and/or memory contents. Information, arguments, parameters, data, etc. can be passed, forwarded, or transmitted via any suitable means including memory sharing, message passing, ticket passing, network transmission, etc.
[0117] For a firmware and/or software implementation, the methodologies can be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein.--, in [0116]-[0118]).








Conclusion
Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WEI WEN YANG whose telephone number is (571)270-5670.  The examiner can normally be reached on 8:00 - 5:00 pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Amandeep Saini can be reached on 571-272-3382.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/WEI WEN YANG/Primary Examiner, Art Unit 2662
Read full office action
Prosecution Timeline

Feb 22, 2023
Application Filed
Jul 08, 2025
Non-Final Rejection — §103
Sep 22, 2025
Interview Requested
Oct 02, 2025
Applicant Interview (Telephonic)
Oct 02, 2025
Examiner Interview Summary
Oct 09, 2025
Response Filed
Jan 06, 2026
Final Rejection — §103
Feb 25, 2026
Interview Requested
Mar 04, 2026
Examiner Interview Summary
Mar 04, 2026
Applicant Interview (Telephonic)
Apr 07, 2026
Request for Continued Examination
Apr 12, 2026
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

18/230,228
Patent 12602789
ENDOSCOPIC IMAGE SEGMENTATION METHOD BASED ON SINGLE IMAGE AND DEEP LEARNING NETWORK
2y 5m to grant Granted Apr 14, 2026
17/754,685
Patent 12586413
METHOD FOR RECOGNIZING ACTIVITIES USING SEPARATE SPATIAL AND TEMPORAL ATTENTION WEIGHTS
2y 5m to grant Granted Mar 24, 2026
18/523,655
Patent 12582359
IMAGE DISPLAY METHOD, STORAGE MEDIUM, AND IMAGE DISPLAY DEVICE
2y 5m to grant Granted Mar 24, 2026
18/168,569
Patent 12573034
IMAGE PROCESSING APPARATUS, IMAGE PROCESSING METHOD AND PROGRAM, AND IMAGE PROCESSING SYSTEM
2y 5m to grant Granted Mar 10, 2026
18/196,364
Patent 12567168
DATA PROCESSING METHOD AND APPARATUS, DEVICE, AND READABLE STORAGE MEDIUM
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
82%
Grant Probability
93%
With Interview (+10.9%)
2y 8m
Median Time to Grant
Moderate
PTA Risk
Based on 657 resolved cases by this examiner. Grant probability derived from career allow rate.