Last updated: April 19, 2026
Application No. 17/180,182
USING NEURAL NETWORKS TO ESTIMATE MOTION VECTORS FOR MOTION CORRECTED PET IMAGE RECONSTRUCTION

Non-Final OA §103
Filed
Feb 19, 2021
Examiner
RIVERA-MARTINEZ, GUILLERMO M
Art Unit
2677
Tech Center
2600 — Communications
Assignee
Canon Medical Systems Corporation
OA Round
7 (Non-Final)
Interview Optional

— +2.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 503 resolved cases, 2023–2026
Examiner Intelligence

RIVERA-MARTINEZ, GUILLERMO M View full profile →
Grants 78% — above average
Career Allow Rate
393 granted / 503 resolved
+16.1% vs TC avg
Minimal +2% lift
Without
With
+2.3%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
28 currently pending
Career history
531
Total Applications
across all art units
Statute-Specific Performance

§101
5.9%
-34.1% vs TC avg
§103
42.8%
+2.8% vs TC avg
§102
22.0%
-18.0% vs TC avg
§112
26.7%
-13.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 503 resolved cases
Office Action

§103
DETAILED ACTION
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action (OA) has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on December 23, 2025 has been entered. Claims 1, 4, 10, and 18 have been amended. An action on the merits follows. Claims 1-4, 10-16, and 18-26 are pending on the application.

Response to Arguments
Applicant’s arguments filed on December 23, 2025 with respect to pending claims have been considered but are moot in view of the new ground(s) of rejection. The amended claims resulted in changes to the scope and contents; therefore, the grounds of rejection are modified accordingly. It is noted that previously applied prior arts remain in effect.
Regarding prior rejection of Claim 1, and similar independent claims, under 35 U.S.C. 103 previously set forth in the Final Office OA of June 24, 2025, Applicant asserts that “the '382 application fails to disclose iteratively reconstructing an image based on the at least one motion vector, as recited in Claim 1” (Remarks Pg. 8). Examiner agrees with Applicant’s remarks regarding “the '382 application fails to disclose iteratively reconstructing an image based on the at least one motion vector, as recited in Claim 1”, as indicated above. However, it should be noted that Hsieh (‘081), prior art previously applied in the Final OA, teaches “iteratively reconstructing an image based on the at least one motion vector”, as previously set forth in the Final OA, Pgs. 26-28.
 Regarding prior rejection of Claim 1, and similar independent claims, under 35 U.S.C. 103 previously set forth in the Final OA, Applicant further asserts that “the '081 application fails to disclose that the machine learning-based system is trained using gated PET data, wherein the machine learning-based system is trained using a series of images to produce a trained machine learning-based system that outputs at least one motion vector indicating a movement of the at least one object between the series of images, as recited in Claim 1” (Remarks, Pg. 9). However, Applicant did not provide further remarks explaining why, or how, “the '081 application fails to disclose” the feature limitations indicated above. Therefore, Applicant’s remarks above are respectfully found unconvincing. However, it should be noted that SUN ('868), prior art previously applied in the Final OA, teaches similar features corresponding the claimed “wherein the machine learning-based system is trained using gated Positron Emission Tomography (PET) data” recited in claim 1, as previously set forth in the Final OA, Pgs. 29-31. Additionally, it should be noted that Pio (‘382), primary prior art previously applied in the Final OA, discloses similar features corresponding to the claimed “training a machine learning-based system based on the series of images to produce a trained machine learning-based system that outputs at least one motion vector indicating a movement of the at least one object between the series of images” recited in claim 1 and was used to reject above claim feature limitations, as previously set forth in the Final OA, Pgs. 22-26.
Regarding prior rejection of Claim 1, and similar independent claims, under 35 U.S.C. 103 previously set forth in the Final OA, Applicant asserts that “the '868 application fails to disclose that the machine learning-based system is trained using gated PET data, wherein the machine learning-based system is trained using a series of images to produce a trained machine learning-based system that outputs at least one motion vector indicating a movement of the at least one object between the series of images, as recited in Claim 1” (Remarks, Pg. 9-10). However, Applicant did not provide further remarks explaining why, or how, “the '868 application fails to disclose” the feature limitations indicated above. Therefore, Applicant’s remarks above are respectfully found unconvincing. However, it should be noted that SUN ('868), prior art previously applied in the Final OA, teaches similar features corresponding the claimed “wherein the machine learning-based system is trained using gated Positron Emission Tomography (PET) data” recited in claim 1, as previously set forth in the Final OA, Pgs. 29-31. Additionally, it should be noted that Pio (‘382), primary prior art previously applied in the Final OA, discloses similar features corresponding to the claimed “training a machine learning-based system based on the series of images to produce a trained machine learning-based system that outputs at least one motion vector indicating a movement of the at least one object between the series of images” recited in claim 1 and was used to reject above claim feature limitations, as previously set forth in the Final OA, Pgs. 22-26.
Regarding Krebs (‘766), prior art previously applied in the Final OA, Applicant asserts that “the '766 application fails to disclose that the series of images comprises a moving image and a fixed image, and the training comprises warping the moving image to the fixed image, as recited in amended Claim 1” (Remarks, Pg. 10). However, Applicant did not provide further remarks explaining why, or how, “the '766 application fails to disclose that the series of images comprises a moving image and a fixed image, and the training comprises warping the moving image to the fixed image, as recited in amended Claim 1”, as indicated above. Therefore, Applicant’s remarks above are respectfully found unconvincing and nothing prevents ‘766 from being applied in combination with prior art previously applied in the Final OA to reject pending independent claims, as follows.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4 and 26 are rejected is rejected under 35 U.S.C. 103 as being unpatentable over Pio et al. (U.S. PG Publication No. 2018/0007382 A1), hereafter referred to as Pio, in view of Hsieh et al. (U.S. PG Publication No. 2018/0350081 A1), hereafter referred to as Hsieh, in further view of SUN et al. (Chinese Publication CN110084868A), hereafter referred to as SUN, and in further view of Krebs et al. (U.S. PG Publication No. 2019/0205766 A1), hereafter referred to as Krebs.

Regarding claim 1, Pio discloses a method (Par. [0003]: embodiments of the present disclosure can include systems, methods, and non-transitory computer readable media configured to train a model to predict motion vectors for entities in video frames) comprising:
obtaining a series of images including movement of at least one object between the series of images (Par. [0008]: systems, methods, and non-transitory computer readable media are configured to obtain a set of videos for training the model, each video having a set of frames, identify one or more objects in the set of frames for each video, determine a set of respective motion vectors for the one or more objects, and cause data describing the one or more objects in the set of frames to be included in the training data as an example inputs and the corresponding motion vectors for the objects to be included in the training data as example outputs; Par. [0022]: motion estimation involves the process of determining, from a set of frames (e.g., images and/or video frames), a set of motion vectors that correspond to various entities in the frames. A motion vector can describe the motion, or displacement, of an entity in the set of frames. An entity can refer to… an object identified in a visual scene captured by a frame… The respective motions of entities can be determined, for example, by evaluating the displacement of the entities in the frames. The displacement of entities can be measured, for example, based on direction (e.g., movement along the x-axis, y-axis, and/or z-axis) and magnitude (e.g., the amount the respective entity was displaced, for example, between the frames; Par. [0037]: training data module 302 can then include data describing each object (e.g., object identifier, object location, e.g., coordinates, etc.) and its corresponding motion vector in the training data… the model can be trained to extract features that allow the model to determine the movement of certain types of objects across frames and to utilize such information to predict motion vectors for such objects… motion vectors can be determined for some, or all, human faces and/or individuals that are identified in video frames. Such human faces and/or individuals may be identified specifically. As a result, true motion can be determined based on how specific faces and/or individuals move in the video frames; obtaining a series of images including movement of at least one object between the series of images (e.g. obtain a set of videos for training a model, each video having a set of frames (i.e. obtaining a series of images), by identify one or more objects in the set of frames for each video to determine respective motions of the objects or entities (i.e. series of images including movement at least one object between the series of images), for example, by evaluating the displacement of the objects or entities in the set of frames, as indicated above), for example); and
training a machine learning-based system based on the series of images to produce a trained machine learning-based system that outputs at least one motion vector indicating a movement of the at least one object between the series of images (Par. [0024-26]: FIG. 1 illustrates an example system including an example motion estimation module 102 configured to determine motion vectors for video content using one or more trained machine learning models… store training data for training one or more machine learning models to predict motion vectors for a set of frames or videos… the training data can include, for example, one or more ground truth motion vector data sets that can be used to train a machine learning model for predicting motion vectors for a set of frames, such as respective directions and magnitudes for entities corresponding to the set of frames; Par. [0029-33]: FIG. 2 illustrates an example motion vector model module 202 configured to analyze video content to determine motion vectors, according to an embodiment of the present disclosure. In some embodiments, the motion vector module 106 of FIG. 1 can be implemented as the example motion vector module 202. As shown in FIG. 2, the example motion vector model module 202 includes a training data module 204 and a training module 206. The motion vector model module 202 can evaluate a set of frames in a video using a trained model to determine motion vectors for entities within and/or between the set of frames. In various embodiments, the model can be implemented using any number of generally known machine learning techniques… training module 206 can be configured to train the model to output, or predict, motion vectors for a set of frames… the trained model can receive, as input, a set of frames and can output a set of motion vectors that each correspond to one or more entities. As mentioned, such entities may refer to one or more pixels in the frames, one or more blocks in the frames, one or more objects in the frames… during the evaluation phase, the accuracy of the model can be tested, for example, using the motion vectors that were outputted by the model for a set of frames and comparing these predicted motion vectors to the pre-computed motion vectors that were provided to the model during the training phase… The training module 206 can measure any inaccuracies in the motion vector information that is outputted by the trained model… By measuring inaccuracies and refining the model over a number of training iterations, the model can be trained to optimally, or otherwise suitably, predict motion vectors for various types for video content;  Par. [0037-40]: training data module 302 can then include data describing each object (e.g., object identifier, object location, e.g., coordinates, etc.) and its corresponding motion vector in the training data… the model can be trained to extract features that allow the model to determine the movement of certain types of objects across frames and to utilize such information to predict motion vectors for such objects… motion vectors can be determined for some, or all, human faces and/or individuals that are identified in video frames… process 500 for training a model to determine motion vectors… At block 502, a model to predict motion vectors for entities in video frames is trained. At block 504, a set of frames that correspond to a first video are obtained. At block 506, the set of frames can be provided as input to the model. At block 508, a set of motion vectors for the set of frames can be obtained from the model. Each motion vector can describe a trajectory of at least one entity in the set of frames; and training a machine learning-based system based on the series of images to produce a trained machine learning-based system that outputs at least one motion vector indicating a movement of the at least one object between the series of images (e.g. system includes motion estimation module configured to determine motion vectors for video content using one or more trained machine learning models (i.e. a trained machine learning-based system that outputs motion vectors), in which each model is trained to extract features that allow the model to determine the movement of certain types of objects across frames and to utilize such information to predict motion vectors for such objects (i.e. training machine learning-based system based on the series of images to produce a trained machine learning-based system that outputs motion vectors indicating movement of objects or entities between the series of images), by training the model to output, or predict, motion vectors for the set of frames, and motion estimation involves the process of determining, from each set of frames, a set of motion vectors that correspond to various objects or entities in the frames, and each motion vector describes the motion, or displacement, of an object or entity in the set of frames objects (i.e. training machine learning-based system based on the series of images to produce a trained machine learning-based system that outputs at least one motion vector indicating a movement of the at least one object between the series of images), as indicated above), for example).
Pio discloses the method, as indicated above, which is used to “predict motion vectors for various types for video content” by “measuring inaccuracies and refining the model over a number of training iterations”, for example, but fails to disclose the following, as further recited in claim 1.
However, Hsieh teaches wherein the method further comprises iteratively reconstructing an image based on the at least one motion vector (Par. [0010]: provide motion-gated medical imaging according to an embodiment. The system 10 includes a medical imaging device 12 which may include, for example, a computed tomography (CT) device, a positron emission tomography (PET) device, a magnetic resonance imaging (MRI) device, and so on. For example, the medical imaging device 12 may include a single photon emission CT (SPECT/CT) device, a PET/CT device, a PET/MRI device, and so on; Par. [0024]: a comparison among data capture samples and/or an evaluation of a motion vector field may be used to compensate for motion during data acquisition (e.g., modulate data acquisition, etc.) and/or in post-processing during reconstruction to generate a final image. For example, a motion vector field from an image captured by an image capture device may provide a boundary condition for motion compensation during tomographic reconstruction. Thus, the logic 30 may provide a motion vector field to a reconstruction process to perform motion correction on the acquired data. In one example, data from the sensor device 28 may be used as input to a reconstruction process to provide motion characteristics (e.g., object motion, table motion, etc.) to be used for motion correction during reconstruction; Par. [0066]: use a motion vector field for motion compensation during reconstruction to generate motion compensated reconstructions. Reconstruction may include, for example… iterative reconstruction, and so on. Thus, location parameters of a reconstructed point may be superimposed by block 92 with a calculated motion vector to reconstruct a motion compensated image… for example, detect how much a subject has moved during data acquisition and generate a motion curve as a function of time that may be used to compensate for that motion; wherein the method further comprises iteratively reconstructing an image based on the at least one motion vector (e.g. image reconstruction method includes iterative reconstruction, which uses a motion vector field for motion compensation during reconstruction to generate motion compensated reconstructions, including a calculated motion vector to reconstruct a motion compensated image (i.e. iteratively reconstructing an image based on the at least one motion vector), as indicated above), for example).
Pio and Hsieh are considered to be analogous art because they pertain to image processing applications which use neural networks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to modify the method for training a model to predict motion vectors for entities in video frames (as disclosed by Pio) with iteratively reconstructing an image based on the at least one motion vector (as taught by Hsieh, Abstract, Par. [0010, 24, 66]) to provide improved image quality from motion monitoring and/or data acquisition gating, reduction of multi-slab axial mis-registration artifacts, and improved performance of motion compensation processes of motion-gated medical imaging systems (Hsieh, Abstract, Par. [0003,103]).
The combination of Pio and Hsieh, as a whole, teaches the method, as indicated above, but fails to disclose the following, as further recited in claim 1.
However, SUN teaches wherein the machine learning-based system is trained using gated Positron Emission Tomography (PET) data (Pg. 1: Positron Emission Computed Tomography (PET) imaging… reconstruction to obtain a PET image… an image correction method, where the method includes: obtaining a to-be-corrected gated reconstructed image and an initial gated reconstructed image… inputting the initial gated reconstructed image into a deep learning model to obtain a deformation field between frames of the initial gated reconstructed image; and correcting the gated reconstructed image to be corrected according to the deformation field between the frames to obtain a corrected reconstructed image; Pg. 2: obtaining a plurality of sample gated reconstructed images, where the sample gated reconstructed image is an image without attenuation correction, and a scan duration of the sample gated reconstructed image is greater than a preset threshold; and obtaining a reference frame image and a moving image according to the sample gating reconstructed image, where the reference frame image is an image without motion influence corresponding to the sample gating reconstructed image, and the moving image is an image with motion influence corresponding to the reference frame image; taking the reference frame image and the moving image as input, taking the deformation field of the moving image as an output, and training the initial deep learning model to obtain the deep learning model… obtain a to-be-corrected gated reconstructed image and an initial gated reconstructed image, where the to-be-corrected gated reconstructed image is an image obtained according to a clinical scan parameter… input the initial gating reconstructed image into a pre-trained deep learning model, and obtain a deformation field between frames of the initial gated reconstructed image… inputting the initial gated reconstructed image into a deep learning model to obtain a deformation field between frames of the initial gated reconstructed image; Pg. 3: the computer device first obtains a to-be-corrected gated reconstructed image and an initial gated reconstructed image, where the to-be-corrected gated reconstructed image is an image obtained according to a clinical scan parameter, and the initial gated reconstructed image is a low-resolution image without attenuation correction processing… obtain a clinical PET scanning parameter from a Positron Emission Computed Tomography (PET) image device, perform attenuation correction gating reconstruction and non-attenuation correction gating reconstruction on the obtained clinical PET scanning parameter, respectively, obtain a to-be-corrected gated reconstruction image and an initial gating reconstructed image… inputting the initial gating reconstructed image into a deep learning model to obtain a deformation field between frames of the initial gated reconstructed image. Specifically, when performing PET scanning, due to the influence of motion such as human respiration or heartbeat, the obtained initial gated reconstructed image is inaccurate, there is a deformation field between frames of the image, the computer device inputs the initial gated reconstructed image into the deep learning model, the deep learning model divides the input initial gated reconstructed image into a plurality of frames, and obtains a deformation field between frames of the initial gated reconstructed image; Pg. 4: obtain a plurality of sample gating reconstructed images from an image device that performs PET gating scanning; wherein the machine learning-based system is trained using gated Positron Emission Tomography (PET) data (e.g. medical imaging method and apparatus include a computer device that obtains a plurality of sample gating reconstructed images from an image device that performs PET gating scanning (i.e. gated Positron Emission Tomography (PET) data), for example, including obtaining a plurality of sample gated reconstructed images, obtaining a reference frame image and a moving image according to the sample gating reconstructed images, inputting the initial gated reconstructed image into a deep learning model (i.e. wherein the machine learning-based system is trained using gated PET data) to obtain a deformation field between frames of the initial gated reconstructed image, as indicated above), for example). 
Pio, Hsieh, and SUN are considered to be analogous art because they pertain to image processing applications which use machine learning. Therefore, the combined teachings of Pio, Hsieh, and SUN, as a whole, would have rendered obvious the invention recited in claim 1 with a reasonable expectation of success in order to modify the method for training a model to predict motion vectors for entities in video frames (as disclosed by Pio) with wherein the machine learning-based system is trained using gated Positron Emission Tomography (as taught by SUN, Pg. 1-4) to improve the efficiency of obtaining corrected reconstructed images and to improve the accuracy of correcting each frame of image of to-be-corrected gated reconstructed images, thereby improving the accuracy of the obtained corrected reconstructed images (SUN, Par. Pg. 1, 4).
The combination of Pio, Hsieh, and SUN, as a whole, teaches the method, as indicated above, but fails to disclose the following, as further recited in claim 1.
However, Krebs teaches and wherein the series of images comprises a moving image and a fixed image, and the training comprises warping the moving image to the fixed image (Krebs, Par. [0003]: Deformable registration is typically achieved by minimizing an objective function between the fixed and moving image and a regularization term; Par. [0023-25]: a method for machine training for diffeomorphic registration. A deep architecture with a diffeomorphic layer is trained, providing for accurate registration of different sets of data... pairs of images representing the patient at different times, from scans with different settings, and/or from scans with different modalities are obtained. In the examples below, images acquired at different times from a same scan of the patient are used as a fixed image and a moving image… the images are captured using x-ray, computed tomography (CT), fluoroscopy, angiography, ultrasound, positron emission tomography (PET), or single photon emission computed tomography (SPECT); Par. [0045-46]: the squaring step as a composition of two vector fields is basically computable with a dense warping function. A differentiable linear interpolation, such as used in spatial transformer networks, may be applied for the squaring step… the architecture is defined to include another layer and corresponding output. The warping layer 27 outputs a warped image from the input of the displacements and one of the input images It. The warping layer 27 may be modeled as a dense spatial transformer with differentiable linear interpolation. Other modeling for spatially transforming the moving image using the displacements ut may be used. The warping is a non-rigid transformation of the image, altering the pixel or voxel values based on the displacements; Par. [0051-56]: In the unsupervised case, a pre-defined similarity metric is used. The defined network is trained to optimize one or multiple losses: a predefined similarity metric… Any differentiable similarity metric and regularizer and its combinations may be used as weighted loss functions during training the unsupervised framework. Example metrics for deep learning-based registration include sum of squared differences, normalized cross correlation, structural similarity, or deformation gradient penalization… The local cross correlation, which requires convolution with Gaussian kernels, may be efficiently implemented with the optimized convolution operators in deep learning frameworks… For training, one of the input images may be fed to the warping layer 27 to be warped (see FIG. 2). For example, the moving image I.sub.t+1 is warped to match the fixed image I.sub.t. The resulting warped image is compared to the other input image. The comparison uses the predefined similarity metric. In training, optimization is performed to minimize or maximize the similarity metric. The network is trained using any differentiable similarity metric of the images of the pairs… The proposed frameworks or networks are generalizable as the networks may learn diffeomorphic deformable registration for different kinds of data. The network may be trained for computed tomography, magnetic resonance, ultrasound, x-ray, positron emission tomography, single photon emission computed tomography, or combinations thereof… the registration is modeled probabilistically by parametrizing the deformation as a vector z to follow a prior p(z). To learn this probabilistic space, the latent vector of dimensionality d in an encoder-decoder neural network is defined as this z. Given the moving and the fixed images as input, a variational inference method is used to reconstruct the fixed image by warping the moving image… For learning a probabilistic deformation encoding, the prior is defined as multivariate unit Gaussians… The frame image registration is treated as a reconstruction problem in which the moving image acts as the conditioning data and is warped to reconstruct or to match the fixed image… a symmetric local cross-correlation (LCC) criterion may be used due to favorable properties for registration; Par. [0074-76]: trained neural network is configured by the machine training to output a velocity field, deformation field, and/or warped image at a resolution of the medical images or scan data. For example, the neural network outputs a displacement vector (magnitude and direction) for each pixel or voxel of the scan data. The velocity field, deformation field, and/or warped image may be at a different resolution than the input images… the medical imaging system or image processor generates an image of the patient from the displacements. The image is output. The estimated results (e.g., velocity field or deformation field) or registered images are output… the warped image is output. The warped image is displayed with an image of the same region of the patient from a different time and/or modality. The registration represented in the warped image may allow for more diagnostically useful comparison of the images from different times and/or modalities. The registration or spatial transformation of the displacements may be used for segmentation or surgery planning where one or more images showing segmentation or the surgery plan are output; and wherein the series of images comprises a moving image and a fixed image, and the training comprises warping the moving image to the fixed image (e.g. neural network architectures are provided for diffeomorphic deformable registration, by using neural network(s) trained to estimate the diffeomorphic deformation between two images (i.e. the series of images), including pairs of images representing the patient at different times, from scans with different settings, and/or from scans with different modalities, including images acquired at different times from a same scan of the patient used as a fixed image and a moving image (i.e. wherein the series of images comprises a moving image and a fixed image), in which deformable registration is achieved by minimizing an objective function between the fixed and moving image, and for training, the moving image is warped to match the fixed image (i.e. and the training comprises warping the moving image to the fixed image), as indicated above), for example).
Pio, Hsieh, SUN, and Krebs are considered to be analogous art because they pertain to image processing applications which use neural networks. Therefore, the combined teachings of Pio, Hsieh, SUN, and Krebs, as a whole, would have rendered obvious the invention recited in claim 1 with a reasonable expectation of success in order to modify the method for training a model to predict motion vectors for entities in video frames (as disclosed by Pio) with and wherein the series of images comprises a moving image and a fixed image, and the training comprises warping the moving image to the fixed image (as taught by Krebs, Abstract, Par. [0003, 23-25, 45-46, 51-56, 74-76]) by using machine learned networks to provide better results in application and/or training than other automated approaches, or manual registration, and by improving registration through diversity and improving diagnosis or treatment using machine learned networks (Krebs, Abstract, Par. [0004-5, 17-18, 20]).

Regarding claim 2, claim 1 is incorporated and the combination of Pio, Hsieh, SUN, and Krebs, as a whole, teaches the method (Pio, Par. [0003]), wherein the training comprises minimizing a penalized loss function based on a similarity metric (Krebs, Par. [0017]: Neural network architectures are provided for diffeomorphic deformable registration. Diffeomorphic registration concepts are incorporated into a neural network architecture. The neural network is trained to estimate the diffeomorphic deformation between two images in one-shot. The framework may be trained in a supervised, unsupervised, or semi-supervised way. For unsupervised training, a differentiable similarity metric (e.g., local cross correlation) may be used as a loss function. Two images are applied to a trained neural network. A warped image, corresponding dense velocities, and/or diffeomorphic deformation field may be output from the network after one forward evaluation of the network; Par. [0048-53]: a machine (e.g., image processor) trains the defined neural network arrangement. The training data (i.e., pairs of images) are used to train the neural network to determine a diffeomorphic deformation field. Given pairs of images, the network is trained by the training data to estimate the displacements between later input unseen images. Other outputs may be trained, such as training to estimate a velocity field and/or warped image. The network may be trained to estimate combinations of outputs, such as outputting velocities, displacements, and a warped image (i.e., one of the input images adjusted by the displacements)…The diffeomorphic neural network learns registration purely from data. Rather than using a manually programmed algorithm, the network is trained to estimate based on the samples in the training data (i.e., image pairs). In the supervised learning, a similarity metric is learned. The training data includes a ground truth deformation field for each pair of images. The similarity of the estimated displacements to the ground truth displacements across the many samples is minimized. For example, the neural network is trained as a standard regression problem with the sum-of-squared differences loss between prediction and ground truth deformation… the trained outputs of the network are the velocities and the deformation field… In the unsupervised case, a pre-defined similarity metric is used. The defined network is trained to optimize one or multiple losses: a predefined similarity metric and possibly a regularizer on the velocities and/or deformation field… Any differentiable similarity metric and regularizer and its combinations may be used as weighted loss functions during training the unsupervised framework. Example metrics for deep learning-based registration include sum of squared differences, normalized cross correlation, structural similarity, or deformation gradient penalization… The local cross correlation, which requires convolution with Gaussian kernels, may be efficiently implemented with the optimized convolution operators in deep learning frameworks… For training, one of the input images may be fed to the warping layer 27 to be warped (see FIG. 2). For example, the moving image It+1 is warped to match the fixed image It. The resulting warped image is compared to the other input image. The comparison uses the predefined similarity metric. In training, optimization is performed to minimize or maximize the similarity metric. The network is trained using any differentiable similarity metric of the images of the pairs… The proposed frameworks or networks are generalizable as the networks may learn diffeomorphic deformable registration for different kinds of data. The network may be trained for computed tomography, magnetic resonance, ultrasound, x-ray, positron emission tomography, single photon emission computed tomography, or combinations thereof ; Par. [0080-81]: trained networks are tested. The performance of the networks is shown using numerical results in cardiac sequences, registering one frame of a sequence to all other frames via composition of the pair-wise deformation fields (similarity metric is mean of squared differences)… The single scale with magnitude penalization is trained with mean of squared differences while the single scale with magnitude penalization marked with “*” is trained with structural similarity loss… FIG. 7 shows a medical imaging system for image registration. The medical imaging system is a host computer, control station, workstation, server, medical diagnostic imaging scanner 72, or other arrangement used for training and/or application of a machine-learned network to medical images; wherein the training comprises minimizing a penalized loss function based on a similarity metric (e.g. neural network architectures are provided for diffeomorphic deformable registration, by using neural network(s) trained to estimate the diffeomorphic deformation between two images, including a differentiable similarity metric, which is used as a loss function during training of the neural network architectures (i.e. training comprises loss function based on a similarity metric), including training a network using differentiable similarity metric as weighted loss functions during training, and in training, optimization is performed to minimize or maximize the similarity metric, for example, and the network is trained to optimize one or multiple losses, including a velocity magnitude penalization trained with structural similarity loss (i.e. training comprises minimizing a penalized loss function based on a similarity metric), as indicated above), for example).
The same motivation to combine above-mentioned teachings applies, as previously indicated in claim 1.

Regarding claim 3, claim 2 is incorporated and the combination of Pio, Hsieh, SUN, and Krebs, as a whole, teaches the method (Pio, Par. [0003]), wherein the similarity metric comprises a cross correlation function for correlating plural images of the series of images (Krebs, Par. [0017]: deep diffeomorphic registration is provided. Neural network architectures are provided for diffeomorphic deformable registration. Diffeomorphic registration concepts are incorporated into a neural network architecture. The neural network is trained to estimate the diffeomorphic deformation between two images in one-shot. The framework may be trained in a supervised, unsupervised, or semi-supervised way. For unsupervised training, a differentiable similarity metric (e.g., local cross correlation) may be used as a loss function. Two images are applied to a trained neural network. A warped image, corresponding dense velocities, and/or diffeomorphic deformation field may be output from the network after one forward evaluation of the network; Par. [0048-56]: a machine (e.g., image processor) trains the defined neural network arrangement. The training data (i.e., pairs of images) are used to train the neural network to determine a diffeomorphic deformation field. Given pairs of images, the network is trained by the training data to estimate the displacements between later input unseen images. Other outputs may be trained, such as training to estimate a velocity field and/or warped image. The network may be trained to estimate combinations of outputs, such as outputting velocities, displacements, and a warped image (i.e., one of the input images adjusted by the displacements)…The diffeomorphic neural network learns registration purely from data. Rather than using a manually programmed algorithm, the network is trained to estimate based on the samples in the training data (i.e., image pairs). In the supervised learning, a similarity metric is learned. The training data includes a ground truth deformation field for each pair of images. The similarity of the estimated displacements to the ground truth displacements across the many samples is minimized. For example, the neural network is trained as a standard regression problem with the sum-of-squared differences loss between prediction and ground truth deformation… the trained outputs of the network are the velocities and the deformation field… In the unsupervised case, a pre-defined similarity metric is used. The defined network is trained to optimize one or multiple losses: a predefined similarity metric and possibly a regularizer on the velocities and/or deformation field… Any differentiable similarity metric and regularizer and its combinations may be used as weighted loss functions during training the unsupervised framework. Example metrics for deep learning-based registration include sum of squared differences, normalized cross correlation, structural similarity, or deformation gradient penalization… The local cross correlation, which requires convolution with Gaussian kernels, may be efficiently implemented with the optimized convolution operators in deep learning frameworks… For training, one of the input images may be fed to the warping layer 27 to be warped (see FIG. 2). For example, the moving image It+1 is warped to match the fixed image It. The resulting warped image is compared to the other input image. The comparison uses the predefined similarity metric. In training, optimization is performed to minimize or maximize the similarity metric. The network is trained using any differentiable similarity metric of the images of the pairs… The proposed frameworks or networks are generalizable as the networks may learn diffeomorphic deformable registration for different kinds of data. The network may be trained for computed tomography, magnetic resonance, ultrasound, x-ray, positron emission tomography, single photon emission computed tomography, or combinations thereof… distributions may be combined in a two-term loss function where the first term describes the reconstruction loss… In other words, the reconstruction loss represents a similarity metric between input and output… a symmetric local cross-correlation (LCC) criterion may be used due to favorable properties for registration; wherein the similarity metric comprises a cross correlation function for correlating plural images of the series of images (e.g. neural network architectures are provided for diffeomorphic deformable registration, by using neural network(s) trained to estimate the diffeomorphic deformation between two images (i.e. plural images of the series of images), for example, including a differentiable similarity metric (e.g., local cross correlation), which is used as a loss function during training of the neural network architectures, and reconstruction loss represents a similarity metric between input and output images, including using a symmetric local cross-correlation (LCC) criterion, as indicated above), for example).
The same motivation to combine above-mentioned teachings applies, as previously indicated in claim 1.

Regarding claim 4, claim 1 is incorporated and the combination of Pio, Hsieh, SUN, and Krebs, as a whole, teaches the method (Pio, Par. [0003]), wherein the training comprises warping the moving image to the fixed image using a differentiable spatial transform (Krebs, Par. [0003]: Deformable registration is typically achieved by minimizing an objective function between the fixed and moving image and a regularization term; Par. [0023-25]: a method for machine training for diffeomorphic registration. A deep architecture with a diffeomorphic layer is trained, providing for accurate registration of different sets of data... pairs of images representing the patient at different times, from scans with different settings, and/or from scans with different modalities are obtained. In the examples below, images acquired at different times from a same scan of the patient are used as a fixed image and a moving image… the images are captured using x-ray, computed tomography (CT), fluoroscopy, angiography, ultrasound, positron emission tomography (PET), or single photon emission computed tomography (SPECT); Par. [0045-46]: the squaring step as a composition of two vector fields is basically computable with a dense warping function. A differentiable linear interpolation, such as used in spatial transformer networks, may be applied for the squaring step… the architecture is defined to include another layer and corresponding output. The warping layer 27 outputs a warped image from the input of the displacements and one of the input images It. The warping layer 27 may be modeled as a dense spatial transformer with differentiable linear interpolation. Other modeling for spatially transforming the moving image using the displacements ut may be used. The warping is a non-rigid transformation of the image, altering the pixel or voxel values based on the displacements; Par. [0051-56]: In the unsupervised case, a pre-defined similarity metric is used. The defined network is trained to optimize one or multiple losses: a predefined similarity metric… Any differentiable similarity metric and regularizer and its combinations may be used as weighted loss functions during training the unsupervised framework. Example metrics for deep learning-based registration include sum of squared differences, normalized cross correlation, structural similarity, or deformation gradient penalization… The local cross correlation, which requires convolution with Gaussian kernels, may be efficiently implemented with the optimized convolution operators in deep learning frameworks… For training, one of the input images may be fed to the warping layer 27 to be warped (see FIG. 2). For example, the moving image I.sub.t+1 is warped to match the fixed image I.sub.t. The resulting warped image is compared to the other input image. The comparison uses the predefined similarity metric. In training, optimization is performed to minimize or maximize the similarity metric. The network is trained using any differentiable similarity metric of the images of the pairs… The proposed frameworks or networks are generalizable as the networks may learn diffeomorphic deformable registration for different kinds of data. The network may be trained for computed tomography, magnetic resonance, ultrasound, x-ray, positron emission tomography, single photon emission computed tomography, or combinations thereof… the registration is modeled probabilistically by parametrizing the deformation as a vector z to follow a prior p(z). To learn this probabilistic space, the latent vector of dimensionality d in an encoder-decoder neural network is defined as this z. Given the moving and the fixed images as input, a variational inference method is used to reconstruct the fixed image by warping the moving image… For learning a probabilistic deformation encoding, the prior is defined as multivariate unit Gaussians… The frame image registration is treated as a reconstruction problem in which the moving image acts as the conditioning data and is warped to reconstruct or to match the fixed image… a symmetric local cross-correlation (LCC) criterion may be used due to favorable properties for registration; Par. [0074-76]: trained neural network is configured by the machine training to output a velocity field, deformation field, and/or warped image at a resolution of the medical images or scan data. For example, the neural network outputs a displacement vector (magnitude and direction) for each pixel or voxel of the scan data. The velocity field, deformation field, and/or warped image may be at a different resolution than the input images… the medical imaging system or image processor generates an image of the patient from the displacements. The image is output. The estimated results (e.g., velocity field or deformation field) or registered images are output… the warped image is output. The warped image is displayed with an image of the same region of the patient from a different time and/or modality. The registration represented in the warped image may allow for more diagnostically useful comparison of the images from different times and/or modalities. The registration or spatial transformation of the displacements may be used for segmentation or surgery planning where one or more images showing segmentation or the surgery plan are output; wherein the training comprises warping the moving image to the fixed image using a differentiable spatial transform (e.g. neural network architectures are provided for diffeomorphic deformable registration, by using neural network(s) trained to estimate the diffeomorphic deformation between two images (i.e. the series of images), including pairs of images representing the patient at different times, from scans with different settings, and/or from scans with different modalities, including images acquired at different times from a same scan of the patient used as a fixed image and a moving image (i.e. the series of images comprises a moving image and a fixed image), in which deformable registration is achieved by minimizing an objective function between the fixed and moving image, and for training, the moving image is warped to match the fixed image (i.e. warping the moving image to the fixed image), for example, and the neural network(s) include a warping layer modeled as a dense spatial transformer with differentiable linear interpolation (i.e. wherein the training comprises warping the moving image to the fixed image using a differentiable spatial transform), as indicated above), for example).
The same motivation to combine above-mentioned teachings applies, as previously indicated in claim 1.

Regarding claim 26, claim 1 is incorporated and the combination of Pio, Hsieh, SUN, and Krebs, as a whole, teaches the method (Pio, Par. [0003]), wherein the machine learning-based system is trained using an unsupervised training process (Par. [0017]: deep diffeomorphic registration is provided. Neural network architectures are provided for diffeomorphic deformable registration. Diffeomorphic registration concepts are incorporated into a neural network architecture. The neural network is trained to estimate the diffeomorphic deformation between two images in one-shot. The framework may be trained in a supervised, unsupervised, or semi-supervised way. For unsupervised training, a differentiable similarity metric (e.g., local cross correlation) may be used as a loss function. Two images are applied to a trained neural network. A warped image, corresponding dense velocities, and/or diffeomorphic deformation field may be output from the network after one forward evaluation of the network; Par. [0031-33]: a neural network (e.g., deep learning) arrangement is defined. The definition is by configuration or programming of the learning. The number of layers or units, type of learning, and other characteristics of the network are controlled by the programmer or user… Deep architectures include convolutional neural network (CNN) or deep belief nets (DBN), but other deep networks may be used… In one embodiment, a CNN, such as a fully convolutional neural network, is used… The neural network is defined as a plurality of sequential feature units or layers… FIG. 2 shows one embodiment of a neural network architecture defined to include a diffeomorphic layer 26. FIG. 2 shows an example architecture for unsupervised training; Par. [0051]: In the unsupervised case, a pre-defined similarity metric is used. The defined network is trained to optimize one or multiple losses: a predefined similarity metric and possibly a regularizer on the velocities and/or deformation field. The need for hand-crafted regularization criteria is minimized due to the architecture of the network, so a regularizer may not be used in other embodiments. Any differentiable similarity metric and regularizer and its combinations may be used as weighted loss functions during training the unsupervised framework. Example metrics for deep learning-based registration include sum of squared differences, normalized cross correlation, structural similarity, or deformation gradient penalization; Par. [0079]: FIG. 6 shows an example application using the network of FIG. 2 (unsupervised training). The unsupervised framework is trained with the loss of mean of squared differences and regularizer of velocities magnitude penalization; Par. [0091]: image processor 76 applies the machine-learned neural network, which includes an encoder-decoder network having been trained to estimate the velocities and an exponentiation layer configured to generate the diffeomorphic deformation field from the velocities. The machine-learned neural network may also include a warping layer configured to generate the warped image from the diffeomorphic deformation field. Alternatively, a warping function is applied outside the network using the deformation field output by the network, or warping is not used… The network may have been trained with unsupervised learning using a pre-training defined similarity metric or with supervised learning using ground truth warped image or deformation field; wherein the machine learning-based system is trained using an unsupervised training process (e.g. neural network architectures are provided for diffeomorphic deformable registration, by using neural network(s) trained to estimate the diffeomorphic deformation between two images, including neural network trained with unsupervised learning (i.e. wherein the machine learning-based system is trained using an unsupervised training process), such as FIG. 2, which shows an example architecture for unsupervised training, as indicated above), for example).
The same motivation to combine above-mentioned teachings applies, as previously indicated in claim 1.

Claims 10-15, and 18-24 are rejected under 35 U.S.C. 103 as being unpatentable over Pio, in view of SUN, and in further view of Krebs.

Regarding claim 10, Pio discloses a system (Par. [0003]: embodiments of the present disclosure can include systems, methods, and non-transitory computer readable media configured to train a model to predict motion vectors for entities in video frames) comprising:
processing circuitry configured to (Par. [0077]: the processes and features described herein are implemented as a series of executable modules run by the computer system 700, individually or collectively in a distributed computing environment. The foregoing modules may be realized by hardware, executable modules stored on a computer-readable medium (or machine-readable medium), or a combination of both. For example, the modules may comprise a plurality or series of instructions to be executed by a processor in a hardware system, such as the processor 702… a module or modules can be executed by a processor or multiple processors in one or multiple locations, such as multiple servers in a parallel processing environment):
obtain a series of images including movement of at least one object between the series of images (Par. [0008]: systems, methods, and non-transitory computer readable media are configured to obtain a set of videos for training the model, each video having a set of frames, identify one or more objects in the set of frames for each video, determine a set of respective motion vectors for the one or more objects, and cause data describing the one or more objects in the set of frames to be included in the training data as an example inputs and the corresponding motion vectors for the objects to be included in the training data as example outputs; Par. [0022]: motion estimation involves the process of determining, from a set of frames (e.g., images and/or video frames), a set of motion vectors that correspond to various entities in the frames. A motion vector can describe the motion, or displacement, of an entity in the set of frames. An entity can refer to… an object identified in a visual scene captured by a frame… The respective motions of entities can be determined, for example, by evaluating the displacement of the entities in the frames. The displacement of entities can be measured, for example, based on direction (e.g., movement along the x-axis, y-axis, and/or z-axis) and magnitude (e.g., the amount the respective entity was displaced, for example, between the frames; Par. [0037]: training data module 302 can then include data describing each object (e.g., object identifier, object location, e.g., coordinates, etc.) and its corresponding motion vector in the training data… the model can be trained to extract features that allow the model to determine the movement of certain types of objects across frames and to utilize such information to predict motion vectors for such objects… motion vectors can be determined for some, or all, human faces and/or individuals that are identified in video frames. Such human faces and/or individuals may be identified specifically. As a result, true motion can be determined based on how specific faces and/or individuals move in the video frames; obtaining a series of images including movement of at least one object between the series of images (e.g. obtain a set of videos for training a model, each video having a set of frames (i.e. obtaining a series of images), by identify one or more objects in the set of frames for each video to determine respective motions of the objects or entities (i.e. series of images including movement at least one object between the series of images), for example, by evaluating the displacement of the objects or entities in the set of frames, as indicated above), for example); and
train a machine learning-based system based on the series of images to produce a trained machine learning-based system for providing at least one motion vector indicating a movement of the at least one object between the series of images (Par. [0024-26]: FIG. 1 illustrates an example system including an example motion estimation module 102 configured to determine motion vectors for video content using one or more trained machine learning models… store training data for training one or more machine learning models to predict motion vectors for a set of frames or videos… the training data can include, for example, one or more ground truth motion vector data sets that can be used to train a machine learning model for predicting motion vectors for a set of frames, such as respective directions and magnitudes for entities corresponding to the set of frames; Par. [0029-33]: FIG. 2 illustrates an example motion vector model module 202 configured to analyze video content to determine motion vectors, according to an embodiment of the present disclosure. In some embodiments, the motion vector module 106 of FIG. 1 can be implemented as the example motion vector module 202. As shown in FIG. 2, the example motion vector model module 202 includes a training data module 204 and a training module 206. The motion vector model module 202 can evaluate a set of frames in a video using a trained model to determine motion vectors for entities within and/or between the set of frames. In various embodiments, the model can be implemented using any number of generally known machine learning techniques… training module 206 can be configured to train the model to output, or predict, motion vectors for a set of frames… the trained model can receive, as input, a set of frames and can output a set of motion vectors that each correspond to one or more entities. As mentioned, such entities may refer to one or more pixels in the frames, one or more blocks in the frames, one or more objects in the frames… during the evaluation phase, the accuracy of the model can be tested, for example, using the motion vectors that were outputted by the model for a set of frames and comparing these predicted motion vectors to the pre-computed motion vectors that were provided to the model during the training phase… The training module 206 can measure any inaccuracies in the motion vector information that is outputted by the trained model… By measuring inaccuracies and refining the model over a number of training iterations, the model can be trained to optimally, or otherwise suitably, predict motion vectors for various types for video content;  Par. [0037-40]: training data module 302 can then include data describing each object (e.g., object identifier, object location, e.g., coordinates, etc.) and its corresponding motion vector in the training data… the model can be trained to extract features that allow the model to determine the movement of certain types of objects across frames and to utilize such information to predict motion vectors for such objects… motion vectors can be determined for some, or all, human faces and/or individuals that are identified in video frames… process 500 for training a model to determine motion vectors… At block 502, a model to predict motion vectors for entities in video frames is trained. At block 504, a set of frames that correspond to a first video are obtained. At block 506, the set of frames can be provided as input to the model. At block 508, a set of motion vectors for the set of frames can be obtained from the model. Each motion vector can describe a trajectory of at least one entity in the set of frames; and training a machine learning-based system based on the series of images to produce a trained machine learning-based system that outputs at least one motion vector indicating a movement of the at least one object between the series of images (e.g. system includes motion estimation module configured to determine motion vectors for video content using one or more trained machine learning models (i.e. a trained machine learning-based system that outputs motion vectors), in which each model is trained to extract features that allow the model to determine the movement of certain types of objects across frames and to utilize such information to predict motion vectors for such objects (i.e. training machine learning-based system based on the series of images to produce a trained machine learning-based system that outputs motion vectors indicating movement of objects or entities between the series of images), by training the model to output, or predict, motion vectors for the set of frames, and motion estimation involves the process of determining, from each set of frames, a set of motion vectors that correspond to various objects or entities in the frames, and each motion vector describes the motion, or displacement, of an object or entity in the set of frames objects (i.e. training machine learning-based system based on the series of images to produce a trained machine learning-based system that outputs at least one motion vector indicating a movement of the at least one object between the series of images), as indicated above), for example). 
Pio discloses the system, as indicated above, but fails to disclose the following, as further recited in claim 10.
However, SUN teaches wherein the machine learning-based system is trained using gated Positron Emission Tomography (PET) data (Pg. 1: Positron Emission Computed Tomography (PET) imaging… reconstruction to obtain a PET image… an image correction method, where the method includes: obtaining a to-be-corrected gated reconstructed image and an initial gated reconstructed image… inputting the initial gated reconstructed image into a deep learning model to obtain a deformation field between frames of the initial gated reconstructed image; and correcting the gated reconstructed image to be corrected according to the deformation field between the frames to obtain a corrected reconstructed image; Pg. 2: obtaining a plurality of sample gated reconstructed images, where the sample gated reconstructed image is an image without attenuation correction, and a scan duration of the sample gated reconstructed image is greater than a preset threshold; and obtaining a reference frame image and a moving image according to the sample gating reconstructed image, where the reference frame image is an image without motion influence corresponding to the sample gating reconstructed image, and the moving image is an image with motion influence corresponding to the reference frame image; taking the reference frame image and the moving image as input, taking the deformation field of the moving image as an output, and training the initial deep learning model to obtain the deep learning model… obtain a to-be-corrected gated reconstructed image and an initial gated reconstructed image, where the to-be-corrected gated reconstructed image is an image obtained according to a clinical scan parameter… input the initial gating reconstructed image into a pre-trained deep learning model, and obtain a deformation field between frames of the initial gated reconstructed image… inputting the initial gated reconstructed image into a deep learning model to obtain a deformation field between frames of the initial gated reconstructed image; Pg. 3: the computer device first obtains a to-be-corrected gated reconstructed image and an initial gated reconstructed image, where the to-be-corrected gated reconstructed image is an image obtained according to a clinical scan parameter, and the initial gated reconstructed image is a low-resolution image without attenuation correction processing… obtain a clinical PET scanning parameter from a Positron Emission Computed Tomography (PET) image device, perform attenuation correction gating reconstruction and non-attenuation correction gating reconstruction on the obtained clinical PET scanning parameter, respectively, obtain a to-be-corrected gated reconstruction image and an initial gating reconstructed image… inputting the initial gating reconstructed image into a deep learning model to obtain a deformation field between frames of the initial gated reconstructed image. Specifically, when performing PET scanning, due to the influence of motion such as human respiration or heartbeat, the obtained initial gated reconstructed image is inaccurate, there is a deformation field between frames of the image, the computer device inputs the initial gated reconstructed image into the deep learning model, the deep learning model divides the input initial gated reconstructed image into a plurality of frames, and obtains a deformation field between frames of the initial gated reconstructed image; Pg. 4: obtain a plurality of sample gating reconstructed images from an image device that performs PET gating scanning; wherein the machine learning-based system is trained using gated Positron Emission Tomography (PET) data (e.g. medical imaging method and apparatus include a computer device that obtains a plurality of sample gating reconstructed images from an image device that performs PET gating scanning (i.e. gated Positron Emission Tomography (PET) data), for example, including obtaining a plurality of sample gated reconstructed images, obtaining a reference frame image and a moving image according to the sample gating reconstructed images, inputting the initial gated reconstructed image into a deep learning model (i.e. wherein the machine learning-based system is trained using gated PET data) to obtain a deformation field between frames of the initial gated reconstructed image, as indicated above), for example). 
Pio and SUN are considered to be analogous art because they pertain to image processing applications which use neural networks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to modify the method for training a model to predict motion vectors for entities in video frames (as disclosed by Pio) with wherein the machine learning-based system is trained using gated Positron Emission Tomography (PET) data  (as taught by SUN, Pg. 1-4) to improve the efficiency of obtaining corrected reconstructed images and to improve the accuracy of correcting each frame of image of to-be-corrected gated reconstructed images, thereby improving the accuracy of the obtained corrected reconstructed images (SUN, Par. Pg. 1, 4).
The combination of Pio and SUN, as a whole, teaches the system (Pio, Par. [0003]), as indicated above, but fails to disclose the following, as further recited in claim 10.
However, Krebs teaches and wherein the series of images comprises a moving image and a fixed image, and the processing circuitry trains the machine learning-based system by warping the moving image to the fixed image (Krebs, Par. [0003]: Deformable registration is typically achieved by minimizing an objective function between the fixed and moving image and a regularization term; Par. [0023-25]: a method for machine training for diffeomorphic registration. A deep architecture with a diffeomorphic layer is trained, providing for accurate registration of different sets of data... pairs of images representing the patient at different times, from scans with different settings, and/or from scans with different modalities are obtained. In the examples below, images acquired at different times from a same scan of the patient are used as a fixed image and a moving image… the images are captured using x-ray, computed tomography (CT), fluoroscopy, angiography, ultrasound, positron emission tomography (PET), or single photon emission computed tomography (SPECT); Par. [0045-46]: the squaring step as a composition of two vector fields is basically computable with a dense warping function. A differentiable linear interpolation, such as used in spatial transformer networks, may be applied for the squaring step… the architecture is defined to include another layer and corresponding output. The warping layer 27 outputs a warped image from the input of the displacements and one of the input images It. The warping layer 27 may be modeled as a dense spatial transformer with differentiable linear interpolation. Other modeling for spatially transforming the moving image using the displacements ut may be used. The warping is a non-rigid transformation of the image, altering the pixel or voxel values based on the displacements; Par. [0051-56]: In the unsupervised case, a pre-defined similarity metric is used. The defined network is trained to optimize one or multiple losses: a predefined similarity metric… Any differentiable similarity metric and regularizer and its combinations may be used as weighted loss functions during training the unsupervised framework. Example metrics for deep learning-based registration include sum of squared differences, normalized cross correlation, structural similarity, or deformation gradient penalization… The local cross correlation, which requires convolution with Gaussian kernels, may be efficiently implemented with the optimized convolution operators in deep learning frameworks… For training, one of the input images may be fed to the warping layer 27 to be warped (see FIG. 2). For example, the moving image I.sub.t+1 is warped to match the fixed image I.sub.t. The resulting warped image is compared to the other input image. The comparison uses the predefined similarity metric. In training, optimization is performed to minimize or maximize the similarity metric. The network is trained using any differentiable similarity metric of the images of the pairs… The proposed frameworks or networks are generalizable as the networks may learn diffeomorphic deformable registration for different kinds of data. The network may be trained for computed tomography, magnetic resonance, ultrasound, x-ray, positron emission tomography, single photon emission computed tomography, or combinations thereof… the registration is modeled probabilistically by parametrizing the deformation as a vector z to follow a prior p(z). To learn this probabilistic space, the latent vector of dimensionality d in an encoder-decoder neural network is defined as this z. Given the moving and the fixed images as input, a variational inference method is used to reconstruct the fixed image by warping the moving image… For learning a probabilistic deformation encoding, the prior is defined as multivariate unit Gaussians… The frame image registration is treated as a reconstruction problem in which the moving image acts as the conditioning data and is warped to reconstruct or to match the fixed image… a symmetric local cross-correlation (LCC) criterion may be used due to favorable properties for registration; Par. [0074-76]: trained neural network is configured by the machine training to output a velocity field, deformation field, and/or warped image at a resolution of the medical images or scan data. For example, the neural network outputs a displacement vector (magnitude and direction) for each pixel or voxel of the scan data. The velocity field, deformation field, and/or warped image may be at a different resolution than the input images… the medical imaging system or image processor generates an image of the patient from the displacements. The image is output. The estimated results (e.g., velocity field or deformation field) or registered images are output… the warped image is output. The warped image is displayed with an image of the same region of the patient from a different time and/or modality. The registration represented in the warped image may allow for more diagnostically useful comparison of the images from different times and/or modalities. The registration or spatial transformation of the displacements may be used for segmentation or surgery planning where one or more images showing segmentation or the surgery plan are output; and wherein the series of images comprises a moving image and a fixed image, and the training comprises warping the moving image to the fixed image (e.g. neural network architectures are provided for diffeomorphic deformable registration, by using neural network(s) trained to estimate the diffeomorphic deformation between two images (i.e. the series of images), including pairs of images representing the patient at different times, from scans with different settings, and/or from scans with different modalities, including images acquired at different times from a same scan of the patient used as a fixed image and a moving image (i.e. wherein the series of images comprises a moving image and a fixed image), in which deformable registration is achieved by minimizing an objective function between the fixed and moving image, and for training, the moving image is warped to match the fixed image (i.e. and the training comprises warping the moving image to the fixed image), as indicated above), for example).
Pio, SUN, and Krebs are considered to be analogous art because they pertain to image processing applications which use neural networks. Therefore, the combined teachings of Pio, SUN, and Krebs, as a whole, would have rendered obvious the invention recited in claim 10 with a reasonable expectation of success in order to modify the method for training a model to predict motion vectors for entities in video frames (as disclosed by Pio) with and wherein the series of images comprises a moving image and a fixed image, and the training comprises warping the moving image to the fixed image (as taught by Krebs, Abstract, Par. [0003, 23-25, 45-46, 51-56, 74-76]) by using machine learned networks to provide better results in application and/or training than other automated approaches, or manual registration, and by improving registration through diversity and improving diagnosis or treatment using machine learned networks (Krebs, Abstract, Par. [0004-5, 17-18, 20]).

Regarding claim 11, claim 10 is incorporated and the combination of Pio, SUN, and Krebs, as a whole, teaches the system (Pio, Par. [0003]), wherein the training comprises minimizing a penalized loss function based on a similarity metric (Krebs, Par. [0017]: Neural network architectures are provided for diffeomorphic deformable registration. Diffeomorphic registration concepts are incorporated into a neural network architecture. The neural network is trained to estimate the diffeomorphic deformation between two images in one-shot. The framework may be trained in a supervised, unsupervised, or semi-supervised way. For unsupervised training, a differentiable similarity metric (e.g., local cross correlation) may be used as a loss function. Two images are applied to a trained neural network. A warped image, corresponding dense velocities, and/or diffeomorphic deformation field may be output from the network after one forward evaluation of the network; Par. [0048-53]: a machine (e.g., image processor) trains the defined neural network arrangement. The training data (i.e., pairs of images) are used to train the neural network to determine a diffeomorphic deformation field. Given pairs of images, the network is trained by the training data to estimate the displacements between later input unseen images. Other outputs may be trained, such as training to estimate a velocity field and/or warped image. The network may be trained to estimate combinations of outputs, such as outputting velocities, displacements, and a warped image (i.e., one of the input images adjusted by the displacements)…The diffeomorphic neural network learns registration purely from data. Rather than using a manually programmed algorithm, the network is trained to estimate based on the samples in the training data (i.e., image pairs). In the supervised learning, a similarity metric is learned. The training data includes a ground truth deformation field for each pair of images. The similarity of the estimated displacements to the ground truth displacements across the many samples is minimized. For example, the neural network is trained as a standard regression problem with the sum-of-squared differences loss between prediction and ground truth deformation… the trained outputs of the network are the velocities and the deformation field… In the unsupervised case, a pre-defined similarity metric is used. The defined network is trained to optimize one or multiple losses: a predefined similarity metric and possibly a regularizer on the velocities and/or deformation field… Any differentiable similarity metric and regularizer and its combinations may be used as weighted loss functions during training the unsupervised framework. Example metrics for deep learning-based registration include sum of squared differences, normalized cross correlation, structural similarity, or deformation gradient penalization… The local cross correlation, which requires convolution with Gaussian kernels, may be efficiently implemented with the optimized convolution operators in deep learning frameworks… For training, one of the input images may be fed to the warping layer 27 to be warped (see FIG. 2). For example, the moving image It+1 is warped to match the fixed image It. The resulting warped image is compared to the other input image. The comparison uses the predefined similarity metric. In training, optimization is performed to minimize or maximize the similarity metric. The network is trained using any differentiable similarity metric of the images of the pairs… The proposed frameworks or networks are generalizable as the networks may learn diffeomorphic deformable registration for different kinds of data. The network may be trained for computed tomography, magnetic resonance, ultrasound, x-ray, positron emission tomography, single photon emission computed tomography, or combinations thereof ; Par. [0080-81]: trained networks are tested. The performance of the networks is shown using numerical results in cardiac sequences, registering one frame of a sequence to all other frames via composition of the pair-wise deformation fields (similarity metric is mean of squared differences)… The single scale with magnitude penalization is trained with mean of squared differences while the single scale with magnitude penalization marked with “*” is trained with structural similarity loss… FIG. 7 shows a medical imaging system for image registration. The medical imaging system is a host computer, control station, workstation, server, medical diagnostic imaging scanner 72, or other arrangement used for training and/or application of a machine-learned network to medical images; wherein the training comprises minimizing a penalized loss function based on a similarity metric (e.g. neural network architectures are provided for diffeomorphic deformable registration, by using neural network(s) trained to estimate the diffeomorphic deformation between two images, including a differentiable similarity metric, which is used as a loss function during training of the neural network architectures (i.e. training comprises loss function based on a similarity metric), including training a network using differentiable similarity metric as weighted loss functions during training, and in training, optimization is performed to minimize or maximize the similarity metric, for example, and the network is trained to optimize one or multiple losses, including a velocity magnitude penalization trained with structural similarity loss (i.e. training comprises minimizing a penalized loss function based on a similarity metric), as indicated above), for example).
The same motivation to combine above-mentioned teachings applies, as previously indicated in claim 10.

Regarding claim 12, claim 11 is incorporated and the combination of Pio, SUN, and Krebs, as a whole, teaches the system (Pio, Par. [0003]), wherein the similarity metric comprises a cross correlation function for correlating plural images of the series of images (Krebs, Par. [0017]: deep diffeomorphic registration is provided. Neural network architectures are provided for diffeomorphic deformable registration. Diffeomorphic registration concepts are incorporated into a neural network architecture. The neural network is trained to estimate the diffeomorphic deformation between two images in one-shot. The framework may be trained in a supervised, unsupervised, or semi-supervised way. For unsupervised training, a differentiable similarity metric (e.g., local cross correlation) may be used as a loss function. Two images are applied to a trained neural network. A warped image, corresponding dense velocities, and/or diffeomorphic deformation field may be output from the network after one forward evaluation of the network; Par. [0048-56]: a machine (e.g., image processor) trains the defined neural network arrangement. The training data (i.e., pairs of images) are used to train the neural network to determine a diffeomorphic deformation field. Given pairs of images, the network is trained by the training data to estimate the displacements between later input unseen images. Other outputs may be trained, such as training to estimate a velocity field and/or warped image. The network may be trained to estimate combinations of outputs, such as outputting velocities, displacements, and a warped image (i.e., one of the input images adjusted by the displacements)…The diffeomorphic neural network learns registration purely from data. Rather than using a manually programmed algorithm, the network is trained to estimate based on the samples in the training data (i.e., image pairs). In the supervised learning, a similarity metric is learned. The training data includes a ground truth deformation field for each pair of images. The similarity of the estimated displacements to the ground truth displacements across the many samples is minimized. For example, the neural network is trained as a standard regression problem with the sum-of-squared differences loss between prediction and ground truth deformation… the trained outputs of the network are the velocities and the deformation field… In the unsupervised case, a pre-defined similarity metric is used. The defined network is trained to optimize one or multiple losses: a predefined similarity metric and possibly a regularizer on the velocities and/or deformation field… Any differentiable similarity metric and regularizer and its combinations may be used as weighted loss functions during training the unsupervised framework. Example metrics for deep learning-based registration include sum of squared differences, normalized cross correlation, structural similarity, or deformation gradient penalization… The local cross correlation, which requires convolution with Gaussian kernels, may be efficiently implemented with the optimized convolution operators in deep learning frameworks… For training, one of the input images may be fed to the warping layer 27 to be warped (see FIG. 2). For example, the moving image It+1 is warped to match the fixed image It. The resulting warped image is compared to the other input image. The comparison uses the predefined similarity metric. In training, optimization is performed to minimize or maximize the similarity metric. The network is trained using any differentiable similarity metric of the images of the pairs… The proposed frameworks or networks are generalizable as the networks may learn diffeomorphic deformable registration for different kinds of data. The network may be trained for computed tomography, magnetic resonance, ultrasound, x-ray, positron emission tomography, single photon emission computed tomography, or combinations thereof… distributions may be combined in a two-term loss function where the first term describes the reconstruction loss… In other words, the reconstruction loss represents a similarity metric between input and output… a symmetric local cross-correlation (LCC) criterion may be used due to favorable properties for registration; wherein the similarity metric comprises a cross correlation function for correlating plural images of the series of images (e.g. neural network architectures are provided for diffeomorphic deformable registration, by using neural network(s) trained to estimate the diffeomorphic deformation between two images (i.e. plural images of the series of images), for example, including a differentiable similarity metric (e.g., local cross correlation), which is used as a loss function during training of the neural network architectures, and reconstruction loss represents a similarity metric between input and output images, including using a symmetric local cross-correlation (LCC) criterion, as indicated above), for example).
The same motivation to combine above-mentioned teachings applies, as previously indicated in claim 10.

Regarding claim 13, claim 10 is incorporated and the combination of Pio, SUN, and Krebs, as a whole, teaches the system (Pio, Par. [0003]), wherein the series of images comprises a moving image and a fixed image, and
wherein the training comprises warping the moving image to the fixed image using a differentiable spatial transform (Krebs, Par. [0003]: Deformable registration is typically achieved by minimizing an objective function between the fixed and moving image and a regularization term; Par. [0023-25]: a method for machine training for diffeomorphic registration. A deep architecture with a diffeomorphic layer is trained, providing for accurate registration of different sets of data... pairs of images representing the patient at different times, from scans with different settings, and/or from scans with different modalities are obtained. In the examples below, images acquired at different times from a same scan of the patient are used as a fixed image and a moving image… the images are captured using x-ray, computed tomography (CT), fluoroscopy, angiography, ultrasound, positron emission tomography (PET), or single photon emission computed tomography (SPECT); Par. [0045-46]: the squaring step as a composition of two vector fields is basically computable with a dense warping function. A differentiable linear interpolation, such as used in spatial transformer networks, may be applied for the squaring step… the architecture is defined to include another layer and corresponding output. The warping layer 27 outputs a warped image from the input of the displacements and one of the input images It. The warping layer 27 may be modeled as a dense spatial transformer with differentiable linear interpolation. Other modeling for spatially transforming the moving image using the displacements ut may be used. The warping is a non-rigid transformation of the image, altering the pixel or voxel values based on the displacements; Par. [0051-56]: In the unsupervised case, a pre-defined similarity metric is used. The defined network is trained to optimize one or multiple losses: a predefined similarity metric… Any differentiable similarity metric and regularizer and its combinations may be used as weighted loss functions during training the unsupervised framework. Example metrics for deep learning-based registration include sum of squared differences, normalized cross correlation, structural similarity, or deformation gradient penalization… The local cross correlation, which requires convolution with Gaussian kernels, may be efficiently implemented with the optimized convolution operators in deep learning frameworks… For training, one of the input images may be fed to the warping layer 27 to be warped (see FIG. 2). For example, the moving image I.sub.t+1 is warped to match the fixed image I.sub.t. The resulting warped image is compared to the other input image. The comparison uses the predefined similarity metric. In training, optimization is performed to minimize or maximize the similarity metric. The network is trained using any differentiable similarity metric of the images of the pairs… The proposed frameworks or networks are generalizable as the networks may learn diffeomorphic deformable registration for different kinds of data. The network may be trained for computed tomography, magnetic resonance, ultrasound, x-ray, positron emission tomography, single photon emission computed tomography, or combinations thereof… the registration is modeled probabilistically by parametrizing the deformation as a vector z to follow a prior p(z). To learn this probabilistic space, the latent vector of dimensionality d in an encoder-decoder neural network is defined as this z. Given the moving and the fixed images as input, a variational inference method is used to reconstruct the fixed image by warping the moving image… For learning a probabilistic deformation encoding, the prior is defined as multivariate unit Gaussians… The frame image registration is treated as a reconstruction problem in which the moving image acts as the conditioning data and is warped to reconstruct or to match the fixed image… a symmetric local cross-correlation (LCC) criterion may be used due to favorable properties for registration; Par. [0074-76]: trained neural network is configured by the machine training to output a velocity field, deformation field, and/or warped image at a resolution of the medical images or scan data. For example, the neural network outputs a displacement vector (magnitude and direction) for each pixel or voxel of the scan data. The velocity field, deformation field, and/or warped image may be at a different resolution than the input images… the medical imaging system or image processor generates an image of the patient from the displacements. The image is output. The estimated results (e.g., velocity field or deformation field) or registered images are output… the warped image is output. The warped image is displayed with an image of the same region of the patient from a different time and/or modality. The registration represented in the warped image may allow for more diagnostically useful comparison of the images from different times and/or modalities. The registration or spatial transformation of the displacements may be used for segmentation or surgery planning where one or more images showing segmentation or the surgery plan are output; wherein the series of images comprises a moving image and a fixed image, and wherein the training comprises warping the moving image to the fixed image using a differentiable spatial transform (e.g. neural network architectures are provided for diffeomorphic deformable registration, by using neural network(s) trained to estimate the diffeomorphic deformation between two images (i.e. the series of images), including pairs of images representing the patient at different times, from scans with different settings, and/or from scans with different modalities, including images acquired at different times from a same scan of the patient used as a fixed image and a moving image (i.e. wherein the series of images comprises a moving image and a fixed image), in which deformable registration is achieved by minimizing an objective function between the fixed and moving image, and for training, the moving image is warped to match the fixed image (i.e. warping the moving image to the fixed image), for example, and the neural network(s) include a warping layer modeled as a dense spatial transformer with differentiable linear interpolation (i.e. wherein the training comprises warping the moving image to the fixed image using a differentiable spatial transform).
The same motivation to combine above-mentioned teachings applies, as previously indicated in claim 10.

Regarding claim 14, claim 10 is incorporated and the combination of Pio, SUN, and Krebs, as a whole, teaches the system (Pio, Par. [0003]), wherein the machine learning-based system comprises a neural network and the trained machine learning-based system comprises a trained neural network (Pio, Par. [0014-17]: FIG. 1 illustrates an example system including an example motion estimation module configured to determine motion vectors for video content using one or more trained machine learning models… FIG. 4 illustrates an example diagram illustrating a trained model for determining motion vector; Par. [0023-29]: a set of frames can be provided as input to a trained model to obtain a corresponding set of motion vectors for various entities. Depending on the implementation, the model can be trained to predict motion vectors for entities such as one or more pixels, blocks, objects, or the frames themselves… the model can be trained using ground truth motion vector training data that may be generated from a set of videos, as described herein. Once trained, the model can determine, for a set of frames, a corresponding set of motion vectors that measure the respective displacements of entities corresponding to the frames. As mentioned, in some embodiments, the model can be trained to predict motion vectors for objects that are identified in visual scenes captured by the frames… at least one data store 108 can also store training data for training one or more machine learning models to predict motion vectors for a set of frames or videos. In one example, the training data can include, for example, one or more ground truth motion vector data sets that can be used to train a machine learning model for predicting motion vectors for a set of frames, such as respective directions and magnitudes for entities corresponding to the set of frames… As shown in FIG. 2, the example motion vector model module 202 includes a training data module 204 and a training module 206. The motion vector model module 202 can evaluate a set of frames in a video using a trained model to determine motion vectors for entities within and/or between the set of frames. In various embodiments, the model can be implemented using any number of generally known machine learning techniques; Par. [0033-39]: the trained model can be implemented as a convolutional neural network… FIG. 4 illustrates an example diagram 400 illustrating a trained model 404 for determining motion vectors, according to various embodiments of the present disclosure. In the example of FIG. 4, the model 404 has been trained to determine, or predict, a set of motion vectors 406 for various entities from a set of input frames 402… In general, the set of frames 402 can be provided as input to the trained model 404 to obtain the corresponding set of motion vectors 406 for the frames. In various embodiments, the model 404 can be trained using ground truth motion vector training data that can be generated as described above. Once trained, the model 404 can determine, for each inputted frame, a corresponding set of motion vectors that measure the respective displacements of the entities. In some embodiments, the model can be trained to recognize objects. In such embodiments, the model can predict motion vectors for such objects both within and between frames of a video; wherein the machine learning-based system comprises a neural network and the trained machine learning-based system comprises a trained neural network (e.g. train a machine learning model for predicting motion vectors for a set of frames implemented as a convolutional neural network (i.e. the machine learning-based system comprises a neural network), in which the model is trained to predict motion vectors for entities such as one or more pixels, blocks, objects, or the frames themselves, for example, by using ground truth motion vector training data generated from each set of videos, and once trained, the model (i.e. the trained neural network) determines, for a set of frames, a corresponding set of motion vectors that measure the respective displacements of entities corresponding to the frames, and the trained model is implemented as a convolutional neural network (i.e. the trained machine learning-based system comprises a trained neural network), as indicated above), for example).

Regarding claim 15, claim 10 is incorporated and the combination of Pio, SUN, and Krebs, as a whole, teaches the system (Pio, Par. [0003]), wherein the machine learning-based system comprises a neural network and the trained machine learning-based system comprises a trained neural network (Pio, Par. [0014-17]: FIG. 1 illustrates an example system including an example motion estimation module configured to determine motion vectors for video content using one or more trained machine learning models… FIG. 4 illustrates an example diagram illustrating a trained model for determining motion vector; Par. [0023-29]: a set of frames can be provided as input to a trained model to obtain a corresponding set of motion vectors for various entities. Depending on the implementation, the model can be trained to predict motion vectors for entities such as one or more pixels, blocks, objects, or the frames themselves… the model can be trained using ground truth motion vector training data that may be generated from a set of videos, as described herein. Once trained, the model can determine, for a set of frames, a corresponding set of motion vectors that measure the respective displacements of entities corresponding to the frames. As mentioned, in some embodiments, the model can be trained to predict motion vectors for objects that are identified in visual scenes captured by the frames… at least one data store 108 can also store training data for training one or more machine learning models to predict motion vectors for a set of frames or videos. In one example, the training data can include, for example, one or more ground truth motion vector data sets that can be used to train a machine learning model for predicting motion vectors for a set of frames, such as respective directions and magnitudes for entities corresponding to the set of frames… As shown in FIG. 2, the example motion vector model module 202 includes a training data module 204 and a training module 206. The motion vector model module 202 can evaluate a set of frames in a video using a trained model to determine motion vectors for entities within and/or between the set of frames. In various embodiments, the model can be implemented using any number of generally known machine learning techniques; Par. [0033-39]: the trained model can be implemented as a convolutional neural network… FIG. 4 illustrates an example diagram 400 illustrating a trained model 404 for determining motion vectors, according to various embodiments of the present disclosure. In the example of FIG. 4, the model 404 has been trained to determine, or predict, a set of motion vectors 406 for various entities from a set of input frames 402… In general, the set of frames 402 can be provided as input to the trained model 404 to obtain the corresponding set of motion vectors 406 for the frames. In various embodiments, the model 404 can be trained using ground truth motion vector training data that can be generated as described above. Once trained, the model 404 can determine, for each inputted frame, a corresponding set of motion vectors that measure the respective displacements of the entities. In some embodiments, the model can be trained to recognize objects. In such embodiments, the model can predict motion vectors for such objects both within and between frames of a video; wherein the machine learning-based system comprises a neural network and the trained machine learning-based system comprises a trained neural network (e.g. train a machine learning model for predicting motion vectors for a set of frames implemented as a convolutional neural network (i.e. the machine learning-based system comprises a neural network), in which the model is trained to predict motion vectors for entities such as one or more pixels, blocks, objects, or the frames themselves, for example, by using ground truth motion vector training data generated from each set of videos, and once trained, the model (i.e. the trained neural network) determines, for a set of frames, a corresponding set of motion vectors that measure the respective displacements of entities corresponding to the frames, and the trained model is implemented as a convolutional neural network (i.e. the trained machine learning-based system comprises a trained neural network), as indicated above), for example), and
 wherein the trained neural network comprises the neural network trained using unsupervised training (Krebs, Par. [0017]: deep diffeomorphic registration is provided. Neural network architectures are provided for diffeomorphic deformable registration. Diffeomorphic registration concepts are incorporated into a neural network architecture. The neural network is trained to estimate the diffeomorphic deformation between two images in one-shot. The framework may be trained in a supervised, unsupervised, or semi-supervised way. For unsupervised training, a differentiable similarity metric (e.g., local cross correlation) may be used as a loss function. Two images are applied to a trained neural network. A warped image, corresponding dense velocities, and/or diffeomorphic deformation field may be output from the network after one forward evaluation of the network; Par. [0031-33]: a neural network (e.g., deep learning) arrangement is defined. The definition is by configuration or programming of the learning. The number of layers or units, type of learning, and other characteristics of the network are controlled by the programmer or user… Deep architectures include convolutional neural network (CNN) or deep belief nets (DBN), but other deep networks may be used… In one embodiment, a CNN, such as a fully convolutional neural network, is used… The neural network is defined as a plurality of sequential feature units or layers… FIG. 2 shows one embodiment of a neural network architecture defined to include a diffeomorphic layer 26. FIG. 2 shows an example architecture for unsupervised training; Par. [0051]: In the unsupervised case, a pre-defined similarity metric is used. The defined network is trained to optimize one or multiple losses: a predefined similarity metric and possibly a regularizer on the velocities and/or deformation field. The need for hand-crafted regularization criteria is minimized due to the architecture of the network, so a regularizer may not be used in other embodiments. Any differentiable similarity metric and regularizer and its combinations may be used as weighted loss functions during training the unsupervised framework. Example metrics for deep learning-based registration include sum of squared differences, normalized cross correlation, structural similarity, or deformation gradient penalization; Par. [0079]: FIG. 6 shows an example application using the network of FIG. 2 (unsupervised training). The unsupervised framework is trained with the loss of mean of squared differences and regularizer of velocities magnitude penalization; Par. [0091]: image processor 76 applies the machine-learned neural network, which includes an encoder-decoder network having been trained to estimate the velocities and an exponentiation layer configured to generate the diffeomorphic deformation field from the velocities. The machine-learned neural network may also include a warping layer configured to generate the warped image from the diffeomorphic deformation field. Alternatively, a warping function is applied outside the network using the deformation field output by the network, or warping is not used… The network may have been trained with unsupervised learning using a pre-training defined similarity metric or with supervised learning using ground truth warped image or deformation field; and wherein the trained neural network comprises the neural network trained using unsupervised training (e.g. neural network architectures are provided for diffeomorphic deformable registration, by using neural network(s) trained to estimate the diffeomorphic deformation between two images, including neural network trained with unsupervised learning (i.e. the trained neural network comprises the neural network trained using unsupervised training), such as FIG. 2, which shows an example architecture for unsupervised training, as indicated above), for example).
The same motivation to combine above-mentioned teachings applies, as previously indicated in claim 10.

Regarding claim 18, is a corresponding method claim rejected as applied to the apparatus claim 10 above.

Regarding claim 19, claim 18 is incorporated and is a corresponding method claim rejected as applied to the apparatus claim 11 above.

Regarding claim 20, claim 19 is incorporated and is a corresponding method claim rejected as applied to the apparatus claim 12 above.

Regarding claim 21, claim 18 is incorporated and is a corresponding method claim rejected as applied to the apparatus claim 13 above.

Regarding claim 22, claim 18 is incorporated and is a corresponding method claim rejected as applied to the apparatus claim 14 above.

Regarding claim 23, claim 18 is incorporated and is a corresponding method claim rejected as applied to the apparatus claim 15 above.

Regarding claim 24, claim 18 is incorporated and the combination of Pio, SUN, and Krebs, as a whole, teaches [a] trained machine learning-based system produced according to the method of claim 18 (Pio, Par. [0024-26]: FIG. 1 illustrates an example system including an example motion estimation module 102 configured to determine motion vectors for video content using one or more trained machine learning models… store training data for training one or more machine learning models to predict motion vectors for a set of frames or videos… the training data can include, for example, one or more ground truth motion vector data sets that can be used to train a machine learning model for predicting motion vectors for a set of frames, such as respective directions and magnitudes for entities corresponding to the set of frames; Par. [0029-33]: FIG. 2 illustrates an example motion vector model module 202 configured to analyze video content to determine motion vectors, according to an embodiment of the present disclosure. In some embodiments, the motion vector module 106 of FIG. 1 can be implemented as the example motion vector module 202. As shown in FIG. 2, the example motion vector model module 202 includes a training data module 204 and a training module 206. The motion vector model module 202 can evaluate a set of frames in a video using a trained model to determine motion vectors for entities within and/or between the set of frames. In various embodiments, the model can be implemented using any number of generally known machine learning techniques… training module 206 can be configured to train the model to output, or predict, motion vectors for a set of frames… the trained model can receive, as input, a set of frames and can output a set of motion vectors that each correspond to one or more entities. As mentioned, such entities may refer to one or more pixels in the frames, one or more blocks in the frames, one or more objects in the frames… during the evaluation phase, the accuracy of the model can be tested, for example, using the motion vectors that were outputted by the model for a set of frames and comparing these predicted motion vectors to the pre-computed motion vectors that were provided to the model during the training phase… The training module 206 can measure any inaccuracies in the motion vector information that is outputted by the trained model… By measuring inaccuracies and refining the model over a number of training iterations, the model can be trained to optimally, or otherwise suitably, predict motion vectors for various types for video content;  Par. [0037-40]: training data module 302 can then include data describing each object (e.g., object identifier, object location, e.g., coordinates, etc.) and its corresponding motion vector in the training data… the model can be trained to extract features that allow the model to determine the movement of certain types of objects across frames and to utilize such information to predict motion vectors for such objects… motion vectors can be determined for some, or all, human faces and/or individuals that are identified in video frames… process 500 for training a model to determine motion vectors… At block 502, a model to predict motion vectors for entities in video frames is trained. At block 504, a set of frames that correspond to a first video are obtained. At block 506, the set of frames can be provided as input to the model. At block 508, a set of motion vectors for the set of frames can be obtained from the model. Each motion vector can describe a trajectory of at least one entity in the set of frames; [a] trained machine learning-based system produced according to the method of claim 18 (e.g. system includes motion estimation module configured to determine motion vectors for video content using one or more trained machine learning models (i.e. a trained machine learning-based system that outputs motion vectors), in which each model is trained to extract features that allow the model to determine the movement of certain types of objects across frames and to utilize such information to predict motion vectors for such objects, by training the model to output, or predict, motion vectors for the set of frames, and motion estimation involves the process of determining, from each set of frames, a set of motion vectors that correspond to various objects or entities in the frames, and each motion vector describes the motion, or displacement, of an object or entity in the set of frames objects (i.e. [a] trained machine learning-based system produced according to the method of claim 18), as indicated above), for example).  
The same motivation to combine above-mentioned teachings applies, as previously indicated in claim 10.

Claims 16 is rejected under 35 U.S.C. 103 as being unpatentable over Pio, in view of SUN, in further view of Krebs, as applied to claim 10 above, and in further view of Liao et al. (U.S. PG Publication No. 2017/0337682 A1), hereafter referred to as Liao.

Regarding claim 16, claim 10 is incorporated and the combination of Pio, SUN, and Krebs, as a whole, teaches the machine learning-based system (Pio, Par. [0003]) that is trained using gated Positron Emission Tomography (PET) data (SUN, Pg. 1-4), which could be interpreted the claimed “machine learning-based system is trained using Positron Emission Tomography (PET) data”, as indicated above, for example.
However, Liao also teaches wherein the machine learning-based system is trained using Positron Emission Tomography (PET) data (Par. [0007]: machine learning based model used to calculate the action-values based on the current state observation may be a trained deep neural network (DNN); Par. [0037-41]: medical image registration can be used to recover correspondences between two or more images acquired using from different patients, the same patient at different times, different medical imaging modalities (e.g., computed tomography (CT), magnetic resonance imaging (MRI), positron emission tomography (PET), ultrasound, etc.)… training images are obtained and/or generated… the DNN can be trained for a particular registration task based on training image pairs (i.e., a reference image Iref and a moving image Imov) corresponding to the particular registration task and known ground truth transformations between the training image pairs. For example, the training image pairs may be medical images acquired using different imaging modalities (e.g., CT, MRI, ultrasound, PET, etc.), medical images acquired from the same patient (using the same imaging modality) at different times, or medical images acquired from different patients; wherein the machine learning-based system is trained using Positron Emission Tomography (PET) data (e.g. machine learning based model used to calculate the action-values based on current state observation, including a trained deep neural network (DNN), which is trained for a particular registration task, based on training image pairs corresponding to the particular registration task and known ground truth transformations between the training image pairs, for example, and the training image pairs include medical images acquired using different imaging modalities, such as positron emission tomography (i.e. wherein the machine learning-based system is trained using Positron Emission Tomography (PET) data), as indicated above), for example).
Pio, SUN, Krebs, and Liao are considered to be analogous art because they pertain to image processing applications which use neural networks. Therefore, the combined teachings of Pio, SUN, Krebs, and Liao, as a whole, would have rendered obvious the invention recited in claim 16 with a reasonable expectation of success in order to modify the method for training a model to predict motion vectors for entities in video frames (as disclosed by Pio) with wherein the machine learning-based system is trained using Positron Emission Tomography (PET) data (as taught by Liao, Abstract, Par. [0007, 37-41]) and uses machine learned networks to improve the alignment between the fixed and moving image by changing the parameters that define the transformation between the images (Liao, Abstract, Par. [0004-5, 36, 39, 60, 122, 124]).


Claim 25 is rejected under 35 U.S.C. 103 as being unpatentable over Pio, in view of SUN, in further view of Krebs, as applied to claim 10, and in further view of Hsieh.

Regarding claim 25, claim 10 is incorporated and the combination or Pio and SUN, and Krebs, as a whole, teaches the system (Pio, Par. [0003]), but fails to teach the following as further recited in claim 25.
However, Hsieh teaches wherein the processing circuitry is further configured to iteratively reconstruct an image based on the at least one motion vector (Par. [0010]: provide motion-gated medical imaging according to an embodiment. The system 10 includes a medical imaging device 12 which may include, for example, a computed tomography (CT) device, a positron emission tomography (PET) device, a magnetic resonance imaging (MRI) device, and so on. For example, the medical imaging device 12 may include a single photon emission CT (SPECT/CT) device, a PET/CT device, a PET/MRI device, and so on; Par. [0024]: a comparison among data capture samples and/or an evaluation of a motion vector field may be used to compensate for motion during data acquisition (e.g., modulate data acquisition, etc.) and/or in post-processing during reconstruction to generate a final image. For example, a motion vector field from an image captured by an image capture device may provide a boundary condition for motion compensation during tomographic reconstruction. Thus, the logic 30 may provide a motion vector field to a reconstruction process to perform motion correction on the acquired data. In one example, data from the sensor device 28 may be used as input to a reconstruction process to provide motion characteristics (e.g., object motion, table motion, etc.) to be used for motion correction during reconstruction; Par. [0066]: use a motion vector field for motion compensation during reconstruction to generate motion compensated reconstructions. Reconstruction may include, for example… iterative reconstruction, and so on. Thus, location parameters of a reconstructed point may be superimposed by block 92 with a calculated motion vector to reconstruct a motion compensated image… for example, detect how much a subject has moved during data acquisition and generate a motion curve as a function of time that may be used to compensate for that motion; wherein the processing circuitry is further configured to iteratively reconstruct an image based on the at least one motion vector (e.g. image reconstruction method includes iterative reconstruction, which uses a motion vector field for motion compensation during reconstruction to generate motion compensated reconstructions, including a calculated motion vector to reconstruct a motion compensated image (i.e. iteratively reconstructing an image based on the at least one motion vector), as indicated above), for example).
Pio, SUN, Krebs, and Hsieh are considered to be analogous art because they pertain to image processing applications which use machine learning. Therefore, the combined teachings of Pio, SUN, Krebs, and Hsieh, as a whole, would have rendered obvious the invention recited in claim 25 with a reasonable expectation of success in order to modify the method for training a model to predict motion vectors for entities in video frames (as disclosed by Pio) with wherein the processing circuitry is further configured to iteratively reconstruct an image based on the at least one motion vector (as taught by Hsieh, Abstract, Par. [0010, 24, 66]) to provide improved image quality from motion monitoring and/or data acquisition gating, reduction of multi-slab axial mis-registration artifacts, and improved performance of motion compensation processes of motion-gated medical imaging systems (Hsieh, Abstract, Par. [0003,103]).

Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to GUILLERMO M RIVERA-MARTINEZ whose telephone number is (571) 272-4979. The examiner can normally be reached on 9 am to 5 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Bee can be reached on 571-270-5183. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/GUILLERMO M RIVERA-MARTINEZ/           Primary Examiner, Art Unit 2677
Read full office action
Prosecution Timeline

Feb 19, 2021
Application Filed
Dec 31, 2022
Non-Final Rejection — §103
May 08, 2023
Response Filed
Aug 12, 2023
Final Rejection — §103
Nov 17, 2023
Request for Continued Examination
Nov 22, 2023
Response after Non-Final Action
Dec 06, 2023
Non-Final Rejection — §103
Apr 12, 2024
Response Filed
Jul 10, 2024
Final Rejection — §103
Oct 15, 2024
Request for Continued Examination
Oct 21, 2024
Response after Non-Final Action
Nov 02, 2024
Non-Final Rejection — §103
Apr 07, 2025
Response Filed
Jun 20, 2025
Final Rejection — §103
Dec 23, 2025
Request for Continued Examination
Jan 18, 2026
Response after Non-Final Action
Jan 23, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/186,575
Patent 12602814
INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND COMPUTER READABLE MEDIUM FOR DETERMINING ATTACHMENT STATE OF IN-VEHICLE CAMERA
2y 5m to grant Granted Apr 14, 2026
18/336,181
Patent 12602860
METHOD FOR CONSTRUCTING THREE-DIMENSIONAL MEDICAL IMAGE
2y 5m to grant Granted Apr 14, 2026
18/356,892
Patent 12602928
MASK PROPAGATION FOR VIDEO SEGMENTATION
2y 5m to grant Granted Apr 14, 2026
18/393,552
Patent 12604049
TECHNIQUES FOR SECURE VIDEO FRAME MANAGEMENT
2y 5m to grant Granted Apr 14, 2026
18/056,778
Patent 12586161
SYSTEMS AND METHODS FOR OPTIMIZED ITERATIVE IMAGE RECONSTRUCTION
2y 5m to grant Granted Mar 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

7-8
Expected OA Rounds
78%
Grant Probability
80%
With Interview (+2.3%)
2y 9m
Median Time to Grant
High
PTA Risk
Based on 503 resolved cases by this examiner. Grant probability derived from career allow rate.
USING NEURAL NETWORKS TO ESTIMATE MOTION VECTORS FOR MOTION CORRECTED PET IMAGE RECONSTRUCTION

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email