Last updated: April 19, 2026
Application No. 17/270,810
DEEP LEARNING-BASED COREGISTRATION

Non-Final OA §103
Filed
Feb 23, 2021
Examiner
CHUANG, SU-TING
Art Unit
2146
Tech Center
2100 — Computer Architecture & Software
Assignee
Arterys Inc.
OA Round
5 (Non-Final)
Interview Optional

— +39.7% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 101 resolved cases, 2023–2026
Examiner Intelligence

CHUANG, SU-TING View full profile →
Grants 52% of resolved cases
Career Allow Rate
52 granted / 101 resolved
-3.5% vs TC avg
Strong +40% interview lift
Without
With
+39.7%
Interview Lift
resolved cases with interview
Typical timeline
4y 5m
Avg Prosecution
28 currently pending
Career history
129
Total Applications
across all art units
Statute-Specific Performance

§101
27.4%
-12.6% vs TC avg
§103
46.3%
+6.3% vs TC avg
§102
10.8%
-29.2% vs TC avg
§112
11.7%
-28.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 101 resolved cases
Office Action

§103
DETAILED ACTION
This action is in response the communications filed on 01/23/2026 in which claims 1 and 72 are amended, claims 12, 24, 33 and 38-71 have been canceled and therefore claims 1-11, 13-23, 25-32, 34-37 and 72 are pending.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 01/23/2026 has been entered.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or non-obviousness.

Claims 1-11, 13-14, 16, 19-23, 25, 27-32, 34-35 and 72 rejected under 35 U.S.C. 103 as being unpatentable over Balakrishnan ("An Unsupervised Learning Model for Deformable Medical Image Registration" 20180207) in view of Shu ("Hierarchical Spatial Transformer Network" 20180130) in view of Jaderberg ("Spatial Transformer Networks" 20150605) in further view of Kenney (US 20190286990 A1, filed on 20180319)

In regard to claims 1 and 72, Balakrishnan teaches: A machine learning system, comprising: at least one nontransitory processor-readable storage medium that stores at least one of processor-executable instructions or data; and at least one processor communicably coupled to the at least one nontransitory processor- readable storage medium, in operation the at least one processor: (Balakrishnan, p. 6 "We implement our networks using Keras [11] with a Tensorflow backend [1]… Table 1 presents runtime results using an Intel Xeon (E5- 2680) CPU, and a NVIDIA TitanX GPU.")
receives learning data comprising a plurality of batches of unlabeled image sets, wherein each image set comprises a source image and target image that each represents a medical image scan of at least one patient; (Balakrishnan, p. 5 "We focus on atlas-based registration, in which we compute a registration field between an atlas, or reference volume, and each volume in our dataset."; p. 6 "each training batch [learning data] consists of one pair of volumes."; p. 1 "The proposed method does not require supervised information such as ground truth [unlabeled] registration fields or anatomical landmarks."; p. 2 "In the typical volume registration formulation, one... volume is warped to align with a second... volume... Let F, M denote the fixed and moving images, respectively... [a target image and a source image]"; p. 2 "Fig. 1 shows sample 2D coronal slices [a medical image scan of at least one patient] taken from 3D MRI volumes, with boundaries of several anatomical structures outlined."; one image can be warped to align with another, either F or M can be a target image, or either F or M can be a source image)
trains one or more convolutional neural networks (CNNs) models… the local network component learns a dense deformation field (DDF)... (Balakrishnan, p. 3 "We model a function g(F, M; θ,  λ) = ϕ using a convolutional neural network (CNN), [trains CNN models] where ϕ is a registration field, θ are learnable parameters of g, and λ is a regularization parameter... We learn appropriate values for θ by training to align a dataset of volume pairs {Fi, Mi}";  p. 2 "Deformable registration strategies separate an initial affine transformation for global alignment from a typically much slower deformable transformation with higher degrees of freedom. We concentrate on the latter step, in which we compute a dense, nonlinear correspondence for all voxels. [dense deformation]")

Balakrishnan and Shu both teach the concept of training one or more CNN models and learning a dense deformation field (DDF). Shu further teaches more details in the claim: trains one or more convolutional neural networks (CNNs) models including a global network component and a local network component, (Shu, p. 2, 1 Introduction "we propose hierarchical spatial transformer network (HSTN) to warp image into desired pose and shape. Firstly, we divide image deformation into two parts: linear deformation and nonlinear deformation. Secondly, we use two modules named linear transformation generator and optical flow field generator to estimate parameters of each part...We propose a novel hierarchical convolutional neural network [two CNN including a global network component and a local network component, see Fig. 5] to achieve image deformation."; see Fig. 5, blue arrow represents convolution)

    PNG
    media_image1.png
    450
    1454
    media_image1.png
    Greyscale


based on the learning data, to learn one or more transformation functions for coregistration of a target image onto a source image, (Shu, p. 5, 4.2 Planar Face Alignment "We apply it in the alignment of the images [coregistration of a target image onto a source image] before and after warping. We use images [the learning data] from a human face database published by [Peng et al., 2012]... they are modified to take concatenated source image and target image as input, [the learning data] and output deformed source image as alignment result."; in light of specification p. 1, coregistration is spatial alignment from an image into another image; Balakrishnan also teaches coregistration; see the next limitation for [transformation functions], i.e. affine transformation and dense deformation (DDF))
wherein the global network component learns an affine transformation matrix and the local network component learns a dense deformation field (DDF) with an unsupervised loss function maintained between the global network component and the local network component, and (Shu, p. 2, 1 Introduction "we use two modules named linear transformation generator and optical flow field generator to estimate parameters of each part. [the global and local network components learns]"; p. 4, Converter "Converter converts obtained affine transformation parameters into corresponding motion field by using following equation:… (5) Where, a ∼ f are 6 parameters of an affine transformation [an affine transformation matrix]"; p. 3, 3 Hierarchical Spatial Transformer Network "The deformation of an image can be expressed by a motion field, a motion field of pixels is called an optical flow field... we combine the theory of approximation and the theory of optical flow to propose a novel way to solve for motion field. Inspired by these two theories, we depict deformation function by a combination of a linear transformation function and an optical flow field... This process can be mathematically described by following equation:... (3)... w denotes an optical flow field... Optical flow field generator produces an optical flow field. [a dense deformation field (DDF)]"; p. 6, 4.2 Planar Face Alignment "we use endpoint error (EPE) [unsupervised loss function] to measure alignment accuracy, which is Euclidean distance between two images, averaged over all pixels... Above convolutional neural networks are trained in the unsupervised manner, thus they save the trouble of getting ground truth. This end-to-end framework [end-to-end training :between the global network component and the local network component] has broad application prospect in the field of image alignment."; end-point error (EPE) in unsupervised alignment is used by measuring the distance between data points of a warped image and another, instead of using labeled ground truth (no knowledge of the ground truth transformations))
wherein the global network component precedes the local network component in the one or more CNNs models and (Shu, p.4 , Figure 5 "The architecture of HSTN. It consists of 4 modules: linear transformation, converter, optical flow field generator and sampler."; see Fig. 5, linear transformation and converter precedes optical flow field generator)

It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Balakrishnan to incorporate the teachings of Shu by including HSTN a hierarchical convolutional neural network. Doing so would achieve better accuracy. (Shu, p. 1, Abstract "we combine them to propose a novel way to achieve image deformation and implement it with a hierarchical convolutional neural network... In the experiments of cluttered MNIST handwritten digits classification and image plane alignment, our method outperforms baseline methods by a large margin.")

Balakrishnan and Shu do not teach, but Jaderberg teaches: (claim 1) … output generated from the global network component in accordance with its affine transformation matrix is further processed with a plurality of scaling factors for at least one of rotation or zooming; (Jaderberg, p. 7 "We also define another extension where before each of the first four convolutional layers of the baseline CNN, we insert a spatial transformer (ST-CNN Multi), where the localisation networks are all two layer fully connected networks with 32 units per layer. In the ST-CNN Multi model, the spatial transformer before the first convolutional layer acts on the input image as with the previous experiments, however the subsequent spatial transformers [output is further processed] deeper in the network act on the convolutional feature maps, [output generated from the global network component in accordance with its affine transformation matrix (from the first ST layer)] predicting a transformation from them and transforming these feature maps (this is visualised in Table 2 (right) (a))."; p. 1 "The transformation is then performed on the entire feature map (non-locally) and can include scaling, cropping, rotations, [rotation] as well as non-rigid deformations"; p. 4 "Aθ = …(2) allowing cropping, translation, and isotropic scaling [zooming] by varying s, tx, and ty. [a plurality of scaling factors]"; The localisation network learns Aθ, and the first ST layer (applying Aθ to the image) and conv layer generate feature maps, which is further processed with more ST layers, which includes scaling factors for rotation or zooming)

    PNG
    media_image2.png
    222
    604
    media_image2.png
    Greyscale


(claim 72) … affine parameter outputs generated from the global network component are further inputted into another spatial transformation layer bounded by a plurality of scaling factors; (Jaderberg, p. 7 "We also define another extension where before each of the first four convolutional layers of the baseline CNN, we insert a spatial transformer (ST-CNN Multi), where the localisation networks are all two layer fully connected networks with 32 units per layer. In the ST-CNN Multi model, the spatial transformer before the first convolutional layer acts on the input image as with the previous experiments, however the subsequent spatial transformers [outputs are further inputted into another spatial transformation layer] deeper in the network act on the convolutional feature maps, [affine parameter outputs (from the first ST layer)] predicting a transformation from them and transforming these feature maps (this is visualised in Table 2 (right) (a))."; p. 1 "The transformation is then performed on the entire feature map (non-locally) and can include scaling, cropping, rotations, [rotation] as well as non-rigid deformations"; p. 4 "Aθ = …(2) allowing cropping, translation, and isotropic scaling [zooming] by varying s, tx, and ty. [a plurality of scaling factors]"; The localisation network learns Aθ, and the first ST layer (applying Aθ to the image) and conv layer generate feature maps, which is further processed with more ST layers, which includes scaling factors for rotation or zooming)

It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Balakrishnan and Shu to incorporate the teachings of Jaderberg by including ST-CNN Multi. Doing so allows deeper spatial transformers to predict a transformation based on richer features rather than the raw image. (Jaderberg, p. 7 "we insert a spatial transformer (ST-CNN Multi)… This allows deeper spatial transformers to predict a transformation based on richer features rather than the raw image.")


    PNG
    media_image3.png
    290
    464
    media_image3.png
    Greyscale
Balakrishnan, Shu and Jaderberg do not teach, but Kenney teaches: stores the one or more trained CNN models in the at least one nontransitory processor-readable storage medium of the machine learning system. (Kenney, [0068] "The trained parameters of the neural network are stored in parameter repository 109, a computer memory subsystem."; in light of specification p.6 weights of the trained network can be stored)

It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Balakrishnan, Shu and Jaderberg to incorporate the teachings of Kenney by including a repository of parameter value. Doing so would allow providing stored data for another neural network. (Kenney, [0081] "A neural network preprocessor 305 combines the encapsulated data provided by merge processor 304 with the parameter values stored in trained neural net parameter value repository 350, thereby providing the inputs (both data inputs and weights/biases) for feed forward neural network processor 306.")

In regard to claim 2, Balakrishnan does not teach, but Shu teaches: wherein the at least one processor trains the one or more CNN models using an unsupervised training algorithm. (Shu, p. 6, 4.2 Planar Face Alignment "Above convolutional neural networks are trained in the unsupervised manner, thus they save the trouble of getting ground truth.")
The rationale for combining the teachings of Balakrishnan and Shu is the same as set forth in the rejection of claim 1.

In regard to claim 3, Balakrishnan teaches: wherein the unsupervised training algorithm comprises a loss function that is calculated from a pair of source and target images and is not computed from any explicit human-created annotations on the images. (Balakrishnan, p p. 8 "This paper presents an unsupervised learning-based approach to medical image registration"; p. 3 "We use batch gradient descent, minimizing the expected loss over a training set: E… [L(Fi, Mi, ϕi)](2) where… L(·,·,·) is a loss as defined in (1). [loss function] Importantly, we do not require supervised information such as ground truth registration fields or anatomical landmarks [no human-created annotations] in order to train our network.")

In regard to claim 4, Balakrishnan teaches: wherein the loss function includes a per-pixel root mean squared error between the source and target images. (Balakrishnan, p. 2 "L(F;M; ϕ) = L_sim(F;M(ϕ)) + λ L_smooth(ϕ); (1)… Common metrics used for Lsim include mean squared voxel difference, mutual information, and cross-correlation."; voxel is pixel)

In regard to claim 5, Balakrishnan teaches: wherein a differentiable objective function includes mutual information loss between the source and target images. (Balakrishnan, p. 4 "The proposed method works with any differentiable loss."; p. 2 "L(F;M; ϕ) = L_sim(F; M(ϕ)) + λ L_smooth(ϕ); (1)… Common metrics used for Lsim include mean squared voxel difference, mutual information, and cross-correlation.")

In regard to claim 6, Balakrishnan teaches: wherein a differentiable objective function includes an L2 loss between the source and target images. (Balakrishnan, p. 2 "L(F;M; ϕ) = L_sim(F; M(ϕ)) + λ L_smooth(ϕ); (1)…"; p. 5 "L_smooth (ϕ) = Σ ‖∇ϕ(p)‖2 [L2 loss] (5)")

In regard to claim 7, Balakrishnan teaches: wherein a differentiable objective function includes a center-weighted L2 loss function between the source and target images. (Balakrishnan, p. 2 "L(F;M; ϕ) = L_sim(F; M(ϕ)) + λ L_smooth(ϕ); (1)…"; p. 5 "L_smooth (ϕ) = Σ ‖∇ϕ(p)‖2 [L2 loss] (5)"; p. 4 "For each voxel p, we compute a (subpixel) voxel location ϕ(p) in M. Because image values are only defined at integer locations, we linearly interpolate the values at the eight neighboring voxels [center-weighted]... M(ϕ(p) =... (3)"; calculating a weighted average of the values at the eight surrounding voxels [center-weighted])

In regard to claim 8, Balakrishnan teaches: wherein a differentiable objective function includes a normalized cross correlation loss function between the source and target images. (Balakrishnan, p. 4 "Let ^ F(p) and ^M(ϕ(p)) denote images with local mean intensities subtracted out. We compute local means over a n3 volume around each voxel, with n = 9 in our experiments. We write the local cross-correlation (CC) of F and M(ϕ) as: CC (F, M) = ... (4) [a normalized cross correlation loss function]")

In regard to claim 9, Balakrishnan teaches: wherein the plurality of batches of unlabeled image sets includes one or both of 2D or 3D images. (Balakrishnan, p. 1 "We present an efficient learning-based algorithm for deformable, pairwise 3D medical image registration."; p. 5 "All registration is done in 3D."; also see Fig. 2 Moving 3D Image (M) and Fixed 3D Image (F))

In regard to claim 10, Balakrishnan does not teach, but Shu teaches: wherein the transformation functions include one or both of affine transformations or dense, nonlinear correspondence maps. (Shu, p. 4, Converter "Converter converts obtained affine transformation parameters into corresponding motion field by using following equation:… (5) Where, a ∼ f are 6 parameters of an affine transformation [an affine transformation matrix]"; p. 3 "Optical flow field is often used to estimate motion field. Optical flow field has great advantage on depicting nonlinear deformation..."; p. 3, 3 Hierarchical Spatial Transformer Network "The deformation of an image can be expressed by a motion field, a motion field of pixels [dense] is called an optical flow field... we combine the theory of approximation and the theory of optical flow to propose a novel way to solve for motion field. Inspired by these two theories, we depict deformation function by a combination of a linear transformation function and an optical flow field. [dense, nonlinear correspondence maps, a dense deformation field (DDF)]... This process can be mathematically described by following equation:... (3)... w denotes an optical flow field")
The rationale for combining the teachings of Balakrishnan and Shu is the same as set forth in the rejection of claim 1.

In regard to claim 11, Balakrishnan does not teach, but Shu teaches: wherein the transformation functions include dense, nonlinear correspondence maps that include dense deformation fields (DDFs). (Shu, p. 3 "Optical flow field is often used to estimate motion field. Optical flow field has great advantage on depicting nonlinear deformation..."; p. 3, 3 Hierarchical Spatial Transformer Network "The deformation of an image can be expressed by a motion field, a motion field of pixels [dense] is called an optical flow field... we combine the theory of approximation and the theory of optical flow to propose a novel way to solve for motion field. Inspired by these two theories, we depict deformation function by a combination of a linear transformation function and an optical flow field. [dense, nonlinear correspondence maps, a dense deformation field (DDF)]... This process can be mathematically described by following equation:... (3)... w denotes an optical flow field")
The rationale for combining the teachings of Balakrishnan and Shu is the same as set forth in the rejection of claim 1.

In regard to claim 13, Balakrishnan and Shu do not teach, but Jaderberg teaches: wherein the affine transformation matrix is calculated on the target image with respect to the source image. (Jaderberg, p. 4 "where (xit, yit) are the target coordinates of the regular grid in the output feature map [target image] , (xis, yis) are the source coordinates in the input feature map [source image] that define the sample points, and Aθ is the affine transformation matrix. "; p. 4 "the input feature map U and produce the sampled output feature map V... This gives us a (sub-)differentiable sampling mechanism, allowing loss gradients to flow back not only to the input feature map (6), but also to the sampling grid coordinates (7), and therefore back to the transformation parameters θ and localisation network since... can be easily derived from (1) for example.")
The rationale for combining the teachings of Balakrishnan, Shu and Jaderberg  is the same as set forth in the rejection of claim 1.

In regard to claim 14, Balakrishnan teaches: wherein the source and target images comprise all possible image pairing combinations. (Balakrishnan, p. 2 "In our work, we optimize function parameters to minimize the expected energy of the form of (1) over a dataset of volume pairs, instead of doing it for each pair independently. [all possible image pairing combinations]")

In regard to claim 16, Balakrishnan teaches: wherein the source and target images comprise all images from one or more disparate MR scan volumes. (Balakrishnan, p. 5 "We use a large-scale, multi-site, multi-study dataset of 7829 T1weighted brain MRI scans from eight publicly available datasets: [disparate MR scan volumes] ADNI [33], OASIS [29], ABIDE [31], ADHD200 [32], MCIC [19], PPMI [30], HABS [12], and Harvard GSP [20]. ")

In regard to claim 19, Balakrishnan and Shu do not teach, but Jaderberg teaches: wherein the affine transformation matrix includes an affine spatial transformation layer. (Jaderberg, p. 7 "Table 2... Right... (a) The schematic of the ST-CNN Multi model. The transformations applied by each spatial transformer (ST) is applied to the convolutional feature map produced by the previous layer."; each ST layer is [an affine spatial transformation layer])
The rationale for combining the teachings of Balakrishnan, Shu and Jaderberg is the same as set forth in the rejection of claim 1.

In regard to claim 20, Balakrishnan and Shu do not teach, but Jaderberg teaches: wherein the affine transformations of the affine transformation matrix are bounded by a scaling factor. (Jaderberg, p. 4 "Aθ = …(2) allowing cropping, translation, and isotropic scaling by varying s, tx, and ty. [a scaling factor]"; in light of specification p. 4 affine spatial transformation layer that is bounded by different scaling factors for rotation, scaling, and zooming)
The rationale for combining the teachings of Balakrishnan, Shu and Jaderberg is the same as set forth in the rejection of claim 1.

In regard to claim 21, Balakrishnan and Shu do not teach, but Jaderberg teaches: wherein the affine transformation matrix includes a regularization operation. (Jaderberg, p. 5 "…(5) To allow backpropagation of the loss [a regularization operation]  through this sampling mechanism we can define the gradients with respect to U and G. For bilinear sampling (5) the partial derivatives are… (6)(7)"; in light of spec. and also see claims 22 and 23, a regularization operation includes bending/gradient energy loss)
The rationale for combining the teachings of Balakrishnan, Shu and Jaderberg is the same as set forth in the rejection of claim 1.

In regard to claim 22, Balakrishnan and Shu do not teach, but Jaderberg teaches: wherein the regularization operation includes bending energy loss. (Jaderberg, p. 5 "…(5) To allow backpropagation of the loss through this sampling mechanism we can define the gradients with respect to U and G… This gives us a (sub-)differentiable sampling mechanism, allowing loss gradients to flow back not only to the input feature map (6), but also to the sampling grid coordinates (7), and therefore back to the transformation parameters θ and localisation network since ∂xsi/∂θ and ∂xsi/∂θ can be easily derived from (1) for example."; θ is used for deformations including scaling, cropping, rotations, and the loss function is associated with θ, thus the loss is [bending energy loss])
The rationale for combining the teachings of Balakrishnan, Shu and Jaderberg is the same as set forth in the rejection of claim 1.

In regard to claim 23, Balakrishnan and Shu do not teach, but Jaderberg teaches: wherein the regularization operation includes gradient energy loss. (Jaderberg, p. 5 "…(5) To allow backpropagation of the loss through this sampling mechanism we can define the gradients with respect to U and G. For bilinear sampling (5) the partial derivatives are… (6)(7)"; the loss function is associated with gradients, thus the loss is [gradient energy loss])
The rationale for combining the teachings of Balakrishnan, Shu and Jaderberg is the same as set forth in the rejection of claim 1.

In regard to claim 25, Balakrishnan teaches: wherein the at least one processor warps the target image to provide a warped target image, and the warped target image is obtained by applying an affine transformation field to the original target image. (Balakrishnan, p. 2 "Deformable registration strategies separate an initial affine transformation for global alignment [an affine transformation field] from a typically much slower deformable transformation with higher degrees of freedom."; p. 3 "We also assume that F and M are affinely aligned [affine transformation on the target image, providing a warped target image] as a preprocessing step, so that the only source of misalignment between the volumes is nonlinear.")

In regard to claim 27, Balakrishnan teaches: wherein the dense deformation field includes a freeform similarity spatial transformer. (Balakrishnan, p. 3 "We model a function g(F, M; θ,  λ) = ϕ using a convolutional neural network (CNN), where ϕ is a registration field"; p. 2 "Deformable registration strategies separate an initial affine transformation for global alignment from a typically much slower deformable transformation with higher degrees of freedom. We concentrate on the latter step, in which we compute a dense, nonlinear correspondence for all voxels."; in light of spec. and also see claims 28 and 29, a freeform similarity spatial transformer includes affine and dense deformation field)

In regard to claim 28, Balakrishnan teaches: wherein the freeform similarity spatial transformer includes an affine transformation. (Balakrishnan, p. 2 "Deformable registration strategies separate an initial affine transformation for global alignment from a typically much slower deformable transformation with higher degrees of freedom.")

In regard to claim 29, Balakrishnan teaches: wherein the freeform similarity spatial transformer includes a dense freeform deformation field warping. (Balakrishnan, p. 2 "Deformable registration strategies separate an initial affine transformation for global alignment from a typically much slower deformable transformation with higher degrees of freedom. We concentrate on the latter step, in which we compute a dense, nonlinear correspondence for all voxels.")

In regard to claim 30, Balakrishnan teaches: wherein the dense deformation field includes a regularization operation. (Balakrishnan, p. 2 "Most existing registration algorithms iteratively optimize a transformation based on an energy function... The energy function is typically of the form: L(F;M; ϕ) = L_sim(F;M(ϕ)) + λ L_smooth(ϕ); (1)"; in light of spec. and also see claims 31 and 32 a regularization operation includes [bending and gradient energy loss], which correspond to ϕ and L_Smooth in (1), thus L(F;M; ϕ) (1) is [a regularization operation])

In regard to claim 31, Balakrishnan teaches: wherein the regularization operation includes bending energy loss. (Balakrishnan, p. 2 "Most existing registration algorithms iteratively optimize a transformation based on an energy function... The energy function is typically of the form: L(F;M; ϕ) = L_sim(F;M(ϕ)) + λ L_smooth(ϕ); (1)... Often, ϕ is a displacement vector field [dense deformation fields], specifying the vector offset from F to M for each voxel"; in light of specification p. 5 lines 23 or 26, bending energy loss refers to dense freeform deformation field warping. ϕ is part of the energy loss function (1), thus the function is [bending energy loss])

In regard to claim 32, Balakrishnan teaches: wherein the regularization operation includes gradient energy loss. (Balakrishnan, p. 2 "L_smooth [gradient energy loss] enforces a spatially smooth deformation, often modeled as a linear operator on spatial gradients of ϕ"; in light of specification p. 5 line 27, A gradient energy loss function may also be used to regularize the DDF)

In regard to claim 34, Balakrishnan teaches: wherein the at least one processor optimizes the one or more CNN models using an adam optimizer using unsupervised differentiable loss functions. (Balakrishnan, p. 6 "We use the ADAM optimizer [24] with a learning rate of 1e4."; p. 4 "The proposed method works with any differentiable loss."; p. 8 "This paper presents an unsupervised learning-based approach to medical image registration")

In regard to claim 35, Balakrishnan teaches: wherein the at least one processor computes the unsupervised loss functions between the source image and warped target image. (Balakrishnan, p. 4 "The proposed method works with any differentiable loss."; p. 8 "This paper presents an unsupervised learning-based approach to medical image registration"; p. 3 "We also assume that F and M are affinely aligned [warped target image] as a preprocessing step, so that the only source of misalignment between the volumes is nonlinear."; one image can be warped to align with another, either F or M can be a target image, or either F or M can be a source image)

Claims 15, 26 and 37 rejected under 35 U.S.C. 103 as being unpatentable over Balakrishnan in view of Shu in view of Jaderberg and Kenney as applied to claims 1 and 72, and in further view of Vigneault ("Ω-Net (Omega-Net): Fully automatic, multi-view cardiac MR detection, orientation, and segmentation with deep neural networks" 20180522) 

In regard to claim 15, Balakrishn teaches: wherein the source and target images comprise all images in a single (Balakrishnan, p. 2 "Figure 1: Example coronal slices from the 3D MRI brain dataset, after affine alignment. Each column is a different scan (subject) [a single MR scan] and each row is a different coronal slice.")
Balakrishnan, Shu, Jaderberg and Kenney do not teach, but Vigneault teaches: cardiac (Vigneault, Highlights "The authors propose Omega-Net: A novel convolutional neural network architecture for the detection, orientation, and segmentation of cardiac MR images.")

It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Balakrishnan, Shu, Jaderberg and Kenney to incorporate the teachings of Vigneault by including the U-Net and applying it on the cardiac images. Doing so would outperform the state-of-the-art method in segmentation of the LV and RV blood- pools. (Vigneault, "we present Ω -Net (Omega-Net): A novel convolutional neural network (CNN) architecture… The Ω -Net outperformed the state-of-the-art method in segmentation of the LV and RV blood- pools")

In regard to claim 26, Balakrishnan, Shu, Jaderberg and Kenney do not teach, but Vigneault teaches: wherein the local network component comprises a contracting path and an expanding path, (Vigneault, p. "The proposed network makes use of the U-Net module (Fig. 3), a type of deep convolutional neural network which has performed well in biomedical segmentation tasks... The U-Net architecture consists of a down-sampling path (left) followed by an up-sampling path (right) [a contracting path and an expanding path] to restore the original spatial resolution.") the contracting path includes one or more convolutional layers and one or more pooling layers, each pooling layer preceded by at least one convolutional layer, and (Vigneault, p. 97 "The downsampling path resembles the canonical classification CNN... with two 3 × 3 convolutions, a rectified linear unit (ReLU) activation, and a 2 × 2 max pooling step repeatedly applied to the input image and feature maps."; conv + pooling + conv + pooling + conv...)

    PNG
    media_image4.png
    198
    450
    media_image4.png
    Greyscale
the expanding path includes a number of convolutional layers and a number of upsampling layers, each upsampling layer preceded by at least one convolutional layer, and (Vigneault, p. 97 "In the upsampling path, the reduction in spatial resolution is 'undone' by performing 2 × 2 up-sampling, ReLU activation, and 3 × 3 convolution, eventually mapping the intermediate feature representation back to the original resolution.") each upsampling layer comprises a transpose convolution operation which performs at least one of an upsampling operation and an interpolation operation with a learned kernel, or an upsampling operation followed by an interpolation operation. (Vigneault, p. 97 "In the upsampling path, the reduction in spatial resolution is 'undone' by performing 2 × 2 up-sampling,  ReLU activation, and 3 × 3 convolution... [transpose convolution operation = upsampling followed by a convolution layer]"; bi-linear interpolation is usually up-scaling followed by a convolution layer; in a convolution process, the receptive field of the input is multiplying with kernel) 

The rationale for combining the teachings of Balakrishnan, Shu, Jaderberg, Kenney and Vigneault is the same as set forth in the rejection of claim 15.

In regard to claim 37, Balakrishnan, Shu, Jaderberg and Kenney do not teach, but Vigneault teaches: wherein the image sets include cardiac short axis CINE MR series. (Vigneault, Highlights "The authors propose Omega-Net: A novel convolutional neural network architecture for the detection, orientation, and segmentation of cardiac MR images."; p. 95 "In this work, Ω-Nets of varying depths were trained to detect five foreground classes in any of three clinical views (short axis, SA; four-chamber, 4C; two-chamber, 2C), without prior knowledge of the view being segmented."; p. 100 "Where available, three SA (basal, equatorial, and apical), one 4C, and one 2C SSFP cine series were obtained.")
The rationale for combining the teachings of Balakrishnan, Shu, Jaderberg, Kenney and Vigneault is the same as set forth in the rejection of claim 15.

Claims 17-18 rejected under 35 U.S.C. 103 as being unpatentable over Balakrishnan in view of Shu in view of Jaderberg and Kenney as applied to claims 1 and 72, and in further view of Zhong ("Handwritten Chinese character recognition with spatial transformer and deep residual networks" 20161204)

In regard to claim 17, Balakrishnan, Shu, Jaderberg and Kenney do not teach, but Zhong teaches: wherein the global network component comprises a contracting path that includes at least one group of layers that comprises at least one convolution layer, max pooling layer, batch normalization layer, and dropout layer. (Zhong, p. 3441 "we use the localization network that contains two convolution+maxpooling layers"; p. 3442 "Fig. 3: The building block architecture in deep residual network: the standard convolution layer (cony) batch normalization (BN), and relu."; p. 3442 "In this paper, we consider two residual networks: a 19-layer and a 34-layer residual networks... We also apply dropout after each shortcuts block for better performance.")

It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Balakrishnan, Shu, Jaderberg and Kenney to incorporate the teachings of Zhong by including spatial transformer network (STN) and the deep residual network (DRN). Doing so would make the training of very deep network to be both efficient and effective. (Zhong, p. 3440 "we combine the recently proposed spatial transformer network (STN) with the deep residual network (DRN). The STN acts like a character shape normalization procedure... the DRN makes the training of very deep network to be both efficient and effective.")

In regard to claim 18, Balakrishnan, Shu, Jaderberg and Kenney do not teach, but Zhong teaches: wherein the global network component comprises a rectifier or a leaky rectifier subsequent to at least one of the at least one of the group of layers in the contracting path. (Zhong, p. 3442 "Fig. 3: The building block architecture in deep residual network: the standard convolution layer (cony) batch normalization (BN), and relu.")
The rationale for combining the teachings of Balakrishnan, Shu, Jaderberg, Kenney, and Zhong is the same as set forth in the rejection of claim 17.

Claim 36 rejected under 35 U.S.C. 103 as being unpatentable over Balakrishnan in view of Shu in view of Jaderberg and Kenney as applied to claim 35, and in further view of Kim ("Improved image registration by sparse patch-based deformation estimation" 20141016)

In regard to claim 36, Balakrishnan, Shu, Jaderberg and Kenney do not teach, but Kim teaches: wherein the warped target image is obtained by applying the dense deformation field to an original target image. (Kim, p. 3 "the dense deformation field can be interpolated via TPS, and further used as the initial deformation field to warp the template to generate the intermediate template."; p. 10 "Since the software of ART and SyN does not provide the interfaces to use the initial deformation field, the integrated versions of these two methods consider the intermediate template images as the target image in their standard registration procedure.")
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Balakrishnan, Shu, Jaderberg and Kenney to incorporate the teachings of Kim by including the intermediate template images. Doing so would allow the system to use the intermediate template images as target images. (Kim, p. 10 "Since the software of ART and SyN does not provide the interfaces to use the initial deformation field, the integrated versions of these two methods consider the intermediate template images as the target image in their standard registration procedure.")

Response to Arguments
Applicant's arguments with respect to the rejection of the claims under 35 U.S.C. 103 have been fully considered but they are moot:
Applicant argues: (p. 9) the portions of art used in the rejection of claim 1 do not teach at least the following features in the context of amended claim 1:  trains one or more convolutional neural networks (CNNs) models including a global network component and a local network component, based on the learning data, to learn one or more transformation functions for coregistration of a target image onto a source image, wherein the global network component learns an affine transformation matrix and the local network component learns a dense deformation field (DDF) with an unsupervised loss function maintained between the global network component and the local network component…

Examiner answers: the arguments do not apply to new references (Shu) being used in the current rejection.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SU-TING CHUANG whose telephone number is (408)918-7519. The examiner can normally be reached Monday - Thursday 8-5 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Usmaan Saeed can be reached at (571) 272-4046. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/SU-TING CHUANG/Examiner, Art Unit 2146
Read full office action
Prosecution Timeline

Feb 23, 2021
Application Filed
Dec 12, 2024
Non-Final Rejection — §103
Feb 10, 2025
Interview Requested
Mar 03, 2025
Applicant Interview (Telephonic)
Mar 03, 2025
Examiner Interview Summary
Mar 11, 2025
Response Filed
Mar 31, 2025
Final Rejection — §103
Apr 30, 2025
Interview Requested
May 08, 2025
Examiner Interview Summary
May 08, 2025
Applicant Interview (Telephonic)
May 12, 2025
Response after Non-Final Action
Jun 30, 2025
Request for Continued Examination
Jul 03, 2025
Response after Non-Final Action
Aug 11, 2025
Non-Final Rejection — §103
Sep 19, 2025
Interview Requested
Oct 07, 2025
Applicant Interview (Telephonic)
Oct 09, 2025
Examiner Interview Summary
Oct 09, 2025
Response Filed
Oct 24, 2025
Final Rejection — §103
Jan 14, 2026
Interview Requested
Jan 21, 2026
Examiner Interview Summary
Jan 21, 2026
Applicant Interview (Telephonic)
Jan 23, 2026
Request for Continued Examination
Jan 28, 2026
Response after Non-Final Action
Feb 07, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

16/953,977
Patent 12561600
LINEAR TIME ALGORITHMS FOR PRIVACY PRESERVING CONVEX OPTIMIZATION
2y 5m to grant Granted Feb 24, 2026
16/984,909
Patent 12518154
TRAINING MULTIMODAL REPRESENTATION LEARNING MODEL ON UNNANOTATED MULTIMODAL DATA
2y 5m to grant Granted Jan 06, 2026
17/224,858
Patent 12481725
SYSTEMS AND METHODS FOR DOMAIN-SPECIFIC ENHANCEMENT OF REAL-TIME MODELS THROUGH EDGE-BASED LEARNING
2y 5m to grant Granted Nov 25, 2025
16/540,414
Patent 12468951
Unsupervised outlier detection in time-series data
2y 5m to grant Granted Nov 11, 2025
18/609,221
Patent 12412095
COOPERATIVE LEARNING NEURAL NETWORKS AND SYSTEMS
2y 5m to grant Granted Sep 09, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
52%
Grant Probability
91%
With Interview (+39.7%)
4y 5m
Median Time to Grant
High
PTA Risk
Based on 101 resolved cases by this examiner. Grant probability derived from career allow rate.
DEEP LEARNING-BASED COREGISTRATION

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email