Last updated: April 19, 2026
Application No. 18/438,426
MOTION ESTIMATION BASED ON MULTIPLE PAIRS OF IMAGES

Non-Final OA §103§112
Filed
Feb 10, 2024
Examiner
CONNER, SEAN M
Art Unit
2663
Tech Center
2600 — Communications
Assignee
Shanghai United Imaging Intelligence Co. Ltd.
OA Round
1 (Non-Final)
Interview Optional

— +27.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 454 resolved cases, 2023–2026
Examiner Intelligence

CONNER, SEAN M View full profile →
Grants 79% — above average
Career Allow Rate
357 granted / 454 resolved
+16.6% vs TC avg
Strong +27% interview lift
Without
With
+27.1%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
22 currently pending
Career history
476
Total Applications
across all art units
Statute-Specific Performance

§101
11.5%
-28.5% vs TC avg
§103
47.9%
+7.9% vs TC avg
§102
12.0%
-28.0% vs TC avg
§112
21.1%
-18.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 454 resolved cases
Office Action

§103 §112
DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claims 1-20, all the claims pending in the application, are rejected. 

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-10 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
In independent claim 1, the limitation “one or more processors configured to” is written in a way that renders the scope of the claim unclear. The apparatus of claim 1 containsone or more processors, but the steps that follow appear to be programming instructions. It is unclear how the one or more processors can be configured to perform each of the claimed steps at a given moment in time. 
If Applicant intends this to be a claim to a hardware-only implementation (i.e., not a processor + software), a statement to that effect is a sufficient response to this rejection. Otherwise, the Examiner recommends amending independent claim 1 to include, as part of the apparatus, “memory configured to store instructions that, when executed by the one or more processors, cause the one or more processors to perform” the claimed steps. 

Claims 2-10 inherit the deficiencies of claim 1 by virtue of their dependency on claim 1. 



Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-4, 9-14, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent Application Publication No. 2020/0311940 to Krebs et al. (hereinafter “Krebs”) in view of U.S. Patent Application Publication No. 2021/0397886 to Chen et al. (hereinafter “Chen”). 

As to independent claim 1, as best understood, Krebs discloses an apparatus, comprising: one or more processors configured to (Abstract and [0022, 0051] discloses that Krebs is directed to “a network architecture 100 of a machine learning based motion model” implemented using “processor 604”): obtain a medical image sequence associated with an anatomical structure ([0031-0032] discloses step 302 – receiving a “sequence of medical images” of “an anatomical structure”); arrange a plurality of medical scan images of the medical video into multiple image pairs, wherein each image pair includes a first medical scan image that is associated with a first temporal position of the medical video and a second medical scan image that is associated with a second temporal position of the medical video; process the multiple image pairs via a machine learning (ML) model, wherein the multiple image pairs are provided to the ML model successively based on the first temporal position or the second temporal position associated with each image pair ([0033] discloses step 304 – “inputting pairs of the one or more medical images into an encoder network of the machine learning based motion model”; see also Fig. 1 in which temporally-separated image pairs (I0, I1), (I0, I2),…, (I0, IT) are input into the model and processed by encoder 102), and wherein the ML model is configured to: determine respective first sets of image features associated with the multiple image pairs ([0033] discloses that “feature vectors are determined” by the encoder 102 processing the respective image pairs; see also Fig. 1 in which the encoder outputs ~Z1 – ~ZT associated with the respective image pairs); refine the first set of image features associated with each image pair based on the respective first sets of image features associated with one or more other image pairs ([0034] discloses step 306 in which “each of the feature vectors are jointly mapped to a respective motion vector to temporally condition the mapping…by merging information of mappings from prior and future time steps”; See also Fig. 1 in which feature vectors ~Z1 – ~ZT are respectively refined, based on one another, to output refined motion vectors Z1 – ZT by Temporal Convolutional Network (TCN) 104); and determine a motion field associated with each image pair based at least on the refined first set of image features associated with the image pair ([0035] discloses step 308 – determining “deformation fields representing motion of the anatomical structure…based on the one or more motion vectors”; see also Fig. 1 in which deformation fields Φ1 – ΦT associated with the respective image pairs are output by a decoder 106 based on the input motion vectors Z1 – ZT); and perform a medical task associated with the anatomical structure based on the respective motion fields associated with the multiple image pairs ([0036] discloses step 310 in which “a medical imaging analysis task is performed using the one or more deformation fields” Φ1 – ΦT associated with the respective image pairs). 
Krebs discloses that the motion model estimates “cardiac motion patterns” for evaluating “myocardial strain rate” ([0046]), wherein such patterns would most reasonably be captured in video; thus, in at least one embodiment of Krebs, the “image sequence” appears to be implicitly disclosed to be video. However, Krebs does not expressly disclose that the obtained image sequence is video. 
Chen, like Krebs, is directed to calculating “subject-specific muscular strain of the myocardium” by “estimating time-varying cardiac motion using myocardial feature tracking” using a “motion estimation neural network system 200” that receives “a pair of input images…extract[s] features from the images”, and processes the features to “infer a flow between the input images” ([0001, 0015-0019], Fig. 2). Chen discloses that the input images are extracted from “video of the heart…that comprises a plurality of images of the heart recorded at different points in time… for a full cardiac cycle” ([0015]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Krebs to obtain the image pairs processed by the motion model from video of the heart, as taught by Chen, to arrive at the claimed invention discussed above. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. It is predictable that the proposed modification would have enhanced the accuracy of Krebs’ “myocardial strain” evaluation ([0046]) since video would have better captured “heart motion starting from relaxation to contraction and then back to relaxation” ([0015] of Chen) relative to individually captured image frames. 

As to claim 2, Krebs as modified above further discloses that the motion field associated with each image pair indicates a motion of the anatomical structure between the first medical scan image of the image pair and the second medical scan image of the image pair ([0035] of Krebs discloses the “deformation fields representing motion of the anatomical structure”, wherein Fig. 1 shows that the one or more deformation fields Φ1 – ΦT are associated with the respective image pairs).

As to claim 3, Krebs as modified above further discloses that the anatomical structure includes a myocardium, the medical video depicts the myocardium within a cardiac cycle, and the medical task includes a determination of one or more strain values associated with the myocardium ([0046] of Krebs discloses that the motion model determines “cardiac specific motion patterns” in order to evaluate “myocardial strain rate”; [0001, 0015] of Chen similarly discloses evaluating “subject-specific muscular strain of the myocardium” and further discloses that the image pairs that are input to the model are extracted from “video of the heart” for obtaining information “for a full cardiac cycle”; the reasons for combining the references are the same as those discussed above in conjunction with claim 1).

As to claim 4, Krebs as modified above further teaches that the ML model includes an encoding portion and a decoding portion, and wherein the first set of image features associated with each image pair is determined via the encoding portion and refined via the decoding portion ([0033-0035] of Krebs discloses that the “feature vectors are determined” by an encoder 102 processing the respective image pairs and that “each of the feature vectors are jointly mapped to a respective motion vector to temporally condition the mapping…by merging information of mappings from prior and future time steps”; See also Fig. 1 in which feature vectors ~Z1 – ~ZT are respectively refined, based on one another, to output refined motion vectors Z1 – ZT by Temporal Convolutional Network (TCN) 104, and deformation fields Φ1 – ΦT associated with the respective image pairs are output by a decoder 106 based on the input motion vectors Z1 – ZT, wherein the TCN 104 and decoder 106 collectively read on the claimed decoding portion). 

As to claim 9, Krebs as modified above further discloses that the plurality of medical scan images of the medical video is associated with a physiological cycle ([0046] of Krebs discloses that the motion model determines “cardiac specific motion patterns”; [0015] of Chen similarly discloses that the motion model obtains information “for a full cardiac cycle”; the reasons for combining the references are the same as those discussed above in conjunction with claim 1), and wherein the one or more processors being configured to arrange the plurality of medical scan images into the multiple image pairs comprises the one or more processors being configured to: select, from the plurality of medical scan images, a same medical scan image as the first medical scan image of each image pair (Fig. 1 of Krebs shows that a same image I0 is the first image in each pair of images input to the model), wherein the selected medical scan image is associated with a beginning of the physiological cycle ([0015] of Chen discloses “starting from a first image frame 102 of the cine MRI”, wherein “the first image frame of the cine MRI is the [end-diastolic] ED frame”; the reasons for combining the references are the same as those discussed above in conjunction with claim 1); and select, from the plurality of medical scan images, a sequentially ordered set of medical scan images as the respective second medical scan images of the multiple image pairs (Fig. 1 of Krebs shows that an ordered sequence of images I1-T are used as the second images in the respective pairs of images input to the model). 

As to claim 10, Krebs does not expressly disclose that the one or more processors being configured to arrange the plurality of medical scan images into the multiple image pairs comprises the one or more processors being configured to: select, from the plurality of medical scan images, a first set of sequentially ordered medical scan images as the respective first medical scan images of the multiple image pairs; and select, from the plurality of medical scan images, a second set of sequentially ordered medical scan images as the respective second medical scan images of the multiple image pairs; wherein a beginning image of the first set of sequentially ordered medical scan images is positioned before a beginning image of the second set of sequentially ordered medical scan images in the medical video. 
However, Chen discloses that the image pairs input to the motion model are consecutively ordered: (t=1, t=2), (t=2, t=3), (t=3, t=4),…, (t=n-1, t=n) (see [0038] and Fig. 4). That is, Chen discloses that the one or more processors being configured to arrange the plurality of medical scan images into the multiple image pairs comprises the one or more processors being configured to: select, from the plurality of medical scan images, a first set of sequentially ordered medical scan images as the respective first medical scan images of the multiple image pairs (first images in the pairs include (t=1, t=2, t=3,…, t=n-1); and select, from the plurality of medical scan images, a second set of sequentially ordered medical scan images as the respective second medical scan images of the multiple image pairs (second images in the pairs include (t=2, t=3, t=4,…, t=n); wherein a beginning image of the first set of sequentially ordered medical scan images is positioned before a beginning image of the second set of sequentially ordered medical scan images in the medical video (wherein t=1 precedes t=2). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Krebs to input consecutive image pairs to the motion model, as taught by Chen, to arrive at the claimed invention discussed above. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. It is predictable that the proposed modification would have enabled “estimat[ing] the cardiac motion represented by the cine MRI by computing a composite flow field between an end-diastolic (ED) image frame and the n-th image frame” by virtue of predicting “respective displacement of features between a respective pair image frames (e.g., a pair of neighboring frames)”, as taught by Chen ([0038]). 

Independent claim 11 recites a method comprising steps performed by the apparatus recited in independent claim 1. Accordingly, claim 11 is rejected for reasons analogous to those discussed above in conjunction with claim 1. 

Dependent claims 12-14 and 19-20 recite features nearly identical to those recited in claims 2-4 and 9-10, respectively. Accordingly, claims 12-14 and 19-20 are rejected for reasons analogous to those discussed above in conjunction with claims 2-4 and 9-10, respectively.


Claims 5 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Krebs in view of Chen and further in view of “Anatomy-Aware Cardiac Motion Estimation” by Chen et al. (hereinafter “Chen2”).

As to claim 5, Krebs does not expressly disclose that the encoding portion of the ML model is implemented via a twin neural network, and wherein the ML model being configured to determine the respective first sets of image features associated with the multiple image pairs comprises the ML model being configured to: extract respective image features from the first medical scan image and the second medical scan image of each image pair using the twin neural network; and concatenate the image features extracted from the first medical scan image and the second medical scan image to derive the first set of image features associated with the image pair. 
Chen, like Krebs, is directed to calculating “subject-specific muscular strain of the myocardium” by “estimating time-varying cardiac motion using myocardial feature tracking” using a “motion estimation neural network system 200” that receives “a pair of input images…extract[s] features from the images”, and processes the features to “infer a flow between the input images” ([0001, 0015-0019], Fig. 2). Chen discloses that the encoding portion 202 includes “twin subnetworks 202 a and 202 b arranged in a Siamese configuration to process the respective input images 204 a and 204 b in tandem”, wherein the “twin feature maps or feature vectors” are combined for input to a subsequent layer ([0019-0020] and Fig. 2). 
That is, Chen discloses that the encoding portion of the ML model is implemented via a twin neural network, and wherein the ML model being configured to determine the respective first sets of image features associated with the multiple image pairs comprises the ML model being configured to: extract respective image features from the first medical scan image and the second medical scan image of each image pair using the twin neural network; and combine the image features extracted from the first medical scan image and the second medical scan image to derive the first set of image features associated with the image pair ([0019-0020] and Fig. 2 discloses encoding portion 202 comprising twin subnetworks 202a and 202b which extracts respective image features from input images 204a and 204b and combines the extracted image features for output to the subsequent layers). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Krebs to utilize twin Siamese subnetworks for the respective image feature extraction of the input pair of images, as taught by Chen, to arrive at the claimed invention discussed above. Such a modification is the result of simple substitution of one known element for another producing a predictable result.  More specifically, Krebs’ encoder 102 and Chen’s twin subnetworks 202a-b perform the same general and predictable function, the predictable function being extracting features from a pair of temporally-separate images for further processing to estimate motion between the images. Since each individual element and its function are shown in the prior art, albeit shown in separate references, the difference between the claimed subject matter and the prior art rests not on any individual element or function but in the very combination itself - that is in the substitution of Krebs’ encoder 102 by replacing it with Chen’s twin subnetworks 202a-b. Thus, the simple substitution of one known element for another producing a predictable result renders the claim obvious. It is predictable that the proposed modification would have enhanced feature extraction speed by virtue of the shared weights between the twin networks. 
Chen2, like Krebs and Chen, is directed to “cardiac motion estimation” (Abstract). Chen2 utilizes a deep learning framework similar to that of Chen in which image pairs are input to a Siamese networks for feature extraction, followed by processing for estimating motion between the image pair (Fig. 2 of Chen2). Chen2 further discloses that the features output by the Siamese Network are concatenated prior to input to the subsequent model (Fig. 2).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the proposed combination of Krebs and Chen to combine the features output by the Siamese Network by concatenation, as taught by Chen2, to arrive at the claimed invention discussed above. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. It is predictable that the proposed modification would have preserved the features for subsequent motion analysis since concatenation does not alter the features, whereas other feature combination methods do. 

Claim 15 recites features nearly identical to those recited in claim 5. Accordingly, claim 15 is rejected for reasons analogous to those discussed above in conjunction with claim 5. 



Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Krebs in view of Chen and further in view of WO 2024/126112 to Sinha et al. (hereinafter “Sinha”).

As to claim 6, Krebs as modified above further teaches that the decoding portion of the ML model is implemented via a neural network and wherein the ML model is configured to refine the first set of image features associated with each image pair using a module of the neural network ([0024-0025] of Krebs discloses that each of the TCN 104 and the decoder network 106 is a neural network, wherein feature vectors ~Z1 – ~ZT are respectively refined, based on one another, to output refined motion vectors Z1 – ZT by Temporal Convolutional Network (TCN) 104). 
The proposed combination of Krebs and Chen does not expressly disclose that the decoder neural network is a transformer or that the refining uses self-attention of the transformer. However, Sinha discloses that computing optical flow between pairs of images in a sequence of images may be performed by a TCN or a transformer, highlighting the interchangeability of these types of networks for “processing temporal data” ([0047]). Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to substitute Kreb’s TCN with a transformer that necessarily uses self-attention by definition of a transformer, as taught by Sinha, to arrive at the claimed invention discussed above. Such a modification is the result of simple substitution of one known element for another producing a predictable result.  More specifically, Kreb’s TCN and Sinha’s transformer perform the same general and predictable function, the predictable function being processing temporal data between pairs of images. Since each individual element and its function are shown in the prior art, albeit shown in separate references, the difference between the claimed subject matter and the prior art rests not on any individual element or function but in the very combination itself - that is in the substitution of Kreb’s TCN by replacing it with Sinha’s transformer. Thus, the simple substitution of one known element for another producing a predictable result renders the claim obvious. It is predictable that the proposed modification would have better captured global contexts that convolutional models do not capture as well as transformers. 


Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Krebs in view of Chen and further in view of “RAFT: Recurrent All-Pairs Field Transforms for Optical Flow” by Teed et al. (hereinafter “Teed”).

As to claim 7, the proposed combination of Krebs and Chen further teaches that the decoding portion of the ML model is implemented via a neural network, and wherein the ML model is configured to refine the first set of image features associated with each image pair based on one or more hidden states of the neural network ([0024-0025] of Krebs discloses that each of the TCN 104 and the decoder network 106 is a neural network, wherein feature vectors ~Z1 – ~ZT are respectively refined, based on one another, to output refined motion vectors Z1 – ZT by Temporal Convolutional Network (TCN) 104, wherein a neural network necessarily performs its functions according to its hidden states, e.g., represented as the “trainable parameters” of Krebs’ neural networks). 
The proposed combination of Krebs and Chen does not expressly disclose that the neural network is recurrent neural network comprising a gate recurrent unit. 
	Teed, like Krebs, is directed to a deep learning framework that inputs a pair of temporally-separated image frames and outputs estimated motion therebetween (Abstract and Fig. 1 and accompanying description). Teed discloses that the encoded features from the images in the pair are refined by a recurrent neural network comprising a gate recurrent unit (Abstract, Sections 1 and 3, and Fig. 1). 
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the proposed modification of Krebs and Chen to substitute Kreb’s TCN with a RNN comprising a GRU, as taught by Teed, to arrive at the claimed invention discussed above. Such a modification is the result of simple substitution of one known element for another producing a predictable result.  More specifically, Kreb’s TCN and Teed’s RNN perform the same general and predictable function, the predictable function being processing feature data of a pair of images to estimate motion therebetween. Since each individual element and its function are shown in the prior art, albeit shown in separate references, the difference between the claimed subject matter and the prior art rests not on any individual element or function but in the very combination itself - that is in the substitution of Kreb’s TCN by replacing it with Teed’s RNN. Thus, the simple substitution of one known element for another producing a predictable result renders the claim obvious. It is predictable that the proposed modification would have increased the accuracy of the temporal feature analysis since RNN’s are better at processing sequential data than convolutional networks. 


Claims 8 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Krebs in view of Chen and further in view of U.S. Patent Application Publication No. 2022/0198669 to Chen et al. (hereinafter “Chen3”).

As to claim 8, the proposed combination of Kreb and Chen does not expressly disclose that the first set of image features associated with each image pair is characterized by a first spatial scale, and wherein the ML model is further configured to: determine respective second sets of image features associated with the multiple image pairs, wherein the second sets of image features are characterized by a second spatial scale; refine the second set of image features associated with each image pair based on the respective second sets of image features associated with one or more other image pairs; and determine the motion field associated with the image pair further on the refined second set of image features associated with the image pair. 
Chen3, like Krebs and Chen, is directed to analyzing temporally-separated image pairs by a deep learning framework for analyzing “cardiac motion”, wherein features extracted from the images by an “encoder” 220 are analyzed by a “temporally-aware” network 230 (Abstract, [0008, 0050] and Fig. 3). Chen3 discloses that the convolutional encoding layers “operate at multiple different spatial resolutions”, wherein an encoded features “at different resolutions” 304 are input to a recurrent neural network 230 which performs temporal analysis on the features at the varying resolutions, and outputs the temporally analyzed features to a decoder 242 for final output ([0052-0056], Fig. 4). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the proposed combination of Krebs and Chen perform Krebs’ feature refinement and motion field estimation at multiple scales/resolutions, as taught by Chen3, to arrive at the claimed invention discussed above. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. It is predictable that the proposed modification would have enhanced the accuracy of predictions by the framework by virtue of providing more information to be analyzed (see [0058] of Chen3: analyzing features “at each spatial resolution layer allows the recurrent part of the network 210 to have full exposure to features at every past time point and every spatial resolution level (e.g., from coarse to fine) before making a prediction”). 

Claim 18 recites features nearly identical to those recited in claim 8. Accordingly, claim 18 is rejected for reasons analogous to those discussed above in conjunction with claim 8.

Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Krebs in view of Chen and Chen2 and further in view of Sinha.

As to claim 16, Krebs as modified above further teaches that the decoding portion of the ML model is implemented via a neural network and wherein the ML model is configured to refine the first set of image features associated with each image pair using a module of the neural network ([0024-0025] of Krebs discloses that each of the TCN 104 and the decoder network 106 is a neural network, wherein feature vectors ~Z1 – ~ZT are respectively refined, based on one another, to output refined motion vectors Z1 – ZT by Temporal Convolutional Network (TCN) 104). 
The proposed combination of Krebs, Chen and Chen2 does not expressly disclose that the decoder neural network is a transformer or that the refining uses self-attention of the transformer. However, Sinha discloses that computing optical flow between pairs of images in a sequence of images may be performed by a TCN or a transformer, highlighting the interchangeability of these types of networks for “processing temporal data” ([0047]). Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to substitute Kreb’s TCN with a transformer that necessarily uses self-attention by definition of a transformer, as taught by Sinha, to arrive at the claimed invention discussed above. Such a modification is the result of simple substitution of one known element for another producing a predictable result.  More specifically, Kreb’s TCN and Sinha’s transformer perform the same general and predictable function, the predictable function being processing temporal data between pairs of images. Since each individual element and its function are shown in the prior art, albeit shown in separate references, the difference between the claimed subject matter and the prior art rests not on any individual element or function but in the very combination itself - that is in the substitution of Kreb’s TCN by replacing it with Sinha’s transformer. Thus, the simple substitution of one known element for another producing a predictable result renders the claim obvious. It is predictable that the proposed modification would have better captured global contexts that convolutional models do not capture as well as transformers.



Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Krebs in view of Chen and Chen2 and further in view of Teed.

As to claim 17, the proposed combination of Krebs, Chen, and Chen2 further teaches that the decoding portion of the ML model is implemented via a neural network, and wherein the ML model is configured to refine the first set of image features associated with each image pair based on one or more hidden states of the neural network ([0024-0025] of Krebs discloses that each of the TCN 104 and the decoder network 106 is a neural network, wherein feature vectors ~Z1 – ~ZT are respectively refined, based on one another, to output refined motion vectors Z1 – ZT by Temporal Convolutional Network (TCN) 104, wherein a neural network necessarily performs its functions according to its hidden states, e.g., represented as the “trainable parameters” of Krebs’ neural networks). 
The proposed combination of Krebs, Chen and Chen2 does not expressly disclose that the neural network is recurrent neural network comprising a gate recurrent unit. 
	Teed, like Krebs, is directed to a deep learning framework that inputs a pair of temporally-separated image frames and outputs estimated motion therebetween (Abstract and Fig. 1 and accompanying description). Teed discloses that the encoded features from the images in the pair are refined by a recurrent neural network comprising a recurrent neural network comprising a gate recurrent unit (Abstract, Sections 1 and 3, and Fig. 1). 
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the proposed modification of Krebs, Chen and Chen2 to substitute Kreb’s TCN with a RNN comprising a GRU, as taught by Teed, to arrive at the claimed invention discussed above. Such a modification is the result of simple substitution of one known element for another producing a predictable result.  More specifically, Kreb’s TCN and Teed’s RNN perform the same general and predictable function, the predictable function being processing feature data of a pair of images to estimate motion therebetween. Since each individual element and its function are shown in the prior art, albeit shown in separate references, the difference between the claimed subject matter and the prior art rests not on any individual element or function but in the very combination itself - that is in the substitution of Kreb’s TCN by replacing it with Teed’s RNN. Thus, the simple substitution of one known element for another producing a predictable result renders the claim obvious. It is predictable that the proposed modification would have increased the accuracy of the temporal feature analysis since RNN’s are better at processing sequential data than convolutional networks. 


Pertinent Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.

Turcea (U.S. Patent Application Publication No. 2020/0085394) is directed to estimating cardiac motion using a deep learning framework comprising a convolutional neural network CNN and temporal CNN. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEAN M CONNER whose telephone number is (571)272-1486. The examiner can normally be reached 10 AM - 6 PM Monday through Friday, and some Saturday afternoons.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Greg Morse can be reached at (571) 272-3838. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/SEAN M CONNER/Primary Examiner, Art Unit 2663
Read full office action
Prosecution Timeline

Feb 10, 2024
Application Filed
Mar 21, 2026
Non-Final Rejection — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/328,597
Patent 12586374
MULTIMODAL VIDEO SUMMARIZATION
2y 5m to grant Granted Mar 24, 2026
18/632,078
Patent 12586412
USING TWO-DIMENSIONAL IMAGES AND MACHINE LEARNING TO IDENTIFY INFORMATION PERTAINING TO EYE SHAPE
2y 5m to grant Granted Mar 24, 2026
18/909,470
Patent 12585862
Training Data for Training Artificial Intelligence Agents to Automate Multimodal Software Usage
2y 5m to grant Granted Mar 24, 2026
18/009,783
Patent 12579778
Pattern Matching Device, Pattern Measuring System, Pattern Matching Program
2y 5m to grant Granted Mar 17, 2026
18/212,325
Patent 12573180
COLLECTION OF IMAGE DATA FOR USE IN TRAINING A MACHINE-LEARNING MODEL
2y 5m to grant Granted Mar 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
79%
Grant Probability
99%
With Interview (+27.1%)
2y 9m
Median Time to Grant
Low
PTA Risk
Based on 454 resolved cases by this examiner. Grant probability derived from career allow rate.
MOTION ESTIMATION BASED ON MULTIPLE PAIRS OF IMAGES

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email