Last updated: April 19, 2026
Application No. 18/104,245
IMAGE ALIGNMENT KNOWLEDGE DISTILLATION

Non-Final OA §102§103§112
Filed
Jan 31, 2023
Examiner
SUMMERS, GEOFFREY E
Art Unit
2669
Tech Center
2600 — Communications
Assignee
Sony Interactive Entertainment Europe Limited
OA Round
3 (Non-Final)
Interview Optional

— +35.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 348 resolved cases, 2023–2026
Examiner Intelligence

SUMMERS, GEOFFREY E View full profile →
Grants 72% — above average
Career Allow Rate
249 granted / 348 resolved
+9.6% vs TC avg
Strong +35% interview lift
Without
With
+35.4%
Interview Lift
resolved cases with interview
Typical timeline
2y 5m
Avg Prosecution
27 currently pending
Career history
375
Total Applications
across all art units
Statute-Specific Performance

§101
9.6%
-30.4% vs TC avg
§103
41.0%
+1.0% vs TC avg
§102
16.3%
-23.7% vs TC avg
§112
28.6%
-11.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 348 resolved cases
Office Action

§102 §103 §112
DETAILED ACTION
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on December 19, 2025, has been entered.
 
Response to Amendment
Claims 1-16 and 18-20 were previously pending.  Applicant’s amendment filed December 19, 2025, has been entered in full.  Claims 1, 13, and 18-20 are amended.  Claims 15 and 16 are cancelled.  New claims 21-23 are added.  Accordingly, claims 1-14 and 18-23 are now pending.

Response to Arguments
Applicant traverses the previous prior art rejections (Remarks filed December 19, 2025, hereinafter Remarks: Pages 9-11).  In particular, Applicant states that “Applicant could find no disclosure in Ihler, either express or implied, of preprocessing image data as required by amended claim 1” (Remarks: Page 10).  Examiner respectfully disagrees.  Ihler receives image data from various datasets (e.g., Sec. 4.1).  The last sentence of Sec. 4.2 in Ihler states “We cropped all datasets to multiples of 64 to fit the models’ architecture.”  This image cropping is a form of preprocessing the received image data that falls within the scope of the claimed invention.

Claim Objections
Applicant is advised that should claim 1 be found allowable, claim 18 will be objected to under 37 CFR 1.75 as being a substantial duplicate thereof. When two claims in an application are duplicates or else are so close in content that they both cover the same thing, despite a slight difference in wording, it is proper after allowing one claim to object to the other as being a substantial duplicate of the allowed claim. See MPEP § 608.01(m).

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claim(s) 6, 7, 13, and 22 is/are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claim 6 recites “wherein processing the received image data using the artificial neural network comprises applying the convolutional filters to the received image data” (emphasis added).  However, claim 6 depends from claim 1, which has been amended to recite “preprocessing the image data of the first image [i.e., the received image data]; processing the preprocessed image data using an artificial neural network” (emphasis added).  
It is unclear whether claim 6 is seeking to further define the processing of the preprocessed image data from claim 1, or to require further processing of the original (i.e., not preprocessed) image data received in claim 1.  On the one hand, recitation of “wherein” and reference to the artificial neural network suggests that claim 6 is further defining the processing performed in claim 1.  On the other hand, claim 6 specifically recites “the received image data” that is distinguished from the preprocessed image data in claim 1, which suggests that claim 6 is reciting different processing.
This ambiguity makes the scope of the claim unclear and renders claim 6 indefinite.  Claim 7 is also indefinite at least because it includes the indefinite limitations of claim 6.

Claim 13 recites “receiving image data of the second image; preprocessing the image data of the first image; processing the preprocessed image data of the second image” (emphasis added).  The meaning of these limitations is unclear, rendering the scope of the claim indefinite.
There is insufficient antecedent basis for “the preprocessed image data of the second image” because claim 13 (as well as claim 1) only recites performing preprocessing on the first image and there is no mention of preprocessing on the second image.

Claim 22 depends from claim 21.  Claim 21 recites “downscaling the image data of the first image.”  Claim 22 further recites “wherein the artificial neural network comprises one or more strided-convolution layers to downscale the image data of the first image.”  
It is unclear whether the strided-convolution layers are performing the downscaling recited in claim 21, or are performing a separate and distinct downscaling.  On the one hand, claim 22 recites that the intended use of the strided-convolution layers is “to downscale the image data of the first image,” which is the exact same action recited in claim 21 and thus suggests that only one downscaling is required by the claim.  I.e., it suggests that claim 22 is further limiting how the downscaling of claim 21 is performed.  On the other hand, claim 1 recites that the artificial neural network processes the preprocessed image, and claim 21 states that the preprocessing comprises downscaling the image data.  Taken together, these limitations suggest that the downscaling of claim 21 occurs before the processing with the artificial neural network in claim 22 – i.e., that the downscaling in claim 22 is separate and distinct from the downscaling performed in claim 21.  This ambiguity makes the scope of the claim unclear and renders the claim indefinite.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claim(s) 1-5, 8-11, 14, and 18 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by ‘Ihler’ (“Patient-Specific Domain Adaptation for Fast Optical Flow Based on Student-Teacher Knowledge Transfer,” 9 July 2020).

Regarding claim 1, Ihler discloses a computer-implemented (e.g., Page 6, Training time, Nvidia GPU) method of processing image data (see steps mapped below), the method comprising: 
receiving image data of a first image in a sequence of images (e.g., Section 4.1, endoscopic image sequences; e.g., Figure 1, left column shows first images in a sequence; Sec. 3.2 uses                                 
                                    x
                                
                             to refer to a received pair of images, one of which is a first image); 
preprocessing the image data of the first image (Sec. 4.2, last sentence, “We cropped all datasets to multiples of 64 to fit the models’ architecture”; This cropping of the images received from the datasets is preprocessing);
processing the preprocessed image data using an artificial neural network to generate output image data of the first image (e.g., Sec. 3.2,                                 
                                    
                                            g
                                        
                                            θ
                                        
                                            x
                                        
                            , where                                 
                                    g
                                
                             is a student artificial neural network with parameters                                 
                                    θ
                                
                             used to process the preprocessed received image data; The output image data that                                 
                                    g
                                
                             is used to generate is described further below), the output image data comprising an aligned image (A motion-compensated image – see below) or a feature map of the aligned image (A flow field – see below), wherein the aligned image corresponds to a synthetic image in which one or more features or pixels in the first image are aligned with one or more corresponding features or pixels in a second image in the sequence of images (e.g., Sec. 3.2, student network                                 
                                    g
                                
                             is trained through knowledge distillation to predict flow fields that match those predicted by teacher model                                 
                                    h
                                
                            ; Also see, e.g., Sec. 4.2, which identifies specific models used for                                 
                                    g
                                
                             and                                 
                                    h
                                
                            , and Sec. 4.3, which defines the loss function used to train                                 
                                    g
                                
                            ; A flow field is within the scope of a feature map of an aligned image at least because it maps displacements of image features that are needed to produce an aligned image; Sec. B (in the appendix) further teaches using the flow fields for motion compensation [i.e., image stabilization], where a subsequent/first image is warped, thereby generating a synthetic image in which the features of the subsequent/first image are aligned with corresponding features of an initial/second image in the sequence; I.e., subsequent images are aligned to the first image to compensate for camera motion; Note that the terms “first” and “second” are being interpreted as distinguishing between different claim elements and not to require a particular temporal order of the images within the sequence of images); and
using the output image data for image processing (e.g., Fig. 1, Sec. B (in appendix), output flow image data is used for image stabilization and/or tracking image processing; also see Sec. 4.3, Tracking), 
wherein the artificial neural network is trained using outputs of an alignment pipeline configured to perform alignment of images input to the alignment pipeline (e.g., Sec. 3.2, artificial neural network                                 
                                    g
                                
                             is trained using outputs                                 
                                    
                                            y
                                        
                                        ~
                                    
                             of alignment pipeline                                 
                                    h
                                
                             configured to perform alignment of images by producing flow fields), wherein the outputs of the alignment pipeline comprise one or more aligned images and/or aligned feature maps of images (e.g., Sec. 3.2, last paragraph, output                                 
                                    
                                            y
                                        
                                        ~
                                    
                             is a flow field, which is at least an aligned feature map of an image because the flow field includes displacements for aligning features of images at different times); 
wherein the alignment pipeline is configured to:
determine flow vectors representing optical flow between the images input to the alignment pipeline (e.g., Sec. 3.2, flow field                                 
                                    
                                            y
                                        
                                        ~
                                    
                            ; e.g., Sec. 2, 1st paragraph, flow field is made up of flow vectors representing optical flow between images); and 
perform an image transformation using the determined flow vectors to align the images input to the alignment pipeline (e.g., Sec. B, frames are warped so that they are aligned with first input frame                                 
                                    
                                            I
                                        
                                            1
                                        
                             during image stabilization; Fig. 1, top row, shows results of tracking using the teacher network – i.e., the alignment pipeline; The following information is not relied upon in this rejection and is noted only to promote compact prosecution: the FlowNet 2 teacher network used by Ihler includes using determined flow vectors to warp input images so that they are aligned – see, e.g., Fig. 2 of the previously-cited ‘Ilg’ reference), 
wherein the artificial neural network is trained to generate the aligned image or the feature map (e.g., Sec. 3.2, Sec. 4.3, artificial neural network                                 
                                    g
                                
                             is trained to output a flow map that matches flow map                                 
                                    
                                            y
                                        
                                        ~
                                    
                             output by the alignment pipeline; As explained above, this flow map is the feature map; As also explained above, the flow maps output by                                 
                                    g
                                
                             are further used to generate aligned images through motion compensation, so the training of                                 
                                    g
                                
                             is within the scope of being “to generate the aligned image”).

Regarding claim 2, Ihler discloses the method according to claim 1, wherein the image transformation comprises a warping operation for warping at least one of the images input to the alignment pipeline based on the determined flow vectors (e.g., Sec. B, frames are warped based on flow vectors                                 
                                    W
                                
                             so that they are aligned with first input frame                                 
                                    
                                            I
                                        
                                            1
                                        
                            ;).

Regarding claim 3, Ihler discloses the method according to claim 1, wherein the first image and the second image are successive images in a temporal sequence of images (e.g., Sec. B, flow fields                                 
                                    W
                                
                             are estimated between successive images in the sequence, such that concatenating the flows aligns a given frame to the first frame).
Regarding claim 4, Ihler discloses the method according to claim 1, further comprising using the output image data of the first image to aggregate temporal information of the first image and/or the second image, to temporally correlate the first image and the second image to enhance the first image and/or the second image (e.g., Sec. B, temporal information of the frames is aggregated by concatenating flows over time/frames, thereby temporally correlating between the frames to enhance the first/second images through image stabilization – i.e., stabilized images fall within the scope of being “enhanced”).

Regarding claim 5, Ihler discloses the method according to claim 1, wherein the output image data of the first image comprises an approximation of a result of performing the image transformation on the image data of the first image using flow vectors representing optical flow between the first image and the second image (e.g., Fig. 1, top row shows a tracking result obtained by performing the warping image transformation on the image data of the first image using flow vectors representing optical flow between the first image and the second image produced by the teacher network, while bottom row shows results of an approximation using the student network; As can be seen in the Figure, and as explained throughout the text, the output image data produced using the student network is an approximation of the result obtained using the teacher network).

Regarding claim 8, Ihler discloses the method according to claim 1, 
wherein the received image data of the first image comprises a feature map of image features derivable from the first image (e.g., Table 1, received image data are arrays of pixels, such as a 640x448 array of pixels; Each of the pixels can be seen as an image feature and the array forms a “map” of pixel features at least because it indicates their relative 2D positions), and 
wherein the output image data comprises an approximation of a result of aligning the map of image features derivable from the first image with a map of image features derivable from the second image (e.g., Sec. 3.2, output includes a flow field produced by student model                                 
                                    g
                                
                            , which is an approximation of flow result from using teacher model                                 
                                    h
                                
                            ; e.g., Sec. 2, 1st par., the flow map provides displacements needed to align individual pixels/features of first and second images).

Regarding claim 9, Ihler discloses the method according to claim 1, wherein the artificial neural network is trained using a loss function configured to determine a difference between an output of the artificial neural network and an output of the alignment pipeline (e.g., Sec. 4.3, Fine-tuning student model,                                 
                                    L
                                    
                                            g
                                            
                                                    x
                                                
                                            ,
                                            
                                                    y
                                                
                                                ~
                                            
                                    =
                                    
                                            g
                                            
                                                    x
                                                
                                            -
                                            
                                                    y
                                                
                                                ~
                                            
                            ; also see Sec. 3.2).

Regarding claim 10, Ihler discloses the method according to claim 1, wherein the alignment pipeline comprises a further artificial neural network trained to determine the flow vectors (e.g., Sec. 4.2, FlowNet2).
Regarding claim 11, Ihler discloses the method according to claim 1, wherein the artificial neural network comprises a student artificial neural network (e.g., Secs. 3.2 and 4.2, student network                                 
                                    g
                                
                            ), and wherein the alignment pipeline comprises a teacher artificial neural network (e.g., Secs. 3.2 and 4.2, teacher network                                 
                                    h
                                
                            ).

Regarding claim 14, Ihler discloses the method according to claim 1, further comprising concatenating the first image with the second image using the output image data of the first image generated using the artificial neural network (e.g., Sec. B, flow concatenation for image stabilization).

Regarding claim 18, Examiner notes that the claim recites a method that is substantially the same as the method of claim 1.  Ihler discloses the method of claim 1 (see above).  Accordingly, claim 18 is also rejected under 35 U.S.C. 102(a)(1) as being anticipated by Ihler for substantially the same reasons as claim 1.

Claim(s) 6-7 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Ihler in view of ‘Ilg’ (“FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks,” 2017) and ‘Dosovitskiy’ (“FlowNet: Learning Optical Flow with Convolutional Networks,” 2015).

Regarding claim 6, Ihler discloses the method according to claim 1.
Ihler discloses that, “[f]or all our experiments we used the accurate, high-complexity FlowNet2 framework [Ilg et al.(2017)] as our teacher model                         
                            h
                        
                     and its fast FlowNet2S component as our low-complexity student model                         
                            g
                            .
                        
                    ” (Sec. 4.2).
Ihler does not describe details of the “FlowNet2S” model within its own text.  Therefore, Ihler itself does not explicitly disclose that the artificial neural network comprises a series of convolutional filters, and that processing the received image data using the artificial neural network comprises applying the convolutional filters to the received image data, as best understood in view of the issues of indefiniteness noted above.
However, Ilg does disclose a FlowNet 2.0 model (e.g., Fig. 2) includes a FlowNetS artificial neural network that is described by Dosovitskiy.  For example, “We tested the two network architectures introduced by Dosovitskiy et al. [10]: FlowNetS, which is a straightforward encoder-decoder architecture …” (Sec. 3, 4th par.).  Also see page 1651, top-right and Table 3, FlowNet2-S, which is an alternative notation for the FlowNetS in the original FlowNet described by Dosovitskiy.  In view of Ilg’s disclosure, one of ordinary skill in the art would have recognized that “FlowNet2S” means the FlowNetS disclosed by Dosovitskiy.
Dosovitskiy discloses details of the FlowNetS artificial neural network (i.e., the “FlowNet2S” referred to in Ihler), including that it comprises a series of convolutional filters (e.g., Fig. 2, FlowNetSimple, every layer includes convolutional filters, which may be seen more-clearly in a color version of the reference; e.g., Sec. 3, Contracting part, 1st par.), and that processing the received image data using the artificial neural network comprises applying the convolutional filters to the received image data (e.g., Fig. 2, FlowNetSimple, received image data at left is processed towards right by applying the various convolutional filters).
Therefore, when read in the context of the disclosures of Ilg and Dosovitskiy, one of ordinary skill in the art would have understood the meaning of Ihler’s disclosure of “FlowNet2S” to include a disclosure of the features recited in claim 6, as best understood in view of the issues of indefiniteness noted above.
Examiner notes that, as explained in MPEP 2131.01, “[n]ormally only one reference should be used in making a rejection under 35 U.S.C. 102. However, a 35 U.S.C. 102 rejection over multiple references has been held to be proper when the extra references are cited to:
(A) Prove the primary reference contains an "enabled disclosure;"
(B) Explain the meaning of a term used in the primary reference; or 
(C) Show that a characteristic not disclosed in the reference is inherent.”
In this instance, the Ilg and Dosovitskiy references are being applied to explain the meaning of the term “FlowNet2S” used in the Ihler primary reference, so a multiple-reference rejection under 35 U.S.C. 102 is appropriate.

Regarding claim 7, Ihler in view of Ilg and Dosovitskiy discloses the method according to claim 6.
Ihler further discloses that the image transformation is dependent on content of the images input to the alignment pipeline (e.g., Sec. 2, 1st par., optical flow is dependent on content of the images, specifically their movement; e.g., Sec. B, the warping transformation is dependent on the optical flow; Therefore, the image transformation is dependent on content of the images input to the alignment pipeline).
As explained above with respect to claim 6, Ihler discloses a “FlowNet2S” artificial neural network and Ilg and Dosovitskiy explain the meaning of this term, including that in FlowNet2S the convolutional filters of the artificial neural network are independent of content of the first image (e.g., Dosovitskiy: Fig. 2, FlowNetSimple, and Sec. 5.1, 1st sentence, the convolutional filters are set by training and are not changed based on content of the input image – i.e., the convolutional filters are independent of content of the input image). 
Therefore, when read in the context of the disclosures of Ilg and Dosovitskiy, one of ordinary skill in the art would have understood the meaning of Ihler’s disclosure of “FlowNet2S” to include a disclosure of the features recited in claim 7.
Examiner notes that, as explained in MPEP 2131.01, “[n]ormally only one reference should be used in making a rejection under 35 U.S.C. 102. However, a 35 U.S.C. 102 rejection over multiple references has been held to be proper when the extra references are cited to:
(A) Prove the primary reference contains an "enabled disclosure;"
(B) Explain the meaning of a term used in the primary reference; or 
(C) Show that a characteristic not disclosed in the reference is inherent.”
In this instance, the Ilg and Dosovitskiy references are being applied to explain the meaning of the term “FlowNet2S” used in the Ihler primary reference, so a multiple-reference rejection under 35 U.S.C. 102 is appropriate.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ihler in view of ‘He’ (“FAKD: Feature-Affinity Based Knowledge Distillation for Efficient Image Super-Resolution,” 2020).

Regarding claim 12, Ihler teaches the method according to claim 1.
Ihler uses a knowledge distillation technique, where a teacher model is used to train a student artificial neural network (e.g., Sec. 3, 2nd par.; Sec. 3.1; Sec. 3.2).  Ihler performs its knowledge distillation using a loss function that penalizes differences between the outputs of the student and teacher (e.g., Sec. 4.3, Fine-tuning student model).
Ihler does not explicitly teach that the artificial neural network is trained using an affinity distillation loss function configured to determine a difference between a teacher affinity matrix and a student affinity matrix, wherein the teacher affinity matrix is indicative of dependencies between image features in a feature map generated by the alignment pipeline, and wherein the student affinity matrix is indicative of dependencies between image features in a feature map generated by the artificial neural network.
Like Ihler, He also teaches a knowledge distillation technique, where a teacher model is used to train a student artificial neural network (Illustrated at Fig. 1, described throughout).  He also performs its knowledge distillation using a loss function that penalizes differences between the outputs of the student and teacher (e.g., Sec. 2.2, teacher supervision (TS)).  
However, He further teaches that the student artificial neural network is trained using an affinity distillation loss function (Sec. 2.1, equation 1) configured to determine a difference between a teacher affinity matrix (Sec. 2.1, equation 1,                         
                            
                                    A
                                
                                    l
                                
                                    T
                                
                    ) and a student affinity matrix (Sec. 2.1, equation 1,                         
                            
                                    A
                                
                                    l
                                
                                    S
                                
                    ), wherein the teacher affinity matrix is indicative of dependencies between image features in a feature map generated by the teacher (i.e., the alignment pipeline) (Sec. 2.1, equations 2-3; Fig. 3), and wherein the student affinity matrix is indicative of dependencies between image features in a feature map generated by the student artificial neural network (Sec. 2.1, equations 2-3; Fig. 3).
He teaches that “The key to knowledge distillation is to design an appropriate mimicry loss function that can successfully propagate valuable information to guide the training process of the student model.”  (Sec. 2.1, 1st par.).  He teaches examples of performing knowledge distillation (a) using “teacher supervision” based on outputs of the student and teacher models (e.g., Table 2, second row, DS + TS), which is substantially the same as the loss function used by Ihler (see above), and (b) using teacher supervision combined with its affinity-matrix-based loss function (e.g., Table 2, last row, DS + TS + SA), and finds that combination (b) provides superior performance (Table 2, last row has highest PSNR values; Sec. 3.2, 1st par.).  He also shows that its affinity-matrix-based knowledge distillation outperforms other knowledge distillation techniques in the prior art (Sec. 3.2, Comparison with other feature KD methods; Table 4).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify the method of Ihler to add the affinity-matrix-based loss function of He in order to improve the method with the reasonable expectation that this would result in a method that provided improved knowledge distillation performance.  This technique for improving the method of Ihler was within the ordinary ability of one of ordinary skill in the art based on the teachings of He.
Therefore, it would have been obvious to one of ordinary skill in the art to combine the teachings of Ihler and He to obtain the invention as specified in claim 12.	

Claim(s) 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ihler in view of ‘Ihler-B’ (“Self-Supervised Domain Adaptation for Patient-Specific, Real-Time Tissue Tracking,” 2020).

Regarding claim 13, Ihler teaches the method of claim 1.
Ihler receives a pair of images (i.e., image data of a first image and a second image) (e.g., Sec. 3.2, image pair                         
                            x
                        
                    ) and preprocesses them (Sec. 4.2, last sentence, “We cropped all datasets to multiples of 64 to fit the models’ architecture”; This cropping of the images received from the datasets is preprocessing).  Ihler only computes one flow field alignment for the image pair (e.g., Sec. 3.2), which can be considered a forward flow alignment from the first image to the second image.
Ihler does not explicitly teach additionally computing a backward flow alignment from the second image to the first image.  I.e., Ihler does not explicitly teach processing the preprocessed image data of the second image using the artificial neural network to generate output image data of the second image, the output image data comprising an aligned image or a feature map of the aligned image, wherein the aligned image corresponds to a synthetic image in which one or more features or pixels in the second image are aligned with one or more corresponding features or pixels in the first image; and using the output image data of the second image for image processing.
However, Ihler-B describes extensions to the technique described in Ihler (e.g., Sec. 1, 3rd and 4th pars.), that include additionally computing a backward flow alignment from the second image to the first image (e.g., Sec. 3.1, last par., reverse/backward mapping                         
                            
                                    y
                                
                                    -
                                    1
                                
                            :
                            
                                    x
                                
                                    2
                                
                            →
                            
                                    x
                                
                                    1
                                
                    ).  I.e., Ihler-B teaches processing the received image data of the second image using the artificial neural network to generate output image data of the second image (e.g., Sec. 3.1, last par., reverse/backward mapping                         
                            
                                    y
                                
                                    -
                                    1
                                
                            :
                            
                                    x
                                
                                    2
                                
                            →
                            
                                    x
                                
                                    1
                                
                     is computed for cycle consistency, the backward mapping being indicative of alignment of the second image                         
                            
                                    x
                                
                                    2
                                
                     with the first image                         
                            
                                    x
                                
                                    1
                                
                    ), the output image data comprising an aligned image or a feature map of the aligned image, wherein the aligned image corresponds to a synthetic image in which one or more features or pixels in the second image are aligned with one or more corresponding features or pixels in the first image (see explanation given for output image data in rejection of claim 1); and using the output image data of the second image for image processing (Sec. 4.2, 1st par., cycle consistency error (CCE) is calculated using the reverse/backward mapping).
Ihler-B teaches that optical flow models should exhibit cycle consistency (Sec. 3.1, last par.), shows how to use backward flow in further image processing measure cycle consistency error (CCE) (Sec. 4.2, last par.), and demonstrates how image processing to measure CCE can be used to evaluate model performance (e.g., Table 1).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify the method of Ihler with the backward flow alignment and further CCE image processing of Ihler-B in order to improve the method with the reasonable expectation that this would result in a method that could advantageously make additional measurements of the performance of its model.  This technique for improving the method of Ihler was within the ordinary ability of one of ordinary skill in the art based on the teachings of Ihler-B.
Therefore, it would have been obvious to one of ordinary skill in the art to combine the teachings of Ihler and Ihler-B to obtain the invention as specified in claim 13.	

Claim(s) 19-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ihler.

Regarding claim 19, Examiner notes that the claim recites a computing device comprising: a memory comprising computer-executable instructions; a processor configured to execute the computer-executable instructions and cause the computing device to perform a method that is substantially the same as the method of claim 1.
Ihler teaches the method of claim 1 (see above).
Ihler generally teaches the use of computing devices (e.g., Sec. 4.3, Training time, GPU), but does not specifically teach implementing its method as a computing device comprising: a memory comprising computer-executable instructions; a processor configured to execute the computer-executable instructions and cause the computing device to perform the method.
However, it has been taken as admitted prior art that it is old and well-known in the art of image analysis to implement an image processing method as a computing device comprising: a memory comprising computer-executable instructions; a processor configured to execute the computer-executable instructions and cause the computing device to perform the method.  Such computer implementation advantageously allows the method to be performed quickly and efficiently.
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to implement the method of Ihler as a computing device comprising: a memory comprising computer-executable instructions; a processor configured to execute the computer-executable instructions and cause the computing device to perform the method in order to improve the method with the reasonable expectation that this would result in a method that could be performed quickly and efficiently.  
Therefore, it would have been obvious to one of ordinary skill in the art to combine the teachings of Ihler to obtain the invention as specified in claim 19.	

Regarding claim 20, Examiner notes that the claim recites non-transitory computer-readable medium comprising computer-executable instructions that, when executed by a processor of a computing device, cause the computing device to perform a method that is substantially the same as the method of claim 1.
Ihler teaches the method of claim 1 (see above).
Ihler generally teaches the use of computing devices (e.g., Sec. 4.3, Training time, GPU), but does not specifically teach implementing its method as a non-transitory computer-readable medium comprising computer-executable instructions that, when executed by a processor of a computing device, cause the computing device to perform the method.
However, it has been taken as admitted prior art that it is old and well-known in the art of image analysis to implement an image processing method as a non-transitory computer-readable medium comprising computer-executable instructions that, when executed by a processor of a computing device, cause the computing device to perform the method.  Such computer implementation advantageously allows the method to be performed quickly and efficiently.
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to implement the method of Ihler as a non-transitory computer-readable medium comprising computer-executable instructions that, when executed by a processor of a computing device, cause the computing device to perform the method in order to improve the method with the reasonable expectation that this would result in a method that could be performed quickly and efficiently.  
Therefore, it would have been obvious to one of ordinary skill in the art to combine the teachings of Ihler to obtain the invention as specified in claim 20.	

Claim(s) 21 and 23 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ihler in view of ‘Kroeger’ (“Fast Optical Flow Using Dense Inverse Search,” 2016).

Regarding claim 21, Ihler teaches the method of claim 1.
Ihler teaches that preprocessing the image data of the first image comprises cropping (Sec. 4.2, last sentence, “We cropped all datasets to multiples of 64 to fit the models’ architecture”; This cropping of the images received from the datasets is preprocessing).  Ihler does not explicitly teach downscaling preprocessing.
However, Kroeger does teach image preprocessing that includes downscaling (Sec. 3.3, especially below Table 3, input images are downscaled by factor                         
                            
                                    2
                                
                                    n
                                
                     to control trade-off between run-time and flow error).
Ihler uses neural networks to estimate flow for an input image (e.g., Sec. 4.2, first sentence).  Kroeger teaches that preprocessing images to downscale them before optical flow estimation is a non-intrusive way to adjust the trade-off between run-time and error (e.g., page 481, below Table 3).  This is because “run-times for optical flow methods are strongly linked to image resolution” (e.g., page 481, below Table 3).  I.e., downscaled images will provide advantageously faster runtime at a cost of higher error, and vice versa.  Incorporating downscaling pre-processing into the flow estimation of Ihler would advantageously allow for the trade-off between speed and accuracy of flow estimations to be more-finely analyzed and controlled to an optimal setting for a given use case.
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify the method of Ihler with the downscaling pre-processing of Kroeger in order to improve the method with the reasonable expectation that this would result in a method that advantageously allowed for the trade-off between speed and accuracy of flow estimations to be more-finely analyzed and controlled to an optimal setting for a given use case.  This technique for improving the method of Ihler was within the ordinary ability of one of ordinary skill in the art based on the teachings of Kroeger.
Therefore, it would have been obvious to one of ordinary skill in the art to combine the teachings of Ihler and Kroeger to obtain the invention as specified in claim 21.	

Regarding claim 23, Ihler in view of Kroeger teaches the method of claim 21, and Kroeger further teaches that the image data is downscaled by a factor of two (e.g., Sec. 3.3 and Figs. 4-5, downscaling by factor of                                 
                                    
                                            2
                                        
                                            n
                                        
                             starting at                                 
                                    n
                                    =
                                    0
                                
                             and increased in increments of                                 
                                    0.5
                                
                            , which would include downscaling by a factor of                                 
                                    
                                            2
                                        
                                            n
                                            =
                                            1
                                        
                                    =
                                    2
                                
                            ).

Claim(s) 22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ihler in view of Kroeger as applied above, and further in view of Ilg and Dosovitskiy.

Regarding claim 22, Ihler in view of Kroeger teaches the method of claim 21.
Ihler discloses that, “[f]or all our experiments we used the accurate, high-complexity FlowNet2 framework [Ilg et al.(2017)] as our teacher model                 
                    h
                
             and its fast FlowNet2S component as our low-complexity student model                 
                    g
                    .
                
            ” (Sec. 4.2).
Ihler does not describe details of the “FlowNet2S” model within its own text.  Therefore, Ihler itself does not explicitly disclose that the artificial neural network comprises a series of convolutional filters, and that processing the received image data using the artificial neural network comprises applying the convolutional filters to the received image data, as best understood in view of the issues of indefiniteness noted above.
However, Ilg does disclose a FlowNet 2.0 model (e.g., Fig. 2) includes a FlowNetS artificial neural network that is described by Dosovitskiy.  For example, “We tested the two network architectures introduced by Dosovitskiy et al. [10]: FlowNetS, which is a straightforward encoder-decoder architecture …” (Sec. 3, 4th par.).  Also see page 1651, top-right and Table 3, FlowNet2-S, which is an alternative notation for the FlowNetS in the original FlowNet described by Dosovitskiy.  In view of Ilg’s disclosure, one of ordinary skill in the art would have recognized that “FlowNet2S” means the FlowNetS disclosed by Dosovitskiy.
Dosovitskiy discloses details of the FlowNetS artificial neural network (i.e., the “FlowNet2S” referred to in Ihler), including that it comprises one or more strided-convolution layers to downscale the image data of the first image (e.g., Sec. 5.1, Fig. 2, each of the network architectures has “nine convolutional layers with stride of 2 (the simplest form of pooling) in six of them”; Fig. 2 illustrates the downscaling caused by the stride of 2, such as a reduction of resolution from 384x512 to 192x256 from the input to the first feature map [best seen when enlarged]).
Therefore, when read in the context of the disclosures of Ilg and Dosovitskiy, one of ordinary skill in the art would have understood the meaning of Ihler’s disclosure of “FlowNet2S” to include a disclosure of the features recited in claim 22, as best understood in view of the issues of indefiniteness noted above.

Conclusion
The following prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
‘Delorme’ (“Image preprocessing,” 2021)
Discusses image preprocessing including both cropping and downscaling and various reasons why such preprocessing may be beneficial

Any inquiry concerning this communication or earlier communications from the examiner should be directed to GEOFFREY E SUMMERS whose telephone number is (571)272-9915. The examiner can normally be reached Monday-Friday, 7:00 AM to 3:30 PM ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chan Park can be reached at (571) 272-7409. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/GEOFFREY E SUMMERS/Examiner, Art Unit 2669
Read full office action
Prosecution Timeline

Jan 31, 2023
Application Filed
May 14, 2025
Non-Final Rejection — §102, §103, §112
Aug 19, 2025
Response Filed
Sep 24, 2025
Final Rejection — §102, §103, §112
Dec 19, 2025
Request for Continued Examination
Jan 17, 2026
Response after Non-Final Action
Jan 26, 2026
Non-Final Rejection — §102, §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/545,794
Patent 12586379
SYSTEM FOR DETECTING OCCURRENCE PERIOD OF CYCLICAL EVENT
2y 5m to grant Granted Mar 24, 2026
18/055,386
Patent 12561755
System and Method for Image Super-Resolution
2y 5m to grant Granted Feb 24, 2026
17/973,809
Patent 12555205
METHOD AND APPARATUS WITH IMAGE DEBLURRING
2y 5m to grant Granted Feb 17, 2026
18/560,833
Patent 12541838
INSPECTION APPARATUS AND REFERENCE IMAGE GENERATION METHOD
2y 5m to grant Granted Feb 03, 2026
18/301,032
Patent 12536682
METHOD AND SYSTEM FOR GENERATING A DEPTH MAP
2y 5m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
72%
Grant Probability
99%
With Interview (+35.4%)
2y 5m
Median Time to Grant
High
PTA Risk
Based on 348 resolved cases by this examiner. Grant probability derived from career allow rate.