DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Election/Restrictions
Applicant’s election without traverse of Group I, Claims 1-9 in the reply filed on 01/30/2026 is acknowledged.
Claims 10-17 are withdrawn from further consideration pursuant to 37 CFR 1.142(b) as being drawn to a nonelected Group II, as there being no allowable generic or linking claim.
Specification
Applicant is reminded of the proper language and format for an abstract of the disclosure.
The abstract should be in narrative form and generally limited to a single paragraph on a separate sheet within the range of 50 to 150 words in length. The abstract should describe the disclosure sufficiently to assist readers in deciding whether there is a need for consulting the full patent text for details.
The language should be clear and concise and should not repeat information given in the title. It should avoid using phrases which can be implied, such as, “The disclosure concerns,” “The disclosure defined by this invention,” “The disclosure describes,” etc. In addition, the form and legal phraseology often used in patent claims, such as “means” and “said,” should be avoided.
The abstract of the disclosure is objected to because it is 162 words. A corrected abstract of the disclosure is required and must be presented on a separate sheet, apart from any other text. See MPEP § 608.01(b).
The disclosure is objected to because of the following informalities:
In paragraph 0003, line 2, "ultrasound." should read "ultrasound,"
In paragraph 0007, line 4, "fames" should read "frames."
In paragraph 0021, line 10, "framerate" should read "frame rate."
In paragraph 0025, line 14, "imagen” should read “image.”
In paragraph 0026, line 17, “with respect a set of adjacent image frames” should read “with respect to a set of adjacent image frames.”
In paragraph 0028, line 1 "interpolate image frames" should read “interpolated image frames.”
In paragraph 0028, lines 4-5, “the first plurality of images frames” should read “the first plurality of image frames.”
In paragraph 0028, line 12, reference number 118 refers to “a display,” but reference number 118 was previously referred to as “the post-processing module” in line 1.
In paragraph 0029, line 9, "a third of image frames" should read “a third plurality of image frames.”
In paragraph 0029, line 13, “the frame rate second dynamic image" should read “the frame rate of the second dynamic image.”
In paragraph 0038, line 13, “framerates” should read “frame rates.”
In paragraph 0038, line 13, "framerate" should read “frame rate.”
In paragraph 0038, line 16 "creating training sample" should read “creating a training sample.”
In paragraph 0040, line 2, "the process return to" should read “the process returns to.”
In paragraph 0040, line 2, “the process return to block 604" should read, “the process returns to block 404,” in order to be consistent with Figure 4A.
In paragraph 0041, line 12, "fames" should read “frames.”
In paragraph 0042, line 4, "can also referenced to as” should read “can also be referenced to as.”
In paragraph 0042, line 5, "cam" should read “can.”
In paragraph 0042, line 16, "framerate" should read “frame rate.”
In paragraph 0044, line 10, “interpolation 614 layer” should read “interpolation layer 614.”
In paragraph 0044, line 19, "FIG., 6A" should read "FIG. 6A"
In paragraph 0044, line 23, "downsampling 632" should read "downsampling layer 632."
In paragraph 0046, line 1, “interpolation 614 layer” should read “interpolation layer 614.”
In paragraph 0046, line 8, “interpolation 614 layer” should read “interpolation layer 614.”
In paragraph 0046, line 9, “interpolation 614 layer” should read “interpolation layer 614.”
In paragraph 0046, line 17, a period is missing at the end of the sentence after “using a blending mask.”
In paragraph 0046, line 26, reference number “610” is used to refer to the “output.” The “output” has previously been denoted as reference number “606.” Reference number “610” is referred to as “three upsampling convolutional decoding layers” in line 9 of paragraph 0044.
Appropriate correction is required.
Drawings
The drawings are objected to because reference number 102 in Figure 1 reads “Dynmaic,” and should read “Dynamic.”
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Claim Objections
Claims 5 and 6 are objected to because of the following informalities:
In claim 5, line 5, “fames” should read “frames.”
In claim 6, line 2, “fames” should read “frames.”
Appropriate correction is required.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 3, 5, and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Guo et al. (NPL “A Spatiotemporal Volumetric Interpolation Network for 4D Dynamic Medical Image,” 2020, hereafter referred to as Guo) in view of Chen et al. (International Patent Pub. No. WO 2022/104194 A1, hereafter referred to as Chen).
Regarding Claim 1, Guo teaches a system for increasing a frame rate of a dynamic image (1 Introduction, Guo teaches a spatiotemporal volumetric interpolation network (SVIN) for dynamic medical image interpolation. SVIN was used to increase the temporal resolution in magnetic resonance (MR) images. The Examiner interprets that increasing the temporal resolution of dynamic images is synonymous to increasing the frame rate.), wherein the first dynamic image has a first plurality of image frames and a first frame rate (3 Proposed Method, Guo teaches a sequence of volumetric images, with {𝑰𝑻, 𝑻=𝟏,𝟐…,𝑵} representing the cardiac motion from end-diastole (ED) (T=1) to end-systole (T=N) phase, with {𝑰𝒊,𝑰𝒋 | (𝒊,𝒋) ∈ 𝑻} being a pair of cardiac images indicating two random time points within the cardiac motion.); a deformation encoding neural network coupled to the input and configured to derive at least one parameter characterizing dynamics of the set of consecutive image frames and to generate an interpolated image frame based on the at least one parameter (Abstract, Guo teaches a spatiotemporal volumetric interpolation network (SVIN). SVIN introduces dual networks: first, the spatiotemporal motion network that derives spatiotemporal motion field from two-image volumes; the second is the sequential volumetric interpolation network, which uses the derived motion field to interpolate images. The Examiner interprets “spatiotemporal motion field” to be a parameter characterizing dynamics of the image frames.);
PNG
media_image1.png
529
978
media_image1.png
Greyscale
and a post-processing module coupled to the deformation encoding neural network and configured to receive one or more interpolated image frames from the deformation encoding neural network (3 Proposed Method, Fig. 2, Guo teaches a regression-based module coupled to the volumetric interpolation network which takes the coarsely interpolated intermediate images as input from the volumetric interpolation network.), to create a second plurality of image frames comprising the one or more interpolated image frames and the first plurality of image frames (3 Proposed Method, Fig. 2, Guo teaches further refining the coarsely interpolated intermediate images to constrain the interpolation to follow the patterns of cardiac biological motion. The interpolated image frames,
I
t
1
/
4
and
I
t
1
/
2
are image frames generated between the images at ED and ES phases at time points t=1/4 and t=1/2, respectively. The Examiner interprets one or more interpolated image frames to mean only one is required. The Examiner also interprets the first plurality of image frames to require two or more image frames to meet the limitation.), and to generate a second dynamic image using the second plurality of image frames (Fig. 2, Guo teaches interpolated images,
I
t
1
/
4
and
I
t
1
/
2
generated by the SVIN, which represent intermediate interpolated frames at time points t=1/4 and t=1/2 between input frames, 𝑰𝟎 and 𝑰𝟏. The Examiner interprets the second dynamic image to be made up of the two original input image frames (𝑰𝟎,𝑰𝟏) with the intermediate interpolated image frames
I
t
1
/
4
and
I
t
1
/
2
“in-between” since the claim is silent as to how the second dynamic image is generated using the second plurality of image frames.).
Guo does not explicitly teach the system comprising: an input for receiving a set of consecutive image frames of a first dynamic image and the second dynamic image having a second frame rate higher than the first frame rate.
Chen is in the same field of art of modifying medical image data by altering the amount of time between any two frames of captured medical image data, with the modified image data having an interpolated image frame created from other frames in the captured medical image data. Further, Chen teaches the system comprising: an input for receiving a set of consecutive image frames of a first dynamic image (Paragraphs [0141], [0010], Chen teaches receiving a sequence of captured image frames for the modification of medical images through a stochastic temporal data augmentation. The captured image data includes the motion between the imaging device and tissue.), and the second dynamic image having a second frame rate higher than the first frame rate (Paragraph [0173], Chen teaches altering an amount of time between any two frames of captured image data by adding one or more frames between any two images of the captured image data. The added frames may be interpolated frames to appear between other frames. The Examiner interprets, in light of the instant applications specification, which states, “to create a second plurality of image frames comprising the one or more interpolated image frames and the first plurality of image frames, and to generate a second dynamic image using the second plurality of image frames,” the second dynamic image is being interpreted as being the first plurality of image frames (requires a minimum of two) with the interpolated image frame/(s) “in-between.” Therefore, the Examiner interprets that altering an amount of time between two image frames by adding additional frames between the images effectively increases the number of frames within the same amount of elapsed time between the input images, therefore, increasing the frame rate of the dynamic image since more frames in the same amount of time increases the frame rate.).
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Guo by providing as input, a series of consecutive image frames to the neural network, that is taught by Chen to make the invention that interpolates image frames from a series of consecutive image frames and generates a second dynamic image made up of the first dynamic image and the interpolated image frames, having a frame rate higher than the first dynamic image; thus, one of ordinary skill in the art would have been motivated to combine the references because cardiac motion images are characterized by twisting action during contraction to relaxation of the heart structure, and have complex changes in muscle morphology and therefore interpolation of the cardiac motion images increases the temporal resolution to capture the cardiac cycle more completely (Guo, Section 1, Introduction).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.
Regarding Claim 3, Guo in view of Chen teaches the system according to claim 1, wherein the first dynamic image and the second dynamic image are magnetic resonance dynamic images (4.1 Materials and implementation details, Fig. 5, Guo teaches interpolation of images from the ACDC dataset, which contains 4D magnetic resonance (MR) cardiac cine images. The first dynamic image is made up of the input images,
I
0
and
I
1
, which represent images taken at two random time points within the cardiac motion. The second dynamic image includes the input images,
I
0
and
I
1
, with the intermediate interpolated images,
I
t
1
/
4
and
I
t
1
/
2
being “in-between” them.).
Regarding Claim 5, Guo teaches a method for increasing a frame rate of a dynamic image (1 Introduction, Guo teaches a deep-learning method, using a spatiotemporal volumetric interpolation network (SVIN) for dynamic medical image interpolation. The Examiner interprets that increasing the temporal resolution of the dynamic image is synonymous to increasing frame rate.), the method comprising: receiving a first dynamic image having a first plurality of image frames and a first frame rate (3 Proposed Method, Guo teaches receiving a pair of cardiac images indicating two random time points within the cardiac motion cycle. Specifically, images were used from end-diastole (ED) and end-systole (ES). The Examiner interprets that the pair of cardiac cine MR images have an initial frame rate since they were taken at two random time points within the cardiac motion. The Examiner interprets that a plurality of image frames means two or more to meet the limitation.); generating an interpolated image frame using the deformation encoding neural network by deriving at least one parameter characterizing dynamics of the set of consecutive image frames and generating the interpolated image frame based on the at least one parameter (Abstract, Fig. 2, Guo teaches a spatiotemporal volumetric interpolation network (SVIN). SVIN introduces dual networks: first, the spatiotemporal motion network that derives spatiotemporal motion field from two-image volumes; the second is the sequential volumetric interpolation network, which uses the derived motion field to interpolate images. The sequential volumetric interpolation network generates interpolated image frames
I
t
1
/
4
and
I
t
1
/
2
, at time points t=1/4 and t=1/2, respectively, by deriving deformation fields between the two input image frames. The Examiner interprets “spatiotemporal motion field” to be a parameter characterizing dynamics of the consecutive image frames.); and generating a second dynamic image using the second plurality of image frames (3 Proposed Method, Fig. 2, Guo teaches generating interpolated images,
I
t
1
/
4
and
I
t
1
/
2
, which represent intermediate interpolated frames at time points t=1/4 and t=1/2 between input image frames,
I
0
and
I
1
. The Examiner interprets the second dynamic image to be made up of the two original input image frames with the intermediate image frames being “in-between.”).
Guo does not explicitly disclose selecting a plurality of sets of consecutive image frames from the first plurality of image fames of the first dynamic image; for each set of consecutive image frames: providing the set of consecutive image frames to a deformation encoding neural network; and storing the interpolated image frame in data storage; creating a second plurality of image frames comprising the first plurality of image frames and the interpolated image frame generated from each set of consecutive image frames, and the second dynamic image having a second frame rate higher than the first frame rate.
Chen is in the same field of art of art of modifying medical image data by altering an amount of time between any two frames of captured medical image data, with the modified image data having an interpolated image frame created from other frames in the captured medical image data. Further, Chen teaches selecting a plurality of sets of consecutive image frames from the first plurality of image fames of the first dynamic image (Paragraph [0009], Chen teaches splitting the captured image data into a plurality of sub-sequences of a predetermined length.); for each set of consecutive image frames: providing the set of consecutive image frames to a deformation encoding neural network (Paragraph [0014], Fig. 1, Chen teaches a series of training medical images comprising a plurality of frames in a sequence and generating a set of deformed control points for each frame of the at least a subset of frames with a convolutional neural network. The Examiner interprets the convolutional neural network to be a deformation encoding neural network since it generates deformed control points.); and storing the interpolated image frame in data storage (Fig. 10, reference character 908, Paragraphs [0191], [0193], [0149], [0009], Chen teaches a storage component, 908, which stores information related to the operation and use of device 900. Device 900 may correspond to the modification engine 100, which may add a subsequence of frames to the captured image data, each of the one or more new frames comprising a copy of one of the new frames, a composite frame created from other frames, an interpolated image frame generated to appear between other frames in the captured image data, etc..); creating a second plurality of image frames comprising the first plurality of image frames and the interpolated image frame generated from each set of consecutive image frames (Paragraphs [0141], [0008], Chen teaches dividing a captured image sequence into multiple subsequences where each subsequence may have a different magnitude of spatial and/or temporal shifts applied to the frames within the sequence. A subsequence of one or more new frames may be added to the captured image data, each of the one or more new frames comprising an interpolated frame generated to appear between other frames in the captured image data.); and the second dynamic image having a second frame rate higher than the first frame rate (Paragraph [0173], Chen teaches altering an amount of time between any two frames of captured image data by adding one or more frames between any two images of the captured image data. The added subsequence of frames may be interpolated frames to appear between other frames. The Examiner interprets, in light of the instant applications specification, which states, “to create a second plurality of image frames comprising the one or more interpolated image frames and the first plurality of image frames, and to generate a second dynamic image using the second plurality of image frames,” the second dynamic image is being interpreted as the first plurality of image frames (two or more) with the interpolated image frame(s) “in-between.” Therefore, the Examiner interprets that altering an amount of time between two frames by adding additional frames between the input images effectively increases the number of frames within the same amount of elapsed time between the input images, therefore, increasing the frame rate of the dynamic image since more frames in the same amount of time increases the frame rate.).
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Guo by selecting a plurality of sub-sequences of captured image data, providing one or more of the sub-sets of frames to a convolutional neural network to generate parameters characterizing dynamics of the input image frames, storing the interpolated image frames in a storage component, and generating interpolated image frames to appear between other frames in the captured image data to create a second plurality of image frames containing the original captured image data and the interpolated image frames, the second plurality of frames having a higher frame rate, that is taught by Chen to make the invention that interpolates image frames from a plurality of sub-sequences of consecutive image frames to generate a second dynamic image with a faster frame rate; thus, one of ordinary skill in the art would have been motivated to combine the references because medical imaging datasets are often limited in quantity and span a restricted distribution (Chen, Paragraph [0005]) and common temporal data augmentation methods such as window warping and other methods do not address rapid and/or unpredictable changes in medical imaging (Chen, Paragraph [0006]).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.
Regarding Claim 8, Guo in view of Chen teaches the method according to claim 5, wherein the first dynamic image and the second dynamic image are magnetic resonance dynamic images (4.1 Materials and implementation details, Fig. 5, Guo teaches interpolation of the ACDC dataset, which is made up of cine-MRI images. The first dynamic image includes the input images,
I
0
and
I
1
, which represent two random time points within the cardiac motion. The second dynamic image includes the input images,
I
0
and
I
1
, with the intermediate interpolated images,
I
t
1
/
4
and
I
t
1
/
2
being “in-between” them.).
Claims 2 and 6-7 are rejected under 35 U.S.C. 103 as being unpatentable over Guo et al. (NPL “A Spatiotemporal Volumetric Interpolation Network for 4D Dynamic Medical Image,” 2020, hereafter referred to as Guo) in view of Chen et al. (International Patent Pub. No. WO 2022/104194 A1, hereafter referred to as Chen) in further view of Hsiao et al. (U.S. Patent Pub. No. 2020/0219262 A1, hereafter referred to as Hsiao).
Regarding Claim 2, Guo in view of Chen discloses the system according to claim 1.
Guo in view of Chen does not explicitly disclose wherein the set of consecutive image frames comprises four consecutive image frames.
Hsiao is in the same field of art of learning the dynamic temporal features of cardiac MRI images using a neural network. Further, Hsiao teaches wherein the set of consecutive image frames comprises four consecutive image frames (Paragraph [0051], Fig. 2, Hsiao teaches four consecutive frames as the network input.).
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Guo in view of Chen by providing four consecutive image frames as input to the neural network, that is taught by Hsiao to make the invention that interpolates consecutive MRI frames using an input set of four consecutive image frames; thus, one of ordinary skill in the art would have been motivated to combine the references because by simultaneously providing a set of four consecutive image frames to the neural network, it provides the network with the temporal context of each frame to address the temporal relationship between image frames (Hsiao, Paragraphs [0049] and [0052]).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.
In regards to Claim 6, Guo in view of Chen discloses the method according to claim 5.
Guo in view of Chen does not explicitly disclose wherein selecting a plurality of sets of consecutive image frames from the first plurality of image fames of the first dynamic image comprises using a sliding window technique.
Hsiao is in the same field of art of evaluating the dynamic temporal changes in cardiac MRI images using a neural network. Further, Hsiao teaches wherein selecting a plurality of sets of consecutive image frames from the first plurality of image fames of the first dynamic image comprises using a sliding window technique (Paragraphs [0018], [0013], Fig. 13, Hsiao teaches applying a sliding window to the sequence of image frames to identify a plurality of image frame windows within the sequence. The sliding window approach defines subsets of image frames within the frame set.).
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Guo in view of Chen by applying a sliding window technique to the series of consecutive MRI images to identify a plurality of sets of image frames to provide to the neural network, that is taught by Hsiao to make the invention that uses a sliding window technique to select each set of consecutive image frames to generate various interpolated image frames; thus, one of ordinary skill in the art would have been motivated to combine the references to automate the process of generating different frame windows to improve reliability and repeatability of the image set selection process (Hsiao, Paragraph [0012]).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.
In regards to Claim 7, Guo in view of Chen discloses the method according to claim 5.
Guo in view of Chen does not explicitly disclose wherein each set of consecutive image frames comprises four consecutive image frames.
Hsiao is in the same field of art of analyzing the dynamic temporal activities in medical images using a neural network model. Further, Hsiao discloses wherein each set of consecutive image frames comprises four consecutive image frames (Paragraph [0049], Hsiao teaches implementing a sliding window approach where multiple windows, each consisting of four consecutive frames, are shown simultaneously to the neural network.).
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Guo in view of Chen by providing sets of four consecutive image frames as input to the neural network, that is taught by Hsiao to make the invention that interpolates consecutive MRI frames using each input set of four consecutive image frames; thus, one of ordinary skill in the art would have been motivated to combine the references because by simultaneously providing a set of four consecutive image frames to the neural network, it provides the network with the temporal context of each frame to address the temporal relationship between image frames (Hsiao, Paragraphs [0049] and [0052]).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.
Claims 4 and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Guo et al. (NPL “A Spatiotemporal Volumetric Interpolation Network for 4D Dynamic Medical Image,” 2020, hereafter referred to as Guo) in view of Chen et al. (International Patent Pub. No. WO 2022/104194 A1, hereafter referred to as Chen) in further view of Shi et al. (NPL “Video Frame Interpolation Transformer”, 2022, hereafter referred to as Shi).
Regarding Claim 4, Guo in view of Chen disclose the system according to claim 1.
Guo in view of Chen does not explicitly disclose wherein the deformation encoding neural network comprises a transformer-based deep learning architecture.
Shi is in the same field of art of interpolating image frames to temporally up-sample an input video by synthesizing new frames between existing ones. Further, Shi teaches wherein the deformation encoding neural network comprises a transformer-based deep learning architecture (1 Introduction, 3.1 Learning Deep Features, Shi teaches the Video Frame Interpolation Transformer (VFIT) for video interpolation. A Transformer-based encoder-decoder architecture is used to extract deep hierarchal features from images.).
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Guo in view of Chen by applying a transformer-based architecture to extract hierarchical feature representations to capture the multi-scale motion information from the images, that is taught by Shi to make the invention that interpolates input image frames using the derived motion information to increase the frame rate of the input image to generate a second dynamic image with a higher frame rate; thus, one of ordinary skill in the art would have been motivated to combine the references to overcome the drawbacks associated with CNN-based architectures such as their inefficiency in exploiting long-range information and are thus less effective in synthesizing high-quality image/video frames (Shi, 1 Introduction). In addition, Transformers are designed to efficiently model long-range dependences between input and output to overcome the drawbacks of CNN-based algorithms, and are therefore better suited for the task of video interpolation (Shi, 1 Introduction).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.
In regards to Claim 9, Guo in view of Chen disclose the method according to claim 5.
Guo in view of Chen does not explicitly disclose wherein the deformation encoding neural network comprises a transformer-based deep learning architecture.
Shi is in the same field of art of interpolating image frames to temporally up-sample an input video by synthesizing new frames between existing ones. Further, Shi teaches wherein the deformation encoding neural network comprises a transformer-based deep learning architecture (1 Introduction, 3.1 Learning Deep Features, Shi teaches the Video Frame Interpolation Transformer (VFIT) for video interpolation. A Transformer-based encoder-decoder architecture is used to extract deep hierarchal features from images.).
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Guo in view of Chen by applying a transformer-based architecture to extract hierarchical feature representations to capture multi-scale motion information from the images, that is taught by Shi to make the invention that interpolates sets of input image frames using the derived motion information from the images to increase the frame rate of the input image to generate a second dynamic image with a higher frame rate; thus, one of ordinary skill in the art would have been motivated to combine the references to overcome the drawbacks associated with CNN-based architectures such as their inefficiency in exploiting long-range information and are thus less effective in synthesizing high-quality image/video frames (Shi, 1 Introduction). In addition, Transformers are designed to efficiently model long-range dependences between input and output to overcome the drawbacks of CNN-based algorithms, and are therefore better suited for the task of video interpolation (Shi, 1 Introduction).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SYDNEY L BLACKSTEN whose telephone number is (571)272-7651. The examiner can normally be reached 8:30am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Oneal Mistry can be reached at 313-446-4912. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/SYDNEY L BLACKSTEN/
Examiner, Art Unit 2674
/ONEAL R MISTRY/Supervisory Patent Examiner, Art Unit 2674