Last updated: April 19, 2026
Application No. 18/441,871
METHOD AND APPARATUS FOR GENERATING IMAGE CHANGE DATA

Non-Final OA §103§112
Filed
Feb 14, 2024
Examiner
LE, MICHAEL
Art Unit
2614
Tech Center
2600 — Communications
Assignee
Postech Research And Business Development Foundation
OA Round
1 (Non-Final)
Interview Optional

— +22.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 864 resolved cases, 2023–2026
Examiner Intelligence

LE, MICHAEL View full profile →
Grants 66% — above average
Career Allow Rate
568 granted / 864 resolved
+3.7% vs TC avg
Strong +22% interview lift
Without
With
+22.1%
Interview Lift
resolved cases with interview
Typical timeline
3y 3m
Avg Prosecution
61 currently pending
Career history
925
Total Applications
across all art units
Statute-Specific Performance

§101
12.4%
-27.6% vs TC avg
§103
52.7%
+12.7% vs TC avg
§102
13.4%
-26.6% vs TC avg
§112
15.9%
-24.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 864 resolved cases
Office Action

§103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.

Claim Rejections - 35 USC § 112
2.	The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

3.	Claims 1-18 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
Claim 2, line 2 cites "the first proxy network”. The limitation "first proxy network" is previously introduced in claim 1, line 7 in “a pre-trained first proxy network”. It is not clear whether "first proxy network” in claim 2, line 2 is the same with "first proxy network” in claim 1, line 7.
Claim 2, line 9 cites "the second proxy network”. The limitation "second proxy network" is previously introduced in claim 1, line 8 in “a pre-trained second proxy network”. It is not clear whether "second proxy network” in claim 2, line 9 is the same with "second proxy network” in claim 1, line 8.
Claim 7, line 2 cites "the first proxy network”. The limitation "first proxy network" is previously introduced in claim 1, line 7 in “a pre-trained first proxy network”. It is not clear whether "first proxy network” in claim 7, line 2 is the same with "first proxy network” in claim 1, line 7.
Claim 12, line 1 cites "the first proxy network”. The limitation "first proxy network" is previously introduced in claim 1, line 7 in “a pre-trained first proxy network”. It is not clear whether "first proxy network” in claim 12, line 1 is the same with "first proxy network” in claim 1, line 7.
Claim 12, line 2 cites "the second proxy network”. The limitation "second proxy network" is previously introduced in claim 1, line 8 in “a pre-trained second proxy network”. It is not clear whether "second proxy network” in claim 12, line 2 is the same with "second proxy network” in claim 1, line 8.
Claim 13, line 2 cites "the first proxy network”. The limitation "first proxy network" is previously introduced in claim 1, line 7 in “a pre-trained first proxy network”. It is not clear whether "first proxy network” in claim 13, line 2 is the same with "first proxy network” in claim 1, line 7.
Claim 13, line 2 cites "the second proxy network”. The limitation "second proxy network" is previously introduced in claim 1, line 8 in “a pre-trained second proxy network”. It is not clear whether "second proxy network” in claim 13, line 2 is the same with "second proxy network” in claim 1, line 8.
Claim 15, line 2 cites "the discriminator”. The limitation "discriminator" is previously introduced in claim 14, line 12 in “a pretrained discriminator”. It is not clear whether "discriminator" in claim 15, line 2 is the same with "discriminator" in claim 14, line 12.
Claim 18, line 6 cites "the data generator”. The limitation "data generator" is previously introduced in claim 18, line 4 in “data generator”. It is not clear whether "data generator" in claim 18, line 6 is the same with "data generator" in claim 18, line 4.

Claim Rejections - 35 USC § 103
4.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

5.	Claims 1 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al., (machine translation of [KR-20230076721-A] cited by applicant, hereinafter "Kim")  in view of Krupani et al., (“Krupani”) [US-2025/0061591-A1]
Regarding claim 1, Kim discloses a method for generating image change data to be performed by an image change data generating apparatus (Kim- ¶0001, at least discloses a video prediction device, method, and a computer-readable program therefor that predict the movement of at least one object within a video frame [an image change data] and generate a predicted video frame; Fig. 1 and ¶0024, at least disclose a video prediction device (100) that generates a predicted video frame by predicting the movement of at least one object within a video frame), the method comprising:
inputting an initial image and data parameters to a data generator (Kim- ¶0008, at least discloses a video prediction device [data generator] that predicts the movement of at least one object in a video frame [an initial image] and generates a predicted video frame, the device including an object movement analysis unit that analyzes the movement of the object over time to determine the type of movement of the object, an object movement prediction unit that learns the movement over time according to the determined type of movement and predicts the object movement; Fig. 8 shows an example of optical flow in a predicted video frame and an actual video frame; ¶0013, at least discloses calculating the optical flow [data parameters] difference between the predicted video frame and the actual video frame and converting it into a scalar-type loss; ¶0066, at least discloses a result of expressing the optical flow result of a predicted frame that occurred between time t and t+1 as a vector);
generating a first image and a second image respectively, within a first time interval by using the data generator based on the initial image and the data parameters (Kim- ¶0008, at least discloses a video prediction device that predicts the movement of at least one object in a video frame and generates a predicted video frame; ¶0024, at least discloses video prediction device (100) that generates a predicted video frame by predicting the movement of at least one object within a video frame;  Fig. 8 and ¶0066, at least disclose the loss function application unit (130) first calculates the optical flow [data parameters] that occurs when moving from the current time frame to the next time frame [generating a first image and a second image] […] According to FIG. 8, (a) in FIG. 8 is a vector representation of the optical flow result of the predicted frame occurring between time t and t+1, and (b) in FIG. 8 is the result of expressing the optical flow result of the predicted frame occurring between time t and t+1 [generating a first image and a second image respectively, within a first time interval]. This is the result of expressing the optical flow result of the actual frame that occurred as a vector), the first image and the second image being in a relationship of consecutive frames with each other (Kim- Fig. 8 and ¶0066, at least disclose According to FIG. 8, (a) in FIG. 8 is a vector representation of the optical flow result of the predicted frame occurring between time t and t+1, and (b) in FIG. 8 is the result of expressing the optical flow result of the predicted frame occurring between time t and t+1 [a relationship of consecutive frames with each other]. This is the result of expressing the optical flow result of the actual frame that occurred as a vector); and
generating a first image change data by using a  first proxy network and a second image change data by using a  second proxy network based on the first image and the second image (Kim- ¶0002, at least discloses The accuracy of deep learning models for detecting and separating objects in images is increasing […] This research is called video prediction, and there are several fields such as Abnormal Detection, which detects abnormal situations, and learning games by combining reinforcement learning and DCNN (Deep Convolutional Neural Network) in games;  ¶0008, at least discloses a video prediction device that predicts the movement of at least one object in a video frame and generates a predicted video frame, the device including an object movement analysis unit that analyzes the movement of the object over time to determine the type of movement of the object, an object movement prediction unit that learns the movement over time according to the determined type of movement and predicts the object movement;  ¶0066, at least disclose the loss function application unit (130) first calculates the optical flow that occurs when moving from the current time frame to the next time frame [generating a first image change data … and a second image change data]).
Kim does not explicitly disclose generating a first image change data by using a pre-trained first proxy network and a second image change data by using a pre-trained second proxy network.
However, Krupani discloses
generating a first image change data by using a pre-trained first proxy network and a second image change data by using a pre-trained second proxy network (Krupani- ¶0114-0115, at least disclose features for detection of tiny objects (referred to as UAVs in the disclosure) and motion of the object within a field of view of one or more cameras, such as by implementing a network model, such as a Deep Learning Neural Network architecture, which is trained to detect the target objects based on high-resolution image data […] The Deep Learning Neural Network (DNN), in some aspects, includes features of a Convolutional Neural Network (CNN) and may be configured to detect tiny (or small) objects at an extended distance through innovative design enhancements, such as adjusting an input to include both red, green, and blue (RGB) data and motion data;  Fig. 3 and ¶0137-0139, at least disclose a training process of a NN model of an image processing system […] The YOLO backbone can be a convolutional neural network that pools image pixels to form features at different resolutions. The YOLO backbone can be pretrained on a classification dataset, such as ImageNet, in order to reduce the size of the input data while maintaining relevant features […] FIG. 3 illustrates an architecture 300 than can take image data (e.g., differential image data 114, threshold motion data 116) as input at 302 and employs a deep neural network model or a convolutional neural network to detect motion of the target object in the image […] The architecture 300 can takes image data as input at 302 and employs a DNN or CNN to detect one or more target objects (e.g., UAVs 110) in the image data […] The compressed dataset of each tile of each image frame is shown as C1, C2, C3, and C4 in 302. In various implementations, a combination of image frame and tile of the image frame are fed together as an input to the training model at 302. A network model, such as a neural network or Convolutional Neural Network (CNN), can be established that pools image pixels to form features at different resolutions […]The backbone 302 can processes each tile and image frame sequentially, concurrently, or a combination of, in the convolutional layers of the CNN model. The CNN model extracts feature representations from different resolutions of the input image. The input tiles and image frames undergo a series of convolutions and pooling operations in the convolutional layer and filters to analyze image, by detecting edges, textures, and visual patterns of the object; Fig. 3 and ¶0143-0144, at least disclose feature pyramid network (FPN) can be created in process 304 that extracts features of each dimension and upsamples the feature data using upsampling techniques in process 310 […] In FIG. 3 , input 310 shows the upsampling process of image data whereas, output 310-1, 310-2, 310-3 shows the upsampling process of each detected object of each tile of each image frame).
It would have been obvious to one of ordinary in the art before the effective filing date of the claimed invention to have modified Kim to incorporate the teachings of Krupani, and apply the Deep Learning Neural Network into Kim’s teachings for generating a first image change data by using a pre-trained first proxy network and a second image change data by using a pre-trained second proxy network based on the first image and the second image.
Doing so would provide an object detection system, wherein the motion data is determined based on optical flow.

Regarding claim 18, Kim in view of Krupani, discloses a non-transitory computer-readable storage medium storing computer-executable instructions stored therein, wherein the computer-executable instructions, when executed by a processor (Kim- ¶0018, at least discloses a program stored in a computer-readable recording medium configured to execute a video prediction method;  ¶0081, at least discloses Computer-readable recording media on which programs for implementing operations by the video prediction method according to embodiments are recorded include all types of recording devices that store data that can be read by a computer. Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disk, and optical data storage devices […] computer-readable recording media may be distributed across computer systems connected to a network, and computer-readable codes may be stored and executed in a distributed manner. Additionally, functional programs, codes, and code segments), cause the processor to perform a method (Krupani- ¶0005, at least discloses a memory storing instructions that, when executed by one or more processors, cause the one or more processors to: process a first set of image data based on the first image frame, process a second set of image data based on the first image frame and the second image frame; Fig. 1 and ¶0125, at least disclose the image processing system 102 may include one or more processors), the method comprising the method of claim 1.


6.	Claims 2-5, 7-10 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Kim in view of Krupani, further in view of Wang et al., (“Wang”) [US-2021/0383169-A1]
Regarding claim 2, Kim in view of Krupani, discloses the method of claim 1, and further discloses wherein the first proxy network is a network that is previously trained to generate third image change data by inputting a third image and a fourth image, which are included in a pre-obtained first image dataset and generated based on a second time interval (Kim- ¶0008, at least discloses a video prediction device that predicts the movement of at least one object in a video frame and generates a predicted video frame, the device including an object movement analysis unit that analyzes the movement of the object over time to determine the type of movement of the object, an object movement prediction unit that learns the movement over time according to the determined type of movement and predicts the object movement; Fig. 8 and ¶0066, at least disclose the loss function application unit (130) first calculates the optical flow that occurs when moving from the current time frame to the next time frame [third image and a fourth image] […] According to FIG. 8, (a) in FIG. 8 is a vector representation of the optical flow result of the predicted frame occurring between time t and t+1, and (b) in FIG. 8 is the result of expressing the optical flow result of the predicted frame occurring between time t and t+1 [second time interval];  Krupani- ¶0114-0115, at least disclose implementing a network model, such as a Deep Learning Neural Network architecture, which is trained to detect the target objects based on high-resolution image data;  Fig. 3 and ¶0137-0139, at least disclose a training process of a NN model of an image processing system […] The YOLO backbone can be a convolutional neural network that pools image pixels to form features at different resolutions. The YOLO backbone can be pretrained on a classification dataset, such as ImageNet, in order to reduce the size of the input data while maintaining relevant features […] FIG. 3 illustrates an architecture 300 than can take image data (e.g., differential image data 114, threshold motion data 116) as input at 302 and employs a deep neural network model or a convolutional neural network to detect motion of the target object in the image […] The backbone 302 can processes each tile and image frame sequentially, concurrently, or a combination of, in the convolutional layers of the CNN model. The CNN model extracts feature representations from different resolutions of the input image. The input tiles and image frames undergo a series of convolutions and pooling operations in the convolutional layer and filters to analyze image, by detecting edges, textures, and visual patterns of the object), and to generate image change data in which  the first loss function based on first label data and the third image change data, wherein the first label data is extracted from the third image and the fourth image (Kim- ¶0063, at least discloses The loss function application unit (130) determines the difference between the predicted object movement and the actual object movement by applying a preset loss function; ¶0066, at least discloses The loss function application unit (130) can produce a two-dimensional vector value that is a change in the x and y axes that occurs when moving from a frame of time t to a frame of time t+1;  Krupani- ¶0059, at least discloses  at least one loss function includes at least one of class loss, box loss, or objectness loss; ¶0063, at least discloses each of the plurality of object images includes a label associated with the object represented in corresponding object image, the label including the type of object and the particular scenario of the object, and wherein the network model is trained by processing each of the objects and the corresponding label in each of the plurality of object images;  ¶0137-0139, at least disclose During YOLO loss, the predictions and classifications can be scrutinized using three loss functions, such as for class, box, and/or objectness […] Each of the images in the image data can be labeled with the particular type of object (e.g., rotor UAVs, wing UAVs, birds, airplanes, etc.) represented in the image, and in some instances, can include scenarios associated with the object in the image. The model can be trained with this image data, and corresponding labels, to be able to distinguish between different objects in a variety of different scenarios […] The backbone 302 can processes each tile and image frame sequentially, concurrently, or a combination of, in the convolutional layers of the CNN model. The CNN model extracts feature representations from different resolutions of the input image. The input tiles and image frames undergo a series of convolutions and pooling operations in the convolutional layer and filters to analyze image, by detecting edges, textures, and visual patterns of the object), the third image and the fourth image being in a relationship of consecutive frames with each other (Kim- Fig. 8 and ¶0066, at least disclose According to FIG. 8, (a) in FIG. 8 is a vector representation of the optical flow result of the predicted frame occurring between time t and t+1, and (b) in FIG. 8 is the result of expressing the optical flow result of the predicted frame occurring between time t and t+1 [relationship of consecutive frames]. This is the result of expressing the optical flow result of the actual frame that occurred as a vector), and
wherein the second proxy network is a network that is previously trained to generate fourth image change data by inputting a fifth image and a sixth image, which are included in a pre-obtained second image dataset and generated based on a third time interval (Kim- ¶0008, at least discloses a video prediction device that predicts the movement of at least one object in a video frame and generates a predicted video frame, the device including an object movement analysis unit that analyzes the movement of the object over time to determine the type of movement of the object, an object movement prediction unit that learns the movement over time according to the determined type of movement and predicts the object movement; Fig. 8 and ¶0066, at least disclose the loss function application unit (130) first calculates the optical flow that occurs when moving from the current time frame to the next time frame [third image and a fourth image] […] According to FIG. 8, (a) in FIG. 8 is a vector representation of the optical flow result of the predicted frame occurring between time t and t+1, and (b) in FIG. 8 is the result of expressing the optical flow result of the predicted frame occurring between time t and t+1 [second time interval];  Krupani- ¶0114-0115, at least disclose implementing a network model, such as a Deep Learning Neural Network architecture, which is trained to detect the target objects based on high-resolution image data;  Fig. 3 and ¶0137-0139, at least disclose a training process of a NN model of an image processing system […] The YOLO backbone can be a convolutional neural network that pools image pixels to form features at different resolutions. The YOLO backbone can be pretrained on a classification dataset, such as ImageNet, in order to reduce the size of the input data while maintaining relevant features […] FIG. 3 illustrates an architecture 300 than can take image data (e.g., differential image data 114, threshold motion data 116) as input at 302 and employs a deep neural network model or a convolutional neural network to detect motion of the target object in the image […] The backbone 302 can processes each tile and image frame sequentially, concurrently, or a combination of, in the convolutional layers of the CNN model. The CNN model extracts feature representations from different resolutions of the input image. The input tiles and image frames undergo a series of convolutions and pooling operations in the convolutional layer and filters to analyze image, by detecting edges, textures, and visual patterns of the object), and to generate image change data in which  the second loss function based on second label data and the fourth image change data (Kim- ¶0063, at least discloses The loss function application unit (130) determines the difference between the predicted object movement and the actual object movement by applying a preset loss function; ¶0066, at least discloses The loss function application unit (130) can produce a two-dimensional vector value that is a change in the x and y axes that occurs when moving from a frame of time t to a frame of time t+1;  Krupani- ¶0059, at least discloses  at least one loss function includes at least one of class loss, box loss, or objectness loss; ¶0063, at least discloses each of the plurality of object images includes a label associated with the object represented in corresponding object image, the label including the type of object and the particular scenario of the object, and wherein the network model is trained by processing each of the objects and the corresponding label in each of the plurality of object images;  ¶0137-0138, at least disclose During YOLO loss, the predictions and classifications can be scrutinized using three loss functions, such as for class, box, and/or objectness […] Each of the images in the image data can be labeled with the particular type of object (e.g., rotor UAVs, wing UAVs, birds, airplanes, etc.) represented in the image, and in some instances, can include scenarios associated with the object in the image. The model can be trained with this image data, and corresponding labels, to be able to distinguish between different objects in a variety of different scenarios […] The backbone 302 can processes each tile and image frame sequentially, concurrently, or a combination of, in the convolutional layers of the CNN model. The CNN model extracts feature representations from different resolutions of the input image. The input tiles and image frames undergo a series of convolutions and pooling operations in the convolutional layer and filters to analyze image, by detecting edges, textures, and visual patterns of the object), wherein the second label data is extracted from the fifth image and the sixth image, the fifth image and the sixth image being in a relationship of consecutive frames with each other (Kim- Fig. 8 and ¶0066, at least disclose According to FIG. 8, (a) in FIG. 8 is a vector representation of the optical flow result of the predicted frame occurring between time t and t+1, and (b) in FIG. 8 is the result of expressing the optical flow result of the predicted frame occurring between time t and t+1 [relationship of consecutive frames]. This is the result of expressing the optical flow result of the actual frame that occurred as a vector).
The prior art does not explicitly disclose a first loss function is minimized by calculating the first loss function; and a second loss function is minimized by calculating the second loss function.
However, Wang discloses
a loss function is minimized by calculating the loss function (Wang- ¶0050, at least discloses The network model minimizes the loss function by using the gradient descent method to adjust the weight parameters in the network layer by layer, and improves the accuracy of the network through frequent iterative training; ¶0123, at least discloses the present specification also provides a multi-step perceptual loss function to train a pyramid deep learning model […] Through a large amount of data training (the difference between the intermediate frame and the real intermediate frame is generated by the loss function comparison, the difference is propagated back to the network, and the weight parameter of the network is modified, so that the intermediate frame and the real intermediate frame are increasingly approached). Finally, a deep learning network with multiple frames as input and intermediate frames between multiple frames can be obtained). 
It would have been obvious to one of ordinary in the art before the effective filing date of the claimed invention to have modified Kim/Krupani to incorporate the teachings of Wang, and apply the network model minimizes the loss function into Kim/Krupani’s teachings in order to generate image change data in which a first loss function is minimized by calculating the first loss function based on first label data and the third image change data, wherein the first label data is extracted from the third image and the fourth image, the third image and the fourth image being in a relationship of consecutive frames with each other, and to generate image change data in which a second loss function is minimized by calculating the second loss function based on second label data and the fourth image change data, wherein the second label data is extracted from the fifth image and the sixth image, the fifth image and the sixth image being in a relationship of consecutive frames with each other.
Doing so would adopt a pyramid refinement strategy to effectively estimate the motion information and the occlusion region, thereby greatly improving the quality of the intermediate frame.

Regarding claim 3, Kim in view of Krupani and Wang, discloses the method of claim 2, and discloses the method further comprising:
extracting fifth image change data as label data from the first image and the second image (Kim- ¶0008, at least discloses a video prediction device that predicts the movement of at least one object in a video frame and generates a predicted video frame, the device including an object movement analysis unit that analyzes the movement of the object over time to determine the type of movement of the object, an object movement prediction unit that learns the movement over time according to the determined type of movement and predicts the object movement;  ¶0066, at least disclose the loss function application unit (130) first calculates the optical flow that occurs when moving from the current time frame to the next time frame;  Krupani- ¶0063, at least discloses each of the plurality of object images includes a label associated with the object represented in corresponding object image, the label including the type of object and the particular scenario of the object, and wherein the network model is trained by processing each of the objects and the corresponding label in each of the plurality of object images); and
calculating a third loss function based on the first image change data and the fifth image change data, and calculating a fourth loss function based on the second image change data and the fifth image change data (Kim- ¶0063, at least discloses The loss function application unit (130) determines the difference between the predicted object movement and the actual object movement by applying a preset loss function; ¶0066, at least discloses The loss function application unit (130) can produce a two-dimensional vector value that is a change in the x and y axes that occurs when moving from a frame of time t to a frame of time t+1;  Krupani- ¶0059, at least discloses  at least one loss function includes at least one of class loss, box loss, or objectness loss; ¶0137-0139, at least disclose During YOLO loss, the predictions and classifications can be scrutinized using three loss functions, such as for class, box, and/or objectness […] The backbone 302 can processes each tile and image frame sequentially, concurrently, or a combination of, in the convolutional layers of the CNN model. The CNN model extracts feature representations from different resolutions of the input image. The input tiles and image frames undergo a series of convolutions and pooling operations in the convolutional layer and filters to analyze image, by detecting edges, textures, and visual patterns of the object).
It would have been obvious to one of ordinary in the art before the effective filing date of the claimed invention to have modified Kim/Wang to incorporate the teachings of Krupani, and apply the loss functions into Kim/Wang’s teachings for extracting fifth image change data as label data from the first image and the second image; and calculating a third loss function based on the first image change data and the fifth image change data, and calculating a fourth loss function based on the second image change data and the fifth image change data.
The same motivation that was utilized in the rejection of claim 1 applies equally to this claim.

Regarding claim 4, Kim in view of Krupani and Wang, discloses the method of claim 3, and discloses the method further comprising:
updating the data parameters based on the third loss function and the fourth loss function (Kim- ¶0063, at least discloses the loss function application unit 130 first calculates the optical flow that occurs when moving from the current time frame to the next time frame. The loss function application unit 130 can calculate a two-dimensional vector value that is the change in the x and y axes that occurs when moving from the t time frame to the t+1 time frame;  ¶0066, at least discloses the loss function application unit 130 first calculates the optical flow that occurs when moving from the current time frame to the next time frame. The loss function application unit 130 can calculate a two-dimensional vector value that is the change in the x and y axes that occurs when moving from the t time frame to the t+1 time frame, and an example result is shown in FIG. 8. . According to FIG. 8, (a) in FIG. 8 is a vector representation of the optical flow result of the predicted frame occurring between time t and t+1, and (b) in FIG. 8 is the result of expressing the optical flow result of the predicted frame occurring between time t and t+1. This is the result of expressing the optical flow result of the actual frame that occurred as a vector;   Krupani- ¶0058, at least discloses determining at least one loss function associated with each of the one or more targeted objects detected; and re-training the network model based on the at least one loss function;  ¶0137, at least discloses During YOLO loss, the predictions and classifications can be scrutinized using three loss functions, such as for class, box, and/or objectness;  Wang- ¶0123, at least discloses a multi-step perceptual loss function to train a pyramid deep learning model).

Regarding claim 5, Kim in view of Krupani and Wang, discloses the method of claim 4, and further discloses wherein the updating the data parameters (see Claim 4 rejection for detailed analysis), updating the data parameters (see Claim 4 rejection for detailed analysis) by calculating a task loss based on the third loss function and the fourth loss function to update the data parameters (Kim- ¶0013, at least discloses the loss function application unit calculates the difference in optical flow between the predicted video frame and the actual video frame, converts it into a scalar loss, and applies the optical flow loss and the MSE (Mean Square Error) loss function. The calculated MSE loss may be weighted and summed to determine the difference between the predicted object motion and the actual object motion).

Regarding claim 7, Kim in view of Krupani and Wang, discloses the method of claim 4, and further discloses wherein the second proxy network is previously trained based on a larger amount of training data than training data used to train the first proxy network (Kim- ¶0002, at least discloses The accuracy of deep learning models for detecting and separating objects in images is increasing […] This research is called video prediction, and there are several fields such as Abnormal Detection, which detects abnormal situations, and learning games by combining reinforcement learning and DCNN (Deep Convolutional Neural Network) in games;   Krupani- ¶0114-0115, at least disclose features for detection of tiny objects (referred to as UAVs in the disclosure) and motion of the object within a field of view of one or more cameras, such as by implementing a network model, such as a Deep Learning Neural Network architecture, which is trained to detect the target objects based on high-resolution image data;   Wang- ¶0044, at least discloses When training, a large amount of video frame data is required. Each set of video frame data is a video frame training sample, and the video frame training sample includes an even number of video frames, at least two, and four or more are better; ¶0123, at least discloses Through a large amount of data training (the difference between the intermediate frame and the real intermediate frame is generated by the loss function comparison, the difference is propagated back to the network, and the weight parameter of the network is modified, so that the intermediate frame and the real intermediate frame are increasingly approached)).
It would have been obvious to one of ordinary in the art before the effective filing date of the claimed invention to have modified Kim/Krupani to incorporate the teachings of Wang, and apply the large amount of data training into Kim/Krupani’s teachings in order the second proxy network is previously trained based on a larger amount of training data than training data used to train the first proxy network.
Doing so would adopt a pyramid refinement strategy to effectively estimate the motion information and the occlusion region, thereby greatly improving the quality of the intermediate frame.

Regarding claim 8, Kim in view of Krupani and Wang, discloses the method of claim 7, and further discloses wherein the updating the data parameters (see Claim 4 rejection for detailed analysis), updating the data parameters (see Claim 4 rejection for detailed analysis) by minimizing the third loss function (Wang- ¶0050, at least discloses The network model minimizes the loss function by using the gradient descent method to adjust the weight parameters in the network layer by layer, and improves the accuracy of the network through frequent iterative training; ¶0123, at least discloses the present specification also provides a multi-step perceptual loss function to train a pyramid deep learning model […] Through a large amount of data training (the difference between the intermediate frame and the real intermediate frame is generated by the loss function comparison, the difference is propagated back to the network, and the weight parameter of the network is modified, so that the intermediate frame and the real intermediate frame are increasingly approached). Finally, a deep learning network with multiple frames as input and intermediate frames between multiple frames can be obtained) and by maximizing the fourth loss function (Krupani- ¶0150, at least discloses The class loss 312-1 measures the accuracy of class predictions for the detected object based on the maximum likelihood for each potential feature).

Regarding claim 9, Kim in view of Krupani and Wang, discloses the method of claim 8, and further discloses wherein the data parameters include at least one of color perturbation, geometric warping, flow field translation, and real world effects (Kim- Fig. 8 shows optical flow; ¶0013, at least discloses the loss function application unit may determine the difference between the predicted object movement and the actual object movement by calculating the optical flow difference between the predicted video frame and the actual video frame and converting it into a scalar-type loss, and adding the optical flow loss and the MSE (Mean Square Error) loss calculated by applying the MSE loss function, with weights;  ¶0066, at least discloses the loss function application unit 130 first calculates the optical flow that occurs when moving from the current time frame to the next time frame;   Wang- ¶0082, at least discloses processing the video frame inputted to the K-th level by using each optical flow in the first optical flow set to generate a first warped image set;  Fig. 2 and ¶0127, at least disclose Four ¼ downsampled video frames are deformed by optical flow to generate four ¼ resolution warped images).

Regarding claim 10, Kim in view of Krupani and Wang, discloses the method of claim 9, and further discloses wherein the real world effects include at least one of texture noise, fog, and motion blur (Kim- ¶0030, at least discloses the probabilistic walking movement can be predicted as a value obtained by adding a probabilistic component, which is white noise, to the last value, as in the mathematical expression;  Krupani- ¶0130, at least discloses the motion blurred image 201-1, 201-2 can display image changes 203 between different images).

Regarding claim 12, Kim in view of Krupani and Wang, discloses the method of claim 8, and further discloses wherein image change data generated by the first proxy network and the second proxy network (see Claim 1 rejection for detailed analysis) are optical flow data (Kim- Fig. 8 shows optical flow; ¶0013, at least discloses the loss function application unit may determine the difference between the predicted object movement and the actual object movement by calculating the optical flow difference between the predicted video frame and the actual video frame and converting it into a scalar-type loss, and adding the optical flow loss and the MSE (Mean Square Error) loss calculated by applying the MSE loss function, with weights;  ¶0066, at least discloses the loss function application unit 130 first calculates the optical flow that occurs when moving from the current time frame to the next time frame).


7.	Claims 11 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Kim in view of Krupani, further in view of Wang et al., (“Wang”) [US-2021/0383169-A1], still further in view of Kearney et al., (“Kearney”) [US-2022/0180447-A1]
Regarding claim 11, Kim in view of Krupani and Wang, discloses the method of claim 8, and does not explicitly disclose, but Kearney discloses wherein the method is repeatedly performed until the third loss function is minimized (Kearney- ¶0173-0181, at least disclose (Step 3) Loss functions LF1 and LF2 are evaluated. Loss function LF1 is low when the output 616 indicates that the synthetic image 622 is real and that the target domain image 606 is fake. Since the output 616 is a matrix, the loss function LF1 may be a function of the multiple values (average, most frequently occurring value, etc.). Loss function LF2 is low when the output 616 indicates that the synthetic image 622 is fake and that the target domain image 606 is real […] (Step 5) A loss function LF3 is evaluated according to a comparison of the synthetic source domain image 624 and the source domain image 604 that was input to the generator 612 at Step 1. The loss function LF3 decreases with similarity of the images 604, 622 […] (Step 7) The output 626 of the discriminator 620, which may be a realism matrix, is evaluated with respect to a loss function LF4 and a loss function LF5. Loss function LF4 is low when the output 626 indicates that the synthetic image 624 is real and that the source domain image 604 is fake. Since the output 626 is a matrix, the loss function LF4 may be a function of the multiple values (average, most frequently occurring value, etc.). Loss function LF5 is low when the output 626 indicates that the synthetic image 624 is fake and that the source domain image 604 is real […] (Step 9) A loss function LF6 is evaluated according to a comparison of the synthetic target domain image 622 from Step 8 and the target domain image 606 that was input to the generator 618 at Step 6. The loss function LF6 decreases with similarity of the images 606, 622 […] (Step 10) Model parameters of the generators 612, 618 and the discriminators 614, 620 are tuned according to the outputs of the loss functions LF1, LF2, LF3, LF4, LF5, LF6, and LF7 […] Steps 1 through 10 may be repeated until an ending condition is reached, such as when the discriminators 616, 620 can no longer distinguish between synthetic and real images (e.g., only correct 50 percent of the time), a Nash equilibrium is reached, or some other ending condition is reached;  Fig. 7 and ¶0196-0198, at least disclose (Step 3) The loss functions 708 are evaluated. This may include a loss function LF1 based on the realism matrix output at Step 1 such that the output of LF1 decreases with increase in the number of values of the realism matrix that indicate that the synthetic labels 726 are real. Step 3 may also include evaluating a loss function LF2 based on the realism matrix such that the output of LF2 decreases with increase in the number of values of the realism matrix that indicate that the synthetic labels 726 are fake […] Steps 1 through 4 may be repeated such that the generator 712, discriminator 714, and classifier 718 are trained simultaneously. Steps 1 through 4 may continue to be repeated until an end condition is reached, such as until loss function LF3 meets a minimum value or other ending condition and LF2 is such that the discriminator 714 identifies the synthetic labels 726 as real 50 percent of the time or Nash equilibrium is reached).
It would have been obvious to one of ordinary in the art before the effective filing date of the claimed invention to have modified Kim/Krupani/Wang to incorporate the teachings of Kearney, and apply the loss functions into Kim/Krupani/Wang’s teachings in order the method is repeatedly performed until the third loss function is minimized.
Doing so would provide a diagnosis hierarchy to determine whether a treatment is appropriate, and the most probable claim adjudication outcome.

Regarding claim 13, Kim in view of Krupani, Wang and Kearney, discloses the method of claim 11, and discloses the method further comprising:
training the first proxy network or the second proxy network to generate image change data (see Claim 2 rejection for detailed analysis) based on images generated while the method is repeatedly performed, when the third loss function is minimized (Kearney- ¶0173-0181, at least disclose (Step 3) Loss functions LF1 and LF2 are evaluated. Loss function LF1 is low when the output 616 indicates that the synthetic image 622 is real and that the target domain image 606 is fake. Since the output 616 is a matrix, the loss function LF1 may be a function of the multiple values (average, most frequently occurring value, etc.). Loss function LF2 is low when the output 616 indicates that the synthetic image 622 is fake and that the target domain image 606 is real […] (Step 5) A loss function LF3 is evaluated according to a comparison of the synthetic source domain image 624 and the source domain image 604 that was input to the generator 612 at Step 1. The loss function LF3 decreases with similarity of the images 604, 622 […] (Step 7) The output 626 of the discriminator 620, which may be a realism matrix, is evaluated with respect to a loss function LF4 and a loss function LF5. Loss function LF4 is low when the output 626 indicates that the synthetic image 624 is real and that the source domain image 604 is fake. Since the output 626 is a matrix, the loss function LF4 may be a function of the multiple values (average, most frequently occurring value, etc.). Loss function LF5 is low when the output 626 indicates that the synthetic image 624 is fake and that the source domain image 604 is real […] (Step 9) A loss function LF6 is evaluated according to a comparison of the synthetic target domain image 622 from Step 8 and the target domain image 606 that was input to the generator 618 at Step 6. The loss function LF6 decreases with similarity of the images 606, 622 […] (Step 10) Model parameters of the generators 612, 618 and the discriminators 614, 620 are tuned according to the outputs of the loss functions LF1, LF2, LF3, LF4, LF5, LF6, and LF7 […] Steps 1 through 10 may be repeated until an ending condition is reached, such as when the discriminators 616, 620 can no longer distinguish between synthetic and real images (e.g., only correct 50 percent of the time), a Nash equilibrium is reached, or some other ending condition is reached;  Fig. 7 and ¶0196-0198, at least disclose (Step 3) The loss functions 708 are evaluated. This may include a loss function LF1 based on the realism matrix output at Step 1 such that the output of LF1 decreases with increase in the number of values of the realism matrix that indicate that the synthetic labels 726 are real. Step 3 may also include evaluating a loss function LF2 based on the realism matrix such that the output of LF2 decreases with increase in the number of values of the realism matrix that indicate that the synthetic labels 726 are fake […] Steps 1 through 4 may be repeated such that the generator 712, discriminator 714, and classifier 718 are trained simultaneously. Steps 1 through 4 may continue to be repeated until an end condition is reached, such as until loss function LF3 meets a minimum value or other ending condition and LF2 is such that the discriminator 714 identifies the synthetic labels 726 as real 50 percent of the time or Nash equilibrium is reached).


8.	Claims 14-17 are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al., (machine translation of [KR-20230076721-A] cited by applicant, hereinafter "Kim") in view of Krupani et al., (“Krupani”) [US-2025/0061591-A1], further in view of Kearney et al., (“Kearney”) [US-2022/0180447-A1]
Regarding claim 14, Kim discloses a method performed by an image change data generating apparatus (Kim- ¶0001, at least discloses a video prediction device, method, and a computer-readable program therefor that predict the movement of at least one object within a video frame [an image change data] and generate a predicted video frame; Fig. 1 and ¶0024, at least disclose a video prediction device (100) that generates a predicted video frame by predicting the movement of at least one object within a video frame), the method comprising:
inputting data parameters to a data generator (Kim- ¶0008, at least discloses a video prediction device [data generator] that predicts the movement of at least one object in a video frame and generates a predicted video frame, the device including an object movement analysis unit that analyzes the movement of the object over time to determine the type of movement of the object, an object movement prediction unit that learns the movement over time according to the determined type of movement and predicts the object movement; Fig. 8 shows an example of optical flow in a predicted video frame and an actual video frame; ¶0013, at least discloses calculating the optical flow [data parameters] difference between the predicted video frame and the actual video frame and converting it into a scalar-type loss; ¶0066, at least discloses a result of expressing the optical flow result of a predicted frame that occurred between time t and t+1 as a vector);
inputting a first  image or a second  image to a data generator (Kim- ¶0008, at least discloses a video prediction device [data generator] that predicts the movement of at least one object in a video frame [first image] and generates a predicted video frame, the device including an object movement analysis unit that analyzes the movement of the object over time to determine the type of movement of the object, an object movement prediction unit that learns the movement over time according to the determined type of movement and predicts the object movement);
generating a first image and a second image respectively, within a first time interval by using the data generator based on the data parameters and the first  image or the second  image (Kim- ¶0008, at least discloses a video prediction device that predicts the movement of at least one object in a video frame and generates a predicted video frame; ¶0024, at least discloses video prediction device (100) that generates a predicted video frame by predicting the movement of at least one object within a video frame;  Fig. 8 and ¶0066, at least disclose the loss function application unit (130) first calculates the optical flow [data parameters] that occurs when moving from the current time frame to the next time frame [generating a first image and a second image] […] According to FIG. 8, (a) in FIG. 8 is a vector representation of the optical flow result of the predicted frame occurring between time t and t+1, and (b) in FIG. 8 is the result of expressing the optical flow result of the predicted frame occurring between time t and t+1 [generating a first image and a second image respectively, within a first time interval]. This is the result of expressing the optical flow result of the actual frame that occurred as a vector), the first image and the second image being in a relationship of consecutive frames with each other (Kim- Fig. 8 and ¶0066, at least disclose According to FIG. 8, (a) in FIG. 8 is a vector representation of the optical flow result of the predicted frame occurring between time t and t+1, and (b) in FIG. 8 is the result of expressing the optical flow result of the predicted frame occurring between time t and t+1 [a relationship of consecutive frames with each other]. This is the result of expressing the optical flow result of the actual frame that occurred as a vector);
generating image change data by using an  image change data generating network based on the data parameters and the first image and the second image (Kim- ¶0002, at least discloses The accuracy of deep learning models for detecting and separating objects in images is increasing […] This research is called video prediction, and there are several fields such as Abnormal Detection, which detects abnormal situations, and learning games by combining reinforcement learning and DCNN (Deep Convolutional Neural Network) in games;  ¶0008, at least discloses a video prediction device that predicts the movement of at least one object in a video frame and generates a predicted video frame, the device including an object movement analysis unit that analyzes the movement of the object over time to determine the type of movement of the object, an object movement prediction unit that learns the movement over time according to the determined type of movement and predicts the object movement;  ¶0066, at least disclose the loss function application unit (130) first calculates the optical flow that occurs when moving from the current time frame to the next time frame [data parameters and the first image and the second image]);
Kim does not explicitly disclose a first environment image or a second environment image;  generating image change data by using an pre-trained image change data generating network; outputting a discrimination result value by inputting the image change data into a pre-trained discriminator; and updating the data parameters based on the discrimination result value.
However, Krupani discloses 
a first environment image or a second environment image (Krupani- ¶0120-0121, at least disclose The video data 104 comprises a plurality of images 108, such as 108-1, 108-2, . . . 108-N (collectively “images”), in which a UAV 110 can be moving […] The differential image data 114 can be determined by comparing changes between two different image frames, e.g. images 108-1 and 108-2, across each pixel;  Fig. 1 and ¶0136, at least disclose in environment 100, the image processing system 102 can be configured to generate RGB data 112 as differential RGB data by using differences between RGB data of successive or subsequent frames of the video data 104 […] the differential RGB data can include multiple sets of differential data, in comparison to greyscale differential data, such as by obtaining a difference between the red data of image 108-1 and the red data of image 108-2, obtaining a difference between the blue data of image 108-1 and the blue data of image 108-2, and obtaining a difference between the green data of image 108-1 and the green data of image 108-2;  ¶0149, at least discloses The model can be trained to dynamically adapt the variations in target object appearance, motion, scale, lighting conditions, environmental conditions, etc. making it well-suited for the real world where the object can be constantly moving, and the location can be changing over time).
It would have been obvious to one of ordinary in the art before the effective filing date of the claimed invention to have modified Kim to incorporate the teachings of Krupani, and apply the environment images into Kim’s teachings for inputting a first environment image or a second environment image to a data generator; generating a first image and a second image respectively, within a first time interval by using the data generator based on the data parameters and the first environment image or the second environment image, the first image and the second image being in a relationship of consecutive frames with each other.
Doing so would provide an object detection system, wherein the motion data is determined based on optical flow.
The prior art does not explicitly disclose, but Kearney discloses
generating image change data by using an pre-trained image change data generating network (Kearney- ¶0316, at least discloses The method 1700 may include returning 1808 gradients obtained during the training at step 1806 to the server system 1500. As known in the art, the weights and other parameters of a machine learning model may be selected according to gradients. These gradients change over time in response to evaluation of a loss function with respect to a prediction from the machine learning model in response to an input of a training data entry and a desired prediction indicated in the training data entry;  Fig. 32C and ¶0432, at least disclose The machine learning models 3220 a, 3220 b may have the same structure as the CNN 3220 as described above and may be pretrained as described above for the CNN 3220 or may be exclusively trained;  Fig. 32D and ¶0436-0437, at least disclose Referring to FIG. 32D, the illustrated system 3200 d may include a CNN 3220 that may be structured as the CNN 3220 described above. The CNN 3220 may be pretrained as described above with respect to FIG. 32B or may be trained exclusively using the approach described below with respect to FIG. 32D. The approach of FIG. 32D makes use of triplet loss to train the CNN 3220 […] Training data entries for training the CNN 3220 may be the same as described above except for training data entries may include a group of three images 3204, each with one or more corresponding labels 3206-3212; ¶0502, at least discloses The image may also be blurred with gaussian smoothing kernel or motion blur);
outputting a discrimination result value by inputting the image change data into a pre-trained discriminator (Kearney- Fig. 5 and ¶0160, at least disclose The discriminator 514 produces as an output 524 a realism matrix that is an attempt to differentiate between real and fake images. The realism matrix is a matrix of values, each value being an estimate as to which of the two input images is real;  Fig. 12 and ¶0258, at least disclose The loss functions 1208 may be implementing using weighted L1 loss between the synthetic image 1216 and input image 1204 without masking […] The discriminator 1214 may be pretrained in some embodiments such that it is not updated during training and only the generator 1212 is trained;  Fig. 32C and ¶0432-0433, at least disclose The machine learning models 3220 a, 3220 b may have the same structure as the CNN 3220 as described above and may be pretrained as described above for the CNN 3220 or may be exclusively trained using the approach described below. Each machine learning model 3220 a, 3220 b takes as inputs an image 3204 a, 3204 b, respectively, each with one or more corresponding labels 3206 a-3212 a, 3206 b-3212 b. The inputs are processed using each machine learning model 3220 a, 3220 b to obtain two sets of values 3222 a, 3222 b characterizing the inputs;  Fig. 34 and ¶0457, at least discloses the discriminator 3410 may be pretrained and is not further trained during training of the generator 3402 and discriminator 3408. The discriminator 3408 may output a realism matrix 3420 with each output of the realism matrix 3420 indicating which of the two input images 3416, 3418 is determined to be a real image by the discriminator 3408;  Fig. 38A and ¶0491, at least disclose the discriminator 3810 may be pretrained and is not further trained during training of the generator 3802 and discriminator 3808. The discriminator 3808 may output a realism matrix 3820 with each output of the realism matrix 3820 indicating which of the two input images 3816, 3818 is a real image); and
updating the data parameters based on the discrimination result value (Kearney- Fig. 6B and ¶0177, at least disclose The output 626 of the discriminator 620, which may be a realism matrix, is evaluated with respect to a loss function LF4 and a loss function LF5. Loss function LF4 is low when the output 626 indicates that the synthetic image 624 is real and that the source domain image 604 is fake. Since the output 626 is a matrix, the loss function LF4 may be a function of the multiple values (average, most frequently occurring value, etc.). Loss function LF5 is low when the output 626 indicates that the synthetic image 624 is fake and that the source domain image 604 is real;  Fig. 7 and ¶0189, at least disclose The discriminator 714 may have an output 716 embodied as a realism matrix that may be implemented as for other realism matrices in other embodiments as described above. The output of the generator 712 may also be input to a classifier 718 trained to produce an output 720 embodied as a tooth label, e.g. pixel mask labeling a portion of an input image estimated to include a tooth;  ¶0194-0196, at least disclose The discriminator 714 outputs a realism matrix with each value in the matrix being an estimate as to which of the input labels 726, 706 b is real […] (Step 3) The loss functions 708 are evaluated. This may include a loss function LF1 based on the realism matrix output at Step 1 such that the output of LF1 decreases with increase in the number of values of the realism matrix that indicate that the synthetic labels 726 are real. Step 3 may also include evaluating a loss function LF2 based on the realism matrix such that the output of LF2 decreases with increase in the number of values of the realism matrix that indicate that the synthetic labels 726 are fake).
It would have been obvious to one of ordinary in the art before the effective filing date of the claimed invention to have modified Kim/Krupani to incorporate the teachings of Kearney, and apply the discrimination result values into Kim/Krupani’s teachings for generating image change data by using a pre-trained image change data generating network based on the data parameters and the first image and the second image; outputting a discrimination result value by inputting the image change data into a pretrained discriminator; and updating the data parameters based on the discrimination result.
Doing so would provide a diagnosis hierarchy to determine whether a treatment is appropriate, and the most probable claim adjudication outcome.

Regarding claim 15, Kim in view of Krupani and Kearney, discloses the method of claim 14, and  further discloses wherein the discriminator is trained to determine whether image change data generated by using the pre-trained image change data generating network (see Claim 14 rejection for detailed analysis) is image change data generated using the first environmental image or the second environmental image (Kim- ¶0074, at least discloses the speed and direction of movement of objects in the image can be known through the difference between the direction of movement that should actually be observed and the direction of movement that is predicted, thereby increasing learning efficiency and speed;  Fig. 4 and ¶0152-0153, at least disclose The input images 404 may be raw 128×128 images […] the images 404 are 3D images, such as a CT scan;   Krupani- ¶0120-0121, at least disclose The video data 104 comprises a plurality of images 108, such as 108-1, 108-2, . . . 108-N (collectively “images”), in which a UAV 110 can be moving […] The differential image data 114 can be determined by comparing changes between two different image frames, e.g. images 108-1 and 108-2, across each pixel;  Fig. 1 and ¶0136, at least disclose in environment 100, the image processing system 102 can be configured to generate RGB data 112 as differential RGB data by using differences between RGB data of successive or subsequent frames of the video data 104 […] the differential RGB data can include multiple sets of differential data, in comparison to greyscale differential data, such as by obtaining a difference between the red data of image 108-1 and the red data of image 108-2, obtaining a difference between the blue data of image 108-1 and the blue data of image 108-2, and obtaining a difference between the green data of image 108-1 and the green data of image 108-2;  ¶0149, at least discloses The model can be trained to dynamically adapt the variations in target object appearance, motion, scale, lighting conditions, environmental conditions, etc. making it well-suited for the real world where the object can be constantly moving, and the location can be changing over time), and
wherein the discriminator outputs a first discrimination value when determining that the image change data generated by using the first environment image (Kearney- Fig. 5 and ¶0160, at least disclose The discriminator 514 produces as an output 524 a realism matrix that is an attempt to differentiate between real and fake images. The realism matrix is a matrix of values, each value being an estimate as to which of the two input images is real;  Fig. 12 and ¶0258, at least disclose The loss functions 1208 may be implementing using weighted L1 loss between the synthetic image 1216 and input image 1204 without masking […] The discriminator 1214 may be pretrained in some embodiments such that it is not updated during training and only the generator 1212 is trained;  Fig. 32C and ¶0432-0433, at least disclose The machine learning models 3220 a, 3220 b may have the same structure as the CNN 3220 as described above and may be pretrained as described above for the CNN 3220 or may be exclusively trained using the approach described below. Each machine learning model 3220 a, 3220 b takes as inputs an image 3204 a, 3204 b, respectively, each with one or more corresponding labels 3206 a-3212 a, 3206 b-3212 b. The inputs are processed using each machine learning model 3220 a, 3220 b to obtain two sets of values 3222 a, 3222 b characterizing the inputs;  Fig. 34 and ¶0457, at least discloses the discriminator 3410 may be pretrained and is not further trained during training of the generator 3402 and discriminator 3408. The discriminator 3408 may output a realism matrix 3420 with each output of the realism matrix 3420 indicating which of the two input images 3416, 3418 is determined to be a real image by the discriminator 3408;  Fig. 38A and ¶0491, at least disclose the discriminator 3810 may be pretrained and is not further trained during training of the generator 3802 and discriminator 3808. The discriminator 3808 may output a realism matrix 3820 with each output of the realism matrix 3820 indicating which of the two input images 3816, 3818 is a real image), and a second discrimination value when determining that the image change data generated by using the second environment image (Kearney- Fig. 5 and ¶0160, at least disclose The discriminator 514 produces as an output 524 a realism matrix that is an attempt to differentiate between real and fake images. The realism matrix is a matrix of values, each value being an estimate as to which of the two input images is real;  Fig. 12 and ¶0258, at least disclose The loss functions 1208 may be implementing using weighted L1 loss between the synthetic image 1216 and input image 1204 without masking […] The discriminator 1214 may be pretrained in some embodiments such that it is not updated during training and only the generator 1212 is trained;  Fig. 32C and ¶0432-0433, at least disclose The machine learning models 3220 a, 3220 b may have the same structure as the CNN 3220 as described above and may be pretrained as described above for the CNN 3220 or may be exclusively trained using the approach described below. Each machine learning model 3220 a, 3220 b takes as inputs an image 3204 a, 3204 b, respectively, each with one or more corresponding labels 3206 a-3212 a, 3206 b-3212 b. The inputs are processed using each machine learning model 3220 a, 3220 b to obtain two sets of values 3222 a, 3222 b characterizing the inputs;  Fig. 34 and ¶0457, at least discloses the discriminator 3410 may be pretrained and is not further trained during training of the generator 3402 and discriminator 3408. The discriminator 3408 may output a realism matrix 3420 with each output of the realism matrix 3420 indicating which of the two input images 3416, 3418 is determined to be a real image by the discriminator 3408;  Fig. 38A and ¶0491, at least disclose the discriminator 3810 may be pretrained and is not further trained during training of the generator 3802 and discriminator 3808. The discriminator 3808 may output a realism matrix 3820 with each output of the realism matrix 3820 indicating which of the two input images 3816, 3818 is a real image).
It would have been obvious to one of ordinary in the art before the effective filing date of the claimed invention to have modified Kim/Krupani to incorporate the teachings of Kearney, and apply the discrimination values into Kim/Krupani’s teachings in order the discriminator outputs a first discrimination value when determining that the image change data generated by using the first environment image, and a second discrimination value when determining that the image change data generated by using the second environment image.
The same motivation that was utilized in the rejection of claim 14 applies equally to this claim.

Regarding claim 16, Kim in view of Kearney, discloses the method of claim 15, and further discloses wherein the method is repeatedly performed so that the discrimination result value reaches the first discrimination value (Kearney- ¶0158-0160, at least disclose The discriminator 514 may include five multi-scale stages 522 […] The discriminator 514 produces as an output 524 a realism matrix that is an attempt to differentiate between real and fake images. The realism matrix is a matrix of values, each value being an estimate as to which of the two input images is real. The loss function 508 may then operate on an aggregation of the values in the realism matrix, e.g. average of the values, a most frequently occurring value of the values, or some other function. The closer the aggregation is to the correct conclusion (determining that the synthetic image 520 is fake), the lower the output of the loss function 508;   Fig. 6B and ¶0169-0181, at least disclose Steps 1 through 10 may be repeated until an ending condition is reached, such as when the discriminators 616, 620 can no longer distinguish between synthetic and real images (e.g., only correct 50 percent of the time), a Nash equilibrium is reached, or some other ending condition is reached).

Regarding claim 17, Kim in view of Kearney, discloses the method of claim 15, and further discloses wherein the pre-trained discriminator is pre-trained to determine whether image change data generated based on the first environment image or the second environment image, by using an adversarial loss function, to order to provide a basis for updating the data parameters (Kearney- Fig. 5 and ¶0158-0161, at least disclose the machine learning model 510 may be embodied as a generative adversarial network (GAN) including a generator 512 and a discriminator 514 […] The discriminator 514 may include five multi-scale stages 522 […] The discriminator 514 produces as an output 524 a realism matrix that is an attempt to differentiate between real and fake images. The realism matrix is a matrix of values, each value being an estimate as to which of the two input images is real. The loss function 508 may then operate on an aggregation of the values in the realism matrix, e.g. average of the values, a most frequently occurring value of the values, or some other function […] the loss functions 508 utilize level 1 (L1) loss to help maintain the spatial congruence of the synthetic image 520 and real image 506 and adversarial loss to encourage realism).


Allowable Subject Matter
9.	Claim 6 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
10.	The following is a statement of reasons for the indication of allowable subject matter:
Regarding Claim 6, the combination of prior arts teaches the method of Claim 1.
However in the context of claim 1, 2, 3, 4, 5 and 6 as a whole, the combination of prior arts does
not teach wherein the task loss is calculated based on at least one of following equations:
			
    PNG
    media_image1.png
    119
    268
    media_image1.png
    Greyscale

			
    PNG
    media_image2.png
    183
    248
    media_image2.png
    Greyscale


where Ltask denotes the task loss, Ltarget denotes the third loss function that is a loss function for the first proxy network, Lbase denotes the fourth loss function that is a loss function for the second proxy network, and α, β, y, € are hyperparameters.
Therefore, Claim 6 in the context of claim 1, 2, 3, 4, 5 as a whole does comprise allowable subject matter.






Conclusion
11.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. They are as recited in the attached PTO-892 form.
12.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL LE whose telephone number is (571)272-5330. The examiner can normally be reached 9am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kent Chang can be reached at (571) 272-7667. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/MICHAEL LE/Primary Examiner, Art Unit 2614
Read full office action
Prosecution Timeline

Feb 14, 2024
Application Filed
Jan 05, 2026
Non-Final Rejection — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/573,760
Patent 12579211
AUTOMATED SHIFTING OF WEB PAGES BETWEEN DIFFERENT USER DEVICES
2y 5m to grant Granted Mar 17, 2026
18/006,008
Patent 12579738
INFORMATION PRESENTING METHOD, SYSTEM THEREOF, ELECTRONIC DEVICE, AND COMPUTER-READABLE STORAGE MEDIUM
2y 5m to grant Granted Mar 17, 2026
18/405,933
Patent 12579072
GRAPHICS PROCESSOR REGISTER FILE INCLUDING A LOW ENERGY PORTION AND A HIGH CAPACITY PORTION
2y 5m to grant Granted Mar 17, 2026
18/203,183
Patent 12573094
COMPRESSION AND DECOMPRESSION OF SUB-PRIMITIVE PRESENCE INDICATIONS FOR USE IN A RENDERING SYSTEM
2y 5m to grant Granted Mar 10, 2026
18/412,614
Patent 12558788
SYSTEM AND METHOD FOR REAL-TIME ANIMATION INTERACTIVE EDITING
2y 5m to grant Granted Feb 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
66%
Grant Probability
88%
With Interview (+22.1%)
3y 3m
Median Time to Grant
Low
PTA Risk
Based on 864 resolved cases by this examiner. Grant probability derived from career allow rate.
METHOD AND APPARATUS FOR GENERATING IMAGE CHANGE DATA

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email