DETAILED ACTIONS
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim this application being a National Stage of the International Application No. PCT/JP2020/019009, filed on May 12, 2020.
Information Disclosure Statement
The information disclosure statement (“IDS”) filed on 11/08/2022 was reviewed and the listed references were noted.
Drawings
The 13-page drawings have been considered and placed on record in the file.
Status of Claims
Claims 1-6 are pending.
Response to Amendment
The amendment filed 10/03/2025 has been entered. Claims 1-5 and new claim 6 remain pending.
Response to Arguments
Applicant's arguments filed 10/03/2025 have been fully considered but they are not persuasive.
On page 5 of the Remarks, Applicants contend that Huang does not teach or suggest “causing the second model to accept the feature vector for each frame image as input, and output a temporal interval between a frame image treated as a sequential reference and each of frame images other than the frame image treated as the sequential reference“ as required by amended independent claim 1. Applicants argue that the time loss in Huang represents the difference between frames which is not temporal/length of time related. The Examiner respectfully disagrees with this characterization of Huang and submits that the reference does indeed disclose the limitation in question.
Huang discloses that neural network in Figure 4 outputs intermediate images in accordance to the input frames. The input frames are sequential to each other as explained in [0032], “the early video frame in time sequence is a video frame having an earlier timestamp in the video frames that are adjacent in time; the later video frame in time sequence is a video frame having a later timestamp in the video frames that are adjacent in time. For example, the video frames that are adjacent in time are sequentially x1, x2, and x3 in time sequence. Therefore, x1 is the early video frame in time sequence relative to x2 and x3, x2 is the later video frame in time sequence relative to x1, and x2 is the early video frame in time sequence relative to x3”. Huang teaches obtaining a time loss between the an intermediate image that corresponds to the late video frame and the expected-intermediate image which corresponds to the early video time frame as shown in step S208 of Figure 2. In the Specification, [0112], of the instant application, the first frame image in the temporal sequence is treated as the reference image. Huang teaches this as the expected-intermediate image is based on the early video frame or first frame which is analogous to the sequential reference frame. In [0043] of Huang, the time loss is defines as “time loss may be used for representing a difference between change, in time domain, of the video frames that are adjacent in time and change, in time domain, between images obtained after the video frames that are adjacent in time are processed by the neural network model” which means it is a difference between time-domain which is in this case the length of time difference between the early video frame and the late video frame. Temporal means “relating to time’ and interval means “a space between two things”, therefore under BRI, the difference between two frames in time-domain is analogous to “temporal interval”. Therefore, Huang discloses the limitation “causing the second model to accept the feature vector for each frame image as input, and output a temporal interval between a frame image treated as a sequential reference and each of frame images other than the frame image treated as the sequential reference “.
On page 6 of the Remarks, Applicants contend that Huang does not teach or suggest “changing a temporal sequence of the frame images“ as required by dependent claim 1. Applicants argue what is being changed in Huang are the intermediate images, not temporal sequence. The Examiner respectfully disagrees with this characterization of Huang and submits that the reference does indeed disclose the limitation in question.
Huang discloses that the input frames are sequential to each other as explained in [0032], “the early video frame in time sequence is a video frame having an earlier timestamp in the video frames that are adjacent in time; the later video frame in time sequence is a video frame having a later timestamp in the video frames that are adjacent in time. For example, the video frames that are adjacent in time are sequentially x1, x2, and x3 in time sequence. Therefore, x1 is the early video frame in time sequence relative to x2 and x3, x2 is the later video frame in time sequence relative to x1, and x2 is the early video frame in time sequence relative to x3”. Changing the temporal sequences of the frame images means to rearrange the order of the frames. Huang teaches changing this sequence or the order of the frames before determining the time loss between them. In [0040] of Huang, changing the sequences of the intermediate images is taught, “in an embodiment of this application, the electronic device may alternatively change, according to optical flow information between two video frames that are not adjacent to each other in the video frames, an intermediate image corresponding to an early video frame in time sequence in the two video frames that are not adjacent to each other, to obtain an intermediate image anticipated to correspond to a later video frame in time sequence in the two video frames that are not adjacent to each other, For example, the video frames that are adjacent in time are sequentially x1, x2, and x3 in time sequence, and intermediate images of x1, x2, and x3 and output by the neural network model are correspondingly sequentially y1, y2, and y3. Optical flow information of change from x1 to x3 is g3. The electronic device may change y1 to z3 according to g3, and z3 is an intermediate image anticipated to correspond to x3”. Huang teaches changing the first frame or early video frame with x3 or the last frame which in result changes the sequences of the video frames. The Therefore, Huang discloses the limitation “changing a temporal sequence of the frame images “.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claims 1-5 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Huang et al., (US 20190228264 A1), hereinafter referred to as Huang.
Claim 1
Huang discloses a learning apparatus (Huang, Fig. 1B) comprising:
a memory including a first model and a second model (Huang, Fig. 1B, memory 104, Fig. 4, neural network model, time domain loss function, and evaluation network model); and
a processor (Huang, Fig. 1B, processor 102) configured to execute:
causing the first model (Huang, Fig. 4, neural network model) to accept a plurality of frame images included in a video as input (Huang, Fig. 2, step S202, obtain a plurality of video frames), and output a feature vector for each frame image (Huang, Fig. 4, the output of the neural network model, Abstract, “inputting the plurality of video frames through a neural network model so that the neural network model outputs intermediate images”);
causing the second model to accept the feature vector for each frame image as input (Huang, Fig. 4, the output of the neural network model is used by the time domain loss function as an input), and output a temporal interval between a frame image treated as a sequential reference and each of the frame images other than the frame image treated as the sequential reference (Huang, Abstract, “determining a time loss between an intermediate image corresponding to the later video frame and the expected-intermediate image”, expected-intermediate image is analogous to the reference frame, Fig. 2, Step S210, [0043], the time loss is defines as “time loss may be used for representing a difference between change, in time domain, of the video frames that are adjacent in time and change, in time domain, between images obtained after the video frames that are adjacent in time are processed by the neural network model” which means it is a difference between time-domain which is in this case the length of time difference between the early video frame and the late video frame. Temporal means “relating to time’ and interval means “a space between two things”, therefore under BRI, the difference between two frames in time-domain is analogous to “temporal interval”); and
updating parameters of the first and second models such that each of the temporal intervals output from the second model approaches each temporal interval computed from time-related information pre-associated with each frame image (Huang, [0016], “When a neural network model is trained, a time loss and a feature loss are used together as a feedback and adjustment basis to adjust the neural network model, to obtain the neural network model used for image processing through training, [0050] “training the neural network model may include adjusting values of one or more parameters/operators in the neural network model”).
Claim 2
Huang discloses the learning apparatus according to Claim 1 (Huang, Fig. 1B), wherein the processor (Huang, Fig. 1B, processor 102) is further configured to execute:
changing a temporal sequence of the frame images (Huang, [0016], “an intermediate image corresponding to an early video frame in time sequence is changed according to optical flow information of change from the early video frame in time sequence to a later video frame in time sequence”, [0037], “The intermediate image corresponding to the early video frame may be changed according to the optical flow information to obtain the expected-intermediate image”, [0040], “in an embodiment of this application, the electronic device may alternatively change, according to optical flow information between two video frames that are not adjacent to each other in the video frames, an intermediate image corresponding to an early video frame in time sequence in the two video frames that are not adjacent to each other, to obtain an intermediate image anticipated to correspond to a later video frame in time sequence in the two video frames that are not adjacent to each other, For example, the video frames that are adjacent in time are sequentially x1, x2, and x3 in time sequence, and intermediate images of x1, x2, and x3 and output by the neural network model are correspondingly sequentially y1, y2, and y3. Optical flow information of change from x1 to x3 is g3. The electronic device may change y1 to z3 according to g3, and z3 is an intermediate image anticipated to correspond to x3”),
generating information indicating a time difference or a difference in a frame ID between the frame image treated as the reference being a first frame image in the temporal sequence, and each of a second frame image and subsequent frame images among the frame images (Huang, [0043], “The time loss may be used for representing a difference between change, in time domain, of the video frames that are adjacent in time and change, in time domain, between images obtained after the video frames that are adjacent in time are processed by the neural network model. Specifically, the electronic device may compare the intermediate image corresponding to the later video frame in time sequence with the image that is obtained after the intermediate image corresponding to the early video frame in time sequence is changed according to the optical flow information of change from the early video frame in time sequence to the later video frame in time sequence, to obtain a difference between the two images, and determine the time loss between the intermediate image corresponding to the later video frame in time sequence and the obtained image according to the difference.”),
storing in the memory information indicating each time difference or difference in the frame ID in association with the frame images for which the temporal sequence has been changed (Huang, [0004], “The electronic device include a memory storing instructions and a processor in communication with the memory, [0037], “The intermediate image corresponding to the early video frame may be changed according to the optical flow information to obtain the expected-intermediate image”),
causing the first model to accept each frame image stored in the memory as input, and output a feature vector of each frame image (Huang, Fig. 2), and
updating the parameters of the first and second models such that the information indicating each time difference or difference in the frame ID output from the second model approaches the information indicating each time difference or difference in the frame ID stored in association with each frame image in the memory (Huang, [0016], “When a neural network model is trained, a time loss and a feature loss are used together as a feedback and adjustment basis to adjust the neural network model, to obtain the neural network model used for image processing through training, [0050] “training the neural network model may include adjusting values of one or more parameters/operators in the neural network model”).
Claim 3
Huang discloses the learning apparatus according to Claim 1 (Huang, Fig. 1B), wherein the processor (Huang, Fig. 1B, processor 102) is further configured to execute causing the first model to accept a plurality of frame images included in the video (Huang, Fig. 2, S202, obtain a plurality of video frames) and either or both of sensor data associated with each frame image or information related to an object included in each frame image as input (Huang, [0057, “the electronic device may use the neural network model on which training is completed and that is used for extracting an image content feature to respectively extract image content features corresponding to the intermediate images and image content features corresponding to the input video frames corresponding to the intermediate images, then compare the image content features corresponding to the intermediate images and the image content features corresponding to the corresponding input video frames, to obtain a difference between the image content features, and determine the content losses between the intermediate images and the corresponding video frames according to the difference”, content is analogous to object in each frame), and output the feature vector for each frame image (Huang, Fig. 4, the output of the neural network model, Abstract, “inputting the plurality of video frames through a neural network model so that the neural network model outputs intermediate images”).
Claim 4 is rejected for similar reasons as those described in claim 1. The additional elements in Claim 4 (Huang) discloses includes: a learning method (Huang, Fig. 4) executed by a computer including a memory including a first model and a second model (Huang, Fig. 1B, memory 104, Fig. 4, neural network model, time domain loss function, and evaluation network model), and a processor (Huang, Fig. 1B, processor 102).
Claim 5 is rejected for similar reasons as those described in claim 1. The additional elements in Claim 5 (Huang) discloses includes: a non-transitory computer- readable recording medium having computer-readable instructions stored (Huang, Fig. 1B, memory 104), thereon, which when executed, cause a computer (Huang, Fig. 1B) including a memory including a first model and a second model (Huang, Fig. 1B, memory 104, Fig. 4, neural network model, time domain loss function, and evaluation network model), and a processor (Huang, Fig. 1B, processor 102)to execute a learning process (Huang, Fig. 4).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Huang in view of Yan et al., "Semi-Supervised Video Salient Object Detection Using Pseudo-Labels" (2019), hereinafter referred to as Yan.
Claim 6
Huang discloses the learning apparatus according to Claim 1 (Huang, Fig. 1B).
Huang does not explicitly disclose wherein as objective task of the temporal interval is computed as a pseudo-label.
However, Yan teaches wherein as objective task of the temporal interval is computed as a pseudo-label (Yan, Section 3.3, “Given triplets of input video frames {Ii , Ik, Ij}(i < k < j), the proposed FGPLG model aims at generating a pseudo-label for frame Ik with ground truth Gi and Gj propagated from frame Ii and Ij , respectively. First, it computes the optical flow Oi→k from frame Ii to frame Ik with the off-the-shelf FlowNet 2.0. The optical flow Oj→k is obtained in the same way. Then, the label of frame Ik is estimated by applying a warping function to adjacent ground truth Gi and Gj”, “The generation model can be trained with sparsely annotated frames to generate denser pseudo-labels. In our experiments, we use a fixed interval l to select sparse annotations for training. We take an annotation every l frames, i.e., the interval between the j th and k th frame, and the interval between the i th and k th frame are both equal to l. Experimental results show that the generation model designed in this way has a strong generalization ability. It can use the model trained by the triples sampled at larger interframe intervals to generate dense pseudo-labels of very high quality”).
Huang and Yan are both considered to be analogous to the claimed invention because they are in the same field of video sequence processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the apparatus as taught by Huang to incorporate the teachings of Yan wherein as objective task of the temporal interval is computed as a pseudo-label. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to learn spatial and temporal cues for both contrast inference and coherence enhancement (Yan, Abstract).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Lee et al., “Unsupervised Representation Learning by Sorting Sequences” – teaches changing the sequences of the video frames as well as determining the temporal interval.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DENISE G ALFONSO whose telephone number is (571)272-1360. The examiner can normally be reached Monday - Friday 7:30 - 5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Amandeep Saini can be reached at (571)272-3382. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/DENISE G ALFONSO/Examiner, Art Unit 2662
/AMANDEEP SAINI/Supervisory Patent Examiner, Art Unit 2662