DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 2/17/2026 has been entered.
Response to Amendment
This office action is in response to communications filed 2/1/2026. Claims 1, 4-7, 10-11, 15 and 18-20 are pending in this action.
Response to Arguments
Applicant’s arguments with respect to claims 1-20 have been considered but are not persuasive. In response to Applicant’s arguments on page10 that “Tandon, however, does not teach or suggest "identifying, for each clip of interest included in the target video, at least one interest frame from the clip of interest based on attention information of each video frame from a plurality of video frames in the clip of interest”, the Examiner respectfully disagrees. Tandon teaches, at least at [0021], that “a segmentation application operating on a computing device uses a trained predictive model to provide aesthetic scores for each video frame. The segmentation application determines, from the aesthetic scores, that a quality threshold has been met between the first and second video frames and that a duration threshold has been met between the first and second video frames. The segmentation application segments the video accordingly”,
at least at [0043] that “aesthetics score can be based on one or more components. For example, an aesthetic score can computed be based on color harmony, the balance of elements in a frame, whether content is interesting, depth of field, whether the light in the scene is interesting, which object is the emphasis of the scene, whether there is repetition, rule of thirds, vivid colors, or symmetry”, and
at least at [0078] that “aesthetic-based video segmentation including segmentation in conjunction with face detection or scene detection can be used for a variety of additional applications. For example, computing system 101 can generate a video summary. A video summary can be a shorter version of the original video 102, with only the most interesting segments. For example, computing system 101 can rank the determined video segments 180a-m according to a criterion such as an aesthetic score. Segmentation application 110 can use those segments with the highest aesthetic score in a video summary. Similarly, segmentation application 110 can create a slideshow of key images by using key frames from the highest-ranked video segments”.
Therefore, the components and/or aesthetic scores of the video frames are reasonably “attention information of each video frame from a plurality of video frames in the clip of interest”, given the limitation’s broadest reasonable interpretation. And as key frames are determined and used to create the video summary based on the aesthetic-based segmentation, this reasonably teaches “based on attention information of each video frame from a plurality of video frames in the clip of interest” given the limitation’s broadest reasonable interpretation, and the rejection of record is maintained.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 15 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Mahyar (of record) in view of Tandon (of record).
Regarding claims 1 and 20, Mahyar discloses a video summarization method (see Mahyar, at least at col 2, lines 17-51, col 8, lines 19-27, Fig. 9, and other related text) comprising:
obtaining an attention coding parameter of a user based on behavior data of the user (see Mahyar, at least at col 4, lines 19-34, col 11, lines 34-39, and other related text);
determining, for each clip included in a target video, whether the clip is a clip of interest to the user, based on the attention coding parameter of the user (see Mahyar, at least at col 2, lines 18-41, col 4, lines 16-43, and other related text);
based on determining that at least one clip included in the target video is a clip of interest, obtaining a video summary of the target video (see Mahyar, at least at col 2, lines 36-54, col 8, lines 47-56, and other related text).
Mahyar does not specifically disclose based on determining that at least one clip included in the target video is the clip of interest, identifying, for each clip of interest included in the target video, at least one interest frame from the clip of interest based on attention information of each video frame from a plurality of video frames in the clip of interest; and
obtaining a video summary of the target video by combining the at least one interest frame from each clip of interest.
In an analogous art related to a system for analyzing video, Tandon discloses based on determining that at least one clip included in a target video is a clip of interest (see Tandon, at least at [0078], and other related text), identifying, for each clip of interest included in the target video (see Tandon, at least at [0078], and other related text), at least one interest frame from the clip of interest (see Tandon, at least at [0078], and other related text) based on attention information of each video frame from a plurality of video frames in the clip of interest (see Tandon, at least at [0021], [0043], [0078], and other related text); and
obtaining a video summary of the target video by combining the at least one interest frame from each clip of interest (see Tandon, at least at [0078], and other related text).
It would have been obvious to a person having ordinary skill in the art before the effective date of the invention to modify the system of the system of Mahyar to include the limitations Tandon for the advantage of more efficiently providing accurate video analysis and optimizing system resources.
Regarding claim 15, Mahyar in view of Tandon a video summarization apparatus (see Mahyar, at least at Fig. 1 and related text, and see Tandon, at least at Fig. 7 and related text) comprising:
a user attention parameter generation unit (processor and code of processor that performs the specific functions, see Mahyar, at least at col 4, lines 19-34, col 11, lines 34-39, and other related text);
an interest frame extraction unit (processor and code of processor that performs the specific functions, see Mahyar, at least at col 2, lines 18-41, col 4, lines 16-43, and other related text, and see Tandon, at least at [0078]-[0080], Fig. 7, and other related text);
a combining unit (processor and code of processor that performs the specific functions, see Tandon, at least at [0078]-[0080], Fig. 7, and other related text);
a memory storing at least one instruction see Mahyar, at least at Fig. 1 and related text; and
at least one processor configured to execute the at least one instruction see Mahyar, at least at Fig. 1 and related text to:
obtain, through the user attention parameter generation unit, an attention coding parameter of a user based on behavior data of the user (processor and code of processor that performs the specific function, see Mahyar, at least at col 4, lines 19-34, col 11, lines 34-39, and other related text);
determine, through the interest frame extraction unit, for each clip included in a target video, whether the clip is a clip of interest to the user, based on the attention coding parameter of the user (processor and code of processor that performs the specific function, see Mahyar, at least at col 2, lines 18-41, col 4, lines 16-43, and other related text);
identify, through the interest frame extraction unit, at least one interest frame from at least one clip of interest included in the target video (processor and code of processor that performs the specific function, see Tandon, at least at [0078]-[0080], Fig. 7, and other related text) based on attention information of each video frame from a plurality of video frames in the clip of interest (see Tandon, at least at [0021], [0043], [0078], and other related text); and
obtain, through the combining unit, a video summary of the target video by combining the at least one interest frame from the at least one clip of interest (processor and code of processor that performs the specific function, see Tandon, at least at [0078]-[0080], Fig. 7, and other related text).
Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Mahyar (of record) in view of Tandon (of record), as applied to claim 1 above, and further in view of Gopalan (of record).
Regarding claim 2, Mahyar in view of Tandon does not specifically disclose wherein the behavior data comprises input-related information and a viewing behavior record of the user within a statistical window, the input-related information comprising at least one of input content information, a time when an input operation is performed, or a place where the input operation is performed.
In an analogous art relating to a system for analyzing video, Gopalan discloses behavior data comprising input-related information and a viewing behavior record of a user within a statistical window, the input-related information comprising at least one of input content information, a time when an input operation is performed, or a place where the input operation is performed (see Gopalan, at least at [0025], and other related text).
It would have been obvious to a person having ordinary skill in the art before the effective date of the invention to modify the system of the system of Mahyar in view of Tandon to include the limitations as taught by Gopalan for the advantage of providing a more efficient system that allows for more diverse data to be analyzed.
Claims 3 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Mahyar (of record) in view of Tandon (of record), as applied to claim 1 above, and further in view of Qiu (of record).
Regarding claims 3 and 16, Mahyar in view of Tandon does not specifically disclose wherein the obtaining the attention coding parameter of the user comprises:
obtaining a vector representation of the behavior data by coding the behavior data; and obtaining the attention coding parameter of the user by inputting the vector representation into a preset first self-attention calculation model to perform self-attention processing.
In an analogous art relating to a system for analyzing data, Qiu discloses obtaining an attention coding parameter of a user comprises: obtaining a vector representation of behavior data by coding the behavior data (see Qiu, at least at [0061], [0094], [0199]-[0201], and other related text); and
obtaining the attention coding parameter of the user by inputting the vector representation into a preset first self-attention calculation model to perform self-attention processing (see Qiu, at least at [0061], [0094], [0199]-[0201], and other related text).
It would have been obvious to a person having ordinary skill in the art before the effective date of the invention to modify the system of the system of Mahyar in view of Tandon to include the limitations as taught by Qiu for the advantage of providing a more efficient system that allows for more accurate results of analysis.
Claims 4 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Mahyar (of record) in view of Tandon (of record), as applied to claim 1 above, and further in view of Rufenacht (of record).
Regarding claims 4 and 17, Mahyar in view of Tandon does not specifically disclose wherein the determining, for each clip included in the target video, whether the clip is the clip of interest comprises:
obtaining video frame vector representations of each video frame in the clip by coding each video frame in the clip; and
determining whether the clip is the clip of interest based on the video frame vector representations.
In an analogous art relating to a system for video analysis, Rufenacht discloses determining, for each clip included in a target video, whether the clip is a clip of interest (see Rufenacht, at least at page 3, lines 17-31, page 4, lines 10-32, page 5, lines 4-12, page 8, lines 27-32, page 9, lines 10-33, and other related text) comprising:
obtaining video frame vector representations of each video frame in the clip by coding each video frame in the clip (see Rufenacht, at least at page 3, lines 17-31, page 4, lines 10-32, page 5, lines 4-12, page 8, lines 27-32, page 9, lines 10-33, and other related text); and
determining whether the clip is a clip of interest based on the video frame vector representations(see Rufenacht, at least at page 3, lines 17-31, page 4, lines 10-32, page 5, lines 4-12, page 8, lines 27-32, page 9, lines 10-33, and other related text).
It would have been obvious to a person having ordinary skill in the art before the effective date of the invention to modify the system of the system of Mahyar in view of Tandon to include the limitations as taught by Rufenacht for the advantage of providing a more efficient system that allows for more accurate results of analysis.
Allowable Subject Matter
Claims 5-14 and 18-19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHENEA DAVIS whose telephone number is (571)272-9524 and whose email address is CHENEA.SMITH@USPTO.GOV. The examiner can normally be reached M-F: 8:00 am - 4:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Nathan Flynn can be reached at 571-272-1915. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/CHENEA DAVIS/Primary Examiner, Art Unit 2421