Last updated: April 19, 2026
Application No. 18/984,364
CONTENT SUMMARIZATION LEVERAGING SYSTEMS AND PROCESSES FOR KEY MOMENT IDENTIFICATION AND EXTRACTION

Final Rejection §103
Filed
Dec 17, 2024
Examiner
ALFONSO, DENISE G
Art Unit
2662
Tech Center
2600 — Communications
Assignee
Salesting Inc.
OA Round
4 (Final)
Interview Optional

— +19.8% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 103 resolved cases, 2023–2026
Examiner Intelligence

ALFONSO, DENISE G View full profile →
Grants 74% — above average
Career Allow Rate
76 granted / 103 resolved
+11.8% vs TC avg
Strong +20% interview lift
Without
With
+19.8%
Interview Lift
resolved cases with interview
Typical timeline
3y 1m
Avg Prosecution
31 currently pending
Career history
134
Total Applications
across all art units
Statute-Specific Performance

§101
8.3%
-31.7% vs TC avg
§103
59.8%
+19.8% vs TC avg
§102
19.4%
-20.6% vs TC avg
§112
8.1%
-31.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 103 resolved cases
Office Action

§103
DETAILED ACTIONS
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 09/09/2025 has been entered.
 
Response to Amendment
The amendment filed 11/14/2025 has been entered. Claims 1, 3-9, 11-17, 19-20 and new claims 21-23 remain pending in the application. Claims 2, 10, and 18 are cancelled.

Response to Arguments
Applicant’s arguments with respect to claims 1, 9, and 17 have been considered but some are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.


On page 11 of the Remarks, Applicants contend that Chen does not teach or suggest “automatically determining weightings for each of the one or more sub-scores and assigning an overall importance score to each of the one or more segments based on the weightings for each of the one or more sub-scores “ as required by amended independent claim 17. Applicants argue that Chen does not calculate these weights using an AI engine. The Examiner respectfully disagrees with this characterization of Chen and submits that the reference does indeed disclose the limitation in question.
In [0025], Chen describes Figure 1 as a flow chart that conceptually illustrates a process for generating a personalized playlist of video segments in accordance with an embodiment of the invention. Creating the personal playlist is analogous to video summarization. Chen also described that, “machine learning techniques can be utilized to determine processes for ordering stories from a set of stories to create a personalized playlist as appropriate to the requirements of specific applications in accordance with embodiments of the invention” [0154]. One of the processes for creating the playlist is determining the weighting for the score for the different segments. In [0154], Chen described that “the “importance” of a video segment can be scored and utilized to determine the order in which the video segments are presented in a playlist. In several embodiments, importance can be scored based upon factors including (but not limited to) the number of related video segments.” Chen describes this difference factors in [0105], “factors including (but not limited to) a user's preferences with respect to sources and/or categories of video segments (s.sub.source, s.sub.category), recency (s.sub.time), and viewing history (s.sub.history) are considered in calculating the personalization weights”. These factors are analogous to the different sub-scores to determine the overall importance or weighting of the segment. Therefore, Chen teaches “automatically determining weightings for each of the one or more sub-scores and assigning an overall importance score to each of the one or more segments based on the weightings for each of the one or more sub-scores”.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1, 3-6, 8-9, 11-14, 16-17, 19-20, and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Potapov "Category-Specific Video Summarization" (2014), hereinafter referred to as Potapov, in view of Fu et al., “Multi-view video summarization” (2010), hereinafter referred to as Fu, in view of Yavagal et al., (US 10,592,750 B1), hereinafter referred to as Yavagal, in view of Bor-Chun et al., (US 2018/0225519 A1), hereinafter referred to as Bor-Chun, in further view of Chen et al., (US 2016/0014482 A1), hereinafter referred to as Chen.

Claim 1
Potapov discloses a method for analyzing multimedia content (Potapov, Section 3, Kernel video summarization), comprising:
receiving multimedia content, including audio data and video data (Potapov, Section 4, “We introduce a new dataset, called MED-summaries. The proposed benchmark simplifies the evaluation by introducing a clear and automatic evaluation procedure, that is tailored to category-specific summarization”);
a first artificial intelligence (Al) engine (Potapov, Section 3.1, “Our Kernel Temporal Segmentation (KTS) method splits the video into a set of non-intersecting temporal segments.”) automatically identifying one or more moment candidates in the multimedia content (Potapov, Section 3.1, “Given the matrix of frame-to-frame similarities defined through a positivedefinite kernel, the algorithm outputs a set of optimal ”change points” that correspond to the boundaries of temporal segments. More precisely, let the video be a sequence of descriptors xi ∈ X, i = 0, . . . , n − 1. Let K : X × X → R be a kernel function between descriptors. Let H be the feature space of the kernel K(·, ·). Denote φ : X → H the associated feature map, and k·kH the norm in the feature space H” each frame are processed and the “change pints” that corresponds to the boundaries of temporal segments is analogous to the moment candidate);
automatically identifying one or more segments of the multimedia content based on the one or more moment candidates (Potapov, Section 3.1, “Our Kernel Temporal Segmentation (KTS) method splits the video into a set of non-intersecting temporal segments”, “The proposed algorithm is described in Algo. 1. First, the kernel is computed for each pair of descriptors in the sequence. Then the segment variances are computed for each possible starting point t and segment duration d. It can be done efficiently by precomputing the cumulative sums of the matrix [34]. Then the dynamic programming algorithm is used to minimize the objective (2). It iteratively computes the best objective value for the first j descriptors and i change points. Finally, the optimal segmentation is reconstructed by backtracking. The total runtime cost of the algorithm is in O(mmaxn 2 ). The penalization introduces a minimal computational overhead because the dynamic programming algorithm already computes Li,n for all possible segment counts.”);
a second Al engine assigning importance scores to the one or more segments (Potapov, Section 3.2, “For each category, we train a linear SVM classifier from a set of videos with video-level labels, assuming that a classifier originally trained to classify the full videos can be used to score importance of small segments.”, “At test time, we segment the video using the KTS algorithm and aggregate Fisher descriptors for each segment. The relevant classifier is then applied to the segment descriptors, producing the importance map of the vid”);
the second Al engine conducting a frame analysis for each sequentially subsequent frame (Potapov, Section 3.2, “At training time, we aggregate frame descriptors of a video as if the whole video was a single segment. In this way a video descriptor has the same dimensionality as a segment descriptor. For each category we use videos of the category as positive examples and the videos from the other categories as negatives. We train one binary SVM classifier per category”, “At test time, we segment the video using the KTS algorithm and aggregate Fisher descriptors for each segment. The relevant classifier is then applied to the segment descriptors, producing the importance map of the vid”, each segment is analysed which means frame analysis is done for each sequentially subsequent frame)).

Potapov does not explicitly disclose the one or more moment candidates including at least one seed point and conducting a frame analysis for each sequentially subsequent frame from the at least one seed point via the importance scores.
	However, Fu teaches the one or more moment candidates including at least one seed point (Fu, Section V.A, “On the other hand, our graph partition is a -way segmentation problem given sampled shots indicating seeds for candidate clusters.”) and conducting a frame analysis for each sequentially subsequent frame from the at least one seed point via the importance scores (Fu, Section IV.A, “r, for every shot detected, we first compute the differential image sequence of adjacent frames. Each image can then be converted into a binary image by comparing the absolute value of each pixel against a thresh”,  Section V.A, “In addition, each event may have at least a central shot which has a high shot importance. We can take it as one of the best views recording this event. The random walks-based shot clustering fulfills these requirements in that we select the shots with higher importance as seeded nodes. Such shots just can be viewed as the centers of events.”, Section IV.B, “By representing multi-view videos with graph, multi-view video summarization is converted into a task of selecting the most representative video shots. The selection of representative shots often varies with different people. In this sense, detecting representative shots generally involves understanding video content based on human perception and is very difficult. To make it computationally tractable, we instead quantitatively evaluate the shot importance by considering low-level image features as well as high-level semantics. We introduce a Gaussian entropy fusion model to fuse a set of low-level features such as color histogram and wavelet coefficients, and compute an importance score”).
Potapov and Fu are all considered to be analogous to the claimed invention because they are in the same field of video summarization. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method as taught by Potapov incorporate the teachings of Fu of the one or more moment candidates including at least one seed point and conducting a frame analysis for each sequentially subsequent frame from the at least one seed point via the importance scores. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been for more robust way of video summarization (Fu, Section VII).

Potapov does not explicitly disclose wherein the at least one seed point defines an annotated timestamp of the multimedia content.
	However, Yavagal teaches wherein the at least one seed point (Yavagal, [Col. 8, lines 14-18], “the server(s) 112 may determine candidate video clips and may select a portion of the candidate video clips to include in a video summarization based on similarities between the candidate video clips”, [Col. 7, lines 4-10], “a second video tag may indicate the begin point and the end point associated with a single video clip, etc. As a second example, a single video tag may include multiple edits, such as a first video tag indicating the begin point and the end point associated with a single video clip along with the selected panning for the single video clip and the special effects and/or audio data associated with the selected video clip. The video tags may correspond to individual video clip or a group of video clip without departing from the disclosure.”) defines an annotated timestamp of the multimedia content  (Yavagal, [Col. 6, lines 36-46], “video tag is a tag (i.e., data structure) including annotation information that may be used in video summarization and/or rendering information that may be used to render a video. Examples of annotation information include an object, a person, an identity of a person, an angle relative to a camera axis, an area associated with a subject, a position associated with the subject, a timestamp (e.g., a time associated with receiving user input, a time associated with an individual video frame, a range of time associated with a sequence of video frames or the like) and/or other annotation data associated with video frame(s)”).
Potapov, Fu, and Yavagal are all considered to be analogous to the claimed invention because they are in the same field of video summarization. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method as taught by Potapov incorporate the teachings of Yavagal wherein the at least one seed point defines an annotated timestamp of the multimedia content. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been because video tags help indicates characteristics corresponding to specific video frames (Yavagal, [Col. 3, lines 6-7]).

Potapov discloses that the KTS “automatically selects the number of segments” in Section 1, but the limitation also recites that the second AI engine assigns the importance score. Potapov teaches using the SVM classifier for importance scoring which is a different AI engine from the KTS. The combination of Potapov in view of Fu in view of Yavagal does not explicitly disclose wherein a second AI engine determining a designated number of the one or more segments automatically.
	However, Bor-Chun teaches wherein a second AI engine  (Bor-Chun, [0032], “In some example implementations, different neural networks may be used in 110 and 120, respectively”, Bor-Chun also teaches plurality of neural networks) determining a designated number of the one or more segments automatically (Bor-Chun, [0031], “In other example implementations, the number of segments may be selected automatically by the neural network based on the total number of segments and/or the number of segments having an importance score exceeding a threshold”, [0030], “The neural network may assign an importance score to each segment based on the detected content features”, Bor-Chun teaches using the same neural network to assign the importance scores as well as automatically designating the number of segments).
Potapov, Fu, Yavagal, and Bor-Chun are all considered to be analogous to the claimed invention because they are in the same field of video summarization. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method as taught by Potapov and Fu to incorporate the teachings of Bor-Chun wherein a second AI engine determining a designated number of the one or more segments  automatically. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been for more robust way of video summarization.

Potapov discloses receiving user-annotated videos to be used for training as shown in Fig. 3. However, the combination of Potapov in view of Fu in view of Yavagal in view of Bor-Chun does not explicitly disclose annotating the multimedia content to identify the one or more segments within the multimedia content and displaying the annotated multimedia content on a user device.
However, Chen teaches annotating the multimedia content to identify the one or more segments within the multimedia content (Chen, [0077], “In order to generate a playlist of video segments personalized to a user's preferences, the process 300 seeks to annotate the video segments with metadata describing the content of the segments. In a number of embodiments, a video segment linking process (306) is performed that seeks to identify additional sources of relevant data that describe the content of the video segment. In a number of embodiments, the video segment linking process (306) also seeks to identify relationships between video segments. In various contexts, including in the generation of personalized playlists of news stories, knowledge concerning the relationship between video segments can be useful in identifying video segments that contain cumulative content and can be excluded from a playlist without significant loss of information or content coverage. Information concerning the number of related stories can also provide an indication of the importance of the story.”) and displaying the annotated multimedia content on a user device (Chen, [0057], “In the illustrated embodiment, the non-volatile memory 1930 includes a media decoder application 1932 that configures the processor 1010 to decode video for playback via display device a client application 1934 that configures the processor to render a user interface based upon metadata describing video segments contained within a personalized playlist 1926 retrieved from a playlist generation server system via the network interface 1940.”, “The video segment being played back via the user interface is described by displaying the video segment's title 2004, source 2006, recency 2008, and number of views 2010 above the player region 2002. As can readily be appreciated, any of a variety of information describing a video segment being played back within a player region can be displayed in any location(s) within a user interface as appropriate to the requirements of specific applications.”).
Potapov, Fu, Yavagal, Bor-Chun, and Chen are all considered to be analogous to the claimed invention because they are in the same field of video summarization. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method as taught by Potapov to incorporate the teachings of Chen of annotating the multimedia content to identify the one or more segments within the multimedia content and displaying the annotated multimedia content on a user device. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have because the annotations would be useful for the user to fully understand the video summary and to improve personalized video playlist generation (Chen, [0171]).


Claim 3
The combination of Potapov in view of Fu in view of Yavagal in view of Bor-Chun in further view of Chen discloses the method of claim 1 (Potapov, Section 3, Kernel video summarization), further comprising cutting the multimedia content to only include the one or more segments to produce summarized multimedia content (Potapov, Section 3.3, “Finally, a summary is constructed by concatenating the most important segments of the video. We assume that the duration of the summary is set a priori. Segments are included in the summary by the order of their importance until the duration limit is achieved (we crop the last segment to satisfy the constraint”).

Claim 4
The combination of Potapov in view of Fu in view of Yavagal in view of Bor-Chun in further view of Chen discloses the method of claim 3 (Potapov, Section 3, Kernel video summarization), further comprising the second Al engine (Potapov, Section 3.2 and 3.3) receiving a summarization threshold from the user device (Chen, Fig. 20B, user device), wherein the summarization threshold is a maximum final length of the summarized multimedia content. (Chen, [0068], “the user's preferences can touch upon topic, content provider, and total playlist duration”).
The proposed combination as well as the motivation for combining the Potapov, Chen, and Fu references presented in the rejection of Claim 1, apply to Claim 4 and are incorporated herein by reference.  Thus, the method recited in Claim 4 is met by Potapov, Chen, and Fu.

Claim 5
The combination of Potapov in view of Fu in view of Yavagal in view of Bor-Chun in further view of Chen discloses the method of claim 1 (Potapov, Section 3, Kernel video summarization), wherein segments are identified using multiple different modes of analysis (Chen, [0068], “The playlist generation system 100 obtains video data streams and video segments from a variety of sources including (but not limited to) over-the-air broadcasts and cable television transmissions (102), online news websites (104), and social media services (106). In several embodiments, continuous data streams such as (but not limited to) over-the-air broadcasts and cable television transmissions (102) are segmented and the video segments stored for later retrieval. In a number of embodiments, a multi-modal segmentation process is utilized that considers a variety of video, audio, and/or text cues in the determination of segmentation boundaries.”, Fig. 5B) and wherein the one or more segments are finally selected based on a multimodal aggregate including weighting of the multiple different modes of analysis (Chen, Fig. 5B, step 558, fuse cues to identify video segments, [0150], “The overall weightings c.sub.i for a video segment v.sub.i from the set of n recent video segments v can be expressed”).
The proposed combination as well as the motivation for combining the Potapov, Chen, and Fu references presented in the rejection of Claim 1, apply to Claim 5 and are incorporated herein by reference.  Thus, the method recited in Claim 5 is met by Potapov, Chen, and Fu.

Claim 6
The combination of Potapov in view of Fu in view of Yavagal in view of Bor-Chun in further view of Chen discloses the method of claim 1 (Potapov, Section 3, Kernel video summarization), further comprising the first Al engine automatically identifying the one or more moment candidates at least in part based on a transcript of the multimedia content (Chen, Fig. 5B. detect textual cues, [0010], “a video clip in the set of video clips further includes an audio channel and the at least one key feature of each video clip includes a text transcript of the audio channel”).
The proposed combination as well as the motivation for combining the Potapov, Chen, and Fu references presented in the rejection of Claim 1, apply to Claim 6 and are incorporated herein by reference.  Thus, the method recited in Claim 6 is met by Potapov, Chen, and Fu.

Claim 8
The combination of Potapov in view of Fu in view of Yavagal in view of Bor-Chun in further view of Chen discloses the method of claim 6 (Potapov, Section 3, Kernel video summarization), wherein the second Al engine includes a neural network (Bor-Chun, [0031], “In other example implementations, the number of segments may be selected automatically by the neural network based on the total number of segments and/or the number of segments having an importance score exceeding a threshold”, [0030], “The neural network may assign an importance score to each segment based on the detected content features”, Bor-Chun teaches using the same neural network to assign the importance scores as well as automatically designating the number of segments).
Potapov, Chen, Fu, and Bor-Chun are all considered to be analogous to the claimed invention because they are in the same field of video summarization. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method as taught by Potapov, Chen, and Fu to incorporate the teachings of Bor-Chun wherein the second Al engine includes a neural network. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been for more robust way of video summarization.

Claims 9, 11-14, and 16 are rejected for similar reasons as those described in claims 1, 3-6, respectively. The additional elements in Claims 9 and 11-14 (the combination of Potapov in view of Fu in view of Yavagal in view of Bor-Chun in further view of Chen) discloses includes: a first artificial intelligence (Al) module (Potapov, Section 3.1) automatically identifying one or more moment candidates of the multimedia content based on identified slide transitions in the video data (Chen, Fig. 5B, [0103], “As noted above, a match may represent that the candidate frame incorporates a logo and/or that the candidate frame corresponds to a frame from a transition animation. In many embodiments, the process of determining a match also involves determining a confidence metric that can also be utilized in the segmentation of a video data stream.”). Potapov, Fu, Yavagal, Bor-Chun, and Chen are all considered to be analogous to the claimed invention because they are in the same field of video summarization. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method as taught by Potapov to incorporate the teachings of Chen of annotating the multimedia content to identify the one or more segments within the multimedia content and displaying the annotated multimedia content on a user device and automatically identifying one or more moment candidates of the multimedia content based on identified slide transitions in the video data. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have because the annotations would be useful for the user to fully understand the video summary and to improve personalized video playlist generation (Chen, [0171]).

Claims 17 and 19-20 are rejected for similar reasons as those described in claim 1, 3 and 5, respectively. The additional elements in Claims 17 and 19-20 (the combination of Potapov in view of Fu in view of Yavagal in view of Bor-Chun in further view of Chen) discloses includes: a first artificial intelligence (AI) engine (Potapov, Section 3.1) automatically identifying one or more moment candidates of the multimedia content based on the audio data, the video data, and/or a transcript of the multimedia content (Chen, Fig. 5B, detect textual cues, audio cues, and visual cues)  wherein the one or more moment candidates are identified based on layout changes, speaker changes, topic changes, visual text changes, slide transitions, transcript information, and/or spoken text changes, assigning one or more sub-scores to one or more segments (Chen, [0091], “Some of the most important cues for story boundaries can be found in closed caption textual data incorporated within a video data stream. Often, >>> and >> markers are inserted to denote changes in stories or changes in speakers, respectively. Due to human errors, relying solely on these markers can provide inaccurate segmentation results. Therefore, segmentation analysis of closed caption data can be enhanced by looking for additional cues including (but not limited to) commonly used transition phrases that occur at segmentation boundaries.”, [0095], “In the context of segmentation of news stories, several embodiments of the invention rely upon one or more of a set of visual cues as strong indicators of a segmentation boundary. In a number of embodiments, the set of visual cues includes (but is not limited to) anchor frames, logo frames, logo animation sequences and/or dark frames. In other embodiments and/or contexts, any of a variety of visual cues can be utilized as appropriate to the requirements of specific applications.”); a second Al engine (Potapov, Section 3.2, Chen, [0154], “machine learning techniques can be utilized to determine processes for ordering stories from a set of stories to create a personalized playlist as appropriate to the requirements of specific applications in accordance with embodiments of the invention”) automatically determining weightings for each of the one or more sub-scores and assigning an overall importance score to each of the one or more segments based on the weightings for each of the one or more sub-scores (Chen [0154], “In a number of embodiments, the “importance” of a video segment can be scored and utilized to determine the order in which the video segments are presented in a playlist. In several embodiments, importance can be scored based upon factors including (but not limited to) the number of related video segments. In the context of news stories, the number of related video segments within a predetermined time period can be indicative of breaking news. Therefore, the number of related video segments to a video segment within a predetermined time period can be indicative of importance.”[0153], “more general preferences can be utilized to modify source and/or category preference scores that are separately used to weight video segments., [0105], “factors including (but not limited to) a user's preferences with respect to sources and/or categories of video segments (s.sub.source, s.sub.category), recency (s.sub.time), and viewing history (s.sub.history) are considered in calculating the personalization weights”). Potapov and Chen are all considered to be analogous to the claimed invention because they are in the same field of video summarization. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method as taught by Potapov to incorporate the teachings of Chen of a first artificial intelligence (AI) engine automatically identifying one or more moment candidates of the multimedia content based on the audio data, the video data, and/or a transcript of the multimedia content the one or more moment candidates are identified based on layout changes, speaker changes, topic changes, visual text changes, slide transitions, transcript information, and/or spoken text changes, assigning one or more sub-scores to one or more segments, a second Al engine automatically determining weightings for each of the one or more sub-scores and assigning an overall importance score to each of the one or more segments based on the weightings for each of the one or more sub-scores, and annotating the multimedia content to identify the one or more segments within the multimedia content and displaying the annotated multimedia content on a user device. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have because the annotations would be useful for the user to fully understand the video summary and to improve personalized video playlist generation (Chen, [0171]).


Claim 23
The combination of Potapov in view of Fu in view of Yavagal in view of Bor-Chun in further view of Chen discloses the method of claim 17 (Potapov, Section 3, Kernel video summarization), wherein the one or more sub-scores comprise information theoretic measures, including entropy, Bayesian surprise, and/or Bayesian perplexity (Chen teaches determining sub-scores to calculate an overall importance score, Fu teaches using Gaussian entropy to compute an importance score, Section IV.B, “We introduce a Gaussian entropy fusion model to fuse a set of low-level features such as color histogram and wavelet coefficients, and compute an importance score.”).
Potapov, Fu, Yavagal, Bor-Chun, Chen and Motoi are all considered to be analogous to the claimed invention because they are in the same field of video summarization. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method as taught by Potapov to incorporate the teachings of Fu wherein the one or more sub-scores comprise information theoretic measures, including entropy, Bayesian surprise, and/or Bayesian perplexity. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to make it computationally tractable (Fu, Section IV.B).

Claims 7 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Potapov in view of Fu in view of Yavagal in view of Bor-Chun in further view of Chen in further view of Mani et al., (US 2019/0258660 A1), hereinafter referred to as Mani.

Claim 7
The combination of Potapov in view of Fu in view of Yavagal in view of Bor-Chun in further view of Chen discloses the method of claim 6 (Potapov, Section 3, Kernel video summarization).

The combination of Potapov in view of Fu in view of Yavagal in view of Bor-Chun in further view of Chen does not explicitly disclose further comprising the first Al engine automatically filtering filler words from the transcript to exclude the filler words from the identification of the one or more moment candidates.
	However, Mani teaches further comprising the first Al engine automatically filtering filler words from the transcript to exclude the filler words from the identification of the one or more moment candidates (Mani, [0002], “The processor partitions the text transcript into a plurality of sequences of segments, wherein each sequence of segments corresponds to a respective series of non-overlapping time intervals different from non-overlapping time interval series of other sequences. An informativeness score is determined for each segment in a sequence of the plurality of sets of segment sequences, where the informativeness score reflects the segment's coverage of key non-redundant information in the source content item. A subsequence of segments is selected from one of the plurality of segment sequences, where the subsequence of segments satisfies the desired compression budget and maximizes a summary score that is a combination of the informativeness score for each segment and the coherence score for the subsequence. The processor generates a summary content item comprised of clips from the audio content item corresponding to the selected subsequence of segments.” [0064], “At 704, a text transcript is obtained from the ASR module 304 via processing of the audio portion of the content item 107 and the text transcript thus obtained is partitioned into sequences of segments at 706. The segments in a particular sequence have respective fixed and equal widths (i.e., block size) different from segments of other sequences. In an embodiment, multiple hypotheses can be generated with the text transcript by the ASR module 304. Accordingly, multiple layers can be generated at 706 for different ASR hypotheses for each segment. At 708 the ASR confidence scores are obtained. It can be appreciated that 708 can be omitted if only a single hypothesis is generated by the ASR module 304. At 710 the informativeness scores comprising a combination of the salience score and the diversity score are obtained for the segments in each of the segment sequences. At 712, segments with low WCN confidence scores are discarded.”).
Potapov, Chen, Fu, Bor-Chun and Mani are all considered to be analogous to the claimed invention because they are in the same field of video summarization. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method as taught by Potapov and Chen and Fu to incorporate the teachings of Mani of further comprising the first Al engine automatically filtering filler words from the transcript to exclude the filler words from the identification of the one or more moment candidates. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to improve user engagement by shortening the video (Mani, [0034]).

Claim 15 is rejected for similar reasons as those described in claim 7. The additional elements in Claim 15 (the combination Potapov in view of Fu in view of Yavagal in view of Bor-Chun in further view of Chen) discloses includes: a first artificial intelligence (Al) module (Potapov, Section 3.1) automatically identifying one or more moment candidates of the multimedia content based on identified slide transitions in the video data (Chen, Fig. 5B, [0103], “As noted above, a match may represent that the candidate frame incorporates a logo and/or that the candidate frame corresponds to a frame from a transition animation. In many embodiments, the process of determining a match also involves determining a confidence metric that can also be utilized in the segmentation of a video data stream.”). The proposed combination as well as the motivation for combining the Potapov, Fu, Yavagal, Bor-Chun, Chen, and Mani references presented in the rejection of Claim 7, apply to Claim 15 and are incorporated herein by reference.  Thus, the method recited in Claim 15 is met by Potapov, Chen, Fu, Yavagal, Bor-Chun, and Mani.

Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over Potapov in view of Fu in view of Yavagal in view of Bor-Chun in further view of Chen in further view of Motoi et al., (US 2013/0156321 A1), hereinafter referred to as Motoi.

Claim 21
The combination of Potapov in view of Fu in view of Yavagal in view of Bor-Chun in further view of Chen discloses the method of claim 1 (Potapov, Section 3, Kernel video summarization).

The combination of Potapov in view of Fu in view of Yavagal in view of Bor-Chun in further view of Chen does not explicitly disclose displaying a visualization of the importance scores on an importance intensity axis.
	However, Motoi teaches displaying a visualization of the importance scores on an importance intensity axis (Motoi, Abstract, “The method can select a common summarization segment based on a first summarization score.”, [0050], “FIG. 3 indicates changes of scores of feature elements extracted from video streams with time “, [0051],”The summarization score 308 is a value obtained by adding the scores of all elements and normalizing the added score. When adding the scores, the scores can be weighted in accordance with the importance of an element. For example, if a specific person is important in an event, the scores of main person 307, size of face 302 and utterance 305 are weighted to be high scores, and the score of cheering 306 is weighted to be low.”, Fig. 3 shows a display of visualization of the importance score over time.
Potapov, Fu, Yavagal, Bor-Chun, Chen and Motoi are all considered to be analogous to the claimed invention because they are in the same field of video summarization. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method as taught by Potapov to incorporate the teachings of Motoi displaying a visualization of the importance scores on an importance intensity axis. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to improve the quality of the summary video (Motoi, [0071]).

Claim 22 is rejected under 35 U.S.C. 103 as being unpatentable over Potapov in view of Fu in view of Yavagal in view of Bor-Chun in further view of Chen in further view of Motoi et al., (US 2013/0156321 A1), hereinafter referred to as Motoi.

Claim 22
The combination of Potapov in view of Fu in view of Yavagal in view of Bor-Chun in further view of Chen discloses the method of claim 9 (Potapov, Section 3, Kernel video summarization).

The combination of Potapov in view of Fu in view of Yavagal in view of Bor-Chun in further view of Chen does not explicitly disclose teaches recomputing the importance scores using the second AI engine based on interaction data, including a number of times the annotated multimedia content is skipped and/or a number of times the annotated multimedia content.
	However, Golan teaches recomputing the importance scores using the second AI engine based on interaction data (Potapov discloses using a second AI engine to assigning scores to the candidate segments, Golan, Abstract, “Portions or segments of the multimedia content object may be associated with rank values or ratings based on the metadata objects”, [0080], “A metadata object related to a segment of a multimedia content object may include, or be associated with, a quality rating or grade. For example, a quality grade representing the quality of the metadata object may be calculated (e.g., by user agent 225 or by video synopsis generation unit 230 based on the entity that created or provided the metadata object. For example, a high quality grade may be given to a metadata object received from an expert and a low quality grade may be given to a metadata object generated based on monitoring actions or input of an unknown user. For example, based on the source of a message that includes a metadata object, video synopsis generation unit 230 may set the quality of the information in the metadata object. The quality grade may be updated over time, e.g., based on new metadata or new logic used in analyzing metadata, and may be fine-tuned in real time.
”, since it is updated over time means it is being recalculated, quality grade is analogous to importance score), including a number of times the annotated multimedia content is skipped and/or a number of times the annotated multimedia content (Golan, [0048], “an expert or employee may watch a video clip and may annotate the clip or otherwise add metadata that may be stored, e.g., on storage 216. For example, an expert or person may generate metadata such as "the segment from second 14 to second 32 is suitable for kids", "the segment 34-57 is unsuitable for children under 8 years old" and so on.”) is viewed (Golan, [0086], “a criterion for ranking time slots may be a popularity (e.g., as determined by the number of users who watched a video clip or the total watch time for a segment in a video clip). A criterion for ranking time slots may be a bounce rate for a segment. A bounce rate as known in the art is the ratio or rate of users who stopped watching a clip. As described, the number of users who watched a segment may be recorded (e.g., by a large set of user agent 225 units on a respective large set of users' computers) using metadata objects associated with a time slot. In an embodiment, video synopsis generation unit 230 uses metadata information to calculate or summarize the total watch time for each time slot and then normalizes the total watch time of each time slot according to a priority, preference and/or quality grade to generate a rank for a time slot. For example, the total watch time may be an aggregation of the time spent by all users watching a segment of a video clip. Complex rules or criteria may be used. For example, a criterion may be a specific segment a user skipped to from the segment.”).
Potapov, Fu, Yavagal, Bor-Chun, Chen and Golan are all considered to be analogous to the claimed invention because they are in the same field of video summarization. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method as taught by Potapov to incorporate the teachings of Golan of recomputing the importance scores using the second AI engine based on interaction data, including a number of times the annotated multimedia content is skipped and/or a number of times the annotated multimedia content. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been so that the video summary can be fin-tuned in real-time (Golan, [0080]).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DENISE G ALFONSO whose telephone number is (571)272-1360. The examiner can normally be reached Monday - Friday 7:30 - 5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Amandeep Saini can be reached at (571)272-3382. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DENISE G ALFONSO/Examiner, Art Unit 2662                                                                                                                                                                                                        
/AMANDEEP SAINI/Supervisory Patent Examiner, Art Unit 2662
Read full office action
Prosecution Timeline

Dec 17, 2024
Application Filed
Feb 11, 2025
Non-Final Rejection — §103
May 14, 2025
Applicant Interview (Telephonic)
May 14, 2025
Examiner Interview Summary
May 15, 2025
Response Filed
Jun 05, 2025
Final Rejection — §103
Aug 25, 2025
Interview Requested
Sep 09, 2025
Request for Continued Examination
Sep 09, 2025
Applicant Interview (Telephonic)
Sep 10, 2025
Response after Non-Final Action
Sep 10, 2025
Examiner Interview Summary
Sep 29, 2025
Non-Final Rejection — §103
Oct 30, 2025
Interview Requested
Nov 06, 2025
Applicant Interview (Telephonic)
Nov 07, 2025
Examiner Interview Summary
Nov 14, 2025
Response Filed
Mar 02, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/161,911
Patent 12586352
IMAGE PROCESSING APPARATUS, IMAGE PROCESSING METHOD AND STORAGE MEDIUM
2y 5m to grant Granted Mar 24, 2026
17/537,799
Patent 12579693
ELECTRONIC SHELF LABEL MANAGING SERVER, DISPLAY DEVICE AND CONTROLLING METHOD THEREOF
2y 5m to grant Granted Mar 17, 2026
18/080,993
Patent 12555371
VISION TRANSFORMER FOR MOBILENET SIZE AND SPEED
2y 5m to grant Granted Feb 17, 2026
17/821,378
Patent 12541980
METHOD FOR DETERMINING OBJECT INFORMATION RELATING TO AN OBJECT IN A VEHICLE ENVIRONMENT, CONTROL UNIT AND VEHICLE
2y 5m to grant Granted Feb 03, 2026
18/007,104
Patent 12541941
A Method for Testing an Embedded System of a Device, a Method for Identifying a State of the Device and a System for These Methods
2y 5m to grant Granted Feb 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
74%
Grant Probability
94%
With Interview (+19.8%)
3y 1m
Median Time to Grant
High
PTA Risk
Based on 103 resolved cases by this examiner. Grant probability derived from career allow rate.
CONTENT SUMMARIZATION LEVERAGING SYSTEMS AND PROCESSES FOR KEY MOMENT IDENTIFICATION AND EXTRACTION

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email