DETAILED ACTION
Notice of Pre-AIA or AIA Status
1. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
2. A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 02/12/2026 has been entered.
Response to Amendment
3. Receipt of Applicant’s Amendment filed on 02/12/2026 is acknowledged. The amendment includes the amending of claims 1, 18, and 19.
Terminal Disclaimer
4. The terminal disclaimer filed on 04/04/2025 disclaiming the terminal portion of any patent granted on this application which would extend beyond the expiration date of U.S. Patent 12,045,281 has been reviewed and is accepted. The terminal disclaimer has been recorded.
Claim Rejections - 35 USC § 103
5. In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
6. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
7. This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
8. Claims 1-6 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Fraser et al. (Article entitled “RePlay: Contextually Presenting Learning Videos Across Software Applications”, dated 09 May 2019), in view of Phillips et al. (U.S. PGPUB 2021/0193187), and further in view of Waitelonis et al. (Article entitled “Semantically Enabled Exploratory Video Search”, dated 30 April 2010).
9. Regarding claims 1, 18, and 19, Fraser teaches a method, non-transitory computer-readable storage medium, and computing device comprising:
A) receiving a search query (Page 4, Figures 1, 2, and 4);
G) providing a search result interface displaying the at least one search video content and one or more search sections within the search video content retrieved as a result of the search query (Pages 4-5, Figures 1, 2, and 4);
H) wherein the search video content and the one or more search sections are related with the search query with respect to at least one of the plurality of video semantic search attributes (Pages 3-5, Figures 1, 2, and 4); and
I) providing a timeline view interface for the search video content when providing the search result interface in a timeline view mode (Pages 3-5, Figures 1, 2, and 4);
J) wherein the timeline view interface indicates locations of the one or more search sections of the search video content along a timeline of the search video content (Pages 3-5, Figures 1, 2, and 4);
K) presenting at least two or more visual characteristics displayed along the timeline of the search video content (Pages 3-5, Figures 1, 2, and 4);
L) wherein each visual characteristic in the two or more visual characteristics corresponds to a respective search section in the one or more search sections (Pages 3-5, Figures 1, 2, and 4).
The examiner notes that Fraser teaches “receiving a search query” as “Users can edit or delete it to form their own query. Pressing the go button or return key triggers a search” (Page 4). The examiner further notes that manually inputted queries from users teaches the claimed receiving as shown in Figures 1, 2, and 4. The examiner further notes that Fraser teaches “providing a search result interface displaying the at least one search video content and one or more search sections within the search video content retrieved as a result of the search query” as “Often, videos have multiple moments that may be relevant. RePlay renders green markers on the video timeline to indicate these moments. Mousing over a marker invokes a pop-up text area displaying a caption excerpt with words from the query in bold (Figure 2). This pop-up obscures YouTube’s default thumbnail pop-up but provides more useful information, as software videos tend to show an entire screen and shrinking this to a thumbnail makes it hard to see. Clicking a marker starts the video from that moment” (Page 4), “RePlay leverages existing online video search engines to retrieve video results. It then finds and ranks relevant clips within these videos” (Page 5), and “To be application-independent and embed online videos directly without waiting to download and process them, RePlay instead uses metadata and caption text to rank and segment videos. For each video result, RePlay divides its captions into 30- second segments, searching each for the queried keywords (with stop words removed) and names of the three most recently used tools in the current application. It ranks all segments by the total number of keyword matches. To break ties it uses number of tool name matches. The highest-ranked segment determines the video’s start time. Timeline markers denote the top ten segments: green for those with a query term; grey if only a tool is mentioned. RePlay re-orders the video results based on the total number of matching clips. To break ties it uses the total number of matching keywords within the clips” (Page 5). The examiner further notes that video search results in the RePlay system of Fraser clearly outputs video content with relevant segments (i.e. sections) on a timeline as shown in Figures 1, 2, and 4. The examiner further notes that Fraser teaches “wherein the search video content and the one or more search sections are related with the search query with respect to at least one of the plurality of video semantic search attributes” as “RePlay uses captions to select relevant clips” (Page 3), “Often, videos have multiple moments that may be relevant. RePlay renders green markers on the video timeline to indicate these moments. Mousing over a marker invokes a pop-up text area displaying a caption excerpt with words from the query in bold (Figure 2). This pop-up obscures YouTube’s default thumbnail pop-up but provides more useful information, as software videos tend to show an entire screen and shrinking this to a thumbnail makes it hard to see. Clicking a marker starts the video from that moment” (Page 4), “RePlay leverages existing online video search engines to retrieve video results. It then finds and ranks relevant clips within these videos” (Page 5), “RePlay queries YouTube and selects its top five video results that have English captions and mention the current application in any of the title, description, or captions (to avoid results that may contain other keywords but do not pertain to the current application)” (Page 5), and “To be application-independent and embed online videos directly without waiting to download and process them, RePlay instead uses metadata and caption text to rank and segment videos. For each video result, RePlay divides its captions into 30- second segments, searching each for the queried keywords (with stop words removed) and names of the three most recently used tools in the current application. It ranks all segments by the total number of keyword matches. To break ties it uses number of tool name matches. The highest-ranked segment determines the video’s start time. Timeline markers denote the top ten segments: green for those with a query term; grey if only a tool is mentioned. RePlay re-orders the video results based on the total number of matching clips. To break ties it uses the total number of matching keywords within the clips. Although automatic captions are far from perfect, we found them to be sufficient for searching in RePlay. Captions are already an approximation of what the demonstrator is doing, so despite some errors, they work well enough for identifying potentially relevant moments” (Page 5). The examiner further notes that video search results in the RePlay system of Fraser is based on caption data (i.e. an example of a semantic search attribute) of videos being “related” to a user query. The examiner further notes that Fraser teaches “providing a timeline view interface for the search video content when providing the search result interface in a timeline view mode” as “RePlay overlays markers on the timeline indicating command or tool use. We use timeline markers over other options as they take up little space and allow for pop-up text previews, which aid browsing” (Page 3), “Often, videos have multiple moments that may be relevant. RePlay renders green markers on the video timeline to indicate these moments. Mousing over a marker invokes a pop-up text area displaying a caption excerpt with words from the query in bold (Figure 2). This pop-up obscures YouTube’s default thumbnail pop-up but provides more useful information, as software videos tend to show an entire screen and shrinking this to a thumbnail makes it hard to see. Clicking a marker starts the video from that moment” (Page 4), “RePlay leverages existing online video search engines to retrieve video results. It then finds and ranks relevant clips within these videos” (Page 5), and “To be application-independent and embed online videos directly without waiting to download and process them, RePlay instead uses metadata and caption text to rank and segment videos. For each video result, RePlay divides its captions into 30- second segments, searching each for the queried keywords (with stop words removed) and names of the three most recently used tools in the current application. It ranks all segments by the total number of keyword matches. To break ties it uses number of tool name matches. The highest-ranked segment determines the video’s start time. Timeline markers denote the top ten segments: green for those with a query term; grey if only a tool is mentioned. RePlay re-orders the video results based on the total number of matching clips. To break ties it uses the total number of matching keywords within the clips” (Page 5). The examiner further notes that video search results in the RePlay system of Fraser clearly outputs video content with relevant segments (i.e. sections) on a timeline as shown in Figures 1, 2, and 4. The examiner further notes that Fraser teaches “wherein the timeline view interface indicates locations of the one or more search sections of the search video content along a timeline of the search video content” as “Often, videos have multiple moments that may be relevant. RePlay renders green markers on the video timeline to indicate these moments. Mousing over a marker invokes a pop-up text area displaying a caption excerpt with words from the query in bold (Figure 2). This pop-up obscures YouTube’s default thumbnail pop-up but provides more useful information, as software videos tend to show an entire screen and shrinking this to a thumbnail makes it hard to see. Clicking a marker starts the video from that moment” (Page 4), “RePlay leverages existing online video search engines to retrieve video results. It then finds and ranks relevant clips within these videos” (Page 5), and “To be application-independent and embed online videos directly without waiting to download and process them, RePlay instead uses metadata and caption text to rank and segment videos. For each video result, RePlay divides its captions into 30- second segments, searching each for the queried keywords (with stop words removed) and names of the three most recently used tools in the current application. It ranks all segments by the total number of keyword matches. To break ties it uses number of tool name matches. The highest-ranked segment determines the video’s start time. Timeline markers denote the top ten segments: green for those with a query term; grey if only a tool is mentioned. RePlay re-orders the video results based on the total number of matching clips. To break ties it uses the total number of matching keywords within the clips” (Page 5). The examiner further notes that video search results in the RePlay system of Fraser clearly outputs video content with relevant segments (i.e. sections) on a timeline as shown in Figures 1, 2, and 4. Such relevant sections are identified via green or grey markers that are indicative of locations within the video. The examiner further notes that Fraser teaches “presenting at least two or more visual characteristics displayed along the timeline of the search video content” as “Often, videos have multiple moments that may be relevant. RePlay renders green markers on the video timeline to indicate these moments. Mousing over a marker invokes a pop-up text area displaying a caption excerpt with words from the query in bold (Figure 2). This pop-up obscures YouTube’s default thumbnail pop-up but provides more useful information, as software videos tend to show an entire screen and shrinking this to a thumbnail makes it hard to see. Clicking a marker starts the video from that moment” (Page 4), “RePlay leverages existing online video search engines to retrieve video results. It then finds and ranks relevant clips within these videos” (Page 5), and “To be application-independent and embed online videos directly without waiting to download and process them, RePlay instead uses metadata and caption text to rank and segment videos. For each video result, RePlay divides its captions into 30- second segments, searching each for the queried keywords (with stop words removed) and names of the three most recently used tools in the current application. It ranks all segments by the total number of keyword matches. To break ties it uses number of tool name matches. The highest-ranked segment determines the video’s start time. Timeline markers denote the top ten segments: green for those with a query term; grey if only a tool is mentioned. RePlay re-orders the video results based on the total number of matching clips. To break ties it uses the total number of matching keywords within the clips” (Page 5). The examiner further notes that video search results in the RePlay system of Fraser clearly outputs video content with relevant segments (i.e. sections) on a timeline as shown in Figures 1, 2, and 4. Such relevant sections are identified via green and/or grey markers (i.e. examples of visual characteristics). The examiner further notes that Fraser teaches “wherein each visual characteristic in the two or more visual characteristics corresponds to a respective search section in the one or more search sections” as “Often, videos have multiple moments that may be relevant. RePlay renders green markers on the video timeline to indicate these moments. Mousing over a marker invokes a pop-up text area displaying a caption excerpt with words from the query in bold (Figure 2). This pop-up obscures YouTube’s default thumbnail pop-up but provides more useful information, as software videos tend to show an entire screen and shrinking this to a thumbnail makes it hard to see. Clicking a marker starts the video from that moment” (Page 4), “RePlay leverages existing online video search engines to retrieve video results. It then finds and ranks relevant clips within these videos” (Page 5), and “To be application-independent and embed online videos directly without waiting to download and process them, RePlay instead uses metadata and caption text to rank and segment videos. For each video result, RePlay divides its captions into 30- second segments, searching each for the queried keywords (with stop words removed) and names of the three most recently used tools in the current application. It ranks all segments by the total number of keyword matches. To break ties it uses number of tool name matches. The highest-ranked segment determines the video’s start time. Timeline markers denote the top ten segments: green for those with a query term; grey if only a tool is mentioned. RePlay re-orders the video results based on the total number of matching clips. To break ties it uses the total number of matching keywords within the clips” (Page 5). The examiner further notes that video search results in the RePlay system of Fraser clearly outputs video content with relevant segments (i.e. sections) on a timeline as shown in Figures 1, 2, and 4. Such relevant sections are identified via green and/or grey markers (i.e. examples of visual characteristics that correspond to those relevant sections).
Fraser does not explicitly teach:
B) obtaining at least one search video content by generating one or more video retrieval vectors for the at least one search video content by applying a transformer architecture;
C) the one or more video retrieval vectors for the at least one search video content including a plurality of video retrieval vectors corresponding to a plurality of video semantic search attributes;
D) generating a search query vector for the search query;
E) comparing the search query vector to the one or more video retrieval vectors for the search video content;
F) the one or more video retrieval vectors obtained from visual data in the search video content;
H) based on the comparison of the search query vector to the one or more video retrieval vectors.
Phillips, however, teaches “obtaining at least one search video content by generating one or more video retrieval vectors for the at least one search video content by applying a transformer architecture” as “the video preprocessor 105 obtains a video, and obtains video scene vectors based on the obtained video. The video may include any type of video, such as, for example, user-uploaded videos and online videos. The video scene vectors may include video scenes (i.e., sequences of successive frames) that are segmented from the obtained video. The video scene vectors may further include vectors that are aligned with the video scenes, and vector types of the vectors. The vectors are semantic representations of the video scenes. A vector type may be, for example, an action, an object, or a caption that describes what a corresponding vector is representing in a video scene. For example, a first video scene vector may include a first video scene with a first vector “burger” and a first vector type “object,” and a second video scene vector may include a second video scene with a second vector “fry the onions” and a second vector type “caption.”” (Paragraph 46), “As will be described later with reference to FIG. 5, the semantic text encoder 215 obtains the video scenes, captions and descriptors, and obtains the video scene vectors, based on the obtained video scenes, captions and descriptors” (Paragraph 58), and “The scene scoring module 705 obtains the video scene vectors from the video preprocessor 105, and obtains the query vector and the vector type weights from the query preprocessor 110. The scene scoring module 705 then obtains vector scores respectively for the video scenes, based on the similarity between the obtained query vector and each of the obtained video scene vectors. Such a similarity may include a Euclidean distance or cosine similarity” (Paragraph 91), “the one or more video retrieval vectors for the at least one search video content including a plurality of video retrieval vectors corresponding to a plurality of video semantic search attributes” as “the video preprocessor 105 obtains a video, and obtains video scene vectors based on the obtained video. The video may include any type of video, such as, for example, user-uploaded videos and online videos. The video scene vectors may include video scenes (i.e., sequences of successive frames) that are segmented from the obtained video. The video scene vectors may further include vectors that are aligned with the video scenes, and vector types of the vectors. The vectors are semantic representations of the video scenes. A vector type may be, for example, an action, an object, or a caption that describes what a corresponding vector is representing in a video scene. For example, a first video scene vector may include a first video scene with a first vector “burger” and a first vector type “object,” and a second video scene vector may include a second video scene with a second vector “fry the onions” and a second vector type “caption.”” (Paragraph 46), “As will be described later with reference to FIG. 5, the semantic text encoder 215 obtains the video scenes, captions and descriptors, and obtains the video scene vectors, based on the obtained video scenes, captions and descriptors” (Paragraph 58), and “The scene scoring module 705 obtains the video scene vectors from the video preprocessor 105, and obtains the query vector and the vector type weights from the query preprocessor 110. The scene scoring module 705 then obtains vector scores respectively for the video scenes, based on the similarity between the obtained query vector and each of the obtained video scene vectors. Such a similarity may include a Euclidean distance or cosine similarity” (Paragraph 91), “generating a search query vector for the search query” as “The semantic text encoder 605 obtains the user query, and encodes the obtained user query into the query vector” (Paragraph 83) and “The scene scoring module 705 obtains the video scene vectors from the video preprocessor 105, and obtains the query vector and the vector type weights from the query preprocessor 110. The scene scoring module 705 then obtains vector scores respectively for the video scenes, based on the similarity between the obtained query vector and each of the obtained video scene vectors. Such a similarity may include a Euclidean distance or cosine similarity” (Paragraph 91), “comparing the search query vector to the one or more video retrieval vectors for the search video content” as “The scene scoring module 705 obtains the video scene vectors from the video preprocessor 105, and obtains the query vector and the vector type weights from the query preprocessor 110. The scene scoring module 705 then obtains vector scores respectively for the video scenes, based on the similarity between the obtained query vector and each of the obtained video scene vectors. Such a similarity may include a Euclidean distance or cosine similarity” (Paragraph 91), “the one or more video retrieval vectors obtained from visual data in the search video content” as “the video preprocessor 105 obtains a video, and obtains video scene vectors based on the obtained video. The video may include any type of video, such as, for example, user-uploaded videos and online videos. The video scene vectors may include video scenes (i.e., sequences of successive frames) that are segmented from the obtained video. The video scene vectors may further include vectors that are aligned with the video scenes, and vector types of the vectors. The vectors are semantic representations of the video scenes. A vector type may be, for example, an action, an object, or a caption that describes what a corresponding vector is representing in a video scene. For example, a first video scene vector may include a first video scene with a first vector “burger” and a first vector type “object,” and a second video scene vector may include a second video scene with a second vector “fry the onions” and a second vector type “caption.”” (Paragraph 46), “Each of the object detector 405, the action recognition module 410, the OCR module 415, the pose detector 420, and the audio detector 425 obtains the video scenes from the scene obtaining module 205, and obtains the descriptors respectively describing the obtained video scenes. Each of the object detector 405, the action recognition module 410, the OCR module 415, the pose detector 420, and the audio detector 425 may include a deep neural network (DNN) that extracts visuo-linguistic content from the obtained video scenes. The object detector 405 detects one or more objects from the obtained video scenes, and obtains one or more of the descriptors respectively describing the detected one or more objects. For example, the one or more of the descriptors may include object labels such as “screw” and “television stand.”. The action recognition module 410 recognizes one or more actions of actors included in the obtained video scenes, and obtains one or more of the descriptors respectively describing the recognized one or more actions. For example, the one or more of the descriptors may include action labels such as “walking” and “lifting.”. The OCR module 415 recognizes one or more textual characters or words included in the obtained video scenes, and obtains one or more of the descriptors respectively describing the recognized one or more textual characters or words. For example, the one or more of the descriptors may include onscreen text such as “step 1” and “welcome.”. The pose detector 420 detects one or more poses of the actors, from the obtained video scenes, and obtains one or more of the descriptors respectively describing the detected one or more poses. The one or more poses may include, for example, a hand or face of an actor. For example, the one or more of the descriptors may include pose labels such as “head,” “left hand” and “gripping.”” (Paragraphs 67-71), “The semantic text encoder 510 obtains the descriptors from the descriptor obtaining module 210, and encodes the obtained descriptors into the vectors. The semantic text encoder 510 further obtains the vector types of the vectors. The semantic text encoder 510 for the descriptors is separate from the semantic text encoder 505 for the captions because any of the captions may be spoken across multiple video scenes” (Paragraph 78), and “The scene scoring module 705 obtains the video scene vectors from the video preprocessor 105, and obtains the query vector and the vector type weights from the query preprocessor 110. The scene scoring module 705 then obtains vector scores respectively for the video scenes, based on the similarity between the obtained query vector and each of the obtained video scene vectors. Such a similarity may include a Euclidean distance or cosine similarity” (Paragraph 91), and “based on the comparison of the search query vector to the one or more video retrieval vectors” as “the video preprocessor 105 obtains a video, and obtains video scene vectors based on the obtained video. The video may include any type of video, such as, for example, user-uploaded videos and online videos. The video scene vectors may include video scenes (i.e., sequences of successive frames) that are segmented from the obtained video. The video scene vectors may further include vectors that are aligned with the video scenes, and vector types of the vectors. The vectors are semantic representations of the video scenes. A vector type may be, for example, an action, an object, or a caption that describes what a corresponding vector is representing in a video scene. For example, a first video scene vector may include a first video scene with a first vector “burger” and a first vector type “object,” and a second video scene vector may include a second video scene with a second vector “fry the onions” and a second vector type “caption.”” (Paragraph 46), “The query vector is a semantic representation of the obtained user query” (Paragraph 47), “Each of the object detector 405, the action recognition module 410, the OCR module 415, the pose detector 420, and the audio detector 425 obtains the video scenes from the scene obtaining module 205, and obtains the descriptors respectively describing the obtained video scenes. Each of the object detector 405, the action recognition module 410, the OCR module 415, the pose detector 420, and the audio detector 425 may include a deep neural network (DNN) that extracts visuo-linguistic content from the obtained video scenes. The object detector 405 detects one or more objects from the obtained video scenes, and obtains one or more of the descriptors respectively describing the detected one or more objects. For example, the one or more of the descriptors may include object labels such as “screw” and “television stand.”. The action recognition module 410 recognizes one or more actions of actors included in the obtained video scenes, and obtains one or more of the descriptors respectively describing the recognized one or more actions. For example, the one or more of the descriptors may include action labels such as “walking” and “lifting.”. The OCR module 415 recognizes one or more textual characters or words included in the obtained video scenes, and obtains one or more of the descriptors respectively describing the recognized one or more textual characters or words. For example, the one or more of the descriptors may include onscreen text such as “step 1” and “welcome.”. The pose detector 420 detects one or more poses of the actors, from the obtained video scenes, and obtains one or more of the descriptors respectively describing the detected one or more poses. The one or more poses may include, for example, a hand or face of an actor. For example, the one or more of the descriptors may include pose labels such as “head,” “left hand” and “gripping.”” (Paragraphs 67-71), “The semantic text encoder 510 obtains the descriptors from the descriptor obtaining module 210, and encodes the obtained descriptors into the vectors. The semantic text encoder 510 further obtains the vector types of the vectors. The semantic text encoder 510 for the descriptors is separate from the semantic text encoder 505 for the captions because any of the captions may be spoken across multiple video scenes” (Paragraph 78), and “The scene scoring module 705 obtains the video scene vectors from the video preprocessor 105, and obtains the query vector and the vector type weights from the query preprocessor 110. The scene scoring module 705 then obtains vector scores respectively for the video scenes, based on the similarity between the obtained query vector and each of the obtained video scene vectors. Such a similarity may include a Euclidean distance or cosine similarity” (Paragraph 91).
The examiner further notes that although the primary reference of Fraser outputs videos that correspond to user queries (including via video search attributes), there is no explicit teaching of the use of vectors as a basis to perform the searching. Nevertheless, the secondary reference of Phillips teaches the concept of querying videos via the use of vector similarities between multiple video vectors (that are semantic representations of the video (which are obtained via the use of an encoder (i.e. the claimed transformer architecture))) and a query vector (which is a semantic representation of the query). Such video vectors are based off of descriptions from visual data of the video (see examples of determined actions, poses, OCR, and objects). The combination would result in Fraser performing its searching via the use of vectors.
It would have been obvious to one of ordinary skill in the art before the effective filing date of instant invention to combine the teachings of the cited references because teaching Phillips’ would have allowed Fraser’s to provide a method for improving the retrieval of relevant videos, as noted by Phillips (Paragraph 2).
Fraser and Phillips do not explicitly teach:
M) the two or more visual characteristics including two or more signs with at least two sizes to indicate at least two degrees of relevance for the respective search sections of the one or more search sections with respect to the at least one of the plurality of video semantic search attributes for the search query;
N) wherein the two or more signs are displayed above or on the timeline for the search video content.
Waitelonis, however, teaches “the two or more visual characteristics including two or more signs with at least two sizes to indicate at least two degrees of relevance for the respective search sections of the one or more search sections with respect to the at least one of the plurality of video semantic search attributes for the search query” as “We have developed the pro totype semantic video search engine ’yovisto’ that demonstrates the advantages of semantically enhanced exploratory video search and enables investigative navigation and browsing in large video repositories” (Abstract) and “We have designed the user interface for exploratory search to consist out of three main areas: the direct search results in the center including geographical information displayed in a map on top of the search results, the facet filter on the right, and the exploratory search navigation on the left (cf. Fig. 2). In the direct search results a timeline exposes the automatically generated temporal segmentation of the videos with highlighted segments that are relevant according to the current query string. The facet filter allows to narrow the search results according to the type of resource, the scientific category, the issuing organization of the video, the language of the video, as well as popular user tags attached to the video segments” (Section 3.5) and “wherein the two or more signs are displayed above or on the timeline for the search video content” as “We have developed the pro totype semantic video search engine ’yovisto’ that demonstrates the advantages of semantically enhanced exploratory video search and enables investigative navigation and browsing in large video repositories” (Abstract) and “We have designed the user interface for exploratory search to consist out of three main areas: the direct search results in the center including geographical information displayed in a map on top of the search results, the facet filter on the right, and the exploratory search navigation on the left (cf. Fig. 2). In the direct search results a timeline exposes the automatically generated temporal segmentation of the videos with highlighted segments that are relevant according to the current query string. The facet filter allows to narrow the search results according to the type of resource, the scientific category, the issuing organization of the video, the language of the video, as well as popular user tags attached to the video segments” (Section 3.5).
The examiner further notes that although the primary reference of Fraser outputs ranked video search results, there is no explicit teaching of two or more signs that indicate relevance that are displayed above or on a timeline. Nevertheless, the secondary reference of Waitelonis teaches a timeline of semantic video search results (See Figure 2) that includes differing sizes of bars (i.e. the claimed signs in the broadest reasonable interpretation) that are indicative of relevance. The combination would result in displaying such differing sized bars in Fraser.
It would have been obvious to one of ordinary skill in the art before the effective filing date of instant invention to combine the teachings of the cited references because teaching Waitelonis’s would have allowed Fraser’s and Phillips’ to provide a method for improving video search, as noted by Waitelonis (Section 5).
Regarding claim 2, Fraser further teaches a method comprising:
A) wherein the plurality of video semantic search attributes includes at least one of conversation information in a video, text information in a video, person information in a video, and visual information in a video (Pages 3-5, Figures 1, 2, and 4).
The examiner notes that Fraser teaches “wherein the plurality of video semantic search attributes includes at least one of conversation information in a video, text information in a video, person information in a video, and visual information in a video” as “RePlay uses captions to select relevant clips” (Page 3), “Often, videos have multiple moments that may be relevant. RePlay renders green markers on the video timeline to indicate these moments. Mousing over a marker invokes a pop-up text area displaying a caption excerpt with words from the query in bold (Figure 2). This pop-up obscures YouTube’s default thumbnail pop-up but provides more useful information, as software videos tend to show an entire screen and shrinking this to a thumbnail makes it hard to see. Clicking a marker starts the video from that moment” (Page 4), “RePlay leverages existing online video search engines to retrieve video results. It then finds and ranks relevant clips within these videos” (Page 5), “RePlay queries YouTube and selects its top five video results that have English captions and mention the current application in any of the title, description, or captions (to avoid results that may contain other keywords but do not pertain to the current application)” (Page 5), and “To be application-independent and embed online videos directly without waiting to download and process them, RePlay instead uses metadata and caption text to rank and segment videos. For each video result, RePlay divides its captions into 30- second segments, searching each for the queried keywords (with stop words removed) and names of the three most recently used tools in the current application. It ranks all segments by the total number of keyword matches. To break ties it uses number of tool name matches. The highest-ranked segment determines the video’s start time. Timeline markers denote the top ten segments: green for those with a query term; grey if only a tool is mentioned. RePlay re-orders the video results based on the total number of matching clips. To break ties it uses the total number of matching keywords within the clips. Although automatic captions are far from perfect, we found them to be sufficient for searching in RePlay. Captions are already an approximation of what the demonstrator is doing, so despite some errors, they work well enough for identifying potentially relevant moments” (Page 5). The examiner further notes that video search results in the RePlay system of Fraser is based on caption data (i.e. conversation information) of videos being “related” to a user query.
Regarding claim 3, Fraser and Phillips do not explicitly teach a method comprising:
A) wherein the two or more visual characteristics further includes at least one of a sign, or text indicating the at least two degrees of relevance.
Waitelonis, however, teaches “wherein the two or more visual characteristics further includes at least one of a sign, or text indicating the at least two degrees of relevance” as “We have designed the user interface for exploratory search to consist out of three main areas: the direct search results in the center including geographical information displayed in a map on top of the search results, the facet filter on the right, and the exploratory search navigation on the left (cf. Fig. 2). In the direct search results a timeline exposes the automatically generated temporal segmentation of the videos with highlighted segments that are relevant according to the current query string. The facet filter allows to narrow the search results according to the type of resource, the scientific category, the issuing organization of the video, the language of the video, as well as popular user tags attached to the video segments” (Section 3.5).
The examiner further notes that although the primary reference of clearly outputs video search results on a timeline indicating visual characteristics of relevance (via the use of color), there is no explicit teaching of displaying visual characteristics that correspond to one of a sign or text. Nevertheless, the secondary reference of Waitelonis teaches video search results on a timeline (See Figure 2) that includes indications of different degrees of relevance via differing bar shape sizes (i.e. the claimed signs in the broadest reasonable interpretation). The combination would result in expanding the indications of relevance in Fraser for its video search results along a timeline.
It would have been obvious to one of ordinary skill in the art before the effective filing date of instant invention to combine the teachings of the cited references because teaching Waitelonis’s would have allowed Fraser’s and Phillips’ to provide a method for improving video search, as noted by Waitelonis (Section 5).
Regarding claim 4, Fraser further teaches a method comprising:
A) providing a search attribute information display interface indicating video semantic search attribute information corresponding to a position of a scroll marker on the timeline view interface (Pages 3-5, Figures 1, 2, and 4).
The examiner notes that Fraser teaches “providing a search attribute information display interface indicating video semantic search attribute information corresponding to a position of a scroll marker on the timeline view interface” as “RePlay uses captions to select relevant clips” (Page 3), “Often, videos have multiple moments that may be relevant. RePlay renders green markers on the video timeline to indicate these moments. Mousing over a marker invokes a pop-up text area displaying a caption excerpt with words from the query in bold (Figure 2). This pop-up obscures YouTube’s default thumbnail pop-up but provides more useful information, as software videos tend to show an entire screen and shrinking this to a thumbnail makes it hard to see. Clicking a marker starts the video from that moment” (Page 4), “RePlay leverages existing online video search engines to retrieve video results. It then finds and ranks relevant clips within these videos” (Page 5), “RePlay queries YouTube and selects its top five video results that have English captions and mention the current application in any of the title, description, or captions (to avoid results that may contain other keywords but do not pertain to the current application)” (Page 5), and “To be application-independent and embed online videos directly without waiting to download and process them, RePlay instead uses metadata and caption text to rank and segment videos. For each video result, RePlay divides its captions into 30- second segments, searching each for the queried keywords (with stop words removed) and names of the three most recently used tools in the current application. It ranks all segments by the total number of keyword matches. To break ties it uses number of tool name matches. The highest-ranked segment determines the video’s start time. Timeline markers denote the top ten segments: green for those with a query term; grey if only a tool is mentioned. RePlay re-orders the video results based on the total number of matching clips. To break ties it uses the total number of matching keywords within the clips. Although automatic captions are far from perfect, we found them to be sufficient for searching in RePlay. Captions are already an approximation of what the demonstrator is doing, so despite some errors, they work well enough for identifying potentially relevant moments” (Page 5). The examiner further notes that video search results in the RePlay include displayed caption data (i.e. an example of the claimed semantic search attribute) corresponding to each marker (i.e. the claimed scroll marker) on a timeline as shown in Figures 1, 2, and 4.
Regarding claim 5, Fraser further teaches a method comprising:
A) wherein the search attribute information display interface includes at least one of a video semantic search thumbnail display area, a video semantic search type display area, and a video semantic search content display area (Pages 3-5, Figures 1, 2, and 4).
The examiner notes that Fraser teaches “wherein the search attribute information display interface includes at least one of a video semantic search thumbnail display area, a video semantic search type display area, and a video semantic search content display area” as “RePlay uses captions to select relevant clips” (Page 3), “Often, videos have multiple moments that may be relevant. RePlay renders green markers on the video timeline to indicate these moments. Mousing over a marker invokes a pop-up text area displaying a caption excerpt with words from the query in bold (Figure 2). This pop-up obscures YouTube’s default thumbnail pop-up but provides more useful information, as software videos tend to show an entire screen and shrinking this to a thumbnail makes it hard to see. Clicking a marker starts the video from that moment” (Page 4), “RePlay leverages existing online video search engines to retrieve video results. It then finds and ranks relevant clips within these videos” (Page 5), “RePlay queries YouTube and selects its top five video results that have English captions and mention the current application in any of the title, description, or captions (to avoid results that may contain other keywords but do not pertain to the current application)” (Page 5), and “To be application-independent and embed online videos directly without waiting to download and process them, RePlay instead uses metadata and caption text to rank and segment videos. For each video result, RePlay divides its captions into 30- second segments, searching each for the queried keywords (with stop words removed) and names of the three most recently used tools in the current application. It ranks all segments by the total number of keyword matches. To break ties it uses number of tool name matches. The highest-ranked segment determines the video’s start time. Timeline markers denote the top ten segments: green for those with a query term; grey if only a tool is mentioned. RePlay re-orders the video results based on the total number of matching clips. To break ties it uses the total number of matching keywords within the clips. Although automatic captions are far from perfect, we found them to be sufficient for searching in RePlay. Captions are already an approximation of what the demonstrator is doing, so despite some errors, they work well enough for identifying potentially relevant moments” (Page 5). The examiner further notes that video search results in the RePlay system include displayed caption data (i.e. an example of the claimed content display area) corresponding to each marker on a timeline as shown in Figures 1, 2, and 4.
Regarding claim 6, Fraser further teaches a method comprising:
A) providing a video playback interface including a playback area for playing the search video content and a playback control interface for the search video content (Page 4, Figures 1, 2, and 4).
The examiner notes that Fraser teaches “providing a video playback interface including a playback area for playing the search video content and a playback control interface for the search video content” as “videos have multiple moments that may be relevant. RePlay renders green markers on the video timeline to indicate these moments. Mousing over a marker invokes a pop-up text area displaying a caption excerpt with words from the query in bold (Figure 2). This pop-up obscures YouTube’s default thumbnail pop-up but provides more useful information, as software videos tend to show an entire screen and shrinking this to a thumbnail makes it hard to see. Clicking a marker starts the video from that moment” (Page 4). The examiner further notes that Figures 1, 2, and 4 each depict a video playback interface with controls to play identified relevant videos (and sections within those videos) via manual selection from a user.
Regarding claim 16, Fraser does not explicitly teach a method comprising:
A) wherein the one or more video retrieval vectors and the search query vector are generated using a set of trained neural networks.
Phillips, however, teaches “wherein the one or more video retrieval vectors and the search query vector are generated using a set of trained neural networks” as “The semantic text encoder 505 obtains the captions from the descriptor obtaining module 210, and encodes the obtained captions into the vectors. The semantic text encoder 505 further obtains the vector types of the vectors. The semantic text encoder 510 obtains the descriptors from the descriptor obtaining module 210, and encodes the obtained descriptors into the vectors. The semantic text encoder 510 further obtains the vector types of the vectors. The semantic text encoder 510 for the descriptors is separate from the semantic text encoder 505 for the captions because any of the captions may be spoken across multiple video scenes. Each of the semantic text encoders 505 and 510 may be pre-trained, and may include the Universal Sentence Encoder or other sentence embedding methods” (Paragraphs 77-79) and “The semantic text encoder 605 obtains the user query, and encodes the obtained user query into the query vector. The semantic text encoder 605 may be pre-trained, and may include the Universal Sentence Encoder, like the semantic text encoders 505 and 510” (Paragraph 83).
The examiner further notes that although the primary reference of Fraser outputs videos that correspond to user queries, there is no explicit teaching of the use of vectors as a basis to perform the searching. Nevertheless, the secondary reference of Phillips teaches the concept of querying videos via the use of vectors generated via multiple trained encoders (i.e. neural networks). The combination would result in Fraser performing its searching via the use of vectors.
It would have been obvious to one of ordinary skill in the art before the effective filing date of instant invention to combine the teachings of the cited references because teaching Phillips’ would have allowed Fraser’s to provide a method for improving the retrieval of relevant videos, as noted by Phillips (Paragraph 2).
10. Claims 8-9 are rejected under 35 U.S.C. 103 as being unpatentable over Fraser et al. (Article entitled “RePlay: Contextually Presenting Learning Videos Across Software Applications”, dated 09 May 2019), in view of Phillips et al. (U.S. PGPUB 2021/0193187), and further in view of Waitelonis et al. (Article entitled “Semantically Enabled Exploratory Video Search”, dated 30 April 2010). as applied to claims 1-6 and 16 above, and further in view of He et al. (U.S. Patent 8,560,533).
11. Regarding claim 8, Fraser, Phillips, and Waitelonis do not explicitly teach a method comprising:
A) providing a sorting interface for changing a sorting criterion of the search video content displayed on the search result interface.
He, however, teaches “providing a sorting interface for changing a sorting criterion of the search video content displayed on the search result interface” as “Referring to FIG. 2, a user interface is displayed which illustrates a result page responsive to a user query. Rather than displaying only a list of appropriate videos responsive to the user query, in an alternative embodiment, a list of appropriate digital content items is presented to the user. For example, the user may query "ducati" and the subsequent results would include videos, documents, images, audio files, etc that are appropriate digital content items for the keyword in the query” (Column 13, lines 3-10).
The examiner further notes that although the primary reference of Fraser outputs ranked video search results (via the use of YouTube), there is no explicit teaching of an interface element that allows users to sort the video search results via different criteria. Nevertheless, the secondary reference of He (which depicts YouTube search results) teaches an interface element that a user can change sorting criteria for video search results (See Figure 2). Specifically, a user can sort video search results in accordance with relevance, date, views, and rating. The combination would result in providing users of Fraser the ability to change the ranking of displayed videos in accordance with different sorting criteria.
It would have been obvious to one of ordinary skill in the art before the effective filing date of instant invention to combine the teachings of the cited references because teaching He’s would have allowed Fraser’s, Phillips’, and Waitelonis’s to provide a method for improving the time efficiency in viewing video search results, as noted by He (Column 1, lines 40-49).
Regarding claim 9, Fraser, Phillips, and Waitelonis do not explicitly teach a method comprising:
A) wherein the sorting criterion includes at least one of the degree of relevance to the search query for the at least one of the plurality of video semantic search attributes, video update date, and video playback time.
He, however, teaches “wherein the sorting criterion includes at least one of the degree of relevance to the search query for the at least one of the plurality of video semantic search attributes, video update date, and video playback time” as “Referring to FIG. 2, a user interface is displayed which illustrates a result page responsive to a user query. Rather than displaying only a list of appropriate videos responsive to the user query, in an alternative embodiment, a list of appropriate digital content items is presented to the user. For example, the user may query "ducati" and the subsequent results would include videos, documents, images, audio files, etc that are appropriate digital content items for the keyword in the query” (Column 13, lines 3-10).
The examiner further notes that although the primary reference of Fraser outputs ranked video search results (via the use of YouTube), there is no explicit teaching of an interface element that allows users to sort the video search results via different criteria. Nevertheless, the secondary reference of He (which depicts YouTube search results) teaches an interface element that a user can change sorting criteria for video search results (See Figure 2). Specifically, a user can sort video search results in accordance with relevance (i.e. the claimed degree of relevance), date (i.e. the claimed video update date), views, and rating. The combination would result in providing users of Fraser the ability to change the ranking of displayed videos in accordance with different sorting criteria.
It would have been obvious to one of ordinary skill in the art before the effective filing date of instant invention to combine the teachings of the cited references because teaching He’s would have allowed Fraser’s, Phillips’, and Waitelonis’s to provide a method for improving the time efficiency in viewing video search results, as noted by He (Column 1, lines 40-49).
12. Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Fraser et al. (Article entitled “RePlay: Contextually Presenting Learning Videos Across Software Applications”, dated 09 May 2019), in view of Phillips et al. (U.S. PGPUB 2021/0193187), and further in view of Waitelonis et al. (Article entitled “Semantically Enabled Exploratory Video Search”, dated 30 April 2010). as applied to claims 1-6 and 16 above, and further in view of Hua et al. (U.S. PGPUB 2007/0203942), and further in view of Berry (U.S. PGPUB 2013/0166587).
13. Regarding claim 14, Fraser, Phillips, and Waitelonis do not explicitly teach a method comprising:
A) responsive to receiving an input to provide providing the search result interface in a card view mode, displaying the at least one search video content in the form of at least one card on the search result interface.
Hua, however, teaches “responsive to receiving an input to provide providing the search result interface in a card view mode, displaying the at least one search video content in the form of at least one card on the search result interface” as “The list-view user interface 500 may include a list-view/grid-view toggle switch 514. Clicking on the toggle switch changes how the video search results of a selected set of search results are displayed” (Paragraph 46) and “FIG. 6 illustrates an exemplary grid-view user interface 600 that includes video search results displayed using static and/or dynamic thumbnails 602. The user interface 600 may include a selection box 604 that can be used to modify the number of thumbnails 602 displayed. Currently, the 3X button is selected. If the 1X button is selected, five thumbnails 602 are displayed. If the 2X button is selected, ten thumbnails 602 are displayed. The user interface 600 may also be implemented with a 4X button that displays twenty thumbnails when selected. The user interface 600 may be implemented to display 6, 12, 24, or 36 thumbnails as well, or any other combination of thumbnails as required and/or desired due to implementation requirements. The user interface 600 also includes playback controls 606 to active/deactivate motion image data associated with the dynamic thumbnails 602” (Paragraph 47).
The examiner further notes that the secondary reference of Hua teaches the concept of displaying video search results in a grid-view (i.e. in a card format). The combination would result in allowing users to toggle to such a grid-view for the video search results of Fraser.
It would have been obvious to one of ordinary skill in the art before the effective filing date of instant invention to combine the teachings of the cited references because teaching Hua’s would have allowed Fraser’s, Phillips, and Waitelonis’s to provide a method for allowing users to change how search results are displayed, as noted by Hua (Paragraph 46).
Fraser, Phillips, Waitelonis, and Hua do not explicitly teach:
B) wherein a playback time of the one or more search sections that are related to the search query are displayed within the at least one card.
Berry, however, teaches “wherein a playback time of the one or more search sections that are related to the search query are displayed within the at least one card” as “As illustrated in FIG. 5, when the thumbnail 412 corresponding to the movie, "Witness," is selected, the user interface 500 is displayed to the user on the client device(s) 100. The user interface 500 preferably includes a video player panel 502 and, if there are more than one segment from within the relevant movie that meet the search criteria, a filmstrip panel 504. In such a case, the video player panel 502 displays the first or most highly relevant segment of the digital media asset, "Witness," visually to the user in response to the user selecting the movie, "Witness." In some embodiments, the start and end times associated with the segment may also be displayed or shown within the video display panel 502 or at another location of the user interface 500. In an illustrative embodiment, the video player panel 502 displays to the user the start and/or end points in time within the selected digital media asset, "Witness," where the search query is satisfied. In this example, the start or end points in time within the digital media asset, "Witness," where the search terms "danny" and "shooter" are both present. The video player panel 502 preferably further includes traditional video or movie player functionality (e.g., play, pause, stop, fast forward, reverse, advance to next scene, reverse to previous scene, volume control, and optionally a timeline slider, a current time start and stop time code location of the current segment, and possibly the start and end times for the entire media asset)” (Paragraph 46).
The examiner further notes that the secondary reference of Berry teaches the concept of video segment (i.e. section) search results that are displayed to a user in a “card” format that includes displayed start & end times of each relevant section (i.e. the claimed playback time). The combination would result in such relevant sections to be displayed in the card view of Hua.
It would have been obvious to one of ordinary skill in the art before the effective filing date of instant invention to combine the teachings of the cited references because teaching Berry’s would have allowed Fraser’s, Phillips’, Waitelonis’s, and Hua’s to provide a method for improving the value when accessing digital media, as noted by Berry (Paragraph 5).
14. Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Fraser et al. (Article entitled “RePlay: Contextually Presenting Learning Videos Across Software Applications”, dated 09 May 2019), in view of Phillips et al. (U.S. PGPUB 2021/0193187), and further in view of Waitelonis et al. (Article entitled “Semantically Enabled Exploratory Video Search”, dated 30 April 2010). as applied to claims 1-6 and 18-19 above, and further in view of Berry (U.S. PGPUB 2013/0166587).
15. Regarding claim 17, Fraser further teaches a method comprising:
A) wherein a first search section of the search video content is related to the search query with respect to a first video semantic search attribute (Pages 3-5, Figures 1, 2, and 4).
The examiner notes that Fraser teaches “wherein a first search section of the search video content is related to the search query with respect to a first video semantic search attribute” as “RePlay uses captions to select relevant clips” (Page 3), “Often, videos have multiple moments that may be relevant. RePlay renders green markers on the video timeline to indicate these moments. Mousing over a marker invokes a pop-up text area displaying a caption excerpt with words from the query in bold (Figure 2). This pop-up obscures YouTube’s default thumbnail pop-up but provides more useful information, as software videos tend to show an entire screen and shrinking this to a thumbnail makes it hard to see. Clicking a marker starts the video from that moment” (Page 4), “RePlay leverages existing online video search engines to retrieve video results. It then finds and ranks relevant clips within these videos” (Page 5), “RePlay queries YouTube and selects its top five video results that have English captions and mention the current application in any of the title, description, or captions (to avoid results that may contain other keywords but do not pertain to the current application)” (Page 5), and “To be application-independent and embed online videos directly without waiting to download and process them, RePlay instead uses metadata and caption text to rank and segment videos. For each video result, RePlay divides its captions into 30- second segments, searching each for the queried keywords (with stop words removed) and names of the three most recently used tools in the current application. It ranks all segments by the total number of keyword matches. To break ties it uses number of tool name matches. The highest-ranked segment determines the video’s start time. Timeline markers denote the top ten segments: green for those with a query term; grey if only a tool is mentioned. RePlay re-orders the video results based on the total number of matching clips. To break ties it uses the total number of matching keywords within the clips. Although automatic captions are far from perfect, we found them to be sufficient for searching in RePlay. Captions are already an approximation of what the demonstrator is doing, so despite some errors, they work well enough for identifying potentially relevant moments” (Page 5). The examiner further notes that video search results in the RePlay system of Fraser is based on caption data (i.e. an example of the undefined first semantic search attribute) of videos being “related” to a user query.
Fraser, Phillips, and Waitelonis do not explicitly teach:
B) a second search section of the search video content is related to the search query with respect to a second video semantic search attribute different from the first video semantic search attribute.
Berry, however, teaches “a second search section of the search video content is related to the search query with respect to a second video semantic search attribute different from the first video semantic search attribute” as “"Metadata," which is a term that has been used above and will be used herein, is information about other information--in this case, information about the digital media, as a whole, or associated with particular images, scenes, segments, or other subparts of the digital media. For example, metadata can identify the following types of information or characteristics regarding the digital media, including things such as actors appearing, themes present, or legal clearance to third party copyrighted material appearing in a respective digital media asset. Metadata may be related to the entire digital media (such as the title, date of creation, director, producer, production studio, etc.) or may only be relevant to particular segments, scenes, images, audio, or other portions of the digital media” (Paragraph 4), “As illustrated in FIG. 5, when the thumbnail 412 corresponding to the movie, "Witness," is selected, the user interface 500 is displayed to the user on the client device(s) 100. The user interface 500 preferably includes a video player panel 502 and, if there are more than one segment from within the relevant movie that meet the search criteria, a filmstrip panel 504. In such a case, the video player panel 502 displays the first or most highly relevant segment of the digital media asset, "Witness," visually to the user in response to the user selecting the movie, "Witness." In some embodiments, the start and end times associated with the segment may also be displayed or shown within the video display panel 502 or at another location of the user interface 500. In an illustrative embodiment, the video player panel 502 displays to the user the start and/or end points in time within the selected digital media asset, "Witness," where the search query is satisfied. In this example, the start or end points in time within the digital media asset, "Witness," where the search terms "danny" and "shooter" are both present. The video player panel 502 preferably further includes traditional video or movie player functionality (e.g., play, pause, stop, fast forward, reverse, advance to next scene, reverse to previous scene, volume control, and optionally a timeline slider, a current time start and stop time code location of the current segment, and possibly the start and end times for the entire media asset)” (Paragraph 46), “When the computer server(s) 104 of the navigation system 102 receives a search query, the computer server(s) 104 may search the search index, containing the conventional and time-based metadata, on the database(s) 106 to obtain an intersection of the clip sets for the attributes being searched” (Paragraph 64).
The examiner further notes that the secondary reference of Berry teaches the concept of searching multiple attributes of video segments (i.e. sections) when receiving a user query. The combination would result in expanding the attributes searched in Fraser yielding relevant segments corresponding to different attributes from amongst each other in video query results.
It would have been obvious to one of ordinary skill in the art before the effective filing date of instant invention to combine the teachings of the cited references because teaching Berry’s would have allowed Fraser’s, Phillips’, and Waitelonis’s to provide a method for improving the value when accessing digital media, as noted by Berry (Paragraph 5).
Allowable Subject Matter
16. Claim 10 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Specifically, although the prior art (See Fraser) clearly displays video search results on a timeline, and Faulkner provides filtering options on a timeline of video search results, the detailed claim limitations directed towards the providing of a sub-timeline with a different visual characteristic indicating a degree of relevance with a search video to a sub-attribute of a video semantic search attribute related to that search video after receiving a detailed search attribute display input is not found in the prior art, in conjunction with the rest of the limitations of the parent claim.
Dependent claims 11-13 are deemed allowable for depending on the deemed allowable subject matter of dependent claim 10.
Response to Arguments
17. Applicant's arguments filed 02/12/2026 have been fully considered but they are not persuasive.
Applicants argue on Page 12 that “Fraser does not teach or suggest presenting at least two or more visual characteristics including two or more signs with at least two sizes to indicate at least two degrees of relevance, where the two or more signs are displayed above or on the timeline for the search video content”. However, the secondary reference of Waitelonis is used to teach the claimed aforementioned two or more signs with at least two sizes to indicate at least two degrees of relevance, where the two or more signs are displayed above or on the timeline for the search video content.
Conclusion
18. The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
U.S. PGPUB 2018/0307383 issued to Faulkner et al. on 25 October 2018. The subject matter disclosed therein is pertinent to that of claims 1-6, 8-14, and 16-19 (e.g., methods to query videos).
U.S. PGPUB 2014/0372424 issued to Markov et al. on 18 December 2014. The subject matter disclosed therein is pertinent to that of claims 1-6, 8-14, and 16-19 (e.g., methods to query videos).
Contact Information
19. Any inquiry concerning this communication or earlier communications from the examiner should be directed to Mahesh Dwivedi whose telephone number is (571) 272-2731. The examiner can normally be reached on Monday to Friday 8:20 am – 4:40 pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Charles Rones can be reached (571) 272-4085. The fax number for the organization where this application or proceeding is assigned is (571) 273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see 20. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
Mahesh Dwivedi
Primary Examiner
Art Unit 2168
February 22, 2026
/MAHESH H DWIVEDI/Primary Examiner, Art Unit 2168