DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
Claims 2 and 13 are objected to because of the following informalities:
Regarding claim 2, the phrase “wherein portions of text data” should read as “wherein portions of the text data”.
Regarding claim 13, the phrase “one or more additional portions of modified version of the text data” should read as “one or more additional portions of the modified version of the text data”.
Appropriate correction is required.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claim(s) 1-6, 12-15 and 20 is/are rejected under 35 U.S.C. 102(a)(1) and 102(a)(2) as being anticipated by Yun (U.S. Pat. No. 9,430,115, hereinafter Yun).
Regarding claim 1, Yun discloses A system comprising: memory; and one or more processors coupled to the memory and configured to perform operations comprising (Systems and methods described with relation to “ generating storylines associated with content” which is implemented using a “device 104” through “one or more processors 106” the processors being “configured to access and execute at least in part instructions stored in the one or more memories 108.”; Yun, ¶ Col. 2, line 64 - col 3, line 33): obtaining text data associated with a content item (“determines text data associated with the one or more occurrence times 212 in the content 124.”; Yun, ¶ Col. 14, lines 16-21), the text data comprising at least one of a transcription and a translation of audio associated with the content item (“The text data may comprise information from retrieving one or more text captions, recognizing speech with a speech recognition module, or recognizing writing with an optical character recognition module” and may “be based at least in part on closed captions, open captions, subtitles, and so forth”; Yun, ¶ Col. 14, lines 16-25); determining a modified version of the text data based on at least one of a deviation in a playback timeline of the content item, topics associated with the text data, a chronological timeline associated with the content item, and a sequence of at least one of events associated with the content item and a content of the content item (“generates one or more tags associated with the one or more pieces of content 124 based at least in part on information in the determined text data and the one or more occurrence times 212” where “selecting one or more tags 132 based on one or more criteria, arranging the selected one or more tags 132” and the arranged tags are understood as the determined modified version of the text data, and where “The tag stores information descriptive of the event” and the events include “a particular character being on-screen, a particular topic or subject being discussed, particular voice... in a video an event may be the lead character being in an image frame... [and/or] in an audiobook an event may be dialogue associated with a particular character or voiced by a particular character” and so forth.; Yun, ¶ Col. 2, lines 9-17; Col. 14, lines 26-32); generating a representation of the modified version of the text data (Discloses “generating one or more storylines” comprising, based on the selected and arranged “one or more tags 132... storing the arrangement of the selected one or more tags 132 as the storyline 134 {a representation of the modified version of the text data}” where the “arranging may comprise sorting the tags 132 by the occurrence time 212” or “chronology of the story” such as, to address “a storyline 134 for a character which features flashback and flash-forward scenes may result in a storyline which presents the scenes showing the flashbacks first, consistent with their earlier appearance in the internal chronology of the story as being earlier events.”; Yun, ¶ Col. 15, lines 9-36); and generating metadata associated with the content item based on the representation of the modified version of the text data (“the tags 132 may be placed into a data structure or otherwise associated with particular portions of the content 124” and “during presentation of the content 132 as defined by the storyline 134, the user may experience a sequential flow of the events 208 indicated by the tags 132 in a particular arrangement.” As presented here, the tags 132, which are described as including information “descriptive of one or more of a character, location, theme, or other event 208 or event 208 components which are present in the portion of the content 124,” are the generated data about the data, and thus are the generated metadata. The tags are further associated with the content 124 {content item}, in light of the ascribed organization defined by the storyline 134 {based on the representation of the modified version of the text data}; Yun, ¶ Col. 6, lines 5-10; Col. 12, lines 26-34 and 43-49).
Regarding claim 2, Yun discloses wherein portions of text data in the modified version of the text data are grouped based on topics (“ the tags 132 may comprise information indicative of events 208 in the accessed content 124 and the occurrence times 212,” where the tags corresponding to the “events 208 may include...dialogue or discussion on a particular topic”; Yun, ¶ Col. 5, lines 7-14; Col. 14, lines 4-49).
Regarding claim 3, Yun discloses wherein the modified version of the text data arranges data in the modified version of the text data based on the sequence of at least one of events associated with the content item and the content of the content item (“the tags 132” as incorporated into the storyline 134 “may comprise information {arranges data...} indicative of events 208 in the accessed content 124 {the content of the content item} and the occurrence times 212” and the system may “selection of the one or more tags 132 may be based on one or more criteria applied to the description 210” and “the occurrence time 212 {the sequence of at least one events},”; Yun, ¶ Col. 5, lines 7-14; Col. 14, lines 4-49; Col. 15, lines 15-21).
Regarding claim 4, Yun discloses wherein arranging the data in the modified version of the text data based on the sequence comprises ordering the data in the modified version of the text data according to a chronological timeline associated with the content item (“The arranging” of the storyline “may comprise sorting the tags 132 by the occurrence time 212 in ascending order from lowest occurrence time 212 to greatest occurrence time 212”; Yun, ¶ Col. 15, lines 22-36).
Regarding claim 5, Yun discloses wherein the modified version of the text data groups a portion of the text data associated with the deviation in the playback timeline with an additional portion of the text data (The “selection of the one or more tags 132 may be based on one or more criteria applied to the description 210 and in some implementations the occurrence time 212” including “sorting the tags 132 by the occurrence time 212 in ascending order from lowest occurrence time 212{the modified version of the text data groups...}” where order of occurrence time includes addressing “flashback and flash-forward scenes may result in a storyline which presents the scenes” such as by “showing the flashbacks first, consistent with their earlier appearance in the internal chronology of the story as being earlier events {a portion of the text data associated with the deviation in the playback timeline}” where the order of appearance is an association of the deviation with additional tags {additional portion of the text data}; Yun, ¶ Col. 15, lines 15-36) selected based on one or more relationships between the portion of the text data and the additional portion of the text data, (“In implementations where the correlation between the tags 132 is to be considered, the criteria for the selecting the one or more tags may include the tags 132 having a determined correlation with one another greater than the correlation threshold 704,” where the determined correlation is the one or more relationships.; Yun, ¶ Col. 15, lines 37-41) wherein the one or more relationships comprise at least one of a chronological relationship, a contextual relationship, and a common timeline associated with the portion of the text data and the additional portion of the text data (In addition to following the “internal chronology of the story”, the “storyline 134 may combine tags 132 which are associated with different events 208” based on a determined “correlation between two or more tags 132” for example, “the tag 132(1) for the location ‘Ed’s Mercantile’ may have a correlation with the tag 132(3) for the character ‘Ed’” and/or “appearance of “Chet” and “Johnny” may be highly correlated, and a storyline 134 may be generated as described next which includes tags for both characters,” and/or the a storyline of “riding may be based on a combination of tags indicating cowboys and horses.”; Yun, ¶ Col. 14, line 56-Col. 15, line 9).
Regarding claim 6, Yun discloses wherein the modified version of the text data comprises additional text data (“FIG. 2 illustrates some elements 200 which may be used to generate the tags 132” where elements other than those applied originally for generation of the tags, which are incorporated in a text format (as the tags are in a text format, all data incorporated is necessarily also in a text format), is “additional text data”; Yun, ¶ Col. 4, line 60-63; FIG. 2) generated based on at least one of the audio associated with the content item and image data associated with the content item (The system “may generate one or more of the tags 132 based at least in part on the content 124” and/or the existing “content metadata 130”, where the “content 124 may comprise one or more of video data 202, audio data 204, caption data 206” where “audio data 204 may comprise a file or portion thereof conforming to the MPEG-2 Audio Layer III (“MP3”) standard”; Yun, ¶ Col. 4, line 60 - Col. 5, line 6; Col. 5, lines 42-46), the image data comprising at least one of one or more video frames and one or more still images (Corresponding to the audio data 204, the “video data 202 may comprise a plurality of image frames, such as a file or portion thereof conforming to the MPEG-4 standard promulgated by the Motion Picture Experts Group (“MPEG”).”; Yun, ¶ Col. 4, line 60- Col. 5, line 6).
Regarding claim 12, Yun discloses A computer-implemented method comprising (Systems and methods described with relation to “ generating storylines associated with content” which is implemented using a “device 104” through “one or more processors 106” the processors being “configured to access and execute at least in part instructions stored in the one or more memories 108.”; Yun, ¶ Col. 2, line 64 - col 3, line 33): obtaining text data associated with a content item (“determines text data associated with the one or more occurrence times 212 in the content 124.”; Yun, ¶ Col. 14, lines 16-21), the text data comprising at least one of a transcription and a translation of audio associated with the content item (“The text data may comprise information from retrieving one or more text captions, recognizing speech with a speech recognition module, or recognizing writing with an optical character recognition module” and may “be based at least in part on closed captions, open captions, subtitles, and so forth”; Yun, ¶ Col. 14, lines 16-25); determining a modified version of the text data based on at least one of a deviation in a playback timeline of the content item, topics associated with the text data, a chronological timeline associated with the content item, and a sequence of at least one of events associated with the content item and a content of the content item (“generates one or more tags associated with the one or more pieces of content 124 based at least in part on information in the determined text data and the one or more occurrence times 212” where “selecting one or more tags 132 based on one or more criteria, arranging the selected one or more tags 132” and the arranged tags are understood as the determined modified version of the text data, and where “The tag stores information descriptive of the event” and the events include “a particular character being on-screen, a particular topic or subject being discussed, particular voice... in a video an event may be the lead character being in an image frame... [and/or] in an audiobook an event may be dialogue associated with a particular character or voiced by a particular character” and so forth.; Yun, ¶ Col. 2, lines 9-17; Col. 14, lines 26-32); generating a representation of the modified version of the text data (Discloses “generating one or more storylines” comprising, based on the selected and arranged “one or more tags 132... storing the arrangement of the selected one or more tags 132 as the storyline 134 {a representation of the modified version of the text data}” where the “arranging may comprise sorting the tags 132 by the occurrence time 212” or “chronology of the story” such as, to address “a storyline 134 for a character which features flashback and flash-forward scenes may result in a storyline which presents the scenes showing the flashbacks first, consistent with their earlier appearance in the internal chronology of the story as being earlier events.”; Yun, ¶ Col. 15, lines 9-36); and generating metadata associated with the content item based on the representation of the modified version of the text data (“the tags 132 may be placed into a data structure or otherwise associated with particular portions of the content 124” and “during presentation of the content 132 as defined by the storyline 134, the user may experience a sequential flow of the events 208 indicated by the tags 132 in a particular arrangement.” As presented here, the tags 132, which are described as including information “descriptive of one or more of a character, location, theme, or other event 208 or event 208 components which are present in the portion of the content 124,” are the generated data about the data, and thus are the generated metadata. The tags are further associated with the content 124 {content item}, in light of the ascribed organization defined by the storyline 134 {based on the representation of the modified version of the text data}; Yun, ¶ Col. 6, lines 5-10; Col. 12, lines 26-34 and 43-49).
Regarding claim 13, Yun discloses wherein the modified version of the text data groups one or more portions of the modified version of the text data that are associated with the deviation in the playback timeline with one or more additional portions of modified version of the text data (The “selection of the one or more tags 132 may be based on one or more criteria applied to the description 210 and in some implementations the occurrence time 212” including “sorting the tags 132 by the occurrence time 212 in ascending order from lowest occurrence time 212{the modified version of the text data groups...}” where order of occurrence time includes addressing “flashback and flash-forward scenes may result in a storyline which presents the scenes” such as by “showing the flashbacks first, consistent with their earlier appearance in the internal chronology of the story as being earlier events {a portion of the modified version of the text data associated with the deviation in the playback timeline}” where the order of appearance is an association of the deviation with additional tags {additional portions of modified version of the text data}; Yun, ¶ Col. 15, lines 15-36) that are selected based on one or more relationships between the one or more portions of the modified version of the text data and the one or more additional portions of the modified version of the text data, (“In implementations where the correlation between the tags 132 is to be considered, the criteria for the selecting the one or more tags may include the tags 132 having a determined correlation with one another greater than the correlation threshold 704,” where the determined correlation is the one or more relationships.; Yun, ¶ Col. 15, lines 37-41) wherein the one or more relationships comprise at least one of a topic, a chronological relationship, a contextual relationship, and a common timeline (In addition to following the “internal chronology of the story”, the “storyline 134 may combine tags 132 which are associated with different events 208” based on a determined “correlation between two or more tags 132” for example, “the tag 132(1) for the location ‘Ed’s Mercantile’ may have a correlation with the tag 132(3) for the character ‘Ed’” and/or “appearance of “Chet” and “Johnny” may be highly correlated, and a storyline 134 may be generated as described next which includes tags for both characters,” and/or the a storyline of “riding may be based on a combination of tags indicating cowboys and horses,” and tags corresponding to the “events 208 may include...dialogue or discussion on a particular topic”; Yun, ¶ Col. 5, lines 7-14; Col. 14, line 56-Col. 15, line 9).
Regarding claim 14, the rejection of claim 12 is incorporated. Claim 14 is substantially the same as claim 4 and is therefore rejected under the same rationale as above.
Regarding claim 15, the rejection of claim 12 is incorporated. Claim 15 is substantially the same as claim 6 and is therefore rejected under the same rationale as above.
Regarding claim 20, Yun discloses A non-transitory computer-readable medium having instructions stored thereon that, when executed by one or more processors, cause the one or more processors to perform operations comprising (Systems and methods described with relation to “ generating storylines associated with content” which is implemented using a “device 104” through “one or more processors 106” the processors being “configured to access and execute at least in part instructions stored in the one or more memories 108.”; Yun, ¶ Col. 2, line 64 - col 3, line 33): obtaining text data associated with a content item (“determines text data associated with the one or more occurrence times 212 in the content 124.”; Yun, ¶ Col. 14, lines 16-21), the text data comprising at least one of a transcription and a translation of audio associated with the content item (“The text data may comprise information from retrieving one or more text captions, recognizing speech with a speech recognition module, or recognizing writing with an optical character recognition module” and may “be based at least in part on closed captions, open captions, subtitles, and so forth”; Yun, ¶ Col. 14, lines 16-25); determining a modified version of the text data based on at least one of a deviation in a playback timeline of the content item, topics associated with the text data, a chronological timeline associated with the content item, and a sequence of at least one of events associated with the content item and a content of the content item (“generates one or more tags associated with the one or more pieces of content 124 based at least in part on information in the determined text data and the one or more occurrence times 212” where “selecting one or more tags 132 based on one or more criteria, arranging the selected one or more tags 132” and the arranged tags are understood as the determined modified version of the text data, and where “The tag stores information descriptive of the event” and the events include “a particular character being on-screen, a particular topic or subject being discussed, particular voice... in a video an event may be the lead character being in an image frame... [and/or] in an audiobook an event may be dialogue associated with a particular character or voiced by a particular character” and so forth.; Yun, ¶ Col. 2, lines 9-17; Col. 14, lines 26-32); generating a representation of the modified version of the text data (Discloses “generating one or more storylines” comprising, based on the selected and arranged “one or more tags 132... storing the arrangement of the selected one or more tags 132 as the storyline 134 {a representation of the modified version of the text data}” where the “arranging may comprise sorting the tags 132 by the occurrence time 212” or “chronology of the story” such as, to address “a storyline 134 for a character which features flashback and flash-forward scenes may result in a storyline which presents the scenes showing the flashbacks first, consistent with their earlier appearance in the internal chronology of the story as being earlier events.”; Yun, ¶ Col. 15, lines 9-36); and generating metadata associated with the content item based on the representation of the modified version of the text data (“the tags 132 may be placed into a data structure or otherwise associated with particular portions of the content 124” and “during presentation of the content 132 as defined by the storyline 134, the user may experience a sequential flow of the events 208 indicated by the tags 132 in a particular arrangement.” As presented here, the tags 132, which are described as including information “descriptive of one or more of a character, location, theme, or other event 208 or event 208 components which are present in the portion of the content 124,” are the generated data about the data, and thus are the generated metadata. The tags are further associated with the content 124 {content item}, in light of the ascribed organization defined by the storyline 134 {based on the representation of the modified version of the text data}; Yun, ¶ Col. 6, lines 5-10; Col. 12, lines 26-34 and 43-49).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim 7-10, and 16-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yun as applied to claims 1 and 12 above, and further in view of Mahyar (U.S. Pat. No. 10999566, hereinafter Mahyar).
Regarding claim 7, the rejection of claim 1 is incorporated. Yun discloses all of the elements of the current invention as stated above. However, Yun fails to expressly recite wherein the operations further comprise: detecting the deviation in the playback timeline of the content item based on at least one of a portion of the text data associated with the deviation in the playback timeline of the content item, the audio associated with the content item, and image data associated with the content item, the image data comprising at least one of one or more video frames and one or more still images.
Mahyar teaches “systems and methods to automatically generate textual descriptions for video content.” (Mahyar, ¶ Col. 2, lines 33-35). Regarding claim 7, Mahyar teaches wherein the operations further comprise: detecting the deviation in the playback timeline of the content item (Discloses detecting “a scene in the content” which is “interrupted by a flashback or other scene {a deviation from the playback timeline...}” as part of a “first segment comprising a first set of frames and first audio content”; Mahyar, ¶ Col. 6, line 64 - Col. 7, line 43) based on at least one of a portion of the text data associated with the deviation in the playback timeline of the content item, the audio associated with the content item, and image data associated with the content item, the image data comprising at least one of one or more video frames and one or more still images (“The first segment” may comprise “non-continuous segments which are related,” also referred to as clips, which may correspond “to events, scenes, and/or other occurrences that may be discrete and/or extractable from the content” based on “certain locations and/or times, certain actors that appear, certain music or sounds, and/or other features of the content” and the system can “extract and/or analyze individual frames of video content to determine whether the frames are part of the same segment or a different segment” including by “processing images using one or more object recognition algorithms, determining pixel color values, comparing certain portions of frames to previous or subsequent frames in the video, and the like.”; Mahyar, ¶ Col. 7, lines 7-43).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the storyline generation systems of Yun to incorporate the teachings of Mahyar to include wherein the operations further comprise: detecting the deviation in the playback timeline of the content item based on at least one of a portion of the text data associated with the deviation in the playback timeline of the content item, the audio associated with the content item, and image data associated with the content item, the image data comprising at least one of one or more video frames and one or more still images. Yun discloses the generation of storylines allowing for novel content generation from existing content, and which further considers discontinuous content. However, Yun is silent as to the automated detection of said discontinuities. Mahyar discloses the automated determination of “non-continuous segments that are related”, which allows for automating the determination of the corresponding portions of events, while avoiding the inclusion of incomplete events due to having internal interruptions, which provides the known benefit of reducing costs for manual annotation, and increasing the efficiency of the storyline generation described in Yun, as recognized by Mahyar. (Mahyar, ¶ Col. 7, lines 5-16).
Regarding claim 8, the rejection of claim 7 is incorporated. Yun and Mahyar disclose all of the elements of the current invention as stated above. Yun further discloses wherein the deviation in the playback timeline comprises at least one of a flashback, a flashforward, and a content recap (“a storyline 134 for a character which features [a] flashback” where the flashback is the deviation in the playback timeline; Yun, ¶ Col. 15, lines 22-36), and wherein the text data comprises at least one of closed captions and subtitles (Discloses “caption data 206” comprising “text corresponding to dialogue present in the video data 202,” in examples where video data is the content, and further including “descriptive language as to the events in the scene” where the “Caption data 206 may include data encoded as closed captions, open captions, subtitles, and so forth.”; Yun, ¶ Col. 4, line 60 - Col. 5, line 6), and wherein the content item comprises at least one of a movie, a television show, a livestream, a podcast, a video game, a video conference, an audio, and a media broadcast comprising at least one of video and audio (“The content 124 may comprise one or more of video data 202, audio data 204, caption data 206, or other data such as text of an eBook.”; Yun, ¶ Col. 4, line 60 - Col. 5, line 6).
Regarding claim 9, the rejection of claim 7 is incorporated. Yun disclose all of the elements of the current invention as stated above. However, Yun fails to expressly recite wherein detecting the deviation in the playback timeline of the content item comprises: based on the image data, recognizing, using facial recognition, a character depicted in a portion of the image data corresponding to the deviation in the playback timeline; and detecting the deviation in the playback timeline of the content item based on a determination that the character is associated with a first segment of the playback timeline that is chronologically before a second segment of the playback timeline corresponding to the deviation in the playback timeline.
The relevance of Mahyar is described above with relation to claim 7. Regarding claim 9, Mahyar teaches wherein detecting the deviation in the playback timeline of the content item comprises: based on the image data, recognizing, using facial recognition, a character depicted in a portion of the image data corresponding to the deviation in the playback timeline (“the video processing module(s) 320 {based on image data} may include facial recognition and/or human pose detection algorithms that can be used to identify people or actions in certain locations over frames or segments of the video content” where non-consecutive frames can indicate “a flashback or cut to a different story” which interrupts a scene.; Mahyar, ¶ Col. 10, lines 4-28); and detecting the deviation in the playback timeline of the content item based on a determination that the character is associated with a first segment of the playback timeline that is chronologically before a second segment of the playback timeline corresponding to the deviation in the playback timeline (The system can “determine frames or sets of frames of video content and may be configured to detect certain features, such as certain objects, as well as actions or events across multiple frames” and can “identify people or actions in certain locations” which are non-consecutive as an indication of “a flashback or cut to a different story”, which is understood as including the identification of a first person in a first frame at some timepoint A, a failure to identify the first person at some timepoint B, and the later identification of the first person in a third frame at some timepoint C, and these frames are non-consecutive because of the inclusion of one or more frames at timepoint B which do not maintain the relationship shared by the first frame and the third frame.; Mahyar, ¶ Col. 10, lines 4-28).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the storyline generation systems of Yun to incorporate the teachings of Mahyar to include wherein detecting the deviation in the playback timeline of the content item comprises: based on the image data, recognizing, using facial recognition, a character depicted in a portion of the image data corresponding to the deviation in the playback timeline; and detecting the deviation in the playback timeline of the content item based on a determination that the character is associated with a first segment of the playback timeline that is chronologically before a second segment of the playback timeline corresponding to the deviation in the playback timeline. Yun discloses the generation of storylines allowing for novel content generation from existing content, and which further considers discontinuous content. However, Yun is silent as to the automated detection of said discontinuities. Mahyar discloses the automated determination of “non-continuous segments that are related”, which allows for automating the determination of the corresponding portions of events, while avoiding the inclusion of incomplete events due to having internal interruptions, which provides the known benefit of reducing costs for manual annotation, and increasing the efficiency of the storyline generation described in Yun, as recognized by Mahyar. (Mahyar, ¶ Col. 7, lines 5-16).
Regarding claim 10, the rejection of claim 7 is incorporated. Yun disclose all of the elements of the current invention as stated above. However, Yun fails to expressly recite wherein detecting the deviation in the playback timeline of the content item comprises: based on the image data, recognizing, using scene or image recognition, a scene depicted in a portion of the image data corresponding to the deviation in the playback timeline; and detecting the deviation in the playback timeline of the content item based on a determination that the scene matches a previous scene in the playback timeline or the scene is associated with a segment of the playback timeline that is before a second segment of the playback timeline corresponding to the deviation in the playback timeline.
The relevance of Mahyar is described above with relation to claim 7. Regarding claim 10, Mahyar teaches wherein detecting the deviation in the playback timeline of the content item comprises: based on the image data, recognizing, using scene or image recognition, a scene depicted in a portion of the image data corresponding to the deviation in the playback timeline (“the video processing module(s) 320 may include facial recognition and/or human pose detection algorithms that can be used to identify people or actions in certain locations over frames or segments of the video content, which may not always be consecutive” and may further include “one or more object recognition algorithms configured to detect at least one of predefined objects, predefined scenery” which may be used as part of detecting “certain locations” which may be “used to identify people or actions in certain locations over frames or segments of the video content” where non-consecutive frames can indicate “a flashback or cut to a different story” which interrupts a scene.; Mahyar, ¶ Col. 10, lines 4-28); and detecting the deviation in the playback timeline of the content item based on a determination that the scene matches a previous scene in the playback timeline or the scene is associated with a segment of the playback timeline that is before a second segment of the playback timeline corresponding to the deviation in the playback timeline (The system can “determine frames or sets of frames of video content and may be configured to detect certain features, such as certain objects, as well as actions or events across multiple frames” and can “identify people or actions in certain locations” which are non-consecutive as an indication of “a flashback or cut to a different story”, which is understood as including the identification of a first person with relation to a first action in a first frame at some timepoint A, a failure to identify the first person with relation to the first action at some timepoint B, and the later identification of the first person with relation to the first action in a third frame at some timepoint C, and these frames are non-consecutive because of the inclusion of one or more frames at timepoint B which do not maintain the person/action relationship.; Mahyar, ¶ Col. 10, lines 4-28).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the storyline generation systems of Yun to incorporate the teachings of Mahyar to include wherein detecting the deviation in the playback timeline of the content item comprises: based on the image data, recognizing, using scene or image recognition, a scene depicted in a portion of the image data corresponding to the deviation in the playback timeline; and detecting the deviation in the playback timeline of the content item based on a determination that the scene matches a previous scene in the playback timeline or the scene is associated with a segment of the playback timeline that is before a second segment of the playback timeline corresponding to the deviation in the playback timeline. Yun discloses the generation of storylines allowing for novel content generation from existing content, and which further considers discontinuous content. However, Yun is silent as to the automated detection of said discontinuities. Mahyar discloses the automated determination of “non-continuous segments that are related”, which allows for automating the determination of the corresponding portions of events, while avoiding the inclusion of incomplete events due to having internal interruptions, which provides the known benefit of reducing costs for manual annotation, and increasing the efficiency of the storyline generation described in Yun, as recognized by Mahyar. (Mahyar, ¶ Col. 7, lines 5-16).
Regarding claim 16, the rejection of claim 12 is incorporated. Yun and Mahyar disclose all of the elements of the current invention as stated above. Yun further discloses wherein the deviation in the playback timeline comprises at least one of a flashback, a flashforward, and a content recap (“a storyline 134 for a character which features [a] flashback” where the flashback is the deviation in the playback timeline; Yun, ¶ Col. 15, lines 22-36), and wherein the text data comprises at least one of closed captions and subtitles (Discloses “caption data 206” comprising “text corresponding to dialogue present in the video data 202,” in examples where video data is the content, and further including “descriptive language as to the events in the scene” where the “Caption data 206 may include data encoded as closed captions, open captions, subtitles, and so forth.”; Yun, ¶ Col. 4, line 60 - Col. 5, line 6), and wherein the content item comprises at least one of a movie, a television show, a livestream, a podcast, a video game, a video conference, an audio, and a media broadcast comprising at least one of video and audio (“The content 124 may comprise one or more of video data 202, audio data 204, caption data 206, or other data such as text of an eBook.”; Yun, ¶ Col. 4, line 60 - Col. 5, line 6). However, Yun fails to expressly recite wherein the operations further comprise: detecting the deviation in the playback timeline of the content item based on at least one of a portion of the text data associated with the deviation in the playback timeline of the content item, the audio associated with the content item, and image data associated with the content item, the image data comprising at least one of one or more video frames and one or more still images.
The relevance of Mahyar is described above with relation to claim 7. Regarding claim 16, Mahyar teaches wherein the operations further comprise: detecting the deviation in the playback timeline of the content item (Discloses detecting “a scene in the content” which is “interrupted by a flashback or other scene {a deviation from the playback timeline...}” as part of a “first segment comprising a first set of frames and first audio content”; Mahyar, ¶ Col. 6, line 64 - Col. 7, line 43) based on at least one of a portion of the text data associated with the deviation in the playback timeline of the content item, the audio associated with the content item, and image data associated with the content item, the image data comprising at least one of one or more video frames and one or more still images (“The first segment” may comprise “non-continuous segments which are related,” also referred to as clips, which may correspond “to events, scenes, and/or other occurrences that may be discrete and/or extractable from the content” based on “certain locations and/or times, certain actors that appear, certain music or sounds, and/or other features of the content” and the system can “extract and/or analyze individual frames of video content to determine whether the frames are part of the same segment or a different segment” including by “processing images using one or more object recognition algorithms, determining pixel color values, comparing certain portions of frames to previous or subsequent frames in the video, and the like.”; Mahyar, ¶ Col. 7, lines 7-43).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the storyline generation systems of Yun to incorporate the teachings of Mahyar to include wherein the operations further comprise: detecting the deviation in the playback timeline of the content item based on at least one of a portion of the text data associated with the deviation in the playback timeline of the content item, the audio associated with the content item, and image data associated with the content item, the image data comprising at least one of one or more video frames and one or more still images. Yun discloses the generation of storylines allowing for novel content generation from existing content, and which further considers discontinuous content. However, Yun is silent as to the automated detection of said discontinuities. Mahyar discloses the automated determination of “non-continuous segments that are related”, which allows for automating the determination of the corresponding portions of events, while avoiding the inclusion of incomplete events due to having internal interruptions, which provides the known benefit of reducing costs for manual annotation, and increasing the efficiency of the storyline generation described in Yun, as recognized by Mahyar. (Mahyar, ¶ Col. 7, lines 5-16)
Regarding claim 17, the rejection of claim 16 is incorporated. Yun disclose all of the elements of the current invention as stated above. However, Yun fails to expressly recite wherein detecting the deviation in the playback timeline of the content item comprises: based on the image data, recognizing, using facial recognition, a character depicted in a portion of the image data corresponding to the deviation in the playback timeline; and detecting the deviation in the playback timeline of the content item based on a determination that the character is associated with a first segment of the playback timeline that is chronologically before a second segment of the playback timeline corresponding to the deviation in the playback timeline.
The relevance of Mahyar is described above with relation to claim 7. Regarding claim 17, Mahyar teaches wherein detecting the deviation in the playback timeline of the content item comprises: based on the image data, recognizing, using facial recognition, a character depicted in a portion of the image data corresponding to the deviation in the playback timeline (“the video processing module(s) 320 {based on image data} may include facial recognition and/or human pose detection algorithms that can be used to identify people or actions in certain locations over frames or segments of the video content” where non-consecutive frames can indicate “a flashback or cut to a different story” which interrupts a scene.; Mahyar, ¶ Col. 10, lines 4-28); and detecting the deviation in the playback timeline of the content item based on a determination that the character is associated with a first segment of the playback timeline that is chronologically before a second segment of the playback timeline corresponding to the deviation in the playback timeline (The system can “determine frames or sets of frames of video content and may be configured to detect certain features, such as certain objects, as well as actions or events across multiple frames” and can “identify people or actions in certain locations” which are non-consecutive as an indication of “a flashback or cut to a different story”, which is understood as including the identification of a first person in a first frame at some timepoint A, a failure to identify the first person at some timepoint B, and the later identification of the first person in a third frame at some timepoint C, and these frames are non-consecutive because of the inclusion of one or more frames at timepoint B which do not maintain the relationship shared by the first frame and the third frame.; Mahyar, ¶ Col. 10, lines 4-28).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the storyline generation systems of Yun to incorporate the teachings of Mahyar to include wherein detecting the deviation in the playback timeline of the content item comprises: based on the image data, recognizing, using facial recognition, a character depicted in a portion of the image data corresponding to the deviation in the playback timeline; and detecting the deviation in the playback timeline of the content item based on a determination that the character is associated with a first segment of the playback timeline that is chronologically before a second segment of the playback timeline corresponding to the deviation in the playback timeline. Yun discloses the generation of storylines allowing for novel content generation from existing content, and which further considers discontinuous content. However, Yun is silent as to the automated detection of said discontinuities. Mahyar discloses the automated determination of “non-continuous segments that are related”, which allows for automating the determination of the corresponding portions of events, while avoiding the inclusion of incomplete events due to having internal interruptions, which provides the known benefit of reducing costs for manual annotation, and increasing the efficiency of the storyline generation described in Yun, as recognized by Mahyar. (Mahyar, ¶ Col. 7, lines 5-16).
Regarding claim 18, the rejection of claim 16 is incorporated. Yun disclose all of the elements of the current invention as stated above. However, Yun fails to expressly recite wherein detecting the deviation in the playback timeline of the content item comprises: based on the image data, recognizing, using scene or image recognition, a scene depicted in a portion of the image data corresponding to the deviation in the playback timeline; and detecting the deviation in the playback timeline of the content item based on a determination that the scene matches a previous scene in the playback timeline or the scene is associated with a segment of the playback timeline that is before a second segment of the playback timeline corresponding to the deviation in the playback timeline.
The relevance of Mahyar is described above with relation to claim 7. Regarding claim 18, Mahyar teaches wherein detecting the deviation in the playback timeline of the content item comprises: based on the image data, recognizing, using scene or image recognition, a scene depicted in a portion of the image data corresponding to the deviation in the playback timeline (“the video processing module(s) 320 may include facial recognition and/or human pose detection algorithms that can be used to identify people or actions in certain locations over frames or segments of the video content, which may not always be consecutive” and may further include “one or more object recognition algorithms configured to detect at least one of predefined objects, predefined scenery” which may be used as part of detecting “certain locations” which may be “used to identify people or actions in certain locations over frames or segments of the video content” where non-consecutive frames can indicate “a flashback or cut to a different story” which interrupts a scene.; Mahyar, ¶ Col. 10, lines 4-28); and detecting the deviation in the playback timeline of the content item based on a determination that the scene matches a previous scene in the playback timeline or the scene is associated with a segment of the playback timeline that is before a second segment of the playback timeline corresponding to the deviation in the playback timeline (The system can “determine frames or sets of frames of video content and may be configured to detect certain features, such as certain objects, as well as actions or events across multiple frames” and can “identify people or actions in certain locations” which are non-consecutive as an indication of “a flashback or cut to a different story”, which is understood as including the identification of a first person with relation to a first action in a first frame at some timepoint A, a failure to identify the first person with relation to the first action at some timepoint B, and the later identification of the first person with relation to the first action in a third frame at some timepoint C, and these frames are non-consecutive because of the inclusion of one or more frames at timepoint B which do not maintain the person/action relationship.; Mahyar, ¶ Col. 10, lines 4-28).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the storyline generation systems of Yun to incorporate the teachings of Mahyar to include wherein detecting the deviation in the playback timeline of the content item comprises: based on the image data, recognizing, using scene or image recognition, a scene depicted in a portion of the image data corresponding to the deviation in the playback timeline; and detecting the deviation in the playback timeline of the content item based on a determination that the scene matches a previous scene in the playback timeline or the scene is associated with a segment of the playback timeline that is before a second segment of the playback timeline corresponding to the deviation in the playback timeline. Yun discloses the generation of storylines allowing for novel content generation from existing content, and which further considers discontinuous content. However, Yun is silent as to the automated detection of said discontinuities. Mahyar discloses the automated determination of “non-continuous segments that are related”, which allows for automating the determination of the corresponding portions of events, while avoiding the inclusion of incomplete events due to having internal interruptions, which provides the known benefit of reducing costs for manual annotation, and increasing the efficiency of the storyline generation described in Yun, as recognized by Mahyar. (Mahyar, ¶ Col. 7, lines 5-16).
Claims 11 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yun and Mahyar as applied to claim 7 and 16 above, and further in view of Non-patent Literature to Eisenberg (Eisenberg, J & Finlayson, M. (2021) 'Narrative Boundaries Annotation Guide', Journal of Cultural Analytics. 6(4) https://doi.org/10.22148/001c.30698, hereinafter Eisenberg).
Regarding claim 11, the rejection of claim 7 is incorporated. Yun and Mahyar disclose all of the elements of the current invention as stated above. Yun further discloses wherein detecting the deviation in the playback timeline of the content item comprises: recognizing, using speech or voice recognition, at least one of an utterance in the audio associated with the content item, speech in the audio associated with the content item, and a voice in the audio associated with the content item (Discloses “A recognition module 624... configured to recognize events 208 or event constituents within the content 124” and which “generates data about components of the events 208” using “facial recognition, voice recognition to identify a particular speaker, speech recognition to transform spoken words to text for processing, object recognition, optical character recognition, and so forth.; Yun, ¶ Col. 9, lines 45-59). However, Yun and Mahyar fail to expressly recite detecting the deviation in the playback timeline of the content item based on at least one of: a first determination that at least one of the voice and a character associated with the voice is associated with a first segment of the playback timeline that is before a second segment of the playback timeline corresponding to the deviation in the playback timeline; and a second determination that at least one of the utterance, the voice, and the speech is associated with the first segment of the playback timeline.
Eisenberg teaches systems and methods for training computers “to identify the beginnings and ends of narratives”. (Eisenberg, ¶ Pg. 1, para. 1). Regarding claim 11, Eisenberg teaches detecting the deviation in the playback timeline of the content item based on at least one of: a first determination that at least one of the voice and a character associated with the voice is associated with a first segment of the playback timeline that is before a second segment of the playback timeline corresponding to the deviation in the playback timeline (Discloses the detection of interrupt narratives, which “can occur within chapters, or, for our purposes, within short stories, chapters of novels, or in the dialogue of a script” where a change in “the person narrating” indicates “an interruptive narrative boundary.” Where the determination of the person narrating, in the context of the “dialogue of a script” is the at least one voice and the character associated with the voice, each of which are associated with “the story of the original narrative {a first segment of the playback timeline}” and this occurs before “the interrupting narrative begins to be told” as the interrupting narrative begins “Once the original narrative has stopped.”; Eisenberg, ¶ pg. 36-37, inclusive); and a second determination that at least one of the utterance, the voice, and the speech is associated with the first segment of the playback timeline (Discloses a second determination of a change in “the person narrating” or a change in the perspective of the original narrator (e.g., “the original narrator is a first-person narrator, and then the narrator will suddenly shift to a third person impersonal narrator”) each of which correspond to a change in at least one of the utterance, the voice, and the speech, as associated with the person narrating the original narrative {the first segment of the playback timeline}; Eisenberg, ¶ pg. 36-37, inclusive).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the storyline generation systems of Yun, as modified by the automated video discontinuity detection systems of Mahyar to incorporate the teachings of Eisenberg to include detecting the deviation in the playback timeline of the content item based on at least one of: a first determination that at least one of the voice and a character associated with the voice is associated with a first segment of the playback timeline that is before a second segment of the playback timeline corresponding to the deviation in the playback timeline; and a second determination that at least one of the utterance, the voice, and the speech is associated with the first segment of the playback timeline. Yun discloses the use of "voice recognition to identify a particular speaker" within the content, and the recognition of events including flashbacks within the content. However, Yun is silent as to the automated detection of said discontinuities. Eisenberg discloses using the collected audio information for the detection of interrupt narratives, which includes interrupt flashbacks and flashforwards. It would have been obvious to one having ordinary skill in the art to combine the known techniques from Yun for automatically detecting a particular speaker and determining the content of said speech using the disclosed speech recognition, to the detection of interrupt narratives based on a speaker (voice, speech, utterance) being different between two scenes, as the further use of audio and text to determine sequence for the tags based on chronological order further improves "the user experience... allowing the user to consume content which is most meaningful to them," by automating the detection of narrative boundaries as disclosed in Eisenberg, not least of which by increasing the amount of available content to which the techniques of Yun may be applied, as recognized in light of the disclosure of Eisenberg. (Eisenberg, ¶ pg. 1, Abstract and Introduction).
Regarding claim 19, the rejection of claim 16 is incorporated. Yun and Mahyar disclose all of the elements of the current invention as stated above. Yun further discloses wherein detecting the deviation in the playback timeline of the content item comprises: recognizing, using speech or voice recognition, at least one of an utterance in the audio associated with the content item, speech in the audio associated with the content item, and a voice in the audio associated with the content item (Discloses “A recognition module 624... configured to recognize events 208 or event constituents within the content 124” and which “generates data about components of the events 208” using “facial recognition, voice recognition to identify a particular speaker, speech recognition to transform spoken words to text for processing, object recognition, optical character recognition, and so forth.; Yun, ¶ Col. 9, lines 45-59). However, Yun and Mahyar fail to expressly recite detecting the deviation in the playback timeline of the content item based on at least one of: a first determination that at least one of the voice and a character associated with the voice is associated with a first segment of the playback timeline that is before a second segment of the playback timeline corresponding to the deviation in the playback timeline; and a second determination that at least one of the utterance, the voice, and the speech is associated with the first segment of the playback timeline.
The relevance of Eisenberg is described above with relation to claim 11. Regarding claim 19, Eisenberg teaches detecting the deviation in the playback timeline of the content item based on at least one of: a first determination that at least one of the voice and a character associated with the voice is associated with a first segment of the playback timeline that is before a second segment of the playback timeline corresponding to the deviation in the playback timeline (Discloses the detection of interrupt narratives, which “can occur within chapters, or, for our purposes, within short stories, chapters of novels, or in the dialogue of a script” where a change in “the person narrating” indicates “an interruptive narrative boundary.” Where the determination of the person narrating, in the context of the “dialogue of a script” is the at least one voice and the character associated with the voice, each of which are associated with “the story of the original narrative {a first segment of the playback timeline}” and this occurs before “the interrupting narrative begins to be told” as the interrupting narrative begins “Once the original narrative has stopped.”; Eisenberg, ¶ pg. 36-37, inclusive); and a second determination that at least one of the utterance, the voice, and the speech is associated with the first segment of the playback timeline (Discloses a second determination of a change in “the person narrating” or a change in the perspective of the original narrator (e.g., “the original narrator is a first-person narrator, and then the narrator will suddenly shift to a third person impersonal narrator”) each of which correspond to a change in at least one of the utterance, the voice, and the speech, as associated with the person narrating the original narrative {the first segment of the playback timeline}; Eisenberg, ¶ pg. 36-37, inclusive).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the storyline generation systems of Yun, as modified by the automated video discontinuity detection systems of Mahyar to incorporate the teachings of Eisenberg to include detecting the deviation in the playback timeline of the content item based on at least one of: a first determination that at least one of the voice and a character associated with the voice is associated with a first segment of the playback timeline that is before a second segment of the playback timeline corresponding to the deviation in the playback timeline; and a second determination that at least one of the utterance, the voice, and the speech is associated with the first segment of the playback timeline. Yun discloses the use of "voice recognition to identify a particular speaker" within the content, and the recognition of events including flashbacks within the content. However, Yun is silent as to the automated detection of said discontinuities. Eisenberg discloses using the collected audio information for the detection of interrupt narratives, which includes interrupt flashbacks and flashforwards. It would have been obvious to one having ordinary skill in the art to combine the known techniques from Yun for automatically detecting a particular speaker and determining the content of said speech using the disclosed speech recognition, to the detection of interrupt narratives based on a speaker (voice, speech, utterance) being different between two scenes, as the further use of audio and text to determine sequence for the tags based on chronological order further improves "the user experience... allowing the user to consume content which is most meaningful to them," by automating the detection of narrative boundaries as disclosed in Eisenberg, not least of which by increasing the amount of available content to which the techniques of Yun may be applied, as recognized in light of the disclosure of Eisenberg. (Eisenberg, ¶ pg. 1, Abstract and Introduction).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Clarke et al. (U.S. Pat. App. Pub. No. 2019/0394531) discloses systems and methods for presenting content with time-based metadata.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Sean E. Serraguard whose telephone number is (313)446-6627. The examiner can normally be reached 07:00-17:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel C. Washburn can be reached at (571) 272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Sean E Serraguard/Patent Examiner, Art Unit 2657