DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on December 19, 2025 has been entered.
Response to Arguments
Applicant’s arguments with respect to claims 1, 3-9 and 11-16 have been considered but are moot because the new ground of rejection does not rely on newly discover art of Cheng et al. applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 3-9 and 11-16 are rejected under 35 U.S.C. 103 as being unpatentable over Van Welzen et al. US 2022/0410004 further in view of Cheng et al. US 2022/0188550.
In regarding to claim 1, Van Welzen teaches:
1. A video clip extraction method, comprising: obtaining event information of a plurality of game events of a game application during a program execution period;
[0080] FIG. 5 is a flow diagram showing a method for presenting a game summary that includes a timeline associated with in-game events, in accordance with some embodiments of the present disclosure. The method 500, at block B502, includes receiving metadata that indicates timing information of in-game events. For example, the graphical interface manager 130 and/or the application 106 may receive metadata that indicates the timing of one or more in-game events within one or more gameplay sessions and associations between the one or more in-game events and one or more video clips that capture the one or more in-game events. The in-game events may be determined based at least on analyzing video data representative of the one or more gameplay sessions and/or other game data.
Van Welzen, 0080-0084 and Figs. 5-7, emphasis added
converting the event information of the game events into an input text;
[0067] Examples of in-game events that may be detected include eliminating another character in the game, collecting a certain item, scoring a goal, hitting a home run, summiting a tall building or mountain, performing or achieving a user-specified task or goal, leveling up, winning a round, losing a round, and/or another event type. For example, in some embodiments, the interest determiner 140 may recognize changes in a reticle or other interface element of a game that indicates an in-game event, such as in-game elimination of a character and trigger a capture event. As further examples, the game data capturer 138 may recognize text (e.g., using optical character recognition (OCR)) in the game instance signifying “Player 1 eliminated Player 4,” or “Player 1 scored a touchdown,” or other in-game events to trigger one or more capture events. Approaches that rely on object detection may analyze game visual data to identify one or more capture events and/or to classify game content.
Van Welzen, 0066-0067, emphasis added
providing the input text comprising at least one time interval between the game events to a text classification model;
[0066] The interest determiner 140 may use a wide variety of potential approaches to detect the game content and/or corresponding in-game events in game data (e.g., automatically), examples of which are described herein. To do so, the interest determiner 140 analyze any form of game data, such as a game stream, recorded game data, and/or identified game content to determine and/or detect an in-game event has occurred. For example, the analysis may be used to trigger a capture event and instruct the capture (or save from a buffer) of at least a portion of game content from the game stream and/or to classify a detected in-game event and/or one or more attributes thereof (e.g., for inclusion in metadata of one or more game summaries 136). The game data capturer 138 may determine an in-game event has occurred based on artificial intelligence, object detection, computer vision, text recognition, and/or other methods of analysis. For example, the interest determiner 140 may leverage any type of machine learning model to detect in-game events and corresponding game content, such as machine learning models using linear regression, logistic regression, decision trees, support vector machine (SVM), Naïve Bayes, k-nearest neighbor (Knn), K means clustering, random forest, dimensionality reduction algorithms, gradient boosting algorithms, neural networks (e.g., auto-encoders, convolutional, recurrent, perceptrons, long/short terms memory, Hopfield, Boltzmann, deep belief, deconvolutional, generative adversarial, liquid state machine, etc.), and/or other types of machine learning models. In some embodiments, the machine-learning model includes a deep convolutional neural network.
Van Welzen, 0066-0067 and Figs. 5-7, emphasis added
and extracting a video clip from a recorded game video of the game application according to a classification category of the input text predicted by the text classification model,
[0022] In various embodiments, in-game events may be detected using algorithms that watch the screen for visual cues (e.g., using neural networks, computer vision, and the like). When these visual cues are detected, it may automatically trigger the recording and/or saving of the gameplay. In at least one embodiment, it may additionally or alternatively trigger further analysis of the corresponding video data to extract one or more elements of the metadata or at least some of the metadata may be derived by detection of a corresponding visual cue. The metadata, screenshots, video clips, the map (e.g., extracted from and/or associated with the game), player icons from the gameplay, and more may be stored in a data entity for use in displaying the game summary.
Van Welzen, 0022, 0038, 0066-0067 and Figs. 5-7, emphasis added
wherein converting the event information of the game events into the input text comprises: calculating at least one time interval between the game events according to a plurality of event occurrence times of the game events,
[0042] In at least one embodiment, the graphical interface manager 130 may generate a game summary using metadata of an event log of in-game events and corresponding screenshots, video clips, and/or other game content, each of which may be generated automatically by the game data capturer 138 based on analyzing game data associated with a gameplay session(s). The metadata may indicate timing information of in-game events within the gameplay session(s) and associations between the in-game events and screenshots, video clips, and/or other game content that captures the in-game events. By way of example and not limitation, the metadata may comprise timestamps in-game events and/or game content that corresponds to the in-game events (e.g., start times, end times, times of non-durational events, etc.). The timestamps may be used to display data corresponding to the in-game events with temporal context (e.g., via a timeline, thumbnails on a grid, icons on a map, tables of stats per round, etc.).
[0067] Examples of in-game events that may be detected include eliminating another character in the game, collecting a certain item, scoring a goal, hitting a home run, summiting a tall building or mountain, performing or achieving a user-specified task or goal, leveling up, winning a round, losing a round, and/or another event type. For example, in some embodiments, the interest determiner 140 may recognize changes in a reticle or other interface element of a game that indicates an in-game event, such as in-game elimination of a character and trigger a capture event. As further examples, the game data capturer 138 may recognize text (e.g., using optical character recognition (OCR)) in the game instance signifying “Player 1 eliminated Player 4,” or “Player 1 scored a touchdown,” or other in-game events to trigger one or more capture events. Approaches that rely on object detection may analyze game visual data to identify one or more capture events and/or to classify game content.
Van Welzen, 0042, 0067, 0080, emphasis added
However, Van Welzen fails to explicitly teach, but Cheng teaches:
wherein calculating the at least one time interval comprising calculating a first time interval between a first event occurrence time of a first game event and a second event occurrence time of a second game event;
[0092] Once trained, the overall highlight generating system, such as depicted in FIG. 1, may be deployed. In one or more embodiments, the system may additionally include an input that allows a user to select one or more parameters for the generated clips. For example, the user may select a specific player, a span of games, one or more events of interest (e.g., goals and penalties), and the number of clips that make the highlight video (or a length of time for each clip and/or the overall highlight compilation video). The highlight generating system may then access videos and metadata and generate the highlight compilation video by concatenating the clips. For example, the user may want 10 seconds per event of interest clip. Thus, in one or more embodiments, the customized highlight video generation module may take the final predicted times for clips and select 8 seconds before the event of interest and 2 seconds after. Alternatively, as illustrated in FIG. 1, key events of a player's career may be the events of interest and they may be automatically identified and compiled into a “story” of the player's career. Audio and other multimedia features may be added to the video by the customized highlight video generation module, which audio and features may be selected by the user. One skilled in the art shall recognize other applications of the highlight generation system.
Cheng, 0055, 0057, 0066-0067, 0069, 0074, 0092, emphasis added.
Accordingly, it would have been obvious to one ordinary skill in the art before the effective filing date to combine the teaching of Cheng with the system of Van Welzen in order wherein calculating the at least one time interval comprising calculating a first time interval between a first event occurrence time of a first game event and a second event occurrence time of a second game event, as such, generally the systems and methods for computer learning that can provide improved computer performance, features, and uses in addition to automatically and precisely generate digested or condensed video content, such as highlight videos..—0002 and 0004.
Furthermore, Cheng teaches: and generating the input text for the text classification model by concatenating the at least one time interval and event identifications of the game event.
[0057] The commentaries and labels provide a large amount of information for each game. For example, they include game date, team names, leagues, game events time (in minute), event labels such as goal, shot, corner, substitution, foul, etc., and associated player names. These commentaries and labels from cloud-sourced data may be translated into or may be considered as rich metadata for raw video processing embodiments, as well as highlight video generation embodiments.
[0062] The second module embodiments in section C.2 are the coarse interval extraction embodiments. This module is a major difference compared to commonly studied event spotting pipelines. In embodiments of this module, intervals of 70 seconds (although other size intervals may be used) are extracted, where a specific event is located by utilizing textual metadata. There are at least three reasons this approach is preferred compared to common end-to-end visual event spotting pipelines. First, clips extracted with metadata contain more context information and can be useful across different dimensions. With the metadata, the clips may be used as a temporal cut (such as game highlight videos) or may be used with other clips for the same team or player to generate team, player, and/or season highlight videos. The second reason is robustness, which comes from low event ambiguity of textual data. And third, by analyzing shorter clips for the event of interest rather than the entire video, many resources (processing, processing time, memory, energy consumption, etc.) are preserved.
[0069] FIG. 9 depicts a method for generating clips from an input video, according to embodiments of the present disclosure. In one or more embodiments, metadata from the cloud-sourced game commentaries and labels, which include the timestamps in minutes for the goal events, is parsed (905). Combined with the game start times detected by an embodiment of the OCR tool (discussed above), the raw videos may be edited to generate a x-second (e.g., 70 seconds) candidate clips containing an event of interest. In one or more embodiments, the extracting rule may be described by the following equations.
[0092] Once trained, the overall highlight generating system, such as depicted in FIG. 1, may be deployed. In one or more embodiments, the system may additionally include an input that allows a user to select one or more parameters for the generated clips. For example, the user may select a specific player, a span of games, one or more events of interest (e.g., goals and penalties), and the number of clips that make the highlight video (or a length of time for each clip and/or the overall highlight compilation video). The highlight generating system may then access videos and metadata and generate the highlight compilation video by concatenating the clips. For example, the user may want 10 seconds per event of interest clip. Thus, in one or more embodiments, the customized highlight video generation module may take the final predicted times for clips and select 8 seconds before the event of interest and 2 seconds after. Alternatively, as illustrated in FIG. 1, key events of a player's career may be the events of interest and they may be automatically identified and compiled into a “story” of the player's career. Audio and other multimedia features may be added to the video by the customized highlight video generation module, which audio and features may be selected by the user. One skilled in the art shall recognize other applications of the highlight generation system.
Cheng, 0055, 0057, 0066-0067, 0069, 0074, 0092, emphasis added.
In regarding to claim 3, Van Welzen and Cheng teaches:
3. The video clip extraction method according to claim 2, furthermore, Van Welzen teaches: wherein generating the input text according to the at least one time interval and the event identifications of the each of the game events comprises: arranging sequentially the event identifications and the at least one time interval of the game events according to occurrence order of the game events to generate the input text comprising the event identifications and the at least one time interval.
[0044] The game summary 136 displayed in the user interface 150 may take a variety of potential forms and may be displayed within a variety of potential contexts (e.g., with an activity feed, a social feed, a gaming application, a game streaming application, etc.). By way of example and not limitation, the game summary 136 in FIG. 1 includes a list 155 of in-game events, visual indicators, and/or interface elements, a game content display region 142, a timeline 190, a round display region 192, a game name 152 of a game that corresponds to the game summary, as well as an amount of time 154 that the game was played for in one or more game sessions that correspond to the game summary 136. As described herein, the game summary 136 may include less than all of those features and/or different features or combinations thereof.
[0075] The game summary server(s) 116 may include one or more application programming interfaces (APIs) to enable communication of information (e.g., game data, timestamps, game content selection data, etc.) with the game server(s) 126 or the client device(s) 104. For example, the game summary server(s) 116 may include one or more game APIs that interface with the client devices 104 or the game server(s) 126 to receive game data and/or game summary data. As a further example, the game summary server(s) 116 may include one or more APIs that interface with the client device(s) 104 for transmitting classified game content and/or game summary data. Although different APIs are described herein, the APIs may be part of a single API, two or more of the APIs may be combined, different APIs may be included other than those described as examples herein, or a combination thereof.
Van Welzen, 0044, 0075, 0077, 0080-0084 and Figs. 5-7, emphasis added
In regarding to claim 4, Van Welzen and Cheng teaches:
4. The video clip extraction method according to claim 1, furthermore, Van Welzen teaches: wherein extracting the video clip from the recorded game video of the game application according to the classification category of the input text predicted by the text classification model comprises: determining description information of the video clip according to the classification category of the input text predicted by the text classification model.
Van Welzen, 0045-0046, 0075, 0077, 0116
In regarding to claim 5, Van Welzen and Cheng teaches:
5. The video clip extraction method according to claim 1, furthermore, Van Welzen teaches: wherein extracting the video clip from the recorded game video of the game application according to the classification category of the input text predicted by the text classification model comprises: when the text classification model determines that the input text is a first classification category, determining timestamp information of the video clip according to the event occurrence time of at least one of the game events; and extracting the video clip from the recorded game video according to the timestamp information.
[0080] FIG. 5 is a flow diagram showing a method for presenting a game summary that includes a timeline associated with in-game events, in accordance with some embodiments of the present disclosure. The method 500, at block B502, includes receiving metadata that indicates timing information of in-game events. For example, the graphical interface manager 130 and/or the application 106 may receive metadata that indicates the timing of one or more in-game events within one or more gameplay sessions and associations between the one or more in-game events and one or more video clips that capture the one or more in-game events. The in-game events may be determined based at least on analyzing video data representative of the one or more gameplay sessions and/or other game data.
Van Welzen, 0045-046, 0077, 0080-0084 and Figs. 5-7, emphasis added
In regarding to claim 6, Van Welzen and Cheng teaches:
6. The video clip extraction method according to claim 1, furthermore, Van Welzen teaches: further comprising: obtaining a plurality of training video clips; determining a plurality of classification labels of the training video clips;
[0066] The interest determiner 140 may use a wide variety of potential approaches to detect the game content and/or corresponding in-game events in game data (e.g., automatically), examples of which are described herein. To do so, the interest determiner 140 analyze any form of game data, such as a game stream, recorded game data, and/or identified game content to determine and/or detect an in-game event has occurred. For example, the analysis may be used to trigger a capture event and instruct the capture (or save from a buffer) of at least a portion of game content from the game stream and/or to classify a detected in-game event and/or one or more attributes thereof (e.g., for inclusion in metadata of one or more game summaries 136). The game data capturer 138 may determine an in-game event has occurred based on artificial intelligence, object detection, computer vision, text recognition, and/or other methods of analysis. For example, the interest determiner 140 may leverage any type of machine learning model to detect in-game events and corresponding game content, such as machine learning models using linear regression, logistic regression, decision trees, support vector machine (SVM), Naïve Bayes, k-nearest neighbor (Knn), K means clustering, random forest, dimensionality reduction algorithms, gradient boosting algorithms, neural networks (e.g., auto-encoders, convolutional, recurrent, perceptrons, long/short terms memory, Hopfield, Boltzmann, deep belief, deconvolutional, generative adversarial, liquid state machine, etc.), and/or other types of machine learning models. In some embodiments, the machine-learning model includes a deep convolutional neural network.
Van Welzen, 0066-0067, 0073, 0116, emphasis added
obtaining event information of a plurality of game events of the training video clips;
[0066] The interest determiner 140 may use a wide variety of potential approaches to detect the game content and/or corresponding in-game events in game data (e.g., automatically), examples of which are described herein. To do so, the interest determiner 140 analyze any form of game data, such as a game stream, recorded game data, and/or identified game content to determine and/or detect an in-game event has occurred. For example, the analysis may be used to trigger a capture event and instruct the capture (or save from a buffer) of at least a portion of game content from the game stream and/or to classify a detected in-game event and/or one or more attributes thereof (e.g., for inclusion in metadata of one or more game summaries 136). The game data capturer 138 may determine an in-game event has occurred based on artificial intelligence, object detection, computer vision, text recognition, and/or other methods of analysis. For example, the interest determiner 140 may leverage any type of machine learning model to detect in-game events and corresponding game content, such as machine learning models using linear regression, logistic regression, decision trees, support vector machine (SVM), Naïve Bayes, k-nearest neighbor (Knn), K means clustering, random forest, dimensionality reduction algorithms, gradient boosting algorithms, neural networks (e.g., auto-encoders, convolutional, recurrent, perceptrons, long/short terms memory, Hopfield, Boltzmann, deep belief, deconvolutional, generative adversarial, liquid state machine, etc.), and/or other types of machine learning models. In some embodiments, the machine-learning model includes a deep convolutional neural network.
Van Welzen, 0066-0067, 0073, 0116, emphasis added
converting the event information of the game events of the training video clips into a plurality of training input texts;
[0066] The interest determiner 140 may use a wide variety of potential approaches to detect the game content and/or corresponding in-game events in game data (e.g., automatically), examples of which are described herein. To do so, the interest determiner 140 analyze any form of game data, such as a game stream, recorded game data, and/or identified game content to determine and/or detect an in-game event has occurred. For example, the analysis may be used to trigger a capture event and instruct the capture (or save from a buffer) of at least a portion of game content from the game stream and/or to classify a detected in-game event and/or one or more attributes thereof (e.g., for inclusion in metadata of one or more game summaries 136). The game data capturer 138 may determine an in-game event has occurred based on artificial intelligence, object detection, computer vision, text recognition, and/or other methods of analysis. For example, the interest determiner 140 may leverage any type of machine learning model to detect in-game events and corresponding game content, such as machine learning models using linear regression, logistic regression, decision trees, support vector machine (SVM), Naïve Bayes, k-nearest neighbor (Knn), K means clustering, random forest, dimensionality reduction algorithms, gradient boosting algorithms, neural networks (e.g., auto-encoders, convolutional, recurrent, perceptrons, long/short terms memory, Hopfield, Boltzmann, deep belief, deconvolutional, generative adversarial, liquid state machine, etc.), and/or other types of machine learning models. In some embodiments, the machine-learning model includes a deep convolutional neural network.
Van Welzen, 0066-0067, 0073, 0116, emphasis added
and training the text classification model according to the training input texts and the classification labels of the training video clips.
[0066] The interest determiner 140 may use a wide variety of potential approaches to detect the game content and/or corresponding in-game events in game data (e.g., automatically), examples of which are described herein. To do so, the interest determiner 140 analyze any form of game data, such as a game stream, recorded game data, and/or identified game content to determine and/or detect an in-game event has occurred. For example, the analysis may be used to trigger a capture event and instruct the capture (or save from a buffer) of at least a portion of game content from the game stream and/or to classify a detected in-game event and/or one or more attributes thereof (e.g., for inclusion in metadata of one or more game summaries 136). The game data capturer 138 may determine an in-game event has occurred based on artificial intelligence, object detection, computer vision, text recognition, and/or other methods of analysis. For example, the interest determiner 140 may leverage any type of machine learning model to detect in-game events and corresponding game content, such as machine learning models using linear regression, logistic regression, decision trees, support vector machine (SVM), Naïve Bayes, k-nearest neighbor (Knn), K means clustering, random forest, dimensionality reduction algorithms, gradient boosting algorithms, neural networks (e.g., auto-encoders, convolutional, recurrent, perceptrons, long/short terms memory, Hopfield, Boltzmann, deep belief, deconvolutional, generative adversarial, liquid state machine, etc.), and/or other types of machine learning models. In some embodiments, the machine-learning model includes a deep convolutional neural network.
Van Welzen, 0066-0067, 0073, 0116, emphasis added
In regarding to claim 7, Van Welzen and Cheng teaches:
7. The video clip extraction method according to claim 6, furthermore, Van Welzen teaches: wherein determining the classification labels of the training video clips comprises: publishing the training video clips on a web page; and determining the classification labels of the training video clips by receiving, through the web page, labeled content provided by a plurality of user terminals.
Van Welzen, 0036, 0112, 0116
In regarding to claim 8, Van Welzen and Cheng teaches:
8. The video clip extraction method according to claim 1, furthermore, Van Welzen teaches: further comprising: providing relevant information of extracted video clips to a game broadcast server.
Van Welzen, 0050
Claims 9 and 11-16 list all similar elements of claims 1 and 3-8, but in device form rather than method form. Therefore, the supporting rationale of the rejection to claims 1 and 3-8 applies equally as well to claims 9 and 11-16.
Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
Summa et al. US 2021/0236944
XI US 2023/0079785
Pereira et al. US 2017/0201793
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DANIEL T TEKLE whose telephone number is (571)270-1117. The examiner can normally be reached Monday-Friday 8:00-4:30 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, William Vaughn can be reached at 571-272-3922. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/DANIEL T TEKLE/Primary Examiner, Art Unit 2481