Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
This office action is in response to application 18/617,772, which was filed 03/27/24. Claims 1-26 are pending in the application and have been considered.
Missing Oath/Declaration
Applicant’s attention is directed to the notice 04/05/24 informing Applicant that a properly executed oath or declaration has not been received for any of the named inventors. The application cannot be allowed until an oath is received, and failure to submit a properly executed oath during prosecution will result in delays should the application otherwise be found to be in condition for allowance.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-4, 9-14, 16-18, and 21-26 are rejected under 35 U.S.C. 103 as being unpatentable over Tadesse et al. (US 11558440) in view of Gelfenbeyn et al. (US 20230351661). Although Gelfenbeyn was filed 04/27/2023 which after Applicant’s 03/28/23 effective filing date, the Gelfenbeyn reference claims priority to provisional application 63/335,874, filed 04/28/22, and this office action cites only material also present in the provisional application.
Consider claim 1, Tadesse discloses a computer-implemented method for video generation (computer implemented process, Col 5 lines 11-16, for updating video with an answer to a question with generated video frames, Col 10 lines 54-60, Col 11 lines 27-29) comprising:
accessing a livestream, wherein the livestream features a host and is viewed by one or more viewers (a meeting host records a meeting that is live streamed to attendees, Col 6 lines 65-67, and is now previously recorded and being watched by a user, Col 7 lines 4-13);
analyzing audio from the livestream (live video streaming program transcribes the previously recorded online meeting and indexes the transcription, Col 7 lines 14-16);
predicting a plurality of potential questions from the one or more viewers, wherein the predicting is based on the audio that was analyzed (key events are created based on the transcribed audio and incorporated into knowledge graph, and live streaming program predicts one or more anticipated questions based on the extracted question/answer pairs, Col 7 lines 14-50);
generating an answer to each potential question within the plurality of potential questions (generating answers to the predicted questions, Col 7 lines 41-50);
a video segment for each answer to each potential question within the plurality of potential questions, wherein the video segment is associated with the potential question that was answered (for the knowledge graph containing predicted questions and their associated answers, the transcription is used to identify the relevant sections of the previously recorded online meeting containing the generated solutions, i.e. generated answers to the questions, Col 9 lines 7-17, Col 10 lines 30-46);
storing the video segment associated with each potential question from the plurality of potential questions (the presentation content transcription and sections of recorded video are indexed and stored for fast forwarded or rewinding, Col 10 lines 30-46);
detecting a real-time question from the one of more viewers within the livestream (questions asked by the attendees of the live online meeting are transcribed and indexed, Col 7 lines 9-13, and the user gives an answer to a question asked by one of the attendees within the livestream, Col 8-9 lines 65-2, Col 10 lines 30-46);
matching the real-time question with a video segment that was stored, wherein the matching is based on the potential question that was associated with the video segment (when the user answers a question asked by an attendee, live video streaming program explores the knowledge graph for the predicted one or more anticipated questions and their associated answers and correlates the question with an anticipated question and answer, Col 9 lines 8-13, Col 8-9 lines 65-2, Col 10 lines 30-46); and
rendering the video segment that was matched to the one or more viewers (altering the presentation by rewinding or fast-forwarding to display the segment answering the question, Col 10 lines 37-53).
Tadesse does not specifically mention creating a synthesized video segment for each answer.
However, Tadesse implies, or at least suggests: creating a synthesized video segment for each answer (generating a simulated reply of for answer to questions asked with synthetic rendered graphics of the presenter, Col 11 lines 27-54, Col 10 lines 50-59).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Tadesse by creating a synthesized video segment for each answer in order to allow a user to interact with previously recorded content at the user’s preferred time, as suggested by Tadesse (Col 3 lines 3-8). Doing so would have led to predictable results of improving collaboration during meetings across a remote workforce, as suggested by Tadesse (Col 1 lines 9-18).
Tadesse does not specifically mention wherein the generating is based on a large language model (LLM) neural network.
Gelfenbeyn discloses generating is based on a large language model (LLM) neural network (forming a response to a question using language model implemented as artificial neural network based on an LLM, [0032], [0048], [0052]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Tadesse such that the generating is based on a large language model (LLM) neural network in order to improve flexibility of the response virtual model, as suggested by Gelfenbeyn ([0003]), predictably enabling integration with more applications and environments, as suggested by Gelfenbeyn ([0003]). The cited references are analogous art in the field of natural language.
Consider claim 25, Tadesse discloses a computer program product embodied in a non-transitory computer readable medium for video generation (generating video frames, Col 10 lines 54-60, Col 11 lines 27-29), the computer program product comprising code which causes one or more processors (computer program product with memory storing instructions executed by a processor, Col 3 lines 47-67) to perform operations of:
accessing a livestream, wherein the livestream features a host and is viewed by one or more viewers (a meeting host records a meeting that is live streamed to attendees, Col 6 lines 65-67, and is now previously recorded and being watched by a user, Col 7 lines 4-13);
analyzing audio from the livestream (live video streaming program transcribes the previously recorded online meeting and indexes the transcription, Col 7 lines 14-16);
predicting a plurality of potential questions from the one or more viewers, wherein the predicting is based on the audio that was analyzed (key events are created based on the transcribed audio and incorporated into knowledge graph, and live streaming program predicts one or more anticipated questions based on the extracted question/answer pairs, Col 7 lines 14-50);
generating an answer to each potential question within the plurality of potential questions (generating answers to the predicted questions, Col 7 lines 41-50);
a video segment for each answer to each potential question within the plurality of potential questions, wherein the video segment is associated with the potential question that was answered (for the knowledge graph containing predicted questions and their associated answers, the transcription is used to identify the relevant sections of the previously recorded online meeting containing the generated solutions, i.e. generated answers to the questions, Col 9 lines 7-17, Col 10 lines 30-46);
storing the video segment associated with each potential question from the plurality of potential questions (the presentation content transcription and sections of recorded video are indexed and stored for fast forwarded or rewinding, Col 10 lines 30-46);
detecting a real-time question from the one of more viewers within the livestream (questions asked by the attendees of the live online meeting are transcribed and indexed, Col 7 lines 9-13, and the user gives an answer to a question asked by one of the attendees within the livestream, Col 8-9 lines 65-2, Col 10 lines 30-46);
matching the real-time question with a video segment that was stored, wherein the matching is based on the potential question that was associated with the video segment (when the user answers a question asked by an attendee, live video streaming program explores the knowledge graph for the predicted one or more anticipated questions and their associated answers and correlates the question with an anticipated question and answer, Col 9 lines 8-13, Col 8-9 lines 65-2, Col 10 lines 30-46); and
rendering the video segment that was matched to the one or more viewers (altering the presentation by rewinding or fast-forwarding to display the segment answering the question, Col 10 lines 37-53).
Tadesse does not specifically mention creating a synthesized video segment for each answer.
However, Tadesse implies, or at least suggests: creating a synthesized video segment for each answer (generating a simulated reply of for answer to questions asked with synthetic rendered graphics of the presenter, Col 11 lines 27-54, Col 10 lines 50-59).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Tadesse by creating a synthesized video segment for each answer in order to allow a user to interact with previously recorded content at the user’s preferred time, as suggested by Tadesse (Col 3 lines 3-8). Doing so would have led to predictable results of improving collaboration during meetings across a remote workforce, as suggested by Tadesse (Col 1 lines 9-18).
Tadesse does not specifically mention wherein the generating is based on a large language model (LLM) neural network.
Gelfenbeyn discloses generating is based on a large language model (LLM) neural network (forming a response to a question using language model implemented as artificial neural network based on an LLM, [0032], [0048], [0052]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Tadesse such that the generating is based on a large language model (LLM) neural network for reasons similar to those for claim 1.
Consider claim 26, Tadesse discloses computer system for video generation (generating video frames, Col 10 lines 54-60, Col 11 lines 27-29) comprising: a memory which stores instructions (memory storing instructions, Col 3 lines 46-67); one or more processors coupled to the memory wherein the one or more processors, when executing the instructions (processor executes instructions from memory, Col 3 lines 46-67) which are stored, are configured to:
access a livestream, wherein the livestream features a host and is viewed by one or more viewers (a meeting host records a meeting that is live streamed to attendees, Col 6 lines 65-67, and is now previously recorded and being watched by a user, Col 7 lines 4-13);
analyze audio from the livestream (live video streaming program transcribes the previously recorded online meeting and indexes the transcription, Col 7 lines 14-16);
predict a plurality of potential questions from the one or more viewers, wherein the predicting is based on the audio that was analyzed (key events are created based on the transcribed audio and incorporated into knowledge graph, and live streaming program predicts one or more anticipated questions based on the extracted question/answer pairs, Col 7 lines 14-50);
generate an answer to each potential question within the plurality of potential questions (generating answers to the predicted questions, Col 7 lines 41-50);
a video segment for each answer to each potential question within the plurality of potential questions, wherein the video segment is associated with the potential question that was answered (for the knowledge graph containing predicted questions and their associated answers, the transcription is used to identify the relevant sections of the previously recorded online meeting containing the generated solutions, i.e. generated answers to the questions, Col 9 lines 7-17, Col 10 lines 30-46);
store the video segment associated with each potential question from the plurality of potential questions (the presentation content transcription and sections of recorded video are indexed and stored for fast forwarded or rewinding, Col 10 lines 30-46);
detect a real-time question from the one of more viewers within the livestream (questions asked by the attendees of the live online meeting are transcribed and indexed, Col 7 lines 9-13, and the user gives an answer to a question asked by one of the attendees within the livestream, Col 8-9 lines 65-2, Col 10 lines 30-46);
match the real-time question with a video segment that was stored, wherein the matching is based on the potential question that was associated with the video segment (when the user answers a question asked by an attendee, live video streaming program explores the knowledge graph for the predicted one or more anticipated questions and their associated answers and correlates the question with an anticipated question and answer, Col 9 lines 8-13, Col 8-9 lines 65-2, Col 10 lines 30-46); and
render the video segment that was matched to the one or more viewers (altering the presentation by rewinding or fast-forwarding to display the segment answering the question, Col 10 lines 37-53).
Tadesse does not specifically mention creating a synthesized video segment for each answer.
However, Tadesse implies, or at least suggests: creating a synthesized video segment for each answer (generating a simulated reply of for answer to questions asked with synthetic rendered graphics of the presenter, Col 11 lines 27-54, Col 10 lines 50-59).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Tadesse by creating a synthesized video segment for each answer in order to allow a user to interact with previously recorded content at the user’s preferred time, as suggested by Tadesse (Col 3 lines 3-8). Doing so would have led to predictable results of improving collaboration during meetings across a remote workforce, as suggested by Tadesse (Col 1 lines 9-18).
Tadesse does not specifically mention wherein the generating is based on a large language model (LLM) neural network.
Gelfenbeyn discloses generating is based on a large language model (LLM) neural network (forming a response to a question using language model implemented as artificial neural network based on an LLM, [0032], [0048], [0052]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Tadesse such that the generating is based on a large language model (LLM) neural network for reasons similar to those for claim 1.
Consider claim 2, Tadesse discloses the analyzing includes analyzing one or more images from the livestream (analyzing frames from the video, Col 11 lines 37-43).
Consider claim 3, Tadesse discloses the one or more images comprises a video (frames from the video, Col 11 lines 37-43).
Consider claim 4, Tadesse discloses the one or more images include images of the host (video frames of the meeting host, Col 11 lines 33-35).
Consider claim 9, Tadesse discloses the analyzing includes analyzing viewer interactions of the one or more viewers and with the host of the livestream (questions asked by the attendees to the host, Col 7 lines 26-33).
Consider claim 10, Tadesse discloses the viewer interactions include questions, responses, or comments that occur during the livestream (questions asked by the attendees to the host, Col 7 lines 26-33).
Consider claim 11, Tadesse discloses each synthesized video segment that was matched comprises a performance by an assistant (the “simulated reply” from the meeting host is considered a performance by an assistant, as this creates a virtual character that assists the host, Col 11 lines 32-37).
Consider claim 12, Tadesse discloses the assistant is a representation of an individual (a simulation of the meeting host, Col 11 lines 32-37).
Consider claim 13, Tadesse does not, but Gelfenbeyn discloses the assistant is an animated character (AI character interacts with gestures, actions, and movements, i.e. animations, [0027]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Tadesse such that the assistant is an animated character for reasons similar to those for claim 1.
Consider claim 14, Tadesse discloses the assistant is the host (a simulation of the meeting host, Col 11 lines 32-37).
Consider claim 16, Tadesse discloses the analyzing audio includes natural language processing (NLP) (the audio is processed using a natural language processing model, Col 8 lines 9-10).
Consider claim 17, Tadesse discloses the analyzing audio further comprises detecting a topic being discussed by the host (the knowledge graph encodes topics discussed, Col 7 lines 54-55).
Consider claim 18, Tadesse discloses the analyzing further comprises evaluating a context of the livestream (NLP model analyzes the context of the audio content, Col 8 lines 10-13).
Consider claim 21, Tadesse discloses the context includes a topic of discussion in the livestream (the knowledge graph encodes topics discussed, Col 7 lines 54-55, and NLP model analyzes the context of the audio content, Col 8 lines 10-13).
Consider claim 22, Tadesse discloses the rendering occurs while the host is displayed in the livestream (the simulated reply is incorporated into video frames of the presentation content containing the host, Col 11 lines 27-50).
Consider claim 23, Tadesse discloses the matching includes one or more synthesized videos that were generated in response to potential questions associated with a previous livestream or livestream replay (questions asked during livestream playback, Col 10 lines 37-50).
Consider claim 24, Tadesse discloses the generating includes the livestream (the simulated reply is incorporated into video frames of the presentation content containing the host, Col 11 lines 27-50)
Tadesse does not specifically mention an input to the LLM neural network.
Gelfenbeyn discloses an input to the LLM neural network (a question input to language model implemented as artificial neural network based on an LLM, [0032], [0048], [0052]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Tadesse such that the generating includes the livestream as in Tadesse as an input to the LLM neural network of Gelfenbeyn for reasons similar to those for claim 1.
Claims 5, 6, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Tadesse et al. (US 11558440) in view of Gelfenbeyn et al. (US 20230351661), in further view of Anders et al. (US 20200128286).
Consider claim 5, Tadesse and Gelfenbeyn do not, but Anders discloses recognizing when the host views a product for sale (determining the blogger is reviewing a set of glasses, and product search program identifies the glasses, [0020]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Tadesse by recognizing when the host views a product for sale in order to address the challenges that exist in the process of finding the correct product and a retailer that carries the product that a user sees in a live stream, predictably improving efficiency and utilization of computer resources, as suggested by Anders ([0011]). The cited references are analogous art in the field of live streaming.
Consider claim 6, Tadesse and Gelfenbeyn do not, but Anders discloses recognizing when the host demonstrates a product for sale (determining the blogger is reviewing a set of glasses, and product search program identifies the glasses, [0020]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Tadesse by recognizing when the host demonstrates a product for sale for reasons similar to those for claim 5.
Consider claim 20, Tadesse and Gelfenbeyn do not, but Anders discloses the context includes one or more products for sale or a brand (a set of glasses for sale, [0020]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Tadesse such that the context includes one or more products for sale or a brand for reasons similar to those for claim 5.
Claims 7 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Tadesse et al. (US 11558440) in view of Gelfenbeyn et al. (US 20230351661), in further view of el Kaliouby et al. (US 20200228359).
Consider claim 7, Tadesse and Gelfenbeyn do not, but el Kaliouby discloses the one or more images include images of the one or more viewers (images of the viewers, [0045]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Tadesse such that the one or more images include images of the one or more viewers in order to facilitate effective online meetings, predictably reducing difficulty of scheduling, as suggested by el Kaliouby ([0013], [0014]). The cited references are analogous art in the field of image analysis.
Consider claim 8, Tadesse and Gelfenbeyn do not, but el Kaliouby discloses identifying a confused look from the one or more viewers (detecting confusion from facial expressions on images of the viewers, [0045]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Tadesse by identifying a confused look from the one or more viewers for reasons similar to those for claim 7.
Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Tadesse et al. (US 11558440) in view of Gelfenbeyn et al. (US 20230351661), in further view of Relan et al. (US 20220374420).
Consider claim 15, Tadesse and Gelfenbeyn do not, but Relan discloses the matching is based on a fuzzy matching algorithm (fuzzy match, [0026]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Tadesse and Gelfenbeyn such that the matching is based on a fuzzy matching algorithm in order to increase probability of correctness accuracy, predictably improving ability to process complex phrases, as suggested by Relan ([0024-0025]). The cited references are analogous art in the field of natural language processing.
Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Tadesse et al. (US 11558440) in view of Gelfenbeyn et al. (US 20230351661), in further view of Pell et al. (US 20180176508).
Consider claim 19, Tadesse and Gelfenbeyn do not, but Pell discloses the context includes other livestreams (prior meetings processed by context manager, [0075]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Tadesse and Gelfenbeyn such that the context includes other livestreams in order to enhance communication over distance, predictably improving convience for users in different locations, as suggested by Pell ([0001]). The cited references are analogous art in the field of natural language processing (see Pell, [0110]).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US 20220414331 Asthana discloses automatically generated question suggestions
US 20220327287 Agrawal discloses generating questions using a resource-efficient neural network
US 20210342575 Nagar discloses cognitive enablement of presenters
US 20210209139 Wu discloses natural question generation via a reinforcement learning based graph-to-sequence model
US 20200356237 Moran discloses a graphical chatbot interface facilitating user-chatbot interaction
US 20230027713 Wu discloses neural-symbolic action transformers for video question answering
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jesse Pullias whose telephone number is 571/270-5135. The examiner can normally be reached on M-F 8:00 AM - 4:30 PM. The examiner’s fax number is 571/270-6135.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Andrew Flanders can be reached on 571/272-7516.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Jesse S Pullias/
Primary Examiner, Art Unit 2655 11/01/25