Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-3, 5-8, 10-13, 15-18 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Candelore et al. (U.S. Patent Application Pub. No. 2020/0335120, hereinafter “Candelore”), in view of Davis et al. (U.S. Patent No. 12,273,568, hereinafter “Davis”).
In regard to claim 1, Candelore discloses a computerized method for generating and displaying contextual data using a mobile computing device at a live event (Fig. 6, 600), the method comprising:
receiving, by a mobile computing device at a live event, a data representation of a live audio signal corresponding to the live event via a wireless network (a display system, Fig. 1, 102, receives audio transmitted via a wireless network from a plurality of audio capture devices, paragraphs [0019] and [0095-0096]; the display system 102 is part of a mobile electronic device 106, paragraph [0018]; and includes processing circuitry, Fig. 2, 202, to perform all disclosed functions, paragraph [0103]);
processing, by the mobile computing device at the live event, the data representation of the live audio signal into a live audio stream (audio segments are extracted from received live audio segments, paragraph [0099]);
generating, by the mobile computing device at the live event, first contextual data based on the live audio stream (caption information is deduced from the audio segment, paragraph [0100]; the information comprising first contextual data derived from audio parameters, paragraph [0041]; ;
generating, by the mobile computing device at the live event, second contextual data based on the live audio stream (caption information comprising verbatim text of the audio segments is determined, paragraph [0100]); and
generating, by the mobile computing device at the live event, for display on the mobile computing device at the live event the first contextual data and the second contextual data (the caption information is displayed on display device 108, paragraph [0101]).
Candelore is silent as to the details of how the first contextual data and second contextual data are determined from the audio signals.
Davis discloses a system to enhance live events with contextual data (see Fig. 2 and column 3, lines 8-16) wherein the contextual data is determined using various machine learning models (column 3, lines 27-53).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to adapt the system disclosed by Candelore to generate the first/second contextual data using first/second machine learning models, because it would leverage advancements in artificial intelligence and machine learning technology, as taught by Davis (column 2, lines 38-67).
In regard to claim 2, Candelore discloses the mobile computing device is configured to receive the data representation of the live audio signal corresponding to the live event from an audio server computing device via the wireless network (the server provides the audio data to speaker 214B of mobile device 106, paragraph [0072]).
In regard to claim 3, Candelore discloses the first contextual data corresponds to sound data (audio parameters, paragraph [0041]) and the second contextual data corresponds to speech data (caption information comprising verbatim text of the audio segments is determined, paragraph [0100]).
In regard to claim 5, Candelore discloses using Automatic Speech Recognition (ASR) to generate the second contextual data (speech-to-text converter, paragraph [0064]).
Candelore does not expressly disclose the ASR comprises a machine learning model.
Davis discloses an Automatic Speech Recognition (ASR) machine learning model (real-time speech to text recognition using machine learning models, column 3, lines 27-53 and column 4, lines 43-55).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize a machine learning model to perform ASR, because it would leverage advancements in artificial intelligence and machine learning technology, as taught by Davis (column 2, lines 38-67).
In regard to claim 6, Candelore discloses a system for generating and displaying contextual data using a mobile computing device at a live event (Fig. 6, 600), the system comprising:
a mobile computing device (electronic device, Fig. 1, 106, paragraph [0018]) communicatively coupled to an audio server computing device (server 104, paragraph [0017]) over a network (communication network 112), the mobile computing device configured (the display system 102 is part of a mobile electronic device 106, paragraph [0018]; and includes processing circuitry, Fig. 2, 202, to perform all disclosed functions, paragraph [0103]) to:
receive a data representation of a live audio signal corresponding to a live event via the wireless network (a display system, Fig. 1, 102, receives audio transmitted via a wireless network from a plurality of audio capture devices, paragraphs [0019] and [0095-0096]);
process the data representation of the live audio signal into a live audio stream (audio segments are extracted from received live audio segments, paragraph [0099]);
generate first contextual data based on the live audio stream (caption information is deduced from the audio segment, paragraph [0100]; the information comprising first contextual data derived from audio parameters, paragraph [0041]; ;
generate second contextual data based on the live audio stream (caption information comprising verbatim text of the audio segments is determined, paragraph [0100]); and
generate for display on the mobile computing device at the live event the first contextual data and the second contextual data (the caption information is displayed on display device 108, paragraph [0101]).
Candelore is silent as to the details of how the first contextual data and second contextual data are determined from the audio signals.
Davis discloses a system to enhance live events with contextual data (see Fig. 2 and column 3, lines 8-16) wherein the contextual data is determined using various machine learning models (column 3, lines 27-53).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to adapt the system disclosed by Candelore to generate the first/second contextual data using first/second machine learning models, because it would leverage advancements in artificial intelligence and machine learning technology, as taught by Davis (column 2, lines 38-67).
In regard to claim 7, Candelore discloses the mobile computing device is configured to receive the data representation of the live audio signal corresponding to the live event from the audio server computing device via the wireless network (the server provides the audio data to speaker 214B of mobile device 106, paragraph [0072]).
In regard to claim 8, Candelore discloses the first contextual data corresponds to sound data (audio parameters, paragraph [0041]) and the second contextual data corresponds to speech data (caption information comprising verbatim text of the audio segments is determined, paragraph [0100]).
In regard to claim 10, Candelore discloses using Automatic Speech Recognition (ASR) to generate the second contextual data (speech-to-text converter, paragraph [0064]).
Candelore does not expressly disclose the ASR comprises a machine learning model.
Davis discloses an Automatic Speech Recognition (ASR) machine learning model (real-time speech to text recognition using machine learning models, column 3, lines 27-53 and column 4, lines 43-55).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize a machine learning model to perform ASR, because it would leverage advancements in artificial intelligence and machine learning technology, as taught by Davis (column 2, lines 38-67).
In regard to claim 11, Candelore discloses a computerized method for generating and tagging contextual data in a user-captured video using a mobile computing device (Fig. 6, 600), the method comprising:
receiving, by a mobile computing device at a live event, a data representation of a live audio signal corresponding to the live event via a wireless network (a display system, Fig. 1, 102, receives audio transmitted via a wireless network from a plurality of audio capture devices, paragraphs [0019] and [0095-0096]; the display system 102 is part of a mobile electronic device 106, paragraph [0018]; and includes processing circuitry, Fig. 2, 202, to perform all disclosed functions, paragraph [0103]);
processing, by the mobile computing device at the live event, the data representation of the live audio signal into a live audio stream (audio segments are extracted from received live audio segments, paragraph [0099]);
generating, by the mobile computing device at the live event, first contextual data based on the live audio stream (caption information is deduced from the audio segment, paragraph [0100]; the information comprising first contextual data derived from audio parameters, paragraph [0041]; ;
generating, by the mobile computing device at the live event, second contextual data based on the live audio stream (caption information comprising verbatim text of the audio segments is determined, paragraph [0100]);
initiating, by the mobile computing device, a video capture corresponding to the live event (display system 102 includes image capture devices 116 and captures video of the live event, paragraph [0107]); and
producing, by the mobile computing device, a shareable video corresponding to the live event based on the captured video, the live audio stream, the first contextual data, and the second contextual data (the caption information is displayed on display device 108, paragraph [0101]; and further broadcasted to others, paragraph [0089]).
Candelore is silent as to the details of how the first contextual data and second contextual data are determined from the audio signals.
Davis discloses a system to enhance live events with contextual data (see Fig. 2 and column 3, lines 8-16) wherein the contextual data is determined using various machine learning models (column 3, lines 27-53).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to adapt the system disclosed by Candelore to generate the first/second contextual data using first/second machine learning models, because it would leverage advancements in artificial intelligence and machine learning technology, as taught by Davis (column 2, lines 38-67).
In regard to claim 12, Candelore discloses the mobile computing device is configured to receive the data representation of the live audio signal corresponding to the live event from the audio server computing device via the wireless network (the server provides the audio data to speaker 214B of mobile device 106, paragraph [0072]).
In regard to claim 13, Candelore discloses the first contextual data corresponds to sound data (audio parameters, paragraph [0041]) and the second contextual data corresponds to speech data (caption information comprising verbatim text of the audio segments is determined, paragraph [0100]).
In regard to claim 15, Candelore discloses using Automatic Speech Recognition (ASR) to generate the second contextual data (speech-to-text converter, paragraph [0064]).
Candelore does not expressly disclose the ASR comprises a machine learning model.
Davis discloses an Automatic Speech Recognition (ASR) machine learning model (real-time speech to text recognition using machine learning models, column 3, lines 27-53 and column 4, lines 43-55).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize a machine learning model to perform ASR, because it would leverage advancements in artificial intelligence and machine learning technology, as taught by Davis (column 2, lines 38-67).
In regard to claim 16, Candelore discloses a system for generating and tagging contextual data in a user-captured video using a mobile computing device (Fig. 6, 600), the system comprising:
a mobile computing device (electronic device, Fig. 1, 106, paragraph [0018]) communicatively coupled to an audio server computing device (server 104, paragraph [0017]) over a network (communication network 112), the mobile computing device configured (the display system 102 is part of a mobile electronic device 106, paragraph [0018]; and includes processing circuitry, Fig. 2, 202, to perform all disclosed functions, paragraph [0103]) to:
receive a data representation of a live audio signal corresponding to a live event via the wireless network (a display system, Fig. 1, 102, receives audio transmitted via a wireless network from a plurality of audio capture devices, paragraphs [0019] and [0095-0096]);
process the data representation of the live audio signal into a live audio stream (audio segments are extracted from received live audio segments, paragraph [0099]);
generate first contextual data based on the live audio stream (caption information is deduced from the audio segment, paragraph [0100]; the information comprising first contextual data derived from audio parameters, paragraph [0041]; ;
generate second contextual data based on the live audio stream (caption information comprising verbatim text of the audio segments is determined, paragraph [0100]); and
initiate a video capture corresponding to the live event (display system 102 includes image capture devices 116 and captures video of the live event, paragraph [0107]); and
produce a shareable video corresponding to the live event based on the captured video, the live audio stream, the first contextual data, and the second contextual data (the caption information is displayed on display device 108, paragraph [0101]; and further broadcasted to others, paragraph [0089]).
Candelore is silent as to the details of how the first contextual data and second contextual data are determined from the audio signals.
Davis discloses a system to enhance live events with contextual data (see Fig. 2 and column 3, lines 8-16) wherein the contextual data is determined using various machine learning models (column 3, lines 27-53).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to adapt the system disclosed by Candelore to generate the first/second contextual data using first/second machine learning models, because it would leverage advancements in artificial intelligence and machine learning technology, as taught by Davis (column 2, lines 38-67).
In regard to claim 17, Candelore discloses the mobile computing device is configured to receive the data representation of the live audio signal corresponding to the live event from the audio server computing device via the wireless network (the server provides the audio data to speaker 214B of mobile device 106, paragraph [0072]).
In regard to claim 18, Candelore discloses the first contextual data corresponds to sound data (audio parameters, paragraph [0041]) and the second contextual data corresponds to speech data (caption information comprising verbatim text of the audio segments is determined, paragraph [0100]).
In regard to claim 20, Candelore discloses using Automatic Speech Recognition (ASR) to generate the second contextual data (speech-to-text converter, paragraph [0064]).
Candelore does not expressly disclose the ASR comprises a machine learning model.
Davis discloses an Automatic Speech Recognition (ASR) machine learning model (real-time speech to text recognition using machine learning models, column 3, lines 27-53 and column 4, lines 43-55).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize a machine learning model to perform ASR, because it would leverage advancements in artificial intelligence and machine learning technology, as taught by Davis (column 2, lines 38-67).
Claim(s) 4, 9, 14, and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Candelore, in view of Davis, and further in view of Bryan (U.S. Patent Application Pub. No. 2021/0125629).
In regard to claims 4, 9, 14, and 19, while Candelore discloses determining a plurality of audio parameters as first contextual data, including a loudness parameter (paragraph [0106]), Candelore and Davis do not expressly disclose the first machine learning model comprises a Signal-to-Noise Ratio (SNR) machine learning model.
Bryan discloses a method for determining audio parameters that includes a Signal-to-Noise Ratio (SNR) machine learning model (paragraphs [0040] and [0042]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize a Signal-to-Noise Ratio (SNR) machine learning model as the first machine learning model, because a signal-to-noise ratio parameter would complement a loudness parameter by indicating whether the loudness was caused by a voice signal or background noise.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Duffy et al., Levacher et al., Nicol et al., Chang et al., Malik et al., Lord, Goldstein et al., and Koishida et al. disclose additional systems for tagging and captioning live audio.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRIAN LOUIS ALBERTALLI whose telephone number is (571)272-7616. The examiner can normally be reached M-F 8AM-3PM, 4PM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached at 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
BLA 1/21/26
/BRIAN L ALBERTALLI/ Primary Examiner, Art Unit 2656