Prosecution Insights
Last updated: April 19, 2026
Application No. 18/621,320

SYSTEMS AND METHODS FOR REAL-TIME CONCERT TRANSCRIPTION AND USER-CAPTURED VIDEO TAGGING

Non-Final OA §103
Filed
Mar 29, 2024
Examiner
ALBERTALLI, BRIAN LOUIS
Art Unit
2656
Tech Center
2600 — Communications
Assignee
Mixhalo Corp.
OA Round
1 (Non-Final)
82%
Grant Probability
Favorable
1-2
OA Rounds
2y 11m
To Grant
98%
With Interview

Examiner Intelligence

Grants 82% — above average
82%
Career Allow Rate
697 granted / 852 resolved
+19.8% vs TC avg
Strong +16% interview lift
Without
With
+16.5%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
19 currently pending
Career history
871
Total Applications
across all art units

Statute-Specific Performance

§101
13.8%
-26.2% vs TC avg
§103
34.9%
-5.1% vs TC avg
§102
27.7%
-12.3% vs TC avg
§112
16.6%
-23.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 852 resolved cases

Office Action

§103
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claim(s) 1-3, 5-8, 10-13, 15-18 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Candelore et al. (U.S. Patent Application Pub. No. 2020/0335120, hereinafter “Candelore”), in view of Davis et al. (U.S. Patent No. 12,273,568, hereinafter “Davis”). In regard to claim 1, Candelore discloses a computerized method for generating and displaying contextual data using a mobile computing device at a live event (Fig. 6, 600), the method comprising: receiving, by a mobile computing device at a live event, a data representation of a live audio signal corresponding to the live event via a wireless network (a display system, Fig. 1, 102, receives audio transmitted via a wireless network from a plurality of audio capture devices, paragraphs [0019] and [0095-0096]; the display system 102 is part of a mobile electronic device 106, paragraph [0018]; and includes processing circuitry, Fig. 2, 202, to perform all disclosed functions, paragraph [0103]); processing, by the mobile computing device at the live event, the data representation of the live audio signal into a live audio stream (audio segments are extracted from received live audio segments, paragraph [0099]); generating, by the mobile computing device at the live event, first contextual data based on the live audio stream (caption information is deduced from the audio segment, paragraph [0100]; the information comprising first contextual data derived from audio parameters, paragraph [0041]; ; generating, by the mobile computing device at the live event, second contextual data based on the live audio stream (caption information comprising verbatim text of the audio segments is determined, paragraph [0100]); and generating, by the mobile computing device at the live event, for display on the mobile computing device at the live event the first contextual data and the second contextual data (the caption information is displayed on display device 108, paragraph [0101]). Candelore is silent as to the details of how the first contextual data and second contextual data are determined from the audio signals. Davis discloses a system to enhance live events with contextual data (see Fig. 2 and column 3, lines 8-16) wherein the contextual data is determined using various machine learning models (column 3, lines 27-53). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to adapt the system disclosed by Candelore to generate the first/second contextual data using first/second machine learning models, because it would leverage advancements in artificial intelligence and machine learning technology, as taught by Davis (column 2, lines 38-67). In regard to claim 2, Candelore discloses the mobile computing device is configured to receive the data representation of the live audio signal corresponding to the live event from an audio server computing device via the wireless network (the server provides the audio data to speaker 214B of mobile device 106, paragraph [0072]). In regard to claim 3, Candelore discloses the first contextual data corresponds to sound data (audio parameters, paragraph [0041]) and the second contextual data corresponds to speech data (caption information comprising verbatim text of the audio segments is determined, paragraph [0100]). In regard to claim 5, Candelore discloses using Automatic Speech Recognition (ASR) to generate the second contextual data (speech-to-text converter, paragraph [0064]). Candelore does not expressly disclose the ASR comprises a machine learning model. Davis discloses an Automatic Speech Recognition (ASR) machine learning model (real-time speech to text recognition using machine learning models, column 3, lines 27-53 and column 4, lines 43-55). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize a machine learning model to perform ASR, because it would leverage advancements in artificial intelligence and machine learning technology, as taught by Davis (column 2, lines 38-67). In regard to claim 6, Candelore discloses a system for generating and displaying contextual data using a mobile computing device at a live event (Fig. 6, 600), the system comprising: a mobile computing device (electronic device, Fig. 1, 106, paragraph [0018]) communicatively coupled to an audio server computing device (server 104, paragraph [0017]) over a network (communication network 112), the mobile computing device configured (the display system 102 is part of a mobile electronic device 106, paragraph [0018]; and includes processing circuitry, Fig. 2, 202, to perform all disclosed functions, paragraph [0103]) to: receive a data representation of a live audio signal corresponding to a live event via the wireless network (a display system, Fig. 1, 102, receives audio transmitted via a wireless network from a plurality of audio capture devices, paragraphs [0019] and [0095-0096]); process the data representation of the live audio signal into a live audio stream (audio segments are extracted from received live audio segments, paragraph [0099]); generate first contextual data based on the live audio stream (caption information is deduced from the audio segment, paragraph [0100]; the information comprising first contextual data derived from audio parameters, paragraph [0041]; ; generate second contextual data based on the live audio stream (caption information comprising verbatim text of the audio segments is determined, paragraph [0100]); and generate for display on the mobile computing device at the live event the first contextual data and the second contextual data (the caption information is displayed on display device 108, paragraph [0101]). Candelore is silent as to the details of how the first contextual data and second contextual data are determined from the audio signals. Davis discloses a system to enhance live events with contextual data (see Fig. 2 and column 3, lines 8-16) wherein the contextual data is determined using various machine learning models (column 3, lines 27-53). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to adapt the system disclosed by Candelore to generate the first/second contextual data using first/second machine learning models, because it would leverage advancements in artificial intelligence and machine learning technology, as taught by Davis (column 2, lines 38-67). In regard to claim 7, Candelore discloses the mobile computing device is configured to receive the data representation of the live audio signal corresponding to the live event from the audio server computing device via the wireless network (the server provides the audio data to speaker 214B of mobile device 106, paragraph [0072]). In regard to claim 8, Candelore discloses the first contextual data corresponds to sound data (audio parameters, paragraph [0041]) and the second contextual data corresponds to speech data (caption information comprising verbatim text of the audio segments is determined, paragraph [0100]). In regard to claim 10, Candelore discloses using Automatic Speech Recognition (ASR) to generate the second contextual data (speech-to-text converter, paragraph [0064]). Candelore does not expressly disclose the ASR comprises a machine learning model. Davis discloses an Automatic Speech Recognition (ASR) machine learning model (real-time speech to text recognition using machine learning models, column 3, lines 27-53 and column 4, lines 43-55). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize a machine learning model to perform ASR, because it would leverage advancements in artificial intelligence and machine learning technology, as taught by Davis (column 2, lines 38-67). In regard to claim 11, Candelore discloses a computerized method for generating and tagging contextual data in a user-captured video using a mobile computing device (Fig. 6, 600), the method comprising: receiving, by a mobile computing device at a live event, a data representation of a live audio signal corresponding to the live event via a wireless network (a display system, Fig. 1, 102, receives audio transmitted via a wireless network from a plurality of audio capture devices, paragraphs [0019] and [0095-0096]; the display system 102 is part of a mobile electronic device 106, paragraph [0018]; and includes processing circuitry, Fig. 2, 202, to perform all disclosed functions, paragraph [0103]); processing, by the mobile computing device at the live event, the data representation of the live audio signal into a live audio stream (audio segments are extracted from received live audio segments, paragraph [0099]); generating, by the mobile computing device at the live event, first contextual data based on the live audio stream (caption information is deduced from the audio segment, paragraph [0100]; the information comprising first contextual data derived from audio parameters, paragraph [0041]; ; generating, by the mobile computing device at the live event, second contextual data based on the live audio stream (caption information comprising verbatim text of the audio segments is determined, paragraph [0100]); initiating, by the mobile computing device, a video capture corresponding to the live event (display system 102 includes image capture devices 116 and captures video of the live event, paragraph [0107]); and producing, by the mobile computing device, a shareable video corresponding to the live event based on the captured video, the live audio stream, the first contextual data, and the second contextual data (the caption information is displayed on display device 108, paragraph [0101]; and further broadcasted to others, paragraph [0089]). Candelore is silent as to the details of how the first contextual data and second contextual data are determined from the audio signals. Davis discloses a system to enhance live events with contextual data (see Fig. 2 and column 3, lines 8-16) wherein the contextual data is determined using various machine learning models (column 3, lines 27-53). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to adapt the system disclosed by Candelore to generate the first/second contextual data using first/second machine learning models, because it would leverage advancements in artificial intelligence and machine learning technology, as taught by Davis (column 2, lines 38-67). In regard to claim 12, Candelore discloses the mobile computing device is configured to receive the data representation of the live audio signal corresponding to the live event from the audio server computing device via the wireless network (the server provides the audio data to speaker 214B of mobile device 106, paragraph [0072]). In regard to claim 13, Candelore discloses the first contextual data corresponds to sound data (audio parameters, paragraph [0041]) and the second contextual data corresponds to speech data (caption information comprising verbatim text of the audio segments is determined, paragraph [0100]). In regard to claim 15, Candelore discloses using Automatic Speech Recognition (ASR) to generate the second contextual data (speech-to-text converter, paragraph [0064]). Candelore does not expressly disclose the ASR comprises a machine learning model. Davis discloses an Automatic Speech Recognition (ASR) machine learning model (real-time speech to text recognition using machine learning models, column 3, lines 27-53 and column 4, lines 43-55). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize a machine learning model to perform ASR, because it would leverage advancements in artificial intelligence and machine learning technology, as taught by Davis (column 2, lines 38-67). In regard to claim 16, Candelore discloses a system for generating and tagging contextual data in a user-captured video using a mobile computing device (Fig. 6, 600), the system comprising: a mobile computing device (electronic device, Fig. 1, 106, paragraph [0018]) communicatively coupled to an audio server computing device (server 104, paragraph [0017]) over a network (communication network 112), the mobile computing device configured (the display system 102 is part of a mobile electronic device 106, paragraph [0018]; and includes processing circuitry, Fig. 2, 202, to perform all disclosed functions, paragraph [0103]) to: receive a data representation of a live audio signal corresponding to a live event via the wireless network (a display system, Fig. 1, 102, receives audio transmitted via a wireless network from a plurality of audio capture devices, paragraphs [0019] and [0095-0096]); process the data representation of the live audio signal into a live audio stream (audio segments are extracted from received live audio segments, paragraph [0099]); generate first contextual data based on the live audio stream (caption information is deduced from the audio segment, paragraph [0100]; the information comprising first contextual data derived from audio parameters, paragraph [0041]; ; generate second contextual data based on the live audio stream (caption information comprising verbatim text of the audio segments is determined, paragraph [0100]); and initiate a video capture corresponding to the live event (display system 102 includes image capture devices 116 and captures video of the live event, paragraph [0107]); and produce a shareable video corresponding to the live event based on the captured video, the live audio stream, the first contextual data, and the second contextual data (the caption information is displayed on display device 108, paragraph [0101]; and further broadcasted to others, paragraph [0089]). Candelore is silent as to the details of how the first contextual data and second contextual data are determined from the audio signals. Davis discloses a system to enhance live events with contextual data (see Fig. 2 and column 3, lines 8-16) wherein the contextual data is determined using various machine learning models (column 3, lines 27-53). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to adapt the system disclosed by Candelore to generate the first/second contextual data using first/second machine learning models, because it would leverage advancements in artificial intelligence and machine learning technology, as taught by Davis (column 2, lines 38-67). In regard to claim 17, Candelore discloses the mobile computing device is configured to receive the data representation of the live audio signal corresponding to the live event from the audio server computing device via the wireless network (the server provides the audio data to speaker 214B of mobile device 106, paragraph [0072]). In regard to claim 18, Candelore discloses the first contextual data corresponds to sound data (audio parameters, paragraph [0041]) and the second contextual data corresponds to speech data (caption information comprising verbatim text of the audio segments is determined, paragraph [0100]). In regard to claim 20, Candelore discloses using Automatic Speech Recognition (ASR) to generate the second contextual data (speech-to-text converter, paragraph [0064]). Candelore does not expressly disclose the ASR comprises a machine learning model. Davis discloses an Automatic Speech Recognition (ASR) machine learning model (real-time speech to text recognition using machine learning models, column 3, lines 27-53 and column 4, lines 43-55). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize a machine learning model to perform ASR, because it would leverage advancements in artificial intelligence and machine learning technology, as taught by Davis (column 2, lines 38-67). Claim(s) 4, 9, 14, and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Candelore, in view of Davis, and further in view of Bryan (U.S. Patent Application Pub. No. 2021/0125629). In regard to claims 4, 9, 14, and 19, while Candelore discloses determining a plurality of audio parameters as first contextual data, including a loudness parameter (paragraph [0106]), Candelore and Davis do not expressly disclose the first machine learning model comprises a Signal-to-Noise Ratio (SNR) machine learning model. Bryan discloses a method for determining audio parameters that includes a Signal-to-Noise Ratio (SNR) machine learning model (paragraphs [0040] and [0042]). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize a Signal-to-Noise Ratio (SNR) machine learning model as the first machine learning model, because a signal-to-noise ratio parameter would complement a loudness parameter by indicating whether the loudness was caused by a voice signal or background noise. Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Duffy et al., Levacher et al., Nicol et al., Chang et al., Malik et al., Lord, Goldstein et al., and Koishida et al. disclose additional systems for tagging and captioning live audio. Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRIAN LOUIS ALBERTALLI whose telephone number is (571)272-7616. The examiner can normally be reached M-F 8AM-3PM, 4PM-5PM. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached at 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. BLA 1/21/26 /BRIAN L ALBERTALLI/ Primary Examiner, Art Unit 2656
Read full office action

Prosecution Timeline

Mar 29, 2024
Application Filed
Jan 22, 2026
Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12592247
INFERRING EMOTION FROM SPEECH IN AUDIO DATA USING DEEP LEARNING
2y 5m to grant Granted Mar 31, 2026
Patent 12573407
QUICK AUDIO PROFILE USING VOICE ASSISTANT
2y 5m to grant Granted Mar 10, 2026
Patent 12574386
DISTRIBUTED IDENTIFICATION IN NETWORKED SYSTEM
2y 5m to grant Granted Mar 10, 2026
Patent 12572327
CONDITIONALLY ASSIGNING VARIOUS AUTOMATED ASSISTANT FUNCTION(S) TO INTERACTION WITH A PERIPHERAL ASSISTANT CONTROL DEVICE
2y 5m to grant Granted Mar 10, 2026
Patent 12573382
ADVERSARIAL LANGUAGE IMITATION WITH CONSTRAINED EXEMPLARS
2y 5m to grant Granted Mar 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
82%
Grant Probability
98%
With Interview (+16.5%)
2y 11m
Median Time to Grant
Low
PTA Risk
Based on 852 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month