Prosecution Insights
Last updated: May 29, 2026
Application No. 18/621,320

SYSTEMS AND METHODS FOR REAL-TIME CONCERT TRANSCRIPTION AND USER-CAPTURED VIDEO TAGGING

Non-Final OA §103
Filed
Mar 29, 2024
Priority
Mar 31, 2023 — provisional 63/456,038
Examiner
ALBERTALLI, BRIAN LOUIS
Art Unit
2656
Tech Center
2600 — Communications
Assignee
Mixhalo Corp.
OA Round
1 (Non-Final)
82%
Grant Probability
Favorable
1-2
OA Rounds
7m
Est. Remaining
98%
With Interview

Examiner Intelligence

Grants 82% — above average
82%
Career Allowance Rate
701 granted / 857 resolved
+19.8% vs TC avg
Strong +17% interview lift
Without
With
+16.6%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
18 currently pending
Career history
874
Total Applications
across all art units

Statute-Specific Performance

§101
9.6%
-30.4% vs TC avg
§103
65.0%
+25.0% vs TC avg
§102
13.9%
-26.1% vs TC avg
§112
7.0%
-33.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 857 resolved cases

Office Action

§103
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claim(s) 1-3, 5-8, 10-13, 15-18 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Candelore et al. (U.S. Patent Application Pub. No. 2020/0335120, hereinafter “Candelore”), in view of Davis et al. (U.S. Patent No. 12,273,568, hereinafter “Davis”). In regard to claim 1, Candelore discloses a computerized method for generating and displaying contextual data using a mobile computing device at a live event (Fig. 6, 600), the method comprising: receiving, by a mobile computing device at a live event, a data representation of a live audio signal corresponding to the live event via a wireless network (a display system, Fig. 1, 102, receives audio transmitted via a wireless network from a plurality of audio capture devices, paragraphs [0019] and [0095-0096]; the display system 102 is part of a mobile electronic device 106, paragraph [0018]; and includes processing circuitry, Fig. 2, 202, to perform all disclosed functions, paragraph [0103]); processing, by the mobile computing device at the live event, the data representation of the live audio signal into a live audio stream (audio segments are extracted from received live audio segments, paragraph [0099]); generating, by the mobile computing device at the live event, first contextual data based on the live audio stream (caption information is deduced from the audio segment, paragraph [0100]; the information comprising first contextual data derived from audio parameters, paragraph [0041]; ; generating, by the mobile computing device at the live event, second contextual data based on the live audio stream (caption information comprising verbatim text of the audio segments is determined, paragraph [0100]); and generating, by the mobile computing device at the live event, for display on the mobile computing device at the live event the first contextual data and the second contextual data (the caption information is displayed on display device 108, paragraph [0101]). Candelore is silent as to the details of how the first contextual data and second contextual data are determined from the audio signals. Davis discloses a system to enhance live events with contextual data (see Fig. 2 and column 3, lines 8-16) wherein the contextual data is determined using various machine learning models (column 3, lines 27-53). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to adapt the system disclosed by Candelore to generate the first/second contextual data using first/second machine learning models, because it would leverage advancements in artificial intelligence and machine learning technology, as taught by Davis (column 2, lines 38-67). In regard to claim 2, Candelore discloses the mobile computing device is configured to receive the data representation of the live audio signal corresponding to the live event from an audio server computing device via the wireless network (the server provides the audio data to speaker 214B of mobile device 106, paragraph [0072]). In regard to claim 3, Candelore discloses the first contextual data corresponds to sound data (audio parameters, paragraph [0041]) and the second contextual data corresponds to speech data (caption information comprising verbatim text of the audio segments is determined, paragraph [0100]). In regard to claim 5, Candelore discloses using Automatic Speech Recognition (ASR) to generate the second contextual data (speech-to-text converter, paragraph [0064]). Candelore does not expressly disclose the ASR comprises a machine learning model. Davis discloses an Automatic Speech Recognition (ASR) machine learning model (real-time speech to text recognition using machine learning models, column 3, lines 27-53 and column 4, lines 43-55). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize a machine learning model to perform ASR, because it would leverage advancements in artificial intelligence and machine learning technology, as taught by Davis (column 2, lines 38-67). In regard to claim 6, Candelore discloses a system for generating and displaying contextual data using a mobile computing device at a live event (Fig. 6, 600), the system comprising: a mobile computing device (electronic device, Fig. 1, 106, paragraph [0018]) communicatively coupled to an audio server computing device (server 104, paragraph [0017]) over a network (communication network 112), the mobile computing device configured (the display system 102 is part of a mobile electronic device 106, paragraph [0018]; and includes processing circuitry, Fig. 2, 202, to perform all disclosed functions, paragraph [0103]) to: receive a data representation of a live audio signal corresponding to a live event via the wireless network (a display system, Fig. 1, 102, receives audio transmitted via a wireless network from a plurality of audio capture devices, paragraphs [0019] and [0095-0096]); process the data representation of the live audio signal into a live audio stream (audio segments are extracted from received live audio segments, paragraph [0099]); generate first contextual data based on the live audio stream (caption information is deduced from the audio segment, paragraph [0100]; the information comprising first contextual data derived from audio parameters, paragraph [0041]; ; generate second contextual data based on the live audio stream (caption information comprising verbatim text of the audio segments is determined, paragraph [0100]); and generate for display on the mobile computing device at the live event the first contextual data and the second contextual data (the caption information is displayed on display device 108, paragraph [0101]). Candelore is silent as to the details of how the first contextual data and second contextual data are determined from the audio signals. Davis discloses a system to enhance live events with contextual data (see Fig. 2 and column 3, lines 8-16) wherein the contextual data is determined using various machine learning models (column 3, lines 27-53). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to adapt the system disclosed by Candelore to generate the first/second contextual data using first/second machine learning models, because it would leverage advancements in artificial intelligence and machine learning technology, as taught by Davis (column 2, lines 38-67). In regard to claim 7, Candelore discloses the mobile computing device is configured to receive the data representation of the live audio signal corresponding to the live event from the audio server computing device via the wireless network (the server provides the audio data to speaker 214B of mobile device 106, paragraph [0072]). In regard to claim 8, Candelore discloses the first contextual data corresponds to sound data (audio parameters, paragraph [0041]) and the second contextual data corresponds to speech data (caption information comprising verbatim text of the audio segments is determined, paragraph [0100]). In regard to claim 10, Candelore discloses using Automatic Speech Recognition (ASR) to generate the second contextual data (speech-to-text converter, paragraph [0064]). Candelore does not expressly disclose the ASR comprises a machine learning model. Davis discloses an Automatic Speech Recognition (ASR) machine learning model (real-time speech to text recognition using machine learning models, column 3, lines 27-53 and column 4, lines 43-55). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize a machine learning model to perform ASR, because it would leverage advancements in artificial intelligence and machine learning technology, as taught by Davis (column 2, lines 38-67). In regard to claim 11, Candelore discloses a computerized method for generating and tagging contextual data in a user-captured video using a mobile computing device (Fig. 6, 600), the method comprising: receiving, by a mobile computing device at a live event, a data representation of a live audio signal corresponding to the live event via a wireless network (a display system, Fig. 1, 102, receives audio transmitted via a wireless network from a plurality of audio capture devices, paragraphs [0019] and [0095-0096]; the display system 102 is part of a mobile electronic device 106, paragraph [0018]; and includes processing circuitry, Fig. 2, 202, to perform all disclosed functions, paragraph [0103]); processing, by the mobile computing device at the live event, the data representation of the live audio signal into a live audio stream (audio segments are extracted from received live audio segments, paragraph [0099]); generating, by the mobile computing device at the live event, first contextual data based on the live audio stream (caption information is deduced from the audio segment, paragraph [0100]; the information comprising first contextual data derived from audio parameters, paragraph [0041]; ; generating, by the mobile computing device at the live event, second contextual data based on the live audio stream (caption information comprising verbatim text of the audio segments is determined, paragraph [0100]); initiating, by the mobile computing device, a video capture corresponding to the live event (display system 102 includes image capture devices 116 and captures video of the live event, paragraph [0107]); and producing, by the mobile computing device, a shareable video corresponding to the live event based on the captured video, the live audio stream, the first contextual data, and the second contextual data (the caption information is displayed on display device 108, paragraph [0101]; and further broadcasted to others, paragraph [0089]). Candelore is silent as to the details of how the first contextual data and second contextual data are determined from the audio signals. Davis discloses a system to enhance live events with contextual data (see Fig. 2 and column 3, lines 8-16) wherein the contextual data is determined using various machine learning models (column 3, lines 27-53). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to adapt the system disclosed by Candelore to generate the first/second contextual data using first/second machine learning models, because it would leverage advancements in artificial intelligence and machine learning technology, as taught by Davis (column 2, lines 38-67). In regard to claim 12, Candelore discloses the mobile computing device is configured to receive the data representation of the live audio signal corresponding to the live event from the audio server computing device via the wireless network (the server provides the audio data to speaker 214B of mobile device 106, paragraph [0072]). In regard to claim 13, Candelore discloses the first contextual data corresponds to sound data (audio parameters, paragraph [0041]) and the second contextual data corresponds to speech data (caption information comprising verbatim text of the audio segments is determined, paragraph [0100]). In regard to claim 15, Candelore discloses using Automatic Speech Recognition (ASR) to generate the second contextual data (speech-to-text converter, paragraph [0064]). Candelore does not expressly disclose the ASR comprises a machine learning model. Davis discloses an Automatic Speech Recognition (ASR) machine learning model (real-time speech to text recognition using machine learning models, column 3, lines 27-53 and column 4, lines 43-55). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize a machine learning model to perform ASR, because it would leverage advancements in artificial intelligence and machine learning technology, as taught by Davis (column 2, lines 38-67). In regard to claim 16, Candelore discloses a system for generating and tagging contextual data in a user-captured video using a mobile computing device (Fig. 6, 600), the system comprising: a mobile computing device (electronic device, Fig. 1, 106, paragraph [0018]) communicatively coupled to an audio server computing device (server 104, paragraph [0017]) over a network (communication network 112), the mobile computing device configured (the display system 102 is part of a mobile electronic device 106, paragraph [0018]; and includes processing circuitry, Fig. 2, 202, to perform all disclosed functions, paragraph [0103]) to: receive a data representation of a live audio signal corresponding to a live event via the wireless network (a display system, Fig. 1, 102, receives audio transmitted via a wireless network from a plurality of audio capture devices, paragraphs [0019] and [0095-0096]); process the data representation of the live audio signal into a live audio stream (audio segments are extracted from received live audio segments, paragraph [0099]); generate first contextual data based on the live audio stream (caption information is deduced from the audio segment, paragraph [0100]; the information comprising first contextual data derived from audio parameters, paragraph [0041]; ; generate second contextual data based on the live audio stream (caption information comprising verbatim text of the audio segments is determined, paragraph [0100]); and initiate a video capture corresponding to the live event (display system 102 includes image capture devices 116 and captures video of the live event, paragraph [0107]); and produce a shareable video corresponding to the live event based on the captured video, the live audio stream, the first contextual data, and the second contextual data (the caption information is displayed on display device 108, paragraph [0101]; and further broadcasted to others, paragraph [0089]). Candelore is silent as to the details of how the first contextual data and second contextual data are determined from the audio signals. Davis discloses a system to enhance live events with contextual data (see Fig. 2 and column 3, lines 8-16) wherein the contextual data is determined using various machine learning models (column 3, lines 27-53). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to adapt the system disclosed by Candelore to generate the first/second contextual data using first/second machine learning models, because it would leverage advancements in artificial intelligence and machine learning technology, as taught by Davis (column 2, lines 38-67). In regard to claim 17, Candelore discloses the mobile computing device is configured to receive the data representation of the live audio signal corresponding to the live event from the audio server computing device via the wireless network (the server provides the audio data to speaker 214B of mobile device 106, paragraph [0072]). In regard to claim 18, Candelore discloses the first contextual data corresponds to sound data (audio parameters, paragraph [0041]) and the second contextual data corresponds to speech data (caption information comprising verbatim text of the audio segments is determined, paragraph [0100]). In regard to claim 20, Candelore discloses using Automatic Speech Recognition (ASR) to generate the second contextual data (speech-to-text converter, paragraph [0064]). Candelore does not expressly disclose the ASR comprises a machine learning model. Davis discloses an Automatic Speech Recognition (ASR) machine learning model (real-time speech to text recognition using machine learning models, column 3, lines 27-53 and column 4, lines 43-55). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize a machine learning model to perform ASR, because it would leverage advancements in artificial intelligence and machine learning technology, as taught by Davis (column 2, lines 38-67). Claim(s) 4, 9, 14, and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Candelore, in view of Davis, and further in view of Bryan (U.S. Patent Application Pub. No. 2021/0125629). In regard to claims 4, 9, 14, and 19, while Candelore discloses determining a plurality of audio parameters as first contextual data, including a loudness parameter (paragraph [0106]), Candelore and Davis do not expressly disclose the first machine learning model comprises a Signal-to-Noise Ratio (SNR) machine learning model. Bryan discloses a method for determining audio parameters that includes a Signal-to-Noise Ratio (SNR) machine learning model (paragraphs [0040] and [0042]). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize a Signal-to-Noise Ratio (SNR) machine learning model as the first machine learning model, because a signal-to-noise ratio parameter would complement a loudness parameter by indicating whether the loudness was caused by a voice signal or background noise. Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Duffy et al., Levacher et al., Nicol et al., Chang et al., Malik et al., Lord, Goldstein et al., and Koishida et al. disclose additional systems for tagging and captioning live audio. Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRIAN LOUIS ALBERTALLI whose telephone number is (571)272-7616. The examiner can normally be reached M-F 8AM-3PM, 4PM-5PM. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached at 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. BLA 1/21/26 /BRIAN L ALBERTALLI/ Primary Examiner, Art Unit 2656
Read full office action

Prosecution Timeline

Mar 29, 2024
Application Filed
Jan 27, 2026
Non-Final Rejection mailed — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12629093
Systems and Methods for Detecting Impairment Based Upon Voice Data
2y 12m to grant Granted May 19, 2026
Patent 12633304
DYSARTHRIA DETECTION METHOD, DYSARTHRIA DETECTION DEVICE, AND RECORDING MEDIUM
2y 2m to grant Granted May 19, 2026
Patent 12632652
MODIFYING DATA USING LARGE LANGUAGE MODELS
2y 1m to grant Granted May 19, 2026
Patent 12620395
GENERATING A GROUP AUTOMATED ASSISTANT SESSION TO PROVIDE CONTENT TO A PLURALITY OF USERS VIA HEADPHONES
3y 5m to grant Granted May 05, 2026
Patent 12620406
System and Method for Speech Enhancement in Multichannel Audio Processing Systems
2y 7m to grant Granted May 05, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2
Expected OA Rounds
82%
Grant Probability
98%
With Interview (+16.6%)
2y 9m (~7m remaining)
Median Time to Grant
Low
PTA Risk
Based on 857 resolved cases by this examiner. Grant probability derived from career allowance rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month