Last updated: April 19, 2026

Application No. 18/621,320

SYSTEMS AND METHODS FOR REAL-TIME CONCERT TRANSCRIPTION AND USER-CAPTURED VIDEO TAGGING

Non-Final OA §103

Filed

Mar 29, 2024

Examiner

ALBERTALLI, BRIAN LOUIS

Art Unit

2656

Tech Center

2600 — Communications

Assignee

Mixhalo Corp.

OA Round

1 (Non-Final)

Interview Optional

— +16.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 852 resolved cases, 2023–2026

Examiner Intelligence

ALBERTALLI, BRIAN LOUIS View full profile →

Grants 82% — above average

Career Allow Rate

697 granted / 852 resolved

+19.8% vs TC avg

Strong +16% interview lift

Without

With

+16.5%

Interview Lift

resolved cases with interview

Typical timeline

2y 11m

Avg Prosecution

19 currently pending

Career history

871

Total Applications

across all art units

Statute-Specific Performance

§101

13.8%

-26.2% vs TC avg

§103

34.9%

-5.1% vs TC avg

§102

27.7%

-12.3% vs TC avg

§112

16.6%

-23.4% vs TC avg

Black line = Tech Center average estimate • Based on career data from 852 resolved cases

Office Action

§103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-3, 5-8, 10-13, 15-18 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Candelore et al. (U.S. Patent Application Pub. No. 2020/0335120, hereinafter “Candelore”), in view of Davis et al. (U.S. Patent No. 12,273,568, hereinafter “Davis”).
In regard to claim 1, Candelore discloses a computerized method for generating and displaying contextual data using a mobile computing device at a live event (Fig. 6, 600), the method comprising:
receiving, by a mobile computing device at a live event, a data representation of a live audio signal corresponding to the live event via a wireless network (a display system, Fig. 1, 102, receives audio transmitted via a wireless network from a plurality of audio capture devices, paragraphs [0019] and [0095-0096]; the display system 102 is part of a mobile electronic device 106, paragraph [0018]; and includes processing circuitry, Fig. 2, 202, to perform all disclosed functions, paragraph [0103]);
processing, by the mobile computing device at the live event, the data representation of the live audio signal into a live audio stream (audio segments are extracted from received live audio segments, paragraph [0099]);
generating, by the mobile computing device at the live event, first contextual data based on the live audio stream (caption information is deduced from the audio segment, paragraph [0100]; the information comprising first contextual data derived from audio parameters, paragraph [0041]; ;
generating, by the mobile computing device at the live event, second contextual data based on the live audio stream (caption information comprising verbatim text of the audio segments is determined, paragraph [0100]); and
generating, by the mobile computing device at the live event, for display on the mobile computing device at the live event the first contextual data and the second contextual data (the caption information is displayed on display device 108, paragraph [0101]).
Candelore is silent as to the details of how the first contextual data and second contextual data are determined from the audio signals.
Davis discloses a system to enhance live events with contextual data (see Fig. 2 and column 3, lines 8-16) wherein the contextual data is determined using various machine learning models (column 3, lines 27-53).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to adapt the system disclosed by Candelore to generate the first/second contextual data using first/second machine learning models, because it would leverage advancements in artificial intelligence and machine learning technology, as taught by Davis (column 2, lines 38-67).

In regard to claim 2, Candelore discloses the mobile computing device is configured to receive the data representation of the live audio signal corresponding to the live event from an audio server computing device via the wireless network (the server provides the audio data to speaker 214B of mobile device 106, paragraph [0072]).

In regard to claim 3, Candelore discloses the first contextual data corresponds to sound data (audio parameters, paragraph [0041]) and the second contextual data corresponds to speech data (caption information comprising verbatim text of the audio segments is determined, paragraph [0100]).

In regard to claim 5, Candelore discloses using Automatic Speech Recognition (ASR) to generate the second contextual data (speech-to-text converter, paragraph [0064]).
Candelore does not expressly disclose the ASR comprises a machine learning model.
Davis discloses an Automatic Speech Recognition (ASR) machine learning model (real-time speech to text recognition using machine learning models, column 3, lines 27-53 and column 4, lines 43-55).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize a machine learning model to perform ASR, because it would leverage advancements in artificial intelligence and machine learning technology, as taught by Davis (column 2, lines 38-67).

In regard to claim 6, Candelore discloses a system for generating and displaying contextual data using a mobile computing device at a live event (Fig. 6, 600), the system comprising:
a mobile computing device (electronic device, Fig. 1, 106, paragraph [0018]) communicatively coupled to an audio server computing device (server 104, paragraph [0017]) over a network (communication network 112), the mobile computing device configured (the display system 102 is part of a mobile electronic device 106, paragraph [0018]; and includes processing circuitry, Fig. 2, 202, to perform all disclosed functions, paragraph [0103]) to:
receive a data representation of a live audio signal corresponding to a live event via the wireless network (a display system, Fig. 1, 102, receives audio transmitted via a wireless network from a plurality of audio capture devices, paragraphs [0019] and [0095-0096]);
process the data representation of the live audio signal into a live audio stream (audio segments are extracted from received live audio segments, paragraph [0099]);
generate first contextual data based on the live audio stream (caption information is deduced from the audio segment, paragraph [0100]; the information comprising first contextual data derived from audio parameters, paragraph [0041]; ;
generate second contextual data based on the live audio stream (caption information comprising verbatim text of the audio segments is determined, paragraph [0100]); and
generate for display on the mobile computing device at the live event the first contextual data and the second contextual data (the caption information is displayed on display device 108, paragraph [0101]).
Candelore is silent as to the details of how the first contextual data and second contextual data are determined from the audio signals.
Davis discloses a system to enhance live events with contextual data (see Fig. 2 and column 3, lines 8-16) wherein the contextual data is determined using various machine learning models (column 3, lines 27-53).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to adapt the system disclosed by Candelore to generate the first/second contextual data using first/second machine learning models, because it would leverage advancements in artificial intelligence and machine learning technology, as taught by Davis (column 2, lines 38-67).

In regard to claim 7, Candelore discloses the mobile computing device is configured to receive the data representation of the live audio signal corresponding to the live event from the audio server computing device via the wireless network (the server provides the audio data to speaker 214B of mobile device 106, paragraph [0072]).

In regard to claim 8, Candelore discloses the first contextual data corresponds to sound data (audio parameters, paragraph [0041]) and the second contextual data corresponds to speech data (caption information comprising verbatim text of the audio segments is determined, paragraph [0100]).

In regard to claim 10, Candelore discloses using Automatic Speech Recognition (ASR) to generate the second contextual data (speech-to-text converter, paragraph [0064]).
Candelore does not expressly disclose the ASR comprises a machine learning model.
Davis discloses an Automatic Speech Recognition (ASR) machine learning model (real-time speech to text recognition using machine learning models, column 3, lines 27-53 and column 4, lines 43-55).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize a machine learning model to perform ASR, because it would leverage advancements in artificial intelligence and machine learning technology, as taught by Davis (column 2, lines 38-67).

In regard to claim 11, Candelore discloses a computerized method for generating and tagging contextual data in a user-captured video using a mobile computing device (Fig. 6, 600), the method comprising:
receiving, by a mobile computing device at a live event, a data representation of a live audio signal corresponding to the live event via a wireless network (a display system, Fig. 1, 102, receives audio transmitted via a wireless network from a plurality of audio capture devices, paragraphs [0019] and [0095-0096]; the display system 102 is part of a mobile electronic device 106, paragraph [0018]; and includes processing circuitry, Fig. 2, 202, to perform all disclosed functions, paragraph [0103]);
processing, by the mobile computing device at the live event, the data representation of the live audio signal into a live audio stream (audio segments are extracted from received live audio segments, paragraph [0099]);
generating, by the mobile computing device at the live event, first contextual data based on the live audio stream (caption information is deduced from the audio segment, paragraph [0100]; the information comprising first contextual data derived from audio parameters, paragraph [0041]; ;
generating, by the mobile computing device at the live event, second contextual data based on the live audio stream (caption information comprising verbatim text of the audio segments is determined, paragraph [0100]); 
initiating, by the mobile computing device, a video capture corresponding to the live event (display system 102 includes image capture devices 116 and captures video of the live event, paragraph [0107]); and
producing, by the mobile computing device, a shareable video corresponding to the live event based on the captured video, the live audio stream, the first contextual data, and the second contextual data (the caption information is displayed on display device 108, paragraph [0101]; and further broadcasted to others, paragraph [0089]).
Candelore is silent as to the details of how the first contextual data and second contextual data are determined from the audio signals.
Davis discloses a system to enhance live events with contextual data (see Fig. 2 and column 3, lines 8-16) wherein the contextual data is determined using various machine learning models (column 3, lines 27-53).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to adapt the system disclosed by Candelore to generate the first/second contextual data using first/second machine learning models, because it would leverage advancements in artificial intelligence and machine learning technology, as taught by Davis (column 2, lines 38-67).

In regard to claim 12, Candelore discloses the mobile computing device is configured to receive the data representation of the live audio signal corresponding to the live event from the audio server computing device via the wireless network (the server provides the audio data to speaker 214B of mobile device 106, paragraph [0072]).

In regard to claim 13, Candelore discloses the first contextual data corresponds to sound data (audio parameters, paragraph [0041]) and the second contextual data corresponds to speech data (caption information comprising verbatim text of the audio segments is determined, paragraph [0100]).

In regard to claim 15, Candelore discloses using Automatic Speech Recognition (ASR) to generate the second contextual data (speech-to-text converter, paragraph [0064]).
Candelore does not expressly disclose the ASR comprises a machine learning model.
Davis discloses an Automatic Speech Recognition (ASR) machine learning model (real-time speech to text recognition using machine learning models, column 3, lines 27-53 and column 4, lines 43-55).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize a machine learning model to perform ASR, because it would leverage advancements in artificial intelligence and machine learning technology, as taught by Davis (column 2, lines 38-67).

In regard to claim 16, Candelore discloses a system for generating and tagging contextual data in a user-captured video using a mobile computing device (Fig. 6, 600), the system comprising:
a mobile computing device (electronic device, Fig. 1, 106, paragraph [0018]) communicatively coupled to an audio server computing device (server 104, paragraph [0017]) over a network (communication network 112), the mobile computing device configured (the display system 102 is part of a mobile electronic device 106, paragraph [0018]; and includes processing circuitry, Fig. 2, 202, to perform all disclosed functions, paragraph [0103]) to:
receive a data representation of a live audio signal corresponding to a live event via the wireless network (a display system, Fig. 1, 102, receives audio transmitted via a wireless network from a plurality of audio capture devices, paragraphs [0019] and [0095-0096]);
process the data representation of the live audio signal into a live audio stream (audio segments are extracted from received live audio segments, paragraph [0099]);
generate first contextual data based on the live audio stream (caption information is deduced from the audio segment, paragraph [0100]; the information comprising first contextual data derived from audio parameters, paragraph [0041]; ;
generate second contextual data based on the live audio stream (caption information comprising verbatim text of the audio segments is determined, paragraph [0100]); and
initiate a video capture corresponding to the live event (display system 102 includes image capture devices 116 and captures video of the live event, paragraph [0107]); and
produce a shareable video corresponding to the live event based on the captured video, the live audio stream, the first contextual data, and the second contextual data (the caption information is displayed on display device 108, paragraph [0101]; and further broadcasted to others, paragraph [0089]).
Candelore is silent as to the details of how the first contextual data and second contextual data are determined from the audio signals.
Davis discloses a system to enhance live events with contextual data (see Fig. 2 and column 3, lines 8-16) wherein the contextual data is determined using various machine learning models (column 3, lines 27-53).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to adapt the system disclosed by Candelore to generate the first/second contextual data using first/second machine learning models, because it would leverage advancements in artificial intelligence and machine learning technology, as taught by Davis (column 2, lines 38-67).

In regard to claim 17, Candelore discloses the mobile computing device is configured to receive the data representation of the live audio signal corresponding to the live event from the audio server computing device via the wireless network (the server provides the audio data to speaker 214B of mobile device 106, paragraph [0072]).

In regard to claim 18, Candelore discloses the first contextual data corresponds to sound data (audio parameters, paragraph [0041]) and the second contextual data corresponds to speech data (caption information comprising verbatim text of the audio segments is determined, paragraph [0100]).

In regard to claim 20, Candelore discloses using Automatic Speech Recognition (ASR) to generate the second contextual data (speech-to-text converter, paragraph [0064]).
Candelore does not expressly disclose the ASR comprises a machine learning model.
Davis discloses an Automatic Speech Recognition (ASR) machine learning model (real-time speech to text recognition using machine learning models, column 3, lines 27-53 and column 4, lines 43-55).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize a machine learning model to perform ASR, because it would leverage advancements in artificial intelligence and machine learning technology, as taught by Davis (column 2, lines 38-67).


Claim(s) 4, 9, 14, and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Candelore, in view of Davis, and further in view of Bryan (U.S. Patent Application Pub. No. 2021/0125629).
In regard to claims 4, 9, 14, and 19, while Candelore discloses determining a plurality of audio parameters as first contextual data, including a loudness parameter (paragraph [0106]), Candelore and Davis do not expressly disclose the first machine learning model comprises a Signal-to-Noise Ratio (SNR) machine learning model.
Bryan discloses a method for determining audio parameters that includes a Signal-to-Noise Ratio (SNR) machine learning model (paragraphs [0040] and [0042]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize a Signal-to-Noise Ratio (SNR) machine learning model as the first machine learning model, because a signal-to-noise ratio parameter would complement a loudness parameter by indicating whether the loudness was caused by a voice signal or background noise.


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Duffy et al., Levacher et al., Nicol et al., Chang et al., Malik et al., Lord, Goldstein et al., and Koishida et al. disclose additional systems for tagging and captioning live audio.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRIAN LOUIS ALBERTALLI whose telephone number is (571)272-7616. The examiner can normally be reached M-F 8AM-3PM, 4PM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached at 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





BLA 1/21/26
/BRIAN L ALBERTALLI/               Primary Examiner, Art Unit 2656

Read full office action

Prosecution Timeline

Mar 29, 2024

Application Filed

Jan 22, 2026

Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/859,660

Patent 12592247

INFERRING EMOTION FROM SPEECH IN AUDIO DATA USING DEEP LEARNING

2y 5m to grant Granted Mar 31, 2026

18/049,984

Patent 12573407

QUICK AUDIO PROFILE USING VOICE ASSISTANT

2y 5m to grant Granted Mar 10, 2026

18/142,926

Patent 12574386

DISTRIBUTED IDENTIFICATION IN NETWORKED SYSTEM

2y 5m to grant Granted Mar 10, 2026

18/431,679

Patent 12572327

CONDITIONALLY ASSIGNING VARIOUS AUTOMATED ASSISTANT FUNCTION(S) TO INTERACTION WITH A PERIPHERAL ASSISTANT CONTROL DEVICE

2y 5m to grant Granted Mar 10, 2026

18/740,292

Patent 12573382

ADVERSARIAL LANGUAGE IMITATION WITH CONSTRAINED EXEMPLARS

2y 5m to grant Granted Mar 10, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

82%

Grant Probability

98%

With Interview (+16.5%)

2y 11m

Median Time to Grant

Low

PTA Risk

Based on 852 resolved cases by this examiner. Grant probability derived from career allow rate.

SYSTEMS AND METHODS FOR REAL-TIME CONCERT TRANSCRIPTION AND USER-CAPTURED VIDEO TAGGING

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email