Last updated: April 19, 2026

Application No. 18/633,092

MODIFYING AUDIO DATA IN A VIRTUAL MEETING TO INCREASE UNDERSTANDABILITY

Final Rejection §103

Filed

Apr 11, 2024

Examiner

ALBERTALLI, BRIAN LOUIS

Art Unit

2656

Tech Center

2600 — Communications

Assignee

Google LLC

OA Round

2 (Final)

Interview Optional

— +16.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 852 resolved cases, 2023–2026

Examiner Intelligence

ALBERTALLI, BRIAN LOUIS View full profile →

Grants 82% — above average

Career Allow Rate

697 granted / 852 resolved

+19.8% vs TC avg

Strong +16% interview lift

Without

With

+16.5%

Interview Lift

resolved cases with interview

Typical timeline

2y 11m

Avg Prosecution

19 currently pending

Career history

871

Total Applications

across all art units

Statute-Specific Performance

§101

13.8%

-26.2% vs TC avg

§103

34.9%

-5.1% vs TC avg

§102

27.7%

-12.3% vs TC avg

§112

16.6%

-23.4% vs TC avg

Black line = Tech Center average estimate • Based on career data from 852 resolved cases

Office Action

§103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments fail to comply with 37 CFR 1.111(b) because they amount to a general allegation that the claims define a patentable invention without specifically pointing out how the language of the claims patentably distinguishes them from the references.  See also 37 CFR 1.133(b). “In every instance where reconsideration is requested in view of an interview with an examiner, a complete written statement of the reasons presented at the interview as warranting favorable action must be filed by the applicant. An interview does not remove the necessity for reply to Office actions as specified in §§ 1.111  and 1.135.” (emphasis added).
In order to expedite prosecution, new grounds of rejection are provided herein. Nguyen discloses modifying accents of virtual meeting participants, and does not disclose modifying speech disrupted by a speech disorder.  However, Malkin et al. (cited below) disclose a method/system for modifying a communication participant’s speech that is disrupted by a speech disorder to produce speech with a removed speech disorder. Additionally, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to adapt Nguyen to perform such modifications of a speech signal for the reasons provided below.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Nguyen et al. (U.S. Patent Application Pub. No. 2024/0098218, hereinafter “Nguyen”), in view of Malkin et al. (U.S. Patent Application Pub. No. 2013/0246061, hereinafter “Malkin”).
In regard to claim 1, Nguyen discloses a method (Fig. 7, 700), comprising:
causing a virtual meeting user interface (UI) to be presented during a virtual meeting between a plurality of participants, the virtual meeting UI providing first audio data associated with an audio stream produced by a client device of a first participant of the plurality of participants (see Fig. 5A, a GUI 500 for virtual conferences is displayed, the GUI 500 including multiple participants 502 and 504, paragraph [0088]; the virtual conference associated with an audio stream including audio data from a client device, paragraphs [0104-0106]);
determining that the first audio data associated with the audio stream produced by the client device of the first participant is to be modified during the virtual meeting (a request to convert audio from a participant in the virtual conference is received, paragraph [0107]);
generating, using an artificial intelligence (AI) model and using the audio stream produced by the client device of the first participant as input to the AI model, a modified audio stream to improve understandability of the first audio data by one or more participants of the plurality of participants (an accent conversion (AC) model generates from a received audio stream including speech in a source accent to a second audio stream including speech in a target accent, paragraph [0112]; the AC model comprising one or more machine learning models, paragraphs [0013-0014]; allowing participants to more easily understand each other, paragraphs [0016-0017]); and
causing second audio data associated with the modified audio stream to be provided during the virtual meeting in place of the first audio data (the second audio stream is transmitted to client devices in the virtual conference, paragraph [0115]).
Nguyen does not expressly disclose the modification to improve understandability is applied to first audio data comprising speech disrupted by a speech disorder and that the modified audio stream comprises speech with a removed speech disorder.
Malik discloses a method for improving understandability by modifying first audio data wherein the first audio data comprises speech disrupted by a speech disorder and that the modified audio stream comprises speech with a removed speech disorder (see Fig. 3, artifacts in a communication participant’s speech caused by a speech disorder are eliminated, paragraph [0034]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Nguyen to generate a modified audio stream comprises speech with a removed speech disorder, because it would allow the participant’s speech impairment to be corrected and remove impairments that are not intended as part of speech when the participant speaks, as suggested by Malik (paragraphs [0013-0014]).

In regard to claim 2, Nguyen discloses the AI model comprises an AI model trained on a plurality of items of training data (recordings of people speaking, paragraph [0014]), wherein each item of training data comprises:
third audio data (pairs of audio data comprising a source audio, paragraphs [0014] and [0076]); and
a ground truth comprising fourth audio data that corresponds to the third audio data and improves the understandability of the third audio data (the second audio of the pair comprising target audio data of the same words spoken in a different accent, paragraphs [0014] and [0076]).
	In regard to claim 3, Nguyen does not disclose the speech comprises a speech disorder.
	Malik discloses the speech disorder comprises at least one of verbal apraxia; cluttering; aphasia; stuttering; or a speech sound disorder (stutters, etc., paragraphs [0014], [0030], and [0032]).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to remove speech disorders comprising at least one of the above speech disorders, because it would allow the participant’s speech impairment to be corrected and remove impairments that are not intended as part of speech when the participant speaks, as suggested by Malik (paragraphs [0013-0014]).

In regard to claim 4, Nguyen discloses generating the modified audio stream comprises using the AI model to perform at least one of:
increase a pitch of the audio stream; or
change a timbre of the audio stream (the AC model is trained to allow a user to select a desired voice as well as an accent, paragraph [0114]; where performing voice conversion (VC) comprises adjusting the pitch and timbre of the audio, paragraph [0073]).

In regard to claim 5, Nguyen discloses determining that the first audio data associated with the audio stream produced by the client device of the first participant is to be modified comprises receiving a command from the client device of the first participant (a participant in the virtual conference requests that their accent is modified, paragraph [0066]).

In regard to claim 6, Nguyen discloses the command comprises data indicating an audio effect to be applied by the AI model (accent conversion, paragraph [0066]).

In regard to claim 7, Nguyen discloses determining that the first audio data associated with the audio stream produced by the client device of the first participant is to be modified comprises receiving a command from a client device of a second participant of the plurality of participants (a participant in the virtual conference requests that a second participant’s accent is converted, paragraph [0066]).

	In regard to claim 8, Nguyen discloses causing the second audio data associated with the modified audio stream to be provided during the virtual meeting in place of the first audio data comprises causing, for a subset of the plurality of participants, the second audio data to be provided in place of the first audio data (multiple participants in the virtual conference request accent conversion to a particular accent, paragraph [0116]).

In regard to claim 9, Nguyen discloses a system (Fig. 8, 800), comprising:
a memory (memory 820); and
a processing device (processor 810), coupled to the memory, configured to perform operations comprising:
causing a virtual meeting user interface (UI) to be presented during a virtual meeting between a plurality of participants, the virtual meeting UI providing first audio data associated with an audio stream produced by a client device of a first participant of the plurality of participants (see Fig. 5A, a GUI 500 for virtual conferences is displayed, the GUI 500 including multiple participants 502 and 504, paragraph [0088]; the virtual conference associated with an audio stream including audio data from a client device, paragraphs [0104-0106]);
determining that the first audio data associated with the audio stream produced by the client device of the first participant is to be modified during the virtual meeting (a request to convert audio from a participant in the virtual conference is received, paragraph [0107]);
generating, using an artificial intelligence (AI) model and using the audio stream produced by the client device of the first participant as input to the AI model, a modified audio stream to improve understandability of the first audio data by one or more participants of the plurality of participants (an accent conversion (AC) model generates from a received audio stream including speech in a source accent to a second audio stream including speech in a target accent, paragraph [0112]; the AC model comprising one or more machine learning models, paragraphs [0013-0014]; allowing participants to more easily understand each other, paragraphs [0016-0017]); and
causing second audio data associated with the modified audio stream to be provided during the virtual meeting in place of the first audio data (the second audio stream is transmitted to client devices in the virtual conference, paragraph [0115]).
Nguyen does not expressly disclose the modification to improve understandability is applied to first audio data comprising speech disrupted by a speech disorder and that the modified audio stream comprises speech with a removed speech disorder.
Malik discloses a method for improving understandability by modifying first audio data wherein the first audio data comprises speech disrupted by a speech disorder and that the modified audio stream comprises speech with a removed speech disorder (see Fig. 3, artifacts in a communication participant’s speech caused by a speech disorder are eliminated, paragraph [0034]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Nguyen to generate a modified audio stream comprises speech with a removed speech disorder, because it would allow the participant’s speech impairment to be corrected and remove impairments that are not intended as part of speech when the participant speaks, as suggested by Malik (paragraphs [0013-0014]).

In regard to claim 10, Nguyen discloses the AI model comprises an AI model trained on a plurality of items of training data (recordings of people speaking, paragraph [0014]), wherein each item of training data comprises:
third audio data (pairs of audio data comprising a source audio, paragraphs [0014] and [0076]); and
a ground truth comprising fourth audio data that corresponds to the third audio data and improves the understandability of the third audio data (the second audio of the pair comprising target audio data of the same words spoken in a different accent, paragraphs [0014] and [0076]).

In regard to claim 11, Nguyen does not disclose the speech comprises a speech disorder.
	Malik discloses the speech disorder comprises at least one of verbal apraxia; cluttering; aphasia; stuttering; or a speech sound disorder (stutters, etc., paragraphs [0014], [0030], and [0032]).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to remove speech disorders comprising at least one of the above speech disorders, because it would allow the participant’s speech impairment to be corrected and remove impairments that are not intended as part of speech when the participant speaks, as suggested by Malik (paragraphs [0013-0014]).

In regard to claim 12, Nguyen discloses generating the modified audio stream comprises using the AI model to perform at least one of:
increase a pitch of the audio stream; or
change a timbre of the audio stream (the AC model is trained to allow a user to select a desired voice as well as an accent, paragraph [0114]; where performing voice conversion (VC) comprises adjusting the pitch and timbre of the audio, paragraph [0073]).

In regard to claim 13, Nguyen discloses determining that the first audio data associated with the audio stream produced by the client device of the first participant is to be modified comprises receiving a command from the client device of the first participant (a participant in the virtual conference requests that their accent is modified, paragraph [0066]).

In regard to claim 14, Nguyen discloses the command comprises data indicating an audio effect to be applied by the AI model (accent conversion, paragraph [0066]).

In regard to claim 15, Nguyen discloses determining that the first audio data associated with the audio stream produced by the client device of the first participant is to be modified comprises receiving a command from a client device of a second participant of the plurality of participants (a participant in the virtual conference requests that a second participant’s accent is converted, paragraph [0066]).

	In regard to claim 16, Nguyen discloses causing the second audio data associated with the modified audio stream to be provided during the virtual meeting in place of the first audio data comprises causing, for a subset of the plurality of participants, the second audio data to be provided in place of the first audio data (multiple participants in the virtual conference request accent conversion to a particular accent, paragraph [0116]).

In regard to claim 17, Nguyen discloses a method (Fig. 7, 700), comprising:
causing a virtual meeting user interface (UI) to be presented during a virtual meeting between a plurality of participants, the virtual meeting UI providing a plurality of at a plurality of time periods during the virtual meeting, wherein each first audio data of the plurality of first audio data is associated with an audio stream produced by a client device of a respective participant of the plurality of participants (see Fig. 5A, a GUI 500 for virtual conferences is displayed, the GUI 500 including multiple participants 502 and 504, paragraph [0088]; the virtual conference associated with an audio stream including audio data from a plurality of client devices, paragraphs [0104-0106]);
determining that the plurality of first audio data are to be modified during the virtual meeting (a request to convert audio to a target accent for each of a plurality of participants is received, paragraphs [0107-0108]);
generating, using a plurality of artificial intelligence (AI) models and using the audio streams of the plurality of participants as input to the AI models, a plurality of modified audio streams (accent conversion (AC) models generate from received audio streams including speech in sources accents, second audio streams including speech in a target accent, paragraphs [0090] and [0012]; the AC model comprising one or more machine learning models, paragraphs [0013-0014]), wherein
each modified audio stream is associated with a participant of the plurality of participants (any participants with a different accent will be converted, paragraph [0090]), and
the respective modified audio streams improve understandability of the respective first audio data by one or more participants of the plurality of participants (the conversion allows participants to more easily understand each other, paragraphs [0016-0017]); and
causing a plurality of second audio data associated with the plurality of modified audio streams to be provided during the virtual meeting in place of the plurality of first audio data (the second audio streams are transmitted to client devices in the virtual conference, paragraphs [0090] and [0115]).
Nguyen does not expressly disclose the modification to improve understandability is applied to first audio data comprising speech disrupted by a speech disorder and that the modified audio stream comprises speech with a removed speech disorder.
Malik discloses a method for improving understandability by modifying first audio data wherein the first audio data comprises speech disrupted by a speech disorder and that the modified audio stream comprises speech with a removed speech disorder (see Fig. 3, artifacts in a communication participant’s speech caused by a speech disorder are eliminated, paragraph [0034]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Nguyen to generate a modified audio stream comprises speech with a removed speech disorder, because it would allow the participant’s speech impairment to be corrected and remove impairments that are not intended as part of speech when the participant speaks, as suggested by Malik (paragraphs [0013-0014]).

In regard to claim 18, Nguyen discloses the plurality of AI models comprises a first AI model and a second AI model (one or more AC processes comprising the trained AC model, paragraph [0065]);
the first AI model applies an audio effect to a first audio stream of the audio streams (an AC process is generated for each participant requesting an accent conversion, paragraphs [0068-0069]); and
the second AI model applies the same audio effect to a second audio stream of the audio streams (an AC process is generated for each participant requesting an accent conversion, paragraphs [0068-0069]).

In regard to claim 19, Nguyen discloses the plurality of AI models comprises a first AI model and a second AI model (one or more AC processes comprising the trained AC model, paragraph [0065]);
the first AI model applies a first audio effect to a first audio stream of the audio streams (an AC process is generated for each participant requesting an accent conversion, paragraphs [0068-0069]); and
the second AI model applies a second audio effect to a second audio stream of the audio streams, wherein the second audio effect is different from the first audio effect (an AC process for each source-target accent pair is generated, paragraphs [0068-0069]).

In regard to claim 20, Nguyen discloses determining that the plurality of first audio data is to be modified comprises receiving a command from a client device of a first participant of the plurality of participants (a participant in the virtual conference requests accent conversion for a plurality of participants, paragraph [0066]).


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Cockram et al. disclose an additional method for removing speech disorders from speech.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRIAN LOUIS ALBERTALLI whose telephone number is (571)272-7616. The examiner can normally be reached M-F 8AM-3PM, 4PM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached at 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





BLA 3/17/26
/BRIAN L ALBERTALLI/Primary Examiner, Art Unit 2656

Read full office action

Prosecution Timeline

Apr 11, 2024

Application Filed

Nov 05, 2025

Non-Final Rejection — §103

Jan 06, 2026

Interview Requested

Jan 23, 2026

Examiner Interview Summary

Jan 23, 2026

Applicant Interview (Telephonic)

Feb 10, 2026

Response Filed

Mar 17, 2026

Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/859,660

Patent 12592247

INFERRING EMOTION FROM SPEECH IN AUDIO DATA USING DEEP LEARNING

2y 5m to grant Granted Mar 31, 2026

18/049,984

Patent 12573407

QUICK AUDIO PROFILE USING VOICE ASSISTANT

2y 5m to grant Granted Mar 10, 2026

18/142,926

Patent 12574386

DISTRIBUTED IDENTIFICATION IN NETWORKED SYSTEM

2y 5m to grant Granted Mar 10, 2026

18/431,679

Patent 12572327

CONDITIONALLY ASSIGNING VARIOUS AUTOMATED ASSISTANT FUNCTION(S) TO INTERACTION WITH A PERIPHERAL ASSISTANT CONTROL DEVICE

2y 5m to grant Granted Mar 10, 2026

18/740,292

Patent 12573382

ADVERSARIAL LANGUAGE IMITATION WITH CONSTRAINED EXEMPLARS

2y 5m to grant Granted Mar 10, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

82%

Grant Probability

98%

With Interview (+16.5%)

2y 11m

Median Time to Grant

Moderate

PTA Risk

Based on 852 resolved cases by this examiner. Grant probability derived from career allow rate.

MODIFYING AUDIO DATA IN A VIRTUAL MEETING TO INCREASE UNDERSTANDABILITY

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email