Prosecution Insights
Last updated: April 19, 2026
Application No. 18/633,092

MODIFYING AUDIO DATA IN A VIRTUAL MEETING TO INCREASE UNDERSTANDABILITY

Final Rejection §103
Filed
Apr 11, 2024
Examiner
ALBERTALLI, BRIAN LOUIS
Art Unit
2656
Tech Center
2600 — Communications
Assignee
Google LLC
OA Round
2 (Final)
82%
Grant Probability
Favorable
3-4
OA Rounds
2y 11m
To Grant
98%
With Interview

Examiner Intelligence

Grants 82% — above average
82%
Career Allow Rate
697 granted / 852 resolved
+19.8% vs TC avg
Strong +16% interview lift
Without
With
+16.5%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
19 currently pending
Career history
871
Total Applications
across all art units

Statute-Specific Performance

§101
13.8%
-26.2% vs TC avg
§103
34.9%
-5.1% vs TC avg
§102
27.7%
-12.3% vs TC avg
§112
16.6%
-23.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 852 resolved cases

Office Action

§103
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Response to Arguments Applicant's arguments fail to comply with 37 CFR 1.111(b) because they amount to a general allegation that the claims define a patentable invention without specifically pointing out how the language of the claims patentably distinguishes them from the references. See also 37 CFR 1.133(b). “In every instance where reconsideration is requested in view of an interview with an examiner, a complete written statement of the reasons presented at the interview as warranting favorable action must be filed by the applicant. An interview does not remove the necessity for reply to Office actions as specified in §§ 1.111 and 1.135.” (emphasis added). In order to expedite prosecution, new grounds of rejection are provided herein. Nguyen discloses modifying accents of virtual meeting participants, and does not disclose modifying speech disrupted by a speech disorder. However, Malkin et al. (cited below) disclose a method/system for modifying a communication participant’s speech that is disrupted by a speech disorder to produce speech with a removed speech disorder. Additionally, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to adapt Nguyen to perform such modifications of a speech signal for the reasons provided below. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claim(s) 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Nguyen et al. (U.S. Patent Application Pub. No. 2024/0098218, hereinafter “Nguyen”), in view of Malkin et al. (U.S. Patent Application Pub. No. 2013/0246061, hereinafter “Malkin”). In regard to claim 1, Nguyen discloses a method (Fig. 7, 700), comprising: causing a virtual meeting user interface (UI) to be presented during a virtual meeting between a plurality of participants, the virtual meeting UI providing first audio data associated with an audio stream produced by a client device of a first participant of the plurality of participants (see Fig. 5A, a GUI 500 for virtual conferences is displayed, the GUI 500 including multiple participants 502 and 504, paragraph [0088]; the virtual conference associated with an audio stream including audio data from a client device, paragraphs [0104-0106]); determining that the first audio data associated with the audio stream produced by the client device of the first participant is to be modified during the virtual meeting (a request to convert audio from a participant in the virtual conference is received, paragraph [0107]); generating, using an artificial intelligence (AI) model and using the audio stream produced by the client device of the first participant as input to the AI model, a modified audio stream to improve understandability of the first audio data by one or more participants of the plurality of participants (an accent conversion (AC) model generates from a received audio stream including speech in a source accent to a second audio stream including speech in a target accent, paragraph [0112]; the AC model comprising one or more machine learning models, paragraphs [0013-0014]; allowing participants to more easily understand each other, paragraphs [0016-0017]); and causing second audio data associated with the modified audio stream to be provided during the virtual meeting in place of the first audio data (the second audio stream is transmitted to client devices in the virtual conference, paragraph [0115]). Nguyen does not expressly disclose the modification to improve understandability is applied to first audio data comprising speech disrupted by a speech disorder and that the modified audio stream comprises speech with a removed speech disorder. Malik discloses a method for improving understandability by modifying first audio data wherein the first audio data comprises speech disrupted by a speech disorder and that the modified audio stream comprises speech with a removed speech disorder (see Fig. 3, artifacts in a communication participant’s speech caused by a speech disorder are eliminated, paragraph [0034]). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Nguyen to generate a modified audio stream comprises speech with a removed speech disorder, because it would allow the participant’s speech impairment to be corrected and remove impairments that are not intended as part of speech when the participant speaks, as suggested by Malik (paragraphs [0013-0014]). In regard to claim 2, Nguyen discloses the AI model comprises an AI model trained on a plurality of items of training data (recordings of people speaking, paragraph [0014]), wherein each item of training data comprises: third audio data (pairs of audio data comprising a source audio, paragraphs [0014] and [0076]); and a ground truth comprising fourth audio data that corresponds to the third audio data and improves the understandability of the third audio data (the second audio of the pair comprising target audio data of the same words spoken in a different accent, paragraphs [0014] and [0076]). In regard to claim 3, Nguyen does not disclose the speech comprises a speech disorder. Malik discloses the speech disorder comprises at least one of verbal apraxia; cluttering; aphasia; stuttering; or a speech sound disorder (stutters, etc., paragraphs [0014], [0030], and [0032]). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to remove speech disorders comprising at least one of the above speech disorders, because it would allow the participant’s speech impairment to be corrected and remove impairments that are not intended as part of speech when the participant speaks, as suggested by Malik (paragraphs [0013-0014]). In regard to claim 4, Nguyen discloses generating the modified audio stream comprises using the AI model to perform at least one of: increase a pitch of the audio stream; or change a timbre of the audio stream (the AC model is trained to allow a user to select a desired voice as well as an accent, paragraph [0114]; where performing voice conversion (VC) comprises adjusting the pitch and timbre of the audio, paragraph [0073]). In regard to claim 5, Nguyen discloses determining that the first audio data associated with the audio stream produced by the client device of the first participant is to be modified comprises receiving a command from the client device of the first participant (a participant in the virtual conference requests that their accent is modified, paragraph [0066]). In regard to claim 6, Nguyen discloses the command comprises data indicating an audio effect to be applied by the AI model (accent conversion, paragraph [0066]). In regard to claim 7, Nguyen discloses determining that the first audio data associated with the audio stream produced by the client device of the first participant is to be modified comprises receiving a command from a client device of a second participant of the plurality of participants (a participant in the virtual conference requests that a second participant’s accent is converted, paragraph [0066]). In regard to claim 8, Nguyen discloses causing the second audio data associated with the modified audio stream to be provided during the virtual meeting in place of the first audio data comprises causing, for a subset of the plurality of participants, the second audio data to be provided in place of the first audio data (multiple participants in the virtual conference request accent conversion to a particular accent, paragraph [0116]). In regard to claim 9, Nguyen discloses a system (Fig. 8, 800), comprising: a memory (memory 820); and a processing device (processor 810), coupled to the memory, configured to perform operations comprising: causing a virtual meeting user interface (UI) to be presented during a virtual meeting between a plurality of participants, the virtual meeting UI providing first audio data associated with an audio stream produced by a client device of a first participant of the plurality of participants (see Fig. 5A, a GUI 500 for virtual conferences is displayed, the GUI 500 including multiple participants 502 and 504, paragraph [0088]; the virtual conference associated with an audio stream including audio data from a client device, paragraphs [0104-0106]); determining that the first audio data associated with the audio stream produced by the client device of the first participant is to be modified during the virtual meeting (a request to convert audio from a participant in the virtual conference is received, paragraph [0107]); generating, using an artificial intelligence (AI) model and using the audio stream produced by the client device of the first participant as input to the AI model, a modified audio stream to improve understandability of the first audio data by one or more participants of the plurality of participants (an accent conversion (AC) model generates from a received audio stream including speech in a source accent to a second audio stream including speech in a target accent, paragraph [0112]; the AC model comprising one or more machine learning models, paragraphs [0013-0014]; allowing participants to more easily understand each other, paragraphs [0016-0017]); and causing second audio data associated with the modified audio stream to be provided during the virtual meeting in place of the first audio data (the second audio stream is transmitted to client devices in the virtual conference, paragraph [0115]). Nguyen does not expressly disclose the modification to improve understandability is applied to first audio data comprising speech disrupted by a speech disorder and that the modified audio stream comprises speech with a removed speech disorder. Malik discloses a method for improving understandability by modifying first audio data wherein the first audio data comprises speech disrupted by a speech disorder and that the modified audio stream comprises speech with a removed speech disorder (see Fig. 3, artifacts in a communication participant’s speech caused by a speech disorder are eliminated, paragraph [0034]). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Nguyen to generate a modified audio stream comprises speech with a removed speech disorder, because it would allow the participant’s speech impairment to be corrected and remove impairments that are not intended as part of speech when the participant speaks, as suggested by Malik (paragraphs [0013-0014]). In regard to claim 10, Nguyen discloses the AI model comprises an AI model trained on a plurality of items of training data (recordings of people speaking, paragraph [0014]), wherein each item of training data comprises: third audio data (pairs of audio data comprising a source audio, paragraphs [0014] and [0076]); and a ground truth comprising fourth audio data that corresponds to the third audio data and improves the understandability of the third audio data (the second audio of the pair comprising target audio data of the same words spoken in a different accent, paragraphs [0014] and [0076]). In regard to claim 11, Nguyen does not disclose the speech comprises a speech disorder. Malik discloses the speech disorder comprises at least one of verbal apraxia; cluttering; aphasia; stuttering; or a speech sound disorder (stutters, etc., paragraphs [0014], [0030], and [0032]). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to remove speech disorders comprising at least one of the above speech disorders, because it would allow the participant’s speech impairment to be corrected and remove impairments that are not intended as part of speech when the participant speaks, as suggested by Malik (paragraphs [0013-0014]). In regard to claim 12, Nguyen discloses generating the modified audio stream comprises using the AI model to perform at least one of: increase a pitch of the audio stream; or change a timbre of the audio stream (the AC model is trained to allow a user to select a desired voice as well as an accent, paragraph [0114]; where performing voice conversion (VC) comprises adjusting the pitch and timbre of the audio, paragraph [0073]). In regard to claim 13, Nguyen discloses determining that the first audio data associated with the audio stream produced by the client device of the first participant is to be modified comprises receiving a command from the client device of the first participant (a participant in the virtual conference requests that their accent is modified, paragraph [0066]). In regard to claim 14, Nguyen discloses the command comprises data indicating an audio effect to be applied by the AI model (accent conversion, paragraph [0066]). In regard to claim 15, Nguyen discloses determining that the first audio data associated with the audio stream produced by the client device of the first participant is to be modified comprises receiving a command from a client device of a second participant of the plurality of participants (a participant in the virtual conference requests that a second participant’s accent is converted, paragraph [0066]). In regard to claim 16, Nguyen discloses causing the second audio data associated with the modified audio stream to be provided during the virtual meeting in place of the first audio data comprises causing, for a subset of the plurality of participants, the second audio data to be provided in place of the first audio data (multiple participants in the virtual conference request accent conversion to a particular accent, paragraph [0116]). In regard to claim 17, Nguyen discloses a method (Fig. 7, 700), comprising: causing a virtual meeting user interface (UI) to be presented during a virtual meeting between a plurality of participants, the virtual meeting UI providing a plurality of at a plurality of time periods during the virtual meeting, wherein each first audio data of the plurality of first audio data is associated with an audio stream produced by a client device of a respective participant of the plurality of participants (see Fig. 5A, a GUI 500 for virtual conferences is displayed, the GUI 500 including multiple participants 502 and 504, paragraph [0088]; the virtual conference associated with an audio stream including audio data from a plurality of client devices, paragraphs [0104-0106]); determining that the plurality of first audio data are to be modified during the virtual meeting (a request to convert audio to a target accent for each of a plurality of participants is received, paragraphs [0107-0108]); generating, using a plurality of artificial intelligence (AI) models and using the audio streams of the plurality of participants as input to the AI models, a plurality of modified audio streams (accent conversion (AC) models generate from received audio streams including speech in sources accents, second audio streams including speech in a target accent, paragraphs [0090] and [0012]; the AC model comprising one or more machine learning models, paragraphs [0013-0014]), wherein each modified audio stream is associated with a participant of the plurality of participants (any participants with a different accent will be converted, paragraph [0090]), and the respective modified audio streams improve understandability of the respective first audio data by one or more participants of the plurality of participants (the conversion allows participants to more easily understand each other, paragraphs [0016-0017]); and causing a plurality of second audio data associated with the plurality of modified audio streams to be provided during the virtual meeting in place of the plurality of first audio data (the second audio streams are transmitted to client devices in the virtual conference, paragraphs [0090] and [0115]). Nguyen does not expressly disclose the modification to improve understandability is applied to first audio data comprising speech disrupted by a speech disorder and that the modified audio stream comprises speech with a removed speech disorder. Malik discloses a method for improving understandability by modifying first audio data wherein the first audio data comprises speech disrupted by a speech disorder and that the modified audio stream comprises speech with a removed speech disorder (see Fig. 3, artifacts in a communication participant’s speech caused by a speech disorder are eliminated, paragraph [0034]). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Nguyen to generate a modified audio stream comprises speech with a removed speech disorder, because it would allow the participant’s speech impairment to be corrected and remove impairments that are not intended as part of speech when the participant speaks, as suggested by Malik (paragraphs [0013-0014]). In regard to claim 18, Nguyen discloses the plurality of AI models comprises a first AI model and a second AI model (one or more AC processes comprising the trained AC model, paragraph [0065]); the first AI model applies an audio effect to a first audio stream of the audio streams (an AC process is generated for each participant requesting an accent conversion, paragraphs [0068-0069]); and the second AI model applies the same audio effect to a second audio stream of the audio streams (an AC process is generated for each participant requesting an accent conversion, paragraphs [0068-0069]). In regard to claim 19, Nguyen discloses the plurality of AI models comprises a first AI model and a second AI model (one or more AC processes comprising the trained AC model, paragraph [0065]); the first AI model applies a first audio effect to a first audio stream of the audio streams (an AC process is generated for each participant requesting an accent conversion, paragraphs [0068-0069]); and the second AI model applies a second audio effect to a second audio stream of the audio streams, wherein the second audio effect is different from the first audio effect (an AC process for each source-target accent pair is generated, paragraphs [0068-0069]). In regard to claim 20, Nguyen discloses determining that the plurality of first audio data is to be modified comprises receiving a command from a client device of a first participant of the plurality of participants (a participant in the virtual conference requests accent conversion for a plurality of participants, paragraph [0066]). Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Cockram et al. disclose an additional method for removing speech disorders from speech. Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRIAN LOUIS ALBERTALLI whose telephone number is (571)272-7616. The examiner can normally be reached M-F 8AM-3PM, 4PM-5PM. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached at 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. BLA 3/17/26 /BRIAN L ALBERTALLI/Primary Examiner, Art Unit 2656
Read full office action

Prosecution Timeline

Apr 11, 2024
Application Filed
Nov 05, 2025
Non-Final Rejection — §103
Jan 06, 2026
Interview Requested
Jan 23, 2026
Examiner Interview Summary
Jan 23, 2026
Applicant Interview (Telephonic)
Feb 10, 2026
Response Filed
Mar 17, 2026
Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12592247
INFERRING EMOTION FROM SPEECH IN AUDIO DATA USING DEEP LEARNING
2y 5m to grant Granted Mar 31, 2026
Patent 12573407
QUICK AUDIO PROFILE USING VOICE ASSISTANT
2y 5m to grant Granted Mar 10, 2026
Patent 12574386
DISTRIBUTED IDENTIFICATION IN NETWORKED SYSTEM
2y 5m to grant Granted Mar 10, 2026
Patent 12572327
CONDITIONALLY ASSIGNING VARIOUS AUTOMATED ASSISTANT FUNCTION(S) TO INTERACTION WITH A PERIPHERAL ASSISTANT CONTROL DEVICE
2y 5m to grant Granted Mar 10, 2026
Patent 12573382
ADVERSARIAL LANGUAGE IMITATION WITH CONSTRAINED EXEMPLARS
2y 5m to grant Granted Mar 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
82%
Grant Probability
98%
With Interview (+16.5%)
2y 11m
Median Time to Grant
Moderate
PTA Risk
Based on 852 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month