Last updated: May 29, 2026
Application No. 18/487,419
AUDIO MANIPULATION OF EMULATED CONTENT

Final Rejection §103
Filed
Oct 16, 2023
Examiner
BECKER, TYLER JUSTIN
Art Unit
2657
Tech Center
2600 — Communications
Assignee
Motorola Mobility LLC
OA Round
2 (Final)
Interview Optional

— +16.5% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 75% grant rate with +16.5% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 20 resolved cases, 2023–2026
Examiner Intelligence

BECKER, TYLER JUSTIN View full profile →
Grants 75% — above average
Career Allowance Rate
15 granted / 20 resolved
+13.0% vs TC avg
Strong +16% interview lift
Without
With
+16.5%
Interview Lift
resolved cases with interview
Typical timeline
2y 7m
Avg Prosecution
11 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
1.2%
-38.8% vs TC avg
§103
90.4%
+50.4% vs TC avg
§102
3.6%
-36.4% vs TC avg
§112
4.8%
-35.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 20 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
	The amendments filed January 12th, 2026 have been entered. Claims 1, 9, and 15 have been amended. Claims 1-20 are pending and have been examined.

Response to Arguments
Applicant’s arguments with respect to claim(s) 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 3, and 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bloom et al. (US Pat. Pub. No. 2006/0165240 A1 hereinafter Bloom), in view of Haupt et al. (US Pat. Pub. No. 2013/0019738 A1 hereinafter Haupt).
Regarding claim 1, Bloom discloses a media device, comprising: a memory configured to store original audio content (Bloom, [0170]: "the processes can be implemented wholly or in part in phones or any other devices that contain a computer system and memory and means for inputting and outputting the required audio signals."; [0037]: "Typically, in an embodiment, to start, the New and Guide Signals are sampled and stored digitally."); and an audio manipulation manager implemented at least partially in hardware, the audio manipulation manager configured to: receive, from a user, input audio content that emulates the original audio content (Bloom, [0022]: "during a playback of the original audio and optional corresponding video, the user's voice is digitized and input to the apparatus (as the New recording)."); determine a user voice category from the input audio content; and transform the input audio content to manipulated audio content by changing the user voice category to the content creator voice category (Bloom, [0022]: "One specific application of this invention is that of automatically adjusting the pitch of a new audio signal ("New Signal") generated by a typical member of the public to follow the pitch of another audio signal ("Guide Signal") generated by a professional singer."). However, Bloom fails to expressly recite receive metadata associated with the original audio content, the metadata including a content creator voice category of the original audio content, the content creator voice category received separately from the original audio content.
Haupt teaches receive metadata associated with the original audio content, the metadata including a content creator voice category of the original audio content, the content creator voice category received separately from the original audio content (Haupt, Fig. 1, 16 and 30; [0028]: “As shown in FIG. 1 the program is initialized and starts with a pre-voice analysis speech is conducted in block or step 10, which feeds to a step 12 for voice analysis. Pitch transposition and multiplication take place in step 14 with input from pitch multiplication parameter information provided in step 16. Stochastic/deterministic transposition occurs in step 18 with singer stochastic/deterministic parameter information provided by step 20. A singing voice model is created in step 22 and passed to spectrogram interpolation between words in step 24. Spectrogram energy shaping and transposition occurs in step 26, which receives the output of singer energy parameter information from step 32 obtained from singer database 28 and vocal track 30. The program moves to step 34 for voice synthesis and then to step 36 for post-voice synthesis speech.”; Here the pitch information is interpreted as the content creator voice category, and is received separately from the vocal track, which is interpreted as the original audio content.).
	Bloom and Haupt are analogous arts because they both belong to the same field of audio processing. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the sound modification method and apparatus of Bloom to incorporate the teachings of Haupt to receive pitch information separately from the vocal track. This enables the system to use different information about the content creator’s voice at different stages of processing (Haupt, [0028]). This ensures that the system can properly access and process necessary information at each of its processes’ stages.

	Regarding claim 3, the rejection of claim 1 is incorporated. Bloom, in view of Haupt, discloses all of the elements of the current invention as stated above. Bloom further discloses wherein the original audio content is a song, and the input audio content is a cover of the song (Bloom, [0022]: " An example of this is a karaoke-style recording and playback system using digitized music videos as the original source in which, during a playback of the original audio and optional corresponding video, the user's voice is digitized and input to the apparatus (as the New recording).").

	Regarding claim 7, the rejection of claim 1 is incorporated. Bloom, in view of Haupt, discloses all of the elements of the current invention as stated above. Bloom further discloses wherein: a second content creator voice category is included with the metadata associated with the original audio content; and the audio manipulation manager is configured to change the manipulated audio content from the content creator voice category to the second content creator voice category (Bloom, [0156]: "In further embodiments, the Guide Signal can be made up of a series of different individual signals instead of one continuous signal, or multiple Guide Signals (e.g. harmony vocals) can be used to generate multiple vocal parts from a single New Signal.").

Claim(s) 2 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bloom, in view of Haupt, as applied to claims 1, 3, and 7 above, and further in view of Shah et al. (US Pat. Pub. No. 2022/0070295 A1 hereinafter Shah).
Regarding claim 2, the rejection of claim 1 is incorporated. Bloom, in view of Haupt, discloses all of the elements of the current invention as stated above. However, Bloom, in view of Haupt, fails to expressly recite wherein to change the user voice category to the content creator voice category, the audio manipulation manager is configured to change a user tone and a user pitch to a content creator tone and a content creator pitch.
Shah teaches wherein to change the user voice category to the content creator voice category, the audio manipulation manager is configured to change a user tone and a user pitch to a content creator tone and a content creator pitch (Shah, [0007]: "Using Artificial Intelligence (AI), such as a neural network, the voice (for audio calls or audio portion of an audio/video call) and face may be overlaid over the actual live agent's face and/or voice so the customer is presented with speech, and/or images, of the desired entity."; [0009]: "For speech, the tone and pitch of the voice of agent are also mapped to that of the celebrity's in real time.").
Bloom, Haupt, and Shah are analogous arts because they each belong to the same field of audio processing. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the sound modification method and apparatus of Bloom, as modified by the speech to song conversion method of Haupt, to incorporate the teachings of Shah to modify both pitch and tone of the user. This allows the user’s voice to better be modified to emulate a specific person’s voice (Shah, [0007]). By better emulating a specific voice, the systems final output is improved.

Claim(s) 4 and 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bloom, in view of Haupt, as applied to claims 1, 3, and 7 above, and further in view of Lindahl et al. (US Pat. Pub. No. 2013/0329908 A1 hereinafter Lindahl).
Regarding claim 4, the rejection of claim 1 is incorporated. Bloom, in view of Haupt, discloses all of the elements of the current invention as stated above. However, Bloom, in view of Haupt, fails to expressly recite wherein the audio manipulation manager is configured to detect that the media device is operating in an audio manipulation mode.
Lindahl teaches wherein the audio manipulation manager is configured to detect that the media device is operating in an audio manipulation mode (Lindahl, [0009]: "To configure the audio beamforming settings, the computing system can detect a predetermined actively running application, such as a dictation application, a speech recognition application, an audio communications application, a video chat application, an audio recording application, or a music playback application.").
Bloom, Haupt, and Lindahl are analogous arts because they both belong to the same field of audio processing. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the sound modification method and apparatus of Bloom, as modified by the speech to song conversion method of Haupt, to incorporate the teachings of Lindahl to detect an audio manipulation mode. This allows the system to be configured based on a current state of a computing device (Lindahl, [0009]). This helps ensure the system works properly with other running programs.

Regarding claim 8, the rejection of claim 1 is incorporated. Bloom, in view of Haupt, discloses all of the elements of the current invention as stated above. Bloom further discloses wherein the audio manipulation manager is configured to: communicate the manipulated audio content to the audio output device for audio playback of the manipulated audio content (Bloom, [0022]: "With this system, a modified user's voice signal can be created that is automatically time and pitch corrected. When the modified voice signal is played back synchronously with the original video, the user's voice can accurately replace the original performer's recorded voice in terms of both pitch and time, including any lip synching."). However, Bloom, in view of Haupt, fails to expressly recite wherein the audio manipulation manager is configured to: detect an audio output device is connected to the media device.
Lindahl teaches wherein the audio manipulation manager is configured to: detect an audio output device is connected to the media device (Lindahl, [0009]: "in some cases, the system can detect at least one predetermined device setting, such as fan speed, current audio route, or a configuration of microphone and speaker placement."). The same motivation for claim 4 applies equally to claim 8.

Claim(s) 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bloom, in view of Haupt and Lindahl, as applied to claims 4 and 8 above, and further in view of Wang et al. (US Pat. Pub. No. 2023/0281335 A1 hereinafter Wang).
Regarding claim 5, the rejection of claim 4 is incorporated. Bloom, in view of Haupt and Lindahl, discloses all of the elements of the current invention as stated above. Lindahl further teaches wherein to detect that the media device is operating in the audio manipulation mode, the audio manipulation manager is configured to: detect an audio application is running in a foreground of the media device (Lindahl, [0009]: "To configure the audio beamforming settings, the computing system can detect a predetermined actively running application, such as a dictation application, a speech recognition application, an audio communications application, a video chat application, an audio recording application, or a music playback application."). The same motivation for claim 4 applies equally to claim 5. However, Bloom, in view of Haupt and Lindahl, fails to expressly recite wherein to detect that the media device is operating in the audio manipulation mode, the audio manipulation manager is configured to: detect the audio application is requesting use of a microphone of the media device.
Wang teaches wherein to detect that the media device is operating in the audio manipulation mode, the audio manipulation manager is configured to: detect the audio application is requesting use of a microphone of the media device (Wang, [0054]: "the sound filter application can detect when a third party application is attempting to utilize a microphone, sound sensor, or the like to obtain information related to the user").
Bloom, Haupt, Lindahl, and Wang are analogous arts because they all belong to the same field of audio processing. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the sound modification method and apparatus of Bloom, as modified by speech to song conversion method of Haupt and the beamforming settings adjustment method of Lindahl, to incorporate the teachings of Wang to detect an audio application attempting to use a microphone. This ensures that no application is using a microphone without proper permissions (Wang, [0054]). This is important to protect the user’s privacy while using the system. 

Claim(s) 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bloom, in view of Haupt, as applied to claims 1, 3, and 7 above, and further in view of Metcalf, Michael (US Pat. Pub. No. 2012/0096018 A1 hereinafter Metcalf).
Regarding claim 6, the rejection of claim 1 is incorporated. Bloom, in view of Haupt, discloses all of the elements of the current invention as stated above. However, Bloom, in view of Haupt, fails to expressly recite wherein the audio manipulation manager is configured to identify the original audio content that is emulated by the input audio content.
Metcalf teaches wherein the audio manipulation manager is configured to identify the original audio content that is emulated by the input audio content (Metcalf, [0016]: "the user sings or hums a portion of the song, in which case a pitch detector, a speech recognizer, or a combination of both is used in VUI platform 20 to identify the song.").
Bloom, Haupt, and Metcalf are analogous arts because they both belong to the same field of audio processing. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the sound modification method and apparatus of Bloom, as modified by the speech to song conversion method of Haupt, to incorporate the teachings of Metcalf to identify audio content. This allows the system to find a song quickly and little or no manual input from the user (Metcalf, [0004]). This improves the overall user experience of the system.

Claim(s) 9, 11-13, 15, and 18-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bloom, in view of Metcalf and HauptauptH.
Regarding claim 9, Bloom discloses a method, comprising: receiving, from a user, input audio content (Bloom, [0022]: "during a playback of the original audio and optional corresponding video, the user's voice is digitized and input to the apparatus (as the New recording)."); determining a user voice category from the input audio content; and transforming the input audio content to manipulated audio content by changing the user voice category to the content creator voice category (Bloom, [0022]: "One specific application of this invention is that of automatically adjusting the pitch of a new audio signal ("New Signal") generated by a typical member of the public to follow the pitch of another audio signal ("Guide Signal") generated by a professional singer."). However, Bloom fails to expressly recite identifying original audio content that is emulated by the input audio content; and receiving metadata associated with the original audio content, the metadata including a content creator voice category from the original audio content, the content creator voice category received separately from the original audio content.
	Metcalf teaches identifying original audio content that is emulated by the input audio content (Metcalf, [0016]: "the user sings or hums a portion of the song, in which case a pitch detector, a speech recognizer, or a combination of both is used in VUI platform 20 to identify the song.").
Bloom and Metcalf are analogous arts because they both belong to the same field of audio processing. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the sound modification method and apparatus of Bloom to incorporate the teachings of Metcalf to identify audio content. This allows the system to find a song quickly and little or no manual input from the user (Metcalf, [0004]). This improves the overall user experience of the system. However, Bloom, in view of Metcalf, fails to expressly recite receiving metadata associated with the original audio content, the metadata including a content creator voice category from the original audio content, the content creator voice category received separately from the original audio content.
Haupt teaches receiving metadata associated with the original audio content, the metadata including a content creator voice category from the original audio content, the content creator voice category received separately from the original audio content (Haupt, Fig. 1, 16 and 30; [0028]: “As shown in FIG. 1 the program is initialized and starts with a pre-voice analysis speech is conducted in block or step 10, which feeds to a step 12 for voice analysis. Pitch transposition and multiplication take place in step 14 with input from pitch multiplication parameter information provided in step 16. Stochastic/deterministic transposition occurs in step 18 with singer stochastic/deterministic parameter information provided by step 20. A singing voice model is created in step 22 and passed to spectrogram interpolation between words in step 24. Spectrogram energy shaping and transposition occurs in step 26, which receives the output of singer energy parameter information from step 32 obtained from singer database 28 and vocal track 30. The program moves to step 34 for voice synthesis and then to step 36 for post-voice synthesis speech.”; Here the pitch information is seen as the content creator voice category, and is received separately from the vocal track, which is seen as the original audio content.).
	Bloom, Metcalf, and Haupt are analogous arts because they each belong to the same field of audio processing. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the sound modification method and apparatus of Bloom, as modified by the music selection method of Metcalf, to incorporate the teachings of Haupt to receive pitch information separately from the vocal track. This enables the system to use different information about the content creator’s voice at different stages of processing (Haupt, [0028]). This ensures that the system can properly access and process necessary information at each of its processes’ stages.

Regarding claim 11, the rejection of claim 9 is incorporated. Bloom, in view of Metcalf and Haupt, discloses all of the elements of the current invention as stated above. Bloom further discloses wherein the original audio content is a song, and the input audio content is a cover of the song (Bloom, [0022]: " An example of this is a karaoke-style recording and playback system using digitized music videos as the original source in which, during a playback of the original audio and optional corresponding video, the user's voice is digitized and input to the apparatus (as the New recording).").

Regarding claim 12, the rejection of claim 9 is incorporated. Bloom, in view of Metcalf and Haupt, discloses all of the elements of the current invention as stated above. Metcalf further teaches wherein identifying the original audio content includes at least one of: accessing an audio application to determine the original audio content; or detecting a tune of the input audio content (Metcalf, [0016]: "the user sings or hums a portion of the song, in which case a pitch detector, a speech recognizer, or a combination of both is used in VUI platform 20 to identify the song."). The same motivation for claim 9 applies equally to claim 12.

Regarding claim 13, the rejection of claim 9 is incorporated. Bloom, in view of Metcalf and Haupt, discloses all of the elements of the current invention as stated above. Bloom further discloses wherein a second content creator voice category is included with the metadata associated with the original audio content; and the transforming includes changing the manipulated audio content from the content creator voice category to the second content creator voice category (Bloom, [0156]: "In further embodiments, the Guide Signal can be made up of a series of different individual signals instead of one continuous signal, or multiple Guide Signals (e.g. harmony vocals) can be used to generate multiple vocal parts from a single New Signal.").

Regarding claim 15, Bloom discloses a system, comprising: a processor (Bloom, [0170]: "the processes can be implemented wholly or in part in phones or any other devices that contain a computer system and memory and means for inputting and outputting the required audio signals."); and an audio manipulation manager implemented at least partially by the processor, configured to: detect an audio manipulation mode (Bloom, [0021]: “Preferred embodiments of this invention provide methods and apparatus for automatically and correctly modifying one or more signal characteristics of a second digitized audio signal to be a function of specified features in a first digitized audio signal.”); determine a user voice category from the input audio content; and transform the input audio content to manipulated audio content by changing the user voice category to the content creator voice category (Bloom, [0022]: "One specific application of this invention is that of automatically adjusting the pitch of a new audio signal ("New Signal") generated by a typical member of the public to follow the pitch of another audio signal ("Guide Signal") generated by a professional singer."). However, Bloom fails to expressly recite identify original audio content that is emulated by input audio content; and receive a content creator voice category associated with the original audio content, the content creator voice category received separately from the original audio content.
Metcalf teaches identify original audio content that is emulated by input audio content (Metcalf, [0016]: "the user sings or hums a portion of the song, in which case a pitch detector, a speech recognizer, or a combination of both is used in VUI platform 20 to identify the song.").
Bloom and Metcalf are analogous arts because they both belong to the same field of audio processing. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the sound modification method and apparatus of Bloom to incorporate the teachings of Metcalf to identify audio content. This allows the system to find a song quickly and little or no manual input from the user (Metcalf, [0004]). This improves the overall user experience of the system. However, Bloom, in view of Metcalf, fails to expressly recite receive a content creator voice category associated with the original audio content, the content creator voice category received separately from the original audio content.
Haupt teaches receive a content creator voice category associated with the original audio content, the content creator voice category received separately from the original audio content (Haupt, Fig. 1, 16 and 30; [0028]: “As shown in FIG. 1 the program is initialized and starts with a pre-voice analysis speech is conducted in block or step 10, which feeds to a step 12 for voice analysis. Pitch transposition and multiplication take place in step 14 with input from pitch multiplication parameter information provided in step 16. Stochastic/deterministic transposition occurs in step 18 with singer stochastic/deterministic parameter information provided by step 20. A singing voice model is created in step 22 and passed to spectrogram interpolation between words in step 24. Spectrogram energy shaping and transposition occurs in step 26, which receives the output of singer energy parameter information from step 32 obtained from singer database 28 and vocal track 30. The program moves to step 34 for voice synthesis and then to step 36 for post-voice synthesis speech.”; Here the pitch information is seen as the content creator voice category, and is received separately from the vocal track, which is seen as the original audio content.).
	Bloom, Metcalf, and Haupt are analogous arts because they each belong to the same field of audio processing. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the sound modification method and apparatus of Bloom, as modified by the music selection method of Metcalf, to incorporate the teachings of Haupt to receive pitch information separately from the vocal track. This enables the system to use different information about the content creator’s voice at different stages of processing (Haupt, [0028]). This ensures that the system can properly access and process necessary information at each of its processes’ stages.

Regarding claim 18, the rejection of claim 15 is incorporated. Bloom, in view of Metcalf and Haupt, discloses all of the elements of the current invention as stated above. Bloom further discloses wherein the original audio content is a song, and the input audio content is a cover of the song (Bloom, [0022]: " An example of this is a karaoke-style recording and playback system using digitized music videos as the original source in which, during a playback of the original audio and optional corresponding video, the user's voice is digitized and input to the apparatus (as the New recording).").

Regarding claim 19, the rejection of claim 15 is incorporated. Bloom, in view of Metcalf and Haupt, discloses all of the elements of the current invention as stated above. Metcalf further teaches wherein to identify the original audio content, the audio manipulation manager is configured to at least one of: access an audio application to determine the original audio content; or detect a tune of the input audio content (Metcalf, [0016]: "the user sings or hums a portion of the song, in which case a pitch detector, a speech recognizer, or a combination of both is used in VUI platform 20 to identify the song."). The same motivation for claim 15 applies equally to claim 19.

Regarding claim 20, the rejection of claim 15 is incorporated. Bloom, in view of Metcalf and Haupt, discloses all of the elements of the current invention as stated above. Bloom further discloses wherein the audio manipulation manager is configured to: receive a second content creator voice category associated with the original audio content; and change the manipulated audio content from the content creator voice category to the second content creator voice category (Bloom, [0156]: "In further embodiments, the Guide Signal can be made up of a series of different individual signals instead of one continuous signal, or multiple Guide Signals (e.g. harmony vocals) can be used to generate multiple vocal parts from a single New Signal.").

Claim(s) 10 and 16-17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bloom, in view of Metcalf and Haupt, as applied to claims 9, 11-13, 15, and 18-20 above, and further in view of Shah.
Regarding claim 10, the rejection of claim 9 is incorporated. Bloom, in view of Metcalf and Haupt, discloses all of the elements of the current invention as stated above. However, Bloom, in view of Metcalf and Haupt, fails to expressly recite wherein changing the user voice category to the content creator voice category includes changing a user tone and a user pitch to a content creator tone and a content creator pitch.
Shah teaches wherein changing the user voice category to the content creator voice category includes changing a user tone and a user pitch to a content creator tone and a content creator pitch (Shah, [0007]: "Using Artificial Intelligence (AI), such as a neural network, the voice (for audio calls or audio portion of an audio/video call) and face may be overlaid over the actual live agent's face and/or voice so the customer is presented with speech, and/or images, of the desired entity."; [0009]: "For speech, the tone and pitch of the voice of agent are also mapped to that of the celebrity's in real time.").
Bloom, Metcalf, Haupt, and Shah are analogous arts because they all belong to the same field of audio processing. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the sound modification method and apparatus of Bloom, as modified by the music selection method of Metcalf and the speech to song conversion method of Haupt, to incorporate the teachings of Shah to modify both pitch and tone of the user. This allows the user’s voice to better be modified to emulate a specific person’s voice (Shah, [0007]). By better emulating a specific voice, the systems final output is improved.

Regarding claim 16, the rejection of claim 15 is incorporated. Bloom, in view of Metcalf and Haupt, discloses all of the elements of the current invention as stated above. However, Bloom, in view of Metcalf and Haupt, fails to expressly recite wherein the user voice category includes a user tone and a user pitch, and the content creator voice category includes a content creator tone and a content creator pitch.
Shah teaches wherein the user voice category includes a user tone and a user pitch, and the content creator voice category includes a content creator tone and a content creator pitch (Shah, [0007]: "Using Artificial Intelligence (AI), such as a neural network, the voice (for audio calls or audio portion of an audio/video call) and face may be overlaid over the actual live agent's face and/or voice so the customer is presented with speech, and/or images, of the desired entity."; [0009]: "For speech, the tone and pitch of the voice of agent are also mapped to that of the celebrity's in real time.").
Bloom, Metcalf, Haupt, and Shah are analogous arts because they all belong to the same field of audio processing. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the sound modification method and apparatus of Bloom, as modified by the music selection method of Metcalf and the speech to song conversion method of Haupt, to incorporate the teachings of Shah to modify both pitch and tone of the user. This allows the user’s voice to better be modified to emulate a specific person’s voice (Shah, [0007]). By better emulating a specific voice, the systems final output is improved.

Regarding claim 17, the rejection of claim 16 is incorporated. Bloom, in view of Metcalf, Haupt, and Shah, discloses all of the elements of the current invention as stated above. Shah further teaches wherein to change the user voice category to the content creator voice category, the audio manipulation manager is configured to change the user tone to the content creator tone and change the user pitch to the content creator pitch (Shah, [0007]: "Using Artificial Intelligence (AI), such as a neural network, the voice (for audio calls or audio portion of an audio/video call) and face may be overlaid over the actual live agent's face and/or voice so the customer is presented with speech, and/or images, of the desired entity."; [0009]: "For speech, the tone and pitch of the voice of agent are also mapped to that of the celebrity's in real time."). The same motivation for claim 16 applies equally to claim 17.

Claim(s) 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bloom, in view of Metcalf and Haupt, as applied to claims 9, 11-13, 15, and 18-20 above, and further in view of Lindahl.
Regarding claim 14, the rejection of claim 9 is incorporated. Bloom, in view of Metcalf and Haupt, discloses all of the elements of the current invention as stated above. However, Bloom, in view of Metcalf and Haupt, fails to expressly recite detecting an audio manipulation mode by at least one of detecting an audio application is running, or detecting the audio application is requesting the input audio content.
Lindahl teaches detecting an audio manipulation mode by at least one of detecting an audio application is running, or detecting the audio application is requesting the input audio content (Lindahl, [0009]: "To configure the audio beamforming settings, the computing system can detect a predetermined actively running application, such as a dictation application, a speech recognition application, an audio communications application, a video chat application, an audio recording application, or a music playback application.").
Bloom, Metcalf, Haupt, and Lindahl are analogous arts because they all belong to the same field of audio processing. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the sound modification method and apparatus of Bloom, as modified by the music selection method of Metcalf and the speech to song conversion method of Haupt, to incorporate the teachings of Lindahl to detect an audio manipulation mode. This allows the system to be configured based on a current state of a computing device (Lindahl, [0009]). This helps ensure the system works properly with other running programs.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to TYLER J BECKER whose telephone number is (703)756-1271. The examiner can normally be reached M-Th, 7:15am-5:45pm PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached at (571) 272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/TYLER BECKER/              Examiner, Art Unit 2657                                                                                                                                                                                          

/DANIEL C WASHBURN/               Supervisory Patent Examiner, Art Unit 2657
Read full office action
Prosecution Timeline

Show 1 earlier event
Aug 12, 2025
Non-Final Rejection mailed — §103
Jan 12, 2026
Response Filed
Mar 10, 2026
Final Rejection mailed — §103
Apr 30, 2026
Interview Requested
May 06, 2026
Applicant Interview (Telephonic)
May 06, 2026
Examiner Interview Summary
May 11, 2026
Request for Continued Examination
May 12, 2026
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

18/346,232
Patent 12632657
Joint Speech and Text Streaming Model for ASR
2y 10m to grant Granted May 19, 2026
18/274,767
Patent 12614560
REVERBERATION REMOVAL DEVICE, PARAMETER ESTIMATION DEVICE, REVERBERATION REMOVAL METHOD, PARAMETER ESTIMATION METHOD, AND PROGRAM
2y 9m to grant Granted Apr 28, 2026
18/484,927
Patent 12597433
SPEECH SIGNAL ENHANCEMENT METHOD AND APPARATUS, AND ELECTRONIC DEVICE
2y 5m to grant Granted Apr 07, 2026
18/334,771
Patent 12585893
Full Media Translator
2y 9m to grant Granted Mar 24, 2026
17/692,070
Patent 12518777
SYSTEMS AND METHODS FOR AUTHENTICATION USING SOUND-BASED VOCALIZATION ANALYSIS
3y 10m to grant Granted Jan 06, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
75%
Grant Probability
92%
With Interview (+16.5%)
2y 7m (~0m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 20 resolved cases by this examiner. Grant probability derived from career allowance rate.