Last updated: April 19, 2026
Application No. 18/034,207
AUDIO PROCESSING METHOD AND APPARATUS, ELECTRONIC DEVICE, AND COMPUTER-READABLE STORAGE MEDIUM

Final Rejection §101§103
Filed
Apr 27, 2023
Examiner
CHAVEZ, RODRIGO A
Art Unit
2658
Tech Center
2600 — Communications
Assignee
Tencent Music Entertainment Technology (Shenzhen) Co. Ltd.
OA Round
2 (Final)
This examiner grants 50% of cases after interview

— +37.3% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 228 resolved cases, 2023–2026
Examiner Intelligence

CHAVEZ, RODRIGO A View full profile →
Grants 50% of resolved cases
Career Allow Rate
115 granted / 228 resolved
-11.6% vs TC avg
Strong +37% interview lift
Without
With
+37.3%
Interview Lift
resolved cases with interview
Typical timeline
3y 5m
Avg Prosecution
22 currently pending
Career history
250
Total Applications
across all art units
Statute-Specific Performance

§101
16.4%
-23.6% vs TC avg
§103
53.1%
+13.1% vs TC avg
§102
20.9%
-19.1% vs TC avg
§112
5.6%
-34.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 228 resolved cases
Office Action

§101 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments, regarding the rejections of claims 1-5, 7, 8, 10-15, 17, and 18 under 35 U.S.C. § 101, filed 02/12/2026, have been fully considered but they are not persuasive. 

The applicant argues:
	“Based on the amended claims 1 and 10, it can be seen that the method for audio processing is performed by a server in a system for audio processing, which is a specific machine related to audio processing, and the claimed method is not directed to a mental step due to performed by the server. 
	Specifically, a target dry audio is obtained by the server, and a beginning and ending time of a lyric word in target dry audio is determined by the server based on a lyrics text. After that, a pitch of the target dry audio and a fundamental frequency within the beginning and ending time are detected by the server, and a current pitch name of the lyric word is determined by the server based on the fundamental frequency and the pitch. Then, the lyric word is pitched up by a first key interval through the server to obtain a first harmony, and the lyric word is pitched up by different second key intervals through the server to obtain different second harmonies. Finally, the first harmony and the second harmonies are synthesized by the server to form a multi-track harmony, and the multi-track harmony is mixed with the target dry audio through the server to obtain a synthesized dry audio. 
 
	Based on the amended claims 1 and 10, a problem that dry audio recorded directly from a user has a poor auditory effect is solved, and the auditory effect of a dry audio is improved. 
Accordingly, claims 1-5, 7, 8, 10-15, 17, and 18 recite a practical application of the 
technology that solves a practical problem. Thus, these claims are directed to statutory subject matter.”

	Regarding the applicant’s arguments, the examiner respectfully disagrees. The examiner contends that the added language of using a “server” fails to render the claim eligible subject matter under 101. Using a “server” in a system for audio processing constitutes a system of distributed processing, and it is recited at such a high level of generality that it also constitutes a generic computing environment, because simply distributing the processing to a server is a widely known use for a server in the technological environment. Using a “server” for processing of audio data adds nothing more than what a server already typically performs by moving processing load to a remote hardware in a generic way. Additionally, the problem, as noted by the applicant, of “a problem that dry audio recorded directly from a user has a poor auditory effect is solved, and the auditory effect of a dry audio is improved”, is not particularly solved by the use of a “server” in the system to process the audio. The use of a “server”, as claimed, makes no apparent difference to the improvement of the technological environment or the solution to the problem as defined by the applicant. Therefore, the examiner maintains that the claimed language constitutes ineligible subject matter under 101. 

	Regarding the rejection of claim 11 under 35 U.S.C. § 101, in association with the claim being directed to a signal per se, the rejection has been withdrawn in view of the amendment made to include the suggested language of “non-transitory”.

Applicant’s arguments, see Remarks, filed 02/12/2026, with respect to the rejection of claims 1-4, 7, 8, 10-14, 17 and 18 under 35 U.S.C. § 102 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made under 35 U.S.C. § 103 in view of Rutledge and Cook.	Applicant's arguments, regarding the rejections of claims 1-5, 7, 8, 10-15, 17, and 18 under 35 U.S.C. § 103, filed 02/12/2026, have been fully considered but they are not persuasive. 	
The applicant argues:
	“Cook describes pitch-correction of vocal performance in accord with score-coded harmonies. 
	Yoshioka describes a voice converter with extraction and modification of attribute data. 
	Both Cook and Yoshioka fail to teach or suggest the aforementioned distinguishing technical features recited in the independent claims. 
	Thus, no combination of the cited references teaches or suggest the combination of features recited in the independent claims.” 
	Regarding applicant’s arguments, the examiner respectfully disagrees. The examiner contends that the applicant simply asserts that the combination of the references fail to teach or suggest the technical features without providing any specific reasoning or explanation towards that assertion. Therefore, the examiner contends that both Cook and Yoshida indeed teach the combination of features as presented in the rejection below.		
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-5, 7, 8, 10-15, 17 and 18 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more.
The Supreme Court has long held that “[l]aws of nature, natural phenomena, and abstract ideas are not patentable.” Alice Corp. Pty. Ltd. v. CLS Bank Int’l, 134 S. Ct. 2347, 2354 (2014) (quoting Assoc. for Molecular Pathology v. Myriad Genetics, Inc., 133 S. Ct. 2107, 2116 (2013) (internal quotation marks omitted)). The “abstract ideas” category embodies the longstanding rule that an idea, by itself, is not patentable. Alice Corp., 134S. Ct. at 2355 (quoting Gottschalk v. Benson, 409 U.S. 63, 67 (1972).
In Alice, the Supreme Court sets forth an analytical “framework for distinguishing patents that claim laws of nature, natural phenomena, and abstract ideas [or mental processes ] from those that claim patent-eligible applications of those concepts.”  Id. at 2355 (citing Mayo Collaborative Servs. v. Prometheus Labs., Inc., 132 S. Ct. 1289, 1296–97 (2012)).  The first step in the analysis is to “determine whether the claims at issue are directed to one of those patent-ineligible concepts.”  Id.  If the claims are directed to a patent-ineligible concept, the second step in the analysis is to consider the elements of the claims “individually and ‘as an ordered combination’” to determine whether there are additional elements that “‘transform the nature of the claim’ into a patent-eligible application.”  Id. (quoting Mayo, 132 S. Ct. at 1298, 1297).  In other words, the second step is to “search for an ‘inventive concept’—i.e., an element or combination of elements that is ‘sufficient to ensure that the patent in practice amounts to significantly more than a patent upon the [ineligible concept] itself’”.  Id. (brackets in original) (quoting Mayo, 132 S. Ct. at 1294).  The prohibition against patenting an abstract idea “‘cannot be circumvented by attempting to limit the use of the formula to a particular technological environment’ or adding ‘insignificant post-solution activity.’”  Bilski v. Kappos, 561 U.S. 593, 610–11 (2010) (citation omitted).

Step 1: This part of the eligibility analysis evaluates whether the claim falls within any statutory category. See MPEP 2106.03. Independent Claim 1 recites the method of obtaining and analyzing target dry audio to detect a pitch and a fundamental frequency of the target dry audio and creating a multi-track harmony by tuning up multiple versions of the target dry audio by specific key intervals, and thus is a process (a series of steps or acts). A process is a statutory category of invention. Independent Claim 10 recites an electronic device comprising a memory and a processor configured to execute a method similar to Claim 1. An electronic device or apparatus is a Statutory category of invention. Dependent claims 2-8 and 12-18 are dependent on claims 1 and 10, respectively, and therefore recite their respective statutory classes. Independent claim 11 is non-statutory a covering both transitory and non-transitory media. A more detailed analysis of the claim regarding non-statutory category is provided below after the analysis of the judicial exception.
	Step 2A, Prong One: This part of the eligibility analysis evaluates whether the claim recites a judicial exception. As explained in MPEP 2106.04, subsection II, a claim “recites” a judicial exception when the judicial exception is “set forth” or “described” in the claim. In applying the framework set out in Alice, examiner found Applicant’s claims 1, 10 and 11 are directed to a patent-ineligible abstract concept of obtaining and analyzing target dry audio to detect a pitch and a fundamental frequency of the target dry audio and creating a multi-track harmony by tuning up multiple versions of the target dry audio by specific key intervals.  The steps of Applicant’s claims 1-5, 7, 8 and 10-15, 17 and 18 are an abstract concept that would fall under the judicial exception of steps performed in the mind or with pen and paper. Specifically, the claims recite the step of “obtaining a target dry audio, and determining a beginning and ending time of each lyric word in the target dry audio, wherein the lyric word is determined based on a lyrics text.” The claim does not place any limits on how the audio is obtained and how the beginning and end of each lyric word is determined. The obtaining and determining may involve a human, for example a human music producer working in a music studio, listening to another human performing a song via singing with his or her voice, wherein the lyrics text may simply be lyrics written on a sheet of paper. Therefore, this step is directed to a mental step. Furthermore, the step of “detecting a pitch of the target dry audio and a fundamental frequency during the beginning and ending time, and determining a current pitch name of the lyric word based on the fundamental frequency and the pitch” recites steps that are directed to mental steps. The detecting and determining steps may simply involve a human music producer listening to a sung performance and analyzing the musical notes that are being sung by the performed. The musical notes being a representation of the pitch name of each pitch and fundamental frequency.  Further, the claim recites “tuning up the lyric word by a first key interval to obtain a first harmony, and tuning up the lyric word by different second key intervals respectively to obtain different second harmonies, wherein the first key interval indicates a positive integer number of keys, each of the second key intervals is a sum of the first key interval and a third key interval, and different ones of the second key intervals are determined from different third key intervals, and the first key interval is different form the third key interval by one order of magnitude.” The steps simply provide a basic way of forming harmonies using at least three different notes (fundamental, first harmony and second harmony) from a particular key or scale (first, second and third key intervals). Under the broadest reasonable interpretation, the claim elements are directed to any process of combining multiple musical notes to create a harmony based on a specific key or scale, which is a process that is typically performed by a human such as a music producer that may direct multiple performers, singing different notes of a particular section(s) of a song, to perform harmonies during that/those section(s) of the song. Therefore, the above steps are also directed to mental steps. The claim further recites “synthesizing the first harmony and the second harmonies to form a multi-track harmony.” The claim does not place any limits on how the synthesizing is performed, therefore, under broadest reasonable interpretation, the process of “synthesizing” may simply be represented by a group of singers producing sound with their voices by following a written composition tailored to each singer, for producing a harmonic performance. Thus, the recitation is also a mental step. Finally, the step of “mixing the multi-track harmony with the target dry audio to obtain a synthesized dry audio” may also be, under broadest reasonable interpretation, a representation of the performance of the group of singers singing in unison. Under broadest reasonable interpretation, any person who is listening to a group of performers singing in unison may perceive the performance as one “mixed” sound being heard.

Step 2A, Prong Two: This part of the eligibility analysis evaluates whether the claim as a whole integrates the recited judicial exception into a practical application of the exception. This evaluation is performed by (1) identifying whether there are any additional elements recited in the claim beyond the judicial exception, and (2) evaluating those additional elements individually and in combination to determine whether the claim as a whole integrates the exception into a practical application. See MPEP 2106.04(d). 
	Independent Claim 10 recites: “a memory storing a computer program; and a processor, wherein the processor, when executing the computer program, is configured to cause a server in a system for audio processing to” and independent claim 11 recites: “wherein the computer-readable storage medium stores a computer program, and the computer program” as additional elements beyond the judicial exception. However, these additional elements do not amount to significantly more than the abstract idea because the additional elements constitute a generic computer environment. Using a “server” in a system for audio processing constitutes a system of distributed processing, and it is recited at such a high level of generality that it also constitutes a generic computing environment, because simply distributing processing to a server is a widely known use for a server in the technological environment. Alice, 134 S. Ct. at 2357. The Claims need meaningful limitations that go beyond generally linking the use of an abstract idea to a particular technological environment. Therefore, the steps are all abstract and the Claim as a whole is abstract. “[S]imply appending generic computer functionality to lend speed or efficiency to the performance of an otherwise abstract concept does not meaningfully limit claim scope for purposes of patent eligibility.” CLS Bank, 2013 U.S. App. LEXIS 9493, at *29 (citing Bancorp, 687 F.3d at 1278, and Dealertrack, Inc. v. Huber, 674 F.3d 1315, 1333-34 (Fed. Cir. 2012) (finding that the claimed computer-aided clearinghouse process is a patent-ineligible abstract idea)); SiRF Tech., Inc. v. Int'l Trade Comm'n, 601 F.3d 1319, 1333 (Fed. Cir. 2010) (“In order for the addition of a machine to impose a meaningful limit on the scope of a claim, it must play a significant part in permitting the claimed method to be performed, rather than function solely as an obvious mechanism for permitting a solution to be achieved more quickly, i.e., through the utilization of a computer for performing calculations.”).	Additionally, dependent claims 2-5, 7, 8, 12-15, 17 and 18 do not provide any additional elements that integrate the judicial exception into a practical application. The claims simply describe steps such as using a pitch classifier, producing a third harmony, determining volumes and delays, adding sound effects and obtaining accompaniment audio signals. The recited elements are elements that may be performed in the mind or using basic devices that may be found in any music studio.

Step 2B:  This part of the eligibility analysis evaluates whether the claim as a whole amounts to significantly more than the recited exception, i.e., whether any additional element, or combination of additional elements, adds an inventive concept to the claim. See MPEP 2106.05.
At step 2A, prong two, the additional elements of “a memory storing a computer program; and a processor, wherein the processor, when executing the computer program, is configured to cause a server in a system for audio processing to” and “wherein the computer-readable storage medium stores a computer program, and the computer program” were found to be insignificant extra-solution activity and a generic computer environment. At Step 2B, the re-evaluation of the insignificant extra-solution activity consideration takes into account whether or not the extra-solution activity is well understood, routine, and conventional in the field. See MPEP 2106.05(g). Here, the subject matter recites mere instructions to apply an exception to be performed by a generic computing environment such as a memory and a processor. Therefore, this limitation remains insignificant extra-solution activity even upon reconsideration and does not amount to significantly more. Even when considered in combination, these additional elements represent mere instructions to apply an exception and insignificant extra-solution activity, and therefore do not provide an inventive concept. 	Additionally, dependent claims 2-5, 7, 8, 12-15, 17 and 18 do not add an inventive concept.

In conclusion, Examiner notes that none of recited steps in Applicant's claims 1-5, 7, 8, 10-15, 17 and 18 refer to a specific machine by reciting structural limitations of any apparatus or to any specific operations that would cause a machine to be the mechanism to perform these steps.  Although the claims may be processed by a computing system having a processor, the computing system is merely a general purpose computing system. Therefore, all of the claims 1-5, 7, 8, 10-15, 17 and 18 are abstract.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5, 7, 8, 10-15, 17 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Rutledge in view of Cook (US PG Pub 20150170636).	As per claims 1, 10 and 11, Rutledge discloses:	A method, electronic device and computer readable storage medium comprising:	a memory storing a computer program (Rutledge; p. 0055 - Read-only memory (ROM) (1424) containing data and programming instructions is also connected to the DSP…; see also p. 0176); and 
a processor, wherein the processor, when executing the computer program (Rutledge; p. 0055 - …A microprocessor (1434) is connected to ROM (1436) and RAM (1426) that contain program instructions and data…; see also p. 0176), is configured to:	obtain a target dry audio (Rutledge; p. 0056 - the monophonic audio signal representing the melody (e.g., a human voice signal) is passed into a pitch detector (100)), and determine a beginning and ending time of each lyric word in the target dry audio (Rutledge; p. 0056 - This block examines the periodicity in the audio signal and determines a voicing indicator which is set to be TRUE when periodicity is detected in the signal); 	detect a pitch of the target dry audio and a fundamental frequency during the beginning and ending time, and determine a current pitch name of the lyric word based on the fundamental frequency and the pitch (Rutledge; p. 0056-0058 - In the case of voiced signals, the value of the fundamental frequency is also determined…; see also p. 0063 - …any pitch detection method capable of detection of the fundamental frequency in a monophonic source with low delay (typically less than about 40 ms) is suitable…); 	tune up the lyric word by a first key interval to obtain a first harmony (Rutledge; p. 0064 - The harmony shift generator (102) analyzes the polyphonic accompaniment signal in context with the melody pitch information to determine a pitch shift amount relative to the input melody signal that will create a musically correct harmony signal…), and tune up the lyric word by different second key intervals respectively to obtain different second harmonies (Rutledge; p. 0056 - …It will be appreciated by one skilled in the art that the processing described above can be applied to multiple harmony styles in order to create a signal having a lead melody and multiple harmony voices…), wherein	the first key interval indicates a positive integer number of keys, each of the second key intervals is a sum of the first key interval and a third key interval, and different ones of the second key intervals are determined from different third key intervals, and the first key interval is different form the third key interval by one order of magnitude (Rutledge; p. 0142 - Chords can be completely described in terms of their intervals, the distances from one note to an adjacent (or other) note in the chord. We pick a set of chords (which, depending on the desired harmony style, may be as simple as the major and the minor, or may include more complicated chords like 7th, minor 7th, diminished, 9th etc.) and we analyze them to determine their frequency of usage of each interval. For the simplest set of chords listed above (i.e. major and minor), the intervals between a note and the note above are +3, +4 and +5 (first key interval… positive integer). The intervals between a note and the note 2 above it are +7, +8 and +9 (second key interval… a third key interval above the second key interval)); and 	synthesize the first harmony and the second harmonies to form a multi-track harmony (Rutledge; p. 0018 - In alternative examples, the harmony note generator includes a synthesizer configured to generate the harmony note. In some cases the harmony note is generated substantially in real-time with receipt of the accompaniment signal. In representative examples, the harmony generator is configured to produce the harmony note substantially in real-time with the current melody note, and the digital melody signal is based on a voice signal); and 	mix the multi-track harmony with the target dry audio to obtain a synthesized dry audio (Rutledge; p.0056 - This output signal is then mixed with the input melody signal by a mixer (106) in order to create a vocal harmony signal).	Rutledge, however, fails to disclose wherein the processor is configured to cause a server in a system for audio processing, and wherein the lyric word is determined based on a lyrics text. 	
Cook does teach wherein the processor is configured to cause a server in a system for audio processing (Cook; p. 0109 - Furthermore, in some embodiments, uploaded dry vocals 106 may be pitch corrected and shifted at content server 110 (e.g., based on pitch harmony cues 105, previously described relative to pitch correction and harmony generation at the handheld 101) to afford the desired prominence), and wherein the lyric word is determined based on a lyrics text (Cook; p. 0052 -  Lyrics (text--for Karaoke display), timing information and applicable pitch correction settings may be retrieved for association with the existing backing track using any of a variety of identifiers ascertainable, e.g., from audio metadata, track title, an associated thumbnail or even fingerprinting techniques applied to the audio, if desired).
Therefore, it would have been obvious to one of ordinary skill in the art to modify the method of Rutledge to include wherein the processor is configured to cause a server in a system for audio processing, and wherein the lyric word is determined based on a lyrics text, as taught by Cook, in order to make the synthetic harmonies appear more distinct from the main voice which is pitch corrected to melody. When using only a single channel, all of the harmonized voices can have the tendency to blend with each other and the main voice. By panning, implementations can provide significant psychoacoustic separation. Typically, the desired spatialization can be provided by adjusting amplitude of respective left and right channels (Cook; p. 0085).	
	As per claims 2 and 12, Rutledge in view of Cook discloses:	The method and electronic device according to claims 1 and 10, wherein the detecting a pitch of the target dry audio comprises: extracting an audio feature from the target dry audio, wherein the audio feature comprises a fundamental frequency feature and spectral information; and inputting the audio feature to a pitch classifier to obtain the pitch of the target dry audio (Rutledge; p. 0063 - When signal frequencies are converted to note numbers, we use the equal-tempered convention which uses the following formula:
                        
                            n
                            =
                            69
                            -
                            12
                             
                            
                                
                                    l
                                    o
                                    g
                                
                                
                                    2
                                
                            
                            (
                            
                                
                                    
                                        
                                            f
                                        
                                        
                                            r
                                            e
                                            f
                                        
                                    
                                
                                
                                    f
                                
                            
                            )
                        
                     (1) wherein n is a note number, and f is an input frequency in hertz (f>27.5 Hz), and                         
                            
                                
                                    f
                                
                                
                                    r
                                    e
                                    f
                                
                            
                        
                     is a reference frequency of note 69 (A above middle C), for example, 440 Hz (classifying pitch to a note number)).  		As per claims 3 and 13, Rutledge in view of Cook discloses:
	The method and electronic device according to claims 1 and 10, wherein after the determining a current pitch name of the lyric word based on the fundamental frequency and the pitch, the method further comprises tuning up the target dry audio by the third key intervals respectively to obtain third harmonies; and the synthesizing the first harmony and the second harmonies to form a multi-track harmony comprises synthesizing the third harmonies, the first harmony, and the second harmonies to form a multi-track harmony (Rutledge; p. 0142 - We pick a set of chords (which, depending on the desired harmony style, may be as simple as the major and the minor, or may include more complicated chords like 7th, minor 7th, diminished, 9th etc.)… these more complicated chords involve third harmony notes).  

	As per claims 4 and 14, Rutledge in view of Cook discloses:	The method and electronic device according to claims 3 and 13, wherein the synthesizing the third harmonies, the first harmony, and the second harmonies to form a multi-track harmony comprises: determining volumes and delays of the third harmonies, the first harmony, and the second harmonies, respectively; and synthesizing the third harmonies, the first harmony, and the second harmonies based on the volumes and delays corresponding to the third harmonies, the first harmony, and the second harmonies, to obtain the multi-track harmony (Rutledge; p. 0084-0088 - As shown in FIG. 4, the spectral quality estimator takes in a polyphonic audio mix and produces spectral quality (SQ) Data, consisting of SQ--a scalar giving the spectral quality value PkVal--the SQ value of the last peak (amplitude peak, i.e. volume) found PkDir--the direction (+1,-1) of the last peak found PkDelay--the delay in samples of the last peak found The filter bank (400) consists of a constant Q digital filter bank with passbands centered on the expected location of specific notes… p. 0089 - The envelope follower (402) analyzes each output channel from the filter bank block to estimate the envelope or peak level of the signal (volume); see also p. 0090-0092).  

	As per claims 5 and 15, Rutledge in view of Cook discloses:	The method and electronic device according to claims 1 and 10, wherein after the mixing the multi-track harmony with the target dry audio to obtain a synthesized dry audio, the method further comprises: obtaining an accompaniment audio corresponding to the synthesized dry audio, and superimposing, in a preset manner, the accompaniment audio with the synthesized dry audio added with the sound effect, to obtain a synthesized audio (Rutledge; p. 0054 - FIG. 14 is a block diagram of a representative vocal harmony generation system (1402) that receives two input signals a monophonic melody signal (1404) and a polyphonic accompaniment signal (1406). The system (1402) generates left and right components (1408, 1410), respectively, of a stereo output signal containing a mix of the original melody signal and one or more generated harmony signals that are pitch shifted versions of the melody signal where the pitch shift intervals are musically correct within the context of the accompaniment signal). 	Rutledge, however, fails to disclose adding a sound effect to the synthesized dry audio by using a sound effect device. 	Cook does teach adding a sound effect to the synthesized dry audio by using a sound effect device (Cook; p. 0085-0088 - Further effects may be provided in addition to the above-described generation of pitch-shifted harmonies in accord with score codings and the user/vocalists own captured vocals. For example, in some embodiments, a slight pan (i.e., an adjustment to left and right channels to create apparent spatialization) of the harmony voices is employed... For example, in some embodiments, even a coarse spatial resolution pan may be employed, e.g....).	Therefore, it would have been obvious to one of ordinary skill in the art to modify the method of Rutledge to include adding a sound effect to the synthesized dry audio by using a sound effect device, as taught by Cook, in order to make the synthetic harmonies appear more distinct from the main voice which is pitch corrected to melody. When using only a single channel, all of the harmonized voices can have the tendency to blend with each other and the main voice. By panning, implementations can provide significant psychoacoustic separation. Typically, the desired spatialization can be provided by adjusting amplitude of respective left and right channels (Cook; p. 0085).	
	As per claims 7 and 17, Rutledge in view of Cook discloses:	The method and electronic device according to claims 1 and 10, wherein the tuning up the lyric word by a first key interval to obtain a first harmony, and tuning up the lyric word by different second key intervals respectively to obtain different second harmonies comprises: determining a preset pitch name interval, and tuning up the lyric word by the preset pitch name interval to obtain the first harmony, wherein adjacent pitch names are different from each other by one or two first key intervals; and tuning up the first harmony by the third key intervals respectively to obtain the second harmonies (Rutledge; p. 00171-0174 – compute harmony note from key and scale… where the harmony note is the pitch name, and the key and scale define the pitch name intervals. The key and scale provide a template of the notes (pitch names) that are to be used parting from a particular fundamental frequency; see also p. 0142 - Chords can be completely described in terms of their intervals, the distances from one note to an adjacent (or other) note in the chord. We pick a set of chords (which, depending on the desired harmony style, may be as simple as the major and the minor, or may include more complicated chords like 7th, minor 7th, diminished, 9th etc.) and we analyze them to determine their frequency of usage of each interval. For the simplest set of chords listed above (i.e. major and minor), the intervals between a note and the note above are +3, +4 and +5 (first key interval… positive integer). The intervals between a note and the note 2 above it are +7, +8 and +9). 
 
As per claims 8 and 18, Rutledge in view of Cook discloses:	The method and electronic device according to claims 7 and 17, wherein the tuning up the lyric word by the preset pitch name interval to obtain the first harmony comprises: determining, based on the current pitch name and the preset pitch name interval, a target pitch name of the lyric word after tuned up by the preset pitch name interval; determining a quantity of the first key intervals corresponding to the lyric word based on a key interval between the target pitch name of the lyric word and the current pitch name of the lyric word; and tuning up the lyric word by the quantity of the first key intervals to obtain the first harmony (Rutledge; p. 00171-0174 – compute harmony note from key and scale… where the harmony note is the pitch name, and the key and scale define the pitch name intervals. The key and scale provide a template of the notes (pitch names) that are to be used parting from a particular fundamental frequency; see also p. 0142 - Chords can be completely described in terms of their intervals, the distances from one note to an adjacent (or other) note in the chord. We pick a set of chords (which, depending on the desired harmony style, may be as simple as the major and the minor, or may include more complicated chords like 7th, minor 7th, diminished, 9th etc.) and we analyze them to determine their frequency of usage of each interval. For the simplest set of chords listed above (i.e. major and minor), the intervals between a note and the note above are +3, +4 and +5 (first key interval… positive integer). The intervals between a note and the note 2 above it are +7, +8 and +9).

Claims 6 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Rutledge in view of Cook and further in view of Yoshioka (US PG Pub 20030055647)

	As per claims 6 and 16, Rutledge in view of Cook disclose:	The method and electronic device according to claims 5 and 15, of superimposing the accompaniment audio with the synthesized dry audio added with the sound effect in a preset manner to obtain a synthesized audio.	Rutledge in view of Cook, however, fails to disclose performing a power normalization on the accompaniment audio to obtain an intermediate accompaniment audio, and performing a power normalization on the synthesized dry audio added with the sound effect to obtain an intermediate dry audio; and superimposing, based on a preset energy ratio, the intermediate accompaniment audio with the intermediate dry audio, to obtain the synthesized audio.	Yoshioka does teach performing a power normalization on the accompaniment audio to obtain an intermediate accompaniment audio, and performing a power normalization on the synthesized dry audio added with the sound effect to obtain an intermediate dry audio; and superimposing, based on a preset energy ratio, the intermediate accompaniment audio with the intermediate dry audio, to obtain the synthesized audio (Yoshioka; p. 0136-0137 - Operation of Amplitude Normalizer: Then, each amplitude An is normalized by the mean amplitude Ame according to the following relation in an amplitude normalizer 15 to obtain normalized amplitude A'n: A'n=An/Ame (preset energy ratio)).  	Therefore, it would have been obvious to one of ordinary skill in the art to modify the method of Rutledge and Cook to include performing a power normalization on the accompaniment audio to obtain an intermediate accompaniment audio, and performing a power normalization on the synthesized dry audio added with the sound effect to obtain an intermediate dry audio; and superimposing, based on a preset energy ratio, the intermediate accompaniment audio with the intermediate dry audio, to obtain the synthesized audio, as taught by Yoshioka, in order to provide a voice converting apparatus and a voice converting method that allow voice conversion without losing naturalness of the voice (Yoshioka; p. 0016).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The prior art made of record and not relied upon includes:
	Kleinberger (US PG Pub 20210050029) discloses a feedback system may play back, to a user, an altered version of the user's voice in real time, in order to reduce stuttering by the user. The system may operate in different feedback modes at different times. For instance, the system may detect when the severity of a user's stuttering increases, which is indicative of the user habituating to the current feedback mode. The system may then switch to a different feedback mode. In some cases, the feedback modes include at least a Whisper mode, a Reverb mode, and a Harmony mode. In Whisper mode, the user's voice may be transformed to sound as if it were whispering in the user's ears. In Harmony mode, the user's voice may be altered as if the user were harmonizing with himself or herself. In Reverb mode, the user's voice may be altered so that it reverberates.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Rodrigo A Chavez whose telephone number is (571)270-0139. The examiner can normally be reached Monday - Friday 9-6 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached at 5712727602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/RODRIGO A CHAVEZ/Examiner, Art Unit 2658


/RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658
Read full office action
Prosecution Timeline

Apr 27, 2023
Application Filed
Nov 10, 2025
Non-Final Rejection — §101, §103
Feb 12, 2026
Response Filed
Mar 25, 2026
Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/175,355
Patent 12597430
MULTI-CHANNEL SIGNAL GENERATOR, AUDIO ENCODER AND RELATED METHODS RELYING ON A MIXING NOISE SIGNAL
2y 5m to grant Granted Apr 07, 2026
17/579,750
Patent 12579984
DATA AUGMENTATION SYSTEM AND METHOD FOR MULTI-MICROPHONE SYSTEMS
2y 5m to grant Granted Mar 17, 2026
17/513,419
Patent 12541653
ENTERPRISE COGNITIVE SOLUTIONS LOCK-IN AVOIDANCE
2y 5m to grant Granted Feb 03, 2026
17/532,315
Patent 12542136
DYNAMICALLY CONFIGURING A WARM WORD BUTTON WITH ASSISTANT COMMANDS
2y 5m to grant Granted Feb 03, 2026
17/450,015
Patent 12531077
METHOD AND APPARATUS IN AUDIO PROCESSING
2y 5m to grant Granted Jan 20, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
50%
Grant Probability
88%
With Interview (+37.3%)
3y 5m
Median Time to Grant
Moderate
PTA Risk
Based on 228 resolved cases by this examiner. Grant probability derived from career allow rate.