Last updated: April 19, 2026
Application No. 18/839,559
SIGNAL PROCESSING APPARATUS AND SIGNAL PROCESSING METHOD

Non-Final OA §103
Filed
Aug 19, 2024
Examiner
GAY, SONIA L
Art Unit
2657
Tech Center
2600 — Communications
Assignee
Sony Group Corporation
OA Round
1 (Non-Final)
Interview Optional

— +11.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 855 resolved cases, 2023–2026
Examiner Intelligence

GAY, SONIA L View full profile →
Grants 82% — above average
Career Allow Rate
701 granted / 855 resolved
+20.0% vs TC avg
Moderate +11% lift
Without
With
+11.4%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
33 currently pending
Career history
888
Total Applications
across all art units
Statute-Specific Performance

§101
10.2%
-29.8% vs TC avg
§103
50.6%
+10.6% vs TC avg
§102
11.9%
-28.1% vs TC avg
§112
13.9%
-26.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 855 resolved cases
Office Action

§103
DETAILED ACTION
This action is in response to the initial filing of application no. 18/839,559 on 08/19/2024.
Claims 1- 18 are still pending in this application, with claims 1 and 18 being independent.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: (in claims 1- 17) feature extraction section … to extract a signal and a feature addition section that applies gain adjustment; (in claims 2 and 3) a channel number conversion section that changes the number of channels…; (in claim 8)  a delay processing section that delays output of the signal.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof: hardware programmed with software that implements an algorithm comprising the functions recited by the functional limitations, Fig.10 and paragraphs 61– 68 of the originally filed specification. 
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 9, 10, 12 and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vitt et al. (US 11,227,623) (“Vitt”) in view of Souden et al. (US 12,141,347) (“Souden”).
For claim 1 and 18, Vitt discloses a signal processing apparatus and method (Abstract) comprising: a feature extraction section (Fig.3, 62) to extract a signal of specific sound (speech or ambient) from an input signal  (input signal comprises a signal sensed in a user environment and audio content source, Fig.1, 22, 30 and Fig.3,  user content source and 64; column 3 lines 35– column 4 line 4)(column 7 lines 25 – 34); and a feature addition section (gain controller and combiner, Fig.3, 66 and 72) that applies gain adjustment to the signal of the specific sound extracted in the feature extraction section (column 7 lines 47 – column 8 line5 ) and adds a result of the gain adjustment to a signal (user content output from a gain controller, Fig.3; column 4 lines 1 – 12) based on the input signal (column 7 lines 49 – 53). 
Yet, Vitt fails to teach that the feature extraction section uses a learning model obtained through machine learning.
However, Souden discloses a system and method for the for the purpose of processing audio (Abstract), comprising the following: a feature extraction section comprises a machine learning model (Fig.1, 104 and Fig.5, 506; column 5 lines 1 – 53, column 8 lines 28 - 30) to extract a signal of specific sound (column 8 lines 25 – 34).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of applicant’s filing to improve Vitt’s invention in the same way that Souden’s invention has been improved to achieved the following, predictable results for the purpose of providing efficient audio transparency in headphones playing audio content to indirectly control a user’s speech loudness (column 2 lines 19 – 54): the feature extraction section further uses a learning model obtained through machine learning to extract the  signal of specific sound.

For claim 9, Vitt further discloses wherein the specific sound is sound relating to voice (Vitt, Fig.3, speech; column 7 lines 27 – 31).
	For claim 10, Vitt and Souden further discloses wherein the specific sound is sound of a specific musical instrument a sound effect, cheer sound, or noise  (Vitt, column 7 lines 27 – 31) (Souden, column 8 lines 25 – 34).
	For claim 12, Vitt further discloses, wherein the feature addition section applies the gain adjustment to the signal of the specific sound with a predetermined fixed setting (Vitt, The gain is applied so that a strength ratio does not exceed a  predefined threshold., column 7 lines 47 – 49; column 8 lines 6 – 20).

Claim(s) 2 – 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vitt et al. (US 11,227,623) (“Vitt”) in view of Souden et al. (US 12,141,347) (“Souden”) and further in view of Kim et al. (US 2012/0128160)(“Kim”) and further in view of Kim et al. (US 2016/0004499) (“Kim”).
For claim 2, the combination of Vitt and Souden further discloses: a channel number conversion section (Vitt, conversion process anywhere in the system, column 4 lines 1 - 11) that changes the number of channels of part of the input signal (audio content source) and outputs a result of the change (Vitt, column 3 lines 53  - column 4 line 13), wherein the feature addition section adds the signal of the specific sound to the signal output from the channel number conversion section (Vitt, column 7 lines 47 – 53).
	Yet, the combination of Vitt and Souden fails to teach the channel number conversion section changing for the number of channels for the other part of the input signal (input signal comprises a signal sensed in a user environment/ microphone signals).
	However, Kim discloses a system and method of processing audio (Abstract), wherein a channel number conversion section (audio analyzer, Fig.1, 114; [0037 – 0039]) changes number of channels for signals sensed by a microphone ([0044] [0046] [0048]). 
	Therefore, it would have been obvious to one of ordinary skill in the art at the time of applicant’s filing to improve the invention disclosed by the combination of Vitt and Souden in the same way that Kim’s invention has been improved to achieve the following predictable results for the purpose of generating surround sound audio with a limited number of microphones (Kim, [0066]): the channel number conversion section further comprises functionality to change the number of channels for all of the input signal (input signal comprises a signal sensed in a user environment/ microphone signals and audio content source).
		For claim 3, Vitt and Kim further disclose, wherein the channel number conversion section uses an upmixing technology to increase the number of channels (Vitt, column 4 lines 1- 9) (Kim, [0044] [0048]).
For claim 4, the combination of Vitt and Souden further disclose wherein both the input signal (Vitt, the input signal comprises both a signal sensed in a user environment and the user content source, Fig.3) and the signal based on the input signal  (Vitt, user content output by the gain controller, Fig.3) are signals on multiple channels (Vitt, column 4 lines 1 – 12; column 7 lines 49 – 53).
Yet, the combination of Vitt and Souden fails to teach the following: the feature extraction section extracts the signal of the specific sound from each channel signal of the input signals, and the feature addition section applies the gain adjustment to each signal of the specific sound extracted in the feature extraction section and adds each result of the gain adjustment to each channel signal based on the input signal.
However, Kim discloses a system and method of processing audio (Abstract), comprising the following: an input signal (signal sensed in a user environment by microphones) comprises a  number of channels (An input signal sensed by microphones is automatically upmixed to set number of channels., [0044] [0048]); and the input signal is further processed for each channel (For example, noise suppression is applied in a channel specific manner, wherein a channel specific manner is broadly and reasonably interpreted to include all channels., [0064]). 
Therefore, it would have been obvious to one of ordinary skill in the art at the time of applicant’s filing to improve the invention disclosed by the combination of Vitt and Souden in the same way that Kim’s invention has been improved to achieve the following, predictable results for the purpose of processing multi-channel audio with audio transparency to indirectly control a user’s speech loudness (Vitt, column 2lines 19 – 46): audio processing, including extraction of the signal of the specific sound using the feature extraction section, further comprises extraction from each channel signal of the input signals (The portion of the input signal comprising the signal sensed in the user environment by microphones further comprises channels.), and the feature addition section further applies the gain adjustment to each signal of the specific sound extracted in the feature extraction section and adds each result of the gain adjustment to each channel signal based on the input signal.

Claim(s) 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vitt et al. (US 11,227,623) (“Vitt”) in view of Souden et al. (US 12,141,347) (“Souden”) and further in view of Al-Naimi et al. (US 2011/0286606) (“Al-Naimi”)
For claim 5, the combination of Vitt and Souden further discloses, wherein the feature addition section applies the gain adjustment to the signal of the specific sound (Vitt, column 7 lines 26 – 53; column 8 lines6 - 20). Yet, the combination of Vitt and Souden fails to teach that the gain adjustment setting changes the clarity of the specific sound.
However, Al-Naimi discloses a system and method for cancelling noise (Abstract), wherein a gain applied to a specific sound (wanted speech signal) changes the clarity of the specific sound ([0040] [0044 – 0051]).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of applicant’s filing to improve the invention disclosed by the combination of Vitt and Souden in the same way that Al-Naimi’s invention has been improved to achieve the following, predictable results for the purpose of providing efficient audio transparency in headphones playing audio content to indirectly control a user’s speech loudness such that a user is capable of distinguishing the user’s own speech (column 2 lines 19 – 54): the gain adjustment setting further changes the clarity of the specific sound (speech).

Claim(s) 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vitt et al. (US 11,227,623) (“Vitt”) in view of Souden et al. (US 12,141,347) (“Souden”), and further in view of Al-Naimi et al. (US 2011/0286606) (“Al-Naimi”) and further in view of Grokop et al. (US 2013/0090926) (“Grokop”).
For claim 6, the combination of Vitt, Souden, and Al-Naimi fails to teach, 
wherein the specific sound is a vocal of music content (Vitt, column 7 lines 25- 35) (Souden, column 6 lines 33 – 38), and the feature addition section reduces a gain as the gain adjustment (Vitt, column 8 lines 1 – 5).
However, Grokop discloses a system and method for detecting speech (Abstract), wherein ambient environment sounds comprise music with vocal content ([0033]).
	Therefore, it would have been obvious to one of ordinary skill in the art at the time of applicant’s filing to modify the combined teachings of Vitt, Souden and Al-Naimi with Grokop’s teachings so that the specific sound comprises both a musical content and a vocal of the musical content for the purpose of providing efficient audio transparency in headphones playing audio content to indirectly control a user’s speech loudness such that a user is capable of distinguishing the user’s own speech (Vitt, column 2 lines 19 – 54).

Claim(s) 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vitt et al. (US 11,227,623) (“Vitt”) in view of Souden et al. (US 12,141,347) (“Souden”) and further in view of Fairey (US 2013/0089208) (“Fairey”).
For claim 7, the combination of Vitt and Souden fails to teach, wherein the feature addition section applies the gain adjustment to the signal of the specific sound with such a setting that localization of the specific sound changes.
However, Fairey discloses a system and method for enhancing audio content (Abstract), comprising the following: applying gain to a specific sound  (audio content, Fig.11, 1032; [0011] [0073]) with such as setting that localization of the specific sound changes (An audio signal having two or more channels is received. The frequency bands are analyzed for each channel. Gains for select frequency bands are increased. An increase of gain in higher frequency bands affects the location information of specific sound – “Recent research indicates that, at higher frequencies, though, it is contemplated that the brain uses the amplitude of the sound in the ears as the location information source.” Thus, an increase in gain changes the localization of the sound., [0005 – 0007] [0049] [0050] [0072 – 0084]).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of applicant’s filing to improve the invention disclosed by the combination of Vitt and Souden in the same way that Fairey’s invention has been improved to achieve the following, predictable results for the purpose of providing efficient audio transparency in headphones playing audio content such that a user is capable of distinguishing the user’s own speech using both loudness and spatialization (Vitt, column 2 lines 19 – 54) (Souden, [0004] [0072]).
Claim(s) 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vitt et al. (US 11,227,623) (“Vitt”) in view of Souden et al. (US 12,141,347) (“Souden”) and further in view of Fisher (US 2005/0228647).
For claim 8, the combination of Vitt and Souden fails to teach a delay processing section that delays output of the signal based on the input signal to the feature addition section according to a processing time in the feature extraction section.
However, Fisher discloses a system and method for the purpose of processing audio signals (Abstract), comprising the following: a delay processing section  (Fig.2, 38) that delays output of a signal based on an input signal to a feature addition section (adaptive modifier, Fig.2, 36) according to a processing time in a feature extraction section (Fig.2, 26, 28, 30, 32,34 and 40; [0035 – 0039]).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of applicant’s filing to improve the invention disclosed by the combination of Vitt and Souden in the same way as Fisher’s invention has been improved to achieve the following, predictable results for the purpose of providing efficient audio transparency in headphones playing audio content such that a user is capable of distinguishing the user’s own speech using both loudness and spatialization (Vitt, column 2 lines 19 – 54):the system further comprises a delay processing section that delays output of the signal based on the input signal to the feature addition section according to a processing time in the feature extraction section.

Claim(s) 11 and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vitt et al. (US 11,227,623) (“Vitt”) in view of Souden et al. (US 12,141,347) (“Souden”) and further in view of Hijazi et al. (US 2022/0392478) (“Hijazi”).
For claim 11, the combination of Vitt and Souden fails to teach, wherein the feature extraction section uses a DNN (Deep Neural Network) as the machine learning.
However, Hijazi discloses  a system and method for performing speech enhancement (Abstract), comprising the following: using a DNN as machine learning method to perform speech enhancement ([0031 – 0035] [0041] [0043] [0044]).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of applicant’s filing to improve the invention disclosed by the combination of Vitt and Souden in the same way that Hijazi’s invention has been improved to achieve the predictable results of the machine learning used by the feature extraction section (Vitt,) (Souden) further being a DNN for the purpose of providing efficient audio transparency in headphones playing audio content such that a user is capable of distinguishing the user’s own speech using loudness (Vitt, column 2 lines 19 – 54).

For claim 17, the combination of Vitt and Souden fails to teach wherein the feature addition section applies, as desired, the gain adjustment to the signal of the specific sound according to operation information output from a user interface.
 However, Hijazi discloses  a system and method for performing speech enhancement (Abstract), comprising the following: gain adjustment is applied to specific signal according to operation information output form a user interface (A user selects an operation mode to process speech input. The speech input enhanced based on the selected mode., [0019] [0029] [0030] [0043] [0044]).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of applicant’s filing to improve the invention disclosed by the combination of Vitt and Souden in the same way that Hijazi’s invention has been improved to achieve the following, predictable results for the purpose of providing efficient audio transparency in headphones playing audio content such that a user is capable of distinguishing the user’s own speech using loudness (Vitt, column 2 lines 19 – 54): the device further receives user input from a user interface; the device outputs the output information , e.g. a mode selection, from the user interface; and the feature addition section applies, as desired, the gain adjustment to the signal of the specific sound according to the operation information output from the user interface.

Claim(s) 13, 14 and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vitt et al. (US 11,227,623) (“Vitt”) in view of Souden et al. (US 12,141,347) (“Souden”), and further in view of Yang et al. (US 11,205,440) (“Yang”).
For claim 13, the combination of Vitt and Souden further discloses a sensor device (Vitt, Fig.6, 158; column 9 lines 1 - 45). Yet, the combination of Vitt and Souden fails to teach, wherein the feature addition section automatically applies the gain adjustment to the signal of the specific sound according to sensing information output from the sensor device.
	However, Yang discloses a sound playback system and output sound adjusting method (Abstract), comprising the following: a gain adjustment is applied to a specific signal (speech/voice) according to sensing information output from a sensor device (capturing module of a near-end electronic device, Fig.1, 10 and Fig.3,14; column 2 lines 41 – 45, column 4 lines 18 – 27) (Fig.4, S404- S408; column 5 lines 17 – 63).
	Therefore, it would have been obvious to one of ordinary skill in the art at the time of applicant’s filing to improve the invention disclosed by the combination of Vitt and Souden in the same way that Yang’s invention has been improved to achieve the following, predictable results for the purpose of providing efficient audio transparency in headphones playing audio content such that a user is capable of distinguishing the user’s own speech using loudness  (Vitt, column 2 lines 19 – 54) (Yang, column 1 lines 15 – 30): the feature addition section further comprises functionality to receive and process sensing information from the sensor device; and the feature addition section automatically applies the gain adjustment to the signal of the specific sound according to sensing information output from the sensor device.

	For claim 14, Vitt and Yang further disclose, wherein a camera (Vitt, Fig.6, 158) (Yang, Fig.1, 10 and Fig.3,14) is included in the sensor device (Vitt, column 9 lines 1 – 45) (Yang, column 2 lines 41 – 45, column 4 lines 18 – 27), and the feature addition section applies the gain adjustment to the signal of the specific sound according to a user age obtained by analyzing an image captured by the camera (Vitt, column 7 lines 47 -  53)(Yang, Fig.4, S404- S408; column 5 lines 17 – 63).

	For claim 16, Vitt further discloses, wherein a microphone is included in the sensor device (Vitt, Fig.3, 64 and Fig.6, 154), and the feature addition section applies the gain adjustment to the signal of the specific sound according to a level of external noise obtained by analyzing collected sound information (Vitt, column 7 lines 47 – 53; column 8 lines 6 – 20).

Claim(s) 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vitt et al. (US 11,227,623) (“Vitt”) in view of Souden et al. (US 12,141,347) (“Souden”), and further in view of Yang et al. (US 11,205,440) (“Yang”) and further in view of Thomas et al. (US 2021/0318850) (“Thomas”).
For claim 15, the combination of Vitt, Souden and Yang further disclose, wherein a camera (Vitt, Fig.6, 158) (Yang, Fig.1, 10 and Fig.3,14) is included in the sensor device (Vitt, column 9 lines 1 – 45) (Yang, column 2 lines 41 – 45, column 4 lines 18 – 27). Yet, the combination of Vitt, Souden and Yang fails to teach, wherein the feature addition section applies the gain adjustment to the signal of the specific sound according to a user position obtained by analyzing an image captured by the camera.
However, Thomas discloses a device and method for controlling microphone gain(Abstract), comprising the following: capturing images of a user  using a camera (Fig.3, 104) ([0040] [0043] [0044]);and a adjusting the gain of a signal based on a user position (proximity) obtained by analyzing the images captured by the camera ([0044 – 0047]).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of applicant’s filing to improve the invention disclosed by the combination of Vitt, Souden and Yang in the same way that Thomas’s invention has been improved to achieve the following, predictable results for the purpose of providing efficient audio transparency in headphones playing audio content such that a user is capable of distinguishing the user’s own speech using loudness (Vitt, column 2 lines 19 – 54): the feature addition section  further applies the gain adjustment to the signal of the specific sound according to a user position obtained by analyzing an image captured by the camera.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SONIA L GAY whose telephone number is (571)270-1951. The examiner can normally be reached Monday-Friday 9-5 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached at 571-272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SONIA L GAY/Primary Examiner, Art Unit 2657
Read full office action
Prosecution Timeline

Aug 19, 2024
Application Filed
Mar 07, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/065,406
Patent 12602617
DATA MANUFACTURING FRAMEWORKS FOR SYNTHESIZING SYNTHETIC TRAINING DATA TO FACILITATE TRAINING A NATURAL LANGUAGE TO LOGICAL FORM MODEL
2y 5m to grant Granted Apr 14, 2026
18/136,634
Patent 12602408
STREAMING OF NATURAL LANGUAGE (NL) BASED OUTPUT GENERATED USING A LARGE LANGUAGE MODEL (LLM) TO REDUCE LATENCY IN RENDERING THEREOF
2y 5m to grant Granted Apr 14, 2026
18/390,675
Patent 12602539
PROACTIVE ASSISTANCE VIA A CASCADE OF LLMS
2y 5m to grant Granted Apr 14, 2026
18/467,276
Patent 12596708
SYSTEMS AND METHODS FOR AUTOMATED CODE GENERATION FOR CALCULATION BASED ON ASSOCIATED FORMAL SPECIFICATIONS
2y 5m to grant Granted Apr 07, 2026
18/209,100
Patent 12591604
INTELLIGENT ASSISTANT
2y 5m to grant Granted Mar 31, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
82%
Grant Probability
93%
With Interview (+11.4%)
3y 0m
Median Time to Grant
Low
PTA Risk
Based on 855 resolved cases by this examiner. Grant probability derived from career allow rate.
SIGNAL PROCESSING APPARATUS AND SIGNAL PROCESSING METHOD

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email