Last updated: April 19, 2026

Application No. 18/251,248

SIGNAL PROCESSING DEVICE, SIGNAL PROCESSING METHOD, AND PROGRAM

Non-Final OA §103

Filed

May 01, 2023

Examiner

TENGBUMROONG, NATHAN NARA

Art Unit

2654

Tech Center

2600 — Communications

Assignee

Sony Group Corporation

OA Round

3 (Non-Final)

This examiner grants 43% of cases after interview

— +75.0% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 14 resolved cases, 2023–2026

Examiner Intelligence

TENGBUMROONG, NATHAN NARA View full profile →

Grants 43% of resolved cases

Career Allow Rate

6 granted / 14 resolved

-19.1% vs TC avg

Strong +75% interview lift

Without

With

+75.0%

Interview Lift

resolved cases with interview

Typical timeline

3y 0m

Avg Prosecution

34 currently pending

Career history

Total Applications

across all art units

Statute-Specific Performance

§101

27.2%

-12.8% vs TC avg

§103

54.3%

+14.3% vs TC avg

§102

14.8%

-25.2% vs TC avg

§112

3.2%

-36.8% vs TC avg

Black line = Tech Center average estimate • Based on career data from 14 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 10/30/2025 has been entered.
 
Response to Amendment
	Claims 1-4 and 8-10 are amended. Claims 1-10 are presented for examination.

Response to Arguments
Rejection under 35 U.S.C. 103
Applicant’s arguments have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 6, and 9-10 are rejected under 35 U.S.C. 103 as being unpatentable over Kitamura et al. (US 20220406325 A1; hereinafter referred to as Kitamura) in view of Disch et al. (US 20170256267 A1; hereinafter referred to as Disch), Koretzky et al. (US 20180122403 A1; hereinafter referred to as Koretzky), and Grauman et al. (US 20210174817 A1; hereinafter referred to as Grauman).
Regarding claim 1, Kitamura teaches: a signal processing device, comprising: circuitry configured to ([0029] the controller 11 is constituted of one or more types of processors, such as CPU (Central Processing Unit), SPU (Sound Processing Unit), DSP (Digital Signal Processor), FPGA (Field Programmable Gate Array), or Application Specific Integrated Circuit (ASIC) and so on): receive a first mixed sound signal ([0007] input data including first sound data, second sound data, and mix sound data), wherein the first mixed sound signal includes a plurality of first sound source signals that are mixed with each other, the plurality of first sound source signals includes a high-frequency component, a frequency of the high-frequency component is greater than a specific frequency… ([0034] The mix sound represented by the audio signal Sx includes components in a frequency band BL and components in a frequency band BH. The frequency band BL and the frequency band BH differ from each other within the whole band BF. The frequency band BL is lower than the frequency band BH. Specifically, the frequency band BL is below a given frequency on the frequency axis within the whole band BF, and the frequency band BH is above the given frequency within the whole band BF. Accordingly, the frequency band BL and the frequency band BH do not overlap. For example, the frequency band BL ranges from 0 kHz to less than 4 kHz, and the frequency band BH ranges from 4 kHz to 8 kHz).
Kitamura does not explicitly, but Disch discloses: the first mixed sound signal corresponds to a high-resolution sound source that has a higher sound quality than a sound quality of a non-high resolution sound source ([0079] the spectrum analysis is applied to separate high resolution spectral components 304, 305, 306, 307 (the first set of first spectral portions) from low resolution components represented by the second set of second spectral portions), and the non-high resolution sound source excludes the high-frequency component ([0079] The first encoding processor 600 comprises a time frequency converter 602 for converting the first input audio signal portion into a frequency domain representation having spectral lines up to a maximum frequency of the input signal. Furthermore, the first encoding processor 600 comprises an analyzer 604 for analyzing the frequency domain representation up to the maximum frequency to determine first spectral regions to be encoded with a first spectral representation and to determine second spectral regions to be encoded with a second spectral resolution being lower than the first spectral resolution);
apply a downsampling process to the first mixed sound signal ([0048] use a frequency-time transform which additionally performs a very efficient downsampling from the higher output or input sampling rate into the lower time domain core coder sampling rate);
generate a second mixed sound signal based on the application of the downsampling process on the first mixed sound signal ([0048] This cross-processor may be applied on the encoder-side and, additionally, on the decoder-side and may use a frequency-time transform which additionally performs a very efficient downsampling from the higher output or input sampling rate into the lower time domain core coder sampling rate by only selecting a certain low band portion of the domain signal together with a certain reduced transform size), wherein the second mixed sound signal includes a plurality of second sound source signals of the non-high resolution sound source… ([0125] the audio decoder additionally comprises the cross-processor 1170 illustrated in FIG. 11B and in FIG. 14B for calculating, from the decoded spectral representation of the first encoded audio signal portion, initialization data of the second decoding processor so that the second decoding processor is initialized to decode the encoded second audio signal portion following in time the first audio signal portion in the encoded audio signal).
 Kitamura and Disch are considered analogous in the field of audio processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Kitamura to combine the teachings of Disch because doing so would allow for processing and determining of high resolution and non-high resolution sound sources in an input audio in order to improve the quality of sound source separation processing by preserving specific audio details while reducing data size (Disch [0206, 0212] the spectral analyzer or full-band analyzer 604 is configured to analyze the representation generated by the time-frequency converter 602 for determining a first set of first spectral portions to be encoded with the first high spectral resolution and the different second set of second spectral portions to be encoded with a second spectral resolution which is lower than the first spectral resolution and, by means of the spectral analyzer, a first spectral portion 306 is determined, with respect to frequency, between two second spectral portions in FIG. 3 at 307a and 307b… the analyzer is configured to apply a tonal mask processing at least of a portion of the spectral representation so that tonal components and non-tonal components are separated from each other, wherein the first set of the first spectral portions comprises the tonal components and wherein the second set of the second spectral portions comprises the non-tonal components).
The combination of Kitamura and Disch does not explicitly, but Koretzky teaches: generate each of a plurality of masks corresponding to each of the plurality of second sound source signals ([0099] The trained DNN may process the input sample and generate an output, via its output layer, representing one or more spectrogram component fragment masks. FIG. 7 illustrates this example workflow for the audio source separation logic 116 for generating a spectrogram component fragment mask for a drums instrument from an original spectrogram fragment used as an input sample), wherein the generation of the each of the plurality of masks is based on a result of the application of the downsampling process on the first mixed sound signal… ([0101] input sample 710 is a fragment of the spectrogram received by audio source separation logic 116 from transform logic 114. In an embodiment, input sample 710 may be taken on a down-sampled version of the original mix spectrogram).
Kitamura, Disch, and Koretzky are considered analogous in the field of audio processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Kitamura and Disch to combine the teachings of Koretzky because doing so would allow for greater efficiency in generating masks for the sound source signals for source separation by using down-sampling on the mixed audio to simplify the audio data while still maintaining audio quality (Koretzky [0101] input sample 710 may be taken on a down-sampled version of the original mix spectrogram. For example, as described earlier, the original sampling rate used by fragmentation logic 112 may be implemented to use a down-sampled sampling rate of 22,050 Hz. Using a down-sampled sampling rate improves computational efficiency of DNN performance by simplifying the input sample received at the input layer of the DNN 720, without compromising quality in a significant way).
The combination of Kitamura, Disch, and Koretzky does not explicitly, but Grauman teaches: and a ratio of a sound source signal of the plurality of second sound source signals to a sum of the each of the plurality of second sound source signals ([0075, claim 10] See equation (9); ground-truth spectrogram ratio mask is a ratio between: (1) a magnitude spectrogram of the audio of one of the one or more sets; and (2) a sum of magnitude spectrograms of the audio of the one or more sets);
and apply the generated plurality of masks ([0071] The goal of the audio-visual separators 310 is to generate spectrogram masks to independently extract/separate each of the sound signals s.sub.1(t), s.sub.2(t), and s.sub.3(t) from the mixed audio 304 (i.e., x.sub.m(t)).) to the first mixed sound signal ([0115] generating a separated magnitude spectrogram for each detected object using the predicted magnitude spectrogram masks and the magnitude spectrogram (step 612), according to some embodiments. In some embodiments, step 612 is performed by separated spectrogram generator 416. The separated magnitude spectrogram may be generated for each of the detected objects using the spectrogram 308 of the mixed audio 304. For example, the separated spectrogram generator 416 may mask the magnitude spectrogram of the mixed audio or the received audio with the predicted magnitude spectrogram masks of each detected object to generate a separated magnitude spectrogram for each detected object).
Kitamura, Disch, Koretzky, and Grauman are considered analogous in the field of audio processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Kitamura, Disch, and Koretzky to combine the teachings of Grauman because doing so would improve the accuracy of sound source separation by generating soft spectrogram masks with specific ratios to apply to mixed audio (Grauman [0117] Training the neural network with realistic multi-source video/audio data may improve the separation accuracy of the neural network. Additionally, the systems and methods herein use soft spectrogram masking which can improve the accuracy of the separation. For example, the neural network can be configured to predict the spectrogram masks which may facilitate improved separation accuracy compared to prediction of spectrograms or raw waveforms for source separation. Additionally, computing the losses (e.g., the co-separation loss and the consistency loss) over masks as opposed to spectrograms may facilitate improved learning/training of the neural network).

Regarding claim 6, the combination of Kitamura, Disch, Koretzky, and Grauman teaches: the signal processing device according to claim 1. Kitamura further teaches: wherein the circuitry is further configured to: separate the first mixed sound signal into the plurality of first sound source signals ([0029] The audio signal Sz is a time-domain signal representative of a sound in which one of the first sound and the second sound is emphasized relative to the other. In other words, the audio processing system 100 performs sound source separation to separate the audio signal Sx into respective sound source); 
and output the plurality of first sound source signals separated from the first mixed sound signal ([0051] The audio signal Sz is then supplied to the sound outputter 13, thereby being emitted as a sound wave).

Regarding claim 9, it recites similar limitations as claim 1 and therefore is rejected similarly.

Regarding claim 10, Kitamura teaches: a non-transitory computer-readable medium having stored thereon computer-executable instructions that when executed by a processor, cause the processor to perform a signal processing method execute operation ([0104] The functions of the audio processing system 100 are realized, as described above, by cooperation of one or more processors constituting the controller 11 and the programs (P1, P2) stored in the storage device 12. The programs according to the present disclosure may be provided in a form stored in a computer-readable recording medium and installed on a computer. The recording medium is a non-transitory recording medium). The rest of the claim recites similar limitations as claim 1 and therefore is rejected similarly.

Claims 2-3 and 7-8 are rejected under 35 U.S.C. 103 as being unpatentable over Kitamura in view of Disch, Koretzky, and Grauman, as applied to claims 1, 6, and 9-10 above, and further in view of Herber et al. (US 20160329061 A1; hereinafter referred to as Herber).
Regarding claim 2, the combination of Kitamura, Disch, Koretzky, and Grauman teaches: the signal processing device according to claim 1. Kitamura further teaches: wherein circuitry is further configured to: perform a sound source separation process on the first mixed sound signal ([0036] an example is given of sound source separation in the frequency domain. However, the sound source separator 22 may also perform sound source separation in the time domain on the audio signal Sx); 
separate the first mixed sound signal into the plurality of first sound source signals based on the sound source separation process… ([0029] The audio signal Sz is a time-domain signal representative of a sound in which one of the first sound and the second sound is emphasized relative to the other. In other words, the audio processing system 100 performs sound source separation to separate the audio signal Sx into respective sound source).
The combination of Kitamura, Disch, Koretzky, and Grauman does not explicitly, but Herber teaches: apply a frequency band extension process to each of the plurality of first sound source signals separated from the first mixed sound signal ([0040] The Bandwidth Extension module 301 may operate on the input signal (X) to generate signal components, or signal treatments (ST1), above such a predetermined cut-off frequency (Fx)); 
and generate, based on a result of the frequency band extension process, the each of the plurality of masks corresponding to each of the plurality of second sound source signals ([0049] The Masked Signal Fill module 306 may operate to identify the missing parts of the corresponding sample components of an audio signal, and amplify low-level signal components so that they are just at the threshold of being masked. The Masked Signal Fill module 306 may receive the input signal (X) and apply a perceptual model to determine the "simultaneous masking threshold" for each frequency. The simultaneous masking threshold indicates the level at which the perceptual model determines that the signal component at a certain frequency is masked by the signal components at other frequencies).
Kitamura, Disch, Koretzky, Grauman, and Herber are considered analogous in the field of audio processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Kitamura, Disch, Koretzky, and Grauman to combine the teachings of Herber because doing so would improve sound source separation by using frequency band extension to determine more details of an audio signal, leading to more accurate masking thresholds for different audio sources in a mixed audio, resulting in improved sound source separation (Herber [0040] a perceptual audio codec may consider all frequency components above a predetermined frequency, such as above 12 kHz, to be less perceptually important and thus discard them. The Bandwidth Extension module 301 may operate on the input signal (X) to generate signal components, or signal treatments (ST1), above such a predetermined cut-off frequency (Fx). The Bandwidth Extension module 301 may analyze the input signal (X) to determine the cut-off frequency (Fx) of the input signal, if one exists. Knowledge of the cut-off frequency (Fx) may be used to guide the generation of a Signal Treatment stream (ST1) with new signal components above the predetermined cut-off frequency (Fx) to compensate for the absence of this characteristic in the corresponding sample components of the audio signal).

Regarding claim 3, the combination of Kitamura, Disch, Koretzky, Grauman, and Herber teaches: the signal processing device according to claim 2. Herber further teaches: wherein circuitry is further configured to generate the each of the plurality of mask based on the first mixed sound signal ([0049] The Masked Signal Fill module 306 may receive the input signal (X) and apply a perceptual model to determine the "simultaneous masking threshold" for each frequency. The simultaneous masking threshold indicates the level at which the perceptual model determines that the signal component at a certain frequency is masked by the signal components at other frequencies).

Regarding claim 7, the combination of Kitamura, Disch, Koretzky, Grauman, and Herber teaches: the signal processing device according to claim 2. Kitamura further teaches: wherein the circuitry is further configured to apply the frequency band extension process ([0040] the band extender 23 converts the frequency band of each of the first and second sounds from the frequency band BL to the whole band BF (the frequency band BL and the frequency band BH)) to each of the plurality of first sound source signals separated from the first mixed sound signal ([0035] The sound source separator 22 in FIG. 2 performs sound source separation on the intensity spectrum X(m). Specifically, the sound source separator 22 performs sound source separation of the mix sound of the first sound and the second sound regarding the frequency band BL, to generate an intensity spectrum Y1(m) corresponding to the frequency band BL and an intensity spectrum Y2(m) corresponding to the frequency band BL).

Regarding claim 8, the combination of Kitamura, Disch, Koretzky, Grauman, and Herber teaches: the signal processing device according to claim 2. Kitamura further teaches: wherein circuitry is further configured to apply the frequency band extension process to a specific sound source signal of the plurality of first sound source signals separated from the first mixed sound signal ([0099] the band extender 23 generates both the first output data O1(m) representative of the intensity spectrum Z1(m) in which the first sound is emphasized and the second output data O2(m) representative of the intensity spectrum Z2(m) in which the second sound is emphasized. However, the band extender 23 may generate either the first output data O1(m) or the second output data O2(m) as the output data O(m)).

Claims 4-5 are rejected under 35 U.S.C. 103 as being unpatentable over Kitamura in view of Disch, Koretzky, and Grauman, as applied to claims 1, 6, and 9-10 above, and further in view of Nakatani et al. (US 20180366135 A1; hereinafter referred to as Nakatani).
Regarding claim 4, the combination of Kitamura, Disch, Koretzky, and Grauman teaches: the signal processing device according to claim 1. The combination of Kitamura, Disch, Koretzky, and Grauman does not explicitly, but Nakatani discloses: wherein the circuitry includes a filter ([0079] By applying this multi-channel Wiener filter W.sub.n(f) to the observation feature value vector x(t, f), it is possible to suppress the components of the sound sources other than the target sound source n), and the first mixed sound signal matches with the sum of the each of the plurality of second sound source signals in the filter ([0072] the mask estimation unit 20 receives the observation feature value vector x(t, f) and estimates, for each time-frequency point, as the value of a mask, the proportion of each of the target sound sources mixed with the background noise. Furthermore, as indicated by Equation (30), it is assumed that, at the time-frequency point, the sum total of the masks related to all of the target sound sources and the background noise becomes one).
Kitamura, Disch, Koretzky, Grauman, and Nakatani are considered analogous in the field of audio processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Kitamura, Disch, Koretzky, and Grauman to combine the teachings of Nakatani because doing so would allow for more accurate separation of different sound sources in a mixed audio signal by using a Wiener filter to determine a target sound source among multiple sound sources, such as background noise (Nakatani [0014] According to the present invention, it is possible to accurately remove the effect of background noise from observation signals and estimate a spatial correlation matrix of target sound sources with high accuracy).

Regarding claim 5, the combination of Kitamura, Disch, Koretzky, Grauman, and Nakatani teaches: the signal processing device according to claim 4. Nakatani further teaches: wherein the filter is a Wiener filter ([0079] By applying this multi-channel Wiener filter W.sub.n(f) to the observation feature value vector x(t, f), it is possible to suppress the components of the sound sources other than the target sound source n and the component of the background noise).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Nathan Tengbumroong whose telephone number is (703)756-1725. The examiner can normally be reached Monday - Friday, 11:30 am - 8:00 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hai Phan can be reached at 571-272-6338. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/NATHAN TENGBUMROONG/Examiner, Art Unit 2654             

/HAI PHAN/Supervisory Patent Examiner, Art Unit 2654

Read full office action

Prosecution Timeline

May 01, 2023

Application Filed

Jan 07, 2025

Non-Final Rejection — §103

May 14, 2025

Response Filed

Jul 29, 2025

Final Rejection — §103

Oct 30, 2025

Response after Non-Final Action

Oct 30, 2025

Request for Continued Examination

Nov 05, 2025

Response after Non-Final Action

Jan 16, 2026

Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/173,495

Patent 12530536

Mixture-Of-Expert Approach to Reinforcement Learning-Based Dialogue Management

2y 5m to grant Granted Jan 20, 2026

17/876,156

Patent 12451142

NON-WAKE WORD INVOCATION OF AN AUTOMATED ASSISTANT FROM CERTAIN UTTERANCES RELATED TO DISPLAY CONTENT

2y 5m to grant Granted Oct 21, 2025

17/883,265

Patent 12412050

MULTI-PLATFORM VOICE ANALYSIS AND TRANSLATION

2y 5m to grant Granted Sep 09, 2025

Study what changed to get past this examiner. Based on 3 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

43%

Grant Probability

99%

With Interview (+75.0%)

3y 0m

Median Time to Grant

High

PTA Risk

Based on 14 resolved cases by this examiner. Grant probability derived from career allow rate.