Last updated: May 29, 2026

Application No. 18/068,559

APPARATUS AND METHOD FOR AUDIO DATA ANALYSIS

Final Rejection §103

Filed

Dec 20, 2022

Priority

Jan 11, 2022 — GB 2200274.5

Examiner

KRZYSTAN, ALEXANDER J

Art Unit

2694

Tech Center

2600 — Communications

Assignee

Sony Interactive Entertainment Inc.

OA Round

5 (Final)

Interview Optional

— +7.3% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 81% grant rate with +7.3% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.

Based on 1123 resolved cases, 2023–2026

Examiner Intelligence

KRZYSTAN, ALEXANDER J View full profile →

Grants 81% — above average

Career Allowance Rate

914 granted / 1123 resolved

+19.4% vs TC avg

Moderate +7% lift

Without

With

+7.3%

Interview Lift

resolved cases with interview

Typical timeline

2y 12m

Avg Prosecution

32 currently pending

Career history

1161

Total Applications

across all art units

Statute-Specific Performance

§101

0.4%

-39.6% vs TC avg

§103

72.9%

+32.9% vs TC avg

§102

9.4%

-30.6% vs TC avg

§112

2.9%

-37.1% vs TC avg

Black line = Tech Center average estimate • Based on career data from 1123 resolved cases

Office Action

§103

DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Examiner’s Comments

The recited limitations in claims 21-23 comprise many nonsensical combinations and are read as respectively mapped, alternative, non-negative recitations of the non-combined, clearly mappable, distinctly recited elements.
The comparing steps recited throughout the claims are read as per respective compared elements.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-3,6-16,18,20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Younessian (US 20220130408 A1) and further in view of Ramakrishnan (US 20210407493 A1).

As per claim 1, Younessian (US 20220130408 A1) discloses a data processing (which requires one or more processors, memory and software in order to implement the cited functions) apparatus, comprising: 
storage circuitry (150 in fig. 1 as implemented with the devices shown in fig. 6) to store a plurality of sound recordings (requiring a storage device); 
receiving circuitry (the circuitry to implement the inputs to 140 and 150 in fig. 1) to receive audio input data/input data being a digital representation indicative of one or more sounds detected by a microphone (the input devices to the device 601 comprises microphones per para. 96, and further the audio content as it is in digital form as used by computer 601,); 
selection circuitry to perform an audio analysis of the input data to obtain one or more audio properties (120 115 130 in fig. 1 performing analysis on inputs to produce outputs)) of the input data; 

perform an audio analysis of at least a portion of a sound recording of the plurality of sound recordings to determine one or more audio properties of the sound recording of the plurality of sound recordings; (the analysis required to obtain ‘auditory events described and/or include in an auditory event repository 150’ and or the distribution of visual elements 124  per the para. 46: and/or the distribution of textual elements 115, and one or more auditory events, such as auditory events stored in an auditory event repository 150. For example, the correlation unit 140 may receive the distribution of visual elements 124 and compare elements of the distribution of visual elements 124 to stored/predefined auditory events described and/or include in an auditory event repository 150.

compare the one or more audio properties of the input data with one or more audio properties of each sound recording of the plurality of sound recordings (block 140 compares input properties to ; (para. 46: and/or the distribution of textual elements 115, and one or more auditory events, such as auditory events stored in an auditory event repository 150. For example, the correlation unit 140 may receive the distribution of visual elements 124 and compare elements of the distribution of visual elements 124 to stored/predefined auditory events described and/or include in an auditory event repository 150.) and


select, from the plurality of sounds recordings, one or more candidate sound recordings 140 may determine that because the visual element “fire” from the distribution of visual elements 124 is lexically the same as (e.g., the same word/spelling, etc.) as an auditory event/label “fire” within the auditory event repository 150, that the visual element “fire” is a candidate auditory event); 
where the above cited correlation is by definition in dependence upon a difference between at least one of the one or more audio properties of the candidate sound recording and corresponding at least one of the one or more audio properties of the input data;
and 
output circuitry to output data in dependence upon one or more of the candidate sound recordings (the circuitry to perform the function per para. 24: one or more candidate audio events in an audio event repository that may be used to supplement the content item, for example, by inserting an audio event to supplement for a missing audio event or by replacing an existing audio event with an audio event that is more relevant to the content item or to that is louder or quieter than the existing audio event).

However, Younessian does not specify that the input audio signal is detected in realtime by a microphone.
Ramakrishnan discloses a machine learning/data clustering based audio classification system for a computer device/user audio device (abstract), and teaches that the audio input can be voice via a microphone either via headset per para 3, or directly attached to the computer per para 65.  The application is a telecommunications application hence the audio is received and processed in realtime (para 3).  Ramakrishnan teaches that this improves quality per para 3-5 during a telecommunications/realtime session.  It would have been obvious to one skilled in the art at the time of filing that the device of Younessian could implement the interface via a microphone as taught by Ramakrishnan for the purpose of improving the quality of a communication.

As per claim 2, the data processing apparatus according to claim 1, wherein the input data/audio input data is indicative of a speech-based input by a user, wherein the speech-based input comprises at least one of a spoken word and a non-linguistic vocalisation by the user (para. 32: An auditory event may include a sound, a plurality of sounds, a sound effect, a voice).

As per claim 3, the data processing apparatus according to claim 1, wherein the input data is indicative of a non-speech based input by a user, wherein the non-speech based input comprises one or more sounds associated with one or more objects (para. 32: An auditory event may include a sound, a plurality of sounds, a sound effect, a voice).

As per claim 4 (cancelled), the data processing apparatus according to claim 1, wherein the selection circuitry is configured to select a candidate sound recording in dependence upon a degree of match between the candidate sound recording and the input data (para. 28: Elements of the combined distribution of media elements may be compared to one or more auditory events within the auditory event repository to identify candidate auditory events. Word embedding, ontology learning, syntax analysis, natural language processing, and/or the like may be used to identify media elements (e.g., indications/descriptions/labels of objects, actions, scenes, events, etc.) of a distribution of media elements that satisfy a correlation (similarity) threshold between one or more auditory events within an auditory event repository).

As per claim 5 (cancelled), the data processing apparatus according to claim 4, wherein the selection circuitry is configured to select the candidate sound recording in dependence upon a difference between an audio property of the candidate sound recording and a corresponding audio property of the input data (the similarity cited in the claim 4 rejection is by definition in dependence upon a difference between an audio property of the candidate sound recording and a corresponding audio property of the input data).

As per claim 6, the data processing apparatus according to claim 5, comprising first modifying circuitry to modify the audio property of the candidate sound recording (para. 60: The augmentation unit 160 may use a sound (e.g., raw waveform, sound clip, an audio file, etc.) from an auditory event repository 150 (indicated by the distribution of candidate auditory events 126) to augment/enhance the audio level of the waveform 161.) in dependence upon the corresponding audio property of the input data when the difference between the audio property of the candidate sound recording and the corresponding audio property of the input data is greater than a threshold amount (para. 60: the first portion of the audio content 102 is low (e.g., does not satisfy an audio level threshold, amplitude, etc., where that threshold level also defines the difference between the audio content and the candidate audio from the audio repository per para 60, which is not low).

As per claim 7, the data processing apparatus according to claim 1, wherein the selection circuitry is configured to generate text data in dependence upon the input data and to select the candidate sound recording in dependence upon a comparison of the text data with metadata associated with the candidate sound recording (para. 85:  A media element may be determined to be a textual description of the sound of a police siren, and based on the media element, a candidate auditory event may be determined to be a sound of a police siren) (where the determination is in dependance upon comparing the textual description with the processor based memory address/metadata associated with the candidate auditory event ).

As per claim 8, the data processing apparatus according to claim 7, wherein the metadata associated with the candidate sound recording is determined, using a machine learning model, in dependence upon one or more audio properties for the candidate sound recording (the determination of the matching candidate audio is based on machine learning per the machine learning based functions described in para. 38).

As per claim 9, the data processing apparatus according to claim 1, wherein the output circuitry is configured to output data for at least a first candidate sound recording and a second candidate sound recording (the system as per the claim 1 rejection, running continuously will process a first detected input data, and then a second detected input data over time and also in parallel as shown in fig. 1 either of which would comprise the second candidate sound recording).

Ass per claim 10, the data processing apparatus according to claim 1, comprising mixing circuitry to mix two or more of the candidate sounds recordings to obtain a combined sound recording, wherein the output circuitry is configured to output data for the combined sound recording (para. 28: the combined distribution of media elements which can include multiple candidate audio, to produce a distribution of media elements that satisfy a correlation (similarity) threshold between one or more auditory events within an auditory event repository).

As per claim 11, the data processing apparatus according to claim 1, comprising second modifying circuitry to modify a candidate sound recording, wherein the receiving circuitry is configured to receive second input data in response to the data output by the output circuitry (the system of the claim 1 rejection and associated processor process a series of input data over time, and simultaneously in parallel, including a first and second input data, where each process is dependent upon a common clocking signal and set of registers to perform the disclosed functions in synchrony, where the input of a second input data that is processed must be in response to the data output by output circuitry that outputs the output based on the first candidate sound recording in order for the processor to process the second input data in the same manner as that applied to the first input data).

and wherein the second modifying circuitry is configured to modify the candidate sound recording in dependence upon the second input data (analogous to the function per the claim 6 rejection).

As per claim 12, the data processing apparatus according to claim 11, wherein the second input data is indicative of at least one of a speech-based input and a controller input by a user for indicating one or more modifications to be applied to the candidate sound recording (the input data, including first and second can be based on speech as per the claim 2 rejection).

As per claim 13, the data processing apparatus according to claim 1, wherein at least some of the plurality of sound recordings comprise a respective sound effect (the candidates can comprise respective sound effects as shown in fig. 1, siren, fire, dog).

As per claim 14, the data processing apparatus according to claim 1, wherein at least some of the plurality of sound recordings are included in a database for a respective video game (para. 90: the device can comp[rise a gameport, by which it would be a videogame machine where the video cited in para. 20 is then a videogame, which makes the set of candidate recordings a database for a respective videogame).

As per claim 15, a data processing method comprising: 
A data processing method comprising: storing a plurality of sound recordings; receiving input data indicative of one or more sounds detected in realtime by a microphone; performing an audio analysis of at least a portion of the input data to determineobtain one or more audio properties of the input data;performing an audio analysis of at least a portion of a sound recording to determine one or more audio properties of the sound recording of the plurality of sound recordings;comparing the one or more audio properties of the input data with the one or more audio properties of ileach]] the sound recording of the plurality of sound recordings; selecting, from the plurality of sounds recordings, one or more candidate sound recordings in dependence upon a result of said comparing a difference between at least one of the one or moreaudio properties of the candidate sound recording and corresponding at least one of the one or more audio properties of the input data; and outputting data in dependence upon theone or more of the candidate sound recordings.. (per the respective circuitry of the claim 1 rejection).

As per claim 16,  A non-transitory, computer readable storage medium containing computer software which, when executed by a computer, causes the computer to carry out a data processing method, comprising: storing a plurality of sound recordings; receiving input data indicative of one or more sounds detected in realtime by a microphone; performing an audio analysis of at least a portion of the input data to obtain determine one or more audio properties of the input data; performing an audio analysis of at least a portion of a sound recording to determine one or more audio properties of the sound recording of the plurality of sound recordings;comparing the one or more audio properties of the input data with the one or more audio properties of ileach]] the sound recording of the plurality of sound recordings; selecting, from the plurality of sounds recordings, one or more candidate sound recordings in dependence upon a result of said comparing a difference between at least one of the one or more audio properties of the candidate sound recording and corresponding at least one of the one or more audio properties of the input data; and outputting data in dependence upon the one or more of the candidate sound recordings. (as required by the system per the claim 1 rejection).

As per claim 17 (cancelled), the data processing method according to claim 15, wherein said selecting comprises selecting the candidate sound recording in dependence upon a degree of match between the candidate sound recording and the input data, and further in dependence upon a difference between an audio property of the candidate sound recording and a corresponding audio property of the input data (the correlation/comparison cited above by definition comprises selecting based on a degree of match and further upon the difference between audio properties).  
As per claim 18, the data processing method according to claim 17, further comprising: modifying the audio property of the candidate sound recording in dependence upon the corresponding audio property of the input data when the difference between the audio property of the candidate sound recording and the corresponding audio property of the input data is greater than a threshold amount (para. 53:  correlation threshold value may be any value, such as 0.5. The correlation values of 1 for the first portion of the content item 101 may both satisfy the correlation threshold value of 0.5. Correlation values that satisfy a correlation threshold may indicate a candidate auditory event (e.g., a labeled auditory event that may or may not be present in the audio content, etc; with the modifying per para. 61: [0061] The audio augmentation unit 160 may, based on the audio level of any speech/dialogue included with the plurality of portions of the audio content 102 determined by the auditory event detection unit 130, modify (e.g., decrease, increase, etc.) the audio level of any speech/dialogue included with the plurality of portions of the audio content 102 in relation to an audio level associated with any auditory event, such as an auditory event of the distribution of auditory events 134.).
As per claim 19 (cancelled), the non-transitory, computer readable storage medium according to claim 16, wherein said selecting comprises selecting the candidate sound recording in dependence upon a degree of match between the candidate sound recording and the input data, and further in dependence upon a difference between an audio property of the candidate sound recording and a corresponding audio property of the input data (as per the claim 17 rejection).
As per claim 20, the non-transitory, computer readable storage medium according to claim 19, wherein the data processing method further comprises: modifying the audio property of the candidate sound recording in dependence upon the corresponding audio property of the input data when the difference between the audio property of the candidate sound recording and the corresponding audio property of the input data is greater than a threshold amount (per claim 18 rejection).

As per claims 21,22,23, per para 48 younessian, the auditory event repository 150 may store any number/quantity of auditory events (e.g., stored as raw waveforms,), where raw waveforms stored digitally, comprise amplitude values which by definition comprise audible frequencies/pitches, where each pitch is the average of a higher and a lower pitch value.

Response to Arguments

The submitted arguments have been considered but are moot in view of the new grounds of rejection.

	
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALEXANDER KRZYSTAN whose telephone number is 571-272-7498, and whose email address is alexander.krzystan@uspto.gov

The examiner can usually be reached on m-f 7:30-4:00 est.
If attempts to reach the examiner by telephone or email are unsuccessful, the examiner’s supervisor, Fan Tsang can be reached on (571) 272-7547.  

The fax phone numbers for the organization where this application or proceeding is assigned are 571-273-8300 for regular communications and 571-273-8300 for After Final communications.
/ALEXANDER KRZYSTAN/Primary Examiner, Art Unit 2653                                                                                                                                                                                                        
Examiner Alexander Krzystan
April 14, 2026

Read full office action

Prosecution Timeline

Show 9 earlier events

Dec 17, 2025

Response Filed

Jan 09, 2026

Final Rejection mailed — §103

Mar 26, 2026

Interview Requested

Apr 02, 2026

Examiner Interview Summary

Apr 02, 2026

Applicant Interview (Telephonic)

Apr 07, 2026

Request for Continued Examination

Apr 11, 2026

Response after Non-Final Action

Apr 14, 2026

Non-Final Rejection (signed) — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/097,611

Patent 12632213

AUDIO CONNECTION SELECTIONS BASED ON AUDIO DROP PREDICTIONS

3y 4m to grant Granted May 19, 2026

18/286,841

Patent 12598440

RENDERING OF OCCLUDED AUDIO ELEMENTS

2y 5m to grant Granted Apr 07, 2026

18/486,764

Patent 12593170

SWITCHING METHOD FOR AUDIO OUTPUT CHANNEL, AND DISPLAY DEVICE

2y 5m to grant Granted Mar 31, 2026

18/314,713

Patent 12573410

DECODER, ENCODER, AND METHOD FOR INFORMED LOUDNESS ESTIMATION IN OBJECT-BASED AUDIO CODING SYSTEMS

2y 10m to grant Granted Mar 10, 2026

18/397,683

Patent 12574675

Acoustic Device and Method

2y 2m to grant Granted Mar 10, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

6-7

Expected OA Rounds

81%

Grant Probability

89%

With Interview (+7.3%)

2y 12m (~0m remaining)

Median Time to Grant

High

PTA Risk

Based on 1123 resolved cases by this examiner. Grant probability derived from career allowance rate.