Last updated: April 19, 2026

Application No. 17/744,138

VOCAL RECORDING AND RE-CREATION

Non-Final OA §103

Filed

May 13, 2022

Examiner

OPSASNICK, MICHAEL N

Art Unit

2658

Tech Center

2600 — Communications

Assignee

Sony Interactive Entertainment Inc.

OA Round

5 (Non-Final)

Interview Optional

— +10.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 900 resolved cases, 2023–2026

Examiner Intelligence

OPSASNICK, MICHAEL N View full profile →

Grants 82% — above average

Career Allow Rate

737 granted / 900 resolved

+19.9% vs TC avg

Moderate +10% lift

Without

With

+10.5%

Interview Lift

resolved cases with interview

Typical timeline

3y 3m

Avg Prosecution

46 currently pending

Career history

946

Total Applications

across all art units

Statute-Specific Performance

§101

17.7%

-22.3% vs TC avg

§103

33.0%

-7.0% vs TC avg

§102

29.9%

-10.1% vs TC avg

§112

6.3%

-33.7% vs TC avg

Black line = Tech Center average estimate • Based on career data from 900 resolved cases

Office Action

§103

Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114

A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 01/08/2026 has been entered.
 
Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim(s) 1,2,4-7,9-13,15,16 are rejected under 35 U.S.C. 103 as being unpatentable over Wang (GB 2571853) in view of Zeng et al (20210073526) in further view of Patel et al (20120089396).

As per claim 1, Wang (GB 2571853) teaches a method for recreating audio (pp 8, first paragraph, “The speech message can be played…”, comprising: 
recording an audio generated by a user at a first device (as acquiring user’s speech signal by the speech acquisition module – middle of pp 6, “The mobile voice terminal….”); 
processing the analog signal of the audio of the user to generate digital data by converting speech to text (as, preprocessing the speech signals into text – middle of pp 6, “speech recognition module aimed at converting….text message..” and to identify one or more characteristics capturing a first emotion and a verbal expression of the user (as, deriving emotion and expressions from the input speech – middle pp 6, “extraction module of speech’s emotional characteristic parameters intended to extract the parameters with emotional characteristics in pre-process speech signal…”, 
metadata of the audio comprising the one or more characteristics (middle of pp 6 – “transforming text message into control commands an parameters”;
transforming the metadata based Wang (GB 2571853), as verifying the emotions and expressions based on the input speech – see pp 7, middle “The query command is aimed at checking….plus the model bases of virtual characters emotional expression and action …are consistent..”) while generating the audio, the facial expressions captured by an image capturing device that is coupled to the first device (Wang (GB 2571853), the emotional expressions and actions can be displayed by display unit – pp 8, lines 1-5).
 and generating, data packets by compressing the text and the first device (as packet transmission – as speech broadcasting module, sending metadata – pp 7, first third, reflecting back on the expressions and actions loaded and sent through the external servers – pp6, last 6 lines), 
the text and the 
the re-created audio presenting the emotion and verbal expressions expressed by the user at the first device (as, the emotional expression and actions can be displayed by the display unit – pp 7 last 6 lines to pp 8 line 5).
As per claim 1, Wang (GB 2571853) discusses the synchronizing of a virtual character’s experience; Zeng et al (20210073526) teaches extraction of emotional information, from the information of the user (in fact, Zeng et al teaches extraction of emotional information from multiple modes – visual, audio, and text – see para 0053; and further, are compared for verification through a fusing process based on semantic meaning and maximized offset – para 0053, and in further detail para 0054, and the like).   Therefore, it would have been obvious to one of ordinary skill in the art of using stored facial-audiotext emotional relationships, and combining the results of multimodal source of emotion detection, to add this feature to the processing of Wang (GB 2571853), because it would advantageously improve upon the accuracy of the detected emotion of the participants(Zeng et al (20210073526) – para 0049).  

Furthermore, the combination of Wang (GB 2571853) in view of Zeng et al (20210073526) discusses the claim limitations from which these claims depend, as mapped above, as well as converting speech to text, deriving expressions and emotions from these conversion, and transmitted to be re-created on a display or virtual environment; however, does not explicitly teach further details of the speech signal processing tied into a persona of the user.  Patel et al (20120089396) teaches specific focus on speech parameter processing based on the language type (para 0130, processing acoustic cues at differing frequency/tones tied to the emotion); tuning the characteristics based on the desired results (preferences set by choosing an emotional category closest to the sample – see para 0121, and choosing/selecting the percentage that is ‘close enough’ – end of para 0121); speech characteristics being any one of speech, pitch, spacing, volume, etc. tied to the emotions and verbal expressions – para 0030, disclosing fundamental frequency, pitch, intensity, loudness, speaker rate, etc. etc.  Therefore, it would have been obvious to one of ordinary skill in the art of emotion detection/extraction, from speech information, to enhance the system of Wang (GB 2571853) in view of Zeng et al (20210073526) with the further processing of speech characteristics as taught by Patel et al (20120089396) because it would advantageously improve upon the end-user understanding/perceptions (see Patel et al (20120089396), para 0102 – see perceptual improvements in the listed categories).

	As per claims 2,4-6, the combination of Wang (GB 2571853) in view of Zeng et al (20210073526) in further view of Patel et al (20120089396) teaches specific focus on speech parameter processing based on the language type ((claim 2) – see in Patel et al,  para 0130, processing acoustic cues at differing frequency/tones tied to the emotion); tuning the characteristics based on the desired results (claims 4,5 – see in Patel et al, preferences set by choosing an emotional category closest to the sample – see para 0121, and choosing/selecting the percentage that is ‘close enough’ – end of para 0121); speech characteristics being any one of speech, pitch, spacing, volume, etc. tied to the emotions and verbal expressions – ((claim 6) – see in Patel et al,  para 0030, disclosing fundamental frequency, pitch, intensity, loudness, speaker rate, etc. etc.   

As per claim 7, the combination of Wang (GB 2571853) in view of Zeng et al (20210073526) teaches the method of claim 1, wherein the first device is a first laptop computing device or a first mobile computing device, and wherein the second device is a server computing device or a cloud server computing device or a game console or a second laptop computing device or a second mobile computing device (Wang (GB 2571853), as, page 6, lines 15-20 – mobile voice terminal, and a virtual environment terminal, external server, and p 5 top – multi-user networked device, and bottom of pp 6 – virtual reality device”).

Claims 9-12 are system claims that perform the commonly shared steps of method claims 1,3,7,8 above and as such, claims 9-12  are similar in scope and content to claims 1,7,8; therefore, claims 9-12 are rejected under similar rationale as presented against claims 1,7,8 above.  Furthermore, the location of the codec in the above system claims, are met by the recitation to the various devices, as mapped against claims 1,3,7,8 above.

Claims 13,15,16 are system claims that perform the commonly shared steps of method claims 2,4-6 above and as such, claims 13,15,16 are similar in scope and content to claims 2,4-6 above; therefore, claims 13,15,16 are rejected under similar rationale as presented against claims 2,4-6 above.  Furthermore, the location of the codec in the above system claims, are met by the recitation to the various devices, as mapped against claims 1,7 above.


Claim(s) 1,17,18 are rejected under 35 U.S.C. 103 as being unpatentable over Wang (GB 2571853) in view of Zeng et al (20210073526) in further view of Mennicken et al (20210104220).

As per claim 1, Wang (GB 2571853) teaches a method for recreating audio (pp 8, first paragraph, “The speech message can be played…”, comprising: 
recording an audio generated by a user at a first device (as acquiring user’s speech signal by the speech acquisition module – middle of pp 6, “The mobile voice terminal….”); 
processing the analog signal of the audio of the user to generate digital data by converting speech to text (as, preprocessing the speech signals into text – middle of pp 6, “speech recognition module aimed at converting….text message..” and to identify one or more characteristics capturing a first emotion and a verbal expression of the user (as, deriving emotion and expressions from the input speech – middle pp 6, “extraction module of speech’s emotional characteristic parameters intended to extract the parameters with emotional characteristics in pre-process speech signal…”, 
metadata of the audio comprising the one or more characteristics (middle of pp 6 – “transforming text message into control commands an parameters”;
transforming the metadata based Wang (GB 2571853), as verifying the emotions and expressions based on the input speech – see pp 7, middle “The query command is aimed at checking….plus the model bases of virtual characters emotional expression and action …are consistent..”) while generating the audio, the facial expressions captured by an image capturing device that is coupled to the first device (Wang (GB 2571853), the emotional expressions and actions can be displayed by display unit – pp 8, lines 1-5).
 and generating, data packets by compressing the text and the 
the text and the 
the re-created audio presenting the emotion and verbal expressions expressed by the user at the first device (as, the emotional expression and actions can be displayed by the display unit – pp 7 last 6 lines to pp 8 line 5).
As per claim 1, Wang (GB 2571853) discusses the synchronizing of a virtual character’s experience; Zeng et al (20210073526) teaches extraction of emotional information, from the information of the user (in fact, Zeng et al teaches extraction of emotional information from multiple modes – visual, audio, and text – see para 0053; and further, are compared for verification through a fusing process based on semantic meaning and maximized offset – para 0053, and in further detail para 0054, and the like).   Therefore, it would have been obvious to one of ordinary skill in the art of using stored facial-audiotext emotional relationships, and combining the results of multimodal source of emotion detection, to add this feature to the processing of Wang (GB 2571853), because it would advantageously improve upon the accuracy of the detected emotion of the participants(Zeng et al (20210073526) – para 0049).  

Furthermore, the combination of Wang (GB 2571853) in view of Zeng et al (20210073526) discusses converting speech to text, deriving expressions and emotions from these conversion, and transmitted to be re-created on a display or virtual environment; however, does not explicitly teach further details of the speech signal processing tied into a persona of the user.  Mennicken et al (20210104220) teaches specific focus on speech parameter processing based on the language type (para 0082) or type of gender voice, and mood/emotion selection – para 0082).  Therefore, it would have been obvious to one of ordinary skill in the art of emotion detection/extraction, from speech information, to enhance the system of Wang (GB 2571853) in view of Zeng et al (20210073526) with the further processing of speech characteristics as taught by Mennicken et al (20210104220) because it would advantageously improve upon the end-user understanding/perceptions with a user-selectable voice for multiple users(see Mennicken et al (20210104220), para 0082).

	As per claims 17,18, the combination of Wang (GB 2571853) in view of Zeng et al (20210073526) in further view of Mennicken et al (20210104220) teaches the selection of a persona from a plurality of personas based on the user -- Mennicken et al (20210104220) – see para 0082, wherein the user-selected characteristics to be used for specific users – end of para 0082.  Furthermore, Mennicken et al (20210104220) teaches transformation of the speech characteristics – such as pitch, tempo, etc. – para 007; operating on the metadata Wang (GB 2571853) -- (middle of pp 6 – “transforming text message into control commands an parameters”.

Response to Arguments

Applicant’s arguments with respect to claim(s) have been considered but are moot because the new ground of rejection refer to new citations/combinations not previously presented.  Examiner notes the use of the Mennicken et al (20210104220) teaching the choice of output speech ‘personas’, operating on speech characteristics, in combination with Wang (GB 2571853), operating in metadata information.  

Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Please note the reference cited on the PTO-892 form.
In further detail, examiner notes the following references pertaining to applicants spec/claim scope:

Iwase et al (20200051545) teaches user selectable tones/types – para 0081
Yamagami et al (20090259475) teaches user selectable voice quality changes – para 0162
  
Sohn et al (20140022370) teaches storing of facial-emotion relationships for predicting from audio – para 0016
   
Socolof et al (20210097468) teaches analysis of sentiment clusters during a user interaction (para 0035, 0078).
	
Ahn et al (20140093849) teaches analysis and selection of estimated emotions from a collection of emotion vectors (para 0006).

Kang et al (20100121804) teaches estimating emotion using emotion vector parameters for comparison (para 0019).

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Michael Opsasnick, telephone number (571)272-7623, who is available Monday-Friday, 9am-5pm. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Mr. Richemond Dorvil, can be reached at (571)272-7602.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).

/Michael N Opsasnick/Primary Examiner, Art Unit 2658                                                                                                                                                                                                        01/24/2026

Read full office action

Prosecution Timeline

May 13, 2022

Application Filed

Mar 22, 2024

Non-Final Rejection — §103

Jun 27, 2024

Response Filed

Oct 13, 2024

Final Rejection — §103

Nov 17, 2024

Request for Continued Examination

Nov 20, 2024

Response after Non-Final Action

Nov 22, 2024

Non-Final Rejection — §103

Mar 25, 2025

Interview Requested

Apr 03, 2025

Applicant Interview (Telephonic)

Apr 10, 2025

Examiner Interview Summary

May 09, 2025

Response Filed

Aug 06, 2025

Final Rejection — §103

Jan 08, 2026

Request for Continued Examination

Jan 23, 2026

Response after Non-Final Action

Jan 24, 2026

Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/512,723

Patent 12602554

SYSTEMS AND METHODS FOR PRODUCING RELIABLE TRANSLATION IN NEAR REAL-TIME

2y 5m to grant Granted Apr 14, 2026

17/698,029

Patent 12592246

SYSTEM AND METHOD FOR EXTRACTING HIDDEN CUES IN INTERACTIVE COMMUNICATIONS

2y 5m to grant Granted Mar 31, 2026

18/367,779

Patent 12586580

System For Recognizing and Responding to Environmental Noises

2y 5m to grant Granted Mar 24, 2026

18/344,007

Patent 12579995

Automatic Speech Recognition Accuracy With Multimodal Embeddings Search

2y 5m to grant Granted Mar 17, 2026

18/273,354

Patent 12567432

VOICE SIGNAL ESTIMATION METHOD AND APPARATUS USING ATTENTION MECHANISM

2y 5m to grant Granted Mar 03, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

5-6

Expected OA Rounds

82%

Grant Probability

92%

With Interview (+10.5%)

3y 3m

Median Time to Grant

High

PTA Risk

Based on 900 resolved cases by this examiner. Grant probability derived from career allow rate.