Prosecution Insights
Last updated: April 19, 2026
Application No. 17/744,138

VOCAL RECORDING AND RE-CREATION

Non-Final OA §103
Filed
May 13, 2022
Examiner
OPSASNICK, MICHAEL N
Art Unit
2658
Tech Center
2600 — Communications
Assignee
Sony Interactive Entertainment Inc.
OA Round
5 (Non-Final)
82%
Grant Probability
Favorable
5-6
OA Rounds
3y 3m
To Grant
92%
With Interview

Examiner Intelligence

Grants 82% — above average
82%
Career Allow Rate
737 granted / 900 resolved
+19.9% vs TC avg
Moderate +10% lift
Without
With
+10.5%
Interview Lift
resolved cases with interview
Typical timeline
3y 3m
Avg Prosecution
46 currently pending
Career history
946
Total Applications
across all art units

Statute-Specific Performance

§101
17.7%
-22.3% vs TC avg
§103
33.0%
-7.0% vs TC avg
§102
29.9%
-10.1% vs TC avg
§112
6.3%
-33.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 900 resolved cases

Office Action

§103
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Continued Examination Under 37 CFR 1.114 A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 01/08/2026 has been entered. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claim(s) 1,2,4-7,9-13,15,16 are rejected under 35 U.S.C. 103 as being unpatentable over Wang (GB 2571853) in view of Zeng et al (20210073526) in further view of Patel et al (20120089396). As per claim 1, Wang (GB 2571853) teaches a method for recreating audio (pp 8, first paragraph, “The speech message can be played…”, comprising: recording an audio generated by a user at a first device (as acquiring user’s speech signal by the speech acquisition module – middle of pp 6, “The mobile voice terminal….”); processing the analog signal of the audio of the user to generate digital data by converting speech to text (as, preprocessing the speech signals into text – middle of pp 6, “speech recognition module aimed at converting….text message..” and to identify one or more characteristics capturing a first emotion and a verbal expression of the user (as, deriving emotion and expressions from the input speech – middle pp 6, “extraction module of speech’s emotional characteristic parameters intended to extract the parameters with emotional characteristics in pre-process speech signal…”, metadata of the audio comprising the one or more characteristics (middle of pp 6 – “transforming text message into control commands an parameters”; transforming the metadata based Wang (GB 2571853), as verifying the emotions and expressions based on the input speech – see pp 7, middle “The query command is aimed at checking….plus the model bases of virtual characters emotional expression and action …are consistent..”) while generating the audio, the facial expressions captured by an image capturing device that is coupled to the first device (Wang (GB 2571853), the emotional expressions and actions can be displayed by display unit – pp 8, lines 1-5). and generating, data packets by compressing the text and the first device (as packet transmission – as speech broadcasting module, sending metadata – pp 7, first third, reflecting back on the expressions and actions loaded and sent through the external servers – pp6, last 6 lines), the text and the the re-created audio presenting the emotion and verbal expressions expressed by the user at the first device (as, the emotional expression and actions can be displayed by the display unit – pp 7 last 6 lines to pp 8 line 5). As per claim 1, Wang (GB 2571853) discusses the synchronizing of a virtual character’s experience; Zeng et al (20210073526) teaches extraction of emotional information, from the information of the user (in fact, Zeng et al teaches extraction of emotional information from multiple modes – visual, audio, and text – see para 0053; and further, are compared for verification through a fusing process based on semantic meaning and maximized offset – para 0053, and in further detail para 0054, and the like). Therefore, it would have been obvious to one of ordinary skill in the art of using stored facial-audiotext emotional relationships, and combining the results of multimodal source of emotion detection, to add this feature to the processing of Wang (GB 2571853), because it would advantageously improve upon the accuracy of the detected emotion of the participants(Zeng et al (20210073526) – para 0049). Furthermore, the combination of Wang (GB 2571853) in view of Zeng et al (20210073526) discusses the claim limitations from which these claims depend, as mapped above, as well as converting speech to text, deriving expressions and emotions from these conversion, and transmitted to be re-created on a display or virtual environment; however, does not explicitly teach further details of the speech signal processing tied into a persona of the user. Patel et al (20120089396) teaches specific focus on speech parameter processing based on the language type (para 0130, processing acoustic cues at differing frequency/tones tied to the emotion); tuning the characteristics based on the desired results (preferences set by choosing an emotional category closest to the sample – see para 0121, and choosing/selecting the percentage that is ‘close enough’ – end of para 0121); speech characteristics being any one of speech, pitch, spacing, volume, etc. tied to the emotions and verbal expressions – para 0030, disclosing fundamental frequency, pitch, intensity, loudness, speaker rate, etc. etc. Therefore, it would have been obvious to one of ordinary skill in the art of emotion detection/extraction, from speech information, to enhance the system of Wang (GB 2571853) in view of Zeng et al (20210073526) with the further processing of speech characteristics as taught by Patel et al (20120089396) because it would advantageously improve upon the end-user understanding/perceptions (see Patel et al (20120089396), para 0102 – see perceptual improvements in the listed categories). As per claims 2,4-6, the combination of Wang (GB 2571853) in view of Zeng et al (20210073526) in further view of Patel et al (20120089396) teaches specific focus on speech parameter processing based on the language type ((claim 2) – see in Patel et al, para 0130, processing acoustic cues at differing frequency/tones tied to the emotion); tuning the characteristics based on the desired results (claims 4,5 – see in Patel et al, preferences set by choosing an emotional category closest to the sample – see para 0121, and choosing/selecting the percentage that is ‘close enough’ – end of para 0121); speech characteristics being any one of speech, pitch, spacing, volume, etc. tied to the emotions and verbal expressions – ((claim 6) – see in Patel et al, para 0030, disclosing fundamental frequency, pitch, intensity, loudness, speaker rate, etc. etc. As per claim 7, the combination of Wang (GB 2571853) in view of Zeng et al (20210073526) teaches the method of claim 1, wherein the first device is a first laptop computing device or a first mobile computing device, and wherein the second device is a server computing device or a cloud server computing device or a game console or a second laptop computing device or a second mobile computing device (Wang (GB 2571853), as, page 6, lines 15-20 – mobile voice terminal, and a virtual environment terminal, external server, and p 5 top – multi-user networked device, and bottom of pp 6 – virtual reality device”). Claims 9-12 are system claims that perform the commonly shared steps of method claims 1,3,7,8 above and as such, claims 9-12 are similar in scope and content to claims 1,7,8; therefore, claims 9-12 are rejected under similar rationale as presented against claims 1,7,8 above. Furthermore, the location of the codec in the above system claims, are met by the recitation to the various devices, as mapped against claims 1,3,7,8 above. Claims 13,15,16 are system claims that perform the commonly shared steps of method claims 2,4-6 above and as such, claims 13,15,16 are similar in scope and content to claims 2,4-6 above; therefore, claims 13,15,16 are rejected under similar rationale as presented against claims 2,4-6 above. Furthermore, the location of the codec in the above system claims, are met by the recitation to the various devices, as mapped against claims 1,7 above. Claim(s) 1,17,18 are rejected under 35 U.S.C. 103 as being unpatentable over Wang (GB 2571853) in view of Zeng et al (20210073526) in further view of Mennicken et al (20210104220). As per claim 1, Wang (GB 2571853) teaches a method for recreating audio (pp 8, first paragraph, “The speech message can be played…”, comprising: recording an audio generated by a user at a first device (as acquiring user’s speech signal by the speech acquisition module – middle of pp 6, “The mobile voice terminal….”); processing the analog signal of the audio of the user to generate digital data by converting speech to text (as, preprocessing the speech signals into text – middle of pp 6, “speech recognition module aimed at converting….text message..” and to identify one or more characteristics capturing a first emotion and a verbal expression of the user (as, deriving emotion and expressions from the input speech – middle pp 6, “extraction module of speech’s emotional characteristic parameters intended to extract the parameters with emotional characteristics in pre-process speech signal…”, metadata of the audio comprising the one or more characteristics (middle of pp 6 – “transforming text message into control commands an parameters”; transforming the metadata based Wang (GB 2571853), as verifying the emotions and expressions based on the input speech – see pp 7, middle “The query command is aimed at checking….plus the model bases of virtual characters emotional expression and action …are consistent..”) while generating the audio, the facial expressions captured by an image capturing device that is coupled to the first device (Wang (GB 2571853), the emotional expressions and actions can be displayed by display unit – pp 8, lines 1-5). and generating, data packets by compressing the text and the the text and the the re-created audio presenting the emotion and verbal expressions expressed by the user at the first device (as, the emotional expression and actions can be displayed by the display unit – pp 7 last 6 lines to pp 8 line 5). As per claim 1, Wang (GB 2571853) discusses the synchronizing of a virtual character’s experience; Zeng et al (20210073526) teaches extraction of emotional information, from the information of the user (in fact, Zeng et al teaches extraction of emotional information from multiple modes – visual, audio, and text – see para 0053; and further, are compared for verification through a fusing process based on semantic meaning and maximized offset – para 0053, and in further detail para 0054, and the like). Therefore, it would have been obvious to one of ordinary skill in the art of using stored facial-audiotext emotional relationships, and combining the results of multimodal source of emotion detection, to add this feature to the processing of Wang (GB 2571853), because it would advantageously improve upon the accuracy of the detected emotion of the participants(Zeng et al (20210073526) – para 0049). Furthermore, the combination of Wang (GB 2571853) in view of Zeng et al (20210073526) discusses converting speech to text, deriving expressions and emotions from these conversion, and transmitted to be re-created on a display or virtual environment; however, does not explicitly teach further details of the speech signal processing tied into a persona of the user. Mennicken et al (20210104220) teaches specific focus on speech parameter processing based on the language type (para 0082) or type of gender voice, and mood/emotion selection – para 0082). Therefore, it would have been obvious to one of ordinary skill in the art of emotion detection/extraction, from speech information, to enhance the system of Wang (GB 2571853) in view of Zeng et al (20210073526) with the further processing of speech characteristics as taught by Mennicken et al (20210104220) because it would advantageously improve upon the end-user understanding/perceptions with a user-selectable voice for multiple users(see Mennicken et al (20210104220), para 0082). As per claims 17,18, the combination of Wang (GB 2571853) in view of Zeng et al (20210073526) in further view of Mennicken et al (20210104220) teaches the selection of a persona from a plurality of personas based on the user -- Mennicken et al (20210104220) – see para 0082, wherein the user-selected characteristics to be used for specific users – end of para 0082. Furthermore, Mennicken et al (20210104220) teaches transformation of the speech characteristics – such as pitch, tempo, etc. – para 007; operating on the metadata Wang (GB 2571853) -- (middle of pp 6 – “transforming text message into control commands an parameters”. Response to Arguments Applicant’s arguments with respect to claim(s) have been considered but are moot because the new ground of rejection refer to new citations/combinations not previously presented. Examiner notes the use of the Mennicken et al (20210104220) teaching the choice of output speech ‘personas’, operating on speech characteristics, in combination with Wang (GB 2571853), operating in metadata information. Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Please note the reference cited on the PTO-892 form. In further detail, examiner notes the following references pertaining to applicants spec/claim scope: Iwase et al (20200051545) teaches user selectable tones/types – para 0081 Yamagami et al (20090259475) teaches user selectable voice quality changes – para 0162 Sohn et al (20140022370) teaches storing of facial-emotion relationships for predicting from audio – para 0016 Socolof et al (20210097468) teaches analysis of sentiment clusters during a user interaction (para 0035, 0078). Ahn et al (20140093849) teaches analysis and selection of estimated emotions from a collection of emotion vectors (para 0006). Kang et al (20100121804) teaches estimating emotion using emotion vector parameters for comparison (para 0019). Any inquiry concerning this communication or earlier communications from the examiner should be directed to Michael Opsasnick, telephone number (571)272-7623, who is available Monday-Friday, 9am-5pm. If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Mr. Richemond Dorvil, can be reached at (571)272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). /Michael N Opsasnick/Primary Examiner, Art Unit 2658 01/24/2026
Read full office action

Prosecution Timeline

May 13, 2022
Application Filed
Mar 22, 2024
Non-Final Rejection — §103
Jun 27, 2024
Response Filed
Oct 13, 2024
Final Rejection — §103
Nov 17, 2024
Request for Continued Examination
Nov 20, 2024
Response after Non-Final Action
Nov 22, 2024
Non-Final Rejection — §103
Mar 25, 2025
Interview Requested
Apr 03, 2025
Applicant Interview (Telephonic)
Apr 10, 2025
Examiner Interview Summary
May 09, 2025
Response Filed
Aug 06, 2025
Final Rejection — §103
Jan 08, 2026
Request for Continued Examination
Jan 23, 2026
Response after Non-Final Action
Jan 24, 2026
Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12602554
SYSTEMS AND METHODS FOR PRODUCING RELIABLE TRANSLATION IN NEAR REAL-TIME
2y 5m to grant Granted Apr 14, 2026
Patent 12592246
SYSTEM AND METHOD FOR EXTRACTING HIDDEN CUES IN INTERACTIVE COMMUNICATIONS
2y 5m to grant Granted Mar 31, 2026
Patent 12586580
System For Recognizing and Responding to Environmental Noises
2y 5m to grant Granted Mar 24, 2026
Patent 12579995
Automatic Speech Recognition Accuracy With Multimodal Embeddings Search
2y 5m to grant Granted Mar 17, 2026
Patent 12567432
VOICE SIGNAL ESTIMATION METHOD AND APPARATUS USING ATTENTION MECHANISM
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

5-6
Expected OA Rounds
82%
Grant Probability
92%
With Interview (+10.5%)
3y 3m
Median Time to Grant
High
PTA Risk
Based on 900 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month