Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 01/08/2026 has been entered.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1,2,4-7,9-13,15,16 are rejected under 35 U.S.C. 103 as being unpatentable over Wang (GB 2571853) in view of Zeng et al (20210073526) in further view of Patel et al (20120089396).
As per claim 1, Wang (GB 2571853) teaches a method for recreating audio (pp 8, first paragraph, “The speech message can be played…”, comprising:
recording an audio generated by a user at a first device (as acquiring user’s speech signal by the speech acquisition module – middle of pp 6, “The mobile voice terminal….”);
processing the analog signal of the audio of the user to generate digital data by converting speech to text (as, preprocessing the speech signals into text – middle of pp 6, “speech recognition module aimed at converting….text message..” and to identify one or more characteristics capturing a first emotion and a verbal expression of the user (as, deriving emotion and expressions from the input speech – middle pp 6, “extraction module of speech’s emotional characteristic parameters intended to extract the parameters with emotional characteristics in pre-process speech signal…”,
metadata of the audio comprising the one or more characteristics (middle of pp 6 – “transforming text message into control commands an parameters”;
transforming the metadata based Wang (GB 2571853), as verifying the emotions and expressions based on the input speech – see pp 7, middle “The query command is aimed at checking….plus the model bases of virtual characters emotional expression and action …are consistent..”) while generating the audio, the facial expressions captured by an image capturing device that is coupled to the first device (Wang (GB 2571853), the emotional expressions and actions can be displayed by display unit – pp 8, lines 1-5).
and generating, data packets by compressing the text and the first device (as packet transmission – as speech broadcasting module, sending metadata – pp 7, first third, reflecting back on the expressions and actions loaded and sent through the external servers – pp6, last 6 lines),
the text and the
the re-created audio presenting the emotion and verbal expressions expressed by the user at the first device (as, the emotional expression and actions can be displayed by the display unit – pp 7 last 6 lines to pp 8 line 5).
As per claim 1, Wang (GB 2571853) discusses the synchronizing of a virtual character’s experience; Zeng et al (20210073526) teaches extraction of emotional information, from the information of the user (in fact, Zeng et al teaches extraction of emotional information from multiple modes – visual, audio, and text – see para 0053; and further, are compared for verification through a fusing process based on semantic meaning and maximized offset – para 0053, and in further detail para 0054, and the like). Therefore, it would have been obvious to one of ordinary skill in the art of using stored facial-audiotext emotional relationships, and combining the results of multimodal source of emotion detection, to add this feature to the processing of Wang (GB 2571853), because it would advantageously improve upon the accuracy of the detected emotion of the participants(Zeng et al (20210073526) – para 0049).
Furthermore, the combination of Wang (GB 2571853) in view of Zeng et al (20210073526) discusses the claim limitations from which these claims depend, as mapped above, as well as converting speech to text, deriving expressions and emotions from these conversion, and transmitted to be re-created on a display or virtual environment; however, does not explicitly teach further details of the speech signal processing tied into a persona of the user. Patel et al (20120089396) teaches specific focus on speech parameter processing based on the language type (para 0130, processing acoustic cues at differing frequency/tones tied to the emotion); tuning the characteristics based on the desired results (preferences set by choosing an emotional category closest to the sample – see para 0121, and choosing/selecting the percentage that is ‘close enough’ – end of para 0121); speech characteristics being any one of speech, pitch, spacing, volume, etc. tied to the emotions and verbal expressions – para 0030, disclosing fundamental frequency, pitch, intensity, loudness, speaker rate, etc. etc. Therefore, it would have been obvious to one of ordinary skill in the art of emotion detection/extraction, from speech information, to enhance the system of Wang (GB 2571853) in view of Zeng et al (20210073526) with the further processing of speech characteristics as taught by Patel et al (20120089396) because it would advantageously improve upon the end-user understanding/perceptions (see Patel et al (20120089396), para 0102 – see perceptual improvements in the listed categories).
As per claims 2,4-6, the combination of Wang (GB 2571853) in view of Zeng et al (20210073526) in further view of Patel et al (20120089396) teaches specific focus on speech parameter processing based on the language type ((claim 2) – see in Patel et al, para 0130, processing acoustic cues at differing frequency/tones tied to the emotion); tuning the characteristics based on the desired results (claims 4,5 – see in Patel et al, preferences set by choosing an emotional category closest to the sample – see para 0121, and choosing/selecting the percentage that is ‘close enough’ – end of para 0121); speech characteristics being any one of speech, pitch, spacing, volume, etc. tied to the emotions and verbal expressions – ((claim 6) – see in Patel et al, para 0030, disclosing fundamental frequency, pitch, intensity, loudness, speaker rate, etc. etc.
As per claim 7, the combination of Wang (GB 2571853) in view of Zeng et al (20210073526) teaches the method of claim 1, wherein the first device is a first laptop computing device or a first mobile computing device, and wherein the second device is a server computing device or a cloud server computing device or a game console or a second laptop computing device or a second mobile computing device (Wang (GB 2571853), as, page 6, lines 15-20 – mobile voice terminal, and a virtual environment terminal, external server, and p 5 top – multi-user networked device, and bottom of pp 6 – virtual reality device”).
Claims 9-12 are system claims that perform the commonly shared steps of method claims 1,3,7,8 above and as such, claims 9-12 are similar in scope and content to claims 1,7,8; therefore, claims 9-12 are rejected under similar rationale as presented against claims 1,7,8 above. Furthermore, the location of the codec in the above system claims, are met by the recitation to the various devices, as mapped against claims 1,3,7,8 above.
Claims 13,15,16 are system claims that perform the commonly shared steps of method claims 2,4-6 above and as such, claims 13,15,16 are similar in scope and content to claims 2,4-6 above; therefore, claims 13,15,16 are rejected under similar rationale as presented against claims 2,4-6 above. Furthermore, the location of the codec in the above system claims, are met by the recitation to the various devices, as mapped against claims 1,7 above.
Claim(s) 1,17,18 are rejected under 35 U.S.C. 103 as being unpatentable over Wang (GB 2571853) in view of Zeng et al (20210073526) in further view of Mennicken et al (20210104220).
As per claim 1, Wang (GB 2571853) teaches a method for recreating audio (pp 8, first paragraph, “The speech message can be played…”, comprising:
recording an audio generated by a user at a first device (as acquiring user’s speech signal by the speech acquisition module – middle of pp 6, “The mobile voice terminal….”);
processing the analog signal of the audio of the user to generate digital data by converting speech to text (as, preprocessing the speech signals into text – middle of pp 6, “speech recognition module aimed at converting….text message..” and to identify one or more characteristics capturing a first emotion and a verbal expression of the user (as, deriving emotion and expressions from the input speech – middle pp 6, “extraction module of speech’s emotional characteristic parameters intended to extract the parameters with emotional characteristics in pre-process speech signal…”,
metadata of the audio comprising the one or more characteristics (middle of pp 6 – “transforming text message into control commands an parameters”;
transforming the metadata based Wang (GB 2571853), as verifying the emotions and expressions based on the input speech – see pp 7, middle “The query command is aimed at checking….plus the model bases of virtual characters emotional expression and action …are consistent..”) while generating the audio, the facial expressions captured by an image capturing device that is coupled to the first device (Wang (GB 2571853), the emotional expressions and actions can be displayed by display unit – pp 8, lines 1-5).
and generating, data packets by compressing the text and the
the text and the
the re-created audio presenting the emotion and verbal expressions expressed by the user at the first device (as, the emotional expression and actions can be displayed by the display unit – pp 7 last 6 lines to pp 8 line 5).
As per claim 1, Wang (GB 2571853) discusses the synchronizing of a virtual character’s experience; Zeng et al (20210073526) teaches extraction of emotional information, from the information of the user (in fact, Zeng et al teaches extraction of emotional information from multiple modes – visual, audio, and text – see para 0053; and further, are compared for verification through a fusing process based on semantic meaning and maximized offset – para 0053, and in further detail para 0054, and the like). Therefore, it would have been obvious to one of ordinary skill in the art of using stored facial-audiotext emotional relationships, and combining the results of multimodal source of emotion detection, to add this feature to the processing of Wang (GB 2571853), because it would advantageously improve upon the accuracy of the detected emotion of the participants(Zeng et al (20210073526) – para 0049).
Furthermore, the combination of Wang (GB 2571853) in view of Zeng et al (20210073526) discusses converting speech to text, deriving expressions and emotions from these conversion, and transmitted to be re-created on a display or virtual environment; however, does not explicitly teach further details of the speech signal processing tied into a persona of the user. Mennicken et al (20210104220) teaches specific focus on speech parameter processing based on the language type (para 0082) or type of gender voice, and mood/emotion selection – para 0082). Therefore, it would have been obvious to one of ordinary skill in the art of emotion detection/extraction, from speech information, to enhance the system of Wang (GB 2571853) in view of Zeng et al (20210073526) with the further processing of speech characteristics as taught by Mennicken et al (20210104220) because it would advantageously improve upon the end-user understanding/perceptions with a user-selectable voice for multiple users(see Mennicken et al (20210104220), para 0082).
As per claims 17,18, the combination of Wang (GB 2571853) in view of Zeng et al (20210073526) in further view of Mennicken et al (20210104220) teaches the selection of a persona from a plurality of personas based on the user -- Mennicken et al (20210104220) – see para 0082, wherein the user-selected characteristics to be used for specific users – end of para 0082. Furthermore, Mennicken et al (20210104220) teaches transformation of the speech characteristics – such as pitch, tempo, etc. – para 007; operating on the metadata Wang (GB 2571853) -- (middle of pp 6 – “transforming text message into control commands an parameters”.
Response to Arguments
Applicant’s arguments with respect to claim(s) have been considered but are moot because the new ground of rejection refer to new citations/combinations not previously presented. Examiner notes the use of the Mennicken et al (20210104220) teaching the choice of output speech ‘personas’, operating on speech characteristics, in combination with Wang (GB 2571853), operating in metadata information.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Please note the reference cited on the PTO-892 form.
In further detail, examiner notes the following references pertaining to applicants spec/claim scope:
Iwase et al (20200051545) teaches user selectable tones/types – para 0081
Yamagami et al (20090259475) teaches user selectable voice quality changes – para 0162
Sohn et al (20140022370) teaches storing of facial-emotion relationships for predicting from audio – para 0016
Socolof et al (20210097468) teaches analysis of sentiment clusters during a user interaction (para 0035, 0078).
Ahn et al (20140093849) teaches analysis and selection of estimated emotions from a collection of emotion vectors (para 0006).
Kang et al (20100121804) teaches estimating emotion using emotion vector parameters for comparison (para 0019).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Michael Opsasnick, telephone number (571)272-7623, who is available Monday-Friday, 9am-5pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Mr. Richemond Dorvil, can be reached at (571)272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
/Michael N Opsasnick/Primary Examiner, Art Unit 2658 01/24/2026