Prosecution Insights
Last updated: May 29, 2026
Application No. 18/404,943

INFORMATION PROCESSING DEVICE AND INFORMATION PROCESSING METHOD FOR ARTIFICIAL SPEECH GENERATION

Non-Final OA §103
Filed
Jan 05, 2024
Priority
Jan 12, 2023 — EU 23151301.1
Examiner
NEWAY, SAMUEL G
Art Unit
2657
Tech Center
2600 — Communications
Assignee
Sony Group Corporation
OA Round
2 (Non-Final)
75%
Grant Probability
Favorable
2-3
OA Rounds
8m
Est. Remaining
83%
With Interview

Examiner Intelligence

Grants 75% — above average
75%
Career Allowance Rate
518 granted / 688 resolved
+13.3% vs TC avg
Moderate +8% lift
Without
With
+7.7%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
29 currently pending
Career history
718
Total Applications
across all art units

Statute-Specific Performance

§101
9.7%
-30.3% vs TC avg
§103
66.8%
+26.8% vs TC avg
§102
7.3%
-32.7% vs TC avg
§112
12.1%
-27.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 688 resolved cases

Office Action

§103
DETAILED ACTION This is responsive to the amendment filed 20 November 2025. Claims 1-6, 8-16 and 18-20 are pending and considered below. Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Response to Arguments Applicant’s arguments with respect to claims 1-6, 8-16 and 18-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1-6, 8-16 and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Federico et al. (US 11,545,134) in view of Deyle et al. (US 2018/0077095). Claim 1: Federico discloses an information processing device for generating artificial speech data (Abstract), comprising circuitry configured to: obtain, based on speech data, speech emotional indicators and associated timing data of the emotional indicators (“extract paralinguistic information (e.g. accent, pitch, volume, speech rate, modulation, and fluency) from the source utterances and prosodically aligned text and timing information to use to reproduce an equivalent or at least credible target utterance”, col. 7, lines 28-36, see also col. 9, lines 44-46); obtain, based on the speech data, text data (“speech recognizer 307 receives the utterances and transcribes them into a sequence of words including timing, punctuation and casing information”, col. 6, lines 41-43, see also col. 9, lines 33-36); obtain, based on video data associated with the speech data, video emotional indicators (“the paralanguage modeler 313 utilizes video to create and use … model prosody that is consistent with the visual”, col. 7, lines 33-36, see also “information from the video is used in the generation of prosody information”, col. 9, lines 46-48); associate the speech emotional indicators and the video emotional indicators with the text data based on the associated timing data (“a prosodic aligner 312 temporally aligns the machine translator 309 output with the speech segments of the original audio. As such, the takes prosodic aligner 312 in the utterances (or text from a text file) and translated text (including timing) to match the distribution of words and pauses to generate prosodically aligned translated text and timing (such as splits, etc.). In some embodiments, pre-processed video information is also provided to the prosodic aligner 312 to use in this alignment”, col. 7, lines 5-15); and generate artificial speech data based on the text data, the speech emotional indicators associated with the text data, the video emotional indicators associated with the text data, and the associated timing data (“speech generator 315 creates a speech signal that reproduces a given sentence with a specified timbre and prosody for text by attempting to match a specified time interval as provided by the machine translator. In particular, the speech generator 315 uses a ML speaker model corresponding to a speaker of each segment. The ML speaker model is selected based on the speaker label and then fed the prosody, text, and timing information for a corresponding segment to create a speech signal”, col. 8, lines 7-15, see also “information from the video is used in the generation of prosody information”, col. 9, lines 46-48). Federico does not explicitly disclose that the video emotional indicators comprise at least one of facial expressions, eye direction, gestures, and body pose. In an analogous art similarly generating artificial speech based on video emotional indicators (“adjusting speech output from a text-to-speech processor with inflections indicated by emotional metadata”, [0026]), Deyle discloses that the video emotional indicators comprise at least one of facial expressions, eye direction, gestures, and body pose (“Facial expression recognition module 222 may receive image data captured by one or more camera(s) 212. Such image data may include one or more still images and/or video that includes a user's face. The facial expression recognition module 222 may analyze such image data to detect facial expressions that are indicative of certain emotions. For instance, the facial expression recognition module 222 may detect emotion by analyzing the shape and/or position of a user eye or eye's (e.g., how open or closed the eye(s) are), and/or by analyzing the position of the user's mouth (e.g., smiling, frowning, or neither)”, [0039], see also “Body expression recognition module 224 may also receive image data captured by one or more camera(s) 212. Such image data may include one or more still images and/or video that includes a portion of the user's body (e.g., the upper half of the user's body), and possibly the entirety of the user's body. The body expression recognition module 224 may analyze such image data to detect “body language” that is indicative of certain emotions; certain gestures, movements, and/or positioning of the body or portions thereof, which is characteristic of certain emotions. For instance, certain hand gestures, head movements, arm movements, whole-body movements, and/or stances, may be considered to be indicative of certain emotional states. Accordingly, when such a gesture, movement, and/or positioning is detected, body expression recognition module 224 may generate emotion data that is indicative of the emotion or emotions associated with the detected gesture, movement, and/or positioning”, [0040]). It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention to combine the references to yield the predictable result of determining Federico’s video emotional indicators using at least one of facial expressions, eye direction, gestures, and body pose as disclosed by Deyle because those physical characteristics satisfactorily convey human emotional states (see Deyle, [0039] and [0040]). Claim 2: Federico in view of Deyle discloses the information processing device according to claim 1, wherein the speech emotional indicators are associated with the text data based on the associated timing data, and wherein the generation of the artificial speech data is based on the speech emotional indicators associated with the text data (Federico, col. 7, lines 28-36, see also col. 9, lines 44-46). Claim 3: Federico in view of Deyle discloses the information processing device according to claim 2, wherein the associated timing data are indicative of time intervals (Federico, col. 7, lines 5-11). Claim 4: Federico in view of Deyle discloses the information processing device according to claim 1, wherein the speech emotional indicators are obtained based on speech indicators of the speech data (Federico, col. 7, lines 28-36, see also col. 9, lines 44-46). Claim 5: Federico in view of Deyle discloses the information processing device according to claim 1, wherein the speech emotional indicators are at least one of: inferred emotion, speech tempo, speech pause, emotional pause, speech pitch, speech rhythm (Federico, col. 7, lines 28-36, see also col. 9, lines 44-46). Claim 6: Federico in view of Deyle discloses the information processing device according to claim 1, wherein the circuitry is further configured to: obtain the speech emotional indicators based on an artificial neural network, which is configured to determine the speech emotional indicators based on the speech data (Federico, “a ML-based paralanguage modeler”, col. 7, lines 28-36). Claim 8: Federico in view of Deyle discloses the information processing device according to claim 1, wherein the speech emotional indicators and the video emotional indicators are associated to each other, based on the associated timing data (Federico, col. 7, lines 33-36, see also col. 8, lines 7-15). Claim 9: Federico in view of Deyle discloses the information processing device according to claim 1, wherein the speech data is captured of a speaker (Federico, col. 6, lines 63-67). Claim 10: Federico in view of Deyle discloses the information processing device according to claim 1, wherein the speech data is captured of multiple speakers (Federico, col. 6, lines 63-67), and wherein the emotional speech indicators are obtained associated with each speaker based on the video data, and wherein the generation of the artificial speech is based on the speech emotional indicators associated with one speaker of the multiple speakers (Federico, col. 6, lines 63-67, see also col. 7, lines 33-36). Claims 11-16 and 18-20: Federico in view of Deyle discloses an information processing method for generating artificial speech data, comprising the steps performed by the information processing device of claims 1-6 and 8-10 as shown above. Conclusion Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to SAMUEL G NEWAY whose telephone number is (571)270-1058. The examiner can normally be reached Monday-Friday 9:00am-5:00pm EST. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached at 571-272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /SAMUEL G NEWAY/Primary Examiner, Art Unit 2657
Read full office action

Prosecution Timeline

Show 2 earlier events
Sep 29, 2025
Interview Requested
Oct 14, 2025
Examiner Interview Summary
Oct 14, 2025
Applicant Interview (Telephonic)
Nov 20, 2025
Response Filed
Jan 30, 2026
Final Rejection mailed — §103
Mar 27, 2026
Response after Non-Final Action
Apr 16, 2026
Request for Continued Examination
Apr 19, 2026
Response after Non-Final Action

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12619834
SYSTEMS AND METHODS FOR INTENT CLASSIFICATION IN A NATURAL LANGUAGE PROCESSING AGENT
3y 3m to grant Granted May 05, 2026
Patent 12613789
ARTIFICIAL INTELLIGENCE BASED GENERATION OF DATA CONNECTORS
2y 6m to grant Granted Apr 28, 2026
Patent 12608561
STRUCTURED DOCUMENT GENERATION USING DOCUMENT-SCALE EMBEDDINGS
2y 2m to grant Granted Apr 21, 2026
Patent 12608554
Method And System For Understanding Medical Chinese Spoken Language, Electronic Device, And Storage Medium
1y 10m to grant Granted Apr 21, 2026
Patent 12602538
METHOD AND SYSTEM FOR EXEMPLAR LEARNING FOR TEMPLATIZING DOCUMENTS ACROSS DATA SOURCES
3y 4m to grant Granted Apr 14, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

2-3
Expected OA Rounds
75%
Grant Probability
83%
With Interview (+7.7%)
3y 0m (~8m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 688 resolved cases by this examiner. Grant probability derived from career allowance rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month