Prosecution Insights
Last updated: April 19, 2026
Application No. 18/708,633

Method and System for Generating an Explainable Prediction of an Emotion Associated With a Vocal Sample

Non-Final OA §101§103
Filed
May 09, 2024
Examiner
SHIN, SEONG-AH A
Art Unit
2659
Tech Center
2600 — Communications
Assignee
National University Of Singapore
OA Round
1 (Non-Final)
78%
Grant Probability
Favorable
1-2
OA Rounds
2y 9m
To Grant
99%
With Interview

Examiner Intelligence

Grants 78% — above average
78%
Career Allow Rate
321 granted / 409 resolved
+16.5% vs TC avg
Strong +20% interview lift
Without
With
+20.5%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
25 currently pending
Career history
434
Total Applications
across all art units

Statute-Specific Performance

§101
20.8%
-19.2% vs TC avg
§103
45.2%
+5.2% vs TC avg
§102
16.7%
-23.3% vs TC avg
§112
7.1%
-32.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 409 resolved cases

Office Action

§101 §103
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Election/Restrictions During a telephone conversation with Trevor Copeland (Reg. No. 50,292) on 12/9/2025 and 12/18/2025, an election was made without traverse to prosecute the invention of the elected group, Claims 1-8. Affirmation of this election must be made by applicant in replying to this Office action. Claims 9-16 are withdrawn from further consideration by the examiner, 37 CFR 1.142(b), as being drawn to a non-elected invention. Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims 1-8 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The independent claim 1 recites “receiving, by a processing device: a vector representation ( PNG media_image1.png 13 16 media_image1.png Greyscale of an initial prediction (ŷ0) of the emotion associated with the vocal sample (x); a counterfactual synthetic vocal sample (x̃ γ) associated with the vocal sample (x) and an alternate emotion (y) different from the initial prediction (ŷ0) of the emotion; a vector representation (ẑ0 γ) of an emotion prediction PNG media_image2.png 1 1 media_image2.png Greyscale (𝛾̂0) associated with the counterfactual synthetic vocal sample (x̃ γ); vocal cue information PNG media_image3.png 11 36 media_image3.png Greyscale associated with the vocal sample (x) and the counterfactual synthetic (x̃ γ); and attribution explanation information PNG media_image4.png 14 30 media_image4.png Greyscale associated with relative importance of the vocal cue information (ĉy, ĉγ) in prediction of the emotion; determining, using the processing device, numeric cue differences (ĉyγ) between the vocal cue information (ĉy) associated with the vocal sample (x) and the vocal cue information (ĉγ) associated with the counterfactual synthetic vocal sample (x̃ γ); generating, using the processing device, cue difference relations information PNG media_image5.png 13 26 media_image5.png Greyscale based on the attribution explanation information PNG media_image6.png 14 30 media_image6.png Greyscale , the numeric cue differences PNG media_image7.png 11 29 media_image7.png Greyscale the vector representation (ẑ0 γ) of the emotion prediction PNG media_image2.png 1 1 media_image2.png Greyscale (𝛾̂0) associated with the counterfactual synthetic vocal sample (x̃ γ) using a [first] neural network (Mr); generating, using the processing device, a final prediction (ÿ) of the emotion based on the numeric cue differences (ĉyγ), the vector representation (ẑ0 γ) of the initial prediction (ŷ0) and the vector representation (ẑ0 γ) of the emotion prediction PNG media_image2.png 1 1 media_image2.png Greyscale (𝛾̂0) associated with the counterfactual synthetic vocal sample (x̃ γ) using a [second] neural network (My); and generating, using the processing device, the explainable prediction of the emotion associated with the vocal sample (x) based on at least the counterfactual synthetic vocal sample (x̃ γ), the final prediction (ŷ) of the emotion and the cue difference relations information (r̂w yγ)”. The limitation of “receiving…”, “determining …”, “generating…”, “generating…” and “generating” is a process that, under its broadest reasonable interpretation, could be performed in the human mind and requires no more than a performing of generic computer functions (e.g. collecting data, calculating). More specifically, claim 1 is a method of receiving spoken voice and voice sample from different sources, deriving information prediction of the audio, and using Neural Networks to calculate difference between input voice and voice sample using a mathematical formula. The claim specifies that Neural Networks are used in the determination, but the claim does not include any details about the Neural Networks or how it operates. This judicial exception is not integrated into a practical application. In particular, claim 1 recites additional element of “processing device”. The computer is recited at a high-level of generality (i.e., as performing a generic computer function and being used as an applying) such that it amounts no more than mere instructions to apply the exception using a generic computer. Accordingly, there additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional element of using a computer amounts to no more than mere instructions to apply an exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible. With respect to claim 5, the claim is similar to claim 1 and claim 5 does not recite additional elements that are sufficient to amount to significantly more than the judicial exception. With respect to dependent claims 2-4 and 6-8, the claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. Therefore, claims 1-8 are rejected. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action. The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows: 1. Determining the scope and contents of the prior art. 2. Ascertaining the differences between the prior art and the claims at issue. 3. Resolving the level of ordinary skill in the pertinent art. 4. Considering objective evidence present in the application indicating obviousness or nonobviousness. This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention. Claims 1, 3-5, and 7-8 are rejected under pre-AIA 35 U.S.C. 103(a) as being unpatentable over Bone et al., (US Pub. 2021/0249035) in view of Triantafyllopoulos et al., (“Deep speaker conditioning for speech emotion recognition”, July,2021). Regarding claim 1, Bone discloses a method for generating an explainable prediction of an emotion associated with a vocal sample (x), the method comprising: receiving, by a processing device: a vector representation ( PNG media_image1.png 13 16 media_image1.png Greyscale of an initial prediction (ŷ0) of the emotion associated with the vocal sample (x) (Fig. 1B, steps 150 and 156, [0042][0117] receiving input audio data and a second feature vector which may represent speech attributes, e.g., accent pitch etc., corresponding to the input audio data); a counterfactual synthetic vocal sample (x̃ γ) associated with the vocal sample (x) and an alternate emotion (y) different from the initial prediction (ŷ0) of the emotion (Fig. 1A, steps 132-140, and Fig. 1B, step 152, [0029][0033][0039][0103][0104][0108] identifying and storing the audio data as a baseline associated with a user profile; the baseline audio data is determined during enrollment processing stage and represented in a neural emotional state); a vector representation (ẑ0 γ) of an emotion prediction PNG media_image2.png 1 1 media_image2.png Greyscale (𝛾̂0) associated with the counterfactual synthetic vocal sample (x̃ γ) (Fig. 1A, step 142, [0033][0041][0120] determining a first feature vector corresponding to the audio data at the enrollment processing stage); vocal cue information PNG media_image3.png 11 36 media_image3.png Greyscale associated with the vocal sample (x) and the counterfactual synthetic (x̃ γ) ([0033][0041][0042] processing a first and second feature vector and outputting one or more scores); and attribution explanation information PNG media_image4.png 14 30 media_image4.png Greyscale associated with relative importance of the vocal cue information (ĉy, ĉγ) in prediction of the emotion ([0033][0041][0042] determining an emotion category based on the scores based on the first/second feature vector; [0144][0139] determining connection weights of the features); determining, using the processing device, numeric cue differences (ĉyγ) between the vocal cue information (ĉy) associated with the vocal sample (x) and the vocal cue information (ĉγ) associated with the counterfactual synthetic vocal sample (x̃ γ) ([0022] determining emotion and/or sentiment between the audio data and based on the user's baseline representing a neutral emotion/sentiment); generating, using the processing device, cue difference relations information PNG media_image5.png 13 26 media_image5.png Greyscale based on the attribution explanation information PNG media_image6.png 14 30 media_image6.png Greyscale , the numeric cue differences PNG media_image7.png 11 29 media_image7.png Greyscale the vector representation (ẑ0 γ) of the emotion prediction PNG media_image2.png 1 1 media_image2.png Greyscale (𝛾̂0) associated with the counterfactual synthetic vocal sample (x̃ γ) using a [first] neural network (Mr) ([0022][0121] determining emotion between the audio data and based on the user's baseline using a neural network); generating, using the processing device, a final prediction (ÿ) of the emotion based on the numeric cue differences (ĉyγ), the vector representation (ẑ0 γ) of the initial prediction (ŷ0) and the vector representation (ẑ0 γ) of the emotion prediction PNG media_image2.png 1 1 media_image2.png Greyscale (𝛾̂0) associated with the counterfactual synthetic vocal sample (x̃ γ) using a [second] neural network (My) (Fig. 1B, step 160, [0043][0121][0123] determining an emotion category corresponding to the input audio data); and generating, using the processing device, the explainable prediction of the emotion associated with the vocal sample (x) based on at least the counterfactual synthetic vocal sample (x̃ γ), the final prediction (ŷ) of the emotion and the cue difference relations information (r̂w yγ) (Fig. 1B, step 162, [0043][0121], generating output data using the emotion category). Bone does not explicitly teach however Triantafyllopoulos does explicitly teach the bracketed limitation: A [first] and [second] neural network (Fig. 1 (b)(c), section 2, Triantafyllopoulos discloses speech emotion recognition system using neural networks architecture. The sub-network is fed a neural sample of the speaker and outputs an embedding vector which is fed into the main network). Therefore, it would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to incorporate the method of speech emotion recognition as taught by Bone with the method of using speaker conditioning sub-networks for speaker adaptation in a deep neural network-based speech emotion recognition system as taught by Triantafyllopoulos to improve performance of speech emotion recognition and speaker normalization (Triantafyllopoulos, section 1, 2nd and 4th paragraph). Regarding claim 3, Bone in view of Triantafyllopoulos discloses the method as claimed in claim 1, and Bone further discloses: wherein the step of receiving the vocal cue information associated with the vocal sample and the counterfactual synthetic vocal sample comprises: generating, using the processing device, a contrastive saliency explanation based on the vocal sample, the initial prediction, and the alternate emotion using a visual explanation algorithm; and determining, using the processing device, the vocal cue information associated with the vocal sample based on the vocal sample and the contrastive saliency explanation and the vocal cue information associated with counterfactual synthetic vocal sample based on the counterfactual synthetic vocal sample and the contrastive saliency explanation ([0047][0121] prediction results to include a textual display of a portion of the input audio data and a corresponding emotion indicator and score which is implicit to be contrast to the reference baseline). Regarding claim 4, Bone in view of Triantafyllopoulos discloses the method as claimed in claim 1, and Bone further discloses: wherein the vocal cue information is associated with one or more of a group consisting of: shrillness, loudness, average pitch, pitch range, speaking rate and proportion of pauses ([0041][0042] speech attributes to be accent, pitch or prosody). Claims 2 and 6 are rejected under pre-AIA 35 U.S.C. 103(a) as being unpatentable over Bone et al., (US Pub. 2021/0249035) in view of Triantafyllopoulos et al., (“Deep speaker conditioning for speech emotion recognition”, July, 2021) and further in view of Paraskevopoulos et al., (US Pub. 2020/0335086). Regarding claim 2, Bone in view of Triantafyllopoulos discloses the method as claimed in claim 1. Bone in view of Triantafyllopoulos does not explicitly teach however Paraskevopoulos does explicitly teach: wherein the step of receiving the counterfactual synthetic vocal sample comprises generating, using the processing device, the counterfactual synthetic vocal sample based on the vocal sample and the alternate emotion using a generative adversarial network (Paraskevopoulos, [Abstract][0026]-[0034] data augmentation for speech emotion recognition tasks using Generative Adversarial Networks (GANs) architecture to generate synthetic spectrograms). Therefore, it would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to incorporate the method of speech emotion recognition as taught by Bone with the method of Data augmentation using GANs as taught by Paraskevopoulos to improve results on a speech emotion recognition task and classification performance as compared to traditional speech data augmentation methods (Paraskevopoulos, [0008]). Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Please see attached form PTO-892. Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEONG-AH A. SHIN whose telephone number is (571)272-5933. The examiner can normally be reached 9 AM-3PM. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached at 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. Seong-ah A. Shin Primary Examiner Art Unit 2659 /SEONG-AH A SHIN/Primary Examiner, Art Unit 2659
Read full office action

Prosecution Timeline

May 09, 2024
Application Filed
Jan 05, 2026
Non-Final Rejection — §101, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12598095
DISPLAY DEVICE
2y 5m to grant Granted Apr 07, 2026
Patent 12591452
INVOKING AN AUTOMATED ASSISTANT TO PERFORM MULTIPLE TASKS THROUGH AN INDIVIDUAL COMMAND
2y 5m to grant Granted Mar 31, 2026
Patent 12585696
REDUCING METADATA TRANSMITTED WITH AUTOMATED ASSISTANT REQUESTS
2y 5m to grant Granted Mar 24, 2026
Patent 12555568
DEVICE CONTROL METHOD AND APPARATUS, READABLE STORAGE MEDIUM AND CHIP
2y 5m to grant Granted Feb 17, 2026
Patent 12554935
COMPUTER IMPLEMENTED METHOD FOR THE AUTOMATED ANALYSIS OR USE OF DATA
2y 5m to grant Granted Feb 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
78%
Grant Probability
99%
With Interview (+20.5%)
2y 9m
Median Time to Grant
Low
PTA Risk
Based on 409 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month