DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Election/Restrictions
During a telephone conversation with Trevor Copeland (Reg. No. 50,292) on 12/9/2025 and 12/18/2025, an election was made without traverse to prosecute the invention of the elected group, Claims 1-8. Affirmation of this election must be made by applicant in replying to this Office action. Claims 9-16 are withdrawn from further consideration by the examiner, 37 CFR 1.142(b), as being drawn to a non-elected invention.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-8 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The independent claim 1 recites “receiving, by a processing device: a vector representation (
PNG
media_image1.png
13
16
media_image1.png
Greyscale
of an initial prediction (ŷ0) of the emotion associated with the vocal sample (x); a counterfactual synthetic vocal sample (x̃ γ) associated with the vocal sample (x) and an alternate emotion (y) different from the initial prediction (ŷ0) of the emotion; a vector representation (ẑ0 γ) of an emotion prediction
PNG
media_image2.png
1
1
media_image2.png
Greyscale
(𝛾̂0) associated with the counterfactual synthetic vocal sample (x̃ γ); vocal cue information
PNG
media_image3.png
11
36
media_image3.png
Greyscale
associated with the vocal sample (x) and the counterfactual synthetic (x̃ γ); and attribution explanation information
PNG
media_image4.png
14
30
media_image4.png
Greyscale
associated with relative importance of the vocal cue information (ĉy, ĉγ) in prediction of the emotion; determining, using the processing device, numeric cue differences (ĉyγ) between the vocal cue information (ĉy) associated with the vocal sample (x) and the vocal cue information (ĉγ) associated with the counterfactual synthetic vocal sample (x̃ γ); generating, using the processing device, cue difference relations information
PNG
media_image5.png
13
26
media_image5.png
Greyscale
based on the attribution explanation information
PNG
media_image6.png
14
30
media_image6.png
Greyscale
, the numeric cue differences
PNG
media_image7.png
11
29
media_image7.png
Greyscale
the vector representation (ẑ0 γ) of the emotion prediction
PNG
media_image2.png
1
1
media_image2.png
Greyscale
(𝛾̂0) associated with the counterfactual synthetic vocal sample (x̃ γ) using a [first] neural network (Mr); generating, using the processing device, a final prediction (ÿ) of the emotion based on the numeric cue differences (ĉyγ), the vector representation (ẑ0 γ) of the initial prediction (ŷ0) and the vector representation (ẑ0 γ) of the emotion prediction
PNG
media_image2.png
1
1
media_image2.png
Greyscale
(𝛾̂0) associated with the counterfactual synthetic vocal sample (x̃ γ) using a [second] neural network (My); and generating, using the processing device, the explainable prediction of the emotion associated with the vocal sample (x) based on at least the counterfactual synthetic vocal sample (x̃ γ), the final prediction (ŷ) of the emotion and the cue difference relations information (r̂w yγ)”.
The limitation of “receiving…”, “determining …”, “generating…”, “generating…” and “generating” is a process that, under its broadest reasonable interpretation, could be performed in the human mind and requires no more than a performing of generic computer functions (e.g. collecting data, calculating). More specifically, claim 1 is a method of receiving spoken voice and voice sample from different sources, deriving information prediction of the audio, and using Neural Networks to calculate difference between input voice and voice sample using a mathematical formula. The claim specifies that Neural Networks are used in the determination, but the claim does not include any details about the Neural Networks or how it operates.
This judicial exception is not integrated into a practical application. In particular, claim 1 recites additional element of “processing device”. The computer is recited at a high-level of generality (i.e., as performing a generic computer function and being used as an applying) such that it amounts no more than mere instructions to apply the exception using a generic computer. Accordingly, there additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea.
The claim is directed to an abstract idea. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional element of using a computer amounts to no more than mere instructions to apply an exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.
With respect to claim 5, the claim is similar to claim 1 and claim 5 does not recite additional elements that are sufficient to amount to significantly more than the judicial exception.
With respect to dependent claims 2-4 and 6-8, the claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. Therefore, claims 1-8 are rejected.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 3-5, and 7-8 are rejected under pre-AIA 35 U.S.C. 103(a) as being unpatentable over Bone et al., (US Pub. 2021/0249035) in view of Triantafyllopoulos et al., (“Deep speaker conditioning for speech emotion recognition”, July,2021).
Regarding claim 1, Bone discloses a method for generating an explainable prediction of an emotion associated with a vocal sample (x), the method comprising:
receiving, by a processing device:
a vector representation (
PNG
media_image1.png
13
16
media_image1.png
Greyscale
of an initial prediction (ŷ0) of the emotion associated with the vocal sample (x) (Fig. 1B, steps 150 and 156, [0042][0117] receiving input audio data and a second feature vector which may represent speech attributes, e.g., accent pitch etc., corresponding to the input audio data);
a counterfactual synthetic vocal sample (x̃ γ) associated with the vocal sample (x) and an alternate emotion (y) different from the initial prediction (ŷ0) of the emotion (Fig. 1A, steps 132-140, and Fig. 1B, step 152, [0029][0033][0039][0103][0104][0108] identifying and storing the audio data as a baseline associated with a user profile; the baseline audio data is determined during enrollment processing stage and represented in a neural emotional state);
a vector representation (ẑ0 γ) of an emotion prediction
PNG
media_image2.png
1
1
media_image2.png
Greyscale
(𝛾̂0) associated with the counterfactual synthetic vocal sample (x̃ γ) (Fig. 1A, step 142, [0033][0041][0120] determining a first feature vector corresponding to the audio data at the enrollment processing stage);
vocal cue information
PNG
media_image3.png
11
36
media_image3.png
Greyscale
associated with the vocal sample (x) and the counterfactual synthetic (x̃ γ) ([0033][0041][0042] processing a first and second feature vector and outputting one or more scores); and
attribution explanation information
PNG
media_image4.png
14
30
media_image4.png
Greyscale
associated with relative importance of the vocal cue information (ĉy, ĉγ) in prediction of the emotion ([0033][0041][0042] determining an emotion category based on the scores based on the first/second feature vector; [0144][0139] determining connection weights of the features);
determining, using the processing device, numeric cue differences (ĉyγ) between the vocal cue information (ĉy) associated with the vocal sample (x) and the vocal cue information (ĉγ) associated with the counterfactual synthetic vocal sample (x̃ γ) ([0022] determining emotion and/or sentiment between the audio data and based on the user's baseline representing a neutral emotion/sentiment);
generating, using the processing device, cue difference relations information
PNG
media_image5.png
13
26
media_image5.png
Greyscale
based on the attribution explanation information
PNG
media_image6.png
14
30
media_image6.png
Greyscale
, the numeric cue differences
PNG
media_image7.png
11
29
media_image7.png
Greyscale
the vector representation (ẑ0 γ) of the emotion prediction
PNG
media_image2.png
1
1
media_image2.png
Greyscale
(𝛾̂0) associated with the counterfactual synthetic vocal sample (x̃ γ) using a [first] neural network (Mr) ([0022][0121] determining emotion between the audio data and based on the user's baseline using a neural network);
generating, using the processing device, a final prediction (ÿ) of the emotion based on the numeric cue differences (ĉyγ), the vector representation (ẑ0 γ) of the initial prediction (ŷ0) and the vector representation (ẑ0 γ) of the emotion prediction
PNG
media_image2.png
1
1
media_image2.png
Greyscale
(𝛾̂0) associated with the counterfactual synthetic vocal sample (x̃ γ) using a [second] neural network (My) (Fig. 1B, step 160, [0043][0121][0123] determining an emotion category corresponding to the input audio data); and
generating, using the processing device, the explainable prediction of the emotion associated with the vocal sample (x) based on at least the counterfactual synthetic vocal sample (x̃ γ), the final prediction (ŷ) of the emotion and the cue difference relations information (r̂w yγ) (Fig. 1B, step 162, [0043][0121], generating output data using the emotion category).
Bone does not explicitly teach however Triantafyllopoulos does explicitly teach the bracketed limitation:
A [first] and [second] neural network (Fig. 1 (b)(c), section 2, Triantafyllopoulos discloses speech emotion recognition system using neural networks architecture. The sub-network is fed a neural sample of the speaker and outputs an embedding vector which is fed into the main network).
Therefore, it would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to incorporate the method of speech emotion recognition as taught by Bone with the method of using speaker conditioning sub-networks for speaker adaptation in a deep neural network-based speech emotion recognition system as taught by Triantafyllopoulos to improve performance of speech emotion recognition and speaker normalization (Triantafyllopoulos, section 1, 2nd and 4th paragraph).
Regarding claim 3, Bone in view of Triantafyllopoulos discloses the method as claimed in claim 1, and Bone further discloses:
wherein the step of receiving the vocal cue information associated with the vocal sample and the counterfactual synthetic vocal sample comprises: generating, using the processing device, a contrastive saliency explanation based on the vocal sample, the initial prediction, and the alternate emotion using a visual explanation algorithm; and determining, using the processing device, the vocal cue information associated with the vocal sample based on the vocal sample and the contrastive saliency explanation and the vocal cue information associated with counterfactual synthetic vocal sample based on the counterfactual synthetic vocal sample and the contrastive saliency explanation ([0047][0121] prediction results to include a textual display of a portion of the input audio data and a corresponding emotion indicator and score which is implicit to be contrast to the reference baseline).
Regarding claim 4, Bone in view of Triantafyllopoulos discloses the method as claimed in claim 1, and Bone further discloses:
wherein the vocal cue information is associated with one or more of a group consisting of: shrillness, loudness, average pitch, pitch range, speaking rate and proportion of pauses ([0041][0042] speech attributes to be accent, pitch or prosody).
Claims 2 and 6 are rejected under pre-AIA 35 U.S.C. 103(a) as being unpatentable over Bone et al., (US Pub. 2021/0249035) in view of Triantafyllopoulos et al., (“Deep speaker conditioning for speech emotion recognition”, July, 2021) and further in view of Paraskevopoulos et al., (US Pub. 2020/0335086).
Regarding claim 2, Bone in view of Triantafyllopoulos discloses the method as claimed in claim 1.
Bone in view of Triantafyllopoulos does not explicitly teach however Paraskevopoulos does explicitly teach:
wherein the step of receiving the counterfactual synthetic vocal sample comprises generating, using the processing device, the counterfactual synthetic vocal sample based on the vocal sample and the alternate emotion using a generative adversarial network (Paraskevopoulos, [Abstract][0026]-[0034] data augmentation for speech emotion recognition tasks using Generative Adversarial Networks (GANs) architecture to generate synthetic spectrograms).
Therefore, it would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to incorporate the method of speech emotion recognition as taught by Bone with the method of Data augmentation using GANs as taught by Paraskevopoulos to improve results on a speech emotion recognition task and classification performance as compared to traditional speech data augmentation methods (Paraskevopoulos, [0008]).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Please see attached form PTO-892.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEONG-AH A. SHIN whose telephone number is (571)272-5933. The examiner can normally be reached 9 AM-3PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached at 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
Seong-ah A. Shin
Primary Examiner
Art Unit 2659
/SEONG-AH A SHIN/Primary Examiner, Art Unit 2659