Last updated: April 17, 2026
Application No. 18/816,052
AUDIO ANALYSIS FOR MEDIA CONTENT GENERATION

Non-Final OA §101§102§103
Filed
Aug 27, 2024
Examiner
PULLIAS, JESSE SCOTT
Art Unit
2655
Tech Center
2600 — Communications
Assignee
unknown
OA Round
1 (Non-Final)
Interview Optional

— +13.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 1052 resolved cases, 2023–2026
Examiner Intelligence

PULLIAS, JESSE SCOTT View full profile →
Grants 83% — above average
Career Allow Rate
873 granted / 1052 resolved
+21.0% vs TC avg
Moderate +13% lift
Without
With
+13.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 8m
Avg Prosecution
47 currently pending
Career history
1099
Total Applications
across all art units
Statute-Specific Performance

§101
15.0%
-25.0% vs TC avg
§103
50.4%
+10.4% vs TC avg
§102
19.7%
-20.3% vs TC avg
§112
4.9%
-35.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 1052 resolved cases
Office Action

§101 §102 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
This office action is in response to application 18/816,052, which was filed 08/27/24. Claims 1-20 are pending in the application and have been considered.

Claim Objections
Claim 18 is objected to because of the following informalities:  in lines 7-8, should “specific portion of the specific in the media content” be “specific portion of the specific body in the media content”?  Appropriate correction is required.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows: 
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-14, 16, 19, and 20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.   
Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim recites “accessing a conversational artificial intelligence model; receiving audio data, the audio data includes an input from an entity in a natural language, the input includes at least a first part and a second part, the first part is associated with a first at least one suprasegmental feature, the second part is associated with a second at least one suprasegmental feature, the second part differs from the first part, the second at least one suprasegmental feature differs from the first at least one suprasegmental feature; using the conversational artificial intelligence model to analyze the audio data to generate a media content, the media content is based on the first at least one suprasegmental feature and the second at least one suprasegmental feature; and using the media content in a communication with the entity”. 
The limitation of “accessing a conversational artificial intelligence model”, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. For example, but for the “artificial intelligence” language, “accessing a conversational … model” in the context of this claim encompasses looking at a printed list of conversational responses mapped to inputs. 
Similarly, the limitation of “receiving audio data, the audio data includes an input from an entity in a natural language, the input includes at least a first part and a second part, the first part is associated with a first at least one suprasegmental feature, the second part is associated with a second at least one suprasegmental feature, the second part differs from the first part, the second at least one suprasegmental feature differs from the first at least one suprasegmental feature”, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “receiving audio data, the audio data includes an input from an entity in a natural language, the input includes at least a first part and a second part, the first part is associated with a first at least one suprasegmental feature, the second part is associated with a second at least one suprasegmental feature, the second part differs from the first part, the second at least one suprasegmental feature differs from the first at least one suprasegmental feature” in the context of this claim encompasses listening to a person speak a first portion of a sentence and yell a second portion of the sentence. 
Similarly, the limitation of “using the conversational artificial intelligence model to analyze the audio data to generate a media content, the media content is based on the first at least one suprasegmental feature and the second at least one suprasegmental feature;”, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. For example, but for the “artificial intelligence” language, “using the conversational … model to analyze the audio data to generate a media content, the media content is based on the first at least one suprasegmental feature and the second at least one suprasegmental feature”  in the context of this claim encompasses consulting the paper conversational model to read how to respond to a sentence partially whispered and partially yelled, and writing down media content in the form of text on a piece of paper. 
Similarly, the limitation of “using the media content in a communication with the entity” as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “using the media content in a communication with the entity” in the context of this claim encompasses displaying the sheet of paper with the written media to the speaker. 
If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea. This judicial exception is not integrated into a practical application. In particular, the claim only recites four additional elements – “computer readable medium”, “instructions”, “processor”, and “artificial intelligence”. The computing elements in this step are recited at a high-level of generality (i.e., as a generic computer readable medium, generic instructions, a generic processor, and generic artificial intelligence) such that they amount to no more than mere instructions to apply the exception using generic computer elements. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using a computing device to perform the accessing and using steps amounts to no more than mere instructions to apply the exception using generic computer components. Mere instructions to apply an exception using generic computer components cannot provide an inventive concept. The claim is not patent eligible. 
Specifically with respect to Step 2A, Prong Two, of the Alice/Mayo test, the judicial exception is not integrated into a practical application. Claim 1 does not recite any limitations that are not mental steps.
Specifically with respect to Step 2B of the Alice/Mayo test, “the claim as a whole does not amount to significantly more than the exception itself (there is no inventive concept in the claim)”. MPEP 2106.05 Il. There are no limitations in claim 1 outside of the judicial exception. As a whole, there does not appear to contain any inventive concept. As discussed above, claim 1 is a mental process that pertains to the mental process of receiving and analyzing audio data, and generating media content, which can be performed entirely by a human with physical aids.
Dependent claims 2-14 and 16 depend from claim 1, do not remedy any of the deficiencies of claim 1, and therefore are rejected on the same grounds as claim 1 above.
Generally, claims 2-14 and 16 merely recite additional steps for receiving and analyzing audio data, and generating media content, all of which could be performed mentally or by writing down text with a pen and paper, and do not amount to anything more than substantially the same abstract idea as explained with respect to claim 1.

Specifically:

Claim 2 recites “wherein the first at least one suprasegmental feature differs from the second at least one suprasegmental feature in at least one of intonation, stress, pitch, rhythm, tempo, loudness or prosody” which could be performed by listening for intonation, stress, pitch, rhythm, tempo, loudness or prosody in the speaker’s speech, and mentally analyzing it.

Claim 3 recites “wherein the first part includes at least a particular word, and the second part includes at least a particular non-verbal sound, and the generated media content is further based on the particular word and the particular non-verbal sound” which could be performed by listening for at least a particular word and a particular non-verbal sound, and formulating a text response based on those.

Claim 4 recites “wherein the first at least one suprasegmental feature is associated with a particular emotion of the entity, and wherein the generated media content is further based on the particular emotion” which could be performed by mentally analyzing emotions in the first segment of speech and formulating a text response based on the emotion.

Claim 5 recites “wherein the first at least one suprasegmental feature is associated with a particular intent of the entity, and wherein the generated media content is further based on the particular intent” which could be performed by mentally analyzing an intent in the first segment of speech and formulating a text response based on the intent.

Claim 6 recites “wherein the first at least one suprasegmental feature is associated with a particular level of empathy of the entity, and wherein the generated media content is further based on the particular level of empathy” which could be performed by mentally analyzing particular level of empathy in the first segment of speech and formulating a text response based on the intent.

Claim 7 recites “wherein the first at least one suprasegmental feature is associated with a particular level of self-assurance of the entity, and wherein the generated media content is further based on the particular level of self-assurance” which could be performed by mentally analyzing particular level of self-assurance in the first segment of speech and formulating a text response based on the intent.

Claim 8 recites “wherein the first at least one suprasegmental feature is associated with a particular level of formality, and wherein the generated media content is further based on the particular level of formality” which could be performed by mentally analyzing particular level of formality in the first segment of speech and formulating a text response based on the intent.

Claim 9 recites “wherein the usage of the generated media content is configured to convey reacting to the input as a humoristic remark based on the first at least one suprasegmental feature and the second at least one suprasegmental feature” which could be performed by mentally formulating a written response designed to convey reacting to the input as a humoristic remark.

Claim 10 recites “wherein the usage of the generated media content is configured to convey reacting to the input as an offensive remark based on the first at least one suprasegmental feature and the second at least one suprasegmental feature” which could be performed by mentally formulating a written response designed to convey reacting to the input as an offensive remark.

Claim 11 recites “wherein the usage of the generated media content is configured to convey a particular emotion, the particular emotion is selected based on the first at least one suprasegmental feature and the second at least one suprasegmental feature” which could be performed by mentally formulating a written response designed to convey a mentally selected particular emotion.

Claim 12 recites “wherein the usage of the generated media content is configured to convey a level of empathy, the level of empathy is selected based on the first at least one suprasegmental feature and the second at least one suprasegmental feature” which could be performed by mentally formulating a written response designed to convey a mentally selected level of empathy.

Claim 13 recites “wherein the usage of the generated media content is configured to convey a level of self-assurance, the level of self-assurance is selected based on the first at least one suprasegmental feature and the second at least one suprasegmental feature” which could be performed by mentally formulating a written response designed to convey a mentally selected level of self-assurance.

Claim 14 recites “wherein the usage of the generated media content is configured to convey a selected reaction to the input, the selected reaction is selected based on the first at least one suprasegmental feature and the second at least one suprasegmental feature, and the selected reaction to the input is at least one of a positive reaction, negative reaction, engagement, show of interest, agreement, respect, disagreement, skepticism, disinterest, boredom, discomfort, uncertainty, confusion or neutrality” which could be performed by mentally formulating a written response designed to convey a selected reaction to the input of a positive reaction, negative reaction, engagement, show of interest, agreement, respect, disagreement, skepticism, disinterest, boredom, discomfort, uncertainty, confusion or neutrality.

Claim 16 recites “wherein operations further comprise: obtaining an indication of a characteristic of an ambient noise; and further basing the generation of the media content on the characteristic of the ambient noise” which could be performed by listening for ambient noise and mentally formulating a written response based in part on the ambient noise.

In sum, claims 2-14 and 16 depend from claim 1 and further recite mental processes as explained above. None of the additional limitations recited in claims 2-14 and 16 amount to anything more than the same or a similar abstract idea as recited in claim 1. Nor do any limitations in claims 2-14 and 16 (a) integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea or (b) amount to significantly more than the judicial exception. Claims 2-14 and 16 are not patent eligible.

Claim 19 is directed to a method that corresponds to the computer readable medium of claim 1 and is therefore rejected for the same reasons set for the above with respect to claim 1. While claim 19 recites generic computer components (processing unit, operations), such generic computing components are recited at a high-level of generality (i.e., as a generic processor performing  generic computer operations) such that they amount to no more than mere instructions to apply the exception using generic computer components. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Claim 19 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional limitations of using generic processor and operations and generic artificial intelligence amount to no more than mere instructions to apply the exception using generic computer components. Mere instructions to apply an exception using generic computer components cannot provide an inventive concept. Claim 19 is not patent eligible.

Claim 20 is directed to a method that corresponds to the computer readable medium of claim 1 and the system of claim 19 and is therefore rejected for the same reasons set forth above with respect to claims 1 and 19.  As discussed above with respect to integration of the abstract idea into a practical application, the additional limitation using generic artificial intelligence amounts to no more than mere instructions to apply the exception using generic computer components. Mere instructions to apply an exception using generic computer components cannot provide an inventive concept. Claim 20 is not patent eligible.


Eligible Claims
Claim 15 recites “wherein the operations further comprise: calculating a convolution of a fragment of the audio data associated with the first part to obtain a first plurality of numerical result values; calculating a convolution of a fragment of the audio data associated with the second part to obtain a second plurality of numerical result values; calculating a function of the first plurality of numerical result values and the second plurality of numerical result values to obtain a specific mathematical object in a mathematical space; and basing the generation of the media content on the specific mathematical object” which cannot be practically performed as a mental process.

Claim 17 recites “ wherein the operations further comprise: using the conversational artificial intelligence model to analyze the audio data to determine a desired at least one suprasegmental feature, the desired at least one suprasegmental feature is based on the first at least one suprasegmental feature and the second at least one suprasegmental feature; and using the desired at least one suprasegmental feature to generate an audible speech in the media content” which cannot be practically performed as a mental process.

Claim 18 recites “wherein the operations further comprise: using the conversational artificial intelligence model to analyze the audio data to determine a desired movement for a specific portion of a specific body, the desired movement is based on the first at least one suprasegmental feature and the second at least one suprasegmental feature; and using the desired movement for the specific portion of the specific body to generate a visual depiction of the desired movement to the specific portion of the specific in the media content” which cannot be practically performed as a mental process.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-5, 11-14, 16, 17, 19, and 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Luan et al. (US 20210043208).

Consider claim 1, Luan discloses a non-transitory computer readable medium storing computer implementable instructions that when executed by at least one processor cause the at least one processor to perform operations (non-transitory computer-readable medium may comprise instructions that when executed, cause one or more processors to perform the methods, [0203]) for audio analysis for media content generation (generating a speech response based on analyzing sound input, [0003], [0070]), the operations comprising: 
accessing a conversational artificial intelligence model (an artificial intelligence chatbot, [0001], [0052]); 
receiving audio data, the audio data includes an input from an entity in a natural language, the input includes at least a first part and a second part, the first part is associated with a first at least one suprasegmental feature, the second part is associated with a second at least one suprasegmental feature, the second part differs from the first part, the second at least one suprasegmental feature differs from the first at least one suprasegmental feature (first and second sound inputs are received during a conversation between the user and the chatbot, [0074], [0080], Fig. 5, e.g. “I’m so angry”, and “I really love you”, having differing emotion vectors, [0099], [0107], determined by analyzing the speech segments for acoustic features such as tone, loudness, pitch, etc., [0096]); 
using the conversational artificial intelligence model to analyze the audio data to generate a media content, the media content is based on the first at least one suprasegmental feature and the second at least one suprasegmental feature (based on the emotion vector differences, conversational chatbot generates speech responses, [0100]-[0107], Fig. 8); and 
using the media content in a communication with the entity (e.g. “Me too” is generated and presented to the user in the form of a speech message, [0107], Fig. 8).

Consider claim 19, Luan discloses a system for audio analysis for media content generation (apparatus for generating a speech response based on analyzing sound input, [0003], [0070]), the system comprising at least one processing unit configured to perform operations (apparatus includes processors which execute instructions, [0202]), the operations comprise: 
accessing a conversational artificial intelligence model (an artificial intelligence chatbot, [0001], [0052]); 
receiving audio data, the audio data includes an input from an entity in a natural language, the input includes at least a first part and a second part, the first part is associated with a first at least one suprasegmental feature, the second part is associated with a second at least one suprasegmental feature, the second part differs from the first part, the second at least one suprasegmental feature differs from the first at least one suprasegmental feature (first and second sound inputs are received during a conversation between the user and the chatbot, [0074], [0080], Fig. 5, e.g. “I’m so angry”, and “I really love you”, having differing emotion vectors, [0099], [0107], determined by analyzing the speech segments for acoustic features such as tone, loudness, pitch, etc., [0096]); 
using the conversational artificial intelligence model to analyze the audio data to generate a media content, the media content is based on the first at least one suprasegmental feature and the second at least one suprasegmental feature (based on the emotion vector differences, conversational chatbot generates speech responses, [0100]-[0107], Fig. 8); and 
using the media content in a communication with the entity (e.g. “Me too” is generated and presented to the user in the form of a speech message, [0107], Fig. 8).

Consider claim 20, Luan discloses a method for audio analysis for media content generation (method for generating a speech response based on analyzing sound input, [0003], [0070]), the method comprising:
accessing a conversational artificial intelligence model (an artificial intelligence chatbot, [0001], [0052]); 
receiving audio data, the audio data includes an input from an entity in a natural language, the input includes at least a first part and a second part, the first part is associated with a first at least one suprasegmental feature, the second part is associated with a second at least one suprasegmental feature, the second part differs from the first part, the second at least one suprasegmental feature differs from the first at least one suprasegmental feature (first and second sound inputs are received during a conversation between the user and the chatbot, [0074], [0080], Fig. 5, e.g. “I’m so angry”, and “I really love you”, having differing emotion vectors, [0099], [0107], determined by analyzing the speech segments for acoustic features such as tone, loudness, pitch, etc., [0096]); 
using the conversational artificial intelligence model to analyze the audio data to generate a media content, the media content is based on the first at least one suprasegmental feature and the second at least one suprasegmental feature (based on the emotion vector differences, conversational chatbot generates speech responses, [0100]-[0107], Fig. 8); and 
using the media content in a communication with the entity (e.g. “Me too” is generated and presented to the user in the form of a speech message, [0107], Fig. 8).

Consider claim 2, Luan discloses the first at least one suprasegmental feature differs from the second at least one suprasegmental feature in at least one of intonation, stress, pitch, rhythm, tempo, loudness or prosody (emotion changes are detected by computing acoustic features of the user’s speech such as tone, loudness, pitch, etc. and analyzing differences between different speech inputs, [0096], Fig. 7).

Consider claim 3, Luan discloses the first part includes at least a particular word (user’s recognized words are interpreted by emotion module to understand the user’s emotion condition, [0059], e.g. “I’m so angry” including the particular word “angry”, [0099]), and the second part includes at least a particular non-verbal sound (physical status indicator analyzes sounds such as sneezing coughing, yawning, etc., to understand the user’s physical status condition, [0060]; in the case where this sound follows a word, it would be in the “second part, see Fig 14 elements 1420 and 1435 where speech is follow by sneezing), and the generated media content is further based on the particular word and the particular non-verbal sound (physical status derived from the non-verbal sounds and emotion derived from the particular word are used to generate a response, [0067]).

Consider claim 4, Luan discloses the first at least one suprasegmental feature is associated with a particular emotion of the entity, and wherein the generated media content is further based on the particular emotion (based on the emotion vector differences, e.g. a change from angry to surprise, conversational chatbot generates speech responses, [0100]-[0107], Fig. 8).

Consider claim 5, Luan discloses the first at least one suprasegmental feature is associated with a particular intent of the entity (e.g. “I want to go to the zoo”, Fig 4, [0070]; the emotion vector for this speech is associated with the user’s intent to go to the zoo, [0074]), and wherein the generated media content is further based on the particular intent (e.g. “That’s good. Have a nice day!”, response is generated based on semantic content of user input, i.e. intent, and emotion vectors, [0070-0076], [0079]).

Consider claim 11, Luan discloses the usage of the generated media content is configured to convey a particular emotion, the particular emotion is selected based on the first at least one suprasegmental feature and the second at least one suprasegmental feature (generation of (e.g. “Don’t mind that. I care about your feeling” in response to the vector difference from user utterances 1330 and 1340, Fig 13, [0161]; generation of the response with semantic information is considered to “select” an emotion of empathy based on the selected words).

Consider claim 12, Luan discloses the usage of the generated media content is configured to convey a level of empathy, the level of empathy is selected based on the first at least one suprasegmental feature and the second at least one suprasegmental feature (e.g. “Don’t mind that. I care about your feeling” in response to the vector difference from user utterances 1330 and 1340, Fig 13, [0161]; generation of the response with semantic information is considered to “select” a level of empathy based on the selected words).

Consider claim 13, Luan discloses the usage of the generated media content is configured to convey a level of self-assurance, the level of self-assurance is selected based on the first at least one suprasegmental feature and the second at least one suprasegmental feature (e.g. “It’s no big deal” is considered to convey a level of self-assuredness based on semantic selection of words, i.e. select a level of self-assurance in the response, based on the emotion vector difference and semantic information, [0114], Fig. 13).

Consider claim 14, Luan discloses the usage of the generated media content is configured to convey a selected reaction to the input, the selected reaction is selected based on the first at least one suprasegmental feature and the second at least one suprasegmental feature, and the selected reaction to the input is at least one of a positive reaction, negative reaction, engagement, show of interest, agreement, respect, disagreement, skepticism, disinterest, boredom, discomfort, uncertainty, confusion or neutrality (e.g. “Really?” is considered to convey skepticism based on semantic selection of words based on the emotion vector difference and semantic information, [0115], Fig. 13).

Consider claim 16, Luan discloses operations further comprise: obtaining an indication of a characteristic of an ambient noise (e.g. street noise, Fig 14 element 1445, [0168]); and further basing the generation of the media content on the characteristic of the ambient noise (based in part on the street noise, the chatbot generates “Where are you now?”, Fig 14 element 1440, [0169]).

Consider claim 17, Luan discloses the operations further comprise: using the conversational artificial intelligence model to analyze the audio data to determine a desired at least one suprasegmental feature, the desired at least one suprasegmental feature is based on the first at least one suprasegmental feature and the second at least one suprasegmental feature (based on emotion changes, determining to generate a response to e.g. sympathize with the user, [0063], Fig. 13 element 1340, [0161]); and using the desired at least one suprasegmental feature to generate an audible speech in the media content (generate a response most likely to achieve the desired condition changes in emotion, [0063], [0161]).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 6-8 are rejected under 35 U.S.C. 103 as being unpatentable over Luan et al. (US 20210043208) in view of Mackay et al. (US 20220245354).

Consider claim 6, Luan discloses the first at least one suprasegmental feature is associated with a particular level of emotion of the entity, and wherein the generated media content is further based on the particular level of emotion (based on the emotion vector differences, which represent levels of emotions, e.g. a change from angry to surprise, conversational chatbot generates speech responses, [0100]-[0107], Fig. 8).
Luan does not specifically mention empathy of the entity.
Mackay discloses empathy of the entity (dimensions analyzed for emotional classification include empathy, [0095]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Luan by analyzing empathy of the entity in order to overcome the known difficulties in sentiment analysis identified by Mackay, ([0009]). Doing so would have led to predictable results of more robust and accurate detection of emotion, as suggested by Mackay ([0011]). The references cited are analogous art in the same field of natural language. 
	
Consider claim 7, Luan discloses the first at least one suprasegmental feature is associated with a particular level of emotion of the entity, and wherein the generated media content is further based on the particular level of emotion (based on the emotion vector differences, which represent levels of emotions, e.g. a change from angry to surprise, conversational chatbot generates speech responses, [0100]-[0107], Fig. 8).
Luan does not specifically mention self-assurance of the entity.
Mackay discloses self-assurance of the entity (dimensions analyzed for emotional classification include self-image/self-esteem, [0095]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Luan by analyzing self-assurance of the entity for reasons similar to those for claim 6.

Consider claim 8, Luan discloses the first at least one suprasegmental feature is associated with a particular level of emotion of the entity, and wherein the generated media content is further based on the particular level of emotion (based on the emotion vector differences, which represent levels of emotions, e.g. a change from angry to surprise, conversational chatbot generates speech responses, [0100]-[0107], Fig. 8).
Luan does not specifically mention formality of the entity.
Mackay discloses formality of the entity (dimensions analyzed for emotional classification include formality, [0095]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Luan by analyzing formality of the entity for reasons similar to those for claim 6.

Claims 9 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Luan et al. (US 20210043208) in view of Kwatra et al. (US 20200322299).




Consider claim 9, Luan discloses the usage of the generated media content is configured to convey reacting to the input as a remark based on the first at least one suprasegmental feature and the second at least one suprasegmental feature (based on the emotion vector differences between user remarks, e.g. a change from angry to surprise, conversational chatbot generates speech responses, [0100]-[0107], Fig. 8).
Luan does not specifically mention a humoristic remark.
Kwatra discloses a humoristic remark (a message that includes humor, [0022]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Luan by detecting a humoristic remark in order to faciliatate better communication and conversations, as suggested by Kwatra, ([0003]-[0004]), predicting leading to more effective dialogues, as suggested by Kwatra ([0004]). The references cited are analogous art in the same field of natural language.

Consider claim 10, Luan discloses the usage of the generated media content is configured to convey reacting to the input as a remark based on the first at least one suprasegmental feature and the second at least one suprasegmental feature (based on the emotion vector differences between user remarks, e.g. a change from angry to surprise, conversational chatbot generates speech responses, [0100]-[0107], Fig. 8).
Luan does not specifically mention an offensive remark.
Kwatra discloses an offensive remark (a message that includes humor that may be considered offensive, [0022]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Luan by detecting an offensive remark for reasons similar to those for claim 9.

Claims 15 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Luan et al. (US 20210043208) in view of Villanueva Aylagas et al. (US 12406419).

Consider claim 15, Luan discloses the operations further comprise: calculating a fragment of the audio data associated with the first part to obtain a first plurality of numerical result values (MFCCs for First Sound Input 502, [0025], [0074]); calculating a fragment of the audio data associated with the second part to obtain a second plurality of numerical result values (MFCCs for Second Sound Input 502, [0025], [0074]); calculating a function of the first plurality of numerical result values and the second plurality of numerical result values to obtain a specific mathematical object in a mathematical space (emotion vectors based on the acoustic features for each of the inputs, and vector differences, [0025], [0074], Fig 8, [0106]); and basing the generation of the media content on the specific mathematical object (generating the response “Sure, we are close friends” based on the vector difference, [0106])
Luan does not specifically mention a convolution of a fragment of the audio data.
Villanueva Aylagas discloses convolution of a fragment of the audio data (convolution of acoustic features for a sequence of frames, Col 9 lines 60-62). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Luan by including a convolution of a fragment of the audio data in order to generate more realistic interactions with the user, predictably improving interaction quality, as suggested by Villanueva Aylagas (Col 4-5 lines 60-6). The references cited are analogous art in the same field of natural language.

Consider claim 18, Luan discloses the operations further comprise: using the conversational artificial intelligence model to analyze the audio data to determine a desired response, the desired response is based on the first at least one suprasegmental feature and the second at least one suprasegmental feature (based on emotion changes detected between the sound inputs, determining to generate a response to e.g. sympathize with the user, [0063], Fig. 13 element 1340, [0161]).
Luan does not specifically mention a desired movement for a specific portion of a specific body; and using the desired movement for the specific portion of the specific body to generate a visual depiction of the desired movement to the specific portion of the specific in the media content.
Villanueva Aylagas discloses a desired movement for a specific portion of a specific body (desired animations of the lips, eyes, etc. from speech input, Col 4-5 lines 57-14); and using the desired movement for the specific portion of the specific body to generate a visual depiction of the desired movement to the specific portion of the specific in the media content (the animation with movement of the lips, eyes, etc. is displayed, Col 4-5 lines 57-14, Col 7 lines 19-28).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Luan by including a desired movement for a specific portion of a specific body, and using the desired movement for the specific portion of the specific body to generate a visual depiction of the desired movement to the specific portion of the specific in the media content for reasons similar to those for claim 15. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
US 20210319780 Aher discloses determining prosodic characteristics of voice input, i.e. audio analysis, to inform the type and character of a synthesized speech response, i.e. media content generation
US 20240169974 Bonar discloses a real-time system for spoken natural stylistic conversations with large language models
US 20200279553 McDuff discloses a linguistic style matching conversational agent
US 12333258 Tiwari discloses multi-level emotional enhancement of dialog during a conversation
US 20210358488 Iyer discloses performing real-time sentiment modulation in conversation systems
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jesse Pullias whose telephone number is 571/270-5135. The examiner can normally be reached on M-F 8:00 AM - 4:30 PM. The examiner’s fax number is 571/270-6135.

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Andrew Flanders can be reached on 571/272-7516. 

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/Jesse S Pullias/
Primary Examiner, Art Unit 2655                                                               02/25/26
Read full office action
Prosecution Timeline

Aug 27, 2024
Application Filed
Feb 25, 2026
Non-Final Rejection — §101, §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/385,358
Patent 12596885
Automatically Labeling Items using a Machine-Trained Language Model
2y 5m to grant Granted Apr 07, 2026
17/747,704
Patent 12573378
SPEECH TENDENCY CLASSIFICATION
2y 5m to grant Granted Mar 10, 2026
18/168,450
Patent 12572740
MULTI-LANGUAGE DOCUMENT FIELD EXTRACTION
2y 5m to grant Granted Mar 10, 2026
18/410,097
Patent 12566929
COMBINING DATA SELECTION AND REWARD FUNCTIONS FOR TUNING LARGE LANGUAGE MODELS USING REINFORCEMENT LEARNING
2y 5m to grant Granted Mar 03, 2026
17/838,199
Patent 12536389
TRANSLATION SYSTEM
2y 5m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
83%
Grant Probability
96%
With Interview (+13.0%)
2y 8m
Median Time to Grant
Low
PTA Risk
Based on 1052 resolved cases by this examiner. Grant probability derived from career allow rate.