Last updated: April 19, 2026
Application No. 18/429,601
SPEECH SIGNAL PROCESSING APPARATUS, SPEECH SIGNAL REPRODUCTION SYSTEM AND METHOD FOR OUTPUTTING A DE-EMOTIONALIZED SPEECH SIGNAL

Non-Final OA §101§102§103§112
Filed
Feb 01, 2024
Examiner
MEIS, JON CHRISTOPHER
Art Unit
2654
Tech Center
2600 — Communications
Assignee
Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.
OA Round
1 (Non-Final)
This examiner grants 46% of cases after interview

— +59.0% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 22 resolved cases, 2023–2026
Examiner Intelligence

MEIS, JON CHRISTOPHER View full profile →
Grants 46% of resolved cases
Career Allow Rate
10 granted / 22 resolved
-16.5% vs TC avg
Strong +59% interview lift
Without
With
+59.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
30 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
24.9%
-15.1% vs TC avg
§103
49.7%
+9.7% vs TC avg
§102
12.9%
-27.1% vs TC avg
§112
10.6%
-29.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 22 resolved cases
Office Action

§101 §102 §103 §112
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
Claims 1-27 are pending.  Claims 1, 15, 26, and 27 are independent.
This Application was published as US 20240169999.
Apparent priority is 2 August 2021.
The instant Application is directed to a method of detecting and removing emotion from speech.

Specification
The disclosure is objected to because of the following informalities: on page 8, line 2, both instances of “FS” are understood to mean “SF”.  Appropriate correction is required.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “speech signal detection apparatus,” “analysis apparatus,” “processing apparatus,” “coupling apparatus,” and “reproduction apparatus” in claims 1-14 and 27; “storage apparatus” in claim 2; “compensation apparatus” in claim 13; and “speech signal processing apparatus” in claim 25.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof. 
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 2-3, 6, 9, 11, 14, and 19-23 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Regarding claims 2-3, 6, 9, 11, and 19-23, the phrase "in particular" renders the claims indefinite because it is unclear whether the limitation(s) following the phrase are part of the claimed invention.  See MPEP § 2173.05(d).
Claim 14 recites the limitation "The speech signal reproduction system" in line 1.  There is insufficient antecedent basis for this limitation in the claim.

Claim Rejections - 35 USC § 101
Claims 1-27 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Step 1: The independent Claims are directed to statutory categories: 
Claim 1 is a device claim and directed to the machine or manufacture category of patentable subject matter. 
Claim 15 is a method claim and directed to the process category of patentable subject matter.
Claim 26 is non transitory medium claim and is directed to the machine or manufacture category of patentable subject matter.
Claim 27 is a device claim and directed to the machine or manufacture category of patentable subject matter. 
Step 2A, Prong One: Does the Claim recite a Judicially Recognized Exception? Abstract Idea? Are these Claims nevertheless considered Abstract as a Mathematical Concept (mathematical relationships, mathematical formulas or equations, mathematical calculations), Mental Process (concepts performed in the human mind (including an observation, evaluation, judgment, opinion), or Certain Methods of Organizing Human Activity (1-fundamental economic principles or practices (including hedging, insurance, mitigating risk), 2-commercial or legal interactions (including agreements in the form of contracts; legal obligations; advertising, marketing or sales activities or behaviors; business relations), 3- managing personal behavior or relationships or interactions between people (including social activities, teaching, and following rules or instructions) and fall under the judicial exception to patentable subject matter?)
The rejected Claims recite Mental Processes.
Step 2A, Prong Two: Additional Elements that Integrate the Judicial Exception into a Practical Application? Identifying whether there are any additional elements recited in the claim beyond the judicial exception(s), and evaluating those additional elements to determine whether they integrate the exception into a practical application of the exception. “Integration into a practical application” requires an additional element(s) or a combination of additional elements in the claim to apply, rely on, or use the judicial exception in a manner that imposes a meaningful limit on the judicial exception, such that the claim is more than a drafting effort designed to monopolize the exception. Uses the considerations laid out by the Supreme Court and the Federal Circuit to evaluate whether the judicial exception is integrated into a practical application.
The rejected Claims do not include additional limitations that point to integration of the abstract idea into a practical application and are therefore directed to a Mental Process.
Claim 1 is a generic automation of a mental process because a human agent can sense the emotional state of a customer censor a transcription or interpretation. Prong Two of step 2A in the 101 analysis asks whether the abstract idea is integrated with a practical application. The answer is no in this instance because there is no technological solution in the Claim that “integrates” the abstract idea. The Claim only suggests that the abstract idea be applied. It does not describe an application. 
1. A speech signal processing apparatus for outputting a de-emotionalized speech signal in real time or after a time period, the speech signal processing apparatus comprising: [an interpreter can process a speech signal into another language]
- a speech signal detection apparatus for detecting a speech signal including at least one piece of emotion information and at least one piece of word information; [interpreter hears a rude word in an angry tone]
- an analysis apparatus for analyzing the speech signal with respect to the at least one piece of emotion information and the at least one piece of word information, [interpreter decides it is inappropriate]
- a processing apparatus for dividing the speech signal into the at least one piece of word information and into the at least one piece of emotion information and for processing the speech signal; and [interpreter makes a note of the word and the emotion]
- a coupling apparatus and/or a reproduction apparatus for reproducing the speech signal as de-emotionalized speech signal including the at least one piece of emotion information converted into a further piece of word information and/or the at least one piece of word information. [interpreter does not translate the rude word but instead says that the other party is angry.]
Step 2B: Search for Inventive Concept: Additional Element Do not amount to Significantly More: The limitations of "speech processing apparatus” are well-understood, routine, and conventional machine components that are being used for their well-understood, routine, and conventional and rather generic functions. Additionally, these limitations are expressed parenthetically and lack nexus to the Claim language and as such are a separable and divisible mention to a machine. Accordingly, they are not sufficient to cause the Claim to amount to significantly more than the underlying abstract idea. 
The Dependent Claims do not add limitations that could help the Claim as a whole to amount to significantly more than the Abstract idea identified for the Independent Claim:
2. The speech signal processing apparatus according to claim 1 comprising a storage apparatus storing the de-emotionalized speech signal and/or the detected speech signal to reproduce the de-emotionalized speech signal at any time, in particular to reproduce the stored speech signal at more than a single arbitrary time as a de-emotionalized speech signal. [interpreter writes down a transcription noting that the speech was angry]
3. The speech signal processing apparatus according to claim 1, wherein the processing apparatus is configured to recognize the further piece of word information comprised in the emotion information and to translate the same into a de-emotionalized speech signal and to forward the same to the reproduction apparatus for reproduction by the reproduction apparatus or to the coupling apparatus that is configured to connect to an external reproduction apparatus, in particular a smartphone or a tablet, to transmit the de-emotionalized signal for reproduction of the same. [agent reads the transcription; outputting by a smartphone or tablet amounts to necessary data outputting]
4. The speech signal processing apparatus according to claim 1, wherein the analysis apparatus is configured to analyze a disturbing noise and/or a piece of emotion information in the speech signal and the processing apparatus is configured to remove the analyzed disturbing noise and/or the piece of emotion information from the speech signal. [interpreter translates but omits the rude portion]
5. The speech signal processing apparatus according to claim 1, wherein the reproduction apparatus is configured to reproduce the de-emotionalized speech signal without the piece of emotion information or with the piece of emotion information that has been transcribed into the further piece of word information and/or with a newly impressed piece of emotion information. [interpreter translates but uses a kinder word]
6. The speech signal processing apparatus according to claim 1, wherein the reproduction apparatus comprises a loudspeaker and/or a display to reproduce the de-emotionalized speech signal, in particular in simplified language, by an artificial voice and/or by displaying a computer-written text and/or by generating and displaying picture card symbols and/or by animation of sign language. [this is merely necessary data output; the agent can speak in simplified language, in a monotone voice, transcribe text, point to picture cards, or use sign language.
7. The speech signal processing apparatus according to claim 1, wherein the analysis apparatus comprises a neuronal network that is configured to transcribe the piece of emotion information into the further piece of word information based on training data or based on a rule-based transcription. [neuronal network is recited at a generic level that amounts to reciting generic computing components]
8. The speech signal processing apparatus according to claim 1, wherein the speech signal processing apparatus comprises a GPS unit and/or a speaker recognition system configured to detect a current location coordinate of the speech signal processing apparatus and/or to recognize the speaker providing the speech signal and to adjust, based on the detected current location coordinate and/or speaker information, associated pre-settings for transcription at the speech signal processing apparatus. [interpreter recognizes the user is from a particular region and avoids language that is offensive in that region]
9. The speech signal processing apparatus according to claim 1 comprising signal exchange means configured to perform a signal transmission of a detected speech signal with one or several other speech signal processing apparatuses, in particular via radio or Bluetooth or LiFi (Light Fidelity). [this amounts to mere data output]
10. The speech signal processing apparatus according to claim 1 comprising an operating interface configured to divide the at least one piece of emotion information according to preferences set by a user into an undesired piece of emotion information and/or into a neutral piece of emotion information and/or into a positive piece of emotion information. [interpreter avoids a word the user has objected to in the past]
11. The speech signal processing apparatus according to claim 10 that is configured to categorize the at least one detected piece of emotion information into classes of different disturbing qualities, in particular those comprising the following allocation: Class 1 “very disturbing”, Class 2 “disturbing”, Class 3 “less disturbing”, Class 4 “not disturbing at all” and to reduce or suppress the at least one detected piece of emotion information that has been categorized in one of the Classes 1 “very disturbing” or Class 2 “disturbing” and/or to add the at least one detected piece of emotion information that has been categorized into one of the Classes 3 “less disturbing” or Class 4 “not disturbing at all” to the de-emotionalized speech signal and/or to add a generated piece of emotion information to the de-emotionalized signal in order to support comprehension of the de-emotionalized speech signal by a user. [interpreter uses a 4 class rating system and only translates less objectionable speech]
12. The speech signal processing apparatus according to claim 1 comprising a sensor that is configured, when in contact with a user, to identify undesired and/or mutual and/or positive emotion information for the user, wherein the sensor is configured to measure bio signals, such as to perform a neurophysiological measurement or to capture and evaluate an image of a user. [interpreter feels the user’s pulse and stops if increased heartrate is detected, or the user looks sad]
13. The speech signal processing apparatus according to claim 1 comprising a compensation apparatus that is configured to compensate an individual hearing impairment associated with a user by non-linear and/or frequency-dependent amplification of the de-emotionalized speech signal. [interpreter speaks in a low voice because the user has lost hearing of high frequencies]
14. The speech signal reproduction system comprising two or more speech signal processing apparatuses according to claim 1. [interpreter translates both ways between two users]
The additional limitations introduced by the Dependent Claims are not sufficient as additional elements that integrate the judicial exception into a practical application or as additional elements that cause the Claim as a whole to amount to substantially more than the underlying abstract idea.
With respect to Independent Claim 15, independent claim 26, and independent Claim 27, which have limitations similar to the limitations of Claim 1, the limitations of “A non-transitory digital storage medium” and “neuronal network” are expressed parenthetically and lack nexus to the Claim language and as such are a separable and divisible mention to a machine. Accordingly, they do not include additional limitations that cause the Claim as a whole to amount to more than the underlying abstract idea. 
The Dependent Claims 16, 18-22, and 24-25 are similar to claims 2, 6, and 8-13 and do not add limitations that could integrate the judicial exception into a practical application or help the Claim as a whole to amount to significantly more than the Abstract idea identified for the Independent Claim. Claim 17 does not add any meaningful limitation because when n = 1, it is similar to claim 1. Additionally, an interpreter could provide several translation options. Claim 23 is similar to claim 4 and claim 10 and does not add limitations that could integrate the judicial exception into a practical application or help the Claim as a whole to amount to significantly more than the Abstract idea identified for the Independent Claim.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-6, 8-10, 14-18, 21, 23, and 25-26 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Shah et al. (US 20220293122 A1).

Regarding claim 1, Shah discloses: 1. A speech signal processing apparatus ("[0078] ... In another embodiment, the hardware component may comprise a general-purpose microprocessor (e.g., CPU, GPU) that is first converted to a special-purpose microprocessor…")
for outputting a de-emotionalized speech signal in real time or after a time period, the speech signal processing apparatus comprising: ("[0006] In one embodiment, the content of a communication is emotionally “flattened” so as to present the content of what is said in a call, or typed in a text message, to the agent, but with some or all of the negative emotionally-charged content removed...")
- a speech signal detection apparatus for detecting a speech signal ("[0065]...Process 300 begins and a communication is initiated in step 302. The communication may be inbound (e.g., initiated by user 102 using user device 104 to agent device 112 and agent 114) or outbound (e.g., initiated by agent device 112, server 108, and/or another component to user device 104) ...")
including at least one piece of emotion information and at least one piece of word information; ("[0053] As introduced above, what is considered emotionally charged content may include words, phrases, gestures, intonations, volume (i.e., audio waveform amplitude), images, etc…")
- an analysis apparatus for analyzing the speech signal with respect to the at least one piece of emotion information and the at least one piece of word information, ("[0006]…The initiation or escalation from one level to another level of emotional filtering may be determine by a manual input, such as the agent or a supervisor monitoring the communication, or by an automated system, such as an artificial intelligence (AI), such as a neural network, to automatically detect the presence of actionable emotional content, the need to mitigate the emotional content, and/or the extent of the mitigation..." )
- a processing apparatus for dividing the speech signal into the at least one piece of word information and into the at least one piece of emotion information and for processing the speech signal; and ("[0007]...As a result, the systems and methods herein may tonally modify the speech, such as to volume-balance the speech so the volume is not elevated above the mean volume of the rest of the conversation; the pitch, pace, or inflection of the speech may be modified to remove the “anger” from the words spoken, the phrase may be redacted or removed and not presented to the agent, and/or a substitute expression inserted into the agent's audio, such as a “bleep” or a mere description, such as, “I am upset.” In another embodiment, the content may be omitted from the communication and a description presented via other means, such as textually or graphically on a display utilized by the agent during the communication. For example, the customer's outburst during a voice call may be presented as silence with a text message presented on the agent's display stating, “Customer is upset.”" - "anger" is emotion information; "Customer is upset" is word information. )
- a coupling apparatus and/or a reproduction apparatus for reproducing the speech signal as de-emotionalized speech signal including the at least one piece of emotion information converted into a further piece of word information and/or the at least one piece of word information. ("[0007]...phrase may be redacted or removed and not presented to the agent, and/or a substitute expression inserted into the agent's audio, such as a “bleep” or a mere description, such as, “I am upset.”…" )

Regarding claim 2, Shah discloses: 2. The speech signal processing apparatus according to claim 1 comprising a storage apparatus storing the de-emotionalized speech signal and/or the detected speech signal to reproduce the de-emotionalized speech signal at any time, in particular to reproduce the stored speech signal at more than a single arbitrary time as a de-emotionalized speech signal. ("[0061] Server 108 may record the communication, such as for storage in data storage 110 and/or other storage device…" - See also [0060] which discloses reviewing a voice recording.)

Regarding claim 3, Shah discloses: 3. The speech signal processing apparatus according to claim 1, wherein the processing apparatus is configured to recognize the further piece of word information comprised in the emotion information and to translate the same into a de-emotionalized speech signal  ("[0056]... Additionally or alternatively, the speech may be converted as a result of performing a speech-to-text translation of the speech provided by user 102 and then presenting the text on agent device 112 to be read by agent 114 and/or convert the resulting text back to speech in a voice having less negative perception (e.g., robotic voice, soft spoken simulated human voice, etc.)." )
and to forward the same to the reproduction apparatus for reproduction by the reproduction apparatus or to the coupling apparatus that is configured to connect to an external reproduction apparatus, (Fig. 1 shows that the user 114 is at an external device from the server 108. )
in particular a smartphone or a tablet, to transmit the de-emotionalized signal for reproduction of the same. ("[0091]...Exemplary hardware that can be used for the present invention includes computers, handheld devices, telephones (e.g., cellular, Internet enabled, digital, analog, hybrids, and others), and other hardware known in the art." )

Regarding claim 4, Shah discloses: 4. The speech signal processing apparatus according to claim 1, wherein the analysis apparatus is configured to analyze a disturbing noise and/or a piece of emotion information in the speech signal and the processing apparatus is configured to remove the analyzed disturbing noise and/or the piece of emotion information from the speech signal. ("[0056] Server 108 may attenuate the content of the communication comprising the emotionally charged content by removing it entirely and presenting silence to agent 114; ..." )

Regarding claim 5, Shah discloses: 5. The speech signal processing apparatus according to claim 1, wherein the reproduction apparatus is configured to reproduce the de-emotionalized speech signal without the piece of emotion information or with the piece of emotion information that has been transcribed into the further piece of word information and/or with a newly impressed piece of emotion information. ("[0056] Server 108 may attenuate the content of the communication comprising the emotionally charged content by removing it entirely and presenting silence to agent 114; substituting a meaning of the emotionally charged content, “customer is upset;” substituting indicia of generic speech, “customer is discussing an irrelevant topic;” etc. ..." )

Regarding claim 6, Shah discloses: 6. The speech signal processing apparatus according to claim 1, wherein the reproduction apparatus comprises a loudspeaker and/or a display to reproduce the de-emotionalized speech signal, ("[0073]...Examples of input/output devices 530 that may be connected to input/output interface include, but are not limited to, keyboard, mouse, trackball, printers, displays, sensor, switch, relay, speaker, microphone, still and/or video camera, etc…" )
 in particular in simplified language,  ("[0056] Server 108 may attenuate the content of the communication comprising the emotionally charged content by removing it entirely and presenting silence to agent 114; substituting a meaning of the emotionally charged content, “customer is upset;” substituting indicia of generic speech, “customer is discussing an irrelevant topic;” etc. ..." – customer is upset is simplified language.)
by an artificial voice  ("[0056]... Additionally or alternatively, the speech may be converted as a result of performing a speech-to-text translation of the speech provided by user 102 and then presenting the text on agent device 112 to be read by agent 114 and/or convert the resulting text back to speech in a voice having less negative perception (e.g., robotic voice, soft spoken simulated human voice, etc.)." )
and/or by displaying a computer-written text and/or by generating and displaying picture card symbols and/or by animation of sign language. ("[0007]...For example, the customer's outburst during a voice call may be presented as silence with a text message presented on the agent's display stating, “Customer is upset.”" )

Regarding claim 8, Shah discloses: 8. The speech signal processing apparatus according to claim 1, wherein the speech signal processing apparatus comprises a GPS unit and/or a speaker recognition system configured to detect a current location coordinate of the speech signal processing apparatus and/or to recognize the speaker providing the speech signal and to adjust, based on the detected current location coordinate and/or speaker information, associated pre-settings for transcription at the speech signal processing apparatus. ("[0009] An AI system may be seeded with default values for what is previously determined to be an unacceptable expression. This may be further limited to an agent's demographic, such as country or region of origin, religion, ethnicity, social-economic status, etc. to account for the different perceptions of language and other forms of human communication. For example, the use of slang or a particular expression may have no consequence to one person, or even be considered as friendly, but insulting, hurtful, or otherwise not acceptable for others..." )

Regarding claim 9, Shah discloses: 9. The speech signal processing apparatus according to claim 1 comprising signal exchange means configured to perform a signal transmission of a detected speech signal with one or several other speech signal processing apparatuses, in particular via radio or Bluetooth or LiFi (Light Fidelity). ("[0074] Network 106 may be embodied, in whole or in part, as network 520. Network 520 may be a wired network (e.g., Ethernet), wireless (e.g., WiFi, Bluetooth, cellular, etc.) network, or combination thereof and enable device 502 to communicate with networked component(s) 522..." )

Regarding claim 10, Shah discloses: 10. The speech signal processing apparatus according to claim 1 comprising an operating interface configured to divide the at least one piece of emotion information according to preferences set by a user into an undesired piece of emotion information and/or into a neutral piece of emotion information and/or into a positive piece of emotion information. ("[0009]...The AI may receive inputs from the agent that indicate their perception of a particular portion of a communication provided by a customer. The input may be manual, such as by turning a dial, whether physical or embodied as a graphical element on a display device, or otherwise providing a direct indication of the agent's perception of the communication or a portion of a communication."  ; see also "[0011] While no agent may enjoy certain negative expressions directed to themselves in a communication, in another embodiment, a threshold level or type of negative expression is determined that is likely to impact the agent's performance in the current call, subsequent call, or over a period of subsequent calls." )

Regarding claim 14, Shah discloses: 14. The speech signal reproduction system comprising two or more speech signal processing apparatuses according to claim 1. ("[0015] The embodiments herein may be directed to other two-party communications beyond those of a customer and agent, as well as to incorporate three or more people. Each person may have their own standard for what is negatively emotionally charged, such as offensive or considered as a personal attack, etc. Accordingly, the background, ethnicity, origin, etc., of each participant may be determined and content presented or attenuated/blocked varied on an individual basis..." )

Claim 15 is a method claim with limitations corresponding to the limitations of Claim 1 and is rejected under similar rationale.

Claim 16 is a method claim with limitations corresponding to the limitations of Claim 2 and is rejected under similar rationale.

Regarding claim 17, Shah discloses: 17. The method according to claim 15, comprising: detecting the at least one piece of emotion information in the speech signal; analyzing the at least one piece of emotion information with respect to possible transcriptions of the at least one piece of emotion signal into n different further pieces of word information, wherein n is a natural number greater than or equal to 1 and n indicates the number of options of appropriately transcribing the at least one piece of further emotion information into the at least one further piece of word information; transcribing the at least one piece of emotion information into the n different further pieces of word information. (“[0008]… Additionally or alternatively, natural language processing (NPL) maybe utilized to extract a meaning from the text, whether raw text from a chat or email message, or generated message from speech, to select an alternative phrase that conveys a similar meaning relative to the purpose of the communication but with less emotional context.” – this reads on the case of n equal to 1.)

Regarding claim 18, Shah discloses: 18. The method according to claim 15, comprising: identifying undesired and/or neutral and/or positive emotion information by a user by means of an operating interface. ("[0009]...The AI may receive inputs from the agent that indicate their perception of a particular portion of a communication provided by a customer. The input may be manual, such as by turning a dial, whether physical or embodied as a graphical element on a display device, or otherwise providing a direct indication of the agent's perception of the communication or a portion of a communication.")

Claim 21 is a method claim with limitations corresponding to the limitations of Claim 6 and is rejected under similar rationale.

Regarding claim 23, Shah discloses: 23. The method according to claim 15, comprising: analyzing whether a disturbing noise, in particular one defined individually by a user is detected in the detected speech signal, removing the detected disturbing noise. ("[0009]...The AI may receive inputs from the agent that indicate their perception of a particular portion of a communication provided by a customer. The input may be manual, such as by turning a dial, whether physical or embodied as a graphical element on a display device, or otherwise providing a direct indication of the agent's perception of the communication or a portion of a communication." )


Regarding claim 25, Shah discloses: 25. The method according to claim 15, comprising: transmitting a detected speech signal from a speech signal processing apparatus to another speech signal processing apparatus or to several speech signal processing apparatuses by means of GPS or radio or Bluetooth or LiFi (Light Fidelity). ("[0074] Network 106 may be embodied, in whole or in part, as network 520. Network 520 may be a wired network (e.g., Ethernet), wireless (e.g., WiFi, Bluetooth, cellular, etc.) network, or combination thereof and enable device 502 to communicate with networked component(s) 522..." )

Claim 26 is a digital storage medium claim with limitations corresponding to the limitations of Claim 1 and is rejected under similar rationale.  Additionally, “A non-transitory digital storage medium having a computer program stored thereon” of the Claim are taught by Shah. (“[0034]… In the context of this document, a computer-readable storage medium may be any tangible, non-transitory medium that can contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.”)	

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim(s) 11 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Shah.

Regarding claim 11, Shah discloses: 11. The speech signal processing apparatus according to claim 10 that is configured to categorize the at least one detected piece of emotion information into classes of different disturbing qualities, in particular those comprising the following allocation: Class 1 “very disturbing”, Class 2 “disturbing”, Class 3 “less disturbing”, Class 4 “not disturbing at all” (“[0009]...The AI may receive inputs from the agent that indicate their perception of a particular portion of a communication provided by a customer. The input may be manual, such as by turning a dial, whether physical or embodied as a graphical element on a display device, or otherwise providing a direct indication of the agent's perception of the communication or a portion of a communication." – a dial implies an adjustable value for how disturbing the portion is.)
and to reduce or suppress the at least one detected piece of emotion information that has been categorized in one of the Classes 1 “very disturbing” or Class 2 “disturbing” ("[0011] While no agent may enjoy certain negative expressions directed to themselves in a communication, in another embodiment, a threshold level or type of negative expression is determined that is likely to impact the agent's performance in the current call, subsequent call, or over a period of subsequent calls." – Shah discloses suppressing information that is disturbing to the agent (see claim 1). The use of a threshold implies that some amount of disturbance is blocked and some is not.)
and/or to add the at least one detected piece of emotion information that has been categorized into one of the Classes 3 “less disturbing” or Class 4 “not disturbing at all” to the de-emotionalized speech signal (Fig. 3 shows that if the content is not emotionally charged, it is presented.)
and/or to add a generated piece of emotion information to the de-emotionalized signal in order to support comprehension of the de-emotionalized speech signal by a user. (“[0007]...phrase may be redacted or removed and not presented to the agent, and/or a substitute expression inserted into the agent's audio, such as a “bleep” or a mere description, such as, “I am upset.”…”)
Shah does not explicitly disclose 4 classes based on the level of disturbance. However, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Shah with 4 classes. Doing so would have been obvious to try. See MPEP 2143.I.(E). Shah discloses “[0006] In one embodiment, the content of a communication is emotionally “flattened” so as to present the content of what is said in a call, or typed in a text message, to the agent, but with some or all of the negative emotionally-charged content removed. The initiation or escalation from one level to another level of emotional filtering may be determine by a manual input…” Shah further describes storing the attenuation level in 0064. Therefore, it is clear that Shah implies multiple classes; only the number of classes is not disclosed, and it would have been obvious to try different numbers of classes.

Claim 20 is a method claim with limitations corresponding to the limitations of Claim 11 and is rejected under similar rationale.

Claim(s) 7 and 27 is/are rejected under 35 U.S.C. 103 as being unpatentable over Shah in view of Gangotri et al. (US 20210192332 A1).

Regarding claim 7, Shah discloses: 7. The speech signal processing apparatus according to claim 1, wherein the analysis apparatus comprises a neuronal network  ("[0067]... In one embodiment, process 400 produces a trained neural network that may then be accessed, such as by a processor of server 108, to determine whether content within a communication, such as current communication between user 102 and agent 114, comprises emotionally charged content and, if so, attenuates the emotionally charged content in the communication as presented to agent device 112."; see also "[0068] A neural network, as is known in the art and in one embodiment, self-configures layers of logical nodes having an input and an output..." )
that is configured to transcribe the piece of emotion information into the further piece of word information based on training data or based on a rule-based transcription. (Not explicitly disclosed )
	Shah does not explicitly disclose that a neural network is used to determine what the emotion is, only that it is emotionally charged.
Gangotri discloses: 7. The speech signal processing apparatus according to claim 1, wherein the analysis apparatus comprises a neuronal network that is configured to transcribe the piece of emotion information into the further piece of word information based on training data or based on a rule-based transcription. (Fig. 1 shows a trained model is used to determine emotion labels from speech samples. [0006] discloses that the model is a neural network. )
Shah and Gangotri are considered analogous art to the claimed invention because they disclose detecting emotion in speech. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Shah with the emotion labeling of Gangotri. Doing so would have been beneficial in order to accurately recognize elevated states of a customer in a call. (Gangotri [0005]-[0006]) 

Regarding claim 27, Shah discloses: 1. A speech signal processing apparatus ("[0078] ... In another embodiment, the hardware component may comprise a general-purpose microprocessor (e.g., CPU, GPU) that is first converted to a special-purpose microprocessor…")
for outputting a de-emotionalized speech signal, the speech signal processing apparatus comprising: ("[0006] In one embodiment, the content of a communication is emotionally “flattened” so as to present the content of what is said in a call, or typed in a text message, to the agent, but with some or all of the negative emotionally-charged content removed...")
- a speech signal detection apparatus configured to detect a speech signal ("[0065]...Process 300 begins and a communication is initiated in step 302. The communication may be inbound (e.g., initiated by user 102 using user device 104 to agent device 112 and agent 114) or outbound (e.g., initiated by agent device 112, server 108, and/or another component to user device 104) ...")
comprising at least one piece of emotion information and at least one piece of word information; ("[0053] As introduced above, what is considered emotionally charged content may include words, phrases, gestures, intonations, volume (i.e., audio waveform amplitude), images, etc…")
- an analysis apparatus comprising a neuronal network or artificial intelligence configured to analyze the speech signal with respect to the at least one piece of emotion information and the at least one piece of word information, ("[0006]…The initiation or escalation from one level to another level of emotional filtering may be determine by a manual input, such as the agent or a supervisor monitoring the communication, or by an automated system, such as an artificial intelligence (AI), such as a neural network, to automatically detect the presence of actionable emotional content, the need to mitigate the emotional content, and/or the extent of the mitigation..." )
- a processing apparatus comprising a neuronal network or artificial intelligence configured to divide the speech signal into the at least one piece of word information and into the at least one piece of emotion information and to process the speech signal, wherein the at least one piece of emotion information is transcribed into a further piece of word information; and ("[0007]...As a result, the systems and methods herein may tonally modify the speech, such as to volume-balance the speech so the volume is not elevated above the mean volume of the rest of the conversation; the pitch, pace, or inflection of the speech may be modified to remove the “anger” from the words spoken, the phrase may be redacted or removed and not presented to the agent, and/or a substitute expression inserted into the agent's audio, such as a “bleep” or a mere description, such as, “I am upset.” In another embodiment, the content may be omitted from the communication and a description presented via other means, such as textually or graphically on a display utilized by the agent during the communication. For example, the customer's outburst during a voice call may be presented as silence with a text message presented on the agent's display stating, “Customer is upset.”" - "anger" is emotion information; "Customer is upset" is word information. )
- a coupling apparatus and/or a reproduction apparatus configured to reproduce the speech signal as de-emotionalized speech signal comprising the at least one piece of emotion information converted into a further piece of word information and the at least one piece of word information. ("[0007]… As a result, the systems and methods herein may tonally modify the speech, such as to volume-balance the speech so the volume is not elevated above the mean volume of the rest of the conversation; the pitch, pace, or inflection of the speech may be modified to remove the “anger” from the words spoken, the phrase may be redacted or removed and not presented to the agent, and/or a substitute expression inserted into the agent's audio, such as a “bleep” or a mere description, such as, “I am upset.”” – substitute information is a further piece of word information. Lowering the volume of speech is presenting the original piece of word information.)
Shah does not explicitly disclose that a neural network is used to determine what the emotion is, only that it is emotionally charged.
Gangotri discloses: a neural network to determine an emotion label (Fig. 1 shows a trained model is used to determine emotion labels from speech samples. [0006] discloses that the model is a neural network. )
Shah and Gangotri are considered analogous art to the claimed invention because they disclose detecting emotion in speech. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Shah with the emotion labeling of Gangotri. Doing so would have been beneficial in order to accurately recognize elevated states of a customer in a call. (Gangotri [0005]-[0006]) 

Claim(s) 12 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Shah in view of Jagmag et al. (US 20200273485 A1).

Regarding claim 12, Shah discloses: 12. The speech signal processing apparatus according to claim 1 comprising a sensor that is configured, when in contact with a user, to identify undesired and/or mutual and/or positive emotion information for the user, wherein the sensor is configured to measure bio signals, such as to perform a neurophysiological measurement or to capture and evaluate an image of a user. ("[0057] In another embodiment, the communication comprises video. Server 108, or at least one processor thereof, may utilize an algorithm and/or artificial intelligence, such as a neural network trained to identify emotionally charged content in a video image whether in general or specific to an entry in a data record of data storage 110 specific to agent 114. Accordingly, user device 104 may comprise a video camera (not shown) capturing an image of user 102. Server 108 determines the body position, gesture, facial expression, background/foreground, etc., comprises emotionally charged content. As a result, server 108 applies one or more attenuations to the video image alone or incrementally or in parallel.”)
Shah does not disclose that the sensor is in contact with a user. 
	Jagmag discloses a sensor in conta
Read full office action
Prosecution Timeline

Feb 01, 2024
Application Filed
Nov 01, 2025
Non-Final Rejection — §101, §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/881,473
Patent 12603087
VOICE RECOGNITION USING ACCELEROMETERS FOR SENSING BONE CONDUCTION
2y 5m to grant Granted Apr 14, 2026
18/303,296
Patent 12579975
Detecting Unintended Memorization in Language-Model-Fused ASR Systems
2y 5m to grant Granted Mar 17, 2026
17/979,989
Patent 12482487
MULTI-SCALE SPEAKER DIARIZATION FOR CONVERSATIONAL AI SYSTEMS AND APPLICATIONS
2y 5m to grant Granted Nov 25, 2025
18/020,514
Patent 12475312
FOREIGN LANGUAGE PHRASES LEARNING SYSTEM BASED ON BASIC SENTENCE PATTERN UNIT DECOMPOSITION
2y 5m to grant Granted Nov 18, 2025
18/065,374
Patent 12430329
TRANSFORMING NATURAL LANGUAGE TO STRUCTURED QUERY LANGUAGE BASED ON MULTI-TASK LEARNING AND JOINT TRAINING
2y 5m to grant Granted Sep 30, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
46%
Grant Probability
99%
With Interview (+59.0%)
3y 0m
Median Time to Grant
Low
PTA Risk
Based on 22 resolved cases by this examiner. Grant probability derived from career allow rate.