Last updated: April 19, 2026
Application No. 18/132,251
PREDICTIVE AUDIO REDACTION FOR REALTIME COMMUNICATION

Non-Final OA §103§112
Filed
Apr 07, 2023
Examiner
KIM, JONATHAN C
Art Unit
2655
Tech Center
2600 — Communications
Assignee
Modulate Inc.
OA Round
3 (Non-Final)
Interview Optional

— +40.6% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 355 resolved cases, 2023–2026
Examiner Intelligence

KIM, JONATHAN C View full profile →
Grants 74% — above average
Career Allow Rate
261 granted / 355 resolved
+11.5% vs TC avg
Strong +41% interview lift
Without
With
+40.6%
Interview Lift
resolved cases with interview
Typical timeline
2y 7m
Avg Prosecution
20 currently pending
Career history
375
Total Applications
across all art units
Statute-Specific Performance

§101
17.6%
-22.4% vs TC avg
§103
47.5%
+7.5% vs TC avg
§102
11.8%
-28.2% vs TC avg
§112
15.0%
-25.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 355 resolved cases
Office Action

§103 §112
DETAILED ACTION
This Office Action is in response to the correspondence filed by the applicant on 11/11/2025.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Regarding Claim Rejection under 35 U.S.C. 102, Applicant’s arguments with respect to rejections have been fully considered and are moot upon a further consideration and a new ground(s) of rejection made under AIA  35 U.S.C. 103 as being unpatentable over CHONG (US 2021/0389924 A1), and further in view of MITCHEM (US 11,706,337 B1).  

  Applicant asserts CHONG does not teach “artificial intelligence”  However, Examiner respectfully disagrees.  Again, by today’s plain dictionary definition, the term “artificial intelligence” is defined as “the theory and development of computer systems able to perform tasks that normally require human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages.”  (Definition from oxford languages:  https://www.google.com/search?q=artificial+intelligence+definition&rlz=1C1GCEA_enUS1061US1061&oq=artificial+intelligence+definition&gs_lcrp=EgZjaHJvbWUyDggAEEUYJxg5GIAEGIoFMgcIARAAGIAEMgcIAhAAGIAEMgcIAxAAGIAEMgcIBBAAGIAEMgcIBRAAGIAEMgcIBhAAGIAEMgYIBxBFGDzSAQc0NTFqMGo5qAIA sAIB&sourceid=chrome&ie=UTF-8).
Since the method/system of CHONG a computer program/model that automatically performs the recited tasks/steps of the claims (e.g., extracting and redacting sensitive information from audio) that normally require human intelligence, the method/system of CHONG is an artificial intelligence system.  Thus, the system of CHONG (speech recognition engine, extraction and redaction engine, sensitive data storage, communication gateway, etc.) as a whole, is an AI. Regarding the newly added limitation, “being a machine learning model trained,” one of ordinary skill in the art would recognizes the speech recognition engine requires a trained machine learning models (e.g., acoustic model, language model, and/or end-to-end model, etc.).  Although CHONG implicitly suggests the machine learning model, Examiner provides MITCHEM for the completeness of the rejections.  Please see the rejections below for more details.


Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: a communication interface and a system output interface in claim 13.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 9 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
Regarding claim 9, the phrase "for example" renders the claim indefinite because it is unclear whether the limitation(s) following the phrase are part of the claimed invention.  See MPEP § 2173.05(d).


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1-6, 10, and 12-20 are rejected under 35 U.S.C. 103 as being unpatentable over CHONG (US 2021/0389924 A1), and in further view of MITCHEM (US 11,706,337 B1).

REGARDING CLAIM 1, CHONG discloses a computer-implemented method of moderating a verbal communication, the method comprising:
receiving at the computer, at a reception time (r1), an electronic speech signal of the verbal communication (Par 43 – “According to some embodiments of the concepts and technologies disclosed herein, the call 114 can include data that can identify a calling party and/or a device associated with the calling party such as, for example, a customer device 116. In various embodiments, functionality of the customer device 116 may be provided by one or more computers, mobile telephones, laptop computers, tablet computers, telephones, other computing systems, other communications devices, combinations thereof, or the like.”; Par 48 – “The “call handling” referred to herein can include an exchange of audio data and associated information (“audio”) associated with the call 114 between the agent device 120 and the customer device 116 to resolve an issue or issues for which the customer initiated the call 114 to the call center 102 (or the call 114 that was eventually routed to the call center 102 as noted above).”), said electronic speech signal of the verbal communication comprising a first portion at a first time (t1) (Fig. 2 – “Sure, my credit card number is”; Par 83 – “As shown in FIG. 2, the audio 200 can include a first instance of nonsensitive audio 124A, a first instance of sensitive audio 122A, and a second instance of nonsensitive audio 124B. It can be appreciated that multiple instances of sensitive audio 122 and nonsensitive audio 124 may be included, and that the illustrated example is merely illustrative and is provided for illustration purposes only. As shown in FIG. 2, these instances of audio can correspond to the spoken words illustrated under the schematic representation of the call 114. As can be appreciated with reference to FIG. 2, the phrase “Sure, my credit card number is” can correspond to the first instance of nonsensitive audio 124A; the phrase “1234 5678 9012 3456” can correspond to the first instance of sensitive audio 122A; and the phrase “Can you please tell me when that charge will clear and . . . ” can correspond to the second instance of nonsensitive audio 124B. It should be understood that this example is illustrative, and therefore should not be construed as being limiting in any way.”; Par103 – “It can be appreciated that if the call 114 includes audio having fifteen seconds of nonsensitive audio 124, five seconds of sensitive audio 122, and another fifteen seconds of nonsensitive audio 124 (a total of thirty five seconds), that the modified call 134 provided to the agent device 120 can include fifteen seconds of nonsensitive audio 124, five seconds of substitute audio 138, and another fifteen seconds of nonsensitive audio 124 (a total of thirty five seconds).”);
providing the electronic speech signal to an artificial intelligence, the artificial intelligence (Par 56 – “According to some embodiments, the extraction and redaction engine 110 can detect a potential impending disclosure of sensitive information based on analyzing output from the speech recognition engine 108.”) [being a machine learning model trained] to: 
(1) process said first portion of the electronic speech signal and thereby predict target speech at a time window (t2-t3) (Par 54 – “More particularly, the extraction and redaction engine 110 can predict an impending disclosure of sensitive information during a call 114 (before the sensitive information is said or heard), and can perform operations to prevent the disclosure of the sensitive information to some entities (e.g., to the agent associated with the agent device 120) during the call 114.”; Par 56 –“ For example, if a caller says “my social security number is,” the extraction and redaction engine 110 can determine that the next words spoken, if numbers, are likely to correspond to a social security number and therefore can extract and redact (from the audio associated with the call 114), the spoken numbers. It should be understood that this example is illustrative, and therefore should not be construed as being limiting in any way.”; Par 96 – “As used herein, a “sensitive personal information event” can refer to the providing of, the speaking of, the entry of, and/or any other exchange of information that is potentially sensitive, commonly referred to as “sensitive personal information” in the telecommunications industry. Thus, in operation 308, the computing device 106 can determine that such an exchange of information is about to occur (before the actual exchange occurs as explained herein).”; Par 110 – “As explained above, the agent device 120 can detect the sensitive personal information event by detecting selection (e.g., by the agent) of a smart field or other field for entry of personal information (e.g., a credit card number, a name, a social security number, etc.), by detecting the agent asking for sensitive information (e.g., “what is your credit card number,” “what is your social security number,” or the like), and/or in other manners.”; Par 100 – “Thus, in some embodiments, the computing device 106 can generate the modified call 134 in operation 310, where the modified call 134 can include the call identifier 118, the substitute audio 138 (a time period of soft tone, music, spoken words, or other audio that is substituted for a matching time period of sensitive audio 122 that has been removed), and other information such as, for example, characters, commands, and/or other information. It should be understood that this example is illustrative, and therefore should not be construed as being limiting in any way.”; Par 103 – “From operation 310, the method 300 can proceed to operation 312. At operation 312, the computing device 106 can provide the modified call 134 to the agent device 120. It can be appreciated that if the call 114 includes audio having fifteen seconds of nonsensitive audio 124, five seconds of sensitive audio 122, and another fifteen seconds of nonsensitive audio 124 (a total of thirty five seconds), that the modified call 134 provided to the agent device 120 can include fifteen seconds of nonsensitive audio 124, five seconds of substitute audio 138, and another fifteen seconds of nonsensitive audio 124 (a total of thirty five seconds).”), which time window (t2-t3) is subsequent to the first portion of the electronic speech signal (Fig. 2; Par 83 – “As can be appreciated with reference to FIG. 2, the phrase “Sure, my credit card number is” can correspond to the first instance of nonsensitive audio 124A; the phrase “1234 5678 9012 3456” can correspond to the first instance of sensitive audio 122A; and the phrase “Can you please tell me when that charge will clear and . . . ” can correspond to the second instance of nonsensitive audio 124B.”), said target speech comprising a pre-defined set of terms to be redacted (Par 70 – “The sensitive data 140 can correspond, in some embodiments, to the sensitive information (e.g., credit card numbers, social security numbers, or the like) that can be captured by the speech recognition engine 108 and/or other functionality.”), and 
(2) redact said target speech from said electronic speech signal during said time window to produce a redacted verbal communication signal (Par 85 – “As can be appreciated with reference to FIG. 2, the phrase “1234 5678 9012 3456” has been removed or redacted from the modified audio 202 and replaced with the first instance of substitute audio 138A.”; Par 100 – “Thus, in some embodiments, the computing device 106 can generate the modified call 134 in operation 310, where the modified call 134 can include the call identifier 118, the substitute audio 138 (a time period of soft tone, music, spoken words, or other audio that is substituted for a matching time period of sensitive audio 122 that has been removed), and other information such as, for example, characters, commands, and/or other information.”) with less than 500 ms of introduced latency as measured from the reception time (r1) (CHONG Par 34 – “As used herein, the phrase “real-time” is used to a refer to a process in which input (e.g., a user's spoken speech during a telephone call) is processed (e.g., the sensitive portions of the user's speech are removed, optionally substituted, and the sensitive information is captured and/or used) within a small number of milliseconds such that the process occurs virtually immediately. In some embodiments, for example, the number of milliseconds can include a number within a range of less than one to five milliseconds; a range of five to twenty milliseconds; or a range of up to fifty milliseconds.”);
producing the redacted verbal communication signal using the artificial intelligence (Par 85 – “In the modified audio 202, however, the spoken words illustrated under the schematic representation of the modified call 134 can reflect what the agent and/or other entity at the agent device 120 hears. As can be appreciated with reference to FIG. 2, the phrase “1234 5678 9012 3456” has been removed or redacted from the modified audio 202 and replaced with the first instance of substitute audio 138A.”); and 
providing said redacted verbal communication signal from the artificial intelligence to a consumer (Par 85 – “In the modified audio 202, however, the spoken words illustrated under the schematic representation of the modified call 134 can reflect what the agent and/or other entity at the agent device 120 hears. As can be appreciated with reference to FIG. 2, the phrase “1234 5678 9012 3456” has been removed or redacted from the modified audio 202 and replaced with the first instance of substitute audio 138A.”; Par 103 – “From operation 310, the method 300 can proceed to operation 312. At operation 312, the computing device 106 can provide the modified call 134 to the agent device 120.”).
CHONG does not explicitly teach the [square-bracketed] limitation.


MITCHEM discloses the [square-bracketed] limitations.  MITCHEM discloses a method/system for censoring speech conversations between two users and redacting sensitive audio stream comprising:
providing the electronic speech signal to an artificial intelligence, the artificial intelligence [being a machine learning model trained] (MITCHEM Col 9:13-26 – “The AI assistant 201 may be configured to be taught to process the inputs. The AI assistant 201 may be fed a training set of data. The training set of data may comprise data associated with an industry . The industry may be associated with a business of the CSR. For example, the training set of data may comprise an employee training manual, business vocabulary, and/or data associated with historical customer communication sessions. The training set of data may comprise one or more inputs. The training set of data may comprise one or more outputs associated with the inputs. The training set of data may comprise one or more input classification codes. The training set may comprise one or more outputs associated with the classification codes.”) to: 
(1) process said first portion of the electronic speech signal and thereby predict target speech at a time window (t2-t3), which time window (t2-t3) is subsequent to the first portion of the electronic speech signal (MITCHEM Col 2:12-23 – “The AI assistant may listen and/or have access to a conversation between a CSR and a customer. Based on the customer's questions, responses, and/or statements, the AI assistant may determine questions, responses, or statements for the CSR and communicate the questions, responses, or statements to the CSR.”; Col 12:62-13:12 – “The output may be determined based on a mapping and/or table of inputs (e.g., input characteristics, input classification codes, etc.) and outputs. The output may be determined based on data received from an artificial neural network. The output may be determined based on a determined probability success of an output. The output may be determined based on other communication sessions (e.g., with the customer and/or other customers). The output may be determined using predictive analysis.”; Claim 1 – “determining, by the processing, that at least a portion of the input includes sensitive information;”), said target speech comprising a pre-defined set of terms to be redacted (MITCHEM Col 8:58 – 9:2 –“The probabilities of success of the outputs may be in response to one or more inputs. The artificial neural network may be configured to map one or more outputs to one or inputs (e.g., key words in the input) or characteristics of an input (e.g., tone, emotion, meaning).”; Col 13:33-56 – “Based on a determination that the CSR is not authorized to receive sensitive information, the AI assistant may censor the information, such as by temporarily silencing audio from the customer to the CSR or preventing the display of the information on the CSR's device. The AI assistant may record the sensitive information, such as in a secure memory and/or database.”), and 
(2) redact said target speech from said electronic speech signal during said time window to produce a redacted verbal communication signal (MITCHEM Col 13:33-56 – “Based on a determination that the CSR is not authorized to receive sensitive information, the AI assistant may censor the information, such as by temporarily silencing audio from the customer to the CSR or preventing the display of the information on the CSR's device. The AI assistant may record the sensitive information, such as in a secure memory and/or database.”) with less than 500 ms of introduced latency as measured from the reception time (r1) (MITCHEM Col 8:28-36 – “The AI assistant 201 may comprise a processor 206. The processor 206 may comprise a microprocessor. The processor 206 may comprise a digital signal processor (DSP), such as a DSP chip. The processor 206 may comprise a real-time dialogue-processing component.”);
producing the redacted verbal communication signal using the artificial intelligence (MITCHEM Col 13:33-56 – “Based on a determination that the CSR is not authorized to receive sensitive information, the AI assistant may censor the information, such as by temporarily silencing audio from the customer to the CSR or preventing the display of the information on the CSR's device. The AI assistant may record the sensitive information, such as in a secure memory and/or database.”); and 
providing said redacted verbal communication signal from the artificial intelligence to a consumer (MITCHEM Col 13:33-56 – “Based on a determination that the CSR is not authorized to receive sensitive information, the AI assistant may censor the information, such as by temporarily silencing audio from the customer to the CSR or preventing the display of the information on the CSR's device. The AI assistant may record the sensitive information, such as in a secure memory and/or database.”).
In other words, MITCHEM teaches an AI assistant that is trained to receive real-time speech conversations, analyzes the conversations, detect keywords in the conversations, perform predictive analysis, detect sensitive information in the conversation, and to redact the sensitive information.  
One of ordinary skill would have recognized that applying a trained AI model to the redacting method/system of CHONG would yield predictable results of improving the level of accuracy for detecting and redacting sensitive data since the redacting method/system of CHONG was ready for improvement to incorporate the trained AI model, as taught by MITCHEM.  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply a trained AI model of MITCHEM to the method/system of detecting and redacting sensitive data of CHONG.


REGARDING CLAIM 2, CHONG in view of MITCHEM discloses the method of claim 1, wherein receiving an electronic speech signal of verbal communication comprises receiving acoustic spoken speech at a transducer (Par 43 – “For purposes of describing the concepts and technologies disclosed herein, the customer device 116 will be described herein as a mobile telephone or smartphone. It should be understood that this embodiment is illustrative, and should not be construed as being limiting in any way.”; Par 48 – “The “call handling” referred to herein can include an exchange of audio data and associated information (“audio”) associated with the call 114 between the agent device 120 and the customer device 116 to resolve an issue or issues for which the customer initiated the call 114 to the call center 102 (or the call 114 that was eventually routed to the call center 102 as noted above).”) and converting said acoustic spoken speech to said electronic speech signal (Par 34 – “As used herein, the phrase “real-time” is used to a refer to a process in which input (e.g., a user's spoken speech during a telephone call) is processed (e.g., the sensitive portions of the user's speech are removed, optionally substituted, and the sensitive information is captured and/or used) within a small number of milliseconds such that the process occurs virtually immediately.”; Par 154 – “The processing unit 902 may be a standard central processor that performs arithmetic and logical operations, a more specific purpose programmable logic controller (“PLC”), a programmable gate array, or other type of processor known to those skilled in the art and suitable for controlling the operation of the server computer.”; In other words, the human speech (i.e., acoustic spoken speech) is transmitted via wireless or wired connection to a computer to process the signal.  Thus, the acoustic signal is converted into electronic signal and also into a digital signal.).


REGARDING CLAIM 3, CHONG in view of MITCHEM discloses the method of claim 2, wherein the transducer comprises a microphone (Par 34 – “As used herein, the phrase “real-time” is used to a refer to a process in which input (e.g., a user's spoken speech during a telephone call) is processed (e.g., the sensitive portions of the user's speech are removed, optionally substituted, and the sensitive information is captured and/or used) within a small number of milliseconds such that the process occurs virtually immediately.”; One of ordinary skill in the art would know telephone comprises a microphone.).


REGARDING CLAIM 4, CHONG in view of MITCHEM discloses the method of claim 1, wherein redacting said target speech from said electronic speech signal during said time window to produce the redacted verbal communication comprises: muting the electronic signal during said time window (Par 32 – “The modified audio can include substitute audio and/or otherwise may omit the sensitive audio.”; Par 80 – “The extraction and redaction engine 110 can remove, from audio associated with the call 114, the sensitive audio 122 that corresponds to the sensitive personal information, and provide modified audio to the agent device 120 as part of the modified call 134. As noted above, the modified audio can include the substitute audio 138 and/or otherwise may omit the sensitive audio 122.”).


REGARDING CLAIM 5, CHONG in view of MITCHEM discloses the method of claim 1, wherein: to predict target speech at a time window (t2-t3) comprises predicting said target speech before said target speech is generated (Par 56 – “For example, if a caller says “my social security number is,” the extraction and redaction engine 110 can determine that the next words spoken, if numbers, are likely to correspond to a social security number and therefore can extract and redact (from the audio associated with the call 114), the spoken numbers. It should be understood that this example is illustrative, and therefore should not be construed as being limiting in any way.”).

REGARDING CLAIM 6, CHONG in view of MITCHEM discloses the method of claim 1, wherein: to predict target speech at a time window (t2-t3) comprises predicting said target speech without recognizing a semantic meaning of the first portion of the electronic speech signal (Par 110 – “As explained above, the agent device 120 can detect the sensitive personal information event by detecting selection (e.g., by the agent) of a smart field or other field for entry of personal information (e.g., a credit card number, a name, a social security number, etc.), by detecting the agent asking for sensitive information (e.g., “what is your credit card number,” “what is your social security number,” or the like), and/or in other manners.”; In other words, the method/system of CHONG detects phrases without understanding the semantic meaning (e.g., natural language processing)).


REGARDING CLAIM 10, CHONG in view of MITCHEM discloses the method of claim 1, wherein each term in the pre-defined set of terms to be redacted is defined by a set of phones, and not based on a meaning of said term (Par 56 – “In some embodiments, for example, output from the speech recognition engine 108 (or a speech and recognition application or module if included in the extraction and redaction engine 110) can include a transcribed text associated with a call 114, or the like, and the extraction and redaction engine 110 can be configured to determine, based on analysis of that text, that a disclosure of sensitive information is about to occur”; Par 110 – “As explained above, the agent device 120 can detect the sensitive personal information event by detecting selection (e.g., by the agent) of a smart field or other field for entry of personal information (e.g., a credit card number, a name, a social security number, etc.), by detecting the agent asking for sensitive information (e.g., “what is your credit card number,” “what is your social security number,” or the like), and/or in other manners.”; In other words, the method/system of CHONG detects phrases without understanding the semantic meaning (e.g., natural language processing)).


REGARDING CLAIM 12, CHONG in view of MITCHEM discloses the method of claim 1, wherein the verbal communication comprises human speech uttered audibly by a human into a transducer (Par 43 – “For purposes of describing the concepts and technologies disclosed herein, the customer device 116 will be described herein as a mobile telephone or smartphone. It should be understood that this embodiment is illustrative, and should not be construed as being limiting in any way.”; Par 48 – “The “call handling” referred to herein can include an exchange of audio data and associated information (“audio”) associated with the call 114 between the agent device 120 and the customer device 116 to resolve an issue or issues for which the customer initiated the call 114 to the call center 102 (or the call 114 that was eventually routed to the call center 102 as noted above).”).


REGARDING CLAIM 13, CHONG in view of MITCHEM discloses a computer-implemented system for moderating a verbal communication, the system comprising:
a communications interface configured to receive, at a reception time (r1), an electronic speech signal of the verbal communication (Par 43 – “According to some embodiments of the concepts and technologies disclosed herein, the call 114 can include data that can identify a calling party and/or a device associated with the calling party such as, for example, a customer device 116. In various embodiments, functionality of the customer device 116 may be provided by one or more computers, mobile telephones, laptop computers, tablet computers, telephones, other computing systems, other communications devices, combinations thereof, or the like.”; Par 48 – “The “call handling” referred to herein can include an exchange of audio data and associated information (“audio”) associated with the call 114 between the agent device 120 and the customer device 116 to resolve an issue or issues for which the customer initiated the call 114 to the call center 102 (or the call 114 that was eventually routed to the call center 102 as noted above).”), said electronic speech signal of the verbal communication comprising a first portion at a first time (t1) (Fig. 2 – “Sure, my credit card number is”; Par 83 – “As shown in FIG. 2, the audio 200 can include a first instance of nonsensitive audio 124A, a first instance of sensitive audio 122A, and a second instance of nonsensitive audio 124B. It can be appreciated that multiple instances of sensitive audio 122 and nonsensitive audio 124 may be included, and that the illustrated example is merely illustrative and is provided for illustration purposes only. As shown in FIG. 2, these instances of audio can correspond to the spoken words illustrated under the schematic representation of the call 114. As can be appreciated with reference to FIG. 2, the phrase “Sure, my credit card number is” can correspond to the first instance of nonsensitive audio 124A; the phrase “1234 5678 9012 3456” can correspond to the first instance of sensitive audio 122A; and the phrase “Can you please tell me when that charge will clear and . . . ” can correspond to the second instance of nonsensitive audio 124B. It should be understood that this example is illustrative, and therefore should not be construed as being limiting in any way.”; Par103 – “It can be appreciated that if the call 114 includes audio having fifteen seconds of nonsensitive audio 124, five seconds of sensitive audio 122, and another fifteen seconds of nonsensitive audio 124 (a total of thirty five seconds), that the modified call 134 provided to the agent device 120 can include fifteen seconds of nonsensitive audio 124, five seconds of substitute audio 138, and another fifteen seconds of nonsensitive audio 124 (a total of thirty five seconds).”);
an artificial intelligence (Par 56 – “According to some embodiments, the extraction and redaction engine 110 can detect a potential impending disclosure of sensitive information based on analyzing output from the speech recognition engine 108.”) [being a machine learning model trained] to: 
(1) process said first portion of the electronic speech signal and thereby predict target speech at a time window (t2-t3) (Par 54 – “More particularly, the extraction and redaction engine 110 can predict an impending disclosure of sensitive information during a call 114 (before the sensitive information is said or heard), and can perform operations to prevent the disclosure of the sensitive information to some entities (e.g., to the agent associated with the agent device 120) during the call 114.”; Par 56 –“ For example, if a caller says “my social security number is,” the extraction and redaction engine 110 can determine that the next words spoken, if numbers, are likely to correspond to a social security number and therefore can extract and redact (from the audio associated with the call 114), the spoken numbers. It should be understood that this example is illustrative, and therefore should not be construed as being limiting in any way.”; Par 96 – “As used herein, a “sensitive personal information event” can refer to the providing of, the speaking of, the entry of, and/or any other exchange of information that is potentially sensitive, commonly referred to as “sensitive personal information” in the telecommunications industry. Thus, in operation 308, the computing device 106 can determine that such an exchange of information is about to occur (before the actual exchange occurs as explained herein).”; Par 110 – “As explained above, the agent device 120 can detect the sensitive personal information event by detecting selection (e.g., by the agent) of a smart field or other field for entry of personal information (e.g., a credit card number, a name, a social security number, etc.), by detecting the agent asking for sensitive information (e.g., “what is your credit card number,” “what is your social security number,” or the like), and/or in other manners.”; Par 100 – “Thus, in some embodiments, the computing device 106 can generate the modified call 134 in operation 310, where the modified call 134 can include the call identifier 118, the substitute audio 138 (a time period of soft tone, music, spoken words, or other audio that is substituted for a matching time period of sensitive audio 122 that has been removed), and other information such as, for example, characters, commands, and/or other information. It should be understood that this example is illustrative, and therefore should not be construed as being limiting in any way.”; Par 103 – “From operation 310, the method 300 can proceed to operation 312. At operation 312, the computing device 106 can provide the modified call 134 to the agent device 120. It can be appreciated that if the call 114 includes audio having fifteen seconds of nonsensitive audio 124, five seconds of sensitive audio 122, and another fifteen seconds of nonsensitive audio 124 (a total of thirty five seconds), that the modified call 134 provided to the agent device 120 can include fifteen seconds of nonsensitive audio 124, five seconds of substitute audio 138, and another fifteen seconds of nonsensitive audio 124 (a total of thirty five seconds).”), which time window (t2-t3) is subsequent to the first portion of the electronic speech signal (Fig. 2; Par 83 – “As can be appreciated with reference to FIG. 2, the phrase “Sure, my credit card number is” can correspond to the first instance of nonsensitive audio 124A; the phrase “1234 5678 9012 3456” can correspond to the first instance of sensitive audio 122A; and the phrase “Can you please tell me when that charge will clear and . . . ” can correspond to the second instance of nonsensitive audio 124B.”), said target speech comprising a pre-defined set of terms to be redacted (Par 70 – “The sensitive data 140 can correspond, in some embodiments, to the sensitive information (e.g., credit card numbers, social security numbers, or the like) that can be captured by the speech recognition engine 108 and/or other functionality.”), and
(2) redact said target speech from said electronic speech signal during said time window to produce a redacted verbal communication signal (Par 85 – “As can be appreciated with reference to FIG. 2, the phrase “1234 5678 9012 3456” has been removed or redacted from the modified audio 202 and replaced with the first instance of substitute audio 138A.”; Par 100 – “Thus, in some embodiments, the computing device 106 can generate the modified call 134 in operation 310, where the modified call 134 can include the call identifier 118, the substitute audio 138 (a time period of soft tone, music, spoken words, or other audio that is substituted for a matching time period of sensitive audio 122 that has been removed), and other information such as, for example, characters, commands, and/or other information.”) with less than 500 ms of introduced latency as measured from the reception time (r1) (CHONG Par 34 – “As used herein, the phrase “real-time” is used to a refer to a process in which input (e.g., a user's spoken speech during a telephone call) is processed (e.g., the sensitive portions of the user's speech are removed, optionally substituted, and the sensitive information is captured and/or used) within a small number of milliseconds such that the process occurs virtually immediately. In some embodiments, for example, the number of milliseconds can include a number within a range of less than one to five milliseconds; a range of five to twenty milliseconds; or a range of up to fifty milliseconds.”); and 
a system output interface configured to provide the redacted audible communication signal as system output (Par 85 – “In the modified audio 202, however, the spoken words illustrated under the schematic representation of the modified call 134 can reflect what the agent and/or other entity at the agent device 120 hears. As can be appreciated with reference to FIG. 2, the phrase “1234 5678 9012 3456” has been removed or redacted from the modified audio 202 and replaced with the first instance of substitute audio 138A.”; Par 103 – “From operation 310, the method 300 can proceed to operation 312. At operation 312, the computing device 106 can provide the modified call 134 to the agent device 120.”).

CHONG does not explicitly teach the [square-bracketed] limitation.

MITCHEM discloses the [square-bracketed] limitations.  MITCHEM discloses a method/system for censoring speech conversations between two users and redacting sensitive audio stream comprising:
an artificial intelligence [being a machine learning model trained] (MITCHEM Col 9:13-26 – “The AI assistant 201 may be configured to be taught to process the inputs. The AI assistant 201 may be fed a training set of data. The training set of data may comprise data associated with an industry . The industry may be associated with a business of the CSR. For example, the training set of data may comprise an employee training manual, business vocabulary, and/or data associated with historical customer communication sessions. The training set of data may comprise one or more inputs. The training set of data may comprise one or more outputs associated with the inputs. The training set of data may comprise one or more input classification codes. The training set may comprise one or more outputs associated with the classification codes.”) to: 
(1) process said first portion of the electronic speech signal and thereby predict target speech at a time window (t2-t3), which time window (t2-t3) is subsequent to the first portion of the electronic speech signal (MITCHEM Col 2:12-23 – “The AI assistant may listen and/or have access to a conversation between a CSR and a customer. Based on the customer's questions, responses, and/or statements, the AI assistant may determine questions, responses, or statements for the CSR and communicate the questions, responses, or statements to the CSR.”; Col 12:62-13:12 – “The output may be determined based on a mapping and/or table of inputs (e.g., input characteristics, input classification codes, etc.) and outputs. The output may be determined based on data received from an artificial neural network. The output may be determined based on a determined probability success of an output. The output may be determined based on other communication sessions (e.g., with the customer and/or other customers). The output may be determined using predictive analysis.”; Claim 1 – “determining, by the processing, that at least a portion of the input includes sensitive information;”), said target speech comprising a pre-defined set of terms to be redacted (MITCHEM Col 8:58 – 9:2 –“The probabilities of success of the outputs may be in response to one or more inputs. The artificial neural network may be configured to map one or more outputs to one or inputs (e.g., key words in the input) or characteristics of an input (e.g., tone, emotion, meaning).”; Col 13:33-56 – “Based on a determination that the CSR is not authorized to receive sensitive information, the AI assistant may censor the information, such as by temporarily silencing audio from the customer to the CSR or preventing the display of the information on the CSR's device. The AI assistant may record the sensitive information, such as in a secure memory and/or database.”), and
(2) redact said target speech from said electronic speech signal during said time window to produce a redacted verbal communication signal (MITCHEM Col 13:33-56 – “Based on a determination that the CSR is not authorized to receive sensitive information, the AI assistant may censor the information, such as by temporarily silencing audio from the customer to the CSR or preventing the display of the information on the CSR's device. The AI assistant may record the sensitive information, such as in a secure memory and/or database.”) with less than 500 ms of introduced latency as measured from the reception time (r1) (MITCHEM Col 8:28-36 – “The AI assistant 201 may comprise a processor 206. The processor 206 may comprise a microprocessor. The processor 206 may comprise a digital signal processor (DSP), such as a DSP chip. The processor 206 may comprise a real-time dialogue-processing component.”); and 
a system output interface configured to provide the redacted audible communication signal as system output (MITCHEM Col 13:33-56 – “Based on a determination that the CSR is not authorized to receive sensitive information, the AI assistant may censor the information, such as by temporarily silencing audio from the customer to the CSR or preventing the display of the information on the CSR's device. The AI assistant may record the sensitive information, such as in a secure memory and/or database.”).
In other words, MITCHEM teaches an AI assistant that is trained to receive real-time speech conversations, analyzes the conversations, detect keywords in the conversations, perform predictive analysis, detect sensitive information in the conversation, and to redact the sensitive information.  
One of ordinary skill would have recognized that applying a trained AI model to the redacting method/system of CHONG would yield predictable results of improving the level of accuracy for detecting and redacting sensitive data since the redacting method/system of CHONG was ready for improvement to incorporate the trained AI model, as taught by MITCHEM.  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply a trained AI model of MITCHEM to the method/system of detecting and redacting sensitive data of CHONG.



REGARDING CLAIM 14, CHONG in view of MITCHEM discloses the system of claim 13, wherein the communications interface comprises the system output interface (Par 85 – “In the modified audio 202, however, the spoken words illustrated under the schematic representation of the modified call 134 can reflect what the agent and/or other entity at the agent device 120 hears. As can be appreciated with reference to FIG. 2, the phrase “1234 5678 9012 3456” has been removed or redacted from the modified audio 202 and replaced with the first instance of substitute audio 138A.”; Par 103 – “From operation 310, the method 300 can proceed to operation 312. At operation 312, the computing device 106 can provide the modified call 134 to the agent device 120.”).

Claim 15 is similar to Claim 5; thus, it is rejected under the same rationale.

Claim 16 is similar to Claim 6; thus, it is rejected under the same rationale.

REGARDING CLAIM 17, CHONG in view of MITCHEM discloses a non-transitory computer-readable medium storing computer-executable code thereon, the code when executed by a computer causing the computer to execute a process of moderating a verbal communication, the code comprising:
code for causing the computer to receive, at a reception tie (r1) an electronic speech signal of the verbal communication (Par 43 – “According to some embodiments of the concepts and technologies disclosed herein, the call 114 can include data that can identify a calling party and/or a device associated with the calling party such as, for example, a customer device 116. In various embodiments, functionality of the customer device 116 may be provided by one or more computers, mobile telephones, laptop computers, tablet computers, telephones, other computing systems, other communications devices, combinations thereof, or the like.”; Par 48 – “The “call handling” referred to herein can include an exchange of audio data and associated information (“audio”) associated with the call 114 between the agent device 120 and the customer device 116 to resolve an issue or issues for which the customer initiated the call 114 to the call center 102 (or the call 114 that was eventually routed to the call center 102 as noted above).”), said electronic speech signal of the verbal communication comprising a first portion at a first time (t1) (Fig. 2 – “Sure, my credit card number is”; Par 83 – “As shown in FIG. 2, the audio 200 can include a first instance of nonsensitive audio 124A, a first instance of sensitive audio 122A, and a second instance of nonsensitive audio 124B. It can be appreciated that multiple instances of sensitive audio 122 and nonsensitive audio 124 may be included, and that the illustrated example is merely illustrative and is provided for illustration purposes only. As shown in FIG. 2, these instances of audio can correspond to the spoken words illustrated under the schematic representation of the call 114. As can be appreciated with reference to FIG. 2, the phrase “Sure, my credit card number is” can correspond to the first instance of nonsensitive audio 124A; the phrase “1234 5678 9012 3456” can correspond to the first instance of sensitive audio 122A; and the phrase “Can you please tell me when that charge will clear and . . . ” can correspond to the second instance of nonsensitive audio 124B. It should be understood that this example is illustrative, and therefore should not be construed as being limiting in any way.”; Par103 – “It can be appreciated that if the call 114 includes audio having fifteen seconds of nonsensitive audio 124, five seconds of sensitive audio 122, and another fifteen seconds of nonsensitive audio 124 (a total of thirty five seconds), that the modified call 134 provided to the agent device 120 can include fifteen seconds of nonsensitive audio 124, five seconds of substitute audio 138, and another fifteen seconds of nonsensitive audio 124 (a total of thirty five seconds).”);
code for processing the electronic speech signal with an artificial intelligence (Par 56 – “According to some embodiments, the extraction and redaction engine 110 can detect a potential impending disclosure of sensitive information based on analyzing output from the speech recognition engine 108.”), the artificial intelligence [being a machine learning model trained] to: 
(1) process the first portion of the electronic speech signal and thereby predict target speech at a time window (t2-t3) (Par 54 – “More particularly, the extraction and redaction engine 110 can predict an impending disclosure of sensitive information during a call 114 (before the sensitive information is said or heard), and can perform operations to prevent the disclosure of the sensitive information to some entities (e.g., to the agent associated with the agent device 120) during the call 114.”; Par 56 –“ For example, if a caller says “my social security number is,” the extraction and redaction engine 110 can determine that the next words spoken, if numbers, are likely to correspond to a social security number and therefore can extract and redact (from the audio associated with the call 114), the spoken numbers. It should be understood that this example is illustrative, and therefore should not be construed as being limiting in any way.”; Par 96 – “As used herein, a “sensitive personal information event” can refer to the providing of, the speaking of, the entry of, and/or any other exchange of information that is potentially sensitive, commonly referred to as “sensitive personal information” in the telecommunications industry. Thus, in operation 308, the computing device 106 can determine that such an exchange of information is about to occur (before the actual exchange occurs as explained herein).”; Par 110 – “As explained above, the agent device 120 can detect the sensitive personal information event by detecting selection (e.g., by the agent) of a smart field or other field for entry of personal information (e.g., a credit card number, a name, a social security number, etc.), by detecting the agent asking for sensitive information (e.g., “what is your credit card number,” “what is your social security number,” or the like), and/or in other manners.”; Par 100 – “Thus, in some embodiments, the computing device 106 can generate the modified call 134 in operation 310, where the modified call 134 can include the call identifier 118, the substitute audio 138 (a time period of soft tone, music, spoken words, or other audio that is substituted for a matching time period of sensitive audio 122 that has been removed), and other information such as, for example, characters, commands, and/or other information. It should be understood that this example is illustrative, and therefore should not be construed as being limiting in any way.”; Par 103 – “From operation 310, the method 300 can proceed to operation 312. At operation 312, the computing device 106 can provide the modified call 134 to the agent device 120. It can be appreciated that if the call 114 includes audio having fifteen seconds of nonsensitive audio 124, five seconds of sensitive audio 122, and another fifteen seconds of nonsensitive audio 124 (a total of thirty five seconds), that the modified call 134 provided to the agent device 120 can include fifteen seconds of nonsensitive audio 124, five seconds of substitute audio 138, and another fifteen seconds of nonsensitive audio 124 (a total of thirty five seconds).”), which time window (t2-t3) is subsequent to the first portion of the electronic speech signal (Fig. 2; Par 83 – “As can be appreciated with reference to FIG. 2, the phrase “Sure, my credit card number is” can correspond to the first instance of nonsensitive audio 124A; the phrase “1234 5678 9012 3456” can correspond to the first instance of sensitive audio 122A; and the phrase “Can you please tell me when that charge will clear and . . . ” can correspond to the second instance of nonsensitive audio 124B.”), said target speech comprising a pre-defined set of terms to be redacted (Par 70 – “The sensitive data 140 can correspond, in some embodiments, to the sensitive information (e.g., credit card numbers, social security numbers, or the like) that can be captured by the speech recognition engine 108 and/or other functionality.”), and 
(2) redact said target speech from said electronic speech signal during said time window to produce a redacted verbal communication signal (Par 85 – “As can be appreciated with reference to FIG. 2, the phrase “1234 5678 9012 3456” has been removed or redacted from the modified audio 202 and replaced with the first instance of substitute audio 138A.”; Par 100 – “Thus, in some embodiments, the computing device 106 can generate the modified call 134 in operation 310, where the modified call 134 can include the call identifier 118, the substitute audio 138 (a time period of soft tone, music, spoken words, or other audio that is substituted for a matching time period of sensitive audio 122 that has been removed), and other information such as, for example, characters, commands, and/or other information.”) with less than 500 ms of introduced latency as measured from the reception time (r1) (CHONG Par 34 – “As used herein, the phrase “real-time” is used to a refer to a process in which input (e.g., a user's spoken speech during a telephone call) is processed (e.g., the sensitive portions of the user's speech are removed, optionally substituted, and the sensitive information is captured and/or used) within a small number of milliseconds such that the process occurs virtually immediately. In some embodiments, for example, the number of milliseconds can include a number within a range of less than one to five milliseconds; a range of five to twenty milliseconds; or a range of up to fifty milliseconds.”);
code for causing the artificial intelligence to produce the redacted verbal communication signal (Par 85 – “In the modified audio 202, however, the spoken words illustrated under the schematic representation of the modified call 134 can reflect what the agent and/or other entity at the agent device 120 hears. As can be appreciated with reference to FIG. 2, the phrase “1234 5678 9012 3456” has been removed or redacted from the modified audio 202 and replaced with the first instance of substitute audio 138A.”); and 
code for providing said redacted verbal communication signal to a consumer (Par 85 – “In the modified audio 202, however, the spoken words illustrated under the schematic representation of the modified call 134 can reflect what the agent and/or other entity at the agent device 120 hears. As can be appreciated with reference to FIG. 2, the phrase “1234 5678 9012 3456” has been removed or redacted from the modified audio 202 and replaced with the first instance of substitute audio 138A.”; Par 103 – “From operation 310, the method 300 can proceed to operation 312. At operation 312, the computing device 106 can provide the modified call 134 to the agent device 120.”).


CHONG does not explicitly teach the [square-bracketed] limitation.

MITCHEM discloses the [square-bracketed] limitations.  MITCHEM discloses a method/system for censoring speech conversations between two users and redacting sensitive audio stream comprising:
code for processing the electronic speech signal with an artificial intelligence, the artificial intelligence [being a machine learning model trained] (MITCHEM Col 9:13-26 – “The AI assistant 201 may be configured to be taught to process the inputs. The AI assistant 201 may be fed a training set of data. The training set of data may comprise data associated with an industry . The industry may be associated with a business of the CSR. For example, the training set of data may comprise an employee training manual, business vocabulary, and/or data associated with historical customer communication sessions. The training set of data may comprise one or more inputs. The training set of data may comprise one or more outputs associated with the inputs. The training set of data may comprise one or more input classification codes. The training set may comprise one or more outputs associated with the classification codes.”) to: 
(1) process the first portion of the electronic speech signal and thereby predict target speech at a time window (t2-t3), which time window (t2-t3) is subsequent to the first portion of the electronic speech signal (MITCHEM Col 2:12-23 – “The AI assistant may listen and/or have access to a conversation between a CSR and a customer. Based on the customer's questions, responses, and/or statements, the AI assistant may determine questions, responses, or statements for the CSR and communicate the questions, responses, or statements to the CSR.”; Col 12:62-13:12 – “The output may be determined based on a mapping and/or table of inputs (e.g., input characteristics, input classification codes, etc.) and outputs. The output may be determined based on data received from an artificial neural network. The output may be determined based on a determined probability success of an output. The output may be determined based on other communication sessions (e.g., with the customer and/or other customers). The output may be determined using predictive analysis.”; Claim 1 – “determining, by the processing, that at least a portion of the input includes sensitive information;”), said target speech comprising a pre-defined set of terms to be redacted (MITCHEM Col 8:58 – 9:2 –“The probabilities of success of the outputs may be in response to one or more inputs. The artificial neural network may be configured to map one or more outputs to one or inputs (e.g., key words in the input) or characteristics of an input (e.g., tone, emotion, meaning).”; Col 13:33-56 – “Based on a determination that the CSR is not authorized to receive sensitive information, the AI assistant may censor the information, such as by temporarily silencing audio from the customer to the CSR or preventing the display of the information on the CSR's device. The AI assistant may record the sensitive information, such as in a secure memory and/or database.”), and 
(2) redact said target speech from said electronic speech signal during said time window to produce a redacted verbal communication signal (MITCHEM Col 13:33-56 – “Based on a determination that the CSR is not authorized to receive sensitive information, the AI assistant may censor the information, such as by temporarily silencing audio from the customer to the CSR or preventing the display of the information on the CSR's device. The AI assistant may record the sensitive information, such as in a secure memory and/or database.”) with less than 500 ms of introduced latency as measured from the reception time (r1) (MITCHEM Col 8:28-36 – “The AI assistant 201 may comprise a processor 206. The processor 206 may comprise a microprocessor. The processor 206 may comprise a digital signal processor (DSP), such as a DSP chip. The processor 206 may comprise a real-time dialogue-processing component.”);
code for causing the artificial intelligence to produce the redacted verbal communication signal (MITCHEM Col 13:33-56 – “Based on a determination that the CSR is not authorized to receive sensitive information, the AI assistant may censor the information, such as by temporarily silencing audio from the customer to the CSR or preventing the display of the information on the CSR's device. The AI assistant may record the sensitive information, such as in a secure memory and/or database.”); and 
code for providing said redacted verbal communication signal to a consumer (MITCHEM Col 13:33-56 – “Based on a determination that the CSR is not authorized to receive sensitive information, the AI assistant may censor the information, such as by temporarily silencing audio from the customer to the CSR or preventing the display of the information on the CSR's device. The AI assistant may record the sensitive information, such as in a secure memory and/or database.”).
In other words, MITCHEM teaches an AI assistant that is trained to receive real-time speech conversations, analyzes the conversations, detect keywords in the conversations, perform predictive analysis, detect sensitive information in the conversation, and to redact the sensitive information.  
One of ordinary skill would have recognized that applying a trained AI model to the redacting method/system of CHONG would yield predictable results of improving the level of accuracy for detecting and redacting sensitive data since the redacting method/system of CHONG was ready for improvement to incorporate the trained AI model, as taught by MITCHEM.  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply a trained AI model of MITCHEM to the method/system of detecting and redacting sensitive data of CHONG.


Claim 18 is similar to Claim 5; thus, it is rejected under the same rationale.

Claim 19 is similar to Claim 6; thus, it is rejected under the same rationale.

Claim 20 is similar to Claim 10; thus, it is rejected under the same rationale.




Claims 7-9 are rejected under 35 U.S.C. 103 as being unpatentable over CHONG (US 2021/0389924 A1) in view of MITCHEM (US 11,706,337 B1), and in further view of KAPPAGANTU (US 2022/0084544 A1).

REGARDING CLAIM 7, CHONG in view of MITCHEM discloses the method of claim 1, wherein the method is executed at a computer [at which the verbal communication was generated] (Fig. 1 “ Computing device 106”; Par 39 – “According to various embodiments, the call center 102 can include a computing device 106. In various embodiments of the concepts and technologies disclosed herein, the functionality of the computing device 106 may be provided by one or more server computers, desktop computers, mobile telephones, laptop computers, set-top boxes, other computing systems, and the like. It should be understood that the functionality of the computing device 106 can be provided by a single device, by two similar devices, and/or by two or more dissimilar devices. For purposes of describing the concepts and technologies disclosed herein, the computing device 106 is described herein as a server computer. It should be understood that this embodiment is illustrative, and should not be construed as being limiting in any way.”).
CHONG does not explicitly teach the [square-bracketed] limitations. In other words, CHONG implicitly suggests the function of computing device 106 can be executed by a customer device 116 and/or agent device 120 (Par 139 – “mobile telephones”).  Although CHONG implicitly suggests the limitations, the Examiner provides KAPPAGANTU for the clarity of the rejections.

KAPPAGANTU explicitly discloses the [square-bracketed] limitations. KAPPAGANTU discloses a method/system for real-time redaction of sensitive information from audio stream, wherein the method is executed at a computer [at which the verbal communication was generated] (KAPPAGANTU Par 64 – “One or more of the devices depicted in the network environment 200, such as the RTRSI device 202, the server devices 204(1)-204(n), or the client devices 208(1)-208(n), for example, may be configured to operate as virtual instances on the same physical machine. In other words, one or more of the RTRSI device 202, the server devices 204(1)-(204(n), or the client devices 208(1)-208(n) may operate on the same physical device rather than as separate devices communicating through communication network(s) 210. Additionally, there may be more or fewer RTRSI devices 202, server devices 204(1)-204(n), or client devices 208(1)-208(n) than illustrated in FIG. 2.”). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method/system of CHONG in view of MITCHEM to substitute implementing the method on a server with implementing the method on a user device, as taught by KAPPAGANTU.
Since each individual element and its function are shown in the prior art, albeit shown in separate references, the simple substitution of one known element for another producing a predictable result renders the claim obvious. For more on this combination rationale, see MPEP § 2143(B).



REGARDING CLAIM 8, CHONG in view of MITCHEM discloses the method of claim 1, wherein the method is executed at a third computer of a third-party user (Fig. 1 – “Computing Device 106 -> Recipient 150”), remote from a computer at which the verbal communication was generated (Fig. 1 – “Customer Device 116  and/or Agent Device 136”), [to mitigate the risk of the third-party hearing the target speech].
CHONG does not explicitly teach the [square-bracketed] limitations. In other words, CHONG teaches mitigating the risk of a second-party hearing the target speech (e.g., an agent hearing the social security number), but does not explicitly teach mitigating the risk of the third-party hearing the target speech.

KAPPAGANTU explicitly discloses the [square-bracketed] limitations. KAPPAGANTU discloses a method/system for real-time redaction of sensitive information from audio stream, wherein the method is executed at a third computer of a third-party user (KAPPAGANTU Fig. 3 – “Real-time voice call redaction module 302”), remote from a computer at which the verbal communication was generated (KAPPAGANTU Fig. 3 Units 208(1) and 208 (2)), [to mitigate the risk of the third-party hearing the target speech] (KAPPAGANTU Fig. 5 – “Destination 1, 2, …, n”; Par 81 – “The portions of the audio stream that are conveyed through the pipeline are then received by the stream multiplier, which then routes the audio stream to an intended destination. In an exemplary embodiment, a possible intended destination may be a memory, such as, for example, a database, within which a recordation of the non-sensitive portions of the audio stream may be stored.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method/system of CHONG in view of MITCHEM to include preventing a third party from hearing the target speech, as taught by KAPPAGANTU.
One of ordinary skill would have been motivated to include preventing a third party from hearing the target speech, in order to provide a more secure communication and preventing a leak of sensitive data.



REGARDING CLAIM 9, CHONG in view of MITCHEM discloses the method of claim 1.
CHONG further discloses wherein the method is executed at an intermediary computer system (e.g., in the cloud) electronically disposed (Fig. 1 – “Computing Device 106”)  between (i) a computer at which the verbal communication was generated (Fig. 1 – “Customer Device 116  and/or Agent Device 136”) and (ii) a computer in use by a third party (Fig. 1 -- “Recipient 150”), [to mitigate the risk of the third-party hearing the target speech].
CHONG does not explicitly teach the [square-bracketed] limitations. In other words, CHONG teaches mitigating the risk of a second-party hearing the target speech (e.g., an agent hearing the social security number), but does not explicitly teach mitigating the risk of the third-party hearing the target speech.

KAPPAGANTU explicitly discloses the [square-bracketed] limitations. KAPPAGANTU discloses a method/system for real-time redaction of sensitive information from audio stream, wherein the method is executed at an intermediary computer system (e.g., in the cloud) electronically disposed (KAPPAGANTU Fig. 3 – “Real-time voice call redaction module 302”) between (i) a computer at which the verbal communication was generated (KAPPAGANTU Fig. 3 Units 208(1) and 208 (2)) and (ii) a computer in use by a third party (KAPPAGANTU Fig. 5 – “Destination 1, 2, …, n”), [to mitigate the risk of the third-party hearing the target speech] (KAPPAGANTU Fig. 5; Par 81 – “The portions of the audio stream that are conveyed through the pipeline are then received by the stream multiplier, which then routes the audio stream to an intended destination. In an exemplary embodiment, a possible intended destination may be a memory, such as, for example, a database, within which a recordation of the non-sensitive portions of the audio stream may be stored.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method/system of CHONG to include preventing a third party from hearing the target speech, as taught by KAPPAGANTU.
One of ordinary skill would have been motivated to include preventing a third party from hearing the target speech, in order to provide a more secure communication and preventing a leak of sensitive data.


Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over CHONG (US 2021/0389924 A1) in view of MITCHEM (US 11,706,337 B1), and in further view of CUTLER (US 2018/0218727 A1).

REGARDING CLAIM 11, CHONG in view of MITCHEM discloses the method of claim 1.
CHONG does not explicitly teach artificially-generated speech.
CUTLER discloses a method/system for a communication session between users,
 wherein the verbal communication comprises artificially-generated speech (Par 45 – “Further, the controller 216 is configured to determine when to control the text-to-speech converter 218 to convert the received text data 254 into synthesized speech and to play this synthesized speech out through the speaker(s) 222 in place of the received audio 250.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method/system of CHONG in view of MITCHEM to include artificially-generated speech, as taught by CUTLER.
One of ordinary skill would have been motivated to include artificially-generated speech, in order to improve voice communication between users.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JONATHAN C KIM whose telephone number is (571)272-3327. The examiner can normally be reached Monday to Friday 8:00 AM thru 4:00 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew C Flanders can be reached at 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JONATHAN C KIM/Primary Examiner, Art Unit 2655
Read full office action
Prosecution Timeline

Apr 07, 2023
Application Filed
Feb 27, 2025
Non-Final Rejection — §103, §112
Jun 05, 2025
Response Filed
Aug 08, 2025
Final Rejection — §103, §112
Nov 11, 2025
Request for Continued Examination
Nov 18, 2025
Response after Non-Final Action
Dec 23, 2025
Non-Final Rejection — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/188,223
Patent 12573391
Generating Contextual Responses for Out-of-coverage Requests for Assistant Systems
2y 5m to grant Granted Mar 10, 2026
18/247,754
Patent 12561110
AUDIO PLAYBACK METHOD AND APPARATUS, COMPUTER READABLE STORAGE MEDIUM, AND ELECTRONIC DEVICE
2y 5m to grant Granted Feb 24, 2026
18/367,180
Patent 12555578
METHOD AND SYSTEM OF AUDIO FALSE KEYPHRASE REJECTION USING SPEAKER RECOGNITION
2y 5m to grant Granted Feb 17, 2026
18/278,537
Patent 12547372
DISPLAY APPARATUS AND DISPLAY METHOD
2y 5m to grant Granted Feb 10, 2026
18/067,277
Patent 12537000
METHOD OF IDENTIFYING TARGET DEVICE AND ELECTRONIC DEVICE THEREFOR
2y 5m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
74%
Grant Probability
99%
With Interview (+40.6%)
2y 7m
Median Time to Grant
High
PTA Risk
Based on 355 resolved cases by this examiner. Grant probability derived from career allow rate.