Last updated: April 19, 2026
Application No. 18/526,730
SYSTEMS AND METHODS FOR PROVIDING LOW LATENCY USER FEEDBACK ASSOCIATED WITH A USER SPEAKING SILENTLY

Non-Final OA §102§103§DP
Filed
Dec 01, 2023
Examiner
SOLAIMAN, FOUZIA HYE
Art Unit
2653
Tech Center
2600 — Communications
Assignee
Wispr AI Inc.
OA Round
1 (Non-Final)
Interview Optional

— +55.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 63 resolved cases, 2023–2026
Examiner Intelligence

SOLAIMAN, FOUZIA HYE View full profile →
Grants 67% — above average
Career Allow Rate
42 granted / 63 resolved
+4.7% vs TC avg
Strong +56% interview lift
Without
With
+55.5%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
16 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
28.5%
-11.5% vs TC avg
§103
47.1%
+7.1% vs TC avg
§102
16.0%
-24.0% vs TC avg
§112
2.7%
-37.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 63 resolved cases
Office Action

§102 §103 §DP
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Information Disclosure Statement
Acknowledgment is made of the information disclosure statements filed on 1/14/2026, 11/13/2025, 8/14/2025, 3/24/2025, 8/12/2024, 5/17/2024, and 5/16/2024. It is noted by the Examiner that all of the references were considered.

Drawings
The drawings submitted on 03/03/2022 have been considered and accepted.
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees.  A nonstatutory double patenting rejection is appropriate where the claims at issue are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); and In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on a nonstatutory double patenting ground provided the reference application or patent either is shown to be commonly owned with this application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The USPTO internet Web site contains terminal disclaimer forms which may be used.  Please visit http://www.uspto.gov/forms/.  The filing date of the application will determine what form should be used.  A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission.  For more information about eTerminal Disclaimers, refer to http://www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.  
Claims 1, 19, and 20  is/are rejected on the ground of non-statutory obviousness-type double patenting as being unpatentable over  claims  11  of (application number, 18/526682) in view of Kapur et al. US 2019/0074012 . Claims 1, 19, and 20 of the instant application recite similar and/or word-for-word limitations as in claims 1 of the reference application and omitted detail features such as " …  transmit the speech data”, " communication interface; and output audio of the speech of the second user  …”, etc.; and thus, claims 1, 19, and 20 of the instant application are narrower than claims 1 of the  US 20240221718 A1/application number, 18/526682).
 As to claim 1, 19, and 20 the application number, 18/526682 teaches all of the limitations as in claim 11. Kapur  teaches “(“[0034] In some cases, the SSI {a silent speech interface (SSI)} system facilitates private human-to-human communication. For instance, the SSI system: (a) may detect the content of speech that is internally articulated by a first user wearing the SSI system; (b) may send a first message to another human (e.g., to a mobile device or computer associated with the other human), which first message comprises the detected content; (c) may receive a second message from the other human (e.g., from a mobile device or computer associated with the other human); and (d) may convert the second message into data representing sound and may output the second message to the first user. The second message may be audible to the first user (who is wearing the SSI system) yet not audible to other persons in the vicinity of the first user. In the preceding sentence, the entire two-way communication may be undetectable by other persons in the vicinity of the first user (who is internally articulating). (“[0081] Wireless transceivers 903, 917, 919 may send and receive wireless radio signals in accordance with one or more wireless standards, …”) by Kapur et al. US 20190074012 A1”) 
Therefore, it would have been obvious to one of ordinary skilled in the art at the time the invention was made to have modified a model having a transmitting and receiving, communicating with user using user interface  by Kapur in order to successfully
Communicate with user. by Kapur et al. US 2019/0074012”)

Claim 19 is a method claim with limitations similar to the limitations of instant system Claim 1 and is rejected under similar rationale. 
Claim 20 is a computer-readable medium claim with limitations similar to the limitations of instant system Claim 1 and is rejected under similar rationale.


Instant Application No. 18/526, 730
Co-pending Application No. 18/526682
1. A communication system for making and receiving a call, the system comprising: a speech system associated with a first user, the speech system configured to measure a signal indicative of speech muscle activation patterns of the first user when the first user is speaking; a communication interface configured to communicate with a communication device associated with a second user on a communication network; and one or more processors configured to: determine speech data representing speech of the first user based on the signal indicative of the speech muscle activation patterns of the first user when the first user is speaking silently; 


transmit the speech data representing the speech of the first user to the communication device associated with the second user on the communication network using the communication interface;
       receive speech data representing speech of a second user from the communication device associated with the second user on the communication network using the communication interface; and
         output audio of the speech of the second user based on the received speech data representing the speech of the second user. 


1. (Currently Amended) A system for synthesizing input speech of a user, the system comprising:
   a speech system configured to measure a signal indicative of speech muscle activation patterns of the user when the user is speaking;  

a machine learning model configured to synthesize an audio signal of the input speech of the user using the signal indicative of the speech muscle activation patterns of the user; and
  a processor configured to output the synthesized audio signal of the input speech substantially in parallel in time with the user speaking by.  
receiving a first audio prediction from the machine learning model, the first audio prediction comprising a first audio frame of the synthesized audio signal and a second audio frame of the synthesized audio signal;   

outputting the first audio frame: and outputting the second audio frame when there is a delay between an end of outputting the first audio frame and receiving a second audio prediction.

4. (Original) The system of claim 1, wherein:  the speech system is a wearable device comprising an electromyography (EMG) sensor, whereby the signal indicative of the speech muscle activation patterns of the user when the user is speaking comprises EMG data received from the EMG sensor when the user is speaking.


11. (Original) The system of claim 4, wherein the EMG sensor is configured to measure the EMG data when the user is speaking silently.

























Claims 1, 19 and 20 is/are rejected on the ground of non-statutory obviousness-type double patenting as being unpatentable over  claims  15  of (application number, 18/403952) in view of Kapur et al. US 2019/0074012 . Claims 1, 19, and 20 of the instant application recite similar and/or word-for-word limitations as in claims 15of the reference application and omitted detail features such as " …  transmit the speech data”, " communication interface; and output audio of the speech of the second user  …”, etc.; and thus, claims 1, 19, and 20 of the instant application are narrower than claims 1 of the  (US 20240221762 A1/application number, 18/403952).
As to claim 1, the issued patent teaches all of the limitations as in claim 1, 19 and 20. 
Kapur  teaches “(“[0034] In some cases, the SSI {a silent speech interface (SSI)} system facilitates private human-to-human communication. For instance, the SSI system: (a) may detect the content of speech that is internally articulated by a first user wearing the SSI system; (b) may send a first message to another human (e.g., to a mobile device or computer associated with the other human), which first message comprises the detected content; (c) may receive a second message from the other human (e.g., from a mobile device or computer associated with the other human); and (d) may convert the second message into data representing sound and may output the second message to the first user. The second message may be audible to the first user (who is wearing the SSI system) yet not audible to other persons in the vicinity of the first user. In the preceding sentence, the entire two-way communication may be undetectable by other persons in the vicinity of the first user (who is internally articulating). (“[0081] Wireless transceivers 903, 917, 919 may send and receive wireless radio signals in accordance with one or more wireless standards, …”) by Kapur et al. US 20190074012 A1”) 
Therefore, it would have been obvious to one of ordinary skilled in the art at the time the invention was made to have modified a model having a transmitting and receiving, communicating with user using user interface  by Kapur in order to successfully
Communicate with user. by Kapur et al. US 2019/0074012”)
Claim 19 is a method claim with limitations similar to the limitations of instant system Claim 1 and is rejected under similar rationale. 
Claim 20 is a computer-readable medium claim with limitations similar to the limitations of instant system Claim 1 and is rejected under similar rationale.



Instant Application No. 18/526, 730
Co-pending Application No. 18/403952
1. A communication system for making and receiving a call, the system comprising: a 

speech system associated with a first user, the speech system configured to measure a signal indicative of speech muscle activation patterns of the first user when the first user is speaking; 

a communication interface configured to communicate with a communication device associated with a second user on a communication network; and 

 one or more processors configured to: determine speech data representing speech of the first user based on the signal indicative of the speech muscle activation patterns of the first user when the first user is speaking silently;

     transmit the speech data representing the speech of the first user to the communication device associated with the second user on the communication network using the communication interface;   
 receive speech data representing speech of a second user from the communication device associated with the second user on the communication network using the communication interface; and output audio of the speech of the second user based on the received speech data representing the speech of the second user. 


1. A system for decoding speech of a user, the system comprising:
    a speech input device configured to measure a signal indicative of the speech muscle activation patterns of the user while the user is speaking;
     a trained machine learning model configured to decode the speech of the user based at least in part on the signal indicative of the speech muscle activation patterns of the user,
      wherein: the trained machine learning model is trained using training data obtained in at least a subset of sampling contexts of a plurality of sampling contexts; and
    at least one processor configured to output the decoded speech of the user.

15. The system of claim 1, wherein; the speech input device is further configured to obtain voiced speech measurements when the user is speaking vocally; and
     the trained machine learning model is a first trained machine learning model configured to associate a first signal indicative of the speech muscle activation patterns of the user when the user is speaking silently with a first voiced speech measurement when the user is speaking vocally; and 

the system further comprises a second trained machine learning model configured to generate an audio and/or text output when the user is speaking silently based at least in part on the association of the first signal indicative of the speech muscle activation patterns of the user with the first voiced speech measurement.



Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.

Claim 1,7, 15, 17-20 is rejected under 35 U.S.C. 102 (a)(1) as being anticipated by Kapur et al., (US Pub. 2019/0074012).
Regarding Claim 1, Kapur teaches:
1. A communication system for making and receiving a call, the system comprising: a speech system associated with a first user, the speech system configured to measure a signal indicative of speech muscle activation patterns of the first user when the first user is speaking;  Kapur teaches  (“[0033] … For instance, in some cases: (a) a user silently and internally articulates an input; and (b) the SSI system detects the content of this input and outputs an instruction to an external device, which instruction is in accordance with the input. “) (“[0034] In some cases, the SSI system facilitates private human-to-human communication. For instance, the SSI system: (a) may detect the content of speech that is internally articulated by a first user wearing the SSI system; (b) may send a first message to another human (e.g., to a mobile device or computer associated with the other human), which first message comprises the detected content; (c) may receive a second message from the other human (e.g., from a mobile device or computer associated with the other human); and (d) may convert the second message into data representing sound and may output the second message to the first user. The second message may be audible to the first user (who is wearing the SSI system) yet not audible to other persons in the vicinity of the first user. In the preceding sentence, the entire two-way communication may be undetectable by other persons in the vicinity of the first user (who is internally articulating). “) (“[0051] In some implementations, during internal articulation, an SSI device detects neuronal activation of muscles. During internal articulation, efferent nerve impulses are sent from the sensorimotor cortex (brain) through cranial nerves which innervate muscles. The neuronal activation of these muscles may be detected as a myoneural signal. In some implementations, one or more of the following muscles (“Articulator Muscles”) are neurologically activated during internal articulation: geniohyoid, mylohyoid, genioglossus, superior longitudinal, inferior longitudinal, transverse, vertical, hyoglossus, palatoglossus, styloglossus, levator palatini, musculus uvulae, tensor palatini, palatopharyngeus, superior pharyngeal constrictor, medial pharyngeal constrictor, and inferior pharyngeal constrictor. In some cases, during internal articulation, little or no movement of the Articulator Muscles occurs. The Articulator Muscles are muscles that would, in ordinary speech, be employed for articulation.”) (“[0085] In the example shown in FIG. 10, a user's inner speech (e.g., mental speech) or mental verbal imagery 1003 may produce efferent nerve signaling 1005, which in turn may cause internal articulation 1000 (e.g., neural activation at neuromuscular junctions in Articulator Muscles). This internal articulation 1000 may produce somato-sensory feedback 1001 to the user.”)  by Kapur et al. US 20190074012 A1
“In some implementations, the SSI device includes a user interface (UI). The UI may include: (a) a natural language processor to detect content of user's internally articulated speech and, in some use cases, to generate instructions for audio feedback; (b) software for generating a response to the user's internally articulated speech (which response may, in some use scenarios, comprise audio feedback to the user); and (c) a transducer (e.g., earphone or bone conduction transducer) configured to produce audio feedback. In some use scenarios, the audio feedback repeats the words that the user internally articulated. In other use scenarios, at least a portion of the audio feedback is different than (and in response to) words which the user internally articulated. For instance, if a user internally articulates a request for the current time, the audio feedback may comprise an answer which states the current time” (paragraph 128) by Kapur et al. US 2019/0074012
              a communication interface configured to communicate with a communication device associated with a second user on a communication network; and Kapur teaches  (“[0079] FIG. 9 is a box diagram that shows hardware in a silent speech interface. In the example shown in FIG. 9, a wearable housing 900 is configured to be worn on a user's head and neck, and to curve over and partially around (and to be supported by) an ear of the user. Wearable housing 900 includes a bone conduction transducer 901, wireless transceivers 903 and 917, electrodes 905, electrode leads 907, an amplifier 909, an ADC (analog-to-digital converter) 911, a microcontroller 915, and a battery 913. Bone conduction transducer 901 may create vibrations that deliver audio feedback to a user 150. For instance, bone conduction transducer 901 may be positioned (e.g., touching the user's hair or scalp) adjacent to a bony protuberance behind the user's ear. Wireless transceiver 903 may receive wireless signals that encode audio feedback, and may convert these into digital or analog signals, and may send the digital or analog signals to bone conduction transducer 901. Electrodes 905 may measure voltage at positions on the user's skin (e.g., positions on the user's head and neck). Electrode leads 907 may electrically connect electrodes 905 and amplifier 909, Amplifier 909 may amplify analog voltage signals detected by electrodes 905. ADC 911 may convert this amplified analog signal to a digital signal and send the digital signal to microcontroller 915. Microcontroller 915 may process this digital signal and may output the processed signal to wireless transmitter 917. …”) (“[0081] Wireless transceivers 903, 917, 919 may send and receive wireless radio signals in accordance with one or more wireless standards, …”) by Kapur et al. US 20190074012 A1

         one or more processors configured to: determine speech data representing speech of the first user based on the signal indicative of the speech muscle activation patterns of the first user when the first user is speaking silently; Kapur teaches   (“[0128] In some implementations, the SSI device includes a user interface (UI). The UI may include: (a) a natural language processor to detect content of user's internally articulated speech …”) (“[0085] In the example shown in FIG. 10, a user's inner speech (e.g., mental speech) or mental verbal imagery 1003 may produce efferent nerve signaling 1005, which in turn may cause internal articulation 1000 (e.g., neural activation at neuromuscular junctions in Articulator Muscles). This internal articulation 1000 may produce somato-sensory feedback 1001 to the user.”) (“[0031] … For example, in some cases: (a) a user silently and internally articulates multiple numbers and a request for a mathematical operation; and (b) the SSI system detects the content of this request and outputs to the user (via a bone conduction transducer) the result of the mathematical operation on the numbers.  … … In each example in this paragraph, the feedback may be audible to a human user wearing the SSI system yet not audible to other persons in the vicinity of that user”) and (“[0035] … the SSI system is wearable and portable …” and (“[136] … For instance: (a) a user may internally articulate instructions; and (b) the SSI device may respond to a phone call in accordance with the instructions (e.g., by saying “hello”, “how are you”, “call you later”, “what's up”, “yes”, or “no”) by Kapur et al. US 20190074012 A1
 by Kapur et al. US 20190074012 A1 
           transmit the speech data representing the speech of the first user to the communication device associated with the second user on the communication network using the communication interface; (“[0034] In some cases, the SSI system facilitates private human-to-human communication. For instance, the SSI system: (a) may detect the content of speech that is internally articulated by a first user wearing the SSI system; (b) may send a first message to another human (e.g., to a mobile device or computer associated with the other human), which first message comprises the detected content; (c) may receive a second message from the other human (e.g., from a mobile device or computer associated with the other human); and (d) may convert the second message into data representing sound and may output the second message to the first user. The second message may be audible to the first user (who is wearing the SSI system) yet not audible to other persons in the vicinity of the first user. In the preceding sentence, the entire two-way communication may be undetectable by other persons in the vicinity of the first user (who is internally articulating). (“[0081] Wireless transceivers 903, 917, 919 may send and receive wireless radio signals in accordance with one or more wireless standards, …”) by Kapur et al. US 20190074012 A1
           receive speech data representing speech of a second user from the communication device associated with the second user on the communication network using the communication interface; and Kapur teaches (“[0034] In some cases, the SSI system facilitates private human-to-human communication. For instance, the SSI system: (a) may detect the content of speech that is internally articulated by a first user wearing the SSI system; (b) may send a first message to another human (e.g., to a mobile device or computer associated with the other human), which first message comprises the detected content; (c) may receive a second message from the other human (e.g., from a mobile device or computer associated with the other human); and (d) may convert the second message into data representing sound and may output the second message to the first user. The second message may be audible to the first user (who is wearing the SSI system) yet not audible to other persons in the vicinity of the first user. In the preceding sentence, the entire two-way communication may be undetectable by other persons in the vicinity of the first user (who is internally articulating).”)  by Kapur et al. US 20190074012 A1
 output audio of the speech of the second user based on the received speech data representing the speech of the second user. Kapur teaches (“[0034] … (d) may convert the second message into data representing sound and may output the second message to the first user. The second message may be audible to the first user (who is wearing the SSI system) …”) by Kapur et al. US 20190074012 A1

Claim 19 is a method claim with a limitation similar to the limitation of system Claim 1 and is rejected under similar rationale.
Claim 20 is a non-transitory computer readable medium claim with a limitation similar to the limitation of system Claim 1 and is rejected under similar rationale.
Regarding Claim 20, Kapur further teaches: 
20. A non-transitory computer readable medium containing program instructions that, when executed, cause one or more processors to:  Kapur teaches (“[0151] In illustrative implementations, one or more computers execute programs according to instructions encoded in one or more tangible, non-transitory, computer-readable media. For example, in some cases, these instructions comprise instructions for a computer to perform any calculation, computation, program, algorithm, or computer function described or implied herein. For example, in some cases, instructions encoded in a tangible, non-transitory, computer-accessible medium comprise instructions for a computer to perform the Computer Tasks.”) by Kapur et al. US 20190074012 A1 

Regarding Claim 7, Kapur teaches the system claim 1 as identified above.
Kapur further teaches:
7. The communication system of claim 1, wherein the received speech data from the communication network representing the speech of the second user comprises audio of the second user. Kapur teaches  (“[0034] In some cases, the SSI system facilitates private human-to-human communication. For instance, the SSI system: (a) may detect the content of speech that is internally articulated by a first user wearing the SSI system; (b) may send a first message to another human (e.g., to a mobile device or computer associated with the other human), which first message comprises the detected content; (c) may receive a second message from the other human (e.g., from a mobile device or computer associated with the other human); and (d) may convert the second message into data representing sound and may output the second message to the first user. The second message may be audible to the first user (who is wearing the SSI system) yet not audible to other persons in the vicinity of the first user. In the preceding sentence, the entire two-way communication may be undetectable by other persons in the vicinity of the first user (who is internally articulating).”)  (“[0149] … (10) to receive data from, control, or interface with one or more sensors; … … (12) to receive signals indicative of human input; …”) (“[0149] In illustrative implementations of this invention, one or more computers (e.g., servers, network hosts, client computers, integrated circuits, microcontrollers, controllers, field-programmable-gate arrays, personal computers, digital computers, driver circuits, or analog computers) are programmed or specially adapted to perform one or more of the following tasks: (1) to control the operation of, or interface with, hardware components of an SSI device, including any electrode, ADC, earphone, bone conduction transducer, or wireless transceiver; (2) to concatenate measurements; (3) to extract a signal of interest from noisy real time data, including by thresholding, feature fusion and performing detection and classification with one or more neural networks (e.g., CNNs); (4) to perform natural language processing; (5) to detect content of internally articulated speech, based on electrode measurements; (6) to calculate a response to internally articulated input; (7) to output instructions to control audio feedback to a user; (8) to output instructions to control another device, such as a luminaire, television or home appliance; (9) to detect content of internally articulated input and, in response to the input, to send a message to another device (e.g., to send a message to another person by sending the message to a device associated with the other person); (10) to receive data from, control, or interface with one or more sensors; (11) to perform any other calculation, computation, program, algorithm, or computer function described or implied herein; (12) to receive signals indicative of human input; …”) (“[0154] In some cases, one or more of the following hardware components are used for network communication: a computer bus, a computer port, network connection, network interface device, host adapter, wireless module, wireless card, signal processor, modem, router, cables or wiring.”) by Kapur et al. US 20190074012 A1


Regarding Claim 15, Kapur teaches the system claim 1 as identified above.
Kapur further teaches:
15. The communication system of claim 1, wherein: the speech system associated with the first user is further configured to receive an audio signal of the speech of the first user when the first user is speaking; and   Kapur teaches (“[0129] In some implementations, the SSI device enables personalized bi-directional human-machine interfacing in a concealed and seamless manner, where the element of interaction is in natural language. This may facilitate a complementary synergy between human users and machines, where certain tasks may be outsourced to a computer. After an internally articulated phrase is recognized, the computer may contextually process the phrase according to the relevant application the user accesses. …”)     by Kapur et al. US 20190074012 A1
        the one or more processors are further configured to determine the speech data representing the speech of the first user by using a machine learning model to remove noise in the audio signal of the speech of the first user based on the signal indicative of the speech muscle activation patterns of the first user when the first user is speaking. FIG. 10, Kapur teaches  (“[0129] In some implementations, the SSI device enables personalized bi-directional human-machine interfacing in a concealed and seamless manner, where the element of interaction is in natural language. This may facilitate a complementary synergy between human users and machines, where certain tasks may be outsourced to a computer. After an internally articulated phrase is recognized, the computer may contextually process the phrase according to the relevant application the user accesses. …”)  (“[0094] … The threshold may tend to filter out low-voltage fluctuations or background noise. …”) (“[0103] Among other things, the first CNN may effectively impose a floor for internal articulation rate (e.g., a floor for the number of phonemes internally articulated by a user per unit of time). Thus, the first CNN may effectively impose a floor that eliminates “dead time” which occurs when the user is not internally articulating. The CNN may effectively delete (not pass on) signal portions (e.g., time windows) where the internal articulation rate is below the floor. Likewise, the first CNN may effectively determine that other parts of a signal (even above the floor) are not of interest and thus may delete (not allow to pass) those other parts of the signal that are not of interest.”) (“[0116] In some implementations, the neural network(s) are trained on training data. For instance, the training data may comprise a set of labeled words (or labeled phonemes) that have been internally articulated. The training data may be internally articulated by multiple different persons, in order to train the SSI device to recognize words that are internally articulated by different persons. Alternatively, training may be customized for a particular user and at least a portion of the training data may comprise labeled words (or labeled phonemes) that were internally articulated by the particular user.”) (“[0197] In some implementations, this invention is a method comprising: (a) taking measurements of a set of electrical signals at positions on a user's skin, which skin is part of the user's head or neck; and (b) analyzing the measurements to recognize content of internally articulated speech by the user; wherein at least a portion of the internally articulated speech occurs when the user is not exhaling. In some cases, analyzing the measurements includes identifying temporal windows during which the electrical signals are low-voltage. In some cases, analyzing the measurements includes identifying temporal windows during which each electrical signal, in the set of electrical signals, occurs at a specific position on the user's skin and has a root mean square (RMS) voltage, which RMS voltage: (a) is greater than or equal to 8 microvolts and less than or equal to 20 microvolts; and (b) is the RMS potential difference between voltage at the specific position and voltage at a reference electrode that is positioned on skin of an ear of the user. In some cases, the content which is recognized comprises one or more words. In some cases, the method further comprises providing audio feedback to the user, via sound vibrations produced by an earphone or bone conduction transducer. …”)   by Kapur et al. US 20190074012 A1

Regarding Claim 17, Kapur teaches the system claim 1 as identified above.
Kapur further teaches: 
17. The communication system of claim 1, wherein: the speech data representing the speech of the first user is first speech data representing the speech of the first user; the communication interface is configured to:  Kapur teaches  (“[0034] In some cases, the SSI system facilitates private human-to-human communication. For instance, the SSI system: (a) may detect the content of speech that is internally articulated by a first user wearing the SSI system; (b) may send a first message to another human (e.g., to a mobile device or computer associated with the other human), which first message comprises the detected content; (c) may receive a second message from the other human (e.g., from a mobile device or computer associated with the other human); and (d) may convert the second message into data representing sound and may output the second message to the first user. The second message may be audible to the first user (who is wearing the SSI system) yet not audible to other persons in the vicinity of the first user. In the preceding sentence, the entire two-way communication may be undetectable by other persons in the vicinity of the first user (who is internally articulating).”) by Kapur et al. US 20190074012 A1
  communicate with the communication device associated with a second user on the communication network when the first user is on a first call; and  Kapur teaches  (“[0034] In some cases, the SSI system facilitates private human-to-human communication. For instance, the SSI system: (a) may detect the content of speech that is internally articulated by a first user wearing the SSI system; … … ; (c) may receive a second message from the other human (e.g., from a mobile device or computer associated with the other human); …”) by Kapur et al. US 20190074012 A1
       communicate with a communication device associated with a third user on the communication network when the first user is on a second call; and Kapur teaches system pause and pause limit. (“The system may react in real-time to this detected content. In some cases, the system reacts by providing audio feedback to the user via an earphone or a bone conduction transducer. In other cases, the system reacts by controlling another device, such as a luminaire or television. In other cases, the system reacts by sending a message to a device associated with another person.”) (“[0105] In FIG. 10, an internally articulated phrase may be detected in the signal and may be temporarily stored in memory in data buffer 1059. A computer may determine: (a) the time elapsed between each word and the next word; and (b) whether the time elapsed exceeds a pause limit 1061. If the elapsed time is less than the pause limit, then this indicates that the user intended the new word to be part of the same phrase, and this new word is also added to the data buffer and the process continues 1063. If the elapsed time is greater than or equal to the pause limit and there are one or more words in the data buffer, this may indicate that the user has completed a phrase, and thus: (a) the buffered phrase may be inputted into a NLP (natural language processing) algorithm and deleted from the data buffer; and (b) the elapsed time between words may be reset to zero 1065.”)  by Kapur et al. US 20190074012 A1
 wherein the one or more processors are further configured: to determine second speech data of the speech of the first user when the first user is speaking on the second call; and Kapur teaches (“[0106] Thus, in FIG. 10, the system may count time elapsed during a pause in internal articulation. If a pause between a new word and the most recent word that preceded it is less than a pause limit (e.g., 4 seconds), then the new word is added to a data buffer. If the pause is more than the pause limit (i.e., if the pause is “long”), then buffered words for the phrase since the last long pause is inputted into an NLP (natural language processor). Thus, effectively: (a) the system may buffer data regarding a group of phonemes until a pause exceeds the pause limit (e.g., 4 seconds); and then (b) the system may forward data regarding the group of phonemes to the NLP.”)   (“[0133] In closed-loop mode, the SSI device may respond to the user's internally articulated queries through aural feedback (which is audible to the user but not to other persons in the vicinity of the user). This aural feedback helps enable a closed-loop, silent and seamless conversation with a computing device.”) …”) by Kapur et al. US 20190074012 A1
 transmit the second speech data to the communication device associated with the third user on the communication network using the communication interface. Kapur teaches  (“[0128] In some implementations, the SSI device includes a user interface (UI). The UI may include: (a) a natural language processor to detect content of user's internally articulated speech and, in some use cases, to generate instructions for audio feedback; (b) software for generating a response to the user's internally articulated speech (which response may, in some use scenarios, comprise audio feedback to the user) …”) (“[0149] …  to output instructions to control another device, such as a luminaire, television or home appliance; (9) to detect content of internally articulated input and, in response to the input, to send a message to another device (e.g., to send a message to another person by sending the message to a device associated with the other person); …”)  (“[0197] … In some cases, the method further comprises controlling at least one device in accordance with instructions, which instructions were at least part of the content of the internally articulated speech. In some cases, the method further comprises sending a message that includes at least a portion of the content of the internally articulated speech. In some cases, the method further comprises: (a) sending, to a device associated with a person other than the user, a first message that includes at least a portion of the content of the internally articulated speech; …”) by Kapur et al. US 20190074012 A1

Regarding Claim 18, Kapur teaches the system claim 17 as identified above.
Kapur further teaches:  
18. The communication system of claim 17, wherein: the speech system associated with the first user is further configured to receive an audio signal of the speech of the first user when the first user is speaking;    
 the signal indicative of the speech muscle activation patterns of the first user is a first signal indicative of the speech muscle activation patterns first user; and (“[0074] Each of the electrode configurations described above or shown in FIG. 5, 6, 7 or 8 may measure voltage from all muscles that are activated during internal articulation. This is because the signal of interest may travel away from the source and reach all electrodes, albeit at different intensities. Thus, the positions of the electrodes may be adjusted.”) (“[0079] FIG. 9 is a box diagram that shows hardware in a silent speech interface. In the example shown in FIG. 9, a wearable housing 900 is configured to be worn on a user's head and neck, and to curve over and partially around (and to be supported by) an ear of the user. Wearable housing 900 includes a bone conduction transducer 901, wireless transceivers 903 and 917, electrodes 905, electrode leads 907, an amplifier 909, an ADC (analog-to-digital converter) 911, a microcontroller 915, and a battery 913. Bone conduction transducer 901 may create vibrations that deliver audio feedback to a user 150. For instance, bone conduction transducer 901 may be positioned (e.g., touching the user's hair or scalp) adjacent to a bony protuberance behind the user's ear. Wireless transceiver 903 may receive wireless signals that encode audio feedback, and may convert these into digital or analog signals, and may send the digital or analog signals to bone conduction transducer 901. Electrodes 905 may measure voltage at positions on the user's skin (e.g., positions on the user's head and neck). Electrode leads 907 may electrically connect electrodes 905 and amplifier 909, Amplifier 909 may amplify analog voltage signals detected by electrodes 905. ADC 911 may convert this amplified analog signal to a digital signal and send the digital signal to microcontroller 915. Microcontroller 915 may process this digital signal and may output the processed signal to wireless transmitter 917. Battery 913 may provide power (e.g., via wired connections) to components housed in housing 100. For instance, battery 913 may provide power to bone conduction transducer 901, wireless transceivers 903 and 917, ADC 911, and microcontroller 915.”) 
      the second speech data is determined at least in part based on a second signal indicative of the speech muscle activation patterns of the first user when the first user is speaking and/or the audio signal of speech of the first user when the first user is speaking. an SSI device that performs aural feedback (e.g., via an earphone or bone conduction transducer) operates as a closed-loop input-output platform. And at least a portion of the audio feedback is different than (and in response to) words which the user internally articulated. For instance, if a user internally articulates a request for the current time, the audio feedback may comprise an answer which states the current time. sending, to a device associated with a person other than the user, a first message that includes at least a portion of the content of the internally articulated speech (“[0128] In some implementations, the SSI device includes a user interface (UI). The UI may include: (a) a natural language processor to detect content of user's internally articulated speech and, in some use cases, to generate instructions for audio feedback; (b) software for generating a response to the user's internally articulated speech (which response may, in some use scenarios, comprise audio feedback to the user); and (c) a transducer (e.g., earphone or bone conduction transducer) configured to produce audio feedback. In some use scenarios, the audio feedback repeats the words that the user internally articulated. In other use scenarios, at least a portion of the audio feedback is different than (and in response to) words which the user internally articulated. For instance, if a user internally articulates a request for the current time, the audio feedback may comprise an answer which states the current time.”) (“[0118] In this prototype, signals that are indicative of internal articulation are captured using electrodes on the user's skin, in a facial or neck region …”) (“[0129] … The output, thus computed by the application, may then be converted using Text-to-Speech and aurally transmitted to the user. Bone conduction headphones may be employed as the aural output, so as to not impede the user's ordinary hearing. In some cases, an SSI device that performs aural feedback (e.g., via an earphone or bone conduction transducer) operates as a closed-loop input-output platform.”) (“[0163] A non-limiting example of “detecting” internal articulation is detecting neural activation of muscles that is caused by, triggered by, or involved in the internal articulation.”) (“[0197] … wherein at least a portion of the internally articulated speech occurs when the user is not exhaling. In some cases, analyzing the measurements includes identifying temporal windows during which the electrical signals are low-voltage. In some cases, analyzing the measurements includes identifying temporal windows during which each electrical signal, … … the method further comprises sending a message that includes at least a portion of the content of the internally articulated speech. In some cases, the method further comprises: (a) sending, to a device associated with a person other than the user, a first message that includes at least a portion of the content of the internally articulated speech; …”) by Kapur et al. US 20190074012 A1

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 2, 3, 5, 6  and 8, are rejected under 35 U.S.C. 103 as being unpatentable over Kapur et al. in view of Janke et al. (“EMG-to-Speech: Direct Generation of Speech From Facial Electromyographic Signals.”).     

Regarding Claim 2, Kapur teaches the system claim 1 as identified above.
Kapur further teaches: 
2. The communication system of claim 1, wherein the speech system is a wearable device comprising  Kapur teaches during internal articulation, an SSI (silent speech interfaces) device detects neuronal activation of muscles. (“[0051] In some implementations, during internal articulation, an SSI (silent speech interfaces) device detects neuronal activation of muscles. During internal articulation, efferent nerve impulses are sent from the sensorimotor cortex (brain) through cranial nerves which innervate muscles. …”) (“[0029] In illustrative implementations, the SSI (silent speech interfaces) system enables a human user to communicate silently with other humans or other devices, in such a way that the communication is not detectable by another human (other than an intended human recipient of the communication).”) (“[0031] In some cases, the SSI system performs closed-loop feedback, where neither the silently articulated input (from a user wearing the SSI system) nor the feedback to the user is detectable by other persons in the vicinity of the user. Among other things: The SSI (silent speech interfaces) system may function as a “world clock”. For instance, in some cases: (a) a user silently and internally articulates a request for the current time in a particular city; and (b) the SSI system detects the content of this request and outputs to the user (via a bone conduction transducer) the current time in that city. Likewise, the SSI system may perform math calculations for the user. For example, in some cases: (a) a user silently and internally articulates multiple numbers and a request for a mathematical operation; and (b) the SSI system detects the content of this request and outputs to the user (via a bone conduction transducer) the result of the mathematical operation on the numbers. Also, the SSI system may play a game with the user. For instance, in some cases: (a) a user silently and internally articulates a chess move (e.g., “Qg5”, which means move the Queen to the g5 square of a chessboard); and (b) the SSI system detects the content of this chess move and simulates another player by outputting to the user (via a bone conduction transducer) a responding chess move (e.g., “Ngf3”). …”)  (“[0045] In illustrative implementations, an SSI system detects the content of internally articulated words even though: (a) the internal articulation may be completely silent (to the unaided hearing of another person); and (b) the internal articulation may occur without movement of any external muscles (that is detectable by the unaided vision of another person). For instance, internal articulation by a user may occur without movement of the user's lips or facial muscles.”) by Kapur et al. US 20190074012 A1
Kapur does not explicitly teach EMG sensor.
Janke  Janke  teaches: 
 an electromyography (EMG) sensor,  comprises EMG data received from the EMG sensor FIG. 10, JankeJanke  teaches with various alternative sensor technologies being actively investigated by research groups: 1) Surface electromyography (EMG) [4]–[8]: The activation potentials of facial articulatory muscles are recorded with surface electrodes, providing information about articulatory muscle movement during speech production.” Page 1 column 2) by Janke et al. (“EMG-to-Speech: Direct Generation of Speech From Facial Electromyographic Signals.”)

Janke
Janke is considered to be analogous to the claimed invention because it relates to direct EMG-to-speech transformation system.
.
Therefore, it would have been obvious for someone of ordinary skill in the art before the effective filing date of the claimed invention to modify Kapur to incorporate the teachings of Janke,  in order to include EMG sensor. 
One could have been motivated to do so because system can improve output quality. (“…We recently advanced this Unit Selection technique using a clustering approach [71] that  substantially reduces the number of units in the codebook and thus the computation time, while improving the output quality.. …” page 2378, column 2, 5 lines.) . …”) by Janke et al. (“EMG-to-Speech: Direct Generation of Speech From Facial Electromyographic Signals.”)

Regarding Claim 6, The combination teaches the system claim 2  as identified above.
Kapur further teaches:
6. The communication system of claim 2, wherein: the communication network comprises one or more computing devices configured to process Kapur teaches (“[0051] In some implementations, during internal articulation, an SSI device detects neuronal activation of muscles. During internal articulation, efferent nerve impulses are sent from the sensorimotor cortex (brain) through cranial nerves which innervate muscles. The neuronal activation of these muscles may be detected as a myoneural signal. In some implementations, one or more of the following muscles (“Articulator Muscles”) are neurologically activated during internal articulation: …”) (“[0074] Each of the electrode configurations described above or shown in FIG. 5, 6, 7 or 8 may measure voltage from all muscles that are activated during internal articulation. …”) (“[0034] In some cases, the SSI system facilitates private human-to-human communication. For instance, the SSI system: (a) may detect the content of speech that is internally articulated by a first user wearing the SSI system; (b) may send a first message to another human (e.g., to a mobile device or computer associated with the other human), which first message comprises the detected content; (c) may receive a second message from the other human (e.g., from a mobile device or computer associated with the other human); and (d) may convert the second message into data representing sound and may output the second message to the first user. The second message may be audible to the first user (who is wearing the SSI system) yet not audible to other persons in the vicinity of the first user. In the preceding sentence, the entire two-way communication may be undetectable by other persons in the vicinity of the first user (who is internally articulating).”)  (“[0047] FIG. 1 shows a user wearing a silent speech interface (SSI) device. In the example shown in FIG. 1, the SSI device 100 is configured to be worn adjacent to the head and neck of a user 150. SSI device 100 includes a curved structure 110. When the SSI device is being worn by the user, curved structure 110 may curve around and above, and may be supported by, the user's ear. SSI device 100 may house a bone conduction transducer 108 that is positioned behind the user's ear and that outputs vibrations that may be heard by the user but not by other persons in the user's environment. SSI device 100 may include a clip-on extension 120. Both extension 120 and the main body of SSI device 100 may house electrodes. In FIG. 1, a portion of the main body of SSI device 100 extends below the jawline and houses electrodes 131, 132, and 133 that are worn on the user's skin in the submaxillary region. Likewise, in FIG. 1, clip-on extension 120 houses electrodes 134 and 135 that are worn on the user's skin in the oral (lip) region and mental (chin) region,  respectively. Sensors 134 and 135 are electrically connected to the main body of SSI device 100 via wired connection 144. SSI device 100 may be configured to communicate wirelessly (e.g., by wireless transmission in accordance with a Bluetooth® protocol) with one or more computers or other electronic devices. Alternatively or in addition, wired connection 140 may allow SSI device 100 to communicate with one or more other computers or electronic devices.”) (“[0079] FIG. 9 is a box diagram that shows hardware in a silent speech interface. In the example shown in FIG. 9, a wearable housing 900 is configured to be worn on a user's head and neck, and to curve over and partially around (and to be supported by) an ear of the user. Wearable housing 900 includes a bone conduction transducer 901, wireless transceivers 903 and 917, electrodes 905, electrode leads 907, an amplifier 909, an ADC (analog-to-digital converter) 911, a microcontroller 915, and a battery 913. Bone conduction transducer 901 may create vibrations that deliver audio feedback to a user 150. For instance, bone conduction transducer 901 may be positioned (e.g., touching the user's hair or scalp) adjacent to a bony protuberance behind the user's ear. Wireless transceiver 903 may receive wireless signals that encode audio feedback, and may convert these into digital or analog signals, and may send the digital or analog signals to bone conduction transducer 901. Electrodes 905 may measure voltage at positions on the user's skin (e.g., positions on the user's head and neck). Electrode leads 907 may electrically connect electrodes 905 and amplifier 909, Amplifier 909 may amplify analog voltage signals detected by electrodes 905. ADC 911 may convert this amplified analog signal to a digital signal and send the digital signal to microcontroller 915. Microcontroller 915 may process this digital signal and may output the processed signal to wireless transmitter 917. Battery 913 may provide power (e.g., via wired connections) to components housed in housing 100. For instance, battery 913 may provide power to bone conduction transducer 901, wireless transceivers 903 and 917, ADC 911, and microcontroller 915.”) (“[0080] … Computer 921 may obtain data from one or more remote computer servers via the Internet. To do so, computer 921 may access the Internet via connection to internet 923. Computer 921 may output signals that encode audio feedback for the user, and these signals may be converted into wireless format and transmitted by wireless transceiver 919. …”) (“[0085] In the example shown in FIG. 10, a user's inner speech (e.g., mental speech) or mental verbal imagery 1003 may produce efferent nerve signaling 1005, which in turn may cause internal articulation 1000 (e.g., neural activation at neuromuscular junctions in Articulator Muscles). This internal articulation 1000 may produce somato-sensory feedback 1001 to the user.”)  (“For instance, the 1D vector of electrode measurements may be fed forward through a first CNN that: (a) performs optimized (trained) spatio-temporal convolutional transformations  …”)     (“[0097] In FIG. 10, mean power spectral moments 1040 may be calculated as follows: The first spectral moment (SM1) is SM1=ϵ.sub.j=1.sup.MP.sub.jf.sub.j; the second spectral moment (SM2) is SM2=Σ.sub.j=1.sup.MP.sub.jf.sub.j.sup.2; the third spectral moment (SM3) is SM3=ΣM.sub.j=1.sup.MP.sub.jf.sub.j.sup.3; and so on, where f.sub.j is frequency of the spectrum at frequency bin j, P.sub.j is the power spectrum at frequency bin j, and M is length of the frequency bin (e.g., number of frequency bins).”)  (“[0123] In this prototype, the signal undergoes a representation transformation before being input to the recognition model. A running window average is employed to identify and omit single spikes (>30 μV above baseline) in the stream, with amplitudes greater than average values for nearest 4 points before and after. Optionally, mel-frequency cepstral coefficient based representations may be employed to characterize the envelopes of human speech. The signal stream is framed into 0.025 s windows, with a 0.01 s step between successive windows, followed by a periodogram estimate computation of the power spectrum for each frame. A Discrete Cosine Transform (DCT) may be applied to the log of the mel filterbank applied to the power spectra. This allows the SSI device to effectively learn directly from the processed signal without explicitly detecting any features.”)  by Kapur et al. US 20190074012 A1
Kapur does not explicitly teach EMG data conversion.
Janke teaches :
EMG: Fig.1,  Janke teaches (“III. EMG-TO-SPEECH TRANSFORMATION The general framework of the proposed EMG-to-speech approach is shown in Fig. 2. It consists, broadly, of two stages: 1) a training stage (green arrows), page 3, entire column. 1) (“(Artificial Neural Networks are models whose power lays in the interconnection of many simple units (“neurons”) that, together, can perform complex calculations. This section describes EMG-to-Speech conversion with two different kinds of neural network models: Feedforward deep neural networks (DNNs), and Long-Short-Term Memory (LSTM) networks.”) Page 5, col. 1, section E) by Janke et al. (“EMG-to-Speech: Direct Generation of Speech From Facial Electromyographic Signals.”)
Janke is considered to be analogous to the claimed invention because it relates to direct EMG-to-speech transformation system.
Therefore, it would have been obvious for someone of ordinary skill in the art before the effective filing date of the claimed invention to modify Kapur to incorporate the teachings of Janke in order to include EMG sensor. 
One could have been motivated to do so because system can improve output quality. (“…We recently advanced this Unit Selection technique using a clustering approach [71] that  substantially reduces the number of units in the codebook and thus the computation time, while improving the output quality.. …” page 2378, column 2, 5 lines.) . …”) by Janke et al. (“EMG-to-Speech: Direct Generation of Speech From Facial Electromyographic Signals.”)


Regarding Claim 3, the combination teaches the system claim 2 as identified above.
Kapur further teaches:
3. The communication system of claim 2, wherein: the speech data representing the speech of the first user comprises a spectrogram or audio of the speech of the first user; Kapur teaches (“[0076] In each of the electrode configurations described above in this “Electrode” section, a low voltage signal (produced during internal articulation by a user) may have a root mean square (RMS) voltage that is less than one third of the RMS voltage that occurs during ordinary speech of the user.”) (“[0123] In this prototype, the signal undergoes a representation transformation before being input to the recognition model. A running window average is employed to identify and omit single spikes (>30 μV above baseline) in the stream, with amplitudes greater than average values for nearest 4 points before and after. Optionally, mel-frequency cepstral coefficient based representations may be employed to characterize the envelopes of human speech. The signal stream is framed into 0.025 s windows, with a 0.01 s step between successive windows, followed by a periodogram estimate computation of the power spectrum for each frame. A Discrete Cosine Transform (DCT) may be applied to the log of the mel filterbank applied to the power spectra. This allows the SSI device to effectively learn directly from the processed signal without explicitly detecting any features.”) by Kapur et al. US 20190074012 A1
        and the one or more processors are further configured to use a machine learning model to convert the Kapur teaches (“[0100] In FIG. 10, slope sign change (SSC) 1045 may be calculated as the number of times that the slope of the signal changes sign during the time window. …”)  (“[0104] In FIG. 10, the first CNN may determine whether a signal (e.g. a vector of measurements for a time window) is a signal of interest 1053. If the signal is not a signal of interest, then it may be disregarded 1055. If the signal (e.g. a vector of measurements for a time window) is a signal of interest, then the signal may be fed forward through a second CNN which has already been optimized (trained) 1057.”) (“[0108] In FIG. 10, a first CNN may perform feature fusion and may detect a signal of interest (e.g., a signal indicative of internal articulation activation). This first CNN may perform steps 1027, 1047, 1049 and 1051 in FIG. 10. This first CNN may include SPTC (spatiotemporal convolution) layers, with ReLU (rectified linear unit) activation function and BN (batch normalization). For instance, this first CNN may comprise the following layers (in the following order): …”) [0123] In this prototype, the signal undergoes a representation transformation before being input to the recognition model. A running window average is employed to identify and omit single spikes (>30 μV above baseline) in the stream, with amplitudes greater than average values for nearest 4 points before and after. Optionally, mel-frequency cepstral coefficient based representations may be employed to characterize the envelopes of human speech. The signal stream is framed into 0.025 s windows, with a 0.01 s step between successive windows, followed by a periodogram estimate computation of the power spectrum for each frame. A Discrete Cosine Transform (DCT) may be applied to the log of the mel filterbank applied to the power spectra. This allows the SSI device to effectively learn directly from the processed signal without explicitly detecting any features.”)    (“[0028] Alternatively, in some cases, a signal of interest (that comprises a low voltage signal produced by internal articulation) is extracted by a neural network (e.g., a CNN) without explicitly excluding voltages above a cutoff frequency and without explicitly excluding voltage spikes. Instead, in this alternative approach, the neural network (e.g., CNN) may be trained on a training set of voltage measurements taken during internal articulation, and may thereby machine learn to extract the signal of interest.”) (“[0079] FIG. 9 is a box diagram that shows hardware in a silent speech interface. In the example shown in FIG. 9, a wearable housing 900 is configured to be worn on a user's head and neck, and to curve over and partially around (and to be supported by) an ear of the user. Wearable housing 900 includes a bone conduction transducer 901, wireless transceivers 903 and 917, electrodes 905, electrode leads 907, an amplifier 909, an ADC (analog-to-digital converter) 911, a microcontroller 915, and a battery 913. Bone conduction transducer 901 may create vibrations that deliver audio feedback to a user 150. For instance, bone conduction transducer 901 may be positioned (e.g., touching the user's hair or scalp) adjacent to a bony protuberance behind the user's ear. Wireless transceiver 903 may receive wireless signals that encode audio feedback, and may convert these into digital or analog signals, and may send the digital or analog signals to bone conduction transducer 901. Electrodes 905 may measure voltage at positions on the user's skin (e.g., positions on the user's head and neck). Electrode leads 907 may electrically connect electrodes 905 and amplifier 909, Amplifier 909 may amplify analog voltage signals detected by electrodes 905. ADC 911 may convert this amplified analog signal to a digital signal and send the digital signal to microcontroller 915. Microcontroller 915 may process this digital signal and may output the processed signal to wireless transmitter 917. Battery 913 may provide power (e.g., via wired connections) to components housed in housing 100. For instance, battery 913 may provide power to bone conduction transducer 901, wireless transceivers 903 and 917, ADC 911, and microcontroller 915.”) by Kapur et al. US 20190074012 A1
Kapur does not explicitly teach convert EMG data to spectrogram. 
Janke teaches:
convert the EMG data to the spectrogram   Fig. 1-2, Janke  teaches (“A. Run-Time Evaluation … We only state pure conversion time for mapping EMG features to MFCC features, …” PAGE 6, COLUMN. 2, A. Run-Time Evaluation) (“To train the feature transformation, parallel source (EMG) and target (MFCC/F0) vectors are stacked to create joint feature vectors. …” PAGE 2378, column. 1, section C) by Janke et al. (“EMG-to-Speech: Direct Generation of Speech From Facial Electromyographic Signals.”)

Janke is considered to be analogous to the claimed invention because it relates to direct EMG-to-speech transformation system.
Therefore, it would have been obvious for someone of ordinary skill in the art before the effective filing date of the claimed invention to modify Kapur to incorporate the teachings of Janke in order to include EMG sensor. 
One could have been motivated to do so because system can improve output quality. (“…We recently advanced this Unit Selection technique using a clustering approach [71] that  substantially reduces the number of units in the codebook and thus the computation time, while improving the output quality.. …” page 2378, column 2, 5 lines.) . …”) by Janke et al. (“EMG-to-Speech: Direct Generation of Speech From Facial Electromyographic Signals.”)


Regarding Claim 5, the combination teaches the system claim 3 as identified above.
Kapur does not explicitly teach conversion using a first portion of the machine learning model to convert the EMG data to the spectrogram;
Janke further teaches:
5. The communication system of claim 3, wherein converting the EMG data to the audio of the speech of the first user comprises: using a first portion of the machine learning model to convert the EMG data to the spectrogram; and Fig.1,  Janke teaches (“III. EMG-TO-SPEECH TRANSFORMATION The general framework of the proposed EMG-to-speech approach is shown in Fig. 2. It consists, broadly, of two stages: 1) a training stage (green arrows), page 3, entire column. 1) (“(Artificial Neural Networks are models whose power lays in the interconnection of many simple units (“neurons”) that, together, can perform complex calculations. This section describes EMG-to-Speech conversion with two different kinds of neural network models: Feedforward deep neural networks (DNNs), and Long-Short-Term Memory (LSTM) networks.”) Page 5, col. 1, section E) by Janke et al. (“EMG-to-Speech: Direct Generation of Speech From Facial Electromyographic Signals.”)
     using a second portion of the machine learning model to convert the spectrogram to the audio of the speech of the first user.  Janke Janke  teaches B. Vocoding For methods where the output of the mapping is a sequence of MFCCs and F0 s, it is necessary to convert those features back to an audio waveform. In our vocoding step, this is achieved using the MLSA filter method [60]. This is possible since the MFCCs and F0 s were extracted as MLSA filter parameters. The vocoding step is the same for all mapping methods in which it is used (i.e. all but Unit Selection with direct concatenative synthesis).”) by Janke et al. (“EMG-to-Speech: Direct Generation of Speech From Facial Electromyographic Signals.”)

 Janke
Janke is considered to be analogous to the claimed invention because it relates to direct EMG-to-speech transformation system.
Therefore, it would have been obvious for someone of ordinary skill in the art before the effective filing date of the claimed invention to modify Kapur to incorporate the teachings of Janke in order to include EMG sensor. 
One could have been motivated to do so because system can improve output quality. (“…We recently advanced this Unit Selection technique using a clustering approach [71] that  substantially reduces the number of units in the codebook and thus the computation time, while improving the output quality.. …” page 2378, column 2, 5 lines.) . …”) by Janke et al. (“EMG-to-Speech: Direct Generation of Speech From Facial Electromyographic Signals.”)


Regarding Claim 8, Kapur teaches the system claim 1 as identified above.
Kapur further teaches:
8. The communication system of claim 1, wherein: the received speech data from the communication network representing the speech of the second user comprises EMG data or spectrogram data associated with the speech of the second user; and Kapur  teaches Kapur teaches a user may internally articulate instructions; and (b) the SSI device may respond to a phone call in accordance with the instructions (e.g., by saying “hello”, “how are you”, “call you later”, “what's up”, “yes”, or “no” and time window (spectrogram) of the signal.   (“[0136] In some cases, when operating in open-loop mode, the SSI device may be employed as an input modality to control devices or to initiate or request services. For instance, the SSI device may function as an IoT (internet of things) controller, where: (a) a user silently and internally articulates instructions, without any action that is detectable by persons around the user; and (b) in response to the internally articulated instructions, the SSI device controls home appliances, such as by switching on/off home lighting, or by controlling a television or HVAC systems. Likewise, the SSI device may be employed to respond to phone calls. For instance: (a) a user may internally articulate instructions; and (b) the SSI device may respond to a phone call in accordance with the instructions (e.g., by saying “hello”, “how are you”, “call you later”, “what's up”, “yes”, or “no”). “) (“[0086] In FIG. 10, distributed electrodes may record neural activations that occur during internal articulation 1007. One or more amplifiers may amplify 1009 the recorded signals (e.g., with a 24× gain). The amplified signals from multiple electrodes may be concatenated into a signal vector 1011 for each temporal window of a real-time signal 1013. For instance, each temporal window may be four seconds. For example, in some cases, there are three electrodes and a 4 second time window, and a single 1D vector comprises (in the following order) data encoding measurements from the first electrode during that window, then data encoding measurements from the second electrode during that window, and then measurements from the third electrode during that window.”)  (“[0079] FIG. 9 is a box diagram that shows hardware in a silent speech interface. In the example shown in FIG. 9, a wearable housing 900 is configured to be worn on a user's head and neck, and to curve over and partially around (and to be supported by) an ear of the user. Wearable housing 900 includes a bone conduction transducer 901, wireless transceivers 903 and 917, electrodes 905, electrode leads 907, an amplifier 909, an ADC (analog-to-digital converter) 911, a microcontroller 915, and a battery 913. Bone conduction transducer 901 may create vibrations that deliver audio feedback to a user 150. For instance, bone conduction transducer 901 may be positioned (e.g., touching the user's hair or scalp) adjacent to a bony protuberance behind the user's ear. Wireless transceiver 903 may receive wireless signals that encode audio feedback, and may convert these into digital or analog signals, and may send the digital or analog signals to bone conduction transducer 901. Electrodes 905 may measure voltage at positions on the user's skin (e.g., positions on the user's head and neck). ….”) by Kapur et al. US 20190074012 A1
Kapur does not teach explicitly teach second user data conversion.
Janke  teaches:      the one or more processors are further configured to use a machine learning model to convert the EMG data or spectrogram data to the audio of the speech of the second user.  FIG. 1-3  Janke teaches Fig.1,  Janke teaches (“III. EMG-TO-SPEECH TRANSFORMATION The general framework of the proposed EMG-to-speech approach is shown in Fig. 2. It consists, broadly, of two stages: 1) a training stage (green arrows), page 3, entire column. 1) (“(Artificial Neural Networks are models whose power lays in the interconnection of many simple units (“neurons”) that, together, can perform complex calculations. This section describes EMG-to-Speech conversion with two different kinds of neural network models: Feedforward deep neural networks (DNNs), and Long-Short-Term Memory (LSTM) networks.”) Page 5, col. 1, section E) by Janke et al. (“EMG-to-Speech: Direct Generation of Speech From Facial Electromyographic Signals.”)

Janke is considered to be analogous to the claimed invention because it relates to direct EMG-to-speech transformation system.
Therefore, it would have been obvious for someone of ordinary skill in the art before the effective filing date of the claimed invention to modify Kapur to incorporate the teachings of Janke in order to include EMG sensor. 
One could have been motivated to do so because system can improve output quality. (“…We recently advanced this Unit Selection technique using a clustering approach [71] that  substantially reduces the number of units in the codebook and thus the computation time, while improving the output quality.. …” page 2378, column 2, 5 lines.) . …”) by Janke et al. (“EMG-to-Speech: Direct Generation of Speech From Facial Electromyographic Signals.”)
Claims 9 are rejected under 35 U.S.C. 103 as being unpatentable over Kapur and Janke in view of  Wilcox et al.  US 10261749 B1
Regarding Claim 9, the combination teaches the system claim 8 as identified above.
Kapur  further teaches:
9. The communication system of claim 8, wherein the machine learning model is trained to generate the audio of the speech of the second user in a   Janke   teaches selection unit. (“B. Vocoding For methods where the output of the mapping is a sequence of MFCCs and F0 s, it is necessary to convert those features back to an audio waveform. In our vocoding step, this is achieved using the MLSA filter method [60]. This is possible since the MFCCs and F0 s were extracted as MLSA filter parameters. The vocoding step is the same for all mapping methods in which it is used (i.e. all but Unit Selection with direct concatenative synthesis).”) Fig.1,  Janke teaches (“III. EMG-TO-SPEECH TRANSFORMATION The general framework of the proposed EMG-to-speech approach is shown in Fig. 2. It consists, broadly, of two stages: 1) a training stage (green arrows), page 3, entire column. 1) (“(Artificial Neural Networks are models whose power lays in the interconnection of many simple units (“neurons”) that, together, can perform complex calculations. This section describes EMG-to-Speech conversion with two different kinds of neural network models: Feedforward deep neural networks (DNNs), and Long-Short-Term Memory (LSTM) networks.”) Page 5, col. 1, section E) by Janke et al. (“EMG-to-Speech: Direct Generation of Speech From Facial Electromyographic Signals.”)

The combination does not explicitly teach select particular voice from plurality of voices.
Wilcox teaches : 
selected one of a plurality of voices   Wilcox teaches  (“(49) During display of the panoramic image for viewing by a device, some implementations can allow user input or preferences to select a particular audio segment for output from a group of multiple available audio segments associated with a given portion of a panoramic image. For example, a user can select one or more of the audio segments to be active and one or more of the audio segments to be inactive.” Col. 16, lines 42-49) (“(98) …  For example, sound types such as flowing water, birds, traffic, human voices, etc. can be detected based on such techniques, e.g., by comparing a detected sound to stored model sounds, and/or using machine learning techniques that use training based on particular types of sounds.” Col. 26, lines 5-9) by Wilcox et al.  US 10261749 B1 
Wilcox i is considered to be analogous to the claimed invention because it relates to audio output for panoramic images.
Therefore, it would have been obvious for someone of ordinary skill in the art before the effective filing date of the claimed invention to modify Kapur and Janke to incorporate the teachings of Wilcox in order to include graphical user interface to communicate between user. 
One could have been motivated to do so because system can have user-friendly communication device. (“(13) … Further, a user interface can be provided allowing a user to associate audio segments with portions of panoramic images and specify the conditions of audio output in straightforward, efficient, and intuitive ways. …” col. 5, lines 47-49) by Wilcox et al.  US 10261749 B1 

Claims 4, are rejected under 35 U.S.C. 103 as being unpatentable over Kapur and Janke in view of YOO et al. US 20210012764 A1

Regarding Claim 4, the combinationteaches the system claim 3 as identified above.
Janke further  teaches:
4. The communication system of claim 3, wherein the one or more processors are further configured to use the machine learning model to convert the EMG data to the spectrogram or audio of the speech of the first user in a selected one of a plurality of voices responsive to receiving a user selection indicating the  Fig. 8. Janke teaches Unit Selection EMG-to-speech conversion, achieving the best results with the mean cosine similarity and Spectrograms of the utterance “He is trying to cut some of the benefits”, reference on top, DNN-based EMG-to-speech on bottom….” Page 2383, col1.) (“Unit Selection EMG-to-speech conversion, achieving the bestresults with the mean cosine similarity. …”page 2378, col. 2, lines 18-19.) (“…For each of the test units, a codebook unit is selected according to the combination of two cost functions, a target cost and a concatenation cost. The target cost function measures how well a codebook unit fits the given test unit. It is calculated between the test- and codebook units’ EMG segments. The concatenation cost function is calculated between codebook units audio segments. It measures how well the units’ acoustic segments fit to each other when they are directly adjacent in the output unit sequence, which enables a smooth transition between the audio segments. Using a weighted sum (with empirically determined weights) of these two costs as the selection criterion, the unit selection process is a search for the sequence of codebook units that minimizes the total cost given the test unit sequence. In our previous work [68], [69], we have evaluated different functions for target and concatenation cost in Unit Selection EMG-to-speech conversion, achieving the best results with the mean cosine similarity.” Page 2378, column 2, para 2) .”) by Janke et al. (“EMG-to-Speech: Direct Generation of Speech From Facial Electromyographic Signals.”)  by Janke et al. (“EMG-to-Speech: Direct Generation of Speech From Facial Electromyographic Signals.”)    

Janke is considered to be analogous to the claimed invention because it relates to direct EMG-to-speech transformation system.
Therefore, it would have been obvious for someone of ordinary skill in the art before the effective filing date of the claimed invention to modify Kapur to incorporate the teachings of Janke in order to include EMG sensor. 
One could have been motivated to do so because system can improve output quality. (“…We recently advanced this Unit Selection technique using a clustering approach [71] that  substantially reduces the number of units in the codebook and thus the computation time, while improving the output quality.. …” page 2378, column 2, 5 lines.) . …”) by Janke et al. (“EMG-to-Speech: Direct Generation of Speech From Facial Electromyographic Signals.”)
 The combination does not explicitly teach selected one of a plurality of voices responsive to receiving a user selection indicating the selected one of the plurality of voices. 

selected one of a plurality of voices responsive to receiving a user selection indicating the selected one of the plurality of voices.  YOO teaches (“[0007] … generating a voice of each of multiple speakers in each section from each of the one or more multi-speaker sections by using a trained artificial neural network and the speaker feature value for each individual speaker. The artificial neural network may include an artificial neural network that has been trained, based on at least one piece of training data labeled with a voice of a test speaker, as to a feature value of the test speaker included in the training data, and a correlation between simultaneous speeches of a plurality of speakers including the test speaker and the voice of the test speaker.”)  (“[0108] The controller 112 according to an embodiment may provide the voices of multiple speakers through distinct channels, respectively. In addition, the controller 112 may provide only the voices of one or more selected speakers according to the user's selection of at least one channel.”) (“[0145] The voice-generating device 110 according to an embodiment may provide each of the voices of multiple speakers through distinct channels. Also, the voice-generating device 110 may provide only the voices of one or more selected speakers according to the user's selection of at least one channel.”) by YOO et al. US 20210012764 A1
YOO is considered to be analogous to the claimed invention because it relates to  generating a voice for each speaker from audio content including a section in which at least two or more speakers speak simultaneously.
Therefore, it would have been obvious for someone of ordinary skill in the art before the effective filing date of the claimed invention to modify Kapur and Janke to incorporate the teachings of YOO in order to include user selection feature. 
One could have been motivated to do so because system can generate voice accurately. (“[0004] One or more embodiments accurately generate a voice for each speaker from audio content including a section in which two or more speakers simultaneously speak.”) by YOO et al. US 20210012764 A1

Claims 12 are rejected under 35 U.S.C. 103 as being unpatentable over Kapur et al. in view of  Selvaraju; et al. US 12518053 B2

Regarding Claim 12, Kapur teaches the system claim 1 as identified above.
Kapur further teaches:
12. The communication system of claim 1, wherein: the speech data representing the speech of the first user comprises audio of the speech of the first user; and the one or more processors are further configured to: Kapur teaches  (“[0137] … (a) may recognize a user's internally articulated phrase “Uber to home”; …”) (“[0116] In some implementations, the neural network(s) are trained on training data. For instance, the training data may comprise a set of labeled words (or labeled phonemes) that have been internally articulated. The training data may be internally articulated by multiple different persons, in order to train the SSI device to recognize words that are internally articulated by different persons. Alternatively, training may be customized for a particular user and at least a portion of the training data may comprise labeled words (or labeled phonemes) that were internally articulated by the particular user.”)  (“[0129] In some implementations, the SSI device enables personalized bi-directional human-machine interfacing in a concealed and seamless manner, where the element of interaction is in natural language. … …  After an internally articulated phrase is recognized, the computer may contextually process the phrase according to the relevant application the user …”) by Kapur et al. US 20190074012 A1
       generate the audio of the speech of the first user based on the signal indicative of the speech muscle activation patterns of the first user when the first user is speaking silently; and Kapur teaches  (“[0137] The interface may be personally trained to recognize phrases meant to access specific services. For example, in some use scenarios, an SSI device: (a) may recognize a user's internally articulated phrase “Uber to home”; and (b) may, in response, book transport from the user's current location to the user's home. The interface may also be used as a silent input to Virtual Reality/Augmented Reality applications.”) by Kapur et al. US 20190074012 A1
Kapur does not explicitly teach remove filler word.
Selvaraju teaches : 
        automatically . Selvaraju  teaches (“(26) The protecting SPI in spoken commands program 134 may mask superfluous portions of the classified commands (step 204). … …  Removing stop words/phrases may include removal of low informational value words such as “the”, “a”, “and”, etc., …”) …”) (“(30) … Once the SPI detected word permissions have been finalized, the modified command may be transmitted to the third-party as a modified user speech recording (audio) and/or a modified user speech corpus. …”) (“(77) The above mentioned AI component will be created by the solution provider and the ML file, like a pickle file, will be provided to the user. The AI/ML methods described above may generate models for steps 202-206 above. In step 202, a model may be generated for classifying commands in the user speech corpus. The model may, e.g., look for patterns in commands such as “find me X” or “do Y”. In step 204, a model may be generated for masking superfluous portions of the classified commands. The model may, e.g., look for stop words and low informational value words, words flagged by the user, and/or by ML from their inclusion/exclusion in similar commands. In step 206, a model may be generated for identifying SPI in unmasked portions of the classified commands as well as transmitting the modified command. The models may use ML to identify reoccurring SPI words and permissions/rules respectively. …” col. 14, lines 45-67) by Selvaraju; et al. US 12518053 B2 

Selvaraju is considered to be analogous to the claimed invention because it relates to protecting sensitive personal information, and more particularly, to protecting sensitive personal information in spoken commands.
.
Therefore, it would have been obvious for someone of ordinary skill in the art before the effective filing date of the claimed invention to modify Kapur to incorporate the teachings of Selvaraju; in order to include EMG sensor. 
One could have been motivated to do so because system may automatically classify text in a faster, more cost-effective, and more accurate manner.. (“(46) …   This applies ML, NLP, and other AI-guided techniques to automatically classify text in a faster, more cost-effective, and more accurate manner.” Col. 10, lines 48-50 by Selvaraju; et al. US 12518053 B2

Claims 10 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Kapur et al. in view of  Pengelley, US 10885911 B2

Regarding Claim 10, Kapur teaches the system claim 1 as identified above.
Kapur further teaches:
10. The communication system of claim 1, wherein: the speech data representing the speech of the first user comprises audio of the speech of the first user; and Kapur teaches   (“[0085] In the example shown in FIG. 10, a user's inner speech (e.g., mental speech) or mental verbal imagery 1003 may produce efferent nerve signaling 1005, which in turn may cause internal articulation 1000 (e.g., neural activation at neuromuscular junctions in Articulator Muscles). This internal articulation 1000 may produce somato-sensory feedback 1001 to the user.”) (“[0031] … For example, in some cases: (a) a user silently and internally articulates multiple numbers and a request for a mathematical operation; and (b) the SSI system detects the content of this request and outputs to the user (via a bone conduction transducer) the result of the mathematical operation on the numbers …”) and (“[0035] … the SSI system is wearable and portable …” and (“[136] … For instance: (a) a user may internally articulate instructions; and (b) the SSI device may respond to a phone call in accordance with the instructions (e.g., by saying “hello”, “how are you”, “call you later”, “what's up”, “yes”, or “no”) by Kapur et al. US 20190074012 A1
 by Kapur et al. US 20190074012 A1 
 transmitting the speech data representing the speech of the first user to the communication device associated with the second user on the communication network is performed Kapur  teaches Computer 921 may output signals that encode audio feedback for the user, and these signals may be converted into wireless format (i.e text protocol) and transmitted by wireless transceiver 919. ).(“[0080] … Computer 921 may obtain data from one or more remote computer servers via the Internet. To do so, computer 921 may access the Internet via connection to internet 923. Computer 921 may output signals that encode audio feedback for the user, and these signals may be converted into wireless format and transmitted by wireless transceiver 919. Computer 921 may store data in, and retrieve data from, memory device 925.”) (“[0129] In some implementations, the SSI device enables personalized bi-directional human-machine interfacing in a concealed and seamless manner, where the element of interaction is in natural language. This may facilitate a complementary synergy between human users and machines, where certain tasks may be outsourced to a computer. After an internally articulated phrase is recognized, the computer may contextually process the phrase according to the relevant application the user accesses (e.g., an IoT application may assign the internally articulated digit 3 to device number 3 whereas a Mathematics application may consider the same input as the actual number 3). The output, thus computed by the application, may then be converted using Text-to-Speech and aurally transmitted to the user. Bone conduction headphones may be employed as the aural output, so as to not impede the user's ordinary hearing. In some cases, an SSI device that performs aural feedback (e.g., via an earphone or bone conduction transducer) operates as a closed-loop input-output platform.”) (“[0034] In some cases, the SSI system facilitates private human-to-human communication. For instance, the SSI system: (a) may detect the content of speech that is internally articulated by a first user wearing the SSI system; (b) may send a first message to another human (e.g., to a mobile device or computer associated with the other human), which first message comprises the detected content; (c) may receive a second message from the other human (e.g., from a mobile device or computer associated with the other human); and (d) may convert the second message into data representing sound and may output the second message to the first user. The second message may be audible to the first user (who is wearing the SSI system) yet not audible to other persons in the vicinity of the first user. In the preceding sentence, the entire two-way communication may be undetectable by other persons in the vicinity of the first user (who is internally articulating). (“[0081] Wireless transceivers 903, 917, 919 may send and receive wireless radio signals in accordance with one or more wireless standards, …”) by Kapur et al. US 20190074012 A1
Kapur does not explicitly teach text protocol.
Pengelley teaches:
(“(33) Additionally, the virtual assistance service 232 may generate the service request 236 according to an API associated with the voice endpoint server 204(1). In some embodiments, the service request 236 may identify a request type, the content of the request (e.g., the textual information corresponding to the audio capture), an identifier of the service request 236, session information corresponding to the service request 236 (e.g., information identifying whether service request is a part of an ongoing communication session between the voice endpoint device and the chatbot server), and authentication information associated with the user 210(1), voice endpoint device 202(1), and/or the voice endpoint server device 204(1). Once the virtual assistance service 232 generates the service request 236, the virtual assistance service 232 may send the service request 236 to the bridge interface device 206(1) using address information corresponding to the chatbot server 208(1) included in the service information 234.”)  (“(42) Additionally, the security module 242 may be responsible for encrypting and decrypting communications between the voice endpoint device 202(1), voice endpoint server device 206(1), and the chatbot server 208(1). For example, the bridge interface device 206(1) may decrypt the request information 226 of the service request 236 based on decryption information and an encryption protocol associated with the voice endpoint server 204(1). Further, the bridge interface device 206(1) may encrypt the decrypted request information 226 based on encryption information and an encryption protocol associated with the chatbot server 208(1). Further, the re-encrypted request information may be included in the bot request 250. As another example, the bridge interface device 206(1) may decrypt the response information 254 included in the bot response 252 based on decryption information and an encryption protocol associated with the chatbot server 208(1). Further, the bridge interface device 206(1) may encrypt the decrypted response information 254 based on encryption information and an encryption protocol associated with the voice endpoint device 202(1) and/or the voice endpoint server 204(1). Further, the re-encrypted response information 230 may be included in the service response 228..” col 8, lines 62-67 and col. 9 lines 1-4) (“(43) The formatting module 244 may generate the bot request 250 based at least in part on the service request 236. As illustrated in FIG. 2, the bot request 250 may include the formatted request information 256. In some embodiments, the formatting module 244 may determine the formatted request information 256 based on converting the request information 226 from a format associated with the voice endpoint server 204(1) to a format associated with the chatbot server 208(1). Further, the formatting module 244 may convert the request information 226 using a format described in the API information 240.”)  (“(60) At 306, the bridge interface device generates a bot agent request based on the query text and a protocol of the bot agent. For example, the formatting module 244 may determine the bot request 250 based on the textual representation of the service request and the API information corresponding to the chatbot server 208(1).” Col. 9, lines 1-15 col. 12, lines 35-40)  by Pengelley, US 10885911 B2
Pengelley,  is considered to be analogous to the claimed invention because it relates to a  bridge interface device operates by receiving query text corresponding to audio information captured at a voice endpoint,.
Therefore, it would have been obvious for someone of ordinary skill in the art before the effective filing date of the claimed invention to modify Kapur to further incorporate the teachings of Pengelley,  in order to include communication protocol. 
One could have been motivated to do so because system may have the bridge interface device, and the bridge interface devices improve the usability and accessibility of the voice endpoint device and chatbot servers .(“(55) …  the bridge interface devices 208 present a simpler and more efficient means of implementing an interface between a voice endpoint 202(1) and the chatbot server 208(1). Moreover, the bridge interface devices 206(1)-(N) improve the usability and accessibility of the voice endpoint device 202(1)-(N) and chatbot servers 208(1)-(N) by connecting two independent technologies that offer complimentary services. …col. 11, lines 45-53”) by Pengelley, US 10885911 B2

Regarding Claim 11, The combination  teaches the system claim 10 as identified above.
Kapur further teaches:
11. The communication system of claim 10, wherein the one or more processors are further configured to use a machine learning model and the signal indicative of the speech muscle activation patterns of the first user when the first user is speaking silently as input to the machine learning model to generate the audio of the speech data representing the speech of the first user. Kapur teaches  (“[0054] FIG. 3 is a conceptual diagram that shows attenuation of voltage. In the example shown in FIG. 3, a signal of interest that is characteristic of internal articulation is generated at source 301. For instance, source 301 may be the centroid of a set of neuromuscular junctions in a group of muscles that are neuronally activated during internal articulation. Electrode 302 is attached to the user's skin, at a distance r from source 301. …”)  (“[0149] …  (1) to control the operation of, or interface with, hardware components of an SSI device, including any electrode, ADC, earphone, bone conduction transducer, or wireless transceiver; (2) to concatenate measurements; (3) to extract a signal of interest from noisy real time data, including by thresholding, feature fusion and performing detection and classification with one or more neural networks (e.g., CNNs); (4) to perform natural language processing; (5) to detect content of internally articulated speech, based on electrode measurements; (6) to calculate a response to internally articulated input; (7) to output instructions to control audio feedback to a user; (8) to output instructions to control another device, such as a luminaire, television or home appliance; (9) to detect content of internally articulated input and, in response to the input, to send a message to another device (e.g., to send a message to another person by sending the message to a device associated with the other person); (10) to receive data from, control, or interface with one or more sensors; (11) to perform any other calculation, computation, program, algorithm, or computer function described or implied herein; (12) to receive signals indicative of human input; …”) by Kapur et al. US 20190074012 A1


Claims 13 are rejected under 35 U.S.C. 103 as being unpatentable over Kapur in view of  Chaudhri et al.  US 20120293438 A1
Regarding Claim 13, Kapur teaches the system claim 1 as identified above.
Kapur does not explicitly teach wherein the one or more processors are configured to accept a call from the second user before receiving the speech data representing the speech of the second user from the communication device associated with the second user on the communication network using the communication interface wherein accepting is performed in response to receiving a gesture or an utterance from the first user.
Chaudhri  teaches :
13. The communication system of claim 1, wherein the one or more processors are configured to accept a call from the second user before receiving the speech data representing the speech of the second user from the communication device associated with the second user on the communication network using the communication interface, Chaudhri  teaches (“[0092] FIGS. 7A-7D illustrate the GUI display of a device that is transitioning the optical intensity of user-interface objects concurrent with a transition from a first user interface state to a second user interface state, according to some embodiments of the invention. In FIG. 7A, the device 700 is locked and has received an incoming call. The device 700 is displaying a prompt 706 to the user, informing the user of the incoming call, on the touch screen 714. The device is also displaying the unlock image 702 and channel 704 so that the user can unlock the device 700 in order to accept or decline the incoming call. The user begins the unlock action by making contact on the touch screen with her finger 710 on the unlock image 702.”) by Chaudhri et al.  US 20120293438 A1
wherein accepting is performed in response to receiving a gesture or an utterance from the first user. FIG. 7B-7D Chaudhri  teaches (“[0093] … The virtual buttons 708 are associated with the prompt 706; the virtual buttons shown in FIG. 7B-7D allow the user to decline or accept the incoming call. …”) (“[0094] …  At this point the user may interact with the virtual buttons 708 and accept or decline the incoming call.”) by Chaudhri et al.  US 20120293438 A1

Chaudhri is considered to be analogous to the claimed invention because it relates to relate generally to user interfaces that employ touch-sensitive displays, and more particularly, to the unlocking of user interfaces on portable electronic devices.
Therefore, it would have been obvious for someone of ordinary skill in the art before the effective filing date of the claimed invention to modify Kapur to incorporate the teachings of Chaudhri in order to include graphical user interface to communicate between user. 
One could have been motivated to do so because system can have user-friendly communication device. (“ [0007] Accordingly, there is a need for more efficient, user-friendly procedures for unlocking such devices, touch screens, and/or applications. More generally, there is a need for more efficient, user-friendly procedures for transitioning such devices, touch screens, and/or applications between user interface states (e.g., from a user interface state for a first application to a user interface state for a second application, between user interface states in the same application, or between locked and unlocked states). by Chaudhri et al.  US 20120293438 A1

Claims 14 are rejected under 35 U.S.C. 103 as being unpatentable over Kapur and Chaudhri in view of  Campbell, US 20240233728 A1
Regarding Claim 14, Kapur teaches the system claim 13 as identified above.
The combination does not explicitly teach second user call:

14. The communication system of claim 13, wherein the one or more processors are configured to receive data from the communication network indicating that the call from the second user is a silent call. Campbell teaches ( [0028] In response to detecting the user pose, method 200 mutes the internet call to receive a voice command, at 202. For example, a user may be on a conference call when they decide to initiate the voice commands. However, the user may not want to interrupt the call or allow the other users on the conference call to hear that the user is initiating the voice commands. The user may also not want to have to click on the screen or keyboard to activate the voice command feature. Therefore, the user can use the gesture to initiate the voice command and in response, the conference call may be muted. The user would then be able to provide the voice commands to navigate or control an application.”)  by Campbell ,US 20240233728 A1 
Campbel is considered to be analogous to the claimed invention because it relates  initiating a voice command based on a gesture.
Therefore, it would have been obvious for someone of ordinary skill in the art before the effective filing date of the claimed invention to modify Kapur and Chaudhri to further incorporate the teachings of Campbel in order to include call interruption. 
One could have been motivated to do so because system can recognized with varying degrees of accuracy with any of the above referenced gesture input devices.(“ [0019]  … any other hand-held device capable of detecting a motion of the user. It should be noted that gestures can be recognized with varying degrees of accuracy with any of the above referenced gesture input devices. In some examples, maintain the initiating gesture data and the terminating gesture data in a cloud-based data repository to be ingested by a machine learning computing system.”) by Campbell ,US 20240233728 A1

Claim 16 are rejected under 35 U.S.C. 103 as being unpatentable over Kapur et al. US 20190074012 A1 in view of Alava et al. US 12020677 B1.
Regarding Claim 16, Kapur teaches the system claim 15 as identified above.
Kapur does not explicitly teach change on or more attributes of a voice of the speech data.
Alava teaches:
16. The communication system of claim 15, wherein the one or more processors are further configured to change on or more attributes of a voice of the speech data. (“(27) …The modified audio data may change a tone, pitch, volume, and/or other aspects of the voice of the user  …” col. 9, lines 35-45) by Alava et al. US 12020677 B1
Alava is considered to be analogous to the claimed invention because it relates to An audio modification system includes one or more processors configured to receive audio data indicative of communication of a user.
Therefore, it would have been obvious for someone of ordinary skill in the art before the effective filing date of the claimed invention to modify Kapur to further incorporate the teachings of Alava in order to include attributes of a voice of the speech data. 
One could have been motivated to do so because system can have efficient protection of the user's communication in real-time.(“(43) …  The audio modification system 102 may automatically and/or iteratively perform the process, thereby enabling efficient protection of the user's communication in real-time (e.g., within seconds of detecting the user's voice and/or the confidential nature of the user's communication). …” col 14, lines 7-11)  by Alava et al. US 12020677 B1

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FOUZIA HYE SOLAIMAN whose telephone number is (571)270-5656. The examiner can normally be reached M-F (8-5)AM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Paras D. Shah can be reached at (571) 270-1650. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/F.H.S./Examiner, Art Unit 2653                                                                                                                                                                                                        


/Paras D Shah/Supervisory Patent Examiner, Art Unit 2653                                                                                                                                                                                                        
02/12/2026
Read full office action
Prosecution Timeline

Dec 01, 2023
Application Filed
Feb 12, 2026
Non-Final Rejection — §102, §103, §DP (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/174,120
Patent 12592217
SYSTEM AND METHOD FOR SPEECH PROCESSING
2y 5m to grant Granted Mar 31, 2026
18/116,994
Patent 12579976
USER TERMINAL, DIALOGUE MANAGEMENT SYSTEM, CONTROL METHOD OF USER TERMINAL, AND DIALOGUE MANAGEMENT METHOD
2y 5m to grant Granted Mar 17, 2026
17/888,243
Patent 12555563
SYSTEMS AND METHODS FOR CHARACTER-TO-PHONE CONVERSION
2y 5m to grant Granted Feb 17, 2026
17/666,645
Patent 12542149
METHOD AND APPARATUS FOR IMPROVING SPEECH INTELLIGIBILITY IN A ROOM
2y 5m to grant Granted Feb 03, 2026
18/932,524
Patent 12537017
COMPUTERIZED SCORING METHOD OF FEATURE EXTRACTION-BASED FOR COVERTNESS OF IMITATED MARINE MAMMAL SOUND SIGNAL
2y 5m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
67%
Grant Probability
99%
With Interview (+55.5%)
3y 0m
Median Time to Grant
Low
PTA Risk
Based on 63 resolved cases by this examiner. Grant probability derived from career allow rate.