Last updated: April 19, 2026

Application No. 18/645,780

SPEECH SIGNAL DETECTION DEVICE

Non-Final OA §102§103

Filed

Apr 25, 2024

Examiner

VILLENA, MARK

Art Unit

2658

Tech Center

2600 — Communications

Assignee

Snap Inc.

OA Round

1 (Non-Final)

Interview Optional

— +15.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 478 resolved cases, 2023–2026

Examiner Intelligence

VILLENA, MARK View full profile →

Grants 70% — above average

Career Allow Rate

334 granted / 478 resolved

+7.9% vs TC avg

Strong +16% interview lift

Without

With

+15.5%

Interview Lift

resolved cases with interview

Typical timeline

3y 10m

Avg Prosecution

22 currently pending

Career history

500

Total Applications

across all art units

Statute-Specific Performance

§101

13.7%

-26.3% vs TC avg

§103

51.5%

+11.5% vs TC avg

§102

20.4%

-19.6% vs TC avg

§112

5.0%

-35.0% vs TC avg

Black line = Tech Center average estimate • Based on career data from 478 resolved cases

Office Action

§102 §103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statements (IDS) submitted on 06/27/2024, 07/14/2025, 10/28/2025, and 02/07/2026 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are being considered by the examiner.

Drawings
The drawings were submitted on 04/25/2024.  These drawings are reviewed and accepted by the examiner.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-3, 5-6, 8-17, and 19-20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Garg et al. (US 20240221762 A1).

Regarding claims 1, 19, and 20, Garg (‘762) teaches:
“collecting, by a speech signal detection device worn by a user, a combination of signals comprising electromyograph (EMG) data signals and one or more non-EMG data signals” (par. 0074; ‘For example, FIG. 3 is a scheme diagram of a speech model configured to decode speech to predict text or encoded features using EMG signals, in accordance with some embodiments of the technology described herein. In some embodiments, the speech model 302 may be trained and installed in a speech input device (e.g., 900A in FIG. 9A, 900B in FIG. 9B, 1000 in FIG. 10A).’; par. 0087; ‘The voiced speech muscle signals and auxiliary measurements may be used as inputs to model 802 to determine a predicted speech label 807A.’)
“processing the combination of signals by a machine learning (ML) model to detect inner speech of the user, the ML model trained to establish a relationship between training signals comprising training EMG data signals and training non-EMG data signals and ground-truth inner speech data” (par. 0048; ‘Ground truth measurements can optionally be used to validate, correct, and/or otherwise adjust another speech label.’; par. 0088; ‘Silent speech EMG 805-1 may be used as input to model 802 to determine predicted speech label 807B.’; par. 0096; ‘In some examples, the digital signal processor 905 may include one or more layers of a neural network and/or a machine learning model maintained by the speech input device to generate digital signal vector(s).’); and
“in response to detecting inner speech of the user by the ML model, performing one or more operations associated with the speech signal detection device” (par. 0092; ‘The device activation logic 902 may recognize this word or phrase and in response will perform one or more actions.’).

Regarding claim 2 (dep. on claim 1), Garg further teaches:
“wherein the ML model is implemented by an individual device external to the speech signal detection device worn by the user” (par. 0074; ‘Alternatively, the speech model 302 may be installed in an external device (e.g., 950 in FIG. 9A).’).

Regarding claim 3 (dep. on claim 2), Garg further teaches:
“converting the combination of signals into a digital signature” (par. 0090; ‘FIG. 9A depicts a scheme diagram of an example speech input device 900A capable of communicating with an external speech model 950, according to some embodiments.’ The speech model is interpreted as a digital signature.); and
“transmitting wirelessly the digital signature from the speech signal detection device worn by the user to the individual device” (par. 0090; ‘FIG. 9A depicts a scheme diagram of an example speech input device 900A capable of communicating with an external speech model 950, according to some embodiments.’).

Regarding claim 5 (dep. on claim 1), Garg further teaches:
“wherein the one or more operations comprise activating a function of an interaction application in response to detecting the inner speech, wherein the function comprises capturing an image or video by a camera of a user system coupled to the speech signal detection device” (par. 0043; ‘The measurement systems can include: electrophysiology measurement systems (e.g., to collect EMG signals, EEG signals, EOG signals, ECG signals, EKG signals, etc.), other biometric measurement systems, motion sensor (e.g., IMU), microphone, optical sensors that detect the movement of the skin (e.g., infrared cameras with a dot matrix projector), video cameras (e.g., to capture images, videos, motion capture data, etc.), sensors that can detect blood flow (e.g., PPG, fNIRS), thermal cameras, ToF sensors, and/or any other measurement systems.’; par. 0092; ‘The device activation logic 902 may recognize this word or phrase and in response will perform one or more actions.’).

Regarding claim 6 (dep. on claim 1), Garg further teaches:
“wherein the ML model comprises an Extreme Gradient Boosting (XGB) model or a multiple layer neural network architecture” (par. 0096; ‘In some examples, the digital signal processor 905 may include one or more layers of a neural network and/or a machine learning model maintained by the speech input device to generate digital signal vector(s).’).

Regarding claim 8 (dep. on claim 1), Garg further teaches:
“wherein the non-EMG data signals represent movement of certain muscles in a face and neck region of the user, physical movements associated with inner speech, and muscle twitches” (par. 0090; ‘In some embodiments, the speech input device 900A may include one or more sensors 911, which record signals indicating a user's speech muscle activation patterns associated with the user speaking (e.g., in a silent, voiced, or whispered speech). In non-limiting examples, the one or more sensors 911 may include one or more EMG electrodes 911A, a microphone 911B, an accelerometer 911C and/or other suitable sensors 911D.’).

Regarding claim 9 (dep. on claim 1), Garg further teaches:
“wherein the non-EMG data signals comprise at least one of inertial measurement unit (IMU) movement or audio data” (par. 0090; ‘In some embodiments, the speech input device 900A may include one or more sensors 911, which record signals indicating a user's speech muscle activation patterns associated with the user speaking (e.g., in a silent, voiced, or whispered speech). In non-limiting examples, the one or more sensors 911 may include one or more EMG electrodes 911A, a microphone 911B, an accelerometer 911C and/or other suitable sensors 911D.’).

Regarding claim 10 (dep. on claim 1), Garg further teaches:
“wherein the non-EMG data signals are received from at least one of an array of biopotential sensors, motion sensors, sound sensors, or photonic sensors that are independent of the EMG data signals” (par. 0090; ‘In some embodiments, the speech input device 900A may include one or more sensors 911, which record signals indicating a user's speech muscle activation patterns associated with the user speaking (e.g., in a silent, voiced, or whispered speech). In non-limiting examples, the one or more sensors 911 may include one or more EMG electrodes 911A, a microphone 911B, an accelerometer 911C and/or other suitable sensors 911D.’).

Regarding claim 11 (dep. on claim 10), Garg further teaches:
“wherein the photonic sensors are configured to perform operations comprising: emitting light at different wavelengths onto a throat region of the user; and measuring the light reflected from the throat region to identify muscle movements in the throat” (par. 0103; ‘For example, the sensors 1005 may include photoplethysogram (PPG) sensors, photodiodes, optical sensors, laser doppler imaging, mechanomyography sensors, sonomyography sensors, ultrasound sensors, infrared sensors, functional near-infrared spectroscopy (fNIRS) sensors, capacitive sensors, electroglottography sensors, electroencephalogram (EEG) sensors, and magnetoencephalography (MEG) sensors, or any other suitable sensors.’).

Regarding claim 12 (dep. on claim 10), Garg further teaches:
“wherein the biopotential sensors comprise a pure silver dry electrode array or dry monopolar bio-potential electrode array” (par. 0102; ‘The EMG electrodes 1004 may be configured as an electrode array or multiple electrode arrays supported by the sensor arm 1002 of the wearable device 1000. Although the EMG electrodes 1004 are shown to be positioned at a distal end of the sensor arm 1002, in other embodiments, the EMG electrodes 1004 may be dispersed over the sensor arm. The one or more electrode arrays may have any suitable shapes e.g., a circular, a square, a rectangular, or any other suitable shape.’).

Regarding claim 13 (dep. on claim 1), Garg further teaches:
“coupling a first EMG data signal of the EMG data signals to a first negative input of a first instrumentation amplifier in a set of instrumentation amplifiers” (par. 0104; ‘In some examples, the reference electrode 1003 may be used in conjunction with the electrodes 1004 supported by the sensor arm 1002 as inputs to a differential amplifier.’);
“coupling a second EMG data signal of the EMG data signals to a second negative input of a second instrumentation amplifier in the set of instrumentation amplifiers” (par. 0104; ‘With further reference to FIG. 10A, in some embodiments, the one or more sensors 1005 may include a reference electrode 1003.’);
“coupling a first non-EMG data signal of the EMG data signals to a third negative input of a third instrumentation amplifier in the set of instrumentation amplifiers” (par. 0104; ‘In some examples, the reference electrode 1003 may be used in conjunction with the electrodes 1004 supported by the sensor arm 1002 as inputs to a differential amplifier.’);
“coupling positive inputs of the first instrumentation amplifier, the second instrumentation amplifier, and the third instrumentation amplifier together” (par. 0104; ‘In some examples, the reference electrode 1003 may be used in conjunction with the electrodes 1004 supported by the sensor arm 1002 as inputs to a differential amplifier.’); and
“coupling the positive inputs to a ground electrode that is attached to skin of the user” (par. 0104; ‘The reference voltage supplied to the face of the user by the reference electrode 1003 will be recorded by the electrodes 1004 supported by the sensor arm 1002, in addition to the voltage generated by muscles in the face of the user.’).

Regarding claim 14 (dep. on claim 13), Garg further teaches:
“measuring a potential of each electrode associated with the EMG data signals and non-EMG data signals with reference to the ground electrode” (par. 0104; ‘The reference electrode 1003 may provide a first potential or voltage to the user.’).

Regarding claim 15 (dep. on claim 14), Garg further teaches:
“computing a first plurality of differences between each signal in the combination of signals and every other signal in the combination of signals” (par. 0104; ‘In some examples, the reference electrode 1003 may be used in conjunction with the electrodes 1004 supported by the sensor arm 1002 as inputs to a differential amplifier.’); and
“computing an additional difference between the combination of signals and a common average of the combination of signals” (par. 0053; ‘… comparison methods (e.g., matching, distance metrics, thresholds, etc.),’).

Regarding claim 16 (dep. on claim 15), Garg further teaches:
“generating a digital signature of the combination of signals based on the first plurality of differences and the additional difference, the digital signature being processed by the ML model to detect the inner speech” (par. 0053; ‘The model (e.g., silent speech decoding model, silent speech recognition model, etc.) can be or use one or more of:…’).

Regarding claim 17 (dep. on claim 1), Garg further teaches:
“training the ML model by performing training operations comprising: obtaining a batch of the training signals” (par. 0054; ‘Inputs to the model can include: training data (in the source domain and/or the target domain), auxiliary measurements, and/or any other suitable outputs.’);
“generating a digital representation of the batch of the training signals” (par. 0036; ‘Step 102 can be performed for an individual training subject, for a set of training subjects, for one or more collection domains, and/or otherwise performed.’);
“processing the digital representation by the ML model to estimate inner speech” (par. 0090; ‘In some embodiments, the speech input device 900A may include one or more sensors 911, which record signals indicating a user's speech muscle activation patterns associated with the user speaking (e.g., in a silent, voiced, or whispered speech).’);
“obtaining the ground-truth inner speech data associated with the batch of the training signals” (par. 0048; ‘In a first variant, the speech label includes a prompt (e.g., text from a prompt). In a second variant, the speech label includes text and/or audio determined based on ground truth measurements.’);
“computing a deviation between the estimated inner speech and the ground-truth inner speech data” (par. 0048; ‘Ground truth measurements can optionally be used to validate, correct, and/or otherwise adjust another speech label. For example, a speech label including prompt text can be corrected based on ground truth measurements. An example is shown in FIG. 2.’) and
“updating one or more parameters of the ML model based on the deviation” (par. 0041; ‘continual model training/calibrating (e.g., outside of data collection centers; while all or parts of the system are not in active use for silent speech decoding and/or for controlling a device based on decoded silent speech; etc.),’).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 4 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Garg in view of Kapur et al. (US 20190074012 A1).

Regarding claim 4 (dep. on claim 1), Garg teaches:
“wherein the combination of signals comprises a sequence of activation of muscles in a face and neck area of the user over time” (par. 0071; ‘In some embodiments, in generating the labeled training data, speech labels associated with training data may be determined using ground truth measurements sampled concurrently with the training data, where the training data (e.g., EMG signals indicating a user's speech muscle activation patterns) and the speech labels may be generated from the same or different domains.’).
However, Garg does not expressly teach:
“wherein the one or more operations comprise controlling an extended reality (XR) experience based on the detected inner speech, and wherein the combination of signals comprises a sequence of activation of muscles in a face and neck area of the user over time.” (emphasis added)
Kapur teaches:
“wherein the one or more operations comprise controlling an extended reality (XR) experience based on the detected inner speech, and wherein the combination of signals comprises a sequence of activation of muscles in a face and neck area of the user over time” (par. 0008; ‘The SSI system may do so by measuring low-voltage electrical signals at electrodes positioned on a user's skin, on the user's face or neck.’; par. 0137; ‘The interface may also be used as a silent input to Virtual Reality/Augmented Reality applications.’).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Garg’s silent speech decoding and EMG signals by incorporating Kapur’s Virtual Reality/Augmented Reality applications in order to control an extended reality (XR) experience based on the detected inner speech. The combination provides an open-loop operation in which the silent speech interface may be employed as an input modality to control devices or to initiate or request services. (Kapur: par. 0136)

Regarding claim 18 (dep. on claim 1), the combination of Garg in view of Kapur further teaches:
“wherein the speech signal detection device comprises an augmented reality (AR) headset that is attached to an EMG communication device, the EMG communication device being positioned adjacent to and underneath a neck region of the user, and the EMG communication device comprising a plurality of electrodes configured to collect the combination of signals” (Kapur: par. 0008; ‘The SSI system may do so by measuring low-voltage electrical signals at electrodes positioned on a user's skin, on the user's face or neck.’; par. 0137; ‘The interface may also be used as a silent input to Virtual Reality/Augmented Reality applications.’).

Claim(s) 7 is rejected under 35 U.S.C. 103 as being unpatentable over Garg in view of Garg et al. (US 20240221738 A1), hereinafter referred to as Garg 2.

Regarding claim 7 (dep. on claim 1), Garg does not expressly teach:
“the one or more operations comprise sending one or more queries to a large language model (LLM) or chatbot.”
Garg 2 teaches:
“the one or more operations comprise sending one or more queries to a large language model (LLM) or chatbot” (par. 0031; ‘Described herein are various techniques, including systems, computerized methods, and non-transitory instructions, that are configured to use silent speech or whisper to query a large language model (LLM). A LLM may be a system trained on large amount of data to receive a prompt and iteratively predict the next character in a string of characters in response to the prompt.’).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Garg’s silent speech decoding by incorporating the large language models taught by Garg 2 in order to iteratively predict the next character in a string of characters in response to the prompt. 

Conclusion
Other pertinent prior art are cited in the PTO-892 for the applicant's consideration. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARK VILLENA whose telephone number is (571)270-3191. The examiner can normally be reached 10 am - 6pm EST Monday through Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached at (571) 272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

MARK . VILLENA
Examiner
Art Unit 2658



/MARK VILLENA/Examiner, Art Unit 2658

Read full office action

Prosecution Timeline

Apr 25, 2024

Application Filed

Feb 21, 2026

Non-Final Rejection — §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/111,671

Patent 12591407

ROBUST VOICE ACTIVITY DETECTOR SYSTEM FOR USE WITH AN EARPHONE

2y 5m to grant Granted Mar 31, 2026

18/141,182

Patent 12592232

SYSTEMS, METHODS, AND APPARATUSES FOR DETECTING AI MASKING USING PERSISTENT RESPONSE TESTING IN AN ELECTRONIC ENVIRONMENT

2y 5m to grant Granted Mar 31, 2026

18/250,511

Patent 12586581

ELECTRONIC DEVICE CONTROL METHOD AND APPARATUS

2y 5m to grant Granted Mar 24, 2026

18/623,751

Patent 12578922

Natural Language Processing Platform For Automated Event Analysis, Translation, and Transcription Verification

2y 5m to grant Granted Mar 17, 2026

18/292,214

Patent 12573394

ESTIMATION METHOD, RECORDING MEDIUM, AND ESTIMATION DEVICE

2y 5m to grant Granted Mar 10, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

70%

Grant Probability

85%

With Interview (+15.5%)

3y 10m

Median Time to Grant

Low

PTA Risk

Based on 478 resolved cases by this examiner. Grant probability derived from career allow rate.