Last updated: May 29, 2026

Application No. 18/718,080

Face Mask for Capturing Speech Produced by a Wearer

Non-Final OA §102§103

Filed

Jun 10, 2024

Priority

Dec 30, 2021 — GR 20210100925 +1 more

Examiner

LERNER, MARTIN

Art Unit

2658

Tech Center

2600 — Communications

Assignee

Telefonaktiebolaget Lm Ericsson (Publ)

OA Round

1 (Non-Final)

Interview Optional

— +13.3% interview lift. Interview lift (+13.3%) is below the 15.0% threshold. A written response is recommended.

Based on 988 resolved cases, 2023–2026

Examiner Intelligence

LERNER, MARTIN View full profile →

Grants 78% — above average

Career Allowance Rate

771 granted / 988 resolved

+16.0% vs TC avg

Moderate +13% lift

Without

With

+13.3%

Interview Lift

resolved cases with interview

Typical timeline

2y 11m

Avg Prosecution

22 currently pending

Career history

1008

Total Applications

across all art units

Statute-Specific Performance

§101

10.0%

-30.0% vs TC avg

§103

74.1%

+34.1% vs TC avg

§102

3.5%

-36.5% vs TC avg

§112

8.7%

-31.3% vs TC avg

Black line = Tech Center average estimate • Based on career data from 988 resolved cases

Office Action

§102 §103

DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Drawings
The drawings are objected to because it appears that an arrow representing training of Mask 2 in Step 5 should extend to the dotted line representing Mask 2 in Figure 6.  That is, a rightward arrow representing Step 5 should extend the same way in the opposite direction as an arrow representing Step 7 because Steps 5 to 7 are operating on Mask 2.  See Specification, ¶[0056].
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office Action to avoid abandonment of the application.  Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended.  The figure or figure number of an amended drawing should not be labeled as “amended.”  If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency.  Additional replacement sheets may be necessary to show the renumbering of the remaining figures.  Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d).  If the changes are not accepted by the examiner, Applicants will be notified and informed of any required corrective action in the next Office Action.  The objection to the drawings will not be held in abeyance.

Specification
The disclosure is objected to because of the following informalities:
In ¶[0058], “In some cases, machine learning model” should be “In some cases, a machine learning model”.
In ¶[0059], “and their features into vector space” should be “and their features into a vector space”.
In ¶[0071], “a network interface 1414” should be “and a network interface 1414”.     
Appropriate correction is required.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 17 to 24 and 26 to 36 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Witchey et al. (U.S. Patent No. 11,160,319).
Regarding independent claims 17, 28, and 36, Witchey et al. discloses a smart mask, method, and computer readable medium for communication based on facial movement, comprising:

“a plurality of sensors adapted to capture changes in shape of a part of a face of the wearer while producing speech” – a smart mask includes a first sensor configured to detect movement of the mouth of the person (“capture changes in shape of a part of a face of the wearer”) and generate a signal indicative of the movement of the mouth (Abstract; column 1, lines 39 to 41); the smart article includes a filter including sensors; the sensors are configured to generate signals indicative of facial movements of the person including the movement of the mouth (“a plurality of sensors adapted to capture changes in shape of a part of a face of the wearer”) (column 3, lines 35 to 38); several small sensors may be included to sense movement of the mask while a person is talking (“while producing speech”) (column 7, lines 8 to 10: Figure 2); a stretch sensor 502 may be used to detect movement of a person’s jaw as the person speaks (“changes in shape of a part of a face of the wearer while producing speech”) (column 10, lines 50 to 52: Figures 5A to 5D);
“processing circuitry configured to: receive data from the plurality of sensors, the data representing the changes in shape of the part of the face of the wearer” – a control module is configured to receive the signals (“processing circuitry configured to: receive data”) (Abstract); a smart mask further includes a second sensor configured to generate a signal indicative of movement of a second portion of the mouth (“the data representing the changes in shape of the part of the face of the wearer”); a control module is configured to display the images on the display based on the signal indicative of the movement of the second portion of the mouth (column 1, lines 56 to 59); a control module is configured to transmit a sensor signal from the first article indicating the at least one movement of the mouth of the person (column 2, lines 44 to 47); mobile device control module 302 executes a smart article application 310 for mapping collected data to facial movement images (column 7, lines 25 to 27: Figure 3); article control module 404 may be suitable for wearable implementations and be based on wearable technologies (column 8, lines 18 to 29: Figure 4);
“and classify the data received from the plurality of sensors into one or more units of speech using a machine learning model” – mapping module 311 may be referred to as a classifier, and include a face movement-to-text mapping module 314 and a sensor output-to-text mapping module 316 (column 7, lines 30 to 35: Figure 3); mapping module 415 may be referred to as a classifier and include a face movement-to-text mapping module 420 and a sensor output-to-text mapping module 422 (“classify the data received from the plurality of sensors into one or more units of speech”) (column 8, lines 41 to 48: Figure 4); a mapping function may operate as a classifier implementing a support vector machine (SVM) algorithm, a k nearest neighbor (kNN) search algorithm, etc. (column 15, lines 43 to 47); a default configuration may be compiled based on AI or machine learning training data sets compiled from many individuals (“using a machine learning model”) (column 17, lines 17 to 19); machine learning techniques could use classifiers, e.g., SVM, kNN, NNs, etc., to convert sensor data to images corresponding to phonemes (column 20, lines 52 to 58); here, text and phonemes are “one or more units of speech”.

	Regarding claim 18, Witchey et al. discloses mapping module 311 may be referred to as a classifier, and includes a face movement-to-text mapping module 314 and a sensor output-to-text mapping module 316 (column 7, lines 30 to 35: Figure 3); mapping module 415 may be referred to as a classifier and includes a face movement-to-text mapping module 420 and a sensor output-to-text mapping module 422 (“classify the data received from the plurality of sensors into one or more units of speech”) (column 8, lines 41 to 48: Figure 4); artificial intelligence (AI) training is implemented to make known movements as matched with spoken words and/or known expressions (phonemes, utterances, etc.) of a person’s face at various positions and during movements (column 15, line 60 to column 16, line 22).  Here, a mapping module generates text from utterances and phonemes (“wherein . . . text is generated from units of speech”).
	Regarding claims 19 and 29, Witchey et al. discloses a smart mask 104 and mobile device 106; central control module 110 communicates with smart mask 104 and mobile device 106 via transceiver 112 (column 5, line 63 to column 6, line 10: Figure 1); mobile device 200 with mobile device control unit 302 executes smart article application 310 for mapping collected data to facial movement images (column 7, lines 19 to 27: Figure 3); mobile device control module 302 executes a smart article application 110 for mapping collected data to facial movement images; mapping module 311 may include a face movement-to-text mapping module 314 and sensor output-to-text mapping module 316 (“wherein . . . text is generated by a communication device”) (column 7, lines 19 to 35: Figure 3); transceiver 308 may be used to communicate with any of the devices in the ecosystem 100 (column 8, lines 10 to 12: Figure 3).  Generally, Figures 1 to 4 disclose that smart mask 104 and mobile device 106 are separate devices that communicate via transceivers (“a communication device connected to the face mask”) and mobile device 106 generates text by mapping sensor output and facial movements to text (“wherein . . . the text is generated by a communication device”).
	Regarding claims 20 and 29, Witchey et al. discloses an embodiment of a central control station 102, a smart mask 104, and a mobile device 106; central control station 102 may communicate with smart mask 104 and mobile device 106; central control station 102 may include a control module 110 that may collect sensor data and map the data to facial movement, and provide results to any of devices 104, 106, and 108 via transceiver 108 (column 5, line 63 to column 6, line 10: Figure 1); functionality of any given module may be distributed among multiple modules connected via interface circuits, and a server, known as a remote or cloud module (“a cloud computing system”) may accomplish functionality on behalf of a client module (column 28, lines 35 to 41).  Here, central control station 102 mapping facial movement to text can be implemented as a server in a cloud (“wherein . . . the text is generated by a cloud computing system”).
	Regarding claims 21 and 30, Witchey et al. discloses an embodiment of central control station 102, a smart mask 104, and mobile device 106; central control module 110 communicates with smart mask 104 and mobile device 106 via transceiver 112; central control station 102 includes central control module 110 that collects sensor data and maps the data to facial movements (column 5, line 63 to column 6, line 16: Figure 1); mapping module 311 may include a face movement-to-text mapping module 314 and sensor output-to-text mapping module 316 (column 7, lines 19 to 35: Figure 3); transceiver 308 may be used to communicate with any of the devices in the ecosystem 100 (column 8, lines 10 to 12: Figure 3).  Here, Figure 1 includes an embodiment of central control station 102 that includes “the processing circuitry” in central control module 110 that performs functionality of mapping module 311 of Figure 3 to generate text as “data representing the units of speech” that are transmitted via transceiver 112 to mobile device 300 having transceiver 308 that is “a communication device associated with a receiver.”  That is, mobile device 106 of Figure 1 is “a communication device associated with a receiver” of Figure 3.  
	Regarding claims 22 and 31, Witchey et al. discloses that known movements are mapped to spoken words and/or known expressions that include phonemes and utterances (“wherein the units of speech comprise one or more of: phonemes . . . utterances”); training data provides for matching sensor data to corresponding spoken words and corresponding images of the person’s face at various positions and/or during movements (column 16, lines 4 to 21); images may be generated corresponding to mouth arrangements for known phonemes in English (column 16, lines 43 to 45). 
	Regarding claims 23 and 32, Witchey et al. discloses a personalized or customized mapping function based on artificial intelligence or machine learning training data sets; a training data set may be compiled on a person-by-person basis to ensure that training data is highly personalized (column 17, lines 10 to 33).  Here, personalized training by machine learning provides “wherein the machine learning is trained for the wearer.”
Regarding claims 24 and 33, Witchey et al. discloses that a default configuration may be compiled from artificial intelligence or machine learning training data sets compiled from many individuals; a training data set may be built from hundreds, thousands, or more users thus representing a default or base configuration; data from a specific user may be captured by a neural network trained on the default data set being refined based on the user’s data set (column 17, lines 10 to 46).  Here, a default or base configuration built from training data of many users provides “machine learning models trained for a plurality of wearers”.  Broadly, combining training data from many users for a default or base configuration is “collating machine learning models”.  Here, “collating” can be defined as ‘collecting and combining in a proper order’.  Compare Applicants’ Specification, ¶[0051] and ¶[0057], which are the only occurrences of ‘collate’. 
Regarding claims 26 and 34, Witchey et al. discloses that a mask may include four sensors that generate values; at any given sensed time, e.g., every millisecond or every 5 ms, a control module of the mask may generate an input vector of four bytes with each byte representing a value of a corresponding sensor (column 15, lines 20 to 26); a mapping function for four landmarks may comprise Rc= right corner of mouth, Lc=left corner of mouth, UL=upper lip portion of mouth, LL=lower lip position of mouth (column 15, lines 43 to 54); as a person speaks, landmarks move in space; a facial feature detection algorithm tracks the movement of the landmarks and calculates the new positions in real-time or near real-time relative to the origin or relative to the landmark’s previous positions (column 19, lines 8 to 13).  Here, capturing movement of landmarks in facial features with sensors in real-time or every five milliseconds as a person speaks is “wherein the plurality of sensors is configured to continuously capture the changes in shape of the part of the face of the wearer while producing speech.”  That is, a change in position of corners of the mouth and lip position of the mouth are  “changes in shape of the part of the face of the wearer”.  
Regarding claims 27 and 35, Witchey et al. discloses four byte sensor data is converted to a set of positions of the four landmarks coordinates (e.g., (x,y)) for each landmark to a single relative distance value from an origin (“wherein the data representing the changes in shape of the part of the face comprises one or more of: distances between the plurality of sensors; and positions of the plurality of sensors”) (column 15, lines 26 to 33); a mapping function for four landmarks may comprise Rc= right corner of mouth, Lc=left corner of mouth, UL=upper lip portion of mouth, LL=lower lip position of mouth (“wherein the data representing the changes in shape of the part of the face comprises one or more of: . . . positions of the plurality of sensors”) (column 15, lines 43 to 54); an origin point may be found based on landmarks as a center point of lips and/or a fixed distance from the nose; data is a set of coordinate values (X,Y) relative to the origin, a difference in X and Y relative to the origin, or a difference in X and Y relative to previous positions (column 19, lines 3 to 20). 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 25 is rejected under 35 U.S.C. 103 as being unpatentable over Witchey et al. (U.S. Patent No. 11,160,319) in view of Stoppa et al. (U.S. Patent Publication 2022/0092331).
Witchey et al. discloses “a machine learning model is produced for a plurality of wearers”, but does not disclose producing a machine learning models for users “in a demographic group through federated averaging.”  That is, Witchey et al. discloses producing a machine learning model for a plurality of wearers because a default configuration may be compiled from artificial intelligence or machine learning training data sets compiled from many individuals.  A training data set may be built from hundreds, thousands, or more users thus representing a default or base configuration.  Data from a specific user may be captured by a neural network trained on the default data set being refined based on the user’s data set.  (Column 17, Lines 10 to 46)  Witchey et al., then,  discloses modifying a base machine learning model with data from a specific user to generate a machine learning model refined for the specific user, but does not provide details that performing this modification is by federated averaging.
However, Stoppa et al. teaches providing personalized saliency models.  (Abstract)  Models may be modified at a server by performing an averaging operation on at least some of the at least a portion of an updated version of a first saliency model.  (¶[0050])  A global saliency model may be trained using a different subsets of collected information tailored for a specific demographic group, e.g., females over the age of 65, certain professions, e.g., dentists, certain activities, e.g., attending a tennis match, and certain geographic regions, e.g., European users.  A user could then switch between different global saliency models 534 as desired which would be relevant or salient to someone of a different demographic group.  (¶[0057]: Figure 5)  Specifically, network devices 550 may utilize training/federated learning module 556 to perform a process known as federate learning in order to continue to train and improve its global saliency models 554 over time.  The process of federated learning may comprise multiple devices working together to collaboratively learn a shared model while keeping each user’s individual training data on their individual electronic device 500, thereby obviating the need for users to send their personal data to network 550 in order to obtain the benefits of machine learning.  A federated learning process generates an updated personalized saliency model by summarizing changes and updates in the form of a small update file.  In order to protect a user’s privacy, while simultaneously reducing latency and power consumption, only the small update file may be sent to network device 550, wherein the small update file may be averaged with other user updates received at network device 550 to improve the global saliency models 554 (“federated averaging”).  (¶[0059] - ¶[0062])  Consequently, Stoppa et al. teaches generating a machine learning model for “a demographic group through federated averaging” has advantages of protecting a user’s privacy by not requiring a user to send their personal data for training.  It would have been obvious to one having ordinary skill in the art to perform federated averaging to generate a machine learning model for a demographic group as taught by Stoppa et al. in a machine learning model refined for a specific user from a model with a training data set from hundreds or thousands of individuals in Witchey et al. for a purpose of protecting a user’s privacy by not requiring a user to send their personal data for training.   

Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicants’ disclosure.
Taylor (EP ‘409), Kihlberg, and LaCombe represent similar prior art with face masks having sensors.
Malik et al. and Gilbertson et al. represent similar prior art with federated learning and demographics.  

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARTIN LERNER whose telephone number is (571) 272-7608.  The examiner can normally be reached Monday-Thursday 8:30 AM-6:00 PM.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached at (571) 272-7602.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

Information regarding the status of published or unpublished applications may be obtained from Patent Center.  Unpublished application information in Patent Center is available to registered users.  To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov.  Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format.  For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).  If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MARTIN LERNER/Primary Examiner
Art Unit 2658                                                                                                                                                                                                        March 25, 2026

Read full office action

Prosecution Timeline

Jun 10, 2024

Application Filed

Apr 01, 2026

Non-Final Rejection mailed — §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/272,516

Patent 12632656

TEXT GENERATION INCLUDING DE-DUPLICATION OF DECODED WORD INFORMATION TO SPLICE TARGET WORD INFORMATION INTO AN INFORMATION SEQUENCE

2y 10m to grant Granted May 19, 2026

17/770,177

Patent 12620404

DEEP SOURCE SEPARATION ARCHITECTURE

4y 0m to grant Granted May 05, 2026

18/365,535

Patent 12596880

DETERMINING CAUSALITY BETWEEN FACTORS FOR TARGET OBJECT BY ANALYZING TEXT

2y 8m to grant Granted Apr 07, 2026

17/882,447

Patent 12586592

METHODS AND APPARATUS FOR GENERATING AUDIO FINGERPRINTS FOR CALLS USING POWER SPECTRAL DENSITY VALUES

3y 7m to grant Granted Mar 24, 2026

18/336,831

Patent 12585680

CONTEXTUAL TITLES BASED ON TEMPORAL PROXIMITY AND SHARED TOPICS OF RELATED COMMUNICATION ITEMS WITH SENSITIVITY POLICY

2y 9m to grant Granted Mar 24, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

78%

Grant Probability

91%

With Interview (+13.3%)

2y 11m (~1y 0m remaining)

Median Time to Grant

Low

PTA Risk

Based on 988 resolved cases by this examiner. Grant probability derived from career allowance rate.