Last updated: April 19, 2026

Application No. 18/733,601

SPEECH MASKING METHOD AND SYSTEM, ELECTRONIC DEVICE, AND NON-TRANSITORY COMPUTER READABLE STORAGE MEDIUM

Non-Final OA §103

Filed

Jun 04, 2024

Examiner

AL AUBAIDI, RASHA S

Art Unit

2693

Tech Center

2600 — Communications

Assignee

Aac Acoustic Technologies (Shanghai) Co. Ltd.

OA Round

1 (Non-Final)

Interview Optional

— +11.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 744 resolved cases, 2023–2026

Examiner Intelligence

AL AUBAIDI, RASHA S View full profile →

Grants 78% — above average

Career Allow Rate

577 granted / 744 resolved

+15.6% vs TC avg

Moderate +11% lift

Without

With

+11.1%

Interview Lift

resolved cases with interview

Typical timeline

3y 3m

Avg Prosecution

38 currently pending

Career history

782

Total Applications

across all art units

Statute-Specific Performance

§101

10.2%

-29.8% vs TC avg

§103

55.9%

+15.9% vs TC avg

§102

16.1%

-23.9% vs TC avg

§112

8.4%

-31.6% vs TC avg

Black line = Tech Center average estimate • Based on career data from 744 resolved cases

Office Action

§103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

1.	This is in response to CON application filed 06/04/2024.

         Claim Rejections - 35 USC § 103
2.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

3.	Claims 1-17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Nyayate et al. (Pub.No.: 2021/0360349 A1) in view of Hillis et al. (US PAT # 7,184,952 B2).


Regarding claims 1 and 10, Nyayate teaches a speech masking method (see abstract) and a device (reads on computing device [0048]), comprising: 
 obtaining a target speech upon detecting that at least one target person is talking (reads on audio can be captured 402 that includes a primary signal, as may correspond to speech of a person, see [0062]);
 determining a training manner for a neural network model according to a target masking effect and training the neural network model (reads on training deep neural networks to construct a speech mask for noisy audio in a noise model 118, see discussion in [0056]); and 
 generating a masking signal according to the neural network model trained and the target speech (generating a speech mask for a center time-band with 961 frequency bins in this frequency domain through a trained deep neural network, see [0054]). 

Nyayate does not specifically teach “playing the masking signal” as recited in claim 1 and 10.

However, Hillis teaches speech masking techniques for intentionally rendering speech unintelligible including, obtaining a speech signal to be masked (see col. 6, lines 15-25 and col. 2, lines 51-62), generating an obfuscated or masking speech signal (see col. 7, lines 5-30), playing the masking signal with the original signal together with the original speech to produce a composite signal that is unintelligible to a listener (see col. 8 lines 10-40) and outputting the masking signal via speaker to mask to a mask speech for privacy purpose (see col. 3, lines 36-46).

Thus, it would have been obvious for one of an ordinary skill in the art before the effective filing date of the claimed invention to modify the neural network-based speech processing system of Nyayate to generate masking signal, as taught Hillis, instead of or in addition to applying the mask internally for noise removal. 

	Also, regarding claim 10, for the claimed “processor” and “memory” see Nyayate [0066].
 
	Independent claim 9 is rejected for the same reasons addressed in independent claims 1 and 10. Note, for the claimed “radio module” as recited in independent claim 9, although Nyayate does not explicitly describe “radio module”, however Nyayate teaches communications of audio signals between system components, including receiving audio input and providing audio data to processing and output modules. Implementing such audio signal transmission using radio module, as recited in claim 9, would have been an obvious design choice, since wireless transmission of audio signals using radio modules is well known and commonly used in communication and call environment.  

Claims 2 and 11 recite “wherein obtaining the target speech upon detecting that the at least one target person is talking includes: 
detecting by a microphone that the at least one target person is making voice in a call environment; and 
marking, in response to voice information included in the voice being voice information that requires privacy protection, the voice as the target speech and obtaining the target speech”.
 Note that Hillis explicitly assumes identification of speech segments to be masked, including speech containing sensitive or protected information (e.g., “speech stream to be masked”, “portion of speech selected for obfuscation”). 
Nyayate already teaches capturing speech via microphone and processes speech signals in real time audio environment (see [0052]). 
Thus, it would have been obvious for one of an ordinary skill in the art before the effective filing date of the claimed invention to designate captured speech as target speech for masking based on a determination that such speech should not be intelligible to unintended listeners, as taught by Hillis, since selective speech masking inherently requires identification of speech segments to be masked.   

Claims 3 and 12 recite “wherein the target masking effect includes a speech masking effect and a comfort degree of a receiving party for receiving the masking signal, wherein the speech masking effect includes at least one of speech intelligibility of a mixed sound signal and speech recognition accuracy of the mixed sound signal, and the comfort degree includes at least one of energy of the masking signal and energy of the mixed sound signal;
 wherein the mixed sound signal is obtained by mixing a signal of the target speech and the masking signal”. 
The limitations of claims 3 and 12 are considered routine optimization not a new technical function. Thus, optimizing speech masking effectiveness while avoiding excessive signal energy would have been a routine design consideration for one of an ordinary skill in the art, since masking signals must be sufficiently strong to reduce intelligibility but not so strong as to cause discomfort.  
 
Claims 4 and 13 recite “wherein the method further comprises:  before playing the masking signal, determining a target masking area”. Note that Hillis teaches masking specific portions or segments of speech, which necessarily implies determining a region (temporal or spatial) for masking (see abstract). Thus, determining a masking area is considered an obvious implementation detail of selective masking as taught by Hillis.

Claims 5 and 14 recite “wherein training the neural network model includes: 
training the neural network model using a loss function corresponding to each of at least one of the speech intelligibility of the mixed sound signal, the speech recognition accuracy of the mixed sound signal, the energy of the masking signal, and the energy of the mixed sound signal; 
wherein the loss function is obtained by calculating according to speech obtained after the target speech superimposed with the masking signal is transmitted to a playing position and speech obtained after the target speech without being superimposed with the masking signal is transmitted to the playing position”.
Nyayate trains neural networks using loss function for speech related objectives (). 
Since Hillis provides functional goal (i.e., render speech unintelligible), selecting loss criteria that reflect desired output characteristics is routine ML practice.  
Thus, it would have been obvious for one of an ordinary skill in the art before the effective filing date of the claimed invention to train the neural network using loss functions reflecting masking effectiveness, since loss-based optimization is inherent in Nyayate’ s neural network training and intelligibility reduction is the explicit objective of Hillis

Regarding claims 6 and 15, the combination of Nyayate and Hillis teaches wherein generating the masking signal according to the neural network model trained and the target speech includes: 
using an end-to-end neural network model to generate the masking signal according to the target speech input (reads on end-to-end neural network, see Nyayate [0061]); or 
using the neural network model to dynamically estimate parameters of a masking generation algorithm, and geniting the masking signal according to the masking generation algorithm and the estimated parameters.

Regarding claims 7 and 16, the combination of Nyayate and Hillis teaches wherein the end-to-end neural network model  includes an encoder-decoder structure, wherein encoder and decoder are convolutional network structures, wherein the encoder is configured to perform feature extraction and conversion of a signal of the target speech input to convert the signal of the target speech into an intermediate representation, and the decoder is configured to decode the intermediate representation to convert the intermediate representation into the masking signal corresponding to the target speech (Nyayate teaches an encoder can utilize both approaches to attempt to learn both temporal and spatial patterns while balancing latency and computational costs, see [0056]. Nyayate also teaches convolutional layers can help to identify and extract appropriate patterns, such as for various types of noise. In at least one embodiment, a number of output filters can be increased to assist in extracting more patterns. In at least one embodiment, by a final convolutional layer of this sequence 204 various patterns will have been extracted from input noise. In at least one embodiment, GRU 206 can similarly extract and understand patterns, having historical data available that allows for learning across time for different time bands. In at least one embodiment, this GRU returns what is important in a current time, with respect to previous times, while these convolutional layers work only within boundaries of a current frame, see [0056]). 

Claims 8 and 17 recite “wherein the masking generation algorithm is a time-reversed speech masking generation algorithm, wherein parameters of the time- reversed speech masking generation algorithm include a reversed time length and an energy magnitude of the masking signal”.
Hillis teaches generating masking speech signals, without limiting signal structure. Thus, employing time reversed masking and adjusting time length or energy magnitude would have been obvious signal-processing variations for generating effective masking audio.   

Conclusion
4.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to Rasha S. AL-Aubaidi whose telephone number is (571) 272-7481.  The examiner
can normally be reached on Monday-Friday from 8:30 am to 5:30 pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Ahmad Matar, can be reached on (571) 272-7488.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).

/RASHA S AL AUBAIDI/               Primary Examiner, Art Unit 2693

Read full office action

Prosecution Timeline

Jun 04, 2024

Application Filed

Jan 24, 2026

Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/397,725

Patent 12593179

System and Method for Efficiency Among Devices

2y 5m to grant Granted Mar 31, 2026

18/105,022

Patent 12581225

CHARGING BOX FOR EARPHONES

2y 5m to grant Granted Mar 17, 2026

18/688,139

Patent 12576367

POLYETHYLENE MEMBRANE ACOUSTIC ASSEMBLY

2y 5m to grant Granted Mar 17, 2026

17/734,011

Patent 12563147

Shared Speakerphone System for Multiple Devices in a Conference Room

2y 5m to grant Granted Feb 24, 2026

18/240,324

Patent 12563330

ELECTRONIC DEVICE

2y 5m to grant Granted Feb 24, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

78%

Grant Probability

89%

With Interview (+11.1%)

3y 3m

Median Time to Grant

Low

PTA Risk

Based on 744 resolved cases by this examiner. Grant probability derived from career allow rate.