Last updated: April 19, 2026
Application No. 18/787,018
SYSTEMS, METHODS, APPARATUS, AND STORAGE MEDIUM FOR PROCESSING A SIGNAL

Non-Final OA §103§DP
Filed
Jul 29, 2024
Examiner
SCHMIEDER, NICOLE A K
Art Unit
2659
Tech Center
2600 — Communications
Assignee
Shenzhen Shokz Co. Ltd.
OA Round
1 (Non-Final)
Interview Optional

— +34.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 167 resolved cases, 2023–2026
Examiner Intelligence

SCHMIEDER, NICOLE A K View full profile →
Grants 68% — above average
Career Allow Rate
113 granted / 167 resolved
+5.7% vs TC avg
Strong +34% interview lift
Without
With
+34.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
25 currently pending
Career history
192
Total Applications
across all art units
Statute-Specific Performance

§101
21.9%
-18.1% vs TC avg
§103
46.7%
+6.7% vs TC avg
§102
13.0%
-27.0% vs TC avg
§112
13.9%
-26.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 167 resolved cases
Office Action

§103 §DP
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim(s) 1-4, 6-9, 11-15, 18, 20-22, and 27-29 is/are pending and has/have been examined.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 10/18/2024 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Drawings
The drawings are objected to because of the following informalities: Fig 10 - elements 743 and 744 are not in the spec.  
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Claim Objections
Applicant is advised that should claim 13 be found allowable, claim 22 will be objected to under 37 CFR 1.75 as being a substantial duplicate thereof. When two claims in an application are duplicates or else are so close in content that they both cover the same thing, despite a slight difference in wording, it is proper after allowing one claim to object to the other as being a substantial duplicate of the allowed claim. See MPEP § 608.01(m).
Examiner note: The Examiner believes that the Applicant intended to cancel claim 22 when the corresponding language was brought into claim 13. Cancelling claim 22 would resolve the objection.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “a voice detector” in claim 2. Regarding the terms voice detector, the terms are generic placeholders. There is no evidence that one or ordinary skill in the art would understand the structure by looking at the terms. Further, the terms are modified by the functional language configured to, but are not modified by a sufficient structure for performing the claimed function. The Examiner notes that the voice detector is specifically not covered by the “processor is configured to perform” language in the current claims.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
	The voice detector is embodied as a processor, as per the specifications [0033].
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
	
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 1-4, 6-9, 11-15, 18, 20-22, and 27-29 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 2, and 4-9 of U.S. Patent No. 12119015, in view of Ganeshkumar (US 2021/0241782), as found in the IDS, hereinafter Ganeshkumar. Although the claims at issue are not identical, they are not patentably distinct from each other because the claims of the issued patent/co-pending application anticipate the claims of the instant application. Please see below for the mapping in the table, where the bolded limitations indicate the corresponding limitations between the issued patent/co-pending application and instant application. With respect to the dependent claims, each of the claims map to a corresponding dependent claim of the issued patent/co-pending application or are found within the scope of the independent claim.
With respect to each of the dependent claims and independent claims, each claim corresponds numerically. Please see mapping that follows: Instant application claim (I), Issued Patent/Co-Pending App (P) - Claim 1 (I):Claim 1 (P), Claim 2 (I):Claim 1 (P), Claim 3 (I):Claim 1 (P), Claim 4 (I):Claim 2 (P), Claim 6 (I):Claim 4 (P), Claim 7 (I):Claim 5 (P), Claim 8 (I):Claim 1 (P), Claim 9 (I):Claim 6 (P), Claim 11 (I):Claim 8 (P), Claim 12 (I):Claim 9 (P), Claim 13 (I):Claim 1 (P), Claim 14 (I):Claim 1 (P), Claim 15 (I):Claim 1 (P), Claim 18 (I):Claim 5 (P), Claim 20 (I):Claim 1 (P), Claim 21 (I):Claim 6 (P), Claim 22 (I):Claim 7 (P), Claim 27 (I):Claim 2 (P), Claim 28 (I):Claim 5 (P), Claim 29 (I):Claim 1 (P).
Instant Application: 18787018
Issued Patent: 12119015
Claim 1: A system for processing a signal, comprising:
at least one microphone configured to obtain a sound signal, the sound signal including at least one of user voice and environmental noise;

at least one vibration sensor configured to collect a vibration signal, the vibration signal including at least one of the user voice and the environmental noise; and
 a processor configured to:



determine a relationship between a noise component in the sound signal and a noise component in the vibration signal; and













obtain a target vibration signal by performing, based at least on the relationship, noise reduction processing on the vibration signal, wherein the system further includes a noise mixer, and the at least one microphone includes a plurality of microphones, and to generate the sound signal, the processor is configured to perform operations including:





determining a first noise signal based on a relative positional relationship between the plurality of microphones, wherein the first noise signal is a noise signal synthesized from noise in all directions except the direction of the user voice;
obtaining a microphone signal collected by at least one target microphone in the plurality of microphones; and
generating the sound signal by mixing the first noise signal and the microphone signal via the noise mixer.  

Claim 1: A system for processing a signal, comprising:
at least one microphone configured to collect a sound signal, the sound signal including at least one of user voice and environmental noise;
at least one vibration sensor configured to collect a vibration signal, the vibration signal including at least one of the user voice and the environmental noise; and
a processor configured to:
identify signal segments excluding the user voice within the sound signal and the vibration signal, respectively;
determine, in the identified signal segments excluding the user voice within the sound signal and the vibration signal, a relationship between a noise component in the sound signal and a noise component in the vibration signal;
determine a noise component in the sound signal in signal segments including the user voice;
determine, based on the relationship and the noise component in the sound signal in the signal segments including the user voice, a noise component in the vibration signal in the signal segments including the user voice; and
obtain a target vibration signal by removing the noise component in the vibration signal in the signal segments including the user voice, 
wherein the at least one microphone includes a microphone array, the microphone array includes a plurality of microphones, and to determine, in the identified signal segments excluding the user voice within the sound signal and the vibration signal, the relationship between the noise component in the sound signal and the noise component in the vibration signal, the processor is further configured to:
determine a first noise signal from the sound signal based on a relative positional relationship between the plurality of microphones in the microphone array in the identified signal segments excluding the user voice within the sound signal and the vibration signal, respectively, wherein the first noise signal is a noise signal synthesized from noises in all directions except a direction of the user voice in the environment; and
determine a relationship between the first noise signal and the vibration signal.

With respect to claim(s) 1 and 13, the issued patent/co-pending application does not specifically disclose generating the sound signal by mixing the first noise signal and the microphone signal via the noise mixer.
Ganeshkumar, however, teaches generating the sound signal by mixing the first noise signal and the microphone signal via the noise mixer (the selector/mixer mixes two or more outputs together, i.e. generating the sound signal by mixing...via the noise mixer, which can be the signals from the full array of microphones, i.e. the microphone signal, including any microphones positioned in such a way that noise, such as the identified noise contributions from the signals of the microphones, i.e. first noise signal, is maximized while the voice is minimized, such as a microphone on the outside of a helmet [0020],[0026],[0028-9]). 
The issued patent/co-pending application and Ganeshkumar are analogous art because they are from a similar field of endeavor in processing audio data from both air and vibration signals. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the use of multiple microphones to receive speech signals teachings of the issued patent, with the use of a mixer to mix outputs together as taught by Ganeshkumar. It would have been obvious to combine the references to enable the selection of specific microphone input in order to reduce noise contributions in an output signal (Ganeshkumar [0020]).
Method claim 13 of the instant application is rejected over system claim 1 of the issued patent/co-pending application using the same rationale as that provided in the table above for the system claims. 
Regarding the differences between Claim 13 of the instant application and system claim 1 of the issued patent/co-pending application, it would have been obvious to one of ordinary skill in the art that the system limitation of the issued patent/co-pending application could be applied to performing the method as presented in the instant application.
As to claim(s) 27, this/these claim(s) are rejected over the issued patent/co-pending application in view of Ganeshkumar, and further in view of Fawaz et al. (U.S. PG Pub No. 2018/0068671), as found in the IDS, hereinafter Fawaz. 
Please see the associated 103 mapping below for further detail.
As to claim(s) 28, this/these claim(s) are rejected over the issued patent/co-pending application in view of Dusan et al. (U.S. PG Pub No. 2017/0365249), as found in the IDS, hereinafter Dusan. 
Please see the associated 103 mapping below for further detail.
As to claim(s) 29, this/these claim(s) are rejected over the issued patent/co-pending application in view of Ganeshkumar, and further in view of Zhang et al. (U.S. PG Pub No. 2006/0178880), as found in the IDS, hereinafter Zhang. 
Please see the associated 103 mapping below for further detail.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-3, 6, 7, 11-15, 18, 22, and 29 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang, in view of Endo (U.S. PG Pub No. 2014/0363020), as found in the IDS, hereinafter Endo, and further in view of Ganeshkumar.

Regarding claims 1 and 13, Zhang teaches
(claim 1) A system for processing a signal (a computing [0012]), comprising:
(claim 13) A method for processing a signal (a method [0005]), comprising:

at least one microphone configured to obtain a sound signal, the sound signal including at least one of user voice and environmental noise (an air conduction microphone that receives, i.e. at least one microphone configured to obtain, a speech signal, i.e. a sound signal, the sound signal including at least one of user voice, and ambient noise generated by one or more noise sources, i.e. environmental noise Fig. 3,[0028-9]);
at least one vibration sensor configured to collect a vibration signal, the vibration signal including at least one of the user voice and the environmental noise (alternative sensor, such as a throat microphone or bone conduction sensor that measures vibrations, receives, i.e. at least one vibration sensor configured to collect, a speech signal, i.e. vibration signal, the vibration signal including at least one of the user voice, and ambient noise generated by one or more noise sources, i.e. environmental noise Fig. 3,[0028-9]); and
(claim 1) a processor configured to (the device includes a microprocessor Fig. 2,[0023]):
determine a relationship between a noise component in the sound signal and a noise component in the vibration signal (alternative sensor values, i.e. vibration signal, and air conduction microphone values, i.e. sound signal, are stored as either speech frames or non-speech frames [0045], and, using the values in the non-speech frames, noise estimates are determined, including model parameters that describe the background noise from the air conduction microphone and the alternative sensor noise from the alternative sensor, where the values are used to calculate the model parameters for the channel response, i.e. determine a relationship between a noise component in the sound signal and a noise component in the vibration signal [0046-9],[0052-3]).
While Zhang provides using noise estimation to filter speech signals and provide clean speech, Zhang does not specifically teach removing noise from the vibration signal, and thus does not teach
obtain a target vibration signal by performing, based at least on the relationship, noise reduction processing on the vibration signal.
Endo, however, teaches obtain a target vibration signal by performing, based at least on the relationship, noise reduction processing on the vibration signal (the class determining unit judges whether an air conduction frame includes a main element of user’s voice, and identifies a corresponding bone conduction frame, and the bone-conduction-sound correction unit corrects the bone conduction sound to make the frequency spectrum of the bone conduction sound identical to the frequency spectrum of an air conduction sound with a high SNR, where air-conduction-sound frames have had noise reduction to reduce stationary noise, i.e. obtain a target vibration signal by performing based at least on the relationship noise reduction processing on the vibration signal Fig. 18,[0038-41],[0050],[0094]).
Zhang and Endo are analogous art because they are from a similar field of endeavor in processing audio data from both air and vibration signals. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify using noise estimation to filter speech signals to provide clean speech teachings of Zhang with correcting bone-conduction sounds when a frame has a main element of a user’s voice as taught by Endo. It would have been obvious to combine the reference to enable a system to output corrected sound signals tailored to the nature of the noise in the signals (Endo [0028-32]).
While Zhang in view of Endo provides the use of a microphone to receive speech signals, Zhang in view of Endo does not specifically teach the use of multiple microphones, and thus does not teach
wherein the system further includes a noise mixer, and the at least one microphone includes a plurality of microphones, and to generate the sound signal, the processor is configured to perform operations including:
determining a first noise signal based on a relative positional relationship between the plurality of microphones, wherein the first noise signal is a noise signal synthesized from noise in all directions except the direction of the user voice;
obtaining a microphone signal collected by at least one target microphone in the plurality of microphones; and
generating the sound signal by mixing the first noise signal and the microphone signal via the noise mixer.  
Ganeshkumar, however, teaches wherein the system further includes a noise mixer, and the at least one microphone includes a plurality of microphones, and to generate the sound signal, the processor is configured to perform operations including (the device has a selector/mixer, i.e. a noise mixer, multiple microphones, i.e. a plurality of microphones, and a processor that runs the appropriate software, i.e. processor is configured to perform operations [0028]):
determining a first noise signal based on a relative positional relationship between the plurality of microphones, wherein the first noise signal is a noise signal synthesized from noise in all directions except the direction of the user voice (the signals of the microphones are compared to identify different noise contributions, such as wind noise, i.e. determining a first noise signal, where some microphones are beamformed along the axis of the user’s mouth and some microphones are located close to the accelerometer and facing a direction that minimizes picking up the user’s voice in favor of noise, i.e. based on a relative positional relationship between the plurality of microphones, where the microphone used as a reference that is subtracted from the accelerometer output is configured to not pick up the user voice, such as on the back of a helmet and head, i.e. wherein the first noise signal is a noise signal synthesized from noise in all directions except the direction of the user voice [0018],[0020],[0026],[0029]);
obtaining a microphone signal collected by at least one target microphone in the plurality of microphones (the separate signals from the different microphones, i.e. at least one target microphone in the plurality of microphones, are collected, i.e. obtaining a microphone signal [0020]); and
generating the sound signal by mixing the first noise signal and the microphone signal via the noise mixer (the selector/mixer mixes two or more outputs together, i.e. generating the sound signal by mixing...via the noise mixer, which can be the signals from the full array of microphones, i.e. the microphone signal, including any microphones positioned in such a way that noise, such as the identified noise contributions from the signals of the microphones, i.e. first noise signal, is maximized while the voice is minimized, such as a microphone on the outside of a helmet [0020],[0026],[0028-9]).  
Zhang, Endo, and Ganeshkumar are analogous art because they are from a similar field of endeavor in processing audio data from both air and vibration signals. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the use of a microphone to receive speech signals teachings of Zhang, as modified by Endo, with the use of multiple microphones in an array as taught by Ganeshkumar. It would have been obvious to combine the references to enable the selection of specific microphone input in order to reduce noise contributions in an output signal (Ganeshkumar [0020]).

	Regarding claims 2 and 14, Zhang in view of Endo and Ganeshkumar teaches claims 1 and 13, and Zhang further teaches
a voice detector for voice activity detection configured to (a speech detection unit, i.e. voice detector, determines which values correspond to the user speaking and which values correspond to background noise, i.e. voice activity detection [0038-42], where the device includes a microprocessor Fig. 2,[0023]):
identify signal segments excluding the user voice within the sound signal and the vibration signal, respectively (the speech detection unit determines if a frame contains speech, i.e. identify signal segments…user voice, where alternative sensor values and air conduction microphone values, i.e. within the sound signal and the vibration signal, respectively, associated with non-speech are stored as non-speech frames, i.e. identify signal segments excluding the user voice Fig. 3,[0031-2],[0038-45]); and
 wherein to determine a relationship between a noise component in the sound signal and a noise component in the vibration signal, the processor is configured to perform operations including:
determining, based on the signal segments excluding the user voice within the sound signal and the vibration signal, the relationship between the noise component in the sound signal and the noise component in the vibration signal (alternative sensor values, i.e. vibration signal, and air conduction microphone values, i.e. sound signal, are stored as either speech frames or non-speech frames, i.e. identified signal segments excluding the user voice within the sound signal and the vibration signal [0045], and, using the values in the non-speech frames, i.e. in the identified signal segments excluding the user voice, noise estimates are determined, including model parameters that describe the background noise from the air conduction microphone and the alternative sensor noise from the alternative sensor, where the values are used to calculate the model parameters for the channel response, i.e. determine…a relationship between a noise component in the sound signal and a noise component in the vibration signal [0046-9],[0052-3]).  

	Regarding claims 3 and 15, Zhang in view of Endo and Ganeshkumar teaches claims 2 and 14, and Endo further teaches
obtain the target vibration signal by performing, based on the relationship, the noise reduction processing on the vibration signal in signal segments including the user voice within the sound signal and the vibration signal, respectively (the class determining unit judges whether an air conduction frame includes a main element of user’s voice, and identifies a corresponding bone conduction frame, i.e. vibration signal in signal segments including the user voice, and the bone-conduction-sound correction unit corrects the bone conduction sound to make the frequency spectrum of the bone conduction sound identical to the frequency spectrum of an air conduction sound with a high SNR, where air-conduction-sound frames have had noise reduction to reduce stationary noise, i.e. obtain a target vibration signal by removing the noise component in the vibration signal Fig. 18,[0038-41],[0050],[0094]).  
Where the motivation to combine is the same as previously presented.

Regarding claim 6, Zhang in view of Endo and Ganeshkumar teaches claim 2, and Endo further teaches
obtain a target sound signal by performing the noise reduction processing on the sound signal in a signal segment including the user voice of the sound signal (when a frame of the air conduction sound is judged to include mainly the user’s voice, i.e. a signal segment including the user voice of the sound signal, and the SNR is equal to or higher than a threshold value, the noise reduction unit reduces stationary noise, i.e.  performing the noise reduction processing, within an air conduction sound, i.e. on the voice signal in a signal segment including the user voice of the sound signal, where the noise reduction unit outputs the data, i.e. obtain a target sound signal [0040-2]).  
Where the motivation to combine is the same as previously presented.

Regarding claims 7 and 18, Zhang in view of Endo and Ganeshkumar teaches claims 6 and 14, and Zhang further teaches
(claim 18) performing the noise reduction processing on the sound signal in a signal segment including the user voice of the sound signal (when a frame of the air conduction sound is judged to include mainly the user’s voice, i.e. a signal segment including the user voice of the sound signal, and the SNR is equal to or higher than a threshold value, the noise reduction unit reduces stationary noise, i.e. performing the noise reduction processing, within an air conduction sound, i.e. on the voice signal in a signal segment including the user voice of the sound signal, where the noise reduction unit outputs the data [0040-2]).
And where Endo further teaches obtain a target signal by aliasing at least part of components in the target vibration signal with at least part of components in the target sound signal (the generating unit designates different signals as the sound output for a sound frame, i.e. obtain a target signal by aliasing, where some frames are designated from the air-conduction sound with noise being reduced, i.e. at least part of components in the target sound signal, and some frames designated from the corrected bone-conduction sound, i.e. at least part of components in the target vibration signal [0047], and some frames are composites of the two sounds [0102]), wherein frequencies of the at least part of the components in the target vibration signal are less than frequencies of the at least part of the components in the target sound signal (when the frame is to be a composite signal, the low frequency component is equal to the corrected bone conduction sound, i.e. frequencies of the at least part of the components in the target vibration signal are less than, and the high frequency component is equal to the air conduction sound, i.e. frequencies of the at least part of the components in the target sound signal [0102]).   
Where the motivation to combine is the same as previously presented.

Regarding claim 11, Zhang in view of Endo and Ganeshkumar teaches claim 1, and Ganeshkumar further teaches
obtain a noise level along a direction of the user voice (the noise levels in individual microphones is determined, i.e. obtain a noise level, where the microphones with less noise can be beamformed along an axis from the expected location of the user’s mouth, i.e. along a direction of the user voice [0020],[0026]); and
determine, based on the noise level, a mixing ratio of the first noise signal to the microphone signal (the array signal can be compared to one or more separate microphones, and the output of only the beamformed microphones that have less noise than the array can be selected to be combined for output, where the selection of which specific microphone outputs from all the microphones in the array to combine, i.e. determine...a mixing ratio of...the microphone signal, results in a combination that  would not include the whole array or output from microphones with a high noise level such a microphone positioned at the back of the head or away from optimal voice capture that has a high wind noise component, i.e. based on the noise level...first noise signal [0020],[0026],[0029]).  
Where the motivation to combine is the same as previously presented.

Regarding claim 12, Zhang in view of Endo and Ganeshkumar teaches claim 1, and Ganeshkumar further teaches
a signal-to-noise ratio of the at least one vibration sensor is greater than a signal-to-noise ratio of the at least one microphone in at least part of a frequency range (an accelerometer, i.e. vibration sensor, can have a bandwidth that allows for it to be active in the speech frequency band, i.e. in at least part of a frequency range, and the accelerometer output is used when the microphone signals are overwhelmed by wind noise, while the accelerometer may not be susceptible to environmental noise, i.e. a signal-to-noise ratio of the at least one vibration sensor is greater than a signal-to-noise ratio of the at least one microphone [0029]).  
Where Endo further teaches that the bone-conduction microphone may only pick up bone vibrations of a user from a user’s voice and not stationary noise, i.e. signal-to-noise ratio of the at least one vibration sensor is greater than a signal-to-noise ratio of the at least one microphone [0035].
And where the motivation to combine is the same as previously presented.

Regarding claim 22, Zhang in view of Endo and Ganeshkumar teaches claim 13, and Ganeshkumar further teaches
the at least one microphone includes a plurality of microphones (the device has multiple microphones, i.e. a plurality of microphones [0028]), and the method further including:
determining a first noise signal based on a relative positional relationship between the plurality of microphones (the signals of the microphones are compared to identify different noise contributions, such as wind noise, i.e. determining a first noise signal, where some microphones are beamformed along the axis of the user’s mouth and some microphones are located close to the accelerometer and facing a direction that minimizes picking up the user’s voice in favor of noise, i.e. based on a relative positional relationship between the plurality of microphones, where the microphone used as a reference that is subtracted from the accelerometer output is configured to not pick up the user voice, such as on the back of a helmet and head [0018],[0020],[0026],[0029]);
obtaining a microphone signal collected by at least one target microphone in the plurality of microphones (the separate signals from the different microphones, i.e. at least one target microphone in the plurality of microphones, are collected, i.e. obtaining a microphone signal [0020]); and
 generating the sound signal by mixing the first noise signal and the microphone signal (the selector/mixer mixes two or more outputs together, i.e. generating the sound signal by mixing, which can be the signals from the full array of microphones, i.e. the microphone signal, including any microphones positioned in such a way that noise, such as the identified noise contributions from the signals of the microphones, i.e. first noise signal, is maximized while the voice is minimized, such as a microphone on the outside of a helmet [0020],[0026],[0028-9]).  
Where the motivation to combine is the same as previously presented

Regarding claim 29, Zhang in view of Endo and Ganeshkumar teaches claim 1, and Zhang further teaches
determining the noise relationship based on the signal segment of the sound signal generated by the noise mixer that does not include the user voice (alternative sensor values and air conduction microphone values, i.e. sound signal, are stored as either speech frames or non-speech frames, i.e. identified signal segments excluding the user voice within the sound signal and the vibration signal [0045], and, using the values in the non-speech frames, i.e. based on the signal segment of the sound signal…that does not include the user voice, noise estimates are determined, including model parameters that describe the background noise from the air conduction microphone and the alternative sensor noise from the alternative sensor, where the values are used to calculate the model parameters for the channel response, i.e. determine…the noise relationship [0046-9],[0052-3]).
Where Ganeshkumar teaches that the microphone signal is generated by a mixer [0020],[0026],[0028-9].
And where the motivation to combine is the same as previously presented.

Claim(s) 4 and 27 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang, in view of Endo, in view of Ganeshkumar, and further in view of Fawaz.

Regarding claim 4, Zhang in view of Endo and Ganeshkumar teaches claim 1.
While Zhang in view of Endo and Ganeshkumar provides the reduction of stationary noise in air conduction signals, Zhang in view of Endo and Ganeshkumar does not specifically teach the reduction of steady-state noise in the vibration signal, and thus does not teach
obtain the target vibration signal by suppressing steady-state noise in the vibration signal.
Fawaz, however, teaches obtain the target vibration signal by suppressing steady-state noise in the vibration signal (the vibration data may be normalized to remove error, i.e. obtain the target vibration signal by suppressing...in the vibration signal, related to background noise, i.e. steady-state noise [0039]).
Zhang, Endo, Ganeshkumar, and Fawaz are analogous art because they are from a similar field of endeavor in processing audio data from both air and vibration signals. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the reduction of stationary noise in air conduction signals teachings of Zhang, as modified by Endo and Ganeshkumar, with normalizing error in the vibration data related to background noise as taught by Fawaz. It would have been obvious to combine the references to enable authentication of a voice command based on correlation analysis between speech signals and vibration data (Fawaz [0040-1]).

Regarding claim 27, Zhang in view of Endo, Ganeshkumar, and Fawaz teaches claim 4, and Fawaz further teaches
suppress steady-state noise in the vibration signal in the frequency band range of 2kHz to 8kHz (the vibration data may be normalized to remove error related to background noise, i.e. suppress steady-state noise in the vibration signal, where normalization requires removal of spikes from the data, and removal of spikes may include applying a low-pass filter at 4kHz or 8kHz to preserve most of the features of the speech signals, i.e. in the frequency band range of 2kHz to 8kHz [0039],[0044-5]). 
Where the motivation to combine is the same as previously presented.

Claim(s) 28 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang, in view of Endo, in view of Ganeshkumar, and further in view of Dusan.

Regarding claim 28, Zhang in view of Endo and Ganeshkumar teaches claim 7. 
While Zhang in view of Endo and Ganeshkumar provides using frames of the bone conduction sound as part of a composite signal, Zhang in view of Endo and Ganeshkumar does not specifically teach that the highest frequency of the component in the target vibration signal is within a specific frequency range, and thus does not teach
the highest frequency of the component in the target vibration signal that is used for aliasing is not greater than 3000 Hz but not less than 1000 Hz.  
Dusan, however, teaches the highest frequency of the component in the target vibration signal that is used for aliasing is not greater than 3000 Hz but not less than 1000 Hz (the accelerometer may be tuned, i.e. vibration signal, to be sensitive to the frequency band by filtering out frequencies above 2000Hz-3000Hz using a low-pass filter, where the highest available frequency in the accelerator would then be between 2000Hz and 3000Hz, i.e. the highest frequency of the component in the target vibration signal…is not greater than 3000 Hz but not less than 1000 Hz [0023]). 
Where Endo teaches that the low frequency component is equal to the corrected bone conduction sound when the frame is to be a composite signal, i.e. component in the target vibration signal that is used for aliasing [0047],[0102].
Zhang, Endo, Ganeshkumar, and Dusan are analogous art because they are from a similar field of endeavor in processing audio data from both air and vibration signals. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the using frames of the bone conduction sound as part of a composite signal teachings of Zhang, as modified by Endo and Ganeshkumar, with the accelerometer having a specified upper frequency limit as taught by Dusan. It would have been obvious to combine the references to enable a VAD system that is more robust, less affected by ambient acoustic noise, and able to more accurately detect speech (Dusan [0016]).
Allowable Subject Matter
Claims 8, 9, 20, and 21 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:  
The closest prior art of Zhang teaches receiving speech
signals and ambient noise at an air conduction microphone and an alternative sensor
such as a throat or bone conduction sensor, determining if a frame contains speech,
determining noise estimates in non-speech frames, determining channel response and
variance in speech frames, and determining if a speech frame contains teeth clack.
However, Zhang does not teach the use of a microphone array, determining a noise
signal based on the positional relationship between the microphones in the array in nonspeech segments, synthesizing noise from all directions except the direction of the user voice, determining a relationship between the noise signal and the vibration signal, and removing the noise component in the vibration signal in speech segments to obtain a target vibration signal.
Endo teaches the correction of a bone conduction sound in a user voice frame to make the frequency spectrum of the bone conduction sound identical to the frequency spectrum of an air conduction sound with high SNR that has had noise reduction. However, Endo does not teach the use of a microphone array, determining a noise signal based on the positional relationship between the microphones in the array in non-speech segments, synthesizing noise from all directions except the direction of the user voice, and determining a relationship between the noise signal and the vibration signal.
Ganeshkumar teaches using multiple microphones in an array, comparing the signals of the microphones to identify different noise contributions, where a microphone may be used as a reference and subtracted from the accelerometer output. However, Ganeshkumar does not teach the determining a noise signal based on the positional relationship between the microphones in the array in nonspeech segments where the noise is synthesized from noises in all directions except the direction of the user voice, and determining a relationship between the noise signal and the vibration signal.
None of Zhang, Endo, and Ganeshkumar, either alone or in combination,
teaches or makes obvious the use of a microphone array to determine a noise signal
based on the positional relationship between the microphones in the array in nonspeech
segments, where the noise signal is synthesized from noises in all directions except the direction of the user voice, and determining a relationship between the noise signal and the vibration signal in signal segments that exclude the user voice in both the
sound and vibration signals. Therefore, none of the cited prior art either alone or in
combination, teaches or makes obvious the combination of limitations as recited in the dependent claims including all of the limitations of the base claim and any intervening claims.

	
Conclusion
	
	
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NICOLE A K SCHMIEDER whose telephone number is (571)270-1474. The examiner can normally be reached 8:00 - 5:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached at (571) 272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/NICOLE A K SCHMIEDER/Primary Examiner, Art Unit 2659
Read full office action
Prosecution Timeline

Jul 29, 2024
Application Filed
Mar 13, 2026
Non-Final Rejection — §103, §DP (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/219,339
Patent 12572751
ELECTRONIC DEVICE AND CONTROLLING METHOD OF ELECTRONIC DEVICE
2y 5m to grant Granted Mar 10, 2026
17/626,617
Patent 12567408
MULTI-MODAL SMART AUDIO DEVICE SYSTEM ATTENTIVENESS EXPRESSION
2y 5m to grant Granted Mar 03, 2026
17/938,173
Patent 12554930
TRANSFORMER-BASED TEXT ENCODER FOR PASSAGE RETRIEVAL
2y 5m to grant Granted Feb 17, 2026
17/418,679
Patent 12542131
SYSTEM AND METHOD FOR COMMUNICATING WITH A USER WITH SPEECH PROCESSING
2y 5m to grant Granted Feb 03, 2026
17/667,487
Patent 12531071
PACKET LOSS CONCEALMENT METHOD AND APPARATUS, STORAGE MEDIUM, AND COMPUTER DEVICE
2y 5m to grant Granted Jan 20, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
68%
Grant Probability
99%
With Interview (+34.0%)
2y 10m
Median Time to Grant
Low
PTA Risk
Based on 167 resolved cases by this examiner. Grant probability derived from career allow rate.