Last updated: April 19, 2026
Application No. 18/258,824
VOICE EXTRACTION METHOD AND APPARATUS, AND ELECTRONIC DEVICE

Non-Final OA §101§103
Filed
Jun 22, 2023
Examiner
BLANKENAGEL, BRYAN S
Art Unit
2658
Tech Center
2600 — Communications
Assignee
BEIJING YOUZHUJU NETWORK TECHNOLOGY CO., LTD.
OA Round
3 (Non-Final)
Interview Optional

— +35.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 377 resolved cases, 2023–2026
Examiner Intelligence

BLANKENAGEL, BRYAN S View full profile →
Grants 67% — above average
Career Allow Rate
254 granted / 377 resolved
+5.4% vs TC avg
Strong +35% interview lift
Without
With
+35.2%
Interview Lift
resolved cases with interview
Typical timeline
2y 7m
Avg Prosecution
23 currently pending
Career history
400
Total Applications
across all art units
Statute-Specific Performance

§101
25.6%
-14.4% vs TC avg
§103
49.3%
+9.3% vs TC avg
§102
13.3%
-26.7% vs TC avg
§112
6.5%
-33.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 377 resolved cases
Office Action

§101 §103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 11/25/2025 has been entered.

Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.

Response to Arguments
Applicant's arguments filed 08/04/2025 have been fully considered but they are not persuasive. Regarding arguments on page 8 of the Remarks, Examiner notes that microphone arrays are generic computing components, and there is no detail in the claims showing why the claimed arrays are specialized in any way compared to normal microphones. Therefore, use of a microphone array is not considered an integration into a practical application or significantly more. Some of the limitations can be performed mentally, such as a human determining a direction of a sound source, or extracting data. Other limitations are mathematical calculations, such as fusing or concatenating data. Using the generic hardware to perform the abstract ideas does not qualify as integration into a practical application or significantly more. 
Regarding arguments on pages 8-9 of the Remarks, Examiner notes that the solution to the technical problem is itself abstract, as noise suppression and direction determination can be performed mentally or as mathematical calculations. Therefore, the claims being a solution to a technical problem cannot bring the claims out of the realm of the abstract, as the solution itself is abstract.
Applicant’s arguments with respect to claim(s) 1-4, 6-7, 9, 11-15, 17-18, and 20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-4, 6-7, 9, 11-15, 17-18, and 20 rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more.  Using the subject matter eligibility test from page 74621 of the Federal Register Notice titled “2014 Interim Guidance on Patent Subject Matter Eligibility,” a two-step process is performed. Under step 1, the claims are analyzed to determine if the claim is directed to a process, machine, article of manufacture, or composition of matter. In this case, claims 1-4, 6-7, and 9 are directed to a method, which is a process; claims 11, 13-15, 17-18, and 20 are directed to a device, which is a machine or an article of manufacture; and claim 12 is directed to a computer readable medium, which is a machine or an article of manufacture. Step 2A (part 1 of the Mayo test), using the guidance from pages 50-57 of the Federal Register Vol. 84 No. 4 from Monday, January 7, 2019, requires applying a two-prong inquiry. In Prong One, examiners evaluate whether the claim recites a judicial exception, determining if the claim is directed to a law of nature, a natural phenomenon, or an abstract idea. In this case, claim 1 recites performing signal processing to obtain a normalized feature, determining a speech feature, and fusing the features, which is a mental process or mathematical calculation. In Prong Two, examiners evaluate whether the judicial exception is integrated into a practical application that imposes a meaningful limit on the judicial exception. In this case, obtaining data is mere extrasolution activity, while structural elements such as processor, memory, and computer readable medium are generic computing components, none of which integrate the abstract idea into a practical application.
Step 2B (part 2 of the Mayo test) requires analyzing the claims to determine if they recite additional elements that amount to significantly more than the judicial exception. In this case, the claims do not include additional elements that are sufficient to amount to significantly more than the abstract idea itself.  

Regarding claims 1 and 11-12, performing signal processing to obtain a normalized feature, determining a direction as a target direction, determining a speech feature, fusing the features, inputting data to a model, obtaining speech data, and concatenating data are mental processes or mathematical calculations, which is an abstract idea. For example, a human could determine a general probability of speech being present in a certain direction, could determine a probability that speech exists at all, and using those probabilities, extract speech data from a target direction. Additional limitations of obtaining data is mere extrasolution activity, while structural elements such as processor, memory, and computer readable medium are generic computing components, none of which integrate the abstract idea into a practical application or constitute significantly more.

Regarding claims 2 and 13, the limitations are further clarifications of the above abstract ideas and the pre-trained model does not include sufficient detail to be considered non-abstract.

Regarding claims 3 and 14, performing compression or expansion are mathematical processes, while inputting data into a model and obtaining speech features are mental processes, both of which are abstract ideas. Use of a neural network is simply using a computer to apply the abstract idea, and does not integrate the abstract ideas into a practical application or constitute significantly more.

Regarding claims 4 and 15, the limitations are further clarifications of the above abstract ideas.

Regarding claims 6 and 17, performing processing and post processing are mathematical calculations, which are an abstract idea without integration into a practical application and without significantly more.

Regarding claims 7 and 18, beamforming and cross correlation are mathematical calculations, which are an abstract idea without integration into a practical application and without significantly more.

Regarding claims 9 and 20, adding noise to speech is a mathematical calculation, while obtaining data is mere extrasolution activity, and does not integrate the abstract idea into a practical application or constitute significantly more.

The limitations of the claims, taken alone, do not amount to significantly more than the above-identified judicial exception (the abstract idea). Looking at the limitations as an ordered combination adds nothing that is not already present when looking at the elements individually. Applicable case law cited in the Federal Register includes, but is not limited to: Alice Corp., 134 S. Ct. at 2355-56, Digitech Image Tech., LLC v. Electronics for Imaging, Inc., 758 F.3d 1344 (Fed. Cir. 2014), Benson, 409 U.S. at 63.

See "Preliminary Examination Instructions in view of the Supreme Court Decision in Alice Corporation Pty. Ltd. v. CLS Bank International, et al.," dated June 25, 2014, and the Federal Register notice titled "2014 Interim Guidance on Patent Subject Matter Eligibility" (79 FR 74618).

		
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-4 and 11-15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Pishehvar et al. (US 11,341,988 B1), hereinafter referred to as Pishehvar, in view of Kim (US 9,521,484 B2), and further in view of Zhang et al. (US 2023/0032385 A1), hereinafter referred to as Zhang.

Regarding claim 1, Pishehvar teaches:
A method for extracting a speech, comprising: 
obtaining microphone array data (col. 10 line 60 - col. 11 line 18, where a microphone array captures speech to generate a multi-channel audio signal); 
performing signal processing on the microphone array data to obtain a normalized feature, wherein the normalized feature is for characterizing a probability of presence of a speech in a predetermined direction (col. 10 line 60 - col. 11 line 18, where a probability that speech is coming from a particular direction is estimated from the multi-channel audio signal), multiple predetermined directions include a direction where a sound source is located and a direction where a sound source is not located (col. 10 line 60 - col. 11 line 18, where a direction of a target speaker is determined, meaning other directions do not correspond to the target speaker);
based on multiple normalized features associated with multiple predetermined directions, determining the direction where a sound source is located as a target direction from the predetermined direction (col. 8 lines 8-29, where the speech presence estimator uses normalized error values for regression features, and col. 11 lines 35-47, where the output from the speech presence estimator is combined with the probability estimates of speech direction); 
determining, based on the microphone array data, a speech feature of a speech in a target direction (col. 11 lines 19-34, where a probability of presence of speech in a channel in a current frame is determined as a speech feature); and 
fusing the normalized feature with the speech feature of the speech in the target direction, and extracting speech data in the target direction based on the fused speech feature (col. 11 lines 35-47, where the two probability estimates are combined, and col. 9 line 55 - col. 10 line 13, where the complete utterances are output);
wherein the fusing the normalized feature with the speech feature of the speech in the target direction, and extracting speech data in the target direction based on the fused speech feature comprises:
inputting the speech feature into the pre-trained model for speech extraction, to obtain the speech data in the target direction (col. 9 line 55 - col. 10 line 13, col. 11 lines 48-67, where the fused speech feature is used to output the complete utterances).  
Pishehvar does not teach:
wherein microphone array data is associated with multiple predetermined directions;
concatenating the normalized feature and the speech feature of the speech in the target direction, and 
inputting the concatenated speech feature into the pre-trained model for speech extraction, to obtain the speech data in the target direction.  
Kim teaches:
wherein microphone array data is associated with multiple predetermined directions (Fig. 1a, col. 5 lines 20-36, where beams are set in predetermined directions);
It would have been obvious to one of ordinary skill in the art to modify the system of Pishehvar by using the predetermined directions of Kim (Kim Fig. 1a) in the beamforming of Pishehvar (Pishehvar col. 10 line 60 – col. 11 line 18), so that the microphone array may be disposed at a predetermined position and the beams may be directed to the seats where users will be sitting (Kim col. 2 lines 54-64).
Zhang teaches:
concatenating the normalized feature and the speech feature of the speech in the target direction (para [0128], [0131], where speech features are concatenated), and
inputting the concatenated speech feature into the pre-trained model (para [0131], where the concatenated features are input to a model).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Pishehvar in view of Kim by using the concatenation of Zhang (Zhang para [0128], [0131]) as the fusion of Pishehvar in view of Kim (Pishehvar Fig. 5 element 313), in order to obtain a feature matrix of speech features in order to train the model (Zhang para [0117], [0131]).

Regarding claim 2, Pishehvar in view of Kim and Zhang teaches:
The method according to claim 1, wherein the determining, based on the microphone array data, a speech feature of a speech in a target direction comprises: 
determining the speech feature of the speech in the target direction based on the microphone array data and a pre-trained model for speech feature extraction (Pishehvar col. 4 lines 43-58, col. 10 line 60 - col. 11 line 18, where a pre-trained DNN is used for extracting features from the multi-channel audio signal).  

Regarding claim 3, Pishehvar in view of Kim and Zhang teaches:
The method according to claim 2, wherein the determining the speech feature of the speech in the target direction based on the microphone array data and a pre-trained model for speech feature extraction comprises: 
inputting the microphone array data into the pre-trained model for speech feature extraction, to obtain the speech feature of the speech in a predetermined direction (Pishehvar col. 11 lines 19-34, where a probability of presence of speech in a channel in a current frame is determined as a speech feature); and 
performing, through a pre-trained recursive neural network, compression or expansion on the speech feature of the speech in the predetermined direction to obtain the speech feature of the speech in the target direction (Pishehvar col. 12 lines 33-40, where the pre-trained DNN is a recursive neural network, and col. 7 lines 24-59, where the DNN reduces the feature dimension of the context window).  

Regarding claim 4, Pishehvar in view of Kim and Zhang teaches:
The method according to claim 2, wherein the model for speech feature extraction comprises a complex convolutional neural network based on spatial variation (Pishehvar col. 8 lines 8-29, where the DNN is a convolutional neural network, and col. 11 lines 19-34, where the DNN operates on a beamformed signal with directional sensitivity, teaching spatial variation). 

Regarding claim 11, Pishehvar teaches:
An electronic device, comprising: 
at least one processor (Col. 13 lines 12-32, where a processor is used); and
at least one memory communicatively coupled to the at least one processor and storing instructions (Col. 13 lines 12-32, where memory is used) that upon execution by the at least one processor cause the device to:
obtain microphone array data (col. 10 line 60 - col. 11 line 18, where a microphone array captures speech to generate a multi-channel audio signal);
perform signal processing on the microphone array data to obtain a normalized feature, wherein the normalized feature is for characterizing a probability of presence of a speech in a predetermined direction (col. 10 line 60 - col. 11 line 18, where a probability that speech is coming from a particular direction is estimated from the multi-channel audio signal), multiple predetermined directions include a direction where a sound source is located and a direction where a sound source is not located (col. 10 line 60 - col. 11 line 18, where a direction of a target speaker is determined, meaning other directions do not correspond to the target speaker);
based on multiple normalized features associated with multiple predetermined directions, determine the direction where a sound source is located as a target direction from the predetermined direction (col. 8 lines 8-29, where the speech presence estimator uses normalized error values for regression features, and col. 11 lines 35-47, where the output from the speech presence estimator is combined with the probability estimates of speech direction);
determine, based on the microphone array data, a speech feature of a speech in a target direction (col. 11 lines 19-34, where a probability of presence of speech in a channel in a current frame is determined as a speech feature); and
fuse the normalized feature with the speech feature of the speech in the target direction, and extracting speech data in the target direction based on the fused speech feature (col. 11 lines 35-47, where the two probability estimates are combined, and col. 9 line 55 - col. 10 line 13, where the complete utterances are output);
the at least one memory further storing instructions that upon execution by the at least one processor cause the device to: input the speech feature into a pre-trained model for speech extraction, to obtain the speech data in the target direction (col. 9 line 55 - col. 10 line 13, col. 11 lines 48-67, where the fused speech feature is used to output the complete utterances).
Pishehvar does not teach:
wherein microphone array data is associated with multiple predetermined directions;
concatenate the normalized feature and the speech feature of the speech in the target direction, and input the concatenated speech feature into the pre-trained model for speech extraction, to obtain the speech data in the target direction.  
Kim teaches:
wherein microphone array data is associated with multiple predetermined directions (Fig. 1a, col. 5 lines 20-36, where beams are set in predetermined directions);
It would have been obvious to one of ordinary skill in the art to modify the system of Pishehvar by using the predetermined directions of Kim (Kim Fig. 1a) in the beamforming of Pishehvar (Pishehvar col. 10 line 60 – col. 11 line 18), so that the microphone array may be disposed at a predetermined position and the beams may be directed to the seats where users will be sitting (Kim col. 2 lines 54-64).
Zhang teaches:
concatenate the normalized feature and the speech feature of the speech in the target direction (para [0128], [0131], where speech features are concatenated), and input the concatenated speech feature into the pre-trained model (para [0131], where the concatenated features are input to a model).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Pishehvar in view of Kim by using the concatenation of Zhang (Zhang para [0128], [0131]) as the fusion of Pishehvar in view of Kim (Pishehvar Fig. 5 element 313), in order to obtain a feature matrix of speech features in order to train the model (Zhang para [0117], [0131]).

Regarding claim 12, Pishehvar teaches:
A computer-readable non-transitory medium bearing computer- readable instructions (Col. 13 lines 12-32, where a computer-readable medium is used) that upon execution on a computing device cause the computing device at least to:
obtain microphone array data (col. 10 line 60 - col. 11 line 18, where a microphone array captures speech to generate a multi-channel audio signal);
perform signal processing on the microphone array data to obtain a normalized feature, wherein the normalized feature is for characterizing a probability of presence of a speech in a predetermined direction (col. 10 line 60 - col. 11 line 18, where a probability that speech is coming from a particular direction is estimated from the multi-channel audio signal), multiple predetermined directions include a direction where a sound source is located and a direction where a sound source is not located (col. 10 line 60 - col. 11 line 18, where a direction of a target speaker is determined, meaning other directions do not correspond to the target speaker);
based on multiple normalized features associated with multiple predetermined directions, determine the direction where a sound source is located as a target direction from the predetermined direction (col. 8 lines 8-29, where the speech presence estimator uses normalized error values for regression features, and col. 11 lines 35-47, where the output from the speech presence estimator is combined with the probability estimates of speech direction);
determine, based on the microphone array data, a speech feature of a speech in a target direction (col. 11 lines 19-34, where a probability of presence of speech in a channel in a current frame is determined as a speech feature); and
fuse the normalized feature with the speech feature of the speech in the target direction, and extracting speech data in the target direction based on the fused speech feature (col. 11 lines 35-47, where the two probability estimates are combined, and col. 9 line 55 - col. 10 line 13, where the complete utterances are output);
bearing computer-readable instructions that upon execution on a computing device cause the computing device at least to: input the speech feature into a pre-trained model for speech extraction, to obtain the speech data in the target direction (col. 9 line 55 - col. 10 line 13, col. 11 lines 48-67, where the fused speech feature is used to output the complete utterances).  
Pishehvar does not teach:
wherein microphone array data is associated with multiple predetermined directions;
concatenate the normalized feature and the speech feature of the speech in the target direction, and input the concatenated speech feature into a pre-trained model for speech extraction, to obtain the speech data in the target direction.  
Kim teaches:
wherein microphone array data is associated with multiple predetermined directions (Fig. 1a, col. 5 lines 20-36, where beams are set in predetermined directions);
It would have been obvious to one of ordinary skill in the art to modify the system of Pishehvar by using the predetermined directions of Kim (Kim Fig. 1a) in the beamforming of Pishehvar (Pishehvar col. 10 line 60 – col. 11 line 18), so that the microphone array may be disposed at a predetermined position and the beams may be directed to the seats where users will be sitting (Kim col. 2 lines 54-64).
Zhang teaches:
concatenate the normalized feature and the speech feature of the speech in the target direction (para [0128], [0131], where speech features are concatenated), and input the concatenated speech feature into a pre-trained model (para [0131], where the concatenated features are input to a model).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Pishehvar in view of Kim by using the concatenation of Zhang (Zhang para [0128], [0131]) as the fusion of Pishehvar in view of Kim (Pishehvar Fig. 5 element 313), in order to obtain a feature matrix of speech features in order to train the model (Zhang para [0117], [0131]).

Regarding claim 13, Pishehvar in view of Kim and Zhang teaches:
The device of claim 11, the at least one memory further storing instructions that upon execution by the at least one processor cause the device to: 
determine the speech feature of the speech in the target direction based on the microphone array data and a pre-trained model for speech feature extraction (Pishehvar col. 4 lines 43-58, col. 10 line 60 - col. 11 line 18, where a pre-trained DNN is used for extracting features from the multi-channel audio signal).  

Regarding claim 14, Pishehvar in view of Kim and Zhang teaches:
The device of claim 13, the at least one memory further storing instructions that upon execution by the at least one processor cause the device to: 
input the microphone array data into the pre-trained model for speech feature extraction, to obtain the speech feature of the speech in a predetermined direction (Pishehvar col. 11 lines 19-34, where a probability of presence of speech in a channel in a current frame is determined as a speech feature); and 
perform, through a pre-trained recursive neural network, compression or expansion on the speech feature of the speech in the predetermined direction to obtain the speech feature of the speech in the target direction (col. 12 lines 33-40, where the pre-trained DNN is a recursive neural network, and col. 7 lines 24-59, where the DNN reduces the feature dimension of the context window).  

Regarding claim 15, Pishehvar in view of Kim and Zhang teaches:
The device of claim 13, wherein the model for speech feature extraction comprises a complex convolutional neural network based on spatial variation (Pishehvar col. 8 lines 8-29, where the DNN is a convolutional neural network, and col. 11 lines 19-34, where the DNN operates on a beamformed signal with directional sensitivity, teaching spatial variation).  

Claim(s) 6-7 and 17-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Pishehvar, in view of Kim, and Zhang, and further in view of Wu et al. (Wu, D., Zhang, K., & Wei, Y. (2019, August). A Speech Enhancement System Based on Real-time Sound Source Localization and Super-directional Fixed Beamforming. In 2019 IEEE International Conference on Real-time Computing and Robotics (RCAR) (pp. 334-339). IEEE.), hereinafter referred to as Wu.

Regarding claim 6, Pishehvar in view of Kim and Zhang teaches:
The method according to claim 1, wherein the performing signal processing on the microphone array data to obtain a normalized feature comprises: 
performing processing on the microphone array data through a target technology (Pishehvar col. 11 lines 19-34, where beamforming is performed to generate a beamformed signal), and 
performing post-processing on data obtained from the processing, to obtain the normalized feature (Pishehvar col. 10 line 60 - col. 11 line 18, where the directional probability estimator operates on a beamformed audio signal),
Pishehvar in view of Kim and Zhang does not teach:
wherein the target technology comprises at least one of the following: a fixed beamforming technology and a speech blind separation technology.  
Wu teaches:
wherein the target technology comprises at least one of the following: a fixed beamforming technology and a speech blind separation technology (page 335 section III, where fixed beamforming is used).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Pishehvar in view of Kim and Zhang by using the fixed beamforming of Wu (Wu page 335 section III) as the beamforming of Pishehvar in view of Kim and Zhang (Pishehvar col. 11 lines 19-34), to achieve significantly better performance than a delay-summation beamformer (Wu page 337 col. 1 last paragraph)

Regarding claim 7, Pishehvar in view of Kim, Zhang, and Wu teaches:
The method according to claim 6, wherein the performing processing on the microphone array data through a target technology, and performing post-processing on data obtained from the processing, comprises: 
processing the microphone array data through the fixed beamforming technology and a cross-correlation based speech enhancement technology (Wu page 335 section III, where a microphone array system is filtered to enhance the sound source, page 335 section A equation 7-8, where cross correlation is used, and page 336 section B, where fixed beamforming is used).  

Regarding claim 17, Pishehvar in view of Kim and Zhang teaches:
The device of claim 11, the at least one memory further storing instructions that upon execution by the at least one processor cause the device to: 
perform processing on the microphone array data through a target technology (Pishehvar col. 11 lines 19-34, where beamforming is performed to generate a beamformed signal), and 
perform post-processing on data obtained from the processing, to obtain the normalized feature (Pishehvar col. 10 line 60 - col. 11 line 18, where the directional probability estimator operates on a beamformed audio signal), wherein
Pishehvar in view of Kim and Zhang does not teach:
the target technology comprises at least one of the following: a fixed beamforming technology and a speech blind separation technology.  
Wu teaches:
the target technology comprises at least one of the following: a fixed beamforming technology and a speech blind separation technology (page 335 section III, where fixed beamforming is used).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Pishehvar in view of Kim and Zhang by using the fixed beamforming of Wu (Wu page 335 section III) as the beamforming of Pishehvar in view of Kim and Zhang (Pishehvar col. 11 lines 19-34), to achieve significantly better performance than a delay-summation beamformer (Wu page 337 col. 1 last paragraph)

Regarding claim 18, Pishehvar in view of Kim, Zhang, and Wu teaches:
The device of claim 17, the at least one memory further storing instructions that upon execution by the at least one processor cause the device to: 
process the microphone array data through the fixed beamforming technology and a cross-correlation based speech enhancement technology (Wu page 335 section III, where a microphone array system is filtered to enhance the sound source, page 335 section A equation 7-8, where cross correlation is used, and page 336 section B, where fixed beamforming is used).  

Claim(s) 9 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Pishehvar, in view of Kim, and Zhang, and further in view of Xue et al. (US 2019/0237065 A1), hereinafter referred to as Xue.

Regarding claim 9, Pishehvar in view of Kim and Zhang teaches:
The method according to claim 1, wherein the microphone array data is generated by:
Pishehvar in view of Kim and Zhang does not teach:
obtaining near-field speech data, and converting the near-field speech data into far-field speech data; and
adding a noise to the far-field speech data to obtain the microphone array data.  
Xue teaches:
obtaining near-field speech data, and converting the near-field speech data into far-field speech data (Fig. 1, para [0014-15], [0020-21], where near field speech data is converted to far-field speech data); and
adding a noise to the far-field speech data to obtain the microphone array data (Fig. 1, para [0014-15], [0020-21], where noise is added to generate the far-field speech data).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Pishehvar in view of Kim and Zhang by using the conversion of Xue (Xue para [0020-21]) for the microphone array data of Pishehvar in view of Kim and Zhang (Pishehvar col. 10 line 60 - col. 11 line 18), in order to simulate far field audio data and obtain more robust analog far-field audio data by integrating environmental factors (Xue para [0018]).

Regarding claim 20, Pishehvar in view of Kim and Zhang teaches:
The device of claim 11, wherein the microphone array data is generated by:
Pishehvar in view of Kim and Zhang does not teach:
obtaining near-field speech data, and converting the near-field speech data into far-field speech data; and 
adding a noise to the far-field speech data to obtain the microphone array data.
Xue teaches:
obtaining near-field speech data, and converting the near-field speech data into far-field speech data (Fig. 1, para [0014-15], [0020-21], where near field speech data is converted to far-field speech data); and 
adding a noise to the far-field speech data to obtain the microphone array data (Fig. 1, para [0014-15], [0020-21], where noise is added to generate the far-field speech data).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Pishehvar in view of Kim and Zhang by using the conversion of Xue (Xue para [0020-21]) for the microphone array data of Pishehvar in view of Kim and Zhang (Pishehvar col. 10 line 60 - col. 11 line 18), in order to simulate far field audio data and obtain more robust analog far-field audio data by integrating environmental factors (Xue para [0018]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. US 2020/0184954 A1 para [0052] teaches beamforming, where signals from a direction different from a predetermined wanted signal direction are suppressed.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRYAN S BLANKENAGEL whose telephone number is (571)270-0685. The examiner can normally be reached 8:00am-5:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached at 571-272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BRYAN S BLANKENAGEL/Primary Examiner, Art Unit 2658
Read full office action
Prosecution Timeline

Jun 22, 2023
Application Filed
Apr 29, 2025
Non-Final Rejection — §101, §103
Aug 04, 2025
Response Filed
Aug 21, 2025
Final Rejection — §101, §103
Oct 24, 2025
Response after Non-Final Action
Nov 25, 2025
Request for Continued Examination
Dec 03, 2025
Response after Non-Final Action
Mar 02, 2026
Non-Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/401,768
Patent 12602551
GENERATION OF SYNTHETIC DOCUMENTS FOR DATA AUGMENTATION
2y 5m to grant Granted Apr 14, 2026
17/850,617
Patent 12579993
Multi-Talker Audio Stream Separation, Transcription and Diaraization
2y 5m to grant Granted Mar 17, 2026
18/014,217
Patent 12572759
MULTILINGUAL CONVERSATION TOOL
2y 5m to grant Granted Mar 10, 2026
18/251,876
Patent 12555591
MACHINE LEARNING ASSISTED SPATIAL NOISE ESTIMATION AND SUPPRESSION
2y 5m to grant Granted Feb 17, 2026
18/066,128
Patent 12547836
KNOWLEDGE FACT RETRIEVAL THROUGH NATURAL LANGUAGE PROCESSING
2y 5m to grant Granted Feb 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
67%
Grant Probability
99%
With Interview (+35.2%)
2y 7m
Median Time to Grant
High
PTA Risk
Based on 377 resolved cases by this examiner. Grant probability derived from career allow rate.