DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
In response to the Non-final Office Action from 9/8/2025, Applicant has filed an amendment on 11/24/2025. In this reply, Applicant has amended independent claims 1, 9, and 17 to further recite that target signals to define pre-recorded signals later used to generate a blocking matrix are recorded "during an offline tuning process." Applicant argues that the prior art of record fails to teach such signals that are used to generate a blocking matrix (Remarks, Pages 8-10). These arguments have been fully considered, however, are moot with respect to the new grounds of rejection, necessitated by the amended claims and further in view of Wang, et al. (“Semi-Supervised Learning with Deep Neural Networks for Relative Transfer Inverse Regression," 2018).
Response to Arguments
Despite the change in the grounds of rejection necessitated by the instant amendment, there remains some arguments that may still pertain to the prior art relied upon in the non-final action- Tashev, et al. In particular, Applicant first argues that Tashev fails to disclose determining activity of a target signal based on detected signals from unblocked zones because the interference signals are removed by the adaptive interference canceller, not utilized as part of the determination of target activity" (Remarks, Pages 9-10).
In response, it should first be noted that the activity of a target signal does not utilize the signals from the unblocked zones as argued by Applicant. Instead, this determination is generically somehow "based on the detected signals from the unblocked zones." Keeping in mind the actual claim language under the broadest reasonable interpretation (BRI), it is noted that Tashev discloses that the "estimated desired signal" or activity determined to have speech presence is determined based upon "signal components...that are correlated to the interfering signal regions" (Paragraphs 0051-0054) since the desired estimated signal is what remains after the interfering signals have been estimated and then removed via the blocking matrix (i.e., the desired estimated signal is “based on” these interfering signals unblocked in the blocking matrix) . Applicant position may only be based on the reading of Paragraph 0051 and not the entirety of 0051-0054 where the signal level/activity from a particular direction is specifically determined using the blocking matrix that contains the unblocked zone noise/interference. In this manner given the current high-level description of this process in the claimed invention, this argument is not found to be persuasive.
Applicant lastly presents a traversal based upon a disagreement with the position of record that the adaptive filter coefficients by alleging that such coefficients are "mathematically and technically distinct from an RTF vector" (Remarks, Page 10). In response, it is noted that a collection of data all pertaining to filter operation, in this case, a generalized sidelobe canceller (GSCs), constitute a RTF for a specific microphone array (see Fig. 5, Element 502 where the input is from a microphone array having different relative positions; see also Paragraph 0003- GSC is an adaptive beamformer that keeps track of the characteristics of interfering signals and then attenuates or cancels these interfering signals using an adaptive interference canceller (AIC); see also Paragraph 0028). Note also that newly added prior art to the 35 U.S.C. 103 describes such a standard concept in detail for GSCs- "generalized sidelobe canceler (GSC) [1], that requires RTF in the implementation of a blocking matrix to provide the reference noise signal" (Section 6.3, Page 194). For this reason, these allegations have not been found to be persuasive.
The prior art rejections of the remaining dependent claims have been traversed for reasons similar to the independent claims (Remarks, Page 10). In regards to such arguments, see the preceding response directed towards the independent claims.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-2, 7, 9-10, 15, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Tashev, et al. (U.S. PG Publication: 2008/0232607 A1) in view of Tzirkel-Hancock, et al. (U.S. PG Publication: 2014/032129 A1) and further in view of Wang, et al. (“Semi-Supervised Learning with Deep Neural Networks for Relative Transfer Inverse Regression," 2018).
With respect to Claim 1, Tashev discloses:
A computer-implemented method when executed by data processing hardware causes the data processing hardware to perform operations comprising:
generating a blocking matrix based on pre-recorded signals from a target zone (captured signals from a "desired" region are used to estimate a "blocking matrix" that will yield a matrix that contains only interfering signals, Paragraph 0052; see also Paragraphs 0025, 0028, 0030, and 0036 for more discussion on blocking matrix processing; Fig. 1, Element 118 showing a microphone array);
receiving, at a voice activity detector, audio frames from a microphone array (microphone array input into a "voice activity detector," Paragraphs 0050-0051);
applying the blocking matrix to one or more zones (the blocking matrix is applied to "undesirable directions" (i.e., regions) that are not the desired region as well as the desired region for subtraction/blocking/masking, Paragraphs 0038 and 0052);
detecting signals from unblocked zones (the blocking matrix detects interference signals from the "undesirable directions" (i.e., regions), Paragraphs 0038 and 0052);
determining an activity of a target signal based on the detected signals from the unblocked zones (desired signal activity is estimated relative to the interfering signals from the undesired directions, Paragraphs 0051-0054);
estimating, by a beamformer, a relative transfer function (RTF) vector based on the received audio frames and the determined activity of the target signal (a plurality of parameters (i.e., constituting a vector) in an adaptive beamformer using a generalized sidelobe canceller indicative of a noise signal relative to a beamformer output from a microphone array to output a signal with an enhanced signal-to-noise ratio, Paragraphs 0007, 0028, 0030, 0042, 0049, and 0054).
While Tashev teaches the plurality of steps comprised in the adaptive beamforming process utilizing a blocking matrix and voice activity detection and could readily be applied to the zones of any environment (e.g., office, kitchen, car, etc.), Tashev does not specifically recite that the zones of interest (i.e., blocked and unblocked zones) pertain to a vehicle. Tzirkel-Hancock, however, discloses utilizing adaptive beamforming including a generalized sidelobe canceller applied to target and "other" locations within a vehicle that also features speech detection to determine a relative transfer function (Paragraphs 0014-0015 and 0042-0044; Fig. 1, Element 106 showing positions in a vehicle).
Tashev and Tzirkel-Hancock are analogous art because they are from a similar field of endeavor in adaptive beamforming utilizing blocking matrices. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date, to utilize the desired and undesired locations taught by Tashev as locations within a vehicle to generate a relative transfer function for that vehicle as taught by Tzirkel-Hancock to provide a predictable result of improving speech applications in a vehicle such as speech recognition that is adversely affected by noise and other disturbances (Tzirkel-Hancock, Paragraph 0003).
Although Tashev teaches pre-recorded signals for blocking matrix estimation in that captured signals are later used for the estimation (i.e., the signals are first captured or "pre-recorded"), Tashev in view of Tzirkel-Hancock does not pre-recorded signals for blocking matrix generation in that the pre-recording takes place "during an offline tuning process." Wang, however, recites a training process for a relative transfer function that defines a blocking matrix where target samples are gathered in an offline/training procedure (Abstract; Sections 4-4.1, Page 192; Section 6.1, Pages 193-194; Section 6.3, Page 194 (“the generalized sidelobe canceler (GSC) [1], that requires RTF in the implementation of a blocking matrix to provide the reference
noise signal”).
Tashev, Tzirkel-Hancock, and Wang are analogous art because they are from a similar field of endeavor in adaptive beamforming utilizing blocking matrices. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date, to utilize the RTF/blocking matrix training taught by Wang for different positions/orientations in the blocking matrix determination taught by Tashev in view of Tzirkel-Hancock to provide a predictable result of exploiting microphone information about a scene (Wang, Section 1- Introduction, Page 191) to improve noise identification and filtering over time. The addition also makes sense in the context of Tzirkel-Hancock where microphones are fixed in a vehicle environment.
With respect to Claim 2, Tashev further discloses:
The method of Claim 1, wherein the operations further include defining a blocking area of the blocking matrix (blocking area determination for the blocking matrix as the "desired" region, Paragraphs 0036 and 0052).
With respect to Claim 7, Tashev further discloses:
The method of Claim 1, wherein generating the blocking matrix includes recording clean signals from each zone (the blocking matrix utilizes a target zone/direction, Paragraph 0052, wherein the target is obtained by capturing clean/target signals from zones where they are present consisting of potential positions for M microphones within the array, Paragraphs 0007 and 0058).
Claim 9 represents a system embodiment comprising data processing hardware that executes program instructions stored on a memory, and thus, is rejected under similar rationale. Tashev also teaches method implementation as a system comprising a computer processor and memory storing processor-executable instructions (Paragraph 0019).
Claim 10 contains subject matter similar to claim 2, and thus, is rejected under similar rationale.
Claim 15 contains subject matter similar to claim 7, and thus, is rejected under similar rationale.
With respect to Claim 17, Tashev discloses:
A speech enhancement system for a vehicle, the speech enhancement system comprising: data processing hardware; and memory hardware in communication with the data processing hardware, the memory hardware storing instructions that when executed on the data processing hardware cause the data processing hardware to perform operations (a computer processor and memory storing processor-executable instructions, Paragraph 0019) comprising:
generating a blocking matrix based on a steering vector, the blocking matrix including a mask (beamforming steering vector used to identify a target signal from an M microphone region (see Paragraphs 0004, 0007-0008, 0058, 0060), wherein the target direction is then masked/blocked in a blocking matrix, Paragraphs 0025, 0036, and 0051-0052);
receiving, at a voice activity detector, audio frames from a microphone array (microphone array input into a "voice activity detector," Paragraphs 0050-0051);
applying the blocking matrix to one or more zones (the blocking matrix is applied to "undesirable directions" (i.e., regions) that are not the desired region as well as the desired region for subtraction/blocking/masking, Paragraphs 0038 and 0052);
detecting signals from unblocked zones (the blocking matrix detects interference signals from the "undesirable directions" (i.e., regions), Paragraphs 0038 and 0052); and
determining an activity of a target signal based on the detected signals (desired signal activity is estimated relative to the interfering signals from the undesired directions, Paragraphs 0051-0054).
While Tashev teaches the plurality of steps comprised in the adaptive beamforming process utilizing a blocking matrix and voice activity detection and could readily be applied to the zones of any environment (e.g., office, kitchen, car, etc.), Tashev does not specifically recite that the zones of interest (i.e., blocked and unblocked zones) pertain to a vehicle. Tzirkel-Hancock, however, discloses utilizing adaptive beamforming including a generalized sidelobe canceller applied to target and "other" locations within a vehicle that also features speech detection to determine a relative transfer function (Paragraphs 0014-0015 and 0042-0044; Fig. 1, Element 106 showing positions in a vehicle).
Tashev and Tzirkel-Hancock are analogous art because they are from a similar field of endeavor in adaptive beamforming utilizing blocking matrices. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date, to utilize the desired and undesired locations taught by Tashev as locations within a vehicle to generate a relative transfer function for that vehicle as taught by Tzirkel-Hancock to provide a predictable result of improving speech applications in a vehicle such as speech recognition that is adversely affected by noise and other disturbances (Tzirkel-Hancock, Paragraph 0003).
Although Tashev teaches pre-recorded signals for blocking matrix estimation in that captured signals are later used for the estimation (i.e., the signals are first captured or "pre-recorded"), Tashev in view of Tzirkel-Hancock does not pre-recorded signals for blocking matrix generation in that the pre-recording takes place "during an offline tuning process." Wang, however, recites a training process for a relative transfer function that defines a blocking matrix where target samples are gathered in an offline/training procedure (Abstract; Sections 4-4.1, Page 192; Section 6.1, Pages 193-194; Section 6.3, Page 194 (“the generalized sidelobe canceler (GSC) [1], that requires RTF in the implementation of a blocking matrix to provide the reference
noise signal”).
Tashev, Tzirkel-Hancock, and Wang are analogous art because they are from a similar field of endeavor in adaptive beamforming utilizing blocking matrices. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date, to utilize the RTF/blocking matrix training taught by Wang for different positions/orientations in the blocking matrix determination taught by Tashev in view of Tzirkel-Hancock to provide a predictable result of exploiting microphone information about a scene (Wang, Section 1- Introduction, Page 191) to improve noise identification and filtering over time. The addition also makes sense in the context of Tzirkel-Hancock where microphones are fixed in a vehicle environment.
Claims 3 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Tashev, et al. in view of Tzirkel-Hancock, et al. in view of Wang, et al. and further in view of Laroche, et al. (U.S. Patent: 8,682,006).
With respect to Claim 3, Tashev in view of Tzirkel-Hancock in view of Wang teaches the adaptive beamforming process utilizing a blocking matrix and voice activity detection as applied to Claim 1. Tashev in view of Tzirkel-Hancock in view of Wang do not specifically mention tracking a noise floor with an energy detector. Laroche, however, discloses that a mask in a blocking matrix is based upon a "detected noise floor" that involves a detection of "noise estimate energy" (Col. 6, Lines 10-22; Col. 8, Lines 58-67; and Col. 10, Lines 13-21).
Tashev, Tzirkel-Hancock, Wang, and Laroche are analogous art because they are from a similar field of endeavor in beamforming utilizing blocking matrices. Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date to use continual tracking of noise floor estimates in Laroche in the speech presence detection taught by Tashev in view of Tzirkel-Hancock in view of Wang to provide a predictable result of continually tracking noise levels in an environment with continually changing interference conditions (i.e., in a vehicle).
Claim 11 contains subject matter similar to claim 3, and thus, is rejected under similar rationale.
Claims 4-6 and 12-14 are rejected under 35 U.S.C. 103 as being unpatentable over Tashev, et al. in view of Tzirkel-Hancock, et al. in view of Wang, et al. and further in view of Amehraye, et al. ("Voice Activity Detection based on a Statistical Semiparametric Test," 2013).
With respect to Claim 4, Tashev in view of Tzirkel-Hancock in view of Wang teaches the adaptive beamforming process utilizing a blocking matrix and voice activity detection as applied to Claim 1. Tashev in view of Tzirkel-Hancock in view of Wang does not teach that an energy threshold for voice activity detection is determined by Monte Carlo Simulation. Amehraye, however, discloses an energy detector for voice activity detection uses a computation of a threshold by Monte Carlo Simulation (Sections 3.1-3.2, Pages 4-5).
Tashev, Tzirkel-Hancock, Wang, and Amehraye are analogous art because they are from a similar field of endeavor in speech processing in noise. Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date to employ Monte Carlo simulations in threshold setting for voice activity as taught by Amehraye in the speech presence detection taught by Tashev in view of Tzirkel in view of Wang to provide a predictable result of obtaining statistical performance that balance false alarms and correct detections of speech/non-speech (Amehraye, Section 3.2, Page 4).
With respect to Claim 5, Amehraye further discloses:
The method of Claim 4, wherein the operations further include identifying an optimal energy threshold based on the energy threshold generated by the Monte Carlo simulation (optimized energy detector threshold is computed using the Monte Carlo simulation taking into account false alarms and correct detections, Sections 3.1-3.2, Pages 4-5).
With respect to Claim 6, Amehraye further discloses:
The method of Claim 5, wherein the operations further include tailoring the identified optimal energy threshold for an audio task (the identified optimal threshold is tailed for energy detection in a voice activity detector to balance false alarms and correct detections, Sections 3.1-3.2, Pages 4-5).
Claims 12-14 contain subject matter respectively similar to claims 4-6, and thus, are rejected under similar rationale.
Claims 8 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Tashev, et al. in view of Tzirkel-Hancock, et al. in view of Wang, et al. and further in view of Michel, et al. (U.S. PG Publication: 2018/0130482 A1).
With respect to Claim 8, Tashev in view of Tzirkel-Hancock in view of Wang teaches the adaptive beamforming process utilizing a blocking matrix and voice activity detection as applied to Claim 1. Although Tashev teaches various time to frequency domain transforms such as FFT (Paragraph 0034), Tashev in view of Tzirkel-Hancock in view of Wang does not specifically teach a short-time Fourier transform (STFT). Michel, however, discloses that a frequency representation of an input signal can be generated using an STFT (Paragraph 0063). Note that the teachings of Michel also include the use of a blocking matrix (Paragraphs 0014 and 0064).
Tashev, Tzirkel-Hancock, and Michel are analogous art because they are from a similar field of endeavor in beamforming utilizing blocking matrices. Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date to use the STFT taught by Michel as the transform in Tashev in view of Tzirkel-Hancock in view of Wang to provide a predictable result of yielding frequency representations for signals that vary with time (i.e., speech signals).
Claim 16 contains subject matter similar to claim 8, and thus, is rejected under similar rationale.
Claims 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Tashev, et al. in view of Tzirkel-Hancock, et al in view of Wang, et al. and further in view of Wang, et al. ("Noise Power Spectral Density Estimation Using MaxNSR Blocking Matrix," 2015; hereinafter Wang2).
With respect to Claim 18, Tashev in view of Tzirkel-Hancock in view of Wang teaches the adaptive beamforming process utilizing a blocking matrix and voice activity detection as applied to Claim 1. Tashev in view of Tzirkel-Hancock in view of Wang does not teach calculating a ratio of energy changes between a reference microphone signal and a maximum value of outputs of the blocking matrix. Wang2, however, discloses calculating a maximum noise output from a MaxNSR (maximum noise to signal ratio) blocking matrix used in a ratio (representing a ratio of differences changes) with a reference microphone to identify noise (Section IV.D., Page 1497; Section V.A, Pages 1498-1499).
Tashev, Tzirkel-Hancock, Wang and Wang2 are analogous art because they are from a similar field of endeavor in beamforming utilizing blocking matrices. Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date to use the MaxNSR blocking matrix taught by Wang2 as the blocking matrix utilized in Tashev in view of Tzirkel-Hancock in view of Wang to provide a predictable result of minimizing the influence of leakage of a target signal in a noise reference (Wang, Abstract).
With respect to Claim 19, Wang2 further discloses:
The speech enhancement system of Claim 18, wherein the operations further include generating the mask of the blocking matrix based on the ratio of energy changes (the blocking matrix is obtained based upon the estimated target noise relying on the ratio wherein the blocking matrix blocks suppresses "the target speech energy," Section V.A., Pages 1498-1499).
With respect to Claim 20, Tashev further discloses:
The speech enhancement system of Claim 19, wherein the operations further include identifying active bins of the mask and updating a relative transfer function (RTF) based on the identified active bins (active sound/speech source presence determination for specific frequency bins that is then used in generating a blocking matrix that leads to an adaptive beamformer using a generalized sidelobe canceller indicative of a noise signal relative to a beamformer output from a microphone array to output a signal with an enhanced signal-to-noise ratio, Paragraphs 0007, 0028, 0030, 0042-0043, 0049, 0051-0052, and 0054).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: Wang, et al. ("A speech enhancement system for automotive speech recognition with a hybrid voice activity detection method," 2018)- teaches an adaptive noise filter for a vehicle environment using voice activity detection based upon unblocked signals (see the blocking matrix input is not input into the VAD) and a blocking matrix based upon RTF estimation (see Fig. 1). While this reference teaches a significant portion of the claimed invention, only the VAD is a trained model (see Abstract).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAMES S WOZNIAK whose telephone number is (571)272-7632. The examiner can normally be reached 7-3, off alternate Fridays.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant may use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached at (571)272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
JAMES S. WOZNIAK
Primary Examiner
Art Unit 2655
/JAMES S WOZNIAK/Primary Examiner, Art Unit 2655