DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Applicant’s claim for the benefit of a prior-filed application under 35 U.S.C. 119(e) or under 35 U.S.C. 120, 121, or 365(c) is acknowledged.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 06/21/2024 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claims 1-2, 16 and 23 are rejected under 35 U.S.C. §102 (a)(1) as being anticipated by Miller et al (US PG Pub. 2022/0130375, referred to as Miller).
The features defined by independent claims 1-2, 16 and 23 include: (1) enhancing an audio signal from a target direction, (2) diminishing audio signals from other directions. Although the specification discloses a neural network implemented system, the recited limitations are related to basic functions of a beamforming technique. After performing an extensive search, the examiner discovered many prior art references. All these prior references meet broadly recited limitations of independent claims.
Miller is a patent publication by the same assignee (Google Inc.) of the instant application. Miller discloses when a user issuing voice commands / queries to a speech-enabled device (Miller, Fig. 1), the system with a speech-enabled device determines user’s location. The system enhances audio signals from a user’s direction and de-emphasizes other directions using beamforming processing techniques (Miller, [0038-0039], Fig. 6). Miller further discloses using deep learning neural network techniques to determine user’s location / direction (Miller, [0041], [0054]). Independent 16 and 23 are narrower than claim 1. In the following analysis, the examiner analyzes limitations recited in a narrower independent claim 16.
Regarding claims 1-2, 16 and 23, Miller discloses a method, a non-transitory computer readable medium and an apparatus (Miller, [0059], Fig. 1, a computer implemented system for processing user’s voice request by focusing on audio signals from a user’s direction), comprising:
identify an audio capture device and a target direction associated with the audio capture device (Miller, [0032-0035], a voice-enabled device is configured to detecting user’s direction when a user issues a voice request);
detect first audio associated with the target direction (Miller, [0041], [0054], using deep learning neural network model to determine user’s location and enhance audios from the determined a target user direction);
enhance the first audio using a machine learning model configured to detect audio associated with the target direction (Miller, [0038-0039], using a beamformer to emphasis audio signals coming from a user’s direction, de-emphasizing audio signals from other directions);
detect second audio associated with a direction different from the target direction (Miller, [0039], de-emphasizing noises from other directions); and
diminish the second audio using the machine learning model (Miller, [0039], [0054]).
Claims 1-10, 12, 16-20, and 23-24 are rejected under 35 U.S.C. §102 (a)(2) as being anticipated by Liu et al (US Pat. 12,300,261, referred to as Liu).
Liu discloses a deep neural network (DNN) implemented audio processing system (Fig. 9A) for enhancing audio signals coming from a target user’s direction and cancelling non-speech / noises from other directions (Abstract, Col. 15, lines 21-50, Fig. 8). Fig. 8 shows an illustration of beamforming obtained from a neural network implemented system in Fig. 9A. In the following analysis, the examiner analyzes limitations of independent claim 16. Other independent claims are broader than claim 16.
Regarding claims 1-2, 16 and 23, Liu discloses a method, a non-transitory computer readable medium and an apparatus (Liu, Col. 31, lines 10-25, Fig. 17, a computer implemented speech enhance system for enhancing speech signals from a desired direction and cancelling non-speech / noises from other directions), comprising:
identify an audio capture device and a target direction associated with the audio capture device (Liu, Col. 3, lines 54-67, enhancing speech signals from a desired direction of a user; Fig. 1 shows devices for recording user’s voice request);
detect first audio associated with the target direction (Liu, Col. 3, lines 63-67, Col. 7, lines 40-56; detecting target audio signals from a direction of a desired);
enhance the first audio using a machine learning model configured to detect audio associated with the target direction (Liu, Col. 4, lines 4-15; enhance audio signal from a target direction by using a neural network model);
detect second audio associated with a direction different from the target direction (Liu, Abstract, Col. 4, lines 102, Col. 15, lines 1-15, 62-65; Fig. 7 and Fig. 8, cancelling noises from other directions); and
diminish the second audio using the machine learning model (Liu, Abstract, Col. 8, lines 60-67, Col. 15, lines 1-15, 62-65; Fig. 7 and Fig. 8,).
Regarding claims 3, 17 and 24, Liu further discloses diminishing the second audio includes decreasing an amplitude of at least one sound wave associated with the second audio (Liu, Col. 3, lines 63-67, Col. 8, lines 60-67; Col. 10, lines 1-3, Fig. 8, non-speech / noises from other directions, side lobe directions, are cancelled and reduced).
Regarding claims 4-5 and 18, Liu further discloses wherein diminishing the second audio includes decreasing an amplitude of at least one sound wave associated with the second audio (Liu, Abstract, Col. 3, lines 62-67, Col. 5, lines 1-6, Fig. 9A, Neural Sidelobe canceller for cancelling noises of other directions); diminishing the second audio includes eliminating the second audio by removing the second audio from an output of the second machine learning model (Liu, Col. 10, lines 20-32, Fig. 9A, removing remaining noises after enhancing speech from target direction and cancelling noise from other directions; neural network implemented noise canceller).
Regarding claim 6, Liu further discloses the first machine learning model and the second machine learning model are a same machine learning model (Liu, Col. 3, lines 54-67, Fig. 9A, enhancing speech from a target direction and cancelling noises from other directions using a neural network model).
Regarding claim 7, Liu further discloses enhancing the first audio includes increasing an amplitude of at least one sound wave associated with the first audio (Liu, Col. 3, lines 25-54; Fig. 8, increase gain for signals from a desired direction).
Regarding claim 8, Liu further discloses enhancing the first audio includes de-reverbing the first audio by removing resonant frequencies from the first audio (Liu, Col. 4, lines 4-15, removing reverberation and other noises in all sidelobe directions; Fig. 5).
Regarding claim 9, Liu further discloses enhancing the first audio includes de-noising the first audio by filtering the first audio (Liu, Col. 6, lines 39-50, Col. 8, lines 3-36, Fig. 5, Fig. 7).
Regarding claims 10 and 19, Liu further discloses the target direction is associated with a focus region (Liu, Col. 3, lines 25-67, focusing on a desired direction of a user; Fig. 8, beamforming towards to the main lobe direction).
Regarding claims 12 and 20, Liu further discloses the enhancing of the first audio using the first machine learning model includes:
compressing the first audio using a first machine learning model (Liu, Fig. 9A, #930); and
decompressing the compressed audio using a second machine learning model (Liu, Fig. 9A, #990).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim 11 is rejected under 35 U.S.C. §103 as being unpatentable over Liu in view of Shah et al. (US PG Pub. 2019/0222691, referred to as Shah).
Liu discloses training a neural network model for enhancing audio signals from a desired direction of a user and suppressing noises of other directions (Liu, Abstract, Col. 3, lines 54-67, Fig. 8, Fig. 9A). Liu does not explicitly mentions using “using an impulse response dataset”.
Shah discloses an echo cancellation system by using impulse response dataset as training data (Shah, [0032]).
Both Liu and Shan are dealing with enhancing audio and removing noise. It would have been obvious to a person having ordinary skill in the art at the time the invention was filed to combine Liu’s teaching with Shah’s teaching to use impulse response dataset as training data. One having ordinary skill in the art would have been motivated to make such a modification to improve noise / echo cancellation performance (Shan, [0024], [0026]).
Allowable Subject Matter
Claims 13-15 and 21-22 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:
Claims 13-15 and 21-22 recite detailed limitations related to training a neural network-based system for enhancing audio signals for a target direction. In particular, these claims include steps:
“convolving the first training data with a first subset of the impulse response dataset as a first convolved audio, the first subset of the impulse response dataset being associated with the target direction;
convolving the second training data with a second subset of the impulse response dataset as a second convolved audio; and
training the neural network model based on the first convolved audio and the second convolved audio.”
Several prior art references of the record disclose a neural network-based techniques for enhancing audio signals from a target user direction and cancelling non-speech from other directions. The prior art references fail to disclose two convolving steps in the above underlined limitations. When considering all limitations as a whole, prior art references of the record fail to anticipate or render obvious the claimed invention.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The examiner discovered several relevant prior art references that are related to one or more concepts disclosed by the instant application. These references are included in the attached PTO-892 form for completeness of the record.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jialong He, whose telephone number is (571) 270-5359. The examiner can normally be reached on Monday – Friday, 8:00AM – 4:30PM, EST.
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Pierre Desir can be reached on (571) 272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/JIALONG HE/Primary Examiner, Art Unit 2659