DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
1. In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
2. Claim(s) 1, 4, 5, and 14-16 are rejected under 35 U.S.C. 103 as being unpatentable over Shin (U.S. Pub. No. 2024/0304186 A1) in view of Al Majid et al. (U.S. Pub. No,. 2020/0412864 A1, hereinafter "Al Majid"), and further in view of Laroche et al. (U.S. Pub. No. 20230206936 A1, hereinafter "Laroche").
Regarding Claim 1, Shin teaches a system for blending audio signals (system 100 for blending audio signals, Fig. 1A, Para. [0036]), the system comprising:
a digital signal processing (DSP) circuit (server device 11 includes an audio signal processing engine 113, Fig. 1A, Para. [0048]) configured to:
extract, by executing a trained machine learning model, a plurality of first audio parameters associated with a plurality of first audio blocks of a first audio signal and a plurality of second audio parameters associated with a plurality of second audio blocks of a second audio signal, wherein the system is configured to receive the first audio signal and the second audio signal (first, second, and nth raw audio signals can be divided into first, second, and nth group of audio frames [201_a, 201_b, 201_c; 202_a, 202_b, 202_c; and 20N_a, 20N_b, 20N_c], Figs. 2A, 2B, and 2C, Para. [0059]; the first groups of audio frames (201_a, 202_a, . . . , 20N_a) can be processed for feature extraction to generate a spectrogram 1_1, a spectrogram 2_1, . . . , and a spectrogram N_1. The spectrogram 1_1, the spectrogram 2_1, . . . , and the spectrogram N_1 can be processed using a trained neural network 203 to generate a SNR output 1_1 indicating a SNR for the first group of audio frames 201_a, a SNR output 2_1 indicating a SNR for the first group of audio frames 202_a, . . . , and a SNR output N_1 indicating a SNR for the first group of audio frames 202_N, Fig. 2A, Para. [0061]; the second groups of audio frames (201_b, 202_b, . . . , 20N_b) can be processed for feature extraction to generate a spectrogram 1_2, a spectrogram 2_2, . . . , and a spectrogram N_2. The spectrogram 1_2, the spectrogram 2_2, . . . , and the spectrogram N_2 can be processed using the trained neural network 203 to generate a SNR output 1_2 indicating a SNR for the second group of audio frames 201_b, a SNR output 2_2 indicating a SNR for the second group of audio frames 202_b, . . . , and a SNR output N_2 indicating a SNR for the second group of audio frames 20N_b, Fig. 2B, Para. [0063]; the third groups of audio frames (201_c, 202_c, . . . , 20N_c) can be processed for feature extraction to generate a spectrogram 1_3, a spectrogram 2_3, . . . , and a spectrogram N_3. The spectrogram 1_3, the spectrogram 2_3, . . . , and the spectrogram N_3 can be processed using the trained neural network 203 to generate a SNR output 1_3 indicating a SNR for the third group of audio frames 201_c, a SNR output 2_3 indicating a SNR for the third group of audio frames 202_c, . . . , and a SNR output N_3 indicating a SNR for the third group of audio frames 20N_c, Fig. 2C, Para. [0064]);
process, by further executing the trained machine learning model, the plurality of first audio parameters and the plurality of second audio parameters to generate a plurality of first audio scores and a plurality of second audio scores, respectively, wherein each audio score of the plurality of first audio scores and the plurality of second audio scores is indicative of an audio of a corresponding audio block of the plurality of first audio blocks and the plurality of second audio blocks, respectively (the plurality of first, second, and nth audio parameters [i.e. the SNR outputs] are processed by the trained neural network to generate a plurality of first, second, and nth audio scores [weight values] corresponding to the audio block of first, second, and nth group of audio frames [201_a, 201_b, 201_c; 202_a, 202_b, 202_c; and 20N_a, 20N_b, 20N_c], Figs. 1A, 2A, 2B, and 2C, Paras. [0053], [0061], [0063], and [0064]);
output, upon analyzing the plurality of first audio scores and the plurality of second audio scores, a plurality of blended blocks, wherein each of the plurality of blended blocks is at least one of a first audio block of the plurality of first audio blocks and a second audio block of the plurality of second audio blocks (the plurality of audio block of first, second, and nth group of audio frames [201_a, 201_b, 201_c; 202_a, 202_b, 202_c; and 20N_a, 20N_b, 20N_c], can be combined based on the audio scores [weight values] to output blended audio M1, M2, M3, Figs. 2A, 2B, and 2C, Paras. [0061]-[0064]).
Shin fails to explicitly teach process, by further executing the trained machine learning model, the plurality of first audio parameters and the plurality of second audio parameters to generate a plurality of first audio quality scores and a plurality of second audio quality scores, respectively, wherein each audio quality score of the plurality of first audio quality scores and the plurality of second audio quality scores is indicative of an audio quality of a corresponding audio block of the plurality of first audio blocks and the plurality of second audio blocks, respectively;
analyze the plurality of first audio quality scores and the plurality of second audio quality scores; and
output, upon analyzing the plurality of first audio quality scores and the plurality of second audio quality scores, a plurality of blended blocks, wherein each of the plurality of blended blocks is at least one of a first audio block of the plurality of first audio blocks and a second audio block of the plurality of second audio blocks.
However, Al Majid teaches process, by further executing the trained machine learning model, the plurality of first audio parameters and the plurality of second audio parameters to generate a plurality of first audio quality scores and a plurality of second audio quality scores, respectively, wherein each audio quality score of the plurality of first audio quality scores and the plurality of second audio quality scores is indicative of an audio quality of a corresponding audio block of the plurality of first audio blocks and the plurality of second audio blocks, respectively (audio stream is analyzed by processing the audio stream as segments and for the segments performing an audio quality assessment on an audio signal of the segment to generate audio quality score. Each segment is analyzed using a quality detection machine learning model to generate quality vector for the segment. The quality vector score indicates a score for audio quality. Segments of the plurality of segments of the audio stream is analyzed using a feature-extraction machine learning model to generate a feature vector for the segments and the quality detection machine learning model is selected based on the feature vector, Para. [0088]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the audio signal blending system (as taught by Shin) to include the processing the plurality of first and second audio parameters by a trained machine learning model to generate audio quality scores (as taught by Al Majid). Doing so enhances efficiency and improves output sound quality.
However, Laroche teaches analyze the plurality of first audio quality scores and the plurality of second audio quality scores (processor 20 is configured to determine whether the mean opinion score associated with audio signal 52 (first quality parameter 42) and the mean opinion score associated with audio signal 62 (second quality parameter 42A) is above a threshold value, Para. [0136).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the audio signal blending system (as taught by Shin in view of Al Majid) to include analyzing the plurality of first and second audio quality scores which is used to out the blended blocks (as taught by Laroche). Doing so improves output sound quality (Laroche Para. [0136]).
Regarding Claim 4, Shin in view of Al Majid, and further in view of Laroche teach wherein the DSP circuit is further configured to execute a time to frequency domain operation, on the plurality of first audio blocks and the plurality of second audio blocks to generate a plurality of third audio blocks and a plurality of fourth audio blocks, respectively, and wherein the plurality of third audio blocks and the plurality of fourth audio blocks in the frequency domain are provided to the trained machine learning model to extract the plurality of first audio parameters and the plurality of second audio parameters, respectively (Shin, first, second, and nth raw audio signals can be divided into first, second, and nth group of audio frames [201_a, 201_b, 201_c; 202_a, 202_b, 202_c; and 20N_a, 20N_b, 20N_c], Figs. 2A, 2B, and 2C, Para. [0059]; the first groups of audio frames (201_a, 202_a, . . . , 20N_a) can be processed for feature extraction to generate a spectrogram 1_1, a spectrogram 2_1, . . . , and a spectrogram N_1 [time to frequency domain operation]. The spectrogram 1_1, the spectrogram 2_1, . . . , and the spectrogram N_1 can be processed using a trained neural network 203 to generate a SNR output 1_1 indicating a SNR for the first group of audio frames 201_a, a SNR output 2_1 indicating a SNR for the first group of audio frames 202_a, . . . , and a SNR output N_1 indicating a SNR for the first group of audio frames 202_N, Fig. 2A, Para. [0061]; the second groups of audio frames (201_b, 202_b, . . . , 20N_b) can be processed for feature extraction to generate a spectrogram 1_2, a spectrogram 2_2, . . . , and a spectrogram N_2. The spectrogram 1_2, the spectrogram 2_2, . . . , and the spectrogram N_2 can be processed using the trained neural network 203, Fig. 2B, Para. [0063]; the third groups of audio frames (201_c, 202_c, . . . , 20N_c) can be processed for feature extraction to generate a spectrogram 1_3, a spectrogram 2_3, . . . , and a spectrogram N_3. The spectrogram 1_3, the spectrogram 2_3, . . . , and the spectrogram N_3 can be processed using the trained neural network 203, Fig. 2C, Para. [0064]).
Regarding Claim 5, Shin in view of Al Majid, and further in view of Laroche teach further comprising a first receiver and a second receiver (Shin, receivers 1, 2 and N, Fig. 1, Paras. [0036] and [0046]) that are configured to:
receive the first audio signal and the second audio signal from an audio source (Shin, receivers 1, 2, and N receive audio signals from source R, Fig. 1, Para. [0046]); and
convert, each of the first audio signal and the second audio signal to a digitized version of each of the first audio signal and the second audio signal, respectively (Shin, first, second, and nth raw audio signals can be divided into first, second, and nth group of audio frames [201_a, 201_b, 201_c; 202_a, 202_b, 202_c; and 20N_a, 20N_b, 20N_c], and are converted to digitized versions of each first and second audio signal, Figs. 2A, 2B, and 2C, Paras. [0059], [0061], [0063], and [0064]).
Regarding Claim 15, Shin in view of Al Majid, and further in view of Laroche teach wherein each of the plurality of first audio parameters and the plurality of second audio parameters include a group consisting of a spectral centroid, a spectral flux, and a noise floor of each of the plurality of first audio blocks and the plurality of second audio blocks, respectively (Shin, the first groups of audio frames can be processed for feature extraction to generate a spectrograms which can be processed using a trained neural network 203 to generate a SNR outputs, Figs. 2A-C, Para. [0061], [0063], and [0064]).
Regarding Claim 15, Shin in view of Al Majid, and further in view of Laroche teach wherein data associated with the first audio signal and data associated with the second audio signal are identical in nature (Shin, receivers 1, 2, and N receive audio signals that are identical in nature, Fig. 1, Paras. [0046] and [0047]).
Regarding Claim 16, it is similarly rejected as Claim 1. The method can be found in Shin (Fig. 3, Paras. [0065]-[0075]).
3. Claim(s) 2 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Shin (U.S. Pub. No. 2024/0304186 A1) in view of Al Majid et al. (U.S. Pub. No,. 2020/0412864 A1, hereinafter "Al Majid"), and further in view of Laroche et al. (U.S. Pub. No. 20230206936 A1, hereinafter "Laroche").
Regarding Claim 2, Shin in view of Al Majid, and further in view of Laroche teach wherein the DSP circuit (Shin, server device 11 includes an audio signal processing engine 113. Server 11 include neural network engine 115 which uses trained neural network 1151, Fig. 1A, Para. [0048]) is further configured to train a first machine learning model based on training data to obtain the trained machine learning model (Laroche, method 200 for training a quality detection model for audio quality estimation, Fig. 3, Para. [0144]), wherein the training data comprises a plurality of test audio recordings (Laroche, obtaining S202 an audio dataset comprising one or more audio signal, Fig. 3, Para. [0146]) and a plurality of quality scores such that the plurality of quality scores include a first quality score of a first test audio recording of the plurality of test audio recordings (Laroche, obtaining S204 a score dataset comprising one or more reference quality parameters including a first reference quality parameter indicative of audio quality associated with the one or more audio signals, Fig. 3, Para. [0146]), wherein each of the plurality of quality scores is indicative of an audio quality of a corresponding test audio recording (Laroche, obtaining S204 a score dataset comprising one or more reference quality parameters including a first reference quality parameter indicative of audio quality associated with the one or more audio signals, Fig. 3, Para. [0146]).
Shin in view of Al Majid, and further in view of Laroche fail to explicitly teach wherein a low score indicates a low quality of a test audio recording of the plurality of test audio recordings and a high score indicates a high quality of the test audio recording.
However, Senior teaches wherein a low score indicates a low quality of a test audio recording of the plurality of test audio recordings and a high score indicates a high quality of the test audio recording (training samples presented to the neural network samples with higher quality values and samples with lower quality values, Col. 4, Lns. 15-23).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the audio signal blending system (as taught by Shin in view of Al Majid, and further in view of Laroche) to include low score indicated low quality of test audio and high score indicating high quality of test audio (as taught by Senior). Doing so improves the training speed of the machine learning model (Senior, Col. 4, Lns. 24-28).
Regarding Claim 17, it is similarly rejected as Claim 2. The method can be found in Shin (Fig. 3, Paras. [0065]-[0075]).
4. Claim(s) 3 is rejected under 35 U.S.C. 103 as being unpatentable over Shin (U.S. Pub. No. 2024/0304186 A1) in view of Al Majid et al. (U.S. Pub. No,. 2020/0412864 A1, hereinafter "Al Majid") in view of Laroche et al. (U.S. Pub. No. 20230206936 A1, hereinafter "Laroche") in view of Senior et al. (U.S. Pat. No. 9,202,464 B1, hereinafter "Senior"), and further in view of Yan et al. (Chinese Pub. No. CN 117409819 A, hereinafter "Yan").
Regarding Claim 3, Shin in view of Al Majid in view of Laroche, and further in view of Senior teach wherein to train the first machine learning model, the DSP circuit is further configured to:
extract a first plurality of training parameters of the first test audio recording (Laroche, system 500 for an audio dataset and a score dataset generation to train a quality detection model, Fig. 4, Para. [0158]; system 500 comprises one or more speech quality metric modules 560, 562, 564. Quality metric module 560 is configured to receive a noisy signal from the noisy dataset 540 and a clean audio signal 560 to generate a quality parameter e.g., mean opinion score, MOS, Fig. 4, Para. [0160]);
determine, by way of a test scoring operation, a first test score based on processing of the first plurality of training parameters (Laroche, system 500 comprises a MOS module 570 to generate a score dataset 571 based on the quality parameters associated with the one or more noisy signals of the noisy dataset 540, Fig. 4, Para. [0160]);
and wherein the trained machine learning model generates the plurality of first audio quality scores and the plurality of second audio quality scores based on the training of the first machine learning model (Al Majid, the trained machine learning model generates the first and second audio quality scores, Paras. [0084] and [0088]).
Shin in view of Al Majid in view of Laroche, and further in view of Senior fail to explicitly teach compare the first test score with the first quality score to determine a match between the first test score and the first quality score, wherein the match between the first test score and the first quality score indicates to the first machine learning model that the determination of the first test score by way of the test scoring operation is accurate, and a mismatch between the first test score and the first quality score indicates to the first machine learning model that the determination of the first test score by way of the test scoring operation is erroneous; and
update the test scoring operation until the match is determined between the first test score and the first quality score, wherein the first machine learning model is trained based on the match between the first test score and the first quality score.
Yan teaches compare the first test score with the first quality score to determine a match between the first test score and the first quality score, wherein the match between the first test score and the first quality score indicates to the first machine learning model that the determination of the first test score by way of the test scoring operation is accurate, and a mismatch between the first test score and the first quality score indicates to the first machine learning model that the determination of the first test score by way of the test scoring operation is erroneous (S1. Collection and preprocessing of human voice data, Para. [0010]; S2. Divide the data from step S1 into training dataset, validation dataset, and test dataset, Para. [0011]; S3. Construct a composite feedforward neural network model based on the training dataset and its labeling information from step S2. The output of this neural network model is the output feature quantity of human voice for different labeling information, Para. [0012]; S4. Compare the validation dataset with the output of the composite feedforward neural network model in step S3 to determine if they match, Para. [0013]); and
update the test scoring operation until the match is determined between the first test score and the first quality score, wherein the first machine learning model is trained based on the match between the first test score and the first quality score (S4. continuously update and expand the data in the training dataset to improve the composite feedforward neural network model, Para. [0013]; S5. Use the test dataset as input parameters for the composite feedforward neural network model obtained in step S4, Para. [0014]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the audio signal blending system (as taught by Shin in view of Al Majid in view of Laroche, and further in view of Senior) to include comparing the first test score with the first quality score and updating the test scoring operation (as taught by Yan). Doing so improves accuracy of the model predicting audio quality scores.
5. Claim(s) 6 and 7 are rejected under 35 U.S.C. 103 as being unpatentable over Shin (U.S. Pub. No. 2024/0304186 A1) in view of Al Majid et al. (U.S. Pub. No,. 2020/0412864 A1, hereinafter "Al Majid") in view of Laroche et al. (U.S. Pub. No. 20230206936 A1, hereinafter "Laroche"), and further in view of Lin (Chinese Pub. No. CN 101233560 B).
Regarding Claim 6, Shin in view of Al Majid, and further in view of Laroche fail to explicitly teach further comprising a first buffer and a second buffer coupled to the first receiver and the second receiver, respectively, wherein the first buffer and the second buffer are configured to:
receive the digitized version of each of the first audio signal and the second audio signal, from the first receiver and the second receiver, respectively; and
store the digitized version of each of the first audio signal and the second audio signal, as the plurality of first audio blocks and the plurality of second audio blocks, respectively.
However, Lin teaches further comprising a first buffer (first buffer 3, Fig. 1, Para. [0061]) and a second buffer (second buffer 5, Fig. 1, Para. [0061]) coupled to the first receiver (first receiver 1, Fig. 1, Para. [0061]) and the second receiver (second receiver 2, Fig. 1, Para. [0061]), respectively, wherein the first buffer and the second buffer are configured to:
receive the digitized version of each of the first audio signal and the second audio signal, from the first receiver and the second receiver, respectively (buffer 3 receives audio data packets of first audio signal from first receiver 1 and buffer 5 receives digital audio from second receiver 2, Fig. 2, Para. [0061]); and
store the digitized version of each of the first audio signal and the second audio signal, as the plurality of first audio blocks and the plurality of second audio blocks, respectively (buffer 3 and 5 store first audio signal and second audio signal as audio blocks, Para. [0061]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the audio signal blending system (as taught by Shin in view of Al Majid, and further in view of Laroche) to include first and second buffer coupled to first and second receiver and storing digital audio blocks (as taught by Lin). Doing so provides the DSP the ability to analyze both signal to choose or combine to improve the signal-to-noise ratio.
Regarding Claim 7, Shin in view of Al Majid in view of Laroche, and further in view of Lin teach wherein the DSP circuit is further configured to read the plurality of first audio blocks and the plurality of second audio blocks from the first buffer and the second buffer, respectively (Lin, first and second audio blocks outputs of buffers 3 and 5 are provided to a DSP 6, Paras. [0061] and [0062]) wherein the plurality of first audio parameters and the plurality of second audio parameters are extracted upon reading the plurality of first audio blocks and the plurality of second audio blocks, respectively (Shin, the first groups of audio frames can be processed for feature extraction to generate a spectrograms which can be processed using a trained neural network 203 to generate a SNR outputs, Figs. 2A-C, Para. [0061], [0063], and [0064]).
Allowable Subject Matter
6. Claim 8-13 and 18-20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion
7. Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHIMEZIE E BEKEE whose telephone number is (571)272-0202. The examiner can normally be reached M-F 7.30-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Duc Nguyen can be reached at 571-272-7503. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/CHIMEZIE EZERIWE BEKEE/Examiner, Art Unit 2691
/DUC NGUYEN/Supervisory Patent Examiner, Art Unit 2691