DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 01/12/26 has been entered.
Response to Arguments
Applicant’s arguments, see pages 6 - 12, filed 01/12/26, with respect to claims 1 - 20 have been fully considered and are persuasive. The rejection of claims 1 – 20 under 35 U.S.C 101 has been withdrawn.
Applicant argues that the combination of phase analysis, DOA tracking, motion-based EOS detection, and selective audio gating is far from generic data processing; rather, it is a carefully engineered, hardware-linked technique that improves the operation of audio-capture devices themselves. Courts consistently hold that such improvements to the functioning of a technological system constitute an inventive concept (Amendment, pages 6 – 12).
Applicant’s arguments, see pages 13 – 16, filed 01/12/26, with respect to the rejection of claims 1 – 20 under 35 U.S.C 102 have been fully considered and are persuasive. Therefore, the rejection has been withdrawn. However, upon further consideration, a new ground of rejection is made in view of Hyun et al. (US 2013/0238335).
Applicant argues that Hiroe does not teach causing, based on the difference between the first direction and the second direction, processing of the audio input without the second portion of the audio input; receiving, from the user device, based on a change in direction associated with the audio source, an end of speech indication; based on the end of speech indication, sending, to the user device, a response to a portion of the audio data received before the end of speech indication for output; determining, by a user device, based on first phase data associated with a first portion of an audio input and second phase data associated with a second portion of the audio input, a change in a position of an audio source associated with the audio input (Amendment, pages 13 – 16).
Claim Rejections - 35 USC § 103
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1 – 20 are rejected under 35 U.S.C. 103 as being unpatentable over Hiroe (US PAP 2012/0263315) in view of Hyun et al. (US 2013/0238335).
As per claim 1, Hiroe teaches a method comprising:
determining, by a user device, based on a first portion of an audio input, a first direction and a first phase associated with the first portion of the audio input (“signals observed with the different microphones and those observation signals are summed in condition where phases of the signals in a direction of a target sound are aligned, the target sound is emphasized because of aligned phase and sound from in other directions”; Abstract, paragraphs 50, 51);
determining, based on a second portion of the audio input, a second direction and a second phase associated with the second portion of the audio input (“signals observed with the different microphones and those observation signals are summed in condition where phases of the signals in a direction of a target sound are aligned, the target sound is emphasized because of aligned phase and sound from in other directions”; Abstract, paragraphs 50, 51);
determining, based on the first phase and the second phase, a difference between the first direction and the second direction (“providing the observation signals of the microphones with different time delays, aligning phases of the signals coming in the direction of the target sound, and summing the observation signals. In the results of the delay-and-sum array, the target sound is emphasized because of the aligned phase and the sounds coming in the other directions are attenuated because they are different in phase.”; paragraphs 50, 51, 341).
However, Hiroe does not specifically teach causing based on the difference between the first direction and the second direction, processing of the audio input without the second portion of the audio input.
Hyun et al. disclose that the endpoint determination unit may be configured to determine endpoints of the plurality of sound sources by use of the sound source maintenance time calculated by the sound source maintenance time calculating unit and the change in position of the sound source according to each direction determined by the sound source position change determination unit…Sound source signals from a plurality of sound sources may be received from a plurality of microphones. Positions of the plurality of sound sources may be detected from the received sound source signals. A change in an angle of the sound source may be monitored at a predetermined time interval by reading the positions of the plurality of sound sources detected…in an environment having various sound sources, the existence and the length of the sound source being input by each direction are recognized and thus the sound source may be detected and the endpoint may be found, thereby improving the performance of a post processing (the sound source separation, the noise cancellation, the speech characteristic extraction, and the speech recognition). In particular, speech being input from a direction other than a direction of speech from a speaker vocalized at a remote area from a sound source collecting unit is distinguished while the speech from the speaker is being recorded and thus the endpoint of the speech may be detected, thereby enabling a remote sound source recognition without restriction on the installation region of a microphone (paragraphs 13, 22, 30).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective date of the claimed invention to determine the endpoint of a sound signal and processing an audio input without the second portion of the audio input as taught by Hyun et al. in Hiroe, because that would help improve the performance of the post-processing (Abstract, paragraph 63).
As per claim 8, Hiroe teaches a method comprising:
receiving, from a user device associated with an audio source, audio data; processing the audio data; (“a sound source extraction unit receives the sound direction and the sound segment of the target sound and extracts a sound-signal of the target sound.”; Abstract, paragraph 50);
receiving, from the user device, based on a change in direction associated with the audio source (“the target sound is emphasized because of aligned phase and sound from in other directions are attenuated because they are shifted in phase respectively… A segment 602 is assumed to be a segment in which an interference sound is active before the target sound starts to be active. It is assumed that around the end of the segment 602 of the interference sound overlaps with the start of the segment 601 of the target sound time-wise and this overlapping region is denoted by an overlap region 611.”; paragraphs 50, 51, 34, 425, see also 375).
However, Hiroe does not specifically teach an end of speech indication; based on the end of speech indication, sending, to the user device a response to a portion of the audio data received before the end of speech indication output.
Hyun et al. disclose that the endpoint determination unit may be configured to determine endpoints of the plurality of sound sources by use of the sound source maintenance time calculated by the sound source maintenance time calculating unit and the change in position of the sound source according to each direction determined by the sound source position change determination unit…Sound source signals from a plurality of sound sources may be received from a plurality of microphones. Positions of the plurality of sound sources may be detected from the received sound source signals. A change in an angle of the sound source may be monitored at a predetermined time interval by reading the positions of the plurality of sound sources detected…in an environment having various sound sources, the existence and the length of the sound source being input by each direction are recognized and thus the sound source may be detected and the endpoint may be found, thereby improving the performance of a post processing (the sound source separation, the noise cancellation, the speech characteristic extraction, and the speech recognition). In particular, speech being input from a direction other than a direction of speech from a speaker vocalized at a remote area from a sound source collecting unit is distinguished while the speech from the speaker is being recorded and thus the endpoint of the speech may be detected, thereby enabling a remote sound source recognition without restriction on the installation region of a microphone (paragraphs 13, 22, 30).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective date of the claimed invention to determine the endpoint of a sound signal and processing an audio input without the second portion of the audio input as taught by Hyun et al. in Hiroe, because that would help improve the performance of the post-processing (Abstract, paragraph 63).
As per claim 15, Hiroe teaches a method comprising:
determining, by a user device, based on first phase data associated with a first portion of an audio input and second phase data associated with a second portion of the
audio input, a change in a position of an audio source associated with the audio input(“providing the observation signals of the microphones with different time delays, aligning phases of the signals coming in the direction of the target sound, and summing the observation signals. In the results of the delay-and-sum array, the target sound is emphasized because of the aligned phase and the sounds coming in the other directions are attenuated because they are different in phase.”; Abstract, paragraphs 50, 50, 341); and
However, Hiroe does not specifically teach based on the change in the position of the audio source, sending the first portion of the audio input and an indication that the first portion of the audio input comprises an end of speech.
Hyun et al. disclose that the endpoint determination unit may be configured to determine endpoints of the plurality of sound sources by use of the sound source maintenance time calculated by the sound source maintenance time calculating unit and the change in position of the sound source according to each direction determined by the sound source position change determination unit…Sound source signals from a plurality of sound sources may be received from a plurality of microphones. Positions of the plurality of sound sources may be detected from the received sound source signals. A change in an angle of the sound source may be monitored at a predetermined time interval by reading the positions of the plurality of sound sources detected…in an environment having various sound sources, the existence and the length of the sound source being input by each direction are recognized and thus the sound source may be detected and the endpoint may be found, thereby improving the performance of a post processing (the sound source separation, the noise cancellation, the speech characteristic extraction, and the speech recognition). In particular, speech being input from a direction other than a direction of speech from a speaker vocalized at a remote area from a sound source collecting unit is distinguished while the speech from the speaker is being recorded and thus the endpoint of the speech may be detected, thereby enabling a remote sound source recognition without restriction on the installation region of a microphone (paragraphs 13, 22, 30).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective date of the claimed invention to determine the endpoint of a sound signal and sending the first portion of the audio input as taught by Hyun et al. in Hiroe, because that would help improve the performance of the post-processing (Abstract, paragraph 63).
As per claims 2, 9, Hiroe in view of Hyun et al. further disclose the user device comprises a voice enabled device (“Some of the speech recognition devices have a sound segment detection function…a direction specifying device may be operated by the user to set the sound source direction .theta..”; Hiroe, paragraphs 380, 546).
As per claim 3, Hiroe in view of Hyun et al. further disclose sending, based on the difference between the first direction and the second direction, an end of speech indication, wherein the end of speech indication is configured to cause one or more of: an exclusion from processing or a termination of one or more audio processing functions (“the endpoint determination unit may be configured to determine endpoints of the plurality of sound sources by use of the sound source maintenance time calculated by the sound source maintenance time calculating unit and the change in position of the sound source according to each direction determined by the sound source position change determination unit”; Hyun et al., paragraphs 13, 22, 30).
As per claim 4, Hiroe in view of Hyun et al. further disclose causing, based on the end of speech indication, termination of one or more audio processing functions (“the endpoint determination unit may be configured to determine endpoints of the plurality of sound sources by use of the sound source maintenance time calculated by the sound source maintenance time calculating unit and the change in position of the sound source according to each direction determined by the sound source position change determination unit… in an environment having various sound sources, the existence and the length of the sound source being input by each direction are recognized and thus the sound source may be detected and the endpoint may be found, thereby improving the performance of a post processing (the sound source separation, the noise cancellation, the speech characteristic extraction, and the speech recognition)”; Hyun et al., paragraphs 13, 22, 30).
As per claim 5, Hiroe in view of Hyun et al. further disclose determining the second direction associated with the second portion of the audio input comprises determining a phase shift (“sound from in other directions are attenuated because they are shifted in phase”; Hiroe, paragraph 50).
As per claim 6, Hiroe in view of Hyun et al. further disclose phase shift comprises a phase difference determined between one or more microphones associated with the user device, the method further comprising determining the phase shift satisfies a phase shift threshold (“signals observed with the different microphones and those observation signals are summed in condition where phases of the signals in a direction of a target sound are aligned…a shift between the phase difference dot and the straight line, that is, a shift 32 shown in FIG. 3 is calculated, so that the larger this value is, the nearer the M(.omega., t) in Equation [2.2] is set to 0, inversely, the nearer the phase difference dot is to the straight line, the nearer the M(.omega., t) is set to 1.”; Hiroe paragraphs 50, 76, see also Hyun et al., paragraphs 65 – 76, 81).
As per claim 7, Hiroe in view of Hyun et al. further disclose causing processing of the audio input without the second portion of the audio input comprises not sending the second portion of the audio input (Hyun et al., paragraphs 13, 22, 30).
As per claim 10, Hiroe in view of Hyun et al. further disclose the audio data is associated with an audio input comprising a wake word received by the user device and wherein the audio data comprises one or more speech inputs (“Some of the speech recognition devices have a sound segment detection function. Further, the speech recognition device often has an STFT function to detect a speech feature, which function can be omitted on the side of the speech recognition side in the case of combining it with the present disclosure.”; Hiroe, paragraph 380, see also Hyun et al. paragraphs 6, 30).
As per claim 11, Hiroe in view of Hyun et al. further disclose processing the audio data comprises one or more of: speech recognition, speech to text transcription, determining one or more queries, determining one or more commands, executing one or more queries, or executing one or more commands (“Some of the speech recognition devices have a sound segment detection function…a direction specifying device may be operated by the user to set the sound source direction .theta..”; Hiroe, paragraphs 380, 546, see also Hyun et al. paragraphs 6, 30).
As per claim 12, Hiroe in view of Hyun et al. further disclose determining a phase shift associated with the audio data (“sound from in other directions are attenuated because they are shifted in phase”; paragraph 50).
As per claim 13, Hiroe in view of Hyun et al. further disclose sending, to the user device, based on the change in direction of the audio source, a change of direction indication (Hyun et al., paragraphs 13, 22, 30).
As per claim 14, Hiroe in view of Hyun et al. further disclose excluding audio data from processing or terminating one or more audio processing operations (“generate a sound signal composed of the target sound from which undesirable interference sounds are removed as much as possible”; Hiroe, paragraphs 50, 253, 341, see also Hyun et al., paragraphs 13, 22, 30).
As per claim 16, Hiroe in view of Hyun et al. further disclose the user device comprises a voice enabled device and wherein the first portion of the audio input comprises a wake word received by a user device(“Some of the speech recognition devices have a sound segment detection function…a direction specifying device may be operated by the user to set the sound source direction .theta..”; Hiroe, paragraphs 380, 546).
As per claim 17, Hiroe in view of Hyun et al. further disclose processing one or more of the first portion of the audio input or the second portion of the audio input (“a sound source extraction unit receives the sound direction and the sound segment of the target sound and extracts a sound-signal of the target sound.”; Hiroe, Abstract, paragraphs 50, 51, see also Hyun et al., paragraphs 13, 22, 30).
As per claim 18, Hiroe in view of Hyun et al. further disclose processing one or more of the first portion of the audio input or the second portion of the audio input comprises performing one or more of: natural language processing, natural language understanding, speech recognition, speech to text transcription, determining one or more queries, determining one or more commands, sending one or more responses, executing one or more queries, sending or receiving data, or executing one or more commands (“Some of the speech recognition devices have a sound segment detection function.”; Hiroe, paragraph 380, see also Hyun et al. paragraphs 6, 30).
As per claim 19, Hiroe in view of Hyun et al. further disclose the change in position of the audio source is associated with a phase shift of an audio input (“sound from in other directions are attenuated because they are shifted in phase”; Hiroe, paragraph 50).
As per claim 20, Hiroe in view of Hyun et al. further disclose receiving, by the user device, based on the indication that the first portion of the audio input comprises an end of speech, one or more of: a message indicating the second portion of the audio input has been excluded from processing a message indicating one or more processing operations have been terminated (“generate a sound signal composed of the target sound from which undesirable interference sounds are removed as much as possible… A segment 602 is assumed to be a segment in which an interference sound is active before the target sound starts to be active. It is assumed that around the end of the segment 602 of the interference sound overlaps with the start of the segment 601 of the target sound time-wise and this overlapping region is denoted by an overlap region 611.”; paragraphs 50, 253, 341, see also Hyun et al., paragraphs 13, 22, 30).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEONARD SAINT-CYR whose telephone number is (571)272-4247. The examiner can normally be reached Monday- Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on (571)272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/LEONARD SAINT-CYR/ Primary Examiner, Art Unit 2658