Last updated: April 19, 2026

Application No. 18/159,316

METHODS AND SYSTEMS FOR VOICE CONTROL

Non-Final OA §103

Filed

Jan 25, 2023

Examiner

SAINT CYR, LEONARD

Art Unit

2658

Tech Center

2600 — Communications

Assignee

Comcast Cable Communications LLC

OA Round

3 (Non-Final)

Interview Optional

— +18.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 1144 resolved cases, 2023–2026

Examiner Intelligence

SAINT CYR, LEONARD View full profile →

Grants 77% — above average

Career Allow Rate

882 granted / 1144 resolved

+15.1% vs TC avg

Strong +18% interview lift

Without

With

+18.2%

Interview Lift

resolved cases with interview

Typical timeline

3y 1m

Avg Prosecution

32 currently pending

Career history

1176

Total Applications

across all art units

Statute-Specific Performance

§101

17.8%

-22.2% vs TC avg

§103

39.1%

-0.9% vs TC avg

§102

28.0%

-12.0% vs TC avg

§112

2.2%

-37.8% vs TC avg

Black line = Tech Center average estimate • Based on career data from 1144 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 01/12/26 has been entered.
 
Response to Arguments
Applicant’s arguments, see pages 6 - 12, filed 01/12/26, with respect to claims 1 - 20 have been fully considered and are persuasive.  The rejection of claims 1 – 20 under 35 U.S.C 101 has been withdrawn. 
Applicant argues that the combination of phase analysis, DOA tracking, motion-based EOS detection, and selective audio gating is far from generic data processing; rather, it is a carefully engineered, hardware-linked technique that improves the operation of audio-capture devices themselves. Courts consistently hold that such improvements to the functioning of a technological system constitute an inventive concept (Amendment, pages 6 – 12).
Applicant’s arguments, see pages 13 – 16, filed 01/12/26, with respect to the rejection of claims 1 – 20 under 35 U.S.C 102 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground of rejection is made in view of Hyun et al. (US 2013/0238335).
Applicant argues that Hiroe does not teach causing, based on the difference between the first direction and the second direction, processing of the audio input without the second portion of the audio input; receiving, from the user device, based on a change in direction associated with the audio source, an end of speech indication; based on the end of speech indication, sending, to the user device, a response to a portion of the audio data received before the end of speech indication for  output; determining, by a user device, based on first phase data associated with a first portion of an audio input and second phase data associated with a second portion of the audio input, a change in a position of an audio source associated with the audio input (Amendment, pages 13 – 16).

Claim Rejections - 35 USC § 103
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1 – 20 are rejected under 35 U.S.C. 103 as being unpatentable over Hiroe (US PAP 2012/0263315) in view of Hyun et al. (US 2013/0238335).
As per claim 1, Hiroe teaches a method comprising:
determining, by a user device, based on a first portion of an audio input, a first direction and a first phase associated with the first portion of the audio input (“signals observed with the different microphones and those observation signals are summed in condition where phases of the signals in a direction of a target sound are aligned, the target sound is emphasized because of aligned phase and sound from in other directions”; Abstract, paragraphs 50, 51);
determining, based on a second portion of the audio input, a second direction and a second phase associated with the second portion of the audio input (“signals observed with the different microphones and those observation signals are summed in condition where phases of the signals in a direction of a target sound are aligned, the target sound is emphasized because of aligned phase and sound from in other directions”; Abstract, paragraphs 50, 51); 
determining, based on the first phase and the second phase, a difference between the first direction and the second direction (“providing the observation signals of the microphones with different time delays, aligning phases of the signals coming in the direction of the target sound, and summing the observation signals. In the results of the delay-and-sum array, the target sound is emphasized because of the aligned phase and the sounds coming in the other directions are attenuated because they are different in phase.”; paragraphs 50, 51, 341).
However, Hiroe does not specifically teach causing based on the difference between the first direction and the second direction, processing of the audio input without the second portion of the audio input.
Hyun et al. disclose that the endpoint determination unit may be configured to determine endpoints of the plurality of sound sources by use of the sound source maintenance time calculated by the sound source maintenance time calculating unit and the change in position of the sound source according to each direction determined by the sound source position change determination unit…Sound source signals from a plurality of sound sources may be received from a plurality of microphones. Positions of the plurality of sound sources may be detected from the received sound source signals. A change in an angle of the sound source may be monitored at a predetermined time interval by reading the positions of the plurality of sound sources detected…in an environment having various sound sources, the existence and the length of the sound source being input by each direction are recognized and thus the sound source may be detected and the endpoint may be found, thereby improving the performance of a post processing (the sound source separation, the noise cancellation, the speech characteristic extraction, and the speech recognition). In particular, speech being input from a direction other than a direction of speech from a speaker vocalized at a remote area from a sound source collecting unit is distinguished while the speech from the speaker is being recorded and thus the endpoint of the speech may be detected, thereby enabling a remote sound source recognition without restriction on the installation region of a microphone (paragraphs 13, 22, 30).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective date of the claimed invention to determine the endpoint of a sound signal and processing an audio input without the second portion of the audio input as taught by Hyun et al. in Hiroe, because that would help improve the performance of the post-processing (Abstract, paragraph 63).

As per claim 8, Hiroe teaches a method comprising:
receiving, from a user device associated with an audio source, audio data; processing the audio data;  (“a sound source extraction unit receives the sound direction and the sound segment of the target sound and extracts a sound-signal of the target sound.”; Abstract, paragraph 50);
receiving, from the user device, based on a change in direction associated with the audio source (“the target sound is emphasized because of aligned phase and sound from in other directions are attenuated because they are shifted in phase respectively… A segment 602 is assumed to be a segment in which an interference sound is active before the target sound starts to be active. It is assumed that around the end of the segment 602 of the interference sound overlaps with the start of the segment 601 of the target sound time-wise and this overlapping region is denoted by an overlap region 611.”; paragraphs 50, 51, 34, 425, see also 375). 
However, Hiroe does not specifically teach an end of speech indication; based on the end of speech indication, sending, to the user device a response to a portion of the audio data received before the end of speech indication output.
Hyun et al. disclose that the endpoint determination unit may be configured to determine endpoints of the plurality of sound sources by use of the sound source maintenance time calculated by the sound source maintenance time calculating unit and the change in position of the sound source according to each direction determined by the sound source position change determination unit…Sound source signals from a plurality of sound sources may be received from a plurality of microphones. Positions of the plurality of sound sources may be detected from the received sound source signals. A change in an angle of the sound source may be monitored at a predetermined time interval by reading the positions of the plurality of sound sources detected…in an environment having various sound sources, the existence and the length of the sound source being input by each direction are recognized and thus the sound source may be detected and the endpoint may be found, thereby improving the performance of a post processing (the sound source separation, the noise cancellation, the speech characteristic extraction, and the speech recognition). In particular, speech being input from a direction other than a direction of speech from a speaker vocalized at a remote area from a sound source collecting unit is distinguished while the speech from the speaker is being recorded and thus the endpoint of the speech may be detected, thereby enabling a remote sound source recognition without restriction on the installation region of a microphone (paragraphs 13, 22, 30).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective date of the claimed invention to determine the endpoint of a sound signal and processing an audio input without the second portion of the audio input as taught by Hyun et al. in Hiroe, because that would help improve the performance of the post-processing (Abstract, paragraph 63).

As per claim 15, Hiroe teaches a method comprising: 
determining, by a user device, based on first phase data associated with a first portion of an audio input and second phase data associated with a second portion of the
audio input, a change in a position of an audio source associated with the audio input(“providing the observation signals of the microphones with different time delays, aligning phases of the signals coming in the direction of the target sound, and summing the observation signals. In the results of the delay-and-sum array, the target sound is emphasized because of the aligned phase and the sounds coming in the other directions are attenuated because they are different in phase.”; Abstract, paragraphs 50, 50, 341); and 
However, Hiroe does not specifically teach based on the change in the position of the audio source, sending the first portion of the audio input and an indication that the first portion of the audio input comprises an end of speech.
Hyun et al. disclose that the endpoint determination unit may be configured to determine endpoints of the plurality of sound sources by use of the sound source maintenance time calculated by the sound source maintenance time calculating unit and the change in position of the sound source according to each direction determined by the sound source position change determination unit…Sound source signals from a plurality of sound sources may be received from a plurality of microphones. Positions of the plurality of sound sources may be detected from the received sound source signals. A change in an angle of the sound source may be monitored at a predetermined time interval by reading the positions of the plurality of sound sources detected…in an environment having various sound sources, the existence and the length of the sound source being input by each direction are recognized and thus the sound source may be detected and the endpoint may be found, thereby improving the performance of a post processing (the sound source separation, the noise cancellation, the speech characteristic extraction, and the speech recognition). In particular, speech being input from a direction other than a direction of speech from a speaker vocalized at a remote area from a sound source collecting unit is distinguished while the speech from the speaker is being recorded and thus the endpoint of the speech may be detected, thereby enabling a remote sound source recognition without restriction on the installation region of a microphone (paragraphs 13, 22, 30).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective date of the claimed invention to determine the endpoint of a sound signal and sending the first portion of the audio input as taught by Hyun et al. in Hiroe, because that would help improve the performance of the post-processing (Abstract, paragraph 63).

As per claims 2, 9, Hiroe in view of Hyun et al. further disclose the user device comprises a voice enabled device (“Some of the speech recognition devices have a sound segment detection function…a direction specifying device may be operated by the user to set the sound source direction .theta..”; Hiroe, paragraphs 380, 546).

As per claim 3, Hiroe in view of Hyun et al. further disclose sending, based on the difference between the first direction and the second direction, an end of speech indication, wherein the end of speech indication is configured to cause one or more of: an exclusion from processing or a termination of one or more audio processing functions (“the endpoint determination unit may be configured to determine endpoints of the plurality of sound sources by use of the sound source maintenance time calculated by the sound source maintenance time calculating unit and the change in position of the sound source according to each direction determined by the sound source position change determination unit”; Hyun et al., paragraphs 13, 22, 30).

As per claim 4, Hiroe in view of Hyun et al. further disclose causing, based on the end of speech indication, termination of one or more audio processing functions (“the endpoint determination unit may be configured to determine endpoints of the plurality of sound sources by use of the sound source maintenance time calculated by the sound source maintenance time calculating unit and the change in position of the sound source according to each direction determined by the sound source position change determination unit… in an environment having various sound sources, the existence and the length of the sound source being input by each direction are recognized and thus the sound source may be detected and the endpoint may be found, thereby improving the performance of a post processing (the sound source separation, the noise cancellation, the speech characteristic extraction, and the speech recognition)”; Hyun et al., paragraphs 13, 22, 30).

As per claim 5, Hiroe in view of Hyun et al. further disclose determining the second direction associated with the second portion of the audio input comprises determining a phase shift (“sound from in other directions are attenuated because they are shifted in phase”; Hiroe, paragraph 50).

As per claim 6, Hiroe in view of Hyun et al. further disclose phase shift comprises a phase difference determined between one or more microphones associated with the user device, the method further comprising determining the phase shift satisfies a phase shift threshold (“signals observed with the different microphones and those observation signals are summed in condition where phases of the signals in a direction of a target sound are aligned…a shift between the phase difference dot and the straight line, that is, a shift 32 shown in FIG. 3 is calculated, so that the larger this value is, the nearer the M(.omega., t) in Equation [2.2] is set to 0, inversely, the nearer the phase difference dot is to the straight line, the nearer the M(.omega., t) is set to 1.”; Hiroe paragraphs 50, 76, see also Hyun et al., paragraphs 65 – 76, 81).

As per claim 7, Hiroe in view of Hyun et al. further disclose causing processing of the audio input without the second portion of the audio input comprises not sending the second portion of the audio input (Hyun et al., paragraphs 13, 22, 30).

As per claim 10, Hiroe in view of Hyun et al. further disclose the audio data is associated with an audio input comprising a wake word received by the user device and wherein the audio data comprises one or more speech inputs (“Some of the speech recognition devices have a sound segment detection function.  Further, the speech recognition device often has an STFT function to detect a speech feature, which function can be omitted on the side of the speech recognition side in the case of combining it with the present disclosure.”; Hiroe, paragraph 380, see also Hyun et al. paragraphs 6, 30).

As per claim 11, Hiroe in view of Hyun et al. further disclose processing the audio data comprises one or more of: speech recognition, speech to text transcription, determining one or more queries, determining one or more commands, executing one or more queries, or executing one or more commands (“Some of the speech recognition devices have a sound segment detection function…a direction specifying device may be operated by the user to set the sound source direction .theta..”; Hiroe, paragraphs 380, 546, see also Hyun et al. paragraphs 6, 30).

As per claim 12, Hiroe in view of Hyun et al. further disclose determining a phase shift associated with the audio data (“sound from in other directions are attenuated because they are shifted in phase”; paragraph 50).

As per claim 13, Hiroe in view of Hyun et al. further disclose sending, to the user device, based on the change in direction of the audio source, a change of direction indication (Hyun et al., paragraphs 13, 22, 30).
As per claim 14, Hiroe in view of Hyun et al. further disclose excluding audio data from processing or terminating one or more audio processing operations (“generate a sound signal composed of the target sound from which undesirable interference sounds are removed as much as possible”; Hiroe, paragraphs 50, 253, 341, see also Hyun et al., paragraphs 13, 22, 30).

As per claim 16, Hiroe in view of Hyun et al. further disclose the user device comprises a voice enabled device and wherein the first portion of the audio input comprises a wake word received by a user device(“Some of the speech recognition devices have a sound segment detection function…a direction specifying device may be operated by the user to set the sound source direction .theta..”; Hiroe, paragraphs 380, 546).

As per claim 17, Hiroe in view of Hyun et al. further disclose processing one or more of the first portion of the audio input or the second portion of the audio input (“a sound source extraction unit receives the sound direction and the sound segment of the target sound and extracts a sound-signal of the target sound.”; Hiroe, Abstract, paragraphs 50, 51, see also Hyun et al., paragraphs 13, 22, 30).

As per claim 18, Hiroe in view of Hyun et al. further disclose processing one or more of the first portion of the audio input or the second portion of the audio input comprises performing one or more of: natural language processing, natural language understanding, speech recognition, speech to text transcription, determining one or more queries, determining one or more commands, sending one or more responses, executing one or more queries, sending or receiving data, or executing one or more commands (“Some of the speech recognition devices have a sound segment detection function.”; Hiroe, paragraph 380, see also Hyun et al. paragraphs 6, 30).

As per claim 19, Hiroe in view of Hyun et al. further disclose the change in position of the audio source is associated with a phase shift of an audio input (“sound from in other directions are attenuated because they are shifted in phase”; Hiroe, paragraph 50).

As per claim 20, Hiroe in view of Hyun et al. further disclose receiving, by the user device, based on the indication that the first portion of the audio input comprises an end of speech, one or more of: a message indicating the second portion of the audio input has been excluded from processing a message indicating one or more processing operations have been terminated (“generate a sound signal composed of the target sound from which undesirable interference sounds are removed as much as possible… A segment 602 is assumed to be a segment in which an interference sound is active before the target sound starts to be active. It is assumed that around the end of the segment 602 of the interference sound overlaps with the start of the segment 601 of the target sound time-wise and this overlapping region is denoted by an overlap region 611.”; paragraphs 50, 253, 341, see also Hyun et al., paragraphs 13, 22, 30).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEONARD SAINT-CYR whose telephone number is (571)272-4247. The examiner can normally be reached Monday- Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on (571)272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/LEONARD SAINT-CYR/           Primary Examiner, Art Unit 2658

Read full office action

Prosecution Timeline

Jan 25, 2023

Application Filed

Apr 09, 2025

Non-Final Rejection — §103

Jul 11, 2025

Interview Requested

Jul 14, 2025

Response Filed

Sep 08, 2025

Final Rejection — §103

Jan 12, 2026

Request for Continued Examination

Jan 20, 2026

Response after Non-Final Action

Mar 03, 2026

Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/397,693

Patent 12603100

SYSTEM AND METHOD FOR OPTIMIZED AUDIO MIXING

2y 5m to grant Granted Apr 14, 2026

18/065,588

Patent 12597415

VOICE RECOGNITION GRAMMAR SELECTION BASED ON CONTEXT

2y 5m to grant Granted Apr 07, 2026

18/442,239

Patent 12592227

DIALOG UNDERSTANDING DEVICE AND DIALOG UNDERSTANDING METHOD

2y 5m to grant Granted Mar 31, 2026

18/496,523

Patent 12591765

SYSTEMS AND METHODS FOR BUILDING A CUSTOMIZED GENERATIVE ARTIFICIAL INTELLIGENT PLATFORM

2y 5m to grant Granted Mar 31, 2026

18/561,788

Patent 12585884

DIALOGUE APPARATUS, DIALOGUE METHOD, AND PROGRAM

2y 5m to grant Granted Mar 24, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

77%

Grant Probability

95%

With Interview (+18.2%)

3y 1m

Median Time to Grant

High

PTA Risk

Based on 1144 resolved cases by this examiner. Grant probability derived from career allow rate.