Last updated: May 04, 2026
Application No. 18/832,053
COMPUTER-IMPLEMENTED METHOD FOR DETECTING ACTIVITY IN AN AUDIO STREAM

Non-Final OA §103
Filed
Jul 22, 2024
Priority
Aug 31, 2022 — FI 20225762 +1 more
Examiner
OGUNBIYI, OLUWADAMILOL M
Art Unit
2653
Tech Center
2600 — Communications
Assignee
Elisa Oyj
OA Round
5 (Non-Final)
Interview Optional

— +18.5% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 78% grant rate with +18.5% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 306 resolved cases, 2023–2026
Examiner Intelligence

OGUNBIYI, OLUWADAMILOL M View full profile →
Grants 78% — above average
Career Allowance Rate
238 granted / 306 resolved
+15.8% vs TC avg
Strong +18% interview lift
Without
With
+18.5%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
30 currently pending
Career history
336
Total Applications
across all art units
Statute-Specific Performance

§101
20.1%
-19.9% vs TC avg
§103
47.1%
+7.1% vs TC avg
§102
12.1%
-27.9% vs TC avg
§112
13.6%
-26.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 306 resolved cases
Office Action

§103
DETAILED ACTION
Claims 1 and 6 – 16 are pending.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant’s submission filed on 11 September 2025 has been entered.
Response to Amendment
With regard to the Final Office Action from 23 July 2025, the Applicant has filed a response on 11 September 2025.
Response to Arguments
The Applicant disagrees (Remarks: page 9 par 3) with the Examiner’s 35 U.S.C. 103 rejection of the independent claim 1 as previously presented. Applicant’s arguments with respect to this claim has been considered but are moot because the new ground of rejection necessitated by the amendment to the claim. The claims will be addressed by their current presentation in the following section.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 9, 14 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Torgrim US (U.S. 5,724,420 A) in view of Huang et al. (WO 2014/194273: hereafter — Huang) further in view of Swvigaradoss et al. (US 2022/0093090 A1: hereafter — Swvigaradoss), further in view of Myers (US 2020/0159550 A1), and further in view of SHIN et al. (US 2022/0270617 A1: hereafter — Shin).
For claim 1, Torgrim discloses a computer-implemented method for detecting activity in an audio stream (Torgrim), the method comprising:
obtaining an audio stream (Torgrim: FIGs. 3, 4, Col 5 lines 35-37 — a DSP that receives audio signals); and
detecting activity in the audio stream (Torgrim: Col 6 lines 45–54 — detecting voice energy (indicating the detecting of an activity in an audio stream)) based on detection criteria, wherein the detection criteria comprise at least two of:
an audio amplitude threshold, wherein sections of the audio stream with an audio amplitude less than the audio amplitude threshold are classified as inactive (Torgrim: Col 7 lines 46-48 — detecting voice energy falling below a preselected relative threshold level to indicate the absence of voice energy (the voice energy being an indication of the claimed audio amplitude));
a detection delay defining a time interval of the audio stream during which activity in the audio stream is ignored (Torgrim: Col 8 lines 10-16 — a delay timer that prevents the starting of a pause timer to measure the preselected pause time limit for a determination of live voice; FIGs. 3, 4, Col 6 line 60- Col 7 line 4 — a delay timer which provides a predetermined time before initiating a pause timer that is applied to detecting voice activity, in this time period, enablement of the pause timer is prevented (to indicate a predetermined timer in place that voice activity gets ignored); Col 8 lines 59–65 — provides that only after a delay timer has elapsed, would the pause timer then be enabled so as to start finding indication of a live voice);
a minimum activity duration defining a minimum duration for an active section in the audio stream (Torgrim: Col 8 lines 42-45 — a minimum time period is established for the initially measured pause in the voice energy which must be exceeded to indicate the presence of a live voice; Col 9 lines 1-3 — if the pause timer times out after measuring any pause after the lapse of the delay time interval, a finding of a live voice us determined (providing a minimum amount of time that must be exceeded for voice activity to be detected)); and/or
a maximum inactivity duration defining a maximum duration of inactivity in the audio stream;
wherein the detecting activity in the audio stream based on detection criteria comprises:
waiting for the detection delay (Torgrim: Col 8 lines 10-16 — a delay timer that prevents the starting of a pause timer to measure the preselected pause time limit for a determination of live voice; FIGs. 3, 4, Col 6 line 60- Col 7 line 4 — a delay timer which provides a predetermined time before initiating a pause timer that is applied to detecting voice activity, in this time period, enablement of the pause timer is prevented (to indicate a predetermined timer in place that voice activity gets ignored));
after the detection delay, continuously comparing the audio amplitude of the audio stream to the audio amplitude threshold (Torgrim: FIG. 2B Steps 118 → 120 — checking if voice energy is detected and if so, resetting a pause timer (the pause timer occurring after the delay timer as in FIGs. 3, 4) Col 7 lines 46-48 — ‘Preferably, when the detected voice energy falls below a preselected relative threshold level representing the absence of voice energy, the pause timer is enabled and begins to run’ (indicating that after the delay timer, audio amplitude is monitored to detect if it is above or below the audio level threshold))
wherein the audio stream corresponds to a voice call (Torgrim: Col 2 lines 60-63 — audio voice energy from answered telephone calls).
The reference of Torgrim provides teaching for the presence of a detection delay time period before beginning to detect activity in an audio stream. It differs from the claimed invention in that the claimed invention now further provides the provision of an activity indication when an audio amplitude’s audio stream exceeds an amplitude threshold for at least a minimum activity duration. The reference of Huang is now introduced to teach this as:
in response to the audio amplitude of the audio stream exceeding the audio amplitude threshold, checking whether the audio amplitude of the audio stream exceeds the audio amplitude threshold for at least the minimum activity duration (Huang: Page 32 lines 1-5 — ‘When the input short time energy exceeds the speech threshold for a period of time (N), it decides that the speech has started’ (indicating that an amplitude threshold is checked as the short time energy, to know if the amplitude of the signal exceeds this threshold)); and
in response to the audio amplitude of the audio stream exceeding the audio amplitude threshold for at least the minimum activity duration, providing an activity indication (Huang: Page 32 lines 1-5 — ‘When the input short time energy exceeds the speech threshold for a period of time (N), it decides that the speech has started’ (indicating that an amplitude threshold is checked as the short time energy, to know if the amplitude of the signal exceeds this threshold and make the decision that activity has been detected)).
The reference of Torgrim teaches the presence of a detection delay time period before beginning to detect activity in an audio stream. This reference of Torgrim fails to teach detecting the presence of activity based on an amplitude of audio stream exceeding an amplitude threshold for at least a minimum activity duration. The reference of Huang is instead made available to teach this.
Hence, before the effective filing date of the claimed invention, one of ordinary skill in the art would have found it obvious to combine the known teaching of Huang which detects activity in an audio based on a short time energy of the audio signal exceeding a threshold over a period of time, with the technique of Torgrim which places a detection delay before a beginning to detect activity in an audio signal, to thereby come up with the claimed invention. The combination of both prior art elements would have provided the predictable result of being informed of the presence of activity in an audio signal only at the point that activity is desired to be detected, while ensuring that the activity is consistent by being detected for longer than a minimum period. See KSR Int’l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007).
The combination of Trogrim in view of Huang fails to teach the further limitations of this claim, for which the reference of Swvigaradoss is no introduced to teach as:
providing an audio prompt to a user, wherein the audio prompt requests the user to perform an action (Swvigaradoss: [0051] — the user is prompted by a voice command so that the user may provide a rephrased speech (the request by the system for the user to rephrase speech input is by itself, an action required of the user));
identifying when the user has performed the action based on the detecting the activity in the audio stream (Swvigaradoss: [0051] — the user is prompted to provide a rephrased speech input and is able to determine that the user has provided the new speech input (the detection of the user having performed the prompted action is based on the process returning to 402, and an identification that the user has performed the requested action based on the detection of activity in the audio stream)); and
in response to identifying when the user has performed the action, performing at least one processing action (Swvigaradoss: [0051] — after the user provides the new speech input after the system has prompted the user to do so, the system performs the further task of processing the new speech input (that was received in response to the request)).
The combination of Trogrim in view of Huang provides teaching for obtaining an audio stream and detecting activity in the audio stream. This differs from the claimed invention in that the claimed invention further provides teaching for providing the user with an audio prompt to request the user perform an action which is detected and in response, having the system perform at least a processing action. This isn’t new to the art as the reference of Swvigaradoss is seen to teach this above.
Hence, before the effective filing date of the claimed invention, one of ordinary skill in the art would have found it obvious to combine the known teaching of Swvigaradoss which prompts a user to perform an action, and upon detecting the action in audio, further performing a processing action, with the technique of the combination of Torgrim in view of Huang which teaches obtaining an audio stream and detecting activity in the audio stream, to thereby come up with the claimed invention. The combination of both prior art elements would have provided the predictable result of making the process more engaging with the user through the provision of the user with a prompt and also performing further processes based on determining the user’s response to the prompt, ensuring interactive dialogue/communication. See KSR Int’l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007).
The combination of Torgrim in view of Huang further in view of Swvigaradoss fails to disclose teaching for the action for the user being unrelated to the voice call, for which the reference of Myers is now introduced to teach as:
wherein the action that the user is requested to perform by the audio prompt is unrelated to the voice call (Myers: [0044] — ‘IVR platform is configured to, based on the DOM from the cobrowse client, determine a next user interface action along the shortest user interface path and generate a voice prompt for the user based on the next user interface action’ (this being a prompt from the system to the user to perform an action, that does not happen to be related to the voice call)), and
[[wherein the at least one processing action comprises performing a speech-to-text conversion on a section of the audio stream detected to be the active section, and]] determining whether the action that is requested of the user by the audio prompt was performed successfully (Myers: [0346]–[0347] — determining if the user has completed or not completed a prompted).
The combination of Trogrim in view of Huang further in view of Swvigaradoss provides teaching for providing an audio prompt to a user, but differs from the claimed invention in that the claimed invention further provides teaching for the action being requested of the user being unrelated to the voice call, and also determining whether the action has been performed. This isn’t new to the art as the reference of Myers is seen to teach above.
Hence, before the effective filing date of the claimed invention, one of ordinary skill in the art would have found it obvious to combine the known teaching of Myers which is able to determine that a user has performed the prompted task, with the technique of the combination of Torgrim in view of Huang further in view of Swvigaradoss which teaches providing an audio prompt to a user, to thereby come up with the claimed invention. The combination of both prior art elements would have provided the predictable result of being able to move the entire process forward after completion of a current step, so that instructing the user through a series of steps can be sequentially performed knowing that each individual step has been completed. See KSR Int’l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007).
The combination of Torgrim in view of Huang further in view of Swvigaradoss and further in view of Myers provides teaching for performing voice activity detection, but differs from the claimed invention in that the claimed invention further provides teaching for performing speech-to-text conversion on sections of the audio determined to contain speech. This isn’t new to the art as the reference of Shin is introduced to teach this as:
wherein the at least one processing action comprises performing a speech-to-text conversion on a section of the audio stream detected to be the active section, [[and determining whether the action that is requested of the user by the audio prompt was performed successfully]] (Shin: [0132] — ‘The ASR 606 (e.g., the automatic speech recognition module 322a of FIG. 3) may convert a user voice in a speech section recognized by the VAD module 605 in the user audio signal 711 received through the audio separation module 603 into user text data’ (teaching that the section where voice activity is detected has its speech converted into text, this being a performed processing action)).
Hence, before the effective filing date of the claimed invention, one of ordinary skill in the art would have found it obvious to combine the known teaching of Shin which performs speech recognition on the section of audio which voice activity is detected at, with the technique of the combination of Torgrim in view of Huang further in view of Swvigaradoss and further in view of Myers which teaches the general voice activity detection, to thereby come up with the claimed invention. The combination of both prior art elements would have provided the predictable result of being able to reduce the required computation cost by not having to process audio segments that don’t contain speech. See KSR Int’l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007).
For claim 9, claim 1 is incorporated and the combination of Torgrim in view of Huang further in view of Swvigaradoss, further in view of Myers, and further in view of Shin discloses the computer-implemented method according to claim 1,wherein the detection criteria comprise at least three of or all of: the audio amplitude threshold (Torgrim: Col 7 lines 46-48 — detecting voice energy falling below a preselected relative threshold level to indicate the absence of voice energy (the voice energy being an indication of the claimed audio amplitude)), the detection delay (Torgrim: Col 8 lines 10-16 — a delay timer that prevents the starting of a pause timer to measure the preselected pause time limit for a determination of live voice; FIGs. 3, 4, Col 6 line 60- Col 7 line 4 — a delay timer which provides a predetermined time before initiating a pause timer that is applied to detecting voice activity, in this time period, enablement of the pause timer is prevented (to indicate a predetermined timer in place that voice activity gets ignored)), the minimum activity duration (Torgrim: Col 8 lines 42-45 — a minimum time period is established for the initially measured pause in the voice energy which must be exceeded to indicate the presence of a live voice; Col 9 lines 1-3 — if the pause timer times out after measuring any pause after the lapse of the delay time interval, a finding of a live voice us determined (providing a minimum amount of time that must be exceeded for voice activity to be detected)), and/or the maximum inactivity duration.
As for claim 14, a computing device claim 14 and method claim 1 are related as device and the method of using same, with each claimed element’s function corresponding to the claimed method step. Torgrim in Col 4 lines 1–2 provide a central processing unit and a main memory suitable to read upon the limitations of this claim. Accordingly, claim 14 is similarly rejected under the same rationale as applied above with respect to method claim 1.
As for claim 15, computer program product claim 15 and method claim 1 are related as computer program product storing executable instructions required for performing the claimed method steps on a computer. Torgrim in Col 4 lines 1–2 provides a memory, and in Col 6 lines 6–9 provides the necessary computer code to run read upon the limitations of this claim. Accordingly, claim 15 is similarly rejected under the same rationale as applied above with respect to method claim 1.
Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Torgrim US (U.S. 5,724,420 A) in view of Huang (WO 2014/194273) further in view of Swvigaradoss (US 2022/0093090 A1), further in view of Myers (US 2020/0159550 A1), further in view of Shin (US 2022/0270617 A1) as applied to claim 1, and further in view of Weingartner (US 2018/0012595 A1).
For claim 6, claim 1 is incorporated but the combination of Torgrim in view of Huang further in view of Swvigaradoss, further in view of Myers, and further in view of Shin, fails to teach the limitation of this claim, for which Weingartner is now introduced to teach as the computer-implemented method, wherein the detection delay starts from an end of the audio prompt (Weingartner: [0013] — having a pause to wait for a user input after the user has been presented with a prompt).
The combination of Torgrim in view of Huang further in view of Swvigaradoss, further in view of Myers, and further in view of Shin provides teaching for the presence of a pause timer period as an indication of a detection delay. This combination differs from the claimed invention in that the claimed invention further provides teaching for starting the detection delay from an end of an audio prompt. The reference of Weingartner is however made available to teach this.
Hence, before the effective filing date of the claimed invention, one of ordinary skill in the art would have found it obvious to substitute the pause occurring after an audio prompt as taught by Weingartner, with the pause timer period taught by the combination of Torgrim in view of Huang further in view of Swvigaradoss, further in view of Myers, and further in view of Shin, to thereby come up with the claimed invention. Such a substitution would provide presenting a delay detection period as a pause after an audio prompt has been provided, the predictable result being granting the user with a short time period of composure to be able to properly respond to the audible prompt. See KSR Int’l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007).
Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Torgrim US (U.S. 5,724,420 A) in view of Huang (WO 2014/194273) further in view of Swvigaradoss (US 2022/0093090 A1), further in view of Myers (US 2020/0159550 A1), further in view of Shin (US 2022/0270617 A1) as applied to claim 1, further in view of Kondziela (US 2014/0142952 A1).
For claim 7, claim 1 is incorporated but the combination of Torgrim in view of Huang further in view of Swvigaradoss, further in view of Myers, and further in view of Shin, fails to teach the limitations of this claim, for which Kondziela is now introduced to teach as the computer-implemented method, the method further comprising:
after providing the audio prompt to the user, starting a polling period, wherein the polling period starts from the end of the audio prompt (Kondziela: [0060] — outputting an audio prompt to a user and monitoring for a response until a timeout is reached (the timeout period being the claimed polling period)); and
in response to no activity being detected during the polling period providing another audio prompt to the user (Kondziela: [0060] — in a situation where no response is received, the user prompt may be repeated).
The combination of Torgrim in view of Huang further in view of Swvigaradoss, further in view of Myers, and further in view of Shin, provides teaching for providing an audio prompt to the user. This combination differs from the claimed invention in that the claimed invention further provides starting a polling period after the audio prompt, and if no activity is detected in the polling period, providing another audio prompt to the user. This is however not new to the art as the reference of Kondziela introduced to teach this.
Hence, before the effective filing date of the claimed invention, one of ordinary skill in the art would have found it obvious to combine the known teaching of Kondziela which monitors for a user response after a prompt until a timeout is achieved, and the prompt gets repeated, with the presentation of an audio prompt as taught by the combination of Torgrim in view of Huang further in view of Swvigaradoss, further in view of Myers, and further in view of Shin, to thereby come up with the claimed invention. The combination of both prior art elements would have provided the predictable result of presenting the user with enough time to respond to the prompt and another opportunity to respond if the first opportunity went by. See KSR Int’l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007).
Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Torgrim US (U.S. 5,724,420 A) in view of Huang (WO 2014/194273) further in view of Swvigaradoss (US 2022/0093090 A1), further in view of Myers (US 2020/0159550 A1), further in view of Shin (US 2022/0270617 A1), and further in view of Kondziela (US 2014/0142952 A1) as applied to claim 7, and further in view of Goldman et al (US 2010/0303214 A1: hereafter — Goldman).
For claim 8, claim 7 is incorporated but the combination of Torgrim in view of Huang further in view of Swvigaradoss, further in view of Myers, further in view of Shin,and further in view of Kondziela fail to teach the limitations of this claim, for which Goldman is now introduced to instead teach as the computer-implemented method, the method further comprising, before the detecting activity in the audio stream, adjusting the detection delay, the minimum activity duration, the maximum inactivity duration, and/or the polling period according to the action (Goldman: [0004] — a gap being detected after an initial prompt; [0024] — a gap being a period of no detectable voice activity; [0028] — adjusting a time period for the gap (the gap here is akin to the claimed detection delay)).
The combination of Torgrim in view of Huang further in view of Swvigaradoss, further in view of Myers, further in view of Shin, and further in view of Kondziela provides teaching for detecting activity in an audio stream whereby a detection delay is present where voice activity is not detected. This combination however differs from the claimed invention in that the claimed invention further provides adjusting a detection delay. The reference of Goldman is however seen to teach this.
Hence, before the effective filing date of the claimed invention, one of ordinary skill in the art would have found it obvious to combine the known teaching of Goldman which provides adjusting a detection delay as a gap period, with the presence of a detection delay as taught by the combination of Torgrim in view of Huang further in view of Swvigaradoss, further in view of Myers, and further in view of Shin and further in view of Kondziela, to thereby come up with the claimed invention. The combination of both prior art elements would have provided the predictable result of presenting the user with enough time to think up and respond to the prompt presented to the user. See KSR Int’l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007).
Claims 10 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Torgrim US (U.S. 5,724,420 A) in view of Huang (WO 2014/194273) further in view of Swvigaradoss (US 2022/0093090 A1), further in view of Myers (US 2020/0159550 A1), further in view of Shin (US 2022/0270617 A1) as applied to claim 1, and further in view of Martinson et al. (US 2022/0366904 A1: hereafter — Martinson).
For claim 10, claim 1 is incorporated but the combination of Torgrim in view of Huang further in view of Swvigaradoss, further in view of Myers, and further in view of Shin fails to disclose the limitation of this claim, for which Martinson is now introduced to teach as the computer-implemented method, the method further comprising:
in response to the maximum inactivity duration being exceeded without activity being detected in the audio stream, providing a no-activity indication (Martinson: [0131] — after a user is prompted and there is no response after a threshold amount of time (this being the maximum inactivity duration), leading the system to determine a time-out and close the microphone (as a no-activity indication)).
The combination of Torgrim in view of Huang further in view of Swvigaradoss, further in view of Myers, and further in view of Shin provides teaching for detecting activity in an audio stream, but differs from the claimed invention in that the claimed invention further provides teaching for the presence exceeding a maximum inactivity duration without detecting any activity, to then provide a no-activity indication. The reference of Martinson is however seen to teach this as presented above.
Hence, before the effective filing date of the claimed invention, one of ordinary skill in the art would have found it obvious to combine the known teaching of Martinson which closes a microphone as a no-activity indication after no response is detected for a threshold period of time, with the teaching of detecting activity in an audio stream as taught by the combination of Torgrim in view of Huang further in view of Swvigaradoss, further in view of Myers, and further in view of Shin, to thereby come up with the claimed invention. The combination of both prior art elements would have provided the predictable result of clearly informing the user that in a period which speech should have been detected, no detectable speech by the user was received. See KSR Int’l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007).
For claim 11, claim 1 is incorporated and the combination of Torgrim in view of Huang further in view of Swvigaradoss, further in view of Myers, and further in view of Shin further in view of Martinson discloses the computer-implemented method according to claim 10, the method further comprising:
in response to the no-activity indication, providing an inactivity audio prompt to the user (Martinson: [0131] — playing an inactive sound (as the prompt to the user for a no-activity indication)).
Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Torgrim US (U.S. 5,724,420 A) in view of Huang (WO 2014/194273) further in view of Swvigaradoss (US 2022/0093090 A1), further in view of Myers (US 2020/0159550 A1), further in view of Shin (US 2022/0270617 A1) as applied to claim 1, and further in view of Schairer et al. (US 2021/0248998 A1: hereafter — Schairer).
For claim 12, claim 1 is incorporated but the combination of Torgrim in view of Huang further in view of Swvigaradoss, further in view of Myers, and further in view of Shin fails to disclose the limitations of this claim, for which Schairer is now introduced to teach as the computer-implemented method, the method further comprising:
in response to detecting activity in the audio stream, performing a speech-to-text conversion on the audio stream, thus obtaining a transcript of speech data in the audio stream (Schairer: [0044] — after detecting voice activity, processing speech-to-text on the audio data to get recognised text); and
performing at least one processing action based at least on the transcript (Schairer: [0044] — performing one or more corresponding actions based on the recognised text).
The combination of Torgrim in view of Huang further in view of Swvigaradoss, further in view of Myers, and further in view of Shin provides teaching for detecting activity in an audio stream and applying speech detection to speech recognition (Huang: Page 31 lines 18-20), but differs from the claimed invention in that the claimed invention further provides performing speech-to-text conversion on the audio after detecting activity, and then performing at least one action based on the transcript. This isn’t new to the art as the reference of Schairer is seen to teach above.
Hence, before the effective filing date of the claimed invention, one of ordinary skill in the art would have found it obvious to combine the known teaching of Schairer which performs speech detection, speech recognition, and then performs an action based on the recognised text, with the teaching of detecting activity in an audio stream as taught by the combination of Torgrim in view of Huang further in view of Swvigaradoss, further in view of Myers, and further in view of Shin, to thereby come up with the claimed invention. The combination of both prior art elements would have provided the predictable result of applying the ease of speech recognition to perform actions spoken by a user. See KSR Int’l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007).
Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Torgrim US (U.S. 5,724,420 A) in view of Huang (WO 2014/194273) further in view of Swvigaradoss (US 2022/0093090 A1), further in view of Myers (US 2020/0159550 A1), further in view of Shin (US 2022/0270617 A1) as applied to claim 1, and further in view of Sehlstedt (US 2012/0215536 A1).
For claim 13, claim 1 is incorporated but the combination of Torgrim in view of Huang further in view of Swvigaradoss, further in view of Myers, and further in view of Shin fails to disclose the limitations of this claim, for which Sehlstedt is now introduced to teach as the computer-implemented method, the method further comprising:
identifying an amplitude of noise in the audio stream (Sehlstedt: [0013], [0016] — for the purpose of performing voice activity detection, estimating noise energy (the noise energy is indicative of noise amplitude)); and
adjusting the audio amplitude threshold according to the amplitude of noise (Sehlstedt: [0016] — an adaptive threshold based on total noise energy; [0029] — adjust an SNR estimate for threshold adaptation; [0043]-[0044] — computing an adaptive threshold for an SNR to be compared to, based on computed noise energy (the SNR is akin to an indication of actual audio amplitude within an audio signal)).
The combination of Torgrim in view of Huang further in view of Swvigaradoss, further in view of Myers, and further in view of Shin provides teaching for detecting activity in an audio stream involving the use of an audio amplitude threshold, but differs from the claimed invention in that the claimed invention further provides adjusting audio amplitude threshold according to noise amplitude in an audio stream. This isn’t new to the art as the reference of Sehlstedt is seen to teach similarly above.
Hence, before the effective filing date of the claimed invention, one of ordinary skill in the art would have found it obvious to combine the known teaching of Sehlstedt which performs SNR threshold adjustment based on detected noise energy in an audio signal, with the teaching of detecting activity in an audio stream involving determining an audio amplitude threshold as taught by the combination of Torgrim in view of Huang further in view of Swvigaradoss, further in view of Myers, and further in view of Shin, to thereby come up with the claimed invention. The combination of both prior art elements would have provided the predictable result of adjusting for speech activity detection based on the quantity of noise present within the applicable sections of the audio signal. See KSR Int’l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007).
Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Torgrim US (U.S. 5,724,420 A) in view of Huang (WO 2014/194273) further in view of Swvigaradoss (US 2022/0093090 A1), further in view of Myers (US 2020/0159550 A1), further in view of Shin (US 2022/0270617 A1) as applied to claim 1, and further in view of Lesso (US 2019/0333522 A1).
For claim 16, claim 1 is incorporated but the combination of Torgrim in view of Huang further in view of Swvigaradoss, further in view of Myers, and further in view of Shin fails to explicitly disclose the limitations of this claim, for which Lesso is now introduced to teach as the computer-implemented method, wherein the activity in the audio stream comprises an individual speaking (Lesso: [0070] — ‘In some embodiments, the speaker identification system comprises: a voice activity detector for attempting to detect human speech in the received audio signal’; [0100] — According to another aspect of the present invention, there is provided a method of voice activity detection, the method comprising performing at least a part of a voice biometric process suitable for determining whether a signal contains speech of an enrolled user).
The combination of Torgrim in view of Huang further in view of Swvigaradoss, further in view of Myers, and further in view of Shin provides teaching for detecting activity in an audio stream involving the use of an audio amplitude threshold, but differs from the claimed invention in that the claimed invention further provides that the activity in the audio stream comprises an individual speaking. This isn’t new to the art as the reference of Lesso is seen to teach similarly above.
Hence, before the effective filing date of the claimed invention, one of ordinary skill in the art would have found it obvious to combine the known teaching of Lesso which has that the activity in the audio stream contains speech of a speaker, with the teaching of detecting activity in an audio stream involving determining an audio amplitude threshold as taught by the combination of Torgrim in view of Huang further in view of Swvigaradoss, further in view of Myers, and further in view of Shin, to thereby come up with the claimed invention. The combination of both prior art elements would have provided the predictable result of associating the detected voice with a particular enrolled speaker, thereby leading to being able to perform speech transcripts segmented by individual speakers in the audio stream. See KSR Int’l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007).
Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicant’s disclosure. See PTO-892.
Any inquiry concerning this communication or earlier communications from the Examiner should be directed to OLUWADAMILOLA M. OGUNBIYI whose telephone number is (571)272-4708. The Examiner can normally be reached Monday - Thursday (8:00 AM - 5:30 PM Eastern Standard Time).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, Applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the Examiner by telephone are unsuccessful, the Examiner’s Supervisor, PARAS D SHAH can be reached at (571) 270-1650. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/OLUWADAMILOLA M OGUNBIYI/Examiner, Art Unit 2653

/Paras D Shah/Supervisory Patent Examiner, Art Unit 2653                                                                                                                                                                                                        
09/29/2025
Read full office action
Prosecution Timeline

Show 6 earlier events
Apr 18, 2025
Non-Final Rejection — §103
Jul 17, 2025
Response Filed
Jul 21, 2025
Final Rejection — §103
Aug 22, 2025
Interview Requested
Aug 29, 2025
Examiner Interview Summary
Sep 11, 2025
Request for Continued Examination
Sep 15, 2025
Response after Non-Final Action
Sep 29, 2025
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/671,825
Patent 12608427
Drill Back To Original Audio Clip In Virtual Assistant Initiated Lists And Reminders
1y 11m to grant Granted Apr 21, 2026
18/615,766
Patent 12579979
NAMING DEVICES VIA VOICE COMMANDS
1y 11m to grant Granted Mar 17, 2026
19/024,112
Patent 12537007
METHOD FOR DETECTING AIRCRAFT AIR CONFLICT BASED ON SEMANTIC PARSING OF CONTROL SPEECH
1y 0m to grant Granted Jan 27, 2026
18/082,346
Patent 12508086
SYSTEM AND METHOD FOR VOICE-CONTROL OF OPERATING ROOM EQUIPMENT
3y 0m to grant Granted Dec 30, 2025
17/693,171
Patent 12499885
VOICE-BASED PARAMETER ASSIGNMENT FOR VOICE-CAPTURING DEVICES
3y 9m to grant Granted Dec 16, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

5-6
Expected OA Rounds
78%
Grant Probability
96%
With Interview (+18.5%)
2y 11m (~1y 2m remaining)
Median Time to Grant
High
PTA Risk
Based on 306 resolved cases by this examiner. Grant probability derived from career allowance rate.