Prosecution Insights
Last updated: April 19, 2026
Application No. 18/676,014

METHOD AND SYSTEM FOR AI-BASED PROCESSING OF VOICE COMMANDS WITHIN SMART HOME

Non-Final OA §102§103§112
Filed
May 28, 2024
Examiner
PULLIAS, JESSE SCOTT
Art Unit
2655
Tech Center
2600 — Communications
Assignee
Ironcore Technologies LLC
OA Round
1 (Non-Final)
83%
Grant Probability
Favorable
1-2
OA Rounds
2y 8m
To Grant
96%
With Interview

Examiner Intelligence

Grants 83% — above average
83%
Career Allow Rate
873 granted / 1052 resolved
+21.0% vs TC avg
Moderate +13% lift
Without
With
+13.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 8m
Avg Prosecution
47 currently pending
Career history
1099
Total Applications
across all art units

Statute-Specific Performance

§101
15.0%
-25.0% vs TC avg
§103
50.4%
+10.4% vs TC avg
§102
19.7%
-20.3% vs TC avg
§112
4.9%
-35.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 1052 resolved cases

Office Action

§102 §103 §112
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . DETAILED ACTION This office action is in response to application 18/676,014, which was filed 05/28/24. Claims 1-20 are pending in the application and have been considered. Specification The abstract of the disclosure is objected to because it is over 150 words. Correction is required. See MPEP § 608.01(b). Claim Objections Claim 15 is objected to because of the following informalities: in line 5, “perfume” should be “perform”. Appropriate correction is required. Claim Rejections - 35 USC § 112 The following is a quotation of 35 U.S.C. 112(b): (b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention. Claims 13-15 are rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor regards as the invention. Claim 13 recites the limitation "the ASR module" in lines 2-3. There is insufficient antecedent basis for this limitation in the claim. Claims 14 and 15 include the indefinite subject matter of claim 13 by virtue of their dependency on it, and do not remedy the indefiniteness. These claims are therefore also rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor regards as the invention. Claim Rejections - 35 USC § 102 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action: A person shall be entitled to a patent unless – (a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention. Claims 1-3, 6, 16, 17, and 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Lyon et al. (US 20180197533). Consider claim 1, Lyon discloses a system for an automated voice command processing within a smart home (automatic processing of spoken commands by smart home devices in a house, [0036], [0037]) comprising: a processor of a voice command processing server node configured to host a machine learning (ML) module and connected to at least one audio capture entity node and to at least one target node over a wireless network connection (voice activated electronic devices 190 has CPU 502, Fig. 2A; processes voice commands as server nodes, [0068-0069], Fig. 1, hosts PCEN frontend in voice processing module 538, [0078], which is a machine learning module, [0142-0146], and is connected to microphone in input devices nodes, [0077], Fig. 2A, and control target nodes such as appliances and media systems operated by voice commands, [0045], over wireless network, [0058]); and a memory on which are stored machine-readable instructions that when executed by the processor (memory 506, [0077], Fig 2A), cause the processor to: acquire raw audio data comprising an audio signal from the at least one audio capture entity node (voice activated electronic device receives audio from microphone 516, [0078], e.g. “OK Google, play cat videos on my Living room TV”, [0052]); normalize the audio signal for volume consistency (normalization module using PCEN normalizes the received audio, [0078], and performs range compression and gain control, [0132], [0138]; this is considered to normalize the audio signal for perceived loudness, i.e. for a type of “volume consistency”); convert the normalized audio signal into a spectrogram (the frontend stacks the PCEN features horizontally into a spectrogram, [0159]); extract a set of classifying features from the spectrogram (features of the spectrogram are input to a neural network, i.e. extracted from the spectrogram as a series for the input layer, [0159]); provide the set of classifying features to the ML module configured to generate a predictive model based on a neural network for producing at least one wake word parameter (these features are input to a CNN for keyword spotting, [0159], which produces a prediction of whether the audio contained “OK Google” based on the input features, [0151]; a probability distribution or score, i.e. a wake word parameter, for detection/not detection is inherent in this CNN architecture); detect a wake word based on the at least one wake word parameter (when the output of the CNN indicates the wake word is present, e.g. “OK Google”, [0151]); and switch the voice command processing server node to an active listening mode for processing subsequent user audio commands through the at least one audio capture entity node (upon detecting the wake word, the device is “awakened” by putting it into a state where the device is ready to receive voice requests to the voice assistant service, [0061]). Consider claim 16, Lyon discloses a method for an automated voice command processing within a smart home (automatic processing of spoken commands by smart home devices in a house, [0036], [0037]), comprising: acquiring, by a voice command processing server (VCPS) node, raw audio data comprising an audio signal from the at least one audio capture entity node (voice activated electronic device receives audio from microphone 516, [0078], e.g. “OK Google, play cat videos on my Living room TV”, [0052]); normalizing, by the VCPS node, the audio signal for volume consistency (normalization module using PCEN normalizes the received audio, [0078], and performs range compression and gain control, [0132], [0138]; this is considered to normalize the audio signal for perceived loudness, i.e. for a type of “volume consistency”); converting, by the VCPS node, the normalized audio signal into a spectrogram (the frontend stacks the PCEN features horizontally into a spectrogram, [0159]); extracting, by the VCPS node, a set of classifying features from the spectrogram (features of the spectrogram are input to a neural network, i.e. extracted from the spectrogram as a series for the input layer, [0159]); providing, by the VCPS node, the set of classifying features to the ML module configured to generate a predictive model based on a neural network for producing at least one wake word parameter (these features are input to a CNN for keyword spotting, [0159], which produces a prediction of whether the audio contained “OK Google” based on the input features, [0151]; a probability distribution or score, i.e. a wake word parameter, for detection/not detection is inherent in this CNN architecture); detecting, by the VCPS node, a wake word based on the at least one wake word parameter (when the output of the CNN indicates the wake word is present, e.g. “OK Google”, [0151]); and switching, by the VCPS node, the voice command processing server node to an active listening mode for processing subsequent user audio commands through the at least one audio capture entity node (upon detecting the wake word, the device is “awakened” by putting it into a state where the device is ready to receive voice requests to the voice assistant service, [0061]). Consider claim 20, Lyon discloses non-transitory computer-readable medium comprising instructions, that when read by a processor (non-transitory computer-readable medium with instructions executed by a processor, [0078]), cause the processor to perform: acquiring raw audio data comprising an audio signal from the at least one audio capture entity node (voice activated electronic device receives audio from microphone 516, [0078], e.g. “OK Google, play cat videos on my Living room TV”, [0052]); normalizing the audio signal for volume consistency (normalization module using PCEN normalizes the received audio, [0078], and performs range compression and gain control, [0132], [0138]; this is considered to normalize the audio signal for perceived loudness, i.e. for a type of “volume consistency”); converting the normalized audio signal into a spectrogram (the frontend stacks the PCEN features horizontally into a spectrogram, [0159]); extracting a set of classifying features from the spectrogram (features of the spectrogram are input to a neural network, i.e. extracted from the spectrogram as a series for the input layer, [0159]); providing the set of classifying features to the ML module configured to generate a predictive model based on a neural network for producing at least one wake word parameter (these features are input to a CNN for keyword spotting, [0159], which produces a prediction of whether the audio contained “OK Google” based on the input features, [0151]; a probability distribution or score, i.e. a wake word parameter, for detection/not detection is inherent in this CNN architecture); detecting a wake word based on the at least one wake word parameter (when the output of the CNN indicates the wake word is present, e.g. “OK Google”, [0151]); and switching the voice command processing server node to an active listening mode for processing subsequent user audio commands through the at least one audio capture entity node (upon detecting the wake word, the device is “awakened” by putting it into a state where the device is ready to receive voice requests to the voice assistant service, [0061]). Consider claim 2, Lyon discloses the machine-readable instructions that when executed by the processor, cause the processor to detect the wake word by applying a confidence threshold to the wake word parameter (a decision boundary confidence threshold is inherent in the trained binary decision classifier CNN, [0151]). Consider claim 3, Lyon discloses the machine-readable instructions that when executed by the processor, cause the processor to produce a wake word detection verdict responsive to the wake word parameter exceeding the confidence threshold (a decision, i.e. verdict, based on comparing the CNN prediction to a decision boundary confidence threshold is inherent in the trained binary decision classifier CNN, [0151]). Consider claim 6, Lyon discloses the machine-readable instructions that when executed by the processor, cause the processor to normalize a volume and energy levels of the audio signal by application of Per-Channel Energy Normalization (normalization module using PCEN normalizes the received audio, [0078], and performs range compression and gain control, [0132], [0138]; this is considered to normalize the audio signal for perceived loudness, i.e. for a type of “volume consistency”). Consider claim 17, Lyon discloses producing a wake word detection verdict responsive to the wake word parameter exceeding a confidence threshold (a decision, i.e. verdict, based on comparing the CNN prediction to a decision boundary confidence threshold is inherent in the trained binary decision classifier CNN, [0151]). Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Lyon et al. (US 20180197533) in view of Lakshmikanth et al. (“Noise Cancellation in Speech Signal Processing-A Review”. International Journal of Advanced Research in Computer and Communication Engineering Vol. 3, Issue 1, January 2014). Consider claim 4, Lyon disclsoes the machine-readable instructions that when executed by the processor, cause the processor to implement a noise module (noise module 790 for noise mitigation, [0130]). Lyon does not specifically mention removing background noise by application of Infinite Impulse Response (IIR) filter for white noise and Kalman filter for non-stationary noise. Lakshmikanth discloses removing background noise by application of Infinite Impulse Response (IIR) filter for white noise and Kalman filter for non-stationary noise (removal of white noise with IIR filters, Sections 1C and 2, page 5177-5178, and Kalman filters for non-stationary noise removal, Section VI., pages 5181-5182). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Lyon by removing background noise by application of Infinite Impulse Response (IIR) filter for white noise and Kalman filter for non-stationary noise in order to reduce the degradation of speech processing systems due to noise, as suggested by Lakshmikanth (page 5176). Doing so would have led to predictable results of making the system more usable in noisy environments, as suggested by Lakshmikanth (page 5175). The references cited are analogous art in the same field of audio processing. Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Lyon et al. (US 20180197533) in view of Wojogbe et al. (US 20190179611). Consider claim 5, Lyon does not, but Wojogbe discloses executing beamforming processing to focus on an audio signal from a direction of a speaker while ignoring other directions (beamforming to capture sound from directions where voice activity is detected, [0098-0099], Fig. 5A, which implicitly ignores directions where voice activity is not detected). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Lyon by executing beamforming processing to focus on an audio signal from a direction of a speaker while ignoring other directions in order to assist in filtering background noise, as suggested by Wojogbe ([0096]). The references cited are analogous art in the same field of audio processing. Claims 7-9 are rejected under 35 U.S.C. 103 as being unpatentable over Lyon et al. (US 20180197533) in view of Sharifi et al. (US 20220180866). Consider claim 7, Lyon does not, but Sharifi discloses streaming the audio signal from a DSP module to an Automatic Speech Recognition (ASR) module (audio data is streamed from smart speaker to speech recognition system, Fig. 1A, [0020], [0022]). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Lyon by streaming the audio signal from a DSP module to an Automatic Speech Recognition (ASR) module reduce processing on a resource constrained device, as suggested by Sharifi [0003]), predictably reducing expense. The references cited are analogous art in the same field of audio processing. Consider claim 8, Lyon does not, but Sharifi discloses feeding the set of classifying features into a deep learning model comprising a sequence-to-sequence model to transcribe spoken words into text (sequence-to-sequence speech recognition model that generates a transcription from the features, [0047]). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Lyon by feeding the set of classifying features into a deep learning model comprising a sequence-to-sequence model to transcribe spoken words into text for reasons similar to those for claim 7. Consider claim 9, Lyon does not, but Sharifi discloses balancing latency and accuracy by adjusting a window size of transcription (adjusting the window size for transcription, [0033], [0047]; this is considered to balance latency and accuracy by promptly submitting the audio for transcription without cutting off more audio the user might utter). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Lyon by balancing latency and accuracy by adjusting a window size of transcription for reasons similar to those for claim 7. Claims 10, 11, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Lyon et al. (US 20180197533) in view of Rand et al. (US 20200150919). Consider claim 10, Lyon discloses the machine-readable instructions that when executed by the processor, cause the processor to, responsive to the wake word detection, continuously monitor the audio signal (upon detecting the wake word, the device is “awakened” by putting it into a state where the device is ready, i.e. continuously monitoring, to receive voice requests to the voice assistant service, [0061]). Lyon does not specifically mention converting the audio signal into a format suitable for VAD model. Rand discloses converting the audio signal into a format suitable for VAD model (MFCCs for Gaussian Mixture model VAD, [0077]). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Lyon by converting the audio signal into a format suitable for VAD model in order to enhance accuracy, as suggested by Rand ([0029]), predictably resulting in augmented overall functionality, as suggested by Rank ([0029]). The references cited are analogous art in the same field of audio processing. Consider claim 11, Lyon does not, but Rand discloses feeding the converted audio signal into the VAD model comprising Gaussian Mixture Model or Silero VAD (MFCCs for Gaussian Mixture model VAD, [0077]). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Lyon by feeding the converted audio signal into the VAD model comprising Gaussian Mixture Model or Silero VAD for reason similar to those for claim 10. Consider claim 18, Lyon discloses responsive to the wake word detection, continuously monitor the audio signal (upon detecting the wake word, the device is “awakened” by putting it into a state where the device is ready, i.e. continuously monitoring, to receive voice requests to the voice assistant service, [0061]). Lyon does not specifically mention converting the audio signal into a format suitable for VAD model. Rand discloses converting the audio signal into a format suitable for VAD model (MFCCs for Gaussian Mixture model VAD, [0077]). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Lyon by converting the audio signal into a format suitable for VAD model for reasons similar to those for claim 10. Claims 12 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Lyon et al. (US 20180197533) in view of Rand et al. (US 20200150919), in further view of Rao (US 20220358913). Consider claim 12, Lyon does not, but Rao discloses analyzing outputs of the VAD models to detect when the at least one audio capture entity node stops capturing the audio data and, responsive to the detection, stop recording and send the audio data for transcription (when the end of sentence is reached, switching mechanism sends a signal to the ASR to stop recording, [0064], [0065], Fig. 4). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Lyon by analyzing outputs of the VAD models to detect when the at least one audio capture entity node stops capturing the audio data and, responsive to the detection, stop recording and send the audio data for transcription in order to reduce pre-processing and postprocessing overheads, as suggested by Rao ([0005]), predictably reducing delay in speech processing, as suggested by Rao ([0005]). The references cited are analogous art in the same field of audio processing. Consider claim 19, Lyon does not, but Rao discloses analyzing outputs of the VAD models to detect when the at least one audio capture entity node stops capturing the audio data and, responsive to the detection, stop recording and send the audio data for transcription (when the end of sentence is reached, switching mechanism sends a signal to the ASR to stop recording, [0064], [0065], Fig. 4). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Lyon by analyzing outputs of the VAD models to detect when the at least one audio capture entity node stops capturing the audio data and, responsive to the detection, stop recording and send the audio data for transcription for reasons similar to those for claim 11. Claims 13-15 are rejected under 35 U.S.C. 103 as being unpatentable over Lyon et al. (US 20180197533) in view of Rand et al. (US 20200150919), in further view of Sundararaman (US 20200050949). Consider claim 13, Lyon discloses collecting a text output from the ASR module(extracting a user voice command, [0117], by a neural network speech recognizer, [0015]). Lyon and Rand do not specifically mention performing text processing by tokenization, stemming, and lemmatization. Sundararaman discloses performing text processing by tokenization, stemming, and lemmatization (tokenizing, stemming, lemmatizing, [0064]). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Lyon and Rand by performing text processing by tokenization, stemming, and lemmatization in order to improve flexibility in analyzing large amounts of data, as suggested by Sundararaman ([0017]), predictably improving the overall process of performing data transformations and analysis, as suggested by Sundararaman ([0017]). The references cited are analogous art in the same field of audio processing (Sundararaman discloses receiving the query from the user as audio data, [0061]). Consider claim 14, Lyon and Rand do not, but Sundararaman discloses extracting features from the processed text and feed the features into an intent recognition model configured to classify intent, where in the intent recognition model comprising any of: a logistic regression model, a support vector machine, and a transformer-based model (intent classification using logistic regression or support vector machine based on text features, [0072-0075]). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Lyon and Rand by extracting features from the processed text and feed the features into an intent recognition model configured to classify intent, where in the intent recognition model comprising any of: a logistic regression model, a support vector machine, and a transformer-based model for reasons similar to those for claim 13. Consider claim 15, Lyon discloses: map an intent to a specific action on a target object associated with the at least one target node (determining the relevance of the command included in the voice input to a particular device, e.g. “stop music” should refer to the device playing music, [0035]); and send a command to the at least one target node to perfume the mapped specific action (sending a command to the device playing music to stop playing music, [0035], [0075]). Lyon and Rand do not specifically mention an intent classified by the intent recognition model. Sundararaman discloses an intent classified by the intent recognition model (intent classification using logistic regression or support vector machine based on text features, [0072-0075]). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Lyon and Rand by including an intent classified by the intent recognition model for reasons similar to those for claim 13. Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. US 20220309343 Elkhatib discloses a wake word processing system that uses a PCEN -> spectrogram acoustic front end (see [0025]) US 20240062745 Smyth discloses low power detection of wake words US 12488789 Huang discloses efficient open vocabulary keyword spotting US 20230104431 Smyth discloses noise robust representations for keyword spotting systems US 10360926 Mortensen discloses low-complexity voice activity detection Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jesse Pullias whose telephone number is 571/270-5135. The examiner can normally be reached on M-F 8:00 AM - 4:30 PM. The examiner’s fax number is 571/270-6135. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Andrew Flanders can be reached on 571/272-7516. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /Jesse S Pullias/ Primary Examiner, Art Unit 2655 12/11/25
Read full office action

Prosecution Timeline

May 28, 2024
Application Filed
Dec 11, 2025
Non-Final Rejection — §102, §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12596885
Automatically Labeling Items using a Machine-Trained Language Model
2y 5m to grant Granted Apr 07, 2026
Patent 12573378
SPEECH TENDENCY CLASSIFICATION
2y 5m to grant Granted Mar 10, 2026
Patent 12572740
MULTI-LANGUAGE DOCUMENT FIELD EXTRACTION
2y 5m to grant Granted Mar 10, 2026
Patent 12566929
COMBINING DATA SELECTION AND REWARD FUNCTIONS FOR TUNING LARGE LANGUAGE MODELS USING REINFORCEMENT LEARNING
2y 5m to grant Granted Mar 03, 2026
Patent 12536389
TRANSLATION SYSTEM
2y 5m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
83%
Grant Probability
96%
With Interview (+13.0%)
2y 8m
Median Time to Grant
Low
PTA Risk
Based on 1052 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month