Prosecution Insights
Last updated: May 29, 2026
Application No. 17/584,512

SERVER-BASED FALSE WAKE WORD DETECTION

Non-Final OA §101§103
Filed
Jan 26, 2022
Examiner
CHAVEZ, RODRIGO A
Art Unit
2658
Tech Center
2600 — Communications
Assignee
Spotify AB
OA Round
5 (Non-Final)
51%
Grant Probability
Moderate
5-6
OA Rounds
0m
Est. Remaining
90%
With Interview

Examiner Intelligence

Grants 51% of resolved cases
51%
Career Allowance Rate
119 granted / 233 resolved
-10.9% vs TC avg
Strong +39% interview lift
Without
With
+38.6%
Interview Lift
resolved cases with interview
Typical timeline
3y 3m
Avg Prosecution
15 currently pending
Career history
252
Total Applications
across all art units

Statute-Specific Performance

§101
3.8%
-36.2% vs TC avg
§103
84.7%
+44.7% vs TC avg
§102
9.6%
-30.4% vs TC avg
§112
0.7%
-39.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 233 resolved cases

Office Action

§101 §103
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Continued Examination Under 37 CFR 1.114 A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 11/25/2025 has been entered. Response to Arguments Applicant's arguments filed 11/25/2025 have been fully considered but they are not persuasive. Regarding the rejections under 35 U.S.C. § 101, applicant argues: “…the claims as written are not directed generally to multithreaded processing but rather involve a specific use of multithreaded processing. Furthermore, at a minimum, the claims as now written specifically require a processor to generate multiple processing threads, which is not something that would be done in a human mind but is rather a computer-specific operation…” … “Applicant also maintains the arguments set forth in the response that Applicant filed in June. For instance, Applicant's claim recites a specific manner of using multiprocessing in which multiple processing threads respectively analyze different portions of a media content item, and Applicant's claims and specification (e.g., para. 0069) expressly state that that particular use of multiprocessing is what works to expedite searching for a wake word in the media content item. Further, Applicant's recited use of multithreaded processing is not mere extra-solution activity but is rather a useful, practical, and meaningful manner of searching for a wake word, which amounts to significantly more than the concept of searching for a wake word” Regarding applicant’s arguments, the examiner respectfully disagrees. The examiner contends that the applicant’s reading of the claim language appears to be narrower than the broadest reasonable interpretation. Specifically, the process of “generating multiple processing threads” is not a process that would specifically require a specialized computer processor. As noted in the rejection below, under BRI, one may associate the generating of multiple processing threads to assigning portions of a large media content item to multiple humans for analysis. Even if the processing threads are specific to “wake-word” processing, under BRI, a human is capable of performing “wake-word” searching within the portion of a large media content item. Furthermore, even when taken as a whole, the recited multithreaded processor is no more than a processor that is well understood, routine, and conventional in the field, performing processes that a human (or multiple humans) is/are capable of performing. Therefore, the examiner maintains that the claimed language constitutes abstract subject matter and is ineligible under 35 U.S.C. § 101. Regarding the rejections under 35 U.S.C. § 103, Applicant’s arguments with respect to claim(s) 21-40 have been considered but are moot because of the new ground of rejection in view of Smith and Poirier for claims 21-24, 28, 31-36, 39 and 40; and in view of Smith, Poirier and Engineer for claims 25-27, 29, 30, 37 and 38. Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims 21-40 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more. The Supreme Court has long held that “[l]aws of nature, natural phenomena, and abstract ideas are not patentable.” Alice Corp. Pty. Ltd. v. CLS Bank Int’l, 134 S. Ct. 2347, 2354 (2014) (quoting Assoc. for Molecular Pathology v. Myriad Genetics, Inc., 133 S. Ct. 2107, 2116 (2013) (internal quotation marks omitted)). The “abstract ideas” category embodies the longstanding rule that an idea, by itself, is not patentable. Alice Corp., 134S. Ct. at 2355 (quoting Gottschalk v. Benson, 409 U.S. 63, 67 (1972). In Alice, the Supreme Court sets forth an analytical “framework for distinguishing patents that claim laws of nature, natural phenomena, and abstract ideas [or mental processes ] from those that claim patent-eligible applications of those concepts.” Id. at 2355 (citing Mayo Collaborative Servs. v. Prometheus Labs., Inc., 132 S. Ct. 1289, 1296–97 (2012)). The first step in the analysis is to “determine whether the claims at issue are directed to one of those patent-ineligible concepts.” Id. If the claims are directed to a patent-ineligible concept, the second step in the analysis is to consider the elements of the claims “individually and ‘as an ordered combination’” to determine whether there are additional elements that “‘transform the nature of the claim’ into a patent-eligible application.” Id. (quoting Mayo, 132 S. Ct. at 1298, 1297). In other words, the second step is to “search for an ‘inventive concept’—i.e., an element or combination of elements that is ‘sufficient to ensure that the patent in practice amounts to significantly more than a patent upon the [ineligible concept] itself’”. Id. (brackets in original) (quoting Mayo, 132 S. Ct. at 1294). The prohibition against patenting an abstract idea “‘cannot be circumvented by attempting to limit the use of the formula to a particular technological environment’ or adding ‘insignificant post-solution activity.’” Bilski v. Kappos, 561 U.S. 593, 610–11 (2010) (citation omitted). Step 1: This part of the eligibility analysis evaluates whether the claim falls within any statutory category. See MPEP 2106.03. Independent Claim 21 recites the method of analyzing different portions of a media content item in search of a wake word, detecting at least one instance of the wake word, generating metadata indicating a time of occurrence of the wake word within the media content item and avoiding responding to the identified wake word during playout of the media content item, and thus is a process (a series of steps or acts). A process is a statutory category of invention. Independent Claim 34 recites a system comprising a processor configured to execute a method similar to Claim 21. An system or apparatus is a Statutory category of invention. Dependent claims 22-33 and 35-40 are dependent on claims 21 and 34, respectively, and therefore recite their respective statutory classes. Step 2A, Prong One: This part of the eligibility analysis evaluates whether the claim recites a judicial exception. As explained in MPEP 2106.04, subsection II, a claim “recites” a judicial exception when the judicial exception is “set forth” or “described” in the claim. In applying the framework set out in Alice, examiner found Applicant’s claims 21 and 34 are directed to a patent-ineligible abstract concept of detecting false wake words in media content and avoid responding to the false wake word. The steps of Applicant’s claims 21-40 are an abstract concept that would fall under the judicial exception of mental processes. Specifically, the claims recite the step of “generating… multiple wake-word processing threads.” The broadest reasonable interpretation of this language may be associated with assigning different segments of a large media content item to multiple humans for analysis. Therefore, the language is directed to a mental process. Further, the claim recites “parsing a media content into multiple separate portions”. Following the same example as before, this step is simply associated to the dividing of the large media content item into chunks that are smaller than the whole. Thus, the language is directed to a mental process. Further the claim recites “executing… the multiple generated processing threads, with each wake-word processing thread processing a respective separate portion of the multiple separate portions of the media content item in search of sound matching a sound signature of a wake word, thereby expediting searching for the wake word within the media content.” The broadest reasonable interpretation of such a limitation may involve a human listening to the segment of the large media content item to detect a specific keyword within the media content. Therefore, this step is directed to a mental process. Although a multi-threaded process is used to analyze the media content item, the claim fails to place any limits on how the multi-threaded processor applies two or more threads to search for the wake word in the media content item, such that the language is applied at a high level of generality and, as such, constitutes a generic computing environment (more on this will be provided further in the analysis of the additional elements). Furthermore, the step of “detecting… based on the executing, at least one instance of the wake word within the media content item” recites steps that are directed to a mental process. Similar to the previous limitation, under the broadest reasonable interpretation, the limitation is directed to listening and identifying specific keywords within the media content item, which a human is capable of performing in the mind by listening to the media. Finally, the step of “generating, based on the detecting, metadata indicating time of occurrence within the media content item of the at least one instance of the wake word, whereby the generated metadata facilitates having a voice-enabled device avoid responding to the wake word during playout of the media content item” falls under the mental process grouping. The limitation involves indicating time of occurrence of the wake word within the media content item, which under the broadest reasonable interpretation, a human is capable of performing by identifying the wake word in the media content and writing down the time(s) of occurrence on pen and paper. Further, the avoiding responding to the wake word during playout of the media content item, may simply correspond to the listener avoiding to perform an action or indicating the presence of the wake word out loud. The claims recite limitations that taken in combination, recite at least a series of mental processes. Step 2A, Prong Two: This part of the eligibility analysis evaluates whether the claim as a whole integrates the recited judicial exception into a practical application of the exception. This evaluation is performed by (1) identifying whether there are any additional elements recited in the claim beyond the judicial exception, and (2) evaluating those additional elements individually and in combination to determine whether the claim as a whole integrates the exception into a practical application. See MPEP 2106.04(d). As discussed above, the claims recite “a multithreaded processor” as an additional element beyond the judicial exception. The examiner has found, however, that the multithreaded processor provides no further detail and is recited at such a high-level of generality that this limitation is merely a post-solution step. As multi-threaded processing is such a common practice in the art, the claim needs meaningful limitations that do not simply append the use of multi-threaded processing without significantly more. Simply noting that the searching is expedited through the use of multi-threaded processing does not sufficiently explain how the multi-threaded processing is applied in a meaningful way. Therefore, this step is an insignificant extra-solution activity and does not integrate the judicial exception into a practical application. See MPEP 2106.05(g). Furthermore, independent Claim 34 further recites “a memory; and at least one processor, in communication with the memory…” as an additional element beyond the judicial exception. However, these additional elements do not amount to significantly more than the abstract idea because the additional elements constitute a generic computer environment. Alice, 134 S. Ct. at 2357. The Claims need meaningful limitations that go beyond generally linking the use of an abstract idea to a particular technological environment. Therefore, the steps are all abstract and the Claim as a whole is abstract. “[S]imply appending generic computer functionality to lend speed or efficiency to the performance of an otherwise abstract concept does not meaningfully limit claim scope for purposes of patent eligibility.” CLS Bank, 2013 U.S. App. LEXIS 9493, at *29 (citing Bancorp, 687 F.3d at 1278, and Dealertrack, Inc. v. Huber, 674 F.3d 1315, 1333-34 (Fed. Cir. 2012) (finding that the claimed computer-aided clearinghouse process is a patent-ineligible abstract idea)); SiRF Tech., Inc. v. Int'l Trade Comm'n, 601 F.3d 1319, 1333 (Fed. Cir. 2010) (“In order for the addition of a machine to impose a meaningful limit on the scope of a claim, it must play a significant part in permitting the claimed method to be performed, rather than function solely as an obvious mechanism for permitting a solution to be achieved more quickly, i.e., through the utilization of a computer for performing calculations.”). Additionally, dependent claims 22-33 and 35-40 do not provide any additional elements that integrate the judicial exception into a practical application. The claims simply describe further the transmittal of the generated metadata between devices, the capability of streaming media content and searching in real-time, and further description of how the time of occurrence is identified and used to carry out the avoiding to respond to the wake word. The broadest reasonable interpretation provides that the transmittal of data between devices may be performed by a human with pen and paper by sharing the identified data with other humans. Further, the streaming of data is also simply a form of communication of information that a human is able to perform. Thus the steps of dependent claims do not amount to significantly more than the abstract idea. Step 2B: This part of the eligibility analysis evaluates whether the claim as a whole amounts to significantly more than the recited exception, i.e., whether any additional element, or combination of additional elements, adds an inventive concept to the claim. See MPEP 2106.05. At step 2A, prong two, the additional elements of multi-threaded processing and the “memory and processor” were found to be insignificant extra-solution activity and a generic computer environment. At Step 2B, the re-evaluation of the insignificant extra-solution activity consideration takes into account whether or not the extra-solution activity is well understood, routine, and conventional in the field. See MPEP 2106.05(g). Here, the step of applying multithreaded processing is recited at such a high level of generality that it fails to recite anything more than what is well understood, routine and conventional in the field of multi-threaded processing. Therefore, this limitation remains insignificant extra-solution activity even upon reconsideration and does not amount to significantly more. Even when considered in combination, these additional elements represent mere instructions to apply an exception and insignificant extra-solution activity, and therefore do not provide an inventive concept. Additionally, dependent claims 22-33 and 35-40 do not add an inventive concept. In conclusion, Examiner notes that none of recited steps in Applicant's claims 1, 5, 9 and 15 refer to a specific machine by reciting structural limitations of any apparatus or to any specific operations that would cause a machine to be the mechanism to perform these steps. Although the claims may be processed by a computing system having a processor, the computing system is merely a general purpose computing system. Therefore, all of the claims 21-40 are abstract. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 21-24, 28, 31-36, 39 and 40 are rejected under 35 U.S.C. 103 as being unpatentable over Smith (US PG Pub 20200090646) in view of Poirier et al. (US Patent 10,109,279; hereinafter “Poirier”). As per claims 21 and 34, Smith discloses: A method and media-delivery system comprising: a memory (Smith; Fig. 2A, item 213; p. 0055 - the playback device 102 includes at least one processor 212, which may be a clock-driven computing component configured to process input data according to instructions stored in memory 213); and at least one processor, in communication with the memory, that causes the media-delivery system to carry out operations including (Smith; Fig. 2A, item 212; p. 0055 - the memory 213 may be data storage that can be loaded with software code 214 that is executable by the processor 212 to achieve certain functions): detecting, based on the executing, at least one instance of the wake word within the media content item (Smith; p. 0028-0030 - The wake-word engine suppressor may be configured to identify one or more false wake words in an audio stream that is to be output by the playback device, and when a false wake word is identified, the wake-word engine suppressor may be configured to temporarily deactivate the playback device's primary wake-word engine and cause the playback device to temporarily deactivate the wake-word engines of one or more other NMDs (e.g., one or more NMD-equipped playback devices); see also p. 0032 – keyword spotting; see also p. 0112); and whereby the detecting of the wake word facilitates having a voice-enabled device avoid responding to the wake word during playout of the media content item (Smith; p. 0028-0030 - The wake-word engine suppressor may be configured to identify one or more false wake words in an audio stream that is to be output by the playback device, and when a false wake word is identified, the wake-word engine suppressor may be configured to temporarily deactivate the playback device's primary wake-word engine and cause the playback device to temporarily deactivate the wake-word engines of one or more other NMDs (e.g., one or more NMD-equipped playback devices); see also p. 0032 – keyword spotting; see also p. 0112). Smith, however, fails to disclose generating, by a multithreaded processor, multiple wake-word processing threads; parsing a media content item into multiple separate portions; executing, by the multithreaded processor, the multiple generated wake-word processing threads, with each wake-word processing thread processing a respective separate portion of the multiple separate portions of the media content item in search of sound matching a sound signature of a wake word, thereby expediting searching for the wake word within the media content item; and generating, based on the detecting, metadata indicating time of occurrence within the media content item of the at least one instance of the wake word. Poirier does teach generating, by a multithreaded processor, multiple wake-word processing threads (Poirier; Fig. 3, items 300, 301 and 302; Col. 6, lines 48-67 - Referring to FIG. 3, there are 3 arrays of Binary Speech Engines 300, 301, and 302 (multiple processing threads)); parsing a media content item into multiple separate portions (Poirier; Fig. 3, items 304; Col. 6, lines 48-67 - The Audio Stream of spoken words enters a Divider function (304) where individual words or phrases are separated from the entire spoke audio stream); executing, by the multithreaded processor, the multiple generated wake-word processing threads, with each wake-word processing thread processing a respective separate portion of the multiple separate portions of the media content item in search of sound matching a sound signature of a wake word, thereby expediting searching for the wake word within the media content item (Poirier; Fig. 3, items 304; Col. 6, lines 48-67 - The separated words continue on to an array of Binary Speech Engines as can be seen that Word 1 (305 and 308) is provided to all the binary speech engines in Binary Speech Engine (BSE) 1 Array (300). Items 305 and 308 are the same, just shown as output of the divider stage and input as the recognition stage. In this example there are 3 Binary Speech Engine Arrays BSE 1 (300), BSE 2 (301) and BSE 3 (302). As the separate audio words are output from the divider (305, 306, and 307 in this example) they are parallel processed (multithreaded processing) in separate BSE arrays to increase processing speed through parallelism); and generating, based on the detecting, metadata indicating time of occurrence within the media content item of the at least one instance of the wake word (Poirier; Col. 8, lines 15-31 - Index the recognized word to full audio using a timestamp or some other method (708 and (709)). Although Poirier fails to explicitly teach that the threads are specifically “wake-word processing threads”, Poirier teaches keyword searching using multithreaded or parallel processing which may be combined with Smith’s wake-word processing to provide Smith’s invention with the capability of searching for a wake-word using Poirier’s multithreaded processing. Therefore, it would have been obvious to one of ordinary skill in the art to modify the method and media-delivery system of Smith to include generating, by a multithreaded processor, multiple wake-word processing threads; parsing a media content item into multiple separate portions; executing, by the multithreaded processor, the multiple generated wake-word processing threads, with each wake-word processing thread processing a respective separate portion of the multiple separate portions of the media content item in search of sound matching a sound signature of a wake word, thereby expediting searching for the wake word within the media content item; and generating, based on the detecting, metadata indicating time of occurrence within the media content item of the at least one instance of the wake word, as taught by Poirier, because parallel processing reduces transcription turnaround time (Poirier; Col. 2, lines 44-54). As per claims 22 and 35, Smith in view of Poirier discloses: The method and media-delivery system of claims 21 and 34, wherein the voice-enabled device is a voice-enabled media-playback device (Smith; p. 0022 - In turn, the VAS corresponding to the wake word that was identified by the wake-word engine receives the transmitted sound data from the NMD over a communication network. A VAS traditionally takes the form of a remote service implemented using one or more cloud servers configured to process voice inputs (e.g., AMAZON's ALEXA, APPLE's SIRI, MICROSOFT's CORTANA, GOOGLE'S ASSISTANT, etc.). In some instances, certain components and functionality of the VAS may be distributed across local and remote devices), whereby the detecting of the wake word facilitates having the voice-enabled media-playback device avoid responding to the wake word when the voice-enabled media-playback device plays the wake word within the transmitted media content item (Smith; p. 0028-0030 - The wake-word engine suppressor may be configured to identify one or more false wake words in an audio stream that is to be output by the playback device, and when a false wake word is identified, the wake-word engine suppressor may be configured to temporarily deactivate the playback device's primary wake-word engine and cause the playback device to temporarily deactivate the wake-word engines of one or more other NMDs (e.g., one or more NMD-equipped playback devices); see also p. 0032 – keyword spotting; see also p. 0112). And further, Poirier teaches transmitting to the voice-enabled media-playback device the media content item and the generated metadata (Poirier; Col. 7, lines 49-60 - Then additional audio segments that do not fall below the Sound Level threshold continued to be stored in the segment buffer. When the next audio segment threshold is below the set Sound Level trigger level the Audio Tag Marker Control (504) is checked for BOS=True. If BOS=True then this is not Begin of Sub Event (BOS) (512) and the process continues on to Tag Segment as EOS (511) and then the Audio Sub Event (i.e. the audio with a word) is created and send to the Binary Speech Engine Array to be transcribed). Although Smith fails to package the identified wake word and time of occurrence of the wake word into metadata and transmit it to the voice enabled device, Poirier provides for this functionality in the cited portions of its disclosure, so that Smith’s network of devices is able to take advantage of the transmitted metadata. Therefore, it would have been obvious to one of ordinary skill in the art to modify the method and system of Smith to include transmitting to the voice-enabled media-playback device the media content item and the generated metadata, as taught by Poirier, because parallel processing reduces transcription turnaround time (Poirier; Col. 2, lines 44-54). As per claim 23, Smith in view of Poirier discloses: The method of claim 22, wherein transmitting the media content item to the voice-enabled media-playback device comprises streaming the media content item to the voice-enabled media-playback device (Smith; p. 0031 - In practice, the playback device may receive the audio stream that the wake-word engine suppressor analyzes via an audio interface, which may take a variety of forms and may be configured to receive audio from a variety of sources. As one example, the audio interface may take the form of an analog and/or digital line-in receptacle that physically connects the playback device to an audio source, such as a CD player or a TV). As per claims 24 and 36, Smith in view of Poirier disclose: The method and media-delivery system of claims 22 and 35, upon which claims 24 and 36 depend. And further, Poirier teaches wherein the executing, detecting, and generating occur before transmitting the media content item to the voice-enabled media-playback device (Poirier; Col. 7, lines 49-60 - Then additional audio segments that do not fall below the Sound Level threshold continued to be stored in the segment buffer. When the next audio segment threshold is below the set Sound Level trigger level the Audio Tag Marker Control (504) is checked for BOS=True. If BOS=True then this is not Begin of Sub Event (BOS) (512) and the process continues on to Tag Segment as EOS (511) and then the Audio Sub Event (i.e. the audio with a word) is created and send to the Binary Speech Engine Array to be transcribed). Although Smith fails to package the identified wake word and time of occurrence of the wake word into metadata and transmit it to the voice enabled device, Poirier provides for this functionality in the cited portions of its disclosure, so that Smith’s network of devices is able to take advantage of the transmitted metadata. Therefore, it would have been obvious to one of ordinary skill in the art to modify the method and system of Smith to include wherein the executing, detecting, and generating occur before transmitting the media content item to the voice-enabled media-playback device, as taught by Poirier, because parallel processing reduces transcription turnaround time (Poirier; Col. 2, lines 44-54). As per claim 28, Smith in view of Poirier disclose: The method of claim 27, wherein the metadata causes the voice-enabled media-playback device to deactivate a wake word detector of the voice-enabled media-playback device for a duration of the time range (Smith; p. 0165 - In practice, the particular amount of time for which the at least one NMD is to deactivate its wake-word engine corresponding to the identified false wake word may have a duration that is sufficient to allow the playback device 102a to output audio, via the speakers 218, that comprises the false wake word and/or to allow the NMD to receive a sound input comprising that output audio…). As per claim 31, Smith in view of Poirier disclose: The method of claim 30, wherein the media stream comprise a live stream (Smith; p. 0118 - The voice extractor 572 transmits or streams these messages, M.sub.V, that may contain voice input in real time or near real time to a remote VAS, such as the VAS 190 (FIG. 1B), via the network interface 224). As per claim 32, Smith in view of Poirier disclose: The method of claim 21, further comprising: determining that the media content has been updated; and responsive to determining that the media content item has been updated, repeating the executing, detecting, and generating (Smith; p. 0105 - In some embodiments, audio content sources may be added or removed from a media playback system such as the MPS 100 of FIG. 1A. In one example, an indexing of audio items may be performed whenever one or more audio content sources are added, removed, or updated. Indexing of audio items may involve scanning for identifiable audio items in all folders/directories shared over a network accessible by playback devices in the media playback system and generating or updating an audio content database comprising metadata (e.g., title, artist, album, track length, among others) and other associated information, such as a URI or URL for each identifiable audio item found. Other examples for managing and maintaining audio content sources may also be possible). As per claim 33, Smith in view of Poirier disclose: The method of claim 21, further comprising: wherein the media-playback device plays the transmitted media content item and forwards to a voice-enabled device the metadata to cause the voice-enabled device to avoid responding to the wake word when playout of the media content item by the media-playback device includes playout of the wake word (Smith; p. 0028-0030 - The wake-word engine suppressor may be configured to identify one or more false wake words in an audio stream that is to be output by the playback device, and when a false wake word is identified, the wake-word engine suppressor may be configured to temporarily deactivate the playback device's primary wake-word engine and cause the playback device to temporarily deactivate the wake-word engines of one or more other NMDs (e.g., one or more NMD-equipped playback devices); see also p. 0032 – keyword spotting; see also p. 0112). And further, Poirier teaches transmitting the media content item and the generated metadata to a media-playback device (Poirier; Col. 7, lines 49-60 - Then additional audio segments that do not fall below the Sound Level threshold continued to be stored in the segment buffer. When the next audio segment threshold is below the set Sound Level trigger level the Audio Tag Marker Control (504) is checked for BOS=True. If BOS=True then this is not Begin of Sub Event (BOS) (512) and the process continues on to Tag Segment as EOS (511) and then the Audio Sub Event (i.e. the audio with a word) is created and send to the Binary Speech Engine Array to be transcribed). Although Smith fails to package the identified wake word and time of occurrence of the wake word into metadata and transmit it to the voice enabled device, Poirier provides for this functionality in the cited portions of its disclosure, so that Smith’s network of devices is able to take advantage of the transmitted metadata. Therefore, it would have been obvious to one of ordinary skill in the art to modify the method and system of Smith to include transmitting the media content item and the generated metadata to a media-playback device, as taught by Poirier, because parallel processing reduces transcription turnaround time (Poirier; Col. 2, lines 44-54). As per claim 39, Smith in view of Poirier disclose: The media-delivery system of claim 34, upon which claim 39 depends. And further, Poirier teaches wherein the metadata indicates a time range that includes the wake word in the media stream (Poirier; Col. 8, lines 15-31 - Index the recognized word to full audio using a timestamp or some other method (708 and (709)). Although Poirier fails to explicitly teach that the threads are specifically “wake-word processing threads”, Poirier teaches keyword searching using multithreaded or parallel processing which may be combined with Smith’s wake-word processing to provide Smith’s invention with the capability of searching for a wake-word using Poirier’s multithreaded processing. Therefore, it would have been obvious to one of ordinary skill in the art to modify the method and media-delivery system of Smith to include wherein the metadata indicates a time range that includes the wake word in the media stream, as taught by Poirier, because parallel processing reduces transcription turnaround time (Poirier; Col. 2, lines 44-54). As per claim 40, Smith in view of Poirier disclose: The media-delivery system of claim 39, wherein the metadata causes the voice-enabled media-playback device to deactivate a wake word detector of the voice-enabled media-playback device for a duration of the time range (Smith; p. 0028-0030 - The wake-word engine suppressor may be configured to identify one or more false wake words in an audio stream that is to be output by the playback device, and when a false wake word is identified, the wake-word engine suppressor may be configured to temporarily deactivate the playback device's primary wake-word engine and cause the playback device to temporarily deactivate the wake-word engines of one or more other NMDs (e.g., one or more NMD-equipped playback devices); see also p. 0032 – keyword spotting; see also p. 0112). Claims 25-27, 29, 30, 37 and 38 are rejected under 35 U.S.C. 103 as being unpatentable over Smith in view of Poirier and further in view of Engineer (US PG Pub 20180160189). As per claim 25 and 37, Smith in view of Poirier disclose: The method and media-delivery system of claims 22 and 35, upon which claims 25 and 37 depend. Smith in view of Poirier, however, fail to disclose wherein the executing, detecting, and generating occur while transmitting the media content item to the voice-enabled media-playback device. Engineer does teach wherein the executing, detecting, and generating occur while transmitting the media content item to the voice-enabled media-playback device (Engineer; p. 0021 - While the video content is playing, the search component also can display the textual information (e.g., words) being spoken in the content, and can highlight or otherwise emphasize the word(s) that corresponds to (e.g., matches or substantially matches) the keyword(s) in the search as the word(s) is being displayed or scrolled across the display screen… displaying search results while media content is being streamed (transmitted)). Therefore, it would have been obvious to one of ordinary skill in the art to modify the method of Smith and Poirier to include wherein the executing, detecting, and generating occur while transmitting the media content item to the voice-enabled media-playback device, as taught by Engineer, in order to facilitate the search and identification of keywords in media content for highlighting or emphasizing as the keywords are displayed to the user (Engineer; p. 0021). As per claim 26, Smith in view of Poirier disclose: The method of claim 21, upon which claim 26 depends. Smith in view of Poirier, however, fail to disclose wherein the metadata includes a first time indicating a start of a time range that includes the wake word in the media content item. And further, Engineer teaches wherein the metadata includes a first time indicating a start of a time range that includes the wake word in the media content item (Engineer; p. 0021 - The presentation of the content can start from a time position associated with a time indicator, for example, in response to a user selecting to play the content or selecting the time indicator; see also p. 0037-0038 - With regard to each word in an item of content that relates to a keyword associated with the search query, the search component 118 can generate time information and/or an indicator (e.g., a time location indicator) to facilitate indicating a time location in the item of content where the word is located; see also p. 0073). Therefore, it would have been obvious to one of ordinary skill in the art to modify the method of Smith and Poirier to include wherein the metadata includes a first time indicating a start of a time range that includes the wake word in the media content item, as taught by Engineer, in order to facilitate the search and identification of keywords in media content for highlighting or emphasizing as the keywords are displayed to the user (Engineer; p. 0021). As per claims 27, Smith in view of Poirier and Engineer disclose: The method of claim 26, upon which claim 27 depends. And further, Engineer teaches wherein the metadata includes a second time indicating an end of the time range that includes the wake word in the media content item (Engineer; p. 0021 - The presentation of the content can start from a time position associated with a time indicator, for example, in response to a user selecting to play the content or selecting the time indicator; see also p. 0037-0038 - With regard to each word in an item of content that relates to a keyword associated with the search query, the search component 118 can generate time information and/or an indicator (e.g., a time location indicator) to facilitate indicating a time location in the item of content where the word is located; see also p. 0073). Therefore, it would have been obvious to one of ordinary skill in the art to modify the method of Smith and Poirier to include wherein the metadata includes a second time indicating an end of the time range that includes the wake word in the media content item, as taught by Engineer, in order to facilitate the search and identification of keywords in media content for highlighting or emphasizing as the keywords are displayed to the user (Engineer; p. 0021). As per claim 29, Smith in view of Poirier and Engineer disclose: The method of claim 27, upon which claim 29 depends. And further, Engineer wherein the first time is indicated by a first offset from a start time of the media content item, and wherein the second time is indicated by a second offset from the start time of the media content item (Engineer; p. 0021 - The presentation of the content can start from a time position associated with a time indicator, for example, in response to a user selecting to play the content or selecting the time indicator; see also p. 0037-0038 - With regard to each word in an item of content that relates to a keyword associated with the search query, the search component 118 can generate time information and/or an indicator (e.g., a time location indicator) to facilitate indicating a time location in the item of content where the word is located; see also p. 0073). Therefore, it would have been obvious to one of ordinary skill in the art to modify the method of Smith and Poirier to include wherein the first time is indicated by a first offset from a start time of the media content item, and wherein the second time is indicated by a second offset from the start time of the media content item, as taught by Engineer, in order to facilitate the search and identification of keywords in media content for highlighting or emphasizing as the keywords are displayed to the user (Engineer; p. 0021). As per claims 30 and 38, Smith in view of Poirier disclose: The method and media-delivery system of claims 21 and 34, wherein the media content item comprises media stream that a media-delivery system receives and forwards in real-time to the voice-enabled media-playback device (Smith; p. 0118 - The voice extractor 572 transmits or streams these messages, M.sub.V, that may contain voice input in real time or near real time to a remote VAS, such as the VAS 190 (FIG. 1B), via the network interface 224). Smith in view of Poirier, however, fail to disclose wherein the media content item comprises media stream that a media-delivery system receives and forwards in real-time to the voice-enabled media-playback device. Engineer teaches wherein the media content item comprises media stream that a media-delivery system receives and forwards in real-time to the voice-enabled media-playback device (Engineer; p. 0022 & p. 0024 - In certain implementations, the search component can be contained in, and executed in, a media device, such as, for example, a set-top box (STB) or set-top unit (STU), which can be associated with (e.g., communicatively connected to) a presentation component (e.g., a television) or another type of communication device (e.g., mobile phone, electronic pad or tablet, electronic notebook, computer, . . . ); see also p. 0082 – application component communicating with media device for streaming media content (real-time) and presentation component (for display of media content)), wherein the executing, detecting, and generating are carried out by the media-delivery system as the media-delivery system receives and forwards the media stream to the voice-enabled media-playback device (Engineer; p. 0082 – application component communicating with media device for streaming media content (real-time) and presentation component (for display of media content)). Therefore, it would have been obvious to one of ordinary skill in the art to modify the method and system of Smith and Poirier to include wherein the media content item comprises media stream that a media-delivery system receives and forwards in real-time to the voice-enabled media-playback device, wherein the executing, detecting, and generating are carried out by the media-delivery system as the media-delivery system receives and forwards the media stream to the voice-enabled media-playback device, as taught by Engineer, in order to facilitate the search and identification of keywords in media content for highlighting or emphasizing as the keywords are displayed to the user (Engineer; p. 0021). Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The prior art made of record and not relied upon includes: Lang (US PG Pub 20190043492) discloses example techniques involving determining a direction of a NMD. An example implementation includes a playback device receiving data representing audio content for playback by the playback device. Before the audio content is played back by the playback device, the playback device detects, in the audio content, one or more wake words for one or more voice services. The playback device causes one or more networked microphone devices to disable its respective wake response to the detected one or more wake words during playback of the audio content by the playback device and plays back the audio content via one or more speakers. When enabled, the wake response of a given networked microphone device to a particular wake word causes the given networked microphone device to listen, via a microphone, for a voice command following the particular wake word (Lang; Abstract). Any inquiry concerning this communication or earlier communications from the examiner should be directed to Rodrigo A Chavez whose telephone number is (571)270-0139. The examiner can normally be reached Monday - Friday 9-6 ET. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on 5712727602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /RODRIGO A CHAVEZ/Examiner, Art Unit 2658 /RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658
Read full office action

Prosecution Timeline

Show 9 earlier events
Jun 12, 2025
Response Filed
Jul 08, 2025
Examiner Interview Summary
Jul 08, 2025
Applicant Interview (Telephonic)
Oct 01, 2025
Final Rejection mailed — §101, §103
Nov 25, 2025
Response after Non-Final Action
Jan 16, 2026
Request for Continued Examination
Jan 26, 2026
Response after Non-Final Action
Apr 01, 2026
Non-Final Rejection mailed — §101, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12620044
SYSTEMS AND METHODS FOR TRACKING DISASTER FOOTPRINTS WITH SOCIAL STREAMING DATA
4y 5m to grant Granted May 05, 2026
Patent 12597430
MULTI-CHANNEL SIGNAL GENERATOR, AUDIO ENCODER AND RELATED METHODS RELYING ON A MIXING NOISE SIGNAL
3y 1m to grant Granted Apr 07, 2026
Patent 12579984
DATA AUGMENTATION SYSTEM AND METHOD FOR MULTI-MICROPHONE SYSTEMS
4y 1m to grant Granted Mar 17, 2026
Patent 12541653
ENTERPRISE COGNITIVE SOLUTIONS LOCK-IN AVOIDANCE
4y 3m to grant Granted Feb 03, 2026
Patent 12542136
DYNAMICALLY CONFIGURING A WARM WORD BUTTON WITH ASSISTANT COMMANDS
4y 2m to grant Granted Feb 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

5-6
Expected OA Rounds
51%
Grant Probability
90%
With Interview (+38.6%)
3y 3m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 233 resolved cases by this examiner. Grant probability derived from career allowance rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month