Prosecution Insights
Last updated: April 19, 2026
Application No. 18/260,196

DISTRIBUTED SPEECH PROCESSING SYSTEM AND METHOD

Final Rejection §101§103
Filed
Jun 30, 2023
Examiner
OGUNBIYI, OLUWADAMILOL M
Art Unit
2653
Tech Center
2600 — Communications
Assignee
Espressif Systems (Shanghai) Co. Ltd.
OA Round
2 (Final)
78%
Grant Probability
Favorable
3-4
OA Rounds
2y 12m
To Grant
96%
With Interview

Examiner Intelligence

Grants 78% — above average
78%
Career Allow Rate
236 granted / 304 resolved
+15.6% vs TC avg
Strong +19% interview lift
Without
With
+18.6%
Interview Lift
resolved cases with interview
Typical timeline
2y 12m
Avg Prosecution
31 currently pending
Career history
335
Total Applications
across all art units

Statute-Specific Performance

§101
20.1%
-19.9% vs TC avg
§103
47.0%
+7.0% vs TC avg
§102
12.1%
-27.9% vs TC avg
§112
13.7%
-26.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 304 resolved cases

Office Action

§101 §103
DETAILED ACTION Claims 1 – 20 are pending. Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Response to Amendment With regard to the Non-Final Office Action from 30 July 2025, the Applicant has filed a response on 23 October 2025. Claims 4 and 14 were objected to for minor informalities. The expansions of the terms — MFCC and PLP have now been presented before the acronyms, and the Examiner hereby withdraws the claim objection. Response to Arguments With regard to the 35 U.S.C. 101 rejection given to the claims for being directed to a judicial exception without significantly more, the Applicant has amended the independent claims, stating (Remarks: page 9 par 5) that the limitations of claim 1 are all integrated into practical application scenarios, the processes involving ‘specific processing and conversion of audio and speech signals with physical characteristics’ which ‘cannot be achieved through any mental process.’ The Examiner maintains that the features here can actually be mentally performed by a human. The Examiner considers the limitations of claim 3 to address the pre-processing, the examples given being: a sound feature value which can be an indication of the features of the audio and can be performed mentally; a sound quality which can be an indication of if the quality of the received audio is good enough to be processed for understanding the contents of the audio and can be performed mentally; and a sound time information which can be a user indicating the duration of the received audio or the times at which the received audio began and ended, and can clearly also be performed mentally. With these, indications can be provided as the first and second pre-processing, which may then be provided to the human for the purpose of speech recognition by presenting text attributable to the received speech. Other humans who listened to the original audio signal may also perform speech recognition of their own in the same fashion, to then lead to writing out the text corresponding to the speech that was heard. Another (possibly a third) human may take the results from the several speech recognitions, combine them, and produce a final speech recognition result. All these, as indicated above, can be performed mentally, contrary to the Applicant’s assertion that they can’t be achieved through any mental process. The Applicant further states (Remarks: page 9 par 5) that even if they are directed to a judicial exception, ‘this judicial exception is integrated into practical application scenarios, e.g., multi-device collaborative speech recognition and controls scenarios such as smart homes and the Internet of Things.’ The Examiner indicates that while this may be true, this is not explicitly provided by the claim limitations. The Applicant further indicates (Remarks: page 10 par 2) that ‘claim 1 provides a technical improvement in the field of speech recognition and processing’ but as indicated above, all limitations of the claim appear to be directed to a mental process, and it does not appear to introduce such technical improvements as indicated by the Specification since these are not explicitly mentioned in the claims. With regard to the 35 U.S.C. 103 rejection given to the claims, the Applicant has amended claim 1 to include subject matter previously-presented in claim 2. The Applicant distinguishes (Remarks: page 13 last par) the preprocessing locations of both the first and second preprocessed results, stating that the first preprocessed result is performed locally, while the one or more second preprocessed results is from a separate node device, in an attempt to distinguish it from the applied prior art. This is followed by (Remarks: page 14 par 2) the transmission of sound preprocessed results among various node devices. The Examiner acquiesces to this and indicates that this will be properly addressed in the following section. Claim Interpretations The following is a quotation of 35 U.S.C. 112(f): (f) Element in Claim for a Combination. — An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. The following is a quotation of pre-AIA 35 U.S.C. 112, sixth paragraph: An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, because the claim limitations uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitations are: ‘the sound acquisition module is configured to acquire an audio signal’ in claim 1; ‘the sound processing module is configured to preprocess the audio signal …’ in claim 1; ‘the communication module is configured to send …’ in claim 1. Because these claim limitations are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, they are being interpreted to cover a processor and communication modules such as those capable of WiFi/BLE/Zigbee communication protocols (page 8 lines 6–17) and a microphone (page 8 lines 15–16), as the corresponding structures described in the Specification as performing the claimed function, and equivalents thereof. If Applicant does not intend to have these limitations interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, Applicant may: (1) amend the claim limitations to avoid them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitations recite sufficient structure to perform the claimed function so as to avoid them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims 1 – 20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception without significantly more. Independent claims 1 and 11 recite the limitations of performing distributed speech processing with a plurality of node devices in a network, acquiring an audio signal, pre-processing the audio signal to obtain a first pre-processed result, communicating the first pre-processed result to one or more node devices in the network, receiving a second pre-processed result from another node device over the network, performing speech recognition on both the first and second sound pre-processed results, these being intermediate results of speech recognition, receiving one or more speech recognition results from another node device over the network, and finally performing speech recognition based on the first and the one or more second speech recognition results in order to be able to obtain a final speech recognition result. Apart from mentioning the plurality of node devices, the network, the processor, memory, communication module, and a sound acquisition module, nothing in the claims preclude the claimed technique from being performed in the human mind. The entire process involves data gathering through the acquiring of an audio signal as well as acquiring the first and one or more second speech recognition results, data manipulation through the several pre-processing steps as well as obtaining a final speech recognition result from the first and one or more second speech recognition results, data transfer through sending the intermediate results between devices in the node as well as the one or more second speech recognition results, and data transformation though speech recognition. A human may receive speech at one location as an audio recording, perform at least two pre-processing steps on the audio recording from: setting start/end boundaries; indicating the duration of audio recording; indicating the quality of the audio recording as being either acceptable or unacceptable for recognising speech; or indicating particular features of the speech which can be determined through listening to the audio, the step then proceeds to sending the first preprocessed result to another human as well as receiving preprocessed results, recognising the speech in the audio based on the result of both the first and second preprocessing, receiving other speech recognition results from other humans in the same environment, and having a human present a final speech recognition result based on the first and the one or more second speech recognition results, presenting this final result as a text-written transcript of the speech content. The claims hereby recite a mental process. This judicial exception is not integrated into a practical application as the claims simply teach of data gathering, manipulating, transferring and manipulating/converting. While the claims do mention a network, node devices, processor, memory, sound acquisition, and a communication module, these are recited in generic terms. The invention is not tied to any particular defining structure and simply provides instructions to apply the judicial exception. The techniques can be performed by a generic computer which would be presented as a tool to implement the abstract idea (classifiable as automation of the mental process steps). The Specification on Page 3 line 26 – page 4 line 2 provides a node device which includes several generic computer parts, these being suitable to perform the required tasks and to read upon the limitations as a generic computer having a processor, memory, network module, microphone, being connected in a network to several other devices like it. The judicial exception is recited at a high level of generality that it amounts to no more than mere instructions to apply the exception using a generic computer. The claims do not provide any additional detail. The claims therefore do not include additional elements that would be sufficient to amount to significantly more than the judicial exception because the invention is not tied to a practical application. The claims provide techniques that amount to no more than mere instructions that apply the judicial exception which can be performed by a generic device. Merely mentioning the processor, memory, sound acquisition, and communication module amount to no more than general-purpose hardware used as tools to implement the abstract idea and do not provide any particular application other than applying them for the purpose of implementing a judicial exception. Mere instructions to apply an exception using a generic device cannot provide an inventive concept. Claims 1 and 11 are not eligible. Claims 2 and 12 provide the presence of a communication module configured to send the first speech recognition result to one or more devices within the network. A human may send his/her own speech recognition result to other humans within the same environment who are also capable of processing the received speech. This does not integrate any practical application nor does it provide any additional element sufficient to amount to more than the mentioned judicial exception. Claims 3 and 13 provide that the pre-processed results comprise sound feature value, sound quality and sound time information. Obtaining these sound feature and sound quality information can be performing by having a human listen to the audio in order to obtain these features. Obtaining sound time information may be performed by having the human use a watch to obtain certain timing boundaries. This does not integrate any practical application nor does it provide any additional element sufficient to amount to more than the mentioned judicial exception. Claims 4 and 14 provide that the sound feature value is an MFCC value or PLP feature value of the audio signal. These values can be obtained as mathematical concept. This does not integrate any practical application nor does it provide any additional element sufficient to amount to more than the mentioned judicial exception. Claims 5 and 15 provide that the sound quality comprises a signal-to-noise ration and an amplitude of the audio signal. Computing an SNR is a mathematical concept, and an audio signal amplitude can be observed by a human. This does not integrate any practical application nor does it provide any additional element sufficient to amount to more than the mentioned judicial exception. Claims 6 and 16 provide teaching for sound time information comprising one of start time, and either the end time of duration of the audio signal. A human may observe the start time and end time, and calculate a duration of the audio signal by the use of a watch and listening to the audio signal. This does not integrate any practical application nor does it provide any additional element sufficient to amount to more than the mentioned judicial exception. Claims 7 and 17 provide teaching for teaching for incrementally sequence numbering of the first and second sound pre-processed results. A human may perform such numberings with the aid of a pen and paper. This does not integrate any practical application nor does it provide any additional element sufficient to amount to more than the mentioned judicial exception. Claims 8 and 18 provides teaching for considering the sound quality of the pre-processed results and if the sound quality does not exceed a predetermined threshold, the pre-processed result gets discarded. A human may listen for the quality of the pre-processed signals, and then discard those that don’t meet a quality standard/threshold. This does not integrate any practical application nor does it provide any additional element sufficient to amount to more than the mentioned judicial exception. Claims 9 and 19 provides teaching for selecting the pre-processed signal with the highest sound quality for performing speech recognition on. A human may observe and listen to the available pre-processed results, and based on the results, choose the pre-processed result with the highest quality for speech recognition to be performed on. This does not integrate any practical application nor does it provide any additional element sufficient to amount to more than the mentioned judicial exception. Claims 10 and 20 provide performing weighting on both the first and second speech recognition results to obtain the final speech recognition result. Performing weighting is a mathematical process. Also, a human may give certain weights to different speech recognition results coming from different people/locations, in order to arrive at a final speech recognition result. This does not integrate any practical application nor does it provide any additional element sufficient to amount to more than the mentioned judicial exception. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1, 2, 10, 11, 12 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Nakadai et al. (US 2016/0055850 A1: hereafter — Nakadai) in view of KIM et al. (US 2019/038614 A1: hereafter — Kim) further in view of KIKUCHI et al. (JP 2020/0160281 A: hereafter — Kikuchi; the Examiner applies the attached English translation). For claim 1, Nakadia discloses a distributed speech processing system (Nakadai: [0046] — first and second speech processing devices connected in a network), comprising: a plurality of node devices in a network, wherein each of the plurality of node devices comprises a processor, a memory, a communication module, and a sound processing module, and at least one of the plurality of node devices comprises a sound acquisition module (Nakadai: [0046] — first and second speech processing devices connected in a network; FIG. 1 — the devices having a preprocessing unit as a sound processing module, a database as a memory, a communication unit, with Part 10 containing a sound acquisition module; [0264] — integration of a processor in the devices); wherein, the sound acquisition module is configured to acquire an audio signal (Nakadai: FIG. 1 Parts 30, 110 — acquiring the input speech signal); the sound processing module is configured to preprocess the audio signal to obtain a first sound preprocessed result (Nakadai: FIG. 1 Part 112 — a preprocessing module which performs sound source localisation); the communication module is configured to send the first sound preprocessed result to one or more [[node devices in the network]] (Nakadai: FIG. 7 Part 120 — a communication unit able to transmit pre-processed speech signal to another device 20A which can perform its own further preprocessing, such as sound source localisation, etc.; [0053] — the network 50 may be a local area network (indicating according to FIG. 7 Part 50, that the devices are all in a network)); the communication module is further configured to receive one or more second sound preprocessed results from at least [[one other node device over the network]] (Nakadai: FIG. 7 — communicating results of pre-processed data from one preprocessing step to the next, such as from 112 to 113 to 114); and the sound processing module is further configured to perform speech recognition based on the first sound preprocessed result and the one or more second sound preprocessed results to obtain a first speech recognition result; wherein each of the first sound preprocessed result [[and the one or more second sound preprocessed results]] is an intermediate result of speech recognition (Nakadai: FIG. 7 — first and second speech processing units for performing speech recognition on results of first and second sound pre-processed results, with Parts 112, 113 and 114 being preprocessed results that are intermediate results on a path to speech recognition Part 116; [0046] — a first speech processing device 10 and a second speech processing device 20 which are connected through a network (indicating different node devices); [0050] — the second speech processing device 20 also performs speech recognition on speech received from the first speech processing device); the communication module is further configured to receive one or more second speech recognition results from at least one other node device over the network (Nakadai: FIG 7 Part 220 — a communication unit that receives second speech recognition results over a network 50, whereby data can be transmitted and received between both the first and second speech processing devices (these being two devices qualifying as node devices over a network); [0135] — a second speech processing device 20A which includes preprocessing unit and a second speech recognition unit (able to perform its own speech recognition); [0078] — ‘The communication unit 220 transmits transmission data including the second text data input from the second speech recognition unit 216 to the first speech processing device 10’ (showing a transmission of second speech recognition result from another node device)). The reference of Nakadai provides teaching for the presence of different node devices over a network performing their separate speech recognition tasks, but differs from the claimed invention in that the claimed invention further provides teaching for performing speech recognition based on a first and one or more speech recognition results to obtain a final speech recognition result. This isn’t new to the art as the reference of Kim is now introduced to teach this as: the sound processing module is further configured to perform speech recognition based on the first speech recognition result and the one or more second speech recognition results to obtain a final speech recognition result (Kim: [0221] — generating a final speech recognition result based on weighting each of the available speech recognition results; [0222] — obtaining several speech recognition results, which get combined in order to obtain an aggregate speech recognition result as a final speech recognition result). Hence, before the effective filing date of the claimed invention, one of ordinary skill in the art would have found it obvious to combine the known teaching of Kim which weighs each of multiple speech recognition results to get a final speech recognition, with the teaching of Nakadai which teaches of the generation of two speech recognition results, to thereby come up with the claimed invention. The combination of both prior art elements would have provided the predictable result of combining the best of several speech recognition results into a single final speech recognition result. See KSR Int’l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007). The combination of Nakadai in view of Kim provides teaching for performing different stages of audio pre-processing, but differs from the claimed invention in that the claimed invention further provides teaching for performing the second sound pre-processing at another node device. This is however not new to the art as the reference of Kikuchi is now introduced to teach this as: the communication module is configured to send the first sound preprocessed result to one or more node devices in the network (Kikuchi: page 4 line 47 – page 5 line 6 — performing pre-processing at 112-1 of a first device and then transmitting it over to another pre-processing unit 112-2 (indicating a distributed pre-processing over different devices or nodes); page 6 lines 20–26 — another pre-processing unit 112-2 which obtains a signal that was initially pre-processed by a different unit, before performing its own pre-processing); the sound processing module is further configured to perform speech recognition based on the first sound preprocessed result and the one or more second sound preprocessed results to obtain a first speech recognition result; wherein each of the first sound preprocessed result and the one or more second sound preprocessed results is an intermediate result of speech recognition (Kikuchi: page 4 line 47 – page 5 line 6 — performing pre-processing at 112-1 of a first device and then transmitting it over to another pre-processing unit 112-2 (indicating a distributed pre-processing over different devices or nodes); page 6 lines 20–26 — another pre-processing unit 112-2 which obtains a signal that was initially pre-processed by a different unit, before performing its own pre-processing). Hence, before the effective filing date of the claimed invention, one of ordinary skill in the art would have found it obvious to improve upon the teaching of the combination of Nakadai in view of Kim which teaches of different stages of performing audio pre-processing, by incorporating the known technique of Kikuchi which distributes the pre-processing among other node devices, to thereby come up with the claimed invention. The combination of both prior art elements would have provided the predictable result of reducing the computational processing at one device while sharing the computing load with other devices, leading to prioritising the transmission of the more useful data. See KSR Int’l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007). For claim 2, claim 1 is incorporated and the combination of Nakadai in view of Kim further in view of Kikuchi discloses the distributed speech processing system, wherein the communication module is further configured to send the first speech recognition result to one or more node devices in the network (Nakadai: [0078], FIG. 7 — a communication unit 120 transmitting speech recognition result over the network to another device). For claim 10, claim 2 is incorporated and the combination of Nakadai in view of Kim further in view of Kikuchi discloses the distributed speech processing system, wherein the sound processing module is further configured to perform weighting processing on the first speech recognition result and the one or more second speech recognition results to obtain the final speech recognition result (Kim: [0221] — generating a final speech recognition result based on weighting each of the available speech recognition results). As for claim 11, method claim 11 and system claim 1 are related as method detailing procedures for using the claimed system, with each claimed element’s function corresponding to the claimed system parts. Accordingly, claim 11 is similarly rejected under the same rationale as applied above with respect to system claim 1. As for claim 12, method claim 12 and system claim 2 are related as method detailing procedures for using the claimed system, with each claimed element’s function corresponding to the claimed system parts. Accordingly, claim 12 is similarly rejected under the same rationale as applied above with respect to system claim 2. As for claim 20, method claim 20 and system claim 10 are related as method detailing procedures for using the claimed system, with each claimed element’s function corresponding to the claimed system parts. Accordingly, claim 20 is similarly rejected under the same rationale as applied above with respect to system claim 10. Claims 3, 4, 6, 9, 13, 14, 16 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Nakadai (US 2016/0055850 A1) in view of Kim (US 2019/038614 A1) further in view of Kikuchi (JP 2020/0160281 A) as applied to claim 1, and further in view of JANG et al. (US 2017/0213569 A1: hereafter — Jang). For claim 3, claim 1 is incorporated and the combination of Nakadai in view of Kim further in view of Kikuchi discloses the distributed speech processing system, wherein each of the first sound preprocessed result and the one or more second sound preprocessed results comprises a sound feature value, [[a sound quality, and sound time information]] (Nakadai: [0015] — pre-processing may generate first acoustic feature quantity (sound feature value)). The combination of Nakadai in view of Kim further in view of Kikuchi however fails to teach the further limitation of this claim regarding the pre-processing result regarding a sound quality and the sound information. The reference of Jang is now introduced to teach this as: the distributed speech processing system, wherein each of the first sound preprocessed result and the one or more second sound preprocessed results comprises a sound feature value, a sound quality, and sound time information (Jang: [0082] — performing pre-processing to output speech signal with suitable quality; [0083] — detecting start and end points of utterance). Hence, before the effective filing date of the claimed invention, one of ordinary skill in the art would have found it obvious to combine the known teaching of Jang which teaches pre-processing to comprise a sound quality and sound time information as taught by Jang, with the combination of Nakadai in view of Kim further in view of Kikuchi which teaches speech signal pre-processing, to thereby come up with the claimed invention. The combination of both prior art elements would have provided the predictable result of cleaning up the speech to be processed so as to obtain clean speech with clear utterance boundaries. See KSR Int’l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007). For claim 4, claim 3 is incorporated and the combination of Nakadai in view of Kim further in view of Kikuchi and further in view of Jang discloses the distributed speech processing system, wherein the sound feature value is a Mel-Frequency Cepstral Coefficient (MFCC) feature value or a Perceptual Linear Predictive (PLP) feature value of the audio signal (Nakadai: [0060] — mel-frequency cepstrum coefficients). For claim 6, claim 3 is incorporated and the combination of Nakadai in view of Kim further in view of Kikuchi and further in view of Jang discloses the distributed speech processing system, wherein the sound time information comprises one of the following: a start time and an end time of the audio signal (Jang: [0083] — detecting start and end points of utterance), and a start time and a duration of the audio signal. For claim 7, claim 3 is incorporated and the combination of Nakadai in view of Kim further in view of Kikuchi and further in view of Jang discloses the distributed speech processing system, wherein each of the first sound preprocessed result and the one or more second sound preprocessed results further comprises an incremental sequence number of the audio signal (Nakadai: FIG. 7, [0137]–[0141] — each of the pre-processing techniques outputs particular pre-processed results to the selection unit, each of the pre-processors being shown to work in incremental sequence; [0145] — the selection unit is able to select the particular pre-processed data that is to be sent to the communication unit (indicating that the presence of a sequential identification for identifying each of the pre-processed results)). As for claim 13, method claim 13 and system claim 3 are related as method detailing procedures for using the claimed system, with each claimed element’s function corresponding to the claimed system parts. Accordingly, claim 13 is similarly rejected under the same rationale as applied above with respect to system claim 3. As for claim 14, method claim 14 and system claim 4 are related as method detailing procedures for using the claimed system, with each claimed element’s function corresponding to the claimed system parts. Accordingly, claim 14 is similarly rejected under the same rationale as applied above with respect to system claim 4. As for claim 16, method claim 16 and system claim 6 are related as method detailing procedures for using the claimed system, with each claimed element’s function corresponding to the claimed system parts. Accordingly, claim 16 is similarly rejected under the same rationale as applied above with respect to system claim 6. As for claim 17, method claim 17 and system claim 7 are related as method detailing procedures for using the claimed system, with each claimed element’s function corresponding to the claimed system parts. Accordingly, claim 17 is similarly rejected under the same rationale as applied above with respect to system claim 7. Claims 5 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Nakadai (US 2016/0055850 A1) in view of Kim (US 2019/038614 A1) further in view of Kikuchi (JP 2020/0160281 A), further in view of Jang (US 2017/0213569) as applied to claim 3, and further in view of KIM et al. (US 2020/0126565 A1: hereafter — Kim). For claim 5, claim 3 is incorporated but the combination of Nakadai in view of Kim further in view of Kikuchi and further in view of Jang fails to disclose the limitation of this claim, for which the reference of Kim is now introduced to teach as the distributed speech processing system, wherein the sound quality comprises a signal-to-noise ratio and an amplitude of the audio signal (Kim: [0123] — SNR and speech signal amplitude being a measure of sound quality). The combination of Nakadai in view of Kim further in view of Kikuchi and further in view of Jang discloses teaching determining sound quality, but differs from the claimed invention in that the claimed invention further provides teaching for the sound quality comprising an SNR and an audio signal amplitude. This isn’t new to the art as the reference of Kim is seen to teach above. Hence, before the effective filing date of the claimed invention, one of ordinary skill in the art would have found it obvious to improve upon the technique provided by the combination of Nakadai in view of Kim further in view of Kikuchi and further in view of Jang which teaches of determining sound quality, with the known teaching of Kim which provides that the quality is measured based SNR and speech signal amplitude, to thereby come up with the claimed invention. The combination of both prior art elements would have provided the predictable result of improving upon speech quality by increasing the speech signal amplitude and reducing the noise presence in the signal. See KSR Int’l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007). As for claim 15, method claim 15 and system claim 5 are related as method detailing procedures for using the claimed system, with each claimed element’s function corresponding to the claimed system parts. Accordingly, claim 15 is similarly rejected under the same rationale as applied above with respect to system claim 5. Claims 8 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Nakadai (US 2016/0055850 A1) in view of Kim (US 2019/038614 A1) further in view of Kikuchi (JP 2020/0160281 A), further in view of Jang (US 2017/0213569) as applied to claim 3, and further in view of Cremer (US 2022/0130412 A1).1 For claim 8, claim 3 is incorporated but the combination of Nakadai in view of Kim further in view of Kikuchi and further in view of Jang fails to disclose the limitations of this claim, for which the reference of Cremer is now introduced to teach as the distributed speech processing system, wherein for each of the first sound preprocessed result and the one or more second sound preprocessed results, the sound processing module is further configured to: determine whether a corresponding sound quality exceeds a predetermined threshold (Cremer: [0033] — performing an equalisation on the received audio (as a pre-processing filtering step) and determines a threshold amount of EG that is to be applied to the signal, such that a determination is made as to if the adjustment meets a threshold (this also being a measure of its quality)), and in response to that the corresponding sound quality does not exceed the predetermined threshold, discard a corresponding speech preprocessed result (Cremer: [0033] — in the event that the threshold is not met, to determine an unsatisfactory quality, the signal gets discarded). The combination of Nakadai in view of Kim further in view of Kikuchi and further in view of Jang provides teaching for a distributed speech processing system that comprises several pre-processing steps, but differs from the claimed invention in that the claimed invention further provides teaching for detecting if sound quality of the pre-processed signal meets a threshold, such that if its quality does not meet the threshold, the pre-processed signal gets discarded. This isn’t new to the art as the reference of Cremer is seen to teach above. Hence, at the time the claimed invention was effectively filed, one of ordinary skill in the art would have found it obvious to modify the teaching of the combination of Nakadai in view of Kim further in view of Kikuchi and further in view of Jang which provides a distributed speech processing involving the pre-processing of input audio signals, with the known teaching of Cremer which provides checking if the quality of a pre-processed signal meets a threshold, to then discard it if it fails to meet the threshold, to thereby come up with the claimed invention. The combination of both prior art elements would have provided the predictable result of reducing unnecessary signal adjustments to an input signal of bad quality simply by not further processing input signals that are not up to standard for the speech recognition process, thereby reducing speech recognition errors. See KSR Int’l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007). As for claim 18, method claim 18 and system claim 8 are related as method detailing procedures for using the claimed system, with each claimed element’s function corresponding to the claimed system parts. Accordingly, claim 18 is similarly rejected under the same rationale as applied above with respect to system claim 8. Claims 9 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Nakadai (US 2016/0055850 A1) in view of Kim (US 2019/038614 A1) further in view of Kikuchi (JP 2020/0160281 A), further in view of Jang (US 2017/0213569) as applied to claim 3, and further in view of Haukioja et al. (US 2019/0253558 A1: hereafter — Haukioja). For claim 9, claim 3 is incorporated but the combination of Nakadai in view of Kim further in view of Kikuchi and further in view of Jang fails to fully disclose the limitation of this claim, for which the reference of Haukioja is now introduced to teach as the distributed speech processing system, wherein the sound processing module is further configured to select, among the first sound preprocessed result and the one or more second sound preprocessed results, one or more sound preprocessed results with a highest sound quality to perform speech recognition to obtain the first speech recognition result (Haukioja: [0026] — pre-processing audio files, to then get files with good sound quality for the next processing stages). The combination of Nakadai in view of Kim further in view of Kikuchi and further in view of Jang provides teaching for pre-processing audio to obtain several pre-processed audio signals along with being able to select which of the pre-processed data along with the speech signal is to be sent for speech recognition (FIG. 7, [0145]), but differs from the claimed invention in that the claimed invention further teaches of selecting the pre-processed result with the highest sound quality. This isn’t new to the art as the reference of Haukioja is seen to teach above. Hence, before the effective filing date of the claimed invention, one of ordinary skill in the art would have found it obvious to improve upon the teaching of the combination of Nakadai in view of Kim further in view of Kikuchi and further in view of Jang which teaches of a selection process of pre-processed audio signal results, through the use of the known technique of Haukioja which performs audio pre-processing to obtain good quality sound, to thereby come up with the claimed invention. The combination of both prior art elements would have provided the predictable result of the selection of audio with the bet quality, suitable for the further audio processing stages (Haukioja: [0026]). As for claim 19, method claim 19 and system claim 9 are related as method detailing procedures for using the claimed system, with each claimed element’s function corresponding to the claimed system parts. Accordingly, claim 19 is similarly rejected under the same rationale as applied above with respect to system claim 9. Conclusion Applicant’s amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. Any inquiry concerning this communication or earlier communications from the Examiner should be directed to OLUWADAMILOLA M. OGUNBIYI whose telephone number is (571)272-4708. The Examiner can normally be reached Monday – Thursday (8:00 AM – 5:30 PM Eastern Standard Time). Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, Applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the Examiner by telephone are unsuccessful, the Examiner’s Supervisor, PARAS D SHAH can be reached at (571) 270-1650. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /OLUWADAMILOLA M OGUNBIYI/Examiner, Art Unit 2653 /Paras D Shah/Supervisory Patent Examiner, Art Unit 2653 02/01/2026 1 This reference has a filing date of 22 October 2021, but has a provisional application with an earlier filing date of 27 October 2020, the Specification of which is completely applicable as a prior art reference.
Read full office action

Prosecution Timeline

Jun 30, 2023
Application Filed
Jul 26, 2025
Non-Final Rejection — §101, §103
Oct 23, 2025
Response Filed
Jan 31, 2026
Final Rejection — §101, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12579979
NAMING DEVICES VIA VOICE COMMANDS
2y 5m to grant Granted Mar 17, 2026
Patent 12537007
METHOD FOR DETECTING AIRCRAFT AIR CONFLICT BASED ON SEMANTIC PARSING OF CONTROL SPEECH
2y 5m to grant Granted Jan 27, 2026
Patent 12508086
SYSTEM AND METHOD FOR VOICE-CONTROL OF OPERATING ROOM EQUIPMENT
2y 5m to grant Granted Dec 30, 2025
Patent 12499885
VOICE-BASED PARAMETER ASSIGNMENT FOR VOICE-CAPTURING DEVICES
2y 5m to grant Granted Dec 16, 2025
Patent 12469510
TRANSFORMING SPEECH SIGNALS TO ATTENUATE SPEECH OF COMPETING INDIVIDUALS AND OTHER NOISE
2y 5m to grant Granted Nov 11, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
78%
Grant Probability
96%
With Interview (+18.6%)
2y 12m
Median Time to Grant
Moderate
PTA Risk
Based on 304 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month