DETAILED ACTION
This office action is in response to Applicant’s RCE submission filed on 12/11/2025. Claims 21-27, and 31-37 were amended. Claims 29, 30 and 39, and 40 were canceled previously. Claims 21-28, 31-38 are pending in the application of which Claims 21, and 31 are independent and have been examined.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant’s submission filed on 12/11/2025 has been entered.
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
Claim 21, and 31 rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
"the second speech processing hypothesis is generated prior to receiving the audio data;" as cited in the claim does not have support or description in the as filed Applicant Specification.
Response to Arguments
Applicant’s arguments filed in the Amendment filed 12/11/2025 (herein “Amendment”) with respect to the 35 USC §101 rejection raised in the previous office action have been fully considered but are not persuasive.
Applicant set forth on page 9 of the Amendment:” claims 21 and 31 recite the above technical solution, rather than merely the abstracted outcome (i.e., reduced errors in processing).
Examiner respectfully traverses the Applicant arguments. Determining the first speech processing results to be incorrect is not an inventive concept and human can also determine what he has heard is in doubt and can end up in misrecognizing what the speaker has said and consequently, he may have to rely on some other data such as contextual data to arrive at a correct understanding/recognition.
Furthermore, the claim doesn’t describe an improvement in speech recognition technology, and instead only describes instructions to implement the method on a generic computer. Deciding if the first recognition result could be in error cannot provide an improvement in speech recognition technology. One need to provide a significantly more so that it can be considered patent eligible, or:
Improvements to another technology or technical field;
Improvements to the functioning of the computer itself;
Applying the judicial exception with, or by use of, a particular machine;
Effecting a transformation or reduction of a particular article to a different state or thing;
Adding a specific limitation other than what is well-understood, routine and conventional in the field, or adding unconventional steps that confine the claim to a particular useful application; or
Other meaningful limitations beyond generally linking Ute use of the judicial exception to a particular technological environment;
See MPEP §§ 2106. 05(a)- (c), (e)- (h). In light of the foregoing arguments, the U.S.C 101 rejection is hereby sustained.
with respect to the 35 USC §103 rejection raised in the previous office action have been fully considered but are moot in view of the new grounds of rejection which was necessitated by applicant’s amendment. Therefore, the previous rejection has been withdrawn. However, upon further consideration, a new ground of rejection is introduced for independent claims further adding Chevrier et al. (US 10325597 B1), to Sun and the argument with respect to the combination of Godden and Kumar are moot since they are no longer relied upon in this Office Action.
Please see prior art section below for more detail including updated citations and obviousness rationale.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 21-28, 31-38 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.
The flowchart in MPEP 2106, subsection III, is used to determine whether a claim satisfies the criteria for subject matter eligibility. For analysis purposes, one can follow the flowchart for subject matter eligibility.
PNG
media_image1.png
628
432
media_image1.png
Greyscale
Step 1: The independent Claims is directed to statutory categories:
Step 1: Abstract Idea Groupings – MPEP 2106.04(a)(2)
The enumerated groupings of abstract ideas are defined as:
1) Mathematical concepts – mathematical relationships, mathematical formulas or equations, mathematical calculations (see MPEP § 2106.04(a)(2), subsection I);
2) Certain methods of organizing human activity – fundamental economic principles or practices (including hedging, insurance, mitigating risk); commercial or legal interactions (including agreements in the form of contracts; legal obligations; advertising, marketing or sales activities or behaviors; business relations); managing personal behavior or relationships or interactions between people (including social activities, teaching, and following rules or instructions) (see MPEP § 2106.04(a)(2), subsection II); and
3) Mental processes – concepts performed in the human mind (including an observation, evaluation, judgment, opinion) (see MPEP § 2106.04(a)(2), subsection III).
Claim 21 is a method claim and directed to the process category of patentable subject matter.
Claim 31 is a system claim and directed to the machine or manufacture category of patentable subject matter.
Step 2A is a two-prong test.
PNG
media_image2.png
404
780
media_image2.png
Greyscale
Step 2A, Prong One: Does the Claim recite a Judicially Recognized Exception? Abstract Idea? Are these Claims nevertheless considered Abstract as a Mathematical Concept (mathematical relationships, mathematical formulas or equations, mathematical calculations), Mental Process (concepts performed in the human mind (including an observation, evaluation, judgment, opinion), or Certain Methods of Organizing Human Activity (1-fundamental economic principles or practices (including hedging, insurance, mitigating risk), 2-commercial or legal interactions (including agreements in the form of contracts; legal obligations; advertising, marketing or sales activities or behaviors; business relations), 3- managing personal behavior or relationships or interactions between people (including social activities, teaching, and following rules or instructions) and fall under the judicial exception to patentable subject matter?)
The rejected Claims recite Mental Processes or Methods of Organizing Human Activity.
Step 2A, Prong Two: Additional Elements that Integrate the Judicial Exception into a Practical Application? Identifying whether there are any additional elements recited in the claim beyond the judicial exception(s), and evaluating those additional elements to determine whether they integrate the exception into a practical application of the exception. “Integration into a practical application” requires an additional element(s) or a combination of additional elements in the claim to apply, rely on, or use the judicial exception in a manner that imposes a meaningful limit on the judicial exception, such that the claim is more than a drafting effort designed to monopolize the exception. Uses the considerations laid out by the Supreme Court and the Federal Circuit to evaluate whether the judicial exception is integrated into a practical application.
The rejected Claims do not include additional limitations that point to integration of the abstract idea into a practical application. Accordingly, the rejected Claims are directed to the abstract idea that they recite.
Claim 21 is a generic automation of a mental process since a human agent can receive an audio, determine first data, processing an intent/hypothesis, determining a hypothesis resulting an error, in response, do the transcription with another person, and finally taking an action based on the re-ranked hypotheses etc. Other than the mental process under the BRI, there is only the mention of a first machine learning trained component (1st ML), which is considered to be generic processor. ML is not invented nor improved by the applicant, and as such it is considered a generic processor/computer due to lack of specificity. With such a generic extra element, one cannot identify anything that can be relied upon as an improvement. Prong 2 of step 2A, in the 101 analysis, asks whether the abstract idea is integrated into a practical application. The answer is no in this instance because there is no technological solution in the Claim that “integrates” the abstract idea. The Claim only suggests that the abstract idea be applied. It does not describe an application.
21. A computer-implemented method, comprising:
receiving audio data representing a spoken natural language input; [This is merely amount to a data gathering activity. Human can attentively listen to a speech and note down what he has heard on a piece of paper.]
using at least a first machine learning trained component, performing speech processing to generate a first speech processing hypothesis corresponding to a first interpretation of the spoken natural language input; [This amount to have a human perform speech transcription with a help of pen and paper. For example, human can write down the speech he has heard and based on his understanding and interpretation of the words he heard in the speech. The machine learning component will be addressed as later as an extra element which does not integrate the abstract idea.]
using a second component, determining the first speech processing hypothesis is associated with a second speech processing hypothesis corresponding to a second interpretation of the spoken natural language input different from the first interpretation; [This is amount to determining the first attempt in transcribing the speech and also devise another version which could be based on two different interpretations of the original speech. This could be done when the speech he has heard is not clear or ambiguous and forces the individual to arrive at two different interpretations of the same speech. As such it is a mental process.]
wherein the first speech processing hypothesis results in a processing error, the second speech processing hypothesis does not result in the processing error, and the second speech processing hypothesis is generated prior to receiving the audio data; [ In the context of this claim, it is like a person state an ambiguous phrase and right away correct his mistake and immediately before the listener get a chance to respond provide the corrected version.]
determining the second speech processing hypothesis corresponds to an action; and [human can evaluate the transcription (hypothesis) and figure out (via analysis) what action the hypothesis corresponds to.]
causing performance of the action. [Human can analyze the transcription and subsequently perform the action which is called for within the transcription.]
These limitations, under their broadest reasonable interpretation, cover performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “machine learning trained component”, “processors”, and “memory” nothing in the claim element precludes the step from practically being performed in the mind. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.
This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements of using a “machine learning trained component”, “processors”, and “memory” to perform all of the above-mentioned steps. The use of a “machine learning trained component”, and “processors” is recited at a high-level of generality (i.e., as a generic computer/processor device performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component See MPEP2106.05(f) Mere Instructions to Apply an Exception [R-10.2019].
Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: Search for Inventive Concept: Additional Element Do not amount to Significantly More: The limitations of " performing speech processing by using a first machine learning trained component, …” is a well-understood, routine, and conventional machine components that and are being used for their well-understood, routine, and conventional and rather generic functions. Additionally, these limitations are expressed parenthetically and lack nexus to the claim language and as such are a separable and divisible mention to a machine. Merely reciting machine learning trained component without significantly more appears to be equivalent to a generic computer/processor to process a task that a human can process in their mind or with the aid of a paper/pen.
As mentioned, the only additional element to be considered, is the recitation of machine learning trained component. However, according to the as-filed specification (Par. 0122, and 0123) it disavows specificity of the ML used and referenced to “various machine learning techniques”, etc. which is attestation for ML to be a generic model. Therefore, the cited additional element of ML does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. Accordingly, it is not sufficient to cause the Claim, as a whole, to amount to significantly/substantially more than the underlying abstract idea.
This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements of using a “machine learning trained component”, “processors”, and “memory” to perform all of the above-mentioned steps. The use of a “machine learning trained component”, and “processors” is recited at a high-level of generality (i.e., as a generic computer/processor device performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component See MPEP2106.05(f) Mere Instructions to Apply an Exception [R-10.2019]. Also, in case of machine learning trained component, it is described in a broad manner such that it could include techniques that may be performed by a human, like a rule-based learning for example. However, ML is a “well-understood, routine, conventional elements, for example, US20230368146A1- discloses an advanced machine learning model for automatic speech recognition (ASR) leveraging using historical sets of speech data. US20220148614A1 – discloses a machine-learned model to perform automatic speech recognition and automatic sound classification techniques. US20210150548A1 – discloses a machine learning model for ranking purposes. US20210074269A1– discloses machine learning models, to perform a text-to-speech (TTS) subsystem, an automatic speech recognition (ASR) subsystem, and a semantic parser subsystem. US20200357390A1– discloses usage of automatic speech recognition and natural language understanding applications through the use of a machine-learning model trained to produce pronunciation output based on text input. The additional element of a “processor” and “memory, as cited in the as-filed specification in paragraphs 0180-0181, 0185, and 0190 of the instant application appears to disclose a general-purpose computer component which are well-understood, routine and conventional elements. The use of an “computer and/or components of a computer” is recited at a high-level of generality (i.e., as a generic computer device performing a generic computer function of capturing input data, storing data and retrieval data) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
With respect to independent Claim 31, the additional component is a processor and memory which are not sufficient to make the claim as a whole to amount to substantially more than the underlying abstract idea.
The dependent claims do not add limitations that would either integrate the recited abstract idea into a practical application or could help the Claim as a whole to amount to significantly more than the Abstract idea identified for the Independent Claim:
Claims 22 and 32 recites: “… wherein the first speech processing hypothesis corresponds to a first entity; and the second speech processing hypothesis corresponds to a second entity different from the first entity.” As mentioned in the previous section, acquiring the speech/command and potentially he could have heard an entity which due to likelihood that he heard one entity vs another, he would proceed and write down the two different choices. This can be done by a human as such it is a mental process which can be carried out with a pen and paper by a human. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim directed toward abstract idea. The claim is not patent eligible.
Claims 23 and 33 recites: “… performing the speech processing comprises performing automatic speech recognition (ASR) using the audio data to determine a first ASR hypothesis, wherein the first speech processing hypothesis comprises the first ASR hypothesis; and determining the second speech processing hypothesis comprises determining a second ASR hypothesis different from the first ASR hypothesis.” Human can transcribe speech and emulate an ASR and due to the uncertainty of what he heard he can ask another person to come up with a new (different) transcriptions, which can be done by a human as such it is a mental process. These steps can be carried out with a pen and paper by a human. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim directed toward abstract idea. The claim is not patent eligible.
Claims 24 and 34 recites: “… performing speech processing comprises performing natural language understanding (NLU) using first data representing the audio data to determine a first NLU hypothesis, wherein the first speech processing hypothesis comprises the first NLU hypothesis; and determining the second speech processing hypothesis comprises determining a second NLU hypothesis different from the first NLU hypothesis.” This claim is merely the same as the previous one since human can assimilate ASR/NLU and would follow the same steps as the previous claim. Therefore, such steps can be done by a human as such it is a mental process. These steps can be carried out with a pen and paper by a human. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim directed toward abstract idea. The claim is not patent eligible.
Claims 25 and 35 recites: “… sending, from a speech processing component to a third component, first data corresponding to the third speech processing hypothesis; and processing the first data using the first component to determine processing of the first speech processing hypothesis will result in the processing error.” Human can process the speech/command he heard, attempt to write it down on a piece of paper and pass it to another individual so he can determine the likelihood that the transcription is successful is in question due to uncertainty of the accuracy of his transcription. Therefore, such steps can be done by a human as such it is a mental process. These steps can be carried out with a pen and paper by a human. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim directed toward abstract idea. The claim is not patent eligible.
Claims 26 and 36 recites: “… processing, by the third component, the first data with respect to stored data to determine processing of the first speech processing hypothesis will result in the processing error, the stored data corresponding to at least one prior speech processing hypothesis.” Human can consult the previously processed command which he has written it down and via comparison to the prior hypothesis come up with a new likelihood measure based on the fact the first transcription could have been prone to being in error. Therefore, such steps can be done by a human as such it is a mental process. These steps can be carried out with a pen and paper by a human. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim directed toward abstract idea. The claim is not patent eligible.
Claims 27 and 37 recites: “… using the second component, processing first data corresponding to the first speech processing hypothesis with respect to stored data to determine the second speech processing hypothesis resulted in a system performing a correct action.” Similar to the previous claim, human can take advantage of previous processed speech and since the previous process was successful, he can use the previous action since it performed correctly before. Therefore, such steps can be done by a human as such it is a mental process. These steps can be carried out with a pen and paper by a human. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim directed toward abstract idea. The claim is not patent eligible.
Claims 28 and 38 recites: “… determining a user profile corresponding to the spoken natural language input; and determining the stored data based at least in part on the user profile.” Since people have specific attribute such as profile and one can relate such attributes to the speech they deliver, then human can use the attribute/profile of the speaker to aide for his transcription attempt. Therefore, such steps can be done by a human as such it is a mental process. These steps can be carried out with a pen and paper by a human. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim directed toward abstract idea. The claim is not patent eligible.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 21 and 31 are rejected under 35 U.S.C. 103 as being unpatentable over Sun (US20200357409A1), and in further view of Chevrier et al. (US10325597B1)(herein “Chevrier”).
Sun was applied in the previous Office Action.
Regarding claims 21 and 31, Sun teaches [A computer-implemented method, comprising: - claim 21], [A system comprising: at least one processor; and at least one memory comprising instructions that, when executed by the at least one processor, cause the system to: - claim 31] (Sun, Par. 0049:” In some examples, a non-transitory computer-readable storage medium of memory 202 is used to store instructions (e.g., for performing aspects of processes described below) for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. “)
receiving/receive audio data representing a spoken natural language input; (sun, Par. 0282:” At block 902, audio input is received (e.g., at I/O processing module 728, via microphone 213). In some examples, the audio input includes a user utterance representing a user request. “)
using at least a first machine learning trained component, performing speech processing to generate a first speech processing hypothesis corresponding to a first interpretation of the spoken natural language input; (Sun, Par. 0210:” … converting speech input into text; identifying a user's intent expressed in a natural language input received from the user; …”, and Par. 0212:” STT processing module 730 includes one or more ASR systems 758. The one or more ASR systems 758 can process the speech input [audio input] that is received through I/O processing module 728 to produce a recognition result [hypothesis] … each ASR system 758 includes one or more speech recognition models (e.g., acoustic models and/or language models) and implements one or more speech recognition engines. Examples of speech recognition models include Hidden Markov Models, Gaussian-Mixture Models, Deep Neural Network Models, n-gram language models, and other statistical models. … Once STT processing module 730 produces recognition results containing a text string (e.g., words, or sequence of words, or sequence of tokens), the recognition result is passed to natural language processing module 732 for intent deduction. In some examples, STT processing module 730 produces multiple candidate [ hypotheses] text representations of the speech input. Each candidate text representation is a sequence of words or tokens corresponding to the speech input. In some examples, each candidate text representation is associated with a speech recognition confidence score. Based on the speech recognition confidence scores, STT processing module 730 ranks the candidate text representations and provides the n-best (e.g., n highest ranked) candidate text representation(s) to natural language processing module 732 for intent deduction, where n is a predetermined integer greater than zero.”)
using a second component, determining the first speech processing hypothesis is associated with a second speech processing hypothesis corresponding to a second interpretation of the spoken natural language input different from the first interpretation, (Sun, Par. 0021:” … natural language processing (e.g., using natural language processing module 732) can be used to interpret the text representation of a spoken request to determine an actionable intent corresponding to the text representation.”, and Par. 0212:” … In some examples, STT processing module 730 produces multiple candidate text representations [hypothesis] of the speech input. Each candidate text representation is a sequence of words or tokens corresponding to the speech input. In some examples, each candidate text representation [hypothesis] is associated with a speech recognition confidence score. Based on the speech recognition confidence scores, STT processing module 730 ranks the candidate text representations [hypothesis] and provides the n-best (e.g., n highest ranked) candidate text representation(s) to natural language processing module 732 for intent deduction [interpretation], where n is a predetermined integer greater than zero. For example, in one example, only the highest ranked (n=1) candidate text representation is passed to natural language processing module 732 for intent deduction. In another example, the five highest ranked (n=5) candidate text representations are passed to natural language processing module 732 for intent deduction.”) Note: multiple candidate text representations correspond to the same speech input, however they represent different interpretation of the same audio input.
determining/determine the second speech processing hypothesis corresponds to an action; and (Sun, Par. 0241:” … The second structured query is selected, for example, based on the speech recognition confidence score of the corresponding candidate text representation, the intent confidence score of the corresponding candidate actionable intent, a missing necessary parameter in the first structured query, or any combination thereof.”)
causing performance of the action. (Sun, Par. 0246:” … For example, natural language processing module 732 performs semantic parsing on the text representation to determine an actionable intent. Task flow processing module 736 then generates a task flow corresponding to the actionable intent and performs the task flow.”)
Sun, does not teach, however, Chevrier teaches wherein: the first speech processing hypothesis results in a processing error, the second speech processing hypothesis does not result in the processing error, and the second speech processing hypothesis is generated prior to receiving the audio data; (Chevrier, Col. 10, ll. 14-20:” In some embodiments, the second hypothesis transcription 250b may be generated by the speech recognition system before receiving all of the audio 240. As illustrated, the second hypothesis transcription 250b may correct an error in the first hypothesis transcription 250a but may also include an error.”) Note: Correcting an error (in the first hypothesis) in the second hypothesis, implies there was an error in the first hypothesis results. Correcting an error in the second hypothesis would read on the second hypothesis does not result in the processing error. Lastly generating the second hypothesis before receiving all the audio, reads on second hypothesis is generated prior to receiving the audio data.
Chevrier is considered to be analogous to the claimed invention because it is in the same field of endeavor. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sun further in view of Chevrier to wherein: the first speech processing hypothesis results in a processing error, the second speech processing hypothesis does not result in the processing error, and the second speech processing hypothesis is generated prior to receiving the audio data. Motivation to do so would provide the transcriptions to a hard-of-hearing or deaf person, a particular device or application running on a mobile device or computer may be used to display text transcriptions of the audio being received by the hard of hearing or deaf person (Chevrier, Col. 1, ll. 12-16)
Claims 22, 23, 32 and 33 are rejected under 35 U.S.C. 103 as being unpatentable over Sun, and Chevrier and in further view of Mairesse (US9558740B1).
Mairesse was applied in the previous Office Action.
Regarding claims 22 and 32, Sun, as modified above, teaches the method and the system of claims 21, and 31 respectively.
Sun, as modified above, does not teach, however, Mairesse teaches wherein the first speech processing hypothesis corresponds to a first entity; and (Mairesse, Claim 1.” … performing ASR processing on the audio data to determine ASR results, … first ASR hypothesis, the second ASR hypothesis and the third ASR hypothesis; processing, by a search engine: the first ASR hypothesis to determine first search results comprising a first plurality of entities, the second ASR hypothesis to determine second search results comprising a second plurality of entities, …”)
the second speech processing hypothesis corresponds to a second entity different from the first entity. (Mairesse, Claim 1. “… performing ASR processing on the audio data to determine ASR results, … first ASR hypothesis, the second ASR hypothesis and the third ASR hypothesis; processing, by a search engine: the first ASR hypothesis to determine first search results comprising a first plurality of entities, the second ASR hypothesis to determine second search results comprising a second plurality of entities, …”)
Mairesse is considered to be analogous to the claimed invention because it is in the same field of endeavor. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sun, as modified above, further in view of Mairesse to wherein the first speech processing hypothesis corresponds to a first entity; and the second speech processing hypothesis corresponds to a second entity different from the first entity. Motivation to do so would improve the process of disambiguation among ASR hypotheses (Mairesse, Col. 16, ll. 18-19).
Regarding claims 23 and 33, Sun, as modified above, teaches the method and the system of claims 21, and 31 respectively.
Sun, as modified above, does not teach, however, Mairesse teaches performing the speech processing comprises performing automatic speech recognition (ASR) using the audio data to determine a first ASR hypothesis, wherein the first speech processing hypothesis comprises the first ASR hypothesis; and (Mairesse, Col. 4, ll. 41-57:” As illustrated in FIG. 1B, after the system performs (154) ASR processing on the audio data, the system determines (170) ASR hypotheses for disambiguation. … One specific example of this would be if a first hypothesis was “who is Spider-Man” and a second hypothesis was “who is spiderman.””)
determining the second speech processing hypothesis comprises determining a second ASR hypothesis. (Mairesse, Col. 4, ll. 41-57: As illustrated in FIG. 1B, after the system performs (154) ASR processing on the audio data, the system determines (170) ASR hypotheses for disambiguation. … One specific example of this would be if a first hypothesis was “who is Spider-Man” and a second hypothesis was “who is spiderman.”)
Mairesse is considered to be analogous to the claimed invention because it is in the same field of endeavor. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sun, as modified above, further in view of Mairesse to perform the speech processing comprises performing automatic speech recognition (ASR) using the audio data to determine a first ASR hypothesis, wherein the first speech processing hypothesis comprises the first ASR hypothesis; and determining the second speech processing hypothesis comprises determining a second ASR hypothesis. Motivation to do so would improve the process of disambiguation among ASR hypotheses (Mairesse, Col. 16, ll. 18-19).
Claims 24 and 34 are rejected under 35 U.S.C. 103 as being unpatentable over Sun, and Chevrier, and in further view of Ko (US 20200143807 A1).
Ko was applied in the previous Office Action.
Regarding claims 24 and 34, Sun, as modified above, teaches the method and the system of claims 21, and 31 respectively.
Sun, as modified above, does not teach, however, Ko teaches performing speech processing comprises performing natural language understanding (NLU) using first data representing the audio data to determine a first NLU hypothesis, wherein the first speech processing hypothesis comprises the first NLU hypothesis; and (Ko, Par. 0133:” The electronic device 100 may determine based on the first ASR result [hypothesis-first data] whether NLU with respect to ASR is possible on the electronic device 100, in operation S850.”, and Par. 0150:” The electronic device 100 may perform NLU based on the first ASR result in operation S1040, thus obtaining a first NLU result.”)
determining the second speech processing hypothesis comprises determining a second NLU hypothesis. (Ko, Par. 0151:” The server 2000 may perform NLU based on the second ASR result in operation S1050, thus obtaining a second NLU result. The server 2000 may transmit the second NLU result to the electronic device 100 in operation S1055.”, and Par. 0152:” The electronic device 100 may select one of the first NLU result and the second NLU result based on context information in operation S1060.”)
Ko is considered to be analogous to the claimed invention because it is in the same field of endeavor. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sun, as modified above, further in view of Ko to perform speech processing comprises performing natural language understanding (NLU) using first data representing the audio data to determine a first NLU hypothesis, wherein the first speech processing hypothesis comprises the first NLU hypothesis; and determining the second speech processing hypothesis comprises determining a second NLU hypothesis. Motivation to do so would improve performance of the virtual assistant (Ko, Par. 0004).
Claims 25 and 35 are rejected under 35 U.S.C. 103 as being unpatentable over Sun, and Chevrier and in further view of Mairesse, and Lee et al. (US20140163975 A1)(herein "Lee").
Regarding claims 25 and 35 , Sun, as modified above, teaches the method and the system of claims 21, and 31 respectively.
Sun, as modified above, does not teach, however, Mairesse teaches sending, from a speech processing component to a third component, first data corresponding to the first speech processing hypothesis; and (Mairesse, Col. 10, ll. 11-19: “The ASR 250 [speech processing component] may then send (308) the selected ASR hypothesis to the NLU [first] component 260. Alternatively, the device 110 may send the selected hypothesis (or an indication of same) directly to the NLU component 260. The NLU 260 may perform NLU processing on the selected hypothesis and may determine that it includes a search request. The NLU 260 may then send the NLU results (310) to a search engine component 290a.”)
Mairesse is considered to be analogous to the claimed invention because it is in the same field of endeavor. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sun, as modified above, further in view of Mairesse to send, from a speech processing component to a third component, first data corresponding to the first speech processing hypothesis. Motivation to do so would improve the process of disambiguation among ASR hypotheses (Mairesse, Col. 16, ll. 18-19).
Sun, as modified above, does not teach, however, Lee teaches processing the first data using the third component to determine processing of the first speech processing hypothesis will result in the processing error. (Lee, Par. 0014:” … a processing unit [third component] configured to determine a likelihood that a speech recognition result is erroneous based on a language model learned through a correct answer corpus and a domain corpus, …”, and Par. 0015:” … the processing unit may determine the likelihood that a speech recognition result [hypothesis] is erroneous based on likelihood of generating the speech recognition result.”)
Lee is considered to be analogous to the claimed invention because it is in the same field of endeavor. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sun, as modified above, further in view of Lee to processing the second data using the first component to process the first data using the third component to determine processing of the first speech processing hypothesis will result in the processing error. Motivation to do so would improve speech recognition accuracy (Lee, Par. 0021).
Claims 26 and 36 are rejected under 35 U.S.C. 103 as being unpatentable over Sun, Chevrier, and Mairesse, and in further view of Lee, and Reilly (US 20200134564 A1).
Reilly was applied in the previous Office Action.
Regarding claims 26 and 36, Sun, as modified above, teaches the method and the system of claims 25, and 35 respectively.
Sun, as modified above, does not teach, however, Lee further teaches processing, by the third component, the first data with respect to stored data to determine processing of the first speech processing hypothesis will result in the processing error, [[ the stored data corresponding to at least one prior speech processing hypothesis]]. (Lee, Par. 0014:” … a processing unit [third component] configured to determine a likelihood that a speech recognition result is erroneous based on a language model learned through a correct answer corpus and a domain corpus, …”, and Par. 0015:” … the processing unit may determine the likelihood that a speech recognition result [hypothesis] is erroneous based on likelihood of generating the speech recognition result.”)
Sun, as modified above, does not teach, however, Reilly teaches [[processing, by the third component, the first data with respect to stored data to determine processing of the first speech processing hypothesis will result in the processing error,]] the stored data corresponding to at least one prior speech processing hypothesis. (Reilly, Par. 0046:” In an embodiment, the platform NLP (natural language processing) confidence score is a mathematical methodology to calculate the probability/relative confidence of the accuracy of NLP results extracted from documents. This score is based on leveraging historic [prior]/accurate results [hypothesis] to train the platform and leverage an algorithm to determine a relative confidence on each answer.”)
Reilly is considered to be analogous to the claimed invention because it is in the same field of endeavor. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sun, as modified above, further in view of Reilly to determine the stored data corresponding to at least one prior speech processing hypothesis. Motivation to do so would improve operational efficiencies (Reilly, Par. 0041).
Claims 27 and 37 are rejected under 35 U.S.C. 103 as being unpatentable over Sun, and Chevrier and in further view of Di Fabbrizio (US20150340033A1).
Di Fabbrizio was applied in the previous Office Action.
Regarding claims 27 and 37, Sun, as modified above, teaches the method and the system of claims 21, and 31 respectively.
Sun, as modified above, does not teach, however, Di Fabbrizio teaches using the second component, processing first data corresponding to the first speech processing hypothesis with respect to stored data to determine the second speech processing hypothesis resulted in a system performing a correct action. (Di Fabbrizio , Par. 0021:” The speech processing system 200 may include an ASR module 202 that performs automatic speech recognition on audio data regarding user utterances, an NLU module 204 that performs natural language understanding on transcriptions generated by the ASR module 202, a context interpreter 206 which applies contextual rules to current NLU results based on prior interpretations and dialog acts, … and a context data store 212 for storing semantic representations of previous user utterances [stored data] and system dialog acts.”, and Par. 0025:’… The ASR module 202 may generate ASR results [hypothesis] for the utterance, such as an n-best list of transcriptions. … The n-best list or some other type of results may be provided to the NLU module 204 so that the user's intent may be determined. An n-best list of interpretations (e.g., intents) may be determined or generated by the NLU module 204 and provided to the context interpreter 206. The context interpreter 206 can process the NLU results ... Illustratively, the context interpreter 206 may merge a current NLU result with a prior result that was stored in the context data store 212 based on the application of a context interpretation rule. The dialog manager 208 may then generate a response (e.g., a confirmation) based on the merged result, …”, and Par. 0038:” … Advantageously, interpreting current utterances within the context of prior turns in a multi-turn interaction … of an utterance in isolation when the correct action is determinable in the context ...”).
Di Fabbrizio is considered to be analogous to the claimed invention because it is in the same field of endeavor. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sun, as modified above, further in view of Di Fabbrizio to use the second component, processing first data corresponding to the first speech processing hypothesis with respect to stored data to determine the second speech processing hypothesis resulted in a system performing a correct action. Motivation to do so would improve natural language understanding accuracy by providing a framework in which to interpret a current user utterance in view of prior interpretations (Di Fabbrizio, Par. 0015).
Claims 28, and 38 are rejected under 35 U.S.C. 103 as being unpatentable over Sun, Chevrier, and Di Fabbrizio and in further view of Dalmia (US10140973B1).
Dalmia was applied in the previous Office Action.
Regarding claims 28 and 38, Sun, as modified above, teaches the method and the system of claims 27, and 33 respectively.
Sun, as modified above, does not teach, however, Dalmia teaches determining a user profile corresponding to the spoken natural language input; and (Dalmia, Col. 17, ll. 23-52:” FIG. 6 illustrates a user profile storage 602 that includes data regarding user accounts 604 as described herein. The user profile storage 602 may be located proximate to the server 120, or may otherwise be in communication with various components, for example over the network 199. The user profile storage 602 may include a variety of information related to individual users, accounts, etc. that interact with the system 100. For example, each user account may include text and associated audio data corresponding to audio previously spoken by the respective user. … may access the user accounts 604 to determine stored text (and associated audio data) corresponding to text of a text message. … For example, when speech is captured by a speech-controlled device (or other device), text corresponding to the speech may be stored and associated with a user ID and user profile of the speaker.”)
determining the stored data based at least in part on the user profile. (Dalmia, Col. 18, ll. 27-32:” In addition, the text and corresponding audio data may be associated with the speaker of the original audio, via the metadata (e.g., speaker ID) associated with the stored text and audio data. The metadata associates the stored text and audio data with a user profile of the speaker and/or the speech-controlled device 110a that captured the spoken utterance.”)
Dalmia is considered to be analogous to the claimed invention because it is in the same field of endeavor. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sun, as modified above, further in view of Dalmia to determine a user profile corresponding to the spoken natural language input; and determining the stored data based at least in part on the user profile. Motivation to do so would improve the likelihood that the ASR process will output speech results that make sense.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure. Rao et al. (US20190295531A1) teaches in Par. 0038:” … the audio transcriber 108 can determine a first transcription using the general language model and a second transcription using the biased language model. The audio transcriber 108 can further determine a first confidence recognition score for the first transcription and a second confidence recognition score for the second transcription. Either the first transcription or the second transcription can be selected based at least in part on the confidence scores. In some implementations, one or more additional transcriptions can be determined using one or more additional language models. The additional transcriptions can have accompanying confidence recognition scores, such that the selected transcription is determined based at least in part on the confidence recognition scores. In some implementations, multiple transcriptions can be selected based to accommodate alternate spellings of words.”
Examiner's Note: Examiner has cited particular columns and line numbers and/or paragraph numbers in the references applied to the claims above for the convenience of the applicant. Although the specified citations are representative of the teachings of the art and are applied to specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested from the applicant in preparing responses, to fully consider the references in entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the Examiner.
In the case of amending the Claimed invention, Applicant is respectfully requested to indicate the portion(s) of the specification which dictate(s) the structure relied on for proper interpretation and also to verify and ascertain the metes and bounds of the claimed invention.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DARIOUSH AGAHI whose telephone number is (408)918-7689. The examiner can normally be reached Monday - Thursday and alternate Fridays, 7:30-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
DARIOUSH AGAHI, P.E.
Primary Examiner
/DARIOUSH AGAHI/Primary Examiner, Art Unit 2656