DETAILED ACTION
1. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
2. Applicant’s submission filed 27 October 2025 [hereinafter Response] has been entered, where:
Claims 1, 6, 8, 13, 15, and 20 have been amended.
Claims 3, 10, and 17 have been cancelled.
Claims 1-20 are pending.
Claims 1-20 are rejected.
Claim Rejections – 35 U.S.C. § 112
3. The following is a quotation of 35 U.S.C. § 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
4. Claims 1, 4-9, 11-16 and 18-20 are rejected under 35 U.S.C. § 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor regards as the invention.
Claim 1, lines 9-10 recite “the plurality of machine learning models.” There is insufficient antecedent basis for this limitation in the claim.
Claim 1, lines 17-18 recite “the plurality of machine learning models.” There is insufficient antecedent basis for this limitation in the claim.
Claim 8, lines 10-11, recite “the plurality of machine learning models.” There is insufficient antecedent basis for this limitation in the claim.
Claim 8, lines 17-18, recite “the plurality of machine learning models.” There is insufficient antecedent basis for this limitation in the claim.
Claim 15, line 9, recites “the plurality of machine learning models.” There is insufficient antecedent basis for this limitation in the claim.
Claim 15, lines 15-16, recite “the plurality of machine learning models.” There is insufficient antecedent basis for this limitation in the claim.
Claims 2 and 4-7 depend directly or indirectly from claim 1. Claims 9 and 11-14 depend directly or indirectly from claim 8. Claims 16 and 18-20 depend directly or indirectly from claim 15. Claims 2, 4-7, 9, 11-14, 16, and 18-20 are rejected as depending from a rejected claim; further, the claims fail to cure the deficiencies of claims 1, 8, and 15.
Claim Rejections - 35 U.S.C. § 101
5. 35 U.S.C. § 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
6. Claims 1, 4-9, 11-16, and 18-20 are rejected under 35 U.S.C. § 101 because the claimed invention is directed to an abstract idea without significantly more.
Claim 1 recites a computer-implemented method, which is a process, and thus one of the statutory categories of patentable subject matter. (35 U.S.C. § 101).
However, under Step 2A Prong One, the claim recites the limitations of “[(a)]1 processing, using one or more machine learning models, one or more portions of an input sequence to generate one or more candidate output sequences, thus defining a plurality of prediction scores for the one or more candidate output sequences,” “[(b)] identifying one or more specialized entities from the one or more candidate output sequences,” “[(c)] applying, using the one or more machine learning models, a first scoring methodology on the one or more candidate output sequences based upon, at least in part, the one or more portions of the input sequence, thus defining a first set of prediction scores for the one or more candidate output sequences,” “[(d)] applying a second scoring methodology on the one or more specialized entities from the one or more candidate output sequences based upon, at least in part, the one or more portions of the input sequence, thus defining a second set of prediction scores for the one or more specialized entities,” “[(e)] at least partially modifying the plurality of prediction scores for the one or more specialized entities based upon, at least in part, the first set of prediction scores and the second set of prediction scores,” and “[(f)] generating output sequences based upon the plurality of prediction scores for the one or more candidate output sequences.” The activities of “[(a)] processing to generate,” “[(b)] identifying,” “[(c)] applying . . . thus defining,” “[(e)] modifying . . . scores,” and “[(f)] generating output sequences” are limitations that can practically be performed in the human mind, including, for example, observations, evaluations, judgments, and opinions, and accordingly, are a mental process, (MPEP § 2106.04(a)(2) sub III), which is one of the groupings of abstract ideas. (MPEP § 2106.04(a)(2)).
The claim recites more details or specifics to the abstract idea of “[(b)] identifying,” “[(b.1)] wherein identifying the one or more specialized entities from the one or more candidate output sequences includes tagging a plurality of specialized entities, [(b.1.1)] the tagging comprising a beginning tag and an ending tag, the tagging defining a tagged region,” and accordingly, is merely more specific to the abstract idea.
The claim also recites more details or specifics to the abstract idea of “[(e)] modifying . . . scores,” in that “the modifying comprising biasing the first set of prediction scores for the one or more specialized entities with the second set of prediction scores, tokens of the input sequence scored independently of whether the tokens are within a named entity tag,” and accordingly, is merely more specific to the abstract idea. Thus, claim 1 recites an abstract idea.
Under Step 2A Prong Two, the claim as a whole is not integrated into a practical application, because the additional elements recited in the claim beyond the identified judicial exception include a “computer-implemented method,” and a “computing device,” which are generic computer components used to implement the abstract idea, (MPEP § 2106.05(f)), that does not serve to integrate the abstract idea into a practical application. The claim also recites “one or more machine learning models,” a “first machine learning model of the plurality of machine learning models,” and a “second machine learning model of the plurality of machine learning models,” which are recited at a high level of generality, and accordingly is also a generic computer component that is used to implement the abstract idea, (MPEP § 2106.05(f)), that does not serve to integrate the abstract idea into a practical application. The claim also recites more details or specifics to the additional element of the “machine learning models,” where “[(b.1.1)] the plurality of machine learning models trained with training data comprising tagged regions for a set of specialized entities,” and accordingly, are merely more specific to the additional element. Therefore, claim 1 is directed to the abstract idea.
Finally, under Step 2B, the additional elements, taken alone or in combination, do not represent significantly more than the abstract idea itself. The additional elements include a “computer-implemented method,” and a “computing device,” which are generic computer components used to implement the abstract idea, (MPEP § 2106.05(f)), that does not amount to significantly more than the abstract idea. The claim also recites “one or more machine learning models,” a “first machine learning model of the plurality of machine learning models,” and a “second machine learning model of the plurality of machine learning models, which are recited at a high level of generality, and accordingly is also a generic computer component that is used to implement the abstract idea, (MPEP § 2106.05(f)), that does not that does not amount to significantly more than the abstract idea. The claim also recites more details or specifics to the additional element of the “machine learning models,” where “[(b.1.1)] the plurality of machine learning models trained with training data comprising tagged regions for a set of specialized entities,” and accordingly, are merely more specific to the additional element. Therefore, claim 1 is subject-matter ineligible.
Claim 8 recites a computer program product, which is a product, and thus one of the statutory categories of patentable subject matter. (35 U.S.C. § 101).
However, under Step 2A Prong One, the claim recites the limitations of “[(a)] processing, using one or more machine learning models, one or more portions of an input sequence to generate one or more candidate output sequences, thus defining a plurality of prediction scores for the one or more candidate output sequences,” “[(b)] identifying one or more specialized entities from the one or more candidate output sequences,” “[(c)] applying, using the one or more machine learning models, a first scoring methodology on the one or more candidate output sequences based upon, at least in part, the one or more portions of the input sequence, thus defining a first set of prediction scores for the one or more candidate output sequences,” “[(d)] applying a second scoring methodology on the one or more specialized entities from the one or more candidate output sequences based upon, at least in part, the one or more portions of the input sequence, thus defining a second set of prediction scores for the one or more specialized entities,” “[(e)] at least partially modifying the plurality of prediction scores for the one or more specialized entities based upon, at least in part, the first set of prediction scores and the second set of prediction scores,” and “[(f)] generating output sequences based upon the plurality of prediction scores for the one or more candidate output sequences.” The activities of “[(a)] processing to generate,” “[(b)] identifying,” “[(c)] applying . . . thus defining,” “[(e)] modifying . . . scores,” and “[(f)] generating output sequences” are limitations that can practically be performed in the human mind, including, for example, observations, evaluations, judgments, and opinions, and accordingly, are a mental process, (MPEP § 2106.04(a)(2) sub III), which is one of the groupings of abstract ideas. (MPEP § 2106.04(a)(2)).
The claim recites more details or specifics to the abstract idea of “[(b)] identifying,” “[(b.1)] wherein identifying the one or more specialized entities from the one or more candidate output sequences includes tagging a plurality of specialized entities, [(b.1.1)] the tagging comprising a beginning tag and an ending tag, the tagging defining a tagged region,” and accordingly, is merely more specific to the abstract idea.
The claim also recites more details or specifics to the abstract idea of “[(e)] modifying . . . scores,” in that “the modifying comprising biasing the first set of prediction scores for the one or more specialized entities with the second set of prediction scores, tokens of the input sequence scored independently of whether the tokens are within a named entity tag,” and accordingly, is merely more specific to the abstract idea. Thus, claim 8 recites an abstract idea.
Under Step 2A Prong Two, the claim as a whole is not integrated into a practical application, because the additional elements recited in the claim beyond the identified judicial exception include a “non-transitory computer readable medium having a plurality of instructions stored thereon which, when executed by a processor, cause the processor to,” and a “computing device,” which are generic computer components used to implement the abstract idea, (MPEP § 2106.05(f)), that does not serve to integrate the abstract idea into a practical application. . The claim also recites “one or more machine learning models,” a “first machine learning model of the plurality of machine learning models,” and a “second machine learning model of the plurality of machine learning models,” which are recited at a high level of generality, and accordingly is also a generic computer component that is used to implement the abstract idea, (MPEP § 2106.05(f)), that does not serve to integrate the abstract idea into a practical application. The claim also recites more details or specifics to the additional element of the “machine learning models,” where “[(b.1.1)] the plurality of machine learning models trained with training data comprising tagged regions for a set of specialized entities,” and accordingly, are merely more specific to the additional element. Therefore, claim 8 is directed to the abstract idea.
Finally, under Step 2B, the additional elements, taken alone or in combination, do not represent significantly more than the abstract idea itself. The additional elements include a “non-transitory computer readable medium having a plurality of instructions stored thereon which, when executed by a processor, cause the processor to” which are generic computer components used to implement the abstract idea, (MPEP § 2106.05(f)), that does not amount to significantly more than the abstract idea. . The claim also recites “one or more machine learning models,” a “first machine learning model of the plurality of machine learning models,” and a “second machine learning model of the plurality of machine learning models,” which are recited at a high level of generality, and accordingly is also a generic computer component that is used to implement the abstract idea, (MPEP § 2106.05(f)), that does not that does not amount to significantly more than the abstract idea. The claim also recites more details or specifics to the additional element of the “machine learning models,” where “[(b.1.1)] the plurality of machine learning models trained with training data comprising tagged regions for a set of specialized entities,” and accordingly, are merely more specific to the additional element. Therefore, claim 8 is subject-matter ineligible.
Claim 15 recites a computer system, which is a product, and thus one of the statutory categories of patentable subject matter. (35 U.S.C. § 101).
However, under Step 2A Prong One, the claim recites the limitations of “[(a)] process, using one or more machine learning models, one or more portions of an input sequence to generate one or more candidate output sequences, thus defining a plurality of prediction scores for the one or more candidate output sequences,” “[(b)] . . . to identify one or more specialized entities from the one or more candidate output sequences,” “[(c)] . . . to apply, using the one or more machine learning models, a first scoring methodology on the one or more candidate output sequences based upon, at least in part, the one or more portions of the input sequence, thus defining a first set of prediction scores for the one or more candidate output sequences,” “[(d)] . . . to apply a second scoring methodology on the one or more specialized entities from the one or more candidate output sequences based upon, at least in part, the one or more portions of the input sequence, thus defining a second set of prediction scores for the one or more specialized entities,” “[(e)] . . . to at least partially modifying the plurality of prediction scores for the one or more specialized entities based upon, at least in part, the first set of prediction scores and the second set of prediction scores,” and “[(f)] generating output sequences based upon the plurality of prediction scores for the one or more candidate output sequences.” The activities of to “[(a)] process. . . to generate,” “[(b)] identify,” “[(c)] apply . . . thus defining,” “[(e)] modify . . . scores,” and “[(f)] generating output sequences” are limitations that can practically be performed in the human mind, including, for example, observations, evaluations, judgments, and opinions, and accordingly, are a mental process, (MPEP § 2106.04(a)(2) sub III), which is one of the groupings of abstract ideas. (MPEP § 2106.04(a)(2)).
The claim recites more details or specifics to the abstract idea of “[(b)] identifying,” “[(b.1)] wherein identifying the one or more specialized entities from the one or more candidate output sequences includes tagging a plurality of specialized entities, [(b.1.1)] the tagging comprising a beginning tag and an ending tag, the tagging defining a tagged region,” and accordingly, is merely more specific to the abstract idea.
The claim also recites more details or specifics to the abstract idea of “[(e)] modifying . . . scores,” in that “the modifying comprising biasing the first set of prediction scores for the one or more specialized entities with the second set of prediction scores, tokens of the input sequence scored independently of whether the tokens are within a named entity tag,” and accordingly, is merely more specific to the abstract idea. Thus, Claim 15 recites an abstract idea.
Under Step 2A Prong Two, the claim as a whole is not integrated into a practical application, because the additional elements recited in the claim beyond the identified judicial exception include a “memory” and “a processor,” which are generic computer components used to implement the abstract idea, (MPEP § 2106.05(f)), that does not serve to integrate the abstract idea into a practical application. The claim also recites “one or more machine learning models,” a “first machine learning model of the plurality of machine learning models,” and a “second machine learning model of the plurality of machine learning models,” which are recited at a high level of generality, and accordingly is also a generic computer component that is used to implement the abstract idea, (MPEP § 2106.05(f)), that does not serve to integrate the abstract idea into a practical application. The claim also recites more details or specifics to the additional element of the “machine learning models,” where “[(b.1.1)] the plurality of machine learning models trained with training data comprising tagged regions for a set of specialized entities,” and accordingly, are merely more specific to the additional element. Therefore, claim 15 is directed to the abstract idea.
Finally, under Step 2B, the additional elements, taken alone or in combination, do not represent significantly more than the abstract idea itself. The additional elements include a “memory” and “a processor,” which are generic computer components used to implement the abstract idea, (MPEP § 2106.05(f)), that does not amount to significantly more than the abstract idea. The claim also recites “one or more machine learning models,” a “first machine learning model of the plurality of machine learning models,” and a “second machine learning model of the plurality of machine learning models,” which are recited at a high level of generality, and accordingly is also a generic computer component that is used to implement the abstract idea, (MPEP § 2106.05(f)), that does not that does not amount to significantly more than the abstract idea. The claim also recites more details or specifics to the additional element of the “machine learning models,” where “[(b.1.1)] the plurality of machine learning models trained with training data comprising tagged regions for a set of specialized entities,” and accordingly, are merely more specific to the additional element. Therefore, claim 15 is subject-matter ineligible.
Claim 2 depends from claim 1. Claim 9 depends from claim 8. Claim 16 depends from claim 15. These claims provide more details or specifics to the additional element of the “[(a)] one or more machine learning models” that include “a sequence-to-sequence model configured to process the one or more portions of an input sequence to generate one or more output sequences,” and accordingly, is merely more specific to the additional element. Therefore, claims 2, 9, and 16 are subject-matter ineligible.
Claim 4 depends from claim 1. Claim 11 depends from claim 8. Claim 18 depends from claim 15. The claims recite more details or specifics to the abstract idea of “[(c)] applying, using the one or more machine learning models, a first scoring methodology,” where “[(c.1)] the first scoring methodology is based upon, at least in part, a first probability distribution associated with an internal language model of the one or more machine learning models,” and accordingly, are merely more specific to the abstract idea. Therefore, claims 4, 11, and 18 are subject-matter ineligible.
Claim 5 depends directly or indirectly from claim 1. Claim 12 depends directly or indirectly from claim 8. Claim 19 depends directly or indirectly from claim 15. The claims recite more details or specifics to the abstract idea of “[(d)] applying a second scoring methodology,” where “[(d.2)] wherein the second scoring methodology is based upon, at least in part, a second probability distribution associated with an external language model,” and accordingly, are merely more specific to the abstract idea. Therefore, claims 5, 12, and 19 are subject-matter ineligible.
Claim 6 depends directly or indirectly from claim 1. Claim 13 depends directly or indirectly from claim 8. Claim 20 depends directly or indirectly from claim 15. The claims recite more details or specifics to the abstract idea of “[(e)] at least partially modifying the plurality of prediction scores,” where “the first set of prediction scores and the second set of prediction scores includes one or more of: [(e.1)] at least partially removing the first set of prediction scores for the one or more specialized entities from the plurality of prediction scores for the one or more candidate output sequences,” and “[(e.2)] adding the second set of prediction scores for the specialized entities to the plurality of prediction scores for the one or more candidate output sequences,” and accordingly, are merely more specific to the abstract idea. Therefore, claims 6, 13, and 20 are subject-matter ineligible.
Claim 7 depends from claim 1. Claim 14 depends from claim 8. The claim recites more details or specifics of the abstract idea of “identifying the one or more specialized entities” by “[(b.1)] tagging a plurality of specialized entities, thus defining one or more tagged portions,” and accordingly, are merely more specific to the abstract idea. Therefore claims 7 and 14 are subject-matter ineligible.
Claim Rejections - 35 U.S.C. § 102
7. The following is a quotation of the appropriate paragraphs of 35 U.S.C. § 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
8. Claims 1, 4-9, 11-16 and 18-20 are rejected under 35 U.S.C. § 102(a)(1) as being anticipated by US Published Application 20200357388 to Zhao et al. [hereinafter Zhao].
Regarding claims 1, 8, and 15, Zhao teaches [a] computer-implemented method, executed on a computing device (Zhao ¶ 0014 teaches “method further includes executing, by the data processing hardware, using the speech recognition scores and the context scores, a beam search decoding process to determine one or more candidate transcriptions for the utterance, and selecting, by the data processing hardware, a transcription for the utterance from the one or more candidate transcriptions”) of claim 1, [a] computer program product residing on a non-transitory computer readable medium having a plurality of instructions stored thereon which, when executed by a processor, cause the processor to (Zhao ¶ 0118 teaches that “computing device 600 includes a processor 610, memory 620 [(that is, non-transitory computer readable medium)]. . . . The processor 610 can process instructions for execution within the computing device 600, including instructions stored in the memory 620 or on the storage device 630 [(that is, a plurality of instructions stored thereon which, when executed by a processor, cause the processor to)] . . . . The memory 620 stores information non-transitorily within the computing device 600”) of claim 8, and [a] computing system (Zhao ¶ 0019 teaches “a system for using contextual biasing to transcribe speech. The system includes data processing hardware and memory hardware in communication with the data processing hardware [(that is, a computing system)]”), comprising:
[(a)] processing, using one or more machine learning models (Zhao ¶ 0071 teaches that the “RNN-T model 200 includes an encoder network 210, a prediction network 220, and a joint network 230 [(that is, using one or more machine learning models)]”), one or more portions of an input sequence to generate one or more candidate output sequences (Zhao, Fig. 1, teaches example speech recognizer using contextual biasing to transcribe speech [Examiner annotations in dashed-line text boxes]:”
PNG
media_image1.png
775
1034
media_image1.png
Greyscale
Zhao ¶ 0045 teaches “[w]hen the utterance 120 is spoken, the one or more microphones of the user device 110 generate audio data 125 representing the acoustic characteristics of the utterance 120. A feature extraction module 130 receives the audio data 125 and that generates acoustic features 135 (e.g., log-mel features) from the audio data 125 [(that is, “acoustic features 135” is to generate one or more candidate output sequences)]. For example, the output of the [feature extraction] module 130 can be an acoustic feature vector for each window or frame (e.g., segment) of the audio data 125 [(that is, “segment” is one or more portions of an input sequence)], where the acoustic feature vector includes values indicating features such as the energy level at different frequency bands [(that is, processing, using one or more machine learning models, one or more portions of an input sequence to generate one or more candidate output sequences)]”),
[(a.1)] thus defining a plurality of prediction scores for the one or more candidate output sequences (Zhao ¶ 0046 teaches “speech recognition model 200 receives the acoustic features 135 as input and calculates, as output, speech recognition scores 145 representing the likelihood [(that is, “likelihood” is thus defining a plurality of prediction scores for the one or more candidate output sequences)] that different speech elements have occurred”);
[(b)] identifying one or more specialized entities from the one or more candidate output sequences (Zhao ¶ 0004 teaches “[c]ontextual automated speech recognition (ASR) involves biasing speech recognition towards a given context, such as towards a user's own playlist, contacts, or geographic place names [(that is, a “given context” is identifying one or more specialized entities from the one or more candidate output sequences)]. Context information usually includes a list of relevant phrases to be recognized, which often includes rare phrases or even foreign words which are seen infrequently in training. To perform contextual biasing, conventional ASR systems sometimes model contextual information in an independent contextual language model (LM), using an n-gram weighted finite state transducer (WFST), and compose the independent contextual LM with a baseline LM for on-the-fly (OTF) rescoring”),
[(b.1)] wherein identifying the one or more specialized entities from the one or more candidate output sequences includes tagging a plurality of specialized entities (Zhao ¶ 0059 teaches a “proper noun tagger may run on each utterance such that for each proper noun, the phonetic representation of the proper noun is produced”),
[(b.1.1)] the tagging comprising a beginning tag and an ending tag , the tagging defining a tagged region, the plurality of machine learning models trained with training data comprising tagged regions for a set of specialized entities (Zhao ¶ 0013 teaches that “for training, the system can obtain or create a large number of proper noun text-only queries, and then synthesize corresponding speech”; Zhao ¶ 0059 teaches “to emphasize the recognition of proper nouns during training, a proper noun tagger process can be run to filter the automatically generated transcriptions in the unsupervised data 193. In some implementations, only example utterances tagged as including a proper noun are used in training”);
[(c)] applying, using a first machine learning model of the one or more machine learning models (Zhao, Fig. 2 teaches a speech recognition model 200 that may include an E2E, RNN-T model 200 having a first machine learning model and a second machine learning model [Examiner annotations in dashed-line text boxes]:
PNG
media_image2.png
703
721
media_image2.png
Greyscale
Zhao ¶ 0054 teaches the “RNN-T model 200 includes an encoder network 210, a prediction network 220, and a joint network 230. The encoder network 210, which is roughly analogous to an acoustic model (AM) in a traditional ASR system [(that is, a first machine learning model)]”; Zhao ¶ 0045 teaches “a feature extraction module 130 and an end-to-end speech recognition model 200 [(that is, using the one or more machine learning models)]”), a first scoring methodology on the one or more candidate output sequences based upon, at least in part, the one or more portions of the input sequence,
[(c.1)] thus defining a first set of prediction scores for the one or more candidate output sequences (Zhao ¶ 0046 teaches “[t]he speech recognition model 200 receives the acoustic features 135 [(that is, the one or more candidate output sequences)] as input and calculates, as output, speech recognition scores 145 [(that is, a first scoring methodology)] representing the likelihood that different speech elements have occurred [(that is, applying . . . a first scoring methodology on the one or more candidate output sequences based upon, at least in part, the one or more portions of the input sequence, thus defining a first set of prediction scores for the one or more candidate output sequences)]”);
[(d)] applying, using a second machine learning model of the plurality of machine learning models, the second machine learning model different from the first machine learning model (Zhao ¶¶ 0054-55 & Fig. 2 (above) teaches the “RNN-T model 200 includes an encoder network 210, a prediction network 220, and a joint network 230. . . . Similarly, the prediction network 220 is also an LSTM network, which like a language model (LM), processes the sequence of no-blank symbols [(that is, a second machine learning model)]”), a second scoring methodology on the one or more specialized entities from the one or more candidate output sequences based upon, at least in part, the one or more portions of the input sequence (Zhao ¶ 0047 teaches “[u]sing any or all of this context information, the context analysis module 165 can select from among the contextual [finite-state transducers (FSTs)] 160 or apply different weights to the contextual FSTs 160”; Zhao ¶ 0047 teaches “[t]he contextual FSTs 160 (e.g., one or more contextual FSTs 160 selected to be applicable to the current context of the utterance 120 [(that is, at least in part, the one or more portions of the input sequence)]) are then used to generate context scores 166 [(that is, applying a second scoring methodology on the one or more specialized entities)] that can bias the recognition process toward the terms and phrases identified in the data storage [of contacts, media names, locations, and app names [(that is, the one or more specialized entities)]] 150”),
[(d.1)] thus defining a second set of prediction scores for the one or more specialized entities (Zhao ¶ 0051 teaches “the context scores 166 bias the recognition toward terms that are more relevant for the particular user 115 in the current context than for speech recognition generally [(that is, “context scores” is thus defining a second set of prediction scores for the one or more specialized entities)]”); and
[(e)] at least partially modifying the plurality of prediction scores for the one or more specialized entities based upon, at least in part, the first set of prediction scores and the second set of prediction scores (Zhao ¶ 0051 teaches “a score combiner 170 combines the speech recognition scores 145 [(that is, the first set of prediction scores)] with the context scores 166 [(that is, the second set of prediction scores)] to produce combined scores 172 [(that is, a “score combiner” is at least partially modifying the plurality of prediction scores for the one or more specialized entities based upon, at least in part, the first set of prediction scores and the second set of prediction scores)] is used in a speech lattice 175”),
[(e.1)] the modifying comprising biasing the first set of prediction scores for the one or more specialized entities with the second set of prediction scores, tokens of the input sequence scored independently of whether the tokens are within a named entity tag (Zhao ¶¶ 0075-76 & Fig. 2 (above) teaches “[s]hallow fusion interpolates the score from the end-to-end model with an external contextual LM during beam-search decoding, given by Equation (1),
PNG
media_image3.png
157
396
media_image3.png
Greyscale
Here, Pc(y) is the score from the contextual [language model (LM)] and λ is a tunable hyperparameter controlling how much the contextual LM [(that is, the first set of prediction scores)] influences the overall model score [(that is, the “overall score” being the first set of prediction scores . . . with the second set of prediction scores)] during beam search [(that is, the “tunable hyperparameter λ controlling . . . influences” is biasing, where the modifying comprising biasing the first set of prediction scores for the one or more specialized entities with the second set of prediction scores)]”; also, Zhao ¶ 0075 teaches that “[g]iven a set of acoustics observations x = (X1 , . . . Xx), end-to-end models provide posterior probabilities for a set of subword units y = (y1, . . . , yL) given these observations, that is P(y|x) [(that is, tokens of the input sequence scored independently of whether the tokens are within a named entity tag)]”); and
[(f)] generating output sequences based upon the plurality of prediction scores for the one or more candidate output sequences (Zhao ¶ 0051 teaches “[t]he context scores 166 based on the context information 122, 186 and the speech recognition scores 145 [(that is, based upon the plurality of prediction scores for the one or more candidate output sequences)] based on acoustic information 135 are used together to determine a transcription 185 [(that is, “transcription” is generating output sequences)] for the utterance 120”).
Regarding claims 2, 9, and 16, Zhao teaches all of the limitations of claims 1, 8, and 15, respectively, as described above in detail.
Zhao teaches -
wherein the one or more machine learning models include a sequence-to-sequence model (Zhao, Fig. 2, teaches an automatic speech recognition model 200 [Examiner annotations in dashed-line text boxes]:
PNG
media_image4.png
654
528
media_image4.png
Greyscale
Zhao ¶ 0005 teaches “end-to-end (E2E) models have shown great promise for [automatic speech recognition (ASR)] . . . . Representative E2E models [(that is, the one or more machine learning models include a sequence-to-sequence model)] include word-based connectionist temporal classification (CTC) models, recurrent neural network transducer (RNN-T) models, and attention-based models such as Listen, Attend, and Spell (LAS)”) configured to process the one or more portions of an input sequence to generate one or more output sequences (Zhao ¶ 0054 teaches a “RNN-T model 200 includes an encoder network 210, a prediction network 220, and a joint network 230. The encoder network 210, which is roughly analogous to an acoustic model (AM) in a traditional ASR system, includes a recurrent network of stacked Long Short-Term Memory (LSTM) layers. For instance the encoder reads a sequence of d-dimensional feature vectors [(that is, “encoder” is configured to process the one or more portions of an input sequence)]”;
[Examiner notes the plain meaning of a sequence-to-sequence model is one used for converting sequences of data from one domain to another, where the input and output sequences may have different lengths, and accordingly, the broadest reasonable interpretation of the term covers the teachings of Zhao pertaining to end-to-end (E2E) models and the use of automatic speech recognition models, which is not inconsistent with the Applicant’s disclosure (MPEP § 2111)]).
Regarding claims 4, 11, and 18, Zhao teaches all of the limitations of claims 1, 8, and 15, respectively, as described above in detail.
Zhao teaches -
wherein the first scoring methodology is based upon, at least in part, a first probability distribution (Zhao ¶ 0046 teaches the “speech recognition model 200 may output a vector of scores representing a probability distribution over a set of output targets [(that is, a first probability distribution )]”) associated with an internal language model of the one or more machine learning models (Zhao ¶ 0054 teaches a “speech recognition model 200 may include an E2E, RNN-T model 200 which adheres to latency constrains associated with interactive applications. The RNN-T model 200 provides a small computational footprint and utilizes less memory requirements than conventional ASR architectures, making the RNN-T model architecture suitable for performing speech recognition entirely on the user device 102 (e.g., no communication with a remote server is required) [(that is, “on the user device” is an internal language model of the one or more machine learning models)]”).
Regarding claims 5, 12, and 19, Zhao teaches all of the limitations of claims 4, 11, and 19, respectively, as described above in detail.
Zhao teaches -
wherein the second scoring methodology is based upon, at least in part, a second probability distribution (Zhao ¶ 0010 teaches “integrating contextual information and outputs (e.g., probability distributions over possible speech recognition hypothesis) [(that is, a second probability distribution)]”) associated with an external language model (Zhao ¶ 0005 teaches “E2E models, which fold the acoustic model (AM), pronunciation model (PM), and LMs into a single network to directly learn speech-to-text mapping, have shown competitive results compared to conventional ASR systems which have a separate AM, PM, and LMs. Representative E2E models include word-based connectionist temporal classification (CTC) models, recurrent neural network transducer (RNN-T) models, and attention-based models such as Listen, Attend, and Spell (LAS)”).
Regarding claims 6, 13, and 20, Zhao teaches all of the limitations of claims 5, 12, and 19, respectively, as described above in detail.
Zhao teaches -
wherein at least partially modifying the plurality of prediction scores for the one or more specialized entities based upon, at least in part, the first set of prediction scores and the second set of prediction scores includes one or more of:
at least partially removing the first set of prediction scores for the one or more specialized entities from the plurality of prediction scores for the one or more candidate output sequences; and
adding the second set of prediction scores for the specialized entities to the plurality of prediction scores for the one or more candidate output sequences (Zhao, Fig. 3, teaches a beam pruning process [Examiner annotations in dashed-line text boxes]:”
PNG
media_image5.png
814
970
media_image5.png
Greyscale
Zhao ¶ 0061 teaches a diagram 300 of the speech recognizer 100 executing the beam pruning process (e.g., pruning 180 of FIG. 1) on the lattice 175 output by the score combiner 170 (FIG. 1) based on the speech recognition scores 145 (FIG. 1) and the context scores 166 (FIG. 1)”; Zhao ¶ 0051 teaches “the context scores 166 bias the recognition toward terms that are more relevant for the particular user 115 in the current context than for speech recognition generally. In some implementations, a score combiner 170 combines [(that is, adding)] the speech recognition scores 145 with the context scores 166 [(that is, “context scores” is adding the second set of prediction scores for the specialized entities to produce combined scores 172 used in a speech lattice 175”).
Regarding claims 7 and 14, Zhao teaches all of the limitations of claims 1, 8, and 15, respectively, as described above in detail.
Zhao teaches -
wherein identifying the one or more specialized entities from the one or more candidate output sequences includes:
tagging a plurality of specialized entities, thus defining one or more tagged portions (Zhao, Fig. 1, teaches the use of prefix FSTs to tag context FSTs [Examiner annotations in dashed-line text boxes]:”
PNG
media_image6.png
693
980
media_image6.png
Greyscale
Zhao ¶ 0043 teaches that “[o]ne way to tailor the contextual biasing to the current context is to use prefix [finite state transducers (FSTs)] 163 [(that is, a “prefix FST” is tagging)] each representing the occurrence of different sets of prefixes that correspond to a different respective context [(that is, “corresponds to a different respective context” is a plurality of specialized entities)]. For example, the occurrence of the prefix ‘call’ can indicate that a contact name is likely the next word, and so the prefix FST 163 for this prefix can cause the speech recognizer 100 to enable the contact names contextual FST 160 [(that is, “prefix ‘call’” corresponding to “contact names contextual FST” is tagging a plurality of specialized entities, thus defining one or more tagged portions)]. As another example, the occurrence of the prefix “play” [(that is, “prefix ‘play’” is tagging)] can indicate that a media item name is likely the next word, and so the prefix FST 163 for this prefix can cause the speech recognizer to enable media item names contextual FST 160. Each contextual FST 160 can optionally have a corresponding prefix FST 163 representing a set of one or more prefixes that have been determined, through analysis of user input logs, to indicate that the terms in the contextual FST 160 are likely.”).
Response to Arguments
9. Examiner has fully considered Applicant’s arguments and responds below accordingly.
Claim Rejections – 35 U.S.C. § 101
10. Under Step 2A Prong One, Applicant submits that “[a]s amended, claim 1 recites a system that exists outside of the human mind.” (Response at p. 9).
In this respect, Applicant submits that “claim 1 recites a model biasing process that uses machine learning models to process portions of an input sequence to generate candidate output sequences, defining prediction scores for the candidate output sequences. . . . It is unclear to Applicant how integrating external language models, which recognize rare entries not seen during training of the primary language model, and then using the modified output sequences to generate output can be performed in the human mind. Indeed, Applicant respectfully submits that the claimed invention, as amended, recites features that can only exist outside of the human mind.” (Response at pp. 9-10).
Thusly, Applicant submits that “claim 1 is directed to a complex system that receives a set of input sequences, generates a set of candidate output sequences, identifies specialized entities in the candidate output sequences, applies a first scoring methodology associated with a first machine learning model against the candidate output sequences and a second scoring methodology associated with a second machine learning model against the specialized entities, tokens of the input sequence scored independently of whether the tokens are within a named entity tag to obtain a set of prediction scores, modifies the prediction scores associated with the candidate output sequences based on the prediction scores associated with the second scoring methodology, and generates output sequences based on the modified candidate output sequences. See, Specification at [0046], [0048], [0052].
Applicant respectfully submits that claim 1, as amended, does not fall within the mental process grouping of abstract ideas and thus is not directed toward an abstract idea pursuant to step 2A prong 1.” (Response at p. 19 (emphasis added by Applicant)).
Examiner’s Response:
Examiner respectfully disagrees, because the rejection identifies the abstract idea (that is, the judicial exception) by referring to what is recited (i.e., set forth or described) in the claim and explain why it is considered an exception. (MPEP § 2106.07(a)).
The rejection sets out, inter alia, the activities of “[(a)] processing to generate,” “[(b)] identifying,” “[(c)] applying . . . thus defining,” and “[(e)] modifying . . . scores” are limitations that can practically be performed in the human mind, including, for example, observations, evaluations, judgments, and opinions, and accordingly, are a mental process, (MPEP § 2106.04(a)(2) sub III), which is one of the groupings of abstract ideas. (MPEP § 2106.04(a)(2)).
Accordingly, the claims recite an abstract idea, as set out above in detail.
11. Under Step 2A Prong Two, Applicant submits that “[a]s amended, claim 1 improves the functioning of a sequence-to-sequence processing system because the features recited improve the system's transformation accuracy by applying a first machine-learning model and a second machine-learning model against specialized entities not encountered by the first machine-learning model during its training. The Specification discusses that the invention provides significant improvements in overall recognition accuracy when processing specific entities within a given input sequence.” (Response at pp. 10-11 (citing Specification ¶¶ 0036 (“improvements in overall recognition accuracy”), 0025 (score modifications), 0043 (“improves effectiveness of computing hardware”)).
Applicant submits that “[t]herefore, the present invention integrates the alleged judicial exception into a practical application because it provides improvements in computer technologies. Specifically, the
claimed invention provides improvements in the sequence-to-sequence processing systems employing machine-learning models. Accordingly, for at least these reasons, applicant respectfully submits that the alleged abstract idea is integrated into a practical application pursuant to step 2A prong 2 analysis.” (Response at p. 11).
Examiner’s Response:
Examiner respectfully disagrees because the rejection identifies any additional elements recited in the claim beyond the identified abstract idea, and evaluates the integration of the judicial exception into a practical application by explaining that the claim as a whole, looking at the additional elements individually and in combination, does not integrate the judicial exception into a practical application using the considerations set forth in MPEP §§ 2106.04(d), 2106.05(a)- (c) and (e)- (h). (MPEP § 2106.07(a)).
Under Step 2A Prong Two, a claim that integrates a judicial exception into a practical application of the exception will apply, rely on, or use the abstract idea in a manner that imposes a meaningful limit on the judicial exception, such that the claim is more than a drafting effort designed to monopolize or preempt the judicial exception. (2024 SME Guidance, 89 Fed. Reg. 137 at p. 58136 (17 July 2024)).
The claims recite additional elements of a “computer-implemented method,” and a “computing device,” which are generic computer components used to implement the abstract idea, (MPEP § 2106.05(f)), that does not serve to integrate the abstract idea into a practical application. The claim also recites “one or more machine learning models,” a “first machine learning model of the plurality of machine learning models,” and a “second machine learning model of the plurality of machine learning models, which are recited at a high level of generality, and accordingly are also a generic computer components that used to implement the abstract idea, (MPEP § 2106.05(f)).
The claims, taken as a whole, does not integrate the abstract idea into a practical application of the abstract idea because it does not apply, rely on, or use the abstract idea in a manner that imposes a meaningful limit on the abstract idea.
Accordingly, the claims are directed to an abstract idea, as set out above in detail.
12. Under Step 2B, “Applicant respectfully submits that, even if it is still determined that claim 1 recites an abstract idea in light of the amendments and arguments above (which is disputed herein by the Applicant), the additional elements of claim 1 amount to significantly more than the alleged abstract idea.” (Response at p. 11).
Particularly, Applicant submits that “Claim 1, as amended, includes unconventional features confining the claim to a particular useful application and that amount to significantly more than the alleged abstract idea. Specifically, as amended, claim 1 recites a system that . . . is unconventional, as conventional systems often rely on an internal language model. See, Specification [0036]. End-to- end machine-learning models may include or define a language model ( e.g., an ‘internal’ language model) for various output tokens or sequence segments, either implicitly or explicitly. Id. This may make exploitation of independently trained language models less straightforward than in conventional systems. Id. Accordingly, it may be difficult to dynamically adapt these E2E sequence-to-sequence processing systems for particular contextual profiles for better processing of specific entities (e.g., names entities, special terms, etc. Id. The probability of particular entries being observed in a first model may be very different from the probability of those entries being observed in a different model. Id. [0040]. Thus, the system claimed improves the accuracy of sequence-to-sequence processing by applying probability distributions for tokens of a candidate output sequence using language models specific to particular specialized entities. Id. [0054]. Applicant respectfully submits that claim 1 recites unconventional features and that the features sufficiently confine the claimed invention to a useful application of accurate output token identification and generation.” (Response at p. 12).
Examiner’s Response:
Examiner respectfully disagrees because the rejection explains why the additional elements, taken individually and in combination, do not result in the claim, as a whole, amounting to significantly more than the identified judicial exception. (MPEP § 2106.07(a)).
The Office guidance sets out that Step 2B includes a consideration of whether the additional element (or combination of elements) is a well-understood, routine, conventional activity. A claim may be found to lack significantly more (and thus be ineligible) based on one or more of these judicial considerations.
However, the rejections set out above set out the use of generic computer components to implement the abstract idea. Specifically, he additional elements include a “computer-implemented method,” and a “computing device,” which are generic computer components used to implement the abstract idea, (MPEP § 2106.05(f)), that does not amount to significantly more than the abstract idea. The claim also recites “one or more machine learning models,” a “first machine learning model of the plurality of machine learning models,” and a “second machine learning model of the plurality of machine learning models, which are recited at a high level of generality, and accordingly is also a generic computer component that is used to implement the abstract idea, (MPEP § 2106.05(f)), that does not that does not amount to significantly more than the abstract idea.
Accordingly, the claims are subject-matter ineligible, as set out above in detail.
Claim Rejections – 35 U.S.C. § 102
13. Applicant submits that “nothing in the cited portions of Zhao discloses that the modifying comprises biasing the first set of prediction scores for the one or more specialized entities with the second set of prediction scores, tokens of the input sequence scored independently of whether the tokens are within a named entity tag." During the Interview, Applicant understood Examiner Smith to agree with this reasoning. Therefore, amended claim 1 is patentable over Zhao. Support for this amendment may be found in paragraph [0059] of the originally filed specification.” (Response at p. 14).
Examiners Response:
Upon further consideration of the claims and the cited reference of Zhao, the reference teachings of Zhao are covered by the “biasing” of Applicant’s claims.
The claim recites, inter alia:
* * *
at least partially modifying the plurality of predictions prediction scores for the one or more specialized entities based upon, at least in part, the first set of prediction scores and the second set of prediction scores, the modifying comprising biasing the first set of prediction scores for the one or more specialized entities with the second set of prediction scores, tokens of the input sequence scored independently of whether the tokens are within a named entity tag; and
* * *
(claim 1, lines 23-28 (emphasis showing the amended language)).
The plain term “biasing” is setting the systematic deviation from impartial judgment. The broadest reasonable interpretation of the term “biasing” covers the teachings of Zhao relating to a tunable hyperparameter λ “controlling how much the contextual [language model} influences the overall model score during beam search.” (Zhao ¶ 0076).
Accordingly, Zhao teaches all of the limitations as set out by Applicant’s claimed invention, as set out above in detail.
Conclusion
14. Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
15. The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
(US Published Application 20240211496 to Polaczuk et al.) teaches determining one or more entity identifiers, determining an entity server address of the entity based on the one or more entity identifiers, wherein the entity server address points to an entity server; verifying the entity server address transmitting a message for request for information to the entity server address, receiving entity information from the entity server; and providing, to a machine learning model, the received entity information. The machine learning model is trained to generate a numerical representations of entities based on the entity information.
(Li et al., “Multi-Stream End-to-End Speech Recognition,” arXiv (2019)) teaches a multi-stream framework based on joint CTC/Attention E2E ASR with parallel streams represented by separate encoders aiming to capture diverse information. On top of the regular attention networks, the Hierarchical Attention Network (HAN) is introduced to steer the decoder toward the most informative encoders.
16. Any inquiry concerning this communication or earlier communications from the Examiner should be directed to KEVIN L. SMITH whose telephone number is (571) 272-5964. Normally, the Examiner is available on Monday-Thursday 0730-1730.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the Examiner by telephone are unsuccessful, the Examiner’s supervisor, KAKALI CHAKI can be reached on 571-272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/K.L.S./
Examiner, Art Unit 2122
/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122
1 Identifying reference markers are applied to the claim limitations for the limited purpose of aiding in claim evaluation for subject-matter eligibility.