DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
Claims 1-14, 21 and 23 are elected without traverse.
Claims 15-20, 22 and 24 are non-elected.
Election/Restrictions
In reply to the Restriction Requirement, an election was made without traverse to prosecute the invention of the elected group, Claims 1-14, 21 and 23. Affirmation of this election must be made by applicant in replying to this Office action. Claims 15-20, 22 and 24 withdrawn from further consideration by the examiner, 37 CFR 1.142(b), as being drawn to a non-elected invention.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-14, 21 and 23 and are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 2A, Prong One: The independent claim 21 recites “transforming input data into an output symbol and outputting the output symbol, wherein a bias module increases a likelihood of a candidate symbol that includes a registered symbol, and a combination table indicates a combination of a specific symbol designated by a user and a registered symbol corresponding to the specific symbol, the information processing method comprising: extracting a feature value from the input data; applying the feature value to a trained model to estimate at least one candidate symbol and a likelihood of the at least one candidate symbol; performing, using the bias module, an increasing process of increasing the likelihood of the candidate symbol that includes the registered symbol corresponding to the specific symbol designated by the user, among the at least one candidate symbol; performing, using the bias module, an increasing process of increasing the likelihood of the candidate symbol that includes the registered symbol, among the at least one candidate symbol; determining a temporary output symbol based on a respective likelihood of the at least one candidate symbol after the increasing process is performed; and when the registered symbol is included in the temporary output symbol, referring to the combination table, performing a transformation process of transforming the registered symbol into the specific symbol corresponding to the registered symbol, and outputting, as the output symbol, the temporary output symbol on which the transformation process has been performed”.
Claims 1, 21 and 23 recite obtaining audio, converting/transcribing audio to text, determining/increasing likelihood of the text according to the speaker’s bias, referring a table and generating a final text.
[Abstract idea indicators]
Transcribing speech into text is the conversion of verbal content to written form—a task humans routinely perform mentally or with conventional tools.
Determining or increasing scores based on user’s bias, i.e., a cognitive process.
Mapping or referring it to a table are decision-making and planning steps that are mental processes.
Accordingly, the claims are directed to the judicial exception of a mental process.
Step 2A, Prong Two: This judicial exception is not integrated into a practical application. The computer is recited at a high-level of generality (i.e., as performing a generic computer function and being used as an applying) such that it amounts no more than mere instructions to apply the exception using a generic computer. Accordingly, there additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B — Claims Do Not Recite an Inventive Concept That Transforms the Mental Process into Patent-Eligible Subject Matter
The claims add generic, well-understood computer components (memory, processor, and presenting to a workspace device) and broadly recite use of “a trained model” without describing any specific, unconventional structure, algorithmic detail, data structure, or system architecture that provides a concrete technical improvement in computer functionality.
Applying Alice step two and relevant Federal Circuit precedent:
The recitation of conventional computer components (memory and processor) performing routine functions does not supply an inventive concept.
The mere invocation of “trained model” without particularity does not demonstrate an unconventional machine or technique or a specific improvement in computer technology.
The claims recite high-level, result-oriented steps (e.g., “extract,” “determine,” “refer”) that describe mental processes rather than specific technical means for performing those processes.
Because the claims lack limitations that tie the mental-process steps to a particular way of achieving a technological improvement (for example, a novel model architecture, specialized data representation, unique training regimen that yields demonstrable technical performance gains, a specialized streaming/decoding pipeline that reduces latency by a quantifiable amount, or hardware/software co-design), the additional elements do not transform the mental processes into significantly more.
With respect to claims 1 and 23, the claim is similar to claim 21 and claims 1 and 23 recite additional element of “processor”, “memory” and “non-transitory recording medium”. The processor and memory are recited at a high-level of generality (i.e., as a generic processor performing generic computer functions and being used as an applying) such that it amounts no more than mere instructions to apply the exception using a generic computer component as well. These claims further do not remedy the judicial exception being integrated into a practical application and further fail to include additional elements that are sufficient to amount to significantly more than the judicial exception.
Therefore, claims 1, 21 and 23 fail to recite an inventive concept sufficient to transform the judicial exception into patent-eligible subject matter.
With respect to dependent claims 2-14, the claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception.
D. Conclusion — Rejection
Claims 1-14, 21 and 23 are rejected under 35 U.S.C. § 101 as being directed to a judicial exception (mental processes) and failing to recite additional elements that amount to significantly more than the judicial exception.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-4, 6-7, 10-14, 21, and 23 are rejected under pre-AIA 35 U.S.C. 103(a) as being unpatentable over Chelba et al., (US Pub. 2015/0371633) in view of Lam (US Pub. 2022/0343910).
Regarding claim 1, Chelba disclose an information processing device for transforming input data into an output symbol and outputting the output symbol, the information processing device comprising:
an interface for receiving a user input of the input data (Fig. 7, [0160] receiving a user input 762);
a memory storing a trained model; and a processor, wherein the processor:
extracts a feature value from the input data (Fig. 7, [0161] extracting a series of speech frames 752);
applies the feature value to the trained model to estimate at least one candidate symbol and a likelihood of the at least one candidate symbol (Fig. 7, [0162] identifying candidate transcriptions 754 a-754 c and each score 755 a-755 c for the input utterance 762 using an acoustic model which is trained using training data or training techniques),
wherein the memory stores a combination table and a bias module, the combination table indicates a combination of a specific symbol designated by the user and a registered symbol corresponding to the specific symbol (Figs. 1 and 2, [0032][0037]-[0039][0168] a dataset 122 stores key-value pairs that each identify a sequence with different amount of phonetic context around a particular phone in a particular instance of an utterance; The phonetic symbols are determined according to the specific language entered by the user, i.e., English),
the bias module increases a likelihood of a candidate symbol that includes the registered symbol (Figs. 1 and 7, [0158] assessing candidate transcriptions for an utterance using different amounts of phonetic context and ranking the candidate transcription according to scores using an acoustic model 140),
the processor further
performs, [using the bias module, an increasing process of increasing] the likelihood of the candidate symbol that includes the registered symbol, among the at least one candidate symbol (Fig. 7, [0171][0172] adjusting the scores 755 a- 755 c for the candidate transcriptions 754 a-754 c by the mapper which can generate a score that indicates a likelihood from the acoustic model 140), and
determines a temporary output symbol based on a respective likelihood of the at least one candidate symbol after the increasing process is performed (Fig. 7, [0172]-[0174] determining a transcription which has the highest score among the candidate transcriptions), and
when the registered symbol is included in the temporary output symbol, the processor refers to the combination table, performs a transformation process of transforming the registered symbol into the specific symbol corresponding to the registered symbol, and outputs, as the output symbol, the temporary output symbol on which the transformation process has been performed (Fig. 7, [0170]-[0174] if data for that phonetic sequence exists in the acoustic model 140, identifying the extracted test sequences, retrieving acoustic model data for the test sequences and outputting the final transcription for the utterance 762).
Chelba does not explicitly teach the bracketed limitation however Lam does explicitly teach including the bracketed limitation:
performs, [using the bias module, an increasing process of increasing] the likelihood of the candidate symbol that includes the registered symbol, among the at least one candidate symbol (Lam, Figs. 1, 3 and 4, [0043] “if the impact scores represent a likelihood of a positive potential impact, then values of the impact scores can be used to bias the recognition scores to increase the selection of candidate transcriptions that are determined to have high likelihoods of a positive potential impact, e.g., using a positive biasing technique”).
Therefore, it would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to incorporate the method of performing a first and second pass of speech recognition using an acoustic model as taught by Chelba with a method of speech recognition biasing using various cues about an environment to apply a bias as taught by Lam to increase the accuracy of speech transcription (Lam, [0004]).
Regarding claim 2, Chelba in view of Lam discloses the information processing device according to claim 1, and Chelba further discloses:
wherein the bias module is a bias table in which a bias value for increasing the likelihood of the candidate symbol that includes the registered symbol and the registered symbol are defined in association with each other (Fig. 7, [0172][0173] using the acoustic model data adjust the scores 755 a-755 c for the candidate transcriptions 754 a-754 c; When acoustic model data is obtained for a test sequence that is a maximal order M-phone, which represents the maximum amount of phonetic context, that acoustic model data is used to generate the adjusted score).
Regarding claim 3, Chelba in view of Lam discloses the information processing device according to claim 2, and Chelba further discloses:
wherein the trained model is trained based on a plurality of training data, the plurality of training data being a combination of a training feature value and a training registered symbol, the memory stores a decision model which outputs a bias value in response to the registered symbol being input to the decision model, the decision model outputs a bias value that is configured in such a manner that the less the number of the input registered symbols are included in the training registered symbol, the greater the likelihood of the candidate symbol that includes the registered symbol is, and the processor generates the bias table by associating the bias value output from the decision model with the registered symbol which is the same as the training registered symbol (Figs. 2-4A and [0093][0124] “the number of quantiles that are defined varies according to the set of data used to train the acoustic model. An M-phone will have more or fewer speech frames 411 a-411 c, and thus more or fewer samples in the set 420, depending on the frequency that the M-phone occurs in the training data”).
Regarding claim 4, Chelba in view of Lam discloses the information processing device according to claim 2, and Chelba further discloses:
wherein the processor generates the bias table based on a user input (Fig. 7 and [0165] generating a table 753 based on a user input 762).
Regarding claim 6, Chelba in view of Lam discloses the information processing device according to claim 1, and Chelba further discloses:
wherein the processor generates the bias module (Fig. 7, [0159]-[0172] a processor located in a system 700 generates and adjusts scores for the candidate transcription based on the data from the acoustic model that corresponds to the selected test sequence).
Regarding claim 7, Chelba in view of Lam discloses the information processing device according to claim 1, and Chelba further discloses:
wherein the interface receives a user input of the specific symbol, and the processor generates, as the combination table, a table indicating a combination of the specific symbol input to the interface and the registered symbol corresponding to the specific symbol (Fig. 7, [0159]-[0172] receiving a user input 762 and generating table with scores for the candidate transcription based on the data from the acoustic model).
Regarding claim 10, Chelba in view of Lam discloses the information processing device according to claim 1, and Lam further discloses:
wherein the increasing process is a process in which a likelihood of a candidate symbol that does not include the registered symbol, among the at least one candidate symbol, is not increased (Lam, [0047] decreasing probabilities associated with certain n-grams within the language model based on information indicated within the context data).
The previous motivation statement as in claim 1 is still applied.
Regarding claim 11, Chelba in view of Lam discloses the information processing device according to claim 1, and Chelba further discloses:
wherein the registered symbol is the same as the specific symbol, expressed in a different way than the specific symbol, or has the same reading as the specific symbol but is expressed in a different way than the specific symbol (Fig. 8, [0168] expressing phonetic representation and transcription in English alphabet).
Regarding claim 12, Chelba in view of Lam discloses the information processing device according to claim 10, and Chelba further discloses:
wherein the specific symbol is expressed in any one of hiragana, katakana, romaji, English, or international phonetic alphabets, and the registered symbol is expressed in other one (Fig. 8, [0168] expressing phonetic representation and transcription in English alphabet).
Regarding claim 13, Chelba in view of Lam discloses the information processing device according to claim 1, and Chelba further discloses:
wherein the input data is voice data, and the output symbol is text of a voice represented by the voice data, or text summarizing content of the voice represented by the voice data (Fig. 7, [0174] inputting utterance 762 from a user and outputting transcription 754 b).
Regarding claim 14, Chelba in view of Lam discloses the information processing device according to claim 1, and Chelba further discloses:
wherein the input data is any one of sound data, still image data, or video data, and the output symbol is text indicating a description of an object represented by the input data (Fig. 7, [0174] inputting utterance 762 from a user and outputting transcription 754 b).
Regarding claims 21 and 23, Claims 21 and 23 are the corresponding method and medium claims to method claim 1. Therefore, claims 21 and 23 are rejected using the same rationale as applied to claim 1 above.
Claim 5 is rejected under pre-AIA 35 U.S.C. 103(a) as being unpatentable over Chelba et al., (US Pub. 2015/0371633) in view of Lam (US Pub. 2022/0343910) and further in view of Mishra et al., (US Pub. 2023/0017352).
Regarding claim 5, Chelba in view of Lam discloses the information processing device according to claim 2.
Chelba in view of Lam does not explicitly teach however Lam does explicitly teach:
wherein the bias table defines a same bias value for a plurality of the registered symbols ([0037] “a hash value may correspond to multiple phonetically identical or similar terms (e.g., each of “meet” and “meat” may correspond to a same hash value), and graphemes or phonemes having similar representations, e.g., similar patterns of vowels or consonants, may have similar hash values”).
Therefore, it would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to incorporate the method of performing a first and second pass of speech recognition using an acoustic model as taught by Chelba in view of Lam with the method for phonetic-based natural language understanding using hash values as taught by Mishra to leverage phoneme representations of terms and heuristics to update a phonetic search index in order to accurately identify a user query (Mishra, [0004]).
Allowable Subject Matter
Claims 8 and 9 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims and if rewritten or amended to overcome the rejection(s) under 35 U.S.C. 101.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Please see attached form PTO-892.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEONG-AH A. SHIN whose telephone number is (571)272-5933. The examiner can normally be reached 9 AM-3PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached at 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
Seong-ah A. Shin
Primary Examiner
Art Unit 2659
/SEONG-AH A SHIN/Primary Examiner, Art Unit 2659