DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application is being examined under the pre-AIA first to invent provisions.
Response to Arguments
Applicant's arguments filed correspondence dated 01/13/2026 have been fully considered but they are not persuasive. Applicant argues in pages 5-7 that the amended claims overcomes the 35 U.S.C. 101 rejection. However examiner disagrees. The addition of using a microphone to capture audio data is a basic computer function as is using a speech recognition model. In other words, using a basic general purpose computer to perform basic computer functions along with using basic computer resources does not make the claim patent eligible. The independent claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Applicant's arguments filed 01/13/2026 have been fully considered but they are not persuasive. The Obviousness-type Double Patenting rejection is maintained pending terminal disclaimer.
Pending claims are 1-4, 6-14 and 16 -20. With amendment filed, claims 5 and 16 have been canceled. Claims 1-4, 6-14 and 16-20 remain rejected.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-4, 6-14 and 16-20 are rejected under 35 U.S.C. 101 because an abstract idea without significantly more.
The Independent claim 1 recite(s) “… computer-implemented method executed on data processing hardware that causes the data processing hardware to perform operations comprising: receiving first audio data and second audio data captured by a computing device; processing the first audio data to identify an entity associated with the first audio data; retrieving a set of terms related to the identified entity; processing the second audio data to determine one or more textual representations associated with the second audio data; and selecting, based on the retrieved set of terms related to the identified entity, a particular textual representation from among the one or more textual representations as a transcription of the second audio data.” Each of the steps in the claim 1 can be performed by a human being including applying a transcription service algorithm. The above limitations as drafted, covers a mental process, as this could be done by mentally or by hand with pen and paper.
Each of the steps of “receiving first audio data and second audio data captured by a computing device”; “processing the first audio data to identify an entity associated with the first audio data”; “retrieving a set of terms related to the identified entity”; “processing the second audio data to determine one or more textual representations associated with the second audio data”; and “selecting, based on the retrieved set of terms related to the identified entity, a particular textual representation from among the one or more textual representations as a transcription of the second audio data”, could be done mentally or by aid of using pen and paper.
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. With respect to integration of the abstract idea into a practical application, the additional element of using a generic computing device the determining and data gathering steps amount to no more than mere instructions to apply the exception using a generic computer. The current specification on paragraph 0059, clearly specifies that “… computing device 104 in some implementations can be a mobile telephone, a personal digital assistant, a laptop computer, a desktop computer, or another computer or hardware resource.” The additional elements have been considered both individually and as an ordered combination in the significantly more consideration. The inclusion of the computer or memory and controller to perform the selecting and generating steps amount to no more than mere instructions to apply the exception using generic computer components. Mere instructions to apply an exception using a generic computing device cannot provide an inventive concept. Therefore, claim 1 as drafted is not patent eligible. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements when considered both individually and as an ordered combination do not amount to significantly more than the abstract idea.
Thus, taken alone, the additional elements do not amount to significantly more than the above-identified judicial exception (the abstract idea). Looking at the limitations as an ordered combination adds nothing that is not already present when looking at the elements taken individually. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide conventional computer implementation. Independent claim 1 is therefore not drawn to eligible subject matter as they are directed to an abstract idea without significantly more. Claims 2-4, 6-10 are dependent claims and do not contain subject matter that can be overcome the rejection of independent claim 1. Claims 11-14 and 16-20 are directed toward a system to implement the method claims 1-4, 6-10 above, and are similar in scope and content, and are rejected under similar rationale.
All dependent claims when analyzed as a whole are held to be patent ineligible under 35 U.S.C. §101 because any additional recited limitations fail to establish that the claims are not directed to an abstract idea for the same reasons already recited for the independent claims.
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 1, 7-8, 10-11, 17-18 and 20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 4-5, 8-10, 13-14, and 17-18 of U.S. Patent No. 9,123,338. Although the claims at issue are not identical, they are not patentably distinct from each other because claims 1, 7-8, 10-11, 17-18 and 20 of the instant application are similar in scope and content of the patented claims 1, 4-5, 8-10, 13-14, and 17-18 of the patent issued to the same Applicant.
It is clear that all the elements of the application claims 1, 7-8, 10-11, 17-18 and 20 are to be found in patented claims 1, 4-5, 8-10, 13-14, and 17-18 (as the application claims 1, 7-8, 10-11, 17-18 and 20 fully encompasses patented claims 1, 4-5, 8-10, 13-14, and 17-18). The difference between the application claims and the patent claims lies in the fact that the patent claim includes many more elements and is thus much more specific. Thus the invention of claims 1, 4-5, 8-10, 13-14, and 17-18 of the patent is in effect a “species” of the “generic” invention of the application claims 1, 7-8, 10-11, 17-18 and 20. It has been held that the generic invention is “anticipated” by the “species”. See In re Goodman, 29 USPQ2d 2010 (Fed. Cir. 1993). Since application claims 1, 7-8, 10-11, 17-18 and 20 is anticipated by claims 1, 4-5, 8-10, 13-14, and 17-18 of the patent, it is not patentably distinct from of the patented claims.
Application No: 18/664,348
Patent No: 9,123,338
1. A computer-implemented method executed on data processing hardware that causes the data processing hardware to perform operations comprising: receiving first audio data and second audio data captured by a computing device; processing the first audio data to identify an entity associated with the first audio data; retrieving a set of terms related to the identified entity; processing the second audio data to determine one or more textual representations associated with the second audio data; and selecting, based on the retrieved set of terms related to the identified entity, a particular textual representation from among the one or more textual representations as a transcription of the second audio data.
1. A computer-implemented method, the method comprising: receiving an audio stream at a computing device during a time interval, the audio stream comprising user speech data and background audio; separating the audio stream into a first substream that comprises the user speech data and a second substream that comprises the background audio; identifying concepts related to the background audio; generating a set of terms related to the identified concepts; influencing a speech recognizer based on at least one of the terms related to the background audio; and obtaining a recognized version of the user speech data using the speech recognizer.
2. The computer-implemented method of claim 1, wherein the computing device captures the first audio data before capturing the second audio data.
3. The computer-implemented method of claim 1, wherein the second audio data corresponds to an utterance spoken by a user associated with the computing device.
4. The computer-implemented method of claim 1, wherein the data processing hardware resides on the computing device.
5. The computer-implemented method of claim 1, wherein processing the second audio data to determine the one or more textual representations comprises determining the one or more textual representations using a speech recognition language model.
6. The computer-implemented method of claim 5, wherein the speech recognition language model executes on the computing device.
7. The computer-implemented method of claim 1, wherein the retrieved set of terms comprises a list of songs.
4. The method of claim 3, wherein the acoustic fingerprint is an acoustic fingerprint for an audio sample from a media recording.
8. The computer-implemented method of claim 1, wherein the retrieved set of terms comprises a list of music performers.
5. The method of claim 1, wherein generating a set of terms related to the background audio comprises: generating a set of terms based on querying a conceptual expansion database based on the concepts related to the background audio.
9. The computer-implemented method of claim 1, wherein the computing device comprises a speaker.
10. The computer-implemented method of claim 1, wherein the particular textual representation comprises a lower relevance score than at least one other textual representation from the one or more textual representations.
8. The method of claim 1, wherein influencing the language model comprises increasing the probability that at least one of the terms related to the background audio will be obtained.
9. The method of claim 8, further comprising: measuring the relevance of a term related to the background audio; and increasing the probability that at least one of the terms related to the background audio based on the measured relevance.
11. A system comprising: data processing hardware; and memory hardware in communication with the data processing hardware and storing instructions, that when executed by the data processing hardware, cause the data processing hardware to perform operations comprising: receiving first audio data and second audio data captured by a computing device; processing the first audio data to identify an entity associated with the first audio data; retrieving a set of terms related to the identified entity; processing the second audio data to determine one or more textual representations associated with the second audio data; and selecting, based on the retrieved set of terms related to the identified entity, a particular textual representation from among the one or more textual representations as a transcription of the second audio data.
10. A system comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: receiving an audio stream during a time interval, the audio stream comprising user speech data and background audio; separating the audio stream into a first substream that comprises the user speech data and a second substream that comprises the background audio; identifying concepts related to the background audio; generating a set of terms related to the identified concepts; influencing a speech recognizer based on at least one of the terms related to the background audio; and obtaining a recognized version of the user speech data using the speech recognizer.
12. The system of claim 11, wherein the computing device captures the first audio data before capturing the second audio data.
13. The system of claim 11, wherein the second audio data corresponds to an utterance spoken by a user associated with the computing device.
14. The system of claim 11, wherein the data processing hardware resides on the computing device.
15. The system of claim 11, wherein processing the second audio data to determine the one or more textual representations comprises determining the one or more textual representations using a speech recognition language model.
16. The system of claim 15, wherein the speech recognition language model executes on the computing device.
17. The system of claim 11, wherein the retrieved set of terms comprises a list of songs.
13. The system of claim 12, wherein the acoustic fingerprint is an acoustic fingerprint for an audio sample from a media recording.
18. The system of claim 11, wherein the retrieved set of terms comprises a list of music performers.
14. The system of claim 10, wherein generating a set of terms related to the background audio comprises: generating a set of terms based on querying a conceptual expansion database based on the concepts related to the background audio.
19. The system of claim 11, wherein the computing device comprises a speaker.
20. The system of claim 11, wherein the particular textual representation comprises a lower relevance score than at least one other textual representation from the one or more textual representations.
17. The system of claim 10, wherein influencing the language model comprises increasing the probability that at least one of the terms related to the background audio will be obtained.
18. The system of claim 17, the operations further comprising: measuring the relevance of a term related to the background audio; and increasing the probability that at least one of the terms related to the background audio based on the measured relevance.
Claims 1, 10 and 14-16 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-2, 7-9 and 14 of U.S. Patent No. 9,812,123. Although the claims at issue are not identical, they are not patentably distinct from each other because claims 1, 10 and 14-16 of the instant application are similar in scope and content of the patented claims 1-2, 7-9 and 14 of the patent issued to the same Applicant.
It is clear that all the elements of the application claims 1, 10 and 14-16 are to be found in patented claims 1-2, 7-9 and 14 (as the application claims 1, 10 and 14 -16 fully encompasses patented claims 1-2, 7-9 and 14). The difference between the application claims and the patent claims lies in the fact that the patent claim includes many more elements and is thus much more specific. Thus the invention of claims 1-2, 7-9 and 14 of the patent is in effect a “species” of the “generic” invention of the application claims 1, 10 and 14-16. It has been held that the generic invention is “anticipated” by the “species”. See In re Goodman, 29 USPQ2d 2010 (Fed. Cir. 1993). Since application claims 1, 10 and 14-16 is anticipated by claims 1-2, 7-9 and 14 of the patent, it is not patentably distinct from of the patented claims.
Application No: 18/664,348
Patent No: 9,812,123
1. A computer-implemented method executed on data processing hardware that causes the data processing hardware to perform operations comprising: receiving first audio data and second audio data captured by a computing device; processing the first audio data to identify an entity associated with the first audio data; retrieving a set of terms related to the identified entity; processing the second audio data to determine one or more textual representations associated with the second audio data; and selecting, based on the retrieved set of terms related to the identified entity, a particular textual representation from among the one or more textual representations as a transcription of the second audio data.
1. A computer-implemented method comprising: receiving, by an application server of an automated speech recognition system that includes (a) the application server, (b) a background audio recognizer, (c) a conceptual expander component, (d) an automated speech recognizer, and (e) a speech recognition language model, an audio stream at a computing device, the audio stream comprising user speech data; identifying, by the background audio recognizer of the automated speech recognition system, concepts from audio features of the audio stream; generating, by the conceptual expander component of the automated speech recognition system, a set of terms related to the identified concepts; influencing the automated speech recognizer of the automated speech recognition system based on at least one of the terms related to the identified concepts, comprising adjusting one or more probabilities or relevance scores of the speech recognition language model, wherein each of the one or more probabilities or relevance scores corresponds to a term related to the identified concepts; providing, by the application server of the automated speech recognition system, the audio stream to the influenced automated speech recognizer; generating, by the automated speech recognizer of the automated speech recognition system, a recognized version of the user speech data; and providing, by the application server of the automated speech recognition system, the recognized version of the user speech data, for output.
2. The computer-implemented method of claim 1, wherein identifying the concepts from audio features of the audio stream comprises: separating the audio stream into a first substream that comprises the user speech data and a second substream that comprises background audio; and using a combination of at least the first substream and the second substream to recognize the concepts.
2. The computer-implemented method of claim 1, wherein the computing device captures the first audio data before capturing the second audio data.
3. The computer-implemented method of claim 1, wherein the second audio data corresponds to an utterance spoken by a user associated with the computing device.
4. The computer-implemented method of claim 1, wherein the data processing hardware resides on the computing device.
5. The computer-implemented method of claim 1, wherein processing the second audio data to determine the one or more textual representations comprises determining the one or more textual representations using a speech recognition language model.
6. The computer-implemented method of claim 5, wherein the speech recognition language model executes on the computing device.
7. The computer-implemented method of claim 1, wherein the retrieved set of terms comprises a list of songs.
8. The computer-implemented method of claim 1, wherein the retrieved set of terms comprises a list of music performers.
9. The computer-implemented method of claim 1, wherein the computing device comprises a speaker.
10. The computer-implemented method of claim 1, wherein the particular textual representation comprises a lower relevance score than at least one other textual representation from the one or more textual representations.
7. The computer-implemented method of claim 1, wherein influencing the automated speech recognizer based on at least one of the terms related to the identified concepts includes: measuring a relevance of a particular term included in the set of terms related to the identified concepts to one or more of the identified concepts; and influencing the automated speech recognizer based on the measured relevance of the particular term to one or more of the identified concepts.
11. A system comprising: data processing hardware; and memory hardware in communication with the data processing hardware and storing instructions, that when executed by the data processing hardware, cause the data processing hardware to perform operations comprising: receiving first audio data and second audio data captured by a computing device; processing the first audio data to identify an entity associated with the first audio data; retrieving a set of terms related to the identified entity; processing the second audio data to determine one or more textual representations associated with the second audio data; and selecting, based on the retrieved set of terms related to the identified entity, a particular textual representation from among the one or more textual representations as a transcription of the second audio data.
8. An automated speech recognition system comprising: one or more computers and one or more storage devices, comprising an application server, a background audio recognizer, a conceptual expander component, an automated speech recognizer, and a speech recognition model, and storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: receiving, by the application server, an audio stream at a computing device, the audio stream comprising user speech data; identifying, by the background audio recognizer, concepts from audio features of the audio stream; generating, by the conceptual expander component, a set of terms related to the identified concepts; influencing the automated speech recognizer based on at least one of the terms related to the identified concepts, comprising adjusting one or more probabilities or relevance scores of the speech recognition language model, wherein each of the one or more probabilities or relevance scores corresponds to a term related to the identified concepts; providing, by the application server, the audio stream to the influenced automated speech recognizer; and generating, by the automated speech recognizer, a recognized version of the user speech data; and providing, by the application server, the recognized version of the user speech data, for output.
9. The system of claim 8, wherein identifying the concepts from audio features of the audio stream comprises: separating the audio stream into a first substream that comprises the user speech data and a second substream that comprises background audio; and using a combination of at least the first substream and the second substream to recognize the concepts.
12. The system of claim 11, wherein the computing device captures the first audio data before capturing the second audio data.
13. The system of claim 11, wherein the second audio data corresponds to an utterance spoken by a user associated with the computing device.
14. The system of claim 11, wherein the data processing hardware resides on the computing device.
15. The system of claim 11, wherein processing the second audio data to determine the one or more textual representations comprises determining the one or more textual representations using a speech recognition language model.
16. The system of claim 15, wherein the speech recognition language model executes on the computing device.
17. The system of claim 11, wherein the retrieved set of terms comprises a list of songs.
18. The system of claim 11, wherein the retrieved set of terms comprises a list of music performers.
19. The system of claim 11, wherein the computing device comprises a speaker.
20. The system of claim 11, wherein the particular textual representation comprises a lower relevance score than at least one other textual representation from the one or more textual representations.
14. The system of claim 8, wherein influencing the automated speech recognizer based on at least one of the terms related to the identified concepts includes: measuring a relevance of a particular term included in the set of terms related to the identified concepts to one or more of the identified concepts; and influencing the automated speech recognizer based on the measured relevance of the particular term to one or more of the identified concepts.
Claims 1, 7-8, 11, 17-18 and 20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-2, 5-10 and 12 of U.S. Patent No. 10,224,024. Although the claims at issue are not identical, they are not patentably distinct from each other because claims 1, 7-8, 11, 17-18 and 20 of the instant application are similar in scope and content of the patented claims 1-2, 5-10 and 12 of the patent issued to the same Applicant.
It is clear that all the elements of the application claims 1, 7-8, 11, 17-18 and 20 are to be found in patented claims 1-2, 5-10 and 12 (as the application claims 1, 7-8, 11, 17-18 and 20 fully encompasses patented claims 1-2, 5-10 and 12). The difference between the application claims and the patent claims lies in the fact that the patent claim includes many more elements and is thus much more specific. Thus the invention of claims 1-2, 5-10 and 12 of the patent is in effect a “species” of the “generic” invention of the application claims 1, 7-8, 11, 17-18 and 20. It has been held that the generic invention is “anticipated” by the “species”. See In re Goodman, 29 USPQ2d 2010 (Fed. Cir. 1993). Since application claims 1, 7-8, 11, 17-18 and 20 is anticipated by claims 1-2, 5-10 and 12 of the patent, it is not patentably distinct from of the patented claims.
Application No: 18/664,348
Patent No: 10, 224,024
1. A computer-implemented method executed on data processing hardware that causes the data processing hardware to perform operations comprising: receiving first audio data and second audio data captured by a computing device; processing the first audio data to identify an entity associated with the first audio data; retrieving a set of terms related to the identified entity; processing the second audio data to determine one or more textual representations associated with the second audio data; and selecting, based on the retrieved set of terms related to the identified entity, a particular textual representation from among the one or more textual representations as a transcription of the second audio data.
1. A computer-implemented method comprising: receiving an audio stream containing first audio data and second audio data, the first audio data corresponding to an utterance spoken by a user and the second audio data corresponding to background audio associated with playback of an item of media content; processing the second audio data to generate at least one term associated with the background audio; adjusting a probability or relevance score associated with a speech recognition model recognizing the at least one term associated with the background audio in the first audio data; after adjusting the probability or relevance score associated with the speech recognition model recognizing the at least one term associated with the background audio, transcribing the first audio data into a textual representation of the utterance using the speech recognition model; and transmitting the textual representation of the utterance to a computing device associated with the user, the textual representation of the utterance when received by the computing device causing the computing device to at least one of display the textual representation on a display or perform a particular task based on the textual representation.
5. The computer-implemented method of claim 2, further comprising: generating conceptual bias data using the at least one term associated with the background audio; and adjusting the probability or relevance score associated with the speech recognition model recognizing the at least one term associated with the background audio in the first audio data based on the conceptual bias data.
2. The computer-implemented method of claim 1, wherein the computing device captures the first audio data before capturing the second audio data.
3. The computer-implemented method of claim 1, wherein the second audio data corresponds to an utterance spoken by a user associated with the computing device.
4. The computer-implemented method of claim 1, wherein the data processing hardware resides on the computing device.
5. The computer-implemented method of claim 1, wherein processing the second audio data to determine the one or more textual representations comprises determining the one or more textual representations using a speech recognition language model.
6. The computer-implemented method of claim 5, wherein the speech recognition language model executes on the computing device.
7. The computer-implemented method of claim 1, wherein the retrieved set of terms comprises a list of songs.
6. The computer-implemented method of claim 2, wherein identifying the one or more concepts from the audio features of the second audio data comprises: comparing the audio features of the second audio data to known audio segments; determining that the audio features of the second audio data correspond to audio features of a particular known audio segment; and identifying the one or more concepts as being related to the particular known audio segment.
8. The computer-implemented method of claim 1, wherein the retrieved set of terms comprises a list of music performers.
5. The computer-implemented method of claim 2, further comprising: generating conceptual bias data using the at least one term associated with the background audio; and adjusting the probability or relevance score associated with the speech recognition model recognizing the at least one term associated with the background audio in the first audio data based on the conceptual bias data.
9. The computer-implemented method of claim 1, wherein the computing device comprises a speaker.
10. The computer-implemented method of claim 1, wherein the particular textual representation comprises a lower relevance score than at least one other textual representation from the one or more textual representations.
7. The computer-implemented method of claim 5, wherein transcribing the first audio data into the textual representation of the utterance comprises selecting, by the speech recognition model, the textual representation from a set of textual representations that have substantially similar frequencies of occurrence in a particular language by using the conceptual data to weigh a statistical selection of the textual representation from the set of textual representations.
11. A system comprising: data processing hardware; and memory hardware in communication with the data processing hardware and storing instructions, that when executed by the data processing hardware, cause the data processing hardware to perform operations comprising: receiving first audio data and second audio data captured by a computing device; processing the first audio data to identify an entity associated with the first audio data; retrieving a set of terms related to the identified entity; processing the second audio data to determine one or more textual representations associated with the second audio data; and selecting, based on the retrieved set of terms related to the identified entity, a particular textual representation from among the one or more textual representations as a transcription of the second audio data.
8. A system comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: receiving an audio stream containing first audio data and second audio data, the first audio data corresponding to an utterance spoken by a user and the second audio data corresponding to background audio associated with playback of an item of media content; processing the second audio data to generate at least one term associated with the background audio; adjusting a probability or relevance score associated with a speech recognition model recognizing the at least one term associated with the background audio in the first audio data; after adjusting the probability or relevance score associated with the speech recognition model recognizing the at least one term associated with the background audio, transcribing the first audio data into a textual representation of the utterance using the speech recognition model; and transmitting the textual representation of the utterance to a computing device associated with the user, the textual representation of the utterance when received by the computing device causing the computing device to at least one of display the textual representation on a display or perform a particular task based on the textual representation.
9. The system of claim 8, wherein processing the second audio data to generate the at least one term comprises: identifying one or more concepts from audio features of the second audio data; and generating a set of terms related to the identified one or more concepts.
12. The system of claim 11, wherein the computing device captures the first audio data before capturing the second audio data.
13. The system of claim 11, wherein the second audio data corresponds to an utterance spoken by a user associated with the computing device.
14. The system of claim 11, wherein the data processing hardware resides on the computing device.
15. The system of claim 11, wherein processing the second audio data to determine the one or more textual representations comprises determining the one or more textual representations using a speech recognition language model.
16. The system of claim 15, wherein the speech recognition language model executes on the computing device.
17. The system of claim 11, wherein the retrieved set of terms comprises a list of songs.
10. The system of claim 9, wherein generating the set of terms related to the identified one or more concepts comprises querying a conceptual expansion database for the set of terms using the one or more concepts identified from the audio features of the second audio data.
18. The system of claim 11, wherein the retrieved set of terms comprises a list of music performers.
10. The system of claim 9, wherein generating the set of terms related to the identified one or more concepts comprises querying a conceptual expansion database for the set of terms using the one or more concepts identified from the audio features of the second audio data.
19. The system of claim 11, wherein the computing device comprises a speaker.
20. The system of claim 11, wherein the particular textual representation comprises a lower relevance score than at least one other textual representation from the one or more textual representations.
12. The system of claim 9, wherein the operations further comprise: generating conceptual bias data using the at least one term associated with the background audio; and adjusting the probability or relevance score associated with the speech recognition model recognizing the at least one term associated with the background audio in the first audio data based on the conceptual bias data.
Claims 1, 10-11, 17-18 and 20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 4, 6, 15-16, 19-21 and 26-29 of U.S. Patent No. 10,872,600. Although the claims at issue are not identical, they are not patentably distinct from each other because claims 1, 10-11, 17-18 and 20 of the instant application are similar in scope and content of the patented claims 1, 4, 6, 15-16, 19-21 and 26-29 of the patent issued to the same Applicant.
It is clear that all the elements of the application claims 1, 10-11, 17-18 and 20 are to be found in patented claims 1, 4, 6, 15-16, 19-21 and 26-29 (as the application claims 1, 10-11, 17-18 and 20 fully encompasses patented claims 1, 4, 6, 15-16, 19-21 and 26-29). The difference between the application claims and the patent claims lies in the fact that the patent claim includes many more elements and is thus much more specific. Thus the invention of claims 1, 4, 6, 15-16, 19-21 and 26-29 of the patent is in effect a “species” of the “generic” invention of the application claims 1, 10-11, 17-18 and 20. It has been held that the generic invention is “anticipated” by the “species”. See In re Goodman, 29 USPQ2d 2010 (Fed. Cir. 1993). Since application claims 1, 7-8, 11, 17-18 and 20 is anticipated by claims 1, 4, 6, 15-16, 19-21 and 26-29 of the patent, it is not patentably distinct from of the patented claims.
Application No: 18/664,348
Patent No: 10,872,600
1. A computer-implemented method executed on data processing hardware that causes the data processing hardware to perform operations comprising: receiving first audio data and second audio data captured by a computing device; processing the first audio data to identify an entity associated with the first audio data; retrieving a set of terms related to the identified entity; processing the second audio data to determine one or more textual representations associated with the second audio data; and selecting, based on the retrieved set of terms related to the identified entity, a particular textual representation from among the one or more textual representations as a transcription of the second audio data.
1. A method comprising: receiving, at a voice recognition system, first audio data and second audio data from a computing device associated with a user, the computing device configured to: capture an audio stream comprising the first audio data and the second audio data; separate the first audio data and the second audio data from the captured audio stream; and provide the first audio data and the second audio data to the voice recognition system; processing, by the voice recognition system, the second audio data to generate at least one term associated with the second audio data; influencing, by the voice recognition system, a speech recognition model based on the at least one term associated with the second audio data; and after influencing the speech recognition model, transcribing, by the voice recognition system, the first audio data into a textual representation using the speech recognition model.
4. The method of claim 1, wherein the computing device is configured to separate the first audio data and the second audio data from the captured audio stream by separating the captured audio stream into a first substream and a second substream, the first substream corresponding to the first audio data and the second substream isolated from the first substream and corresponding to the second audio data.
6. The method of claim 1, further comprising transmitting the textual representation of the first audio data to the computing device, the textual representation when received by the computing device causing the computing device to display the textual representation on a display.
2. The computer-implemented method of claim 1, wherein the computing device captures the first audio data before capturing the second audio data.
3. The computer-implemented method of claim 1, wherein the second audio data corresponds to an utterance spoken by a user associated with the computing device.
4. The computer-implemented method of claim 1, wherein the data processing hardware resides on the computing device.
5. The computer-implemented method of claim 1, wherein processing the second audio data to determine the one or more textual representations comprises determining the one or more textual representations using a speech recognition language model.
6. The computer-implemented method of claim 5, wherein the speech recognition language model executes on the computing device.
7. The computer-implemented method of claim 1, wherein the retrieved set of terms comprises a list of songs.
8. The computer-implemented method of claim 1, wherein the retrieved set of terms comprises a list of music performers.
9. The computer-implemented method of claim 1, wherein the computing device comprises a speaker.
10. The computer-implemented method of claim 1, wherein the particular textual representation comprises a lower relevance score than at least one other textual representation from the one or more textual representations.
15. The method of claim 14, wherein transcribing the first audio data into the textual representation using the speech recognition model comprises, selecting, by the speech recognition model, the textual representation from a set of textual representations that have substantially similar frequencies of occurrence in a particular language by using the conceptual bias data to weigh a statistical selection of the textual representation from the set of textual representations.
11. A system comprising: data processing hardware; and memory hardware in communication with the data processing hardware and storing instructions, that when executed by the data processing hardware, cause the data processing hardware to perform operations comprising: receiving first audio data and second audio data captured by a computing device; processing the first audio data to identify an entity associated with the first audio data; retrieving a set of terms related to the identified entity; processing the second audio data to determine one or more textual representations associated with the second audio data; and selecting, based on the retrieved set of terms related to the identified entity, a particular textual representation from among the one or more textual representations as a transcription of the second audio data.
16. A voice recognition system comprising: data processing hardware; and memory hardware in communication with the data processing hardware and storing instructions, that when executed by the data processing hardware, cause the data processing hardware to perform operations comprising: receiving first audio data and second audio data from a computing device associated with a user, the computing device configured to: capture an audio stream comprising the first audio data and the second audio data; separate the first audio data and the second audio data from the captured audio stream; and provide the first audio data and the second audio data to the voice recognition system; processing the second audio data to generate at least one term associated with the second audio data; influencing a speech recognition model based on the at least one term associated with the second audio data; and after influencing the speech recognition model, transcribing the first audio data into a textual representation using the speech recognition model.
19. The voice recognition system of claim 16, wherein the computing device is configured to separate the first audio data and the second audio data from the captured audio stream by separating the captured audio stream into a first substream and a second substream, the first substream corresponding to the first audio data and the second substream isolated from the first substream and corresponding to the second audio data.
20. The voice recognition system of claim 19, wherein the computing device is configured to provide the first audio data and the second audio data to the voice recognition system by transmitting the first substream and the second substream over a communication channel to the voice recognition system.
21. The voice recognition system of claim 16, wherein the operations further comprise transmitting the textual representation of the first audio data to the computing device, the textual representation when received by the computing device causing the computing device to display the textual representation on a display.
12. The system of claim 11, wherein the computing device captures the first audio data before capturing the second audio data.
13. The system of claim 11, wherein the second audio data corresponds to an utterance spoken by a user associated with the computing device.
14. The system of claim 11, wherein the data processing hardware resides on the computing device.
15. The system of claim 11, wherein processing the second audio data to determine the one or more textual representations comprises determining the one or more textual representations using a speech recognition language model.
16. The system of claim 15, wherein the speech recognition language model executes on the computing device.
17. The system of claim 11, wherein the retrieved set of terms comprises a list of songs.
26. The voice recognition system of claim 16, wherein processing the second audio data to generate the at least one term associated with the second audio data comprises: identifying one or more concepts from audio features of the second audio data; and generating a set of terms related to the identified one or more concepts.
27. The voice recognition system of claim 26, wherein generating the set of terms related to the identified one or more concepts comprises querying a conceptual expansion database for the set of terms using the one or more concepts identified from the audio features of the second audio data.
18. The system of claim 11, wherein the retrieved set of terms comprises a list of music performers.
27. The voice recognition system of claim 26, wherein generating the set of terms related to the identified one or more concepts comprises querying a conceptual expansion database for the set of terms using the one or more concepts identified from the audio features of the second audio data.
19. The system of claim 11, wherein the computing device comprises a speaker.
20. The system of claim 11, wherein the particular textual representation comprises a lower relevance score than at least one other textual representation from the one or more textual representations.
28. The voice recognition system of claim 16, wherein influencing the speech recognition model based on the at least one term associated with the second audio data comprises adjusting a probability or relevance score associated with the speech recognition model recognizing the at least one term associated with the second audio data.
28. The voice recognition system of claim 16, wherein influencing the speech recognition model based on the at least one term associated with the second audio data comprises adjusting a probability or relevance score associated with the speech recognition model recognizing the at least one term associated with the second audio data.
Claims 1-2, 5, 10-13 and 20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-3, 6, 10-13, 15 and 17 of U.S. Patent No. 11,557,280. Although the claims at issue are not identical, they are not patentably distinct from each other because claims 1-2, 5, 10-13 and 20 of the instant application are similar in scope and content of the patented claims 1-3, 6, 10-13, 15 and 17 of the patent issued to the same Applicant.
It is clear that all the elements of the application claims 1-2, 5, 10-13 and 20 are to be found in patented claims 1-3, 6, 10-13, 15 and 17 (as the application claims 1-2, 5, 10-13 and 20 fully encompasses patented claims 1-3, 6, 10-13, 15 and 17). The difference between the application claims and the patent claims lies in the fact that the patent claim includes many more elements and is thus much more specific. Thus the invention of claims 1-3, 6, 10-13, 15 and 17 of the patent is in effect a “species” of the “generic” invention of the application claims 1-2, 5, 10-13 and 20. It has been held that the generic invention is “anticipated” by the “species”. See In re Goodman, 29 USPQ2d 2010 (Fed. Cir. 1993). Since application claims 1-2, 5, 10-13 and 20 is anticipated by claims 1-3, 6, 10-13, 15 and 17 of the patent, it is not patentably distinct from of the patented claims.
Application No: 18/664,348
Patent No: 11,557,280
1. A computer-implemented method executed on data processing hardware that causes the data processing hardware to perform operations comprising: receiving first audio data and second audio data captured by a computing device; processing the first audio data to identify an entity associated with the first audio data; retrieving a set of terms related to the identified entity; processing the second audio data to determine one or more textual representations associated with the second audio data; and selecting, based on the retrieved set of terms related to the identified entity, a particular textual representation from among the one or more textual representations as a transcription of the second audio data.
1. A method comprising: receiving, at data processing hardware, from a computing device associated with a user, first audio data and second audio data captured by the computing device; processing, by the data processing hardware, the first audio data to identify a concept associated with the first audio data; influencing, by the data processing hardware, a speech recognition language model based on the identified concept associated with the first audio data; and generating, by the data processing hardware, using the influenced speech recognition language model, a textual representation of the second audio data.
6. The method of claim 5, wherein generating the textual representation of the second audio data using the influenced speech recognition language model comprises, selecting, by the influenced speech recognition language model, the textual representation from a set of textual representations that have substantially similar frequencies of occurrence in a particular language by using the conceptual bias data to weigh a statistical selection of the textual representation from the set of textual representations.
10. The method of claim 1, further comprising transmitting, by the data processing hardware, the textual representation of the second audio data to the computing device, the textual representation when received by the computing device causing the computing device to display the textual representation on a display.
2. The computer-implemented method of claim 1, wherein the computing device captures the first audio data before capturing the second audio data.
2. The method of claim 1, wherein the computing device captures the first audio data before capturing the second audio data.
3. The computer-implemented method of claim 1, wherein the second audio data corresponds to an utterance spoken by a user associated with the computing device.
4. The computer-implemented method of claim 1, wherein the data processing hardware resides on the computing device.
6. The computer-implemented method of claim 5, wherein the speech recognition language model executes on the computing device.
7. The computer-implemented method of claim 1, wherein the retrieved set of terms comprises a list of songs.
8. The computer-implemented method of claim 1, wherein the retrieved set of terms comprises a list of music performers.
9. The computer-implemented method of claim 1, wherein the computing device comprises a speaker.
10. The computer-implemented method of claim 1, wherein the particular textual representation comprises a lower relevance score than at least one other textual representation from the one or more textual representations.
3. The method of claim 1, further comprising: generating, by the data processing hardware, a set of terms related to the identified concept, wherein influencing the speech recognition model based on the identified concept comprises adjusting a probability or relevance score associated with the speech recognition language model recognizing at least one term in the set of terms related to the identified concept.
11. A system comprising: data processing hardware; and memory hardware in communication with the data processing hardware and storing instructions, that when executed by the data processing hardware, cause the data processing hardware to perform operations comprising: receiving first audio data and second audio data captured by a computing device; processing the first audio data to identify an entity associated with the first audio data; retrieving a set of terms related to the identified entity; processing the second audio data to determine one or more textual representations associated with the second audio data; and selecting, based on the retrieved set of terms related to the identified entity, a particular textual representation from among the one or more textual representations as a transcription of the second audio data.
11. A system comprising: data processing hardware; and memory hardware in communication with the data processing hardware and storing instructions, that when executed by the data processing hardware, cause the data processing hardware to perform operations comprising: receiving, from a computing device associated with a user, first audio data and second audio data captured by the computing device; processing the first audio data to identify a concept associated with the first audio data; influencing a speech recognition language model based on the identified concept associated with the first audio data; and generating, using the influenced speech recognition language model, a textual representation of the second audio data.
12. The system of claim 11, wherein the computing device captures the first audio data before capturing the second audio data.
12. The system of claim 11, wherein the computing device captures the first audio data before capturing the second audio data.
13. The system of claim 11, wherein the second audio data corresponds to an utterance spoken by a user associated with the computing device.
17. The system of claim 11, wherein the second audio data corresponds to an utterance spoken by the user of the computing device.
14. The system of claim 11, wherein the data processing hardware resides on the computing device.
15. The system of claim 11, wherein processing the second audio data to determine the one or more textual representations comprises determining the one or more textual representations using a speech recognition language model.
16. The system of claim 15, wherein the speech recognition language model executes on the computing device.
17. The system of claim 11, wherein the retrieved set of terms comprises a list of songs.
18. The system of claim 11, wherein the retrieved set of terms comprises a list of music performers.
19. The system of claim 11, wherein the computing device comprises a speaker.
20. The system of claim 11, wherein the particular textual representation comprises a lower relevance score than at least one other textual representation from the one or more textual representations.
13. The system of claim 11, wherein the operations further comprise: generating a set of terms related to the identified concept, wherein influencing the speech recognition model based on the identified concept comprises adjusting a probability or relevance score associated with the speech recognition language model recognizing at least one term in the set of terms related to the identified concept.
15. The system of claim 13, wherein the operations further comprise: generating conceptual bias data using set of terms related to the identified concept associated with the first audio data; and adjusting the probability or relevance score associated with the speech recognition language model recognizing the at least one term in the set of terms related to the identified concept based on the conceptual bias data.
Claims 1-4, 6-9, 11-14 and 16-19 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-2, 4, 6-12, 14 and 16-20 of U.S. Patent No. 12,002,452. Although the claims at issue are not identical, they are not patentably distinct from each other because claims 1-4, 6-9, 11-14 and 16-19 of the instant application are similar in scope and content of the patented claims 1-2, 4, 6-12, 14 and 16-20 of the patent issued to the same Applicant.
It is clear that all the elements of the application claims 1-4, 6-9, 11-14 and 16-19 are to be found in patented claims 1-2, 4, 6-12, 14 and 16-20 (as the application claims 1-4, 6-9, 11-14 and 16-19 fully encompasses patented claims 1-2, 4, 6-12, 14 and 16-20). The difference between the application claims and the patent claims lies in the fact that the patent claim includes many more elements and is thus much more specific. Thus the invention of claims 1-2, 4, 6-12, 14 and 16-20 of the patent is in effect a “species” of the “generic” invention of the application claims 1-4, 6-9, 11-14 and 16-19. It has been held that the generic invention is “anticipated” by the “species”. See In re Goodman, 29 USPQ2d 2010 (Fed. Cir. 1993). Since application claims 1-4, 6-9, 11-14 and 16-19 is anticipated by claims 1-2, 4, 6-12, 14 and 16-20 of the patent, it is not patentably distinct from of the patented claims.
Application No: 18/664,348
Patent No: 12,002,452
1. A computer-implemented method executed on data processing hardware that causes the data processing hardware to perform operations comprising: receiving first audio data and second audio data captured by a computing device; processing the first audio data to identify an entity associated with the first audio data; retrieving a set of terms related to the identified entity; processing the second audio data to determine one or more textual representations associated with the second audio data; and selecting, based on the retrieved set of terms related to the identified entity, a particular textual representation from among the one or more textual representations as a transcription of the second audio data.
1. A computer-implemented method executed on data processing hardware that causes the data processing hardware to perform operations comprising: receiving first audio data and second audio data captured by a computing device associated with a user; processing the first audio data to identify an entity associated with the first audio data; retrieving a set of terms related to the identified entity; influencing, using the retrieved set of terms related to the identified entity, a speech recognition language model; and generating, using the influenced speech recognition language model, a transcription of the second audio data.
2. The computer-implemented method of claim 1, wherein the computing device captures the first audio data before capturing the second audio data.
2. The method of claim 1, wherein the computing device captures the first audio data before capturing the second audio data.
3. The computer-implemented method of claim 1, wherein the second audio data corresponds to an utterance spoken by a user associated with the computing device.
4. The method of claim 1, wherein the second audio data corresponds to an utterance spoken by the user of the computing device.
4. The computer-implemented method of claim 1, wherein the data processing hardware resides on the computing device.
6. The method of claim 1, wherein the data processing hardware resides on the computing device.
5. The computer-implemented method of claim 1, wherein processing the second audio data to determine the one or more textual representations comprises determining the one or more textual representations using a speech recognition language model.
6. The computer-implemented method of claim 5, wherein the speech recognition language model executes on the computing device.
7. The method of claim 1, wherein the speech recognition language model executes on the computing device.
7. The computer-implemented method of claim 1, wherein the retrieved set of terms comprises a list of songs.
8. The method of claim 1, wherein the retrieved list of terms comprises a list of songs.
8. The computer-implemented method of claim 1, wherein the retrieved set of terms comprises a list of music performers.
9. The method of claim 1, wherein the retrieved list of terms comprises a list of music performers.
9. The computer-implemented method of claim 1, wherein the computing device comprises a speaker.
10. The method of claim 1, wherein the computing device comprises a speaker.
10. The computer-implemented method of claim 1, wherein the particular textual representation comprises a lower relevance score than at least one other textual representation from the one or more textual representations.
11. A system comprising: data processing hardware; and memory hardware in communication with the data processing hardware and storing instructions, that when executed by the data processing hardware, cause the data processing hardware to perform operations comprising: receiving first audio data and second audio data captured by a computing device; processing the first audio data to identify an entity associated with the first audio data; retrieving a set of terms related to the identified entity; processing the second audio data to determine one or more textual representations associated with the second audio data; and selecting, based on the retrieved set of terms related to the identified entity, a particular textual representation from among the one or more textual representations as a transcription of the second audio data.
11. A system comprising: data processing hardware; and memory hardware in communication with the data processing hardware and storing instructions, that when executed by the data processing hardware, cause the data processing hardware to perform operations comprising: receiving first audio data and second audio data captured by a computing device associated with a user; processing the first audio data to identify an entity associated with the first audio data; retrieving a set of terms related to the identified entity; influencing, using the retrieved set of terms related to the identified entity, a speech recognition language model; and generating, using the influenced speech recognition language model, a transcription of the second audio data.
12. The system of claim 11, wherein the computing device captures the first audio data before capturing the second audio data.
12. The system of claim 11, wherein the computing device captures the first audio data before capturing the second audio data.
13. The system of claim 11, wherein the second audio data corresponds to an utterance spoken by a user associated with the computing device.
14. The system of claim 11, wherein the second audio data corresponds to an utterance spoken by the user of the computing device.
14. The system of claim 11, wherein the data processing hardware resides on the computing device.
16. The system of claim 11, wherein the data processing hardware resides on the computing device.
15. The system of claim 11, wherein processing the second audio data to determine the one or more textual representations comprises determining the one or more textual representations using a speech recognition language model.
16. The system of claim 15, wherein the speech recognition language model executes on the computing device.
17. The system of claim 11, wherein the speech recognition language model executes on the computing device.
17. The system of claim 11, wherein the retrieved set of terms comprises a list of songs.
18. The system of claim 11, wherein the retrieved list of terms comprises a list of songs.
18. The system of claim 11, wherein the retrieved set of terms comprises a list of music performers.
19. The system of claim 11, wherein the retrieved list of terms comprises a list of music performers.
19. The system of claim 11, wherein the computing device comprises a speaker.
20. The system of claim 11, wherein the computing device comprises a speaker.
20. The system of claim 11, wherein the particular textual representation comprises a lower relevance score than at least one other textual representation from the one or more textual representations.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to VIJAY B CHAWAN whose telephone number is (571)272-7601. The examiner can normally be reached 7-5 Monday thru Thursday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached at 571-272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/VIJAY B CHAWAN/Primary Examiner, Art Unit 2658