DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-6 and 8-10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Weber et al. (US 9,401,140) in view of Gomar (US 9,042,867).
Claim 1,
Weber teaches a portable communication device ([Fig. 1] [col. 4 line 1] mobile phone) comprising:
a display ([Fig. 1] display);
a microphone ([Fig. 1] microphone); and
a processor operatively coupled with the display and the microphone ([Fig. 1] processor; display; microphone),
select a first acoustic model based at least in part on at least one voice input of the one or more voice inputs ([col. 4 lines 62-64] [col. 8 lines 4-11] the local computing device 200 may, upon capturing or receiving speech, select an acoustic model from the local data store 250; the capture module 216A may transcribe the speech in part using multiple acoustic models stored in the local data store 250; the acoustic model that provides the highest confidence level for transcribing the speech may be assigned to the speaker or characteristic),
generate a second acoustic model based at least in part on training the first acoustic model using the at least one voice input ([col. 4 line 66 to col. 5 line 2] [col. 9 line 66 to col. 10 line 4] update the selected acoustic model based on the transcription and store the updated model in the local data store 250; the modeling module 316B may then update the acoustic model to generate an updated acoustic model; the mean of Gl may be modified to have a value between m and m+p.), and
perform the audio-related function using the second acoustic model instead of the first acoustic model ([col. 3 lines 3-4] [col. 10 lines 29-31] the updated model provides greater accuracy for subsequent speech recognition; that only adjustments prompted by more recent speech recognition results are used to generate the updated acoustic model).
The difference between Weber and the claimed invention is that Weber does not explicitly teach receive a request for setting a personalized audio configuration associated with an audio-related function; display, via the display, one or more specified texts based at least in part on the request; receive one or more voice inputs of a user each corresponding to a respective text of the one or more specified texts.
Gomar teaches wherein the processor is configured to: receive a request for setting a personalized audio configuration associated with an audio-related function ([col. 8 lines 40-41] upon receipt of a request from a first user of a mobile device to enroll in a speaker recognition system),
display, via the display, one or more specified texts based at least in part on the request ([col. 8 lines 43-45] the enrollment and learning module displays a first plurality of text prompts to the first user),
receive one or more voice inputs of a user each corresponding to a respective text of the one or more specified texts ([col. 8 lines 45-47]),
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Weber with teachings of Gomar by modifying unsupervised acoustic model training as taught by Weber to include receive a request for setting a personalized audio configuration associated with an audio-related function; display, via the display, one or more specified texts based at least in part on the request; receive one or more voice inputs of a user each corresponding to a respective text of the one or more specified texts as taught by Gomar for the benefit of identifying/authenticating speakers using a mobile device (Gomar [Background]).
Claim 2,
Weber further teaches the portable communication device of claim 1, wherein the processor is further configured to: select the first acoustic model from a plurality acoustic models including the first acoustic model and a third acoustic model ([col. 2 lines 40-53] the local data store 250 may store one or more acoustic models for individual speaker or general acoustic model).
Claim 3,
Weber further teaches the portable communication device of claim 2, further comprising: memory storing an acoustic model database including the first acoustic model and the third acoustic model ([col. 4 line 55 to col. 5 line 16] local data store 250 includes one or more acoustic models for individual speakers or a general acoustic model).
Claim 4,
Weber further teaches the portable communication device of claim 1, wherein the processor is further configured to: as part of the selection of the first acoustic model, extract one or more acoustic characteristics from the at least one voice input ([col. 15 lines 5-24] extracting statistics/features from captured speech; computes statistics (e.g., the deltas and energy bands) which are acoustic characteristics derived from voice input and used in the acoustic-model process).
Claim 5,
The portable communication device of claim 1, wherein the processor is further configured to: perform the training of the first acoustic model based at least in part on a determination that the at least one voice input corresponds to respective one or more of the one or more specified texts by a specified validity ([col. 34 lines 17-60] validating spoken enrollment against the displayed phrase before creating the model/print; automated speech recognition determine if the user is saying the phrase displayed and once acceptable, a voice biometric print is created by combining utterance results).
Claim 6,
Gomar further teaches the portable communication device of claim 1, wherein the processor is further configured to: based at least in part on a determination that the at least one voice input does not correspond to respective one or more of the one or more specified texts by a specified validity ([col. 33 line 43 to col. 34 line 16] if the result is negative),
display an additional text ([col. 33 line 43 to col. 34 line 16] returning to earlier phrase/prompt selection; returning to step 1804 (selection of a new phrase)),
receive an additional voice input from the user corresponding to the additional text ([col. 33 line 43 to col. 34 line 16] returns to step 1814 which involves recording a new audio prompt), and
determine whether the specified validity is met further based on the additional voice input before the training of the first acoustic model ([col. 32 line 45 to col. 34 line 16] checking the newly recorded prompt (validity/quality) and repeating until acceptable; performing a check by performing various quality tests before proceeding).
Claim 8,
Weber further teaches the portable communication device of claim 1, wherein the processor is further configured to: select a third acoustic model over the first acoustic model based at least in part on at least one voice input corresponding to another user ([col. 2 line 40 to col. 3 line 4] speaker-dependent model selection among multiple model; each speaker has a unique acoustic model and select an acoustic model based on the speaker);
generate a fourth acoustic model based at least in part on training the third acoustic model using the at least one voice input corresponding to the other user ([col. 2 line 40 to col. 3 line 4] updating the selected model using captured speech; alignments used to update the acoustic model to generate an updated acoustic model); and
perform the audio-related function for the other user using the fourth acoustic model instead of the first acoustic model, the second acoustic model or the third acoustic model ([col. 2 line 40 to col. 3 line 4] using the updated model for subsequent operations; the updated model may provide greater accuracy for subsequent speech recognition).
Claim 9,
Gomar further teaches the portable communication device of claim 1, further comprising: a microphone, wherein the processor is further configured to: receive the at least one voice input via the microphone while at least one of the one or more specified texts is displayed ([col. 32 line 45 to col. 33 line 42] displayed phrase and recording while displayed; phrase is displayed… user may then record an audio prompt).
Claim 10,
Gomar further teaches the portable communication device of claim 1, wherein the processor is further configured to: receive the at least one voice input from a voice data file provided by the user ([col. 15 line 62] a voice biometric print is data file received from the user).
Claim 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Weber et al. (US 9,401,140) in view of Gomar (US 9,042,867) and further in view of Gruber et al. (US 2016/0016678).
Claim 7,
Weber and Gomar teach all the limitations in claim 1. The difference between the prior art and the claimed invention is that Weber nor Gomar explicitly teach wherein the processor is further configured to: provide a speech output corresponding to a text-to-speech (TTS) function as at least part of the audio-related function.
Gruber teaches wherein the processor is further configured to: provide a speech output corresponding to a text-to-speech (TTS) function as at least part of the audio-related function ([0609] generating 626 speech output, which is sent 636 to a speech generation module).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Weber and Gomar with teachings of Gruber by modifying unsupervised acoustic model training as taught by Weber to include wherein the processor is further configured to: provide a speech output corresponding to a text-to-speech (TTS) function as at least part of the audio-related function as taught by Gruber for the benefit of facilitating user interaction with a device, and to help the user more effectively engage with local and/or remote services (Gruber [0008]).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Kalluri et al. (US 2015/0161999) – the apparatus may include a presentation engine to play the media content; and a user interface engine to facilitate a user in controlling the playing of the media content. The user interface engine may include a user identification engine to acoustically identify the user; an acoustic speech recognition engine to recognize speech in voice input of the user, using an acoustic speech recognition model specifically trained for the user, and a user command processing engine to process recognized speech as user commands. Other embodiments may be described and/or claimed ([Abstract]).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHREYANS A PATEL whose telephone number is (571)270-0689. The examiner can normally be reached Monday-Friday 8am-5pm PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached at 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
SHREYANS A. PATEL
Primary Examiner
Art Unit 2653
/SHREYANS A PATEL/ Examiner, Art Unit 2659