DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
Claims 1, 3 to 11, and 13 to 20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claims contain subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventor, at the time the application was filed, had possession of the claimed invention.
Firstly, independent claims 1 and 11 set forth a limitation of “after receiving the audio data representing the spoken voice command and after determining the identification of the identified speaker of the spoken voice command, determining . . . a device identifier”, which is maintained to represent new matter under 35 U.S.C. §112(a). MPEP §2163 I. B. provides a standard for identifying new matter, “While there is no in haec verba requirement, newly added claims or claim limitations must be supported in the specification through express, implicit, or inherent disclosure.” Here, it is maintained there is clearly no express written description of determining a device identifier of a second computing device occurs “after” receiving the audio data representing the spoken voice command and “after” determining the identification of the speaker. The Specification, at ¶[0033] and ¶[0035], appears to be the most relevant written descriptions of these features, but these do not expressly describe a limitation of “after”. Moreover, Applicant’s Specification, ¶[0033] and ¶[0035], does not implicitly support these “after” limitations because the enumerated steps are not described to have any temporal dependency, but can be construed as a ‘laundry list’ of ways to perform authentication.
Secondly, Applicant’s limitation of “determining, based on the identification of the identified speaker of the spoken voice command, a device identifier stored for a second computing device used by the identified speaker” is not supported by the Specification due to the limitation “used by the identified speaker. The Specification, ¶[0033], states, “the voice action server 220 may . . . determine a device identifier stored for the mobile computing device 230 used by ‘John Doe’”, but not that ‘John Doe’ who is using the second computing device is “the identified speaker”. The problem is that the described ‘John Doe’ is not necessarily equivalent to “the identified speaker”. Here, Applicant’s claim language of “used by the identified speaker” is not supported by the Specification because ‘John Doe’ is merely a generic placeholder. Applicant’s second computing device could be used by someone else than ‘John Doe’, so that “a device identifier stored” is not being “used by the identified speaker”. Applicant’s Specification, ¶[0033], only states that a device identifier is stored by a mobile device used by ‘John Doe’, but this description does not expressly state that ‘John Doe’ is the same as “the identified speaker”. Consequently, used by ‘John Doe’ is not implicitly the same as “used by the identified speaker”, so that this limitation is new matter under 35 U.S.C. §112(a).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 3, 7, 11, 13, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over DiMambro et al. (WO 2006/128171) in view of Lyman et al. (U.S. Patent Publication 2015/0058941).
Concerning independent claims 1 and 11, DiMambro et al. discloses a method and system for bio-metric voice print authentication, comprising:
“receiving audio data representing a voice command spoken by a speaker and captured by a first computing device” – one or more spoken utterances are received from a user, and a phrase corresponding to one or more spoken utterances is recognized (Abstract); one or more spoken utterances are received from a user, and a phrase corresponding to one of more spoken utterances is identified (¶[0020]); voice authentication system 200 can include voice authentication server 130 (“a first computing device”) (¶[0027]: Figure 2); one embodiment provides that an entire voice authentication including speech recognition can be conducted on server 130 (¶[0039]: Figure 2); a user may be accessing a website or voice mail requesting a service that requires authentication (¶[0042]: Figure 5); here, server 130 is “a first computing device” that receives “audio data . . . spoken by a speaker”; broadly, a pass phrase spoken by a user requesting a service is “a voice command spoken by a speaker”;
“determining, based on a confidence that the audio data representing the spoken voice command matching a stored voice print associated with speaker, an identification of an identified speaker of the spoken command” – a biometric voice print of the user is identified from one or more spoken utterances in Step 706 (Abstract: Figure 7); server 130 can acknowledge whether a pass phrase spoken by the user is a correct pass phrase and whether the biometric voice print associated with a pronunciation of the phrase is a correct match to a user profile in the database (¶[0023]: Figure 1); a biometric voice print is compared against previously stored voice prints for identifying a match; a feature matrix is calculated and compared against one or more reference matrices and a logarithmic distance can be calculated for each feature matrix of a biometric voice print; if a logarithmic distance is less than a predetermined threshold level, a match can be determined, and a speaker identified (¶[0050]); during verification, a biometric analyzer calculates a logarithmic distance, and evaluates a threshold for classifying and authorizing the user (¶[0068]); here, a logarithmic distance of a voice print match is equivalent to “a confidence that the audio data representing the spoken voice command matches a stored voice print associated with the speaker”; that is, a voice print matching distance is a number representing a confidence of a match, so that a lower distance corresponds to a higher confidence of a match;
“after receiving the audio data representing the spoken voice command and after obtaining the identification of the identified speaker of the spoken voice command, determining, based on the identification of the identified speaker of the spoken voice command, a device identifier stored for a second computing device used by the identified speaker, the second computing device different from the first computing device” – server 130 can verify that mobile device 102 (“a second computing . . . , the second computing device different from the first computing device”) is a device authorized for use to access resources and is a device associated with the biometric voice print of the user (“a second computing device used by the identified speaker”); user profiles can be stored on database 130, which can be used to identify a user of mobile device 102; a user profile can include a biometric voice print and a device identifier (“a device identifier stored for a second computing device used by the identified speaker”) (¶[0022]: Figure 1); server 130 can verify that the mobile device 102 is associated with the biometric voice print of the user (“determining, based on the identification of the identified speaker of the spoken voice command, a device identifier stored for a second computing device used by the identified speaker”); server 130 can validate that the user speaking into mobile device 102 is associated with the mobile device (¶[0023]: Figure 1); a method includes identifying a biometric voice print of the user from one or more utterances in Step 706, and determining a device identifier associated with the device in Step 708 (“after receiving the audio data representing the spoken voice command and after obtaining the identification of the identified speaker of the spoken voice command, determining . . . a device identifier”) (Abstract; ¶[0044]: Figure 7: Steps 706 and 708);
“selecting a voice action based on a transcription of the audio data” – a phrase is recognized corresponding to one or more spoken utterances (Abstract); upon authenticating the user, access can be granted to one or more resources having a communication with the device (¶[0005]); upon authorizing the user’s voice, access can be granted to one or more resources; a resource can provide a service available to the device including music downloading, on-line gambling, subscription, gaming, etc. (¶[0022]: Figure 1); implicitly, speech recognition generates “a transcription of the audio data”; broadly, a pass phrase specifies “a voice action” corresponding to a request to access a particular service, e.g., music downloading, on-line gambling, gaming, etc.;
“obtaining, from the second computing device, contextual data representative of a current location of the second computing device” – a location of a handset can be employed as a criteria for granting access to one or more resources (Abstract); a location of a device or the user can be determined for granting access (¶[0005]); server 130 can determine a location of device 102 for authorizing access to one or more resources; mobile device 102 can include a global positioning system (GPS) for identifying a location of the device; alternatively, the server can authorize access to resources based on a location stated by the user; a user can speak their location, (¶[0023]: Figure 1); gateway 145 can verify a location of the caller using information through GPS positional data provided by mobile device 102; gateway 145 can identify a location of the device from GPS data to establish a location of the caller (¶[0030]: Figure 2); here, GPS data of mobile device 102 is “contextual data representative of a current location of the second computing device”;
“determining, using the context data obtained from the second computing device, that the current location of the second computing device coincides with a location [pre-designated by the speaker]” – a location of a handset can be employed as a criteria for granting access to one or more resources in Step 712 (Abstract); a location of a device or the user can be determined for granting access (¶[0005]); a location of the handset or the user can be employed as an additional criteria for approving access to one or more resources (¶[0021]); server 130 can determine if the spoken location corresponds with an authorized or accepted location of the device or the user (“determining, using the context data obtained from the second computing device, that the current location of the second computing device coincides with a location”) (¶[0023]: Figure 1); a combination of biometric voiceprint recognition with a location verification capability makes a particularly convenient solution for applications including gambling, which may only be permitted in some states or territories, or commerce, where sale of certain items may not be permitted in some jurisdictions (¶[0030]: Figure 3); that is, a location of mobile device 102 or the user established by GPS must be in an authorized or accepted location in order to access a requested service;
“based on determining that the current location of the second computing device coincides with the location [pre-designated by the speaker], providing, to a service provider, a request to perform the selected voice action” – a location of a handset or the user can be employed as a criteria for granting access to one or more resources (Abstract); upon authorizing the user’s voice, access can be granted to one or more resources; a resource can provide a feature or service available to the device including music downloading, on-line gambling, subscription, gaming, etc. (“providing, to a service provider, a request to perform the selected voice action”) (¶[0022]: Figure 1); a combination of biometric voiceprint recognition with a location verification capability makes a particular convenient solution for applications including gambling, which may only be permitted in some states or territories, or commerce, where sale of certain items may not be permitted in some jurisdictions (¶[0030]: Figure 3); voice authentication system 200 can grant a user access to one or more resources available to device 102 based on authentication of the user’s voice for accessing the resources or services (¶0033]: Figure 4); here, a music downloading, gambling, and gaming are “service providers”, and a pass phrase to download music, access a gambling website, or a gaming website is “a request to perform the selected voice action”.
Concerning independent claims 1 and 11, DiMambro et al. does not expressly disclose the limitations that a current location of the second computing device coincides with a location “pre-designated by the speaker”. That is, DiMambro et al. states that a location of a mobile device must be an ‘authorized location’ or an ‘accepted location’, but does not provide that the location must be “pre-designated by the speaker.”
Concerning independent claims 1 and 11, Lyman et al. teaches location-based device security. (Abstract) Service provider 170 maintains a plurality of user accounts 180, each of which may include account information 185 associated with users. Account information may include passwords and device identifiers. (¶[0022]: Figure 1) A user may designate a home or work address as a secure location so that a service provider determines whether the location of the user device is within the user-designated location (“a location pre-designated”). (¶[0025]: Figure 2: Step 204) Based on the geographical location detected by a GPS sensor, user device 110 may allow access with a reduced security requirement at these designated secured locations. (¶[0027]: Figure 2) A request to designate a location as a secured location may be generated and displayed to the user as to whether the user would like to designate a merchant location as a secured location so that a reduced security requirement may be used for a next purchase. (¶[0031]: Figure 2: Step 208) User device 110 may use various communication sensors to detect GPS location. (¶[0039]: Figure 3) Each profile of designated location may include environmental conditions that identify the designated secured location including proximity to other devices and geographic location. (¶[0040]: Figure 3) When a user attempts to access a user account at service provider server 170, service provider server 170 may receive environmental data (“contextual data”) detected at user device 100 and identification information for the user account. Service provider 170 may determine based on the received environmental data and the designated secured location associated with the user account, whether user device 110 is in a secured location designated by the user (“based on determining that the current location of the second computing device coincides with the location pre-designated”). (¶[0042]: Figure 3) When service provider service 170 is a payment service provider and is used to make a purchase via user device 110 at a designated secured location, service provider 170 may generate an authentication request requiring less or no password. (¶[0043]: Figure 3) An objective is to enable a system that permits a user to easily gain access to a device or an online service when the user is in a secured location in which an access security requirement is not needed. (¶[0004]) It would have been obvious to one having ordinary skill in the art to determine a location pre-designated by a user to a service provider to perform an action as taught by Lyman et al. in biometric voice print authentication of DiMambro et al. for a purpose of permitting easy access when security is not needed.
Concerning claims 3 and 13, DiMambro et al. discloses that profile management module 420 can parse a user profile for a biometric voice print and compare the biometric voice print with other voice prints in a voice print database from multiple users having a registered voice print to determine a match with a voice print (“obtaining a plurality of speaker identification results each indicating a corresponding likelihood that the audio data representing the spoken command matches a corresponding one of a plurality of stored voice prints associated with different speakers”). (¶[0041]: Figure 4) During verification, a user speaks a spoken utterance, and a feature matrix is compared against one or more reference matrices stored in a voiceprint database. If a logarithmic distance is less than a predetermined threshold level, a match can be determined, and the speaker can be identified. (¶[0050]: Figure 8) During verification, biometric analyzer 944 evaluates a personal histogram to determine whether a biometric voice print analyzes one of the said plurality of biometric voice prints for identifying an identity of the user (“selecting, from among the plurality of speaker identification results, the speaker identification result having the highest corresponding likelihood as the identification of the identified speaker”). (¶[0068]) That is, audio data of a speaker is compared with voice prints of a plurality of speakers in a voice print database (“a plurality of stored voice prints associated with different speakers”), and ‘one’ speaker is identified based on a logarithmic distance less than a threshold to determine a match (“having the highest corresponding likelihood”). Here, a logarithmic distance of a voice print match is equivalent to ‘a likelihood” because a matching distance is a number representing a closeness of a match, so that a lower distance corresponds to a higher likelihood of a match.
Concerning claims 7 and 17, DiMambro et al. discloses that one or more spoken utterances are received from a user, and a phrase corresponding to one or more spoken utterances is recognized (“generating the transcription of the audio data using an automated speech recognizer”) (Abstract); one embodiment provides that an entire voice authentication including speech recognition can be conducted on server 130 (¶[0039]: Figure 2). Here, recognizing a spoken phrase by speech recognition is equivalent to “generating the transcription”.
Claims 4 to 5 and 14 to 15 are rejected under 35 U.S.C. 103 as being unpatentable over DiMambro et al. (WO 2006/128171) in view of Lyman et al. (U.S. Patent Publication 2015/0058941) as applied to claims 1 and 11 above, and further in view of Kim et al. (U.S. Patent Publication 2015/0302856).
Concerning claims 4 and 14, DiMambro et al. discloses granting access to resources corresponding to one of a plurality of resources, but does not disclose “selecting from a plurality of different service providers, the service provider that can perform the selected voice action.” However, Kim et al. teaches performing functions by speech input for a variety of applications capable of performing functions for users. A speaker 110 may speak, “I want to check my bank account”, “please show my photos”, or “open web browser”. Once a speech command is recognized, voice assistant application 130 may identify the function associated with the speech command, e.g., activating the banking application 140, the photo application 150, or the web browser 160. (¶[0026] - ¶0027]: Figure 1) Here, a banking application, a photo application, and a web browser correspond to the “a plurality of different service providers”. An objective is to perform a function associated with a speech command based on a security level associated with the speech command. (¶[0006]) It would have been obvious to one having ordinary skill in the art to select a service provider from a plurality of different service providers that can perform a selected voice action as taught by Kim et al. in voice print authentication of DiMambro et al. for a purpose of performing a function associated with a speech command based on a security level associated with the speech command.
Concerning claims 5 and 15, Kim et al. teaches that when a speech command is recognized, voice assistant unit 242 may identify a function associated with the speech command (“in response to determining that the mapping of voice actions indicates that the service provider can perform the selected action, selecting the service provider”), and map a plurality of functions to be performed by voice assistant 242 to a plurality of predetermined security levels. (¶[0040]) Store unit 260 may store a lookup table which maps one or more words in the speech command to a specified function (“determining the mapping of voice actions indicates that the service provider can perform the selected voice action”). (¶[0062]: Figure 5)
Claims 6 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over DiMambro et al. (WO 2006/128171) in view of Lyman et al. (U.S. Patent Publication 2015/0058941) as applied to claims 1 and 11 above, and further in view of Aleksic et al. (U.S. Patent Publication 2014/0337032).
DiMambro et al. does not expressly disclose “wherein each voice action in the set of voice actions identifies one or more terms that correspond to that voice action”, “determining that one or more terms . . . match the one or more terms that correspond to the voice action”, and “in response to determining that the one or more terms . . . match the or more terms that corresponding to the voice action”, selecting the voice action. Still, Aleksic et al. teaches a collection of voice action terms 136 includes a set of known words and/or grammar associated with commands. Voice action terms can include words, ‘call’, ‘text’, ‘navigate’, ‘send email’, etc. (¶[0015]: Figure 1) A transcription aligner 150 provides aligned transcriptions, and what transcribed words are to be used to execute a command. Words or phrases within each transcription may be associated with a confidence score and/or a weight that reflects each word or phrases’ likelihood of relevance within the context of a voice action. (¶[0023] - ¶[0024]) Aleksic et al., then, teaches identifying one or more terms that correspond to a voice action, and determining that one or more words of a transcription match the one or more terms that correspond to the voice action to select a voice action among a set of voice actions. An objective is to enhance a privacy of user information and provide improved recognition of speech phrases that are found in a user dictionary and a general purpose dictionary. (¶[0007]) It would have been obvious to one having ordinary skill in the art to identify terms from a transcription that correspond to a voice action as taught by Aleksic et al. in a voice print authentication of DiMambro et al. for a purpose of enhancing privacy of user information and providing an improved recognition of speech phrases.
Claims 8 to 9 and 18 to 19 are rejected under 35 U.S.C. 103 as being unpatentable over DiMambro et al. (WO 2006/128171) in view of Lyman et al. (U.S. Patent Publication 2015/0058941) as applied to claims 1 and 11 above, and further in view of Himmelstein (U.S. Patent No. 6,496,107).
DiMambro et al. discloses “service providers” that include services of music downloading, gambling, and gaming, but omits the limitations directed to “receiving, from the service provider, an indication that the service provider performed the selected voice action” and “wherein the first computing device is configured to output the indication that the service provider performed the selected voice action as synthesized speech.” However, Himmelstein teaches a voice-controlled vehicle control system that controls at least one function of a device, and executes a command instruction to control a device if a match is found to a voiceprint stored in memory. (Abstract) Specifically, Himmelstein teaches that if a voice command corresponds to a selected one of the voiceprint templates in memory circuit 32, microcomputer 30 generates a control signal to voice synthesis unit 44 to provide a digital output signal to speaker output circuit 42 that is converted to an audible output signal that supplies an output instruction signal 47. Speakers 46, 48 inform the user of the present operating status, so that system 20 may confirm that the instruction contained within the voice command is being carried out (“an indication that the service provider performed the selected voice action”). (Column 6, Lines 17 to 35: Figure 1) Here, a voice synthesis unit 44 “is configured to output the indication . . . as synthesized speech.” An objective is to provide a security device that incorporates voice-control and proximity for enhanced security. (Column 1, Lines 9 to 13) It would have been obvious to one having ordinary skill in the art to output an indication as synthesized speech that a selected voice action is performed as taught by Himmelstein in voice print authentication of DiMambro et al. to provide enhanced security incorporating voice-control and proximity.
Claims 10 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over DiMambro et al. (WO 2006/128171) in view of Lyman et al. (U.S. Patent Publication 2015/0058941) as applied to claims 1 and 11 above, and further in view of Muschett et al. (U.S. Patent Publication 2008/0243517).
DiMambro et al. discloses that a user can enter a PIN number during user profile creation. (¶[0042] - ¶[0043]: Figures 5 to 6) However, DiMambro et al. does not disclose “an authorization request requesting the speaker to provide an explicit authorization code that the service provider needs to perform the selected voice action”. Still, Muschett et al. teaches a voice user interface (VUI) with a dialog 200 that can include a VUI announcement which prompts the user to enter a user identifier. The VUI can prompt the user for a personal identification number (PIN) code or similar authorization code. (¶[0017]: Figure 2) An objective is to use speech bookmarks to facilitate navigating to a desired option in a hierarchical menu. (¶[0005]) It would have been obvious to one having ordinary skill in the art to request a user to enter an authorization code to perform a selected voice action as taught by Muschett et al. in voice print authentication of DiMambro et al. for a purpose of facilitating navigation to a desired option in a hierarchical menu.
Response to Arguments
Applicant's arguments filed 11 February 2026 have been fully considered but they are not persuasive.
Applicant amends independent claims 1 and 11 to change “a likelihood” to “a confidence” and “a device identifier that identifies a second computing device associated with the identified speaker” to “a device identifier stored for a second computing device used by the identified speaker”. Then Applicant presents arguments traversing the rejection for new matter under 35 U.S.C. §112(a) and the rejection for obviousness under 35 U.S.C. §103 over DiMambro et al. (WO 2006/128171) in view of Lyman et al. (U.S. Patent Publication 2015/0058941). Firstly, Applicant argues that changing the term “likelihood” to the term “confidence” overcomes that new matter rejection. Secondly, Applicant argues that ¶[0033] of the Specification provides implicit support for the temporal limitation of “after” and “based on the identification” because this paragraph describes server 220 determining that a voice command is from ‘John Doe’ based on the voice command matching a stored voice print for ‘John Doe’. Consequently, Applicant argues that this description provides ‘a logical sequence’ where speaker identification necessarily precedes a determination of a device identifier that corresponds to the new limitation of a second computing device “used by the identified speaker”. Thirdly, Applicant argues that DiMambro et al. “operates in a fundamentally different manner” with the device identifier being obtained before speaker identification, and not after. Applicant considers ¶[0043]: Figure 6 of DiMambro et al., and contends that the device identifier is obtained from the mobile device itself, and not determined based on identifying who is speaker through voice matching.
Generally, Applicant’s arguments are not persuasive, and the rejections are being maintained. Applicant’s amendment overcomes the new matter rejection as directed to the limitation of “a likelihood” under 35 U.S.C. §112(a). However, a new matter rejection under 35 U.S.C. §112(a) is being maintained for the limitation of “after receiving the audio data representing the spoken voice command and after obtaining the identification of the identified speaker of the spoken voice command, determining, based on the identification of the identified speaker of the spoken voice command, a device identifier stored for a second computing device used by the identified speaker”. Similarly, Applicant’s arguments are not persuasive as directed to the rejection of the independent claims as being obvious under 35 U.S.C. §103 over DiMambro et al. (WO 2006/128171) in view of Lyman et al. (U.S. Patent Publication 2015/0058941). The rejection of some dependent claims continues to rely upon Kim et al. (U.S. Patent Publication 2015/0302856), Aleksic et al. (U.S. Patent Publication 2014/0337032), Himmelstein (U.S. Patent No. 6,496,107), and Muschett et al. (U.S. Patent Publication 2008/0243517).
The examiner maintains that the problem is that Applicant is trying to expand the score of their invention to overcome the prior art but that this expanded scope is not supported by the originally-filed Specification. That is, Applicant is trying to overcome the prior art by introducing new matter at a point of novelty or obviousness. Applicant wants to have it both ways: that the expanded limitations are supported by the Specification, and that the expanded limitations are not disclosed or taught by the prior art. However, it is maintained that the expanded limitations are not supported by the Specification, and that the expanded limitations do not overcome the prior art. Actually, it is maintained that the prior art discloses these expanded limitations better than Applicant’s own Specification. MPEP §2163 I. B. provides a standard for identifying new matter,
While there is no in haec verba requirement, newly added claims or claim limitations must be supported in the specification through express, implicit, or inherent disclosure. (emphasis added)
Similarly, MPEP §2163.05 states:
The failure to meet the written description requirement of 35 U.S.C. 112(a) or pre-AIA 35 U.S.C. 25 U.S.C. 112, first paragraph, commonly arises when the claims are changed after filing to either broaden or narrow the breadth of the claim limitations, or to alter a numerical range limitation or to use claim language which is not synonymous with the terminology used in the original disclosure. To comply with the written description requirement of 35 U.S.C. 112(a) or pre-AIA 35 U.S.C. 25 U.S.C. 112, first paragraph or to be entitled to an earlier priority date or filing date under 35 U.S.C. 119, 120, or 362(c), each claim limitation must be expressly, implicitly, or inherently supported in the originally filed disclosure. (emphasis added)
Applicant’s claim language sets forth a limitation of “after receiving the audio data representing the spoken voice command and after obtaining the identification of the identified speaker of the spoken voice command, determining . . . a device identifier” which is maintained to be new matter under 35 U.S.C. §112(a). The examiner contends that this embodiment might be considered ‘obvious’ given what it is described in the originally-filed Specification, but this limitation is not ‘implicit’ as contended by Applicant. A standard for determining if a limitation is implicit or inherent is a stronger standard than determining if a limitation is obvious. One way of considering this is to view Applicant’s originally-filed Specification as a prior art reference, and determining if the Specification would anticipate the claim limitations in a sense of 35 U.S.C. §102.
Applicant contends that these limitations are supported by ¶[0033] of the Specification, which states:
In some implementations, the voice action server 220 may determine an identity of the user from the audio data, identify the mobile computing device 230 of the user, determine a status of the mobile computing device, and then determine the values for the identified input data types from the status of the mobile computing device. For example, the voice action server 220 may determine that a voice command is from "John Doe" based on the voice in the voice command matching a stored voice print for "John Doe," determine a device identifier stored for the mobile computing device 230 used by "John Doe," request information from the device 230 corresponding to the device identifier, receive the requested information from the device 230, and generate values using the information from the device 230. In another example, the voice action server 220 may determine that a voice command is from "John Doe" based on the voice in the voice command matching a stored voice print for "John Doe," determine a device identifier stored for the mobile computing device 230 used by "John Doe," identify information already stored by the voice action server 220 corresponding to the device identifier, and generate values using the identified already stored information.
This appears to be the maximum of support for the claim limitations provided by the Specification. However, the examiner contends that this description of “determine an identity of the user from the audio data, identify the mobile computing device 230 of the user, determine a status of the mobile computing device, and then determine the values for the identified input data types from the status of the mobile computing device” can be construed as only a “laundry list” of capabilities of the invention, and do not necessarily require a causal ordering of steps that determining an identify of the user occurs before the identifying of mobile computing device of the user is determined to support the “after” limitations. Consequently, Applicant’s limitations of “after receiving the audio data representing the spoken voice command” and “after determining the identification of the identified speaker of the spoken voice command, determining . . . a device identifier” is not implicitly or inherently described by the originally-filed Specification. This limitation of a temporal ordering may be considered an obvious modification of what is described by the Specification, but it is not necessarily implicit or inherent that an ordering of steps must require determining a device identifier “after” receiving the audio data and “after” determining an identification of the identified speaker given what is described at ¶[0033] of the Specification. Consequently, these limitations do not necessarily present “a logical sequence” as argued by Applicants. These “after” limitations are not expressly described in the Specification, and do not have a sense of necessity as required for the concepts of ‘implicit’ or inherent’ in accordance with patent law.
Moreover, Applicant’s limitation of “determining, based on the identification of the identified speaker of the spoken voice command, a device identifier stored for a second computing device used by the identified speaker” is not supported by the Specification, ¶[0033]. The Specification, ¶[0033], states: “For example, the voice action server 220 may determine that a voice command is from ‘John Doe’ based on the voice in the voice command matching a stored voice print for ‘John Doe,’ determine a device identifier stored for the mobile computing device 230 used by ‘John Doe’ . . . .” Here, Applicant argues that “one must identify the speaker before determining which device belongs to that speaker.” This is not necessarily true as evidenced by the fact that this is precisely the element that Applicant contends is missing from DiMambro et al. Conceivably, a device identifier may be determined, and then voice authentication can be performed. Similarly, ¶[0033] of the Specification states:
In another example, the voice action server 220 may determine that a voice command is from "John Doe" based on the voice in the voice command matching a stored voice print for "John Doe," determine a device identifier stored for the mobile computing device 230 used by "John Doe," identify information already stored by the voice action server 220 corresponding to the device identifier . . . .
Applicant argues that the device identifier is determined for the device “used by John Doe”, where ‘John Doe’ is identified through voice matching, and this demonstrates that the device identifier determination is based on the speaker identification. The problem is that “determining . . . a device identifier” that a device is being used by ‘John Doe’ could be performed independently from “determining the identification of the identified speaker” being ‘John Doe’ in the sense of the two “determining” steps of ¶[0033] being presented in a manner of a ‘laundry list’. Determining a device identifier of a second computing device being used by ‘John Doe’ as described at ¶[0033] does not require a determination that a speaker is ‘John Doe’ so as to support a limitation of “used by the identified speaker”. That is, “the identified speaker” is not necessarily determined to be ‘John Doe’ at a time that of determining a device identifier as described in the Specification. Applicant’s written description would support a limitation of “used by the user” but not “used by the identified speaker”. Determining the device identifier, then, might not be “based on identification of the identified speaker of the spoken voice command” as described by ¶[0033] of the Specification: “the identified speaker” could be independently determined to be ‘John Doe’ and “the device identifier” could be independently determined to be ‘John Doe’. There is, then, not necessarily “a logical sequence” as argued by Applicants that is implicitly supported by the Specification.
Next, Applicant’s argument is not persuasive that DiMambro et al. operates “in a fundamentally different manner” than the claims and that the device identifier is not determined based on identifying who is speaker. Firstly, DiMambro et al. expressly provides a flowchart that is disclosed in the Abstract and ¶[0044]: Figure 7, which clearly describes that Step 702 of “receiving one or more spoken utterances of the user” and Step 706 of “identifying a biometric voice print of the user” are temporally before Step 708 of “determining a device identifier associated with the device”.
[0044] Referring to FIG. 7, a method 700 for voice authentication on a device is shown. The method can include receiving one or more spoken utterances from a user (702), recognizing a phrase corresponding to the one or more spoken utterances (704), identifying a biometric voice print of the user from a variability of the one or more spoken utterances of the phrase (706), determining a device identifier associated with the device (708), and authenticating the user based on the phrase, the biometric voice print, and the device identifier (710). (emphasis added)
DiMambro et al., then, is maintained to clearly disclose a temporal relationship of Applicant’s “determining . . . a device identifier” occurring “after” “receiving the audio data” and “after determining the identification of the identified speaker” in a manner better than described in Applicant’s Specification, so that the prior art does not operate “in a fundamentally different manner”. Moreover, DiMambro et al., at ¶[0023], clearly states:
For example, the voice authentication server 130 can determine whether characteristics of the user's voice captured during a pronunciation of the pass phrase match one or more biometric voice prints in the database 140 for authenticating access to one or more resources. The server 130 can also verify that the mobile device 102 is a device authorized for use to access resources and is a device associated with the biometric voice print of the user. In particular, the server 130 can validate that the user speaking into the mobile device 102 is associated with the mobile device. (emphasis added)
That is, ¶[0023] of DiMambro et al. discloses that server 130 first matches a user utterance against voice prints stored in a database, and then verifies that this mobile device 102 is associated with the voice print of this same user. Looking at Figure 7 and ¶[0023] of DiMambro et al., it is clear that a temporal ordering of the algorithm is (1) “receiving the audio data representing the spoken voice command” and “determining the identification of the identified speaker of the spoken voice command”, and then (2) “determining, based on the identification of the identified speaker of the spoken command, a device identifier stored for a second computing device used by the identified speaker”. That is, (2) occurs “after” (1). Applicant’s argument, then, is incorrect that “DiMambro does not teach determining a device identifier ‘based on the identification of’ the identified speaker.”
The examiner reiterates that Applicants’ reliance upon ¶[0043]: Figure 6 of DiMambro et al. is not relevant. Here, DiMambro et al., at ¶[0043], Figure 6 discloses a broad procedure that includes an initial registration by recording a user voice for voice print verification, typing in a PIN, and then verifying the user’s voice. Figure 6 does not get into as many details of Figure 7, which is the embodiment upon which the rejection relies, and Figure 6 is not relevant to the rejection. Applicant is simply arguing an irrelevant embodiment of the prior art. Additionally, Applicant alleges that DiMambro et al. provides the device identifier before speaker identification, not after, by arguing that the voice identifier is obtained from the mobile device itself through a wireless connection, and not determined based on identifying who is speaking. Even assuming that it is true that a mobile computing device is registered with a wireless network before authentication as speculated by Applicant, DiMambro et al. still discloses checking that the device identifiers match up after the voice characteristics of the user match up with the voice print of the user. Here, DiMambro et al., at ¶[0022], states:
The mobile communication environment 100 can include a voice authentication server 130, a database 130, and one or more mobile devices 102. User profiles can be stored on the database 130 which can be used to identify a user of the mobile device 102. A user profile can include a pass phrase, a biometric voice print, and a device identifier. The server 130 can compare a user's profile to other user profiles stored on the database 140 for authorizing the user's voice. For example, a user of the mobile device 102 can speak into the mobile device for accessing one or more resources available to the mobile device.
Database 130 stores voice profiles that include biometric voice prints and device identifiers that are used to authenticate the user. Even assuming that a mobile computing device of DiMambro et al. is already registered with the wireless network with some device identifier, a device identifier stored in a user profile is not accessed for purposes of authentication until a user speaks a pass phrase to initiate authentication. DiMambro et al. still must access a device identifier of the mobile computer with a device identifier stored in a voice profile of database 130 to perform authentication even if mobile device 102 is already registered with a wireless network. Consequently, DiMambro et al. discloses the limitation of “determining, based on the identification of the identified speaker of the spoken voice command, a device identifier stored for a second computing device used by the identified speaker”.
Applicant’s arguments are not persuasive. There are no new grounds of rejection. Accordingly, this rejection is properly FINAL.
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARTIN LERNER whose telephone number is (571) 272-7608. The examiner can normally be reached Monday-Thursday 8:30 AM-6:00 PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on (571) 272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MARTIN LERNER/Primary Examiner
Art Unit 2658 February 25, 2026