DETAILED ACTION
This non-final office action is responsive to application 18/339,677 as submitted 22 June 2023.
Claim status is currently pending and under examination for claims 1-20 of which independent claims are 1 and 12.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
As required by MPEP 609(c), the applicant’s submissions of the Information Disclosure Statements dated 06/22/23 – 11/07/24 are acknowledged by the examiner and the cited references have been considered in the examination of the claims now pending. As required by MPEP 609 C(2), a copy of the PTOL-1449 initialed and dated by the examiner is attached to the instant office action.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 12-20 are rejected under 35 U.S.C. 101 for being directed to non-statutory subject matter. Claim 12 is drawn to a system comprising controller which can be software control of a software system implemented by computer programs. It fails to assert that the software system is executed by a processor and is recorded on a non-transitory computer-readable medium so as to be structurally and functionally interrelated to the medium and permit the function of the descriptive material to be realized. When read in light of instant specification [0023,22] non-limiting embodiments are disclosed without defining or otherwise requiring the inclusion of sufficient structural elements. Furthermore, while the claim nominally recites a vehicle at high level this element merely conveys a passive environment for the generating of features from voice signals which can be a software function. A computer program is merely a set of instructions capable of being executed by a computer. Without a processor or non-transitory computer-readable medium to realize a computer program's functionality, the computer program constitutes non-statutory functional descriptive material. See MPEP 2106.01. Claims 13-20 are rejected because they are dependent on claim 12 and do not resolve the issue above.
Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. In determining whether the claims are subject matter eligible, the examiner applies guidance set forth under MPEP 2106.
Step 1: Is the claim to a process, machine, manufacture, or composition of matter? Yes—all claims fall within, or could be amended to fall within, one of the four statutory categories set forth under MPEP 2106.03: claims 1-11 are a method/process, and claims 12-20 are a system/machine. In the case of claims 12-20 it is noted that if the claims could be amended to fall within one of the four statutory categories, then the analysis should proceed. Accordingly, the analysis continues.
Step 2A, prong one: Does the claim recite an abstract idea, law of nature or natural phenomenon? Yes—the claims, under the broadest reasonable interpretation, recites an abstract idea. In this case, claims fall within the enumerated grouping of abstract idea being “Mathematical Concepts” and/or “Mental Processes.” More particularly, claims recite:
“calculating similarities between an input vector of the input features and historical vectors in voiceprints of one or more enrolled users” (Mathematical Calculations, e.g. specification [0034] “Cosine similarity equals (1-cos α) where α is the angle between two vectors”)
“after determining a similarity between the input vector and at least one historical vector in a voiceprint of an identified user is less than a threshold similarity, authenticating the current speaker as the identified user” (Mental determination subject to math constraint, e.g. [0034] “validate the specific user and personalize the user’s usage preference”)
“calculating a probabilistic notion based on the similarity” (Mathematical Calculations e.g. [0037] “The probabilistic notion 150 may include a weight factor inversely proportional to the similarity 118, where the weight factor has a value between 0 and 1”)
“applying the probabilistic notion to interpolate between downstream user preference embeddings associated with the identified user” (Mathematical Calculations, e.g. [0042] “interpolated embedding 170 may be generated with the formula ax (user preference 327) +(1-a) x (usage embedding 337)” and/or interpolation by summation e.g. concatenation)
Focus of the claim concerns calculating. The calculated similarities between vectors comprise a threshold similarity and probabilistic ‘notion’ for interpolating embeddings. These functions characterize calculations which are specifically called out as abstract idea MPEP 2106.04(a)(2). Further using the calculations for determinations and authenticating speakers as identified users is a mental process which can be performed an evaluation with criteria for observed users. Accordingly, the claims are drawn to mathematical concepts and/or mental processes as the abstract idea.
Step 2A, prong two: Does the claim recite additional elements that integrate the judicial exception into a practical application? No—a practical application is not integrated by the judicial exception because the additional elements are as follows:
“generating, using a neural network trained to generate features based on training data comprising human voices spoken by a plurality of historical speakers inside a vehicle, input features based on a human voice of a current speaker inside the vehicle” MPEP 2106.05(h) generally linking the use of the judicial exception to a particular technological environment or field of use, i.e. ‘using’ a neural network for human voice in a vehicle, the voice input is an insignificant extra-solution activity under MPEP 2106.05(g) mere data gathering.
Balance of the claim concerns using a neural network trained to generated features as input from human voice signals collected in a vehicle. For example, [0029] “artificial intelligence techniques, such as, but not limited to…” e.g. [0046] “neural network, may be pre-trained” amounts to use of known models as an apply-it, off-the-shelf neural network which performs mere data gathering from human speech. Mere data gathering is an insignificant pre-solution activity under MPEP 2106.05(g). While the speech is gathered in a vehicle, the vehicle is not a positively recited element as functional component to responsively carry out a particular task, but rather collects voice data for inputting audio features into the neural network. Therefore, the claim remains directed to the abstract idea and additional elements do not elevate the claim in a manner that is sufficient for integrating the judicial exception into a practical application.
Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception? No—the claims do not include additional elements that amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea in to a practical application, the additional elements are identified with respect to MPEP 2106.05 and do not demonstrate an inventive concept. Particularly, the additional elements are as follows:
“generating, using a neural network trained to generate features based on training data comprising human voices spoken by a plurality of historical speakers inside a vehicle, input features based on a human voice of a current speaker inside the vehicle” MPEP 2106.05(h) generally linking the use of the judicial exception to a particular technological environment or field of use, i.e. ‘using’ a neural network for human voice in a vehicle, the voice input is an insignificant extra-solution activity under MPEP 2106.05(g) mere data gathering.
Significantly more is not satisfied by the balance of the claim through the additional elements identified above. The limitation of generating is to input features into a trained neural network that is trained on features of human voice in a vehicle. The use of using neural network for human voice in a vehicle is considered a field of use under MPEP 2106.05(h). The input is an insignificant pre-solution activity under MPEP 2106.05(g) and which is a well-understood, routine and conventional activity under MPEP 2106.05(d)(II)(i). No meaningful limitation is demonstrative of technical solution. If the claim language provides only a result-oriented solution, with insufficient detail for how a computer accomplishes it, then the claims do contain an inventive concept. Taken alone, the additional elements do not amount to significantly more than the above-identified judicial exception (the abstract idea). Looking at the limitations as an ordered combination adds nothing that is not already present when looking at the elements taken individually. As a whole, the breadth of claim provide the reader with little guidance to inform the public whether or not they are infringing and presents some risk of pre-emption.
For the above reasons, the claims are not patent eligible. This rejection applies to independent claims 1 and 12 as well to dependent claims 2-11 and 13-20. Independent claim 12 recites a system with controller to perform similar limitations. The controller is considered an additional element which falls under MPEP 2106.05(f) mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. The controller is recited at a high level of generality and does not satisfy the test of particular machine under MPEP 2106.05(b). Thus, the additional element does not integrate the judicial exception into a practical application or amount to significantly more.
Dependent claims, when analyzed as a whole, are held to be patent ineligible under 35 U.S.C. 101 because the additional recited limitations fail to establish that the claims are not directed to an abstract idea, or that they include additional elements which integrate the judicial exception into a practical application or amount to significantly more.
Dependent claim 2 discloses wherein similarity is Euclidean similarity or Cosine similarity. This is considered part of the abstract idea being a mathematical calculation as described per specification at [0034]. There are no additional elements.
Dependent claims 3 and 13 disclose wherein the probabilistic notion comprises a weight factor inversely proportional to the similarity. This is considered part of the abstract idea being mathematical calculations or mathematical relationships as described per instant specification [0037]. There are no additional elements.
Dependent claims 4 and 14 disclose wherein embeddings comprise user preference calculated based on narrated comments and usage embeddings comprising user interactions with vehicle. The embeddings are based on vectors which is fundamentally a math representation for calculation as the abstract idea. The embeddings may encode information pertaining to user preference and usage which convey mental processes such that preference is an opinion. Further, the voice interactions are based on narrated comments which is a human process of speech. The vehicle itself is an additional element that is considered generally linking the use of the judicial exception to a particular technological environment or field of use under MPEP 2106.05(h). The vehicle does not meaningfully limit the claim because it merely ingests embeddings with no output resulting from their inclusion. In other words, there is no concrete, real-world use case claimed. As a whole, the claim serves to embellish the embeddings. Accordingly, the claim remains drawn to the abstract idea and the additional elements fail to integrate the judicial exception into a practical application or amount to significantly more.
Dependent claims 5 and 15 disclose further determining whether the voice comprises a user interaction with the vehicle after authenticating, and integrating user interaction into usage embedding after determining human voice comprises interaction. The limitations are considered to be part of the abstract idea notwithstanding the vehicle. Particularly, mental determinations may be carried out by a human to verify inclusion of interaction information, this may include integrating based on math or evaluation of weighting a probabilistic notion. The vehicle is an additional element as already noted from parent claim 4 which falls under MPEP 2106.05(h). As such, the claim remains drawn to the abstract idea and additional elements fail to integrate the judicial exception into a practical application or amount to significantly more.
Dependent claims 6 and 16 disclose wherein the neural network comprises an incremental learning algorithm that dynamically integrates the input features weighted based on probabilistic notion into the voiceprint of identified user. The algorithmic step is seen to perform math for a neural network being used that is considered additional elements falling under MPEP 2106.05(h) field of use. It does not meaningfully limit the claim because neural networks ordinarily perform weighting of feature input data and the integrating is recited at a high level of generality which does not lend particularity to a technical solution. For example, integrating could be any means of combining or inclusion of identified data. Therefore, the additional elements do not integrate the judicial exception into a practical application or amount to significantly more.
Dependent claims 7-8 disclose shrinking the voiceprints by removing features less than a confidence threshold and overlapping voiceprint. This is considered part of the abstract idea being mathematical calculation or mental evaluation. For example, min/max with operand ‘>’ for the confidence threshold and overlap as union or intersecting set membership. The removal of features could be subject to numerous techniques like filtering, masking, dropout, pruning, or element-wise multiplicative products like Hadamard, Kronecker or Khatri-rao. There are no additional elements.
Dependent claims 9 and 17 disclose wherein input features comprise among a list of alternatives such as tone, pitch, volume, speed or timbre. The list of features characterizes human voice for input which may be an insignificant pre-solution activity under MPEP 2106.05(g) mere data gathering or selecting to type of data to manipulate. The data according to type is not a technical solution and fails to effect a particular transformation. As such, the additional elements do not integrate the abstract idea into a practical application or amount to significantly more.
Dependent claims 10 and 19 disclose wherein voiceprints are enrolled implemented by trigger such as vocal or physical or vocal trigger to begin enrollment and a recording of human voice to create the enrolled voiceprint. The implementation of enrolling a human voice may be carried out by listening for template response, e.g. a password for membership communicated verbally is a mental process. The limitation of recording is considered adding insignificant extra-solution activity to the abstract idea under MPEP 2106.05(g). Particularly, said extra-solution activity is a well-understood, routine and conventional activity identified by the courts under MPEP 2106.04(d)(II)(iii-iv). As such, the additional elements are not sufficient to integrate the judicial exception into a practical application or amount to significantly more.
Dependent claim 11 discloses calculating similarity between vectors of non-user voiceprints, determining whether similarity meets threshold, integrating the vector into voiceprint of non-users, and creating a voiceprint of a non-user based on the vector. The limitations are considered to be part of the abstract idea as mathematical calculations similar to the rationale of claim 1. There are no further additional elements.
Dependent claim 18 discloses a sound sensor to receive or record human voice. The limitation is considered an additional element which amounts to adding insignificant extra-solution activity under MPEP 2106.05(g). Particularly, said extra-solution activity is a well-understood, routine and conventional activity evidenced by Otsuka et al., US PG Pub No 2022/0301576 at [0003] “conventional microphones.” The claimed sound sensor is recited at a high level of generality and does not details a new type of sensor to meet inventive concept. As such, the additional elements do not integrate the abstract idea into a practical application or amount to significantly more.
Dependent claim 20 discloses a button or touchscreen is physically triggered when the button is pressed or touchscreen is touched. The limitation is considered an additional element which falls under MPEP 2106.05(g) adding insignificant extra-solution activity to the judicial exception. Particularly, said extra-solution activity is a well-understood, routine and conventional activity identified by the courts under MPEP 2106.05(d)(II)(vi) “button functionality.” Requiring a human to place a finger on the button for triggering implementation is a rudimentary manual process that does not meaningfully contribute to an inventive concept. Therefore, the additional elements do not integrate the judicial exception into a practical application or amount to significantly more.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-2, 9-10, 12 and 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over:
Choi et al., US PG Pub No 2020/0042285A1, hereinafter Choi, in view of
Jain et al., US PG Pub No 2021/0390959A1 hereinafter Jain, in view of
Ghosh et al., US PG Pub No 2024/0161728A1 hereinafter Ghosh.
With respect to claim 1, Choi teaches:
A method {Choi [0288] “method of the present disclosure” again [0078]} comprising:
generating, using a neural network trained to generate features based on training data comprising human voices spoken by a plurality of historical speakers inside a vehicle, input features based on a human voice of a current speaker inside the vehicle {Choi [0252] “generator may generate feature information of the spoken utterance of the user” particularly [0255] “generator may extract a feature vector… generator may train a deep neural network model by using the feature information of the voice actor spoken utterance as a training data set” collected from [0208] “in-vehicle acoustic signal based on, for example, a pre-trained deep neural network model” and historical is stored in database [0228], [0255]. Fig 2:200 shows vehicle, in-vehicle speakers include driver and passengers [0102]};
Choi discloses [0255] “similarity with the feature vector” and [0245] “user-customized settings” as well as [0089] “authentication process” and likelihood threshold with speaker verification [0231].
However, Choi does not explicitly recite “voiceprints” or “enrolled” users descriptive terms which are disclosed by Jain:
calculating similarities between an input vector of the input features and historical vectors in voiceprints of one or more enrolled users {Jain [0092] “Cosine Similarity” Equation details calculation between vectors using “voiceprints” for speaker verification, shown Fig 4A with “enrolled users” [0143]. The features are extracted for a neural network [0088-89]. See also [0101], [0282], Figs 5A and 11};
after determining a similarity between the input vector and at least one historical vector in a voiceprint of an identified user is less than a threshold similarity, authenticating the current speaker as the identified user {Jain Fig 2B:S218 “authenticating the speaker based on a substantial similarity” e.g. [0024] “similarity value being smaller than a threshold” smaller is less than threshold similarity, the threshold may entail distance calculation [0092-95] such that [0089,92] “voice embedding to be used as a biometric for authentication… threshold value to deem an identified speaker” Further, [0091,0102] “currently received voice features” and/or “test voice samples” [0083] correspond to current speaker. See also Figs 5,11 threshold for speaker enrollment database, as well as Fig 21 min/max functions};
calculating a probabilistic notion based on the similarity {Jain [0095] “similarity score is calculated… cosine similarity” Equations are probabilistic, notions may include e.g. distance, variance, standard deviation, and/or further weighting [0114-19] A. = x1∙w1 + x2∙w2 + x3∙w3… is a weighted sum of products where “weights (w1, w2, …, wn) may be learnt statistically, for example, by a frequentist method or by machine learning enabled probabilistic learning”}; and
Jain is directed to speaker recognition with connected devices to generating voice features using neural networks thus being analogous. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to calculate similarity as well as threshold for authentication and probabilistic notion per Jain in combination for a motivation that “there lies a need for a mechanism that can adapt itself automatically to overcome the challenge posed by voice variability during the enrolled speaker’s authentication and prevents false rejections” [0013] e.g. by detecting mismatch of voice features [0069] and/or more generally “speaker recognition performance improvement” [0157].
However, Jain does not explicitly recite “interpolate” which is disclosed by Ghosh:
applying the probabilistic notion to interpolate between downstream user preference embeddings associated with the identified user {Ghosh [0019,22] “interpolated speaker embeddings …interpolating embeddings of different speakers or groups of speakers, weighting different speaker embeddings” e.g. [0044-45] Equation with W-weighting is a probabilistic notion for embedding, noted “interpolated using speech/voice features of speakers whose speech was used for training of SM” is Speech Model e.g. neural network Figs 2A:120, 8:808 [0030] model includes biases, entails similarity [0033,35], and a preference may include attributes, e.g. [0019] “interpolated speech attributes”. Additionally see [0058] “automotive systems (e.g. an in-vehicle”}.
Ghosh is directed to speaker identification and generating synthetic speech features with neural networks thus being analogous. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to interpolate per Ghosh in combination to arrive at the invention as claimed for a motivation “resilient embeddings generated using the multi-stage (funnel) training approach can be combined into new embeddings that likewise produce a natural human-sounding speech” that is “robust against noise and various recording defects and artifacts… produce embeddings that are both resilient to noise and capable of generating artificial speech of high quality even when embeddings for different speakers are interpolated or otherwise combined” [0019-22] and/or further “facilitate accurate modeling of speech attributes and generation of speech synthesis of high quality” [0001] i.e. accuracy and robustness to noise.
With respect to claim 2, the combination of Choi, Jain and Ghosh teaches the method of claim 1, wherein
the similarity is a Euclidean similarity or a Cosine similarity {Jain [0092] “Cosine Similarity”}.
With respect to claim 9, the combination of Choi, Jain and Ghosh teaches the method of claim 1, wherein
the input features of a human voice comprise tone, pitch, volume, speed, or timbre {Choi [0254] “feature information including at least one of tone, dialect, gender, pitch, speed and age” and/or [0227-28] “timbre feature extraction”}.
With respect to claim 10, the combination of Choi, Jain and Ghosh teaches the method of claim 1, wherein
the voiceprints of one or more enrolled users are enrolled through an initial implementation, the initial implementation comprising a physical or vocal trigger of enrollment to initialize the enrollment and a recording of the human voice to create the voiceprint to be enrolled {Jain [0091] “enrolled voiceprints obtained by the speech recognition” and [0087] “triggers operation of the speech processing criteria” e.g. Fig 9:S906 “triggering, one or more of the voice-features, speaker identifier, and labels associated with the one more enrolled voiceprint” so as for [0005] “enrollment phase, user is enrolled based on capturing voice”, initial comprises [0016] “pre-registered voiceprint a first voiceprint” e.g. [0164] “User 1 attempts to access his account with voice command” describes Fig 14 auto-enroll, the recording is via microphone [0224] for storage in speaker enrollment database per Figs 4A or 11}.
With respect to claim 12, the rejection of claim 1 is incorporated. The difference in scope being a system comprising controller to perform limitations of claim 1 method. Choi illustrates a “controller” Fig 3:160 for system shown Figs 2-3, described e.g. [0118] “vehicle controller” and/or [0211] “controller 160 may perform machine learning.” The remainder of the claim is rejected for the same rationale as claim 1.
With respect to claim 17, the combination of Choi, Jain and Ghosh teaches the system of claim 12, and further teaches the limitation of claim 9. Therefore, the rejection of claim 9 is applied to claim 17.
With respect to claim 18, the combination of Choi, Jain and Ghosh teaches the system of claim 12, wherein the system further comprises
a sound sensor to receive or record the human voice {Choi [0263] “in-vehicle… microphones provided in the vehicle, the spoken utterance of the user of the vehicle may be registered through the microphone”}.
With respect to claim 19, the combination of Choi, Jain and Ghosh teaches the system of claim 12, and further teaches the limitation of claim 10. Therefore, the rejection of claim 10 is applied to claim 19.
Claims 3 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Choi, Jain and Ghosh in view of Ulasen et al., US PG Pub No 2023/0325717A1 hereinafter Ulasen.
With respect to claim 3, the combination of Choi, Jain and Ghosh teaches the method of claim 1. Ulasen teaches wherein
the probabilistic notion comprises a weight factor inversely proportional to the similarity {Ulasen [0033] “similarity score may be a weighted inverse of this distance” similar at [0029-35] with example calculations. The similarity is subject to threshold Fig 2:208 so as for re-training of selected model, the model may be a neural network [0025,45]}.
Ulasen is directed to training models subject to similarity criteria thus being analogous. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to employ weighted inverse of distance for similarity per Ulasen in combination to arrive at the invention as claimed as applying known techniques to known methods ready for improvement to yield predictable results and/or for a motivation “to repurpose machine learning models to make them more efficient for an arbitrary dataset” [0005].
With respect to claim 13, the combination of Choi, Jain and Ghosh teaches the system of claim 12, and further combination with Ulasen teaches the limitation of claim 3. Therefore, the rejection of claim 3 with equal motivation is applied to claim 13.
Claims 4-5 and 14-15 are rejected under 35 U.S.C. 103 as being unpatentable over Choi, Jain and Ghosh in view of Zhang et al., PCT WO2023/159536A1 as evidenced by translation US2024/0402989A1 hereinafter Zhang.
With respect to claim 4, the combination of Choi, Jain and Ghosh teaches the method of claim 1, wherein
the downstream user preference embeddings comprise a user preference and a usage embedding, where the user preference comprises user preferences calculated based on user comments narrated by the identified user during authentication and dynamic integration into historical user preferences {Jain [0082] “speaker embedding mapped with enrolled sample embedding” is a usage embedding for user/speaker, user preference includes attributes with target information [0282] and exemplified [0108] “personalized wakeup words” words are comments narrated (e.g. phrases, utterances) by the user being an enrolled speaker with enrolled embedding illustrated Fig 11A, such that [0089] “voice embedding to be used as a biometric for authenticating the speaker” and new embeddings may be included in the speaker database Fig 11B}; and
However, Jain does not explicitly relate the embedding to vehicle interactions which is met by Zhang:
the usage embedding comprises user interactions with the vehicle {Zhang discloses [0256] “voice assistant, speech embedding and vehicle status embedding… add the two factors to obtain feature representation” for a [0252] “in-vehicle voice assistant system in Fig. 6” interactions of in-vehicle voice assistant shown e.g. Figs 2(b)-1 or 4(a)-1, and described [0244] “a ‘word-word vector (embedding)’ matrix for learning and classification. Speech emotion detection analysis is to classify emotions based on a timbre and speaking speed of voice broadcast”. See also [0208] “user preference analysis”}.
Zhang is directed to embedding-based neural network training for in-vehicle voice interaction assistant thus being analogous. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to specify embeddings with in-vehicle voice interaction assistant per Zhang in combination to arrive at the invention as claimed for a motivation of an “automotive smart voice assistant (ASA)… improves human-computer interaction experience of the user” [0003,05] where “personalized interaction assistants are customized for different users” [0154] and “In a neural network, embedding can reduce a quantity of spatial dimensions of a discrete variable, and represent the variable meaningfully” [0247].
With respect to claim 5, the combination of Choi, Jain, Ghosh and Zhang teaches the method of claim 4, wherein:
after authenticating the identified user, further determining whether the human voice comprises a user interaction with the vehicle {Zhang see [0274] “vehicle detects that the second user sends speech information, the vehicle obtains the identity information of the second user through voiceprint information recognition” recognition/detection is determining, user is identified and authenticating is [0111,13] “user account login… After the account is logged in”}, and
after determining the human voice comprises the user interaction, integrating the user interaction weighted based on the probabilistic notion into the usage embedding associated with the identified user {Zhang [0256] “voice assistant, speech embedding and vehicle status embedding… add the two factors to obtain feature representation” adding is integrating/including e.g. by summation [0188,87] discloses weighting with calculation, Figs 14 and 13 show combining features for modeling such as neural network NN shown Fig 7}. Motivation is applied equally as in claim 4.
With respect to claim 14, the combination of Choi, Jain and Ghosh teaches the system of claim 12, and further combination with Zhang teaches the limitation of claim 4. Therefore, the rejection of claim 4 with equal motivation is applied to claim 14.
With respect to claim 15, the combination of Choi, Jain, Ghosh and Zhang teaches the system of claim 14, and further teaches the limitation of claim 5. Therefore, the rejection of claim 5 with equal motivation is applied to claim 15.
Claims 6 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Choi, Jain and Ghosh in view of Cai et al., “VSVC: Backdoor Attack Against Keyword Spotting based on Voiceprint Selection and Voice Conversion” hereinafter Cai (arXiv: 2212.10103v1) as evidenced by Li [24] as cited.
With respect to claim 6, the combination of Choi, Jain and Ghosh teaches the method of claim 1. Cai teaches wherein
the neural network comprises an incremental learning algorithm that dynamically integrates the input features weighted based on the probabilistic notion into the voiceprint of the identified user {Cai [P.3] Alg. 1, Lines 2-4 voiceprint extraction for speaker embeddings, Line 12 “append” is integrating, neural network shown Fig 1, the Alg. uses known model stargan where training is weighted per citation to earlier work Li [24] at Li-Fig 1}.
Cai is directed to deep learning for speech recognition with voiceprints thus being analogous. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to use the algorithm per Cai in combination to arrive at the invention as claimed for a motivation to “characterize a person’s speech feature in a combination of temporal and spatial dimensions… rich in features” [P.4 ¶3,8] and/or applying known techniques to known methods to yield predictable results where “users can easily control various smart devices through specific speech commands …consideration of cost saving and convenience of training” [P.1 ¶1].
With respect to claim 16, the combination of Choi, Jain and Ghosh teaches the system of claim 12, and further combination with Cai teaches the limitation of claim 6. Therefore, the rejection of claim 6 with equal motivation is applied to claim 16.
Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Choi, Jain and Ghosh in view of Zhang, Mingyuan US PG Pub No 2020/0058293A1 hereinafter ZhangM.
With respect to claim 7, the combination of Choi, Jain and Ghosh teaches the method of claim 1. ZhangM teaches wherein the method further comprises
shrinking the voiceprint of the identified user by removing a feature of the voiceprint having a confidence less than a threshold confidence {ZhangM [055] “removing echoes, or filtering out speeches of non-target objects according to features” i.e. [0067] “voiceprint feature information based on a relationship between the voice confidence value and a preset voice confidence value threshold” similarly at [0147], [0164], see Figs 4:S207, 5:S301}.
ZhangM is directed to voiceprint features for trained models thus being analogous. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to remove/filter features thereby shrinking based on a confidence threshold per Zhang in combination to arrive at the invention as claimed for a motivation “because not all the speech information in the speech information set is the information of the target object… perform screening” [0055] and “thereby increasing accuracy” [0048].
Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Choi, Jain, Ghosh and ZhangM in view of Zhang et al., “Enroll-Aware Attentive Statistics Pooling for Target Speaker Verification” hereinafter ZhangL.
With respect to claim 8, the combination of Choi, Jain, Ghosh and ZhangM teaches the method of claim 7. ZhangL teaches wherein
the voiceprint of the identified user is shrunk by removing the feature of the voiceprint overlapping with a voiceprint of another enrolled user {ZhangL discloses [P.312 ¶3] “filter the interfering speaker information when multiple speakers are overlapped” e.g. by [P.312 ¶4,1] “masking …to remove the interfering information” Fig 1 shows masking for embeddings from extracted features to convey voiceprints, and uses [P.312 ¶1,5] “enroll embeddings… remove the interfering speaker information with the help of enrollment embedding”}.
ZhangL is directed to speaker verification with trained neural networks thus being analogous. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to filter/remove overlapping speaker feature data per ZhangL in combination to arrive at the invention as claimed for a motivation of solving target speaker verification which “achieved a great improvement when facing multi-speaker overlapped utterances” [P.314 Sect.5].
Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Choi, Jain, Ghosh and ZhangM in view of Chaubey et al., “Speaker-specific Thresholding for Robust Imposter Identification in Unseen Speaker Recognition” hereinafter Chaubey (arXiv: 2306.00952v1).
With respect to claim 11, the combination of Choi, Jain and Ghosh teaches the method of claim 1. Chaubey teaches wherein the method further comprises:
calculating a non-user similarity between the input vector and vectors of voiceprints of one or more non-users; determining whether the non-user similarity is less than, equal to, or exceeds the threshold similarity {Chaubey discloses [P.1 ¶2] “Imposters in speaker identification are speakers which are not enrolled” not enrolled are non-users termed imposters so-titled, similarity threshold detailed [P.2 Sect2.1] Eq.3 maxS is “maximum similarity with the predicted speaker is less than the threshold” operand ‘ > ‘ is threshold, noting [P.3 Sect4.1] “Cosine similarity is used as the back-end for computing all the similarities” and “speaker embeddings” conveys vectors using sequence notation for known neural networks like ResNet};
after determining the non-user similarity is less than the threshold similarity, integrating the input vector into the voiceprint of the one or more non-users; and after determining the non-user similarity exceeds or equals the threshold similarity, creating a voiceprint of a non-user based on the input vector {Chaubey discloses [P.2 Sect.2 ¶1] “aggregated enrollment speaker embedding” where aggregating is integrating, and creating is [P.1 ¶1] “generate speaker embeddings for test” by encoder shown Fig 1, so as to [P.3 ¶2] “reject all speakers present in the enrollment set except sj as imposters”}.
Chaubey is directed to speaker recognition with trained models thus being analogous. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to specify imposter/not-enrolled non-users with similarity thresholding per Chaubey in combination to arrive at the invention as claimed for a motivation “the major contributions of this work are… (ii) We propose a speaker-specific thresholding technique for robust imposter identification in unseen speaker identification” [P.1 Last2¶].
Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Choi, Jain, Ghosh and ZhangM in view of Xu et al., PCT WO2022/233239A1 as evidenced by translation US2024/0071392A1 hereinafter Xu.
With respect to claim 20, the combination of Choi, Jain and Ghosh teaches the system of claim 19. Xu teaches wherein the system further comprises
a button or a touchscreen, where the initial implementation is physically triggered when the button is pressed or the touchscreen is touched {Xu Fig 4 “hold the button” to verify voice, e.g. [0098] “user may trigger the verification instruction by tapping a corresponding position of an icon corresponding to the voiceprint recognition function on a touchscreen”}.
Xu is directed to speaker verification devices with voiceprint feature extraction thus being analogous. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to specify the button/icon of touchscreen per Xu in combination to arrive at the invention as claimed for a motivation of prompting users for account login [0097] and/or beneficially “both voiceprint recognition performance and user experience are improved” [0045].
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Kothapally et al., “Deep Neural Mel-SubBand Beamformer for In-Car Speech Separation” arXiv: 2211.12590v2 see Fig 1
Zhang et al., “Towards Robust Speaker Verification with Target Speaker Enhancement” arXiv: 2103.08781v1 see Figs 1-2.
Nayak et al., “Improving Voice Trigger Detection with Metric Learning” arXiv: 2204.02455v2 see Fig 1, Eqs.4-6
Gosztolya, Gabor “Estimating the Level of Conflict Based on Audio Information using Inverse Distance Weighting” at [Sect.3] “Inverse Distance Weighting (IDW) was introduced by Shephard in 1968, originally for interpolating”
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Chase P Hinckley whose telephone number is (571)272-7935. The examiner can normally be reached M-F 9:00 - 5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda M. Huang can be reached at 571-270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/CHASE P. HINCKLEY/Examiner, Art Unit 2124