Prosecution Insights
Last updated: April 19, 2026
Application No. 18/415,339

VOICE PROCESSING DEVICE, METHOD, AND RECORDING MEDIUM FOR DETERMINING THAT AN INPUT VOICE IS A REGISTERED PERSON

Final Rejection §103
Filed
Jan 17, 2024
Examiner
PULLIAS, JESSE SCOTT
Art Unit
2655
Tech Center
2600 — Communications
Assignee
Panasonic Intellectual Property Management Co., Ltd.
OA Round
2 (Final)
83%
Grant Probability
Favorable
3-4
OA Rounds
2y 8m
To Grant
96%
With Interview

Examiner Intelligence

Grants 83% — above average
83%
Career Allow Rate
873 granted / 1052 resolved
+21.0% vs TC avg
Moderate +13% lift
Without
With
+13.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 8m
Avg Prosecution
47 currently pending
Career history
1099
Total Applications
across all art units

Statute-Specific Performance

§101
15.0%
-25.0% vs TC avg
§103
50.4%
+10.4% vs TC avg
§102
19.7%
-20.3% vs TC avg
§112
4.9%
-35.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 1052 resolved cases

Office Action

§103
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . DETAILED ACTION This office action is in response to correspondence 11/13/25 regarding application 18/415,339, in which claims 1, 4, and 7-10 were amended and new claim 11 was added. Claims 1-11 are pending in the application and have been considered. Foreign Priority Receipt is acknowledged of certified copies of papers submitted under 35 U.S.C. 119(a)-(d), which papers have been placed of record in the file. Response to Arguments The amended title of the invention overcome the objection for not being descriptive, and so the objection is withdrawn. The examiner agrees with Applicant on page 7 that no new matter is added by the amendments to claims 1, 4, and 7-10 and newly added claim 11. In particular, the examiner finds that new claim 11 is adequately supported by Figure 2, steps S6-S8, and the supporting description. Applicant’s arguments on pages 7-8 regarding the claim amendments and the 35 U.S.C. 101 rejections have been considered and are persuasive, and so the rejections are withdrawn. In particular, the examiner agrees with Applicant that the “when the similarity is equal to or larger than the first threshold and smaller than a second threshold larger than the first threshold, updating the second feature in the memory by replacing the second feature with the first feature” as recited in the amended claims cannot be performed as a mental process and is significant from a technological standpoint as a solution for saving memory capacity when a person’s voice changes over time. Applicant’s arguments on pages 8-9 regarding the 35 U.S.C. 102(a)(1) rejections based on Yoshioka and the 35 U.S.C. 103 rejections based on Yoshioka and Welbourne have been considered but are moot in view of the new grounds for rejection, based in part on the newly discovered reference to Miki et al. (US 20200227027), which discloses updating a second feature in a memory by replacing the second feature with a first feature, similar to the amended claim language. The new grounds for rejection based in part on Miki are prompted by Applicant’s amendments. Applicant’s arguments on page 10 regarding new claim 11 are similar to those above for claim 1, and are moot for the same reasons. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1-4, 6, 7, 9, and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Yoshioka et al. (US 20080059805) in view of Miki et al. (US 20200227027). Consider claim 1, Yoshioka discloses a voice processing device (biometrics authentication apparatus based on voice, [0030]) comprising: a memory in which a computer program is stored (storage device 20, e.g. magnetic memory, [0032], with program, [0033]), the memory storing one or more registered features of registered persons (storing biometric information in dictionaries, [0031]-[0032], stored in the storage device 20 individually with respect to a plurality of users, [0072]); and a hardware processor coupled to the memory and configured to perform processing by executing the computer program (CPU executing the program, [0033]), the processing including: receiving, from a microphone, an input voice signal (the input unit is a microphone that receives a voice spoken by the subject, [0031]); calculating a first feature being a feature of the input voice signal (generating biometrics information DIN from the user’s voice, such as a cepstral vector sequence, [0037-0038]); determining a similarity between the first feature and a second feature out of the one or more registered features (a distance DST_1 is calculated between the biometrics DIN, a first feature, and biometrics information DO, a second feature specific to the registered user, of the dictionary DIC_1, which represents the similarity, [0039]); when the similarity between the first feature and the second feature is equal to or larger than a first threshold, outputting a result the input voice signal is a voice of a first registered person out of the registered persons, the first registered person corresponding to the second feature (distance DST_1 is compared to threshold value VTHR1, and when it is smaller, validity of the user is confirmed, and the controller informs the subject through the output unit that the subject has been authenticated, [0040]; since the second threshold is set lower than the first threshold, the similarity is also above the second threshold, [0025]); and, when the similarity is equal to or larger than the first threshold and smaller than a second threshold, updating the registered features with the first feature (determining whether distance DST_1 is smaller than threshold value VTHR2, i.e. similarity is larger than a first threshold, [0043], and when the distance is between the thresholds, using auxiliary authentication to confirm validity of the user, [0044-0046], Fig 2 steps SA15-SA17, and since the user’s voice probably changed, creating a new dictionary entry which is updated for the user including biometrics information DIN, [0046]). Yoshioka does not specifically mention updating the second feature in the memory by replacing the second feature with the first feature. Miki discloses updating a second feature in a memory by replacing the second feature with a first feature (an existing template is replaced with the latest template created using parameter values obtained from the latest conversation unit, [0103]). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Yoshioka by updating a second feature in a memory by replacing the second feature with a first feature in order to avoid the speaker recognition rate decreasing if the user’s speaking style changes after registration, as suggested by Miki ([0003]). Doing so would have led to predictable results of reducing additional work of reregistration on the part of the user, as identified by Miki ([0003]). The references cited are analogous art in the same field of speech processing. Consider claim 9, Yoshioka discloses a method implemented by a computer (method performed by executing instructions on a computer, [0023-0024]), the method comprising: receiving, from a microphone, an input voice signal (the input unit is a microphone that receives a voice spoken by the subject, [0031]); calculating a first feature being a feature of the input voice signal (generating biometrics information DIN from the user’s voice, such as a cepstral vector sequence, [0037-0038]); determining a similarity between the first feature and a second feature out of the one or more registered features of registered persons stored in a memory (a distance DST_1 is calculated between the biometrics DIN, a first feature, and biometrics information DO, a second feature specific to the registered user, of the dictionary DIC_1, which represents the similarity, [0039]); when the similarity between the first feature and the second feature is equal to or larger than a first threshold, outputting a result the input voice signal is a voice of a first registered person out of the registered persons, the first registered person corresponding to the second feature (distance DST_1 is compared to threshold value VTHR1, and when it is smaller, validity of the user is confirmed, and the controller informs the subject through the output unit that the subject has been authenticated, [0040]; since the second threshold is set lower than the first threshold, the similarity is also above the second threshold, [0025]); and, when the similarity is equal to or larger than the first threshold and smaller than a second threshold, updating the registered features with the first feature (determining whether distance DST_1 is smaller than threshold value VTHR2, i.e. similarity is larger than a first threshold, [0043], and when the distance is between the thresholds, using auxiliary authentication to confirm validity of the user, [0044-0046], Fig 2 steps SA15-SA17, and since the user’s voice probably changed, creating a new dictionary entry which is updated for the user including biometrics information DIN, [0046]). Yoshioka does not specifically mention updating the second feature in the memory by replacing the second feature with the first feature. Miki discloses updating a second feature in a memory by replacing the second feature with a first feature (an existing template is replaced with the latest template created using parameter values obtained from the latest conversation unit, [0103]). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Yoshioka by updating a second feature in a memory by replacing the second feature with a first feature for reasons similar to those for claim 1. Consider claim 10, Yoshioka discloses a non-transitory computer-readable recording medium on which programmed instructions are recorded (machine readable medium with instructions, [0024], such as semiconductor memory, [0032]), the instructions causing a computer to execute processing (instructions executed by a computer, [0024]), the processing comprising: receiving, from a microphone, an input voice signal (the input unit is a microphone that receives a voice spoken by the subject, [0031]); calculating a first feature being a feature of the input voice signal (generating biometrics information DIN from the user’s voice, such as a cepstral vector sequence, [0037-0038]); determining a similarity between the first feature and a second feature out of the one or more registered features of registered persons stored in a memory (a distance DST_1 is calculated between the biometrics DIN, a first feature, and biometrics information DO, a second feature specific to the registered user, of the dictionary DIC_1, which represents the similarity, [0039]); when the similarity between the first feature and the second feature is equal to or larger than a first threshold, outputting a result the input voice signal is a voice of a first registered person out of the registered persons, the first registered person corresponding to the second feature (distance DST_1 is compared to threshold value VTHR1, and when it is smaller, validity of the user is confirmed, and the controller informs the subject through the output unit that the subject has been authenticated, [0040]; since the second threshold is set lower than the first threshold, the similarity is also above the second threshold, [0025]); and, when the similarity is equal to or larger than the first threshold and smaller than a second threshold, updating the registered features with the first feature (determining whether distance DST_1 is smaller than threshold value VTHR2, i.e. similarity is larger than a first threshold, [0043], and when the distance is between the thresholds, using auxiliary authentication to confirm validity of the user, [0044-0046], Fig 2 steps SA15-SA17, and since the user’s voice probably changed, creating a new dictionary entry which is updated for the user including biometrics information DIN, [0046]). Yoshioka does not specifically mention updating the second feature in the memory by replacing the second feature with the first feature. Miki discloses updating a second feature in a memory by replacing the second feature with a first feature (an existing template is replaced with the latest template created using parameter values obtained from the latest conversation unit, [0103]). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Yoshioka by updating a second feature in a memory by replacing the second feature with a first feature for reasons similar to those for claim 1. Consider claim 2, Yoshioka discloses, in the processing, the hardware processor performs the updating of the registered features with the first feature in response to determining that the similarity is equal to or larger than the first threshold and smaller than the second threshold due to a change in voice quality of the first registered person (when the distance is between the thresholds, using auxiliary authentication to confirm validity of the user, [0044-0046], Fig 2 steps SA15-SA17, and since the user’s voice probably changed, creating a new dictionary entry which is updated for the user including biometrics information DIN, [0046]). Yoshioka does not specifically mention updating the second feature in the memory by replacing the second feature with the first feature. Miki discloses updating a second feature in a memory by replacing the second feature with a first feature (an existing template is replaced with the latest template created using parameter values obtained from the latest conversation unit, [0103]). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Yoshioka by updating a second feature in a memory by replacing the second feature with a first feature for reasons similar to those for claim 1. Consider claim 3, Yoshioka discloses the first threshold and the second threshold are each variable in setting (threshold values VTHR1 and VTHR2 can be properly modified by controller 30, [0040], [0043], by user instruction operating the operating unit 40, [0035]). Consider claim 4, Yoshioka discloses, in the processing, the hardware processor calculates a feature for each predetermined segment of the voice signal (the extracted period from which cepstral vector sequences are computed, [0037], where periods are extracted over a long period of time, [0062]), and performs the adding of the first feature to the registered features or the updating of the registered features with the first feature in response to determining, as to the predetermined segments, that the similarity between the feature calculated for each predetermined segment and the second feature is equal to or larger than the first threshold and smaller than the second threshold (determining whether distance DS_1 is smaller than threshold value VTHR2, i.e. similarity is larger than a first threshold, [0043], and when the distance is between the thresholds, using auxiliary authentication to confirm validity of the user, [0044-0046], Fig 2 steps SA15-SA17, and since the user’s voice probably changed, creating a new dictionary entry which is updated for the user including biometrics information DIN, [0046]). Yoshioka does not specifically mention updating the second feature in the memory by replacing the second feature with the first feature. Miki discloses updating a second feature in a memory by replacing the second feature with a first feature (an existing template is replaced with the latest template created using parameter values obtained from the latest conversation unit, [0103]). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Yoshioka by updating a second feature in a memory by replacing the second feature with a first feature for reasons similar to those for claim 1. Consider claim 6, Yoshioka discloses a user interface used for selecting a type of a mode, wherein, in the processing, the hardware processor registers the first feature in association with a mode selected by the user interface (user instructs controller to modify various parameters used for authentication or start authentication, i.e. select a type of mode, by instructing apparatus to start initial registration, which collects biometrics information DO, [0035], [0037]). Consider claim 7, Yoshioka discloses a storage device in which registered features corresponding to multiple modes are stored (template files including biometric information in storage device 20, DIC_1-DIC_M, corresponding to multiple speaker confirmation modes [0032]), wherein, in the processing, the hardware processor automatically determines a mode among the multiple modes (mode comparing DST_i with DIC_i, [0039]), and makes the determination that the input voice signal is a voice of the first registered person corresponding to the second feature when the similarity between the first feature and the second feature is equal to or larger than the first threshold, the second feature being a feature out of one or more registered features corresponding to the determined mode (determining whether distance DST_i, which is a comparison between biometrics DIN and DIC_i, is smaller than threshold value VTHR2, i.e. similarity is larger than a first threshold, [0043], and when the distance is between the thresholds, using auxiliary authentication to confirm validity of the user, [0044-0046], Fig 2 steps SA15-SA17, and since the user’s voice probably changed, creating a new dictionary entry which is updated for the user including biometrics information DIN, [0046]). Claims 5 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Yoshioka et al. (US 20080059805) in view of Miki et al. (US 20200227027), in further view of Welbourne et al. (US 10178301). Consider claim 5, Yoshioka discloses: a speaker configured to reproduce a sound (a speaker for outputting the authentication result as a sound, [0035]). Yoshioka and Miki do not specifically mention a cancellation mechanism configured to cancel, from the input voice signal, a signal of the sound reproduced by the speaker. Welbourne discloses a cancellation mechanism configured to cancel, from the input voice signal, a signal of the sound reproduced by the speaker (device includes microphones that capture audio signals that include user speech, Col 10 lines 17-28, and performs acoustic echo cancellation across the lobes or microphones, Col 10 lines 51-57, on sound from speaker elements, Col 8 lines 38-43). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Yoshioka and Miki by including a cancellation mechanism configured to cancel, from the input voice signal, a signal of the sound reproduced by the speaker in order to produce a cleaner SNR, as suggested by Welbourne (Col 1 lines 20-28). Doing so would have led to predictable results of improving efficacy of speaker recognition on the resulting audio signal, as suggested by Welbourne (Col 11 lines 1-3). The references cited are analogous art in the same field of audio processing. Consider claim 8, Yoshioka discloses the processing further includes performing recognition of a speaking person (auxiliary authentication to confirm validity of the user, [0044-0046]), and, in the processing, the hardware processor registers the first feature in association with the determined mode when the speaking person is recognized as the first registered person as a result of the recognition and no feature corresponding to the determined mode has been registered in the storage device (when the distance is between the thresholds, using auxiliary authentication to confirm validity of the user, [0044-0046], Fig 2 steps SA15-SA17, and since the user’s voice probably changed, creating a new dictionary entry which is updated for the user including biometrics information DIN, [0046]) Yoshioka and Miki do not specifically mention performing facial recognition. Welbourne discloses performing facial recognition (facial recognition, Abstract). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Yoshioka and Miki by including facial recognition as in Welbourne as the auxiliary authentication method of Yoshioka in order to more accurately identify the user, predictably improving recognition performance, as suggested by Welbourn (Col 2 lines 24-38). Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Yoshioka et al. (US 20080059805) in view of Miki et al. (US 20200227027), in further view of Mequanint et al. (US 20200082062). Consider claim 11, Yoshioka discloses adding the first feature to the registered features in the memory (creating a new dictionary entry which is updated for the user including biometrics information DIN, [0046]). Yoshioka does not specifically mention updating the second feature in the memory by replacing the second feature with the first feature. Miki discloses updating a second feature in a memory by replacing the second feature with a first feature (an existing template is replaced with the latest template created using parameter values obtained from the latest conversation unit, [0103]). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Yoshioka by updating a second feature in a memory by replacing the second feature with a first feature for reasons similar to those for claim 1. Yoshioka and Miki do not specifically mention when the similarity is equal to or larger than the second threshold, restricting both (1) adding the first feature to the registered features in the memory and (2) updating the second feature in the memory by replacing the second feature with the first feature. Mequanint discloses when the similarity is equal to or larger than the second threshold, restricting (when the similarity score for biometric data, which may be voice data, [0055], is above both the authentication threshold and the gradual learning threshold, the process ends rather than save a new template, Fig 4, steps 410, 4414, and 416, [0073]; this is considered “restricting” in the same sense as Applicant’s Fig 2, step S6). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Yoshioka and Miki such that when the similarity is equal to or larger than the second threshold, restricting, as in Mequanint, both (1) adding the first feature to the registered features in the memory as in Yoshioka and (2) updating the second feature in the memory by replacing the second feature with the first feature as in Miki in order to optimize enrolled data during adaptation, as suggested by Mequanint, ([0005]), predictably helping cover large intra-class variations while maintaining memory requirements, as suggested by Mequanint, ([0005]). The references cited are analogous art in the same field of audio processing. Conclusion Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jesse Pullias whose telephone number is 571/270-5135. The examiner can normally be reached on M-F 8:00 AM - 4:30 PM. The examiner’s fax number is 571/270-6135. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Andrew Flanders can be reached on 571/272-7516. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /Jesse S Pullias/ Primary Examiner, Art Unit 2655 12/15/25
Read full office action

Prosecution Timeline

Jan 17, 2024
Application Filed
Aug 15, 2025
Non-Final Rejection — §103
Oct 20, 2025
Examiner Interview Summary
Oct 20, 2025
Applicant Interview (Telephonic)
Nov 13, 2025
Response Filed
Dec 15, 2025
Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12596885
Automatically Labeling Items using a Machine-Trained Language Model
2y 5m to grant Granted Apr 07, 2026
Patent 12573378
SPEECH TENDENCY CLASSIFICATION
2y 5m to grant Granted Mar 10, 2026
Patent 12572740
MULTI-LANGUAGE DOCUMENT FIELD EXTRACTION
2y 5m to grant Granted Mar 10, 2026
Patent 12566929
COMBINING DATA SELECTION AND REWARD FUNCTIONS FOR TUNING LARGE LANGUAGE MODELS USING REINFORCEMENT LEARNING
2y 5m to grant Granted Mar 03, 2026
Patent 12536389
TRANSLATION SYSTEM
2y 5m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
83%
Grant Probability
96%
With Interview (+13.0%)
2y 8m
Median Time to Grant
Moderate
PTA Risk
Based on 1052 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month