Last updated: April 19, 2026
Application No. 18/540,549
EFFECTIVE EXTRACTION OF VOICE BIO, ACOUSTIC AND LINGUISTIC MARKERS FROM AN AUDIO SIGNAL FOR SPEAKER IDENTIFICATION

Final Rejection §103
Filed
Dec 14, 2023
Examiner
PULLIAS, JESSE SCOTT
Art Unit
2655
Tech Center
2600 — Communications
Assignee
Yobe Inc.
OA Round
2 (Final)
Interview Optional

— +13.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 1052 resolved cases, 2023–2026
Examiner Intelligence

PULLIAS, JESSE SCOTT View full profile →
Grants 83% — above average
Career Allow Rate
873 granted / 1052 resolved
+21.0% vs TC avg
Moderate +13% lift
Without
With
+13.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 8m
Avg Prosecution
47 currently pending
Career history
1099
Total Applications
across all art units
Statute-Specific Performance

§101
15.0%
-25.0% vs TC avg
§103
50.4%
+10.4% vs TC avg
§102
19.7%
-20.3% vs TC avg
§112
4.9%
-35.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 1052 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
This office action is in response to correspondence 01/29/26 regarding application 18/540,549, in which claims 1, 3-7, 10-16, 18, 19, 21, 23-25 were amended. Claims 1-25 are pending in the application and have been considered.

Response to Arguments
The examiner agrees with Applicant on page 8 that no new matter was added by the amendments to claims 1, 3-7, 10-16, 18, 19, 21, 23-25. 
Amended claims 1 and 23 overcome the 35 U.S.C. 101 rejections of claims 1, 2, 4, 6, 8, 9, and 23-25 as being directed to an abstract idea without significantly more, and so the rejections are withdrawn. Specifically, the independent claims as amended each recite operations that cannot be practically performed as a mental process. 
As noted above, the 35 U.S.C. 101 rejections are withdrawn, and so Applicant’s arguments on pages 8-17 regarding these rejections are moot.
Applicant’s arguments on pages 15-16 regarding the 35 U.S.C. 102(a)(1) rejections based on Sharifi and the amended “a representation of a spectral peak or a format” claim language have been considered but are moot in view of new grounds for rejection, necessitated by Applicant’s amendments. 
Applicant’s arguments on pages 16-17 regarding the 35 U.S.C. 102(a)(1) rejections based on Sharifi and “enhancing the plurality of audio signals with the first set of acoustic characteristics” have been considered but are not persuasive. Applicant argues that the enhanced endpointer signal of Sharifi only enhances endpoint detection, not the underlying audio signals. In response, it is noted that Sharifi explicitly calls signal 121 “enhanced endpointer signal 121”, not “enhanced endpoints”. See Sharifi Col 5 lines 44-47 and Fig. 1. If Applicant’s enhancement is somehow different from Sharifi’s, it is not clear from the particular claim language, which Sharifi appears to explicitly disclose. It is also noted that in Fig. 1, both general endpointer 109 and enhanced endpointer 121 are generated from audio input 103, and certainly may be considered enhancements in the sense that they emphasize desired parts of the input audio signal and suppress undesired segments of the audio signal. 
Applicant’s arguments on pages 17-18 regarding Wang and Headings are similar to those addressed above, and are either moot in view of the new grounds for rejection or not persuasive for similar reasons. 


Claim Objections
In claim 21, line 5, should “profiles” be “profile”?

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 1-11, 13-21, and 23-25 are rejected under 35 U.S.C. 103 as being unpatentable over Sharifi, Matthew (US 8843369) in view of Gainsboro et al. (US 20070071206).

Consider claim 1, Sharifi discloses a method of voice isolation (enhanced endpointing isolates utterances 133 and 139 from utterance 136, Col 5 lines 41-47, Fig. 1), comprising: 
receiving a plurality of audio signals comprising a homogeneous voice data group (audio input of utterances 133-139, stored as audio signals, Col 3 lines 45-58, the utterances 133 and 139 making up a homogeneous voice data group as they are uttered by speaker 127, Fig. 1); 
extracting a first set of acoustic characteristics from the plurality of audio signals (computing device generates acoustic features of the audio input, Col 3 lines 65-66); 
enhancing audio signals with the first set of acoustic characteristics (generating enhanced endpointer signal 124, Col 5 lines 34-47, Fig. 1 element 121);
associating a first set of metadata with the first set of acoustic characteristics and the enhanced plurality of audio signals, wherein the first set of metadata comprises a representation of the homogenous voice data group (voice profile generated based on acoustic features of an initial portion of speech represents characteristics of a user’s voice such as pitch and range, Col 4 lines 37-54; this is considered to associate the pitch and range representations with the acoustic features used to generate the profile, and is considered “associated” with first segment of enhanced end pointer signal 121 which corresponds to the first user speaking “OK, computer, remind me to buy milk”, Fig 1); and 
creating a voice biometric profile for the homogeneous voice data group, wherein the voice biometric profile comprises the first set of metadata associated with the homogenous voice data group (generating a voice profile based on the acoustic features and specific speech data, the voice profile uniquely representing the characteristics of a user’s voice such as pitch and range, i.e. biometric information Col 4 lines 42-67).
Sharifi does not specifically mention at least one of a spectral peak or a format associated with the voice.
Gainsboro discloses at least one of a spectral peak or a format associated with the voice (the peaks in the spectrum in Fig. 4 are the formants of speech, [0021], Fig. 4).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Sharifi by including the at least one of a spectral peak or a format associated with the voice disclosed by Gainsboro as metadata in the voice biometric profile along with the pitch and range information disclosed by Sharifi because as Gainsboro describes at [0021], use of spectral characteristics such as formants, which were indicate of the physiology of a particular speaker’s vocal track, to differentiate speakers as voice prints or voice signatures were known in the art. Doing so would have predictably improved the accuracy of speaker identification in multi-party conversations, as suggested by Gainsboro ([0015]). The references cited are analogous art in the same field of speech processing.

Consider claim 13, Sharifi discloses a system for voice differentiation (system that identifies speech endpoints of a particular speaker, Col 5 lines 57-58), comprising: 
an audio receiver configured to receive an analog audio signal (system receives audio data from microphone, Col 5 lines 61-62); 
an analog-to-digital converter configured to convert the analog audio signal received by the audio receiver to a digital audio signal (analog to digital convertor, Col 5 lines 65-66); and 
a biometric computing component (a voice profile uniquely representing the characteristics of a user’s voice such as pitch and range, i.e. biometric information Col 4 lines 42-67, stored in acoustic features database, Col 6 lines 20-27), comprising: 
a processor and a non-transitory computer readable medium with computer executable instructions embedded thereon (processor executing instructions from memory, Col 11 lines 49-54), the computer executable instructions configured to cause the processor to: 
extract acoustic characteristics from the digital audio signal (computing device generates acoustic features of the audio input, Col 3 lines 65-66); 
associate metadata to the extracted acoustic characteristics, wherein the metadata comprises a representation of the digital audio signal (voice profile generated based on acoustic features of an initial portion of speech represents characteristics of a user’s voice such as pitch and range, Col 4 lines 37-54; this is considered to associate the pitch and range representations with the acoustic features used to generate the profile); 
group the metadata into a first voice biometric profile if a first homogenous voice data group is detected in the digital audio signal (generating a voice profile based on the acoustic features and specific speech data, the voice profile uniquely representing the characteristics of a user’s voice such as pitch and range, i.e. biometric information Col 4 lines 42-67, and comparing subsequent portions of audio data to group into a segment within the same endpoints if they are from the same speaker, Col 2 lines 46-56, Fig. 1); 
differentiate the first voice biometric profile from a second voice biometric profile if two homogenous voice data groups are detected in the digital audio signal (determining subsequent speech does not match the generated profile and instead belongs to a different profile using voice profile change detector 112, Col 4 lines 56-64, Fig. 1); and 
suppress audio signal not associated with the first voice biometric profile (output the portions of audio data that correspond to the particular user and remove the portions of the audio data that do not, i.e. that correspond to the second speaker, Col 10 lines 28-33, Fig 1).
Sharifi does not specifically mention at least one broad peak of a formant graph corresponding to the digital audio signal.
Gainsboro discloses at least one broad peak of a formant graph corresponding to the digital audio signal (the peaks in the spectrum in Fig. 4 are broad peaks which are the formants of the digital speech signal, [0021], Fig. 4).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Sharifi by including at least one broad peak of a formant graph corresponding to the digital audio signal as disclosed by Gainsboro as metadata in the voice biometric profile along with the pitch and range information disclosed by Sharifi for reasons similar to those for claim 1.

Consider claim 23, Sharifi discloses a method for differentiating a target audio signal from a plurality of audio signals (that identifies speech endpoints of a particular speaker, Col 5 lines 57-58), the method comprising: 
receiving a plurality of audio signals via an audio input device (audio input 103, Fig. 1, received from microphone, Col 5 lines 61-62); 
extracting acoustic characteristics from each audio signal of the plurality of audio signals (computing device generates acoustic features of the audio input, Col 3 lines 65-66); 
associating each acoustic characteristic to a set of metadata, wherein the set of metadata comprises a representation associated with the acoustic characteristic (voice profile generated based on acoustic features of an initial portion of speech represents characteristics of a user’s voice such as pitch and range, Col 4 lines 37-54; this is considered to associate the pitch and range representations with the acoustic features used to generate the profile); 
grouping each set of metadata with other sets of metadata representative of the target audio signal (generating a voice profile based on the acoustic features and specific speech data, the voice profile uniquely representing the characteristics of a user’s voice such as pitch and range, i.e. biometric information Col 4 lines 42-67, and comparing subsequent portions of audio data to group into a segment within the same endpoints if they are from the same speaker, Col 2 lines 46-56, Fig. 1); and 
differentiating metadata associated with the target audio signal from metadata associated with remaining plurality of audio signals (determining subsequent speech does not match the generated profile and instead belongs to a different profile using voice profile change detector 112, Col 4 lines 56-64, Fig. 1, using profile characteristics such as representations of characteristics of a user’s voice such as pitch and range, Col 4 lines 37-54; this is considered to differentiate the metadata from the metadata in other profiles);
enhancing audio signals associated with the target audio signal based on the differentiated metadata (generating enhanced endpointer signal 124, Col 5 lines 34-47, Fig. 1 element 121, based on determining subsequent speech does not match the generated profile and instead belongs to a different profile using voice profile change detector 112, Col 4 lines 56-64, Fig. 1);
suppressing audio signals not associated with the target audio signal based on the differentiated metadata (enhanced endpointer signal 124, Col 5 lines 34-47, Fig. 1 element 121, suppresses the utterance “Maybe later” based on determining it does not match the generated profile for “OK computer…” and instead belongs to a different profile using voice profile change detector 112, Col 4 lines 56-64, Fig. 1).
Sharifi does not specifically mention at least one of a spectral peak or a format.
Gainsboro discloses at least one of a spectral peak or a format (the peaks in the spectrum in Fig. 4 are the formants of speech, [0021], Fig. 4).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Sharifi by including the at least one of a spectral peak or a format disclosed by Gainsboro as metadata in the voice biometric profile along with the pitch and range information disclosed by Sharifi for reasons similar to those for claim 1. 

Consider claim 2, Sharifi discloses: identifying a target audio signal from the plurality of audio signals based on the voice biometric profile (identifying subsequent speech segments based on the generated profile, Col 4 lines 56-64, Fig. 1).

Consider claim 3, Sharifi discloses: suppressing audio signals not associated with the first set of acoustic characteristics (remove the portions of the audio data that do not correspond to the target speaker, Col 10 lines 28-33, Fig 1).

Consider claim 4, Sharifi discloses: 
receiving the plurality of audio signals comprising a second homogenous voice data group (utterances “maybe” and “later” 136, Fig. 1, Col 3 lines 52-57); 
extracting a second set of acoustic characteristics from the plurality of audio signals (computing device generates acoustic features of the audio input, Col 3 lines 65-66, e.g. those extracted from “Tomorrow” making up a second set, Fig. 1);
enhancing the plurality of audio signals associated with the second set of acoustic characteristics (generating enhanced endpointer signal 124, Col 5 lines 34-47, Fig. 1 element 121, based on determining subsequent speech does not match the generated profile and instead belongs to a different profile using voice profile change detector 112, Col 4 lines 56-64, Fig. 1, which enhances the “Tomorrrow” utterance segment);
suppressing the plurality of audio signals associated with the second set of acoustic characteristics (enhanced endpointer signal 124, Col 5 lines 34-47, Fig. 1 element 121, suppresses the utterance “Maybe later” based on determining it does not match the generated profile for “OK computer…” and instead belongs to a different profile using voice profile change detector 112, Col 4 lines 56-64, Fig. 1);
associating a second set of metadata with the second set of acoustic characteristics (voice profile generated based on acoustic features of an initial portion of speech represents characteristics of a user’s voice such as pitch and range, Col 4 lines 37-54; this is considered to associate the pitch and range representations with the acoustic features used to generate the profile, e.g. for the utterance “Tomorrow”, Fig. 1); and 
creating a second voice biometric profile for the second homogenous voice data group, wherein the second voice biometric profile comprises the second set of metadata associated with the second homogenous voice data group (determining subsequent speech does not match the generated profile and instead belongs to a different generated profile using voice profile change detector 112, Col 4 lines 56-64, Fig. 1, the second voice profile based on the acoustic features and specific speech data from the second user’s utterances, the voice profile uniquely representing the characteristics of the second user’s voice such as pitch and range, i.e. biometric information Col 4 lines 42-67).

Consider claim 5, Sharifi discloses: differentiating the second voice biometric profile from the voice biometric profile by comparing the second set of metadata to the first set of metadata (determining a match based on second language similarities in by comparing the voice profiles, Col 7-8 lines 60-2); and isolating audio signals associated with the voice biometric profile (remove the portions of the audio data that do not correspond to the target speaker profile, Col 10 lines 28-33, Fig 1).

Consider claim 6, Sharifi discloses:
receiving the plurality of audio signals comprising a third homogenous voice data group (the audio signals making up the utterance “tomorrow” 136, Fig. 1, Col 3 lines 52-57); 
extracting a third set of acoustic characteristics from the plurality of audio signals (computing device generates acoustic features of the audio input, Col 3 lines 65-66);
enhancing the plurality of audio signals associated with the third set of acoustic characteristics (generating enhanced endpointer signal 124, Col 5 lines 34-47, Fig. 1 element 121, based on determining subsequent speech does not match the generated profile and instead belongs to a different profile using voice profile change detector 112, Col 4 lines 56-64, Fig. 1, which enhances speech for a third utterance segment);
suppressing the plurality of audio signals associated with the first set of acoustic characteristics and the second set of acoustic characteristics (enhanced endpointer signal 124, Col 5 lines 34-47, Fig. 1 element 121, suppresses the utterances not belonging to the profile of the third utterance segment based on determining it does not match the generated profile for “OK computer…” and instead belongs to a different profile using voice profile change detector 112, Col 4 lines 56-64, Fig. 1);
associating a third set of metadata with the third set of acoustic characteristics (identifying general speech activity and general endpoints based on the acoustic features, Col 4 lines 11-13, Col 4 lines 19-21, and generating specific speech data including specific endpoints, Col 4-5 lines 65-12; those for “Tomorrow” considered a “third set”); and 
creating a third voice biometric profile for the third homogenous voice data group, wherein the third voice biometric profile comprises the third set of metadata associated with the third homogenous voice data group (generating a new, third profile for the subsequent speech, Col 4 lines 56-64, Fig. 1, the third voice profile based on the acoustic features and specific speech data from the third utterances, the voice profile uniquely representing the characteristics of the user voice such as pitch and range, i.e. biometric information Col 4 lines 42-67).

Consider claim 7, Sharifi discloses: differentiating the voice biometric profile from the second and third voice biometric profiles by comparing the first set of metadata to the second and third sets of metadata (determining a match or not based on second language similarities in by comparing the voice profiles, Col 7-8 lines 60-2); and isolating audio signals associated with the voice biometric profile (remove the portions of the audio data that do not correspond to the target speaker profile, Col 10 lines 28-33, Fig 1, the endpoints of “OK computer, remind me to buy milk” considered to isolate this utterance from utterances 136 and 139).

Consider claim 8, Sharifi discloses the voice biometric profile uniquely identifies a target speaker (voice profile specific voice activity detection and enhanced endpointer identifies beginning and ending points of utterances of a particular speaker in the audio data, Col 8 lines 9-12).

Consider claim 9, Sharifi discloses converting each audio signal of the plurality of audio signals to a digital signal via an analog-to digital converter, the digital signal comprising acoustic characteristics extracted from each of the plurality of audio signals (processing the audio data using an analog to digital converter and further sampling the digitized audio data, Col 9 lines 29-33).

Consider claim 10, Sharifi discloses suppressing audio signals not associated with the first set of acoustic characteristics comprises removing metadata not associated with a target voice biometric profile (general endpoints associated with speaker 130 are removed from enhanced endpointer, Fig 1, Col 5 lines 34-47, Col 10 lines 28-33).

Consider claim 11, Sharifi discloses removing metadata not associated with the target voice biometric profile comprises filtering audio signals not associated with the voice biometric profile (enhanced endpointer passes the speech of speaker 127 while removing the speech of speaker 130, thereby “filtering” the audio signals not associated with speaker 127’s profile, Fig 1, Col 5 lines 34-47, Col 10 lines 28-33).

Consider claim 14, Sharifi discloses the two homogenous voice data groups comprise the first homogenous voice data group and a second homogenous voice data group (utterance groups 133 and 136, Fig. 1, Col 4 lines 29-36).

Consider claim 15, Sharifi discloses: a third voice biometric profile (voice profile voice activity detector creates a new profile for each utterance, i.e. a third profile for utterance 139, Col 7-8 lines 57-2), if three homogenous voice data groups are detected in the digital audio signal, wherein the three homogenous voice data groups comprise the first homogenous voice data group, the second homogenous voice data group, and a third homogenous voice data group (the audio corresponding to utterances 133, 136, and 139, Col 5 lines 13-33).

Consider claim 16, Sharifi discloses the biometric computing component is further configured to cause the processor to: differentiate the first biometric profile from the second and third biometric profiles (comparing the profiles using a score based on acoustic features, Col 7-8 lines 60-2); and suppress the audio associated with the second and third voice biometric profiles (remove the portions of the audio data that do not correspond to the target speaker profile, Col 10 lines 28-33, Fig 1; while utterance 139 is not shown removed in the Fig 1 example, the utterance 136 removed is “associated” with the second and third profiles since the profiles have been compared; alternatively, subsequent utterances from speaker 130 are removed).

Consider claim 17, Sharifi discloses the acoustic characteristics are extracted from the digital audio signal at discrete audio frames (acoustic features based on a particular audio frame, Col 3-4 lines 66-3).

Consider claim 18, Sharifi discloses the computer executable instructions are further configured to cause the processor to isolate the metadata associated with the first homogenous voice data group to create the first voice biometric profile (generating a voice profile based on the acoustic features and specific speech data, the voice profile uniquely representing the characteristics of a user’s voice such as pitch and range, i.e. biometric information, a separate, i.e. isolated, profile in acoustic features database, Col 6 lines 61-65, Col 4 lines 42-67).

Consider claim 19, Sharifi discloses the computer executable instructions are further configured to isolate, from the plurality of audio signals, metadata associated with the first homogenous voice data based on the first voice biometric profile (the voice profile a separate, i.e. isolated, profile in acoustic features database, isolating speaker endpoints from speech from other speakers, Col 6 lines 61-65, uniquely representing the characteristics of a user’s voice such as pitch and range, i.e. biometric information, Col 4 lines 42-67).

Consider claim 20, Sharifi discloses the computer executable instructions are further configured to filter metadata not associated with a target voice biometric profile (general endpoints not associated with the profile of speaker 127, Fig 1, Col 5 lines 34-56).

Consider claim 21, Sharifi discloses the computer executable instructions are further configured to cause the processor to enhance an audio signal associated with the first homogenous voice data group by differentiating audio associated with the first voice biometric profile of the first homogenous voice data group from audio associated with the second voice biometric profiles of the second homogenous voice data group (generating enhanced endpointer signal 124, Col 5 lines 34-47, Fig. 1 element 121, while removing the portions of the audio data that do not correspond to the target speaker, Col 10 lines 28-33, Fig 1, identified by comparing subsequent portions of audio data to group into a segment within the same endpoints if they are from the same speaker, Col 2 lines 46-56, Fig. 1).

Consider claim 24, Sharifi discloses the target audio signal comprises a homogenous voice data group (the utterances 133 and 139 making up a homogeneous voice data group as they are uttered by speaker 127, Fig. 1), wherein the homogenous voice data group comprises acoustic characteristics, and wherein the acoustic characteristics are associated with a set of metadata that is grouped into a voice biometric profile that uniquely identifies a target speaker (generating a voice profile based on the acoustic features and specific speech data, the voice profile uniquely representing the characteristics of a user’s voice such as pitch and range, i.e. biometric information Col 4 lines 42-67, comparing subsequent portions of audio data to group into a segment within the same endpoints if they are from the same speaker, Col 2 lines 46-56, Fig. 1).

Consider claim 25, Sharifi discloses the plurality of audio signals comprises a first homogenous voice data group and a second homogenous voice data group (utterance groups 133 and 136, Fig. 1, Col 4 lines 29-36), wherein each homogenous voice data group comprises unique and different acoustic characteristics represented by metadata (characteristics of a user’s voice such as pitch and range, i.e. biometric information Col 4 lines 42-67), and wherein the acoustic characteristics extracted from each homogenous voice data group are grouped into a different voice biometric profiles that identify each speaker as a different homogenous voice data group (generating voice profiles based on the acoustic features and specific speech data, the voice profile uniquely representing the characteristics of a user’s voice such as pitch and range, i.e. biometric information, for the group of utterances from that speaker, Col 6 lines 61-65, Col 4 lines 42-67).



Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Sharifi in view of Gainsboro, in further view of Wang et al. (US 20220301573).

Consider claim 12, Sharifi and Gainsboro do not, but Wang discloses a target audio signal is isolated from the plurality of audio signals via a machine learning method stored in a memory of a user device, the machine learning method configured to isolate the target audio signal associated with the homogenous voice data group (a mask is predicted using voice filter model 112 which uses CNN and RNN, [0065], [0066], which isolates target utterances, [0067]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Sharifi and Gainsboro by such that a target audio signal is isolated from the plurality of audio signals via a machine learning method stored in a memory of a user device, the machine learning method configured to isolate the target audio signal associated with the homogenous voice data group in order to avoid over suppression in the audio signal, as suggested by Wang ([0013]), predictably resulting in decreased ASR error rate, as suggested by Wang ([0013]). The references cited are analogous art in the same field of speech processing. 
	

Claim 22 is rejected under 35 U.S.C. 103 as being unpatentable over Sharifi in view of Gainsboro, in further view of Headings et al. (US 20230402041).

Consider claim 22, Sharifi discloses a transceiver communicatively coupled to a server (server transmits and receives data from client, Col 12 lines 51-62); a voice biometric profile (voice profile uniquely representing the characteristics of a user’s voice such as pitch and range, i.e. biometric information, Col 6 lines 61-65, Col 4 lines 42-67); and the system for voice differentiation identifies a target homogenous voice data group (generating a voice profile based on the acoustic features and specific speech data, the voice profile uniquely representing the characteristics of a user’s voice such as pitch and range, i.e. biometric information Col 4 lines 42-67, comparing subsequent portions of audio data to group into a segment within the same endpoints if they are from the same speaker, Col 2 lines 46-56, Fig. 1).
Sharifi and Gainsboro do not specifically mention the server comprises a plurality of known voice profiles, and wherein the system for voice differentiation identifies a target homogenous voice data group by transmitting a voice biometric profile to the server, matching the transmitted voice biometric profile with a known voice biometric profile, and transmitting an identification of the target homogenous voice data group to the system for voice differentiation from the server.
Headings discloses the server comprises a plurality of known voice profiles (voice profiles in data repository on server 130, [0050]), and transmitting a voice to the server (voice characteristics indicated by the metadata, [0050]), matching the transmitted voice with a known voice profile (matching to a profile in the data repository, [0050]), and transmitting an identification of the target homogenous voice data group to the system for voice differentiation from the server (generating results and providing to user device 110 over network 108, [0050-0051], Fig 1).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Sharifi and Gainsboro such that the server comprises a plurality of known voice profiles, and wherein the system for voice differentiation identifies a target homogenous voice data group by transmitting a voice biometric profile to the server, matching the transmitted voice biometric profile with a known voice biometric profile, and transmitting an identification of the target homogenous voice data group to the system for voice differentiation from the server in order to assist a user who does not recall an individuals’ name, as suggested by Headings, [0001]), predictably reducing the user’s social embarrassment, as suggested by Headings ([0002]). The references cited are analogous art in the same field of speech processing.


Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jesse Pullias whose telephone number is 571/270-5135. The examiner can normally be reached on M-F 8:00 AM - 4:30 PM. The examiner’s fax number is 571/270-6135.

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Andrew Flanders can be reached on 571/272-7516. 

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/Jesse S Pullias/
Primary Examiner, Art Unit 2655                           03/03/26
Read full office action
Prosecution Timeline

Dec 14, 2023
Application Filed
Jul 25, 2025
Non-Final Rejection — §103
Jan 29, 2026
Response Filed
Mar 04, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/385,358
Patent 12596885
Automatically Labeling Items using a Machine-Trained Language Model
2y 5m to grant Granted Apr 07, 2026
17/747,704
Patent 12573378
SPEECH TENDENCY CLASSIFICATION
2y 5m to grant Granted Mar 10, 2026
18/168,450
Patent 12572740
MULTI-LANGUAGE DOCUMENT FIELD EXTRACTION
2y 5m to grant Granted Mar 10, 2026
18/410,097
Patent 12566929
COMBINING DATA SELECTION AND REWARD FUNCTIONS FOR TUNING LARGE LANGUAGE MODELS USING REINFORCEMENT LEARNING
2y 5m to grant Granted Mar 03, 2026
17/838,199
Patent 12536389
TRANSLATION SYSTEM
2y 5m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
83%
Grant Probability
96%
With Interview (+13.0%)
2y 8m
Median Time to Grant
Moderate
PTA Risk
Based on 1052 resolved cases by this examiner. Grant probability derived from career allow rate.