Last updated: April 19, 2026

Application No. 18/777,278

CENTRALIZED SYNTHETIC SPEECH DETECTION SYSTEM USING WATERMARKING

Non-Final OA §102§103

Filed

Jul 18, 2024

Examiner

THOMAS-HOMESCU, ANNE L

Art Unit

2656

Tech Center

2600 — Communications

Assignee

Pindrop Security Inc.

OA Round

1 (Non-Final)

Interview Optional

— +36.7% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 360 resolved cases, 2023–2026

Examiner Intelligence

THOMAS-HOMESCU, ANNE L View full profile →

Grants 77% — above average

Career Allow Rate

276 granted / 360 resolved

+14.7% vs TC avg

Strong +37% interview lift

Without

With

+36.7%

Interview Lift

resolved cases with interview

Typical timeline

2y 8m

Avg Prosecution

34 currently pending

Career history

394

Total Applications

across all art units

Statute-Specific Performance

§101

16.7%

-23.3% vs TC avg

§103

50.7%

+10.7% vs TC avg

§102

19.9%

-20.1% vs TC avg

§112

7.5%

-32.5% vs TC avg

Black line = Tech Center average estimate • Based on career data from 360 resolved cases

Office Action

§102 §103

DETAILED ACTION

1. This communication is in response to the Application filed on 18 July 2024.
Claims 1-20 are pending and have been examined.
2. The present application, filed on or after March 16, 2013, is being examined
under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-7, 10-17, and 20 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by US 20240221763, hereinafter referred to as Ginsburg.

Regarding claim 1, Ginsburg discloses a computer-implemented method comprising: 
obtaining, by a computer, an audio signal including synthetic speech (“In at least one embodiment, this can include providing the text input to a neural network trained to generate audio data that includes a synthetic representation of a person speaking or uttering the speech input,” Ginsburg, para [0024].); 

extracting, by the computer, metadata from a watermark of the audio signal by applying a set of keys associated with a plurality of text-to-speech (TTS) services to the audio signal (“Where a key is used for the watermark, this can be used to determine that the audio did not come from a specific source, which may be an indication that the content should not be relied upon or may otherwise be untrustworthy. In situations where a watermark can be detected but not verified, or does not match the expected key or watermark content, an indication can be provided as well that this content may not be trustworthy,” Ginsburg, para [0028].), the metadata indicating an origin of the synthetic speech in the audio signal (Ginsburg, para [0028]. The watermark contains information (i.e., metadata) indicating an origin (i.e., specific source).); and 

generating, by the computer, based on the metadata as extracted from the watermark, a notification indicating that the audio signal includes the synthetic speech (“Once such a watermark is detected, an indication can be provided 418 that the presentation includes at least some amount of synthesized speech audio. This can include, for example, displaying an icon, notification, or warning, providing an audio noise or sound, or providing haptic feedback, among other such options,” Ginsburg, para [0034]. Here, a notification is displayed based on the information (i.e., metadata) that the speech audio contains synthetic speech.).  

As to claim 11, system claim 11 and method claim 1 are related as method and system of using same, with each claimed element’s function corresponding to the method step. Accordingly claim 11 is similarly rejected under the same rationale as applied above with respect to method claim. And, Ginsburg, fig. 9(902) teaches a processor. 

Regarding claim 2, Ginsburg discloses the computer-implemented method of claim 1, further comprising generating a score for each key of the set of keys to determine that the audio signal includes the watermark, wherein the watermark was generated using the key of the set of keys (“In situations where a specified key is used for the watermark, a watermark detection process 226 can identify or locate the appropriate key, such as from a key repository 228, and can analyze the audio data to determine whether a watermark corresponding to that key can be identified with at least a minimum level or threshold of confidence or certainty. This can include determining whether a signal pattern matching the key is detected in the corresponding audio signal, such as is discussed in more detail elsewhere herein. If such a watermark can be detected, then the voice application 224 can determine that the audio includes at least some synthetic data, and can provide an indication of synthetic data when providing the audio output for presentation via an output device 230, such as a speaker or playback system,” Ginsburg, para [0028]. The confidence/certainty is interpreted as a score.).  

As to claim 12, system claim 12 and method claim 2 are related as method and system of using same, with each claimed element’s function corresponding to the method step. Accordingly claim 12 is similarly rejected under the same rationale as applied above with respect to method claim. And, Ginsburg, fig. 9(902) teaches a processor. 

Regarding claim 3, Ginsburg discloses the computer-implemented method of claim 2, further comprising transmitting the key to a TTS service to generate the watermark (“An audio watermark signal can be generated 408 using this key, such as may encode the key into an audio watermark signal using spread spectrum-based watermarking. This audio watermark signal can then be added to, or embedded 410 within, the audio signal or data containing the synthesized speech audio,” Ginsburg, para [0033].).  

As to claim 13, system claim 13 and method claim 3 are related as method and system of using same, with each claimed element’s function corresponding to the method step. Accordingly claim 13 is similarly rejected under the same rationale as applied above with respect to method claim. And, Ginsburg, fig. 9(902) teaches a processor. 

Regarding claim 4, Ginsburg discloses the computer-implemented method of claim 1, wherein the metadata includes one or more of a service identifier of a TTS service (“In addition to watermarking, other authenticity verification mechanisms can be used. For example, watermarks can be added that are sequential in nature, such as monotonically increasing in value with regular periodicity, such that it can be determined whether any portions of the audio were added, removed, or reordered… Various other information may be encoded or included as well, as may relate to voice or source identity, source location, source IP address, synthesis tool, account number, user identifier, digital signature, or other such information, which may be helpful in identifying or verifying a source,” Ginsburg, para [0031]. Here, the “other information” is interpreted as metadata.), a model identifier of a TTS model, a user identifier of a user of the TTS service (“In addition to watermarking, other authenticity verification mechanisms can be used. For example, watermarks can be added that are sequential in nature, such as monotonically increasing in value with regular periodicity, such that it can be determined whether any portions of the audio were added, removed, or reordered. This information can also, or alternatively, be included as side information, such as may be stored in metadata, although this may not be as secure. Various other information may be encoded or included as well, as may relate to voice or source identity, source location, source IP address, synthesis tool, account number, user identifier, digital signature, or other such information, which may be helpful in identifying or verifying a source,” Ginsburg, para [0031].), or a timestamp indicating when the synthetic speech was generated.  

As to claim 14, system claim 14 and method claim 4 are related as method and system of using same, with each claimed element’s function corresponding to the method step. Accordingly claim 14 is similarly rejected under the same rationale as applied above with respect to method claim. And, Ginsburg, fig. 9(902) teaches a processor. 

Regarding claim 5, Ginsburg discloses the computer-implemented method of claim 1, further comprising transmitting an alert to a TTS service based on the origin of the synthetic speech in the audio signal (“In addition to watermarking, other authenticity verification mechanisms can be used. For example, watermarks can be added that are sequential in nature, such as monotonically increasing in value with regular periodicity, such that it can be determined whether any portions of the audio were added, removed, or reordered. This information can also, or alternatively, be included as side information, such as may be stored in metadata, although this may not be as secure. Various other information may be encoded or included as well, as may relate to voice or source identity, source location, source IP address, synthesis tool, account number, user identifier, digital signature, or other such information, which may be helpful in identifying or verifying a source,” Ginsburg, para [0031].).  
As to claim 15, system claim 15 and method claim 5 are related as method and system of using same, with each claimed element’s function corresponding to the method step. Accordingly claim 15 is similarly rejected under the same rationale as applied above with respect to method claim. And, Ginsburg, fig. 9(902) teaches a processor. 

Regarding claim 6, Ginsburg discloses the computer-implemented method of claim 1, wherein the notification includes a portion of the metadata as extracted from the watermark (“In this example, the corresponding display that is presented via a presentation device 118 can provide or include an indication that the audio data being presented includes at least some modified or synthesized content. This can include, for example, content 158 stating that the content is at least partially synthesized, or can include an icon 156 or graphical element indicating presence of synthetic content, among other such options,” Ginsburg, para [0025].).  

As to claim 16, system claim 16 and method claim 6 are related as method and system of using same, with each claimed element’s function corresponding to the method step. Accordingly claim 16 is similarly rejected under the same rationale as applied above with respect to method claim. And, Ginsburg, fig. 9(902) teaches a processor. 

Regarding claim 7, Ginsburg discloses the computer-implemented method of claim 1, further comprising: 

receiving, by the computer, from the origin of the synthetic speech, the audio signal including the watermark (Ginsburg, para [0028].); 

determining, by the computer, that a robustness of the watermark exceeds a predetermined threshold (“In situations where a specified key is used for the watermark, a watermark detection process 226 can identify or locate the appropriate key, such as from a key repository 228, and can analyze the audio data to determine whether a watermark corresponding to that key can be identified with at least a minimum level or threshold of confidence or certainty. This can include determining whether a signal pattern matching the key is detected in the corresponding audio signal, such as is discussed in more detail elsewhere herein. If such a watermark can be detected, then the voice application 224 can determine that the audio includes at least some synthetic data, and can provide an indication of synthetic data when providing the audio output for presentation via an output device 230, such as a speaker or playback system,” Ginsburg, para [0028]. The confidence/certainty is interpreted as a score.); and 

transmitting an approval of the watermark to the origin of the synthetic speech (“Once such a watermark is detected, an indication can be provided 418 that the presentation includes at least some amount of synthesized speech audio. This can include, for example, displaying an icon, notification, or warning, providing an audio noise or sound, or providing haptic feedback, among other such options. In some instances, the presentation application or platform may ask a user, viewer, or participant whether to proceed with the presentation, or may block playback of the presentation with synthesized audio, among other such options,” Ginsburg, para [0028].).  

As to claim 17, system claim 17 and method claim 7 are related as method and system of using same, with each claimed element’s function corresponding to the method step. Accordingly claim 17 is similarly rejected under the same rationale as applied above with respect to method claim. And, Ginsburg, fig. 9(902) teaches a processor. 

Regarding claim 10, Ginsburg discloses the computer-implemented method of claim 1, further comprising: 

obtaining, by the computer, a second audio signal including second synthetic speech (“In at least one embodiment, this can include providing the text input to a neural network trained to generate audio data that includes a synthetic representation of a person speaking or uttering the speech input,” Ginsburg, para [0024].); 

extracting, by the computer, second metadata from a second watermark of the second audio signal (“Where a key is used for the watermark, this can be used to determine that the audio did not come from a specific source, which may be an indication that the content should not be relied upon or may otherwise be untrustworthy. In situations where a watermark can be detected but not verified, or does not match the expected key or watermark content, an indication can be provided as well that this content may not be trustworthy,” Ginsburg, para [0028].), the second metadata indicating a second origin of the second synthetic speech that is different from the origin of the audio signal including the synthetic speech (Ginsburg, para [0028]. The watermark contains information (i.e., metadata) indicating an origin (i.e., specific source).); and   

generating, by the computer, a second notification indicating that the second audio signal includes the second synthetic speech (“Once such a watermark is detected, an indication can be provided 418 that the presentation includes at least some amount of synthesized speech audio. This can include, for example, displaying an icon, notification, or warning, providing an audio noise or sound, or providing haptic feedback, among other such options,” Ginsburg, para [0034]. Here, a notification is displayed based on the information (i.e., metadata) that the speech audio contains synthetic speech. The examiner notes that there is nothing in the claim language to preclude the same passages of Ginsburg being applied to another audio speech signal.).  

As to claim 20, system claim 20 and method claim 10 are related as method and system of using same, with each claimed element’s function corresponding to the method step. Accordingly claim 20 is similarly rejected under the same rationale as applied above with respect to method claim. And, Ginsburg, fig. 9(902) teaches a processor. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 8-9 and 18-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over US 20240221763, hereinafter referred to as Ginsburg, in view of US 20220414244, hereinafter referred to as Atluri et al.

Regarding claim 8, Ginsburg discloses the computer-implemented method of claim 1, but not wherein the watermark includes a consent watermark, and wherein the notification indicates usage consent parameters of the consent watermark. Atluri et al. is cited to disclose wherein the watermark includes a consent watermark, and wherein the notification indicates usage consent parameters of the consent watermark (“Processing proceeds to step 464, where encrypt mod 522, attaches the identifier and encrypts the metadata. The metadata is embedded with the image information and the watermark is embedded in the image file. Some watermarks are visible to the receiver while some watermarks are only readable by the operating system for determining the consent level, or consent parameter, of the image file,” Atluri et al., para [0057].). Alturi et al. benefits Ginsburg by allowing a sending to control the sharing of content with a recipient, thereby increasing security of delivered content. Therefore, it would be obvious for one skilled in the art to combine the teachings of Ginsburg with those of Alturi et al. to better control the secure delivery of synthesized speech content.  

As to claim 18, system claim 18 and method claim 8 are related as method and system of using same, with each claimed element’s function corresponding to the method step. Accordingly claim 18 is similarly rejected under the same rationale as applied above with respect to method claim. And, Ginsburg, fig. 9(902) teaches a processor. 

Regarding claim 9, Ginsburg discloses the computer-implemented method of claim 1, but not wherein the watermark includes an authorization watermark, and wherein the notification indicates authorization parameters of the authorization watermark. Atluri et al. is cited to disclose wherein the watermark includes an authorization watermark, and wherein the notification indicates authorization parameters of the authorization watermark (“Processing proceeds to step 464, where encrypt mod 522, attaches the identifier and encrypts the metadata. The metadata is embedded with the image information and the watermark is embedded in the image file. Some watermarks are visible to the receiver while some watermarks are only readable by the operating system for determining the consent level, or consent parameter, of the image file,” Atluri et al., para [0057]. Here, “consent” is considered equivalent to “authorization”.). Alturi et al. benefits Ginsburg by allowing a sending to control the sharing of content with a recipient, thereby increasing security of delivered content. Therefore, it would be obvious for one skilled in the art to combine the teachings of Ginsburg with those of Alturi et al. to better control the secure delivery of synthesized speech content.  

As to claim 19, system claim 19 and method claim 9 are related as method and system of using same, with each claimed element’s function corresponding to the method step. Accordingly claim 19 is similarly rejected under the same rationale as applied above with respect to method claim. And, Ginsburg, fig. 9(902) teaches a processor. 

Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. See attached PTO-892. In particular, the examiner notes Davis et al., Graham, and Jacobson as describing watermarks and metadata applied to synthesized speech.
Any inquiry concerning this communication or earlier communications from the
examiner should be directed to ANNE L THOMAS-HOMESCU whose telephone
number is (571 )272-0899. The examiner can normally be reached on Mon-Fri 8-6.
Examiner interviews are available via telephone, in-person, and video
conferencing using a USPTO supplied web-based collaboration tool. To schedule an
interview, applicant is encouraged to use the USPTO Automated Interview Request
(AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner's
supervisor, Bhavesh M Mehta can be reached on 5712727 453. The fax phone number
for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the
Patent Application Information Retrieval (PAIR) system. Status information for
published applications may be obtained from either Private PAIR or Public PAIR.
Status information for unpublished applications is available through Private PAIR only.
For more information about the PAIR system, see https://ppairmy.
uspto.gov/pair/PrivatePair. Should you have questions on access to the Private
PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
If you would like assistance from a USPTO Customer Service Representative or access
to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-
272-1000.

/ANNE L THOMAS-HOMESCU/Primary Examiner, Art Unit 2656

Read full office action

Prosecution Timeline

Jul 18, 2024

Application Filed

Jan 15, 2026

Non-Final Rejection — §102, §103

Apr 16, 2026

Examiner Interview Summary

Apr 16, 2026

Applicant Interview (Telephonic)

Precedent Cases

Applications granted by this same examiner with similar technology

18/502,648

Patent 12592241

METHOD AND APPARATUS FOR ENCODING AND DECODING AUDIO SIGNAL USING COMPLEX POLAR QUANTIZER

2y 5m to grant Granted Mar 31, 2026

18/703,640

Patent 12591741

VIOLATION PREDICTION APPARATUS, VIOLATION PREDICTION METHOD AND PROGRAM

2y 5m to grant Granted Mar 31, 2026

17/765,668

Patent 12573369

METHOD FOR CONTROLLING UTTERANCE DEVICE, SERVER, UTTERANCE DEVICE, AND PROGRAM

2y 5m to grant Granted Mar 10, 2026

18/630,871

Patent 12561684

Evaluating User Status Via Natural Language Processing and Machine Learning

2y 5m to grant Granted Feb 24, 2026

18/526,170

Patent 12554926

METHOD, DEVICE, COMPUTER EQUIPMENT AND STORAGE MEDIUM FOR DETERMINING TEXT BLOCKS OF PDF FILE

2y 5m to grant Granted Feb 17, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

77%

Grant Probability

99%

With Interview (+36.7%)

2y 8m

Median Time to Grant

Low

PTA Risk

Based on 360 resolved cases by this examiner. Grant probability derived from career allow rate.