Prosecution Insights
Last updated: April 19, 2026
Application No. 18/777,278

CENTRALIZED SYNTHETIC SPEECH DETECTION SYSTEM USING WATERMARKING

Non-Final OA §102§103
Filed
Jul 18, 2024
Examiner
THOMAS-HOMESCU, ANNE L
Art Unit
2656
Tech Center
2600 — Communications
Assignee
Pindrop Security Inc.
OA Round
1 (Non-Final)
77%
Grant Probability
Favorable
1-2
OA Rounds
2y 8m
To Grant
99%
With Interview

Examiner Intelligence

Grants 77% — above average
77%
Career Allow Rate
276 granted / 360 resolved
+14.7% vs TC avg
Strong +37% interview lift
Without
With
+36.7%
Interview Lift
resolved cases with interview
Typical timeline
2y 8m
Avg Prosecution
34 currently pending
Career history
394
Total Applications
across all art units

Statute-Specific Performance

§101
16.7%
-23.3% vs TC avg
§103
50.7%
+10.7% vs TC avg
§102
19.9%
-20.1% vs TC avg
§112
7.5%
-32.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 360 resolved cases

Office Action

§102 §103
DETAILED ACTION 1. This communication is in response to the Application filed on 18 July 2024. Claims 1-20 are pending and have been examined. 2. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claim Rejections - 35 USC § 102 The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action: A person shall be entitled to a patent unless – (a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention. Claim(s) 1-7, 10-17, and 20 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by US 20240221763, hereinafter referred to as Ginsburg. Regarding claim 1, Ginsburg discloses a computer-implemented method comprising: obtaining, by a computer, an audio signal including synthetic speech (“In at least one embodiment, this can include providing the text input to a neural network trained to generate audio data that includes a synthetic representation of a person speaking or uttering the speech input,” Ginsburg, para [0024].); extracting, by the computer, metadata from a watermark of the audio signal by applying a set of keys associated with a plurality of text-to-speech (TTS) services to the audio signal (“Where a key is used for the watermark, this can be used to determine that the audio did not come from a specific source, which may be an indication that the content should not be relied upon or may otherwise be untrustworthy. In situations where a watermark can be detected but not verified, or does not match the expected key or watermark content, an indication can be provided as well that this content may not be trustworthy,” Ginsburg, para [0028].), the metadata indicating an origin of the synthetic speech in the audio signal (Ginsburg, para [0028]. The watermark contains information (i.e., metadata) indicating an origin (i.e., specific source).); and generating, by the computer, based on the metadata as extracted from the watermark, a notification indicating that the audio signal includes the synthetic speech (“Once such a watermark is detected, an indication can be provided 418 that the presentation includes at least some amount of synthesized speech audio. This can include, for example, displaying an icon, notification, or warning, providing an audio noise or sound, or providing haptic feedback, among other such options,” Ginsburg, para [0034]. Here, a notification is displayed based on the information (i.e., metadata) that the speech audio contains synthetic speech.). As to claim 11, system claim 11 and method claim 1 are related as method and system of using same, with each claimed element’s function corresponding to the method step. Accordingly claim 11 is similarly rejected under the same rationale as applied above with respect to method claim. And, Ginsburg, fig. 9(902) teaches a processor. Regarding claim 2, Ginsburg discloses the computer-implemented method of claim 1, further comprising generating a score for each key of the set of keys to determine that the audio signal includes the watermark, wherein the watermark was generated using the key of the set of keys (“In situations where a specified key is used for the watermark, a watermark detection process 226 can identify or locate the appropriate key, such as from a key repository 228, and can analyze the audio data to determine whether a watermark corresponding to that key can be identified with at least a minimum level or threshold of confidence or certainty. This can include determining whether a signal pattern matching the key is detected in the corresponding audio signal, such as is discussed in more detail elsewhere herein. If such a watermark can be detected, then the voice application 224 can determine that the audio includes at least some synthetic data, and can provide an indication of synthetic data when providing the audio output for presentation via an output device 230, such as a speaker or playback system,” Ginsburg, para [0028]. The confidence/certainty is interpreted as a score.). As to claim 12, system claim 12 and method claim 2 are related as method and system of using same, with each claimed element’s function corresponding to the method step. Accordingly claim 12 is similarly rejected under the same rationale as applied above with respect to method claim. And, Ginsburg, fig. 9(902) teaches a processor. Regarding claim 3, Ginsburg discloses the computer-implemented method of claim 2, further comprising transmitting the key to a TTS service to generate the watermark (“An audio watermark signal can be generated 408 using this key, such as may encode the key into an audio watermark signal using spread spectrum-based watermarking. This audio watermark signal can then be added to, or embedded 410 within, the audio signal or data containing the synthesized speech audio,” Ginsburg, para [0033].). As to claim 13, system claim 13 and method claim 3 are related as method and system of using same, with each claimed element’s function corresponding to the method step. Accordingly claim 13 is similarly rejected under the same rationale as applied above with respect to method claim. And, Ginsburg, fig. 9(902) teaches a processor. Regarding claim 4, Ginsburg discloses the computer-implemented method of claim 1, wherein the metadata includes one or more of a service identifier of a TTS service (“In addition to watermarking, other authenticity verification mechanisms can be used. For example, watermarks can be added that are sequential in nature, such as monotonically increasing in value with regular periodicity, such that it can be determined whether any portions of the audio were added, removed, or reordered… Various other information may be encoded or included as well, as may relate to voice or source identity, source location, source IP address, synthesis tool, account number, user identifier, digital signature, or other such information, which may be helpful in identifying or verifying a source,” Ginsburg, para [0031]. Here, the “other information” is interpreted as metadata.), a model identifier of a TTS model, a user identifier of a user of the TTS service (“In addition to watermarking, other authenticity verification mechanisms can be used. For example, watermarks can be added that are sequential in nature, such as monotonically increasing in value with regular periodicity, such that it can be determined whether any portions of the audio were added, removed, or reordered. This information can also, or alternatively, be included as side information, such as may be stored in metadata, although this may not be as secure. Various other information may be encoded or included as well, as may relate to voice or source identity, source location, source IP address, synthesis tool, account number, user identifier, digital signature, or other such information, which may be helpful in identifying or verifying a source,” Ginsburg, para [0031].), or a timestamp indicating when the synthetic speech was generated. As to claim 14, system claim 14 and method claim 4 are related as method and system of using same, with each claimed element’s function corresponding to the method step. Accordingly claim 14 is similarly rejected under the same rationale as applied above with respect to method claim. And, Ginsburg, fig. 9(902) teaches a processor. Regarding claim 5, Ginsburg discloses the computer-implemented method of claim 1, further comprising transmitting an alert to a TTS service based on the origin of the synthetic speech in the audio signal (“In addition to watermarking, other authenticity verification mechanisms can be used. For example, watermarks can be added that are sequential in nature, such as monotonically increasing in value with regular periodicity, such that it can be determined whether any portions of the audio were added, removed, or reordered. This information can also, or alternatively, be included as side information, such as may be stored in metadata, although this may not be as secure. Various other information may be encoded or included as well, as may relate to voice or source identity, source location, source IP address, synthesis tool, account number, user identifier, digital signature, or other such information, which may be helpful in identifying or verifying a source,” Ginsburg, para [0031].). As to claim 15, system claim 15 and method claim 5 are related as method and system of using same, with each claimed element’s function corresponding to the method step. Accordingly claim 15 is similarly rejected under the same rationale as applied above with respect to method claim. And, Ginsburg, fig. 9(902) teaches a processor. Regarding claim 6, Ginsburg discloses the computer-implemented method of claim 1, wherein the notification includes a portion of the metadata as extracted from the watermark (“In this example, the corresponding display that is presented via a presentation device 118 can provide or include an indication that the audio data being presented includes at least some modified or synthesized content. This can include, for example, content 158 stating that the content is at least partially synthesized, or can include an icon 156 or graphical element indicating presence of synthetic content, among other such options,” Ginsburg, para [0025].). As to claim 16, system claim 16 and method claim 6 are related as method and system of using same, with each claimed element’s function corresponding to the method step. Accordingly claim 16 is similarly rejected under the same rationale as applied above with respect to method claim. And, Ginsburg, fig. 9(902) teaches a processor. Regarding claim 7, Ginsburg discloses the computer-implemented method of claim 1, further comprising: receiving, by the computer, from the origin of the synthetic speech, the audio signal including the watermark (Ginsburg, para [0028].); determining, by the computer, that a robustness of the watermark exceeds a predetermined threshold (“In situations where a specified key is used for the watermark, a watermark detection process 226 can identify or locate the appropriate key, such as from a key repository 228, and can analyze the audio data to determine whether a watermark corresponding to that key can be identified with at least a minimum level or threshold of confidence or certainty. This can include determining whether a signal pattern matching the key is detected in the corresponding audio signal, such as is discussed in more detail elsewhere herein. If such a watermark can be detected, then the voice application 224 can determine that the audio includes at least some synthetic data, and can provide an indication of synthetic data when providing the audio output for presentation via an output device 230, such as a speaker or playback system,” Ginsburg, para [0028]. The confidence/certainty is interpreted as a score.); and transmitting an approval of the watermark to the origin of the synthetic speech (“Once such a watermark is detected, an indication can be provided 418 that the presentation includes at least some amount of synthesized speech audio. This can include, for example, displaying an icon, notification, or warning, providing an audio noise or sound, or providing haptic feedback, among other such options. In some instances, the presentation application or platform may ask a user, viewer, or participant whether to proceed with the presentation, or may block playback of the presentation with synthesized audio, among other such options,” Ginsburg, para [0028].). As to claim 17, system claim 17 and method claim 7 are related as method and system of using same, with each claimed element’s function corresponding to the method step. Accordingly claim 17 is similarly rejected under the same rationale as applied above with respect to method claim. And, Ginsburg, fig. 9(902) teaches a processor. Regarding claim 10, Ginsburg discloses the computer-implemented method of claim 1, further comprising: obtaining, by the computer, a second audio signal including second synthetic speech (“In at least one embodiment, this can include providing the text input to a neural network trained to generate audio data that includes a synthetic representation of a person speaking or uttering the speech input,” Ginsburg, para [0024].); extracting, by the computer, second metadata from a second watermark of the second audio signal (“Where a key is used for the watermark, this can be used to determine that the audio did not come from a specific source, which may be an indication that the content should not be relied upon or may otherwise be untrustworthy. In situations where a watermark can be detected but not verified, or does not match the expected key or watermark content, an indication can be provided as well that this content may not be trustworthy,” Ginsburg, para [0028].), the second metadata indicating a second origin of the second synthetic speech that is different from the origin of the audio signal including the synthetic speech (Ginsburg, para [0028]. The watermark contains information (i.e., metadata) indicating an origin (i.e., specific source).); and generating, by the computer, a second notification indicating that the second audio signal includes the second synthetic speech (“Once such a watermark is detected, an indication can be provided 418 that the presentation includes at least some amount of synthesized speech audio. This can include, for example, displaying an icon, notification, or warning, providing an audio noise or sound, or providing haptic feedback, among other such options,” Ginsburg, para [0034]. Here, a notification is displayed based on the information (i.e., metadata) that the speech audio contains synthetic speech. The examiner notes that there is nothing in the claim language to preclude the same passages of Ginsburg being applied to another audio speech signal.). As to claim 20, system claim 20 and method claim 10 are related as method and system of using same, with each claimed element’s function corresponding to the method step. Accordingly claim 20 is similarly rejected under the same rationale as applied above with respect to method claim. And, Ginsburg, fig. 9(902) teaches a processor. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claim(s) 8-9 and 18-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over US 20240221763, hereinafter referred to as Ginsburg, in view of US 20220414244, hereinafter referred to as Atluri et al. Regarding claim 8, Ginsburg discloses the computer-implemented method of claim 1, but not wherein the watermark includes a consent watermark, and wherein the notification indicates usage consent parameters of the consent watermark. Atluri et al. is cited to disclose wherein the watermark includes a consent watermark, and wherein the notification indicates usage consent parameters of the consent watermark (“Processing proceeds to step 464, where encrypt mod 522, attaches the identifier and encrypts the metadata. The metadata is embedded with the image information and the watermark is embedded in the image file. Some watermarks are visible to the receiver while some watermarks are only readable by the operating system for determining the consent level, or consent parameter, of the image file,” Atluri et al., para [0057].). Alturi et al. benefits Ginsburg by allowing a sending to control the sharing of content with a recipient, thereby increasing security of delivered content. Therefore, it would be obvious for one skilled in the art to combine the teachings of Ginsburg with those of Alturi et al. to better control the secure delivery of synthesized speech content. As to claim 18, system claim 18 and method claim 8 are related as method and system of using same, with each claimed element’s function corresponding to the method step. Accordingly claim 18 is similarly rejected under the same rationale as applied above with respect to method claim. And, Ginsburg, fig. 9(902) teaches a processor. Regarding claim 9, Ginsburg discloses the computer-implemented method of claim 1, but not wherein the watermark includes an authorization watermark, and wherein the notification indicates authorization parameters of the authorization watermark. Atluri et al. is cited to disclose wherein the watermark includes an authorization watermark, and wherein the notification indicates authorization parameters of the authorization watermark (“Processing proceeds to step 464, where encrypt mod 522, attaches the identifier and encrypts the metadata. The metadata is embedded with the image information and the watermark is embedded in the image file. Some watermarks are visible to the receiver while some watermarks are only readable by the operating system for determining the consent level, or consent parameter, of the image file,” Atluri et al., para [0057]. Here, “consent” is considered equivalent to “authorization”.). Alturi et al. benefits Ginsburg by allowing a sending to control the sharing of content with a recipient, thereby increasing security of delivered content. Therefore, it would be obvious for one skilled in the art to combine the teachings of Ginsburg with those of Alturi et al. to better control the secure delivery of synthesized speech content. As to claim 19, system claim 19 and method claim 9 are related as method and system of using same, with each claimed element’s function corresponding to the method step. Accordingly claim 19 is similarly rejected under the same rationale as applied above with respect to method claim. And, Ginsburg, fig. 9(902) teaches a processor. Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. See attached PTO-892. In particular, the examiner notes Davis et al., Graham, and Jacobson as describing watermarks and metadata applied to synthesized speech. Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANNE L THOMAS-HOMESCU whose telephone number is (571 )272-0899. The examiner can normally be reached on Mon-Fri 8-6. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Bhavesh M Mehta can be reached on 5712727 453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see https://ppairmy. uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571- 272-1000. /ANNE L THOMAS-HOMESCU/Primary Examiner, Art Unit 2656
Read full office action

Prosecution Timeline

Jul 18, 2024
Application Filed
Jan 15, 2026
Non-Final Rejection — §102, §103
Apr 16, 2026
Examiner Interview Summary
Apr 16, 2026
Applicant Interview (Telephonic)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12592241
METHOD AND APPARATUS FOR ENCODING AND DECODING AUDIO SIGNAL USING COMPLEX POLAR QUANTIZER
2y 5m to grant Granted Mar 31, 2026
Patent 12591741
VIOLATION PREDICTION APPARATUS, VIOLATION PREDICTION METHOD AND PROGRAM
2y 5m to grant Granted Mar 31, 2026
Patent 12573369
METHOD FOR CONTROLLING UTTERANCE DEVICE, SERVER, UTTERANCE DEVICE, AND PROGRAM
2y 5m to grant Granted Mar 10, 2026
Patent 12561684
Evaluating User Status Via Natural Language Processing and Machine Learning
2y 5m to grant Granted Feb 24, 2026
Patent 12554926
METHOD, DEVICE, COMPUTER EQUIPMENT AND STORAGE MEDIUM FOR DETERMINING TEXT BLOCKS OF PDF FILE
2y 5m to grant Granted Feb 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
77%
Grant Probability
99%
With Interview (+36.7%)
2y 8m
Median Time to Grant
Low
PTA Risk
Based on 360 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month