DETAILED ACTION
This communication is in response to the Amendments and Arguments filed on 01/16/2026.
Claims 1-20 are pending and have been examined.
All previous objections/rejections not mentioned in this Office Action have been withdrawn by the examiner.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments with respect to claim(s) 1, 10, and 16 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. The newly cited art of Pearce teaches that specific information about a user can be rendered as an indicator such as a pop-up image or aura over a player’s character, which teaches the amended claim language when combined with Ben-David, Critzer, and Bodolec, where an alert can be displayed if captured audio includes synthetic sound. Please see the updated mappings below for further detail.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-4, 6, and 10-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ben-David et al. (US PG Pub No. 2012/0239387), hereinafter Ben-David, in view of Critzer et al. (U.S. PG Pub No. 2023/0136241), hereinafter Critzer, in view of Bodolec et al. (U.S. Patent No. 11119568), hereinafter Bodolec, and further in view of Pearce (U.S. PG Pub No. 2009/0075738), hereinafter Pearce.
Regarding claims 1, 10, and 16, Ben-David teaches
(claim 1) A computer-implemented method (a method [0034]), comprising:
(claim 10) A collaborative content generation system (a system [0034]), comprising:
(claim 10) one or more processors, and memory including instructions that, when executed by the one or more processors, cause the system to (the system includes at least one processor coupled to memory, where the memory stores program code to be executed [0075-8]):
(claim 16) A system, comprising (a system [0034]):
(claim 16) one or more processors (the system includes at least one processor [0075])
synthesizing audio data to be associated with a digital avatar in a … virtual environment (source speech is synthesized with specific parameters to generated transformed speech, i.e. synthesizing audio data [0068-71], where the speech may be of an online avatar of a user during gaming, i.e. associated with a digital avatar in a virtual environment [0005],[0011]);
encoding an audio watermark into the audio data, the audio watermark corresponding to a selected key (the steganography component encodes the information into the model parameters which are encoded into the transformed speech as steganographic or watermarking data, i.e. encoding an audio watermark into the audio data [0034], and where the parameters encoded by the steganography may be encrypted such that only those who have access to the decipher key can decipher the information, i.e. the audio watermark corresponding to a selected key [0058]);
providing the audio data for presentation with the digital avatar in the … virtual environment (the online avatar speaks with its transformed voice instead of the player’s voice, i.e. providing the audio data for presentation with the digital avatar in a virtual environment [0002],[0005],[0011],[0071]); and
providing, along with the audio data, an indication, …in the virtual environment, that the audio data was synthesized, based in part on detecting a presence of the audio watermark in the audio data (the speech output is provided, i.e. providing along with the audio data, and a detection component detects if the speech includes a steganography signal, i.e. based in part on detecting a presence of the audio watermark in the audio data, and issues an alert if the signal is detected to inform a user that the speech is not the original voice, i.e. an indication in the virtual environment that the audio data was synthesized [0002],[0005],[0011],[0034],[0071-2]).
While Ben-David provides issuing an alert if the signal is not the original voice, Ben-David does not specifically teach that the indication comprises a graphical element, and thus does not teach
an indication, comprising a graphical element …in the … virtual environment that the audio data was synthesized.
Critzer, however, teaches an indication, comprising a graphical element…in the…virtual environment that the audio data was synthesized (when it is determined that the captured audio includes synthetic sound, i.e. audio data was synthesized, a notification is transmitted to a device associated with the call, such as a notification on the display saying “Alert! This call has been identified as suspicious”, i.e. an indication comprising a graphical element in the virtual environment Fig. 1C,[0038-9],[0064]).
Where Ben-David specifically teaches that the speech may be of an online avatar during gaming, where the system is processing the speech to determine if a steganography signal is present, i.e. virtual environment [0005],[0011],[0072].
Ben-David and Critzer are analogous art because they are from a similar field of endeavor in identifying and acting on synthetic audio data. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the issuing an alert if the signal is not the original voice teachings of Ben-David with sending a notification to a called device, including a display as taught by Critzer. It would have been obvious to combine the references to enable the prevention, mitigation, or rectification of consequences resulting from potential fraud associated with a call (Critzer [0037]).
While Ben-David in view of Critzer provides an online game with an avatar, Ben-David in view of Critzer does not specifically teach that the game is a 3D environment, and thus does not teach
a digital avatar in a three-dimensional (3D) virtual environment.
Bodolec, however, teaches a digital avatar in a three-dimensional (3D) virtual environment (a consumer gaming application may render a 3D artificial reality environment in which the user is rendered as an avatar (4:27-60)).
Ben-David, Critzer, and Bodolec are analogous art because they are from a similar field of endeavor in providing virtual interactions between users. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the use of transformed voices in an online game with an avatar teachings of Ben-David, as modified by Critzer, with the online gaming application providing virtual avatars in a 3D artificial reality as taught by Bodolec. It would have been obvious to combine the references to enable users to interact with each other through avatars in an artificial reality environment (Bodolec (1:48-57),(4:27-50)).
While Ben-David in view of Critzer and Bodolec provides presenting virtual graphics related to a user, Ben-David in view of Critzer and Bodolec does not specifically teach an indication of information about a digital avatar displayed next to the avatar in the environment, and thus does not teach
an indication, comprising a graphical element displayed next to the digital avatar in the 3D virtual environment.
Pearce, however, teaches an indication, comprising a graphical element displayed next to the digital avatar in the 3D virtual environment (an indication of another user being flagged, i.e. an indication, may be rendered on or near a player character associated with the other user, such as a pop-up image or aura over a player’s character, i.e. comprising a graphical element displayed next to the digital avatar in the…virtual environment [0009],[0043-7],[0060]).
Where Bodolec specifically teaches that the environment is 3D (4:27-60).
Ben-David, Critzer, Bodolec, and Pearce, are analogous art because they are from a similar field of endeavor in providing virtual interactions between users. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the presenting virtual graphics related to a user teachings of Ben-David, as modified by Critzer and Bodolec, with the rendering of an indicator over a player’s character suggesting specific information as taught by Pearce. It would have been obvious to combine the references to enable users to determine pertinent information about other players visually rather than by searching (Pearce [0047]).
Regarding claims 2, 11, and 19, Ben-David in view of Critzer, Bodolec, and Pearce teaches claims 1, 10, and 16, and Ben-David further teaches
the audio watermark is a spread spectrum watermark encoded periodically into the audio data (the audio steganography includes spread spectrum, i.e. audio watermark is a spread spectrum watermark, where the steganographic information is encoded in a pitch curve by adjusting each frame, i.e. encoded periodically into the audio data [0046],[0053]).
Regarding claims 3, 12, and 20, Ben-David in view of Critzer, Bodolec, and Pearce teaches claims 1, 10, and 16, and Ben-David further teaches
the audio watermark is undetectable by a human ear during the presentation of the audio data (the transformation data is encoded in the transformed speech, i.e. audio watermark, to synthesize the output speech signal with an effect that is practically inaudible to the human ear, i.e. undetectable by a human ear during the presentation of the audio data [0034],[0051-4]).
Regarding claims 4 and 13, Ben-David in view of Critzer, Bodolec, and Pearce teaches claims 1 and 10, and Bodolec further teaches
providing a presentation of a plurality of digital avatars corresponding to a plurality of users in the 3D virtual environment
(claim 4), wherein the 3D virtual environment is generated using a collaborative 3D content generation platform (a consumer gaming application may render a 3D artificial reality environment, i.e. virtual environment is generated using a collaborative three-dimensional (3D) content generation platform, in which multiple users are rendered as avatars Fig. 9,(4:27-60)).
(claim 13), wherein the plurality of participants are allowed to provide instances of captured digital content or synthesized digital content associated with the plurality of avatars (the avatars may have rendered hand-held objects, i.e. synthesized digital content, and may be presented as talking to other avatars, including presenting audio, i.e. plurality of participants are allowed to provide instances of captured digital content (9:3-35),(25:40-26:12)).
Where the motivation to combine is the same as previously presented.
Regarding claims 6 and 14, Ben-David in view of Critzer, Bodolec, and Pearce teaches claims 1 and 10, and Ben-David further teaches
the selected key is associated with a source of the synthesized audio data ((claim 14), and wherein the synthetic audio data includes synthesized speech data)(the transformation parameters for the transformed speech, i.e. synthesized audio data includes speech data, encoded by the steganography may be encrypted using a particular cipher, i.e. selected key is associated with a source, such that only those who have access to the decipher key can decipher the information, i.e. selected key [0056-8]).
Regarding claim 15, Ben-David in view of Critzer, Bodolec, and Pearce, teaches claim 10, and Ben-David further teaches
a system for performing simulation operations (a voice transformation system, i.e. performing simulation operations [0042]);
a system for performing simulation operations to test or validate autonomous machine applications;
a system for rendering graphical output;
a system for performing deep learning operations;
a system implemented using an edge device (the system may operate in a networked environment using connections to one or more remote computers, where a user may provide input to a device coupled to the system, i.e. implemented using an edge device [0075-80]);
a system incorporating one or more Virtual Machines (VMs);
a system implemented at least partially in a data center;
or a system implemented at least partially using cloud computing resources (the system may be provided as a service to a customer over a network, i.e. implemented at least partially using cloud computing resources [0075-80]).
Regarding claim 17, Ben-David in view of Critzer, Bodolec, and Pearce, teaches claim 16, and Ben-David further teaches
use the selected key to identify the audio watermark (the transformation parameters for the transformed speech encoded by the steganography may be encrypted using a particular cipher, i.e. the selected key, such that only those who have access to the decipher key can decipher the information, i.e. identify the watermark [0056-8]); and
generate the indication based at least in part upon a source associated with the selected key (the speech output is provided, and a detection component detects if the speech includes a steganography signal, and issues an alert if the signal is detected to inform a user that the speech is not the original voice and transform the speech back to the original voice, i.e. generate the indication based at least in part upon a source, where the information can be deciphered using the decipher key, i.e. based at least in part upon a source associated with the selected key [0002],[0005],[0011],[0034], [0056-8],[0071-2]).
Regarding claim 18, Ben-David in view of Critzer, Bodolec, and Pearce, teaches claim 16, and Ben-David further teaches
determine, based at least in part upon one or more instances of the audio watermark, a portion of the content corresponding to the synthetic audio data (the speech output is provided, and a detection component detects if the speech includes a steganography signal, i.e. determine based at least in part upon one or more instances of the audio watermark, and issues an alert if the signal is detected to inform a user that the speech is not the original voice, i.e. a portion of the content corresponding to the synthetic audio data [0002],[0005],[0011],[0034],[0058],[0071-2]); and
provide the indication during the presentation of the portion of the content corresponding to the synthetic audio data (the speech output is provided, and a detection component detects if the speech includes a steganography signal and issues an alert if the signal is detected to inform a user that the incoming speech is not the original voice, i.e. provide the indication during the presentation of the portion of the content corresponding to the synthetic audio data [0002],[0005],[0011],[0034],[0040], [0056],[0071-2]).
Claim(s) 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ben-David, in view of Critzer, in view of Bodolec, in view of Pearce, and further in view of Wouters et al. (U.S. PG Pub No. 2021/0050024), hereinafter Wouters.
Regarding claim 5, Ben-David in view of Critzer, Bodolec, and Pearce teaches claim 4.
While Ben-David in view of Critzer, Bodolec, and Pearce, provides a decipher key that can be used to decipher the transformation parameters, Ben-David in view of Critzer, Bodolec, and Pearce, does not specifically teach detecting a watermark pattern corresponding to the selected key, and thus does not teach
determining the selected key that corresponds to the audio data, wherein detecting the audio watermark includes detecting a watermark pattern corresponding to the selected key.
Wouters, however, teaches determining the selected key that corresponds to the audio data, wherein detecting the audio watermark includes detecting a watermark pattern corresponding to the selected key (an audio watermark key is known to both the sender and recipient, i.e. determining the selected key that corresponds to the audio data, where the audio watermark key is used by the audio watermark detection processer to analyze a received synthetic speech signal using the pattern known in the audio watermark key to determine the manner in which the audio watermark signal was embedded, i.e. detecting the audio watermark includes detecting a watermark pattern corresponding to the selected key [0022-3],[0027]).
Ben-David, Critzer, Bodolec, Pearce, and Wouters are analogous art because they are from a similar field of endeavor in embedding and detecting information about users based on input data. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the decipher key that can be used to decipher the transformation parameters teachings of Ben-David, as modified by Critzer, Bodolec, and Pearce, with the use of patterns associated with an audio watermark key to determine how an audio watermark was embedded in a signal as taught by Wouters. It would have been obvious to combine the references to enable a recipient machine to prevent malicious actors who are not in possession of the audio watermark key from detecting and removing the audio watermark signal (Wouters [0022]).
Claim(s) 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ben-David, in view of Critzer, in view of Bodolec, in view of Pearce, and further in view of Tehranchi et al. (U.S. PG Pub No. 2017/0140493), hereinafter Tehranchi.
Regarding claim 7, Ben-David in view of Critzer, Bodolec, and Pearce teaches claim 1.
While Ben-David in view of Critzer, Bodolec, and Pearce, provides detecting a watermark in the audio, Ben-David in view of Critzer, Bodolec, and Pearce, does not specifically teach detecting an irregularity in the watermark, and thus does not teach
detecting an irregularity in a placement, an ordering, or a content of one or more instances of the audio watermark in the audio data; and
providing, during presentation of the audio data, an indication that the audio data has been modified.
Tehranchi, however, teaches detecting an irregularity in a placement, an ordering, or a content of one or more instances of the audio watermark in the audio data (the periodicity and order of detected watermarks in a host audio signal is determined to not have the correct periodicity or order, i.e. detecting an irregularity in the placement, ordering, or content of one or more instances of the audio watermark in the audio data [0095],[0117],[0129]); and
providing, during presentation of the audio data, an indication that the audio data has been modified (when the continuity of a detected signal is different from the authorized continuity, suggesting the audio may have been altered or tampered with, i.e. audio data has been modified, the system displays a warning signal after a certain playback period of the content, i.e. providing during presentation of the audio data an indication [0095],[0117],[0129],[0219],[0225]).
Ben-David, Critzer, Bodolec, Pearce, and Tehranchi are analogous art because they are from a similar field of endeavor in embedding and detecting watermarks in audio signals. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the detecting a watermark in the audio teachings of Ben-David, as modified by Critzer, Bodolec, and Pearce, with detecting a different periodicity or order of watermarks than that which has been authorized as taught by Tehranchi. It would have been obvious to combine the references to enable a smart filter to perform enforcement of a content usage policy as set forth by the content owner or the law (Tehranchi [0220-5]).
Claim(s) 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ben-David, in view of Critzer, in view of Bodolec, in view of Pearce, and further in view of Ping et al. (U.S. PG Pub No. 2021/0118423), hereinafter Ping.
Regarding claim 8, Ben-David in view of Critzer, Bodolec, and Pearce teaches claim 1.
While Ben-David in view of Critzer, Bodolec, and Pearce provides transforming a voice, Ben-David in view of Critzer, Bodolec, and Pearce does not specifically teach the use of a TTS generator, and thus does not teach
the audio data is synthesized using a text-to-speech generator that includes at least one neural network trained to synthesize speech data from input text.
Ping, however, teaches the audio data is synthesized using a text-to-speech generator that includes at least one neural network trained to synthesize speech data from input text (input text can be converted by a neural TTS subcomponent into a speech output audio segment, i.e. the audio data is synthesized using a text-to-speech generator, where the neural TTS is a trained neural network model provided with a text/audio pair during training, i.e. includes at least one neural network trained to synthesize speech data from input text [0021],[0031],[0036],[0039]).
Ben-David, Critzer, Bodolec, Pearce, and Ping are analogous art because they are from a similar field of endeavor in embedding and detecting watermarks in synthesized speech. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the transforming a voice teachings of Ben-David, as modified by Critzer, Bodolec, and Pearce, with the synthesis of speech audio from text as taught by Ping. It would have been obvious to combine the references to enable imperceptible watermarks that are robust to audio manipulation and other processing operations (Ping [0024]).
Claim(s) 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ben-David, in view of Critzer, in view of Bodolec, in view of Pearce, in view of Ping, and further in view of Alameh et al. (U.S. PG Pub No. 2019/0287513), hereinafter Alameh.
Regarding claim 9, Ben-David in view of Critzer, Bodolec, Pearce, and Ping teaches claim 8.
While Ben-David in view of Critzer, Bodolec, Pearce, and Ping provides training a voice transformation using target speech as an input, Ben-David in view of Critzer, Bodolec, Pearce, and Ping does not specifically teach the voice input identifying a voice selection, characteristic, or style, and causing the TTS to generate synthesized audio according to the voice input, and thus does not teach
receiving voice input identifying at least one of a voice selection, voice characteristic, or voice style; and
causing the text-to-speech generator to generate the synthesized audio data according to the received voice input.
Alameh, however, teaches receiving voice input identifying at least one of a voice selection, voice characteristic, or voice style (one or more audible characteristics are extracted from the voice input received by the electronic device, i.e. receiving voice input identifying at least one of a … voice characteristic, or voice style [0103]); and
causing the text-to-speech generator to generate the synthesized audio data according to the received voice input (the user can select a voice-synthesized audio output stream that mimics the voice input, i.e. generate the synthesized audio data according to the received voice input [0103]).
Where Ping teaches that the data is synthesized using a TTS subcomponent trained on an input text/audio pair [0021],[0031],[0036],[0039].
Ben-David, Critzer, Bodolec, Pearce, Ping, and Alameh are analogous art because they are from a similar field of endeavor in embedding and detecting watermarks in synthesized speech. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the training a voice transformation using target speech as an input teachings of Ben-David, as modified by Critzer, Bodolec, Pearce, and Ping, with generating an audio output stream that mimics the voice input of a user as taught by Alameh. It would have been obvious to combine the references to allow customization of voices so that different voice assistants in different electronic devices to distinguish one another, communicate and interact with unique voices, and more easily associate a particular synthetic voice-synthesized audio output with that of its owner (Alameh [0104]).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NICOLE A K SCHMIEDER whose telephone number is (571)270-1474. The examiner can normally be reached 8:00 - 5:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached at (571) 272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/NICOLE A K SCHMIEDER/Primary Examiner, Art Unit 2659