DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant's arguments filed 26 November 2025 have been fully considered but they are not persuasive.
Applicant’s arguments with respect to the prior art have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 6, 7, 12, 13, 18, 19, 24, 25, and 30 are rejected under 35 U.S.C. 103 as being unpatentable over Song (WO 2021052224 A1, herein represented by U.S. Publication 2021/0357625) in view of Park (U.S. Publication 2021/0357088).
As to claim 1, Song discloses a processor (fig. 8, element 82; p. 10, sections 0174-0178; a processor is configured to execute instructions from a medium, such as a memory, which also stores data for performing a programmed method), comprising:
one or more circuits to use a one or more neural networks to generate a first plurality of video frames of a person uttering speech based on speech information corresponding to the speech and based on an image corresponding to the user (fig. 5; p. 7, sections 0101-0112; p. 7, section 0120; a first neural network is trained using an audio clip with voice/speech information, to generate face key point information corresponding to input video data of a person’s face and speech).
Song does not expressly disclose, but Park discloses that the generation is without corresponding video frames and modifying the generated first plurality of video frames to produce a second plurality of video frames of a representation of a user uttering the speech based on a single image corresponding to the user (fig. 3; fig. 6-7; p. 4, sections 0065-0070; p. 5, sections 0076-0077; p. 6, section 0088; p. 7, sections 0102-0108; p. 8, sections 0115-0116; generation of a character animation based on driving information including mouth movements and words for a character to speak is performed; at this stage, the character is synthetic like the characters in the figures and not based on input video; later, a user inputs “a facial image” and movements, which would include mouth motions associated with speaking, are mapped such that the input facial image replaces one of the synthetic character images in the animation). The motivation for this is to allow more personal and realistic content so that a user can experience the content more dynamically (p. 1, section 0005). It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to modify Song to generate without corresponding video frames and modify the generated first plurality of video frames to produce a second plurality of video frames of a representation of a user uttering the speech based on a single image corresponding to the user in order to allow more personal and realistic content so that a user can experience the content more dynamically as taught by Park.
As to claim 6, Song discloses wherein the plurality of video frames is representative of an amount of emotion or pattern of speech determined from the speech information (fig. 3; p. 3, sections 0046-0052; p. 4, section 0060; p. 9, section 0146-0147; the voice/speech information is analyzed to determine a facial expression representing a particular emotional state; the inpainted image from the second neural network uses the expression information and video is generated from the inpainted image).
As to claim 7, see the rejection to claim 1.
As to claim 12, see the rejection to claim 6.
As to claim 13, see the rejection to claim 1.
As to claim 18, see the rejection to claim 6.
As to claim 19, see the rejection to claim 1.
As to claim 24, see the rejection to claim 6.
As to claim 25, see the rejection to claim 1.
As to claim 30, see the rejection to claim 6.
Claims 2-5, 8-11, 14-17, 20-23, and 26-29 are rejected under 35 U.S.C. 103 as being unpatentable over Song in view of Park and further in view of Liao (U.S. Publication 2021/0390748).
As to claim 2, Song does not disclose, but Liao does disclose wherein the plurality of frames includes a representation of one or more three-dimensional character models uttering a corresponding portion of the speech information (p. 1, section 0006; p. 6, sections 0065-0067; p. 6-7, section 0070; p. 7, section 0072; p. 7, section 0079-p. 8, section 0083; using a neural network, a 3D skeleton model is used to create poses for video frames corresponding to spoken voice/speech information). The motivation for this is to produce output video with synchronized, realistic, and expressive body dynamics at low cost (p. 1, sections 0003-0004). It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to modify Song and Park to have the plurality of frames include a representation of one or more three-dimensional character models uttering a corresponding portion of the speech information in order to produce output video with synchronized, realistic, and expressive body dynamics at low cost as taught by Liao.
As to claim 3, Song does not disclose, but Liao does disclose wherein the plurality of frames includes a representation of the one or more users uttering the corresponding portion of the speech information as represented by the one or more three-dimensional character models in the plurality of video frames (p. 1, sections 0006-0007; p. 5, section 0059; p. 6, sections 0065-0067; p. 6-7, section 0070; p. 7, section 0072; p. 7, section 0079-p. 8, section 0083; the 3D body model first video information, input video of a person, and speech-to-text information are used to generate video frames of a speaking person). Motivation for the combination is given in the rejection to claim 2.
As to claim 4, Song does not disclose, but Liao does disclose wherein the one or more neural networks is trained to correlate key points between the one or more three-dimensional character models represented in the plurality of video frames and at least one of shape information or pose information for the user (p. 1, section 0006; p. 5, section 0060; p. 6, section 0062; p. 6, section 0064-0067; correspondence between points in the 3D model and 2D point positions of a person speaking in input video that correspond to projected pose information is found using the neural network). Motivation for the combination is given in the rejection to claim 2.
As to claim 5, Song does not disclose, but Liao does disclose wherein the one or more circuits are further to use the one or more neural networks to synthesize the speech information as voice information from text (fig. 3, element 325; p. 4-5, section 0053; p. 5, section 0056; p. 5, section 0060; a neural network is shown that trains speech video synthesis and mapping from text input). Motivation for the combination is given in the rejection to claim 2.
As to claim 8, see the rejection to claim 2.
As to claim 9, see the rejection to claim 3.
As to claim 10, see the rejection to claim 4.
As to claim 11, see the rejection to claim 5.
As to claim 14, see the rejection to claim 2.
As to claim 15, see the rejection to claim 3.
As to claim 16, see the rejection to claim 4.
As to claim 17, see the rejection to claim 5.
As to claim 20, see the rejection to claim 2.
As to claim 21, see the rejection to claim 3.
As to claim 22, see the rejection to claim 4.
As to claim 23, see the rejection to claim 5.
As to claim 26, see the rejection to claim 2.
As to claim 27, see the rejection to claim 3.
As to claim 28, see the rejection to claim 4.
As to claim 29, see the rejection to claim 5.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AARON M RICHER whose telephone number is (571)272-7790. The examiner can normally be reached 9AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, King Poon can be reached at (571)272-7440. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/AARON M RICHER/Primary Examiner, Art Unit 2617