Last updated: April 19, 2026
Application No. 18/839,543
APPARATUS AND METHOD FOR PROVIDING SPEECH VIDEO

Non-Final OA §103
Filed
Aug 19, 2024
Examiner
WU, MING HAN
Art Unit
2618
Tech Center
2600 — Communications
Assignee
Deepbrain AI Inc.
OA Round
1 (Non-Final)
Interview Optional

— +23.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 370 resolved cases, 2023–2026
Examiner Intelligence

WU, MING HAN View full profile →
Grants 76% — above average
Career Allow Rate
282 granted / 370 resolved
+14.2% vs TC avg
Strong +23% interview lift
Without
With
+23.3%
Interview Lift
resolved cases with interview
Typical timeline
2y 8m
Avg Prosecution
35 currently pending
Career history
405
Total Applications
across all art units
Statute-Specific Performance

§101
7.8%
-32.2% vs TC avg
§103
68.3%
+28.3% vs TC avg
§102
2.1%
-37.9% vs TC avg
§112
12.6%
-27.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 370 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitation(s) is/are: a speech state image generator to generate in claim 8 a speech voice generator configured to generate in claim 8; a reproducer configured to reproduce the standby in claim 8; a reproducer configured to reproduce in claims 8, 10, and 12; a synthesized speech video generator configured to generate in claims 8 and 14.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.


Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1 to 14 are rejected under 35 U.S.C. 103 as being unpatentable over Guo et al (Publication US 2003/0144055 A1) in view of Chapman et al. (Patent US 2015/0371661 A1 ).

Regarding claim 1, Guo discloses a method for providing a speech video performed by a computing device, the method comprising ([0014], [0017] - FIG. 1 illustrates a suitable computing system environment 100 includes memory stores instruction to be executed by a processor to perform the methods. [0017] a general purpose computing device in the form of a computer 110. [0007] - a speech synthesizer receives input from a user for speech synthesis and provides an audio output signal) ;
reproducing a standby state video in a video file format in which a person in the video is in a standby state (
[0027] the talking head will appear to wait or listen, “standby state “ 
[0088] As generation of the talking head 256 (FIG. 3) is provided by talking head module 256, which provides video output data indicated at block 270 that in turn is rendered by video display 254. Generally, the talking head module 256 implements a video rewrite technique , reproducing, wherein stored frames of video are sequenced selectively in order to provide facial and head movement animation.
[0089] A background video of the talking heads face, neck and shoulders. Frames of continuous video of the talking head 256 with a neutral (i.e., nontalking) facial expressions.) [0027] voice audio as rendered by the output device 206 with a speech synthesizer 210, in conjunction with a talking head rendered on a suitable display can also be provided. Interaction with system 200 simulates a conversation between the user and the rendered image of the talking head. Specifically, the talking head will appear to wait “standby” or listen to the computer user in addition to appear to be talking to the user. 
[0088] The talking head module 256 smoothly switches back and forth between waiting/listening and talking states of the talking head. [0089] As illustrated in FIG. 4, in a first part 300, a background video of the talking heads face, neck and shoulders are provided. Typically the background video 300 is obtained from a relatively short sequence of video frames. However, characteristics or features of the talking head 256 in the video frames are analyzed such that transitions can be formed between nonadjacent frames of the sequence. Referring to FIG. 5, line 304 represents 100 tram es of continuous video of the talking head 256 with a neutral n.e., non-talking) facial expressions.) [0090] a video sequence of the jaw and mouth, which is lip-synced according to the spoken, video file format.
 [0090] triphone sequences of the mouth and jaw) 
generating a plurality of speech state images in which the person in the video is in a speech state and a speech voice based on a source of speech contents during the reproduction of the standby state video ([0090] superimposing upon the background video a video sequence of the jaw and mouth, images, which is lip-synced according to the spoken sentence or voice output, generating . 
[0090] The resulting triphone sequences of the mouth and jaw are then stitched into, or superimposed on the background video, “plurality of speech state images”.
[0027] the talking head will appear to wait or listen, standby state [0036] the dialog manager module 204 [0039] the dialog manager module 204 determines what actions to take regarding presenting information, or if necessary, soliciting further information from the user until the semantic representation is complete. [0039] the voice output can be synchronized with a rendering of a talking head provided by a talking head module 2 46. [0090] text 312 to be spoken [0079] The language generator 242 provides the text that will be converted to voice output through the speech synthesizer 210. [0084] text is transformed into speech and synchronized with the talking head 25, “source of speech contents”. [0088]generation of the talking head 256 (FIG. 3) is provided by talking head module 256, which provides video output data indicated at block 270 that in tum is rendered by video display 254. Generally, the talking head module 256 implements a video rewrite technique, reproducing video, wherein stored frames of video are sequenced selectively in order to provide facial and head movement animation. The frames are sequenced in order to completely simulate a conversation between the talking head 256 and the computer user Video rewrite has been used for creating a talking head to speak individual sentences, the talking head 256 herein provided completes simulation of a conversation by continuing animation of the talking head between sentences, i.e. while the computer user is quiet, for instance, contemplating the next action, or when the computer user is speaking in order to simulate that the talking head 256 is listening. The talking head module 256 smoothly switches back and forth between waiting/listening and talking states of the talking head, “based on a source of speech contents during the reproduction of the standby state video”. ) ;
stopping the reproduction of the standby state video and reproducing a video([0088], [0090] - video rewrite has been used for creating a talking head to speak individual sentences, the talking head 256 herein provided completes simulation of a conversation by continuing animation of the talking head between sentences, i.e. while the computer user is quiet, for instance, contemplating the next action, or when the computer user is speaking in order to simulate that the talking head 256 is listening thus rewrite to reproduce the standby state video.
The talking head module 256 smoothly switches back and forth between waiting/listening and talking states of the talking head thus during switches back and forth, the waiting/listening needs to be “stopped” so talking state can be started after the waiting/listening state. );
generating a synthesized speech video by synthesizing the plurality of speech state images and the speech voice with the standby state video (e.g., 1 [0090] superimposing upon the background video and the speech state image [0090] a video sequence of the jaw and mouth, which is lip-synced according to the spoken sentence or voice output. [0090] The resulting triphone sequences of the mouth and jaw are then stitched into, or superimposed on the background video, “generating a synthesized speech video”. [0090] A second part of the facial animation is obtained by superimposing upon the background video a video sequence of the jaw and mouth, which is lip-synced according to the spoken sentence or voice output. This component is illustrated in FIG. 4 at block 308. Sequences of frames for the jaw and mouth are stored with respect to a number of acoustical units (e.g. phoneme) such as a triphone. An audio output sentence to be spoken is received from the talking head module 256 from the speech synthesizer 210. In particular, as illustrated in FIG. 4, text 312 to be spoken is converted to a sound track or audio signal 314 by speech synthesizer 210. The audio output to be spoken is segmented into acoustic units such as triphones at 316. The shape distance between the triphone to be rendered visually and a labeled triphone in a video module 320 is computed and some of the smallest distances will be selected as candidates. The smallest path from the beginning triphone to the ending triphone is then determined and the selected triphone sequences will then be aligned with the voice output signal. The resulting trip hone sequences of the mouth and jaw are then stitched into, or superimposed on the background video. 
[0027], [0088] - the talking head will appear to wait or listen, standby state wherein stored frames of video are sequenced selectively in order to provide facial and head movement animation. The frames are sequenced in order to completely simulate a conversation between the talking head 256 and the computer user Video rewrite has been used for creating a talking head to speak individual sentences, i.e. while the computer user is quiet, for instance, contemplating the next action, or when the computer user is speaking in order to simulate that the talking head 256 is listening. The talking head module 256 smoothly switches back and forth between waiting/listening and talking states of the talking head. )
Guo does not however Chapman discloses
perform a back motion video in a video file format for returning to a reference frame of the standby state video ([0094] the rendering device is configured to generate in between frames of visual data when transitioning between the rendering of the primary animation data and the alternative clip of animation data “back motion video” Note: a plurality of frames made up a video.
[0094]: "to generate in between frames of visual data when transitioning between the rendering of the primary animation data and the alternative clip of animation data, “clip is video file format”." [0057] calculating frames in between (tweening) by automated interpolation.e.g., [0094]: "the position at which that tap occurred"; NOTE: In otherwords, the frame of the primary animation (e.g., the idle animation loop 1203) displayed when the screen tap occurs is set as a reference frame for transitioning to and from the alternate animation clip, “a back motion video in a video file format” . [0094], [0096] - When a user taps the display during playing of the primary animation (e.g., the animation loop of the character idling, the standby state image. );
generating from the reference frame ([0094]: "to generate in between frames of visual data when transitioning between the rendering of the primary animation data and the alternative clip of animation data. [0057] calculating frames in between (tweening) by automated interpolation.e.g., [0094]: "the position at which that tap occurred"; NOTE: In otherwords, the frame of the primary animation (e.g., the idle animation loop 1203) displayed when the screen tap occurs is set as a reference frame for transitioning to and from the alternate animation clip.

    PNG
    media_image1.png
    472
    450
    media_image1.png
    Greyscale
) .
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify Guo with perform a back motion video in a video file format for returning to a reference frame of the standby state video; generating from the reference frame as taught by Chapman. The motivation for doing is to have less expensive video production. 

Regarding claim 2, see rejection on claim 9.
Regarding claim 3, see rejection on claim 10.
Regarding claim 4, see rejection on claim 11.
Regarding claim 5, see rejection on claim 12.
Regarding claim 6, see rejection on claim 13.
Regarding claim 7, see rejection on claim 14.

Regarding claim 8, Guo discloses an apparatus for providing a speech video ([0014], [0017] - FIG. 1 illustrates a suitable computing system environment 100 includes memory stores instruction to be executed by a processor to perform the methods. [0017] a general purpose computing device in the form of a computer 110. [0007] - a speech synthesizer receives input from a user for speech synthesis and provides an audio output signal. Fig. 4 is a video system.), 
the apparatus comprising a speech state image generator configured to generate a plurality of speech state images based on a source of speech contents during reproduction of a standby state video in which a person in the video is in a standby state ([0090] superimposing upon the background video a video sequence of the jaw and mouth, which is lip-synced according to the spoken sentence or voice output, generate. [0090] The resulting triphone sequences of the mouth and jaw are then stitched into, or superimposed on the background video.
[0027] the talking head will appear to wait or listen, “standby state”. [0036], [0039] the dialog manager module 204, “speech state image generator”, determines what actions to take regarding presenting information, or if necessary, soliciting further information from the user until the semantic representation is complete. [0039] the voice output can be synchronized with a rendering of a talking head provided by a talking head module 2. [0090] text 312 to be spoken [0079] The language generator 242 provides the text that will be converted to voice output through the speech synthesizer 210. [0084] text is transformed into speech and synchronized with the talking head 256 in real time, “speech content”. [0088]generation of the talking head 256 (FIG. 3) is provided by talking head module 256, which provides video output data indicated at block 270 that in tum is rendered by video display 254. Generally, the talking head module 256 implements a video rewrite technique, “reproducing video”, wherein stored frames of video are sequenced selectively in order to provide facial and head movement animation. The frames are sequenced in order to completely simulate a conversation between the talking head 256 and the computer user Video rewrite has been used for creating a talking head to speak individual sentences, the talking head 256 herein provided completes simulation of a conversation by continuing animation of the talking head between sentences, i.e. while the computer user is quiet, for instance, contemplating the next action, or when the computer user is speaking in order to simulate that the talking head 256 is listening. The talking head module 256 smoothly switches back and forth between waiting/listening “standby state” and talking states of the talking head, “based on a source of speech contents during the reproduction of the standby state video in which a person in the video is in a standby state”.) ;
generate a speech voice based on the source of the speech contents during the reproduction of the standby state video ( [0090] text 312 to be spoken [0079] The language generator 242 provides the text that will be converted to voice output, “generate a speech voice”. [0084] text is transformed into speech and synchronized with the talking head 256 in real time, “based on the source of speech content”. [0088] generation of the talking head 256 (FIG. 3) is provided by talking head module 256, which provides video output data indicated at block 270 that in tum is rendered by video display 254. Generally, the talking head module 256 implements a video rewrite technique, “reproducing video”, wherein stored frames of video are sequenced selectively in order to provide facial and head movement animation. The frames are sequenced in order to completely simulate a conversation between the talking head 256 and the computer user Video rewrite has been used for creating a talking head to speak individual sentences, the talking head 256 herein provided completes simulation of a conversation by continuing animation of the talking head between sentences, i.e. while the computer user is quiet, for instance, contemplating the next action, or when the computer user is speaking in order to simulate that the talking head 256 is listening. The talking head module 256 smoothly switches back and forth between waiting/listening and talking states of the talking head, “based on a source of speech contents during the reproduction of the standby state video”.
[0079], [0084], [0088] [0088] generation of the talking head 256 (FIG. 3) is provided by talking head module 256, which provides video output data indicated at block 270 that in tum is rendered by video display 254.
Generally, the talking head module 256 implements a video rewrite technique, “reproduction”, wherein stored frames of video are sequenced selectively in order to provide facial and head movement animation. the frames are sequenced in order to completely simulate a conversation between the talking head 256 and the computer user. The talking head 256 herein provided completes simulation of a conversation by continuing animation of the talking head between sentences, i.e. while the computer user is quiet, for instance, contemplating the next action, or when the computer user is speaking in order to simulate that the talking head 256 is listening. The talking head module 256 smoothly switches back and forth between waiting/listening and talking states of the talking head, “during the reproduction of the standby state video”. )
a reproducer configured to reproduce the standby state video ([0027] the talking head will appear to wait or listen, “standby state” [0027] voice audio as rendered by the output device 206 with a speech synthesizer 210, in conjunction with a talking head rendered on a suitable display can also be provided. Interaction with system 200 simulates a conversation between the user and the rendered image of the talking head. Specifically, the talking head will appear to wait “standby” or listen to the computer user in addition to appear to be talking to the user. [0088] As indicated above, generation of the talking head 256 (FIG. 3) is provided by talking head module 256, “reproducer”, which provides video output data indicated at block 270 that in turn is rendered by video display 254. Generally, the talking head module 256 implements a video rewrite technique , “reproduce”, wherein stored frames of video are sequenced selectively in order to provide facial and head movement animation.), and 
the generation of the plurality of speech state images and the speech voice ([0088] As generation of the talking head 256 (FIG. 3) is provided by talking head module 256, which provides video output data indicated at block 270 that in turn is rendered by video display 254. Generally, the talking head module 256 implements a video rewrite technique , wherein stored frames of video are sequenced selectively in order to provide facial and head movement animation. i.e. while the computer user is quiet, for instance, contemplating the next action, or when the computer user is speaking in order to simulate that the talking head 256 is listening. The talking head module 256 smoothly switches back and forth between waiting/listening and talking states of the talking head.
[0089] A background video of the talking heads face, neck and shoulders. Frames of continuous video of the talking head 256 with a neutral (i.e., nontalking) facial expressions.) [0027] voice audio as rendered by the output device 206 with a speech synthesizer 210, in conjunction with a talking head rendered on a suitable display can also be provided.); 
a synthesized speech video generator configured to generate a synthesized speech video by synthesizing the plurality of speech state images and the speech voice with the standby state video from the reference frame ( [0090] speech synthesizer 210 “synthesized speech video generator’ , superimposing upon the background video and the speech state image [0090] a video sequence of the jaw and mouth, which is lip-synced according to the spoken sentence or voice output. [0090] The resulting triphone sequences of the mouth and jaw are then stitched into, or superimposed on the background video. [0090] A second part of the facial animation is obtained by superimposing upon the background video a video sequence of the jaw and mouth, which is lip-synced according to the spoken sentence or voice output, “generate a synthesized speech video”. This component is illustrated in FIG. 4 at block 308. Sequences of frames for the jaw and mouth are stored with respect to a number of acoustical units (e.g. phoneme) such as a triphone. An audio output sentence to be spoken is received from the talking head module 256 from the speech synthesizer 210. In particular, as illustrated in FIG. 4, text 312 to be spoken is converted to a sound track or audio signal 314 by speech synthesizer 210. The audio output to be spoken is segmented into acoustic units such as triphones at 316. The shape distance between the triphone to be rendered visually and a labeled triphone in a video module 320 is computed and some of the smallest distances will be selected as candidates. The smallest path from the beginning triphone to the ending triphone is then determined and the selected triphone sequences will then be aligned with the voice output signal. The resulting trip hone sequences of the mouth and jaw are then stitched into, or superimposed on the background video. It should be noted the video frame sequences or frames are stored in a database or store 326 however, each of the video frames are typically adjusted to a defined pose. 
[0027], [0088] - the talking head will appear to wait or listen, standby state wherein stored frames of video are sequenced selectively in order to provide facial and head movement animation. The frames are sequenced in order to completely simulate a conversation between the talking head 256 and the computer user Video rewrite has been used for creating a talking head to speak individual sentences, i.e. while the computer user is quiet, for instance, contemplating the next action, or when the computer user is speaking in order to simulate that the talking head 256 is listening. The talking head module 256 smoothly switches back and forth between waiting/listening and talking states of the talking head.).
Guo does not however Chapman discloses
Perform of a standby state video of a video file format ([0094]: "to generate in between frames of visual data when transitioning between the rendering of the primary animation data and the alternative clip of animation data, “clip is video file format”." [0057] calculating frames in between (tweening) by automated interpolation.e.g., [0094]: "the position at which that tap occurred"; NOTE: In otherwords, the frame of the primary animation (e.g., the idle animation loop 1203, “reproduction”) displayed when the screen tap occurs is set as a reference frame for transitioning to and from the alternate animation clip .);
a speech voice generator configured to generate a speech voice ([0009] – production device is configured to produce associated audio data, speech animation.);
stop reproducing the standby state video when the action is completed ([0048] An artist takes the character and places the character in an animation scene. They make an animation loop of the character idling, “standby state”, that is to say just looking around and occasionally blinking. This consists of a few seconds (say two seconds) of animation that can be repeated or looped to fill in time when the character is not actually saying anything. That is “stop” reproducing the animation loop of the character idling loop, “standby state”, when the character is started to say something.
[0051] After creating an idle loop, the artist creates a speech loop, thus is completed.);
perform a back motion video in a video file format for returning to a reference frame of the standby state video ([0094] the rendering device is configured to generate in between frames of visual data when transitioning between the rendering of the primary animation data and the alternative clip of animation data “back motion video” Note: a plurality of frames made up a video.
[0094]: "to generate in between frames of visual data when transitioning between the rendering of the primary animation data and the alternative clip of animation data, “clip is video file format”." [0057] calculating frames in between (tweening) by automated interpolation.e.g., [0094]: "the position at which that tap occurred"; NOTE: In otherwords, the frame of the primary animation (e.g., the idle animation loop 1203) displayed when the screen tap occurs is set as a reference frame for transitioning to and from the alternate animation clip, “a back motion video in a video file format” . [0094], [0096] - When a user taps the display during playing of the primary animation (e.g., the animation loop of the character idling, 'the standby state image.

    PNG
    media_image1.png
    472
    450
    media_image1.png
    Greyscale
 )
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify Guo with Perform of a standby state video of a video file format a speech voice generator configured to generate a speech voice; stop reproducing the standby state video when the action is completed ; perform a back motion video in a video file format for returning to a reference frame of the standby state video as taught by Chapman. The motivation for doing is to have less expensive video production. 

Regarding claim 9, Guo in view of Chapman disclose all the limitation of claim 8.
Chapman discloses includes a plurality of back motion frame sets for image interpolation between each frame of the standby state video and the reference frame (0049] A timeline is split into frames, typically working at thirty frames per second. Consequently, two seconds of animation will require sixty frames to be generated.) for image interpolation between ro [0094] to generate in between frames of visual data when transitioning between the rendering of the primary animation data and the alternative clip of animation data, back motion frame [0057] calculating frames in between (tweening) by automated interpolation.e.g., [0094] the position at which that tap occurred NOTE In other words, the frame of the primary animation (e.g., the idle animation loop 1203) displayed when the screen tap occurs is set as a reference frame for transitioning to and from the alternate animation clip. [0094], [0096] - When a user taps the display during playing of the primary animation (e.g., the animation loop of the character idling, the standby state image .).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify Guo in view of Chapman with includes a plurality of back motion frame sets for image interpolation between each frame of the standby state video and the reference frame as taught by Chapman. The motivation for doing is to have less expensive video production. 

Regarding claim 10, Guo in view of Chapman disclose all the limitation of claim 9.
Guo discloses the generation of the plurality of speech state images and the speech voice ([0088] As generation of the talking head 256 (FIG. 3) is provided by talking head module 256, which provides video output data indicated at block 270 that in turn is rendered by video display 254. Generally, the talking head module 256 implements a video rewrite technique , wherein stored frames of video are sequenced selectively in order to provide facial and head movement animation. i.e. while the computer user is quiet, for instance, contemplating the next action, or when the computer user is speaking in order to simulate that the talking head 256 is listening. The talking head module 256 smoothly switches back and forth between waiting/listening and talking states of the talking head.
[0089] A background video of the talking heads face, neck and shoulders. Frames of continuous video of the talking head 256 with a neutral (i.e., nontalking) facial expressions.) [0027] voice audio as rendered by the output device 206 with a speech synthesizer 210, in conjunction with a talking head rendered on a suitable display can also be provided.); 
Guo discloses perform , when the generating of the plurality of speech state images and the speech voice is completed ([0027], [0036] , [0039] - the dialog manager module 204 determines what actions to take regarding presenting information, or if necessary, soliciting further information from the user until the semantic representation, waiting/listening state or talking state, is complete, plurality of speech state images and the speech voice is completed. [0088] generation of the talking head 256 (FIG. 3) is provided by talking head module 256, which provides video output data indicated at block 270 that in tum is rendered by video display 254. Generally, the talking head module 256 implements a video rewrite technique, generation of the plurality, video, wherein stored frames of video are sequenced selectively in order to provide facial and head movement animation. The frames are sequenced in order to completely simulate a conversation between the talking head 256 and the computer user. Video rewrite has been used for creating a talking head to speak individual sentences, the talking head 256 herein provided completes simulation of a conversation by continuing animation of the talking head between sentences, i.e. while the computer user is quiet, for instance, contemplating the next action, or when the computer user is speaking in order to simulate that the talking head 256 is listening. The talking head module 256 smoothly switches back and forth between waiting/listening and talking states of the talking head, waiting/listening or talking states including images and talking is complete, when the generation of the plurality of speech state images and the speech voice is completed.) 
Chapman discloses
detect, when the action is completed ([0048] An artist takes the character and places the character in an animation scene. They make an animation loop of the character idling, that is to say just looking around and occasionally blinking. This consists of a few seconds (say two seconds) of animation that can be repeated or looped to fill in time when the character is not actually saying anything, “detect”. 
[0051] After creating an idle loop, the artist creates a speech loop.),
a closest frame having a back motion frame set among frames of the standby state video after completion ([0094] the rendering device is configured to generate in between frames of visual data when transitioning between the rendering of the primary animation data and the alternative clip of animation data “back motion video” Note: a plurality of frames made up a video.
[0094]: "to generate in between frames of visual data when transitioning between the rendering of the primary animation data and the alternative clip of animation data. [0057] calculating frames in between (tweening) by automated interpolation.e.g., [0094]: "the position at which that tap occurred, detected a closest frame after the tap completion; NOTE: In otherwords, the frame of the primary animation (e.g., the idle animation loop 1203) displayed when the screen tap occurs is set as a reference frame for transitioning to and from the alternate animation clip, “a back motion video” . [0094], [0096] - When a user taps the display during playing of the primary animation (e.g., the animation loop of the character idling, the standby state image.);
detect a back motion frame set section corresponding to the detected frame in the back motion video ([0094]: the position at which that tap occurred, detect; NOTE: In otherwords, the frame of the primary animation (e.g., the idle animation loop 1203) displayed when the screen tap occurs is set as a reference frame for transitioning to and from the alternate animation clip, frame in the back motion video. When a user taps the display during playing of the primary animation (e.g., the animation loop of the character idling, 'the standby state image, the current animation frame being displayed is set as a reference frame for transitioning to and from the alternate animation loop. e.g., 1l [0093]: The normal animation will then resume where it left off.
[0048]: "an animation loop of the character idling, that is to say just looking around and occasionally blinking. This consists of a few seconds (say two seconds) of animation that can be repeated or looped “corresponding to the detected frame in the back motion video” to fill in time when the character is not actually saying anything.

    PNG
    media_image1.png
    472
    450
    media_image1.png
    Greyscale
 )and 
reproduce the standby state video up to the detected frame and then reproduce the back motion frame set section (
[0094]: the position at which that tap occurred, detect; NOTE: In otherwords, the frame of the primary animation (e.g., the idle animation loop 1203) displayed when the screen tap occurs is set as a reference frame for transitioning to and from the alternate animation clip, frame in the back motion video. When a user taps the display during playing of the primary animation (e.g., the animation loop of the character idling, 'the standby state image, the current animation frame being displayed is set as a reference frame for transitioning to and from the alternate animation loop. e.g., 1l 
[0048]: "an animation loop of the character idling, that is to say just looking around and occasionally blinking. This consists of a few seconds (say two seconds) of animation that can be repeated or looped “reproduce the standby state video” to fill in time when the character is not actually saying anything.
[0093]: The normal animation will then resume where it left off, reproduce the back motion frame set section.


    PNG
    media_image1.png
    472
    450
    media_image1.png
    Greyscale
).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify Guo in view of Chapman with detect, when the action is completed , a closest frame having a back motion frame set among frames of the standby state video after completion; detect a back motion frame set section corresponding to the detected frame in the back motion video and reproduce the standby state video up to the detected frame and then reproduce the back motion frame set sectionas taught by Chapman. The motivation for doing is to have less expensive video production. 

Regarding claim 11, Guo in view of Chapman disclose all the limitation of claim 8.

Chapman discloses wherein the reference frame is a first frame ([0057] An animation is defined by identifying positions for elements data first key frame on a timeline reference frame and a first frame, identifying alternative positions at a second key frame on a time line and calculating tram es in between (tweening) by automated interpolation.
[0050] In the loop, different parts of the model, such as the arms, eyes and head, move in terms of their location, rotation, scale and visibility. All of these are defined by the animation timeline. For example, a part of the animation timeline may contain movement of the head. Thus, in a loop, the head may move up and down twice, for example. To achieve this, it may be necessary to define four key frames in the time line and the remaining frames maybe generated by interpolation.) .
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify Guo in view of Chapman with wherein the reference frame is a first frame as taught by Chapman. The motivation for doing is to have less expensive video production. 

Regarding claim 12, Guo in view of Chapman disclose all the limitation of claim 8.
Chapman discloses repeatedly reproduces the standby state video ([0094] the position at which that tap occurred NOTE In other words, the frame of the primary animation (e.g., the idle animation loop 1203) displayed when the screen tap occurs is set as a reference frame for transitioning to and from the alternate animation clip. [0094], [0096] - When a user taps the display during playing of the primary animation (e.g., the animation loop of the character idling, the standby state image, the current animation frame being displayed is set as a reference frame for transitioning to and from the alternate animation loop. e.g., [0093] The normal animation will then resume where it left off
 [0048] an animation loop of the character idling, that is to say just looking around and occasionally blinking. This consists of a few seconds (say too seconds) of animation that can be repeated or looped to fill in time when the character is not actually saying anything, reproduce).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify Guo in view of Chapman with repeatedly reproduces the standby state video as taught by Chapman. The motivation for doing is to have less expensive video production. 

Regarding claim 13, Guo in view of Chapman disclose all the limitation of claim 8.
Chapman discloses wherein the plurality of speech state images are face images of the person in the video ([0027] the talking head will appear to wait or listen; [0089] a background video of the talking heads face, neck and shoulders; [0089] frames of continuous video of the talking head 256 with a neutral (i.e., nontalking) facial expressions.) [0027] Likewise, in one embodiment, voice audio as rendered by the output device 206 with a speech synthesizer 210, and in one form, in conjunction with a talking head rendered on a suitable display can also be provided. In this manner, interaction with system 200 simulates a conversation between the user and the rendered image of the talking head.).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify Guo in view of Chapman with wherein the plurality of speech state images are face images of the person in the video as taught by Chapman. The motivation for doing is to have less expensive video production. 

Regarding claim 14, Guo in view of Chapman disclose all the limitation of claim 13.
Guo discloses generates the synthesized speech video by replacing a face of the person in the video with each speech state image from the frame and synthesizing the speech state image and the speech voice ([0090] stitched into, or superimposed on “replace”. A second part of the facial animation is obtained by superimposing upon the background video a video sequence of the jaw and mouth, which is lip-synced according to the spoken sentence or voice output. This component is illustrated in FIG. 4 at block 308. Sequences of frames for the jaw and mouth are stored with respect to a number of acoustical units (e.g. phoneme) such as a triphone. An audio output sentence to be spoken is received from the talking head module 256 from the speech synthesizer 210. In particular, as illustrated in FIG. 4, text 312 to be spoken is converled to a sound track or audio signal 314 by speech synthesizer 210. The audio output to be spoken is segmented into acoustic units such as triphones at 316. The shape distance between the triphone to be rendered visually and a labeled triphone in a video module 320 is computed and some of the smallest distances will be selected as candidates. The smallest path from the beginning triphone to the ending triphone is then determined and the selected triphone sequences will then be aligned with the voice output signal. The resulting trip hone sequences of the mouth and jaw are then stitched into, or superimposed on the background video. It should be noted the video frame sequences or frames are stored in a database or store 326; however, each of the video frames are typically adjusted to a defined pose.).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicants disclosure. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Ming Wu whose telephone number is (571)270-0724. The examiner can normally be reached on Monday - Friday 930am - 600pm EST .
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http//www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiners supervisor, Devona Faulk can be reached on 571-272-7515. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http//pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/MING WU/
Primary Examiner, Art Unit 2618
Read full office action
Prosecution Timeline

Aug 19, 2024
Application Filed
Jan 24, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/470,698
Patent 12597109
SYSTEMS AND METHODS FOR GENERATING THREE-DIMENSIONAL MODELS USING CAPTURED VIDEO
2y 5m to grant Granted Apr 07, 2026
18/436,674
Patent 12579702
METHOD AND SYSTEM FOR ADAPTING A DIFFUSION MODEL
2y 5m to grant Granted Mar 17, 2026
18/551,392
Patent 12579623
IMAGE PROCESSING METHOD AND APPARATUS, ELECTRONIC DEVICE, AND READABLE STORAGE MEDIUM
2y 5m to grant Granted Mar 17, 2026
18/387,825
Patent 12567185
Method and system of creating and displaying a visually distinct rendering of an ultrasound image
2y 5m to grant Granted Mar 03, 2026
18/490,325
Patent 12548202
TEXTURE COORDINATE COMPRESSION USING CHART PARTITION
2y 5m to grant Granted Feb 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
76%
Grant Probability
99%
With Interview (+23.3%)
2y 8m
Median Time to Grant
Low
PTA Risk
Based on 370 resolved cases by this examiner. Grant probability derived from career allow rate.