Last updated: May 29, 2026

Application No. 18/785,109

SYSTEM AND METHOD OF CONVERSATIONAL GAZE CONTROL FOR COMPUTER ANIMATION

Non-Final OA §103

Filed

Jul 26, 2024

Examiner

HE, WEIMING

Art Unit

2611

Tech Center

2600 — Communications

Assignee

Jali Inc.

OA Round

1 (Non-Final)

Interview Optional

— +13.0% interview lift. Interview lift (+13.0%) is below the 15.0% threshold. A written response is recommended.

Based on 414 resolved cases, 2023–2026

Examiner Intelligence

HE, WEIMING View full profile →

Grants 46% of resolved cases

Career Allowance Rate

191 granted / 414 resolved

-15.9% vs TC avg

Moderate +13% lift

Without

With

+13.0%

Interview Lift

resolved cases with interview

Typical timeline

3y 4m

Avg Prosecution

25 currently pending

Career history

453

Total Applications

across all art units

Statute-Specific Performance

§101

0.9%

-39.1% vs TC avg

§103

93.4%

+53.4% vs TC avg

§102

3.2%

-36.8% vs TC avg

§112

1.9%

-38.1% vs TC avg

Black line = Tech Center average estimate • Based on career data from 414 resolved cases

Office Action

§103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 10/28/24 are being considered by the examiner.

CLAIM INTERPRETATION
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitations are: 
“an input module to receive…” “a gaze module to determine…” and “an output module to output …” in Claim 14.
Because these claim limitations are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
Dependent claims 15-18 are interpreted under 35 U.S.C. 112(f) due to dependency of the claims mentioned above and for similar rationale as discussed above.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1-6, 8-9, 14-15 and 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Canales et al. (Real-Time Conversational Gaze Synthesis for Avatars, In ACM SIGGRAPH Conference on Motion, Interaction and Games (MIG ’23), November 15–17, 2023) in view of Colburn et al. (The Role of Eye Gaze in Avatar Mediated Conversational Interfaces).
As to Claim 1, Canales teaches A method of determining conversational gaze control for computer animation of a character (Canales, Abstract), the method executed on a processing unit, the method comprising:
receiving transcripted  speech audio (Canales discloses “an Audio-Technica AT2020 microphone recorded audio in mono, sampled at 44.1kHz” at p. 2);
outputting the trajectories of head motion and gaze for computer animation of the character (Canales discloses “In our work, we focus on developing and evaluating a data-driven approach for animating the eyes of a virtual avatar based on the head motion and audio during a dyadic face-to-face conversation.” at p. 2.)
Canales doesn’t explicitly teach gaze state machine. The combination of Colburn further teaches following limitations:
determining time sequences of gaze transition targets for a series of time-steps using a state machine that resolves between direct focus and aversion at each time-step (Colburn discloses “The stochastic eye gaze models we have developed are summarized in the hierarchical state machine diagrams shown in Figures 1 through 4” under section Hierarchical State Machine; see also Fig 2 below:

    PNG
    media_image1.png
    607
    563
    media_image1.png
    Greyscale
);
determining trajectories of head motion and gaze of the character for each timestep using the determined gaze transition targets (Canales further discloses “To this end, we captured the head motion, eye motion, and audio of several two-party conversations and trained an RNN-based model to predict where an avatar looks in a two-person conversational scenario” in Abstract; “A motion capture system consisting of 15 Optitrack motion capture cameras recorded the head movements (position and orientation) of both performers at 120fps… We opted for an external eye tracker in combination with a motion capture system, as opposed to a head mounted display with an integrated eye tracker, so that the conversational partners could see each other leading to a more natural eye motions” at p. 2. Colburn discloses “While we track the user’s gaze in our study and draw on this information as input to the simulated gaze model for the avatar” at p. 3).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the invention of Canales with the teaching of Colburn so as to use gaze state machine to analyze the stochastic eye gaze models (Colburn, p. 4).

As to Claim 2, Canales in view of Colburn teaches The method of claim 1, further comprising receiving directorial inputs from a user that are embedded within the transcripted speech audio (Canales discloses “created a model that generated both head and eye motion based only on speech input for two party conversations” at p. 2. Colburn also discloses “The microphone input along with the subject’s eye gaze if fed into the eye gaze model described above to drive the eye gaze of the avatar” at p. 7.)

As to Claim 3, Canales in view of Colburn teaches The method of claim 2, wherein the directorial inputs comprise one of look-at tags to amplify salience of an object, directional tags to specify ego-centric aversion behavior, or override tags to force focus or aversion behaviour (Colburn discloses “Each of these sub-states is labeled with either one or two numbers. The (0) state indicates that the avatar is gazing away from the other. State (1,0) indicates the avatar is looking at the other, but that the other is looking away. State (1,1) is one of mutual gaze… Within the other speaking state there are now three targets for gaze: away from anyone, towards the speaker, and towards a non-speaker” at p. 5, see also Fig 1-4.)
As to Claim 4, Canales in view of Colburn teaches The method of claim 1, further comprising determining visually salient portions of a setting for the computer animation to determine locations for the gaze of the character (Canales discloses “a saliency-based gaze behavior model in a virtual conversational scenario… In addition to communication specific gaze behavior models, there are several methods that synthesize or retarget gaze based on the location of gaze targets or the visual saliency of the virtual scene” at p. 2.)

As to Claim 5, Canales in view of Colburn teaches The method of claim 1, wherein determining the time sequences of gaze transition targets comprises determining a speech-based probability indicating whether to avert the gaze of the character from a conversational partner (Colburn teaches gaze transition targets at Fig 1-4. Canales discloses “Iwao et al. [2012; 2013] synthesized the more subtle eye movements that occur during fixations using probability models derived from captured gaze data from two party conversations” at p. 2.)

As to Claim 6, Canales in view of Colburn teaches The method of claim 5, wherein the speech based probability is determined using a recurrent neural network model, the recurrent neural network model taking as input prosodic audio features and relative timing of speaking and listening turns obtained from the transcripted speech audio (Canales discloses “For example, Klein et al. [2019] used an RNN to animate a character’s upper-body as it follows a moving gaze target in real-time… To model gaze direction based on motion and speech inputs, we trained a recurrent neural network (RNN), which can capture temporal relations, on a dataset consisting of two-party conversations” at p. 2. Colburn, section “The Role of Eye Gaze in Conversation” and  “Timing of Transitions Between Sub-States”.)

As to Claim 8, Canales in view of Colburn teaches The method of claim 5, wherein transitions of the state machine are determined based on one or more of the speech-based probability, a visual salience of each scene object, and a gaze state of a conversational partner (Colburn, Fig 1-4. Canales discloses “Iwao et al. [2012; 2013] synthesized the more subtle eye movements that occur during fixations using probability models derived from captured gaze data from two party conversations” at p. 2.)

As to Claim 9, Canales in view of Colburn teaches The method of claim 1, wherein determining the trajectories of head motion and gaze of the character for each time-step using the determined gaze transition targets comprises optimizing for a head rotation to a shift in gaze, and comprises interpolating a sequence of head and eye targets using a motion generator (Canales discloses “there are several methods that synthesize or retarget gaze based on the location of gaze targets or the visual saliency of the virtual scene. Peters et al. [2010] developed a gaze shift model that animates the head, eyes, and blinks of a character based on the gaze target location and a parameter specifying the tendency the character moves their head” at p. 2; “We then use the confidence for each sample (a value between 0 and 1, provided by Pupil [Kassner et al. 2014]), to linearly interpolate the gaze angles between high confidence (c > 0.9) samples within each conversation” at p. 3.)

Claim 14 recites similar limitations as claim 1 but in a system form. Therefore, the same rationale used for claim 1 is applied.
Claim 15 is rejected based upon similar rationale as Claim 5.

Claim 17 is rejected based upon similar rationale as Claim 8.
Claim 18 is rejected based upon similar rationale as Claim 9.


Claims 7 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Canales in view of Colburn and Lee (JP3659181B2).
As to Claim 7, Canales in view of Colburn teaches The method of claim 4, wherein, during direct focus, look-at-points are generated on a conversational partner, and wherein, during aversion, look-at-points are generated using a random walk algorithm based on scene salience (Colburn teaches gaze state machine in Fig 1-4. Lee further discloses “The random walk algorithm of the embodiment of the present invention using node-seed distance information to identify penalties used in calculating transition probabilities for random walks” in [0070].)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the invention of Canales and Colburn with the teaching of Lee so as to use node-seed distance information to identify penalties used in calculating transition probabilities for random walks (Lee, [0070]).

Claim 16 is rejected based upon similar rationale as Claim 7.

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Canales in view of Colburn and Miklos et al. (US 2008/0130950 A1).
As to Claim 11, Canales in view of Colburn teaches The method of claim 9, wherein the motion generator comprises interpolation of a sequence of target head and eye angles determined by summing a sequence of sub-movements (Miklos discloses “In such matter, by generating interpolation data, the interpolators may be able to determine a gaze angle of the eye, based on interpolation…” in [0027]; “In step 62, the tracking algorithm of the software may be applied, utilizing the custom template and the eye image at that point in time ( or frame), to determine the real-time, two-dimensional position of a portion of the operator's eye, such as a pupil, within the frame image. In step 64, the tracking algorithm software may use the interpolating data of step 56 in order to determine the operator's real-time eye gaze angle at the operator's eye position determined in step 62” in [0028].)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the invention of Canales and Colburn with the teaching of Miklos so as to interpolate head movement and eye rotation to determine a real-time eye gaze angle with the corresponding head movement.
Claims 12 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Canales in view of Colburn and Dunlop (US 2008/0091122 A1).
As to Claim 12, Canales in view of Colburn teaches The method of claim 1, further comprising adding rhythmic head motion to the trajectory of the head motion (Canales discloses amplitude saccades at p. 3; “parameters, including saccade speed, magnitude, and frequency” at p. 4-5. Here, Canales doesn’t directly use claim language. Dunlop disclose “wherein if the analysis of the motion signal determines that the motion includes rhythmic, repetitive motion of the limbs, head, or trunk, a determination is made” in claim 54.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the invention of Canales and Colburn with the teaching of Dunlop so as to detect rhythmic head motion by analysis on speed, amplitude or frequency etc.

Claim 19 is rejected based upon similar rationale as Claim 12.

Claims 13 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Canales in view of Colburn and Ressemann et al. (US 2024/0398302 A1).
As to Claim 13, Canales in view of Colburn teaches The method of claim 9. The combination of Ressemann further teaches altering fixation of the trajectory of the gaze with eye rotations where a gaze fixation interval is longer than a predetermined time interval (Ressemann discloses “In certain other embodiments, a patient's point-of-gaze data (e.g., visual fixation data) is analyzed over a predetermined time period (e.g., over multiple sessions spanning several months) to identify a decline, increase, or other salient change in visual fixation (e.g., point-of-gaze data that initially corresponds to that of typically-developing children changing to more erratic point-of-gaze data corresponding to that of children…” in [0330].)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the invention of Canales and Colburn with the teaching of Ressemann so as to change the fixation of the gaze based on meeting a specific condition.

Claim 20 is rejected based upon similar rationale as Claim 13.

Allowable Subject Matter
Claim 10 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WEIMING HE whose telephone number is (571)270-1221.  The examiner can normally be reached on Monday-Friday, 8:30am-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tammy Goddard can be reached on 571-272-7773. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/WEIMING HE/
Primary Examiner, Art Unit 2611

Read full office action

Prosecution Timeline

Jul 26, 2024

Application Filed

May 06, 2026

Non-Final Rejection mailed — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/057,117

Patent 12639877

REFINEMENT OF FACIAL KEYPOINT METADATA GENERATION FOR VIDEO CONFERENCING OR OTHER APPLICATIONS

3y 6m to grant Granted May 26, 2026

16/900,500

Patent 12632615

DATA SERIALIZATION EXTRUSION FOR CONVERTING TWO-DIMENSIONAL IMAGES TO THREE-DIMENSIONAL GEOMETRY

5y 11m to grant Granted May 19, 2026

18/337,634

Patent 12633000

TEXT-TO-IMAGE SYNTHESIS UTILIZING DIFFUSION MODELS WITH TEST-TIME ATTENTION SEGREGATION AND RETENTION OPTIMIZATION

2y 11m to grant Granted May 19, 2026

18/354,159

Patent 12608891

INFORMATION PROCESSING DEVICE, HEAD-MOUNTED DISPLAY DEVICE, CONTROL METHOD OF INFORMATION PROCESSING DEVICE, AND NON-TRANSITORY COMPUTER READABLE MEDIUM WITH WHITE-BALANCE CORRECTION VALUE CORRESPONDING TO COLOR TEMPERATURE OF ENVIRONMENT LIGHT-SOURCE

2y 9m to grant Granted Apr 21, 2026

18/580,103

Patent 12567135

MULTIMEDIA PLAYBACK MONITORING SYSTEM AND METHOD, AND ELECTRONIC APPARATUS

2y 1m to grant Granted Mar 03, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

46%

Grant Probability

59%

With Interview (+13.0%)

3y 4m (~1y 6m remaining)

Median Time to Grant

Low

PTA Risk

Based on 414 resolved cases by this examiner. Grant probability derived from career allowance rate.