Last updated: April 19, 2026

Application No. 18/597,369

METHOD FOR ESTIMATING GAZE DIRECTIONS OF MULTIPLE PERSONS IN IMAGES

Non-Final OA §103

Filed

Mar 06, 2024

Examiner

BURKE, TIONNA M

Art Unit

2178

Tech Center

2100 — Computer Architecture & Software

Assignee

BEIHANG UNIVERSITY

OA Round

1 (Non-Final)

This examiner grants 54% of cases after interview

— +19.3% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 431 resolved cases, 2023–2026

Examiner Intelligence

BURKE, TIONNA M View full profile →

Grants 54% of resolved cases

Career Allow Rate

233 granted / 431 resolved

-0.9% vs TC avg

Strong +19% interview lift

Without

With

+19.3%

Interview Lift

resolved cases with interview

Typical timeline

4y 9m

Avg Prosecution

46 currently pending

Career history

477

Total Applications

across all art units

Statute-Specific Performance

§101

11.0%

-29.0% vs TC avg

§103

60.1%

+20.1% vs TC avg

§102

18.1%

-21.9% vs TC avg

§112

7.5%

-32.5% vs TC avg

Black line = Tech Center average estimate • Based on career data from 431 resolved cases

Office Action

§103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-8 are rejected under 35 U.S.C. 103 as being unpatentable over Wright et al., “Measuring inferred gaze direction to support analysis of people in a meeting” Expert Systems with Applications (hereinafter “Wright”), in view of Haro, United States Patent Publication 20230083909.
Claim 1: 
	Wright discloses:
A method for estimating gaze directions of multiple persons in images, comprising: 
obtaining a facial image, wherein the facial image comprises at least one face region (see page, section “Methodology”). Wright teaches estimating gaze directions of multiple people in an image; 
constructing a model for multi-task, wherein the model comprises a plurality of multi-task processing structures capable of parallel calculations, the plurality of multi-task processing structures capable of parallel calculations output a gaze direction of each face region and face position information of each face region in the facial image through one-time calculation (see page 4, column 1 and 2). Wright teaches constructing a model. The uncertainty of the gaze direction (GD) for each participant is modelled by a Gaussian distribution, GD(μ,σ), constructed using estimates of the mean gaze direction and its standard deviation (σd) based on ground truth measurements; 
training the deep network model end-to-end on a dataset to obtain a trained deep network model, wherein the trained deep network model is used to determine the gaze directions of multiple persons; and 
inputting the facial image to the trained deep network model to obtain the gaze direction of at least one face region included in the facial image (see page 4 col. 1). Wright teaches inputting the facial image into the model to determine the gaze direction of multiple people in the image.

Wright fails to expressly disclose constructing a deep network learning model and training the model.

Haro discloses:
constructing a deep network model for multi-task learning, wherein the deep network model comprises a plurality of multi-task processing structures capable of parallel calculations, the plurality of multi-task processing structures capable of parallel calculations output a gaze direction of each face region and face position information of each face region in the facial image through one-time calculation (see paragraphs 0032], [0060], [0102]). Haro teaches constructing an always evolving learning model to take input, calculate calculations and determining the gaze of a user; 
training the deep network model end-to-end on a dataset to obtain a trained deep network model, wherein the trained deep network model is used to determine the gaze directions of multiple persons (see paragraph [0032], [0071], [0078]). Haro teaches training the model with images to determine the gaze of a user.

Accordingly, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the method disclose by Wright to include constructing and training a learning model to determine the gaze of user for the purpose of efficiently creating models to accurately detecting the gaze, as taught by Haro. 

Claim 2:
	Wright discloses:
wherein the face region in the at least one face region comprises at least one of the following: facial size information, posture information, facial expression information, and gender information (see figure 1 and page 4 Section “Gaze Direction”). Wright teaches a posture information and the position of the head.

Claim 3:
	Wright fails to expressly disclose encoders and decoders that complete tasks.

	Haro discloses:
wherein the deep network model is a multi-task single-stage deep network model, the deep network model comprising one encoder and multiple decoders, wherein, decoders in the multiple decoders are decoders that simultaneously complete different types of tasks, the deep network model is used to determine at the same time the gaze directions of multiple face regions, and output the face position information and key point information of multiple face regions (see paragraphs [0041], [0045] and [0078]). Haro teaches the model has encoders and decoders completing different task to determine the position and gaze information.

Accordingly, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the method disclose by Wright to include using encoders and decoders to complete tasks for the eye gaze detection for the purpose of efficiently creating models to accurately detecting the gaze, as taught by Haro. 

Claim 4:
	Wright discloses:
wherein the method further comprises: 
for the gaze direction of each face region in the gaze direction of the at least one face region, the following determination steps are performed: determine positions Ут,Уₛ of gaze projection points of the gaze direction Vg = (Θ,φ) of the face region on front, top and side projection planes, wherein, YF represents a position of a gaze projection point in a front direction, Ут represents a position of the gaze projection point in a top direction, Ys represents a position of the gaze projection point in a side direction, F represents the front direction, T represents the top direction, S represents the side direction, y₉ represents the gaze direction of the face region, Θ represents an angle of nutation in the gaze direction, and φ represents an angle of rotation in the gaze direction (see page  4 column 1 and Section “gaze direction”). Wright teaches determining positions of the gaze projections points of the face region using the gaze projection model. Wright uses equations to determine the positions and the directions of the gaze; 
determine whether the positions Yf,Yt,Ys of the gaze projection points on the front, top and side projection planes are equal to three projections of three-dimensional gaze prediction values, wherein, the three projections of the three-dimensional gaze prediction values are obtained by the following formula:  
Пᵣ(θ,Φ) = [sinΦ cosΘ, sinΘ], Пₜ(θ,Φ) = [cosΦ cosΘ, sinф cosΘ],  Пₛ(θ,Φ) = [cosΦ cosΘ, sinΘ] wherein, П represents a projection function, F represents the front direction, Θ represents the angle of nutation in the gaze direction, Φ represents the angle of rotation in the gaze direction, Пₚ(Θ,Φ) represents projecting the gaze direction onto the front plane, sinΦ represents a sine value of Φ, cosΘ represents a cosine value of Θ, T represents the top direction, Пτ(θ,Φ) represents projecting the gaze direction onto the top plane, соsф represents a cosine value of Φ, S represents the side direction, Пₛ(θ,Φ) represents projecting the gaze direction onto the side plane, sinΘ represents a sine value of Θ (see pages 6-10). Wright teaches the positions and measurements of the head and face to estimate gaze direction. 

Claim 5:
	Wright fails to expressly disclose a loss function for the model. 

	Haro discloses:
wherein the deep network model comprises a self-supervised loss function, the self-supervised loss function is obtained by the following formula:
Lₛₑlƒ = X, wherein, Lₛₑlf represents the self-supervised loss function, τ represents the front or top or side direction, τ takes a value of {F,T,S}, F represents the front direction, T represents the top direction, S represents the side direction, Vτ represents the gaze direction in the front or top or side direction, П represents the projection function, yg represents the gaze direction of the face region, Пₜ(у₉) represents a projection function from three-dimension to two-dimension, I |₁ represents a L1 norm, e represents a natural constant, p represents a trainable parameter, eₜʳ represents a -p power of e, pτ represents a correction coefficient for τ projection (see paragraph [0060]). Haro teaches a loss function to use an energy function that is to be reduced (e.g., minimized), such that discrepancies between reality (e.g., a ground truth gaze/gaze direction of a user) and a modeled, predicted, or estimated gaze/gaze direction for the user are minimized. In some machine learning implementations, movement of a machine learning model towards inaccuracy is minimized by adjusting the parameters. One way to achieve this is to use the energy function as a loss function and to minimize the loss function directly. In some contexts described herein, such minimization of the loss function is an attempt to maintain a heuristic balance between the uniqueness of the eye characteristics of a user and a pre-existing machine learning mode.

Accordingly, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the method disclose by Wright to include using a loss function for the model for the purpose of efficiently creating models to accurately detecting the gaze, as taught by Haro. 

Claim 6:
	Wright discloses:
wherein the dataset is generated by a multi-person gaze direction image and generation framework that has replaced an eye region, by inputting two types of data, wherein, one type of data is single-person image data with a gaze direction label, and the other type of data is multi-person image data with multiple face regions, the generation framework is used to automatically cluster the single-person image data based on at least one of the gender information, race information, age information and head posture information, for easy retrieval, and the generation framework is also used to retrieve in the single-person image data, a single-person image data that is closest to the face region, for each face region in the multi-person image data, and replace the eye region, to generate a corresponding gaze direction (see page 3 col 2, page 4 col 2, page 9 col 1). Wright teaches the data set related to an individual person and to multiple people include posture position to generate the gaze direction. 

Claim 7:
	Wright fails to expressly disclose a total loss for the training process. 

	Haro discloses:
wherein an overall loss function during end-to-end training process is obtained by the following formula: wherein, L represents the overall loss function, α represents a first adjustable hyperparameter, Lƒₐcₑ represents a loss function related to the face position information and key point information, ß represents a second adjustable hyperparameter, L₉ₐ₂ₑ represents a loss function related to the gaze direction, wherein, Lgaze is obtained by the following formula, wherein, L₉ₐ₂ₑ represents the loss function related to the gaze direction, 1₁ represents a first hyperparameter used to balance different loss terms, Lₛₑlf represents the self-supervised loss function, A₂ represents a second hyperparameter used to balance different loss terms, Vg represents the gaze direction of the face region, y* represents a truth label, and 111 represents the L1 norm (see paragraph [0061]-[0066]). Haro teaches minimizing discrepancies during training by using a plurality of parameters and formulas to minimize the loss.

Accordingly, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the method disclose by Wright to include using a loss function for minimizing discrepancies during training for the purpose of efficiently creating models to accurately detecting the gaze, as taught by Haro. 

Claim 8:
	Wright fails to perform eye gaze direction in real time.

	Haro discloses:
wherein in a deployment environment, the facial image is input in real-time to the trained deep network model, to obtain the gaze direction of at least one face region included in the facial image (see paragraph [0069] and [0070]). Haro teaches estimating in real time the eye gaze direction by an input image. 

Accordingly, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the method disclose by Wright to include estimating eye gaze direction in real time for the purpose of efficiently creating models to accurately detecting the gaze in real time, as taught by Haro. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TIONNA M BURKE whose telephone number is (571)270-7259. The examiner can normally be reached M-F 8a-4p.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Stephen Hong can be reached at (571)272-4124. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/TIONNA M BURKE/Examiner, Art Unit 2178                                                                                                                                                                                                        3/21/26

Read full office action

Prosecution Timeline

Mar 06, 2024

Application Filed

Mar 21, 2026

Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/573,954

Patent 12596470

GESTURE-BASED MENULESS COMMAND INTERFACE

2y 5m to grant Granted Apr 07, 2026

16/414,239

Patent 12591731

SYSTEM AND METHOD FOR SELECTING RELEVANT CONTENT IN AN ENHANCED VIEW MODE

2y 5m to grant Granted Mar 31, 2026

16/411,574

Patent 12572698

INFRASTRUCTURE METHODS AND SYSTEMS FOR EXTENDING CUSTOMER RELATIONSHIP MANAGEMENT PLATFORM

2y 5m to grant Granted Mar 10, 2026

16/530,909

Patent 12564152

SYSTEM AND METHOD FOR MANAGEMENT OF SENSOR DATA BASED ON HIGH-VALUE DATA MODEL

2y 5m to grant Granted Mar 03, 2026

17/485,056

Patent 12547823

DYNAMICALLY AND SELECTIVELY UPDATED SPREADSHEETS BASED ON KNOWLEDGE MONITORING AND NATURAL LANGUAGE PROCESSING

2y 5m to grant Granted Feb 10, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

54%

Grant Probability

73%

With Interview (+19.3%)

4y 9m

Median Time to Grant

Low

PTA Risk

Based on 431 resolved cases by this examiner. Grant probability derived from career allow rate.