Last updated: April 19, 2026

Application No. 18/293,976

HEARING ATTENTIONAL STATE ESTIMATION APPARATUS, LEARNING APPARATUS, METHOD, AND PROGRAM THEREOF

Non-Final OA §102§103

Filed

Jan 31, 2024

Examiner

OMETZ, RACHEL ANNE

Art Unit

2668

Tech Center

2600 — Communications

Assignee

NTT, Inc.

OA Round

1 (Non-Final)

Interview Optional

— +30.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 26 resolved cases, 2023–2026

Examiner Intelligence

OMETZ, RACHEL ANNE View full profile →

Grants 69% — above average

Career Allow Rate

18 granted / 26 resolved

+7.2% vs TC avg

Strong +30% interview lift

Without

With

+30.1%

Interview Lift

resolved cases with interview

Typical timeline

2y 11m

Avg Prosecution

24 currently pending

Career history

Total Applications

across all art units

Statute-Specific Performance

§101

3.1%

-36.9% vs TC avg

§103

62.1%

+22.1% vs TC avg

§102

18.8%

-21.2% vs TC avg

§112

14.7%

-25.3% vs TC avg

Black line = Tech Center average estimate • Based on career data from 26 resolved cases

Office Action

§102 §103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1, 3-7, and 9-12 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Liao et al. (JP-2019126423-A).

Regarding claim 1, Liao teaches:
An auditory attention state estimation apparatus (“An auditory attention estimation device,” Para [0031]) comprising:
a feature quantity extraction processing circuitry configured to obtain a feature quantity (“correlation”) based on the strength of a correlation between each of a plurality of different visual stimulus patterns (“based on the correlation between the temporal pattern of pupil diameter change information and the temporal pattern of brightness change in each region of the moving image,” Para [0060]) emitted from positions of a plurality of different sound sources (the left and right sound sources of “headphones”, Para [0018]) and a pupil diameter change amount of a user (“When auditory attention is directed to the right ear, if the image on the right is darker than the image on the left, the pupil size of the eye increases. (2) When auditory attention is directed to the right ear, if the image on the right is brighter than the image on the left, the pupil size of the eye will decrease,” Para [0025], and vice versa for the left ear and left area);
and an estimation processing circuitry configured to estimate a destination (“direction of auditory attention”) to which the user pays auditory attention for a sound from the sound source using the feature quantity (“based on the correlation between the temporal pattern of pupil diameter change information and the temporal pattern of brightness change in each region of the moving image (temporal pattern of region information), the direction determined from the region of the moving image that is most highly correlated with the temporal pattern of pupil diameter change information is estimated to be the direction of auditory attention,” Para [0060]).

Regarding claim 3, the rejection of claim 1 is incorporated herein. Liao teaches the apparatus of claim 1, and further teaches:
	wherein each of the plurality of visual stimulus patterns is a pattern of a time-varying visual stimulus (“a moving image in which the time pattern of brightness changes in the left half of the moving image differs from the time pattern of brightness changes in the right half of the moving image,” Para [0058]), 
and the feature quantity indicates at least each maximum value of a cross-correlation function (“the higher the peak of the cross-correlation function of two series (series A and B) near the time difference of 0, the higher the correlation,” Para [0062]), between a sequence corresponding to a time-series signal indicating each of the plurality of visual stimulus patterns (arbitrarily “Series A”, or “series of index values indicating brightness”) and a sequence corresponding to a time-series signal indicating the pupil diameter change (arbitrarily “Series B”, or “pupil diameter change information”) amount (“a cross-correlation function is calculated between pupil diameter change information (series of pupil diameters) and area information (series of index values indicating brightness) of a certain area of a moving image,” Para [0062]).

Regarding claim 4, the rejection of claim 1 is incorporated herein. Liao teaches the apparatus of claim 1, and wherein the sound source corresponding to the visual stimulus pattern having a stronger correlation with the pupil diameter change amount of the user among the plurality of sound sources (“when auditory attention is directed to the left, the correlation (or similarity) between the time pattern of pupil diameter change information and the time pattern of brightness changes in the left half of the moving image will be higher than the correlation… in the right half of the moving image,” Para [0059]) has a higher frequency (“highly correlated”) at which the estimation processing circuitry estimates that the sound source is the destination to which the user pays auditory attention (“the direction determined from an area where brightness fluctuates in a time pattern that is highly correlated with a pattern obtained by frequency analysis of pupil diameter change information is estimated as the auditory attention direction,” Para [0061]).

Regarding claim 5, the rejection of claim 1 is incorporated herein. Liao teaches the apparatus of claim 1, and wherein the plurality of sound sources include a first sound source (“sounds are played from the… left headphones,” Para [0018]), the plurality of visual stimulus patterns includes a first visual stimulus pattern corresponding to the first sound source (“When auditory attention is directed to the left ear, if the image on the left is brighter than the image on the right, the pupil size of the eye will decrease,” Para [0025]), and when the strength of the correlation between the first visual stimulus pattern (darker area on the left side, for example) and the pupil diameter change amount of the user is equal to or greater than a predetermined value, the estimation processing circuitry estimates that the destination to which the user pays auditory attention is the first sound source or the vicinity of the first sound source (“the direction determined from a relatively dark area may be estimated as the direction of auditory attention when the pupil diameter change information is equal to or greater than a predetermined threshold,” Para [0041]).

Regarding claim 6, the rejection of claim 1 is incorporated herein. Liao teaches the apparatus of claim 1, and further teaches:
wherein the plurality of sound sources include a first sound source (from the left speaker of the headphones) and a second sound (from the right speaker of the headphones) source other than the first sound source (“different sounds are played from the right and left headphones,” Para [0018]),
the plurality of visual stimulus patterns include a first visual stimulus pattern (left half of the image painted black, Para [0028]) corresponding to the first sound source (“it is thought that when auditory attention is directed to the left, the correlation (or similarity) between the time pattern of pupil diameter change information and the time pattern of brightness changes in the left half of the moving image will be higher,” Para [0059]) and a second visual stimulus pattern (right half of the image painted white, Para [0028]) corresponding to the second sound source (“when auditory attention is directed to the right, the correlation (or similarity) between the temporal pattern of pupil diameter change information and the temporal pattern of brightness change in the right half of the moving image is thought to be higher,” Para [0059]),
and when the strength of a correlation between the first visual stimulus pattern and the pupil diameter change amount of the user is higher than the strength of a correlation between the second visual stimulus pattern and the pupil diameter change amount of the user (“when auditory attention is directed to the left, the correlation (or similarity) between the time pattern of pupil diameter change information and the time pattern of brightness changes in the left half of the moving image will be higher than the correlation (or similarity) between the time pattern of pupil diameter change information and the time pattern of brightness changes in the right half of the moving image,” Para [0059]),
the estimation processing circuitry estimates that the destination to which the user pays auditory attention is the first sound source or the vicinity of the first sound source (“based on the correlation between the temporal pattern of pupil diameter change information and the temporal pattern of brightness change in each region of the moving image (temporal pattern of region information), the direction determined from the region of the moving image that is most highly correlated with the temporal pattern of pupil diameter change information is estimated to be the direction of auditory attention,” Para [0060]).

Regarding claim 7, the rejection of claim 1 is incorporated herein. Liao teaches the method of claim 1, and further teaches a learning apparatus comprising:
a learning processing circuitry configured to obtain an estimation model for receiving a feature quantity based on the strength of a correlation between each of a plurality of different visual stimulus patterns (“area information which is an index value indicating the brightness of the area included in the field of view of the subject,” Para [0065]) emitted from positions of a plurality of different sound sources and a pupil diameter change (“change amount (or the first biometric information and the second biometric information) and area information,” Para [0064]) amount of a user (“the auditory attention direction can also be estimated using a model trained using machine learning, a neural network, or the like, which inputs the relative change amount (or the first biometric information and the second biometric information) and area information,” Para [0064]), 
and estimating a destination to which the user pays auditory attention for a sound from the sound source (“outputs the auditory attention direction,” Para [0064]), through learning processing using training data in which a training feature quantity based on the strength of a correlation between each of a plurality of different training visual stimulus patterns (“area information”) emitted from positions of a plurality of different training sound sources and a training pupil diameter change amount is associated with correct answer information indicating a destination to which auditory attention is paid for a training sound from the training sound source (“model that takes as input first biometric information and second biometric information acquired from one or more subjects, area information which is an index value indicating the brightness of the area included in the field of view of the subject, and information based on the first biometric information and second biometric information, learned using a collection of training data obtained based on the correct answer to the auditory attention direction of the subject when the second biometric information was acquired, and area information, and outputs the auditory attention direction,” Para [0065]).

Claims 9 and 11 are non-transitory computer-readable medium and method claims, respectively, that correspond to apparatus claim 1. The rejection of claim 1 is thus applied to claims 9 and 11.

Claims 10 and 12 are non-transitory computer-readable medium and method claims, respectively, that correspond to apparatus claim 7. The rejection of claim 7 is thus applied to claims 10 and 12. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 2 is/are rejected under 35 U.S.C. 103 as being unpatentable over Liao et al. (JP-2019126423-A) as applied to claim 1 above, and further in view of Naber et al., "Tracking the allocation of attention using human pupillary oscillations", Front Psychol. 2013 Dec 10, hereinafter referred to as Naber.

Regarding claim 2, the rejection of claim 1 is incorporated herein. Liao teaches the apparatus of claim 1, and further teaches: wherein each of the plurality of visual stimulus patterns is a pattern of a periodically time-varying visual stimulus (“the moving image presented by the visual information presenting unit 112 is a moving image in which the brightness changes in different time patterns for each region,” Para [0058]). 
Liao fails to teach the following limitations as further claimed. Naber, however, further teaches:
 the feature quantity indicates at least a magnitude of a frequency domain pupil diameter change amount signal at a peak frequency of each of a plurality of frequency domain visual stimulus signals and/or near the peak frequency (Fig. 1C and “The strength of pupil oscillations was analyzed by conducting a Fast Fourier Transform (FFT) that produces a power spectrum across frequencies,” pp. 2, Analysis)

    PNG
    media_image1.png
    662
    356
    media_image1.png
    Greyscale

each of the plurality of frequency domain visual stimulus signals is a signal obtained by transforming a time-series signal (Fig. 1B) indicating each of the plurality of visual stimulus patterns into a frequency domain (“The strength of pupil oscillations was analyzed by conducting a Fast Fourier Transform (FFT) that produces a power spectrum across frequencies,” pp. 2, Analysis),
and the frequency domain pupil diameter change amount signal is a signal obtained by transforming a time-series signal (Fig. 1B) indicating the pupil diameter change amount into the frequency domain (using the “Fast Fourier Transform (FFT),” pp. 2, Analysis).
Naber is considered to be analogous to the claimed invention because they are in the same field of determining pupil responses to visual stimuli. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Naber into Liao for the benefit of insight into the frequency response of the pupils.  

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Mulliken et al. (US-20230282080-A1) teaches a method that determines an attentive state of a person based on an auditory stimulus by observing their pupil. 
Singh et al. (US-20200253526-A1) teaches a method for testing hearing in infants based on pupil dilation by using periodic visual stimuli. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RACHEL A OMETZ whose telephone number is (571)272-2535. The examiner can normally be reached 6:45am-4:00pm ET Monday-Thursday, 6:45am-1:00pm ET every other Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vu Le can be reached at 571-272-7332. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Rachel Anne Ometz/               Examiner, Art Unit 2668                                                                                                                                                                                         	2/11/26


/VU LE/               Supervisory Patent Examiner, Art Unit 2668

Read full office action

Prosecution Timeline

Jan 31, 2024

Application Filed

Feb 11, 2026

Non-Final Rejection — §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/175,264

Patent 12602925

HYPERSPECTRAL IMAGE ANALYSIS USING MACHINE LEARNING

2y 5m to grant Granted Apr 14, 2026

18/180,643

Patent 12555255

ABSOLUTE DEPTH ESTIMATION FROM A SINGLE IMAGE USING ONLINE DEPTH SCALE TRANSFER

2y 5m to grant Granted Feb 17, 2026

18/246,348

Patent 12548354

METHOD FOR PROCESSING CELL IMAGE, ELECTRONIC DEVICE, AND STORAGE MEDIUM

2y 5m to grant Granted Feb 10, 2026

18/270,420

Patent 12541970

SYSTEM AND METHOD FOR ESTIMATING THE POSE OF A LOCALIZING APPARATUS USING REFLECTIVE LANDMARKS AND OTHER FEATURES

2y 5m to grant Granted Feb 03, 2026

18/155,952

Patent 12530735

IMAGE PROCESSING APPARATUS THAT IMPROVES COMPRESSION EFFICIENCY OF IMAGE DATA, METHOD OF CONTROLLING SAME, AND STORAGE MEDIUM

2y 5m to grant Granted Jan 20, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

69%

Grant Probability

99%

With Interview (+30.1%)

2y 11m

Median Time to Grant

Low

PTA Risk

Based on 26 resolved cases by this examiner. Grant probability derived from career allow rate.