Last updated: April 19, 2026

Application No. 18/519,716

MULTI-MODAL FAR FIELD USER INTERFACES AND VISION-ASSISTED AUDIO PROCESSING

Non-Final OA §102

Filed

Nov 27, 2023

Examiner

TUCKER, WESLEY J

Art Unit

2661

Tech Center

2600 — Communications

Assignee

Analog Devices, Inc.

OA Round

1 (Non-Final)

Interview Optional

— +6.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 715 resolved cases, 2023–2026

Examiner Intelligence

TUCKER, WESLEY J View full profile →

Grants 83% — above average

Career Allow Rate

596 granted / 715 resolved

+21.4% vs TC avg

Moderate +6% lift

Without

With

+6.1%

Interview Lift

resolved cases with interview

Typical timeline

3y 1m

Avg Prosecution

19 currently pending

Career history

734

Total Applications

across all art units

Statute-Specific Performance

§101

12.3%

-27.7% vs TC avg

§103

35.7%

-4.3% vs TC avg

§102

39.4%

-0.6% vs TC avg

§112

8.3%

-31.7% vs TC avg

Black line = Tech Center average estimate • Based on career data from 715 resolved cases

Office Action

§102

DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1-3, 5, 7, 10, 12-14, 16, 18, 21 and 23 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Publication titled “Error Handling in Multimodal Voice-Enabled Interfaces of Tour-Guide Robots Using Graphical Models” by Prodanov.

With regard to claim 1, Prodanov discloses a method for vision-assisted audio processing in a far field device, comprising: 
receiving a video stream (Fig. 8.4(c) on page 122); 
detecting a person in the video stream; (See table 7.3 on page 105 and pages 122-123, a robot equipped with a video camera records video for detecting a user presence through face detection); 
 determining the person is an attentive person based on an attention feature associated with the person, wherein the attention feature indicates the person is paying attention to the far field device (page 123, section 8.3.4, Determination is made that the person is attending the conversation by detecting the person’s face for a preset number of frames or minimum period of time); 
applying, in response to determining the person being the attentive person, beamforming to a microphone array of the far field device to enhance reception of audio signals received from a target direction of arrival corresponding to a target direction in which the person is located (page 25, speech enhancement and audio signal capture section discusses performing beam forming for precise audio spatial filtering. Pages 140-142 discloses the specific microphone array used to perform beamforming to target the person speaking); and 
page 123, speech recognition section 8.3.5, and 
initiating, in response to determining the person being the attentive person, automatic speech recognition on the audio signals received from the target direction of arrival (page 123, speech recognition section 8.3.5, and sections 3.1 and 5.4.2.  See also table 7.3 and pages 122-123, Speech recognition is performed in response to the recognized face and the detected audio).

With regard to claim 2, Prodanov discloses the method of claim 1, wherein applying beamforming to the microphone array of the far field device includes at least one of amplifying the audio signals coming from the target direction of arrival or nullifying other audio signals coming from other directions different from the target direction of arrival (page 25, Speech enhancement and audio signal capture section describes the microphone arrays are implemented to reduce noise and thereby relatively amplify the speech part of the audio signal.  The beamformer specification is shown on pages 140-142.  Fig. 1, DSDA illustration shows an amplified speech signal with noise removed.  The microphone array also seeks to minimize audio signal that is outside of the directional sensitivity beam shown in Fig. 2).

With regard to claim 3, Prodanov discloses the method of claim 1, further comprising: 
receiving one or more audio signals having one or more frequencies; and wherein applying beamforming to the microphone array of the far field device includes applying different weights to different ones of the one or more frequencies to perform at least one of amplifying the audio signals coming from the target direction of arrival or nullifying other audio signals coming from other directions different from the target direction of arrival (Section 3.1.3 on page 26 describes the weighting of specific frequencies of the audio signal in order to accent the speech audio signal.  See also page 25, Speech enhancement and audio signal capture section describes the microphone arrays are implemented to reduce noise and thereby relatively amplify the speech part of the audio signal.  The beamformer specification is shown on pages 140-142.  Fig. 1, DSDA illustration shows an amplified speech signal with noise removed.  The microphone array also seeks to minimize audio signal that is outside of the directional sensitivity beam shown in Fig. 2).

With regard to claim 5, Prodanov discloses the method of claim 1, wherein determining the person is the attentive person further comprises: 
detecting the attention feature associated with the person (page 123, section 8.3.4, Determination is made that the person is attending the conversation by detecting the person’s frontal face for a preset number of frames or minimum period of time); 
comparing a period of time that the attention feature has been detected against a threshold (page 123, section 8.3.4, The example give is 0.8 seconds that a forward facing face is detected); and 
identifying the person as the attentive person in response to determining that the period of time exceeds the threshold (page 123, section 8.3.4, Determination is made that the person is attending the conversation by detecting the person’s frontal face for a preset number of frames or minimum period of time).

 With regard to claim 7, Prodanov discloses the method of claim 1, wherein the attention feature comprises at least one of a frontal face of the person, a side face of the person, an eye gaze of the person, a facial expression of the person, or a mouth movement of the person (page 123, section 8.3.4, Determination is made that the person is attending the conversation by detecting the person’s frontal face for a preset number of frames or minimum period of time).

With regard to claim 10, Prodanov discloses the method of claim 1, wherein determining the person is the attentive person further comprises detecting the attention feature associated with the person, comparing a period of time that the attention feature has been detected against a threshold, and identifying the person as the attentive person in response to determining that the period of time exceeds the threshold (page 123, section 8.3.4, Determination is made that the person is attending the conversation by detecting the person’s frontal face for a preset number of frames or minimum period of time. The example give is 0.8 seconds that a forward facing face is detected) ; and 
wherein applying beamforming to the microphone array of the far field device includes at least one of amplifying the audio signals coming from the target direction of arrival or nullifying other audio signals coming from other directions different from the target direction of arrival (page 25, Speech enhancement and audio signal capture section describes the microphone arrays are implemented to reduce noise and thereby relatively amplify the speech part of the audio signal.  The beamformer specification is shown on pages 140-142.  Fig. 1, DSDA illustration shows an amplified speech signal with noise removed.  The microphone array also seeks to minimize audio signal that is outside of the directional sensitivity beam shown in Fig. 2).

With regard to claim 12, the discussion of claim 1 applies.  Prodanov discloses an apparatus for vision-assisted audio processing in a far field device, comprising: one or more memories; and one or more processors couples with the one or more memories  for performing the method recited in claim 1 (See page 65, section 5.3.1 Hardware architecture.  The apparatus includes processors with memory for processing video and audio input).

With regard to claims 13-14, 16, 18 and 21, the discussions of claims 2-3, 5, 7 and 10 apply respectively.

With regard to claim 23, the discussions of claims 1 and 12 apply.  Prodanov discloses software program for controlling the device and performing the method recited in claim 1 (See page 66, Section 5.3.2 Software architecture).


Allowable Subject Matter
Claims 4, 6, 8-9, 11, 15, 17, 19-20 and 22 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

The following is a statement of reasons for the indication of allowable subject matter:  
With regard to claims 4 and 15, no prior art of record was found to teach the specific claimed steps of:
determining a first location of the person in an image coordinate system of the video stream in response to the person being the attentive person; 
converting the first location into a second location of the person in an audio coordinate system of the microphone array; and 
determining a target vector toward the second location, wherein the target direction of arrival corresponds to the target vector.

With regard to claims 6 and 17, nor prior art of record was found to teach or fairly suggest the specific steps of:
identifying the attention feature associated with the person in a first video frame of a plurality of video frames of the video stream; 
skipping a number of video frames subsequent to the first video frame; and 
identifying the attention feature associated with the person in a second video frame of the plurality of video frames of the video stream, wherein the second video frame is after the number of video frames subsequent to the first video frame, wherein a time duration between the first video frame and the second video frame comprises the period of time exceeding the threshold.

With regard to claims 8, 9, 11, 19, 20 and 22 no found prior art of record teaches or fairly suggests the steps of:   
 detecting an interferer object in the video stream; and 
identifying an interferer direction of arrival corresponding to an interferer direction in which the interferer object is located; 
wherein applying beamforming to the microphone array of the far field device includes at least one of amplifying the audio signals coming from the target direction of arrival or nullifying interferer audio signals coming from the interferer direction of arrival.  
USPN 2019/0050629 to Olgiati discloses a system for determining an interferer in the form of a person or object occluding or obscuring another person or object (See Fig. 3 and paragraphs [0038]-[0041]). However Olgiati does not teach or fairly suggest that the interferer or overlapping detected persons are used for processing audio or microphone array beamforming.  Prodanov also doesn’t teach or suggest the determination of or accounting for interferers in the processing of audio data in microphone array beamforming.


Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WESLEY J TUCKER whose telephone number is (571)272-7427. The examiner can normally be reached 9AM-5PM Monday-Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, JOHN VILLECCO can be reached at 571-272-7319. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/WESLEY J TUCKER/Primary Examiner, Art Unit 2661

Read full office action

Prosecution Timeline

Nov 27, 2023

Application Filed

Feb 24, 2026

Non-Final Rejection — §102 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/449,072

Patent 12597221

IMAGE PROCESSING APPARATUS AND ELECTRONIC APPARATUS

2y 5m to grant Granted Apr 07, 2026

18/534,120

Patent 12597222

METHOD AND SYSTEM FOR DETERMINING A REGION OF WATER CLEARANCE OF A WATER SURFACE

2y 5m to grant Granted Apr 07, 2026

18/568,927

Patent 12592057

SYSTEM AND METHOD FOR DETECTING AND CLASSIFYING RETINAL MICROANEURYSMS

2y 5m to grant Granted Mar 31, 2026

17/473,467

Patent 12585939

SYSTEMS AND METHODS FOR DISTRIBUTED DATA ANALYTICS

2y 5m to grant Granted Mar 24, 2026

18/498,027

Patent 12586410

Method and Device for Dynamic Recognition of Emotion Based on Facial Muscle Movement Monitoring

2y 5m to grant Granted Mar 24, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

83%

Grant Probability

90%

With Interview (+6.1%)

3y 1m

Median Time to Grant

Low

PTA Risk

Based on 715 resolved cases by this examiner. Grant probability derived from career allow rate.