Prosecution Insights
Last updated: April 19, 2026
Application No. 18/624,381

METHOD AND SYSTEM FOR REAL-TIME ACTIVE SPEAKER DETECTION

Non-Final OA §102§103
Filed
Apr 02, 2024
Examiner
MARIAM, DANIEL G
Art Unit
2675
Tech Center
2600 — Communications
Assignee
LENOVO (SINGAPORE) PTE. LTD.
OA Round
1 (Non-Final)
91%
Grant Probability
Favorable
1-2
OA Rounds
2y 6m
To Grant
99%
With Interview

Examiner Intelligence

Grants 91% — above average
91%
Career Allow Rate
1068 granted / 1179 resolved
+28.6% vs TC avg
Moderate +10% lift
Without
With
+10.3%
Interview Lift
resolved cases with interview
Typical timeline
2y 6m
Avg Prosecution
15 currently pending
Career history
1194
Total Applications
across all art units

Statute-Specific Performance

§101
15.9%
-24.1% vs TC avg
§103
33.3%
-6.7% vs TC avg
§102
20.7%
-19.3% vs TC avg
§112
20.9%
-19.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 1179 resolved cases

Office Action

§102 §103
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Notice re prior art available under both pre-AIA and AIA In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. Examiner's Note Examiner has cited particular columns and line numbers or figures in the references as applied to the claims below for the convenience of the applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested from the applicant, in preparing the responses, to fully consider the references in entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the examiner. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action: A person shall be entitled to a patent unless – (a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention. Claims 1-5, 8-12, and 15-18 are rejected under 35 U.S.C. 102 (a) (1) as being anticipated by Chaudhuri, et al. (US 10,846,522 B2). Please note, due to the very broad formulation of claim 1 its subject matter is disclosed by a plurality of documents. For procedural efficiency, the examiner has focused the search to prior art that discloses further to independent claim 1. With regard to claim 1, Chaudhuri, et al. disclose an active speaker detection system, comprising: a visual sensor that captures a visual scene including a first person, i.e., target person (See for example, col. 8, lines 57-65; and col. 9, lines 43-44); and a computer system comprising: one or more computer processors (See for example, Figs. 1-2 and the associated text); and a detection model comprising an audiovisual encoder (See for example, col. 4, lines 64-67; and items 116 and 124 in Fig. 1) and a classifier (See for example, col. 5, lines 1-11; and item 114, in Fig. 1), wherein the computer system is communicably coupled to the visual sensor (See for example, Figs. 1-2 and the associated text) and configured to: obtain a first set of frames and a second set of frames from the visual sensor (See for example, col. 8, line 66 – col. 9, line 21), produce a first embedding and a second embedding from the first set of frames and the second set of frames, respectively, using the audiovisual encoder (See for example, col. 4, lines 51-67), generate one or more composite embeddings from the first embedding and the second embedding (See for example, col. 7, lines 14-21, and col. 7, lines 32-35), determine, using the classifier, an active speaker detection (ASD) score for each of the one or more composite embeddings (See for example, col. 7, lines 32-52; col. 8, line 66 – col. 9, line 33; and col. 11, lines 47-61), aggregate the one or more ASD scores forming a detection result (See for example, col. 9, lines 22-33), determine whether the first person is speaking based on the detection result, and upon determining that the first person is speaking, adjust a display of the visual scene to focus (via zooming in and/or annotating the video with a bounding box around the face of the current speaker), on the first person (See for example, col. 9, lines 41-65). Thus, each of the requirements of claim 1 is met. With regard to claim 2, the active speaker detection system according to claim 1, wherein the determination of whether the first person is speaking corresponds to the second set of frames (See for example, col. Col 8, line 66 – col. 9, line 21; and Fig. 2). With regard to claim 3, the active speaker detection system according to claim 1, wherein the second set of frames are temporally after the first set of frames (See for example, col. 8, lines 57-65; and Fig. 2). With regard to claims 4, the active speaker detection system according to claim 1, wherein the first embedding and the second embedding each comprise a number of audiovisual feature vectors, and the number of audiovisual feature vectors is equal to a number of frames in the first set or the second set (See for example, col. 5, line 62 – col. 6, line 2; and col. 7, lines 32-52. With regard to claim 5, the active speaker detection system according to claim 1, wherein the audiovisual encoder comprises a neural network (See for example, col. 4, lines 51-53); and the classifier comprises a recurrent neural network (See for example, col. 7, line 58 – col. 8, line 12). Claim 8 is rejected the same as claim 1 except claim 8 is a method claim. Thus, argument similar to that presented above for claim 1 is applicable to claim 8. Claim 9, 10, 11, and 12, are rejected the same as claims 2, 3, 4, and 5 respectively, except claims 9, 10, 11, and 12 are method claims. Thus, arguments similar to those presented above for claims 2, 3, 4, and 5 are respectively applicable to claims 9, 10, 11, and 12. Claim 15 is rejected the same as claim 8. Thus, argument similar to that presented above for claim 8 is applicable to claim 15. Claim 15 distinguishes from claim 8 only in that it recites 15. A non-transitory computer-readable medium comprising computer-executable instructions that. Fortunately, Chaudhuri (See for example, col. 4, lines 9-14; and col. 13, lines 22-42) teach this feature. Claims 16, 17, and 18 are rejected the same as claims 9, 10 and 11 respectively. Thus, arguments similar to those presented above for claims 9, 10, and 11 are respectively applicable to claims 16, 17, and 18. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 6-7, 13-14, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Chaudhuri, et al. (US 10,846,522 B2) in view of DONTCHEVA, et al. (US 2024/0134597 A1). With regard to claim 6, Chaudhuri, et al. (hereinafter “Chaudhuri”) discloses all of the claimed subject matter as already discussed above in paragraph 6, and incorporated herein by reference. Chaudhuri further discloses wherein the detection result comprises a first speaking metric for the first person, the determination of whether the first person is speaking comprises comparing the first speaking metric, i.e., a probability that a person is speaking, to a they are from the same field of endeavor, i.e., active speaker detection (See for example, paragraph 0050). Before the effective filing date of the claimed invention, it would have been obvious to incorporate the teaching as taught by DONTCHEVA, et al. into the system of Chaudhuri, et al. and to do so would at least allow detection of an active speaker based on a predefined threshold (See for example, paragraph 0096). Therefore, it would have been obvious to combine Chaudhuri with DONTCHEVA, et al. to obtain the invention as specified in claim 6. With regard to claim 7, the active speaker detection system according to claim 6, wherein the visual scene further includes a second person, i.e., multiple people, and the detection result further comprises a second speaking metric for the second person, and the determination of whether the first person is speaking further comprises: obtaining a status for the first person in response to the speaking metric of the first person being lower than or equal to the threshold; and determining whether the first speaking metric is greater than the second speaking metric and the whether the status of the first person is active, wherein the first person is determined to be speaking in response to the status of the first person being active and the first speaking metric being greater than the second speaking metric (See for example, col. 9, lines 41-65 of Chaudhuri; and paragraph 0096 of DONTCHEVA, et al.). Claims 13 and 14 are rejected the same as claims 6 and 7 respectively, except claims 13 and 14 are method claims. Thus, arguments similar to those presented above for claims 6 and 7 are respectively applicable to claims 13 and 14. Claims 19 and 20 are rejected the same as claims 13 and 14 respectively. Thus, argument similar to those presented above for claims 13 and 14 are respectively applicable to claims 19 and 20. Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Roth, et al. (AVA Active Speaker: An Audio-Visual Data Set for Active Speaker Detection) (See for example, Figs. 1 and 4 , and the associated text); Tesema, et al. (Efficient Audiovisual Fusion for Active Speaker Detection) (See entire document); and US Patent Application Publication Number 2005/0243167 (See for example, paragraphs 0012, 0015, 0029-0030, 0050, and 0061-0062). Any inquiry concerning this communication or earlier communications from the examiner should be directed to DANIEL G MARIAM whose telephone number is (571)272-7394. The examiner can normally be reached M-F 7:30-5:00 EST. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ANDREW MOYER can be reached at (571)272-9523. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /DANIEL G MARIAM/Primary Examiner, Art Unit 2675
Read full office action

Prosecution Timeline

Apr 02, 2024
Application Filed
Mar 04, 2026
Non-Final Rejection — §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12597281
IMAGE AND SEMANTIC BASED TABLE RECOGNITION
2y 5m to grant Granted Apr 07, 2026
Patent 12584859
IDENTIFYING AUTO-FLUORESCENT ARTIFACTS IN A MULTIPLEXED IMMUNOFLUORESCENT IMAGE
2y 5m to grant Granted Mar 24, 2026
Patent 12579782
METHOD FOR IMAGE PROCESSING
2y 5m to grant Granted Mar 17, 2026
Patent 12579833
IDENTITY DOCUMENT DETECTION WITH CONVOLUTIONAL NEURAL NETWORKS FOR DATA LOSS PREVENTION
2y 5m to grant Granted Mar 17, 2026
Patent 12573200
VIDEO-BASED BEHAVIOR RECOGNITION DEVICE AND OPERATION METHOD THEREFOR
2y 5m to grant Granted Mar 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
91%
Grant Probability
99%
With Interview (+10.3%)
2y 6m
Median Time to Grant
Low
PTA Risk
Based on 1179 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month