Prosecution Insights
Last updated: April 18, 2026
Application No. 18/820,254

METHOD AND SYSTEM FOR GENERATING MEETING MINUTES

Non-Final OA §101§103
Filed
Aug 30, 2024
Examiner
SHIN, SEONG-AH A
Art Unit
2659
Tech Center
2600 — Communications
Assignee
Inventec Corporation
OA Round
1 (Non-Final)
78%
Grant Probability
Favorable
1-2
OA Rounds
2y 9m
To Grant
99%
With Interview

Examiner Intelligence

Grants 78% — above average
78%
Career Allow Rate
321 granted / 409 resolved
+16.5% vs TC avg
Strong +20% interview lift
Without
With
+20.5%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
25 currently pending
Career history
434
Total Applications
across all art units

Statute-Specific Performance

§101
20.8%
-19.2% vs TC avg
§103
45.2%
+5.2% vs TC avg
§102
16.7%
-23.3% vs TC avg
§112
7.1%
-32.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 409 resolved cases

Office Action

§101 §103
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Status of Claims Claims 1-10 are pending in this application. Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims 1-10 and are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. Step 2A, Prong One: The independent claim 1 recites “obtaining a video signal, an audio signal and sound source localization information of a video conference; performing face recognition on a plurality of image frames of the video signal to obtain a plurality of face recognition results; performing voice recognition on a plurality of audio segments of the audio signal to obtain a plurality of voice recognition results at a plurality of timestamps; matching the voice recognition results with the face recognition results according to the sound source localization information, to obtain a plurality of speaker’s identities; performing speech to text transcription on the audio segments of the audio signal to obtain a transcript; attaching the speaker’s identities to the transcript according to the timestamps, to obtain a context; and performing context understanding on the context to obtain a meeting minutes report”. Claims 1 and 10 recite obtaining audio, video and direction of audio, recognizing/matching face and voice based on direction of each sound, transcribing audio to text, and analyzing context to generate a summary. [Abstract idea indicators] Recognizing and matching face and voice -- are decision making steps that are mental processes. Transcribing speech into text is the conversion of verbal content to written form—a task humans routinely perform mentally or with conventional tools. Understanding and summarize context are mental processes -- a cognitive process. Accordingly, the claims are directed to the judicial exception of a mental process. Step 2A, Prong Two: This judicial exception is not integrated into a practical application. The computer is recited at a high-level of generality (i.e., as performing a generic computer function and being used as an applying) such that it amounts no more than mere instructions to apply the exception using a generic computer. Accordingly, there additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. Step 2B — Claims Do Not Recite an Inventive Concept That Transforms the Mental Process into Patent-Eligible Subject Matter The claims add generic, well-understood computer components (memory and processor) and broadly recite use of ‘matching recognition results according to the sound source localization information’ without describing any specific, unconventional structure, algorithmic detail, data structure, or system architecture that provides a concrete technical improvement in computer functionality. Applying Alice step two and relevant Federal Circuit precedent: The recitation of conventional computer components (memory and processor) performing routine functions does not supply an inventive concept. The claims recite high-level, result-oriented steps (e.g., “performing,” “matching,” “attaching”) that describe mental processes rather than specific technical means for performing those processes. Because the claims lack limitations that tie the mental-process steps to a particular way of achieving a technological improvement (for example, a novel model architecture, specialized data representation, unique training regimen that yields demonstrable technical performance gains, a specialized streaming/decoding pipeline that reduces latency by a quantifiable amount, or hardware/software co-design), the additional elements do not transform the mental processes into significantly more. Therefore, claims 1 and 10 fail to recite an inventive concept sufficient to transform the judicial exception into patent-eligible subject matter. With respect to dependent claims 2-9, the claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. Conclusion — Rejection Claims 1-10 are rejected under 35 U.S.C. § 101 as being directed to a judicial exception (mental processes) and failing to recite additional elements that amount to significantly more than the judicial exception. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1-8 and 10 are rejected under pre-AIA 35 U.S.C. 103(a) as being unpatentable over Liu (US Pub. 2022/0335949) in view of Zhu et al., (US Pub. 2023/0205985). Regarding claim 1, Liu discloses a method for generating meeting minutes, comprising: obtaining a video signal, an audio signal and sound source localization information of a video conference (Figs. 1 and 2, [0131][0132] conference terminals obtain video data in the conference site to obtain video data, audio data and sound source direction information); performing face recognition on a plurality of image frames of the video signal to obtain a plurality of face recognition results (Fig. 4, [0131][0150][0151] performing facial recognition on the received video data); performing voice recognition on a plurality of audio segments of the audio signal to obtain a plurality of voice recognition results at a plurality of timestamps (Fig. 4 and [0132][0133][0148][0152][0236] performing voiceprint recognition on the received audio data with timestamps corresponding to each audio segment); matching the voice recognition results with the face recognition results according to the sound source localization information, to obtain a plurality of speaker’s identities (Fig. 4, [0151][0152][0156] identifying a face and voiceprint in a sound source direction to obtain a face ID and voiceprint ID and determining a speaker identity corresponding to the audio segment); performing speech to text transcription on the audio segments of the audio signal to obtain a transcript ([0155[0156] performing speech recognition to transcribe audio to text); attaching the speaker’s identities to the transcript according to the timestamps, to obtain a context ([0102][0156][0184][0189] each text is marked with a speaker ID); and performing [context understanding] on the context to obtain a meeting minutes report (Fig. 5, [0203] obtaining and displaying Minutes of meeting). Zhu does explicitly teach including the bracketed limitation: performing [context understanding] on the context to obtain a meeting minutes report (Zhu, [0045][0086][0087][0476][0489][0490][0512] understanding speech by analyzing the post-processed transcription and which is automatically populated with content from the post-processed transcription to generate a meeting minutes). Therefore, it would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to incorporate the method of processing conference data based on a sound source direction as taught by Liu with the method of adapting a post-processing system which is configured to perform editing to the meeting transcripts as taught by Zhu to improve the accuracy and readability of the transcript for subsequent downstream operations such as generating summaries (Zhu, [0016]). Regarding claim 2, Liu in view of Zhu discloses the method of claim 1, and Zhu further discloses: wherein the voice recognition results and the face recognition results comprise a plurality of known identities and at least one unknown identity (Zhu, Fig. 8, 820, 822, and 824, [0503][0504][0520] determining whether the content 804 is associated with audio and visual data collected from a meeting in which the contributing entities 802 are meeting participants based on the speech and image recognition results). The previous motivation statement as in claim 1 is still applied. Regarding claim 3, Liu in view of Zhu discloses the method of claim 1, and Zhu further discloses: wherein the voice recognition results comprise a plurality of unknown identities, and wherein the face recognition results comprise a plurality of known identities (Zhu, [0445]-[0447] utilizing only audio data and/or only visual data to attribute sub-portions to a user profile for guest user profiles). The previous motivation statement as in claim 1 is still applied. Regarding claim 4, Liu in view of Zhu discloses the method of claim 3, and Zhu further discloses: determining the speaker’s identities from the known identities, according to the sound source localization information and the face recognition results; and updating the unknown identities comprised in the voice recognition results with the speaker’s identities, according to the timestamps (Zhu, [0445][0446] “the guest user profile becomes a known user profile when the guest user profile is tagged with the identity of a meeting participant, thereby linking the user profile and its associated electronic content to the meeting participant”). The previous motivation statement as in claim 1 is still applied. Regarding claim 5, Liu in view of Zhu discloses the method of claim 1, and Zhu further discloses: wherein the voice recognition results comprise a plurality of known identities, and wherein the face recognition results comprise a plurality of unknown identities (Zhu, [0445]-[0447] utilizing only audio data and/or only visual data to attribute sub-portions to a user profile for guest user profiles). The previous motivation statement as in claim 1 is still applied. Regarding claim 6, Liu in view of Zhu discloses the method of claim 1, and Liu further discloses: wherein the sound source localization information comprises at least one of an angle and a direction of each of sound sources (Liu, [0030][0034] capturing target images using a director camera and collecting sound source localization information from each directions). Regarding claim 7, Liu in view of Zhu discloses the method of claim 1, and Zhu further discloses: wherein the face recognition results comprise coordinates of a plurality of facial bounding boxes in the image frames and an identity corresponding to each of the facial bounding boxes (Zhu, Fig. 9, 0545)[0546] detecting and identifying faces corresponding to each of the detected areas, 950 A-950 D). The previous motivation statement as in claim 1 is still applied. Regarding claim 8, Liu in view of Zhu discloses the method of claim 1, and Liu further discloses: obtaining a text input associated with at least one user profile; inserting the text input into the transcript according to time series, to generate an updated transcript; attaching the speaker’s identities to the updated transcript according to the timestamps, to obtain the context (Liu, [0003][0184] obtaining a statement text made by each person during a conference). Liu does not explicitly teach however Zhu does explicitly teach: performing the context understanding on the context to obtain the meeting minutes report (Zhu, [0045][0086][0087][0476][0489][0490][0512] understanding speech by analyzing the post-processed transcription and which is automatically populated with content from the post-processed transcription to generate a meeting minutes). The previous motivation statement as in claim 1 is still applied. Regarding claim 10, claim 10 is the corresponding system claim to method claim 1. Therefore, claim 10 is rejected using the same rationale as applied to claim 1 above. Claim 9 is rejected under pre-AIA 35 U.S.C. 103(a) as being unpatentable over Liu (US Pub. 2022/0335949) in view of Zhu et al., (US Pub. 2023/0205985) and further in view of Wasserblat et al., (US Pub. 2012/0215535). Regarding claim 9, Liu in view of Zhu discloses the method of claim 1, and Liu further discloses: performing summary extraction on the updated context, to obtain the meeting minutes report (Liu, Fig. 5, [0203] obtaining and displaying Minutes of meeting). Liu in view of Zhu does not explicitly teach however Wasserblat does explicitly teach: performing the context understanding on the context, to obtain a plurality of emotional semantics of a plurality of sentences comprised in the context; removing a portion of the context according to the emotional semantics, to generate an updated context (Figs. 1, 4 and 5, interaction analytics 136, [0062][0070][0085]-[0087] filtering emotional words by using semantic inference). Therefore, it would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to incorporate the method of processing conference data based on a sound source direction as taught by Liu in view of Zhu with the method of filtering out emotional words as taught by Wasserblat to provide suitable interactions for further analysis (Wasserblat, [0008]). Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Please see attached form PTO-892. Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEONG-AH A. SHIN whose telephone number is (571)272-5933. The examiner can normally be reached 9 AM-3PM. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached at 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. Seong-ah A. Shin Primary Examiner Art Unit 2659 /SEONG-AH A SHIN/Primary Examiner, Art Unit 2659
Read full office action

Prosecution Timeline

Aug 30, 2024
Application Filed
Mar 31, 2026
Non-Final Rejection — §101, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12598095
DISPLAY DEVICE
2y 5m to grant Granted Apr 07, 2026
Patent 12591452
INVOKING AN AUTOMATED ASSISTANT TO PERFORM MULTIPLE TASKS THROUGH AN INDIVIDUAL COMMAND
2y 5m to grant Granted Mar 31, 2026
Patent 12585696
REDUCING METADATA TRANSMITTED WITH AUTOMATED ASSISTANT REQUESTS
2y 5m to grant Granted Mar 24, 2026
Patent 12555568
DEVICE CONTROL METHOD AND APPARATUS, READABLE STORAGE MEDIUM AND CHIP
2y 5m to grant Granted Feb 17, 2026
Patent 12554935
COMPUTER IMPLEMENTED METHOD FOR THE AUTOMATED ANALYSIS OR USE OF DATA
2y 5m to grant Granted Feb 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
78%
Grant Probability
99%
With Interview (+20.5%)
2y 9m
Median Time to Grant
Low
PTA Risk
Based on 409 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month