Last updated: May 29, 2026

Application No. 18/411,379

CONTEXT-BASED VIDEO TRANSCRIPTION SYSTEM USING MACHINE LEARNING

Non-Final OA §101§102§103

Filed

Jan 12, 2024

Examiner

ARMSTRONG, ANGELA A

Art Unit

2659

Tech Center

2600 — Communications

Assignee

Cisco Technology Inc.

OA Round

1 (Non-Final)

Interview Optional

— +8.2% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 74% grant rate with +8.2% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.

Based on 646 resolved cases, 2023–2026

Examiner Intelligence

ARMSTRONG, ANGELA A View full profile →

Grants 74% — above average

Career Allowance Rate

480 granted / 646 resolved

+12.3% vs TC avg

Moderate +8% lift

Without

With

+8.2%

Interview Lift

resolved cases with interview

Typical timeline

3y 10m

Avg Prosecution

18 currently pending

Career history

672

Total Applications

across all art units

Statute-Specific Performance

§101

11.3%

-28.7% vs TC avg

§103

69.4%

+29.4% vs TC avg

§102

8.7%

-31.3% vs TC avg

§112

3.8%

-36.2% vs TC avg

Black line = Tech Center average estimate • Based on career data from 646 resolved cases

Office Action

§101 §102 §103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action is in response to the submission filed January 12, 2024.  Claims 1-20 are pending.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.   Claims 1, 10, and 17 are directed to methods, systems and computer readable mediums for determining words from audio/video data.  The claims provide limitations for  analyzing multimedia data including video data and audio data associated with the video data to identify one or more features in the video data, which can be achieved by a person watching the video and denoting any particular features observed; obtaining one or more candidate words based on the one or more features identified in the video data can be achieved by the person, obtaining a listing of words representative of the video; determining that a particular candidate word of the one or more candidate words matches a particular utterance in the audio data can be achieved by the person generating text based on the audio that is provided and comparing the generated text with the words representative of the video features; and selecting the particular candidate word for the particular utterance based on the audio data can be achieved by the person deciding or selecting the correct word(s) of the audio/video.  
The recited limitations are directed a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind (or pen and paper) but for the recitation of the generic computer, apparatus, medium, and generic computer components (memory processor).  If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas.  Accordingly, the claims recite an abstract idea.
This judicial exception is not integrated into a practical application because the recited the generic computer, system, medium, and generic computer components (memory processor) and computer instructions amounts to no more than mere instructions to apply the exception using generic computer components.  Accordingly, the elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.  The claims are directed to an abstract idea.  The claims are not patent eligible.
 The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because, as indicated with respect to integration of the abstract idea into a practical application, the additional elements of the generic computer, system, medium, and generic computer components (memory processor) and computer instructions to perform the various steps amounts to no more than mere instructions to apply the exception using generic computer components.  Mere instructions to apply an exception using generic computer components cannot provide an inventive concept.  The claims are not patent eligible.
Dependent claims 2-9, 11-16 and 18-20 do not integrate the judicial exception into a practical application and do not include additional elements that are sufficient to amount to significantly more than the judicial exception. The limitations of the dependent claims are directed to steps of organizing or manipulating data for analyzing audio or video data, recognizing words from audio, recognizing people/objects seen in a video, utilizing natural language processing rules and principles to analyze/process audio/text, and generating transcriptions.  The limitations of the dependent claims are steps that can be achieved via mental processing and/or using pen and paper.


Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1, 2, 4-11, 13-18 and 20 are rejected under 35 U.S.C. 102(a)(1)/(a)(2) as being anticipated by Basu et al (US Patent Application Publication No. 2018/0130461).
Basu teaches selecting media using weighted keywords based on facial recognition.  Regarding claim 1, Basu teaches a computer-implemented method {fig 1 and 5] comprising: analyzing multimedia data including video data and audio data associated with the video data to identify one or more features in the video data [para 0016-0017; 0027-0036 – facial analysis module]; obtaining one or more candidate words based on the one or more features identified in the video data [para 0017; 0019; 0022-0023 – filtered words based on entity attributes; 0045-0046 – knowledge of important words/predetermined keywords]; determining that a particular candidate word of the one or more candidate words matches a particular utterance in the audio data [para 0016-0018; 0022-0023; 0026-0036; 0044-0055 – word recognition module/transcription]; and selecting the particular candidate word for the particular utterance based on the audio data [para 0016-0018; 0022-0023; 0026-0036; 0044-0055 – word recognition module/transcription].
Regarding claim 2, Basu teaches determining a relevance score for each of the one or more candidate words based on a context of the one or more features [para 0016-0018; 0022-0023; 0026-0036; 0044-0055 – different keywords different scores…count weight factor]; and ranking the one or more candidate words according to the relevance score of each candidate word, wherein the particular candidate word is selected based on the ranking of the one or more candidate words [para 0016-0018; 0022-0023; 0026-0036; 0044-0055 – different keywords different scores…negative/positive count weight factor].
Regarding claim 4, Basu teaches the context of the one or more features includes one or more of: a position of the one or more features in the video data, and a user interaction with respect to the one or more features [para 0017; 0048 – facial analysis/recognition].
Regarding claim 5, Basu teaches the one or more features identified in the video data include a physical entity or object, a location, a logo, or an action depicted in the video data [para 0017; 0048 – facial analysis/recognition], and wherein the one or more candidate words are obtained from a corpus of words that are semantically related to the physical entity or object, the location, the logo, or the action [para 0017; 0019; 0022-0023 – filtered words based on entity attributes; 0045-0046 – knowledge of important words/predetermined keywords].
Regarding claim 6, Basu teaches features identified in the video data include a person, and wherein a facial recognition model is employed to identify the person [para 0017; 0048 – facial analysis/recognition], and the one or more candidate words are obtained based on an identity of the person [para 0017; 0019; 0022-0023 – filtered words based on entity attributes for job seekers, where sets of words based on necessary skills; 0045-0046 – knowledge of important words/predetermined keywords].
Regarding claim 7, Basu teaches: identifying a topic of at least a portion of the multimedia data, wherein the one or more candidate words are obtained from a corpus of words that are semantically related to the topic [para 0017; 0019; 0022-0023 – filtered words based on entity attributes for job seekers, where sets of words based on necessary skills; 0045-0046 – knowledge of important words/predetermined keywords].
Regarding claim 8, Basu teaches obtaining the one or more candidate words includes providing the one or more features to a large language model that generates a corpus of words relating to the one or more features, and wherein the one or more candidate words are selected from the corpus of words [para 0045-0046 – HMM, neural networks, LDA].
Regarding claim 9, Basu teaches generating a transcript or closed-caption text for the multimedia data based on the selecting of the particular candidate word [para 0016-0018; 0022-0023; 0026-0036; 0044-0055 – word recognition module/transcription].
Claims 10-11, 13-16 and 17-18 and 20 are system claims and computer readable medium claims that are rejected under similar rationale as claims 1,2, and 4-9.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 3, 12, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Basu in view of Long et al (US Patent Application Publication No. 2024/0062568), hereinafter Long.
Regarding claims 3, 12, and 19, Basu fails to teach the one or more features identified in the video data include text, and wherein the context that is used to determine the relevance score for each candidate word includes one or more of: a position of the text, a font size of the text, a letter case of the text, and an acronym status of the text.  In a similar field of endeavor, Long teaches unified scene text detection and layout analysis for detecting text in an image and determining layout data for the text including position/location, font type, size and/or color [para 0022-0025].  One having ordinary skill in the art at the time of the invention would have recognized the advantages of implementing the text detection processing suggested by Long, in the system of Basu, so as to enhance Basu’s  overall multimedia analysis for additional information [para 0021],  improving the recognition/classification of the system and thereby enhancing and improving the user’s interaction with the system.



Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANGELA A ARMSTRONG whose telephone number is (571)272-7598. The examiner can normally be reached M,T,TH,F 11:30-8:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached at 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

ANGELA A. ARMSTRONG
Primary Examiner
Art Unit 2659



/ANGELA A ARMSTRONG/Primary Examiner, Art Unit 2659

Read full office action

Prosecution Timeline

Jan 12, 2024

Application Filed

Apr 09, 2026

Non-Final Rejection mailed — §101, §102, §103

May 11, 2026

Interview Requested

May 19, 2026

Applicant Interview (Telephonic)

May 20, 2026

Examiner Interview Summary

Precedent Cases

Applications granted by this same examiner with similar technology

17/924,466

Patent 12640146

INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD AND RECORDING MEDIUM

3y 6m to grant Granted May 26, 2026

18/183,522

Patent 12640140

ELECTRONIC APPARATUS AND CONTROLLING METHOD THEREOF

3y 2m to grant Granted May 26, 2026

18/132,793

Patent 12626704

DETECTING VISUAL ATTENTION DURING USER SPEECH

3y 1m to grant Granted May 12, 2026

18/089,392

Patent 12608566

Method and Apparatus for Selecting Sample Corpus Used to Optimize Translation Model

3y 3m to grant Granted Apr 21, 2026

18/240,480

Patent 12602547

DOMAIN ADAPTING GRAPH NETWORKS FOR VISUALLY RICH DOCUMENTS

2y 7m to grant Granted Apr 14, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

74%

Grant Probability

82%

With Interview (+8.2%)

3y 10m (~1y 5m remaining)

Median Time to Grant

Low

PTA Risk

Based on 646 resolved cases by this examiner. Grant probability derived from career allowance rate.