Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The amendment filed 27 January, 2026 has been entered.
The amendment of claims 1, 7, 8, 14, and 15 has been acknowledged.
Response to Arguments
Applicant’s arguments, see page 7, section “Claim Objections - Informalities”, filed 27 January, 2026 with respect to the objection of claims 7 and 14 have been fully considered and are persuasive. The objection of claims 7 and 14 have been withdrawn.
Applicant’s arguments, see page 7, section “Claim Rejections – Obviousness – 35 USC § 103”, filed 27 January, 2026 with respect to the rejection of claims 1 – 20 have been fully considered and are persuasive. The objection of rejection of claims 1 – 20 under 35 USC § 103 have been withdrawn. However, upon further examination, a new rejection under 35 USC § 103 has been made.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1 - 20 are rejected under 35 U.S.C. 103 as being unpatentable over Miao et al (U.S. Patent Publication No. 2021/0150224 A1, hereinafter “Miao”) in view of Le et al (U.S. Patent Publication No. 2016/0210506 A1, hereinafter “Le”) and further in view of Nister et al (U.S. Patent Publication No. 2014/0112527 A1, hereinafter “Nister”).
Regarding claim 1, Miao teaches a method of recognizing animation through optical character recognition (OCR), executable by a processor, comprising:
receiving data corresponding to a video or animation containing one or more frames (¶ 0018: The video 110 includes frames, one or more of which depict entities such as people, animals, and/or inanimate objects. The video 110 also includes text data in the form of captions, subtitles, transcripts, computer-readable speech data, etc. The video 110 includes at least one set of consecutive frames, each set referred to herein as a "picture".; ¶ 0027: Process 200 begins when a video 110 is received);
extracting (¶ 0027: At least one frame of the video 110 depicts two or more entities (e.g., humans, animals, animated characters, etc.). Further, at least one of the frames is associated with text data that refers to at least two of the entities. In some embodiments, the text data is from captions, subtitles, scripts, etc. The text can also be from audio data such as recorded dialogue or narration. The video 110 is divided at regular intervals to form sets of frames referred to herein as "pictures"… The number of frames in each picture can be preset and/or selected by a user. In some embodiments, there are 10 frames per picture, though the picture intervals can include any number of consecutive frames (e.g., 24 frames, 30 frames, 100 frames, 120 frames, 240 frames, 300 frames, 500 frames, 1,000 frames, etc.).);
recognizing text in the extracted individual frames or representative images based on performing optical character recognition on the extracted individual frames or representative images (Figure 3A – 3C; ¶ 0021: The extraction component 140 can also extract features from text obtained by converting characters (e.g., letters, numbers, punctuation, etc.) detected in video 110 frames to machine-encoded text using techniques such as optical character recognition (OCR) or intelligent character recognition (ICR). For example, the extraction component 140 can identify text printed on an object such as a building sign, book cover, etc.);
identifying relationships between the recognized text across multiple frames of the one or more frames (Figure 3A – 3C; ¶ 0023: Further, the extraction component 140 maps the entities to video frames associated with the text from which they were extracted. The frames can be associated with text that is spoken to, by, or about particular entities. Returning to the previous example involving pictures 1-3 of the video 110, the video 110 can be divided into pictures at IO-frame intervals, and picture 3 can include frames 21-30. In this picture, frames 21-24 can include text spoken by the first person to the second person (e.g., "Hello"), and frames 27-29 can include a sentence about the bird that is spoken by the second person to the first person (e.g., "Look at the bird.").)
generating a textual or graphical representation of the video or animation based on the identified relationships (¶ 0024: The graphing component 150 generates image and text knowledge graphs based on the entities and entity relations identified in the extracted image and text data.).
Miao does not explicitly teach extracting individual frames or representative images from the received data and wherein the relationships between the recognized text across multiple frames comprises: utilizing tracking techniques to track movement of the text elements across the one or more frames.
However, Le does teach extracting individual frames or representative images from the received data (¶ 0012: Referring to FIG. 1, after selection of digital content for sharing, the application may extract frames 102 from the digital content that are representative of the digital content. As an example, the frames 102 may be frames of a video selected for sharing, as will be further described.; ¶ 0015: Referring to FIG. 1, the application may extract the frames 102 shown from the video due to the detection of facial features, for example, from facial recognition techniques.).
Miao and Le are considered to be analogous art as both pertain to image and video frame analysis and extraction. Therefore, it would have been obvious to one of ordinary skill in the art to combine the system of video segmentation based on weighted knowledge graph (as taught by Miao) and the device for identifying digital content (as taught by Le) before the effective filing date of the claimed invention. The motivation for this combination of references would be the method of Le extracts image features which are then used as a digital fingerprint for identifying and retrieving similar content such as frames consisting of the same features (See ¶ 0009).
Additionally, Nister teaches wherein the relationships between the recognized text across multiple frames comprises:
utilizing tracking techniques to track movement of the text elements across the one or more frames (Figure 5; ¶ 0039: A first frame 504 includes recognized text “heilo” (with a confidence score of 0.7) in an associated bounding box of known dimensions and coordinates based on an established (created) coordinate system of the first frame 504. The homography facilitates mapping back the “heilo” bounding box to the keyframe coordinate system using the transformation.; ¶ 0040: A second frame 506 includes recognized text “hello” (with a confidence score of 0.9) in an associated bounding box of known dimensions and coordinates based on an established (created) coordinate system of the second frame 506. The homography facilitates mapping back the “hello” bounding box to the keyframe coordinate system using the inverse of the homography.’ ¶ 0041: Similarly, an nth frame 508 includes recognized text “hella” (with a confidence score of 0.7) in an associated bounding box of known dimensions and coordinates based on an established (created) coordinate system of the nth frame 508. The homography facilitates mapping back the “hella” bounding box to the keyframe coordinate system using the inverse of the homography.).
Miao and Nister are considered to be analogous art as both pertain to image and video frame analysis and extraction. Therefore, it would have been obvious to one of ordinary skill in the art to combine the system of video segmentation based on weighted knowledge graph (as taught by Miao) and the system simultaneous tracking and text recognition in video frames (as taught by Nister) before the effective filing date of the claimed invention. The motivation for this combination of references would be the system of Nister performs a conflation which merges OCR results of the same line/word from different frames and keeping the more accurate recognition so that the result quality improves over time. (See ¶ 0003).
This motivation for the combination of Maio, Le, and Nister is supported by KSR exemplary rationale (G) Some teaching, suggestion, or motivation in the prior art that would have led one of ordinary skill to modify the prior art reference or to combine prior art reference teachings to arrive at the claimed invention. MPEP 2141 (III).
Regarding claim 2, the Maio, Le, and Nister combination teaches the method of claim 1.
Additionally, Miao teaches wherein the textual or graphical representation corresponds to a timeline representation of the video or animation associated with the recognized text at one or more time intervals (Figure 3A; ¶ 0028: Based on the entity relations, the graphing component 150 generates an image knowledge graph that links entities appearing in the same picture(s). Each picture can be assigned a picture number n, which can be an integer greater than zero (e.g., picture 1, picture 2, picture 3, etc.). Each linked pair of entities is referred to herein as an "entity relation". The image knowledge graph can also indicate the number of pictures in which each entity relation occurs.; ¶ 0034: The extraction component 140 (FIG. 1) extracts text data 310 from the video 110, and identifies entities and entity relations in the data 310. The entities extracted from the text data 310 are illustrated in table 320, which includes columns 322, 324, and 326. Column 322 includes numbers (1-8) that each indicate an occurrence of at least one entity relation in the text data 310… The video frames corresponding to the occurrences in column 322 are identified by frame numbers in column 324. Column 326 includes the names of entities included in entity relations at each occurrence.).
Regarding claim 3, the Maio, Le, and Nister combination teaches the method of claim 1.
Additionally, Miao teaches wherein the textual or graphical representation corresponds to a transcript corresponding to the recognized text in a chronological order (Figure 3A; ¶ 0034: The extraction component 140 (FIG. 1) extracts text data 310 from the video 110, and identifies entities and entity relations in the data 310. The entities extracted from the text data 310 are illustrated in table 320, which includes columns 322, 324, and 326. Column 322 includes numbers (1-8) that each indicate an occurrence of at least one entity relation in the text data 310… The video frames corresponding to the occurrences in column 322 are identified by frame numbers in column 324. Column 326 includes the names of entities included in entity relations at each occurrence.).
Regarding claim 4, the Maio, Le, and Nister combination teaches the method of claim 1.
Additionally, Maio teaches wherein the textual or graphical representation corresponds to a consolidated representation of the video or animation based on merging the recognized text displayed in the multiple frames ((Figure 3C; ¶ 0039: Table 380 includes the top relation weights ( column 382) calculated for entity relations (column 384) in the weighted knowledge graph 370, as well as the video frames containing each entity relation (column 386). The remaining entity relations, which have Weightr values below a threshold value (e.g., Weightr=0.05), are not shown in table 380.; Examiner’s note: The table in figure 3C is understood as a representation of each text occurrence which has been merged into individual rows showing the frame instances of each entity relation.).
Regarding claim 5, the Maio, Le, and Nister combination teaches the method of claim 1.
Additionally, Maio teaches further comprising reconstructing the video or animation based on the generated textual or graphical representation of the video or animation (¶ 0040: The frames (column 386) corresponding to the top relations in table 380 are grouped into video segments by the grouping component 160 (FIG. 1). In the example illustrated in FIG. 3C, the video 110 (FIGS. 1 and 3A) can be divided into three segments. The first segment can include frames 1-10 and 21-50, which include top relation entities Thuy, Cheryl, Geoff, and Sarah (nodes T, C, S, and G). The second segment can include frames 51-53 and 61-70, which include top relation entities Barbara and Hannah (nodes B and H). The third video segment can include frames 71-100, which include top relation entities Danielle, William, and Ellen (nodes D, W, and E).; ¶ 0041: When each of the frames has been grouped, the first segment includes frames 1-50, the second segment includes frames 51-70, and the third segment includes frames 71-100.).
Regarding claim 6, the Maio, Le, and Nister combination teaches the method of claim 1.
Additionally, Maio teaches further comprising enabling search and retrieval of scenes or instances within the video or animation that contain one or more keywords or phrases (¶ 0014: Video segmentation is a process of grouping video frames into related segments. This allows specific portions of a video to be located (e.g., in response to a user query).; ¶ 0016: Disclosed herein is are techniques for segmenting videos into scenarios. The disclosed techniques include using weighted knowledge graphs to identify related pairs of entities within consecutive sets of frames divided at regular intervals ( e.g., ten frames per set).; Examiner’s note: As described in ¶ 0015 of Maio, the system which extracts and organizes data based on relation is in the context of detecting and identifying frames in response to a user query.).
Regarding claim 7, the Maio, Le, and Nister combination teaches the method of claim 1.
Additionally, Le teaches wherein the individual frames or representative images are extracted based on one or more predefined criteria (¶ 0013: Referring to FIG. 1, the application may extract the frames 102 shown from the video due to the detection of facial features, for example, from facial recognition techniques.; ¶ 0013” As an example, the frames 102 may be limited to frames that contain recognizable features of the video. For example, any suitable image analysis techniques in the field of computer vision may be used for electronically perceiving and understanding an linage, Such techniques may be used for identifying video frames containing features that may be of interest.).
Regarding claim 8, claim 8 has been analyzed with regard to claim 1 and is rejected for the same reasons of obviousness as used above as well as in accordance with Maio’s further teaching on:
One or more computer-readable storage media configured to store computer program code (¶ 0043: Each CPU 402 may execute instructions stored in the memory subsystem 404 and can include one or more levels of on-board cache.); and
One or more computer processors configured to access said computer program code and operate as instructed by said computer program cod e (¶ 0043: Each CPU 402 may execute instructions stored in the memory subsystem 404 and can include one or more levels of on-board cache.)…
Regarding claim 9, claim 9 has been analyzed with regard to respective claim 2 and is rejected for the same reasons of obviousness as used above.
Regarding claim 10, claim 10 has been analyzed with regard to respective claim 3 and is rejected for the same reasons of obviousness as used above.
Regarding claim 11, claim 11 has been analyzed with regard to respective claim 4 and is rejected for the same reasons of obviousness as used above.
Regarding claim 12, claim 12 has been analyzed with regard to respective claim 5 and is rejected for the same reasons of obviousness as used above.
Regarding claim 13, claim 13 has been analyzed with regard to respective claim 6 and is rejected for the same reasons of obviousness as used above.
Regarding claim 14, claim 14 has been analyzed with regard to respective claim 7 and is rejected for the same reasons of obviousness as used above.
Regarding claim 15, claim 15 has been analyzed with regard to claim 1 and is rejected for the same reasons of obviousness as used above as well as in accordance with Maio’s further teaching on:
One or more computer-readable storage devices (¶ 0043: Each CPU 402 may execute instructions stored in the memory subsystem 404 and can include one or more levels of on-board cache.); and
program instructions stored on the at least one or more computer-readable storage devices, the program instructions configured to cause one or more computer processors to (¶ 0043: Each CPU 402 may execute instructions stored in the memory subsystem 404 and can include one or more levels of on-board cache.)…
Regarding claim 16, claim 16 has been analyzed with regard to respective claim 2 and is rejected for the same reasons of obviousness as used above.
Regarding claim 17, claim 17 has been analyzed with regard to respective claim 3 and is rejected for the same reasons of obviousness as used above.
Regarding claim 18, claim 18 has been analyzed with regard to respective claim 4 and is rejected for the same reasons of obviousness as used above.
Regarding claim 19, claim 19 has been analyzed with regard to respective claim 5 and is rejected for the same reasons of obviousness as used above.
Regarding claim 20, claim 20 has been analyzed with regard to respective claim 6 and is rejected for the same reasons of obviousness as used above.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Pooja et al (U.S. Patent Publication No. 2020/0364463 A1) teaches a system which merges handwritten content and digital audio based on a presentation flow. The system uses edge detection and generates a transcription of handwritten content utilizing digital audio.
Kalyuzhny et al (U.S. Patent Publication No. 2017/0330049 A1) teaches a system for performing OCR on a series of images depicting text symbols and tracking features across multiple frames.
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANDREW JONES whose telephone number is (703)756-4573. The examiner can normally be reached Monday - Friday 8:00-5:00 EST, off Every Other Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Bella can be reached at (571) 272-7778. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ANDREW B. JONES/Examiner, Art Unit 2667
/MATTHEW C BELLA/Supervisory Patent Examiner, Art Unit 2667