Last updated: April 19, 2026
Application No. 17/540,200
DISPLAY CONTROL INTEGRATED CIRCUIT APPLICABLE TO PERFORMING REAL-TIME VIDEO CONTENT TEXT DETECTION AND SPEECH AUTOMATIC GENERATION IN DISPLAY DEVICE

Non-Final OA §103
Filed
Dec 01, 2021
Examiner
DICKERSON, CHAD S
Art Unit
2683
Tech Center
2600 — Communications
Assignee
Realtek Semiconductor Corp.
OA Round
5 (Non-Final)
This examiner grants 63% of cases after interview

— +23.0% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 600 resolved cases, 2023–2026
Examiner Intelligence

DICKERSON, CHAD S View full profile →
Grants 63% of resolved cases
Career Allow Rate
376 granted / 600 resolved
+0.7% vs TC avg
Strong +23% interview lift
Without
With
+23.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
35 currently pending
Career history
635
Total Applications
across all art units
Statute-Specific Performance

§101
8.8%
-31.2% vs TC avg
§103
55.5%
+15.5% vs TC avg
§102
14.9%
-25.1% vs TC avg
§112
18.1%
-21.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 600 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments, see pages 5-8, filed 10/1/2025, with respect to the rejection(s) of claim(s) 1-3 and 5-10 under 103 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Gonzales and Huang.  The Gonzales reference is used to perform the features of having video content that includes text data reflecting subtitles within a broadcast.  The reference further discloses identifying text within the video that is displayed as a subtitle and generating characters that are associated with a subtitle.  The invention of Gonzales filters an image to generate a filtered image that is better suited for the downstream OCR operation.  This reference lastly performs generating words that corresponds to the series of characters that comprise the subtitles that are converted to speech for an audio output.  These teachings are taught in ¶ [26]-[30] and [32]- [37].  However, this reference does not teach searching for multiple lines within a text region, which is cured by the Huang reference, and the video content being in a real-time manner.  
Regarding the remaining applied references, the teachings of Huang teach searching for text region pr regions that contains multiple lines of subtitle data, which is taught in Huang in [48]-[51] and [68].  The Gupta reference remains in the combination in order to perform the correction of a series of characters associated with a subtitle or close captioned text.  This reference replacing erroneous text that undergoes an OCR operation into correct text based on the OCR operation and context of other recognized text, which is taught in ¶ [28]-[34] and [41].  Lastly, this reference discloses having broadcast TV performed live or in real-time, which is taught in ¶ [45].  This real-time nature of the broadcast can be evaluated by the previously applied features in the combination to perform the features evaluating real-time broadcast data of the independent clams.     

Therefore, based on the above, the features of the claims are disclosed below. 
  
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: storage unit in claim 2.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim(s) 1, 2 and 5, 6 and 8-10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gonzales (US Pub 2003/0216922) in view of Huang (US Pub 2023/0027412 (Priority date: 10/27/2020)) and Gupta (US Pub 2019/0215545).

Re claim 1: Gonzales discloses a display control integrated circuit (IC), applicable to performing real-time video content text detection and speech automatic generation in a display device, the display control IC comprising: 
an image processing circuit, configured to receive a video signal (e.g. the A/D converter is used to receive the video signal, which is taught in ¶ [26].);
[0026] Reference is made to FIG. 1 for illustrating a block diagram of a first embodiment of a system 100 that is constructed and operated in accordance with this invention for performing a subtitles translation from a language of origin (or source language) to a viewer language (or target language) during reception of a transmitted audiovisual (e.g., television) program, or during DVD or VCR tape playback of an audiovisual program. In the system 100 certain input audio analog signals 100A and video analog signals 100B are digitized by an analog-to-digital converter 101. The digitized audio data 101A is recorded in an audio delay buffer 105 to enable full synchronization with the video signal and translated subtitles data at later stages. The digitized video data 101B is processed in a character recognition block 102 having two outputs: output 102A which is the digitized video itself, and an output 102B which represents recognized textual information in the original (i.e., as-received) language. The digital video signal appearing at output 102A is recorded in a video delay buffer 104 to enable full synchronization with the audio and translated subtitles data at later stages. The time delay provided by buffers 105 and 104 is preferably equal to the total required time to process and translate the subtitles or closed caption information, plus any system latencies and delays.

a pre-processing circuit, configured to obtain a video content carried by the video signal, and (e.g. the bitmap constructor and processor receives video content carried from a video signal, which is taught in ¶ [26] above, [29], [35] and [36].) comprising: 

[0029] Regardless of the color television system used the digitized signals are applied to a bit map constructor and processor 102C that is bidirectionally coupled to a memory, referred to herein for convenience as a frame buffer 102D. The processor 102C constructs a video frame in the frame buffer 102D so as to contain pixels and scan lines corresponding to a video image that would be viewed on a television receiver. In the preferred embodiment some number of frames are accumulated in the frame buffer 102D and averaged or otherwise combined together when stored in the frame buffer 102D. The exact number of frames to be accumulated to form the bit map is selected based on some criteria related to how rapidly the displayed alphanumeric information would be expected to change. One suitable but non-limiting value for the number of accumulated frames corresponds to a number of frames displayed in 0.5 second, or about 15 frames for a NTSC formatted television signal. The result is that the frame buffer 102D contains memory locations corresponding to the alphanumeric symbols or textual characters that may be present in a subtitle or a closed caption, while the background video, assuming movement in the video image at the frame rate or near the frame rate, will appear as a noisy background signal. After some desired number of frames are accumulated (e.g., from one to about 15 for a NTSC formatted television signal), the content of the frame buffer 102D is processed by an OCR block 102E. The OCR block 102E, or some other pattern recognizer, may operate in a manner similar to a conventional OCR function that is used to process a scanned and digitized page to locate and recognize individual alphanumeric characters. Optional feedback (FB) can be provided to the processor 102C for indicating the status of the operation of the OCR block 102E. For example, if the background becomes excessively noisy, making character recognition difficult, the FB signal may cause the processor 102C, for example, to accumulate fewer video frames to form the bitmap, or to apply some type of filtering to the bitmap prior to the operation of the OCR block 102E. The OCR block 102E can also vary the size of the sampling window within which it examines the bitmap for potential characters. The end result is recognized alphanumeric characters or symbols that are output on line 102B to the text-to-text translator 103 where the recognized textual information 102B is processed further.

a text detection circuit, configured to perform preliminary text detection according to the video content to identify text displayed in at least a frame of the video content and generate a series of segmented character images to indicate a subtitle, wherein the text detection circuit performs image filtering on the video content to generate a filtered image, and searches for a text region having lines in the filtered image to be a target region, and obtain at least one text-existence image in the target region for further processing (e.g. the processor is used to identify text that is displayed in a frame of a video.  The frame buffer contains an area or portion that contains alphanumeric symbols or characters in the bitmap that are present on a background of a video.  If the background is noisy, the processor filters the noisy background in order to clear up the bitmap area containing characters or alphanumeric symbols to be later processed by the OCR operation..  The characters can be within lines in different frames to be processed and sent to the OCR.  This is explained in ¶ [26], [29] above and [30].  A segment of characters is illustrated in the frame in figure 3A.),

[0030] It should be noted that in most cases subtitles information is located in the bottom portion of the video screen image. As such, in order to decrease the amount of required frame buffer 102D memory, while increasing processing speed, it may be preferred to only accumulate and process video frame information that corresponds to the bottom portion (e.g., the bottom third or bottom quarter) of the video image.

a character recognition circuit, coupled to the pre-processing circuit, configured to perform character recognition on the series of segmented character images to generate a series of characters corresponding to the subtitle, respectively (e.g. the filtered text within the frame is input into an OCR circuit for character recognition of characters or alphanumeric symbols to generate a series or group of text that reflect a subtitle or closed caption, which is taught in ¶ [26], [29], [30] above and [27].); and 

[0027] The character recognition block 102 recognizes the graphical representation of characters that are present in the digitized video signal 101B by using any one of a number of suitable methods, such as by using a method based on pattern matching and/or OCR.

a post-processing circuit, coupled to the character recognition circuit, configured to perform vocabulary correction on the series of characters to selectively replace any character with a correct character to generate one or more vocabularies comprising words corresponding to the series of characters, for performing speech automatic generation (e.g. after an OCR operation to recognize characters within the detected video data, the system may perform a text to text translation to change the language to a second language selected by the user, which can be considered as the correct characters or language reflecting the correct characters.  The system detects the text generated in order to convert the text to speech that can be output.  This is explained in ¶ [32]-[34] and [37].).  

[0032] A logical block diagram for text-to-text machine translation block 103 is shown in FIG. 4. A first function of text-to-text machine translation block 103 is to automatically identify the source language if requested by the viewer. For the first embodiment of this invention the automatic language detection is performed in block 401 of FIG. 4, and is based on the character set used in the source language and optionally also on any special features or characteristics of the language, or even on explicitly given language identifiers, all of which are referred to generally as control information 102C. Upon completion of the operation of block 401 the text is translated from the original, and possibly automatically detected, language (the source language) to the language chosen by a viewer (the target language) in a text-to-text machine translation block 402. Translation can be performed by any suitable technique for converting an alphanumeric string that represents a sentence in one language to a sentence in another language. The translated text is output on line 103A to block 106 of FIG. 1. Either subtitles or closed captions with translated text are generated in block 106. The translated text and video data from the video delay buffer 105 are then combined in block 106 and multiplexed with the delayed digitized audio data output from the audio buffer 105 before being directed to the video display (not shown in FIG. 1). The end result is that the source text is presented to the viewer after translation to the target text, with the translation being to a language selected by the user (e.g., English to French, Swedish to Hebrew, etc.).

[0033] FIG. 5 illustrates a block diagram of the subtitles/closed caption generator, mixer and digital to analog converter (DAC) block 106. The translated text 103A is applied to both a closed caption generator 501 and a subtitles generator 503. The outputs of blocks 501 and 503 are fed into multiplexer 502 that receives a control signal 106A from the viewer by means of, for example, a remote control (e.g., IR) link. The output of block 502 is input to block 504 which mixes the delayed video digital signal 104A with the selected one of closed caption or subtitles data output from the multiplexer 502. The output of mixer block 504 is input to a digital-to-analog converter (DAC) 505 that produces an analog TV signal, for example an analog TV signal in the NTSC format.

[0034] Note in FIG. 1 that the textual information may be presented to the viewer in an audio (speech) signal format by performing text-to-speech synthesis in block 107. The choice of textual or audio (or both) is preferably made user-selectable. The output 207A of the synthesizer 207 can be multiplexed into the audio signal 105A.

[0037] Still referring to FIG. 2, the buffered video data 204A and buffered audio data 205A are decoded by dedicated MPEG-2 video and audio decoders 208 and 209, respectively. The translated text 203A as well as the decoded video data 208A are multiplexed together in block 206, where either subtitles or closed captions with translated text are generated as shown in FIG. 5. As in the analog embodiment of FIG. 1, it may be desirable present the textual information as a speech signal by performing text-to-speech synthesis in block 207.

	
wherein the video signal received by the image processing circuit contains the text (e.g. the text is contained in the video signal, which is taught in ¶ [26] and [27] above.).
However, Gonzales fails to specifically teach the features of obtain real-time content carried by the video signal, searches for a text region having multiple lines in the filtered image to be a target region, configured to perform vocabulary correction on the series of characters to selectively replace any erroneous character generated by the character recognition circuit with a correct character.

However, this is well known in the art as evidenced by Huang.  Similar to the primary reference, Huang discloses recognizing text within a video displayed (same field of endeavor or reasonably pertinent to the problem).      
	Huang discloses searches for a text region having multiple lines in the filtered image to be a target region (e.g. the invention discloses searching for areas associated with text and recognizing the text within text regions.  Figure 4 shows a subtitle that contains multiple lines that are recognized.  This is taught in ¶ [48]-[51] and [68].). 

[0048] Step 101: Recognize a video to obtain n candidate subtitle regions, the candidate subtitle regions being regions in which text contents are displayed in the video, and n being a positive integer.

[0049] Exemplarily, the video may be a video file of any type, for example, a short video, a TV series, a movie, a variety show, or the like. Exemplarily, a subtitle is included in the video. Taking a short video as an example, texts in a short video screen not only includes a subtitle, but may also include other text information, such as a watermark text of a short video application, a user nickname of a short video publisher, a video name of the short video, and the like. Therefore, a subtitle of a short video cannot be accurately obtained simply by using the OCR technology for text recognition. In addition, a lot of labors are required to manually mark a subtitle region and then perform text recognition on a marked position to obtain the subtitle. Therefore, this application provides a method for recognizing a subtitle, which can accurately recognize the subtitle from a plurality of text information in a video, omit the step of manually marking a subtitle region, and improve the efficiency of subtitle extraction.

[0050] Exemplarily, the method for obtaining a video may be arbitrary, and the video may be a video file locally stored by a computer device, or may be a video file obtained by another computer device. For example, when the computer device is a server, the server may receive a video file uploaded by a terminal; and when the computer device is a terminal, the terminal may also download a video file stored on a server through a network. For example, the computer device is a server, a client with a font extraction function may be installed on a terminal, a user may select a locally stored video file on a user interface of the client, and click an upload control to upload the video file to the server, and the server performs subsequent processing of subtitle region recognition on the video file.

[0051] A candidate subtitle region refers to a region in a video in which text contents are displayed. Exemplarily, the candidate subtitle region includes a region in which text contents are displayed in each frame of a video screen in the video. The candidate subtitle region is a type of region position with a clear region range and position coordinates. Exemplarily, text regions in which text contents with similar positions in the video are located are clustered into a candidate subtitle region.

[0068] Exemplarily, after a text list is obtained, the text list includes a plurality of text regions. Because a subtitle of a video is usually displayed in a same region, the text regions are grouped to obtain a plurality of candidate subtitle regions. Exemplarily, due to text contents of different subtitles are different, ranges of displayed regions may be slightly different. For example, (1) and (2) in FIG. 4 are respectively two video frame images of a video, on the two video frame images, there are a first text content located in a first text region 501 and a second text content located in a second text region 502, and both the text contents are subtitles. However, due to a difference in quantities of words and lines of the text contents, the text regions of the two text contents are slightly different. However, both the text regions are subtitle regions. Therefore, a deviation threshold needs to be set when grouping candidate subtitle regions. When position deviations of the two text regions are less than the deviation threshold, the two text regions are considered to belong to a same candidate subtitle region. In this way, a plurality of text regions in a text list can be grouped, and finally several candidate subtitle regions can be obtained.

Therefore, in view of Huang, it would have been obvious to one of ordinary skill at the time the invention was made to have the feature of searches for a text region having multiple lines in the filtered image to be a target region, incorporated in the device of Gonzales, in order to extract subtitles through searching for regions containing lines of subtitle information automatically, which improve can save time of a user from performing the process manually (as stated in Huang ¶ [06]).   

However, Gonzales above fails to specifically teach the features of wherein the text detection circuit performs image filtering on the real-time video content to generate a filtered image, and searches for a text region having multiple lines in the filtered image to be a target region, and obtain at least one text-existence image in the target region for further processing.

However, the combination above fails to specifically teach the feature of obtain real-time content carried by the video signal, configured to perform vocabulary correction on the series of characters to selectively replace any erroneous character generated by the character recognition circuit with a correct character.
However, this is well known in the art as evidenced by Gupta.  Similar to the primary reference, Gupta discloses replacing errors made from the OCR of caption data (same field of endeavor or reasonably pertinent to the problem).      
Gupta discloses obtain real-time content carried by the video signal (e.g. the system discloses content within a live performance that can be acquired, which is taught in ¶ [45].), 

[0045] Interactive media guidance applications may take various forms depending on the content for which they provide guidance. One typical type of media guidance application is an interactive television program guide. Interactive television program guides (sometimes referred to as electronic program guides) are well-known guidance applications that, among other things, allow users to navigate among and locate many types of content or media assets. Interactive media guidance applications may generate graphical user interface screens that enable a user to navigate among, locate and select content. As referred to herein, the terms “media asset” and “content” should be understood to mean an electronically consumable user asset, such as television programming, as well as pay-per-view programs, on-demand programs (as in video-on-demand (VOD) systems), Internet content (e.g., streaming content, downloadable content, Webcasts, etc.), video clips, audio, content information, pictures, rotating images, documents, playlists, websites, articles, books, electronic books, blogs, chat sessions, social media, applications, games, and/or any other media or multimedia and/or combination of the same. Guidance applications also allow users to navigate among and locate content. As referred to herein, the term “multimedia” should be understood to mean content that utilizes at least two different content forms described above, for example, text, audio, images, video, or interactivity content forms. Content may be recorded, played, displayed or accessed by user equipment devices, but can also be part of a live performance.

configured to perform vocabulary correction on the series of characters to selectively replace any erroneous character generated by the character recognition circuit with a correct character (e.g. the system discloses performing an OCR operation on caption data on a screen.  The system will transcribe the information that is being spoken and displayed on a screen.  The information spoken is recognized by OCR.  The information on the screen is checked against certain grammar rules.  If the information is determined as erroneous, the system may check the erroneous information against displayed caption information.  Once the correct word or phrase is identified, letters within the word, not the whole word itself, may be replaced.  This is explained in ¶ [28]-[34] and [41].).

[0028] In some embodiments, a media guidance application corrects the errors by accessing a knowledge graph based on information derived from the media asset itself and looks for candidate replacements or corrections for the errors from within the knowledge graph. The media guidance application may be implemented partially on multiple devices such that some portions of the media guidance application are executed on one device while other portions of the media guidance application are executed on another device. The knowledge graph may be on a server, such as media content source 816 or media guidance data source 818, or on any other servers or databases that are maintained and accessible from the media guidance application. In some embodiments, the information derived from the media asset may be texts or images that appear in video frames of the media asset surrounding the errors, and can be identified by performing textual or image recognition on the video frames. The textual or image recognition may be performed by the media guidance application using any of a number of techniques, such as various optical character recognition algorithms, image recognition algorithms, and other machine learning techniques. Additionally, the information may also be derived from the correctly recognized portions of the on-screen caption text itself. Here, the media guidance application may apply one or more text parsing and keyword extraction algorithms on the portions of the on-screen caption text that have already been correctly recognized. In some embodiments, the media guidance application may consider a portion of the on-screen caption text to be correctly recognized if the portion passes a natural language processing (NLP) processor and returns no grammar errors according to the grammar rules specified by the NLP processor.
[0029] In some embodiments, the media guidance application may determine one or more potential corrections for the errors by accessing the knowledge graph. The knowledge graph may comprise nodes and links arranged in a linked data format, whereby a node indicates a conceptual entity and a link represents a relationship between two or more nodes. The knowledge graph may be pre-populated by the media guidance application with data collected over time, and may be periodically updated to include new nodes and links, which reflect information that are related to existing nodes of the knowledge graph. In some embodiments, the knowledge graph may be maintained by a third-party service, such as a third-party knowledge database, whereby the media guidance application is capable of accessing the knowledge graph via an application programming interface (API) offered by the third-party service. In this implementation, the third-party service is responsible for constructing, maintaining, and updating the knowledge graph. In some embodiments, the knowledge graph may be generic and could include information on anything at any time. In some other embodiments, the media guidance application may maintain a contextual knowledge graph that is dedicated to a particular subject area, a particular time period, and the like. These may be referred to as a sub-knowledge graph or a contextual knowledge graph. Two exemplary knowledge graphs that may be used in accordance with some embodiments of the present disclosure are presented and discussed in relation to FIGS. 2-3 below.
[0030] To determine the one or more potential corrections for the errors, the media guidance application may access a suitable knowledge graph and search for one or more nodes representing the information derived from the media asset. As previously discussed, the information derived from the media asset may include one or more contextual terms determined from the video frames and one or more keywords extracted from the on-screen caption text. In some embodiments, the media guidance application may examine all other nodes in the knowledge graph that are linked to at least one of the one or more nodes representing such information, and optionally construct a sub-knowledge graph that is self-contained with the one or more nodes and their immediate neighboring nodes. These nodes may each represent a potential correction for the errors.
[0031] In some embodiments, the media guidance application may weigh the one or more potential corrections determined above, based on their phonetic similarity to the errors, in order to select a candidate correction having the highest weight. Besides phonetic similarity, the media guidance application may weigh the potential corrections based on any number of other criteria, such as by their time stamps, which indicate how up to date their corresponding nodes are. In some embodiments, the media guidance application may then replace the errors with the candidate correction and present an error-free on-screen caption text to viewers.

[0032] FIG. 1 shows an illustrative example of display screen 100 generated by a media guidance application in accordance with some embodiments of the disclosure. In display screen 100, the media guidance application makes a mistake in transcribing the name for the current Chinese president, Xi Jinping, in on-screen caption text 106 during a news broadcast. Rather than displaying an intelligible sentence, the media guidance application displays on-screen caption text 106: “The meeting between President Obama and President She-Jumping underscored . . . ” This illustrative example demonstrates the failure of existing on-screen caption text systems that implement a traditional automated transcription service, or systems that employ human stenographers who are not aware of the particular term in question (in this case, the name of the current Chinese president). In accordance with the current disclosure, however, the media guidance application implemented in system 800 may apply one or more NLP rules to on-screen caption text 106 and determine that “She-Jumping” 108 is an erroneous term because it fails to adhere to one or more grammar rules.
[0033] To correct the erroneous term, the media guidance application may extract keywords from the correctly recognized portions of the on-screen caption text 106, such as “President,” and access a knowledge graph based on the term. The media guidance application may also perform OCR of video frame 110, which corresponds to a position in the media asset that is equivalent to the position of on-screen caption text 106 in the media asset. For example, the media guidance application may generate for display video frame 110 on display 712 at substantially the same time that on-screen caption text 106 is announced in the audio stream of the news broadcast. Based on the OCR of video frame 110, the media guidance application may recognize contextual terms such as “China” and “State Visit” from information panel 102. Alternatively, or in addition to performing a textual recognition such as OCR of video frame 110, the media guidance application may perform an image recognition of the characters shown in video frame 110 to further identify contextual terms that is associated with the erroneous term. For example, if the Chinese president in video frame 110 receives a close camera shot, the media guidance application may perform an image recognition on his identity and arrive at the contextual term “Xi Jinping,” which incidentally corresponds to the real identity of the erroneous term “She-Jumping.”
[0034] The media guidance application may access the knowledge graph based on these contextual terms in addition to the keywords extracted from on-screen caption text 106. By analyzing nodes and properties associated with these terms in the knowledge graph, the media guidance application may identify a number of potential corrections related to “President,” “China,” and “State Visit,” such as “Xi Jinping” and “Hu Jintao” (President Xi Jinping and former President Hu Jintao have each hosted President Obama's state visits to China on separate occasions). The media guidance application may then replace “She-Jumping” in the original text segment with “Xi Jinping”.

[0041] In some embodiments, the media guidance application may update every node of the knowledge graph periodically, by pulling and examining authoritative sources at fixed intervals. In some embodiments, the media guidance application may update the nodes and properties by groups. In some further embodiments, the media guidance application may perform the updating in real time, such as by linking the nodes directly to an API service of the authoritative sources. For example, the media guidance application can implement an automatic update for the node “Barack Obama” 312 by linking it to an API of an online encyclopedia, a news service, or the White House official news portal in order to receive real-time updates. Upon receiving a real-time update for a particular node, the media guidance application may enter new properties for the node, update existing properties, delete properties, add new links to existing or new nodes, or create new nodes to be linked to the particular node.

Therefore, in view of Gupta, it would have been obvious to one of ordinary skill at the time the invention was made to have the feature of obtain real-time content carried by the video signal, configured to perform vocabulary correction on the series of characters to selectively replace any erroneous character generated by the character recognition circuit with a correct character, incorporated in the device of Gonzales, as modified by Huang in order to correct error in on-screen caption text identified by OCR, which can present an error free screen to a user (as stated in Gupta ¶ [02]).   

Re claim 2: Gonzales discloses the display control IC of claim 1, further comprising: a storage unit (interpretation: The display control IC100 may comprise a storage unit to be one of the multiple sub-circuits, and some other sub-circuits among the multiple sub-circuits (e.g., the image processing circuit101, the preprocessing circuit110, the character recognition circuit120, the post-processing circuit130 and the V2S conversion circuit140) can share the storage unit, where the storage unit may comprise at least one line buffer, but the present invention is not limited thereto. For example, the storage unit may be integrated into a certain sub-circuit of the multiple sub-circuits, such as any of the image processing circuit101, the pre-processing circuit110, etc, which is described in ¶ [18].  This interpretation and its equivalents are utilized for this claim term hereinafter in the Office Action.) configured to store a partial image of the real-time video content for performing the preliminary text detection, wherein the partial image corresponds to more than one row of pixel data (e.g. as seen in figure 3, an image is stored within a frame buffer that is used to detect pixels that are used to form a bitmap of the subtitles that may be present within the frame stored.  The system determines alphanumeric characters formed are stored within a buffer as a result, and criteria is used to determine how quickly this information would be expected to change.  Figure 3 shows the image data is more than a row of pixel data, which is taught in ¶ [29] above.).
	However, Gonzales fails to specifically teach the features of the display control IC of claim 1, further comprising: a storage unit configured to store a partial image of the real-time video content for performing the preliminary text detection. 
However, this is well known in the art as evidenced by Gupta.  Similar to the primary reference, Gupta discloses replacing errors made from the OCR of caption data (same field of endeavor or reasonably pertinent to the problem).      
Gupta discloses a storage unit configured to store a partial image of the real-time video content (e.g. the system discloses content within a live performance that can be acquired, which is taught in ¶ [45] above.  The system discloses storing video data that contains a capture of text data on the screen.  The system allows for storing this information in order to perform text detection of the information on screen that is in a row on the screen, which is taught in ¶ [89], [90] and [93]-[96].).  

[0089] The cloud provides access to services, such as content storage, content sharing, or social networking services, among other examples, as well as access to any content described above, for user equipment devices. Services can be provided in the cloud through cloud computing service providers, or through other providers of online services. For example, the cloud-based services can include a content storage service, a content sharing site, a social networking site, or other services via which user-sourced content is distributed for viewing by others on connected devices. These cloud-based services may allow a user equipment device to store content to the cloud and to receive content from the cloud rather than storing content locally and accessing locally-stored content.

[0090] A user may use various content capture devices, such as camcorders, digital cameras with video mode, audio recorders, mobile phones, and handheld computing devices, to record content. The user can upload content to a content storage service on the cloud either directly, for example, from user computer equipment 804 or wireless user communications device 806 having content capture feature. Alternatively, the user can first transfer the content to a user equipment device, such as user computer equipment 804. The user equipment device storing the content uploads the content to the cloud using a data transmission service on communications network 814. In some embodiments, the user equipment device itself is a cloud resource, and other user equipment devices can access the content directly from the user equipment device on which the user stored the content.

[0093] FIG. 9 is a flowchart of an illustrative process 900 for control circuitry (e.g., control circuitry 704) to correct an erroneous term in on-screen caption text for a media asset displayed using a media guidance application in accordance with some embodiments of the disclosure. In some embodiments this algorithm may be encoded onto a non-transitory storage medium (e.g., storage device 708) as a set of instructions to be decoded and executed by processing circuitry (e.g., processing circuitry 706). Processing circuitry may in turn provide instructions to other sub-circuits contained within control circuitry 704, such as the tuning, video generating, encoding, decoding, encrypting, decrypting, scaling, analog/digital conversion circuitry, and the like.

[0094] An interactive media guidance application may cause control circuitry 704 to initialize the process for correcting an erroneous term in on-screen caption text of a media asset presented on a media guidance application. At step 910, the media guidance application may cause control circuitry 704 to analyze an audio stream of the media asset to determine a first text segment of the on-screen caption text. For example, the media guidance application may analyze the audio stream of a sports news commentary and automatically transcribe it into an on-screen caption text. The media guidance application may then cause control circuitry 704 to determine a first text segment of the on-screen caption text to be “It will be interesting to see how Tom Brady performs despite being in the news for div plate date.”

[0095] At step 920, the media guidance application may cause control circuitry 704 to identify an erroneous term in the first text segment of the on-screen caption text. For example, the media guidance application may cause control circuitry 704 to identify that “div plate date” is an erroneous term in the first text segment. In some embodiments, the media guidance application may identify the erroneous term by performing natural language processing on the first text segment to compare the first text segment against a plurality of grammar rules. For example, the media guidance application may compare the sentence above against a grammar rule that requires the word “div” be followed by a number (e.g., as in “NCAA div one”) and determine that “div plate date” is an erroneous term because it conflicts with the grammar rule.

[0096] At step 930, the media guidance application may cause control circuitry 704 to extract one or more video frames from a video stream of the media asset corresponding to the first text segment. For example, the media guidance application may cause control circuitry 704 to extract a video frame from the media asset corresponding to the time that the sentence above appeared in the audio stream. The video stream may be a news interview of Tom Brady, which includes a few video frames displaying the following sentence in a banner overlaying the interview: “News of the Hour: Patriots quarterback serves NFL suspension.” The media guidance application may cause control circuitry 704 to extract these video frames because they correspond to substantially the same time as the sentence “It will be interesting to see how Tom Brady performs despite being in the news for div plate date” is announced on the news.

Therefore, in view of Gupta, it would have been obvious to one of ordinary skill at the time the invention was made to have the feature of a storage unit configured to store a partial image of the real-time video content, incorporated in the device of Gonzales, as modified by Huang, in order to correct error in on-screen caption text identified by OCR, which can present an error free screen to a user (as stated in Gupta ¶ [02]).   
.   
Re claim 5: Gonzales discloses the display control IC of claim 1, wherein the pre-processing circuit further comprises: 
a denoise circuit, coupled to the text detection circuit, configured to perform denoising processing on the at least one text-existence image to generate at least one denoised text image (e.g. if the image is too noisy for the OCR function to properly recognize the text, the processor can be used to filter, or denoise, the image in order to have a clearer image to recognize the characters, which is taught in ¶ [29] above.); and 
a character isolation circuit, coupled to the denoise circuit, configured to perform character isolation on the at least one denoised text image to segment the at least one denoised text image into the series of segmented character images (e.g. the denoising, or filtering, of the part of the noisy image allows for isolating the text from the noisy image in order to create a clearer image to recognize the text by the OCR operation, which is taught in ¶ [29] above.).  

Re claim 6: However, Gonzales fails to specifically teach the features of the display control IC of claim 1, wherein the text detection circuit monitors whether the at least one text-existence image appears in the respective filtered images of a series of continuous frames, in order to prevent triggering repeated processing regarding the at least one text-existence image.  
However, this is well known in the art as evidenced by Huang.  Similar to the primary reference, Huang discloses recognizing text within a video displayed (same field of endeavor or reasonably pertinent to the problem).      
Huang discloses wherein the text detection circuit monitors whether the at least one text-existence image appears in the respective filtered images of a series of continuous frames, in order to prevent triggering repeated processing regarding the at least one text-existence image (e.g. the invention discloses recognizing a video in order to determine candidate subtitle regions where text for subtitles can appear.  The system screens the candidate regions in order to select the one region where the appearance of the subtitles occur the longest.  This is performed in order to select a single candidate area for detection of subtitles for later recognition of text.  This is explained in ¶ [48]-[56].). 

[0048] Step 101: Recognize a video to obtain n candidate subtitle regions, the candidate subtitle regions being regions in which text contents are displayed in the video, and n being a positive integer.

[0049] Exemplarily, the video may be a video file of any type, for example, a short video, a TV series, a movie, a variety show, or the like. Exemplarily, a subtitle is included in the video. Taking a short video as an example, texts in a short video screen not only includes a subtitle, but may also include other text information, such as a watermark text of a short video application, a user nickname of a short video publisher, a video name of the short video, and the like. Therefore, a subtitle of a short video cannot be accurately obtained simply by using the OCR technology for text recognition. In addition, a lot of labors are required to manually mark a subtitle region and then perform text recognition on a marked position to obtain the subtitle. Therefore, this application provides a method for recognizing a subtitle, which can accurately recognize the subtitle from a plurality of text information in a video, omit the step of manually marking a subtitle region, and improve the efficiency of subtitle extraction.

[0050] Exemplarily, the method for obtaining a video may be arbitrary, and the video may be a video file locally stored by a computer device, or may be a video file obtained by another computer device. For example, when the computer device is a server, the server may receive a video file uploaded by a terminal; and when the computer device is a terminal, the terminal may also download a video file stored on a server through a network. For example, the computer device is a server, a client with a font extraction function may be installed on a terminal, a user may select a locally stored video file on a user interface of the client, and click an upload control to upload the video file to the server, and the server performs subsequent processing of subtitle region recognition on the video file.

[0051] A candidate subtitle region refers to a region in a video in which text contents are displayed. Exemplarily, the candidate subtitle region includes a region in which text contents are displayed in each frame of a video screen in the video. The candidate subtitle region is a type of region position with a clear region range and position coordinates. Exemplarily, text regions in which text contents with similar positions in the video are located are clustered into a candidate subtitle region.

[0052] Step 102: Screen the n candidate subtitle regions according to a subtitle region screening policy to obtain the subtitle region, the subtitle region screening policy being used for determining a candidate subtitle region in which text contents have a repetition rate being lower than a repetition rate threshold and have a longest total display duration as the subtitle region.

[0053] Exemplarily, based on characteristics that text contents displayed in a subtitle region are diverse and the text contents are displayed in the subtitle region for a long time, from a plurality of candidate subtitle regions, a candidate subtitle region in which text contents have a repetition rate being lower than a repetition rate threshold and are displayed for a long time is determined as a subtitle region.

[0054] The repetition rate of text contents is high, that is, a variety of text contents are displayed in the candidate subtitle region, and the repetition rate of text contents is low, that is, only one or several types of text contents are displayed in the candidate subtitle region.

[0055] The total display duration refers to a total duration of text contents displayed in a candidate subtitle region. Because a subtitle is usually displayed for a long time in a video, a candidate subtitle region with text contents displayed for a long time is selected as a subtitle region.

[0056] In summary, in the method provided in this embodiment, a subtitle region is obtained by screening candidate subtitle regions recognized from a video by using a subtitle region screening policy. According to characteristics of a fixed display position, diverse text contents, and a relatively long display duration of a subtitle, the subtitle region is selected from the candidate subtitle regions. Therefore, a subtitle of the video can be extracted according to the subtitle region. Compared with a method of manually marking the subtitle region, this method saves labor resources required for subtitle recognition and improves the speed and efficiency of the subtitle recognition.

Therefore, in view of Huang, it would have been obvious to one of ordinary skill at the time the invention was made to have the feature of wherein the text detection circuit monitors whether the at least one text-existence image appears in the respective filtered images of a series of continuous frames, in order to prevent triggering repeated processing regarding the at least one text-existence image, incorporated in the device of Gonzales, in order to determine which area contains subtitles within a part of the video, which can improve the speed and efficiency of subtitle recognition (as stated in Huang ¶ [56]).   

Re claim 8: Gonzales discloses the display control IC of claim 1, wherein according to any predetermined character data set among multiple predetermined character data sets, the character recognition circuit determines similarity between the series of segmented character images and the any predetermined character data set, in order to recognize the series of characters from the series of segmented character images (e.g. the system performs the character recognition by pattern matching, which involves matching a detected pattern with a pre-existing or template of a pattern that conforms to a character.  This is discussed in ¶ [27].  Conventional OCR compares detected patterns to stored patterns that reflect a character.).  

[0027] The character recognition block 102 recognizes the graphical representation of characters that are present in the digitized video signal 101B by using any one of a number of suitable methods, such as by using a method based on pattern matching and/or OCR.

Re claim 9: However, Gonzales fails to specifically teach the features of the display control IC of claim 1, wherein the post-processing circuit determines whether the any erroneous character exists according to a predetermined vocabulary data set, for selectively replacing the any erroneous character with the correct character.  
However, this is well known in the art as evidenced by Gupta.  Similar to the primary reference, Gupta discloses replacing errors made from the OCR of caption data (same field of endeavor or reasonably pertinent to the problem).      
Gupta discloses wherein the post-processing circuit determines whether the any erroneous character exists according to a predetermined vocabulary data set, for selectively replacing the any erroneous character with the correct character (e.g. the system determines, based on the OCR operation on subtitle or closed caption data recognized on the screen, if the data is correct in terms of the context.  If not, the incorrect characters within a word or phrase are changed to reflect the correct information, which is taught in [28]-[34] and [41] above.).
Therefore, in view of Gupta, it would have been obvious to one of ordinary skill before the effective filing date of the claimed invention was made to have the feature of wherein the post-processing circuit determines whether the any erroneous character exists according to a predetermined vocabulary data set, for selectively replacing the any erroneous character with the correct character, incorporated in the device of Gonzales, as modified by Huang, in order to correct error in on-screen caption text identified by OCR, which can present an error free screen to a user (as stated in Gupta ¶ [02]).   

Re claim 10: Gonzales discloses the display control IC of claim 1, further comprising: 
a vocabulary-to-speech conversion circuit, coupled to the post-processing circuit, configured to perform vocabulary-to-speech conversion on the one or more vocabularies to generate an audio signal corresponding to the one or more vocabularies for outputting speech (e.g. the invention discloses performing text to speech that converts text recognized into an audio signal that is used to output speech, which is taught in ¶ [34], [37] and [38].).  

[0034] Note in FIG. 1 that the textual information may be presented to the viewer in an audio (speech) signal format by performing text-to-speech synthesis in block 107. The choice of textual or audio (or both) is preferably made user-selectable. The output 207A of the synthesizer 207 can be multiplexed into the audio signal 105A.

[0037] Still referring to FIG. 2, the buffered video data 204A and buffered audio data 205A are decoded by dedicated MPEG-2 video and audio decoders 208 and 209, respectively. The translated text 203A as well as the decoded video data 208A are multiplexed together in block 206, where either subtitles or closed captions with translated text are generated as shown in FIG. 5. As in the analog embodiment of FIG. 1, it may be desirable present the textual information as a speech signal by performing text-to-speech synthesis in block 207.

[0038] For the case where the second embodiment depicted in FIG. 2 is embodied as a module or a subsystem of a further system or audiovisual appliance, such as a DVD player or a set-top-box, the functionality of at least blocks 204, 205, 206, 208 and 209 can be performed by the further system as a part of its operation such that the system 200 may need to contain only those blocks shown as 201, 202, 203 and 207. That is, the functionality of the system 200, as well as the system 100, may be distributed over two or more systems. Further in this regard, and relevant also to FIG. 1, while the blocks shown in the Figures have been described as hardware blocks, a number of these blocks may be implemented by a suitably programmed data processor or data processors as algorithms and processes. Still further in this regard, the block diagrams of FIGS. 1-5 maybe viewed as well as logic flow diagrams, wherein the individual blocks are implemented by hardware circuitry, by software instructions executed by a data processor and stored on or within a computer-readable media, or by a combination of hardware and software instructions. Still further in this regard, and also relevant to FIG. 1, the blocks shown in the Figures may be integrated within one or more integrated circuits.

Claim(s) 3 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gonzales, as modified by Huang and Gupta, as applied to claim 2 above, and further in view of IC technology (Common Knowledge).

Re claim 3: However, Chen fails to specifically teach the features of the display control IC of claim 2, wherein the display control IC comprises multiple sub-circuits, and the multiple sub-circuits comprise the pre-processing circuit, the character recognition circuit and the post-processing circuit; and the storage unit is integrated into one of the multiple sub-circuits.  
	However, this is well known in the art as evidenced by IC Technology (Common Knowledge).  Similar to the primary reference, IC Technology discloses multiple circuits that are used for different functions of an invention (same field of endeavor or reasonably pertinent to the problem).     
	IC Technology discloses wherein the display control IC comprises multiple sub-circuits, and the multiple sub-circuits comprise the pre-processing circuit, the character recognition circuit and the post-processing circuit; and the storage unit is integrated into one of the multiple sub-circuits (e.g. it is well known to have multiple circuits used to perform multiple separate functions of an invention with a main circuit controlling the functions of the sub-circuits.  Since one of the circuits can perform the modification of the data, having a form of ROM or RAM on the circuit can speed up the process of the function to be performed locally.).  
	Therefore, in view of IC Technology, it would have been obvious to one of ordinary skill at the time the invention was made to have the feature of wherein the display control IC comprises multiple sub-circuits, and the multiple sub-circuits comprise the pre-processing circuit, the character recognition circuit and the post-processing circuit; and the storage unit is integrated into one of the multiple sub-circuits., incorporated in the device of Chen, in order to have different circuits perform differing functions, which can improve the efficiency of the processing of the overall functions.   

Claim(s) 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gonzales, as modified by Huang and Gupta, as applied to claim 1 above, and further in view of Wang (CN Pub 10-7862315 (Pub date: 9/17/2019)).

Re claim 7: However, Gonzales fails to specifically teach the features of the display control IC of claim 1, wherein the text detection circuit calculates respective characteristic values of a current pixel and multiple neighboring pixels, and determines, according to whether the respective characteristic values of the current pixel and the multiple neighboring pixels fall within a background interval or a line interval among multiple predetermined intervals, whether the current pixel and the multiple neighboring pixels belong to the background or any line of the multiple lines, wherein the background interval and the line interval are defined by at least one threshold.  
However, this is well known in the art as evidenced by Wang.  Similar to the primary reference, Wang discloses creating subtitles from video data (same field of endeavor or reasonably pertinent to the problem).     
Wang discloses wherein the text detection circuit calculates respective characteristic values of a current pixel and multiple neighboring pixels, and determines, according to whether the respective characteristic values of the current pixel and the multiple neighboring pixels fall within a background interval or a line interval among multiple predetermined intervals, whether the current pixel and the multiple neighboring pixels belong to the background interval or any line of the multiple predetermined intervals, wherein the background interval and the line interval are defined by at least one threshold (e.g. the invention discloses determining average color value of pixels in the caption images that includes a main and neighboring pixels, which is taught in ¶ [118]-[121].  The system uses a threshold to determine whether the pixels associated with the image is a background image or a character image using the corresponding average color value.  This is explained in ¶ [122]-[126].).

[0118] S2032, the first title image for binarization processing to obtain the corresponding binary image;
[0119] In this step, the process of binarization processing may refer to the following process:
[0120] S20321, determining the average colour value of each pixel in the first caption image;
[0121] This step can be specifically: the colour value of each pixel in the image of the first subtitle are summed, and then divided by the number of pixels to obtain the average colour value of each pixel.
[0122] S20322, a binarization threshold value according to the average colour values, determining for performing binarization processing;
[0123] it can be understood that the average color value about the binarization threshold value is larger.
[0124] S20323. The binarization threshold value, performing binarization treatment to the first title image.
[0125] Here the average colour value of each pixel of the first caption image in determining the binarization threshold, namely, determining the binarization threshold value is a dynamic and adaptive process, first the different caption image can be determined with different binarization threshold, the binarization threshold value is suitable for binarization parameter of the first caption image.
[0126] FIG. 5 is the binary image of FIG. 4a after performing binarization processing, such processing is beneficial to analyze which region is a background region, which region is a character region. However, in the FIG. 5 can be seen, as the interference part of the background, the binary image is not ideal, so if only may affect the accuracy of the subsequent character recognition according to the binary image to remove the background, therefore also in the example of the invention combines the character outline comprehensively determines the background area, in order to obtain more accurate background region.

Therefore, in view of Wang, it would have been obvious to one of ordinary skill at the time the invention was made to have the feature of wherein the text detection circuit calculates respective characteristic values of a current pixel and multiple neighboring pixels, and determines, according to whether the respective characteristic values of the current pixel and the multiple neighboring pixels fall within a background interval or a line interval among multiple predetermined intervals, whether the current pixel and the multiple neighboring pixels belong to the background or any line of the multiple lines, wherein the background interval and the line interval are defined by at least one threshold, incorporated in the device of Gonzales, as modified by Huang and Gupta, in order to utilize a threshold to determine a character region or a background region, which improves the accuracy of determining a background region (as stated in Wang ¶ [126]).   
  
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Yosunaga discloses multiple circuits and sub circuits to perform the functions of the invention.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHAD S DICKERSON whose telephone number is (571)270-1351. The examiner can normally be reached Monday-Friday 10AM-6PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abderrahim Merouan can be reached on 571-270-5254. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/CHAD DICKERSON/           Primary Examiner, Art Unit 2681
Read full office action
Prosecution Timeline

Dec 01, 2021
Application Filed
Feb 24, 2024
Non-Final Rejection — §103
May 28, 2024
Response Filed
Sep 02, 2024
Final Rejection — §103
Dec 04, 2024
Request for Continued Examination
Dec 05, 2024
Response after Non-Final Action
Dec 14, 2024
Non-Final Rejection — §103
Mar 18, 2025
Response Filed
Jun 28, 2025
Non-Final Rejection — §103
Oct 01, 2025
Response Filed
Jan 10, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/016,602
Patent 12602908
INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND RECORDING MEDIUM
2y 5m to grant Granted Apr 14, 2026
18/467,670
Patent 12603960
IMAGE ANALYSIS APPARATUS, IMAGE ANALYSIS SYSTEM, IMAGE ANALYSIS METHOD, PROGRAM, AND NON-TRANSITORY COMPUTER READABLE RECORDING MEDIUM COMPRISING READING A PRINTED MATTER, ANALYZING CONTENT RELATED TO READING OF THE PRINTED MATTER AND ACQUIRING SUPPORT INFORMATION BASED ON AN ANALYSIS RESULT OF THE CONTENT FOR DISPLAY TO ASSIST A USER IN FURTHER READING OPERATIONS
2y 5m to grant Granted Apr 14, 2026
18/337,243
Patent 12579817
Vehicle Control Device and Control Method Thereof for Camera View Control Based on Surrounding Environment Information
2y 5m to grant Granted Mar 17, 2026
18/217,785
Patent 12522110
APPARATUS AND METHOD OF CONTROLLING THE SAME COMPRISING A CAMERA AND RADAR DETECTION OF A VEHICLE INTERIOR TO REDUCE A MISSED OR FALSE DETECTION REGARDING REAR SEAT OCCUPATION
2y 5m to grant Granted Jan 13, 2026
18/273,555
Patent 12519896
IMAGE READING DEVICE COMPRISING A LENS ARRAY INCLUDING FIRST LENS BODIES AND SECOND LENS BODIES, A LIGHT RECEIVER AND LIGHT BLOCKING PLATES THAT ARE BETWEEN THE LIGHT RECEIVER AND SECOND LENS BODIES, THE THICKNESS OF THE LIGHT BLOCKING PLATES EQUAL TO OR GREATER THAN THE SECOND LENS BODIES THICKNESS
2y 5m to grant Granted Jan 06, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
63%
Grant Probability
86%
With Interview (+23.0%)
2y 9m
Median Time to Grant
High
PTA Risk
Based on 600 resolved cases by this examiner. Grant probability derived from career allow rate.
DISPLAY CONTROL INTEGRATED CIRCUIT APPLICABLE TO PERFORMING REAL-TIME VIDEO CONTENT TEXT DETECTION AND SPEECH AUTOMATIC GENERATION IN DISPLAY DEVICE

This examiner grants 63% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email