Last updated: May 04, 2026
Application No. 18/086,238
OCR BASED ON ML TEXT SEGMENTATION INPUT

Non-Final OA §103
Filed
Dec 21, 2022
Examiner
DIGUGLIELMO, DANIELLA MARIE
Art Unit
2666
Tech Center
2600 — Communications
Assignee
Raytheon Company
OA Round
3 (Non-Final)
Interview Optional

— +25.9% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 81% grant rate with +25.9% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 174 resolved cases, 2023–2026
Examiner Intelligence

DIGUGLIELMO, DANIELLA MARIE View full profile →
Grants 81% — above average
Career Allowance Rate
141 granted / 174 resolved
+19.0% vs TC avg
Strong +26% interview lift
Without
With
+25.9%
Interview Lift
resolved cases with interview
Typical timeline
2y 7m
Avg Prosecution
23 currently pending
Career history
197
Total Applications
across all art units
Statute-Specific Performance

§101
12.8%
-27.2% vs TC avg
§103
35.4%
-4.6% vs TC avg
§102
10.3%
-29.7% vs TC avg
§112
33.3%
-6.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 174 resolved cases
Office Action

§103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
Claims 1-20 are pending.

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 1/20/26 has been entered.

Response to Arguments
Regarding claim 1, applicant’s arguments in p. 6-8 of the remarks filed on 1/7/26, with respect to the prior art of record Kumar and Gaither, have been considered but are moot in view of the new ground of rejection. As shown below, the rejection is now modified in view of Lin (US 2015/0254507 A1). 

Applicant's arguments filed 1/7/26 with respect to the claims have been fully considered but they are not persuasive.
First, regarding claims 5, 12, and 19, applicant argues in p. 8 of the remarks that metadata is not output in Kumar. The Examiner respectfully disagrees. As shown in the rejection below, Kumar teaches that font characteristics/font type is part of the text. However, please note that the examiner primarily uses the new prior art of record Lin to teach the claim limitations.
Second, in response to applicant's argument for claims 6, 13, and 20 in p. 8 of the remarks that Nicholson does not teach a text segmentation model outputting font type metadata and the metadata being supplied to the OCR engine, the test for obviousness is not whether the claimed invention must be expressly suggested in any one or all of the references.  Rather, the test is what the combined teachings of the references would have suggested to those of ordinary skill in the art.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981). As shown in the rejection below, the examiner relies on the combination of references as a whole to teach the limitations, in which Nicholson is used to teach metadata indicating a font type.
Third, in response to applicant's argument for claims 7 and 14 in p. 8 of the remarks that Yuan does not teach metadata being from the text segmentation model and being provided to the OCR engine along with the per-pixel classification map, the test for obviousness is not whether the claimed invention must be expressly suggested in any one or all of the references.  Rather, the test is what the combined teachings of the references would have suggested to those of ordinary skill in the art.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981). As shown in the rejection below, the examiner relies on the combination of references as a whole to teach the limitations, in which Yuan is used to teach metadata indicating a language of the text.
Fourth, in response to applicant's argument in p. 8-9 of the remarks that the references fail to show certain features of the invention, it is noted that the features upon which applicant relies (i.e., does not require any specialized training, improve performance of OCR without further training) are not recited in the rejected claim(s). Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5, 8-12, and 15-19 are rejected under 35 U.S.C. 103 as being unpatentable over Kumar et al. (US 10,489,682 B1, hereinafter “Kumar”) in view of Lin et al. (US 2015/0254507 A1, hereinafter “Lin”) and further in view of Gaither (US 6,771,816 B1).
Regarding claim 1, Kumar teaches, A method for optical character recognition (OCR) comprising (Fig. 1: OCR 100; Col. 3, lines 4-6: “systems and methods are disclosed that employ a trained deep-neural net for OCR of documents”; Col. 3, lines 49-59: computerized digitization system employing a deep-learning based OCR system):
providing an image including text as input to a text segmentation model (Fig. 1: document image 101 is input into automatic segmentation 106; Col. 1, lines 54-64: there are synthetic text segments and real-life text segments in which the real-life segments are extracted out of documents; Col. 4, lines 9-24: automatic segmentation module breaks up a document image into sub-images of characters and words and are processed by the OCR system), the text segmentation model is a machine learning (ML) model trained in a supervised manner (Col. 3, lines 56-66: deep learning system employs machine learning algorithms to perform OCR, in which the algorithms may learn in a supervised manner; Col. 4, lines 9-24: automatic segmentation module breaks up a document image into sub-images of characters and words and are processed by the OCR system).
Kumar does not expressly disclose the following limitations: receiving, from the text segmentation model, a per-pixel image segmentation map of the image, the per-pixel image segmentation map comprising a binary, pixel-wise text/non-text classification mask output by the text segmentation model, the binary, pixel-wise text/non-text classification mask indicates, for each pixel, whether the pixel is classified, by the text segmentation model, as including a text character or not including a text character; providing the per-pixel image segmentation map as input to an OCR engine in place of image pixels of the image; and receiving, as output from the OCR engine based on the per-pixel image segmentation map, a digitized version of the image. 
However, Lin teaches, receiving, from the text segmentation model, a comprising a binaryindicates(Para. 0017: text is detected in the image using algorithms that detect and recognize the location of text in the image and the region of the image that includes text can be selected or cropped to remove irrelevant portions of the image and to highlight relevant regions containing text. The relevant regions are binarized; Para. 0018: “In various embodiments, detecting text in the image 102 can include locating regions of extremes (e.g., regions of sharp transitions between pixel values) such as the edges of letters. The regions of extremes, or the maximally stable extremal regions, can be extracted and analyzed to detect characters, where the detected characters can be connected and/or aggregated. A text line algorithm can be used to determine the orientation of the connected characters, and once the orientation of the characters is determined, a binary mask of the region containing the characters can be extracted. The binary mask can be converted into a black white representation; Fig. 3: In image 302, the text is in black; Note: a binary mask consists of only two values (i.e., black and white). The Examiner interprets the relevant regions containing text as text characters);
providing the  in place of  (Para. 0018: The binary mask can be converted into a black white representation, and the black white representation can be communicated to an optical character recognition engine (OCR) or other text recognition engine for further processing. In accordance with various embodiments, the binary mask is provided to a first recognition engine 104a, a second recognition engine 104b, and an nth recognition engine 104n for concurrent character recognition processing in a multithreaded mode; Note: the Examiner interprets the binary mask being provided to the OCR/recognition engine as providing the binary mask in place of the image); 
and receiving, as output from the OCR engine based on the (Para. 0002: “Optical character recognition (OCR) systems are generally used to detect text present in an image and to convert the detected text into its equivalent electronic representation”; Para. 0017; Para. 0018: The binary mask can be converted into a black white representation, and the black white representation can be communicated to an optical character recognition engine (OCR) or other text recognition engine for further processing. In accordance with various embodiments, the binary mask is provided to a first recognition engine 104a, a second recognition engine 104b, and an nth recognition engine 104n for concurrent character recognition processing in a multithreaded mode; Para. 0025: an Optical character recognition service is accessed for communicating digital information). 
It would have been obvious before the effective filing date of the claimed invention, to one of ordinary skill in the art, to combine receiving a binary image segmentation mask output by the text segmentation model (i.e., algorithm) that indicates text characters, providing the image segmentation map as input to an OCR in place of the image, and receiving a digitized image as output from the OCR based on the image segmentation map as taught by Lin with the character recognition of Kumar in order to improve text recognition precision (Lin, Para. 0002). Therefore, one of ordinary skill in the art would be capable to have combined the elements as claimed by known methods and that in combination, each element merely performs the same function as it does separately. 
The combination of Kumar and Lin does not expressly disclose the following limitations: a per-pixel image segmentation map of the image, the per-pixel image segmentation map comprising a pixel-wise text/non-text classification, the pixel-wise text/non-text classification indicates, for each pixel, whether the pixel is classified, or not including a text character; the per-pixel image segmentation map; image pixels of the image; the per-pixel image segmentation map. 
However, Gaither teaches, a per-pixel image segmentation map of the image, the per-pixel image segmentation map comprising a pixel-wise text/non-text classification, the pixel-wise text/non-text classification indicates, for each pixel, whether the pixel is classified, or not including a text character (As shown in Col. 5, lines 27-42, the bitmap has pixels set to given values, in which pixels of text are set to 0 and pixels representing the background are set to 1; Note: the Examiner interprets non-text as the background); 
the per-pixel image segmentation map (As shown in Col. 5, lines 27-42, the bitmap has pixels set to given values, in which pixels of text are set to 0 and pixels representing the background are set to 1); 
image pixels of the image (As shown in Col. 5, lines 11-42, the text image has pixels that represent text and background); 
the per-pixel image segmentation map (As shown in Col. 5, lines 27-42, the bitmap has pixels set to given values, in which pixels of text are set to 0 and pixels representing the background are set to 1). 
It would have been obvious before the effective filing date of the claimed invention, to one of ordinary skill in the art, to combine the image segmentation map being a per-pixel image segmentation map (i.e., bitmap) that classifies whether a pixel does not indicate text (i.e., is the background), and the image having pixels as taught by Gaither with the combined character recognition of Kumar and Lin in order to generate a representation of the lexical unit for display on an output device (Gaither, Abstract). Therefore, one of ordinary skill in the art would be capable to have combined the elements as claimed by known methods and that in combination, each element merely performs the same function as it does separately. It is for at least the aforementioned that the Examiner has reached a conclusion of obviousness with respect to claim 1.

Regarding claim 2, the combination of Kumar, Lin, and Gaither teaches the limitations as explained above in claim 1.
The combination of Kumar, Lin, and Gaither further teaches, The method of claim 1 (see claim 1 above), wherein the per-pixel image segmentation map includes one of only two values, a first value or a second value for each pixel of the image, the first value different from the second value (Lin, Para. 0017: text is detected in the image using algorithms that detect and recognize the location of text in the image and the region of the image that includes text can be selected or cropped to remove irrelevant portions of the image and to highlight relevant regions containing text. The relevant regions are binarized; Lin, Para. 0018: “In various embodiments, detecting text in the image 102 can include locating regions of extremes (e.g., regions of sharp transitions between pixel values) such as the edges of letters. The regions of extremes, or the maximally stable extremal regions, can be extracted and analyzed to detect characters, where the detected characters can be connected and/or aggregated. A text line algorithm can be used to determine the orientation of the connected characters, and once the orientation of the characters is determined, a binary mask of the region containing the characters can be extracted. The binary mask can be converted into a black white representation; Gaither: As shown in Col, 5, lines 27-42, the bitmap has pixels set to given values, in which pixels of text are set to 0 and pixels representing the background are set to 1).  
The proposed combination as well as the motivation for combining the Kumar, Lin, and Gaither references presented in the rejection of claim 1 apply to claim 2 and are incorporated herein by reference. Thus, the method recited in claim 2 is met by Kumar, Lin, and Gaither.

Regarding claim 3, the combination of Kumar, Lin, and Gaither teaches the limitations as explained above in claim 2.
The combination of Kumar, Lin, and Gaither further teaches, The method of claim 2 (see claim 2 above), wherein the first value indicates the pixel is part of any character in the image and the second value indicates the pixel is not part of any character in the image (Gaither: Col. 1, lines 6-16; Gaither: As shown in Col, 5, lines 27-42, the bitmap has pixels set to given values, in which pixels of text are set to 0 and pixels representing the background are set to 1; Note: the Examiner interprets the background (i.e. pixel value of 1) as not being part of the text/character).  
The proposed combination as well as the motivation for combining the Kumar, Lin, and Gaither references presented in the rejection of claim 2 apply to claim 3 and are incorporated herein by reference. Thus, the method recited in claim 3 is met by Kumar, Lin, and Gaither.

Regarding claim 4, the combination of Kumar, Lin, and Gaither teaches the limitations as explained above in claim 3.
The combination of Kumar, Lin, and Gaither further teaches, The method of claim 3 (see claim 3 above), wherein the first value corresponds to a dark color and the second value corresponds to a light color (Lin, Para. 0018: “The binary mask can be converted into a black white representation, and the black white representation can be communicated to an optical character recognition engine (OCR) or other text recognition engine for further processing”;  Lin, Fig. 3: In image 302, the text is in black; Gaither: As shown in Col. 5, lines 27-42, the bitmap has pixels set to given values, in which pixels of text are set to 0 and pixels representing the background are set to 1; Note: text is black (i.e., a dark color and pixel value of 0), background is white (i.e., light color and value of 1)).  
The proposed combination as well as the motivation for combining the Kumar, Lin, and Gaither references presented in the rejection of claim 3 apply to claim 4 and are incorporated herein by reference. Thus, the method recited in claim 4 is met by Kumar, Lin, and Gaither.

Regarding claim 5, the combination of Kumar, Lin, and Gaither teaches the limitations as explained above in claim 1.
The combination of Kumar, Lin, and Gaither further teaches, The method of claim 1 (see claim 1 above), further comprising: further receiving, from the text segmentation model based on the image, metadata regarding characters in the text in the image (Lin, Para. 0017: text is detected in the image using algorithms that detect and recognize the location of text in the image and the region of the image that includes text can be selected or cropped to remove irrelevant portions of the image and to highlight relevant regions containing text. The relevant regions are binarized; Lin, Para. 0018: “In various embodiments, detecting text in the image 102 can include locating regions of extremes (e.g., regions of sharp transitions between pixel values) such as the edges of letters. The regions of extremes, or the maximally stable extremal regions, can be extracted and analyzed to detect characters, where the detected characters can be connected and/or aggregated. A text line algorithm can be used to determine the orientation of the connected characters, and once the orientation of the characters is determined, a binary mask of the region containing the characters can be extracted. The binary mask can be converted into a black white representation and the black white representation can be communicated to an optical character recognition engine (OCR) or other text recognition engine for further processing”; Lin, Para. 0020: text has different variables including font style and font size which may vary the confidence scores for each detected character between different recognition engines; Note: the Examiner interprets font style/font size as metadata of the text. The text of the relevant regions is of various font styles/font sizes and the regions are detected by an algorithm to output a binary mask of the region. Therefore, the font style/font size is received from the algorithm during OCR processing; Kumar, Col. 4, line 46 to Col. 5, line 43: image of an invoice has text of different sizes with varying font characteristics (i.e., font types); Kumar: Fig. 3); 
and providing the metadata as further input to the OCR engine along with the per-pixel image segmentation map (Gaither: As shown in Col. 5, lines 27-42, the bitmap has pixels set to given values, in which pixels of text are set to 0 and pixels representing the background are set to 1; Lin, Para. 0017: text is detected in the image using algorithms that detect and recognize the location of text in the image and the region of the image that includes text can be selected or cropped to remove irrelevant portions of the image and to highlight relevant regions containing text. The relevant regions are binarized; Lin, Para. 0018: “In various embodiments, detecting text in the image 102 can include locating regions of extremes (e.g., regions of sharp transitions between pixel values) such as the edges of letters. The regions of extremes, or the maximally stable extremal regions, can be extracted and analyzed to detect characters, where the detected characters can be connected and/or aggregated. A text line algorithm can be used to determine the orientation of the connected characters, and once the orientation of the characters is determined, a binary mask of the region containing the characters can be extracted. The binary mask can be converted into a black white representation and the black white representation can be communicated to an optical character recognition engine (OCR) or other text recognition engine for further processing”; Lin, Para. 0019: text is recognized by recognition engines; Lin, Para. 0020: text has different variables including font style and font size which may vary the confidence scores for each detected character between different recognition engines; Note: As shown in Lin, since the binary mask of the text regions (which includes text of various font styles/font sizes) is communicated to an OCR for further processing, the metadata is provided with the binary mask). 
The proposed combination as well as the motivation for combining the Kumar, Lin, and Gaither references presented in the rejection of claim 1 apply to claim 5 and are incorporated herein by reference. Thus, the method recited in claim 5 is met by Kumar, Lin, and Gaither.

Regarding claim 8, Kumar teaches, A non-transitory machine-readable medium including instructions that, when executed by a machine, cause the machine to perform operations for optical character recognition (OCR), the operations comprising (Fig. 1: OCR 100; Col. 3, lines 4-6: “systems and methods are disclosed that employ a trained deep-neural net  for OCR of documents”; Col. 3, lines 49-59: computerized digitization system employing a deep-learning based OCR system; Col. 9, lines 24-60: the computing system includes and memory and processing units in which the processing units execute computer executable instructions. The storage medium may be non-transitory):
providing an image including text as input to a text segmentation model (Fig. 1: document image 101 is input into automatic segmentation 106; Col. 1, lines 54-64: there are synthetic text segments and real-life text segments in which the real-life segments are extracted out of documents; Col. 4, lines 9-24: automatic segmentation module breaks up a document image into sub-images of characters and words and are processed by the OCR system), the text segmentation model is a machine learning (ML) model trained in a supervised manner (Col. 3, lines 56-66: deep learning system employs machine learning algorithms to perform OCR, in which the algorithms may learn in a supervised manner; Col. 4, lines 9-24: automatic segmentation module breaks up a document image into sub-images of characters and words and are processed by the OCR system);
Kumar does not expressly disclose the following limitations: receiving, from the text segmentation model, a per-pixel image segmentation map of the image, the per-pixel image segmentation map comprising a binary, pixel-wise text/non-text classification mask output by the text segmentation model, the binary, pixel-wise text/non-text classification mask indicates, for each pixel, whether the pixel is classified, by the text segmentation model, as including a text character or not including a text character; providing the per-pixel image segmentation map as input to an OCR engine in place of image pixels of the image; and receiving, as output from the OCR engine based on the per-pixel image segmentation map, a digitized version of the image. 
However, Lin teaches, receiving, from the text segmentation model, a comprising a binaryindicates(Para. 0017: text is detected in the image using algorithms that detect and recognize the location of text in the image and the region of the image that includes text can be selected or cropped to remove irrelevant portions of the image and to highlight relevant regions containing text. The relevant regions are binarized; Para. 0018: “In various embodiments, detecting text in the image 102 can include locating regions of extremes (e.g., regions of sharp transitions between pixel values) such as the edges of letters. The regions of extremes, or the maximally stable extremal regions, can be extracted and analyzed to detect characters, where the detected characters can be connected and/or aggregated. A text line algorithm can be used to determine the orientation of the connected characters, and once the orientation of the characters is determined, a binary mask of the region containing the characters can be extracted. The binary mask can be converted into a black white representation; Fig. 3: In image 302, the text is in black; Note: a binary mask consists of only two values (i.e., black and white). The Examiner interprets the relevant regions containing text as text characters);
providing the  in place of  (Para. 0018: The binary mask can be converted into a black white representation, and the black white representation can be communicated to an optical character recognition engine (OCR) or other text recognition engine for further processing. In accordance with various embodiments, the binary mask is provided to a first recognition engine 104a, a second recognition engine 104b, and an nth recognition engine 104n for concurrent character recognition processing in a multithreaded mode; Note: the Examiner interprets the binary mask being provided to the OCR/recognition engine as providing the binary mask in place of the image); 
and receiving, as output from the OCR engine based on the (Para. 0002: “Optical character recognition (OCR) systems are generally used to detect text present in an image and to convert the detected text into its equivalent electronic representation”; Para. 0017; Para. 0018: The binary mask can be converted into a black white representation, and the black white representation can be communicated to an optical character recognition engine (OCR) or other text recognition engine for further processing. In accordance with various embodiments, the binary mask is provided to a first recognition engine 104a, a second recognition engine 104b, and an nth recognition engine 104n for concurrent character recognition processing in a multithreaded mode; Para. 0025: an Optical character recognition service is accessed for communicating digital information). 
It would have been obvious before the effective filing date of the claimed invention, to one of ordinary skill in the art, to combine receiving a binary image segmentation mask output by the text segmentation model (i.e., algorithm) that indicates text characters, providing the image segmentation map as input to an OCR in place of the image, and receiving a digitized image as output from the OCR based on the image segmentation map as taught by Lin with the character recognition of Kumar in order to improve text recognition precision (Lin, Para. 0002). Therefore, one of ordinary skill in the art would be capable to have combined the elements as claimed by known methods and that in combination, each element merely performs the same function as it does separately. 
The combination of Kumar and Lin does not expressly disclose the following limitations: a per-pixel image segmentation map of the image, the per-pixel image segmentation map comprising a pixel-wise text/non-text classification, the pixel-wise text/non-text classification indicates, for each pixel, whether the pixel is classified, or not including a text character; the per-pixel image segmentation map; image pixels of the image; the per-pixel image segmentation map. 
However, Gaither teaches, a per-pixel image segmentation map of the image, the per-pixel image segmentation map comprising a pixel-wise text/non-text classification, the pixel-wise text/non-text classification indicates, for each pixel, whether the pixel is classified, or not including a text character (As shown in Col. 5, lines 27-42, the bitmap has pixels set to given values, in which pixels of text are set to 0 and pixels representing the background are set to 1; Note: the Examiner interprets non-text as the background); 
the per-pixel image segmentation map (As shown in Col. 5, lines 27-42, the bitmap has pixels set to given values, in which pixels of text are set to 0 and pixels representing the background are set to 1); 
image pixels of the image (As shown in Col. 5, lines 11-42, the text image has pixels that represent text and background); 
the per-pixel image segmentation map (As shown in Col. 5, lines 27-42, the bitmap has pixels set to given values, in which pixels of text are set to 0 and pixels representing the background are set to 1). 
It would have been obvious before the effective filing date of the claimed invention, to one of ordinary skill in the art, to combine the image segmentation map being a per-pixel image segmentation map (i.e., bitmap) that classifies whether a pixel does not indicate text (i.e., is the background), and the image having pixels as taught by Gaither with the combined character recognition of Kumar and Lin in order to generate a representation of the lexical unit for display on an output device (Gaither, Abstract). Therefore, one of ordinary skill in the art would be capable to have combined the elements as claimed by known methods and that in combination, each element merely performs the same function as it does separately. It is for at least the aforementioned that the Examiner has reached a conclusion of obviousness with respect to claim 8.

Regarding claim 9, the combination of Kumar, Lin, and Gaither teaches the limitations as explained above in claim 8.
The combination of Kumar, Lin, and Gaither further teaches, The non-transitory machine-readable medium of claim 8 (see claim 8 above), wherein the per-pixel image segmentation map includes one of only two values, a first value or a second value for each pixel of the image, the first value different from the second value (Lin, Para. 0017: text is detected in the image using algorithms that detect and recognize the location of text in the image and the region of the image that includes text can be selected or cropped to remove irrelevant portions of the image and to highlight relevant regions containing text. The relevant regions are binarized; Lin, Para. 0018: “In various embodiments, detecting text in the image 102 can include locating regions of extremes (e.g., regions of sharp transitions between pixel values) such as the edges of letters. The regions of extremes, or the maximally stable extremal regions, can be extracted and analyzed to detect characters, where the detected characters can be connected and/or aggregated. A text line algorithm can be used to determine the orientation of the connected characters, and once the orientation of the characters is determined, a binary mask of the region containing the characters can be extracted. The binary mask can be converted into a black white representation; Gaither: As shown in Col, 5, lines 27-42, the bitmap has pixels set to given values, in which pixels of text are set to 0 and pixels representing the background are set to 1).  
The proposed combination as well as the motivation for combining the Kumar, Lin, and Gaither references presented in the rejection of claim 8 apply to claim 9 and are incorporated herein by reference. Thus, the method recited in claim 9 is met by Kumar, Lin, and Gaither.


Regarding claim 10, the combination of Kumar, Lin, and Gaither teaches the limitations as explained above in claim 9.
The combination of Kumar, Lin, and Gaither further teaches, The non-transitory machine-readable medium of claim 9 (see claim 9 above), wherein the first value indicates the pixel is part of any character in the image and the second value indicates the pixel is not part of any character in the image (Gaither: Col. 1, lines 6-16; Gaither: As shown in Col, 5, lines 27-42, the bitmap has pixels set to given values, in which pixels of text are set to 0 and pixels representing the background are set to 1; Note: the Examiner interprets the background (i.e. pixel value of 1) as not being part of the text/character).  
The proposed combination as well as the motivation for combining the Kumar, Lin, and Gaither references presented in the rejection of claim 9 apply to claim 10 and are incorporated herein by reference. Thus, the method recited in claim 10 is met by Kumar, Lin, and Gaither.

Regarding claim 11, the combination of Kumar, Lin, and Gaither teaches the limitations as explained above in claim 10.
The combination of Kumar, Lin, and Gaither further teaches, The non-transitory machine-readable medium of claim 10 (see claim 10 above), wherein the first value corresponds to a dark color and the second value corresponds to a light color (Lin, Para. 0018: “The binary mask can be converted into a black white representation, and the black white representation can be communicated to an optical character recognition engine (OCR) or other text recognition engine for further processing”;  Lin, Fig. 3: In image 302, the text is in black; Gaither: As shown in Col. 5, lines 27-42, the bitmap has pixels set to given values, in which pixels of text are set to 0 and pixels representing the background are set to 1; Note: text is black (i.e., a dark color and pixel value of 0), background is white (i.e., light color and value of 1)).  
The proposed combination as well as the motivation for combining the Kumar, Lin, and Gaither references presented in the rejection of claim 10 apply to claim 11 and are incorporated herein by reference. Thus, the method recited in claim 11 is met by Kumar, Lin, and Gaither.

Regarding claim 12, the combination of Kumar, Lin, and Gaither teaches the limitations as explained above in claim 8.
The combination of Kumar, Lin, and Gaither further teaches, The non-transitory machine-readable medium of claim 8 (see claim 8 above), wherein the operations further comprise: further receiving, from the text segmentation model based on the image, metadata regarding characters in the text in the image (Lin, Para. 0017: text is detected in the image using algorithms that detect and recognize the location of text in the image and the region of the image that includes text can be selected or cropped to remove irrelevant portions of the image and to highlight relevant regions containing text. The relevant regions are binarized; Lin, Para. 0018: “In various embodiments, detecting text in the image 102 can include locating regions of extremes (e.g., regions of sharp transitions between pixel values) such as the edges of letters. The regions of extremes, or the maximally stable extremal regions, can be extracted and analyzed to detect characters, where the detected characters can be connected and/or aggregated. A text line algorithm can be used to determine the orientation of the connected characters, and once the orientation of the characters is determined, a binary mask of the region containing the characters can be extracted. The binary mask can be converted into a black white representation and the black white representation can be communicated to an optical character recognition engine (OCR) or other text recognition engine for further processing”; Lin, Para. 0020: text has different variables including font style and font size which may vary the confidence scores for each detected character between different recognition engines; Note: the Examiner interprets font style/font size as metadata of the text. The text of the relevant regions is of various font styles/font sizes and the regions are detected by an algorithm to output a binary mask of the region. Therefore, the font style/font size is received from the algorithm during OCR processing; Kumar, Col. 4, line 46 to Col. 5, line 43: image of an invoice has text of different sizes with varying font characteristics (i.e., font types); Kumar: Fig. 3); 
and providing the metadata as further input to the OCR engine along with the per-pixel image segmentation map (Gaither: As shown in Col. 5, lines 27-42, the bitmap has pixels set to given values, in which pixels of text are set to 0 and pixels representing the background are set to 1; Lin, Para. 0017: text is detected in the image using algorithms that detect and recognize the location of text in the image and the region of the image that includes text can be selected or cropped to remove irrelevant portions of the image and to highlight relevant regions containing text. The relevant regions are binarized; Lin, Para. 0018: “In various embodiments, detecting text in the image 102 can include locating regions of extremes (e.g., regions of sharp transitions between pixel values) such as the edges of letters. The regions of extremes, or the maximally stable extremal regions, can be extracted and analyzed to detect characters, where the detected characters can be connected and/or aggregated. A text line algorithm can be used to determine the orientation of the connected characters, and once the orientation of the characters is determined, a binary mask of the region containing the characters can be extracted. The binary mask can be converted into a black white representation and the black white representation can be communicated to an optical character recognition engine (OCR) or other text recognition engine for further processing”; Lin, Para. 0019: text is recognized by recognition engines; Lin, Para. 0020: text has different variables including font style and font size which may vary the confidence scores for each detected character between different recognition engines; Note: As shown in Lin, since the binary mask of the text regions (which includes text of various font styles/font sizes) is communicated to an OCR for further processing, the metadata is provided with the binary mask). 
The proposed combination as well as the motivation for combining the Kumar, Lin, and Gaither references presented in the rejection of claim 8 apply to claim 12 and are incorporated herein by reference. Thus, the method recited in claim 12 is met by Kumar, Lin, and Gaither.

Regarding claim 15, Kumar teaches, A system for optical character recognition (OCR), the system comprising (Fig. 1: OCR 100; Col. 3, lines 4-6: “systems and methods are disclosed that employ a trained deep-neural net for OCR of documents”; Col. 3, lines 49-59: computerized digitization system employing a deep-learning based OCR system):
processing circuitry (Col. 9, lines 24-60: the computing system includes processing units in which the processing units execute computer executable instructions);
a memory including instructions that, when executed by the processing circuitry, cause the processing circuitry to perform operations comprising (Col. 9, lines 24-60: the computing system includes and memory and processing units in which the processing units execute computer executable instructions):
providing an image including text as input to a text segmentation model (Fig. 1: document image 101 is input into automatic segmentation 106; Col. 1, lines 54-64: there are synthetic text segments and real-life text segments in which the real-life segments are extracted out of documents; Col. 4, lines 9-24: automatic segmentation module breaks up a document image into sub-images of characters and words and are processed by the OCR system), the text segmentation model is a machine learning (ML) model trained in a supervised manner(Col. 3, lines 56-66: deep learning system employs machine learning algorithms to perform OCR, in which the algorithms may learn in a supervised manner; Col. 4, lines 9-24: automatic segmentation module breaks up a document image into sub-images of characters and words and are processed by the OCR system);
Kumar does not expressly disclose the following limitations: receiving, from the text segmentation model, a per-pixel image segmentation map of the image, the per-pixel image segmentation map comprising a binary, pixel-wise text/non-text classification mask output by the text segmentation model, the binary, pixel-wise text/non-text classification mask indicates, for each pixel, whether the pixel is classified, by the text segmentation model, as including a text character or not including a text character; providing the per-pixel image segmentation map as input to an OCR engine in place of image pixels of the image; and receiving, as output from the OCR engine based on the per-pixel image segmentation map, a digitized version of the image. 
However, Lin teaches, receiving, from the text segmentation model, a comprising a binaryindicates(Para. 0017: text is detected in the image using algorithms that detect and recognize the location of text in the image and the region of the image that includes text can be selected or cropped to remove irrelevant portions of the image and to highlight relevant regions containing text. The relevant regions are binarized; Para. 0018: “In various embodiments, detecting text in the image 102 can include locating regions of extremes (e.g., regions of sharp transitions between pixel values) such as the edges of letters. The regions of extremes, or the maximally stable extremal regions, can be extracted and analyzed to detect characters, where the detected characters can be connected and/or aggregated. A text line algorithm can be used to determine the orientation of the connected characters, and once the orientation of the characters is determined, a binary mask of the region containing the characters can be extracted. The binary mask can be converted into a black white representation; Fig. 3: In image 302, the text is in black; Note: a binary mask consists of only two values (i.e., black and white). The Examiner interprets the relevant regions containing text as text characters);
providing the  in place of  (Para. 0018: The binary mask can be converted into a black white representation, and the black white representation can be communicated to an optical character recognition engine (OCR) or other text recognition engine for further processing. In accordance with various embodiments, the binary mask is provided to a first recognition engine 104a, a second recognition engine 104b, and an nth recognition engine 104n for concurrent character recognition processing in a multithreaded mode; Note: the Examiner interprets the binary mask being provided to the OCR/recognition engine as providing the binary mask in place of the image); 
and receiving, as output from the OCR engine based on the (Para. 0002: “Optical character recognition (OCR) systems are generally used to detect text present in an image and to convert the detected text into its equivalent electronic representation”; Para. 0017; Para. 0018: The binary mask can be converted into a black white representation, and the black white representation can be communicated to an optical character recognition engine (OCR) or other text recognition engine for further processing. In accordance with various embodiments, the binary mask is provided to a first recognition engine 104a, a second recognition engine 104b, and an nth recognition engine 104n for concurrent character recognition processing in a multithreaded mode; Para. 0025: an Optical character recognition service is accessed for communicating digital information). 
It would have been obvious before the effective filing date of the claimed invention, to one of ordinary skill in the art, to combine receiving a binary image segmentation mask output by the text segmentation model (i.e., algorithm) that indicates text characters, providing the image segmentation map as input to an OCR in place of the image, and receiving a digitized image as output from the OCR based on the image segmentation map as taught by Lin with the character recognition of Kumar in order to improve text recognition precision (Lin, Para. 0002). Therefore, one of ordinary skill in the art would be capable to have combined the elements as claimed by known methods and that in combination, each element merely performs the same function as it does separately. 
The combination of Kumar and Lin does not expressly disclose the following limitations: a per-pixel image segmentation map of the image, the per-pixel image segmentation map comprising a pixel-wise text/non-text classification, the pixel-wise text/non-text classification indicates, for each pixel, whether the pixel is classified, or not including a text character; the per-pixel image segmentation map; image pixels of the image; the per-pixel image segmentation map. 
However, Gaither teaches, a per-pixel image segmentation map of the image, the per-pixel image segmentation map comprising a pixel-wise text/non-text classification, the pixel-wise text/non-text classification indicates, for each pixel, whether the pixel is classified, or not including a text character (As shown in Col. 5, lines 27-42, the bitmap has pixels set to given values, in which pixels of text are set to 0 and pixels representing the background are set to 1; Note: the Examiner interprets non-text as the background); 
the per-pixel image segmentation map (As shown in Col. 5, lines 27-42, the bitmap has pixels set to given values, in which pixels of text are set to 0 and pixels representing the background are set to 1); 
image pixels of the image (As shown in Col. 5, lines 11-42, the text image has pixels that represent text and background); 
the per-pixel image segmentation map (As shown in Col. 5, lines 27-42, the bitmap has pixels set to given values, in which pixels of text are set to 0 and pixels representing the background are set to 1). 
It would have been obvious before the effective filing date of the claimed invention, to one of ordinary skill in the art, to combine the image segmentation map being a per-pixel image segmentation map (i.e., bitmap) that classifies whether a pixel does not indicate text (i.e., is the background), and the image having pixels as taught by Gaither with the combined character recognition of Kumar and Lin in order to generate a representation of the lexical unit for display on an output device (Gaither, Abstract). Therefore, one of ordinary skill in the art would be capable to have combined the elements as claimed by known methods and that in combination, each element merely performs the same function as it does separately. It is for at least the aforementioned that the Examiner has reached a conclusion of obviousness with respect to claim 15.

Regarding claim 16, the combination of Kumar, Lin, and Gaither teaches the limitations as explained above in claim 15.
The combination of Kumar, Lin, and Gaither further teaches, The system of claim 15 (see claim 15 above), wherein the per-pixel image segmentation map includes one of only two values, a first value and a second value for each pixel of the image, the first value different from the second value (Lin, Para. 0017: text is detected in the image using algorithms that detect and recognize the location of text in the image and the region of the image that includes text can be selected or cropped to remove irrelevant portions of the image and to highlight relevant regions containing text. The relevant regions are binarized; Lin, Para. 0018: “In various embodiments, detecting text in the image 102 can include locating regions of extremes (e.g., regions of sharp transitions between pixel values) such as the edges of letters. The regions of extremes, or the maximally stable extremal regions, can be extracted and analyzed to detect characters, where the detected characters can be connected and/or aggregated. A text line algorithm can be used to determine the orientation of the connected characters, and once the orientation of the characters is determined, a binary mask of the region containing the characters can be extracted. The binary mask can be converted into a black white representation; Gaither: As shown in Col, 5, lines 27-42, the bitmap has pixels set to given values, in which pixels of text are set to 0 and pixels representing the background are set to 1).  
The proposed combination as well as the motivation for combining the Kumar, Lin, and Gaither references presented in the rejection of claim 15 apply to claim 16 and are incorporated herein by reference. Thus, the method recited in claim 16 is met by Kumar, Lin, and Gaither.

Regarding claim 17, the combination of Kumar, Lin, and Gaither teaches the limitations as explained above in claim 16.
The combination of Kumar, Lin, and Gaither further teaches, The system of claim 16 (see claim 16 above), wherein the first value indicates the pixel is part of any character in the image and the second value indicates the pixel is not part of any character in the image (Gaither: Col. 1, lines 6-16; Gaither: As shown in Col, 5, lines 27-42, the bitmap has pixels set to given values, in which pixels of text are set to 0 and pixels representing the background are set to 1; Note: the Examiner interprets the background (i.e. pixel value of 1) as not being part of the text/character).  
The proposed combination as well as the motivation for combining the Kumar, Lin, and Gaither references presented in the rejection of claim 16 apply to claim 17 and are incorporated herein by reference. Thus, the method recited in claim 17 is met by Kumar, Lin, and Gaither.
 
Regarding claim 18, the combination of Kumar, Lin, and Gaither teaches the limitations as explained above in claim 17.
The combination of Kumar, Lin, and Gaither further teaches, The system of claim 17 (see claim 17 above), wherein the first value corresponds to a dark color and the second value corresponds to a light color (Lin, Para. 0018: “The binary mask can be converted into a black white representation, and the black white representation can be communicated to an optical character recognition engine (OCR) or other text recognition engine for further processing”;  Lin, Fig. 3: In image 302, the text is in black; Gaither: As shown in Col. 5, lines 27-42, the bitmap has pixels set to given values, in which pixels of text are set to 0 and pixels representing the background are set to 1; Note: text is black (i.e., a dark color and pixel value of 0), background is white (i.e., light color and value of 1)).  
The proposed combination as well as the motivation for combining the Kumar, Lin, and Gaither references presented in the rejection of claim 17 apply to claim 18 and are incorporated herein by reference. Thus, the method recited in claim 18 is met by Kumar, Lin, and Gaither.
 
Regarding claim 19, the combination of Kumar, Lin, and Gaither teaches the limitations as explained above in claim 15.
The combination of Kumar, Lin, and Gaither further teaches, The system of claim 15 (see claim 15 above), wherein the operations further comprise: further receiving, from the text segmentation model based on the image, metadata regarding characters in the text in the image (Lin, Para. 0017: text is detected in the image using algorithms that detect and recognize the location of text in the image and the region of the image that includes text can be selected or cropped to remove irrelevant portions of the image and to highlight relevant regions containing text. The relevant regions are binarized; Lin, Para. 0018: “In various embodiments, detecting text in the image 102 can include locating regions of extremes (e.g., regions of sharp transitions between pixel values) such as the edges of letters. The regions of extremes, or the maximally stable extremal regions, can be extracted and analyzed to detect characters, where the detected characters can be connected and/or aggregated. A text line algorithm can be used to determine the orientation of the connected characters, and once the orientation of the characters is determined, a binary mask of the region containing the characters can be extracted. The binary mask can be converted into a black white representation and the black white representation can be communicated to an optical character recognition engine (OCR) or other text recognition engine for further processing”; Lin, Para. 0020: text has different variables including font style and font size which may vary the confidence scores for each detected character between different recognition engines; Note: the Examiner interprets font style/font size as metadata of the text. The text of the relevant regions is of various font styles/font sizes and the regions are detected by an algorithm to output a binary mask of the region. Therefore, the font style/font size is received from the algorithm during OCR processing; Kumar, Col. 4, line 46 to Col. 5, line 43: image of an invoice has text of different sizes with varying font characteristics (i.e., font types); Kumar: Fig. 3); 
and providing the metadata as further input to the OCR engine along with the per-pixel image segmentation map (Gaither: As shown in Col. 5, lines 27-42, the bitmap has pixels set to given values, in which pixels of text are set to 0 and pixels representing the background are set to 1; Lin, Para. 0017: text is detected in the image using algorithms that detect and recognize the location of text in the image and the region of the image that includes text can be selected or cropped to remove irrelevant portions of the image and to highlight relevant regions containing text. The relevant regions are binarized; Lin, Para. 0018: “In various embodiments, detecting text in the image 102 can include locating regions of extremes (e.g., regions of sharp transitions between pixel values) such as the edges of letters. The regions of extremes, or the maximally stable extremal regions, can be extracted and analyzed to detect characters, where the detected characters can be connected and/or aggregated. A text line algorithm can be used to determine the orientation of the connected characters, and once the orientation of the characters is determined, a binary mask of the region containing the characters can be extracted. The binary mask can be converted into a black white representation and the black white representation can be communicated to an optical character recognition engine (OCR) or other text recognition engine for further processing”; Lin, Para. 0019: text is recognized by recognition engines; Lin, Para. 0020: text has different variables including font style and font size which may vary the confidence scores for each detected character between different recognition engines; Note: As shown in Lin, since the binary mask of the text regions (which includes text of various font styles/font sizes) is communicated to an OCR for further processing, the metadata is provided with the binary mask). 
The proposed combination as well as the motivation for combining the Kumar, Lin, and Gaither references presented in the rejection of claim 15 apply to claim 19 and are incorporated herein by reference. Thus, the method recited in claim 19 is met by Kumar, Lin, and Gaither.


Claims 6, 13, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Kumar et al. (US 10,489,682 B1, hereinafter “Kumar”) in view of Lin et al. (US 2015/0254507 A1, hereinafter “Lin”) and further in view of Gaither (US 6,771,816 B1) and Nicholson et al. (US 5,729,637; hereinafter “Nicholson”).
Regarding claim 6, the combination of Kumar, Lin, and Gaither teaches the limitations as explained above in claim 5.
The combination of Kumar, Lin, and Gaither further teaches, The method of claim 5 (see claim 5 above), wherein (Lin, Para. 0020: text has different variables including font style and font size; Kumar, Col. 5, lines 5-43: Fonts can vary widely with different font types being used, and the grids of pixel values for text characters may differ due to varied fonts and sizes; Kumar: Fig. 3; Kumar, Col. 8, lines 9-52).  
The combination of Kumar, Lin, and Gaither does not expressly disclose the following limitation: the metadata indicates a font type.
However, Nicholson teaches, the metadata indicates a font type (Col. 2, lines 5-8: characters in fonts; As shown in Col. 16, lines 27-28, there are font attributes of the input bitmap; Col. 16, lines 62-65: the font type that the character belongs to is determined; Col. 17, lines 37-66; Figs. 8 and 8a; Col. 18, lines 3-32: a typeface is assigned to each font type number until all the recognized words can be associated with a standard available typeface. The typeface and font attributes are stored with the characters).
It would have been obvious before the effective filing date of the claimed invention, to one of ordinary skill in the art, to combine the font type being indicated by metadata as taught by Nicholson with the combined character recognition of Kumar, Lin, and Gather in order to accurately render and view words (Nicholson, Col. 16, lines 14-16). Therefore, one of ordinary skill in the art would be capable to have combined the elements as claimed by known methods and that in combination, each element merely performs the same function as it does separately. It is for at least the aforementioned that the Examiner has reached a conclusion of obviousness with respect to claim 6.

Regarding claim 13, the combination of Kumar, Lin, and Gaither teaches the limitations as explained above in claim 12.
The combination of Kumar, Lin, and Gaither further teaches, The non-transitory machine-readable medium of claim 12 (see claim 12 above), wherein (Lin, Para. 0020: text has different variables including font style and font size; Kumar, Col. 5, lines 5-43: Fonts can vary widely with different font types being used, and the grids of pixel values for text characters may differ due to varied fonts and sizes; Kumar: Fig. 3; Kumar, Col. 8, lines 9-52).  
The combination of Kumar, Lin, and Gaither does not expressly disclose the following limitation: the metadata indicates a font type.
However, Nicholson teaches, the metadata indicates a font type (Col. 2, lines 5-8: characters in fonts; As shown in Col. 16, lines 27-28, there are font attributes of the input bitmap; Col. 16, lines 62-65: the font type that the character belongs to is determined; Col. 17, lines 37-66; Figs. 8 and 8a; Col. 18, lines 3-32: a typeface is assigned to each font type number until all the recognized words can be associated with a standard available typeface. The typeface and font attributes are stored with the characters).
It would have been obvious before the effective filing date of the claimed invention, to one of ordinary skill in the art, to combine the font type being indicated by metadata as taught by Nicholson with the combined character recognition of Kumar, Lin, and Gather in order to accurately render and view words (Nicholson, Col. 16, lines 14-16). Therefore, one of ordinary skill in the art would be capable to have combined the elements as claimed by known methods and that in combination, each element merely performs the same function as it does separately. It is for at least the aforementioned that the Examiner has reached a conclusion of obviousness with respect to claim 13.

Regarding claim 20, The combination of Kumar, Lin, and Gaither teaches the limitations as explained above in claim 19.
The combination of Kumar, Lin, and Gaither further teaches, The system of claim 19 (see claim 19 above), wherein (Lin, Para. 0020: text has different variables including font style and font size; Kumar, Col. 5, lines 5-43: Fonts can vary widely with different font types being used, and the grids of pixel values for text characters may differ due to varied fonts and sizes; Kumar: Fig. 3; Kumar, Col. 8, lines 9-52; Note: since there is an “or” statement, the Examiner selects the font type limitation for consideration).
The combination of Kumar, Lin, and Gaither does not expressly disclose the following limitation: the metadata indicates a font type.
However, Nicholson teaches, the metadata indicates a font type (Col. 2, lines 5-8: characters in fonts; As shown in Col. 16, lines 27-28, there are font attributes of the input bitmap; Col. 16, lines 62-65: the font type that the character belongs to is determined; Col. 17, lines 37-66; Figs. 8 and 8a; Col. 18, lines 3-32: a typeface is assigned to each font type number until all the recognized words can be associated with a standard available typeface. The typeface and font attributes are stored with the characters).
It would have been obvious before the effective filing date of the claimed invention, to one of ordinary skill in the art, to combine the font type being indicated by metadata as taught by Nicholson with the combined character recognition of Kumar, Lin, and Gather in order to accurately render and view words (Nicholson, Col. 16, lines 14-16). Therefore, one of ordinary skill in the art would be capable to have combined the elements as claimed by known methods and that in combination, each element merely performs the same function as it does separately. It is for at least the aforementioned that the Examiner has reached a conclusion of obviousness with respect to claim 20.


Claims 7 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Kumar et al. (US 10,489,682 B1, hereinafter “Kumar”) in view of Lin et al. (US 2015/0254507 A1, hereinafter “Lin”) and further in view of Gaither (US 6,771,816 B1) and Yuan et al. (US 2023/0073932 A1; hereinafter “Yuan”).
Regarding claim 7, the combination of Kumar, Lin, and Gaither teaches the limitations as explained above in claim 5.
The combination of Kumar, Lin, and Gaither further teaches, The method of claim 5 (see claim 5 above), wherein (Kumar, Col. 3, lines 6-8: “black-and-white segments from English language invoices”; Kumar, Col. 4, lines 4-8: English language invoices are used as the domain of interest).  
The combination of Kumar, Lin, and Gaither does not expressly disclose the following limitation: the metadata indicates a language of the text.
However, Yuan teaches, the metadata indicates a language of the text (Para. 0079: “each word or character may be stored with a small amount of metadata that identifies the language it corresponds to”; Paras. 0081-0083).
It would have been obvious before the effective filing date of the claimed invention, to one of ordinary skill in the art, to combine metadata including the language of each text/character as taught by Yuan with the combined character recognition of Kumar, Lin, and Gaither in order to achieve rapid language detection for scanned documents (Yuan, Para. 0079). Therefore, one of ordinary skill in the art would be capable to have combined the elements as claimed by known methods and that in combination, each element merely performs the same function as it does separately. It is for at least the aforementioned that the Examiner has reached a conclusion of obviousness with respect to claim 7.

Regarding claim 14, the combination of Kumar, Lin, and Gaither teaches the limitations as explained above in claim 5.
The combination of Kumar, Lin, and Gaither further teaches, The non-transitory machine-readable medium of claim 12 (see claim 12 above), wherein (Kumar, Col. 3, lines 6-8: “black-and-white segments from English language invoices”; Kumar, Col. 4, lines 4-8: English language invoices are used as the domain of interest).   
The combination of Kumar, Lin, and Gaither does not expressly disclose the following limitation: the metadata indicates a language of the text.
However, Yuan teaches, the metadata indicates a language of the text (Para. 0079: “each word or character may be stored with a small amount of metadata that identifies the language it corresponds to”; Paras. 0081-0083).
It would have been obvious before the effective filing date of the claimed invention, to one of ordinary skill in the art, to combine metadata including the language of each text/character as taught by Yuan with the combined character recognition of Kumar, Lin, and Gaither in order to achieve rapid language detection for scanned documents (Yuan, Para. 0079). Therefore, one of ordinary skill in the art would be capable to have combined the elements as claimed by known methods and that in combination, each element merely performs the same function as it does separately. It is for at least the aforementioned that the Examiner has reached a conclusion of obviousness with respect to claim 14.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Hoehne et al. (US 2020/0082218 A1)
“A Deep Learning Approach for Text Segmentation in Document Analysis” by V-L Pham et al.

Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Daniella M. DiGuglielmo whose telephone number is (571)272-0183. The examiner can normally be reached Monday - Friday 8:00 AM - 4:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Emily Terrell can be reached at (571)270-3717. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Daniella M. DiGuglielmo/Examiner, Art Unit 2666                             

/EMILY C TERRELL/Supervisory Patent Examiner, Art Unit 2666
Read full office action
Prosecution Timeline

Show 6 earlier events
Aug 25, 2025
Response Filed
Nov 03, 2025
Final Rejection — §103
Jan 07, 2026
Response after Non-Final Action
Jan 20, 2026
Request for Continued Examination
Jan 22, 2026
Response after Non-Final Action
Jan 29, 2026
Non-Final Rejection — §103
Apr 24, 2026
Applicant Interview (Telephonic)
Apr 24, 2026
Examiner Interview Summary
Precedent Cases

Applications granted by this same examiner with similar technology

17/578,465
Patent 12608900
LEARNING APPARATUS, LEARNING METHOD, AND LEARNING PROGRAM, REGION-OF-INTEREST EXTRACTION APPARATUS, REGION-OF-INTEREST EXTRACTION METHOD, AND REGION-OF-INTEREST EXTRACTION PROGRAM, AND LEARNED EXTRACTION MODEL
4y 3m to grant Granted Apr 21, 2026
18/074,942
Patent 12586401
SYSTEMS AND METHODS FOR REPRESENTING AND SEARCHING CHARACTERS
3y 3m to grant Granted Mar 24, 2026
18/014,172
Patent 12567228
IMAGE DATA PROCESSING METHOD, IMAGE DATA PROCESSING APPARATUS, AND COMMERCIAL USE
3y 2m to grant Granted Mar 03, 2026
18/522,372
Patent 12567266
IMAGE RECOGNITION SYSTEM AND IMAGE RECOGNITION METHOD
2y 3m to grant Granted Mar 03, 2026
17/895,617
Patent 12555372
IMAGE SENSOR EVALUATION METHOD USING COMPUTING DEVICE INCLUDING PROCESSOR
3y 5m to grant Granted Feb 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
81%
Grant Probability
99%
With Interview (+25.9%)
2y 7m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 174 resolved cases by this examiner. Grant probability derived from career allowance rate.