DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments, see Applicants Remarks pages 5-8, filed 02/13/26, with respect to the rejection(s) of claim(s) 1-4 and 9-18 have been fully considered and are persuasive. Therefore, the rejection has been withdrawn. However, upon further consideration, a new ground(s) of rejection is made in view of Kakrana et al US 20210110150.
Regarding claim 1, Applicant has amended claim to recite, “wherein detecting the plurality of text clusters uses a text cluster detection algorithm executed by the computer processor. wherein the text cluster detection algorithm applies a morphological transformation on the bounding boxes along one or more axes”. Kakrana et al teaches data node logic 244a (text cluster detection algorithm) is configured to detect the plurality of closed-shaped data nodes (plurality of text clusters) and localize the text enclosed within the plurality of closed-shaped data nodes, as described below. The data node logic 244a may store a variety of algorithms for this purpose including, but not limited to, canny edge detection algorithm, morphological transformation (paragraph 0035) Note: a plurality of closed-shaped data nodes representing any geometrical or non-geometrical shape having text enclosed (paragraph 0040). Kakrana et al also teaches morphological transformation through one or more rounds of erosion to contract the white foreground objects using an appropriate structural element or kernel size, and (v) edge detection using canny edge detection algorithm to highlight the horizontal lines, vertical lines to adaptively highlight the geometrical edges of each of the plurality of closed-shaped data nodes and defocus the text enclosed within the plurality of closed-shaped data nodes (paragraph 0042), which teaches the morphological transformation on the data nodes (bounding box) is performed along horizontal and vertical lines (axes)
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-4, 11-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over
Patel et al US 2021/0201018 in view of Zhong et al US 2021/0286989 further in view of Sisto et al US 2021/0342399 further in view of Loudon et al. US 6891971 further in view of Kakrana et al US 20210110150.
Regarding claim 1, Patel et al teaches a system for recognizing a relevant value from an unstructured document (system 102 (paragraph 0030), where a computer processor performs the operations of:
receiving an unstructured document as input (the formats of the unstructured documents may be distinct owing to the fact that the documents may include invoices, bills and so on received from distinct third parties, billers and so on, each following a distinct format. In an embodiment, preprocessing of the documents may be performed by a document preprocessor 304 (paragraph 0032));
detecting a plurality of text clusters in the unstructured document (unstructured documents may initially be preprocessed to identify sections in the page images. bounding boxes may be generated around such sections (paragraph 0032);
generating, by an optical character recognition (OCR) module, a plurality of text outputs from the plurality of text clusters, wherein each text cluster corresponds to a respective text output (the bounding boxes may be generated by utilizing an HOCR technique that formats OCR output of a page and provides location of every word which is treated as a section. HOCR may be utilized for labels which have single word values such as invoice number, invoice date, and so on (paragraph 0032) Note: the labels that are generated using the HOCR technique, are generated using single word, which are text. The labels read on text cluster since a label can contain a several words or text;
generating a new text output by performing a further OCR operation on the merged text clusters (In logo classification, for the logo with text only, directly optical character recognition (OCR) of logo region may be performed, to obtain text therefrom, and then text may be interpreted, to identify the source (e.g vendor/company name) to which said unstructured document belongs to (paragraph 0039) Note the OCR results of the logo (merged text clusters) is read as new text output :
using the pre-trained question-answering model to obtain a revised answer from the classified new text output (Logo classification for logos with image pattern and text, and logo with image pattern only, collection of reference logo images is created and features are extracted therefrom. extraction models may be used to extract such features. When a logo image is to be classified, a cosine vector similarity may be calculated between reference logo feature vector and new logo feature vector, taking one reference logo image at a time. Cosine value between the two vectors measures the similarity between reference logo image and new logo image (which is to be classified). By taking a maximum of these calculated cosine similarity value, the logo may be classified into to a particular category (paragraph 0036);
and extract a revised final answer, based on the revised initial answer, to be presented as a new extracted value to be associated with the field (if the bounding box includes a logo, the logo may be extracted by using a deep learning model pre-trained to extract logo from the unstructured document (paragraph 0036).
Although Patel et al teaches logo detection, logo region is detected from the unstructured documents. Known models such as YOLO (You only look once) object detection model may be utilized to detect logo, where a CNN deep neural network model may be trained to detect logo as an object from invoice image (paragraph 0038) Note: CNN is trained to detect logo. A logo consists of text or image and text, so a CNN model can detect a text as a logo (also see paragraph 0039-0040). Patel et al also teaches of classifying logo by vendor or company name (paragraph 0039)
Patel et al fails to teach classifying the plurality of text outputs using a natural language processing algorithm configured to classify text;
using a pre-trained question-answering model to obtain an initial answer from one or more of the classified plurality of text outputs
classifying the new text output using the natural language processing algorithm configured to classify text;
Zhong et al teaches classifying the plurality of text outputs using a natural language processing algorithm configured to classify text (QA sub-model configured and arranged to use the natural language processing algorithms 432 and machine learning algorithms 434 to analyze the features and data structures of the labeled format types (paragraph 0064);
using a pre-trained question-answering model to obtain an initial answer from one or more of the classified plurality of text outputs (QA sub-model configured and arranged to use the natural language processing algorithms 432 and machine learning algorithms 434 to analyze the features and data structures of the labeled format types in order to provide answers to open-ended natural language questions (paragraph 0064)
classifying the new text output using the natural language processing algorithm configured to classify text (the modified CA sub-model 440 is configured and arranged to electronically and automatically perform research using the categorization and classification of speech. the modified CA model 440 can be arranged to use the natural language processing algorithms 432 and machine learning algorithms 434 to analyze the features and data structures of the labeled format types in order to provide answers to open-ended natural language questions (paragraph 0064);
Therefore, it would have been obvious to one of ordinary skill in the art to modify Patel et al to include classifying the plurality of text outputs using a natural language processing algorithm configured to classify text and using a pre-trained question-answering model to obtain an initial answer from one or more of the classified plurality of text outputs.
The reason of doing so would be to obtain a correct and specific output of a desired process.
Patel et al in view of Zhong et al fails to teach extract a final answer, based on the initial answer, to be presented as an extracted value to be associated with a corresponding field.
Sisto et al teaches extract a final answer, based on the initial answer, to be presented as an extracted value to be associated with a corresponding field (the modified CA sub-model 440 includes a standard CA sub-model that has been trained (or pre-trained) on its conventional CA functionality then modified with additional neutral network functionality that is trained to electronically read and analyze the numerical representations of extracted features generated by the auxiliary sub-models 438 (paragraph 0064).
Therefore, it would have been obvious to one of ordinary skill in the art to modify Patel et al in view of Zhong et al to include: extract a final answer, based on the initial answer, to be presented as an extracted value to be associated with a corresponding field.
The reason of doing so would be to obtain a correct and specific output of a desired process. In other words, to accurately identify a specific text or image in a document.
The reason of doing so would be to reduce the amount of bounding boxes to process, thereby saving time and processing power.
Patel et al in view of Zhong et al further in view of Sisto et al further fails to teach wherein each of the text clusters is bounded within a contour of a respective bounding box, and wherein two neighboring bounding boxes are merged based on proximity of individual neighboring bounding box coordinates;
checking if a desired value is extracted as the final answer from an initial set of text clusters with their respective bounding boxes; and responsive to determining that the desired value is not extracted from the initial set of text clusters with their respective bounding boxes, creating compound bounding boxes by merging one or more neighboring bounding boxes, thereby merging the corresponding text clusters.
Loundon et al teaches wherein each of the text clusters is bounded within a contour of a respective bounding box, and wherein two neighboring bounding boxes are merged based on proximity of individual neighboring bounding box coordinates (Bounding boxes 18 and 20 of the previous sub-segments, and bounding boxes 22 and 24 for the new strokes are determined and used to decide if the new stroke is surrounded by previous strokes. A new stroke 10 contained in the bounding box of a previous sub-segment 12, joins or is merged with that sub-segment 12. (column 3, lines 11-14)
checking if a desired value is extracted as the final answer from an initial set of text clusters with their respective bounding boxes (a check is made to see if the new stroke (final answer) is surrounded by a previous group of strokes, which form a sub-segment ((column 3 lines 6-10); and
responsive to determining that the desired value is not extracted from the initial set of text clusters with their respective bounding boxes, creating compound bounding boxes by merging one or more neighboring bounding boxes, thereby merging the corresponding text clusters (If the new stroke 24 (final answer) is not surrounded by the previous strokes 16, a distance between the centroids 26 and 28 of the bounding boxes 20 and 24 is determined. If the distance is less than a predetermined threshold, then the new stroke 24 is added (merged) to the sub-segment 16 (bounding box). (column 3 lines 10-25 and fig 8) Note: if the new stroke is not surrounded by previous strokes, which would read on non extracted value, distance is looked at to see if the bounding boxes are close and then they are added (merged) to sub segment
Therefore, it would have been obvious to one of ordinary skill in the art to modify Patel et al in view of Zhong et al further in view of Sisto et al to include the teaching of Loudon et al wherein checking if a desired value is extracted as the final answer from an initial set of text clusters with their respective bounding boxes; and responsive to determining that the desired value is not extracted from the initial set of text clusters with their respective bounding boxes, creating compound bounding boxes by merging one or more neighboring bounding boxes, thereby merging the corresponding text clusters.
The reason of doing so would be to reduce the amount of bounding boxes to process, by merging bounding boxes.
Patel et al in view of Zhong et al further in view of Sisto et al further in view of Loudon et al fails to teach wherein detecting the plurality of text clusters uses a text cluster detection algorithm executed by the computer processor. wherein the text cluster detection algorithm applies a morphological transformation on the bounding boxes along one or more axes,
Kakrana et al teaches wherein detecting the plurality of text clusters uses a text cluster detection algorithm executed by the computer processor, wherein the text cluster detection algorithm applies a morphological transformation on the bounding boxes along one or more axes (data node logic 244a (text cluster detection algorithm) is configured to detect the plurality of closed-shaped data nodes (plurality of text clusters) and localize the text enclosed within the plurality of closed-shaped data nodes, as described below. The data node logic 244a may store a variety of algorithms for this purpose including, but not limited to, canny edge detection algorithm, morphological transformation (paragraph 0035) Note: a plurality of closed-shaped data nodes representing any geometrical or non-geometrical shape having text enclosed (paragraph 0040). morphological transformation through one or more rounds of erosion and (v) edge detection using canny edge detection algorithm to highlight the horizontal lines, vertical lines to adaptively highlight the geometrical edges of each of the plurality of closed-shaped data nodes (paragraph 0042), which teaches the morphological transformation on the data nodes (bounding box) is performed along horizontal and vertical lines (axes),
Therefore, it would have been obvious to one of ordinary skill in the art to modify Patel et al in view of Zhong et al further in view of Sisto et al to include the teaching of Loudon et al wherein detecting the plurality of text clusters uses a text cluster detection algorithm executed by the computer processor, wherein the text cluster detection algorithm applies a morphological transformation on the bounding boxes along one or more axes.
The reason of doing so would be to reduce the amount of bounding boxes to process, by merging bounding boxes.
Regarding claim 2, Patel et al teaches wherein detecting the text clusters is agnostic of a layout or format of the unstructured document (determining the layout of the document may refer to dissecting the page image into sections and identifying geometric and/or spatial layout of elements of the page image. the layout of the page images may be obtained by initially converting the page image to grey scale so that entire contents/elements of the page image are neutralized to black and white (paragraph 0033)).
Regarding claim 3, Patel et al teaches wherein the unstructured document in an unstructured image document in its entirety, or the unstructured document has a portion that is an unstructured image document (the formats of the unstructured documents may be distinct owing to the fact that the documents may include invoices, bills and so on (paragraph 0032)).
Regarding claim 4, Patel et al teaches wherein each of the text clusters is a smaller image within the unstructured image document (the label value for each label, the bounding boxes may be assessed to determine whether the label text (label name and/or label values) is present in the bounding boxes (paragraph 0036) Note: in order for the label text to fit in the bounding boxes, the text would have to be smaller than the bounding box, thus making the label text smaller than the unstructured image document).
Regarding claim 11, Patel et al in view of Zhong et al further in view of Sisto et al further in view of Loudon et al further in view of Kakrana et al teaches wherein the morphological transformation is an iterative process that, in each iteration, creates an intermediate set of bounding boxes from a previous set of bounding boxes (Loudon et al: Bounding boxes 18 and 20 of the previous sub-segments, and bounding boxes 22 and 24 for the new strokes are determined and used to decide if the new stroke is surrounded by previous strokes. A new stroke 10 contained in the bounding box of a previous sub-segment 12, joins or is merged with that sub-segment 12. (column 3, lines 11-14).
Regarding claim 12, Patel et al in view of Zhong et al further in view of Sisto et al further in view of Loudon et al further in view of Kakrana et al teaches wherein in one or more iteration, the morphological transformation along a selected axis applies a dilation rate that is faster or slower than a dilation rate along another axis (Kakrana et al: morphological transformation through one or more rounds of dilation to expand the white foreground objects using an appropriate structural element or kernel size, (iv) morphological transformation through one or more rounds of erosion to contract the white foreground objects using an appropriate structural element or kernel size, and (v) edge detection using canny edge detection algorithm to highlight the horizontal lines, vertical lines and edges pf the white foreground objects (paragraph 0042).
Regarding claim 13, Patel et al in view of Zhong et al further in view of Sisto et al further in view of Loudon et al further in view of Kakrana et al teaches wherein relative dilation rates along different axes can be scaled up or down by predetermined factors (Patel et al: The page image may then be dilated to increase the thickness of content which is in white colour. Dilation may be done repetitively so that optimal intended areas may be grouped together. Bounding boxes are drawn or generated around edge of the white colored elements at 206 (FIG. 2A). For every element (bounding box) key features as well as features (at 310, FIG. 3A) depicting location with respect to geometry of the bounding box may be saved. These features may include, but are not limited to, i) Unique Identification number given by system; ii) part image section contained in the bounding box; iii) Location—x, y location of the left, top point of the bounding box; iv) Size—height, width; v) Centroid location—x, y location of centroid of the bounding box vi) (paragraph 0034). Note: the dilation is increased based on number of intended areas, which would determine at which rate the dilation would occur
Regarding claim 14, Patel et al in view of Zhong et al further in view of Sisto et al further in view of Loudon et al further in view of Kakrana et al teaches wherein dilation scaling can be applied to axes of the bounding boxes sequentially (Patel et al: Dilation may be done repetitively so that optimal intended areas may be grouped together. Bounding boxes are drawn or generated around edge of the white colored elements at 206 (FIG. 2A). For every element (bounding box) key features as well as features (at 310, FIG. 3A) depicting location with respect to geometry of the bounding box may be saved. These features may include, but are not limited to, i) Unique Identification number given by system; ii) part image section contained in the bounding box; iii) Location—x, y location of the left, top point of the bounding box; iv) Size—height, width; v) Centroid location—x, y location of centroid of the bounding box vi) (paragraph 0034).
Regarding claim 15, Patel et al in view of Zhong et al further in view of Sisto et al further in view of Loudon et al further in view of Kakrana et al teaches wherein dilation scaling can be applied to both axes of the bounding boxes in parallel (Patel et al: Dilation may be done repetitively so that optimal intended areas may be grouped together. Bounding boxes are drawn or generated around edge of the white colored elements at 206 (FIG. 2A). For every element (bounding box) key features as well as features (at 310, FIG. 3A) depicting location with respect to geometry of the bounding box may be saved. These features may include, but are not limited to, i) Unique Identification number given by system; ii) part image section contained in the bounding box; iii) Location—x, y location of the left, top point of the bounding box; iv) Size—height, width; v) Centroid location—x, y location of centroid of the bounding box vi) (paragraph 0034). Note: the x, y location are parallel
Regarding claim 16, Patel et al in view of Zhong et al further in view of Sisto et al further in view of Loudon et al further in view of Kakrana et al teach wherein the intermediate set of bounding boxes are fed to the OCR module to obtain an array of text samples for natural language processing (Zhong et al: pre-training operations applied to the layout recognition sub-model 510 includes using the pre-training data 502 (specifically, the bounding box (Bbox) data that identifies a particular location on an image) and the PDF document 412A to train the layout recognition sub-model 510 to perform its unique set of analysis operations. The pre-training operations applied to the OCR sub-model 520 includes using text-related supervisions from the pre-training data 502 to train the OCR sub-model 520 to perform its unique set of analysis operations (paragraph 0078).
Regarding claim 17, Patel et al in view of Zhong et al further in view of Sisto et al further in view of Loudon et al further in view of Kakrana et al teach where the pre-trained question answering model is used to fetch one or more relevant answers corresponding to each of the plurality of text outputs (Sisto et al: given a user query, the system first returns a relatively small set of the user's documents that might have an answer to the query, and then the neural filter is applied to the sentences within that to identify only those sentences that are then examined to determine a final result. once the small number of sentences 808 ( is identified by the neural filter, only those sentences 808 are then queried (with the user query) to return the answer (final answer) (paragraph 0026 and 0034)
Therefore, it would have been obvious to one of ordinary skill in the art to modify Patel et al in view of Zhong et al further in view Loudon et al to include the teaching of Sisto et al where the pre-trained question answering model is used to fetch one or more relevant answers corresponding to each of the plurality of text outputs.
The reason of doing so would be to reduce the amount of time to process a document using a model to retrieve specific and desired answers.
Regarding claim 18, Patel et al in view of Zhong et al further in view of Sisto et al further in view of Loudon et al further in view of Kakrana et al teach where an output of the question-answering model is passed through one or more rule-based filters to obtain the final answer (Sisto et al: given a user query, the system first returns a relatively small set of the user's documents that might have an answer (initial answer) to the query, and then the neural filter is applied to the sentences within that to identify only those sentences that are then examined to determine a final result. once the small number of sentences 808 ( is identified by the neural filter, only those sentences 808 are then queried (with the user query) to return the answer (final answer) (fig 8 and paragraph 0026 and 0034).
Therefore, it would have been obvious to one of ordinary skill in the art to modify Patel et al in view of Zhong et al further in view Loudon et al further in view of Karkana et al to include the teaching of Sisto et al where an output of the question-answering model is passed through one or more rule-based filters to obtain the final answer.
The reason of doing so would be to reduce the amount of time to process a document using a model to retrieve specific and desired answers.
Regarding claim 19, Patel et al teaches A computer-implemented method for recognizing a relevant value from an unstructured document, the method comprising: (paragraph 0030),
receiving an unstructured document as input (the formats of the unstructured documents may be distinct owing to the fact that the documents may include invoices, bills and so on received from distinct third parties, billers and so on, each following a distinct format. In an embodiment, preprocessing of the documents may be performed by a document preprocessor 304 (paragraph 0032));
detecting a plurality of text clusters in the unstructured document (unstructured documents may initially be preprocessed to identify sections in the page images. bounding boxes may be generated around such sections (paragraph 0032);
generating, by an optical character recognition (OCR) module, a plurality of text outputs from the plurality of text clusters, wherein each text cluster corresponds to a respective text output (the bounding boxes may be generated by utilizing an HOCR technique that formats OCR output of a page and provides location of every word which is treated as a section. HOCR may be utilized for labels which have single word values such as invoice number, invoice date, and so on (paragraph 0032) Note: the labels that are generated using the HOCR technique, are generated using single word, which are text. The labels read on text cluster since a label can contain a several words or text;
generating a new text output by performing a further OCR operation on the merged text clusters (In logo classification, for the logo with text only, directly optical character recognition (OCR) of logo region may be performed, to obtain text therefrom, and then text may be interpreted, to identify the source (e.g vendor/company name) to which said unstructured document belongs to (paragraph 0039) Note the OCR results of the logo (merged text clusters) is read as new text output :
using the pre-trained question-answering model to obtain a revised answer from the classified new text output (Logo classification for logos with image pattern and text, and logo with image pattern only, collection of reference logo images is created and features are extracted therefrom. extraction models may be used to extract such features. When a logo image is to be classified, a cosine vector similarity may be calculated between reference logo feature vector and new logo feature vector, taking one reference logo image at a time. Cosine value between the two vectors measures the similarity between reference logo image and new logo image (which is to be classified). By taking a maximum of these calculated cosine similarity value, the logo may be classified into to a particular category (paragraph 0036);
and extract a revised final answer, based on the revised initial answer, to be presented as a new extracted value to be associated with the field (if the bounding box includes a logo, the logo may be extracted by using a deep learning model pre-trained to extract logo from the unstructured document (paragraph 0036).
Although Patel et al teaches logo detection, logo region is detected from the unstructured documents. Known models such as YOLO (You only look once) object detection model may be utilized to detect logo, where a CNN deep neural network model may be trained to detect logo as an object from invoice image (paragraph 0038) Note: CNN is trained to detect logo. A logo consists of text or image and text, so a CNN model can detect a text as a logo (also see paragraph 0039-0040). Patel et al also teaches of classifying logo by vendor or company name (paragraph 0039)
Patel et al fails to teach classifying the plurality of text outputs using a natural language processing algorithm configured to classify text;
using a pre-trained question-answering model to obtain an initial answer from one or more of the classified plurality of text outputs
classifying the new text output using the natural language processing algorithm configured to classify text;
Zhong et al teaches classifying the plurality of text outputs using a natural language processing algorithm configured to classify text (QA sub-model configured and arranged to use the natural language processing algorithms 432 and machine learning algorithms 434 to analyze the features and data structures of the labeled format types (paragraph 0064);
using a pre-trained question-answering model to obtain an initial answer from one or more of the classified plurality of text outputs (QA sub-model configured and arranged to use the natural language processing algorithms 432 and machine learning algorithms 434 to analyze the features and data structures of the labeled format types in order to provide answers to open-ended natural language questions (paragraph 0064)
classifying the new text output using the natural language processing algorithm configured to classify text (the modified CA sub-model 440 is configured and arranged to electronically and automatically perform research using the categorization and classification of speech. the modified CA model 440 can be arranged to use the natural language processing algorithms 432 and machine learning algorithms 434 to analyze the features and data structures of the labeled format types in order to provide answers to open-ended natural language questions (paragraph 0064);
Therefore, it would have been obvious to one of ordinary skill in the art to modify Patel et al to include classifying the plurality of text outputs using a natural language processing algorithm configured to classify text and using a pre-trained question-answering model to obtain an initial answer from one or more of the classified plurality of text outputs.
The reason of doing so would be to obtain a correct and specific output of a desired process.
Patel et al in view of Zhong et al fails to teach extract a final answer, based on the initial answer, to be presented as an extracted value to be associated with a corresponding field.
Sisto et al teaches extract a final answer, based on the initial answer, to be presented as an extracted value to be associated with a corresponding field (the modified CA sub-model 440 includes a standard CA sub-model that has been trained (or pre-trained) on its conventional CA functionality then modified with additional neutral network functionality that is trained to electronically read and analyze the numerical representations of extracted features generated by the auxiliary sub-models 438 (paragraph 0064).
Therefore, it would have been obvious to one of ordinary skill in the art to modify Patel et al in view of Zhong et al to include: extract a final answer, based on the initial answer, to be presented as an extracted value to be associated with a corresponding field.
The reason of doing so would be to obtain a correct and specific output of a desired process. In other words, to accurately identify a specific text or image in a document.
The reason of doing so would be to reduce the amount of bounding boxes to process, thereby saving time and processing power.
Patel et al in view of Zhong et al further in view of Sisto et al further fails to teach wherein each of the text clusters is bounded within a contour of a respective bounding box, and wherein two neighboring bounding boxes are merged based on proximity of individual neighboring bounding box coordinates;
checking if a desired value is extracted as the final answer from an initial set of text clusters with their respective bounding boxes; and responsive to determining that the desired value is not extracted from the initial set of text clusters with their respective bounding boxes, creating compound bounding boxes by merging one or more neighboring bounding boxes, thereby merging the corresponding text clusters.
Loundon et al teaches wherein each of the text clusters is bounded within a contour of a respective bounding box, and wherein two neighboring bounding boxes are merged based on proximity of individual neighboring bounding box coordinates (Bounding boxes 18 and 20 of the previous sub-segments, and bounding boxes 22 and 24 for the new strokes are determined and used to decide if the new stroke is surrounded by previous strokes. A new stroke 10 contained in the bounding box of a previous sub-segment 12, joins or is merged with that sub-segment 12. (column 3, lines 11-14)
checking if a desired value is extracted as the final answer from an initial set of text clusters with their respective bounding boxes (a check is made to see if the new stroke (final answer) is surrounded by a previous group of strokes, which form a sub-segment ((column 3 lines 6-10); and
responsive to determining that the desired value is not extracted from the initial set of text clusters with their respective bounding boxes, creating compound bounding boxes by merging one or more neighboring bounding boxes, thereby merging the corresponding text clusters (If the new stroke 24 (final answer) is not surrounded by the previous strokes 16, a distance between the centroids 26 and 28 of the bounding boxes 20 and 24 is determined. If the distance is less than a predetermined threshold, then the new stroke 24 is added (merged) to the sub-segment 16 (bounding box). (column 3 lines 10-25 and fig 8) Note: if the new stroke is not surrounded by previous strokes, which would read on non extracted value, distance is looked at to see if the bounding boxes are close and then they are added (merged) to sub segment
Therefore, it would have been obvious to one of ordinary skill in the art to modify Patel et al in view of Zhong et al further in view of Sisto et al to include the teaching of Loudon et al wherein checking if a desired value is extracted as the final answer from an initial set of text clusters with their respective bounding boxes; and responsive to determining that the desired value is not extracted from the initial set of text clusters with their respective bounding boxes, creating compound bounding boxes by merging one or more neighboring bounding boxes, thereby merging the corresponding text clusters.
The reason of doing so would be to reduce the amount of bounding boxes to process, by merging bounding boxes.
Patel et al in view of Zhong et al further in view of Sisto et al further in view of Loudon et al fails to teach wherein detecting the plurality of text clusters uses a text cluster detection algorithm executed by the computer processor. wherein the text cluster detection algorithm applies a morphological transformation on the bounding boxes along one or more axes,
Kakrana et al teaches wherein detecting the plurality of text clusters uses a text cluster detection algorithm executed by the computer processor, wherein the text cluster detection algorithm applies a morphological transformation on the bounding boxes along one or more axes (data node logic 244a (text cluster detection algorithm) is configured to detect the plurality of closed-shaped data nodes (plurality of text clusters) and localize the text enclosed within the plurality of closed-shaped data nodes, as described below. The data node logic 244a may store a variety of algorithms for this purpose including, but not limited to, canny edge detection algorithm, morphological transformation (paragraph 0035) Note: a plurality of closed-shaped data nodes representing any geometrical or non-geometrical shape having text enclosed (paragraph 0040). morphological transformation through one or more rounds of erosion and (v) edge detection using canny edge detection algorithm to highlight the horizontal lines, vertical lines to adaptively highlight the geometrical edges of each of the plurality of closed-shaped data nodes (paragraph 0042), which teaches the morphological transformation on the data nodes (bounding box) is performed along horizontal and vertical lines (axes),
Therefore, it would have been obvious to one of ordinary skill in the art to modify Patel et al in view of Zhong et al further in view of Sisto et al to include the teaching of Loudon et al wherein detecting the plurality of text clusters uses a text cluster detection algorithm executed by the computer processor, wherein the text cluster detection algorithm applies a morphological transformation on the bounding boxes along one or more axes.
The reason of doing so would be to reduce the amount of bounding boxes to process, by merging bounding boxes.
Conclusion
Any inquiry concerning this communication should be directed to Michael Burleson whose telephone number is (571) 272-7460 and fax number is (571) 273-7460. The examiner can normally be reached Monday thru Friday from 8:00 a.m. – 4:30p.m. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Akwasi Sarpong can be reached at (571) 270- 3438.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
Michael Burleson
Patent Examiner
Art Unit 2683
Michael Burleson
February 28, 2026
/MICHAEL BURLESON/
/AKWASI M SARPONG/SPE, Art Unit 2681 3/9/2026