Office Action Analysis: 18696981 — LAYOUT ANALYSIS SYSTEM, LAYOUT ANALYSIS METHOD, AND PROGRAM

Office Action

§101 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 03/28/2024 has been considered by the examiner.
Preliminary Amendment
The Preliminary Amendment submitted on 03/28/2024 has been entered and made of record.
Status of Claims
Currently pending Claims:
Amended claims:
1-15
1-3, 5-13, and 15

Specification
The title of the invention is not descriptive. A new title is required that is clearly indicative of the invention to which the claims are directed. 
Claim Objections
Claims 1, 14, and 15 are objected to because of the following informalities: Claims 1 and 15 line 3 recites “detect a plurality of cells from in a document” and claim 14 line 2 recites “detecting a plurality of cells from in a document image” this could read “from a document” or “in a document”, sentence contains two prepositions only needs one for clarity. Appropriate correction is required.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1, 2, and 9-15 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., abstract idea - mental process) without significantly more. 
Step (1) Are the claims directed to a process, machine, manufacture, or composition of matter; 
Step (2A) Prong One: Are the claims directed to a judicially recognized exception, i.e., a law of nature, a natural phenomenon, or an abstract idea; 
Prong Two: If the claims are directed to a judicial exception under Prong One, then is the judicial exception integrated into a practical application; 
Step (2B) If the claims are directed to a judicial exception and do not integrate the judicial exception, do the claims provide an inventive concept.

Step 1:
	Claim 1 recites a system. Therefore, the claims are directed to the statutory categories of machine.
Step 2A:
Prong One:
	Claim 1 recites:
“detect a plurality of cells from in a document image showing a document including a plurality of components”. Under its broadest reasonable interpretation in light of the specification, the first limitation encompasses the mental process of detecting multiple cells from a document which is practically capable of being performed in the human mind with the assistance of pen and paper.
“analyze a layout relating to the document based on the cell information on each of the plurality of cells.”. Under its broadest reasonable interpretation in light of the specification, the first limitation encompasses the mental process of analyzing the layout based on the cell information which is practically capable of being performed in the human mind with the assistance of pen and paper
Prong Two:
This judicial exception is not integrated into a practical application. The additional elements of “A layout analysis system, comprising at least one processor configured to” and “acquire cell information relating to at least one of a row or a column of each of the plurality of cells based on coordinates of each of the plurality of cells” amount to no more than mere necessary data gathering and applying because it is simply using hardware, or a generic computer as a tool to perform the abstract idea. Thus, they are insignificant extra-solution activity. Even when viewed in combination, these additional elements do not integrate the abstract idea into a practical application and the claims are thus directed to the abstract idea.
Step 2B:
	Claim 1 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. “A layout analysis system, comprising at least one processor configured to” and “acquire cell information relating to at least one of a row or a column of each of the plurality of cells based on coordinates of each of the plurality of cells” amount to no more than mere data gathering and applying with a general purpose computer. These elements, individually and in combination, are well-understood, routine, conventional activity. As such, the claim is ineligible.

Step 1: 
Claims 2, 9-13 recite a system. Therefore, the claims are directed to the statutory categories of 
machine. 
Step 2A: 
Prong One: 
Claim 2, 9-13 merely narrow the previously recited abstract idea limitations. For the reasons described above, this judicial exception is not meaningfully integrated into a practical application, or significantly more than the abstract idea. The claims disclose similar limitations described for the independent claims above and do not provide anything more than the mental process and mathematical calculation that are practically capable of being performed in the human mind with the assistance of pen and paper. 
Prong Two: 
These judicial exceptions are not integrated into a practical application nor includes additional 
elements that are sufficient to amount to significantly more. Thus, the claims are ineligible.

Step 1:
	Claim 14 recites a method. Therefore, the claims are directed to the statutory categories of process.
Step 2A:
Prong One:
	Claim 1 recites:
“detecting a plurality of cells from in a document image showing a document including a plurality of components”. Under its broadest reasonable interpretation in light of the specification, the first limitation encompasses the mental process of detecting multiple cells from a document which is practically capable of being performed in the human mind with the assistance of pen and paper.
“analyzing a layout relating to the document based on the cell information on each of the plurality of cells”. Under its broadest reasonable interpretation in light of the specification, the first limitation encompasses the mental process of analyzing the layout based on the cell information which is practically capable of being performed in the human mind with the assistance of pen and paper
Prong Two:
This judicial exception is not integrated into a practical application. The additional elements of “A layout analysis method” and “acquiring cell information relating to at least one of a row or a column of each of the plurality of cells based on coordinates of each of the plurality of cells” amount to no more than mere necessary data gathering and applying because it is simply using hardware, or a generic computer as a tool to perform the abstract idea. Thus, they are insignificant extra-solution activity. Even when viewed in combination, these additional elements do not integrate the abstract idea into a practical application and the claims are thus directed to the abstract idea.
Step 2B:
	Claim 14 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. “A layout analysis method” and “acquiring cell information relating to at least one of a row or a column of each of the plurality of cells based on coordinates of each of the plurality of cells” amount to no more than mere data gathering and applying with a general purpose computer. These elements, individually and in combination, are well-understood, routine, conventional activity. As such, the claim is ineligible.

Step 1:
	Claim 15 recites a non-transitory storage medium. Therefore, the claims are directed to the statutory categories of manufacture.
Step 2A:
Prong One:
	Claim 15 recites:
“detect a plurality of cells from in a document image showing a document including a plurality of components”. Under its broadest reasonable interpretation in light of the specification, the first limitation encompasses the mental process of detecting multiple cells from a document which is practically capable of being performed in the human mind with the assistance of pen and paper.
“analyze a layout relating to the document based on the cell information on each of the plurality of cells.”. Under its broadest reasonable interpretation in light of the specification, the first limitation encompasses the mental process of analyzing the layout based on the cell information which is practically capable of being performed in the human mind with the assistance of pen and paper
Prong Two:
This judicial exception is not integrated into a practical application. The additional elements of “A non-transitory computer-readable information storage medium for storing a program for causing a computer to” and “acquire cell information relating to at least one of a row or a column of each of the plurality of cells based on coordinates of each of the plurality of cells” amount to no more than mere necessary data gathering and applying because it is simply using hardware, or a generic computer as a tool to perform the abstract idea. Thus, they are insignificant extra-solution activity. Even when viewed in combination, these additional elements do not integrate the abstract idea into a practical application and the claims are thus directed to the abstract idea.
Step 2B:
	Claim 15 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. “A non-transitory computer-readable information storage medium for storing a program for causing a computer to” and “acquire cell information relating to at least one of a row or a column of each of the plurality of cells based on coordinates of each of the plurality of cells” amount to no more than mere data gathering and applying with a general purpose computer. These elements, individually and in combination, are well-understood, routine, conventional activity. As such, the claim is ineligible.

Dependent claims 3-8 recite additional elements that, when considered individually and as an ordered combination, amount to significantly more that the abstract idea identified in claim 1. Specifically, these claims include technical features such as:
“analyze the layout by arranging the cell information on each of the plurality of cells under a predetermined condition, inputting the arranged cell information to the learning model, and acquiring a result of analysis of the layout by the learning model.”. The additional elements of claim 3 do not recite an abstract idea and all additional elements capture the improvements seen in Application Pub. Paragraph [0042-0043].
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claim 1-4, 14 , and 15 provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1-4, 13 and 14 of copending Application No. 18/696,991 (reference application). Although the claims at issue are not identical, they are not patentably distinct from each other because the Copending Application is broader than independent claims 1, 14 and 15 of the Instant Application.
This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.
Table 1
18/696,981 (Instant Application)
18/696,991 (Copending Application)
Claim 1:
A layout analysis system, comprising at least one processor configured to: detect a plurality of cells from in a document image showing a document including a plurality of components; acquire cell information relating to at least one of a row or a column of each of the plurality of cells based on coordinates of each of the plurality of cells; and analyze a layout relating to the document based on the cell information on each of the plurality of cells.
Claim 1:
A layout analysis system, comprising at least one processor configured to: detect a cell of each of a plurality of scales from in a document image showing a document including a plurality of components; acquire cell information relating to the cell of each of the plurality of scales; and analyze a layout relating to the document based on the cell information on each of the plurality of scales.
Claim 2:
The layout analysis system according to claim 1, wherein the at least one processor is configured to analyze the layout based on a learning model which has learned a for-training layout relating to a for-training document.
Claim 2:
The layout analysis system according to claim 1, wherein the at least one processor is configured to analyze the layout based on a learning model which has learned a for-training layout relating to a for-training document.
Claim 3:
The layout analysis system according to claim 2, wherein the at least one processor is configured to analyze the layout by arranging the cell information on each of the plurality of cells under a predetermined condition, inputting the arranged cell information to the learning model, and acquiring a result of analysis of the
layout by the learning model.
Claim 3:
The layout analysis system according to claim 2, wherein the at least one processor is configured to analyze the layout by arranging the cell information on each of the plurality of scales under a predetermined condition, inputting the arranged cell information to the learning model, and acquiring a result of analysis of the layout by the learning model.
Claim 4:
The layout analysis system according to claim 3, wherein the learning model is a Vision Transformer-based model.
Claim 4:
The layout analysis system according to claim 3, wherein the learning model is a Vision Transformer-based model.
Claim 14:
A layout analysis method, comprising: detecting a plurality of cells from in a document image showing a document including a plurality of components; acquiring cell information relating to at least one of a row or a column of each of the plurality of cells based on coordinates of each of the plurality of cells; and analyzing a layout relating to the document based on the cell information on each of the plurality of cells.
Claim 13:
A layout analysis method, comprising: detecting a cell of each of a plurality of scales from in a document image showing a document including a plurality of components; acquiring cell information relating to the cell of each of the plurality of scales; and analyzing a layout relating to the document based on the cell information on each of the plurality of scales.
Claim 15:
A non-transitory computer-readable information storage medium for storing a program for causing a computer to detect a plurality of cells from in a document image showing a document including a plurality of components; acquire cell information relating to at least one of a row or a column of each of the plurality of cells based on coordinates of each of the plurality of cells; and analyze a layout relating to the document based on the cell information on each of the plurality of cells.
Claim 14:
A non-transitory computer-readable information storage medium for storing a program for causing a computer detect a cell of each of a plurality of scales from in a document image showing a document including a plurality of components; acquire cell information relating to the cell of each of the plurality of scales; and analyze a layout relating to the document based on the cell information on each of the plurality of scales.

Table 1
The table (Table 1) above shows that independent claims 1, 14 and 15 of the Instant Application are not identical to the claims of 18/696,991 (Copending Application), however, the claims are not patentably distinct.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 5, 6, 9, 10, 12, 14 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Chatzistamatiou et al. (US 2023/0410543 A1) (hereinafter, “Chatzistamatiou”) in view of Yang et al. (CN 113,343,740 B) (hereinafter, “Yang”).
Regarding claim 1, Chatzistamatiou discloses layout analysis system (Paragraph [0008] “instructions cause the processor to map each item of data extracted from a cell in the first table to a field using semantic data understanding, and to generate a first digital table representing data extracted from the first table for presentation in a user interface.”), comprising at least one processor configured to: 
detect a plurality of cells from in a document image showing a document including a plurality of components (Paragraph [0026] “the proposed techniques offer an end-to-end solution toward the organization of a set of documents based on similar characteristics. In particular, documents processed by the disclosed extraction system may be generated by photography or scanning of physical documents.”; Paragraph [0031] “data from the image is extracted, even where there are no boundaries for the tables or lists (“boundaryless”). In one embodiment, column segmentation is performed based on signal analysis on column wise mean pixel values, line detection based on Computer Vision (CV) techniques, and clustering models. Furthermore, row segmentation is performed using OCR bounding boxes.”); 
Figure 4C:

    PNG
    media_image1.png
    564
    772
    media_image1.png
    Greyscale

acquire cell information relating to at least one of a row or a column of each of the plurality of cells [based on coordinates of each of the plurality of cells] (Paragraph [0049] “In the first stage 610, signal analysis labeling is used to demarcate the scanned document 702 as shown, with a plurality of horizontal lines 780 and a plurality of vertical lines 770. The plurality of horizontal lines 780 are used to automatically identify each row (e.g., shown as “row 0”, “row 1”, “row 2”. . . “row 14”) by the system. In addition, row classification is performed, labeling a first section 710 (“other”), a second section 720 (“header”), and a third section 730 (“table”).”)
Figure 7:

    PNG
    media_image2.png
    554
    768
    media_image2.png
    Greyscale

; and 
analyze a layout relating to the document based on the cell information on each of the plurality of cells (Paragraph 0029] “the proposed extraction techniques and systems can be understood to operate as part of a larger text analysis paradigm.”; Paragraph [0047] “CRF models are used to perform row/column classification (rather than word classification). For rows the task is to cluster into three classes: “header row”, “table row”, and an “other row”. Based on this, the position of the actual table on the page can be determined accurately. The CRF model was selected as it considers the context of information, rather than just a single aspect of data at a time. In other words, the model will attempt to predict a certain goal based not only on the individual row content being focused on, but also on the previous (above) and next (below) row. This larger view allows for improved labeling of each row.”; Paragraph [0071] “a document reader and data extraction system 1014 (or system 1014), according to an embodiment. The environment 1000 may include a plurality of components capable of performing the disclosed method of table or list recognition, row and column segmentation, table localization, and data mapping and visualization.”).
However, Chatzistamatiou fails to teach based on coordinates of each of the plurality of cells.
Yang teaches based on coordinates of each of the plurality of cells (Paragraph [051] “coordinates of the four vertices of the multiple cells are determined directly according to the corresponding center point coordinates of the multiple cells and the distances between the four vertices and the center point coordinates. The coordinates of the four vertices of the cell define a plurality of cell regions.”).
Therefore, it would have been obvious to one of ordinary skill of the art before the effective
filing date to modify Chatzistamatiou’s reference to include based on coordinates of each of the plurality of cells taught by Yang’s reference. The motivation for doing so would have been to define the regions of the plurality of cells as suggested by Yang (see Yang, Paragraph [051]).
Further, one skilled in the art could have combined the elements described above by known methods with no change to the respective functions, and the combination would have yielded nothing more that predictable results. Therefore, it would have been obvious to combine Yang with Chatzistamatiou to obtain the invention specified in claim 1.
Regarding claim 2, which claim 1 is incorporated, Chatzistamatiou discloses wherein the at least one processor is configured to analyze the layout based on a learning model which has learned a for-training layout relating to a for-training document (Paragraph [0047] “A set of training data was generated to train the CRF model to classify the rows and columns with such precision before model deployment.”).
Regarding claim 3, which claim 2 is incorporated, Chatzistamatiou discloses wherein the at least one processor is configured to analyze the layout by arranging the cell information on each of the plurality of cells under a predetermined condition (Paragraph [0006] “where this has been applied is in museum archiving where documents may not be categorically organized by document type. In addition, the system offers highly accurate table segmentation, where columns are differentiated based on the signal analysis of column-wise mean pixel values, and rows are differentiated based on the textboxes from OCR results.”; Paragraph [0047] “CRF models are used to perform row/column classification (rather than word classification). For rows the task is to cluster into three classes: “header row”, “table row”, and an “other row”. Based on this, the position of the actual table on the page can be determined accurately. The CRF model was selected as it considers the context of information, rather than just a single aspect of data at a time. In other words, the model will attempt to predict a certain goal based not only on the individual row content being focused on, but also on the previous (above) and next (below) row. This larger view allows for improved labeling of each row.”), inputting the arranged cell information to the learning model (Paragraph [0033] “in a fourth step 140, data is mapped to the correct corresponding field utilizing semantic data understanding. This is done even in the absence of headers identifying the information. Semantic data understanding can also be used to train the machine learning models to recognize certain types of information (e.g., is a number a date or prisoner number, is a column referring to occupation or birthplace, etc.). Using this understanding, data in the document can be mapped back to a specific format.”), and acquiring a result of analysis of the layout by the learning model (Paragraph [0047] “A set of training data was generated to train the CRF model to classify the rows and columns with such precision before model deployment.”).
Regarding claim 5, which claim 3 is incorporated, Chatzistamatiou discloses wherein the cell information includes a row order in the document image (Paragraph [0049] “In the first stage 610, signal analysis labeling is used to demarcate the scanned document 702 as shown, with a plurality of horizontal lines 780 and a plurality of vertical lines 770. The plurality of horizontal lines 780 are used to automatically identify each row (e.g., shown as “row 0”, “row 1”, “row 2”. . . “row 14”) by the system. In addition, row classification is performed, labeling a first section 710 (“other”), a second section 720 (“header”), and a third section 730 (“table”).”), and
wherein the at least one processor is configured to sort the cell information on each of the plurality of cells based on the row order of the each of the plurality of cells (Paragraph [0049] “In the first stage 610, signal analysis labeling is used to demarcate the scanned document 702 as shown, with a plurality of horizontal lines 780 and a plurality of vertical lines 770. The plurality of horizontal lines 780 are used to automatically identify each row (e.g., shown as “row 0”, “row 1”, “row 2”. . . “row 14”) by the system. In addition, row classification is performed, labeling a first section 710 (“other”), a second section 720 (“header”), and a third section 730 (“table”).”; Figure 7)
However, Chatzistamatiou fails to teach to input [the sorted cell] information to the learning model.
Yang teaches to input [the sorted cell] information to the learning model (Paragraph 0118] “training process, it is first necessary to obtain a training sample image for training the table detection model, and the training sample image includes a table, that is, a table image. After that, it is necessary to label the training sample images according to the requirements of the above five output layers, so as to perform supervised training of the table detection model based on the label information (ie, the supervision information).”).
Therefore, it would have been obvious to one of ordinary skill of the art before the effective
filing date to modify Chatzistamatiou’s reference to input [the sorted cell] information to the learning model taught by Yang’s reference. The motivation for doing so would have been to improve the detection performance by training the model used in the system as suggested by Yang (see Yang, Paragraph [0113] and Paragraph [0130]).
Further, one skilled in the art could have combined the elements described above by known methods with no change to the respective functions, and the combination would have yielded nothing more that predictable results. Therefore, it would have been obvious to combine Yang with Chatzistamatiou to obtain the invention specified in claim 5.
Regarding claim 6, which claim 5 is incorporated, Chatzistamatiou discloses wherein the at least one processor is configured to sort the cell information on each of the plurality of cells based on the row order of the each of the plurality of cells (Paragraph [0049] “In the first stage 610, signal analysis labeling is used to demarcate the scanned document 702 as shown, with a plurality of horizontal lines 780 and a plurality of vertical lines 770. The plurality of horizontal lines 780 are used to automatically identify each row (e.g., shown as “row 0”, “row 1”, “row 2”. . . “row 14”) by the system. In addition, row classification is performed, labeling a first section 710 (“other”), a second section 720 (“header”), and a third section 730 (“table”).”; Figure 7), to insert predetermined row change information into a portion which has a row change (Paragraph [0047] “CRF model was selected as it considers the context of information, rather than just a single aspect of data at a time. In other words, the model will attempt to predict a certain goal based not only on the individual row content being focused on, but also on the previous (above) and next (below) row… CRF model is better equipped to predict whether a line is a header or is actually inside of the table, or if it is another row of data, or outside of the table.”)
However, Chatzistamatiou fail to teach to input the cell information [having the inserted predetermined row change information] to the learning model.
Yang teaches to input the cell information [having the inserted predetermined row change information] to the learning model (Paragraph 0118] “training process, it is first necessary to obtain a training sample image for training the table detection model, and the training sample image includes a table, that is, a table image. After that, it is necessary to label the training sample images according to the requirements of the above five output layers, so as to perform supervised training of the table detection model based on the label information (ie, the supervision information).”).
Therefore, it would have been obvious to one of ordinary skill of the art before the effective
filing date to modify Chatzistamatiou’s reference to input the cell information [having the inserted predetermined row change information] to the learning model taught by Yang’s reference. The motivation for doing so would have been to improve the detection performance by training the model used in the system as suggested by Yang (see Yang, Paragraph [0113] and Paragraph [0130]). 
Further, one skilled in the art could have combined the elements described above by known methods with no change to the respective functions, and the combination would have yielded nothing more that predictable results. Therefore, it would have been obvious to combine Yang with Chatzistamatiou to obtain the invention specified in claim 6.
Regarding claim 9, which claim 1 is incorporated, Chatzistamatiou fails to teach wherein the at least one processor is configured to acquire the cell information relating to the row of each of the plurality of cells based on a y-coordinate of the each of the plurality of cells so that cells having a distance from each other in a y-axis direction of less than a threshold value are arranged in the same row.
Yang teaches wherein the at least one processor is configured to acquire the cell information relating to the row of each of the plurality of cells based on a y-coordinate of the each of the plurality of cells so that cells having a distance from each other in a y-axis direction of less than a threshold value are arranged in the same row (Paragraph [079] “It is determined that the distance between the coordinates of the first vertex and the coordinates of the second vertex is less than the second threshold, and the first boundary line and the second boundary line are merged.; Paragraph [081] “if the distance between A1 and A2 and the distance between B1 and B2 are both smaller than the above-mentioned second threshold, it is considered that A1 and A2 should be merged, and B1 and B2 should be merged. Thus, the boundary line L1 and the boundary line L2 are merged into one boundary line L.”).
Therefore, it would have been obvious to one of ordinary skill of the art before the effective
filing date to modify Chatzistamatiou’s reference to include wherein the at least one processor is configured to acquire the cell information relating to the row of each of the plurality of cells based on a y-coordinate of the each of the plurality of cells so that cells having a distance from each other in a y-axis direction of less than a threshold value are arranged in the same row taught by Yang’s reference. The motivation for doing so would have been to align the cell columns and rows from the document as suggested by Yang (see Yang, Paragraph [074]).
Further, one skilled in the art could have combined the elements described above by known methods with no change to the respective functions, and the combination would have yielded nothing more that predictable results. Therefore, it would have been obvious to combine Yang with Chatzistamatiou to obtain the invention specified in claim 9.
Regarding claim 10, which claim 1 is incorporated, Chatzistamatiou fails to teach wherein the at least one processor is configured to acquire the cell information relating to the column of each of the plurality of cells based on an x-coordinate of the each of the plurality of cells so that cells having a distance from each other in an x-axis direction of less than a threshold value are arranged in the same column.
Yang teaches wherein the at least one processor is configured to acquire the cell information relating to the column of each of the plurality of cells based on an x-coordinate of the each of the plurality of cells so that cells having a distance from each other in an x-axis direction of less than a threshold value are arranged in the same column (Paragraph [081] “if the distance between A1 and A2 and the distance between B1 and B2 are both smaller than the above-mentioned second threshold, it is considered that A1 and A2 should be merged, and B1 and B2 should be merged. Thus, the boundary line L1 and the boundary line L2 are merged into one boundary line L.”; Paragraph [084] “It can be understood that, after obtaining multiple cell regions that have undergone boundary correction processing, the adjacency relationship between the two adjacent cell regions can be determined according to the positional relationship between the corresponding center point coordinates of the two adjacent cell regions Left-right adjacency or top-bottom adjacency.”).
Therefore, it would have been obvious to one of ordinary skill of the art before the effective
filing date to modify Chatzistamatiou reference to include wherein the at least one processor is configured to acquire the cell information relating to the column of each of the plurality of cells based on an x-coordinate of the each of the plurality of cells so that cells having a distance from each other in an x-axis direction of less than a threshold value are arranged in the same column taught by Yang’s reference. The motivation for doing so would have been to align the cell columns and rows from the document as suggested by Yang (see Yang, Paragraph [074]).
Further, one skilled in the art could have combined the elements described above by known methods with no change to the respective functions, and the combination would have yielded nothing more that predictable results. Therefore, it would have been obvious to combine Yang with Chatzistamatiou to obtain the invention specified in claim 10.
Regarding claim 12, which claim 9 is incorporated, Chatzistamatiou fails to teach wherein the at least one processor is configured to determine the threshold value based on a size of each of the plurality of cells.
Yang teaches wherein the at least one processor is configured to determine the threshold value based on a size of each of the plurality of cells (Paragraph [050] “the coordinates refer to the coordinates of the corresponding pixels, and the distance refers to the distance between pixels.”; Paragraph [065-066] “distance between the coordinate of the intersection point of the target candidate line and the coordinate of any vertex is less than the first threshold, and the coordinate of any vertex is updated with the coordinate of the intersection of the target candidate line. The preset distance as the radius is, for example, 20 pixels, and the first threshold is, for example, 10 pixels.”).
Therefore, it would have been obvious to one of ordinary skill of the art before the effective
filing date to modify Chatzistamatiou reference to include at least one processor is configured to determine the threshold value based on a size of each of the plurality of cells taught by Yang’s reference. The motivation for doing so would have been to align the cell columns and rows from the document as suggested by Yang (see Yang, Paragraph [074]).
Further, one skilled in the art could have combined the elements described above by known methods with no change to the respective functions, and the combination would have yielded nothing more that predictable results. Therefore, it would have been obvious to combine Yang with Chatzistamatiou to obtain the invention specified in claim 12.
Regarding claim 13, which claim 1 is incorporated, Chatzistamatiou discloses wherein the at least one processor is configured to detect the plurality of cells by executing optical character recognition on the document image (Paragraph [0036] “Document image binarization is often performed in the preprocessing stage of different document image processing related applications such as optical character recognition (OCR) and document image retrieval.”).
Regarding claim 14, Chatzistamatiou discloses a layout analysis method (Paragraph [0008] “instructions cause the processor to map each item of data extracted from a cell in the first table to a field using semantic data understanding, and to generate a first digital table representing data extracted from the first table for presentation in a user interface.”), comprising: detecting a plurality of cells from in a document image showing a document including a plurality of components (Paragraph [0026] “the proposed techniques offer an end-to-end solution toward the organization of a set of documents based on similar characteristics. In particular, documents processed by the disclosed extraction system may be generated by photography or scanning of physical documents.”; Paragraph [0031] “data from the image is extracted, even where there are no boundaries for the tables or lists (“boundaryless”). In one embodiment, column segmentation is performed based on signal analysis on column wise mean pixel values, line detection based on Computer Vision (CV) techniques, and clustering models. Furthermore, row segmentation is performed using OCR bounding boxes.”; Figure 4C);
acquiring cell information relating to at least one of a row or a column of each of the plurality of cells [based on coordinates of each of the plurality of cells] (Paragraph [0049] “In the first stage 610, signal analysis labeling is used to demarcate the scanned document 702 as shown, with a plurality of horizontal lines 780 and a plurality of vertical lines 770. The plurality of horizontal lines 780 are used to automatically identify each row (e.g., shown as “row 0”, “row 1”, “row 2”. . . “row 14”) by the system. In addition, row classification is performed, labeling a first section 710 (“other”), a second section 720 (“header”), and a third section 730 (“table”)”; Figure 7); and
analyzing a layout relating to the document based on the cell information on each of the plurality of cells (Paragraph 0029] “the proposed extraction techniques and systems can be understood to operate as part of a larger text analysis paradigm.”; Paragraph [0047] “CRF models are used to perform row/column classification (rather than word classification). For rows the task is to cluster into three classes: “header row”, “table row”, and an “other row”. Based on this, the position of the actual table on the page can be determined accurately. The CRF model was selected as it considers the context of information, rather than just a single aspect of data at a time. In other words, the model will attempt to predict a certain goal based not only on the individual row content being focused on, but also on the previous (above) and next (below) row. This larger view allows for improved labeling of each row.”; Paragraph [0071] “a document reader and data extraction system 1014 (or system 1014), according to an embodiment. The environment 1000 may include a plurality of components capable of performing the disclosed method of table or list recognition, row and column segmentation, table localization, and data mapping and visualization.”).
However, Chatzistamatiou fails to teach based on coordinates of each of the plurality of cells.
Yang teaches based on coordinates of each of the plurality of cells (Paragraph [051] “coordinates of the four vertices of the multiple cells are determined directly according to the corresponding center point coordinates of the multiple cells and the distances between the four vertices and the center point coordinates. The coordinates of the four vertices of the cell define a plurality of cell regions.”).
Therefore, it would have been obvious to one of ordinary skill of the art before the effective
filing date to modify Chatzistamatiou reference to include based on coordinates of each of the plurality of cells taught by Yang’s reference. The motivation for doing so would have been to define the regions of the plurality of cells as suggested by Yang (see Yang, Paragraph [051]).
Further, one skilled in the art could have combined the elements described above by known methods with no change to the respective functions, and the combination would have yielded nothing more that predictable results. Therefore, it would have been obvious to combine Yang with Chatzistamatiou to obtain the invention specified in claim 14.
Regarding claim 15, Chatzistamatiou discloses a non-transitory computer-readable information storage medium for storing a program for causing a computer to (Paragraph [0085] “storage components store information and/or software related to the operation and use of the device. For example, storage components may include a hard disk (e.g., a magnetic disk, an optical disk, a magneto-optic disk, and/or a solid state disk), a compact disc (CD), a digital versatile disc (DVD), a floppy disk, a cartridge, a magnetic tape, and/or another type of non-transitory computer-readable medium, along with a corresponding drive.”): 
detect a plurality of cells from in a document image showing a document including a plurality of components (Paragraph [0026] “the proposed techniques offer an end-to-end solution toward the organization of a set of documents based on similar characteristics. In particular, documents processed by the disclosed extraction system may be generated by photography or scanning of physical documents.”; Paragraph [0031] “data from the image is extracted, even where there are no boundaries for the tables or lists (“boundaryless”). In one embodiment, column segmentation is performed based on signal analysis on column wise mean pixel values, line detection based on Computer Vision (CV) techniques, and clustering models. Furthermore, row segmentation is performed using OCR bounding boxes.”; Figure 4C):
acquire cell information relating to at least one of a row or a column of each of the plurality of cells [based on coordinates of each of the plurality of cells] (Paragraph [0049] “In the first stage 610, signal analysis labeling is used to demarcate the scanned document 702 as shown, with a plurality of horizontal lines 780 and a plurality of vertical lines 770. The plurality of horizontal lines 780 are used to automatically identify each row (e.g., shown as “row 0”, “row 1”, “row 2”. . . “row 14”) by the system. In addition, row classification is performed, labeling a first section 710 (“other”), a second section 720 (“header”), and a third section 730 (“table”)”; Figure 7); and
analyze a layout relating to the document based on the cell information on each of the plurality of cells (Paragraph 0029] “the proposed extraction techniques and systems can be understood to operate as part of a larger text analysis paradigm.”; Paragraph [0047] “CRF models are used to perform row/column classification (rather than word classification). For rows the task is to cluster into three classes: “header row”, “table row”, and an “other row”. Based on this, the position of the actual table on the page can be determined accurately. The CRF model was selected as it considers the context of information, rather than just a single aspect of data at a time. In other words, the model will attempt to predict a certain goal based not only on the individual row content being focused on, but also on the previous (above) and next (below) row. This larger view allows for improved labeling of each row.”; Paragraph [0071] “a document reader and data extraction system 1014 (or system 1014), according to an embodiment. The environment 1000 may include a plurality of components capable of performing the disclosed method of table or list recognition, row and column segmentation, table localization, and data mapping and visualization.”).
However, Chatzistamatiou fails to teach based on coordinates of each of the plurality of cells.
Yang teaches based on coordinates of each of the plurality of cells (Paragraph [051] “coordinates of the four vertices of the multiple cells are determined directly according to the corresponding center point coordinates of the multiple cells and the distances between the four vertices and the center point coordinates. The coordinates of the four vertices of the cell define a plurality of cell regions.”).
Therefore, it would have been obvious to one of ordinary skill of the art before the effective
filing date to modify Chatzistamatiou reference to include based on coordinates of each of the plurality of cells taught by Yang’s reference. The motivation for doing so would have been to define the regions of the plurality of cells as suggested by Yang (see Yang, Paragraph [051]).
Further, one skilled in the art could have combined the elements described above by known methods with no change to the respective functions, and the combination would have yielded nothing more that predictable results. Therefore, it would have been obvious to combine Yang with Chatzistamatiou to obtain the invention specified in claim 15.
Claims 4, 7, and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Chatzistamatiou et al. (US 2023/0410543 A1) (hereinafter, “Chatzistamatiou”) in view of Yang et al. (CN 113,343,740 B) (hereinafter, “Yang”), and further in view of Li (CN 113,901,904 A).
Regarding claim 4, which claim 3 is incorporated, Chatzistamatiou and Yang fails to teach wherein the learning model is a Vision Transformer-based model.
Li teaches wherein the learning model is a Vision Transformer-based model (Paragraph [0064] “implementation method may include: inputting multiple face image samples into the visual transformation model, and obtaining the attention matrix corresponding to each face image sample output by each layer of network; merging all the obtained attention matrices to obtain each image.”).
Therefore, it would have been obvious to one of ordinary skill of the art before the effective
filing date to modify Chatzistamatiou in view of Yang to include wherein the learning model is a Vision Transformer-based model taught by Li’s reference. The motivation for doing so would have been to divide the document into multiple patches as suggested by Li (see Li, Paragraph [0052]).
Further, one skilled in the art could have combined the elements described above by known methods with no change to the respective functions, and the combination would have yielded nothing more that predictable results. Therefore, it would have been obvious to combine Li with Chatzistamatiou and Yang to obtain the invention specified in claim 4.
Regarding claim 7, which claim 3 is incorporated, Chatzistamatiou fails to teach wherein the cell information includes a column order in the document image, and wherein the at least one processor is configured to sort the cell information on each of the plurality of cells based on the column order of the each of the plurality of cells, and to input the sorted cell information to the learning model.
Yang teaches to input [the sorted cell] information to the learning model (Paragraph 0118] “training process, it is first necessary to obtain a training sample image for training the table detection model, and the training sample image includes a table, that is, a table image. After that, it is necessary to label the training sample images according to the requirements of the above five output layers, so as to perform supervised training of the table detection model based on the label information (ie, the supervision information).”).
Therefore, it would have been obvious to one of ordinary skill of the art before the effective
filing date to modify Chatzistamatiou’s reference to include to input [the sorted cell] information to the learning model taught by Yang’s reference. The motivation for doing so would have been to improve the detection performance by training the model used in the system as suggested by Yang (see Yang, Paragraph [0113] and Paragraph [0130]).
Further, one skilled in the art could have combined the elements described above by known methods with no change to the respective functions, and the combination would have yielded nothing more that predictable results.
	However, Chatzistamatiou and Yang fail to teach wherein the cell information includes a column order in the document image, and wherein the at least one processor is configured to sort the cell information on each of the plurality of cells based on the column order of the each of the plurality of cells.
Li teaches wherein the cell information includes a column order in the document image (Paragraph [0093] “after the face image to be processed is cut into a plurality of image blocks, each image block is arranged based on the position in the face image to be processed, that is, the face to be processed is divided into The image is divided into multiple image blocks, which is equivalent to dividing the face image to be processed into different rows and columns. Each image block is arranged based on the position in the face image to be processed. order, from top to bottom, left to right.”), 
and wherein the at least one processor is configured to sort the cell information on each of the plurality of cells based on the column order of the each of the plurality of cells (Paragraph [0093] “after the face image to be processed is cut into a plurality of image blocks, each image block is arranged based on the position in the face image to be processed, that is, the face to be processed is divided into The image is divided into multiple image blocks, which is equivalent to dividing the face image to be processed into different rows and columns. Each image block is arranged based on the position in the face image to be processed. order, from top to bottom, left to right.”).
Therefore, it would have been obvious to one of ordinary skill of the art before the effective
filing date to modify Chatzistamatiou in view of Yang to include wherein the cell information includes a column order in the document image, and wherein the at least one processor is configured to sort the cell information on each of the plurality of cells based on the column order of the each of the plurality of cells taught by Li’s reference. The motivation for doing so would have been to arrange the cells based on their position in document as suggested by Li (see Li, Paragraph [0093]).
Further, one skilled in the art could have combined the elements described above by known methods with no change to the respective functions, and the combination would have yielded nothing more that predictable results. Therefore, it would have been obvious to combine Li with Chatzistamatiou and Yang to obtain the invention specified in claim 7.
Regarding claim 8, which claim 7 is incorporated, Chatzistamatiou discloses to insert predetermined column change information into a portion which has a column change (Paragraph [0048] “features must be selected carefully, as they are the basis by which the CRF models can classify the rows and columns. The trained CRF model can then evaluate an entire column's contents (not just one cell in the column), as well as columns adjacent to the selected column, in order to calculate a set of features… the CRF model does not simply use one column and then predict the class, but also takes the neighboring columns into account.”).
However, Chatzistamatiou fails to teach to input the cell information having the inserted predetermined column change information to the learning model.
Yang teaches to input the cell information [having the inserted predetermined column change information] to the learning model (Paragraph 0118] “training process, it is first necessary to obtain a training sample image for training the table detection model, and the training sample image includes a table, that is, a table image. After that, it is necessary to label the training sample images according to the requirements of the above five output layers, so as to perform supervised training of the table detection model based on the label information (ie, the supervision information).”).
Therefore, it would have been obvious to one of ordinary skill of the art before the effective
filing date to modify Chatzistamatiou’s reference to include to input the cell information [having the inserted predetermined column change information] to the learning model taught by Yang’s reference. The motivation for doing so would have been to improve the detection performance by training the model used in the system as suggested by Yang (see Yang, Paragraph [0113] and Paragraph [0130]).
Further, one skilled in the art could have combined the elements described above by known methods with no change to the respective functions, and the combination would have yielded nothing more that predictable results.
However, both Chatzistamatiou and Yang fail to teach wherein the at least one processor is configured to sort the cell information on each of the plurality of cells based on the column order of the each of the plurality of cells.
Li teaches wherein the at least one processor is configured to sort the cell information on each of the plurality of cells based on the column order of the each of the plurality of cells (Paragraph [0093] “after the face image to be processed is cut into a plurality of image blocks, each image block is arranged based on the position in the face image to be processed, that is, the face to be processed is divided into The image is divided into multiple image blocks, which is equivalent to dividing the face image to be processed into different rows and columns. Each image block is arranged based on the position in the face image to be processed. order, from top to bottom, left to right.”).
Therefore, it would have been obvious to one of ordinary skill of the art before the effective
filing date to modify Chatzistamatiou in view of Yang to include wherein the at least one processor is configured to sort the cell information on each of the plurality of cells based on the column order of the each of the plurality of cells taught by Li’s reference. The motivation for doing so would have been to arrange the cells based on their position in document as suggested by Li (see Li, Paragraph [0093]).
Further, one skilled in the art could have combined the elements described above by known methods with no change to the respective functions, and the combination would have yielded nothing more that predictable results. Therefore, it would have been obvious to combine Li with Chatzistamatiou and Yang to obtain the invention specified in claim 8. 
Claims 11 is rejected under 35 U.S.C. 103 as being unpatentable over Chatzistamatiou et al. (US 2023/0410543 A1) (hereinafter, “Chatzistamatiou”) in view of Yang et al. (CN 113,343,740 B) (hereinafter, “Yang”), and further in view of Yebes Torres et al. (US 2023/0005286 A1) (hereinafter “Torres”).
Regarding claim 11, which claim 9 is incorporated, Chatzistamatiou fails to teach wherein the at least one processor is configured to determine the threshold value based on a size of the whole document.
Yang teaches wherein the at least one processor is configured to determine the threshold value (Paragraph [079] “It is determined that the distance between the coordinates of the first vertex and the coordinates of the second vertex is less than the second threshold, and the first boundary line and the second boundary line are merged.”; Paragraph [081] “if the distance between A1 and A2 and the distance between B1 and B2 are both smaller than the above-mentioned second threshold, it is considered that A1 and A2 should be merged, and B1 and B2 should be merged. Thus, the boundary line L1 and the boundary line L2 are merged into one boundary line L.”).
Therefore, it would have been obvious to one of ordinary skill of the art before the effective
filing date to modify Chatzistamatiou reference to include wherein the at least one processor is configured to determine the threshold value taught by Yang’s reference. The motivation for doing so would have been to align the cell columns and rows from the document as suggested by Yang (see Yang, Paragraph [074]).
Further, one skilled in the art could have combined the elements described above by known methods with no change to the respective functions, and the combination would have yielded nothing more that predictable results. 
However, Chatzistamatiou and Yang fail to teach based on a size of the whole document.
Torres teaches based on a size of the whole document (Element 108 in Figure 9 equates to the whole document) (Paragraph [0133] “the bounding box generating circuitry 316 outputs a plurality of bounding boxes including respective coordinates corresponding to lines of the receipt region.”; Paragraph [0164] “example of FIG. 9…receives an example receipt image 108 (e.g., from the basket datastore 112)…The extraction circuitry 118 applies an example regions detection model 306 to the receipt image 108 to detect an example receipt region 902 and an example products region 904. In some examples, the extraction circuitry 118 applies a cropping operation (e.g., via the image cropping circuitry 308) to the receipt image 108 based on the detected regions).
Figure 9:

    PNG
    media_image3.png
    714
    540
    media_image3.png
    Greyscale

Therefore, it would have been obvious to one of ordinary skill of the art before the effective
filing date to modify Chatzistamatiou in view of Yang to include based on a size of the whole document taught by Yang’s reference. The motivation for doing so would have been to align the respective coordinates to the document region as suggested by Torres (see Torres, Paragraph [0133]).
Further, one skilled in the art could have combined the elements described above by known methods with no change to the respective functions, and the combination would have yielded nothing more that predictable results. Therefore, it would have been obvious to combine Torres with Chatzistamatiou and Yang to obtain the invention specified in claim 11.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Semenov et al. (US 2022/0198182 A1) discloses a method to receive a training data set comprising document images which are associated with their respective metadata identifying a document field containing a variable text.
Pinho et al. (US 2023/0237080 A1) discloses a method to collect annotated documents that include tables with their respective columns labeled.
Jamshidikhezeli et al. (US 2023/0260309 A1) discloses a method of extracting component from a document and identifying the subset of region from in the document and number of rows and columns from the extracted tables,
Any inquiry concerning this communication or earlier communications from the examiner should be directed to UROOJ FATIMA whose telephone number is (571)272-2096. The examiner can normally be reached M-F 8:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Henok Shiferaw can be reached at (571) 272-4637. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/UROOJ FATIMA/Examiner, Art Unit 2676                  


/Henok Shiferaw/Supervisory Patent Examiner, Art Unit 2676
Read full office action
LAYOUT ANALYSIS SYSTEM, LAYOUT ANALYSIS METHOD, AND PROGRAM

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

LAYOUT ANALYSIS SYSTEM, LAYOUT ANALYSIS METHOD, AND PROGRAM

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email