Office Action Analysis: 18033915 — SYSTEM AND METHOD FOR TEXT MINING

Office Action

§101 §103 §112
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. AU2020903975, filed on 11/02/2020. 
The present application is the U.S. national stage of International Application No. PCT/AU21/51282, filed 11/01/2021, under 35 U.S.C. 371. 

Status of Claims
Claims 1 – 22 are pending and examined herein. 
Claims 4 – 13, 16 – 18, 20 – 22 are rejected under 35 U.S.C. 112(b).
Claims 1 – 22 are rejected under 35 U.S.C. 101.
Claims 1 – 22 are rejected under 35 U.S.C. 103.

Specification
Applicant is reminded of the proper content of an abstract of the disclosure.
A patent abstract is a concise statement of the technical disclosure of the patent and should include that which is new in the art to which the invention pertains. The abstract should not refer to purported merits or speculative applications of the invention and should not compare the invention with the prior art.
If the patent is of a basic nature, the entire technical disclosure may be new in the art, and the abstract should be directed to the entire disclosure. If the patent is in the nature of an improvement in an old apparatus, process, product, or composition, the abstract should include the technical disclosure of the improvement. The abstract should also mention by way of example any preferred modifications or alternatives. 
Where applicable, the abstract should include the following: (1) if a machine or apparatus, its organization and operation; (2) if an article, its method of making; (3) if a chemical compound, its identity and use; (4) if a mixture, its ingredients; (5) if a process, the steps.
Extensive mechanical and design details of an apparatus should not be included in the abstract. The abstract should be in narrative form and generally limited to a single paragraph within the range of 50 to 150 words in length.
See MPEP § 608.01(b) for guidelines for the preparation of patent abstracts.

The abstract of the disclosure is objected to because the abstract merely recites claim language without providing a clear and concise summary of the technical disclosure of the invention.  A corrected abstract of the disclosure is required and must be presented on a separate sheet, apart from any other text. See MPEP § 608.01(b).

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 4 – 13, 16 – 18, 20 – 22 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claim 4 recites the limitation "a model" in line 2. Claim 4 is dependent to claim 1 or 2 and these claims recite “a sequential 2-D model” and “an image classification model”. It is unclear whether “a model” defined in claim 4 is supposed to refer back to the existing model defined in claim 1. There is insufficient antecedent basis for this limitation in the claim. For examination purposes,  claim 4 “a model” will refer to “the sequential 2-D model”. 

Claims 5, 6, and 8 recite the limitation "the encoder" in line 1. Claims 5, 6, and 8 are dependent to claim 1 and claim 1 never introduced “encoder”. There is insufficient antecedent basis for this limitation in the claim. For examination purposes,  claims 5, 6, and 8 will define each different “an encoder”. 
	Claims 7, 9 are dependent on claims 6 and 7 respectively. They do not resolve the issue of indefiniteness and are rejected with the same rationale.

Claims 10, 11 recite the limitation "the table level classification" in line 1. Claims 10, 11 are dependent to claim 1 and claim 1 never introduced “table level classification”. There is insufficient antecedent basis for this limitation in the claim. For examination purposes,  claims 10, 11 will define each different “a table level classification”. 

Claim 12 recites the limitation "the one or more classified cells" in line 1. Claim 12 is dependent to claim 1 and claim 1 never introduced “one or more classified cells”. There is insufficient antecedent basis for this limitation in the claim. For examination purposes,  claim 12 will define “one or more classified cells”. 

Claim 13 recites the limitation "the pre-processing" in line 1. Claim 13 is dependent to claim 1 and claim 1 never introduced “pre-processing”. There is insufficient antecedent basis for this limitation in the claim. For examination purposes,  claim 13 will define “a step of pre-processing” as defined in claim 12. 

Claim 16 recites the limitation "the internal states" in line 2. Claim 16 is dependent to claim 1 and claim 1 never introduced “internal states”. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, claim 16 will define “internal states”. Claim 16 also recites the limitation “the model” in line 3. With respect to claim 1 and 16, there are “a transformer based language model”, “a sequential 2-D model”, and “an image classification model”. It is unclear whether “the model” in claim 16 is referring to one of the previously defined model or introducing new machine learning model. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, claim 16 will refer to “the sequential 2-D model” as “a transformer based language model” is further mapped to “the language model” in claim 17, 18. 
	Claims 17 – 18 and 20 – 22 are dependent on claim 16 respectively. They do not resolve the issue of indefiniteness and are rejected with the same rationale.

Claim 22 recites the limitation "a model" in line 2. Claim 22 is dependent to claim 16, which recites “the model” in line 3. It is unclear whether claim 22 is defining new ML model with “a model” or supposed to refer to same model defined in claim 16. There is insufficient antecedent basis for this limitation in the claim. For examination purposes,  claim 22 “a model” will refer to “the sequential 2-D model” as mentioned above regarding claim 16 “the model”. 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1 - 22 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 

MPEP § 2109(III) sets out steps for evaluating whether a claim is drawn to patent-eligible subject matter. The analysis of claims 1 – 22, in accordance with these steps, follows. 

Step 1 Analysis:
Step 1 is to determine whether the claim is directed to a statutory category (process, machine, manufacture, or composition of matter.
Claims 1 – 18, 20 - 22 are directed to a method, meaning that it is directed to the statutory category of process. Claim 19 is directed to a method, which is also the statutory category of process. 

Step 2A Prong One, Step 2A Prong Two, and Step 2B Analysis:
Step 2A Prong One asks if the claim recites a judicial exception (abstract idea, law of nature, or natural phenomenon). If the claim recites a judicial exception, analysis proceeds to Step 2A Prong Two, which asks if the claim recites additional elements that integrate the abstract idea into a practical application. If the claim does not integrate the judicial exception, analysis proceeds to Step 2B, which asks if the claim amounts to significantly more than the judicial exception. If the claim does not amount to significantly more than the judicial exception, the claim is not eligible subject matter under 35 U.S.C. 101.

Regarding claim 1, the following claim elements are abstract ideas:
(b) transforming each of the cells into cell vector representations; (Transforming cells into cell vector representations is merely mathematical calculation, which is mathematical concept.)
(c) encoding the one or more cell vector representations with a sequential 2-D model; (Encoding vector representations with a sequential 2-D model is merely mathematical relationship, which is mathematical concept.)
(e) mapping the output of step (d) to an output vector which represents the probability of each of the table labels. (Mapping the output vector to represent probability of labels is merely mathematical relationship, which is mathematical concept.)
The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
A method for text mining from one or more tables, the method including the steps of:(a) receiving one or more tables, the tables having one or more table labels, and one or more cells to be processed; (This is mere data gathering, an insignificant extra solution activity, which does not integrate the judicial exception into a practical application. See MPEP § 2106.05(d)(II)(iv). Therefore, this does not amount to significantly more than the judicial exception.)
(d) obtaining one or more table-level vector representations  (This is mere data gathering, an insignificant extra solution activity, which does not integrate the judicial exception into a practical application. See MPEP § 2106.05(d)(II)(iv). Therefore, this does not amount to significantly more than the judicial exception.)
by summarising the semantics of the cell vector representations by an image classification model; and (This falls under mere instructions to apply an exception. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)

Regarding claim 2, the rejection of claim 1 is incorporated herein. Further, claim 2 recites the following additional elements:
wherein the sequential 2D model includes one or more quad-directional long-short term memory network. (This falls under mere instructions to apply an exception. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)

Regarding claim 3, the rejection of claim 1 is incorporated herein. Further, claim 3 recites the following additional elements:
wherein the sequential 2D model is Q-LSTM. (This falls under mere instructions to apply an exception. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)
Regarding claim 4, the rejection of claim 1 or 2 is incorporated herein. Further, claim 4 recites the following additional elements:
further including the step of:applying a machine learning paradigm to train a model from a labelled data set. (This falls under mere instructions to apply an exception. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)

Regarding claim 5, the rejection of claim 1 is incorporated herein. Further, claim 5 recites the following additional elements:
wherein a long-text transformer is provided as the encoder. (This falls under mere instructions to apply an exception. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)

Regarding claim 6, the rejection of claim 1 is incorporated herein. Further, claim 6 recites the following additional elements:
wherein a combination of pre-trained word vectors and character-level word representation is provided as input to the encoder. (This is mere transmitting data, which is a well-understood, routine conventional activity. It does not integrate the judicial exception into a practical application. See MPEP § 2106.05(d). Therefore, this does not amount to significantly more than the judicial exception.)

Regarding claim 7, the rejection of claim 6 is incorporated herein. Further, claim 7 recites the following additional elements:
wherein the pre-trained word vectors are provided by an unsupervised learning algorithm for obtaining vector representations. (This is mere data gathering and outputting, an insignificant extra solution activity, which does not integrate the judicial exception into a practical application. See MPEP § 2106.05(d)(II)(iv). Therefore, this does not amount to significantly more than the judicial exception.)

Regarding claim 8, the rejection of claim 1 is incorporated herein. Further, claim 8 recites the following additional elements:
wherein, the encoder is pre-trained with an in- domain dataset. (This falls under mere instructions to apply an exception. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)

Regarding claim 9, the rejection of claim 7 is incorporated herein. Further, claim 9 recites the following additional elements:
wherein the algorithm is selected from one or more of a long-text transformer encoder, GLoVe, Word2vec, Continuous-Bag-of- Words (CBOW), ELMo or BERT. (This falls under mere instructions to apply an exception. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)

Regarding claim 10, the rejection of claim 1 is incorporated herein. Further, claim 10 recites the following additional elements:
wherein the table level classification includes a table layout classification. (This falls under mere instructions to apply an exception. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)

Regarding claim 11, the rejection of claim 1 is incorporated herein. Further, claim 11 recites the following additional elements:
wherein the table level classification includes a table semantic classification. (This falls under mere instructions to apply an exception. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)

Regarding claim 12, the rejection of claim 1 is incorporated herein. Further, claim 12 recites the following additional elements:
wherein the method includes a step of pre- processing the one or more classified cells into each of the one or more tables to provide one or more pre-processed classified cells. (This falls under mere instructions to apply an exception. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)

Regarding claim 13, the rejection of claim 1 is incorporated herein. Further, claim 13 recites the following additional elements:
wherein the pre-processing is tokenisation by way of one or more of OSCAR4, ChemTok, NBICGeneChemTokenizer,OpenNLP, CoreNLP, NLTK, spaCy Tokenizer and the like. (This falls under mere instructions to apply an exception. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)

Regarding claim 14, the rejection of claim 1 is incorporated herein. Further, claim 14 recites the following additional elements:
wherein the image classification is by way of a convolutional neural network such as one or more of esNet18, VGG, DenseNet or Inception. (This falls under mere instructions to apply an exception. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)

Regarding claim 15, the rejection of claim 1 is incorporated herein. Further, claim 15 recites the following additional elements:
wherein the step of transforming each of the cells into cell vector representations includes utilising a long-text transformer or an LSTM-based embedder. (This falls under mere instructions to apply an exception. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)

Regarding claim 16, the rejection of claim 1 is incorporated herein. Further, claim 16 recites the following additional elements:
herein the method includes the step utilising a transformer based language model, and generating contextualized word representations by combining the internal states of the model for use in Natural Language Processing (NLP) tasks. (This falls under mere instructions to apply an exception. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)

Regarding claim 17, the rejection of claim 16 is incorporated herein. Further, claim 17 recites the following additional elements:
wherein the language model is one or more of a long-text transformer encoder, BERT, ELMo, XLNet or Roberta. (This falls under mere instructions to apply an exception. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)

Regarding claim 18, the rejection of claim 17 is incorporated herein. Further, claim 18 recites the following additional elements:
wherein the language model BERT is modified to accept tables. (This falls under mere instructions to apply an exception. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)

Claim 19 recites substantially similar subject matter to claim 1 respectively and is rejected with the same rationale, mutatis mutandis.

Regarding claims 20 – 22, the rejection of claim 16 is incorporated herein. Claims 20 – 22 recite substantially similar subject matter to claims 2 – 4 respectively and are rejected with the same rationale, mutatis mutandis.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1, 4, 6, 7, 10 – 12, 14, 19 are rejected under 35 U.S.C. 103 as being unpatentable over Nishida et al. (NPL: “Understanding the semantic structures of tables with a hybrid deep neural network architecture”) in view of Chan et al. (U.S. Pub. 2021/0406266 A1).
	Regarding Claim 1, Nishida teaches

    PNG
    media_image1.png
    63
    767
    media_image1.png
    Greyscale
A method for text mining from one or more tables, the method including the steps of:(a) receiving one or more tables, the tables having one or more table labels, and one or more cells to be processed; (Pg. 2 – 3 Network Architecture section of Nishida states “Figure 3 shows the overall architecture of our network. The input table is fixed-sized (cropped or padded), consisting of N rows andM columns and T tokens (vocabulary size: |V |) in the cells. Our network begins with a token embedding that creates a vectorial representation (size of E) of each token in the cells. Next, it encodes each cell (i.e., a sequence of token representations) into a fixed-size vector (size of H) using a recurrent neural network (RNN), which uses a long short-term memory (LSTM) with an attention mechanism, to obtain a semantic representation of each cell;” Pg 3 Figure 3 section of Nishida states 
“Figure 3: Entire architecture of TabNet. An embedding layer creates a vector (size of E) from a one-hot representation (size of |V |) of each token. An RNN uses an LSTM with an attention mechanism to encode each cell into a vector (size of H). A CNN uses one convolutional layer that has F filters and several stacked convolutional blocks with residual units to extract semantic features for table classification. Fully connected classification layers compute the predictive probabilities for all six table types.”)
(b) transforming each of the cells into cell vector representations; (Pg. 3 Network Architecture section of Nishida states “Next, it encodes each cell (i.e., a sequence of token representations) into a fixed-size vector (size of H) using a recurrent neural network (RNN), which uses a long short-term memory (LSTM) with an attention mechanism, to obtain a semantic representation of each cell;”)
(d) obtaining one or more table-level vector representations by summarising the semantics of the cell vector representations by an image classification model; and (Pg. 3 Network Architecture section of Nishida states “the result is an N×M×H third-order tensor. This tensor has the same structure as image data, i.e., height, width, and depth; hence, our network encodes the tensor using a convolutional neural network (CNN) to capture high-level semantic representations of the cell matrix.” CNN is used for image classification in Nishida reference to capture high level semantic (table level representation))
(e) mapping the output of step (d) to an output vector which represents the probability of each of the table labels. (Pg. 3 Network Architecture section of Nishida states “Finally, it flattens the output of the last convolutional layer (an N×M×F tensor) into a vector and uses fully connected layers and a softmax function to compute the predictive probabilities for all the table types.”)
	Nishida does not explicitly teach that (c) encoding the one or more cell vector representations with a sequential 2-D model; 
	However, Chan explicitly teaches that 
(c) encoding the one or more cell vector representations with a sequential 2-D model; ([0099] of Chan states “For the “row-wise” processing, embodiments encode each row (i.e., rows 602, 604, and 608) of the object 610 into a corresponding set of contextualized vectors 612. In various embodiments, a machine learning model simultaneously or in parallel reads the object 610 in rows and converts each row of the object into a corresponding contextualized vector. For example, at a first time, a 2-dimensional bi-directional LSTM model may first encode the feature vectors 602 representing each cell of each row (e.g., a list of names) into the contextualized vector 630. At a second time subsequent to the first time, the 2-dimensional bi-directional LSTM model may encode the feature vector 604 representing each cell of each row into the contextualized vector 632.”)
It would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to combine the teachings of Nishida and Chan. Nishida teaches representing a table as a grid of cells, transforming each cell’s token sequence into a fixed-size vector representation, and using a convolutional model to obtain a table level representation and output predictive probabilities for table labels. Chan teaches extracting information from table cells to derive feature vectors for cells and performing sequential encoding over the table structure using a 1D/2Dbi-directional LSTM to generate contextualized table representations for classification. One with ordinary skill in the art would have been motivated to incorporate the teachings of Nishida into that of Chan to improve capturing dependencies among neighboring cells across both dimensions of the table. It would have been predictable and known combination to improve classification of LSTM models with tables while keeping the system architecture. 

Regarding claim 4, the rejection of claim 1 is incorporated herein. Furthermore, the combination of Nishida and Chan teaches
further including the step of: applying a machine learning paradigm to train a model from a labelled data set. (Pg. 4 Dataset section of Nishida states “An expert annotated the type of table into six types: vertical and horizontal relational (VR and HR), vertical and horizontal entity (VE and HE), matrix (M), and other (O) tables. Note that a website has a large number of very similar tables, and random partitioning of the dataset causes the test set to contain seen data. We therefore split up the dataset by website; we used 60,678 tables in the top 300 of the 500 websites for training and the remaining for testing (Table 1).” Nishida shows labeled dataset and splitting dataset to train the model.)

Regarding claim 6, the rejection of claim 1 is incorporated herein. Furthermore, the combination of Nishida and Chan teaches
wherein a combination of pre-trained word vectors and character-level word representation is provided as input to the encoder. (Pg. 4 Learning of Networks section of Nishida states “The entire network can be trained end-to-end by using stochastic gradient descent (SGD) with backpropagation and can be easily implemented using common libraries without modifying the solvers. It is worth noting that pre-training only the embedding matrix with Word2Vec (Mikolov et al. 2013a; 2013b) or GloVe (Pennington, Socher, and Manning 2014) from a very large text corpus is effective when the number of training tables is not so large.” [0091] of Chan states “According to various embodiments a plurality of feature vectors are generated for a single element of a table, such as a cell. Each of these feature vectors may each represent a particular different feature and/or feature class. For example, the feature vector 402 represents a word vector. A “word vector” is a feature vector that represents a payload or identity of one or more natural language words (or version of the word) in an element.” [0115] of Chan states “Element or cell 1001 represents a feature vector representing the feature values of an entry, table cell, or other element. In an illustrative example, the cell 1001 may represent an entry (e.g., the entry 408), which includes word vector values, shape vector values, and a POS vector values, as described with respect to FIG. 4.”)

Regarding claim 7, the rejection of claim 6 is incorporated herein. Furthermore, the combination of Nishida and Chan teaches
wherein the pre-trained word vectors are provided by an unsupervised learning algorithm for obtaining vector representations. (Pg. 5 Model Configuration section of Nishida states “We conducted pre-training for token embedding. For word tokens, we obtained a pre-training word embedding matrix using Word2Vec with the skip-gram model and negative sampling (Mikolov et al. 2013a; 2013b). We used full texts in Wikipedia article pages for pre-training, where the tables in the pages were not included in the test dataset. We only retained words appearing in the pretraining and replaced the other words with a special UNK token. For other tokens (HTML tags and row and column indexes), we randomly initialized the columns of the embedding matrix corresponding to the tokens. We set the size of the token embedding, E, to 100.”)

Regarding claim 10, the rejection of claim 1 is incorporated herein. Furthermore, the combination of Nishida and Chan teaches
wherein the table level classification includes a table layout classification. (Pg. 2 Layout tables section of Nishida states “There are two categories. Navigational tables consist of cells organized for navigational purposes. Formatting tables account for a large portion of the tables on the Web; their only purpose is to organize elements visually. We treat both the other genuine tables and the layout tables as the ’Other’ type.”)

Regarding claim 11, the rejection of claim 1 is incorporated herein. Furthermore, the combination of Nishida and Chan teaches
wherein the table level classification includes a table semantic classification. (Pg. 1 Introduction section of Nishida states “The most fundamental technology for such applications is the means to examine the structures of tables, i.e., table type classification. (Crestan and Pantel 2010; 2011) proposed a fine-grained classification taxonomy as to whether they contain semantic triples of the form (subject, property, object) or whether they are used for layout purposes.”)

Regarding claim 12, the rejection of claim 1 is incorporated herein. Furthermore, the combination of Nishida and Chan teaches
wherein the method includes a step of pre- processing the one or more classified cells into each of the one or more tables to provide one or more pre-processed classified cells. ([0046] of Chan states ”The preprocessing component 204 is generally responsible for formatting, tagging, and/or structuring the data in the detected table in some particular way. For example, some embodiments use one or more preprocessing rules to fill all empty cells in the detected cell with a string “NA.” Some embodiments also treat each word or other character sequence (e.g., numbers, symbols, etc.) as a single token for processing. It is understood that although some embodiments treat the word sequence as a single token, particular embodiments read the sequence of words within a cell by using a deep learning model for sequences, (e.g. LSTM).” [0089] of Chan states ”The output of the system 300 is an element-wise logistics layer or output table 317. This output table 317 may be identical to the input table 303, except that there are annotations or other identifiers that indicate the prediction statistic described above.” Chan contains preprocessing component to process table data and output table being identical to input table after annotation identifiers being added could be interpreted as preprocessed classified cells being outputted.)

Regarding claim 14, the rejection of claim 1 is incorporated herein. Furthermore, the combination of Nishida and Chan teaches
wherein the image classification is by way of a convolutional neural network such as one or more of esNet18, VGG, DenseNet or Inception. (Pg. 3 Network Architecture section of Nishida states “The CNN consists of a convolutional layer that has F filters and several stacked convolutional blocks with residual units. Finally, it flattens the output of the last convolutional layer (an N×M×F tensor) into a vector and uses fully connected layers and a softmax function to compute the predictive probabilities for all the table types.” Pg. 4 Convolutional blocks with residual units section of Nishida states “Deep residual networks (ResNets) have high accuracy and nice convergence behavior for image classification, as a result of using many (over 100 layers) stacked residual units (He et al. 2015; 2016a).”)

Claim 19 recites substantially similar subject matter to claim 1 respectively and is rejected with the same rationale, mutatis mutandis.

Claims 2, 3 are rejected under 35 U.S.C. 103 as being unpatentable over Nishida et al. (NPL: “Understanding the semantic structures of tables with a hybrid deep neural network architecture”) in view of Chan et al. (U.S. Pub. 2021/0406266 A1), further in view of Stollenga et al. (NPL: ” Parallel Multi-Dimensional LSTM, With Application to Fast Biomedical Volumetric Image Segmentation”) .
Regarding claim 2, the rejection of claim 1 is incorporated herein.  The combination of Nishida and Chan does not explicitly teach wherein the sequential 2D model includes one or more quad-directional long-short term memory network.
However, Stollenga teaches that 
wherein the sequential 2D model includes one or more quad-directional long-short term memory network. (Pg. 3 Pyramidal Connection Topology section of Stollenga states “In MD-LSTMs, connections are aligned with the grid axes. In 2D, these directions are up, down, left and right. A 2D-LSTM adds the pixel-wise outputs of 4 LSTMs, one scanning the image pixel by pixel from north-west to south-east, one from north-east to south-west, one from south-west to north-east, and one from south-east to north-west.”)
	It would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to combine the teachings of Stollenga into the combination of Nishida and Chan. Nishida teaches representing a table as a grid of cells, transforming each cell’s token sequence into a fixed-size vector representation, and using a convolutional model to obtain a table level representation and output predictive probabilities for table labels. Chan teaches extracting information from table cells to derive feature vectors for cells and performing sequential encoding over the table structure using a 1D/2Dbi-directional LSTM to generate contextualized table representations for classification. Stollenga teaches a multi-directional 2D-LSTM that aggregates information from multiple scanning directions to capture richer 2D context in grid structured inputs. One with ordinary skill in the art would have been motivated to incorporate the teachings of Stollenga into that of Chan and Nishida to strengthen context propagation in 2D structured data, improve robustness, and accuracy of table representations by incorporating information from multiple directions across the table grid. It would have been predictable combination of known concept to use Quad-directional model. 

Regarding claim 3, the rejection of claim 1 is incorporated herein.  The combination of Nishida, Chan, and Stollenga teaches 
wherein the sequential 2D model is Q-LSTM. (Pg. 3 Pyramidal Connection Topology section of Stollenga states “In MD-LSTMs, connections are aligned with the grid axes. In 2D, these directions are up, down, left and right. A 2D-LSTM adds the pixel-wise outputs of 4 LSTMs, one scanning the image pixel by pixel from north-west to south-east, one from north-east to south-west, one from south-west to north-east, and one from south-east to north-west.” In the current disclosure’s specification [00054], Q-LSTM stands for “Quad-directional long-short term memory network”)

Claims 5, 8, 9, 13, 15 – 18 are rejected under 35 U.S.C. 103 as being unpatentable over Nishida et al. (NPL: “Understanding the semantic structures of tables with a hybrid deep neural network architecture”) in view of Chan et al. (U.S. Pub. 2021/0406266 A1), further in view of Yin et al. (NPL: “TABERT: Pretraining for Joint Understanding of Textual and Tabular Data”).
Regarding claim 5, the rejection of claim 1 is incorporated herein.  The combination of Nishida and Chan does not explicitly teach 
	wherein a long-text transformer is provided as the encoder. 
	However, Yin teaches that 
	wherein a long-text transformer is provided as the encoder. (Pg. 2 Introduction section of Yin states “In this paper we present TABERT, a pretraining approach for joint understanding of NL text and (semi-)structured tabular data (§ 3). TABERT is built on top of BERT, and jointly learns contextual representations for utterances and the structured schema of DB tables (e.g., a vector for each utterance token and table column). Specifically, TABERT linearizes the structure of tables to be compatible with a Transformer-based BERT model.” Yin reference uses transformer encoder for table text.)
	It would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to combine the teachings of Yin into the combination of Nishida and Chan. Nishida teaches representing a table as a grid of cells, transforming each cell’s token sequence into a fixed-size vector representation, and using a convolutional model to obtain a table level representation and output predictive probabilities for table labels. Chan teaches extracting information from table cells to derive feature vectors for cells and performing sequential encoding over the table structure using a 1D/2Dbi-directional LSTM to generate contextualized table representations for classification. Yin teaches using a transformer/BERT based language model adapted for tables to generate contextualized representations of natural language and tabular content, including pretraining on large table corpora and applying tokenization and table aware encoding mechanisms. One with ordinary skill in the art would have been motivated to incorporate the teachings of Yin into that of Chan and Nishida to improve semantic encoding of cell text and table context and pretrain using transformer language model to improve representation quality and classification performance for text and structured inputs. It would have been predictable combination to use transformer based table encoder into the system of Chan and Nishida. 

Regarding claim 8, the rejection of claim 1 is incorporated herein.  The combination of Nishida, Chan, and Yin teaches 
the encoder is pre-trained with an in- domain dataset. (Pg. 2 Introduction section of Yin states “To capture the association between tabular data and related NL text, TABERT is pretrained on a parallel corpus of 26 million tables and English paragraphs (§ 3.2). TABERT can be plugged into a neural semantic parser as a general-purpose encoder to compute representations for utterances and tables. Our key insight is that although semantic parsers are highly domain-specific, most systems rely on representations of input utterances and the table schemas to facilitate subsequent generation of DB queries, and these representations can be provided by TABERT, regardless of the domain of the parsing task.”)
Regarding claim 9, the rejection of claim 1 is incorporated herein.  The combination of Nishida, Chan, and Yin teaches 
wherein the algorithm is selected from one or more of a long-text transformer encoder, GLoVe, Word2vec, Continuous-Bag-of- Words (CBOW), ELMo or BERT. (Pg. 5 Model Configuration section of Nishida states “We conducted pre-training for token embedding. For word tokens, we obtained a pre-training word embedding matrix using Word2Vec with the skip-gram model and negative sampling (Mikolov et al. 2013a; 2013b).” Pg. 2 Introduction section of Yin states “In this paper we present TABERT, a pretraining approach for joint understanding of NL text and (semi-)structured tabular data (§ 3). TABERT is built on top of BERT, and jointly learns contextual representations for utterances and the structured schema of DB tables (e.g., a vector for each utterance token and table column). Specifically, TABERT linearizes the structure of tables to be compatible with a Transformer-based BERT model.”)

Regarding claim 13, the rejection of claim 1 is incorporated herein.  The combination of Nishida, Chan, and Yin teaches 
wherein the pre-processing is tokenisation by way of one or more of OSCAR4, ChemTok, NBICGeneChemTokenizer,OpenNLP, CoreNLP, NLTK, spaCy Tokenizer and the like. (Pg. 12 Preprocessing of Yin states “Our dataset is collected from arbitrary Web tables, which are extremely noisy. We develop a set of heuristics to clean the data by: (1) removing columns whose names have more than 10 tokens; (2) filtering cells with more than two non-ASCII characters or 20 tokens; (3) removing empty or repetitive rows and columns; (4) filtering tables with less than three rows and four columns, and (5) running spaCy to identify the data type of columns (text or real value) by majority voting over the NERlabels of column tokens, (6) rotating vertically oriented tables. We sub-tokenize the corpus using the Wordpiece tokenizer in Devlin et al. (2019).” Pg. 3 Tokenization section of Nishida states “Our network considers words, HTML tags, and row and column indexes of cells as tokens. The attributes of the HTML tags are ignored, except for rowspan and colspan, and spanned cells are discriminated from the original cell by cell indexes and the tag name, as shown in Figure 4. <thead>, <tbody>, <tr>, <colgroup>, <col>, <caption> tags are not used. All word and tag tokens are converted to lowercase.”)

Regarding claim 15, the rejection of claim 1 is incorporated herein.  The combination of Nishida, Chan, and Yin teaches 
wherein the step of transforming each of the cells into cell vector representations includes utilising a long-text transformer or an LSTM-based embedder. (Pg. 2 Introduction section of Yin states “TABERT is built on top of BERT, and jointly learns contextual representations for utterances and the structured schema of DB tables (e.g., a vector for each utterance token and table column). Specifically, TABERT linearizes the structure of tables to be compatible with a Transformer-based BERT model.” Pg. 3 Network Architecture section of Nishida states “Next, it encodes each cell (i.e., a sequence of token representations) into a fixed-size vector (size of H) using a recurrent neural network (RNN), which uses a long short-term memory (LSTM) with an attention mechanism, to obtain a semantic representation of each cell;”)

Regarding claim 16, the rejection of claim 1 is incorporated herein.  The combination of Nishida, Chan, and Yin teaches 
wherein the method includes the step utilising a transformer based language model, and generating contextualized word representations by combining the internal states of the model for use in Natural Language Processing (NLP) tasks. (Pg. 3 Masked Language Models section of Yin states “BERT parameterizes pθ(xm|x) using a Transformer model. During the pretraining phase, BERT maximizes pθ(xm|x) on large-scale textual corpora. In the fine-tuning phase, the pretrained model is used as an encoder to compute representations of input NL tokens, and its parameters are jointly tuned with other task-specific neural components.” Pg. 4 Vertical Self-Attention Mechanism section of Yin states “Next, the sequence of word vectors for the NL utterance (from the base Transformer model) are concatenated with the cell vectors as initial inputs to the vertical attention layer. Each vertical attention layer has the same parameterization as the Transformer layer in (Vaswani et al., 2017), but operates on vertically aligned elements, i.e., utterance and cell vectors that correspond to the same question token and column, respectively. This vertical self-attention mechanism enables the model to aggregate information from different rows in the content snapshot, allowing TABERT to capture cross-row dependencies on cell values.” Yin generates contextual vectors through transformer internal attention and combines context across vertically aligned table positions.)

Regarding claim 17, the rejection of claim 16 is incorporated herein.  The combination of Nishida, Chan, and Yin teaches 
wherein the language model is one or more of a long-text transformer encoder, BERT, ELMo, XLNet or Roberta. (Pg. 2 Introduction section of Yin states “In this paper we present TABERT, a pretraining approach for joint understanding of NL text and (semi-)structured tabular data (§ 3). TABERT is built on top of BERT, and jointly learns contextual representations for utterances and the structured schema of DB tables (e.g., a vector for each utterance token and table column). Specifically, TABERT linearizes the structure of tables to be compatible with a Transformer-based BERT model.”)

Regarding claim 18, the rejection of claim 17 is incorporated herein.  The combination of Nishida, Chan, and Yin teaches 
wherein the language model BERT is modified to accept tables. (Pg. 2 Introduction section of Yin states “Specifically, TABERT linearizes the structure of tables to be compatible with a Transformer-based BERT model. To cope with large tables, we propose content snap shots, a method to encode a subset of table content most relevant to the input utterance.”)

Claims 20 – 22 are rejected under 35 U.S.C. 103 as being unpatentable over Nishida et al. (NPL: “Understanding the semantic structures of tables with a hybrid deep neural network architecture”) in view of Chan et al. (U.S. Pub. 2021/0406266 A1), Yin et al. (NPL: “TABERT: Pretraining for Joint Understanding of Textual and Tabular Data”), further in view of Stollenga et al. (NPL: ” Parallel Multi-Dimensional LSTM, With Application to Fast Biomedical Volumetric Image Segmentation”).
Regarding claims 20 – 22, the rejection of claim 16 is incorporated herein. Claims 20 – 22 recite substantially similar subject matter to claims 2 – 4 respectively and are rejected with the same rationale, mutatis mutandis.
	It would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to combine the teachings of Stollenga into the combination of Nishida, Chan, and Yin. Nishida teaches representing a table as a grid of cells, transforming each cell’s token sequence into a fixed-size vector representation, and using a convolutional model to obtain a table level representation and output predictive probabilities for table labels. Chan teaches extracting information from table cells to derive feature vectors for cells and performing sequential encoding over the table structure using a 1D/2D bi-directional LSTM to generate contextualized table representations for classification. Yin teaches using a transformer/BERT based language model adapted for tables to generate contextualized representations of natural language and tabular content, including pretraining on large table corpora and applying tokenization and table aware encoding mechanisms. Stollenga teaches a multi-directional 2D-LSTM that aggregates information from multiple scanning directions to capture richer 2D context in grid structured inputs. One with ordinary skill in the art would have been motivated to incorporate the teachings of Stollenga into that of Nishida, Chan, and Yin because a table is inherently a 2D grid of cells and Stollenga teaches a known way to propagate context across a 2D grid from four directions to better capture inter-cell dependencies. It would have been a predictable combination to use quad direction 2D LSTM from Stollenga to obtain richer contextualized cell or table representations while keeping the current combination of model architecture from Nishida, Chan, and Yin. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BYUNGKWON HAN whose telephone number is (571)272-5294. The examiner can normally be reached M-F: 9:00AM-6PM PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached at (571)272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/BYUNGKWON HAN/               Examiner, Art Unit 2121                                                                                                                                                                                         
/Li B. Zhen/               Supervisory Patent Examiner, Art Unit 2121
Read full office action
SYSTEM AND METHOD FOR TEXT MINING

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

SYSTEM AND METHOD FOR TEXT MINING

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email