Detailed Action
This communication is in response to the Arguments and Amendments filed on 11/24/2025.
Claims 1-20 are pending and have been examined.
Claims 1-20 are rejected.
Claims 1 and 11 are independent.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Arguments and Amendments
Applicant has amended the independent claims to include “training a machine learning algorithm based on a set of ground truth documents, in which a list of cell-items of tables have been labeled; extracting, by the machine learning algorithm, a list of cell-item attributions to words within the tables in the set of ground truth documents to create a rule-set;” “applying, by the machine learning algorithm,” “wherein a frequency of each frequent word is greater than or equal to a frequency threshold;” “an association” “two or more frequent words, wherein the association rule is to find an association between two or more frequent words” “association” “association rule has” “the two or more frequent words,” association “ “mining and” “two or more frequent words,” “association” “association” “association”
Regarding the Rejections under 35 U.S.C. 101 Applicant notes
The rule-set is created by the machine learning algorithm, which has been trained based on the set of ground truth documents. Association rule, which has failed in the rule-set, can be identified with two or more frequent words, and the triple of the two or more words, the association rule, and the context are mined and provided for verification. With these features, "it could drastically reduce the amount of human work necessary for practical applications, thereby significantly decreasing the cost of the operations that require content extraction." ( [0009] of the specification as originally filed.) Thus, even assuming, arguendo, that the claimed invention is directed to an alleged abstract idea, it is respectfully submitted that independent claim 1, as a whole, amounts to significantly more than the alleged abstract idea.
The independent claims recite a sequence of data transformations and computations: extracting frequent words, computing confidence values validation/verification. These steps are paradigmatic mathematical/data processing operations (vector generation, averaging, alignment, numeric feature based signal synthesis) and therefore fall within the “mathematical concepts” exception recognized by the USPTO and Federal Circuit (see, e.g., Digitech, SAP America, Electric Power Group).
On the present claim wording, the limitations are largely functional and outcome oriented (detect, extracting, determining, thresholding) without concrete computational detail or a recitation of how the arrangements materially improve the functioning of the computer system itself (e.g., speed/latency reductions, memory or computational efficiency, novel data representations that reduce error by a measurable metric, or specific unconventional network architectures constrained in a way that produces the improvement). Applicants arguments “it could drastically reduce the amount of human work necessary for practical applications” are not persuasive. Applicant would need to show the improvement by some measure.
Regarding the Rejections under 35 U.S.C. § 103 Applicant notes
Brisimi, Srinivasan, Kim, Sublett, Anubhai, Jacquet, and Benincasa fail to teach or suggest at least the above-recited features of independent claims 1 and 11.
Applicant notes The Office Action acknowledged that "Brisimi in view of Srinivasan does not specifically teach determining whether or not a rule in the rule-set succeeded or failed when applied, to the words in a table in the new unstructured document ...." (Page 11.) Since Brisimi and Srinivasan fail to teach determining step, it follows that Brisimi and Srinivasan also fail to teach or suggest "when the confidence level is below a threshold confidence level, identifying the two or more frequent words, to which the failed association rule was applied; and mining and providing a triple of the two or more frequent words, to which the failed association rule was applied, the failed association rule, and context, in which the failed association rule should have succeeded, for verification," as recited in independent claims 1 and 11.
Examiner notes Brisimi is no longer used as a primary reference, see New Grounds for Rejection, below.
Applicant notes Kim fails to cure the deficiencies of Brisimi and Srinivasan. Kim relates to "techniques for retrieving query results for natural language procedural queries" (Abstract) and discloses a query answering (QA) system that "generates a structured semantic representation of a natural language query" (Id.). The Office Action appeared to interpret a triple of the query at paragraph [0071] of Kim as the triple as recited in independent claims 1 and 11. Paragraph [0071] of Kim states the following: If a triple for the query(or the title of the query result) fails to be aligned with any triple for the title of the query result (or the query) according to Rule 1 and Rule 2,the alignment score for the triple would be assigned a value (e.g., -1 or 0) indicating that the triple fails to align with other triples. (Emphasis added.) Paragraphs [0067]-[0070] of Kim describe what Rule 1 and Rule 2 are as following: In one example, a triple in the semantic representation ofthe querv is (ac, ra, v.) and a triple in the semantic representation of the title of a query result is (at, rt, v) as described above. One rule used by the query result scoring engine is:
Rule 1: If aq is same as or a synonym of at, rq is same as rt, and vg is same as or a synonym of vt, the two triples are aligned and the corresponding alignment score is 1.
Rule 1 can be used to determine the alignment scores in cases where the two triples are the same as each other. Rule 1 can also be used to determine the alignment scores by matching synonymous words based on word-to-word synonym information, such as {"image,""photo," and "picture"}.
Another rule used by the query result scoring engine is:
Rule 2: If (aq, rq, vq) is a paraphrase of (at, rt, vt) (e.g., the two triples are included in a paraphrasing rule), the two triples are aligned and the alignment score is the similarity score indicated by the paraphrasing rule.
Rule 2 can be used to determine the alignment scores in cases where the two triples are paraphrases. In one example, a paraphrasing rule includes a pair of paraphrases ("create," Object, "gif") and ("save," "gif format") and a similarity score between the two triples.Thus, if a triple in the semantic representation of the query is ("create," Object, "gif') and a triple in the semantic representation of the queryresult title is ("save," "gif format") or vice versa, the twotriples are aligned with an alignment score corresponding to the similarity score between the two triples in the paraphrasing rule.
(Emphasis added.) As disclosed by Kim, the triple is a semantic representation of a query, and the alignment score between two tripes is determined based on the paraphrase rule. Kim merely discloses triples but the triples are totally different from the triple of "the two or more frequent words, to which the failed association rule was applied, the failed association rule, and context, in which the failed association rule should have succeeded," as recited in independent claims 1 and 11.
Examiner notes Kim is no longer used as a reference, see below.
Applicant notes Sublett, Anubhai, Jacquet, and Benincasa are not relied upon by the Office Action to cure the deficiencies of Brisimi. In particular, Benincasa discloses an answer table, "in which each cell corresponds to a specific question and a specific document (answer table-see FIG. 8)."( [0108].) However, the way how the answer table is described in Benincasa fails to teach or suggest the above-recited features.
Examiner notes Benincasa is not relied upon to teach or suggest the above-recited features.
Applicant’s arguments with respect to claim(s) 1-20 have been considered but are moot because the new ground of rejection does not rely on the primary reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. The applicants’ amendments have caused a new interpretation of the claim an therefore Kim (US Patent Number US 20190392066 A1) is no longer required.
Hence, new grounds for rejection have been made over OSUALA (US Patent Number US 20220382784 A1), in view of Srinivasan (US Patent Number US 20060288268 A1).
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Independent Claim 1 recites,
“1. (Currently amended) A method, comprising:
training a machine learning algorithm based on a set of ground truth documents, in which a list of cell-items of tables have been labeled;
extracting, by the machine learning algorithm, a list of cell-item attributions to words within the tables in the set of ground truth documents to create a rule-set; [a human can extract a a list of cell-item attributions to words using logic and pen and paper]
receiving the rule-set, comprising a combination of rules, that was determined to occur in the set of ground truth documents; [a human can receive a rule set using visual or auditory processes]
applying, by the machine learning algorithm, the rule-set to a new unstructured document that was not included in the set of ground truth documents and has a table, which has one or more rows and one or more columns; [a human can apply the rule-set to a new unstructured document that was not included in the set of ground truth documents and has a table]
detecting, by using computer vision and a natural language process, the table in the new unstructured document and extracting frequent words from a cell-item in the table, [a human can detect a table using visual or auditory processes and extract frequent words using pen and paper.]
wherein a frequency of each frequent word is greater than or equal to a frequency threshold; [this is a mathematical process]
determining whether or not an association rule in the rule-set has succeeded or failed when applied to two or more frequent words, wherein the association rule is to find an association between two or more frequent words; [a human can determine whether or not an association rule in the rule-set has succeeded or failed using logic in the mind]
when the association rule is determined to have failed,
identifying a confidence level in the determination that the association rule has failed; [a human can identify when a confidence level in the determination that the association rule has failed using logic and reasoning]
when the confidence level is below a threshold confidence level, identifying a word the two or more frequent words, to which the failed association rule was applied, [a human can identify a identifying a word the two or more frequent words, to which the failed association rule was applied using logic and reasoning.]
and mining and providing a triple of the two or more frequent words, to which the failed association rule was applied, the failed association rule, and context, in which the failed association rule should have succeeded, for verification. [a human can mine and provide a triple of the two or more frequent words using pen and paper.]
Regarding Independent Claim 11, Claim 11 is a storage medium claim with limitations similar to that of claim 1 and is rejected under the same rationale.
This judicial exception is not integrated into a practical application. In particular, claim 11 recites additional element of “processor”, as per the independent claims. For example, in [0096] of the as filed specification, there is description of using computer storage media carrying instructions that, when executed by the processor and/or caused to be executed by the processor, perform any one or more of the methods disclosed. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional element of using a computing device such as a processor is noted as a general computer. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Further, the additional limitation in the claims noted above are directed towards insignificant solution activity. The claims are not patent eligible.
With respect to claims 2 and 12 the claims relate to the new unstructured document is an unstructured document This relates to a human receiving and understanding a new document using natural human cognitive abilities No additional limitations are present. With respect to claims 3 and 13 the claims relate to the confidence level is a function of an extent to which rules in the rule set hold for words in the new unstructured document. This relates to a human using logic and reasoning to apply a confidence level to rules holding for words in a new document which can be understood using natural human cognitive abilities. No additional limitations are present. With respect to claims 4 and 14 the claims relate to the rule is considered to hold for a word in the cell item in the table in the new unstructured document if preconditions for the rule hold, This relates to a human using logic and reasoning to apply preconditions for the rule and apply them to the new document and if a check concerning that rule is satisfied, This relates to a human using natural logic and reasoning to check to see if a rule meets satisfactory requirements wherein the check comprises determining clauses that must succeed and/or constraints that must hold for the rule to succeed when the preconditions are met. This relates to a human using natural language understanding to determine if a rule succeeds for the clause requirement. No additional limitations are present. With respect to claims 5 and 15 the claims relate to some of the rules in the rule- set are known to hold together for some words. This relates to a human using natural language understanding to determine if a rule set holds for some words. No additional limitations are present. With respect to claims 6 and 16 the claims relate to the confidence level is determined based on a level of support in the ground truth documents for the rule set when the failed rule is excluded, and also based on a level of support in the ground truth documents for the rule set when the failed rule is included. This relates to a human using natural language understanding and logic and reasoning to determine a level of support in a ground truth documents when the failed rule is included or excluded. No additional limitations are present. With respect to claims 7 and 17 the claims relate to the verification comprises a determination whether or not a cell-item of a table in the new unstructured document was correctly apply to the set of ground truth documents. This relates to a human using logic and reasoning to determine if a cell item is applied correctly to the ground truth documents. No additional limitations are present. With respect to claims 8 and 18 the claims relate to the rule-set is employed with the new unstructured document based on a frequency with which the rule-set was determined to apply to the set of ground truth documents This relates to a human using logic and reasoning to use the rules when they apply to the ground truth documents. No additional limitations are present. With respect to claims 9 and 19 the claims relate to the rules are included in the rule-set due to a determination that the rules hold together for some words in the set of ground truth documents. This relates to a human using natural human understanding to make a determination that the rules hold for some words in the documents and to include them. No additional limitations are present. With respect to claims 10 and 20 the claims relate to performing a content extraction process that includes using the rules in the rule-set to assign cell- items to the words in the new unstructured document. This relates to a human using logic and reasoning to assign cell items to the words in the document. And using pen and paper, to extract the words from the content. No additional limitations are present.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 5, 6, 9 11, 15, 16, 19 are rejected under 35 U.S.C. 103 as being unpatentable over OSUALA (US Patent Number US 20220382784 A1), in view of Srinivasan (US Patent Number US 20060288268 A1).
Regarding Claim 1, OSUALA teaches
“1. (Currently amended) A method, comprising:
training a machine learning algorithm based on a set of ground truth documents, (see Osuala [0052] “Some electronic health records may contain mostly unstructured data in textual format…”) (see OSUALA [0057] “The computing system 100 may further comprise an association rule learning component 109 that may retrieve the stored data containing the clusters, the unstructured data and/or structured data with their corresponding timestamps. First, the association rule learning component 109 may group the records into time buckets by applying a moving window strategy, which in at least one embodiment may be of fixed length of time. Once all records are distributed into their corresponding time bucket, an association rule learning algorithm may be applied to find sequential patterns that recur in different time buckets.”) in which a list of cell-items of tables have been labeled; (see OSUALA [0094] “This, then, may lead to consistent encoding and classification of the data in categories. These embodiments may be advantageous because an association rule learning may aim to infer rules of co-occurring categories from status entries and corresponding outcomes in a related time frame. This preparation of the dataset may be performed as illustrated in FIG. 11, wherein each data record (e.g., a row in a table) may have the following information (e.g., as columns in a table): a uniqueness entry (e.g., ID),”) extracting, by the machine learning algorithm, a list of cell-item attributions to words within the tables in the set of ground truth documents to create a rule-set; (see OSUALA [0094] …”This, then, may lead to consistent encoding and classification of the data in categories. These embodiments may be advantageous because an association rule learning may aim to infer rules of co-occurring categories from status entries and corresponding outcomes in a related time frame. This preparation of the dataset may be performed as illustrated in FIG. 11, wherein each data record (e.g., a row in a table) may have the following information (e.g., as columns in a table): a uniqueness entry (e.g., ID), an associated time entry (e.g., timestamp), and an unstructured data entry (e.g., a text input). The unstructured data entry may be encoded as vector. The encoder, such as an artificial neural network, may be trained to handle various types of unstructured data. Hence, the only data condition for the unstructured data may be that it follows an internal structure of patterns that allows for categorization. For example, text has such an internal structure commonly referred to as semantics. In sum, as long as the data contains learnable patterns that a neural network encoder can represent, the type of unstructured data is flexible and depends on the use-case. It could be of, e.g., type text, documents, logs, sequences, sensor data, images, video, audio, etc.”) (see OSUALA [0096] “…FIG. 12 illustrates the ANN encoder for unstructured data of textual format, such as in EHRs, IT Ticketing Data and Chat logs, where the input to the ANN encoder is a sequence of words S. The encoder may transform this sequence S into a multidimensional vector that contains numbers. This multidimensional vector may be the output of the encoder M and is henceforth referred to as embedding E. Embedding E may be computed as the output of a function φ(M, S) of the input S and the model M. This function may be embodied as a mean, concatenation, or sum of several internal token-level embeddings generated by M, as illustrated by FIG. 12. This function may also be embodied by any other output of M. For example, one or several internal character, token, word, or text level embeddings in one or several hidden layers of M may encode a relevant part of the meaning of sequence S and, hence, can be used as E.”) receiving the rule-set, comprising a combination of rules, that was determined to occur in the set of ground truth documents; (see OSUALA [0094] …It could be of, e.g., type text, documents, logs, sequences, sensor data, images, video, audio, etc.”) applying, by the machine learning algorithm, the rule-set to a new unstructured document (see OSUALA [0028] “An association rule analysis may be a technique to discover the association rules in a dataset. The association rule analysis may, for example be a fully automatic analysis or a semi-automatic analysis. Embodiments utilizing fully automatic association rule analysis may be advantageous as they may improve the processing of the datasets. For example, an association rule may be determined using a machine learning method.”) that was not included in the set of ground truth documents and has a table, which has one or more rows and one or more columns; (see OSUALA [0027] “The unstructured record may comprise values of attributes in an unstructured form. The unstructured record may enable to associate attributes to corresponding attribute values. The unstructured record may be a file, document or an object with free form text or embedded values included therein. Examples of unstructured records may include word processing documents”)
OSUALA does not specifically teach detecting, by using computer vision and a natural language process, the table in the new unstructured document (see Srinivasan [0073] “With reference to FIG. 2, a table is identified in a document specified by a user at Step 202. The document may be in various formats such as ASCII text, Unicode text, HTML, PDF text, and PDF image. The present invention uses Optical Character Recognition (OCR) (examiner interprets computer vision as “OCR”) to scan the PDF image documents and convert the image into text. Similarly, PDF text documents are converted into text by using a filter. HTML documents are converted into text format before they are processed further, and the text documents are processed as is.”) and extracting frequent words from a cell-item in the table, (see Srinivasan [0030] “The web server runs a controller servlet compliant with industry standard web servers. The web server can access documents containing unstructured tabular data, stored in any format such as ASCII text, Unicode text, HTML, PDF text or PDF Image format. The application server comprises an engine and a data access layer. The engine, which is the runtime execution module of the system, extracts tabular data from documents, and interprets and standardizes it. Extraction, interpretation, and standardization is performed by using a set of identification, parsing, and mapping rules, as described above, which are stored in the database and are accessed by using the data access layer. The extracted data, along with other application-specific data, is stored in the database. The data access layer acts as a gateway to the database and the RDBMS. The extracted, interpreted, and standardized data is accessible to a user through the user interface. Development of rules for identification, extraction and interpretation are facilitated by the rules development UI.”) (see Srinivasan [0162] Identical text descriptions may occur in different sections of an identified table. For example, the word `other`”) wherein a frequency of each frequent word is greater than or equal to a frequency threshold; (see Srinivasan [0068] Referring now primarily to Table 2, the results of the identification of hierarchical mathematical structural relationships amongst the line items are displayed. The last column in Table 2 represents the hierarchical structure, where a value of -1 indicates that the item does not have a parent. A positive value in this column implies that the item in the line indicated by the value is the parent of the current row. The third-last column represents the value for that line. For example, `2003` in line 0 represents the value for the STATEMENT YEAR. In this example, lines 0 through 15 are header labels for a Balance Sheet, and therefore do not have any parent lines, since they are not a part of the Balance Sheet. Lines 16 through 19 have line 21 as their parent. Line 21 is the `Total Current Assets`. Line 21 through 26, in turn, have a parent in line 27, which is the grand total of the assets side of the balance sheet. In the case of financial statements, such a mathematical relationship serves to validate the integrity of the extracted statement, and can also be used to identify the key sections in a financial statement.”) determining whether or not an association rule in the rule-set has succeeded or failed when applied to two or more frequent words, (see Srinivasan [0125] At step 1405, the hierarchical mathematical structure in a table is discovered and used in conjunction with a set of validation rules, which are applied to the tokenized/parsed contents of the identified table to verify the accuracy of the tokenizing/parsing. The following is a set of example rules for validating a Balance Sheet are as follows: TABLE-US-00011 31 `FINAL ROW` `1` 32 `STOCKHOLDER `CONTAINS` `CONTINUE` 33 `EQUITY` `CONTAINS` `EXIT` 34 `CAPITAL` `CONTAINS` `EXIT` 35 `MAX UNFATHOMED ROWS` `1` “) wherein the association rule is to find an association between two or more frequent words; (see Srinivasan 0126] “Rule 31 specifies that the hierarchical structure of the table should have only one root at the end of the discovery process. A hierarchical structure implies that each row in the table will be a constituent part of the final row, either directly or indirectly, through another row. Therefore, a row can be a part of another row, which will be referred to as its parent. The parent can be a part of another row in the table, which will then be known as the parent's parent row. Continuing this way, the rule specifies that at the end there should be only one independent parent or root row. Such a characteristic is commonly found in most financial tables, including in financial statements. Rules 32, 33 and 34 specify further validation constraints on the final row. Rule 32 states that the final row should be checked to ascertain whether it `contains` the text `STOCKHOLDER`, and specifies that if that condition is satisfied, the validation step should `continue`. In other words, the mere containment of the text `STOCKHOLDER` is a necessity, but not a sufficient condition for concluding the validation step. The third rule states that the final row should be checked to confirm whether it contains the text `EQUITY`. If it does, the rule specifies that the validation step can be concluded. Similarly, the fourth rule states that the final row should be checked to determine whether it contains the text `CAPITAL`, and if it does, the validation step may be concluded. The fifth rule specifies that no rows in the table can be left unprocessed. This implies that every row in the table has to be part of the hierarchical structure.”)when the association rule is determined to have failed, identifying a confidence level in the determination that the association rule has failed; (see Srinivasan [0052] “The preferred embodiment of the present invention provides the application designer with a framework, to create a set of identification, extraction, interpretation and standardization rules. Once the designer is satisfied that the rules are offering a satisfactory level of accuracy, they can be deployed for production usage at step 128. These rules are applied on other documents, to identify, parse and interpret tabular data at step 130. The accuracy of the results is also checked at step 132, and the rules revised at step 134, if the desired accuracy is not achieved. While the documents are being processed in production, the present invention enables the automated updating of the rules as a result of correcting structuring errors, if any.”) when the confidence level is below a threshold confidence level, identifying a word the two or more frequent words, to which the failed association rule was applied, and (see Srinivasan [0076] The content of the identified table is first filtered to remove any invalid data. Examples of invalid data include HTML tags that are embedded between text contents of the table, and signify the beginning of a table. Then, by using parsing rules, the table content is tokenized/parsed into items or tokens on a line-by-line basis. Next, a set of validation rules are applied to the tokenized/parsed contents of the identified table, in order to verify the accuracy of tokenizing/parsing. This step eliminates erroneous tokenization/parsing of the table content.”) mining and providing a triple of the two or more frequent words, to which the failed association rule was applied, the failed association rule, and context, in which the failed association rule should have succeeded, for verification. (see Srinivasan [0126] Rule 31 specifies that the hierarchical structure of the table should have only one root at the end of the discovery process. A hierarchical structure implies that each row in the table will be a constituent part of the final row, either directly or indirectly, through another row. Therefore, a row can be a part of another row, which will be referred to as its parent. The parent can be a part of another row in the table, which will then be known as the parent's parent row. Continuing this way, the rule specifies that at the end there should be only one independent parent or root row. Such a characteristic is commonly found in most financial tables, including in financial statements. Rules 32, 33 and 34 specify further validation constraints on the final row. Rule 32 states that the final row should be checked to ascertain whether it `contains` the text `STOCKHOLDER`, and specifies that if that condition is satisfied, the validation step should `continue`. In other words, the mere containment of the text `STOCKHOLDER` is a necessity, but not a sufficient condition for concluding the validation step. The third rule states that the final row should be checked to confirm whether it contains the text `EQUITY`. If it does, the rule specifies that the validation step can be concluded. Similarly, the fourth rule states that the final row should be checked to determine whether it contains the text `CAPITAL`, and if it does, the validation step may be concluded. The fifth rule specifies that no rows in the table can be left unprocessed. This implies that every row in the table has to be part of the hierarchical structure.”)
OSUALA and Srinivasan are in the same field of endeavor of natural language understanding therefore It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of OSUALA to incorporate the teachings of Srinivasan to include detecting, by using computer vision and a natural language process, the table in the new unstructured document and extracting frequent words from a cell-item in the table, wherein a frequency of each frequent word is greater than or equal to a frequency threshold; determining whether or not an association rule in the rule-set has succeeded or failed when applied to two or more frequent words, wherein the association rule is to find an association between two or more frequent words; when the association rule is determined to have failed, identifying a confidence level in the determination that the association rule has failed; when the confidence level is below a threshold confidence level, identifying a word the two or more frequent words, to which the failed association rule was applied, and mining and providing a triple of the two or more frequent words, to which the failed association rule was applied, the failed association rule, and context, in which the failed association rule should have succeeded, for verification.. Doing so allows for the extraction, interpretation and standardization a combination of text labels, grammatical constraints on the text labels, a number of distinct words, and operations on numeric values to be used to identify, as recognized by Srinivasan in paragraph [0148].
As to claims 5 and 15 OSUALA in view of Srinivasan teach The method as recited in claim 1, (see claim 1) and The non-transitory storage medium as recited in claim 11, (see claim 11).
Furthermore, OSUALA teaches wherein some of the rules in the rule- set are known to hold together for some words. (see OSUALA [0071] “A database (“DB” or knowledge domain) of rules may be enriched, enhanced, updated, replaced, and/or added to using the extracted rules, as in block 508. That is, the extraction of one or more rules, concepts and topics may include, but is not limited to, performing knowledge extraction from natural language text documents including reading input text; transforming the input text into a machine understandable knowledge representation so as to provide knowledge libraries (e.g., within the database/knowledge domain) from said documents; and using semantic based means for extracting concepts and their interrelations from said input text. Knowledge structures of the database/knowledge domain may be used consisting of rules, or other concepts and topics, such as rule-like obligations (examiner interprets rules hold for some as “rule-like obligations”) and violations, and the interrelations of the rule-like obligations and violations. Hence, the one or more rules having incorrect data relating to the one or more existing, similar rules may be identified according to the database/knowledge domain.”)
As to claims 6 and 16 OSUALA in view of Srinivasan teach The method as recited in claim 1, (see claim 1) and The non-transitory storage medium as recited in claim 11, (see claim 11).
Furthermore, OSUALA teaches wherein the confidence level is determined based on a level of support in the ground truth documents for the rule set when the failed rule is excluded, and also based on a level of support in the ground truth documents for the rule set when the failed rule is included. (see OSUALA [0077] “To further illustrate the operation of the rule ranker of block 516, consider the following example. Assume the rule selector receives as input 1) set of rules R such that each rule has a body and a weight, 2) a scoring function F. The rule ranker may output a list of rules R ranked according to the scoring function F. During implementation, R′ may be set equal to the set of rules (e.g., Let R′={ }), and F may be set as a default scoring function that ranks rules based on the probability the rules are incorrect as defined by the machine learning model. In an additional embodiment, the default scoring function F may be giving higher priority to rules that are associated with probability at or above a defined threshold or value (e.g., close to 0.5) (for those rules that correctness/incorrectness is uncertain. Thus, receiving user provided feedback/correction/labelling may increase the ML algorithm's accuracy. For each (Ri, wi) in R, a scoring function F(R) may be determined, (Ri,wi, F(Ri)) may be added to R′, R′ may be sorted in decreasing order of F(Ri) and R′ may be returned. That is, a set of rules R is ranked according to a scoring function F(Ri), which is defined relative to a rule Ri. An example of a scoring function is the probability that rule Ri is incorrect as computed by the machine learning model.”)
As to claims 9 and 19 OSUALA in view of Srinivasan teach The method as recited in claim 1, (see claim 1) and The non-transitory storage medium as recited in claim 11, (see claim 11).
Furthermore, OSUALA teaches wherein the rules are included in the rule-set due to a determination that the rules hold together for some words in the set of ground truth documents. (see OSUALA [0071] “A database (“DB” or knowledge domain) of rules may be enriched, enhanced, updated, replaced, and/or added to using the extracted rules, as in block 508. That is, the extraction of one or more rules, concepts and topics may include, but is not limited to, performing knowledge extraction from natural language text documents including reading input text; transforming the input text into a machine understandable knowledge representation so as to provide knowledge libraries (e.g., within the database/knowledge domain) from said documents; and using semantic based means for extracting concepts and their interrelations from said input text. Knowledge structures of the database/knowledge domain may be used consisting of rules, or other concepts and topics, such as rule-like obligations (examiner interprets rules hold for some as “rule-like obligations”) and violations, and the interrelations of the rule-like obligations and violations. Hence, the one or more rules having incorrect data relating to the one or more existing, similar rules may be identified according to the database/knowledge domain.”)
Regarding claim 11 Claim 11 is a storage medium claim with limitations similar to that of claim 1 and is rejected under the same rationale. Furthermore, OSUALA teaches A non-transitory storage medium having stored therein instructions that are executable by one or more hardware processors to perform operations comprising: (see OSUALA [0121] When the systems and methods described herein are implemented in software 1712, as is shown in FIG. 15, the methods may be stored on any computer readable medium, such as storage 1720, for use by or in connection with any computer related system or method. The storage 1720 may comprise a disk storage such as HDD storage.OSUALA in view of Srinivasan are in the same field of endeavor of natural language understanding therefore It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the storage medium combination of of OSUALA and Srinivasan to incorporate the teachings of Kim to include applying the rule-set to a new document that was not included in the set of ground truth documents; determining whether or not a rule in the rule-set succeeded or failed when applied to a word in the new document, and when the rule is determined to have failed, identifying the failed rule; and providing a triple of the word, the rule that failed, and context, in which the rule should have succeeded, to a human for verification Doing so allows for determination of the overall match score between the semantic representation of the original and the semantic representation of the candidate, as recognized by Kim in paragraph [0072].
Claims 2 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over OSUALA (US Patent Number US 20200160191 A1), in view of Srinivasan (US Patent Number US 20060288268 A1), and further in view of Sublett (US Patent Number US 20230418978 A1).
As to Claims 2 and 12 OSUALA in view of Srinivasan teach The method as recited in claim 1, (see Claim 1), and The non-transitory storage medium as recited in claim 11, (see Claim 11).
OSUALA in view of Srinivasan do not specifically teach wherein the new document is an unstructured document However, Sublett does teach this limitation (see Sublett [0008] “In another embodiment of the invention, a data processing system can be adapted for batch de-identification of unstructured health care documents.”)
OSUALA in view of Srinivasan and Sublett are in the same field of endeavor of natural language understanding therefore It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified The method as recited in claim 1, and The non-transitory storage medium as recited in claim 11, of OSUALA Srinivasan to incorporate the teachings of Sublett to include wherein the new document is an unstructured document. Doing so allows for the fields of the unstructured document to be rapidly located as recognized by Sublett in paragraph [0004-0005].
Claims 3 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over OSUALA (US Patent Number US 20200160191 A1), in view of Srinivasan (US Patent Number US 20060288268 A1), and further in view of Anubhai (US Patent Number US 20220100963 A1).
As to claims 3 and 13 OSUALA in view of Srinivasan teach The method as recited in claim 1, (see claim 1) and The non-transitory storage medium as recited in claim 11, (see claim 11).
OSUALA in view of Srinivasan do not specifically teach wherein the confidence level is a function of an extent to which rules in the rule set hold for words in the new document. However, Anubhai does teach this limitation (see Anubhai [0134] “As shown in FIG. 14A, a custom NLP model 2135 may be deployed to a production environment to perform model inference 2260. The model 2135 may be deployed to production after being trained using a set of training data 2245 (e.g., a corpus of annotated documents) and evaluated against one or more acceptance rules 2255. Inference based on the model 2135 may be monitored to collect inference data 2280. The inference data 2280 may include one or more inference inputs 2285. For example, the inference input(s) 2285 may include one or more input documents associated with low-confidence outputs and/or one or more input documents that statistically deviate with respect to the task definition from the corpus of documents used for training the model.”)
OSUALA in view of Srinivasan and Anubhai are in the same field of endeavor of natural language understanding therefore It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified The method as recited in claim 1, and The non-transitory storage medium as recited in claim 11, of OSUALA Srinivasan to incorporate the teachings of Anubhai to include wherein the confidence level is a function of an extent to which rules in the rule set hold for words in the new document. Doing so allows for improving the latency of event extraction from documents using automated techniques such as machine learning instead of manual review; (2) improving the scalability of event extraction from documents using automated techniques such as machine learning instead of manual review; (3) improving the accuracy of event extraction from documents using automated techniques for trigger detection, event detection, role assignment, trigger co-reference, and entity co-reference and so on, as recognized by Anubhai in paragraph [0025].
Claims 4, 8, 14, 18 are rejected under 35 U.S.C. 103 as being unpatentable over OSUALA (US Patent Number US 20200160191 A1), in view of Srinivasan (US Patent Number US 20060288268 A1), and further in view of Jacquet (US Patent Number US 20150127323 A1).
As to claims 4 and 14 OSUALA in view of Srinivasan teach The method as recited in claim 1, (see claim 1) and The non-transitory storage medium as recited in claim 11, (see claim 11).
OSUALA in view of Srinivasan do not teach wherein the rule is considered to hold for a word in the table in the new document if preconditions for the rule hold, and if a check concerning that rule is satisfied, However, Jacquet does teach this limitation (see Jacquet [0020] “The identification of similar paths is based on event clustering information under the assumption that related predicates will occur more often in the same events. This allows inference rules to be generated based on the identified, similar paths. In the exemplary embodiment, an unsupervised temporal-based clustering of events is used, and the cluster structure is used to weight candidate inference rules. Using a more accurate set of rules directly impacts the inference and results in better application performance. The utility of the refined rules is demonstrated below on a document clustering task where the refined rules improve the clustering. Semantic inference, and inference rules that enable it, are not limited to the clustering task but can be employed in many NLP applications, such as information extraction, question answering, and document summarization.”) wherein the check comprises determining clauses that must succeed and/or constraints that must hold for the rule to succeed when the preconditions are met. However, Jacquet does teach this limitation (see Jacquet [0021-0022] A "path," as used herein is a syntactic construct around a binary predicate, i.e., a predicate with two slots (i.e., variables) for the predicate's arguments (the subject and object of the predicate). In the path, the predicate is represented by its root (e.g., infinitive) form. An instance of a path is a triple in which the two slots are occupied by respective instances of the arguments and the predicate may be any of the forms of the predicate accepted in the particular grammar of the natural language under consideration. The instance of the path may be found in a corpus of text documents by parsing of the corpus documents. For example a path for the predicate find could be represented as: [0022] where X is the subject of the verb find and Y is the object of the verb find. An instance of this path could be the triple (Harry, find, Sally) where Harry is the subject of the verb find, occupying the first slot and Sally is the object of find, occupying the second slot. The triple could be identified in the corpus by parsing a sentence such as "Yesterday, Harry found Sally in the park."”)
OSUALA in view of Srinivasan teach and Jacquet are in the same field of endeavor of natural language understanding therefore It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified The method as recited in claim 1, and The non-transitory storage medium as recited in claim 11, of OSUALA Srinivasan to incorporate the teachings of Jacquet to include wherein the rule is considered to hold for a word in the new document if preconditions for the rule hold, and if a check concerning that rule is satisfied, and wherein the check comprises determining clauses that must succeed and/or constraints that must hold for the rule to succeed when the preconditions are met. Doing so allows for improved inference, results in better application performance, improve the clustering, information extraction, and document summarization, as recognized by Jacquet in paragraph [0020].
As to claims 8 and 18 OSUALA in view of Srinivasan teach The method as recited in claim 1, (see claim 1) and The non-transitory storage medium as recited in claim 11, (see claim 11).
OSUALA in view of Srinivasan do not specifically teach wherein the rule-set is employed with the new document based on a frequency with which the rule-set was determined to apply to the set of ground truth documents However, Jacquet does teach this limitation (see Jacquet [0065] The exemplary parser 30 may incorporate rules…) [0066] Prior to computing the event-based path similarity, corpus statistics are collected. For example, for every path, all the occurrences of nouns that instantiate each of its two slots are logged, as well as the frequency of these instantiations (e.g., number of occurrences, in the document corpus 12).”)
OSUALA in view of Srinivasan teach and Jacquet are in the same field of endeavor of natural language understanding therefore It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified The method as recited in claim 1, and The non-transitory storage medium as recited in claim 11, of OSUALA Srinivasan to incorporate the teachings of Jacquet to include wherein the rule-set is employed with the new document based on a frequency with which the rule-set was determined to apply to the set of ground truth documents. Doing so allows for improved inference, results in better application performance, improve the clustering, information extraction, and document summarization, as recognized by Jacquet in paragraph [0020].
Claims 7, 10, 17 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over OSUALA (US Patent Number US 20200160191 A1), in view of Srinivasan (US Patent Number US 20060288268 A1), and further in view of BENINCASA (US Patent Number US 20220309109 A1).
As to claims 7 and 17 OSUALA in view of Srinivasan teach The method as recited in claim 1, (see claim 1) and The non-transitory storage medium as recited in claim 11, (see claim 11).
OSUALA in view of Srinivasan do not specifically teach wherein the verification comprises a determination whether or not a cell-item of the table in the new document was correctly apply to the set of ground truth documents. However, BENINCASA does teach this limitation (see BENINCASA [0108] “The extracted answer may be presented at the user interface by rendering the corresponding characters (text) of the original document in a tabular format, in which each cell corresponds to a specific question and a specific document (answer table—see FIG. 8).”) (see BENINCASA [0158] “FIG. 3 is a schematic representation of the relationships between tokens, features and label. FIG. 3 uses a tabular format to represent those relationships (note, however, that, unlike the answer tables of FIGS. 2 and 8, the information in FIG. 3 is not generally something that is presented to the user). The left-hand column shows a token sequence S, and the right-hand column shows the corresponding label sequence L. Each row of the table corresponds to a position in the token sequence. The middle n rows correspond to the n feature functions respectively. Each cell of the left-hand row represents a token s.sub.i at the corresponding position i in the sequence S and each cell of the right-hand column represents that token's label l.sub.i. Each cell of the column corresponding to feature function ƒ.sub.j denotes feature j of token s.sub.i. As noted, each feature is a numerical value which may be weighted in the CRF in accordance with Equation 4, though the numerical values are not shown explicitly in FIG. 3. In FIG. 3, the labels in the right-hand column are ground truth labels, indicating relevancy or non-relevancy of the corresponding token to a specific user-defined question, as determined based on the user's highlighting of the original (underlying) document.”)
OSUALA in view of Srinivasan teach and BENINCASA are in the same field of endeavor of natural language understanding therefore It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified The method as recited in claim 1, and The non-transitory storage medium as recited in claim 11, of OSUALA Srinivasan to incorporate the teachings of BENINCASA to include wherein the verification comprises a determination whether or not a cell-item of a table in the new document was correctly apply to the set of ground truth documents. Doing so allows for an effective data extraction system which can be extended to new use cases/document types in a shorter amount of time, as recognized by BENINCASA in paragraph [0006].
As to claims 10 and 20 OSUALA in view of Srinivasan teach The method as recited in claim 1, (see claim 1) and The non-transitory storage medium as recited in claim 11, (see claim 11).
OSUALA in view of Srinivasan do not specifically teach further comprising performing a content extraction process However OSUALA in view of Kim teach does teach this limitation (see OSUALA [0108] “The extracted answer may be presented at the user interface by rendering the corresponding characters (text) of the original document in a tabular format, in which each cell corresponds to a specific question and a specific document (answer table—see FIG. 8).”) that includes using the rules in the rule-set to assign cell- items to the words in the new document. (see OSUALA [0158] FIG. 3 is a schematic representation of the relationships between tokens, features and label. FIG. 3 uses a tabular format to represent those relationships (note, however, that, unlike the answer tables of FIGS. 2 and 8, the information in FIG. 3 is not generally something that is presented to the user). The left-hand column shows a token sequence S, and the right-hand column shows the corresponding label sequence L. Each row of the table corresponds to a position in the token sequence. The middle n rows correspond to the n feature functions respectively. Each cell of the left-hand row represents a token s.sub.i at the corresponding position i in the sequence S and each cell of the right-hand column represents that token's label l.sub.i. Each cell of the column corresponding to feature function ƒ.sub.j denotes feature j of token s.sub.i. As noted, each feature is a numerical value which may be weighted in the CRF in accordance with Equation 4, though the numerical values are not shown explicitly in FIG. 3. In FIG. 3, the labels in the right-hand column are ground truth labels, indicating relevancy or non-relevancy of the corresponding token to a specific user-defined question, as determined based on the user's highlighting of the original (underlying) document.”)
OSUALA in view of Srinivasan teach and BENINCASA are in the same field of endeavor of natural language understanding therefore It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified The method as recited in claim 1, and The non-transitory storage medium as recited in claim 11, of OSUALA Srinivasan to incorporate the teachings of BENINCASA to include further comprising performing a content extraction process that includes using the rules in the rule-set to assign cell- items to the words in the new document. Doing so allows for an effective data extraction system which can be extended to new use cases/document types in a shorter amount of time, as recognized by BENINCASA in paragraph [0006].
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KRISTEN MICHELLE MASTERS whose telephone number is (703)756-1274. The examiner can normally be reached M-F 8:30 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Louis Desir can be reached at 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/KRISTEN MICHELLE MASTERS/Examiner, Art Unit 2659
/PIERRE LOUIS DESIR/Supervisory Patent Examiner, Art Unit 2659