Office Action Analysis: 18129194 — ARTIFICIAL INTELLIGENCE PLATFORM TO MANAGE A DOCUMENT COLLECTION

Office Action

§101 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
Claims 1 – 20 are pending and examined herein. 
Claims 1 – 20 are rejected under 35 U.S.C. 101.
Claims 1 – 20 are rejected under 35 U.S.C. 103.

Specification
The disclosure is objected to because of the following informalities:
Reference 304 used to refer “trained data” in [0098]. 304 has been used in other paragraphs to refer data in general (mostly refer to data from source data). It is unclear whether 304 used in previous paragraphs are also trained data. 
Reference 1208 in Fig. 12 is never introduced in specification. 
Appropriate correction is required.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1 - 20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 

MPEP § 2109(III) sets out steps for evaluating whether a claim is drawn to patent-eligible subject matter. The analysis of claims 1-20, in accordance with these steps, follows. 

Step 1 Analysis:
Step 1 is to determine whether the claim is directed to a statutory category (process, machine, manufacture, or composition of matter.
Claims 1 – 7 are directed to a method, meaning that it is directed to the statutory category of process. Claims 8 – 14 are directed to a non-transitory computer-readable storage medium, which can be an article of manufacture. Claims 15 – 20 are directed to a computing apparatus, which can be an article of machine. 

Step 2A Prong One, Step 2A Prong Two, and Step 2B Analysis:
Step 2A Prong One asks if the claim recites a judicial exception (abstract idea, law of nature, or natural phenomenon). If the claim recites a judicial exception, analysis proceeds to Step 2A Prong Two, which asks if the claim recites additional elements that integrate the abstract idea into a practical application. If the claim does not integrate the judicial exception, analysis proceeds to Step 2B, which asks if the claim amounts to significantly more than the judicial exception. If the claim does not amount to significantly more than the judicial exception, the claim is not eligible subject matter under 35 U.S.C. 101.

Regarding claim 1, the following claim elements are abstract ideas:
identifying a set of common parameters shared by the set of related information blocks; (This is practical to perform in the human mind under its broadest reasonable interpretation aside from the recitation of generic computer components or by a human using a pen and paper.) 
generating a document rule based on the set of common parameters shared by the set of related information blocks; (This is practical to perform in the human mind under its broadest reasonable interpretation aside from the recitation of generic computer components or by a human using a pen and paper.)
The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
	retrieving data from a document corpus associated with a defined entity, (This is mere data gathering, an insignificant extra solution activity, which is a well-understood, routine conventional activity. It does not integrate the judicial exception into a practical application. See MPEP § 2106.05(d). Therefore, this does not amount to significantly more than the judicial exception.)
the document corpus comprising a set of document records, each document record to include a signed electronic document, each signed electronic document to include a set of information blocks; (This falls under mere instructions to apply an exception. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)
identifying a set of related information blocks using a machine learning model, each information block in the set of related information blocks part of a different signed electronic document of the document corpus associated with the defined entity; (This falls under mere instructions to apply an exception. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)
storing the document rule as part of a document rule set for the defined entity in a rules database. (This is mere data gathering, an insignificant extra solution activity, which is a well-understood, routine conventional activity. It does not integrate the judicial exception into a practical application. See MPEP § 2106.05(d). Therefore, this does not amount to significantly more than the judicial exception.)

Regarding claim 2, the rejection of claim 1 is incorporated herein. Further, claim 2 recites the following additional elements:
converting the information blocks of the signed electronic documents of the document corpus from image components into text components. (This falls under mere instructions to apply an exception. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)

Regarding claim 3, the rejection of claim 1 is incorporated herein. Further, claim 3 recites the following additional elements:
pre-processing the information blocks of the signed electronic documents of the document corpus to a defined data schema. (This falls under mere instructions to apply an exception. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)

Regarding claim 4, the rejection of claim 1 is incorporated herein. Further, claim 4 recites the following additional elements:
processing data from the information blocks of the signed electronic documents from the document corpus to obtain one or more features for training the machine learning model. (This is mere data gathering, an insignificant extra solution activity, which is a well-understood, routine conventional activity. It does not integrate the judicial exception into a practical application. See MPEP § 2106.05(d). Therefore, this does not amount to significantly more than the judicial exception.)

Regarding claim 5, the rejection of claim 1 is incorporated herein. Further, claim 5 recites the following additional elements:
training the machine learning model on training data from the signed electronic documents. (This falls under mere instructions to apply an exception. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)

Regarding claim 6, the rejection of claim 1 is incorporated herein. Further, claim 6 recites the following additional elements:
training the machine learning model using structured data, unstructured data, or semi-structured data from the signed electronic documents. (This falls under mere instructions to apply an exception. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)

Regarding claim 7, the rejection of claim 1 is incorporated herein. Further, claim 7 recites the following abstract idea: 
evaluating the machine learning model on testing data from the signed electronic documents. (This is practical to perform in the human mind under its broadest reasonable interpretation aside from the recitation of generic computer components or by a human using a pen and paper.)
Claim 7 does not recite additional elements.

Claims 8 – 14 recite substantially similar subject matter to claims 1 – 7 respectively and are rejected with the same rationale, mutatis mutandis.

Claim 15 further recites following additional elements:
A computing apparatus, comprising: processing circuitry; and a memory communicatively coupled to the processing circuitry, the memory storing instructions that, when executed by the processing circuitry, configure the processing circuitry to: (This falls under mere instructions to apply an exception. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)
The rest of claim 15 and claims 16 – 20 recite substantially similar subject matter to claims 1 – 6 respectively and are rejected with the same rationale, mutatis mutandis.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1 – 20 are rejected under 35 U.S.C. 103 as being unpatentable over Li et al. (U.S. Pub. 2023/0004604 A1) in view of Castellanos (U.S. Pub. 2005/0182736 A1).
Regarding Claim 1, Li teaches
A method, comprising: retrieving data from a document corpus associated with a defined entity, the document corpus comprising a set of document records, each document record to include a signed electronic document, each signed electronic document to include a set of information blocks; ([0010] of Li states “receive first input data comprising a document bundle; extract, from the document bundle, first information comprising substantive content of one or more documents of the document bundle; extract, from the document bundle, second information comprising metadata associated with one or more documents of the document bundle;” [0012] of Li states “generating the output data is further based on information obtained from an ERP system of an entity associated with the document bundle.” (Which means data obtained as document bundle would be associated with entity. ) [0018] of Li states “receive an electronic document comprising one or more signatures;”)
identifying a set of related information blocks using a machine learning model, each information block in the set of related information blocks part of a different signed electronic document of the document corpus associated with the defined entity; ([0015] of Li states “generating the output data comprises applying a page similarity assessment model to a plurality of pages of the document bundle.” [0093] of Li states “In some embodiments, the system may apply a page-similarity model as part of a document-understanding pipeline. In some embodiments, a page-similarity model may be the first step applied in a document-understanding pipeline… The page-similarity model may include one or more of the following: a random forest classification of image features (e.g., low-level image features) such as Oriented FAST and rotated BRIEF (ORB), Structural Similarity (SSIM) index, and histograms of images using different distance metrics such as correlation, chi-squared, intersection, Hellinger, etc.“)
Li does not explicitly teach that 
identifying a set of common parameters shared by the set of related information blocks;
generating a document rule based on the set of common parameters shared by the set of related information blocks;
and storing the document rule as part of a document rule set for the defined entity in a rules database.
However, Castellanos teaches that 
identifying a set of common parameters shared by the set of related information blocks; ([0021] of Castellanos states “The learning arrangement 108 may be used to programmatically build a knowledge base that links the annotations to various patterns found in the annotated samples 104.” [0022] of Castellanos states “The training element 110 is generally used to sift through data and determine important relations within that data. In this case, the functions provided by the training element 110 may include identifying patterns within the documents and determining whether the existence of a particular pattern is indicative of an annotation associated with that pattern.” [0025] of Castellanos states “The extractor element 116 may access legacy documents in the legacy contracts database 118 and rules in the rules database 114 to identify language patterns of the rules in the legacy documents. The patterns may be used by the extractor element 116 to identify which annotations to potentially associate with the corresponding portions (i.e. values) of the legacy documents. The extractor element 116 may use one or more statistical analyses to choose the most likely annotations to associate with parts of the legacy documents.” These language patterns are found in legacy documents to match or label those corresponding portions across documents. Therefore, the system derives pattern from multiple documents and use them as the basic for recognition like common parameters. )
generating a document rule based on the set of common parameters shared by the set of related information blocks; ([0043] of Castellanos states “To generate the second part of the rule (the regular expression), a number of different techniques may be used to identify valid expressions. The techniques that suit this domain may be based on machine learning” [0044] of Castellanos states ”Any technique used to generate these regular expressions should be supervised (or at least semi-supervised), which means that the algorithm requires a set of contracts with tagged examples, called training set, from which patterns are learned, as previously explained. The tags of the training instances are used to guide the creation of rules and also to test the performance of proposed rules.” The learned patterns/tags are used across multiple instances (i.e., across the related information blocks) to guide the creation of rules. [0052] of Castellanos states “As previously discussed, the process of rule generation (218) may include identifying patterns and generating rules associated with the patterns.”)
and storing the document rule as part of a document rule set for the defined entity in a rules database. ([0020] of Castellanos states ”The present disclosure describes applying information extraction technologies to contract-management knowledge. Information extraction systems require a separate set of rules for each domain, whether extracting from structured, semi-structured or free text. This makes machine learning an attractive option for knowledge acquisition.” [0023] of Castellanos states ”The knowledge produced by the training and testing elements 110, 112 may be placed in a rules database 114. This database 114 may be any form of data storage element suitable for storing the information such as rules linking syntactical patterns with annotations that are extracted by the learning arrangement 108.” [0024] of Castellanos states “The rules database 114 may be accessed by an extractor element 116. The extractor element 116 may apply the knowledge stored in the rules database 114 to legacy contracts.”)
It would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to combine the teachings of Li and Castellanos. Li teaches an automated document processing pipeline for a plurality of electronic documents including document conversion with OCR to structured data, feature engineering, document similarity measurement, classification, and applying ML models to extract output data from the documents. Castellanos teaches learning patterns from documents to generate document rules, store the learned rules in a rules database, and evaluating learned rules on testing data by measuring recall/precision. One with ordinary skill in the art would be motivated to incorporate the teachings of Castellanos into Li to apply a known approach for generating document rules from learned document patterns and storing the document rules in a rules database for document processing of signed electronic documents. Therefore, it would have been predictable combination for more consistent and robust document processing. 

Regarding claim 2, the rejection of claim 1 is incorporated herein. Furthermore, the combination of Li and Castellanos teaches
comprising converting the information blocks of the signed electronic documents of the document corpus from image components into text components. ([0029] of Li states “apply a set of data conversion processing steps to the plurality of electronic documents to generate a processed data set comprising structured data generated based on the plurality of electronic documents, wherein applying set of data conversion processing steps comprises applying one or more deep-learning-based optical character recognition (OCR) models;” OCR converts image components into text components in documents. [0032] of Li states “In some embodiments of the third system, applying the one or more deep-learning-based OCR models comprises: applying a text-detection model; and applying a text-recognition model.” [0033] of Li states “In some embodiments of the third system, applying the set of data conversion processing steps comprises, after applying the one or more deep-learning-based OCR models, applying an image-level feature engineering processing step to generate the structured data.”)

Regarding claim 3, the rejection of claim 1 is incorporated herein. Furthermore, the combination of Li and Castellanos teaches
comprising pre-processing the information blocks of the signed electronic documents of the document corpus to a defined data schema. ([0048] of Li states “apply a set of data conversion processing steps to the plurality of electronic documents to generate a processed data set comprising structured data generated based on the plurality of electronic documents” generates defined structured representation through processing data. [0024] of Castellanos states “The legacy contracts are converted to a machine readable format before being placed in the database 118. This conversion may involve converting electronic documents into a standard data format and/or converting paper documents to an electronic format using Optical Character Recognition (OCR) or similar technologies.” The system converts document into defined data format for use.)

Regarding claim 4, the rejection of claim 1 is incorporated herein. Furthermore, the combination of Li and Castellanos teaches
comprising processing data from the information blocks of the signed electronic documents from the document corpus to obtain one or more features for training the machine learning model. ([0033] of Li states “after applying the one or more deep-learning-based OCR models, applying an image-level feature engineering processing step to generate the structured data.” Where it applies feature engineering after OCR to obtain features. [0094] of Li states “In some embodiments, the page-classification module may be configured to classifying the one or more pages (e.g., the first page) of a bundle of documents determined using a Support Vector Machine (SVM) classifier and features as the image text through TFIDF and visual features of the VGG16 Model.” Features extracted from document text or images to use for ML model.)

Regarding claim 5, the rejection of claim 1 is incorporated herein. Furthermore, the combination of Li and Castellanos teaches
comprising training the machine learning model on training data from the signed electronic documents. ([0021] of Castellanos states “In general, the annotated samples 104 include contractual language and annotations describing the contractual language. A learning arrangement 108 may use the contract language and annotations as input to a training element 110 and/or a testing element 112.” [0044] of Castellanos states “Any technique used to generate these regular expressions should be supervised (or at least semi-supervised), which means that the algorithm requires a set of contracts with tagged examples, called training set, from which patterns are learned, as previously explained.” [0029] of Li states “receiving user input indicating a plurality of data labels for the structured data; and applying a knowledge-based deep learning model based on the structured data and the plurality of data labels; and generating output data extracted from the plurality of electronic documents.”)

Regarding claim 6, the rejection of claim 1 is incorporated herein. Furthermore, the combination of Li and Castellanos teaches
comprising training the machine learning model using structured data, unstructured data, or semi-structured data from the signed electronic documents. ([0020] of Castellanos states “The present disclosure describes applying information extraction technologies to contract-management knowledge. Information extraction systems require a separate set of rules for each domain, whether extracting from structured, semi-structured or free text. This makes machine learning an attractive option for knowledge acquisition.”)

Regarding claim 7, the rejection of claim 1 is incorporated herein. Furthermore, the combination of Li and Castellanos teaches
comprising evaluating the machine learning model on testing data from the signed electronic documents. ([0045] of Castellanos states “Once the rule set covers all the tagged instances in the training set, the rules are applied (220) to a sample set. The extractor 116 (see FIG. 1) applies the rules in the repository 114 to a subset of untagged instances to automatically extract values which then are corrected by the user. The results of this testing on the sample set are used to compute (222) recall and precision of the rules.” [0046] of Castellanos states “Recall and precision are two typical measures of the quality (i.e., accuracy) of the extraction rules. Precision is the proportion of correct extractions from all the extractions done (i.e., measure of correctness). Recall is the proportion of correct extractions from all the extractions that had to be done (i.e. measure of completeness). If the resulting recall and precision do not meet or exceed (224) predetermined thresholds, the process is repeated.”)

Claims 8 – 14 recite substantially similar subject matter to claims 1 – 7 respectively and are rejected with the same rationale, mutatis mutandis.

Claims 15 – 20 recite substantially similar subject matter to claims 1 – 6 respectively and are rejected with the same rationale, mutatis mutandis.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BYUNGKWON HAN whose telephone number is (571)272-5294. The examiner can normally be reached M-F: 9:00AM-6PM PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached at (571)272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/BYUNGKWON HAN/Examiner, Art Unit 2121                                                                                                                                                                                                        

/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121
Read full office action
ARTIFICIAL INTELLIGENCE PLATFORM TO MANAGE A DOCUMENT COLLECTION

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

ARTIFICIAL INTELLIGENCE PLATFORM TO MANAGE A DOCUMENT COLLECTION

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email