DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 5/30/2024 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-2, 9-13, 15, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Meng et al. US 20230022845 A1 (hereinafter Meng) in view of Dinerstein et al. US 10339423 B1 (hereinafter Dinerstein) in view of Ouyang (US 20250238638 A1).
Regarding claim 1, Meng teaches a computer-implemented method comprising:
extracting data from the digital representation of the master document (FIG. 10, [0110] “in response to the one or more documents having been uploaded, various embodiments extract metadata from the document”);
applying the trained first machine learning model to map the extracted data to corresponding fields of the template (FIG. 7, [0101] “particular questions are mapped (e.g., via a hash map) to particular fields within the window pane”);
outputting the mappings (FIG. 7, 717);
training the first machine learning model further with the training dataset ([0026] “the questions into a feature vector embedding in feature space based at least in part on training one or more machine learning models in order to learn”);
Meng fails to teach receiving a request to train a trained first machine learning model further to extract data from digital representations of documents of a first document type, the request comprising a template for the first document type and a digital representation of a master document of the first document type that contains text in a first language; receiving one or more corrections to the mappings; modifying the mappings based on the one or more corrections to generate ground truth data for the digital representation of the master document; generating a training dataset comprising a plurality of fake documents of the first document type that contain text in a second language, wherein the fake documents are generated by a second machine learning model based at least in part on the ground truth data; and applying the trained first machine learning model to map data extracted from a digital representation of a document of the first document type that contains text in the second language to corresponding fields of the template.
However, Dinerstein teaches generating a training dataset comprising a plurality of fake documents of the first document type that contain text in a second language, wherein the fake documents are generated by a second machine learning model based at least in part on the ground truth data ([Column 2, line 19-21] “the set of simulated training documents may represent fake documents that contain text in the second language”);
Meng in view of Dinerstein are considered to be analogous to the claimed invention because both are the same field of improving machine learning models. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the techniques of improving predicting a type of data that one or more numerical characters and/or one or more natural language word characters of a document (e.g., an invoice) correspond to of Meng with the technique of generating a training dataset using fake documents taught by Dinerstein in order to improve methods of generating training documents used by classification algorithms (see Dinerstein [Abstract]).
Meng in view of Dinerstein fails to teach receiving a request to train a trained first machine learning model further to extract data from digital representations of documents of a first document type, the request comprising a template for the first document type and a digital representation of a master document of the first document type that contains text in a first language; receiving one or more corrections to the mappings; modifying the mappings based on the one or more corrections to generate ground truth data for the digital representation of the master document; and applying the trained first machine learning model to map data extracted from a digital representation of a document of the first document type that contains text in the second language to corresponding fields of the template.
However, Ouyang teaches receiving a request to train a trained first machine learning model further to extract data from digital representations of documents of a first document type, the request comprising a template for the first document type and a digital representation of a master document of the first document type that contains text in a first language ([0045] “Training a ML model generally involves inputting into an ML model (e.g., an untrained ML model) training data to be processed by the ML model, processing the training data using the ML model, collecting the output generated by the ML model (e.g., based on the inputted training data), and comparing the output to a desired set of target values. If the training data is labeled, the desired target values may be, e.g., the ground truth labels of the training data”, examiner interprets input as the request);
receiving one or more corrections to the mappings ([0098] “Each of the nodes 612 may be associated with a candidate task prompt and at least one candidate output generated by an LLM using the task prompt”);
modifying the mappings based on the one or more corrections to generate ground truth data for the digital representation of the master document ([0098] “Each of the edges 614 may be associated with a modification prompt and at least one candidate task prompt generated by an LLM using the modification prompt”; [0052] “The parameters of the CNN may be learned through training, using data having ground truth labels specific to the desired task”); and
applying the trained first machine learning model to map data extracted from a digital representation of a document of the first document type that contains text in the second language to corresponding fields of the template ([0061] “the transformer 50 is used for a translation task, the decoder 54 may map the feature vectors 62 into text output in a target language different from the language of the original tokens 56”, examiner interprets vectors as digital representation).
Meng in view of Dinerstein in view of Ouyang are considered to be analogous to the claimed invention because all are the same field of improving machine learning models. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the techniques of data extraction using training data of Meng in view of Dinerstein with the technique of modifying mappings taught by Ouyang in order to improve how large language models modify prompts (see Ouyang [0001]).
Regarding claim 2, Meng in view of Dinerstein in view of Ouyang teaches all of the limitations of claim 1, upon which claim 2 depends.
Additionally, Ouyang teaches wherein: the one or more corrections to the mappings are received via a user interface ([0078] “The I/O interface 526 may include any communication interface which enables the prompt modification processor 520 to communicate with external components”), and
the one or more corrections to the mappings comprise at least one of: a modification of the extracted data; an adjustment of field coordinates; or an addition of a field specified in the template that was not extracted from the digital representation of the master document by the trained first machine learning model ([0021] “The subsequent candidate prompt may include modified candidate instructions for processing the input data which are different from the candidate instructions”; [0096] “The prompt display subregion 602 may also allow users associated with the user devices 504 to further modify and/or amend the subsequent candidate task prompt initially generated by the LLM and displayed in the prompt display subregion 602 (e.g., using the user interface of the one of the user devices 504), including deletions and insertions of text of the subsequent candidate task prompt”).
Regarding claim 9, Meng in view of Dinerstein in view of Ouyang teaches all of the limitations of claim 1, upon which claim 9 depends.
Additionally, Dinerstein teaches receiving an indication to use the second language for the training dataset as part of the request; or receiving an indication to use the second language for the training dataset along with the one or more corrections to the mappings ([Column 1, line 40-51] “generating a list of tokens from within the training documents that indicate critical terms representative of classes defined by the classification system, (iii) translating the list of tokens from the first language to a second language”).
Regarding claim 10, Meng in view of Dinerstein in view of Ouyang teaches all of the limitations of claim 1, upon which claim 10 depends.
Additionally, Meng teaches storing the data extracted from the digital representation of the document of the first document type that contains the text in the second language in a database (FIG. 1, 125).
Regarding claim 11, Meng in view of Dinerstein in view of Ouyang teaches all of the limitations of claim 1, upon which claim 10 depends.
Additionally, Dinerstein teaches the plurality of fake documents of the first document type that contain text in the second language is a first plurality of fake documents ([Column 2, line 19-23] “the set of simulated training documents may represent fake documents that contain text in the second language but do not contain content that is comprehensible by a speaker of the second language”),
the training dataset further comprises a second plurality of fake documents of the first document type that contain text in a third language ([Column 3, line 48-51] “the systems and methods described herein may generate fake training documents based on the translated tokens that enable the classification system to classify documents written in the different language”),
the second plurality of fake documents are generated by the second machine learning model based at least in part on the ground truth data ([Column 14, line 3-10] “The system may then use the translated tokens to create fake training documents in the desired language that are statistically and/or structurally similar to the original training documents. As such, the system may use the simulated training documents to train a machine learning algorithm to accurately and efficiently classify documents written in the desired language”), and
the method further comprises at least one of: receiving an indication to use the second language and the third language for the training dataset as part of the request; or receiving an indication to use the second language and the third language for the training dataset along with the one or more corrections to the mappings ([Column 14, line 1-6] “The system may ensure that the correct context, usage and/or meaning of the tokens is retained during translation. The system may then use the translated tokens to create fake training documents in the desired language that are statistically and/or structurally similar to the original training documents”).
Regarding claim 12, Meng in view of Dinerstein in view of Ouyang teaches all of the limitations of claim 1, upon which claim 12 depends.
Additionally, Ouyang teaches wherein the second machine learning model comprises a Large Language Model (LLM) ([0037] A generative language model, such as an LLM as described below, may receive a task prompt comprising at least instructions and input data.).
Regarding claim 13, Meng in view of Dinerstein in view of Ouyang teaches all of the limitations of claim 1, upon which claim 13 depends.
Additionally, Meng teaches wherein training the trained first machine learning model further comprises transferring previously generated weights of the trained first machine learning model (fine-tuning means taking weights of a trained neural network and use it as initialization for a new model being trained on data from the same domain (e.g., documents)).
Regarding claim 15, Meng in view of Dinerstein teaches all of the limitations of claim 14, upon which claim 15 depends.
Additionally, Meng teaches a user interface receiving the template (FIG. 9, 901), the digital representation of the master document (FIG. 7, 717), and
Meng in view of Dinerstein fails to teach a request to train the trained first machine learning model further to extract data from digital representations of documents of the first document type that contain text in the second language.
However, Ouyang teaches a request to train the trained first machine learning model further to extract data from digital representations of documents of the first document type that contain text in the second language ([0045] “Training a ML model generally involves inputting into an ML model (e.g., an untrained ML model) training data to be processed by the ML model, processing the training data using the ML model, collecting the output generated by the ML model (e.g., based on the inputted training data), and comparing the output to a desired set of target values. If the training data is labeled, the desired target values may be, e.g., the ground truth labels of the training data”, examiner interprets input as the request)
Meng in view of Dinerstein in view of Ouyang are considered to be analogous to the claimed invention because all are the same field of improving machine learning models. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the techniques of data extraction using training data of Meng in view of Dinerstein with the technique of modifying mappings taught by Ouyang in order to improve how large language models modify prompts (see Ouyang [0001]).
Regarding claim 19, Meng in view of Dinerstein in view of Ouyang teaches all of the limitations of claim 15, upon which claim 19 depends.
Additionally, Meng teaches comprising computer-executable instructions that, when executed by the computing system, cause the computing system to perform: after training the trained first machine learning model further with the training dataset, applying the trained first machine learning model to extract data from a digital representation of a document of the first document type that contains text in the second language ([0079] “A retrained or trained word embedding receives training feedback after it has received initial training session(s) and is optimized or generated for a specific data set (e.g., trained invoices)”; [0054] “the pre-training component 108 additionally or alternatively uses other NLP-based functionality, such as Named Entity Recognition (NER). NER is an information extraction technique that identifies and classifies elements or “entities” in natural language text into predefined categories”; [0104] “The screenshot 800 additionally includes the window pane 817, which corresponds to a bill summary that indicates (e.g., in different natural language relative to the invoice 801)”).
Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Meng et al. US 20230022845 A1 (hereinafter Meng) in view of Dinerstein et al. US 10339423 B1 (hereinafter Dinerstein).
Regarding independent claim 14, Meng teaches a computing system comprising:
at least one hardware processor (FIG. 15, 14);
at least one memory coupled to the at least one hardware processor (FIG. 15, 12);
a first machine learning model trained with digital representations of a plurality of documents of a first document type that contain text in a first language ([0025] “various embodiments of the present disclosure are directed to using one or more machine learning models (e.g., a modified transformer) to predict a type of data that one or more numerical characters and/or one or more natural language word characters of a document (e.g., an invoice) correspond to”, examiner interprets invoices could be in digital form in a first language);
a second machine learning model comprising a Large Language Model (LLM) ([0122] “the one or more machine learning models used at 1305 includes a modified Bidirectional Encoder Representations from Transformers (BERT) model”); and
one or more non-transitory computer-readable media having stored therein computer-executable instructions that, when executed by the computing system, cause the computing system to perform ([0002] “Particular embodiments of the present disclosure include a computer-implemented method, a non-transitory computer storage medium”):
extracting data from a digital representation of a master document of the first document type that contains text in the first language (FIG. 10, [0110] “in response to the one or more documents having been uploaded, various embodiments extract metadata from the document”);
applying the trained first machine learning model to map the extracted data to corresponding fields of a template for the first document type (FIG. 7, [0101] “particular questions are mapped (e.g., via a hash map) to particular fields within the window pane”);
training the trained first machine learning model further with the training dataset ([0026] “the questions into a feature vector embedding in feature space based at least in part on training one or more machine learning models in order to learn”).
Meng fails to teach generating a training dataset comprising a plurality of fake documents of the first document type that contain text in a second language, wherein the fake documents are generated by the second machine learning model based at least in part on the mappings;
However, Dinerstein teaches generating a training dataset comprising a plurality of fake documents of the first document type that contain text in a second language, wherein the fake documents are generated by the second machine learning model based at least in part on the mappings ([Column 2, line 19-21] “the set of simulated training documents may represent fake documents that contain text in the second language”)
Meng in view of Dinerstein are considered to be analogous to the claimed invention because both are the same field of improving machine learning models. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the techniques of improving predicting a type of data that one or more numerical characters and/or one or more natural language word characters of a document (e.g., an invoice) correspond to of Meng with the technique of generating a training dataset using fake documents taught by Dinerstein in order to improve methods of generating training documents used by classification algorithms (see Dinerstein [Abstract]).
Allowable Subject Matter
Claims 3-8 and 16-18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Claim 20 is allowed. The following is a statement of reasons for the indication of allowable subject matter: The closest piece of prior the examiner found was Meng et al. (US 20230022845 A1) which teaches features such as “extracting data from the digital representation of the master document; applying the trained first machine learning model to map the extracted data to corresponding fields of the template; outputting the mappings”, however upon further search and consideration the examiner deems the prior art of record whether taken alone or in combination fails to teach “translating the ground truth data into each of the selected languages to generate translated data; outputting the translated data; receiving one or more corrections to the translated data; modifying the translated data based on the one or more corrections to the translated data to generate translated ground truth data for the digital representation of the master document” in combination with the other claim features, therefore claim 20 is allowable.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Unni et al. (US 20210390109 A1) teaches a system, method, and computer program product embodiments for extracting and posting data from an unstructured data file to a database table. In an embodiment, a server receives a request to extract and post data from an unstructured data. The server extracts the data from the unstructured data file. The server identifies a set of columns from the structured format of the extracted data. Each column of the set of columns corresponds with a set of data elements from the extracted data. The server identifies a pattern of a set of possible patterns corresponding with each column of the set of columns. Furthermore, the server maps each column of the set of columns with a database column. The server stores each set of data elements of each respective column in the respective database column.
Ruokonen et al. (US 20230120230 A1) teaches a method for translating a source text of a first language to a second language. The method includes receiving a translation request including the source text in the first language; selecting, from the source text, at least a first segment, associating at least one first metadata parameter with the first segment; providing the first segment to a first translation memory for determining a first set of translation proposals; determining a first quality score for each translation proposal; and comparing the first quality score of each translation proposal with a first predetermined acceptance threshold, and wherein based on the comparison, when a first quality score of at least one translation proposal is greater than the first predetermined acceptance threshold, the method comprises selecting a given translation proposal and providing the given translation proposal as an accepted translation of the first segment and as at least a part of an accepted translation of the source text.
Mortensen et al. (US 12443805 B1) teaches a method for generating a first case dataset in a first language. The method includes receiving adverse event data. The method further includes determining case data including general case data and regional case data and providing the case data to a translator computing device to enable display on a user interface including multiple duolingual text fields with a first language text field including at least a portion of the text data in the first language and a second language text field adjacent the first language text field. The method further includes receiving the text data in the second language from a translator computing device. The text data in the second language is received via the second language text fields of the plurality of duolingual text fields. The method further includes generating and outputting the first case dataset including the text data in the first language.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ZEESHAN SHAIKH whose telephone number is (703)756-1730. The examiner can normally be reached Monday-Friday 7:30AM-5:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached at (571) 272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ZEESHAN MAHMOOD SHAIKH/Examiner, Art Unit 2658
/RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658