Last updated: May 04, 2026

Application No. 18/410,785

TRAINING OF AN ELECTRONIC DOCUMENT EXTRACTION MODEL

Non-Final OA §103

Filed

Jan 11, 2024

Examiner

WINDSOR, COURTNEY J

Art Unit

2661

Tech Center

2600 — Communications

Assignee

Intuit Inc.

OA Round

1 (Non-Final)

Interview Optional

— +8.7% interview lift. Interview lift (+8.7%) is below the 15.0% threshold. A written response is recommended.

Based on 258 resolved cases, 2023–2026

Examiner Intelligence

WINDSOR, COURTNEY J View full profile →

Grants 86% — above average

Career Allowance Rate

223 granted / 258 resolved

+24.4% vs TC avg

Moderate +9% lift

Without

With

+8.7%

Interview Lift

resolved cases with interview

Typical timeline

2y 5m

Avg Prosecution

27 currently pending

Career history

285

Total Applications

across all art units

Statute-Specific Performance

§101

5.4%

-34.6% vs TC avg

§103

51.1%

+11.1% vs TC avg

§102

20.5%

-19.5% vs TC avg

§112

18.0%

-22.0% vs TC avg

Black line = Tech Center average estimate • Based on career data from 258 resolved cases

Office Action

§103

DETAILED ACTION

	Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on February 2, 2024 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-2, 6, 8, 11-12, 16 and 18are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Publication No. 2024/0256741 to Emanuel (hereinafter Emanuel), and further in view of U.S. Patent No. 8,996,350 to Dub et al. (hereinafter Dub).
Regarding independent claim 1, Emanuel discloses A computer-implemented method (abstract, “Systems and methods for asset fingerprinting and for authentication of rareness of a digital asset.;” paragraph 0017, “FIG. 1A shows an example of a computer network environment provided with a system for fingerprinting a digital asset and authentication of the rareness of the digital asset, according to principles of the disclosure;”) for generating training data for training an extraction model (paragraph 0101, “FIGS. 10A-10C illustrate original and transformed art images for training machine learning techniques (such as the logistic regression model) for authentication of rareness of a digital asset, according to an embodiment. FIGS. 10A-10C illustrate some examples of complex transformations.”), the method comprising:
obtaining a plurality of computer readable documents (paragraph 0061, “Finally, the processor 100 can be configured to generate a large corpus of artificially generated near-duplicate NFTs through transformation techniques applied to the NFTs in the subset of registered NFTs, as shown and described in the examples discussed below. For example, FIG. 4 depicts three sets of images 405, 410, 415 that each include an original image (leftmost image) and near-duplicate NFTs (remaining images) generated through various transformation techniques.”);
for each document of the plurality of computer readable documents, generating a document-level rareness metric based on the document (paragraph 0062, “Then, the processor 100 can be configured apply the digital asset fingerprinting and rareness evaluation protocols to the transformations, which can be stored in the database 170, for example. Specifically, a known near-duplicate NFT is selected from the corpus of artificially generated near-duplicate NFTs, its digital fingerprint vector is computed (e.g., according to method 200).”); and
sampling the plurality of computer readable documents at a document level based on the document-level rareness metrics of the plurality of computer readable documents to obtain a subset of computer readable documents (paragraph 0104, “The logistic regression model is trained by randomly selecting images from the generated duplicates and the true (unregistered) originals, producing their digital fingerprints and corresponding measures of statistical dependency (and their gains) with the registered images, and running the logistic regression model on the measures and gains for the top 10 registered fingerprints.”), wherein a training data to train an extraction model includes the subset of computer readable documents (paragraph 0101, “FIGS. 10A-10C illustrate original and transformed art images for training machine learning techniques (such as the logistic regression model) for authentication of rareness of a digital asset, according to an embodiment.”).
Emanuel fails to explicitly disclose as further recited. However, Dub discloses obtaining a plurality of computer readable documents (column 6, line 19, “In an embodiment, one or more documents are scanned and converted into one or more electronic files, such as to TIFF, PDF or other suitable format. ”), wherein each of the computer readable documents is generated by performing optical character recognition (OCR) on an electronic document (column 6, line 21, “The document(s) may be, thereafter, translated into text, such as by optical character recognition (“OCR”) or other suitable way.”).
Emanuel is directed toward, “Systems and methods for asset fingerprinting and for authentication of rareness of a digital asset (abstract).” Dub is directed toward, “A system for managing documents, comprising: interfaces to a user interface, proving an application programming interface, a database of document images, a remote server, configured to communicate a text representation of the document from the optical character recognition engine to the report server (abstract).” As can be easily seen by one of ordinary skill in the art before the effective filing date of the claimed invention Emanuel and Dub are directed toward similar methods of endeavor of digital document analysis. Further, one of ordinary skill in the art before the effective filing date would be well aware the digital files analyzed for rareness values can contain text data. Further OCR is known to be a method of analyzing text in a data file. Thus, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of Dub in order to ensure rareness calculations can take into account text data present in the digital file, ideally making the rareness calculation more accurate.
Regarding dependent claim 2, the rejection of claim 1 is incorporated herein. Additionally, Emanuel in the combination further discloses wherein a document-level rareness metric of a first document in the plurality of computer readable documents includes a structural rareness metric that indicates a rareness of a structure in the first document (paragraph 0051, “At step 305, the RRE 155 compares the digital fingerprint vector to the digital fingerprint vectors for previously registered NFTs in the database. As a result of the comparison, the RRE computes a relative rareness score at step 310. In an embodiment, this score is a number between 0% (i.e., the NFT is identical to an existing NFT) to 100% (i.e., the NFT is not even similar to any known NFT). At step 315, the RRE determines whether the NFT is a near duplicate to a previously registered NFT based on the RRS;” structure is read as embedded in the NFTs, thus the rareness is based on the structure of the file itself).
Regarding dependent claim 6, the rejection of claim 1 is incorporated herein. Additionally, Emanuel in the combination further discloses wherein the document-level rareness metric of a first document in the plurality of computer readable documents includes a content rareness metric that indicates a rareness of a content in the first document (paragraph 0051, “At step 305, the RRE 155 compares the digital fingerprint vector to the digital fingerprint vectors for previously registered NFTs in the database. As a result of the comparison, the RRE computes a relative rareness score at step 310. In an embodiment, this score is a number between 0% (i.e., the NFT is identical to an existing NFT) to 100% (i.e., the NFT is not even similar to any known NFT). At step 315, the RRE determines whether the NFT is a near duplicate to a previously registered NFT based on the RRS;” content is read as the data within the NFTs, thus the rareness is based on the content of the file and if it matches or does not match any other NFTs).
Regarding dependent claim 8, the rejection of claim 1 is incorporated herein. Additionally, Emanuel in the combination further discloses wherein the document-level rareness metric of a first document in the plurality of computer readable documents includes a combination of a structural rareness metric that indicates a rareness of a structure in the first document and a content rareness metric that indicates a rareness of a content in the first document (paragraph 0051, “At step 305, the RRE 155 compares the digital fingerprint vector to the digital fingerprint vectors for previously registered NFTs in the database. As a result of the comparison, the RRE computes a relative rareness score at step 310. In an embodiment, this score is a number between 0% (i.e., the NFT is identical to an existing NFT) to 100% (i.e., the NFT is not even similar to any known NFT). At step 315, the RRE determines whether the NFT is a near duplicate to a previously registered NFT based on the RRS;” content is read as the data within the NFTs, thus the rareness is based on the content of the file and if it matches or does not match any other NFTs; structure is read as embedded in the NFTs, thus the rareness is based on the structure of the file itself; paragraph 0058, “ The RRE can be configured to combine the results of the processes described above to generate various sub-scores that can be transformed to a single number between 0.00%-100.00%. One sub-score sums up the various similarity measures and compares the sum to the maximum if the NFTs were the same, essentially averaging the result of the different similarity measures to the extent they are available.”).
Regarding independent claim 11, the rejection of claim 1 applies directly. Additionally, Emanuel further discloses A system (abstract, “Systems and methods for asset fingerprinting and for authentication of rareness of a digital asset.;” paragraph 0017, “FIG. 1A shows an example of a computer network environment provided with a system for fingerprinting a digital asset and authentication of the rareness of the digital asset, according to principles of the disclosure;”) for generating training data for training an extraction model (paragraph 0101, “FIGS. 10A-10C illustrate original and transformed art images for training machine learning techniques (such as the logistic regression model) for authentication of rareness of a digital asset, according to an embodiment. FIGS. 10A-10C illustrate some examples of complex transformations.”), the system comprising:
one or more processors (Figure 1B, element 110, “CPU/GPU); and
a memory storing instructions that, when executed by the one or more processors, causes the system to perform operations (paragraph 0012, “The system comprises a processing circuit and a non-transitory storage medium storing a registry of registered digital fingerprints, and machine learning models. Also stored on the storage medium are instructions that, when executed by the processing circuit”) comprising:
obtaining a plurality of computer readable documents (paragraph 0061, “Finally, the processor 100 can be configured to generate a large corpus of artificially generated near-duplicate NFTs through transformation techniques applied to the NFTs in the subset of registered NFTs, as shown and described in the examples discussed below. For example, FIG. 4 depicts three sets of images 405, 410, 415 that each include an original image (leftmost image) and near-duplicate NFTs (remaining images) generated through various transformation techniques.”);
for each document of the plurality of computer readable documents, generating a document-level rareness metric based on the document  (paragraph 0062, “Then, the processor 100 can be configured apply the digital asset fingerprinting and rareness evaluation protocols to the transformations, which can be stored in the database 170, for example. Specifically, a known near-duplicate NFT is selected from the corpus of artificially generated near-duplicate NFTs, its digital fingerprint vector is computed (e.g., according to method 200).”); and
sampling the plurality of computer readable documents at a document level based on the document-level rareness metrics of the plurality of computer readable documents to obtain a subset of computer readable documents (paragraph 0104, “The logistic regression model is trained by randomly selecting images from the generated duplicates and the true (unregistered) originals, producing their digital fingerprints and corresponding measures of statistical dependency (and their gains) with the registered images, and running the logistic regression model on the measures and gains for the top 10 registered fingerprints.”), wherein a training data to train an extraction model includes the subset of computer readable documents (paragraph 0101, “FIGS. 10A-10C illustrate original and transformed art images for training machine learning techniques (such as the logistic regression model) for authentication of rareness of a digital asset, according to an embodiment.”).
Emanuel fails to explicitly disclose as further recited. However, Dub discloses obtaining a plurality of computer readable documents (column 6, line 19, “In an embodiment, one or more documents are scanned and converted into one or more electronic files, such as to TIFF, PDF or other suitable format. ”), wherein each of the computer readable documents is generated by performing optical character recognition (OCR) on an electronic document (column 6, line 21, “The document(s) may be, thereafter, translated into text, such as by optical character recognition (“OCR”) or other suitable way.”)
Emanuel is directed toward, “Systems and methods for asset fingerprinting and for authentication of rareness of a digital asset (abstract).” Dub is directed toward, “A system for managing documents, comprising: interfaces to a user interface, proving an application programming interface, a database of document images, a remote server, configured to communicate a text representation of the document from the optical character recognition engine to the report server (abstract).” As can be easily seen by one of ordinary skill in the art before the effective filing date of the claimed invention Emanuel and Dub are directed toward similar methods of endeavor of digital document analysis. Further, one of ordinary skill in the art before the effective filing date would be well aware the digital files analyzed for rareness values can contain text data. Further OCR is known to be a method of analyzing text in a data file. Thus, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of Dub in order to ensure rareness calculations can take into account text data present in the digital file, ideally making the rareness calculation more accurate.
Regarding dependent claim 12, the rejection of claim 11 is incorporated herein. Additionally, Emanuel in the combination further discloses wherein a document-level rareness metric of a first document in the plurality of computer readable documents includes a structural rareness metric that indicates a rareness of a structure in the first document (paragraph 0051, “At step 305, the RRE 155 compares the digital fingerprint vector to the digital fingerprint vectors for previously registered NFTs in the database. As a result of the comparison, the RRE computes a relative rareness score at step 310. In an embodiment, this score is a number between 0% (i.e., the NFT is identical to an existing NFT) to 100% (i.e., the NFT is not even similar to any known NFT). At step 315, the RRE determines whether the NFT is a near duplicate to a previously registered NFT based on the RRS;” structure is read as embedded in the NFTs, thus the rareness is based on the structure of the file itself).
Regarding dependent claim 16, the rejection of claim 11 is incorporated herein. Additionally, Emanuel in the combination further discloses wherein the document-level rareness metric of a first document in the plurality of computer readable documents includes a content rareness metric that indicates a rareness of a content in the first document (paragraph 0051, “At step 305, the RRE 155 compares the digital fingerprint vector to the digital fingerprint vectors for previously registered NFTs in the database. As a result of the comparison, the RRE computes a relative rareness score at step 310. In an embodiment, this score is a number between 0% (i.e., the NFT is identical to an existing NFT) to 100% (i.e., the NFT is not even similar to any known NFT). At step 315, the RRE determines whether the NFT is a near duplicate to a previously registered NFT based on the RRS;” content is read as the data within the NFTs, thus the rareness is based on the content of the file and if it matches or does not match any other NFTs).
Regarding dependent claim 18, the rejection of claim 11 is incorporated herein. Additionally, Emanuel in the combination further discloses wherein the document-level rareness metric of a first document in the plurality of computer readable documents includes a combination of a structural rareness metric that indicates a rareness of a structure in the first document and a content rareness metric that indicates a rareness of a content in the first document (paragraph 0051, “At step 305, the RRE 155 compares the digital fingerprint vector to the digital fingerprint vectors for previously registered NFTs in the database. As a result of the comparison, the RRE computes a relative rareness score at step 310. In an embodiment, this score is a number between 0% (i.e., the NFT is identical to an existing NFT) to 100% (i.e., the NFT is not even similar to any known NFT). At step 315, the RRE determines whether the NFT is a near duplicate to a previously registered NFT based on the RRS;” content is read as the data within the NFTs, thus the rareness is based on the content of the file and if it matches or does not match any other NFTs; structure is read as embedded in the NFTs, thus the rareness is based on the structure of the file itself; paragraph 0058, “ The RRE can be configured to combine the results of the processes described above to generate various sub-scores that can be transformed to a single number between 0.00%-100.00%. One sub-score sums up the various similarity measures and compares the sum to the maximum if the NFTs were the same, essentially averaging the result of the different similarity measures to the extent they are available.”).

Allowable Subject Matter
Claims 3-5, 7, 9-10, 13-15, 17 and 19-20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Claims 3-5 and 13-15:
The following is a statement of reasons for the indication of allowable subject matter: the closest prior arts of record teach methods of training machine learning algorithms to quantify document rareness. However, none of them alone or in any combination teaches determining a group of all documents of a specific type, and calculating a centroid of the group including the first document, then calculating a distance between the centroid and an embedding of the first document in order to determine a structural rareness. The closest prior art being Emanuel discloses methods of determining rareness between NFT files (abstract). Further, Emanuel discloses, “The FE processes the digital asset (e.g., NFT image) using trained neural network models, which generate a digital fingerprint vector representation of the image. The RRE compares the digital fingerprint vectors to a dataset of registered digital fingerprint vectors using multiple correlation measures (abstract)” and determining rarity from 0-100 (paragraph 0038).  
However, Emanuel fails to disclose determining a group of all documents of a specific type, and calculating a centroid of the group including the first document, then calculating a distance between the centroid and an embedding of the first document in order to determine a structural rareness.
Claims 7 and 17:
The following is a statement of reasons for the indication of allowable subject matter: the closest prior arts of record teach methods of training machine learning algorithms to quantify document rareness. However, none of them alone or in any combination teaches determining a content rareness metric of a digital document based on identifying fields to be extracted in a document group of the same type and identifying which fields are in an original document. The closest prior art being Emanuel discloses methods of determining rareness between NFT files (abstract). Further, Emanuel discloses, “The FE processes the digital asset (e.g., NFT image) using trained neural network models, which generate a digital fingerprint vector representation of the image. The RRE compares the digital fingerprint vectors to a dataset of registered digital fingerprint vectors using multiple correlation measures (abstract)” and determining rarity from 0-100 (paragraph 0038).  
However, Emanuel fails to disclose fails to disclose determining a content rareness metric of a digital document based on identifying fields to be extracted in a document group of the same type and identifying which fields are in an original document.
Claims 9-10 and 19-20:
The following is a statement of reasons for the indication of allowable subject matter: the closest prior arts of record teach methods of training machine learning algorithms to quantify document rareness. However, none of them alone or in any combination teaches sampling documents based on utilizing a top-percentile sampling method of the rareness metrics used to train the extraction model.
The closest prior art being Emanuel discloses methods of determining rareness between NFT files (abstract). Further, Emanuel discloses training of the model used in a supervised manner (paragraph 0063) using images and their respective vectors (paragraph 0084).
However, Emanuel fails to disclose fails to disclose sampling documents based on utilizing a top-percentile sampling method of the rareness metrics used to train the extraction model.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
U.S. Publication No. 2021/0248446 to Hughes et al. discloses, “A method and system of matching a first product with a second product. The method including converting first product metadata with image metadata and textual data to a first product feature vector (abstract).” 
U.S. Patent No. 10,891,699 to Admon discloses, “Systems and methods in support of digital document analysis receive a data file having a document containing text; determine a document classification for the document and at least one defined external consideration relating to the first document (abstract).”

Contact
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Courtney J. Nelson whose telephone number is (571)272-3956. The examiner can normally be reached Monday - Friday 8:00 - 4:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, John Villecco can be reached at 571-272-7319. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/COURTNEY JOAN NELSON/Primary Examiner, Art Unit 2661

Read full office action

Prosecution Timeline

Jan 11, 2024

Application Filed

Feb 24, 2026

Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/551,988

Patent 12609201

PROGNOSIS PREDICTION DEVICE, PROGNOSIS PREDICTION METHOD, AND PROGRAM

2y 7m to grant Granted Apr 21, 2026

17/906,054

Patent 12603175

METHOD AND APPARATUS FOR DETERMINING DIAGNOSIS RESULT DATA

2y 9m to grant Granted Apr 14, 2026

18/059,154

Patent 12597188

SYSTEMS AND METHODS FOR PROCESSING ELECTRONIC IMAGES FOR PHYSIOLOGY-COMPENSATED RECONSTRUCTION

3y 4m to grant Granted Apr 07, 2026

18/073,290

Patent 12597494

METHOD AND APPARATUS FOR TRAINING MEDICAL IMAGE REPORT GENERATION MODEL, AND IMAGE REPORT GENERATION METHOD AND APPARATUS

3y 4m to grant Granted Apr 07, 2026

17/988,138

Patent 12588881

PROVIDING A RESULT DATA SET

3y 4m to grant Granted Mar 31, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

86%

Grant Probability

95%

With Interview (+8.7%)

2y 5m (~2m remaining)

Median Time to Grant

Low

PTA Risk

Based on 258 resolved cases by this examiner. Grant probability derived from career allowance rate.