Prosecution Insights
Last updated: April 19, 2026
Application No. 18/313,426

IMAGE PROCESSING TECHNIQUES FOR GENERATING PREDICTIONS

Non-Final OA §101§103§112
Filed
May 08, 2023
Examiner
CASTILLO-TORRES, KEISHA Y
Art Unit
2659
Tech Center
2600 — Communications
Assignee
Optum Services (Ireland) Limited
OA Round
3 (Non-Final)
74%
Grant Probability
Favorable
3-4
OA Rounds
3y 0m
To Grant
99%
With Interview

Examiner Intelligence

Grants 74% — above average
74%
Career Allow Rate
80 granted / 108 resolved
+12.1% vs TC avg
Strong +30% interview lift
Without
With
+30.5%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
32 currently pending
Career history
140
Total Applications
across all art units

Statute-Specific Performance

§101
26.2%
-13.8% vs TC avg
§103
42.9%
+2.9% vs TC avg
§102
15.1%
-24.9% vs TC avg
§112
8.8%
-31.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 108 resolved cases

Office Action

§101 §103 §112
DETAILED ACTION This communication is in response to the Amendments and Arguments filed on 01/05/2026. Claim(s) 12 has been canceled by the Applicant. Claim(s) 1-11 and 13-20 are pending and have been examined. Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Continued Examination Under 37 CFR 1.114 A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 01/05/2026 has been entered. Response to Arguments and Amendments Amendments to the claims by the Applicant have been considered and addressed below. With respect to the 35 USC § 112, 101 and 103 rejections, the Applicant provides several arguments in which the Examiner will respond accordingly, below. 35 USC § 112 rejection(s) Arguments in pages 10-11 of the Remarks filed on 01/05/2026. Examiner’s Response to Arguments: Applicant' s arguments and respective amendments with respect to the 35 USC § 112 rejection(s) of claims 1-16 have been fully considered and are persuasive. The 35 USC § 112 rejection(s) of claims 1-16 have been withdrawn. However, Applicant' s arguments and/or respective amendments with respect to the 35 USC § 112 rejection(s) of claim 18 have been fully considered and are not persuasive. The 35 USC § 112 rejection(s) of claim 18 is maintained. The Examiner refers the Applicant to the updated 35 USC § 112 rejection for claim 18. 35 USC § 101 rejection(s) Arguments in pages 11-12 of the Remarks filed on 01/05/2026. Examiner’s Response to Arguments: Arguments have been considered but these are not persuasive. The Examiner respectfully disagrees with the arguments of “…amended independent claims reflect an improvement to the architecture and operation of a natural language processing (NLP) model that includes a "neural network" trained using labelled training data, a type of Al model that was the focus of the recent USPTO guidance…” and “…the recited NLP model is able to achieve features not enabled by a conventional model (e.g., indicating the importance of portions of text of a document relative to other portions). This is a clear technical improvement that illustrates how the subject matter of amended independent claims is integrated into a practical application…” Please see detailed analysis below for more details on how the Examiner understands the independent claims do not recite additional elements that integrate the judicial exception into a practical application. Hence, not qualifying as patent eligible subject matter under 35 U.S.C. § 101. Please refer to MPEP 2106.04(1): Eligibility Step 2A: Whether a Claim is Directed to a Judicial Exception: Prong One. “Prong One asks does the claim recite an abstract idea, law of nature, or natural phenomenon? In Prong One examiners evaluate whether the claim recites a judicial exception, i.e. whether a law of nature, natural phenomenon, or abstract idea is set forth or described in the claim. While the terms "set forth" and "described" are thus both equated with "recite", their different language is intended to indicate that there are two ways in which an exception can be recited in a claim. For instance, the claims in Diehr, 450 U.S. at 178 n. 2, 179 n.5, 191-92, 209 USPQ at 4-5 (1981), clearly stated a mathematical equation in the repetitively calculating step, and the claims in Mayo, 566 U.S. 66, 75-77, 101 USPQ2d 1961, 1967-68 (2012), clearly stated laws of nature in the wherein clause, such that the claims "set forth" an identifiable judicial exception. Alternatively, the claims in Alice Corp., 573 U.S. at 218, 110 USPQ2d at 1982, described the concept of intermediated settlement without ever explicitly using the words "intermediated" or "settlement."” “An example of a claim that recites a judicial exception is "A machine comprising elements that operate in accordance with F=ma." This claim sets forth the principle that force equals mass times acceleration (F=ma) and therefore recites a law of nature exception. Because F=ma represents a mathematical formula, the claim could alternatively be considered as reciting an abstract idea. Because this claim recites a judicial exception, it requires further analysis in Prong Two in order to answer the Step 2A inquiry. An example of a claim that merely involves, or is based on, an exception is a claim to "A teeter-totter comprising an elongated member pivotably attached to a base member, having seats and handles attached at opposing sides of the elongated member." This claim is based on the concept of a lever pivoting on a fulcrum, which involves the natural principles of mechanical advantage and the law of the lever. However, this claim does not recite these natural principles and therefore is not directed to a judicial exception (Step 2A: NO). Thus, the claim is eligible at Pathway B without further analysis.” From this analysis, in Step 2A, Prong One, the Examiner has evaluated the independent claims accordingly and determined that the amended independent claims as drafted indeed describe a judicial exception (i.e., an abstract idea), which represent a mental process (which can be performed by a human mentally and/or with pen and paper). More specifically, similar to what was discussed in the Final Rejection mailed on 11/05/2025: The limitations of independent claims 1, 17, and 20 as drafted cover a human (mental process), these recite(s): 1. A computer-implemented method comprising: receiving, by one or more processors of a document processing system, imaging data of one or more documents, wherein the one or more documents include medical records; extracting, by the one or more processors and utilizing an optical character recognition (OCR) engine of the document processing system, text from the imaging data of one or more documents; determining, by the one or more processors and utilizing a natural language processing (NLP) model of the document processing system, one or more attention scores for one or more tokens in the one or more documents, wherein each of the one or more tokens represents a respective portions of extracted text, wherein the NLP model includes a neural network that has been trained to generate the one or more attention scores for the one or more tokens based on training text and labels applied to the training text that are indicative of an importance of portions of the training text; aggregating, by the one or more processors, the one or more tokens into clusters based on the one or more attention scores; and determining, by the one or more processors and utilizing the NLP model, an aggregate attention score for each of the clusters, wherein individual aggregate attention scores indicate an importance of one or more portions of the text corresponding to one of the clusters relative to other portions of the text corresponding to another of the clusters; causing, by the one or more processors, a graphical user interface to output a presentation of the text that visually indicates the importance of the one or more portions of the text relative to the other portions of the text, respectively. This reads on a human (e.g., mentally and/or using pen and paper): Receiving a document (e.g., printed documents (i.e., medical records)); Extracting text from said document; Predicting and assigning values based on a predetermined set of steps or rules to the words extracted in the text from the document, wherein the predetermined set of rules are set based on learning from previous predefined rules, and wherein the set of rules are learned to include the computation of scores for each of the words based on pre-known/predefined data (i.e., text/labels) associated to indications of importance of words in text; Organizing the words based on the predictions/values above to create sentence(s); Determining a score for the words based on predetermined set of steps or rules, wherein the score is associated with the importance of the word(s) in the text groups; Writing down (e.g., on paper) said sentences along with the scores for display. 17. A system comprising: one or more processors; and at least one non-transitory computer readable medium storing instructions which, when executed by the one or more processors, cause the one or more processors to perform operations comprising: receiving one or more documents, wherein the one or more documents include medical records; extracting, utilizing an optical character recognition (OCR) engine, text from the one or more documents; aggregating the one or more tokens into clusters based on the one or more attention scores; determining, utilizing a natural language processing (NLP) model, one or more attention scores for one or more tokens of the one or more documents, wherein each of the one or more tokens represents a respective portion of the extracted text, wherein the NLP model includes a neural network that has been trained to generate the one or more attention scores for the one or more tokens based on training text and labels applied to the training text that are indicative of an importance of portions of the training text; determining, utilizing the NLP model, an aggregate attention score for each of the clusters, wherein individual aggregate attention scores indicate an importance of one or more portions of the text corresponding to one of the clusters relative to other portions of the text corresponding to another one of the clusters; retraining the NLP model using the determined aggregate attention scores and the clusters as labeled training data, such that the NLP model learns to interpret importance of different portions of input text relative to the other portions of the input text, respectively. This reads on a human (e.g., mentally and/or using pen and paper): Receiving a document (e.g., printed documents (i.e., medical records)); Extracting text from said document; Predicting and assigning values based on a predetermined set of steps or rules to the words extracted in the text from the document, wherein the set of rules are learned to include the computation of scores for each of the words based on pre-known/predefined data (i.e., text/labels) associated to indications of importance of words in text; Organizing the words based on the predictions/values above to create sentence(s); Determining a score for the words based on predetermined set of steps or rules, wherein the score is associated with the importance of the word(s) in the text groups; Redefining the predetermined set of steps or rules for future use. 20. A non-transitory computer readable medium storing instructions which, when executed by one or more processors, cause the one or more processors to perform operations comprising: receiving one or more documents, wherein the one or more documents include medical records; extracting, utilizing an optical character recognition (OCR) engine, text from the one or more documents; determining, utilizing a natural language processing (NLP) model, one or more attention scores for one or more tokens of the one or more documents, wherein each of the one or more tokens represents a respective portion of the extracted text, wherein the NLP model includes a neural network that has been trained to generate the one or more attention scores for the one or more tokens based on training text and labels applied to the training text that are indicative of an importance of portions of the training text; aggregating the one or more tokens into clusters based on the one or more attention scores to construct sentences; determining, utilizing the NLP model, an aggregate attention score for each of the clusters, wherein individual aggregate attention scores indicateone or more portions of the text corresponding to one of the clusters relative to other portions of the text corresponding to another one of the clusters; filtering at least a portion of the clusters based on a predetermined threshold value; extracting the filtered clusters as structured data, and storing the structured data in a database; receiving, via a graphical user interface, a query to the database that corresponds to a cluster stored in the database; and based on the query, causing the graphical user interface to display imaging data associated with the one or more documents that include the cluster. This reads on a human (e.g., mentally and/or using pen and paper): Receiving a document (e.g., printed documents (i.e., medical records)); Extracting text from said document; Predicting and assigning values based on a predetermined set of steps or rules to the words extracted in the text from the document, wherein the set of rules are learned to include the computation of scores for each of the words based on pre-known/predefined data (i.e., text/labels) associated to indications of importance of words in text; Organizing the words based on the predictions/values above to create sentence(s); Determining a score for the words based on predetermined set of steps or rules, wherein the score is associated with the importance of the word(s) in the text groups; Redefining the predetermined set of steps or rules to extract the text based on a predetermined threshold value; Writing down on a piece of paper the extracted text based on the threshold above; Receive a request for specific word(s) Writing down (e.g., on paper) said word(s) for display Please also refer to MPEP 2106.05(f)(2): Whether the claim invokes computers or other machinery merely as a tool to perform an existing process, and MPEP 2106.06(b): Clear Improvement to a Technology or to Computer Functionality. Please refer to MPEP 2106.04(2): Eligibility Step 2A: Whether a Claim is Directed to a Judicial Exception: Prong Two. “Prong Two asks does the claim recite additional elements that integrate the judicial exception into a practical application? In Prong Two, examiners evaluate whether the claim as a whole integrates the exception into a practical application of that exception. If the additional elements in the claim integrate the recited exception into a practical application of the exception, then the claim is not directed to the judicial exception (Step 2A: NO) and thus is eligible at Pathway B. This concludes the eligibility analysis. If, however, the additional elements do not integrate the exception into a practical application, then the claim is directed to the recited judicial exception (Step 2A: YES), and requires further analysis under Step 2B (where it may still be eligible if it amounts to an ‘‘inventive concept’’). For more information on how to evaluate whether a judicial exception is integrated into a practical application, see MPEP § 2106.04(d)(2).” From this analysis, in Step 2A, Prong Two, the Examiner has evaluated the independent claims accordingly and determined that the amended independent claims as drafted that the claims as a whole do not include additional elements that integrate the exception into a practical application of that exception. (i.e., an abstract idea). As discussed in the Final Rejection mailed on 11/05/2025: This judicial exception is not integrated into a practical application because for example: claims 1, 17, and 20 recite “one or more processors”, “an OCR engine”, “an NLP model”, “a graphical user interface”, “a device”, and “a neural network… trained… on training text” while claims 17 and 20 additionally recite “a non-transitory computer readable medium.” As an example, in ¶ [0081] of the as filed specification, it is disclosed that “The computer system 900 includes a memory 904 that communicates via bus 908. Memory 904 is a main memory, a static memory, or a dynamic memory. Memory 904 includes, but is not limited to computer-readable storage media such as various types of volatile and non-volatile storage media, including but not limited to random access memory, read-only memory, programmable read-only memory, electrically programmable read-only memory, electrically erasable read-only memory, flash memory, magnetic tape or disk, optical media and the like. In one implementation, the memory 904 includes a cache or random-access memory for the processor 902. In alternative implementations, the memory 904 is separate from the processor 902, such as a cache memory of a processor, the system memory, or other memory. Memory 904 is an external storage device or database for storing data. Examples include a hard drive, compact disc (“CD”), digital video disc (“DVD”), memory card, memory stick, floppy disc, universal serial bus (“USB”) memory device, or any other device operative to store data. The memory 904 is operable to store instructions executable by the processor 902...”. Therefore, a general-purpose computer or computing device is described and mainly used as an application thereof. Accordingly, these additional elements do not integrate the abstract idea into a practical idea because it does not impose any meaningful limits on practicing the abstract idea. Please also refer to MPEP 2106.05(f)(2): Whether the claim invokes computers or other machinery merely as a tool to perform an existing process. Finally, please refer to MPEP 2106.05(A): Relevant Considerations For Evaluating Whether Additional Elements Amount To An Inventive Concept “Limitations that the courts have found not to be enough to qualify as "significantly more" when recited in a claim with a judicial exception include: i. Adding the words "apply it" (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, e.g., a limitation indicating that a particular function such as creating and maintaining electronic records is performed by a computer, as discussed in Alice Corp., 573 U.S. at 225-26, 110 USPQ2d at 1984 (see MPEP § 2106.05(f)); ii. Simply appending well-understood, routine, conventional activities previously known to the industry, specified at a high level of generality, to the judicial exception, e.g., a claim to an abstract idea requiring no more than a generic computer to perform generic computer functions that are well-understood, routine and conventional activities previously known to the industry, as discussed in Alice Corp., 573 U.S. at 225, 110 USPQ2d at 1984 (see MPEP § 2106.05(d));” From this analysis, in Step 2B, the Examiner has evaluated the independent claims accordingly and determined that the independent claims as drafted have limitations that the courts have found not to be enough to qualify as "significantly more" when recited in a claim with a judicial exception. Similar to what was discussed in the Final Rejection mailed on 11/05/2025: The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional elements of using a computer is listed as a general computing device as noted. The claim is not patent eligible. Lastly, the Examiner refers the Applicant to MPEP 2106.05(a): “It is important to note that in order for a method claim to improve computer functionality, the broadest reasonable interpretation of the claim must be limited to computer implementation. That is, a claim whose entire scope can be performed mentally, cannot be said to improve computer technology. Synopsys, Inc. v. Mentor Graphics Corp., 839 F.3d 1138, 120 USPQ2d 1473 (Fed. Cir. 2016) (a method of translating a logic circuit into a hardware component description of a logic circuit was found to be ineligible because the method did not employ a computer and a skilled artisan could perform all the steps mentally). Similarly, a claimed process covering embodiments that can be performed on a computer, as well as embodiments that can be practiced verbally or with a telephone, cannot improve computer technology. See RecogniCorp, LLC v. Nintendo Co., 855 F.3d 1322, 1328, 122 USPQ2d 1377, 1381 (Fed. Cir. 2017) (process for encoding/decoding facial data using image codes assigned to particular facial features held ineligible because the process did not require a computer).” (Emphasis added) In summary, the Examiner respectfully disagrees with the arguments above. For more details, please refer to updated 35 U.S.C. § 101 rejections for claims 1, 17 and 20, below. 35 USC § 103 rejection(s) Arguments in pages 12-14 of the Remarks filed on 01/05/2026. Examiner’s Response to Arguments: Applicant’s arguments and amendments with respect to claim(s) 1, 17, and 20 rejected under 35 U.S.C. § 103 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. Therefore, the rejection has been withdrawn. However, upon further consideration, a new ground(s) of rejection is made in view of Independent claim 1: Scripka et al. (US 20230017211 A1) and further in view of Blanco et al. (Blanco, Alberto, et al. "Implementation of specialised attention mechanisms: ICD-10 classification of Gastrointestinal discharge summaries in English, Spanish and Swedish." Journal of Biomedical Informatics 130 (2022): 104050. https://www.sciencedirect.com/science/article/pii/S1532046422000661). Independent claim 17: Scripka et al. (US 20230017211 A1) and further in view of Blanco et al. (Blanco, Alberto, et al. "Implementation of specialised attention mechanisms: ICD-10 classification of Gastrointestinal discharge summaries in English, Spanish and Swedish." Journal of Biomedical Informatics 130 (2022): 104050. https://www.sciencedirect.com/science/article/pii/S1532046422000661) and Lucas et al. (pub. date 2019) (US 10395772 B1). Independent claim 20: Scripka et al. (US 20230017211 A1) and further in view of Blanco et al. (Blanco, Alberto, et al. "Implementation of specialised attention mechanisms: ICD-10 classification of Gastrointestinal discharge summaries in English, Spanish and Swedish." Journal of Biomedical Informatics 130 (2022): 104050. https://www.sciencedirect.com/science/article/pii/S1532046422000661) and Gullapudi et al. (US 20240177053 A1). For more details, please refer to updated 35 U.S.C. § 103 rejections for claims 1, 17, and 20, below. Claim Rejections - 35 USC § 112 The following is a quotation of 35 U.S.C. 112(b): (b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention. The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph: The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention. Claim 18 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention. Claim 18 recites the limitation "the relative importance of one or more portions of the text…" in line 11 of the claim. There is insufficient antecedent basis for this limitation in the claim. The Examiner suggests modifying the claim language of independent claim 17 to “a relative importance of one or more portions of the text…” Claim Objections Claim 18 objected to because of the following informalities: the limitation of “…the relative importance of one or more portions of the text.” should read “…the relative importance of the one or more portions of the text corresponding to the one of the clusters.” The Examiner notes that this modification is suggested based on, and in addition to, the language modification being suggested to independent claim 17 under the 35 USC § 112 rejection, above. Appropriate correction is required. Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims 1-11 and 13-20 rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. More specifically directed to the abstract idea grouping of: mental process. The independent claim(s) 1, 17, and 20 recite(s): 1. A computer-implemented method comprising: receiving, by one or more processors of a document processing system, imaging data of one or more documents, wherein the one or more documents include medical records; extracting, by the one or more processors and utilizing an optical character recognition (OCR) engine of the document processing system, text from the imaging data of one or more documents; determining, by the one or more processors and utilizing a natural language processing (NLP) model of the document processing system, one or more attention scores for one or more tokens in the one or more documents, wherein each of the one or more tokens represents a respective portions of extracted text, wherein the NLP model includes a neural network that has been trained to generate the one or more attention scores for the one or more tokens based on training text and labels applied to the training text that are indicative of an importance of portions of the training text; aggregating, by the one or more processors, the one or more tokens into clusters based on the one or more attention scores; and determining, by the one or more processors and utilizing the NLP model, an aggregate attention score for each of the clusters, wherein individual aggregate attention scores indicate an importance of one or more portions of the text corresponding to one of the clusters relative to other portions of the text corresponding to another of the clusters; causing, by the one or more processors, a graphical user interface to output a presentation of the text that visually indicates the importance of the one or more portions of the text relative to the other portions of the text, respectively. This reads on a human (e.g., mentally and/or using pen and paper): Receiving a document (e.g., printed documents (i.e., medical records)); Extracting text from said document; Predicting and assigning values based on a predetermined set of steps or rules to the words extracted in the text from the document, wherein the predetermined set of rules are set based on learning from previous predefined rules, and wherein the set of rules are learned to include the computation of scores for each of the words based on pre-known/predefined data (i.e., text/labels) associated to indications of importance of words in text; Organizing the words based on the predictions/values above to create sentence(s); Determining a score for the words based on predetermined set of steps or rules, wherein the score is associated with the importance of the word(s) in the text groups; Writing down (e.g., on paper) said sentences along with the scores for display. 17. A system comprising: one or more processors; and at least one non-transitory computer readable medium storing instructions which, when executed by the one or more processors, cause the one or more processors to perform operations comprising: receiving one or more documents, wherein the one or more documents include medical records; extracting, utilizing an optical character recognition (OCR) engine, text from the one or more documents; aggregating the one or more tokens into clusters based on the one or more attention scores; determining, utilizing a natural language processing (NLP) model, one or more attention scores for one or more tokens of the one or more documents, wherein each of the one or more tokens represents a respective portion of the extracted text, wherein the NLP model includes a neural network that has been trained to generate the one or more attention scores for the one or more tokens based on training text and labels applied to the training text that are indicative of an importance of portions of the training text; determining, utilizing the NLP model, an aggregate attention score for each of the clusters, wherein individual aggregate attention scores indicate an importance of one or more portions of the text corresponding to one of the clusters relative to other portions of the text corresponding to another one of the clusters; retraining the NLP model using the determined aggregate attention scores and the clusters as labeled training data, such that the NLP model learns to interpret importance of different portions of input text relative to the other portions of the input text, respectively. This reads on a human (e.g., mentally and/or using pen and paper): Receiving a document (e.g., printed documents (i.e., medical records)); Extracting text from said document; Predicting and assigning values based on a predetermined set of steps or rules to the words extracted in the text from the document, wherein the set of rules are learned to include the computation of scores for each of the words based on pre-known/predefined data (i.e., text/labels) associated to indications of importance of words in text; Organizing the words based on the predictions/values above to create sentence(s); Determining a score for the words based on predetermined set of steps or rules, wherein the score is associated with the importance of the word(s) in the text groups; Redefining the predetermined set of steps or rules for future use. 20. A non-transitory computer readable medium storing instructions which, when executed by one or more processors, cause the one or more processors to perform operations comprising: receiving one or more documents, wherein the one or more documents include medical records; extracting, utilizing an optical character recognition (OCR) engine, text from the one or more documents; determining, utilizing a natural language processing (NLP) model, one or more attention scores for one or more tokens of the one or more documents, wherein each of the one or more tokens represents a respective portion of the extracted text, wherein the NLP model includes a neural network that has been trained to generate the one or more attention scores for the one or more tokens based on training text and labels applied to the training text that are indicative of an importance of portions of the training text; aggregating the one or more tokens into clusters based on the one or more attention scores to construct sentences; determining, utilizing the NLP model, an aggregate attention score for each of the clusters, wherein individual aggregate attention scores indicateone or more portions of the text corresponding to one of the clusters relative to other portions of the text corresponding to another one of the clusters; filtering at least a portion of the clusters based on a predetermined threshold value; extracting the filtered clusters as structured data, and storing the structured data in a database; receiving, via a graphical user interface, a query to the database that corresponds to a cluster stored in the database; and based on the query, causing the graphical user interface to display imaging data associated with the one or more documents that include the cluster. This reads on a human (e.g., mentally and/or using pen and paper): Receiving a document (e.g., printed documents (i.e., medical records)); Extracting text from said document; Predicting and assigning values based on a predetermined set of steps or rules to the words extracted in the text from the document, wherein the set of rules are learned to include the computation of scores for each of the words based on pre-known/predefined data (i.e., text/labels) associated to indications of importance of words in text; Organizing the words based on the predictions/values above to create sentence(s); Determining a score for the words based on predetermined set of steps or rules, wherein the score is associated with the importance of the word(s) in the text groups; Redefining the predetermined set of steps or rules to extract the text based on a predetermined threshold value; Writing down on a piece of paper the extracted text based on the threshold above; Receive a request for specific word(s) Writing down (e.g., on paper) said word(s) for display This judicial exception is not integrated into a practical application because for example: claims 1, 17, and 20 recite “one or more processors”, “an OCR engine”, “an NLP model”, “a graphical user interface”, “a device”, and “a neural network… trained… on training text” while claims 17 and 20 additionally recite “a non-transitory computer readable medium.” As an example, in ¶ [0081] of the as filed specification, it is disclosed that “The computer system 900 includes a memory 904 that communicates via bus 908. Memory 904 is a main memory, a static memory, or a dynamic memory. Memory 904 includes, but is not limited to computer-readable storage media such as various types of volatile and non-volatile storage media, including but not limited to random access memory, read-only memory, programmable read-only memory, electrically programmable read-only memory, electrically erasable read-only memory, flash memory, magnetic tape or disk, optical media and the like. In one implementation, the memory 904 includes a cache or random-access memory for the processor 902. In alternative implementations, the memory 904 is separate from the processor 902, such as a cache memory of a processor, the system memory, or other memory. Memory 904 is an external storage device or database for storing data. Examples include a hard drive, compact disc (“CD”), digital video disc (“DVD”), memory card, memory stick, floppy disc, universal serial bus (“USB”) memory device, or any other device operative to store data. The memory 904 is operable to store instructions executable by the processor 902...”. Therefore, a general-purpose computer or computing device is described and mainly used as an application thereof. Accordingly, these additional elements do not integrate the abstract idea into a practical idea because it does not impose any meaningful limits on practicing the abstract idea. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional elements of using a computer is listed as a general computing device as noted. The claim is not patent eligible. With respect to claim 2, the claim(s) recite: 2. The computer-implemented method of claim 1, further comprising: generating, by the one or more processors utilizing the OCR engine, one or more bounding boxes for recognized words and/or phrases in the one or more documents, wherein the one or more bounding boxes indicate the one or more attention scores. This reads on a human (e.g., mentally and/or using pen and paper): Writing down or drawing, using a predefined set of rules, boxes around words on a document indicating a prediction and a score. The additional limitation of “OCR engine” is present. Same analysis as discussed for independent claims apply. With respect to claim 3, the claim(s) recite: 3. The computer-implemented method of claim 2, wherein the presentation comprises: superimposing, by the one or more processors, the one or more bounding boxes over the recognized words and/or phrases in the one or more documents, wherein the one or more bounding boxes are colored and/or semi-transparent. This reads on a human (e.g., mentally and/or using pen and paper): Writing down (e.g., on paper) said sentences for display comprise writing down or drawing, using a predefined set of rules, boxes around words on a document, wherein the boxes are colored and/or semi-transparent (i.e., highlighted). No additional limitations are present. With respect to claim 4, the claim(s) recite: 4. The computer-implemented method of claim 3, wherein an intensity of the color or transparency of each of the one or more bounding boxes represents a magnitude of the corresponding attention score. This reads on a human (e.g., mentally and/or using pen and paper): Wherein the intensity of the color of the boxes is associated with the corresponding score. No additional limitations are present. With respect to claim 5, the claim(s) recite: 5. The computer-implemented method of claim 1, further comprising: determining, by the one or more processors, one or more intervals to form the clusters around one or more tokens with high attention scores by utilizing an expanding window technique, wherein the one or more tokens with high attention scores are clustered based, at least in part, on a task-based parameter that indicates a quantity of data sought during processing of the one or more documents. This reads on a human (e.g., mentally and/or using pen and paper): Grouping the words based on a predefined set of rules and based on their scores, wherein the words with higher score are grouped together based on predefined parameters. No additional limitations are present. With respect to claim 6, the claim(s) recite: 6. The computer-implemented method of claim 5, wherein the one or more intervals are positioned around the one or more tokens with high attention scores, and wherein overlapping intervals are merged. This reads on a human (e.g., mentally and/or using pen and paper): Further using/defining the predefined set of rules regarding the words with high scores. No additional limitations are present. With respect to claim 7, the claim(s) recite: 7. The computer-implemented method of claim 5, further comprising: determining, by the one or more processors, an unnormalized aggregated attention score for each cluster by summing the high attention scores within the cluster; and determining, by the one or more processors, a normalized aggregated attention score for each cluster based on a softmax function. This reads on a human (e.g., mentally and/or using pen and paper): Determining unnormalized scores by summing (i.e., mathematical concept) scores within a certain group/section; Determining normalized scores based on predetermined set of rules/functions. No additional limitations are present. With respect to claim 8, the claim(s) recite: 8. The computer-implemented method of claim 1, further comprising: determining, by the one or more processors, labelled data upon processing of the one or more documents to train or update the NLP model. This reads on a human (e.g., mentally and/or using pen and paper): Determining labels for data or text in the documents to further define a predetermined set of rules. No additional limitations are present. With respect to claim 9, the claim(s) recite: 9. The computer-implemented method of claim 1, wherein the imaging data of the one or more documents includes scanned images of typed and/or handwritten text. This reads on a human (e.g., mentally and/or using pen and paper): Reading a document with typed or handwritten text. No additional limitations are present. With respect to claim 10, the claim(s) recite: 10. The computer-implemented method of claim 9, wherein the scanned images are in a portable document format. This reads on a human (e.g., mentally and/or using pen and paper): Reading a document with typed or handwritten text (e.g., pdf format) No additional limitations are present. With respect to claim 11, the claim(s) recite: 11. The computer-implemented method of claim 1, wherein the NLP model includes at least one of an attention-based model, a rule-based model, or a statistical model. This reads on a human (e.g., mentally and/or using pen and paper): Further predefining the predetermined set of rules. No additional limitations are present. With respect to claim 13, the claim(s) recite: 13. The computer-implemented method of claim 1, wherein the NLP model performs at least one of text classification, named entity recognition, or entity linking on the one or more documents. This reads on a human (e.g., mentally and/or using pen and paper): Further predefining the predetermined set of rules. No additional limitations are present. With respect to claim 14, the claim(s) recite: 14. The computer-implemented method of claim 1, further comprising: determining, by the one or more processors, a threshold value for the attention scores; and filtering, by the one or more processors, at least a portion of the one or more tokens based on the threshold value. This reads on a human (e.g., mentally and/or using pen and paper): Determining a threshold value for the scores, Filtering or comparing the words with said threshold value. No additional limitations are present. With respect to claim 15, the claim(s) recite: 15. The computer-implemented method of claim 14, further comprising: determining, by the one or more processors, a threshold value for the aggregate attention scores; and filtering, by the one or more processors, at least a portion of the clusters based on the threshold value. This reads on a human (e.g., mentally and/or using pen and paper): Redefining the predetermined set of steps or rules based on a predetermined threshold value and writing it down Writing down (e.g., on paper) said word(s) for display based on the threshold No additional limitations are present. With respect to claim 16, the claim(s) recite: 16. The computer-implemented method of claim 1, wherein the extracted text includes words and locations of the words within the one or more documents. This reads on a human (e.g., mentally and/or using pen and paper): When extracting text/words, including the locations of said words. No additional limitations are present. With respect to claim 18, the claim(s) recite: 18. The system of claim 17, further comprising: generating, utilizing the OCR engine, one or more bounding boxes for recognized words and/or phrases in the one or more documents, wherein the one or more bounding boxes indicate the one or more predictions and attention scores; and superimposing the one or more bounding boxes over the recognized words and/or phrases in the one or more documents, wherein the one or more bounding boxes are colored and/or semi-transparent, wherein an intensity of the color or transparency of each of the one or more bounding boxes represents a magnitude of the corresponding attention score and causing a graphical user interface to output a presentation of the text that visually indicates the relative importance of one or more portions of the text. This reads on a human (e.g., mentally and/or using pen and paper): Writing down or drawing, using a predefined set of rules, boxes around words on a document indicating a prediction and a score Writing down (e.g., on paper) said sentences for display comprise writing down or drawing, using a predefined set of rules, boxes around words on a document, wherein the boxes are colored and/or semi-transparent (i.e., highlighted), wherein the intensity of the color of the boxes is associated with the corresponding score. Writing down (e.g., on paper) said word(s) for display No additional limitations are present. With respect to claim 19, the claim(s) recite: 19. The system of claim 17, further comprising: determining one or more intervals to form the clusters around the one or more tokens with high attention scores by utilizing an expanding window technique, wherein the one or more tokens with high attention scores are clustered based, at least in part, on a task-based parameter that indicates a quantity of data sought during processing of the one or more documents, wherein the one or more intervals are positioned around the one or more tokens with high attention scores, and wherein overlapping intervals are merged. This reads on a human (e.g., mentally and/or using pen and paper): Grouping the words based on a predefined set of rules and based on their scores, wherein the words with higher score are grouped together based on predefined parameters. Further using/defining the predefined set of rules regarding the words with high scores. No additional limitations are present. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1, 9, 11, and 13, is/are rejected under 35 U.S.C. 103 as being unpatentable over Scripka et al. (US 20230017211 A1) and further in view of Blanco et al. (Blanco, Alberto, et al. "Implementation of specialised attention mechanisms: ICD-10 classification of Gastrointestinal discharge summaries in English, Spanish and Swedish." Journal of Biomedical Informatics 130 (2022): 104050. https://www.sciencedirect.com/science/article/pii/S1532046422000661). As to independent claim 1, Scripka et al. teaches: 1. A computer-implemented method (see Fig. 1 and ¶ [0004]: “The present invention relates to systems and methods for automated analysis of structured and unstructured data, and more particularly to analysis of medical images and records in various formats and structures from disparate sources and providers.” and ¶ [0008]: “According to one embodiment, the invention relates to a computer-implemented method for analysis of structured and unstructured data to provide answers to specific medical-related questions…”) comprising: receiving, by one or more processors of a document processing system, imaging data of one or more documents, wherein the one or more documents include medical records (see Fig. 1 (110: source documents (e.g., health records) and 120: ingest and OCR) and ¶ [0004] citations as in limitation above and further ¶ [0018]: “According to an embodiment, the invention relates to an automated system and method for analysis of structured and unstructured data relating to medical records… The System may include foundational capabilities such as document ingestion and optical character recognition (OCR), e.g., the ability to take documents and convert them into formats readable by a machine to perform analytics…” and ¶ [0062]: “The processes and logic flows described in this document can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output…”); extracting, by the one or more processors and utilizing an optical character recognition (OCR) engine of the document processing system, , text from the imaging data of one or more documents (see Fig. 1 (110: source documents (e.g., health records) and 120: ingest and OCR) and ¶ [0004, 0008, 0018, and 0062] citations as in limitation above and further ¶ [0018]: “According to an embodiment, the invention relates to an automated system and method for analysis of structured and unstructured data relating to medical records. The analysis system (sometimes referred to herein as the “System”) may include a portfolio of artificial intelligence capabilities, including artificial intelligence domain expertise and related technology components. The System may include foundational capabilities such as document ingestion and optical character recognition (OCR), e.g., the ability to take documents and convert them into formats readable by a machine to perform analytics…” and ¶ [0039-0040]: “[0039] At step 210, document sourcing and ingestion may be performed. Documents may be ingested and processed according to the supported file types. This may further involve capturing metadata and identifying document characteristics. [0040] At step 212, optical character recognition may be performed. An Optical Character Recognition component may be applied to the documents. This may involve extracting text content and spatial location in the document.”); aggregating, by the one or more processors, the one or more tokens into clusters (see Fig. 1 (110: source documents (e.g., health records), 120: ingest and OCR, 130: text spatial analysis & computer vision and 132: NLP) and ¶ [0004, 0008, 0018, and , 0039-0041, 0062] citations as in limitation(s) above and further : ¶ [0034-0036]: “[0034] …Text spatial analysis and clustering may refer to a combination of computer vision and NLP approaches to structure/segment an image of a medical record into semantically useful regions. For example, these approaches may be used to identify a “Patient History” section of a medical record based textual and visual indicators, regardless of the exact format and presentation of that content in the medical record. [0035] Key-value pair identification may be performed at 140. Key-value pair identification may involve linking data items where a key is used as a unique identifier for a corresponding value. This may be accomplished through SME (subject matter expert) Medical Ontology 142, Text Classification 144 and Deep Learning Extraction Algorithms 146. Examples of key-value pair identification and extraction in medical records may include extracting patient information including demographics, medical history, current status, and/or other data when presented in a form/table structure with implicit pairs of such as information type and answer, e.g., Patient Name: John Doe. Another example may include parsing of administrative information about a medical record such as providing health care professional and location, visit date, scheduled follow-up appointments, e.g., Date of Visit: Oct. 10, 2010. Other examples may involve parsing of assessment/treatment related information when presented in a form/table structure with implicit pairs of information type and answer, e.g., Patient Status: Full temporary disability. [0036] Data may be outputted in various formats including structured data output 150. This may involve standardization of data fields 152 and a structured output 154. Structured data output 150 may be communicated to recipients, receiving platforms and/or other systems via a communication network and/or other modes of communication.”, ¶ [0042]: “At step 216, a clustering of related and nearby text may be performed. Document texts may be grouped into categories and semantic groups based on similarity measures in order to inform the Machine Learning (ML) models training. Similarity measures may refer to various methods of comparing text content for semantic/topic similarity based on a desired domain and use-case. For example, an embodiment of the present invention may determine if a given paragraph from a medical record contains a description of a patient's injury by comparing it to other known examples of patient history texts from other medical records.”, and ¶ [0048]: “At step 228, structured output may be generated and provided. Standardized data may be exported into structured outputs for viewing and consumption by downstream users and applications. The structured output may be transmitted and/or communicated to various recipients through an interactive interface. According to another example, the structured output may be compiled into a report and/or other end product. In addition, the structured output may be transmitted to a receiving system for additional analysis and processing. The interactive interface may be supported by various browsers, applications, systems, mobile devices, etc.”); However, Scripka et al. does not explicitly teach, but Blanco et al. does teach: determining, by the one or more processors and utilizing a natural language processing (NLP) model of the document processing system, one or more attention scores for one or more tokens of the one or more documents (see Fig. 4 (Aggregated per-label attention on HER from MIMIC dataset), ¶ 5 of Introduction: “…To that end, we have implemented a per-label attention mechanism on our multi-label classification head that allows the model to set a relevance to each token towards the prediction of each label independently…” ¶ 6 of 2. Related work: “Nonetheless, the Transformer architecture, the Language Models in general and BERT in particular [5], were proven successful in a variety of NLP downstream tasks [26]. BERT models were proven superior to RNNs as the Language Model captures, efficiently, contextual traits and nuances of contexts by contrast to RNNs [26,27]. Regarding clinical NLP, in [28], the Biomedical Language Understanding Evaluation (BLUE) benchmark was introduced and proved that the BERT models outperformed most state-of-the-art models. We chose Transformer based models as the general trend suggests the worth of the Transformer architecture and because it is the architecture with the most growth projection.” ¶ 2 of 4. Methods: “A hidden document representation is computed based on the input text X and this is fed to the Language Model (LM), a core part of the BERT model (presented in Section 4.1). Finally the predictions are computed with a regular dense layer that takes a document representation obtained from the per-label attention module. The implementation of the per-label attention mechanism developed for this work is released with this article in an attempt to promote reproducible research.” ¶ 2 of 5. Results: “… The attention mechanism allows focusing on the most relevant words that motivated the predictions while also aiding to reveal the inner workings the underlying model.” ¶ 6.1. Attention visualization: “The attention mechanism follows a per-label attention strategy, generating particular attention weights for each token and class tuple. Nonetheless, regarding the visualization and for the sake of simplicity, we show an aggregated version. Specifically, the maximum for each label attention is displayed for each word. Therefore, if a word gets a high attention score on the visualization, then it means that the attention for at least one of the labels was that high, but not necessarily for several or all of them. Anyhow, the per-label attention weights are also produced; if a use-case requires inspecting them separately, doing it would be straightforward. Additionally, we have included a masking strategy inside the model to make the padding tokens (if any) not get any attention, as they are worthless towards the prediction…”), wherein each of the one or more tokens represents a respective portion of the extracted text (see ¶ 5 of Introduction, ¶ 6 of 2. Related work, ¶ 2 of 4. Methods, ¶ 2 of 5. Results, and ¶ 6.1. Attention visualization citations as in limitation(s) above. More specifically: ¶ 6.1. Attention visualization: “…maximum for each label attention is displayed for each word...” and Fig. 4 (Aggregated per-label attention on HER from MIMIC dataset)); wherein the NLP model includes a neural network that has been trained to generate the one or more attention scores for the one or more tokens based on training text and labels applied to the training text that are indicative of an importance of portions of the training text (see ¶ 5 of Introduction, ¶ 6 of 2. Related work, ¶ 2 of 4. Methods, ¶ 2 of 5. Results, and ¶ 6.1. Attention visualization citations as in limitation(s) above. More specifically: ¶ 6.1. Attention visualization: “…maximum for each label attention is displayed for each word...” and Fig. 4 (Aggregated per-label attention on HER from MIMIC dataset) and further ¶ 1 of 4. Methods: “For this work, we have developed a multi-label text classification model based on the Transformer architecture [26], depicted in Fig. 2. The ICD-10 multi-label classification problem consists of mapping the raw text of an EHR (X) to a subset from the full set of ICD labels, C, with ICI being the total number of ICD codes. As an example, in our case the label set conveys gastro-intestinal diseases, i.e., C = {K209, K388, ... K551}. The Transformer-based neural network model is trained with instances comprising pairs of input (EHR text) and output (ICD codes).); aggregating, by the one or more processors, the one or more tokens into clusters based on the one or more attention scores (see ¶ 5 of Introduction, ¶ 6 of 2. Related work, ¶ 2 of 4. Methods, ¶ 2 of 5. Results, and ¶ 6.1. Attention visualization citations as in limitation(s) above. More specifically: Fig. 4 (Aggregated per-label attention on HER from MIMIC dataset) and further: ¶ 2 of 4.2. PlaBERT: Per-label attention BERT: “Following the computation of the last hidden state from the LM part of the BERT model, the EHR is embedded in the H ∈   R ^(Nxd) matrix, where N is the length of the EHR (number of tokens) and d is the dimension of the internal BERT representation of documents (d = 768 for BERTbAsE models). Next, instead of computing a pooling operation across the document length (N) as in the original proposal [5], our contribution applies a per-label attention mechanism. This enables the model to learn proper relationships between words and each specific label. First, a leamable vector parameter uc, E lhld is used to compute the matrix-vector product with the embedded representation of the document as H^Tuc,. The Softmax function is applied to the resulting vector as in (1), in order to get the attention scores computed as a probability distribution of the relevance that each token conveys towards the prediction of each ICD code. ac, = Softmax(HT uc,) (1) The attention vector a E IJ;l lCl xN is computed for each ICD class C;, where C is the full set of ICD labels. Finally, to combine the per-label attention representations to allow its use as the document representation on the final layer, the matrix product between the document representation and the attention vector is computed as, M = Ha,…” and ¶ 6.1. Attention visualization: “The attention mechanism follows a per-label attention strategy, generating particular attention weights for each token and class tuple. Nonetheless, regarding the visualization and for the sake of simplicity, we show an aggregated version. Specifically, the maximum for each label attention is displayed for each word. Therefore, if a word gets a high attention score on the visualization, then it means that the attention for at least one of the labels was that high, but not necessarily for several or all of them. Anyhow, the per-label attention weights are also produced; if a use-case requires inspecting them separately, doing it would be straightforward. Additionally, we have included a masking strategy inside the model to make the padding tokens (if any) not get any attention, as they are worthless towards the prediction. n. Fig. 4 shows the attention weights computed for an EHR from the MIMIC dataset within the context of the classification task with the Group Labels. It is relevant to note that the attention mechanism does a fair job of focusing its attention upon diseases, symptoms, procedures and drugs, which are indeed relevant issues for ICD classification. […] Our hypothesis is that the attention could achieve this level of specificity by virtue of the per-label attention mechanism that allows fine-grained attention, independent of the number of labels. It is important to additionally point out that the attention mechanism seems to develop Natural Language Understanding capabilities, as a negation detection, and an “implication detection” ability…”); determining, by the one or more processors and utilizing the NLP model, an aggregate attention score for each of the clusters (see ¶ 5 of Introduction, ¶ 6 of 2. Related work, ¶ 2 of 4. Methods, ¶ 2 of 5. Results, and ¶ 6.1. Attention visualization citations as in limitation(s) above. More specifically: ¶ 6.1. Attention visualization: “The attention mechanism follows a per-label attention strategy, generating particular attention weights for each token and class tuple. Nonetheless, regarding the visualization and for the sake of simplicity, we show an aggregated version…” ), wherein individual aggregate attention scores indicate an importance of one or more portions of the text corresponding to one of the clusters relative to other portions of the text corresponding to another one of the clusters (see ¶ 5 of Introduction, ¶ 6 of 2. Related work, ¶ 2 of 4. Methods, ¶ 2 of 5. Results, and ¶ 6.1. Attention visualization citations as in limitation(s) above. More specifically: ¶ 6.1. Attention visualization: “The attention mechanism follows a per-label attention strategy, generating particular attention weights for each token and class tuple. Nonetheless, regarding the visualization and for the sake of simplicity, we show an aggregated version. Specifically, the maximum for each label attention is displayed for each word. Therefore, if a word gets a high attention score on the visualization, then it means that the attention for at least one of the labels was that high, but not necessarily for several or all of them…); causing, by the one or more processors, a graphical user interface to output a presentation of the text that visually indicates the importance of the one or more portions of the text relative to the other portions of the text, respectively (see ¶ 5 of Introduction, ¶ 6 of 2. Related work, ¶ 2 of 4. Methods, ¶ 2 of 5. Results, and ¶ 6.1. Attention visualization citations as in limitation(s) above. More specifically: Fig. 4 (Aggregated per-label attention on HER from MIMIC dataset) and further: ¶ 4 of Introduction: “…Our software can export the attention weights (per label or aggregated) ready for its visualisation on the NeatVision tool [7].” and ¶ 5 of 6.1. Attention visualization: “Although not tested with physicians, we feel that generating this kind of visualisation together with the label predictions would be helpful to classify notes as a DSS or as an aid for clinical documentation tasks.”). PNG media_image1.png 416 663 media_image1.png Greyscale Scripka et al. and Blanco et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in natural language processing (e.g., text associated with medical records). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Scripka et al. to incorporate the teachings of Blanco et al. of determining, by the one or more processors and utilizing a natural language processing (NLP) model of the document processing system, one or more attention scores for one or more tokens of the one or more documents, wherein each of the one or more tokens represents a respective portion of the extracted text, wherein the NLP model includes a neural network that has been trained to generate the one or more attention scores for the one or more tokens based on training text and labels applied to the training text that are indicative of an importance of portions of the training text; aggregating, by the one or more processors, the one or more tokens into clusters based on the one or more attention scores; determining, by the one or more processors and utilizing the NLP model, an aggregate attention score for each of the clusters, wherein individual aggregate attention scores indicate an importance of one or more portions of the text corresponding to one of the clusters relative to other portions of the text corresponding to another one of the clusters; and causing, by the one or more processors, a graphical user interface to output a presentation of the text that visually indicates the importance of the one or more portions of the text relative to the other portions of the text, respectively which provides the benefit of improving the performance of the models with a reasonable trade-off among extra necessary computation and memory (¶ [conclusion] of Blanco et al.). Regarding claim 9, Scripka et al. in combination with Blanco et al. teach the limitations as in claim 1, above. Scripka et al. further teaches: 9. The computer-implemented method of claim 1, wherein the imaging data of the one or more documents include scanned images of typed and/or handwritten text (see Fig. 3 and ¶ [0033 and 0050]: “[0033] An embodiment of the present invention may then ingest data and perform an optical character recognition process as shown by 120. Data from various source documents and platforms may be ingested and processed. Data may be ingested through a Document Sourcing and Ingestion module 122 and further processed through Optical Character Recognition 124 or other electronic conversion of scanned images and text into machine encoded texts for computations and analysis. [0050] FIG. 3 is an exemplary illustration, according to an embodiment of the present invention. An embodiment of the present invention may access, read and/or interpret a health care form, identify related text, leverage computer vision to group nearby text, and use deep learning models to identify fields from the text. This information may then be used to create a structured standardized output. An embodiment of the present invention may extract information in various formats including Explicit Data Extraction 310, Form Data Extraction 320 and Free Text Extraction 330.”). Regarding claim 11, Scripka et al. in combination with Blanco et al. teach the limitations as in claim 1, above. Blanco et al. further teaches: 11. The computer-implemented method of claim 1, wherein the NLP model includes at least one of an attention-based model, a rule-based model, or a statistical model (see ¶ 5 of Introduction, ¶ 6 of 2. Related work, ¶ 2 of 4. Methods, ¶ 2 of 5. Results, and ¶ 6.1. Attention visualization citations as in limitation(s) above. More specifically: ¶ 6 of 2. Related work: “Nonetheless, the Transformer architecture, the Language Models in general and BERT in particular [5], were proven successful in a variety of NLP downstream tasks [26]. BERT models were proven superior to RNNs as the Language Model captures, efficiently, contextual traits and nuances of contexts by contrast to RNNs [26,27]. Regarding clinical NLP, in [28], the Biomedical Language Understanding Evaluation (BLUE) benchmark was introduced and proved that the BERT models outperformed most state-of-the-art models. We chose Transformer based models as the general trend suggests the worth of the Transformer architecture and because it is the architecture with the most growth projection.” ¶ 6.1. Attention visualization: “The attention mechanism follows a per-label attention strategy, generating particular attention weights for each token and class tuple. Nonetheless, regarding the visualization and for the sake of simplicity, we show an aggregated version. Specifically, the maximum for each label attention is displayed for each word. Therefore, if a word gets a high attention score on the visualization, then it means that the attention for at least one of the labels was that high, but not necessarily for several or all of them…). Scripka et al. and Blanco et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in natural language processing (e.g., text associated with medical records). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Scripka et al. to incorporate the teachings of Blanco et al. of wherein the NLP model includes an attention-based model which provides the benefit of improving the performance of the models with a reasonable trade-off among extra necessary computation and memory (¶ [conclusion] of Blanco et al.). Regarding claim 13, Scripka et al. in combination with Blanco et al. teach the limitations as in claim 1, above. Scripka et al. further teaches: 13. The computer-implemented method of claim 1, wherein the NLP model performs at least one of text classification, named entity recognition, or entity linking on the one or more documents (see ¶ [0041]: “At step 214, natural language processing (NLP) may be performed. The NLP component may parse the document into segments such as phrases, sentences, paragraphs and/or other hierarchical concepts. The NLP component may also detect medical domain specific keywords, entities, phrases, etc.”). Claims 2-3 and 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Scripka et al. (US 20230017211 A1) and further in view of Blanco et al. (Blanco, Alberto, et al. "Implementation of specialised attention mechanisms: ICD-10 classification of Gastrointestinal discharge summaries in English, Spanish and Swedish." Journal of Biomedical Informatics 130 (2022): 104050. https://www.sciencedirect.com/science/article/pii/S1532046422000661) as applied to claim 1, above and further in view of Lucas et al. (pub. date 2019) (US 10395772 B1). Regarding claim 2, Scripka et al. in combination with Blanco et al. teach the limitations as in claim 1, above. However, Scripka et al. in combination with Blanco et al. does not explicitly teach, but Lucas et al. (pub. date 2019) does teach: 2. The computer-implemented method of claim 1, further comprising: generating, by the one or more processors utilizing the OCR engine, one or more bounding boxes for recognized words and/or phrases in the one or more documents (see Fig. 8-9 and ¶ starting at Col. 30, line 29: “(114) Sentence splitting is a function of NLP that may be incorporated to parse sentences into meaningful structures. Documents may arrive in either plaintext format (containing all text from the document) or in a structured OCR format (including the text as well as bounding boxes for every character, word, and sometimes paragraph if the model is capable of identifying paragraph regions)…”, ¶ starting at Col. 33, line 34: “(157) FIG. 8 provides a visual representation of word weightings for a sentence 98 containing “The patient was given Tylenol 50 mg at 11:35 am.” At the word level, “the”, “was”, “given”, and “at”, may be given low weights, “patient” and “11:35 am” may be given medium weights, and “Tylenol” and “50 mg” may be given high weights. As a result, the overall sentence may be classified with a high weight 100 (such as 95%) that medication the patient has taken includes Tylenol 50 mg. For this example, because such a high confidence value is determined, the processing may not need to continue to evaluate other sentences in the document to determine that the patient did indeed take Tylenol 50 mg, but each sentence 102, 104 will be processed to determine if other concepts are identified (such as to identify gender, other medications, other treatments, or demographic information). In this example, even though only the medication concept is given a high weight, each of the identified concept candidates may be retained for the next stage of the intake pipeline for further processing; alternatively, those identified concept candidates may be dropped from the candidate list.” and ¶ starting at Col. 34, line 40: “(160) Turning to FIG. 9, a sequence labeling classifier 106 may provide a “BIO” score for each word, where a BIO score (10, 30, 60) would mean that the associated word is the first word in a multi-word phrase in about 10% of its occurrences in the training set, an intermediary word in a multi-word phrase in about 30% of its occurrences in the training set, and a stand-alone word in about 60% of its occurrences in the training set. For example, the word “the” almost always precedes another word and occasionally is an intermediary word of a multi-word phrase, so may be provided a BIO score 108 of (90, 10, 0). “The” may also be considered an extraneous word, despite almost always preceding other words of importance, so it may be provided a BIO score of (0, 10, 90) to prevent processing, “patient” may be provided a BIO score 110 of (5, 20, 75), and “was” may be provided a BIO score 112 of (0, 0, 100). The sequence labeling model may begin processing the sentence at the first word, “the,” and then note a high incidence of that word being the beginning value of a multi-word phrase (in the first incidence where BIO score is (90, 10, 0)), process the second word “patient” to note a high incidence of being an intermediary or stand-alone word, and process the third word “was” to note a high incidence of being a stand-alone word. By recognizing a potential beginning of a multi-word concept, a potential intermediary of a multi-word concept, and a distinct non-multi-word entry, the sequence labeling model may identify a first multi-word concept. Therefore the sequence labeling model may indicate “the patient” 114 as a likely candidate concept for the multi-word label and “patient” 116 as a likely candidate concept for the stand-alone word label.”), wherein the one or more bounding boxes indicate the one or more attention scores (see Fig. 8-9 and ¶ starting at Col. 30, line 29, ¶ starting at Col. 33, line 34, and ¶ starting at Col. 34, line 40 citations as in limitation above.). Scripka et al., Blanco et al. and Lucas et al. (pub. date 2019) are considered to be analogous to the claimed invention because they are in the same field of endeavor in natural language processing (e.g., text). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Scripka et al. in combination with Blanco et al. to incorporate the teachings of Lucas et al. (pub. date 2019) of generating, by the one or more processors utilizing the OCR engine, one or more bounding boxes for recognized words and/or phrases in the one or more documents, wherein the one or more bounding boxes indicate the one or more predictions and attention scores which provides the benefit of improving the results of the overall processing.(Col. 30, line 28 of Lucas et al. (pub. date 2019)). Regarding claim 3, Scripka et al. in combination with Blanco et al. and Lucas et al. (pub. date 2019) teach the limitations as in claim 2, above. Scripka et al. further teaches: 3. The computer-implemented method of claim 2, wherein the presentation (see Fig. 1 (110: source documents (e.g., health records), 120: ingest and OCR, 130: text spatial analysis & computer vision and 132: NLP) and ¶ [0004, 0008, 0018, and 0034-0036, 0039-0042, 0048, and 0062] citations as in claim 1, above. More specifically: ¶ [0035]: “, e.g., Patient Name: John Doe. […] e.g., Date of Visit: Oct. 10, 2010. […] e.g., Patient Status: Full temporary disability.” and ¶ [0036]: “…Structured data output 150 may be communicated to recipients, receiving platforms and/or other systems via a communication network and/or other modes of communication.”)) Blanco et al. further teaches: [the presentation] comprises: superimposing, by the one or more processors, the one or more bounding boxes over the recognized words and/or phrases in the one or more documents (see ¶ 5 of Introduction, ¶ 6 of 2. Related work, ¶ 2 of 4. Methods, ¶ 2 of 5. Results, and ¶ 6.1. Attention visualization citations as in claim 1 above. More specifically: Fig. 4 (Aggregated per-label attention on HER from MIMIC dataset) and further: ¶ 4 of Introduction: “…Our software can export the attention weights (per label or aggregated) ready for its visualisation on the NeatVision tool [7].” and ¶ 5 of 6.1. Attention visualization: “Although not tested with physicians, we feel that generating this kind of visualisation together with the label predictions would be helpful to classify notes as a DSS or as an aid for clinical documentation tasks.”), wherein the one or more bounding boxes are colored and/or semi-transparent (see ¶ 5 of Introduction, ¶ 6 of 2. Related work, ¶ 2 of 4. Methods, ¶ 2 of 5. Results, and ¶ 6.1. Attention visualization citations as in claim 1 above. More specifically: Fig. 4 (Aggregated per-label attention on HER from MIMIC dataset)). PNG media_image1.png 416 663 media_image1.png Greyscale Scripka et al., Blanco et al. and Lucas et al., (pub. date 2019) and Sajwan are considered to be analogous to the claimed invention because they are in the same field of endeavor in natural language processing (e.g., text). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Scripka et al. in combination Blanco et al. and Lucas et al. (pub. date 2019) to further incorporate the teachings of Blanco et al. of superimposing, by the one or more processors, the one or more bounding boxes over the recognized words and/or phrases in the one or more documents, wherein the one or more bounding boxes are colored and/or semi-transparent which provides the benefit of improving the performance of the models with a reasonable trade-off among extra necessary computation and memory (¶ [conclusion] of Blanco et al.). Regarding claim 8, Scripka et al. in combination with Blanco et al. teaches the limitations as in claim 1, above. However, Scripka et al. in combination with Blanco et al. does not explicitly teach, but Lucas et al. (pub. date 2019) does teach: 8. The computer-implemented method of claim 1, further comprising: determining, by the one or more processors, labelled data upon processing of the one or more documents to train or update the NLP model (see ¶ Col. 46, lines 60-62: “…The Abstraction Engine 158 and Abstraction Engine toolbox 160 components utilize MLA and NLP to generate predictions.” and ¶ starting at Col. 48, line 42: “(236) As mentioned above, the system periodically checks to make sure that the NLP/MLA models being used are most up-to-date (such as elements 76, 78, and 80 from FIG. 5). The system may include a Bootcamp subroutine 166 for evaluating and updating the NLP and MLA models. In this subroutine 166, the system retrieves clinical record documents from the clinical data vault 144, such as based on one or more unique user id's, on clinical features common to one or more patients, or any other criteria. The subroutine also may communicate with the Abstraction Engine S3 database 156 to retrieve the raw OCR files corresponding to each of those documents, as well as the current NLP model. The system further may retrieve abstractor feedback (such as the feedback loop's erroneous result corrections/annotations) from the toolbox 160. Each of these inputs may be used to execute a training script to verify or update the NLP model. At that point, metadata relating to the updated model may be communicated to the Abstraction Engine toolbox database 164 (such as for later human inspection, model or data provenance, and/or long-term metrics). Workbench 148 supports the ability for abstractors to tag Abstraction Engine's 158 incorrect predictions with a predetermined set of issues (such as documents are from wrong patient, OCR errors, wrong entity linked, correct concept candidate, wrong entity linked, correct concept candidate but hypothetical reference in document cannot be construed as haven taken place, etc.). For example, in the case of patients whose predictions are incorrect because ‘Documents are for wrong patient’, the Abstraction Engine Bootcamp 166 may ignore these patients when training future MLAs to understand gender or may instantiate a specific training phase to train the current MLAs to predict which patients have documents from multiple patients and exclude from training and/or flag all patients which have documents from wrong patient for independent abstraction.”). Scripka et al., Blanco et al. and Lucas et al. (pub. date 2019) are considered to be analogous to the claimed invention because they are in the same field of endeavor in natural language processing (e.g., text). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Scripka et al. in combination with Blanco et al. to incorporate the teachings of Lucas et al. (pub. date 2019) of determining, by the one or more processors, labelled data upon processing of the one or more documents to train or update the NLP model which provides the benefit of improving the results of the overall processing.(Col. 30, line 28 of Lucas et al. (pub. date 2019)). Claim 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Scripka et al. (US 20230017211 A1) and further in view of Blanco et al. (Blanco, Alberto, et al. "Implementation of specialised attention mechanisms: ICD-10 classification of Gastrointestinal discharge summaries in English, Spanish and Swedish." Journal of Biomedical Informatics 130 (2022): 104050. https://www.sciencedirect.com/science/article/pii/S1532046422000661) and Lucas et al. (pub. date 2019) (US 10395772 B1) as applied to claim 3, above and further in view of Kadav et al. (US 20190019037 A1). Regarding claim 4, Scripka et al. in combination with Blanco et al., Lucas et al. (pub. date 2019) , and Sajwan teach the limitations as in claim 3, above. However, Scripka et al. in combination with Blanco et al. and Lucas et al. (pub. date 2019) do not explicitly teach, but Kadav et al. does teach: 4. The computer-implemented method of claim 3, wherein an intensity of the color or transparency of each of the one or more bounding boxes represents a magnitude of the corresponding attention score (see ¶ [0033]: “Using LSTM to aggregate a sequence of image representations can result in limited performance since image representations can be similar to each other and, thus, lack temporal variances. Therefore, the outputs of the MLP component 140 are received by the attention component 150 to attend to key image-level representations to summarize the entire video sequence. Generally, attention is performed by generating a linear combination of vectors using attention weights calculated using an attention function. The attention weights can be normalized so that they have a sum of 1 (e.g., applying a SoftMax function).” and ¶ [0056]: “As mentioned, the interaction network described herein can be applied to perform (human) action recognition regarding scenes of a video. The interaction network described herein selectively attends to various regions with relationships and interactions across time. For example, a video frame or scene can have multiple ROIs corresponding to respective bounding box colors. ROIs with the same color can indicate the existence of inter-relationships, and interactions between groups of ROIs can be modeled across different colors. The color of each bounding box can be weighted by the attention generated in accordance with the embodiment described herein. Thus, if some ROIs are not important, they can have smaller weights and/or may not be shown on the corresponding image. The same weights can then be used to set the transparent ratio for each ROI. Accordingly, there is a direct relationship between ROI brightness and ROI importance.”). Scripka et al., Blanco et al., Lucas et al. (pub. date 2019), Sajwan and Kadav et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in natural language processing (e.g., text). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Scripka et al. in combination with Blanco et al. and Lucas et al. (pub. date 2019) and Sajwan to incorporate the teachings of Kadav et al. of wherein an intensity of the color or transparency of each of the one or more bounding boxes represents a magnitude of the corresponding attention score which provides the benefit of provide for improved accuracy and lower computational cost, as compared to conventional approaches ([0025] of Kadav et al.). Claims 5-7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Scripka et al. (US 20230017211 A1) and further in view of Blanco et al. (Blanco, Alberto, et al. "Implementation of specialised attention mechanisms: ICD-10 classification of Gastrointestinal discharge summaries in English, Spanish and Swedish." Journal of Biomedical Informatics 130 (2022): 104050. https://www.sciencedirect.com/science/article/pii/S1532046422000661) as applied to claim 1, above and further in view of Qin et al. ("Hybrid Attention-based Transformer for Long-range Document Classification," 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy, 2022, pp. 1-8, doi: 10.1109/IJCNN55064.2022.9891918. https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9891918). Regarding claim 5, Scripka et al. in combination with Blanco et al. teach the limitations as in claim 1, above. However, Scripka et al. in combination with Blanco et al. do not explicitly teach, but Qin et al. does teach: 5. The computer-implemented method of claim 1, further comprising: determining, by the one or more processors, one or more intervals to form the clusters around one or more tokens with high attention scores by utilizing an expanding window technique (see Fig. 2: “Overview of attention patterns used in our work. (a) Sliding Window Local Attention on a part of attention heads. (b) Clustering-Based Long-Range Attention on the other part. After a long sequence is fed into the model, tokens will be divided into non-overlapping windows according to the size of the sliding window on a part of attention heads in Multi-Head Self-Attention layer. Meanwhile, tokens will be clustered on the basis of clustering algorithm on another part of attention heads. Then two tokens connected by a curve will attend to each other, i.e. attention is restricted within each window or cluster.”, ¶ 2-3 of A. Sliding Window Local Attention: “…After normalization by the softmax function, weights and corresponding value vectors are weighted and summed to get the final attention score Oi… Since tokens adjacent to specific one are usually more likely to be closely related, i.e., local contextual information is of great importance [36] , we retain traditional sliding window attention mechanism to capture local structures in context. As shown in Fig. 2(a), given the window size w, the query at location i only needs to attend to w tokens in the same window, which means the indices of keys range from ⌊ I w⌋×w to (⌊ I w⌋+ 1)×w. Thus, the representation of token at location i in sliding window local attention pattern is [eq. 6]” and ¶ 1 of B. Clustering-Based Long-Range Attention: “Vanilla Transformer [1] chooses to compute the attention be tween each pair of tokens at the cost of quadratic complexity. It is worth noting that two vectors with higher similarity in the vector space usually have larger dot product values and higher attention weights. Therefore, our work introduces clustering algorithms to group vectors with highest similarity together. As shown in Fig. 2(b), we use the clustering algorithm to divide query vectors into several clusters together with key vectors. Theoretically, vectors in the same cluster should have higher similarity. So for the query vector at position i, it only needs to focus on the key vectors belonging to the same cluster…), wherein the one or more tokens with high attention scores are clustered based, at least in part, on a task-based parameter that indicates a quantity of data sought during processing of the one or more documents (see Fig. 2, ¶ 2-3 of A. Sliding Window Local Attention and ¶ 1 of B. Clustering-Based Long-Range Attention citations as in limitation above. More specifically: “¶ 1 of B. Clustering-Based Long-Range Attention: “Vanilla Transformer [1] chooses to compute the attention be tween each pair of tokens at the cost of quadratic complexity. It is worth noting that two vectors with higher similarity in the vector space usually have larger dot product values and higher attention weights. Therefore, our work introduces clustering algorithms to group vectors with highest similarity together. As shown in Fig. 2(b), we use the clustering algorithm to divide query vectors into several clusters together with key vectors. Theoretically, vectors in the same cluster should have higher similarity. So for the query vector at position i, it only needs to focus on the key vectors belonging to the same cluster…”. And further and ¶ 2 of B. Clustering-Based Long-Range Attention: “…We choose k-means [37] and mean shift [38] as the cluster ing methods in this attention patten due to the fact that k-means is simple and efficient with fast convergence, and mean shift is a density-based clustering algorithm. Since the text length of different samples in the dataset varies considerably, we preset a maximum length in the experimental implementation which is discussed thoroughly later…”). Scripka et al., Blanco et al. and Qin et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in natural language processing (e.g., text). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Scripka et al. in combination with Blanco et al. to incorporate the teachings of Qin et al. of determining, by the one or more processors, one or more intervals to form the clusters around one or more tokens with high attention scores by utilizing an expanding window technique, wherein the one or more tokens with high attention scores are clustered based, at least in part, on a task-based parameter that indicates a quantity of data sought during processing of the one or more documents which provides the benefit of best balancing classification effectiveness and memory efficiency, making it easier to input longer sequences under the same hardware conditions (¶ V. Conclusion of Qin et al.). Regarding claim 6, Scripka et al. in combination with Blanco et al. and Qin et al. teach the limitations as in claim 5, above. However, Scripka et al. in combination with Blanco et al. do not explicitly teach, but Qin et al. does teach: 6. The computer-implemented method of claim 5, wherein the one or more intervals are positioned around the one or more tokens with high attention scores, and wherein overlapping intervals are merged (see Fig. 2 and ¶ 2-3 of A. Sliding Window Local Attention and ¶ 1-2 of B. Clustering-Based Long-Range Attention citations as in claim 5, above. More specifically: Fig. 2: “Overview of attention patterns used in our work. (a) Sliding Window Local Attention on a part of attention heads. (b) Clustering-Based Long-Range Attention on the other part. After a long sequence is fed into the model, tokens will be divided into non-overlapping windows according to the size of the sliding window on a part of attention heads in Multi-Head Self-Attention layer. Meanwhile, tokens will be clustered on the basis of clustering algorithm on another part of attention heads. Then two tokens connected by a curve will attend to each other, i.e. attention is restricted within each window or cluster.” and Section under III. HYBRID ATTENTION PATTERN: A. Sliding window local attention (¶ 2-3): “As shown in Fig. 2(a), given the window size w, the query at location i only needs to attend to w tokens in the same window, which means the indices of keys range from ⌊ i w ⌋×w to (⌊ i w ⌋+ 1)×w.”). Scripka et al., Blanco et al. and Qin et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in natural language processing (e.g., text). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Scripka et al. in combination with Blanco et al. to incorporate the teachings of Qin et al. of wherein the one or more intervals are positioned around the one or more tokens with high attention scores, and wherein overlapping intervals are merged which provides the benefit of best balancing classification effectiveness and memory efficiency, making it easier to input longer sequences under the same hardware conditions (¶ V. Conclusion of Qin et al.). Regarding claim 7, Scripka et al. in combination with Blanco et al. teach the limitations as in claim 1, above. Qin et al. further teaches: 7. The computer-implemented method of claim 1, further comprising: determining, by the one or more processors, an unnormalized aggregated attention score for each cluster by summing the high attention scores within the interval (see Fig. 2 and ¶ 2-3 of A. Sliding Window Local Attention and ¶ 1-2 of B. Clustering-Based Long-Range Attention citations as in claim 5, above and further ¶ 1 of A. Sliding Window Local Attention “Assuming that the input of the model is x = (x1, x2, ..., xn), which is a sequence made up of n vectors in a d-dimensional vector space. After three linear projections, for a query vector qi , its correlation with key vectors kj , j ∈ [1, n] will be calculated, and the weight coefficients of value vector vj corresponding to each key vector kj will be obtained: aij = qik T j (3) Ai1, · · · , Ain = softmax (ai1, · · · , ain) (4)”); and determining, by the one or more processors, a normalized aggregated attention score for each cluster based on a softmax function (Fig. 2 and ¶ 2-3 of A. Sliding Window Local Attention and ¶ 1-2 of B. Clustering-Based Long-Range Attention citations as in claim 5, above. More specifically, ¶ 2-3 of A. Sliding Window Local Attention: “…After normalization by the softmax function, weights and corresponding value vectors are weighted and summed to get the final attention score Oi…”). Scripka et al., Blanco et al. and Qin et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in natural language processing (e.g., text). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Scripka et al. in combination with Blanco et al. to incorporate the teachings of Qin et al. of determining, by the one or more processors, an unnormalized aggregated attention score for each interval by summing the high attention scores within the interval and determining, by the one or more processors, a normalized aggregated attention score for each interval based on a softmax function which provides the benefit of best balancing classification effectiveness and memory efficiency, making it easier to input longer sequences under the same hardware conditions (¶ V. Conclusion of Qin et al.). Claim 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Scripka et al. (US 20230017211 A1) and further in view of Blanco et al. (Blanco, Alberto, et al. "Implementation of specialised attention mechanisms: ICD-10 classification of Gastrointestinal discharge summaries in English, Spanish and Swedish." Journal of Biomedical Informatics 130 (2022): 104050. https://www.sciencedirect.com/science/article/pii/S1532046422000661) as applied to claim 9, above and further in view of Lucas et al. (pub. date 2020) (US 20200176098 A1). Regarding claim 10, Scripka et al. in combination with Blanco et al. teaches the limitations as in claim 9, above. However, Scripka et al. in combination with Blanco et al. do not explicitly teach, but Lucas et al. (pub. date 2020) does teach: 10. The computer-implemented method of claim 9, wherein the scanned images are in a portable document format (see ¶ [0098]: “The modularity of each processing stage requires different pre-processing mechanisms for each OCR service/software implemented. For example, different OCR services support some image formats and resolutions for OCR but may not support others. When processing patient records, many document formats included within the record are unsupported, and may require format conversion from the unsupported format to a support format. Exemplary conversions may take documents of a variety of formats (PDF, PNG, JPG, etc.) and convert them to a format that each respective OCR service accepts (e.g., JPG, PNG, etc.).”). Scripka et al., Blanco et al. and Lucas et al. (pub. date 2020) are considered to be analogous to the claimed invention because they are in the same field of endeavor in natural language processing (e.g., text). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Scripka et al. in combination with Blanco et al. to incorporate the teachings of Lucas et al. (pub. date 2020) of wherein the scanned images are in a portable document format which provides the benefit of improving the results of the overall processing ([0117] of Lucas et al. (pub. date 2020)). Claims 14-15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Scripka et al. (US 20230017211 A1) and further in view of Blanco et al. (Blanco, Alberto, et al. "Implementation of specialised attention mechanisms: ICD-10 classification of Gastrointestinal discharge summaries in English, Spanish and Swedish." Journal of Biomedical Informatics 130 (2022): 104050. https://www.sciencedirect.com/science/article/pii/S1532046422000661) as applied to claim 1, above and further in view of Gullapudi et al. (US 20240177053 A1). Regarding claim 14, Scripka et al. in combination with Blanco et al. teaches the limitations as in claim 1, above. However, Scripka et al. in combination with Blanco et al. do not explicitly teach, but Gullapudi et al. does teach: 14. The computer-implemented method of claim 1, further comprising: determining, by the one or more processors, a threshold value for the attention scores (see ¶ [0006, 0021, and 0059]: “[0006] In some implementations, actions include receiving query data and target data, the query data representative of query entities and the target data representative of target entities, determining, by an attention ML model, a set of character-level embeddings based on the query data and the target data, providing, by a sub-word-level tokenizer, a set of sub-word-level tokens from the query data and the target data, each sub-word-level token including a string of multiple characters, generating, by the attention ML model, a set of sub-word-level embeddings based on the set of sub-word-level tokens, providing, by the attention ML model, at least one attention matrix including, for at least a sub-set of sub-word-level tokens, attention scores, each attention score representative of a relative importance of a respective sub-word-level token in a predicted match provided by a matching ML model, the predicted match including a match between a query entity and a target entity, and outputting an explanation including a query text string representative of the query entity and a target text string representative of the target entity, the query text string and the target text string being provided based on the at least one attention matrix. Other implementations of this aspect include corresponding systems, apparatus, and computer programs, configured to perform the actions of the methods, encoded on computer storage devices. [0021] … outputting an explanation including a query text string representative of the query entity and a target text string representative of the target entity, the query text string and the target text string being provided based on the at least one attention matrix. [0059] In some examples, the attention scores component 436 compares each attention score to a threshold attention score. If an attention score meets or exceeds the threshold attention score, the respective sub-word-level token is included in the explanation text 430. If an attention score does not meet or exceed the threshold attention score, the respective sub-word-level token is not included in the explanation text 430.”); and filtering, by the one or more processors, at least a portion of the one or more tokens based on the threshold value (see ¶ [0006, 0021, and 0059] citations as in limitation above. More specifically: ¶[0059]: “In some examples, the attention scores component 436 compares each attention score to a threshold attention score. If an attention score meets or exceeds the threshold attention score, the respective sub-word-level token is included in the explanation text 430. If an attention score does not meet or exceed the threshold attention score, the respective sub-word-level token is not included in the explanation text 430.”). Scripka et al., Blanco et al. and Gullapudi et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in natural language processing (e.g., text). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Scripka et al. in combination with Blanco et al. to incorporate the teachings of Gullapudi et al. of determining, by the one or more processors, a threshold value for the attention scores and filtering, by the one or more processors, at least a portion of the one or more tokens based on the threshold value which provides the benefit of improving the quality of explanations output ([0047] of Gullapudi et al.). Regarding claim 15, Scripka et al. in combination with Blanco et al. and Gullapudi et al. teaches the limitations as in claim 14, above. Gullapudi et al. further teaches: 15. The computer-implemented method of claim 14, further comprising: determining, by the one or more processors, a threshold value for the aggregate attention scores (see ¶ [0006, 0021, and 0059]: “[0006] In some implementations, actions include receiving query data and target data, the query data representative of query entities and the target data representative of target entities, determining, by an attention ML model, a set of character-level embeddings based on the query data and the target data, providing, by a sub-word-level tokenizer, a set of sub-word-level tokens from the query data and the target data, each sub-word-level token including a string of multiple characters, generating, by the attention ML model, a set of sub-word-level embeddings based on the set of sub-word-level tokens, providing, by the attention ML model, at least one attention matrix including, for at least a sub-set of sub-word-level tokens, attention scores, each attention score representative of a relative importance of a respective sub-word-level token in a predicted match provided by a matching ML model, the predicted match including a match between a query entity and a target entity, and outputting an explanation including a query text string representative of the query entity and a target text string representative of the target entity, the query text string and the target text string being provided based on the at least one attention matrix. Other implementations of this aspect include corresponding systems, apparatus, and computer programs, configured to perform the actions of the methods, encoded on computer storage devices. [0021] Implementations can include actions of receiving query data and target data, the query data representative of query entities and the target data representative of target entities, determining, by an attention ML model, … outputting an explanation including a query text string representative of the query entity and a target text string representative of the target entity, the query text string and the target text string being provided based on the at least one attention matrix. [0059] In some examples, the attention scores component 436 compares each attention score to a threshold attention score. If an attention score meets or exceeds the threshold attention score, the respective sub-word-level token is included in the explanation text 430. If an attention score does not meet or exceed the threshold attention score, the respective sub-word-level token is not included in the explanation text 430.”); and filtering, by the one or more processors, at least a portion of the clusters based on the threshold value (see ¶ [0006, 0021, and 0059] citations as in limitations above). Scripka et al. and Blanco et al. and Gullapudi et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in natural language processing (e.g., text). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Scripka et al. in combination with Blanco et al. to incorporate the teachings of Gullapudi et al. of determining, by the one or more processors, a threshold value for the aggregate attention scores and filtering, by the one or more processors, at least a portion of the clusters based on the threshold value which provides the benefit of improving the quality of explanations output ([0047] of Gullapudi et al.). Claim 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Scripka et al. (US 20230017211 A1) and further in view of Blanco et al. (Blanco, Alberto, et al. "Implementation of specialised attention mechanisms: ICD-10 classification of Gastrointestinal discharge summaries in English, Spanish and Swedish." Journal of Biomedical Informatics 130 (2022): 104050. https://www.sciencedirect.com/science/article/pii/S1532046422000661) as applied to claim 1, above and further in view of Lucas et al. (pub. date 2020) (US 20200176098 A1). Regarding claim 16, Scripka et al. in combination with Blanco et al. teaches the limitations as in claim 1, above. However, Scripka et al. in combination with Blanco et al. do not explicitly teach, but Lucas et al. (pub. date 2020) does teach: 16. The computer-implemented method of claim 1, wherein the extracted text includes words and locations of the words within the one or more documents (see ¶ [0097]: “The intake pipeline 110 receives a clinical document that may include machine readable text or that may be received as an image file. If necessary, the document may be submitted to a pre-processor stage 120 that performs text cleaning and error detection (i.e., format conversion, resolution conversion, batch sizing, text cleaning, etc.). Once pre-processed, the document may be submitted for OCR on the document to convert the text into a machine-readable format (i.e., text document, html, etc.). Once in a machine-readable format, the error correction (e.g., spell checking, noise removal, context based correlation, etc.) may be performed on the now-machine-readable text. The intake pipeline stages 120-150 are modular components, which allows for real-time selection of the best processing tools and software depending on the type of document and document content being processed, enabling the processing pipeline to replace/compare algorithms used as necessary. Two examples of OCR software that may be used include Tesseract and Google Cloud Vision API. Tesseract provides high-speed OCR for documents which do not have any artifacting/noise (i.e., documents that have been printed to PDF or that had very little noise generated during the scanning process). Google Cloud Vision API, conversely, may be used for documents which have too much noise, as it is well-suited to process old documents or images of documents that have been scanned/faxed many times, introducing extensive artifacting and noise into the image. As a result, Cloud Vision may provide detailed information about the position of paragraphs, words, and documents within the documents processed. Other OCR systems may also be utilized in lieu of or in combination with the two described above.”). Scripka et al., Blanco et al. , and Lucas et al. (pub. date 2020) are considered to be analogous to the claimed invention because they are in the same field of endeavor in natural language processing (e.g., text). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Scripka et al. in combination with Blanco et al. to incorporate the teachings of Lucas et al. (pub. date 2020) of wherein the extracted text includes words and locations of the words within the one or more documents which provides the benefit of improving the results of the overall processing ([0117] of Lucas et al. (pub. date 2020)). Claim 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Scripka et al. (US 20230017211 A1) and further in view of Blanco et al. (Blanco, Alberto, et al. "Implementation of specialised attention mechanisms: ICD-10 classification of Gastrointestinal discharge summaries in English, Spanish and Swedish." Journal of Biomedical Informatics 130 (2022): 104050. https://www.sciencedirect.com/science/article/pii/S1532046422000661) and Lucas et al. (pub. date 2019) (US 10395772 B1). As to independent claim 17, Scripka et al. teaches: 17. A system (see ¶ [0004]: “The present invention relates to systems and methods for automated analysis of structured and unstructured data, and more particularly to analysis of medical images and records in various formats and structures from disparate sources and providers.” ) comprising: one or more processors (see ¶ [0062]: “The processes and logic flows described in this document can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output…”); and at least one non-transitory computer readable medium storing instructions which, when executed by the one or more processors (see ¶ [0063]: “Computer-readable media suitable for storing computer program instructions and data can include all forms of nonvolatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto optical disks; and CD ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.”), cause the one or more processors to perform operations comprising: receiving one or more documents, wherein the one or more documents include medical records (see Fig. 1 (110: source documents (e.g., health records) and 120: ingest and OCR) and ¶ [0004, 0018, and 0062] citations as in claim 1, above.); extracting, utilizing an optical character recognition (OCR) engine, text from the one or more documents (see Fig. 1 (110: source documents (e.g., health records) and 120: ingest and OCR) and ¶ [0004, 0018, 0039-0040 and 0062] citations as in claim 1, above.); aggregating the one or more tokens into (see Fig. 1 (110: source documents (e.g., health records) and 120: ingest and OCR) and ¶ [0004, 0018, 0034-0036, 0042, 0048 and 0062] citations as in claim 1, above.); However, Scripka et al. does not explicitly teach, but Blanco et al. does teach: determining, utilizing a natural language processing (NLP) model, one or more attention scores for one or more tokens of the one or more documents (see ¶ 5 of Introduction, ¶ 6 of 2. Related work, ¶ 2 of 4. Methods, ¶ 2 of 5. Results, and ¶ 6.1. Attention visualization citations as in claim 1, above.), wherein each of the one or more tokens represents a respective portion of the extracted text (see ¶ 5 of Introduction, ¶ 6 of 2. Related work, ¶ 2 of 4. Methods, ¶ 2 of 5. Results, and ¶ 6.1. Attention visualization citations as in claim 1, above. More specifically: ¶ 6.1. Attention visualization: “…maximum for each label attention is displayed for each word...” and Fig. 4 (Aggregated per-label attention on HER from MIMIC dataset)), wherein the NLP model includes a neural network that has been trained to generate the one or more attention scores for the one or more tokens based on training text and labels applied to the training text that are indicative of an importance of portions of the training text (see ¶ 5 of Introduction, ¶ 6 of 2. Related work, ¶ 2 of 4. Methods, ¶ 2 of 5. Results, and ¶ 6.1. Attention visualization citations as in limitation(s) above. More specifically: ¶ 6.1. Attention visualization: “…maximum for each label attention is displayed for each word...” and Fig. 4 (Aggregated per-label attention on HER from MIMIC dataset) and further ¶ 1 of 4. Methods: “For this work, we have developed a multi-label text classification model based on the Transformer architecture [26], depicted in Fig. 2. The ICD-10 multi-label classification problem consists of mapping the raw text of an EHR (X) to a subset from the full set of ICD labels, C, with ICI being the total number of ICD codes. As an example, in our case the label set conveys gastro-intestinal diseases, i.e., C = {K209, K388, ... K551}. The Transformer-based neural network model is trained with instances comprising pairs of input (EHR text) and output (ICD codes).); aggregating the one or more tokens into clusters based on the one or more attention scores (see ¶ 5 of Introduction, ¶ 6 of 2. Related work, ¶ 2 of 4. Methods, ¶ 2 of 5. Results, and ¶ 6.1. Attention visualization, and ¶ 2 of 4.2. PlaBERT: Per-label attention BERT citations as in claim 1, above.); determining, utilizing the NLP model, an aggregate attention score for each of the clusters (see ¶ 5 of Introduction, ¶ 6 of 2. Related work, ¶ 2 of 4. Methods, ¶ 2 of 5. Results, and ¶ 6.1. Attention visualization citations as in claim 1, above.), wherein individual aggregate attention scores indicate an importance of one or more portions of the text corresponding to one of the clusters relative to other portions of the text corresponding to another one of the clusters (see ¶ 5 of Introduction, ¶ 6 of 2. Related work, ¶ 2 of 4. Methods, ¶ 2 of 5. Results, and ¶ 6.1. Attention visualization citations as in claim 1, above.); Scripka et al. and Blanco et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in natural language processing (e.g., text associated with medical records). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Scripka et al. to incorporate the teachings of Blanco et al. of determining, by the one or more processors and utilizing a natural language processing (NLP) model of the document processing system, one or more attention scores for one or more tokens of the one or more documents, wherein each of the one or more tokens represents a respective portion of the extracted text, wherein the NLP model includes a neural network that has been trained to generate the one or more attention scores for the one or more tokens based on training text and labels applied to the training text that are indicative of an importance of portions of the training text; aggregating, by the one or more processors, the one or more tokens into clusters based on the one or more attention scores; determining, by the one or more processors and utilizing the NLP model, an aggregate attention score for each of the clusters, wherein individual aggregate attention scores indicate an importance of one or more portions of the text corresponding to one of the clusters relative to other portions of the text corresponding to another one of the cluster which provides the benefit improves the performance of the models with a reasonable trade-off among extra necessary computation and memory (¶ [conclusion] of Blanco et al.). However, Scripka et al. in combination with Blanco et al. do not explicitly teach, but Lucas et al. (pub. date 2019) does teach: retraining the NLP model using the determined aggregate attention scores and the clusters as labeled training data, such that the NLP model learns to interpret relative importance of different portions of input text (see ¶ Col. 46, lines 60-62: “…The Abstraction Engine 158 and Abstraction Engine toolbox 160 components utilize MLA and NLP to generate predictions.” and ¶ starting at Col. 48, line 42: “(236) As mentioned above, the system periodically checks to make sure that the NLP/MLA models being used are most up-to-date (such as elements 76, 78, and 80 from FIG. 5). The system may include a Bootcamp subroutine 166 for evaluating and updating the NLP and MLA models. In this subroutine 166, the system retrieves clinical record documents from the clinical data vault 144, such as based on one or more unique user id's, on clinical features common to one or more patients, or any other criteria. The subroutine also may communicate with the Abstraction Engine S3 database 156 to retrieve the raw OCR files corresponding to each of those documents, as well as the current NLP model. The system further may retrieve abstractor feedback (such as the feedback loop's erroneous result corrections/annotations) from the toolbox 160. Each of these inputs may be used to execute a training script to verify or update the NLP model. At that point, metadata relating to the updated model may be communicated to the Abstraction Engine toolbox database 164 (such as for later human inspection, model or data provenance, and/or long-term metrics). Workbench 148 supports the ability for abstractors to tag Abstraction Engine's 158 incorrect predictions with a predetermined set of issues (such as documents are from wrong patient, OCR errors, wrong entity linked, correct concept candidate, wrong entity linked, correct concept candidate but hypothetical reference in document cannot be construed as haven taken place, etc.). For example, in the case of patients whose predictions are incorrect because ‘Documents are for wrong patient’, the Abstraction Engine Bootcamp 166 may ignore these patients when training future MLAs to understand gender or may instantiate a specific training phase to train the current MLAs to predict which patients have documents from multiple patients and exclude from training and/or flag all patients which have documents from wrong patient for independent abstraction.”). Scripka et al., Blanco et al. and Lucas et al. (pub. date 2019) are considered to be analogous to the claimed invention because they are in the same field of endeavor in natural language processing (e.g., text). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Scripka et al. in combination with Blanco et al. to incorporate the teachings of Lucas et al. (pub. date 2019) of retraining the NLP model using the determined aggregate attention scores and the clusters as labeled training data, such that the NLP model learns to interpret relative importance of different portions of input text which provides the benefit of improving the results of the overall processing.(Col. 30, line 28 of Lucas et al. (pub. date 2019)). Claim 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Scripka et al. (US 20230017211 A1) and further in view of Blanco et al. (Blanco, Alberto, et al. "Implementation of specialised attention mechanisms: ICD-10 classification of Gastrointestinal discharge summaries in English, Spanish and Swedish." Journal of Biomedical Informatics 130 (2022): 104050. https://www.sciencedirect.com/science/article/pii/S1532046422000661) and Lucas et al. (pub. date 2019) (US 10395772 B1) as applied to claim 17, above and further in view of Kadav et al. (US 20190019037 A1). Regarding claim 18, Scripka et al. in combination with Blanco et al. and Lucas et al. (pub. date 2019) teach the limitations as in claim 17, above. Blanco et al. further teaches: 18. The system of claim 17, further comprising: generating, (see ¶ 5 of Introduction, ¶ 6 of 2. Related work, ¶ 2 of 4. Methods, ¶ 2 of 5. Results, and ¶ 6.1. Attention visualization citations as in claim 1 above. More specifically: Fig. 4 (Aggregated per-label attention on HER from MIMIC dataset) and further: ¶ 4 of Introduction: “…Our software can export the attention weights (per label or aggregated) ready for its visualisation on the NeatVision tool [7].” and ¶ 5 of 6.1. Attention visualization: “Although not tested with physicians, we feel that generating this kind of visualisation together with the label predictions would be helpful to classify notes as a DSS or as an aid for clinical documentation tasks.”), superimposing the one or more bounding boxes over the recognized words and/or phrases in the one or more documents (see ¶ 5 of Introduction, ¶ 6 of 2. Related work, ¶ 2 of 4. Methods, ¶ 2 of 5. Results, and ¶ 6.1. Attention visualization citations as in claim 1 above. More specifically: Fig. 4 (Aggregated per-label attention on HER from MIMIC dataset) and further: ¶ 4 of Introduction: “…Our software can export the attention weights (per label or aggregated) ready for its visualisation on the NeatVision tool [7].” and ¶ 5 of 6.1. Attention visualization: “Although not tested with physicians, we feel that generating this kind of visualisation together with the label predictions would be helpful to classify notes as a DSS or as an aid for clinical documentation tasks.”), wherein the one or more bounding boxes are colored and/or semi-transparent (see ¶ 5 of Introduction, ¶ 6 of 2. Related work, ¶ 2 of 4. Methods, ¶ 2 of 5. Results, and ¶ 6.1. Attention visualization citations as in claim 1 above. More specifically: Fig. 4 (Aggregated per-label attention on HER from MIMIC dataset)) causinq a qraphical user interface to output a presentation of the text that visually indicates the relative importance of one or more portions of the text (see ¶ 5 of Introduction, ¶ 6 of 2. Related work, ¶ 2 of 4. Methods, ¶ 2 of 5. Results, and ¶ 6.1. Attention visualization citations as in limitation(s) above. More specifically: Fig. 4 (Aggregated per-label attention on HER from MIMIC dataset) and further: ¶ 4 of Introduction: “…Our software can export the attention weights (per label or aggregated) ready for its visualisation on the NeatVision tool [7].” and ¶ 5 of 6.1. Attention visualization: “Although not tested with physicians, we feel that generating this kind of visualisation together with the label predictions would be helpful to classify notes as a DSS or as an aid for clinical documentation tasks.”) PNG media_image1.png 416 663 media_image1.png Greyscale Scripka et al. and Blanco et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in natural language processing (e.g., text associated with medical records). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Scripka et al. to incorporate the teachings of Blanco et al. of generating one or more bounding boxes for recognized words and/or phrases in the one or more documents, superimposing, by the one or more processors, the one or more bounding boxes over the recognized words and/or phrases in the one or more documents, wherein the one or more bounding boxes are colored and/or semi-transparent which provides the benefit improves the performance of the models with a reasonable trade-off among extra necessary computation and memory (¶ [conclusion] of Blanco et al.). Lucas et al. (pub. date 2019) further teaches: generating, utilizing the OCR engine, one or more bounding boxes for recognized words and/or phrases in the one or more documents (see Fig. 8-9 and ¶ starting at Col. 30, line 29, ¶ starting at Col. 33, line 34, and ¶ starting at Col. 34, line 40 citations as in claim 2, above.), wherein the one or more bounding boxes indicate the one or more attention scores (see Fig. 8-9 and ¶ starting at Col. 30, line 29, ¶ starting at Col. 33, line 34, and ¶ starting at Col. 34, line 40 citations as in claim 2 above.); Scripka et al., Blanco et al. and Lucas et al. (pub. date 2019) are considered to be analogous to the claimed invention because they are in the same field of endeavor in natural language processing (e.g., text). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Scripka et al. in combination with Blanco et al. to incorporate the teachings of Lucas et al. (pub. date 2019) of generating, by the one or more processors utilizing the OCR engine, one or more bounding boxes for recognized words and/or phrases in the one or more documents, wherein the one or more bounding boxes indicate the one or more predictions and attention scores which provides the benefit of improving the results of the overall processing.(Col. 30, line 28 of Lucas et al. (pub. date 2019) ). However, Scripka et al. in combination with Blanco et al. and Lucas et al. (pub. date 2019) do not explicitly teach, but Kadav et al. does teach: wherein an intensity of the color or transparency of each of the one or more bounding boxes represents a magnitude of the corresponding attention score (see ¶ [0033 and 0056] citations as in claim 4 above.). Scripka et al., Blanco et al. Lucas et al. (pub. date 2019), and Kadav et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in natural language processing (e.g., text). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Scripka et al. in combination with Blanco et al. and Lucas et al. (pub. date 2019) to incorporate the teachings of Kadav et al. of wherein an intensity of the color or transparency of each of the one or more bounding boxes represents a magnitude of the corresponding attention score which provides the benefit of provide for improved accuracy and lower computational cost, as compared to conventional approaches ([0025] of Kadav et al.). Claims 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Scripka et al. (US 20230017211 A1) and further in view of Blanco et al. (Blanco, Alberto, et al. "Implementation of specialised attention mechanisms: ICD-10 classification of Gastrointestinal discharge summaries in English, Spanish and Swedish." Journal of Biomedical Informatics 130 (2022): 104050. https://www.sciencedirect.com/science/article/pii/S1532046422000661) and Lucas et al. (pub. date 2019) (US 10395772 B1) as applied to claim 17, above and further in view of Qin et al. ("Hybrid Attention-based Transformer for Long-range Document Classification," 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy, 2022, pp. 1-8, doi: 10.1109/IJCNN55064.2022.9891918. https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9891918). Regarding claim 19, Scripka et al. in combination with Blanco et al. teaches the limitations as in claim 17, above. However, Scripka et al. in combination with Blanco et al. do not explicitly teach, but Qin et al. does teach: 19. The system of claim 17, further comprising: determining one or more intervals to form the clusters around one or more tokens with high attention scores by utilizing an expanding window technique (see Fig. 2, ¶ 2-3 of A. Sliding Window Local Attention, and and ¶ 1 of B. Clustering-Based Long-Range Attention citations as in claim 5, above.), wherein the one or more tokens with high attention scores are clustered based, at least in part, on a task-based parameter that indicates a quantity of data sought during processing of the one or more documents (see Fig. 2, ¶ 2-3 of A. Sliding Window Local Attention, and and ¶ 1 of B. Clustering-Based Long-Range Attention citations as in claim 5, above.), wherein the one or more intervals are positioned around the one or more tokens with high attention scores, and wherein overlapping intervals are merged (see Fig. 2 and ¶ 2-3 of A. Sliding Window Local Attention and ¶ 1-2 of B. Clustering-Based Long-Range Attention citations as in claim 5, above.). Scripka et al., Blanco et al., and Qin et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in natural language processing (e.g., text). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Scripka et al. in combination with Blanco et al. to incorporate the teachings of Qin et al. of determining one or more intervals to cluster the one or more tokens with high attention scores by utilizing an expanding window technique, wherein the one or more tokens with high attention scores are clustered based, at least in part, on a task-based parameter that indicates a quantity of data sought during processing of the one or more documents, wherein the one or more intervals are positioned around the one or more tokens with high attention scores, and wherein overlapping intervals are merged which provides the benefit of best balancing classification effectiveness and memory efficiency, making it easier to input longer sequences under the same hardware conditions (¶ V. Conclusion of Qin et al.). Claim 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Scripka et al. (US 20230017211 A1) and further in view of Blanco et al. (Blanco, Alberto, et al. "Implementation of specialised attention mechanisms: ICD-10 classification of Gastrointestinal discharge summaries in English, Spanish and Swedish." Journal of Biomedical Informatics 130 (2022): 104050. https://www.sciencedirect.com/science/article/pii/S1532046422000661) and Gullapudi et al. (US 20240177053 A1). As to independent claim 20, Scripka et al. further teaches: 20. At least one non-transitory computer readable medium storing instructions which, when executed by one or more processors (see ¶ [0062-0063]: “[0062] The processes and logic flows described in this document can be performed by one or more programmable processors… [0063] Computer-readable media suitable for storing computer program instructions and data can include all forms of nonvolatile memory, media and memory devices, …”), cause the one or more processors to perform operations comprising: receiving one or more documents, wherein the one or more documents include medical records (see Fig. 1 (110: source documents (e.g., health records) and 120: ingest and OCR) and ¶ [0004, 0018, and 0062] citations as in claim 1, above.); extracting, utilizing an optical character recognition (OCR) engine, text from the one or more documents (see Fig. 1 (110: source documents (e.g., health records) and 120: ingest and OCR) and ¶ [0004, 0018, 0039-0040 and 0062] citations as in claim 1, above.); aggregating the one or more tokens into clusters (see Fig. 1 (110: source documents (e.g., health records) and 120: ingest and OCR) and ¶ [0004, 0018, 0034-0036, 0042, 0048 and 0062] citations as in claim 1, above.); However, Scripka et al. does not explicitly teach, but Blanco et al. does teach: determining, utilizing a natural language processing (NLP) model, one or more attention scores for one or more tokens of the one or more documents (see ¶ 5 of Introduction, ¶ 6 of 2. Related work, ¶ 2 of 4. Methods, ¶ 2 of 5. Results, and ¶ 6.1. Attention visualization citations as in claim 1, above.), wherein each of the one or more tokens represents a respective portion of the extracted text (see ¶ 5 of Introduction, ¶ 6 of 2. Related work, ¶ 2 of 4. Methods, ¶ 2 of 5. Results, and ¶ 6.1. Attention visualization citations as in claim 1, above. More specifically: ¶ 6.1. Attention visualization: “…maximum for each label attention is displayed for each word...” and Fig. 4 (Aggregated per-label attention on HER from MIMIC dataset)), wherein the NLP model includes a neural network that has been trained to generate the one or more attention scores for the one or more tokens based on training text and labels applied to the training text that are indicative of an importance of portions of the training text (see ¶ 5 of Introduction, ¶ 6 of 2. Related work, ¶ 2 of 4. Methods, ¶ 2 of 5. Results, and ¶ 6.1. Attention visualization citations as in limitation(s) above. More specifically: ¶ 6.1. Attention visualization: “…maximum for each label attention is displayed for each word...” and Fig. 4 (Aggregated per-label attention on HER from MIMIC dataset) and further ¶ 1 of 4. Methods: “For this work, we have developed a multi-label text classification model based on the Transformer architecture [26], depicted in Fig. 2. The ICD-10 multi-label classification problem consists of mapping the raw text of an EHR (X) to a subset from the full set of ICD labels, C, with ICI being the total number of ICD codes. As an example, in our case the label set conveys gastro-intestinal diseases, i.e., C = {K209, K388, ... K551}. The Transformer-based neural network model is trained with instances comprising pairs of input (EHR text) and output (ICD codes).); aggregating the one or more tokens into clusters based on the one or more attention scores to construct sentences (see ¶ 5 of Introduction, ¶ 6 of 2. Related work, ¶ 2 of 4. Methods, ¶ 2 of 5. Results, and ¶ 6.1. Attention visualization, and ¶ 2 of 4.2. PlaBERT: Per-label attention BERT citations as in claim 1, above.); determining, utilizing the NLP model, an aggregate attention score for each of the clusters (see ¶ 5 of Introduction, ¶ 6 of 2. Related work, ¶ 2 of 4. Methods, ¶ 2 of 5. Results, and ¶ 6.1. Attention visualization citations as in claim 1, above.), wherein individual aggregate attention scores indicateone or more portions of the text corresponding to one of the clusters relative to other portions of the text corresponding to another one of the clusters (see ¶ 5 of Introduction, ¶ 6 of 2. Related work, ¶ 2 of 4. Methods, ¶ 2 of 5. Results, and ¶ 6.1. Attention visualization citations as in claim 1, above.); receiving, via a graphical user interface, a query to the database that corresponds to a cluster stored in the database (see ¶ 5 of Introduction, ¶ 6 of 2. Related work, ¶ 2 of 4. Methods, ¶ 2 of 5. Results, and ¶ 6.1. Attention visualization citations as in claim 1, above. Further, Fig. 3 (Input EHR (text) and assigned ICD codes) and ¶ 1 of 3. Materials: “In this article, Spanish, Swedish and English electronic patient records labelled with gastrointestinal-related !CD codes were used. The Swedish dataset origins from Health Bank1; to be precise, the EHRs are within the gastro-surgery medical speciality. The Spanish dataset is rooted in IXAmed-GS [34] but conveys EHRs from the general Emergency Department. As for the English dataset, it is a well-known reference, i.e., MIMIC [35], and also focused on the Emergency Department. The English dataset was encoded with ICD-9 system while both, the Spanish and the Swedish datasets follow the ICD-10. In order to find a comparable label set shared by the three corpora, a mapping between ICD-9 and ICD-10 was applied. As a result, the three datasets share the same set of 157 base-codes all within the Chapter XI of the ICD.”); and based on the query, causing the graphical user interface to display imaging data associated with the one or more documents that include the cluster (see ¶ 5 of Introduction, ¶ 6 of 2. Related work, ¶ 2 of 4. Methods, ¶ 2 of 5. Results, and ¶ 6.1. Attention visualization citations as in claim 1, above, as well as Fig. 3 and ¶ 1 of 3. Materials citations as in limitation above. More specifically, Fig. 3 (below).) PNG media_image2.png 484 754 media_image2.png Greyscale Scripka et al. and Blanco et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in natural language processing (e.g., text associated with medical records). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Scripka et al. to incorporate the teachings of Blanco et al. of determining, utilizing a natural language processing (NLP) model, one or more attention scores for one or more tokens of the one or more documents, wherein each of the one or more tokens represents a respective portion of the extracted text, wherein the NLP model includes a neural network that has been trained to generate the one or more attention scores for the one or more tokens based on training text and labels applied to the training text that are indicative of an importance of portions of the training text; aggregating the one or more tokens into clusters based on the one or more attention scores to construct sentences; determining, utilizing the NLP model, an aggregate attention score for each of the clusters, wherein individual aggregate attention scores indicate However, Scripka et al. in combination with Blanco et al. do not explicitly teach, but Gullapudi et al. does teach: filtering at least a portion of the clusters based on a predetermined threshold value (see ¶ [0006, 0021, and 0059]: “[0006] In some implementations, actions include receiving query data and target data, the query data representative of query entities and the target data representative of target entities, determining, by an attention ML model, a set of character-level embeddings based on the query data and the target data, providing, by a sub-word-level tokenizer, a set of sub-word-level tokens from the query data and the target data, each sub-word-level token including a string of multiple characters, generating, by the attention ML model, a set of sub-word-level embeddings based on the set of sub-word-level tokens, providing, by the attention ML model, at least one attention matrix including, for at least a sub-set of sub-word-level tokens, attention scores, each attention score representative of a relative importance of a respective sub-word-level token in a predicted match provided by a matching ML model, the predicted match including a match between a query entity and a target entity, and outputting an explanation including a query text string representative of the query entity and a target text string representative of the target entity, the query text string and the target text string being provided based on the at least one attention matrix. Other implementations of this aspect include corresponding systems, apparatus, and computer programs, configured to perform the actions of the methods, encoded on computer storage devices. [0021] Implementations can include actions of receiving query data and target data, the query data representative of query entities and the target data representative of target entities, determining, by an attention ML model, … outputting an explanation including a query text string representative of the query entity and a target text string representative of the target entity, the query text string and the target text string being provided based on the at least one attention matrix. [0059] In some examples, the attention scores component 436 compares each attention score to a threshold attention score. If an attention score meets or exceeds the threshold attention score, the respective sub-word-level token is included in the explanation text 430. If an attention score does not meet or exceed the threshold attention score, the respective sub-word-level token is not included in the explanation text 430.”); extracting the filtered clusters as structured data, and storing the structured data in a database (see ¶ [0006, 0021, and 0059] citations as in limitations above and further ¶ [0024]: “FIG. 1 depicts an example architecture 100 in accordance with implementations of the present disclosure. In the depicted example, the example architecture 100 includes a client device 102, a network 106, and a server system 104. The server system 104 includes one or more server devices and databases 108 (e.g., processors, memory). In the depicted example, a user 112 interacts with the client device 102.”); Scripka et al. and Blanco et al. and Gullapudi et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in natural language processing (e.g., text). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Scripka et al. in combination Blanco et al. to incorporate the teachings of Gullapudi et al. of filtering at least a portion of the clusters based on a predetermined threshold value and extracting the filtered clusters as structured data, and storing the structured data in a database which provides the benefit of improving the quality of explanations output ([0047] of Gullapudi et al.). Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to Keisha Y Castillo-Torres whose telephone number is (571)272-3975. The examiner can normally be reached Monday - Friday, 9:00 am - 4:00 pm (EST). Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached at (571)272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. Keisha Y. Castillo-Torres Examiner Art Unit 2659 /Keisha Y. Castillo-Torres/Examiner, Art Unit 2659
Read full office action

Prosecution Timeline

May 08, 2023
Application Filed
May 30, 2025
Non-Final Rejection — §101, §103, §112
Aug 19, 2025
Examiner Interview Summary
Aug 19, 2025
Applicant Interview (Telephonic)
Aug 28, 2025
Response Filed
Oct 31, 2025
Final Rejection — §101, §103, §112
Dec 10, 2025
Examiner Interview Summary
Dec 10, 2025
Applicant Interview (Telephonic)
Jan 05, 2026
Request for Continued Examination
Jan 21, 2026
Response after Non-Final Action
Mar 03, 2026
Non-Final Rejection — §101, §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12573402
GENERATING AND/OR UTILIZING UNINTENTIONAL MEMORIZATION MEASURE(S) FOR AUTOMATIC SPEECH RECOGNITION MODEL(S)
2y 5m to grant Granted Mar 10, 2026
Patent 12536989
Language-agnostic Multilingual Modeling Using Effective Script Normalization
2y 5m to grant Granted Jan 27, 2026
Patent 12531050
VOICE DATA CREATION DEVICE
2y 5m to grant Granted Jan 20, 2026
Patent 12499332
TRANSLATING TEXT USING GENERATED VISUAL REPRESENTATIONS AND ARTIFICIAL INTELLIGENCE
2y 5m to grant Granted Dec 16, 2025
Patent 12488180
SYSTEMS AND METHODS FOR GENERATING DIALOG TREES
2y 5m to grant Granted Dec 02, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
74%
Grant Probability
99%
With Interview (+30.5%)
3y 0m
Median Time to Grant
High
PTA Risk
Based on 108 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month