Last updated: May 29, 2026
Application No. 17/375,657
SYSTEMS AND METHODS FOR THE AUTOMATIC CATEGORIZATION OF TEXT

Final Rejection §101§103§112
Filed
Jul 14, 2021
Priority
Jul 14, 2020 — provisional 63/051,407
Examiner
WILLIS, AMANDA LYNN
Art Unit
2156
Tech Center
2100 — Computer Architecture & Software
Assignee
Thomson Reuters Enterprise Centre GmbH
OA Round
8 (Final)
Interview Optional

— +26.4% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 36% grant rate with +26.4% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 348 resolved cases, 2023–2026
Examiner Intelligence

WILLIS, AMANDA LYNN View full profile →
Grants only 36% of cases
Career Allowance Rate
124 granted / 348 resolved
-19.4% vs TC avg
Strong +26% interview lift
Without
With
+26.4%
Interview Lift
resolved cases with interview
Typical timeline
4y 9m
Avg Prosecution
16 currently pending
Career history
374
Total Applications
across all art units
Statute-Specific Performance

§101
1.6%
-38.4% vs TC avg
§103
86.2%
+46.2% vs TC avg
§102
5.9%
-34.1% vs TC avg
§112
1.1%
-38.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 348 resolved cases
Office Action

§101 §103 §112
DETAILED ACTION
	Receipt of Applicant’s Amendment, filed April 7, 2025is acknowledged.  
Claims 1-5, 7-20 are pending in this office action.

Claim Interpretation
With regard to claims 1 and 14, the claim recites “fine-tuning the transformer-based sequence to sequence model with the plurality of documents, plurality of headnotes and metadata”.  There is no definition provided in the instant specification for the term “fine-tuning”.  There does not appear to be any examples of ‘fine-tuning’ within the instant specification.  Paragraph [0040] was identified as discussing fine-tuning.  When read within the context of Paragraphs [0040] and [0043] one of ordinary skill in the art would identify the act of training/retraining the sequence-to-sequence model as ‘fine-tuning’ said model.  This is the interpretation that was applied for examination purposes.  Please note that the data (e.g. the plurality of documents, plurality of headnotes and metadata) that is used to perform the training/retraining amounts to Non-functional descriptive material, merely being the data that is input into the system and does not impact the functionality performed by the claimed system.  Within Paragraph [0040] this data is provided by to the system from the Westlaw legal research platform, which is stated as being a “known” research platform within the field of art (Original specification Paragraph [004]).

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-5, 7-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
[Step 2A Prong 1] The claim(s) recite the following limitations which have been identified as being directed to a mental process:
1. A computer implemented method for categorizing documents comprising:
[a] [[
[b] classifying, [[
[c] [[
[d] [[
[e] generating, [[
[f] [[
[g] generating, [[
[h] [[
[i] associating, [[


9. The computer implemented method of claim 1, comprising:
predicting, … for at least a second of the plurality of headnotes, a second statute pertaining to the second headnote, wherein the second predicted statute has associated therewith a second taxonomy of topics; 
…to model a relationship between a given word in the first headnote and a plurality of other words in the first headnote.

12. The computer implemented method of claim 1, comprising retrieving the taxonomy of topics associated with the statute pertaining to the first headnote and using the retrieved taxonomy of topics as input for predicting the topic from the taxonomy associated with the statute pertaining to the first headnote.  

13. The computer implemented method of claim 12, wherein the predicted statute and first headnote are further used as input for predicting the topic from the taxonomy associated with the statute pertaining to the first headnote.  

14. A computer implemented method for categorizing documents comprising:
[a] [[
[b] classifying, [[
[c] [[
[f] [[
[h] [[
[d] [[
[i] associating, [[
[c] [[
[e] generating, [[
[f] [[
[g] generating, [[
[i] associating, [[

The act of [limitation b] classifying headnotes as a note of decision may reasonably be interpreted as covering a mental process, specifically an evaluation of the headnotes, or the formation of a judgement or opinion regarding the interpretive status of the headnotes.  A human being may look at the data and formulate the classification mentally, and determining that the headnote is a ‘note of decision’ (See Paragraph [0040] of the original specification which describes Westlegal research platform, which is a database of human classified ‘notes of decision’).  
The act of [limitations c and f, claim 9] making a prediction of a status associated with the headnote may reasonably be interpreted a mental process, specifically an evaluation of the headnote, to formulate a judgement or opinion regarding which statute/topic is most relevant.  A human being may look at the headnote and formulate these predictions mentally.
The act of [limitations e and g, claim 13] generating a new statue prediction and new topic may reasonably be interpreted as a human being making a mental evaluation and coming to a conclusion regarding the data they are evaluating.
The act of [limitation i, claim 12] associating the first headnote with the precited topic may reasonably be interpreted as a human being mentally formulating the opinion that the headnote pertains to the topic.
Paragraph [0004] of the original specification describes how “experts” humans manually performed the claimed operation.  Paragraph [0034] of the original specification describes how a human can manually perform the categorization, and that the claimed system is merely automating the categorization process using known methodologies.  Paragraph [0038] of the original specification describes “Westlaw services” which is a known existing system in which human being manually perform the claimed operations.  Based on these reasoning these claim limitations have been identified as being directed to a mental process.
When taken as a whole, these steps appear to recite the operations that subject matter experts perform when manually processing these documents within the field of art.

Claims 2-4, 7, 8, 10, and 11 further recite the following limitations identified as further defining the mental process, including: 
2. The computer implemented method of claim 1, comprising generating an annotated statute by annotating the predicted statute with the first headnote.   

3. The computer implemented method of claim 2, wherein annotating the predicted statute comprises adding the segment of text from the first headnote to the annotated statute.  

4. The computer implemented method of claim 3, wherein annotating the predicted statute comprises adding, to the annotated statute, a link to the document.

7. The computer implemented method of claim 1, wherein the first of the plurality of headnotes does not contain an explicit citation to the predicted statute, and wherein… to suggest statutes based on headnote text without citations to any statute.

8. The computer implemented method of claim 1, wherein the first headnote comprises a citation to a second statute different than the predicted statute, and … to suggest statutes based on the segment of text, wherein the segment of text does not include an explicit citation to the predicted statute. 

10. The computer implemented method of claim 9, further comprising, … predict the new topic, wherein the new topic includes terms not recited in the second headnote.  

11. The computer implemented method of claim 9, wherein the new topic is unique to the second taxonomy associated with the second statute pertaining to the second headnote.   

These claim limitations appear to further recite the abstract idea itself, as they appear to further define how the mental processes of the predicting of the statute is done, as well as how the human being may mentally associate the statute to the document.  The claim limitation may be construed to include situations where a human being formulates a judgement or an opinion to mentally link the document to the statute.  At most, the claim may be construed as a human being using a pencil to document (e.g. annotate) the mentally established association.  The original specification describes how human experts perform such manual operations in the past.  Paragraph [0004] of the specification describes how “experts” humans manually performed the claimed operation.  Paragraph [0034] how a human can manually perform the categorization.  In Paragraph [0038] the specification describes Westlaw services which is an existing known system in which human being manually perform the claimed operations.

[Step 2A Prong Two] This judicial exception is not integrated into a practical application.  The claims recite the following additional elements:
1. A computer implemented method for categorizing documents comprising:
[a] pre-training a sequential neural network, by a server computer, with a plurality of documents, the plurality of documents comprising a plurality of headnotes and metadata, wherein the plurality of headnotes each comprise a respective segment of text that summarizes at least a respective portion of the document;
[b] … via the sequential neural network, … wherein pre-training and classifying produces a pre-trained model with a generalized domain knowledge and enhances the accuracy of classifying the plurality of headnotes as the note of decision;
[c] utilizing a transformer-based sequence to sequence model with the pre-trained model and the generalized domain knowledge …by the server computer …wherein producing the pre-trained model is outsourced to the sequential neural network allowing the transformer-based sequence to sequence model to be tailored to a predictive task;
[d] fine-tuning the transformer-based sequence to sequence model with the plurality of documents, plurality of headnotes and metadata;
[e] …by the server computer utilizing the transformer-based sequence to sequence model …;
[f] utilizing the transformer-based sequence to sequence model … by the server computer …;
[g] …by the server computer utilizing the transformer-based sequence to sequence model …;
[h] periodically retraining the sequential neural network with changes to the taxonomy of topics, and
[i] …by the server computer….

5. The computer implemented method of claim 1, wherein the sequential neural network includes a bidirectional long-short term memory (LSTM) based classifier model.  

7. The computer implemented method of claim 1, … wherein the transformer-based sequence to sequence model is trained ….  

8. The computer implemented method of claim 1, … wherein the transformer-based sequence to sequence model is trained …. 

9. The computer implemented method of claim 1, comprising:…, by the server computer, using the transformer-based sequence to sequence model,…
utilizing self-attention within the transformer-based sequence to sequence model …
10. The computer implemented method of claim 9, further comprising, training the transformer-based sequence to sequence model to … 

14. A computer implemented method for categorizing documents comprising:
[a] pre-training a sequential neural network, by a server computer, with a plurality of documents, the plurality of documents comprising a plurality of headnotes and metadata, wherein the plurality of headnotes each comprise a respective segment of text that summarizes at least a respective portion of the document;
[b] … via the sequential neural network, … wherein pre-training and classifying produces a pre-trained model with a generalized domain knowledge and enhances the accuracy of classifying the plurality of headnotes as the note of decision;
[c] utilizing a transformer-based sequence to sequence model with the pre-trained model and the generalized domain knowledge …by the server computer …wherein producing the pre-trained model is outsourced to the sequential neural network allowing the transformer-based sequence to sequence model to be tailored to a predictive task;
[f] utilizing the transformer-based sequence to sequence model … by the server computer…;
[h] periodically retraining the sequential neural network with changes to the first taxonomy of topics,
[d] fine-tuning the transformer-based sequence to sequence model with the plurality of documents, plurality of headnotes and metadata;
[i] … by the server computer…;
[c] utilizing the transformer-based sequence to sequence model … by the server computer…;
[e] … by the server computer utilizing the transformer-based sequence to sequence model…;
[f] utilizing the transformer-based sequence to sequence model … by the server computer…;
[g] by the server computer utilizing the transformer-based sequence to sequence model…;
[i] … by the server computer... .

 

This judicial exception is not integrated into a practical application because these claim elements have been identified as generic computing components merely applying the abstract idea within a computing environment (MPEP 2106.05(f)).  
	One of ordinary skill in the art would recognize that all AI systems (limitations [b], [c], [e], [f], [g], claim 5) must be trained (limitations [a], [b], [c], claim 7, 9, and 10) in order for them to function.  One of ordinary skill in the art would further recognize that retraining (limitations [d] and [h]) existing AI models is a standard practice in the art for updating AI models.  The training and retraining limitations have been identified as the standard functions of generic AI models necessary to make us of AI models.
	The claim language itself is recited at a high level of generality, merely reciting the outcome of the steps, and reciting that this outcome is produced using the generic computing device (limitation [i]).  Even the generation of the model is recited as feeding the specific data (e.g. the plurality of documents) to a Sequential Neural Network to produce the ‘pre-trained model’.  This model again merely being fed data to produce the final result, the ‘prediction’.  MPEP2106.05(f)(1) details that merely reciting the solution or outcome with no description of the mechanism for accomplishing the result does not integrate a judicial exception into a practical application or provide significantly more.  The format of the claims follows the same pattern within each step, the recitation of the computing device to generate resultant data.  The resultant data being something that is traditionally done by a human being performing that functionality (as stated in the original specification  ¶4, ¶34).  The claims merely recite the name of a generic computing device (e.g. transformer-based sequence to sequence model, server computer, Self-attention, bidirectional long-short term memory (LSTM)) for generating this resultant.  The computing device itself being recited at a high level of generality, and appearing to amount to known computing devices operating in their ordinary capacity (See Original Specification, ¶35 which describes the use of known off the shelf machine learning algorithms, e.g. Seq2Seq model, LSTM and Self-attention).  The recitation of a generic computing device operating in its ordinary capacity is not sufficient to amount to significantly more or a practical application of the abstract idea itself (MPEP 2106.05(f)(2)).
When taken individually or viewed as an ordered combination the claims as a whole do not appear to be integrated into a practical application but instead appear to be the mere automation of a manual operation within a generic computing environment, using generic computing devices.

The Claim(s) further recites:
1. …[a]…, the plurality of documents comprising a plurality of headnotes and metadata, wherein the plurality of headnotes each comprise a respective segment of text that summarizes at least a respective portion of the document; …

14 … [a] … a plurality of documents comprising a plurality of headnotes and metadata, wherein the plurality of headnotes each comprise a respective segment of text that summarizes at least a respective portion of the document; …

These claim limitations have been identified as a recitation of the underling documentation upon which the claimed operation is performed.  As such, these claim limitations have been identified as an extra-solution activity (MPEP 2106.05(g)).  When taken individually or viewed as an ordered combination the claims as a whole do not appear to be integrated into a practical application.

[Step 2B] The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.  The above additional elements are the same as indicated above.

As detailed above this claim limitation has been identified as reciting generic computing devices merely automating the manual process of classifying the documents.  
The use of the AI modules recites are identified as known methodologies in Paragraph [0034] of the specification.  Paragraph [0034] of the specification details that the categorization is performed using “known methodologies”.  Paragraph [0035] recites that the machine learning methodology used is a known “sequence-to-sequence” model known in the art.  In Paragraph [0040] of the specification it is disclosed that the claimed machine learning model is “Google’s Text-to-Text Transfer Transformer (T5)”.  Paragraph [0040] of the specification further states “Preferably, a sequence-to-sequence the model is initially pre-trained with generalized domain knowledge, and then retraining for it and find to using documents/document segments re-relevant to the given task.”  The following references are cited to establish the fact that the use of AI models, specifically the LSTM (aka, the “sequential neural network), and the transformer-based sequence to sequence model are indeed standard use off the shelf generic computer devices known to be use for text classification within the field of art: Makhija [2021/0201013], Keskar [2019/0130273], Lee [2019/0172466], Park [2014/0058539], Eldesouki [Arabic Multi-Dialect Segmentation: bi_LSTM-CRF vs. SVM], Jayawardhana [Sequence Models & Recurrent Neural Networks (RNNs)], Chandra [Transformer-based deep learning for predicting protein properties in the life sciences].
All of this evidence demonstrates that the Machine learning model is a general-purpose machine within the field of art.  When taken individually or viewed as an ordered combination the claims as a whole do not appear to amount to significantly more than the abstract idea.   The claimed device appears to be the mere automation of a manual operation with a generic computing environment (e.g. the trained ML model), using generic computing devices (e.g. the known sequence to sequence ML algorithm).
The Claim(s) further recites:
1. … with a plurality of documents, the plurality of documents comprising a plurality of headnotes and metadata, wherein the plurality of headnotes each comprise a respective segment of text that summarizes at least a respective portion of the document; …

14 … with a plurality of documents comprising a plurality of headnotes and metadata, wherein the plurality of headnotes each comprise a respective segment of text that summarizes at least a respective portion of the document; …

This appears to the recitation of the underling documentation upon which the claimed operation is performed.  As such, this claim limitation has been identified as an extra-solution activity (MPEP 2106.05(g)).
	MPEP 2106.05(d)(II) provides a listing of examples of activities that the courts have found to be insignificant extra-solution activity, including “v. Electronically scanning or extracting data from a physical document, Content Extraction and Transmission, LLC v. Wells Fargo Bank, 776 F.3d 1343, 1348, 113 USPQ2d 1354, 1358 (Fed. Cir. 2014) (optical character recognition)”.  Within the instant device the specific data recited is merely the underlying physical document upon which the system is extracting data from.  The particular data itself appears to be well understood, routine, and conventional within the field of art as evidenced by Paragraphs [0036]. [0041], and [0051] of the specification and the prior arts on record (Al-Kofahi911 [2010/0114911], Al-Kofahi125 [2012/0036125], Shastri [2013/0238316], and Stauffer [2020/0020058]) all of which disclose legal documents with the claimed structure being analyzed.  
[Consideration as a Whole] When viewed within the claim as a whole, the document is merely retrieved and analyzed to formulate the predications.  The claim does not provide any functionality regarding how the analyzation is performed, or any operations that appears dependent on the specific formulation of the document itself.  A human being may perform this type of analysis and prediction regardless of the specific underlying document being analyzed.  When taken individually or viewed as an ordered combination the claims as a whole do not appear to amount to significantly more than the abstract idea.

Claims 15-20 appear to recite substantially similar limitations as those discussed individually above, and are rejected based upon the same rational.  For the sake of brevity, the rejection has not been duplicated.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-5, 7-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claims 1 and 14, claim 1 recites the limitation "utilizing a transformer-based sequence to sequence model with the pre-trained model and the generalized domain knowledge to predict".  There is insufficient antecedent basis for this limitation in the claim.
It is unclear how many machine learning models are being claimed.  When read in light of the instant specification, the ‘transformer-based sequence to sequence model’ is the ‘pre-trained model’ (see Paragraph [0040] which recites the sequence-to-sequence model being a transformer-based model, and being pretrained.  Furthermore, one of ordinary skill in the art would recognize that all machine learning models are ‘pre-trained’.  It is unclear how many models the claim requires, and the distinction between them.  For examination purposes the transformer-based sequence to sequence model has been identified as being the pre-trained model which performs the classification and prediction.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5, 7-20 are rejected under 35 U.S.C. 103 as being unpatentable over Mendelevitch [2003/0130993] in view of Makhija [2021/0201013] and Al-Kofahi [2010/0114911]

With regard to claim 1, Mendelevitch teaches A computer implemented method for categorizing documents comprising:
[a] pre-training (Mendelevitch, ¶12 “after being trained”; ¶6 “These algorithms are usually "trained" on a small document subset (the training set) used to represent typical documents in each topic. The trained algorithm is then applied to the unclassified documents.”; ¶60) a [[ as the machine learning algorithm (Mendelevitch, ¶52 “the classifier for one topic may use a Naive Bayes algorithm and the classifier for a second topic may use a Support Vector Machines algorithm.”; Please note that all references to the “machine learning algorithm” are intended to refer to the modified machine learning algorithm detailed in the 103 combination below.), by server computer as the server 60 (Mendelevitch, ¶40 “client computer system 10 configured with a document classification and categorization application module 40 (also referred to herein as "classification engine" or "categorization engine") according to the present invention … Client system 10 may operate as a stand-alone system or it may be connected to server 60 and/or other client systems 10 over a network 70”), with a plurality of documents as the documents being classified (Id) comprising a plurality of [[ as text objects from the document, such as the summary or description (Mendelevitch, ¶48 “each document is preferably converted into a raw text stream. For a given document, each text object (e.g., term or word) is placed in a data structure, e.g., simple table, with an indication of the number of occurrences of that term.”; ¶90 “summary or description”; Please note that all references to the “document” are intended to refer to the modified document detailed in the 103 combination below.) and metadata as accepted metadata (Mendelevitch, ¶48 “Because certain metadata may be highly pertinent to the classification process, the system advantageously allows the user to configure the system to process or reject certain metadata. For example, any tags, such as HTML tags, and other metadata may be stripped off during processing. Alternatively, a user may configure the system to process certain metadata”), wherein the plurality of headnotes each comprise a respective segment of text that summarizes (Mendelevitch, ¶90 “summary or description”) at least a respective portion of the document as the document being analyzed (Mendelevitch, ¶48, ¶90); 
[b] classifying as placing into ‘bins’ (Mendelevitch ¶52 “The classification architecture of the present invention is preferably binary such that a distinct classifier is built for each topic in the taxonomy. That is, for each topic, each document is processed by a machine-learning algorithm to determine whether the document satisfies a threshold criteria and should therefore be assigned to the topic. Each such classifier outputs for each document a "raw score" that in itself is a measure of the degree of confidence”; ¶60 “all the "raw scores" from all training documents (positive and negative) are processed during training so as to create "bins" of equal size and put the "raw scores" into those bins.”), via the sequential neural network as the machine learning algorithm (Mendelevitch, ¶52 “the classifier for one topic may use a Naive Bayes algorithm and the classifier for a second topic may use a Support Vector Machines algorithm.”), each of the plurality of headnotes as the documents being processed (Mendelevitch ¶52 “The classification architecture of the present invention is preferably binary such that a distinct classifier is built for each topic in the taxonomy. That is, for each topic, each document is processed by a machine-learning algorithm to determine whether the document satisfies a threshold criteria and should therefore be assigned to the topic. Each such classifier outputs for each document a "raw score" that in itself is a measure of the degree of confidence”; ¶60 “all the "raw scores" from all training documents (positive and negative) are processed during training so as to create "bins" of equal size and put the "raw scores" into those bins.”) as a note of decision as a first topic in which the document is predicted to be in, e.g. a first ‘bin’ the document is assigned to (Mendelevitch ¶52 “The classification architecture of the present invention is preferably binary such that a distinct classifier is built for each topic in the taxonomy. That is, for each topic, each document is processed by a machine-learning algorithm to determine whether the document satisfies a threshold criteria and should therefore be assigned to the topic. Each such classifier outputs for each document a "raw score" that in itself is a measure of the degree of confidence”; ¶60 “all the "raw scores" from all training documents (positive and negative) are processed during training so as to create "bins" of equal size and put the "raw scores" into those bins.”; Please note this claim limitation has been interpreted as detailed above) if the headnote as the document being classified (Mendelevitch, ¶52 “That is, for each topic, each document is processed by a machine-learning algorithm to determine whether the document satisfies a threshold criteria and should therefore be assigned to the topic.”) meets at least one jurisdictional criteria as satisfying the threshold criteria (Id), wherein pre-training (Mendelevitch, ¶12 “after being trained”) and classifying as placing into ‘bins’ (Mendelevitch ¶52) produces a pre-trained (Mendelevitch, ¶12 “after being trained”) model as the classification engine (Mendelevitch (¶10 “A topic's Training list contains examples of typical documents for that topic, used to train the automatic classification algorithms.”; ¶40 “FIG. 1 illustrates a client computer system 10 configured with a document classification and categorization application module 40 (also referred to herein as "classification engine" or "categorization engine") according to the present invention.”) with a generalized domain knowledge as the contents of the document (Mendelevitch, ¶43; ¶84 “Advisor algorithm, which serves as a preliminary categorization tool, analyzes the content of the collection as a whole and/or individual documents, including metadata, and determines probable topics among all topics for placement of the documents.”) and enhances the accuracy (Mendelevitch, ¶8 “The present invention provides document categorization systems and methods that are both scalable and accurate”) of classifying as placing into ‘bins’ (Mendelevitch ¶52) the plurality of headnotes as the document being classified (Mendelevitch, ¶52) as the note of decision as a first topic in which the document is predicted to be in, e.g. a first ‘bin’ the document is assigned to (Mendelevitch ¶52; ¶60);
[c] utilizing a [[(Mendelevitch, ¶12) model as the classification engine (Mendelevitch (¶10; ¶40) and the generalized domain knowledge as the contents of the document (Mendelevitch, ¶43; ¶84) to predict as the algorithm within the Topic Advisor which analyzes the content of the collection to predict a “final” score for a document being related to a specific topic (Mendelevitch, ¶48 “The data structure is used by the machine-learning algorithm(s) to determine whether the document should be placed in a topic.”; ¶84 “A Topic Advisor algorithm, which serves as a preliminary categorization tool, analyzes the content of the collection as a whole and/or individual documents, including metadata, and determines probable topics among all topics for placement of the documents.”; Please note that all references to the “Top Advisor” are intended to refer to the modified Top Advisor detailed in the 103 combination below.; (¶61 “After binning, a "final" confidence score is calculated by combining the "binary classifier confidence scores" for all binary classifiers according to the coding matrix. According to one aspect, if a topic is in the positive side of a binary classifier, then that "binary confidence score" is preferably weighted as is … This final single confidence score can be used both for classification and for display to users”), by the server computer as the server 60 (Mendelevitch, ¶40 “client computer system 10 configured with a document classification and categorization application module 40 (also referred to herein as "classification engine" or "categorization engine") according to the present invention … Client system 10 may operate as a stand-alone system or it may be connected to server 60 and/or other client systems 10 over a network 70”), from at least a first classified headnote as after binning, if the document is in the positive side of the topic (Mendelevitch, ¶61 “After binning, a "final" confidence score is calculated by combining the "binary classifier confidence scores" for all binary classifiers according to the coding matrix. According to one aspect, if a topic is in the positive side of a binary classifier, then that "binary confidence score" is preferably weighted as is … This final single confidence score can be used both for classification and for display to users”), a statute as the final classification (Id) pertaining to the first classified headnote as the document (Mendelevitch, ¶60 “all the "raw scores" from all training documents (positive and negative) are processed during training so as to create "bins" of equal size and put the "raw scores" into those bins.”), wherein the predicted statute as the final classification (Mendelevitch, ¶61 “After binning, a "final" confidence score is calculated by combining the "binary classifier confidence scores" for all binary classifiers according to the coding matrix. According to one aspect, if a topic is in the positive side of a binary classifier, then that "binary confidence score" is preferably weighted as is … This final single confidence score can be used both for classification and for display to users”) has associated therewith a first taxonomy as the published taxonomy (Mendelevitch, ¶65 “Preferably two taxonomies are included in the system: draft and published”; ¶50 “a set of zero (0) or more topics from the taxonomy”; ¶52 “The classification architecture of the present invention is preferably binary such that a distinct classifier is built for each topic in the taxonomy”) of topics as the topic represented by the classifier (Mendelevitch, ¶50 “this algorithm determines a set of zero (0) or more topics from the taxonomy to which the document belongs”; ¶54 “the number of binary classifiers ( each representing a single topic)”) wherein producing the pre-trained (Mendelevitch, ¶12) model as the classification engine (Mendelevitch (¶10; ¶40) is outsourced (Mendelevitch, ¶46 “In the client-server arrangement of FIG. 2, portions of module 40 may execute on client 10 while portions may execute on server 60 and/or on any other client 101 -10n.”) to the sequential neural network as the machine learning algorithm (Mendelevitch, ¶52 “the classifier for one topic may use a Naive Bayes algorithm and the classifier for a second topic may use a Support Vector Machines algorithm.”) allowing the transformer-based sequence to sequence model as the algorithm within the Topic Advisor which analyzes the content of the collection to predict a “final” score for a document being related to a specific topic (Mendelevitch, ¶48, ¶84; ¶61) to be tailored (¶10 “A topic's Training list contains examples of typical documents for that topic, used to train the automatic classification algorithms”) to a predictive task as the algorithm within the Topic Advisor which analyzes the content of the collection to predict a “final” score for a document being related to a specific topic (Mendelevitch, ¶48, ¶84; ¶61); 
[d] [[as training (Mendelevitch, ¶12 “In the first stage, a categorization engine ( e.g., algorithm) executes in the background (after being trained), classifying incoming documents to topics”) the transformer-based sequence to sequence model as the algorithm within the Topic Advisor which analyzes the content of the collection to predict a “final” score for a document being related to a specific topic (Mendelevitch, ¶48; ¶84; ¶61) with the plurality of documents as the document being analyzed (Mendelevitch, ¶48, ¶90), plurality of [[ as text objects from the document, such as the summary or description (Mendelevitch, ¶48 “each document is preferably converted into a raw text stream. For a given document, each text object (e.g., term or word) is placed in a data structure, e.g., simple table, with an indication of the number of occurrences of that term.”; ¶90 “summary or description”; Please note that all references to the “document” are intended to refer to the modified document detailed in the 103 combination below.) and metadata as accepted metadata (Mendelevitch, ¶48 “Because certain metadata may be highly pertinent to the classification process, the system advantageously allows the user to configure the system to process or reject certain metadata. For example, any tags, such as HTML tags, and other metadata may be stripped off during processing. Alternatively, a user may configure the system to process certain metadata”);
[e] generating, by the server computer as the server 60 (Mendelevitch, ¶40 “client computer system 10 configured with a document classification and categorization application module 40 (also referred to herein as "classification engine" or "categorization engine") according to the present invention … Client system 10 may operate as a stand-alone system or it may be connected to server 60 and/or other client systems 10 over a network 70”) utilizing the transformer-based sequence to sequence model as the algorithm within the Topic Advisor which analyzes the content of the collection to predict a “final” score for a document being related to a specific topic (Mendelevitch, ¶48 “The data structure is used by the machine-learning algorithm(s) to determine whether the document should be placed in a topic.”; ¶84 “A Topic Advisor algorithm, which serves as a preliminary categorization tool, analyzes the content of the collection as a whole and/or individual documents, including metadata, and determines probable topics among all topics for placement of the documents.”; Please note that all references to the “Top Advisor” are intended to refer to the modified Top Advisor detailed in the 103 combination below.; (¶61 “After binning, a "final" confidence score is calculated by combining the "binary classifier confidence scores" for all binary classifiers according to the coding matrix. According to one aspect, if a topic is in the positive side of a binary classifier, then that "binary confidence score" is preferably weighted as is … This final single confidence score can be used both for classification and for display to users”), a new statute prediction as a second topic label (Mendelevitch, ¶12 “A document may be classified to a single topic or multiple topics or no topics.”) from the at least first headnote as after binning, if the document is in the positive side of the topic (Mendelevitch, ¶61 “After binning, a "final" confidence score is calculated by combining the "binary classifier confidence scores" for all binary classifiers according to the coding matrix. According to one aspect, if a topic is in the positive side of a binary classifier, then that "binary confidence score" is preferably weighted as is … This final single confidence score can be used both for classification and for display to users”) when the classified note of decision as a first topic in which the document is predicted to be in, e.g. a first ‘bin’ the document is assigned to (Mendelevitch ¶52 “The classification architecture of the present invention is preferably binary such that a distinct classifier is built for each topic in the taxonomy. That is, for each topic, each document is processed by a machine-learning algorithm to determine whether the document satisfies a threshold criteria and should therefore be assigned to the topic. Each such classifier outputs for each document a "raw score" that in itself is a measure of the degree of confidence”; ¶60 “all the "raw scores" from all training documents (positive and negative) are processed during training so as to create "bins" of equal size and put the "raw scores" into those bins.”; Please note this claim limitation has been interpreted as detailed above) [[ as the classification label being analyzed (Mendelevitch, ¶52; ¶86);
[f] utilizing the [[as the topic advisor algorithm which determines the probable topic for document placement (Mendelevitch, ¶48 “The data structure is used by the machine-learning algorithm(s) to determine whether the document should be placed in a topic.”; ¶84 “A Topic Advisor algorithm, which serves as a preliminary categorization tool, analyzes the content of the collection as a whole and/or individual documents, including metadata, and determines probable topics among all topics for placement of the documents.), by the server computer as the server 60 (Mendelevitch, ¶40 “client computer system 10 configured with a document classification and categorization application module 40 (also referred to herein as "classification engine" or "categorization engine") according to the present invention … Client system 10 may operate as a stand-alone system or it may be connected to server 60 and/or other client systems 10 over a network 70”), a topic as a third topic represented by the classifier (Mendelevitch, ¶50 “this algorithm determines a set of zero (0) or more topics from the taxonomy to which the document belongs”; ¶54 “the number of binary classifiers ( each representing a single topic)”) from the taxonomy of topics (Mendelevitch, ¶65 “Preferably two taxonomies are included in the system: draft and published”; ¶50 “a set of zero (0) or more topics from the taxonomy”; ¶52 “The classification architecture of the present invention is preferably binary such that a distinct classifier is built for each topic in the taxonomy”) associated with the predicted statute as the classification (Mendelevitch, ¶61 “After binning, a "final" confidence score is calculated by combining the "binary classifier confidence scores" for all binary classifiers according to the coding matrix. According to one aspect, if a topic is in the positive side of a binary classifier, then that "binary confidence score" is preferably weighted as is … This final single confidence score can be used both for classification and for display to users”) for which the first headnote pertains as the document being classified (Mendelevitch, ¶60 “all the "raw scores" from all training documents (positive and negative) are processed during training so as to create "bins" of equal size and put the "raw scores" into those bins.”); 
[g] generating, by the server computer as the server 60 (Mendelevitch, ¶40) utilizing the transformer-based sequence to sequence model as the algorithm within the Topic Advisor which analyzes the content of the collection to predict a “final” score for a document being related to a specific topic (Mendelevitch, ¶48; ¶84; ¶61), a new topic as a third topic label (Mendelevitch, ¶12 “A document may be classified to a single topic or multiple topics or no topics.”) associated with the predicted statute as the final classification (Mendelevitch, ¶61) for with the first headnote pertains as after binning, if the document is in the positive side of the topic (Mendelevitch, ¶61 “After binning, a "final" confidence score is calculated by combining the "binary classifier confidence scores" for all binary classifiers according to the coding matrix. According to one aspect, if a topic is in the positive side of a binary classifier, then that "binary confidence score" is preferably weighted as is … This final single confidence score can be used both for classification and for display to users”) when the topic as a third topic represented by the classifier (Mendelevitch, ¶50 “this algorithm determines a set of zero (0) or more topics from the taxonomy to which the document belongs”; ¶54 “the number of binary classifiers ( each representing a single topic)”) from the taxonomy of topics (Mendelevitch, ¶65 “Preferably two taxonomies are included in the system: draft and published”; ¶50 “a set of zero (0) or more topics from the taxonomy”; ¶52 “The classification architecture of the present invention is preferably binary such that a distinct classifier is built for each topic in the taxonomy”) associated with the predicted statue as the classification (Mendelevitch, ¶61 “After binning, a "final" confidence score is calculated by combining the "binary classifier confidence scores" for all binary classifiers according to the coding matrix. According to one aspect, if a topic is in the positive side of a binary classifier, then that "binary confidence score" is preferably weighted as is … This final single confidence score can be used both for classification and for display to users”) is not available as the final score for the other label being on the negative side (Mendelevitch, ¶61 “After binning, a "final" confidence score is calculated by combining the "binary classifier confidence scores" for all binary classifiers according to the coding matrix. According to one aspect, if a topic is in the positive side of a binary classifier, then that "binary confidence score" is preferably weighted as is … This final single confidence score can be used both for classification and for display to users”);
[h] [[periodically retraining]] as training (Mendelevitch, ¶12 “In the first stage, a categorization engine ( e.g., algorithm) executes in the background (after being trained), classifying incoming documents to topics”) the sequential neural network as the machine learning algorithm (Mendelevitch, ¶52 “the classifier for one topic may use a Naive Bayes algorithm and the classifier for a second topic may use a Support Vector Machines algorithm.”) with changes to the taxonomy of topics as edits being made to the taxonomy structure (Mendelevitch, ¶101 “An Information Manager edits the taxonomy structure (i.e., adds topics, moves topics, deletes topics, modifies topics). The workflow memory system automatically requeues content in affected topics for re-categorization immediately.”), 
[i] associating (Mendelevitch, ¶87 “Topic and document information stored in the system”), by the server computer as the server 60 (Mendelevitch, ¶40 “client computer system 10 configured with a document classification and categorization application module 40 (also referred to herein as "classification engine" or "categorization engine") according to the present invention … Client system 10 may operate as a stand-alone system or it may be connected to server 60 and/or other client systems 10 over a network 70”), the first headnote as the first document being classified (Mendelevitch, ¶60 “all the "raw scores" from all training documents (positive and negative) are processed during training so as to create "bins" of equal size and put the "raw scores" into those bins.”) with the predicted topic as the topic represented by the classifier (Mendelevitch, ¶50 “this algorithm determines a set of zero (0) or more topics from the taxonomy to which the document belongs”; ¶54 “the number of binary classifiers ( each representing a single topic)”). 
Mendelevitch does not explicitly teach a sequential neural network … …utilizing a transformer-based sequence to sequence model to predict … fine-tuning the transformer-based sequence to sequence model…  generating, by the server computer utilizing the transformer-based sequence to sequence model a new statute prediction …when the classified note of decision does not cite a given statute in the classified note of decision; utilizing the transformer-based sequence to sequence model to predict  … a topic… generating, by the server computer utilizing the transformer-based sequence to sequence model, a new topic …; periodically retraining the sequential neural network …
Makhija teaches [a] a sequential neural network (Makhija, ¶84 “The sequence of embedding vectors is then input to a bidirectional long-short-term memory (LSTM) layer 604 which encodes the surrounding context of each token as output. An LSTM layer is an RNN that encodes a hidden context by parsing through a sequence of vector inputs:…”)…
[b] classifying, via the sequential neural network, each of the plurality of headnotes as using the LSTM to produce y, aka the weight parameters for the inputs (Makhija, ¶84 “An LSTM layer is an RNN that encodes a hidden context by parsing through a sequence of vector inputs: … where y, is the output vector of the LSTM provided as input to the higher layers in the model and functions f, g, encode the hidden context of the RNN h, and the output of the RNN layer y,. The parameters w(hh), w (l,x), w(hy) are the corresponding weights for the inputs ht,_1, x,, h,.”) … wherein pre-training and classifying produces a pre-trained model as storing trained  classifiers (¶47 “The system 110 includes a memory data store 116 for accessing contract data from registered and unregistered entity and also storing plurality of training classification models created by support mechanism 115.”) with a generalized domain knowledge (¶89 “The application of meta learning to data element extraction results in performing in data-efficient learning where the model 900 is generalized to a new data element 904 using fewer samples.”) and enhances the accuracy of classifying (¶64 “For, this level of processing, the system requires training of the data models with higher accuracy and confidence score to make changes to the contract”; ¶92 “This level of accuracy is required for ensuring risk free contract creation. … This is achieved using a sequence-to-sequence model which includes an RNN based encoder and decoder.”)…;
[c] utilizing a transformer-based sequence to sequence model as passing through the self-attention layer (Makhija, ¶84 “The output of Bi-LSTM 604 is then passed through a self-attention layer 605 to a time distributed dense layer 606 for further encoding”) with the pretrained model (¶8 “sending the data object to at least one data recognition training model for identification of at least one data attribute wherein the data recognition training model processes the data object”; ¶47 “plurality of training classification models”) to predict… as the output y of the Bi-LSTM (Id) … 
[d] fine-tuning (Makhija, ¶89 “α, β are the learning rates to perform the fine-tuning task and meta training respectively.”) the transformer-based sequence to sequence model (Makhija, ¶84, ¶92 “This is achieved using a sequence-to-sequence model which includes an RNN based encoder and decoder”) with the plurality of documents (Makhija, ¶76 “The present invention deploys a data extraction and mapping module that retains structure of the text in the data object/document”), plurality of headnotes as section headings (Makhija, ¶76 “The data structure includes retaining the font size of the text to detect section headings and spacing between lines to detect paragraphs in the contract text.”) and metadata as font size and spacing (Id);
[e] generating, by the server computer utilizing the transformer-based sequence to sequence model as passing through the self-attention layer (Makhija, ¶84 “The output of Bi-LSTM 604 is then passed through a self-attention layer 605 to a time distributed dense layer 606 for further encoding”) a new statute prediction as the identified classification (Makhija, ¶79 “The data attributes are extracted by executing a sentence level segmentation of the data object and classification of each sentence into a data attribute category.”) from the at least first headnote when the classified note of decision does not cite as the absence or deviations of certain attributes (Makhija, ¶77n “In an embodiment, the extracted data attributes are compared with a contract data attribute library to detect presence or absence of certain attributes and deviations from a standard contract template in the library wherein the deviations are analyzed to generate a risk score for quantifying the risk involved for an entity on enforcing a contract.”) a given statute as the extracted data attributes in the attribute library (Id) in the classified note of decision as the specific classification that the attribute is associated with (¶79);
[f] utilizing the transformer-based sequence to sequence model to predict as passing through the self-attention layer (Makhija, ¶84 “The output of Bi-LSTM 604 is then passed through a self-attention layer 605 to a time distributed dense layer 606 for further encoding”) … a topic…
[g] generating, by the server computer utilizing the transformer-based sequence to sequence model as passing through the self-attention layer (Makhija, ¶84 “The output of Bi-LSTM 604 is then passed through a self-attention layer 605 to a time distributed dense layer 606 for further encoding”), a new topic associated with the predicted statute for with the first headnote pertains as determining the classification with the satisfactory classification probability (¶84) when the topic from the taxonomy of topics  associated with the predicted statue is not available as the probability not being satisfactory for a given category (¶84 “The target probability distribution used as an input for cross entropy loss is a one-hot encoded vector t with a probability of 1 for the desired category and the probability distribution y is output by the softmax layer”);
[h] periodically retraining the sequential neural network with changes to the taxonomy of topics as alternating between the first and the second phase where the model is trained using the aggregation of data elements (Makhija, ¶88 “The system of the invention deploys model-agnostic meta-learning where the meta learning model 900 is trained by alternating between two training phases 901 as shown in FIG. 9. The first is an element specific training where the gradient direction to update the model parameters is obtained by using the data of a single element. The second phase is a meta-update 902 to the model where the gradient direction is obtained by an aggregation of gradient directions from all the data elements 903:”)
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the Machine learning algorithm and the Topic advisor algorithm taught by Mendelevitch using the AI model taught by Makhija as it yields the predictable results of automating the process using a known, off-the-self, AI modeling system to achieve the desired results.  Please note that both systems are extracting data from text, assigning weights, which is used to determine a category tag for the data.  Both systems are related to the same technology (e.g. uses of Machine learning to categorize text).  The automation of the extraction process is known to save a lot of time (Makhija, ¶83) when accuracy matters (Makhija, ¶92) which is a problem that Mendelevitch acknowledges is faced within the field of document classification (Mendelevitch, ¶3, ¶4, ¶6).  Furthermore, the proposed combination would enable the system to automate the detection and addition of new elements using the AI scripts (Makhija, ¶90, ¶51 “The support mechanism further includes a data attribute library creation script (DALCS) 121 that is updated each time a new data attribute for a contract is identified from a newly executed contract that is added to the data store 116 of the system 110.”) enabling the system to automate the manual process depicted by Mendelevitch.
Mendelevitch does not explicitly teach a plurality of documents comprising a plurality of headnotes… wherein the plurality of headnotes each comprise a respective segment of text that summarizes at least a respective portion of the document… note of decision does not cite a given statute in the classified note of decision.
Al-Kofahi teaches [a] a plurality of documents as the judicial opinion, aka an electronic document (Al-Kofahi, ¶32 “one judicial opinion (or generally an electronic document), such as electronic judicial opinion (or case) 115. Judicial opinion 115 includes and/or is associated with one or more headnotes in headnote database 120, such as headnotes 122 and 124.”) comprising a plurality of headnotes as the headnotes (Id)… headnotes (Al-Kofahi, ¶32)… wherein the plurality of headnotes each comprise a respective segment of text that summarizes at least a respective portion of the document (Al-Kofahi, ¶27 “The term "headnote" refers to an electronic textual summary or abstract concerning a point of law within a written judicial opinion. The number of headnotes associated with a judicial opinion ( or case) depends on the number of issues it addresses.”)… [e] note of decision as the classification associated with the case decision (Al-Kofahi, ¶44 “Headnotes database 120 receives a new set of headnotes (such as headnotes 126 and 128) for recently decided cases, and classification processor 130 determines whether one or more of the cases associated with the headnotes are sufficiently relevant to any of the annotations within ALR to justify recommending assignments of the headnotes ( or associated cases) to one or more of the annotations. (Some other embodiments directly assign the headnotes or associated cases to the annotations.) … However, both accepted and rejected recommendations are fed back to classification processor 130 for incremental training or tuning of its decision criteria”) does not cite a given statute in the classified note of decision (Al-Kofahi, ¶50 “parameters are based on proxy text, for example, the text of associated headnotes, as opposed to text of the annotation itself.”; ¶51; ¶57 “computing a set of similarity scores based on the similarity of text in each input headnote text to the text associated with each annotation”).
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the classification system taught by the proposed combination to facilitate the classification of headnotes to address the problems for articulated for the American legal system (Al-Kofahi, ¶4; ¶8, ¶9).  The proposed system enables the automatic identification of classes for specific headnotes using AI to facilitate search and retrieval.

With regard to claim 2 the proposed combination further teaches comprising generating an annotated statute as storing the document information and the topic together within the storage system (Mendelevitch, ¶87 “Topic and document information stored in the system”) by annotating the predicted statute as the classification (Mendelevitch, ¶61 “After binning, a "final" confidence score is calculated by combining the "binary classifier confidence scores" for all binary classifiers according to the coding matrix. According to one aspect, if a topic is in the positive side of a binary classifier, then that "binary confidence score" is preferably weighted as is … This final single confidence score can be used both for classification and for display to users”) with the first headnote as the document being processed (Mendelevitch ¶52 “The classification architecture of the present invention is preferably binary such that a distinct classifier is built for each topic in the taxonomy. That is, for each topic, each document is processed by a machine-learning algorithm to determine whether the document satisfies a threshold criteria and should therefore be assigned to the topic. Each such classifier outputs for each document a "raw score" that in itself is a measure of the degree of confidence”; ¶60 “all the "raw scores" from all training documents (positive and negative) are processed during training so as to create "bins" of equal size and put the "raw scores" into those bins.”).   

With regard to claim 3 the proposed combination further teaches wherein annotating as storing the document information and the topic together within the storage system (Mendelevitch, ¶87 “Topic and document information stored in the system”) the predicted statute as the classification (Mendelevitch, ¶61 “After binning, a "final" confidence score is calculated by combining the "binary classifier confidence scores" for all binary classifiers according to the coding matrix. According to one aspect, if a topic is in the positive side of a binary classifier, then that "binary confidence score" is preferably weighted as is … This final single confidence score can be used both for classification and for display to users”) comprises adding the segment of text (Mendelevitch, ¶90 “summary or description”) from the first headnote as the document being analyzed (Mendelevitch, ¶48, ¶90) to the annotated statute as storing the document information and the topic together within the storage system (Mendelevitch, ¶87 “Topic and document information stored in the system”; ¶90 “Metadata available for a document: for example, title(s), summary or description, location (URL)…”).  

With regard to claim 4 the proposed combination further teaches wherein annotating as storing the document information and the topic together within the storage system (Mendelevitch, ¶87 “Topic and document information stored in the system”) the predicted statute as the classification (Mendelevitch, ¶61 “After binning, a "final" confidence score is calculated by combining the "binary classifier confidence scores" for all binary classifiers according to the coding matrix. According to one aspect, if a topic is in the positive side of a binary classifier, then that "binary confidence score" is preferably weighted as is … This final single confidence score can be used both for classification and for display to users”) comprises adding, to the annotated statute, a link to the document as storing the document information (e.g. a URL) and the topic together within the storage system (Mendelevitch, ¶87 “Topic and document information stored in the system”; ¶90 “Metadata available for a document: for example, title(s), summary or description, location (URL)…”).  

With regard to claim 5 the proposed combination further teaches wherein the sequential neural network includes a bidirectional long-short term memory (LSTM) based classifier model (Makhija, ¶84 “The sequence of embedding vectors is then input to a bidirectional long-short-term memory (LSTM) layer 604 which encodes the surrounding context of each token as output. An LSTM layer is an RNN that encodes a hidden context by parsing through a sequence of vector inputs:…”).

With regard to claims 7 and 15 the proposed combination further teaches wherein the first of the plurality of headnotes does not contain an explicit citation to the predicted statute as the text of associated headnotes, as opposed to text of the annotation itself, aka proxy text or other related documents (Al-Kofahi, ¶50 “parameters are based on proxy text, for example, the text of associated headnotes, as opposed to text of the annotation itself.”; ¶51) which may identify a discernable match (Mendelevitch, ¶12 “the raw score generally does not indicate how well a document matches a topic, only that there is some discernable match.”), and wherein the transformer-based sequence to sequence model (Makhaji, ¶84, ¶87) is trained to suggest as using the ML model depicted by Makhija to automate the manual process of adding new topics  (Mendelevitch, ¶77 “a user is advantageously able to perform topic management functions. … add, move, and delete new topics”; Makhija, ¶51, ¶84 “the data model enables extraction of contract data elements using machine learning. … A recurrent neural network (RNN) is implemented to perform the BIO tagging. The input to the network is a sequence of tokens 601 and the output is a sequence of BIO tags with one tag corresponding to one token. The RNN model contains an embedding layer 602 that maps the vocabulary index of a token to an embedding vector. The sequence of embedding vectors is then input to a bidirectional long-short-term memory (LSTM) layer 604 which encodes the surrounding context of each token as output.”; ¶87 “The tokens 701 are passed through an embedding layer 702 to obtain word embedding vectors per token. The sequence of word embeddings is passed through multiple CNN layers 704 where trigram, five-grams and seven-grams within the sequence are passed through filters to encode characteristic utterances that are specific to each data attribute/clause category.  The output of the plurality of CNN layers 704 are passed through max pooling layers 705 to obtain tokens with importance and the final output is through a sigmoid layer 708 where the output is collection of multiple independent probabilities 709 corresponding to each data attribute/clause category.”) statutes (Al-Kofahi, ¶44 “classification processor 130 determines whether one or more of the cases associated with the headnotes are sufficiently relevant to any of the annotations within ALR to justify recommending assignments of the headnotes ( or associated cases) to one or more of the annotations. (Some other embodiments directly assign the headnotes or associated cases to the annotations.)… Accepted recommendations are added as citations to the respective annotations in ALR annotation database 110 and rejected recommendations are not”) based on headnote text without citations to any statute as the absence or deviations of certain attributes (Makhija, ¶77n “In an embodiment, the extracted data attributes are compared with a contract data attribute library to detect presence or absence of certain attributes and deviations from a standard contract template in the library wherein the deviations are analyzed to generate a risk score for quantifying the risk involved for an entity on enforcing a contract.”) such when the text is associated with the headnote but not part of the annotation itself such as when proxy text is used (Al-Kofahi, ¶50 “parameters are based on proxy text, for example, the text of associated headnotes, as opposed to text of the annotation itself.”; ¶51; ¶57 “computing a set of similarity scores based on the similarity of text in each input headnote text to the text associated with each annotation”).  
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the classification system of the proposed combination to incorporate similar values in addition to exact values as it enables the system to classify related concepts.

	With regard to claims 8 and 16, the proposed combination further teaches wherein the first headnote comprises a citation to a second statute as the text of associated headnotes, as opposed to text of the annotation itself, aka proxy text or other related documents (Al-Kofahi, ¶50 “parameters are based on proxy text, for example, the text of associated headnotes, as opposed to text of the annotation itself.”; ¶51) different than the predicted statute as the text of the annotation itself (Id), and wherein the transformer-based sequence to sequence model (Makhija, ¶84, ¶87) is trained to suggest  as using the ML model depicted by Makhija to automate the manual process of adding new topics  (Mendelevitch, ¶77 “a user is advantageously able to perform topic management functions. … add, move, and delete new topics”; Makhija, ¶51, ¶84 “the data model enables extraction of contract data elements using machine learning. … A recurrent neural network (RNN) is implemented to perform the BIO tagging. The input to the network is a sequence of tokens 601 and the output is a sequence of BIO tags with one tag corresponding to one token. The RNN model contains an embedding layer 602 that maps the vocabulary index of a token to an embedding vector. The sequence of embedding vectors is then input to a bidirectional long-short-term memory (LSTM) layer 604 which encodes the surrounding context of each token as output.”; ¶87 “The tokens 701 are passed through an embedding layer 702 to obtain word embedding vectors per token. The sequence of word embeddings is passed through multiple CNN layers 704 where trigram, five-grams and seven-grams within the sequence are passed through filters to encode characteristic utterances that are specific to each data attribute/clause category.  The output of the plurality of CNN layers 704 are passed through max pooling layers 705 to obtain tokens with importance and the final output is through a sigmoid layer 708 where the output is collection of multiple independent probabilities 709 corresponding to each data attribute/clause category.”) statutes based on the segment of text (Al-Kofahi, ¶44 “classification processor 130 determines whether one or more of the cases associated with the headnotes are sufficiently relevant to any of the annotations within ALR to justify recommending assignments of the headnotes ( or associated cases) to one or more of the annotations. (Some other embodiments directly assign the headnotes or associated cases to the annotations.)… Accepted recommendations are added as citations to the respective annotations in ALR annotation database 110 and rejected recommendations are not”), wherein the segment of text does not include an explicit citation to the predicted statute as the text of associated headnotes, as opposed to text of the annotation itself, aka proxy text or other related documents (Al-Kofahi, ¶50 “parameters are based on proxy text, for example, the text of associated headnotes, as opposed to text of the annotation itself.”; ¶51; ¶57 “computing a set of similarity scores based on the similarity of text in each input headnote text to the text associated with each annotation”).  
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the classification system of the proposed combination to incorporate similar values in addition to exact values as it enables the system to classify related concepts.

With regard to claim 9 the proposed combination further teaches predicting as the algorithm within the Topic Advisor which analyzes the content of the collection (Mendelevitch, ¶48 “The data structure is used by the machine-learning algorithm(s) to determine whether the document should be placed in a topic.”; ¶84 “A Topic Advisor algorithm, which serves as a preliminary categorization tool, analyzes the content of the collection as a whole and/or individual documents, including metadata, and determines probable topics among all topics for placement of the documents.”), by the server computer as the server 60 (Mendelevitch, ¶40 “client computer system 10 configured with a document classification and categorization application module 40 (also referred to herein as "classification engine" or "categorization engine") according to the present invention … Client system 10 may operate as a stand-alone system or it may be connected to server 60 and/or other client systems 10 over a network 70”), using the transformer-based sequence to sequence model as the algorithm within the Topic Advisor which analyzes the content of the collection (Mendelevitch, ¶48 “The data structure is used by the machine-learning algorithm(s) to determine whether the document should be placed in a topic.”; ¶84 “A Topic Advisor algorithm, which serves as a preliminary categorization tool, analyzes the content of the collection as a whole and/or individual documents, including metadata, and determines probable topics among all topics for placement of the documents.”), for at least a second of the plurality of headnotes as a second set of documents being processed, such as newly added or modified documents (Mendelevitch ¶52 “The classification architecture of the present invention is preferably binary such that a distinct classifier is built for each topic in the taxonomy. That is, for each topic, each document is processed by a machine-learning algorithm to determine whether the document satisfies a threshold criteria and should therefore be assigned to the topic. Each such classifier outputs for each document a "raw score" that in itself is a measure of the degree of confidence”; ¶60 “all the "raw scores" from all training documents (positive and negative) are processed during training so as to create "bins" of equal size and put the "raw scores" into those bins.”; ¶74, ¶99), a second statute as the final classifications for the second set of documents (Mendelevitch, ¶61 “After binning, a "final" confidence score is calculated by combining the "binary classifier confidence scores" for all binary classifiers according to the coding matrix. According to one aspect, if a topic is in the positive side of a binary classifier, then that "binary confidence score" is preferably weighted as is … This final single confidence score can be used both for classification and for display to users”) pertaining to the second headnote as the document being classified (Mendelevitch, ¶60 “all the "raw scores" from all training documents (positive and negative) are processed during training so as to create "bins" of equal size and put the "raw scores" into those bins.”), wherein the second predicted statute as the final classifications for the second set of documents (Mendelevitch, ¶61 “After binning, a "final" confidence score is calculated by combining the "binary classifier confidence scores" for all binary classifiers according to the coding matrix. According to one aspect, if a topic is in the positive side of a binary classifier, then that "binary confidence score" is preferably weighted as is … This final single confidence score can be used both for classification and for display to users”) has associated therewith a second taxonomy of topics as the draft taxonomy (Mendelevitch, ¶65 “Preferably two taxonomies are included in the system: draft and published”; ¶50 “a set of zero (0) or more topics from the taxonomy”; ¶52 “The classification architecture of the present invention is preferably binary such that a distinct classifier is built for each topic in the taxonomy”) of topics as the topic represented by the classifier (Mendelevitch, ¶50 “this algorithm determines a set of zero (0) or more topics from the taxonomy to which the document belongs”; ¶54 “the number of binary classifiers ( each representing a single topic)”); 
utilizing self-attention within the transformer-based sequence to sequence model as the algorithm within the Topic Advisor which analyzes the content of the collection implemented as the self-attention layer (Mendelevitch, ¶48 “The data structure is used by the machine-learning algorithm(s) to determine whether the document should be placed in a topic.”; ¶84 “A Topic Advisor algorithm, which serves as a preliminary categorization tool, analyzes the content of the collection as a whole and/or individual documents, including metadata, and determines probable topics among all topics for placement of the documents.”; Makhija, ¶84 “The output of Bi-LSTM 604 is then passed through a self-attention layer 605 to a time distributed dense layer 606 for further encoding”) to model a relationship between a given word in the first headnote and a plurality of other words in the first headnote (Mendelevitch, ¶12 “a match for one or several features or set(s) of keywords will indicate that the document should be classified to a certain topic”; ¶48 “For a given document, each text object (e.g., term or word) is placed in a data structure, e.g., simple table, with an indication of the number of occurrences of that term.”; ¶79 “topic keywords”).

With regard to claims 10 and 17 the proposed combination further teaches training the transformer-based sequence to sequence model as enabling the system to automate the detection and addition of new elements using the AI scripts (Makhija, ¶90, ¶51 “The support mechanism further includes a data attribute library creation script (DALCS) 121 that is updated each time a new data attribute for a contract is identified from a newly executed contract that is added to the data store 116 of the system 110.”) to predict as the topic represented by the classifier (Mendelevitch, ¶50 “this algorithm determines a set of zero (0) or more topics from the taxonomy to which the document belongs”; ¶54 “the number of binary classifiers ( each representing a single topic)”) the new topic (Mendelevitch, ¶77 “a user is advantageously able to perform topic management functions. … add, move, and delete new topics”; Makhija, ¶51 “The support mechanism further includes a data attribute library creation script (DALCS) 121 that is updated each time a new data attribute for a contract is identified from a newly executed contract that is added to the data store 116 of the system 110.”), wherein the new topic includes terms not recited in the second headnote (Mendelevitch, ¶12 “a match for one or several features or set(s) of keywords will indicate that the document should be classified to a certain topic”; ¶48 “For a given document, each text object (e.g., term or word) is placed in a data structure, e.g., simple table, with an indication of the number of occurrences of that term.”; ¶79 “topic keywords”).  

With regard to claims 11 and 18 the proposed combination further teaches wherein the new topic (Mendelevitch, ¶77 “a user is advantageously able to perform topic management functions. … add, move, and delete new topics”; Makhija, ¶90, ¶51 “The support mechanism further includes a data attribute library creation script (DALCS) 121 that is updated each time a new data attribute for a contract is identified from a newly executed contract that is added to the data store 116 of the system 110.”) is unique (Al-Kofahi, ¶39 “About 89% of the headnotes are associated with a single class identifier”; ¶62 “The exemplary determination of similarity scores S3 relies on assumptions that class identifiers are assigned to a headnote independently of each other, and that only one class identifier in {k his actually relevant to annotation a. Although the one-class assumption does not hold for many annotations, it improves the overall performance of the system.”) to the second taxonomy as the draft taxonomy (Mendelevitch, ¶65 “Preferably two taxonomies are included in the system: draft and published”; ¶50 “a set of zero (0) or more topics from the taxonomy”; ¶52 “The classification architecture of the present invention is preferably binary such that a distinct classifier is built for each topic in the taxonomy”) of topics as the topic represented by the classifier (Mendelevitch, ¶50 “this algorithm determines a set of zero (0) or more topics from the taxonomy to which the document belongs”; ¶54 “the number of binary classifiers ( each representing a single topic)”) associated with the second statute as the final classifications for the second set of documents in the set of topics within the draft taxonomy (Mendelevitch, ¶50 “a set of zero (0) or more topics from the taxonomy”; ¶61 “After binning, a "final" confidence score is calculated by combining the "binary classifier confidence scores" for all binary classifiers according to the coding matrix. According to one aspect, if a topic is in the positive side of a binary classifier, then that "binary confidence score" is preferably weighted as is … This final single confidence score can be used both for classification and for display to users”) pertaining to the second headnote pertains as the document being classified (Mendelevitch, ¶60 “all the "raw scores" from all training documents (positive and negative) are processed during training so as to create "bins" of equal size and put the "raw scores" into those bins.”).  
  It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the proposed device to associate a specific headnote with a single class identifier as this is commonly expected within the field of art as evidenced by Al-Kofahi.

With regard to claim 12 the proposed combination further teaches comprising retrieving the taxonomy of topics as the published taxonomy (Mendelevitch, ¶65 “Preferably two taxonomies are included in the system: draft and published”; ¶50 “a set of zero (0) or more topics from the taxonomy”; ¶52 “The classification architecture of the present invention is preferably binary such that a distinct classifier is built for each topic in the taxonomy”) associated with the statute pertaining to the first headnote  as the document being classified (Mendelevitch, ¶60 “all the "raw scores" from all training documents (positive and negative) are processed during training so as to create "bins" of equal size and put the "raw scores" into those bins.”) and using the retrieved taxonomy of topics as the published taxonomy (Mendelevitch, ¶65 “Preferably two taxonomies are included in the system: draft and published”; ¶50 “a set of zero (0) or more topics from the taxonomy”; ¶52 “The classification architecture of the present invention is preferably binary such that a distinct classifier is built for each topic in the taxonomy”) as input  (Mendelevitch, ¶60 “For binning in a given binary classifier, all the "raw scores" from all training documents (positive and negative) are processed during training so as to create "bins" of equal size and put the "raw scores" into those bins. Given a new document, the "raw score" is examined and placed in the appropriate bin;”) for predicting the topic as the topic represented by the classifier (Mendelevitch, ¶50 “this algorithm determines a set of zero (0) or more topics from the taxonomy to which the document belongs”; ¶54 “the number of binary classifiers ( each representing a single topic)”) from the taxonomy as the published taxonomy (Mendelevitch, ¶65 “Preferably two taxonomies are included in the system: draft and published”; ¶50 “a set of zero (0) or more topics from the taxonomy”; ¶52 “The classification architecture of the present invention is preferably binary such that a distinct classifier is built for each topic in the taxonomy”) associated with the statute pertaining to the first headnote as the document being classified (Mendelevitch, ¶60 “all the "raw scores" from all training documents (positive and negative) are processed during training so as to create "bins" of equal size and put the "raw scores" into those bins.”).  

With regard to claim 13 the proposed combination further teaches wherein the predicted statute as the classification (Mendelevitch, ¶61 “After binning, a "final" confidence score is calculated by combining the "binary classifier confidence scores" for all binary classifiers according to the coding matrix. According to one aspect, if a topic is in the positive side of a binary classifier, then that "binary confidence score" is preferably weighted as is … This final single confidence score can be used both for classification and for display to users”) and first headnote as the documents being processed (Mendelevitch ¶52 “The classification architecture of the present invention is preferably binary such that a distinct classifier is built for each topic in the taxonomy. That is, for each topic, each document is processed by a machine-learning algorithm to determine whether the document satisfies a threshold criteria and should therefore be assigned to the topic. Each such classifier outputs for each document a "raw score" that in itself is a measure of the degree of confidence”; ¶60 “all the "raw scores" from all training documents (positive and negative) are processed during training so as to create "bins" of equal size and put the "raw scores" into those bins.”) are further used as input  (Mendelevitch, ¶60 “For binning in a given binary classifier, all the "raw scores" from all training documents (positive and negative) are processed during training so as to create "bins" of equal size and put the "raw scores" into those bins. Given a new document, the "raw score" is examined and placed in the appropriate bin;”) for predicting the topic as the topic represented by the classifier (Mendelevitch, ¶50 “this algorithm determines a set of zero (0) or more topics from the taxonomy to which the document belongs”; ¶54 “the number of binary classifiers ( each representing a single topic)”) from the taxonomy as the published taxonomy (Mendelevitch, ¶65 “Preferably two taxonomies are included in the system: draft and published”; ¶50 “a set of zero (0) or more topics from the taxonomy”; ¶52 “The classification architecture of the present invention is preferably binary such that a distinct classifier is built for each topic in the taxonomy”)  associated with the statute as the final classification (Mendelevitch, ¶61 “After binning, a "final" confidence score is calculated by combining the "binary classifier confidence scores" for all binary classifiers according to the coding matrix. According to one aspect, if a topic is in the positive side of a binary classifier, then that "binary confidence score" is preferably weighted as is … This final single confidence score can be used both for classification and for display to users”) pertaining to the first headnote as the documents being processed (Mendelevitch ¶52 “The classification architecture of the present invention is preferably binary such that a distinct classifier is built for each topic in the taxonomy. That is, for each topic, each document is processed by a machine-learning algorithm to determine whether the document satisfies a threshold criteria and should therefore be assigned to the topic. Each such classifier outputs for each document a "raw score" that in itself is a measure of the degree of confidence”; ¶60 “all the "raw scores" from all training documents (positive and negative) are processed during training so as to create "bins" of equal size and put the "raw scores" into those bins.”).  

With regard to claim 14 Mendelevitch teaches A computer implemented method for categorizing documents comprising:  
[a] pre-training (Mendelevitch, ¶12 “after being trained”; ¶6 “These algorithms are usually "trained" on a small document subset (the training set) used to represent typical documents in each topic. The trained algorithm is then applied to the unclassified documents.”; ¶60) a [[ as the machine learning algorithm (Mendelevitch, ¶52 “the classifier for one topic may use a Naive Bayes algorithm and the classifier for a second topic may use a Support Vector Machines algorithm.”; Please note that all references to the “machine learning algorithm” are intended to refer to the modified machine learning algorithm detailed in the 103 combination below.), by server computer as the server 60 (Mendelevitch, ¶40 “client computer system 10 configured with a document classification and categorization application module 40 (also referred to herein as "classification engine" or "categorization engine") according to the present invention … Client system 10 may operate as a stand-alone system or it may be connected to server 60 and/or other client systems 10 over a network 70”), with a plurality of documents as the documents being classified (Id) comprising a plurality of [[ as text objects from the document, such as the summary or description (Mendelevitch, ¶48 “each document is preferably converted into a raw text stream. For a given document, each text object (e.g., term or word) is placed in a data structure, e.g., simple table, with an indication of the number of occurrences of that term.”; ¶90 “summary or description”; Please note that all references to the “document” are intended to refer to the modified document detailed in the 103 combination below.) and metadata as accepted metadata (Mendelevitch, ¶48 “Because certain metadata may be highly pertinent to the classification process, the system advantageously allows the user to configure the system to process or reject certain metadata. For example, any tags, such as HTML tags, and other metadata may be stripped off during processing. Alternatively, a user may configure the system to process certain metadata”), wherein the plurality of headnotes each comprise a respective segment of text that summarizes (Mendelevitch, ¶90 “summary or description”) at least a respective portion of the document as the document being analyzed (Mendelevitch, ¶48, ¶90); 
[b] classifying as placing into ‘bins’ (Mendelevitch ¶52 “The classification architecture of the present invention is preferably binary such that a distinct classifier is built for each topic in the taxonomy. That is, for each topic, each document is processed by a machine-learning algorithm to determine whether the document satisfies a threshold criteria and should therefore be assigned to the topic. Each such classifier outputs for each document a "raw score" that in itself is a measure of the degree of confidence”; ¶60 “all the "raw scores" from all training documents (positive and negative) are processed during training so as to create "bins" of equal size and put the "raw scores" into those bins.”), via the sequential neural network as the machine learning algorithm (Mendelevitch, ¶52 “the classifier for one topic may use a Naive Bayes algorithm and the classifier for a second topic may use a Support Vector Machines algorithm.”), each of the plurality of headnotes as the documents being processed (Mendelevitch ¶52 “The classification architecture of the present invention is preferably binary such that a distinct classifier is built for each topic in the taxonomy. That is, for each topic, each document is processed by a machine-learning algorithm to determine whether the document satisfies a threshold criteria and should therefore be assigned to the topic. Each such classifier outputs for each document a "raw score" that in itself is a measure of the degree of confidence”; ¶60 “all the "raw scores" from all training documents (positive and negative) are processed during training so as to create "bins" of equal size and put the "raw scores" into those bins.”) as a note of decision as a first topic in which the document is predicted to be in, e.g. a first ‘bin’ the document is assigned to (Mendelevitch ¶52 “The classification architecture of the present invention is preferably binary such that a distinct classifier is built for each topic in the taxonomy. That is, for each topic, each document is processed by a machine-learning algorithm to determine whether the document satisfies a threshold criteria and should therefore be assigned to the topic. Each such classifier outputs for each document a "raw score" that in itself is a measure of the degree of confidence”; ¶60 “all the "raw scores" from all training documents (positive and negative) are processed during training so as to create "bins" of equal size and put the "raw scores" into those bins.”; Please note this claim limitation has been interpreted as detailed above) if the headnote as the document being classified (Mendelevitch, ¶52 “That is, for each topic, each document is processed by a machine-learning algorithm to determine whether the document satisfies a threshold criteria and should therefore be assigned to the topic.”) meets at least one jurisdictional criteria as satisfying the threshold criteria (Id), wherein pre-training (Mendelevitch, ¶12 “after being trained”) and classifying as placing into ‘bins’ (Mendelevitch ¶52) produces a pre-trained (Mendelevitch, ¶12 “after being trained”) model as the classification engine (Mendelevitch (¶10 “A topic's Training list contains examples of typical documents for that topic, used to train the automatic classification algorithms.”; ¶40 “FIG. 1 illustrates a client computer system 10 configured with a document classification and categorization application module 40 (also referred to herein as "classification engine" or "categorization engine") according to the present invention.”) with a generalized domain knowledge as the contents of the document (Mendelevitch, ¶43; ¶84 “Advisor algorithm, which serves as a preliminary categorization tool, analyzes the content of the collection as a whole and/or individual documents, including metadata, and determines probable topics among all topics for placement of the documents.”) and enhances the accuracy (Mendelevitch, ¶8 “The present invention provides document categorization systems and methods that are both scalable and accurate”) of classifying as placing into ‘bins’ (Mendelevitch ¶52) the plurality of headnotes as the document being classified (Mendelevitch, ¶52) as the note of decision as a first topic in which the document is predicted to be in, e.g. a first ‘bin’ the document is assigned to (Mendelevitch ¶52; ¶60);
[c] utilizing a [[(Mendelevitch, ¶12) model as the classification engine (Mendelevitch (¶10; ¶40) and the generalized domain knowledge as the contents of the document (Mendelevitch, ¶43; ¶84) to predict as the algorithm within the Topic Advisor which analyzes the content of the collection to predict a “final” score for a document being related to a specific topic (Mendelevitch, ¶48 “The data structure is used by the machine-learning algorithm(s) to determine whether the document should be placed in a topic.”; ¶84 “A Topic Advisor algorithm, which serves as a preliminary categorization tool, analyzes the content of the collection as a whole and/or individual documents, including metadata, and determines probable topics among all topics for placement of the documents.”; Please note that all references to the “Top Advisor” are intended to refer to the modified Top Advisor detailed in the 103 combination below.; (¶61 “After binning, a "final" confidence score is calculated by combining the "binary classifier confidence scores" for all binary classifiers according to the coding matrix. According to one aspect, if a topic is in the positive side of a binary classifier, then that "binary confidence score" is preferably weighted as is … This final single confidence score can be used both for classification and for display to users”), by the server computer as the server 60 (Mendelevitch, ¶40 “client computer system 10 configured with a document classification and categorization application module 40 (also referred to herein as "classification engine" or "categorization engine") according to the present invention … Client system 10 may operate as a stand-alone system or it may be connected to server 60 and/or other client systems 10 over a network 70”), from at least a first classified headnote as after binning, if the document is in the positive side of the topic (Mendelevitch, ¶61 “After binning, a "final" confidence score is calculated by combining the "binary classifier confidence scores" for all binary classifiers according to the coding matrix. According to one aspect, if a topic is in the positive side of a binary classifier, then that "binary confidence score" is preferably weighted as is … This final single confidence score can be used both for classification and for display to users”), a statute as the final classification (Id) pertaining to the first classified headnote as the document (Mendelevitch, ¶60 “all the "raw scores" from all training documents (positive and negative) are processed during training so as to create "bins" of equal size and put the "raw scores" into those bins.”), wherein the predicted statute as the final classification (Mendelevitch, ¶61 “After binning, a "final" confidence score is calculated by combining the "binary classifier confidence scores" for all binary classifiers according to the coding matrix. According to one aspect, if a topic is in the positive side of a binary classifier, then that "binary confidence score" is preferably weighted as is … This final single confidence score can be used both for classification and for display to users”) has associated therewith a first taxonomy as the published taxonomy (Mendelevitch, ¶65 “Preferably two taxonomies are included in the system: draft and published”; ¶50 “a set of zero (0) or more topics from the taxonomy”; ¶52 “The classification architecture of the present invention is preferably binary such that a distinct classifier is built for each topic in the taxonomy”) of topics as the topic represented by the classifier (Mendelevitch, ¶50 “this algorithm determines a set of zero (0) or more topics from the taxonomy to which the document belongs”; ¶54 “the number of binary classifiers ( each representing a single topic)”) wherein producing the pre-trained (Mendelevitch, ¶12) model as the classification engine (Mendelevitch (¶10; ¶40) is outsourced (Mendelevitch, ¶46 “In the client-server arrangement of FIG. 2, portions of module 40 may execute on client 10 while portions may execute on server 60 and/or on any other client 101 -10n.”) to the sequential neural network as the machine learning algorithm (Mendelevitch, ¶52 “the classifier for one topic may use a Naive Bayes algorithm and the classifier for a second topic may use a Support Vector Machines algorithm.”) allowing the transformer-based sequence to sequence model as the algorithm within the Topic Advisor which analyzes the content of the collection to predict a “final” score for a document being related to a specific topic (Mendelevitch, ¶48, ¶84; ¶61) to be tailored (¶10 “A topic's Training list contains examples of typical documents for that topic, used to train the automatic classification algorithms”) to a predictive task as the algorithm within the Topic Advisor which analyzes the content of the collection to predict a “final” score for a document being related to a specific topic (Mendelevitch, ¶48, ¶84; ¶61); 
[d] [[as training (Mendelevitch, ¶12 “In the first stage, a categorization engine ( e.g., algorithm) executes in the background (after being trained), classifying incoming documents to topics”) the transformer-based sequence to sequence model as the algorithm within the Topic Advisor which analyzes the content of the collection to predict a “final” score for a document being related to a specific topic (Mendelevitch, ¶48; ¶84; ¶61) with the plurality of documents as the document being analyzed (Mendelevitch, ¶48, ¶90), plurality of [[ as text objects from the document, such as the summary or description (Mendelevitch, ¶48 “each document is preferably converted into a raw text stream. For a given document, each text object (e.g., term or word) is placed in a data structure, e.g., simple table, with an indication of the number of occurrences of that term.”; ¶90 “summary or description”; Please note that all references to the “document” are intended to refer to the modified document detailed in the 103 combination below.) and metadata as accepted metadata (Mendelevitch, ¶48 “Because certain metadata may be highly pertinent to the classification process, the system advantageously allows the user to configure the system to process or reject certain metadata. For example, any tags, such as HTML tags, and other metadata may be stripped off during processing. Alternatively, a user may configure the system to process certain metadata”);
[e] generating, by the server computer as the server 60 (Mendelevitch, ¶40 “client computer system 10 configured with a document classification and categorization application module 40 (also referred to herein as "classification engine" or "categorization engine") according to the present invention … Client system 10 may operate as a stand-alone system or it may be connected to server 60 and/or other client systems 10 over a network 70”) utilizing the transformer-based sequence to sequence model as the algorithm within the Topic Advisor which analyzes the content of the collection to predict a “final” score for a document being related to a specific topic (Mendelevitch, ¶48 “The data structure is used by the machine-learning algorithm(s) to determine whether the document should be placed in a topic.”; ¶84 “A Topic Advisor algorithm, which serves as a preliminary categorization tool, analyzes the content of the collection as a whole and/or individual documents, including metadata, and determines probable topics among all topics for placement of the documents.”; Please note that all references to the “Top Advisor” are intended to refer to the modified Top Advisor detailed in the 103 combination below.; (¶61 “After binning, a "final" confidence score is calculated by combining the "binary classifier confidence scores" for all binary classifiers according to the coding matrix. According to one aspect, if a topic is in the positive side of a binary classifier, then that "binary confidence score" is preferably weighted as is … This final single confidence score can be used both for classification and for display to users”), a new statute prediction as a second topic label (Mendelevitch, ¶12 “A document may be classified to a single topic or multiple topics or no topics.”) from the at least first headnote as after binning, if the document is in the positive side of the topic (Mendelevitch, ¶61 “After binning, a "final" confidence score is calculated by combining the "binary classifier confidence scores" for all binary classifiers according to the coding matrix. According to one aspect, if a topic is in the positive side of a binary classifier, then that "binary confidence score" is preferably weighted as is … This final single confidence score can be used both for classification and for display to users”) when the classified note of decision as a first topic in which the document is predicted to be in, e.g. a first ‘bin’ the document is assigned to (Mendelevitch ¶52 “The classification architecture of the present invention is preferably binary such that a distinct classifier is built for each topic in the taxonomy. That is, for each topic, each document is processed by a machine-learning algorithm to determine whether the document satisfies a threshold criteria and should therefore be assigned to the topic. Each such classifier outputs for each document a "raw score" that in itself is a measure of the degree of confidence”; ¶60 “all the "raw scores" from all training documents (positive and negative) are processed during training so as to create "bins" of equal size and put the "raw scores" into those bins.”; Please note this claim limitation has been interpreted as detailed above) [[ as the classification label being analyzed (Mendelevitch, ¶52; ¶86);
[f] utilizing the [[as the topic advisor algorithm which determines the probable topic for document placement (Mendelevitch, ¶48 “The data structure is used by the machine-learning algorithm(s) to determine whether the document should be placed in a topic.”; ¶84 “A Topic Advisor algorithm, which serves as a preliminary categorization tool, analyzes the content of the collection as a whole and/or individual documents, including metadata, and determines probable topics among all topics for placement of the documents.), by the server computer as the server 60 (Mendelevitch, ¶40 “client computer system 10 configured with a document classification and categorization application module 40 (also referred to herein as "classification engine" or "categorization engine") according to the present invention … Client system 10 may operate as a stand-alone system or it may be connected to server 60 and/or other client systems 10 over a network 70”), a topic as a third topic represented by the classifier (Mendelevitch, ¶50 “this algorithm determines a set of zero (0) or more topics from the taxonomy to which the document belongs”; ¶54 “the number of binary classifiers ( each representing a single topic)”) from the taxonomy of topics (Mendelevitch, ¶65 “Preferably two taxonomies are included in the system: draft and published”; ¶50 “a set of zero (0) or more topics from the taxonomy”; ¶52 “The classification architecture of the present invention is preferably binary such that a distinct classifier is built for each topic in the taxonomy”) associated with the predicted statute as the classification (Mendelevitch, ¶61 “After binning, a "final" confidence score is calculated by combining the "binary classifier confidence scores" for all binary classifiers according to the coding matrix. According to one aspect, if a topic is in the positive side of a binary classifier, then that "binary confidence score" is preferably weighted as is … This final single confidence score can be used both for classification and for display to users”) for which the first headnote pertains as the document being classified (Mendelevitch, ¶60 “all the "raw scores" from all training documents (positive and negative) are processed during training so as to create "bins" of equal size and put the "raw scores" into those bins.”); 
[g] generating, by the server computer as the server 60 (Mendelevitch, ¶40) utilizing the transformer-based sequence to sequence model as the algorithm within the Topic Advisor which analyzes the content of the collection to predict a “final” score for a document being related to a specific topic (Mendelevitch, ¶48; ¶84; ¶61), a new topic as a third topic label (Mendelevitch, ¶12 “A document may be classified to a single topic or multiple topics or no topics.”) associated with the predicted statute as the final classification (Mendelevitch, ¶61) for with the first headnote pertains as after binning, if the document is in the positive side of the topic (Mendelevitch, ¶61 “After binning, a "final" confidence score is calculated by combining the "binary classifier confidence scores" for all binary classifiers according to the coding matrix. According to one aspect, if a topic is in the positive side of a binary classifier, then that "binary confidence score" is preferably weighted as is … This final single confidence score can be used both for classification and for display to users”) when the topic as a third topic represented by the classifier (Mendelevitch, ¶50 “this algorithm determines a set of zero (0) or more topics from the taxonomy to which the document belongs”; ¶54 “the number of binary classifiers ( each representing a single topic)”) from the taxonomy of topics (Mendelevitch, ¶65 “Preferably two taxonomies are included in the system: draft and published”; ¶50 “a set of zero (0) or more topics from the taxonomy”; ¶52 “The classification architecture of the present invention is preferably binary such that a distinct classifier is built for each topic in the taxonomy”) associated with the predicted statue as the classification (Mendelevitch, ¶61 “After binning, a "final" confidence score is calculated by combining the "binary classifier confidence scores" for all binary classifiers according to the coding matrix. According to one aspect, if a topic is in the positive side of a binary classifier, then that "binary confidence score" is preferably weighted as is … This final single confidence score can be used both for classification and for display to users”) is not available as the final score for the other label being on the negative side (Mendelevitch, ¶61 “After binning, a "final" confidence score is calculated by combining the "binary classifier confidence scores" for all binary classifiers according to the coding matrix. According to one aspect, if a topic is in the positive side of a binary classifier, then that "binary confidence score" is preferably weighted as is … This final single confidence score can be used both for classification and for display to users”);
[h] [[periodically retraining]] as training (Mendelevitch, ¶12 “In the first stage, a categorization engine ( e.g., algorithm) executes in the background (after being trained), classifying incoming documents to topics”) the sequential neural network as the machine learning algorithm (Mendelevitch, ¶52 “the classifier for one topic may use a Naive Bayes algorithm and the classifier for a second topic may use a Support Vector Machines algorithm.”) with changes to the taxonomy of topics as edits being made to the taxonomy structure (Mendelevitch, ¶101 “An Information Manager edits the taxonomy structure (i.e., adds topics, moves topics, deletes topics, modifies topics). The workflow memory system automatically requeues content in affected topics for re-categorization immediately.”), 
[i] associating (Mendelevitch, ¶87 “Topic and document information stored in the system”), by the server computer as the server 60 (Mendelevitch, ¶40 “client computer system 10 configured with a document classification and categorization application module 40 (also referred to herein as "classification engine" or "categorization engine") according to the present invention … Client system 10 may operate as a stand-alone system or it may be connected to server 60 and/or other client systems 10 over a network 70”), the first headnote as the first document being classified (Mendelevitch, ¶60 “all the "raw scores" from all training documents (positive and negative) are processed during training so as to create "bins" of equal size and put the "raw scores" into those bins.”) with the predicted topic as the topic represented by the classifier (Mendelevitch, ¶50 “this algorithm determines a set of zero (0) or more topics from the taxonomy to which the document belongs”; ¶54 “the number of binary classifiers ( each representing a single topic)”). 
Mendelevitch does not explicitly teach a sequential neural network … …utilizing a transformer-based sequence to sequence model to predict … fine-tuning the transformer-based sequence to sequence model…  generating, by the server computer utilizing the transformer-based sequence to sequence model a new statute prediction …when the classified note of decision does not cite a given statute in the classified note of decision; utilizing the transformer-based sequence to sequence model to predict  … a topic… generating, by the server computer utilizing the transformer-based sequence to sequence model, a new topic …; periodically retraining the sequential neural network …
Makhija teaches [a] a sequential neural network (Makhija, ¶84 “The sequence of embedding vectors is then input to a bidirectional long-short-term memory (LSTM) layer 604 which encodes the surrounding context of each token as output. An LSTM layer is an RNN that encodes a hidden context by parsing through a sequence of vector inputs:…”)…
[b] classifying, via the sequential neural network, each of the plurality of headnotes as using the LSTM to produce y, aka the weight parameters for the inputs (Makhija, ¶84 “An LSTM layer is an RNN that encodes a hidden context by parsing through a sequence of vector inputs: … where y, is the output vector of the LSTM provided as input to the higher layers in the model and functions f, g, encode the hidden context of the RNN h, and the output of the RNN layer y,. The parameters w(hh), w (l,x), w(hy) are the corresponding weights for the inputs ht,_1, x,, h,.”) … wherein pre-training and classifying produces a pre-trained model as storing trained  classifiers (¶47 “The system 110 includes a memory data store 116 for accessing contract data from registered and unregistered entity and also storing plurality of training classification models created by support mechanism 115.”) with a generalized domain knowledge (¶89 “The application of meta learning to data element extraction results in performing in data-efficient learning where the model 900 is generalized to a new data element 904 using fewer samples.”) and enhances the accuracy of classifying (¶64 “For, this level of processing, the system requires training of the data models with higher accuracy and confidence score to make changes to the contract”; ¶92 “This level of accuracy is required for ensuring risk free contract creation. … This is achieved using a sequence-to-sequence model which includes an RNN based encoder and decoder.”)…;
[c] utilizing a transformer-based sequence to sequence model as passing through the self-attention layer (Makhija, ¶84 “The output of Bi-LSTM 604 is then passed through a self-attention layer 605 to a time distributed dense layer 606 for further encoding”) with the pretrained model (¶8 “sending the data object to at least one data recognition training model for identification of at least one data attribute wherein the data recognition training model processes the data object”; ¶47 “plurality of training classification models”) to predict… as the output y of the Bi-LSTM (Id) … 
[d] fine-tuning (Makhija, ¶89 “α, β are the learning rates to perform the fine-tuning task and meta training respectively.”) the transformer-based sequence to sequence model (Makhija, ¶84, ¶92 “This is achieved using a sequence-to-sequence model which includes an RNN based encoder and decoder”) with the plurality of documents (Makhija, ¶76 “The present invention deploys a data extraction and mapping module that retains structure of the text in the data object/document”), plurality of headnotes as section headings (Makhija, ¶76 “The data structure includes retaining the font size of the text to detect section headings and spacing between lines to detect paragraphs in the contract text.”) and metadata as font size and spacing (Id);
[e] generating, by the server computer utilizing the transformer-based sequence to sequence model as passing through the self-attention layer (Makhija, ¶84 “The output of Bi-LSTM 604 is then passed through a self-attention layer 605 to a time distributed dense layer 606 for further encoding”) a new statute prediction as the identified classification (Makhija, ¶79 “The data attributes are extracted by executing a sentence level segmentation of the data object and classification of each sentence into a data attribute category.”) from the at least first headnote when the classified note of decision does not cite as the absence or deviations of certain attributes (Makhija, ¶77n “In an embodiment, the extracted data attributes are compared with a contract data attribute library to detect presence or absence of certain attributes and deviations from a standard contract template in the library wherein the deviations are analyzed to generate a risk score for quantifying the risk involved for an entity on enforcing a contract.”) a given statute as the extracted data attributes in the attribute library (Id) in the classified note of decision as the specific classification that the attribute is associated with (¶79);
[f] utilizing the transformer-based sequence to sequence model to predict as passing through the self-attention layer (Makhija, ¶84 “The output of Bi-LSTM 604 is then passed through a self-attention layer 605 to a time distributed dense layer 606 for further encoding”) … a topic…
[g] generating, by the server computer utilizing the transformer-based sequence to sequence model as passing through the self-attention layer (Makhija, ¶84 “The output of Bi-LSTM 604 is then passed through a self-attention layer 605 to a time distributed dense layer 606 for further encoding”), a new topic associated with the predicted statute for with the first headnote pertains as determining the classification with the satisfactory classification probability (¶84) when the topic from the taxonomy of topics  associated with the predicted statue is not available as the probability not being satisfactory for a given category (¶84 “The target probability distribution used as an input for cross entropy loss is a one-hot encoded vector t with a probability of 1 for the desired category and the probability distribution y is output by the softmax layer”);
[h] periodically retraining the sequential neural network with changes to the taxonomy of topics as alternating between the first and the second phase where the model is trained using the aggregation of data elements (Makhija, ¶88 “The system of the invention deploys model-agnostic meta-learning where the meta learning model 900 is trained by alternating between two training phases 901 as shown in FIG. 9. The first is an element specific training where the gradient direction to update the model parameters is obtained by using the data of a single element. The second phase is a meta-update 902 to the model where the gradient direction is obtained by an aggregation of gradient directions from all the data elements 903:”)
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the Machine learning algorithm and the Topic advisor algorithm taught by Mendelevitch using the AI model taught by Makhija as it yields the predictable results of automating the process using a known, off-the-self, AI modeling system to achieve the desired results.  Please note that both systems are extracting data from text, assigning weights, which is used to determine a category tag for the data.  Both systems are related to the same technology (e.g. uses of Machine learning to categorize text).  The automation of the extraction process is known to save a lot of time (Makhija, ¶83) when accuracy matters (Makhija, ¶92) which is a problem that Mendelevitch acknowledges is faced within the field of document classification (Mendelevitch, ¶3, ¶4, ¶6).  Furthermore, the proposed combination would enable the system to automate the detection and addition of new elements using the AI scripts (Makhija, ¶90, ¶51 “The support mechanism further includes a data attribute library creation script (DALCS) 121 that is updated each time a new data attribute for a contract is identified from a newly executed contract that is added to the data store 116 of the system 110.”) enabling the system to automate the manual process depicted by Mendelevitch.
Mendelevitch does not explicitly teach a plurality of documents comprising a plurality of headnotes… wherein the plurality of headnotes each comprise a respective segment of text that summarizes at least a respective portion of the document… note of decision does not cite a given statute in the classified note of decision.
Al-Kofahi teaches [a] a plurality of documents as the judicial opinion, aka an electronic document (Al-Kofahi, ¶32 “one judicial opinion (or generally an electronic document), such as electronic judicial opinion (or case) 115. Judicial opinion 115 includes and/or is associated with one or more headnotes in headnote database 120, such as headnotes 122 and 124.”) comprising a plurality of headnotes as the headnotes (Id)… headnotes (Al-Kofahi, ¶32)… wherein the plurality of headnotes each comprise a respective segment of text that summarizes at least a respective portion of the document (Al-Kofahi, ¶27 “The term "headnote" refers to an electronic textual summary or abstract concerning a point of law within a written judicial opinion. The number of headnotes associated with a judicial opinion ( or case) depends on the number of issues it addresses.”)… [e] note of decision as the classification associated with the case decision (Al-Kofahi, ¶44 “Headnotes database 120 receives a new set of headnotes (such as headnotes 126 and 128) for recently decided cases, and classification processor 130 determines whether one or more of the cases associated with the headnotes are sufficiently relevant to any of the annotations within ALR to justify recommending assignments of the headnotes ( or associated cases) to one or more of the annotations. (Some other embodiments directly assign the headnotes or associated cases to the annotations.) … However, both accepted and rejected recommendations are fed back to classification processor 130 for incremental training or tuning of its decision criteria”) does not cite a given statute in the classified note of decision (Al-Kofahi, ¶50 “parameters are based on proxy text, for example, the text of associated headnotes, as opposed to text of the annotation itself.”; ¶51; ¶57 “computing a set of similarity scores based on the similarity of text in each input headnote text to the text associated with each annotation”).
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the classification system taught by the proposed combination to facilitate the classification of headnotes to address the problems for articulated for the American legal system (Al-Kofahi, ¶4; ¶8, ¶9).  The proposed system enables the automatic identification of classes for specific headnotes using AI to facilitate search and retrieval.

With regard to claim 19 the proposed combination further teaches comprising retrieving the taxonomy of topics as the published taxonomy (Mendelevitch, ¶65 “Preferably two taxonomies are included in the system: draft and published”; ¶50 “a set of zero (0) or more topics from the taxonomy”; ¶52 “The classification architecture of the present invention is preferably binary such that a distinct classifier is built for each topic in the taxonomy”) associated with the first statute  as final classification (Mendelevitch, ¶60 “all the "raw scores" from all training documents (positive and negative) are processed during training so as to create "bins" of equal size and put the "raw scores" into those bins.”) and using the retrieved taxonomy of topics as the published taxonomy (Mendelevitch, ¶65 “Preferably two taxonomies are included in the system: draft and published”; ¶50 “a set of zero (0) or more topics from the taxonomy”; ¶52 “The classification architecture of the present invention is preferably binary such that a distinct classifier is built for each topic in the taxonomy”) as input  (Mendelevitch, ¶60 “For binning in a given binary classifier, all the "raw scores" from all training documents (positive and negative) are processed during training so as to create "bins" of equal size and put the "raw scores" into those bins. Given a new document, the "raw score" is examined and placed in the appropriate bin;”) for predicting the topic as the topic represented by the classifier (Mendelevitch, ¶50 “this algorithm determines a set of zero (0) or more topics from the taxonomy to which the document belongs”; ¶54 “the number of binary classifiers ( each representing a single topic)”) from the taxonomy as the published taxonomy (Mendelevitch, ¶65 “Preferably two taxonomies are included in the system: draft and published”; ¶50 “a set of zero (0) or more topics from the taxonomy”; ¶52 “The classification architecture of the present invention is preferably binary such that a distinct classifier is built for each topic in the taxonomy”) associated with the first statute as the final classification (Mendelevitch, ¶61 “After binning, a "final" confidence score is calculated by combining the "binary classifier confidence scores" for all binary classifiers according to the coding matrix. According to one aspect, if a topic is in the positive side of a binary classifier, then that "binary confidence score" is preferably weighted as is … This final single confidence score can be used both for classification and for display to users”).  

With regard to claim 20 the proposed combination further teaches wherein the predicted first statute as the classification (Mendelevitch, ¶61 “After binning, a "final" confidence score is calculated by combining the "binary classifier confidence scores" for all binary classifiers according to the coding matrix. According to one aspect, if a topic is in the positive side of a binary classifier, then that "binary confidence score" is preferably weighted as is … This final single confidence score can be used both for classification and for display to users”) and first headnote as the documents being processed (Mendelevitch ¶52 “The classification architecture of the present invention is preferably binary such that a distinct classifier is built for each topic in the taxonomy. That is, for each topic, each document is processed by a machine-learning algorithm to determine whether the document satisfies a threshold criteria and should therefore be assigned to the topic. Each such classifier outputs for each document a "raw score" that in itself is a measure of the degree of confidence”; ¶60 “all the "raw scores" from all training documents (positive and negative) are processed during training so as to create "bins" of equal size and put the "raw scores" into those bins.”) are further used as input  (Mendelevitch, ¶60 “For binning in a given binary classifier, all the "raw scores" from all training documents (positive and negative) are processed during training so as to create "bins" of equal size and put the "raw scores" into those bins. Given a new document, the "raw score" is examined and placed in the appropriate bin;”) for predicting the topic as the topic represented by the classifier (Mendelevitch, ¶50 “this algorithm determines a set of zero (0) or more topics from the taxonomy to which the document belongs”; ¶54 “the number of binary classifiers ( each representing a single topic)”) from the first taxonomy of topics as the published taxonomy (Mendelevitch, ¶65 “Preferably two taxonomies are included in the system: draft and published”; ¶50 “a set of zero (0) or more topics from the taxonomy”; ¶52 “The classification architecture of the present invention is preferably binary such that a distinct classifier is built for each topic in the taxonomy”)  associated with the first statute as the final classification (Mendelevitch, ¶61 “After binning, a "final" confidence score is calculated by combining the "binary classifier confidence scores" for all binary classifiers according to the coding matrix. According to one aspect, if a topic is in the positive side of a binary classifier, then that "binary confidence score" is preferably weighted as is … This final single confidence score can be used both for classification and for display to users”).  

Response to Arguments
Applicant's arguments filed April 7, 2026 have been fully considered but they are not persuasive.

With regard to Section II.A, no argument appears to be put forth.  The office has established on the record what is believed to be within the broadest reasonable interpretation for the sake of clarity of the record.  Applicant has not argued against the reasonability of the interpretation put forth by the office.

With regard to Section II.B, applicant argues that the office has not put forth a finding of non-functional descriptive material.  Within both the specification and the claims, the specific data (e.g. that the documents are headnotes and the predicted class is a “note of decision”) are merely the input and output of the system.  There is no functionally within the claimed device that is dependent on the specific data that is being analyzed.  The algorithm that performs the classification is the same ML model, using the same mathematical algorithm that is the sequence-to-sequence neural network.  The specific data does not impose a functional limitation on the claimed device.  The specific data that is used to train the model has a predictable coloration to the AI model that will be generated.  The AI model operates the same way regardless of the data that is fed into it, but it will be trained to look for the patterns that match that specific data.  The data is not functional, it is merely the pattern that is being checked for, e.g. the data.  The AI model is performing the functionality of looking for the pattern that it was trained to look for.  The method it uses to look for the pattern remains the same regardless of the pattern.  The specific data is merely non-functional descriptive material, merely the data being fed into the AI algorithm to generate the model that will be expected to perform the classification on similar data.  The data within the instant invention is not parallel to a hatband that performs a function, but is more similar to a generic image printed on the hat.  The fact that the metadata within the instant documents is related to legal cases does not make it different than any other metadata.  The fact that the documents contain a legal summary (e.g. the head note) does not change the fact that it is a document of text.  The computer system does not understand the human derived meaning (e.g. that it is a legal document, or that the top set of text is referred to as a ‘head note’).  These human derived meanings are non-functional descriptive material.  The specific name of the classification (e.g. that the determined classification is referred to as a ‘note of decision’) is merely a human meaning referring to the functionality of classification.  The functionality has been fully addressed in the above mapping.  The system processes the documents the same way, regardless of if they are legal documents (as is the use case claimed) or recipe documents, or anything else.  The human derived meaning of the specific data being retrieved does not impact the functionality of how said data is retrieved.

With regard to Section III. A, applicant argues that the examiner has not identified any specific terms that are indefinite.
In response, the indefinite terms are “a transformer-based sequence-to-sequence model” and “the pre-trained model”.  As explained above, one of ordinary skill in the art would identify the transformer-based sequence-to-sequence model as being a pre-trained model.  Within the claims, the transformer-based sequence-to-sequence model is details as being utilized with the pre-trained model. One of ordinary skill in the art may therefore read the claims to require a single model, which is the transformer-based sequence-to-sequence model, that described as a pre-trained model.  This interpretation is consistent with Paragraph [0040] of the instant specification.  One of ordinary skill in the art may reasonably read the claims to require at least two distinct models, one named ‘transformer-based sequence-to-sequence model’ and the second being named ‘pre-trained model’.  As such, the claim language is ambiguous, and the claimed ‘pre-trained model’ language lacks antice3dent basis as it is unclear if applicant is reciting a new claim element (as indicated by the use of a distinct label) or if applicant is referring to the previously recited element (which appears to be consistent with the specification).

With regard to the 101:
On Page 5 of the remarks, applicant argues that the pre-training and classification procedures utilizing a transformer base sequence to sequence model cannot be performed in the human mind.
In response, these claim limitations were identified as additional elements and analyzed accordingly.  Applicants arguments do not apply to the rejection put forth.

Applicant argues (Page 6) that the office fails to consider the claims as a whole, pointing to where the office action explicitly identifies the additional elements.  The applicant states that the office makes a conclusionary remark regarding the ordered combination.
In response, each limitation was analized individually and as part of the whole within the discussion of each limitation.  Furthermore, the entire claim was considered as a whole at the end of the analysis, a label has been facilitate identification of this section of the final analysis (see the label [Consideration as a Whole] within the 101 analysis).  The claimed system appears to automate the manual system that is explicitly recited as historically being performed by humans, using ML to perform the classification.  The claimed device, when viewed as a whole, appears to recite using an off the shelf ML model (e.g. the sequence-to-sequence model) to perform the classification instead of having the human do it.

Applicant argues (page 6) that there is no consideration regarding the improve functioning of the computer or to the technology.  The only argued improvement appears to be the expected improvement that is achieved when automating a manual process.  The ‘improvement’ appears to merely be the inherent befit of applying the abstract idea on a computer, which is not sufficient to integrate an abstract idea into a practical application (MPEP 2106.05(f) “Similarly, "claiming the improved speed or efficiency inherent with applying the abstract idea on a computer" does not integrate a judicial exception into a practical application or provide an inventive concept”).

Applicant argues Ex parte Desjardins (Page 7)
In response, it is noted that the fact patterns of the cited court case do not appear to match the fact patterns of the instant case.  Within applicant’s own argument the “holding that the claimed steps, particularly the selective parameter adjustment for optimizing performance of the machine learning model on a second learning task while projecting its performance on a first learning task, reflected an improvement in model performance”.  The instant claims is not directed to select parameter adjustment, there is no optimization of performance of a machine learning model on a second task while protecting it’s performance on a first task.  Within the instant case, the only argued ‘ improvement’ is that the ML model is trained with specific data.  All machine learning models are trained with specific data.  The specific data does not yield an improvement to the field of art or to the ML model itself.  As such, the facts of the cited case and the instant case are significantly distinct.  The instant claims appear to merely recite the use of an off the shelf ML model, trained on specific data.  The instant claims are not directed a specific machine learning model.

With regard to Section A and B on 35 USC 101, MPEP 2106, and Legal Standards.  
Applicant has not put forth any arguments regarding the analysis put forth by the office.  Discussion of the guidelines does is not an argument against the analysis put forth.  Applicant does not make any statement regarding the analysis put forth by the examiner, but instead appears to merely be discussing the guidelines for 101 analysis itself.  The arguments are not specific to the facts of the instant case, or the analysis put forth in detail by the office.

Section C Part 1: applicant argues that the Office Action skips the step of interpreting any of the claim terms and steps right into step 2A Prong 1.
In response, within the Step 2A Prong 1, the office analysis each claim limitation, and discusses how one of ordinary skill in the art would reasonably read the claim limitation (this is a discussion of ‘claim interpretation’).  Where appropriate, the office action has provided citations to the instant specification which are deemed pertinent to the interpretation of specific limitations.  There need not be a specific section, separate from the analyzation of the limitations which discusses interpretation separately.

Section C Part 1: the applicant states that there is confusion as to which parts of the claim were identified as abstract ideas.  For sake of clarity, the claim language, that is not struck through, that immediately follows the following label where identified as abstract: “[Step 2A Prong 1] The claim(s) recite the following limitations which have been identified as being directed to a mental process: …”
Each claim limitation was labeled with a [x] mark, and immediately following the listing of claim limitations, each limitation (as referenced to by the [x] mark) is addressed with the analysis.

Applicant further states that the 101 analysis is based upon an unreasonable interpretation.  But applicant fails to provide any clarification statements, or explain what is unreasonable about the examiners interpretation.  The examiner has made an effort to explain on the record why the interpretation put forth should be considered reasonable and why it is being applied.  Simply stating that it is unreasonable does not assist in providing any clarity regarding the scope of the claims.

Section C Part 3: applicant asserts that the claims are even more technical in nature than in DRR, Enfish.  Applicant then quotes claim language.
In response, the facts of DRR and Enfish are distinct from the facts of the instant case.  Applicant does not provide any parallel as to why or how the facts that lead to the decision of DRR or Enfish relate to the facts of the instant case.  

Section C Part 3: On Pages 19-20 applicant argues that the office has failed to consider the additional element of the transformer-based sequential neural network.
In response, the applicant did not invent transformer based sequential neural networks.  Within the instant specification, applicant states that the ML models used are known, off the shelf models, which are referred to by their name within the field of art.  Machine learning models that are so generally used, that they have a known name (e.g. bidireictional LSTM, Google’s Text-To-Text Tranformer T5) see, Paragraph [0040] of the original specification).  The instant invention appears to be using these models, trained on the given training data (e.g. the WestLaw dataset) to automate the manual process of classification.  Which is what these machine learning models are typically used for.  The claimed device is not attempting to claim a new machine learning methodology, but instead appears to merely be making use of existing ML models to perform the act of classification.
Applicant asserts that Paragraph [0035] discusses an advanced type of deep neural network architecture specifically targeted at test generation.  Paragraph [0035] recites: “In one embodiment, the proposed pipeline uses a sequence-to-sequence model (an advanced type of deep neural network architecture specifically targeted at text generation)”.  One of ordinary skill in the art would recognize that the specification is reciting a known name for an off-the-shelf neural network architecture that is readily available within the field of art.  Applicant is not describing a new machine learning architecture, but is instead referring to a known deep neural network architecture by name.  The instant invention appears to merely train this sequence-to-sequence model using specific data (which is also readably available, e.g. the Westlaw database, as detailed in Paragraph [0040]).  The instant claims are not directed to a specific machine learning model, but instead are directed to the sequence-to-sequence model that has been trained using the Westlaw data as training data.

Section C Part 4: applicant asserts that the claims are even more technical in nature than in BASCOM.  Applicant then quotes claim language and specification.
In response, the facts of BASCOM are distinct from the facts of the instant case.  Applicant does not provide any parallel as to why or how the facts that lead to the decision of BASCON relate to the facts of the instant case.  

Section C Part 4: Applicant argues that the instant invention improves efficiency by reducing the amount of skilled labor required.
In response, using a computer system to automate a manual process will inherently reduce the amount of skill labor required.  As stated above, inherent befits of merely implementing an abstract idea (in this case classification) in a computer system (in this case by using AI) does not amount to a partial application or significantly more than the abstract idea itself.  The claimed system is merely using the off the shelf AI model (that has been trained on the specific data it is expected to evaluate) to perform the operations historically performed by the human expert.

Section C Part 5: Applicant argues that there is no evidence provided regarding Step 2B.
In response, see the section labeled [Step 2B].  The office provides citation to the specification (e.g. Paragraphs [0034], [0035], [0040]) and prior art (e.g. Makhija, Keskar, Lee, Park, Eldesouki, Jayawardhana, Chandra) were provided.  

Section C Part 6: applicant argues that the office did not consider the machine-or transformation text of Bilski.  Applicant argues there is no analysis regarding 2106.05(b) or (c).
In response, the PEG 2019 guidelines provides a full 101 analysis that replaces the previous test.  The step 2a prong 2 and step 2b analysis provide the necessary analysis.  It would be improper for the office to refer to the Bilski test when performing the Peg 2019 analysis.  The analysis in the Step 2A Prong two put forth, properly analysis the limitations as falling into mere instructions to apply MPEP 2106.05(f) which is part of the test for additional elements.

Section C Part 6:applicant argues that the office provides no evidence, and instead merely asserts that the additional elements are generic.  Applicant cites a court cate that states that software can be non-abstract.
In response, the office has provided evidence.  The office cited the specification, and provided prior art citations that demonstrate that the specifically recited Machine Learning algorithms were generic computer devices.  This is not a mere assertion, nor is this a statement regarding software in general.  The analysis is made specific to the facts of the case, and specifically analyzing the specific Machine learning algorithms (which are referred to by name) as being known algorithms within the field of art.  Applicant is not asserting to have invented the stated algorithms.  Nor is there any discussion of the inner workings of the machine learning models at all.  Within the claims and the instant specification the machine learning models are merely black box devices that are fed data and output results.

With regard to the dependent claims, applicant argues that the dependent claims do not automatically rise or fall based on the parent.
In response, the dependent claim limitations are addressed in the above analysis.  It is noted that the dependent claims include all of the parent claim limitations, so must be (and have been) considered as a whole in view of this.  The analysis above analyses each limitation both independently and as an ordered combination of the claim as a whole.

With regard to the prior art:
Applicant argues that the prior does not teach “a new statutes” and asserts that the prior art only produces raw scores and confidence scores.  This is followed by a discussion of the scores and similarity metrics within the prior arts.
In response, the ‘new statute’ was mapped to a second topic label.  The prior art classifies the document into multiple topics (Mendelevitch, ¶12).  Multiple including ‘more than 1’, e.g. a first and a second topic.  One of ordinary skill in the art would recognize these classification topics as topic labeled, as the second one reads on the ‘new statutes’.  Applicants arguments do not address the rejection on record.

Applicant argues that Mendelevitch does not disclose pre-training.  Specifically that there is no indication of what the training is comprised of, and that the claims specifically claim that it is ‘pre-trained’ with specific documents.
In response, one of ordinary skill in the art would recognize the requirement of pre-trained as meaning that it is trained before it is used.  As such, the recitation that the model is used after it is trained, means that it was trained prior to the classification, e.g. pre-trained.  Furthermore, the instant claim makes no statement regarding how the training is performed either.  A statement of the training data, does not recite the functionality of how the AI model is trained.  This training is a model and methodology that is a known part of the sequence-to-sequence model that is referenced to by name within the instant specification.  One of ordinary skill in the art understands the underlying functionality of how this model is trained.  A recitation of the data that is fed into the model does not place any functional limitation on the training operation itself.  The data is merely non-functional descriptive material.  At best, detailing an intended use of the claimed system (e.g. the intention to use the ML to classify legal documents would need to be trained on legal training documents).  
Applicant further argues that because the training in Medelevitch is a ‘small document subset’ it cannot be understood as ‘pre-training’ because the specification discusses pre-training as including a large dataset (Referencing to Provisional application, Page 11, and 26).  In response, the size of the dataset is not captured in the claim language, merely the time of the training (e.g. being “pre”).  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).  In addition, the size of the training set would not have a functional impact on how the training operation is performed and as such does not impose a functional limitation on the claimed device.
Furthermore, within the 103 combination, the proposed combination has been modified in view of Al-Kofahi to train the model on headnotes and other such legal documents.

Applicant argues that Mendelevitch does not disclose a pre-trained neural network.  Specifically that classification the Naïve Bayes, SVM and decision trees taught by Mendelevitch are not Neural networks.  
It is noted that the Machine Learning classification system taught by Mendelevitch was modified in the proposed 103 combination to use the Bi-LSTM of Makhija (See ¶84).  Applicant’s arguments do not address the proposed combination on record.  The citation to the various classification methods mentioned in Mendelevitch was provided to facilitate identification of when the classification system was being used within the proposed combination.

Applicant argues that while Makhija teaches a sequential neural network, there is no disclosure of this ML being pre-trained.
In response, all Machine learning models are trained prior to being used.  Otherwise they would not function.  Makhija goes into minor detail regarding the specific training algorithm that they use (¶91), making it clear that the Machine learning model is trained.  One of ordinary skill in the art would recognize that the training phase is required to be performed prior to the model being used to perform classification.

Applicant argues that the examiner has provided to reasoning to combine Makhija with Mendelevitch.
The examiner explicitly stated the reasoning (e.g. “it yields the predictable results of automating the process using a known, off-the-self, AI modeling system to achieve the desired results.”).  Followed by an explanation regarding how both systems relate to the ssame technology (e.g. using ML to categorize text), and how Makhija details the use of the Bi-LSTM model is known to save a lot of time (Makhija, ¶83) when accuracy matters (Makhija, ¶92) which is a problem that Mendelevitch acknowledges is faced within the field of document classification (Mendelevitch, ¶3, ¶4, ¶6)

Applicant argues that Majkija does not teach periodically retraining the sequenceial neural network.  Specifcially, applicant asserts that altnerating between two training phrases is not ‘periodic’.
In response applicant merely asserts that the two distinct training phases do not qualify as being ‘periodic’ but has providing no reasoning or rational as to the distinction applicant sees.  One of ordinary skill in the art would recognize the broadest reasonable interpretation of the term ‘periodic’ to include where the training is repeated from time to time.  The two-phase training explicitly recited by Majkija details at least two distinct training sequences, which are executed at distinct time.  As such, one of ordinary skill in the art would recognize the two-phase training sequence as a periodic training.

Applicant argues that Mendelevitch fails to teach the headnotes and note of decision, and that Al-Kofahi fails to teach the Jurisdictional criteria and the taxonomy.
In response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).
Within the proposed combination, the specific classification engine (e.g. the ML model) is taught by Mendelevitch (modified in view of Makhija) to perform the functional limitations.  This includes the Taxonomy and the jurisdictional criteria (which functionally is a classification criteria, just applied on legal data).  Within the proposed combination, this classification model has been trained to classify legal documents (as taught by Al-Kofahi) and therefore provides the classification on the headnotes and notes of decision (as taught by Al-Kofahi).  Applicant’s arguments do not address the proposed rejection on record, and instead attempts to attack the references individually.

Applicant argues that Mendelevitch and Makhija discuss completely different processes.
In response, Mendelevitch teaches a classification system that uses a ML  classification system.  How the ML system is used to perform the classification, and how it works with the underlying taxonomies and topics is provided, but Mendelevitch does not provide any detail regarding the ML algorithm itself.  Mendelevitch references to generic machine learning algorithms (Mendelevitch, ¶48) to use the system with.  Makhija provides description of Machine learning classifiers (¶81, ¶84).  How the internal layers operate, and how it is trained.  These specifics of how the ML classifiers operate are not tied to the specific use case to which Makhija apply it, but are details of how the Machine learning algorithm operates.  Some of the internal mechanisms of the Machine learning models itself.  One of ordinary skill in the art would recognize that the machine learning algorithm is the same algorithm regardless of the use case that it is applied to.  Within the proposed combination, the device operates as taught by Mendelevitch, incorporating the Machine learning model taught by Makhija to implement the referenced to generic machine learning algorithm uses (Mendelevitch, ¶48).  The ‘different processes’ applicant appears to be referring to, are merely the intended use of the machine learning system.  What operates above the machine learning system.

Applicant argues that an RNN is not a transformer model.
In response, it is not asserted that a RNN is a transformer model.  One of ordinary skill in the art would recognize that the term ‘transformer’ refers to a machine learning that uses attention mechanisms to weight the importance of parts.  The Transformer based sequence to sequence model was mapped to where the prior art discusses the output of the BI-LSTM being passed through a self-attention layer for further encoding (Makhija, ¶84).  The passing of the data through a self-attention layer addresses the ‘transformer’ claim limitation.  The instant application does not discuss the inner layers of the machine learning model at all.  There is no discussion on how the model operates or its internal mechanisms.  The instant claims do not make any attempt of reciting how the machine learning system operates.  To be clear, reciting the data that a model is trained on (e.g. the training data) does not recite any functional limitation or operation of the machine learning model.  This is not a discussion of the model itself.  The instant claims merely recite the machine learning model at a high level of generality, merely reciting utilizing a ‘transformer-based sequence to sequence model’.  Applicant is relying on high-level keywords, known in the art to provide any data regarding the operations of the machine learning model itself.  Makhija does not just teach a RNN or a Bi-LSTM.  Makhija specifically teaches that the output data is passed through a self-attention layer (e.g. a transformer).  

Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AMANDA WILLIS whose telephone number is (571)270-7691. The examiner can normally be reached Monday-Friday 8am-2pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ajay Bhatia can be reached at 571-272-3906. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/AMANDA L WILLIS/Primary Examiner, Art Unit 2156
Read full office action
Prosecution Timeline

Show 16 earlier events
Aug 15, 2025
Final Rejection mailed — §101, §103, §112
Nov 05, 2025
Interview Requested
Nov 12, 2025
Examiner Interview Summary
Nov 13, 2025
Request for Continued Examination
Nov 19, 2025
Response after Non-Final Action
Dec 08, 2025
Non-Final Rejection mailed — §101, §103, §112
Apr 07, 2026
Response Filed
May 13, 2026
Final Rejection mailed — §101, §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

16/678,984
Patent 12639369
Dynamic Audio File Generation
6y 6m to grant Granted May 26, 2026
18/732,241
Patent 12639306
DATABASE OPERATOR CLAUSE VARIABLE CALCULATION IN DISTRIBUTED SYSTEMS
1y 11m to grant Granted May 26, 2026
18/603,392
Patent 12619635
METHODS AND SYSTEMS FOR SUPPLY CHAIN ANALYTICS USING VISUALIZATIONS AND STANDARDIZATION CONSTRUCTS
2y 1m to grant Granted May 05, 2026
15/132,638
Patent 12608395
EXTRACTION OF AUDIT TRAILS
10y 0m to grant Granted Apr 21, 2026
17/380,905
Patent 12602380
SUBSUMPTION OF VIEWS AND SUBQUERIES
4y 8m to grant Granted Apr 14, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

9-10
Expected OA Rounds
36%
Grant Probability
62%
With Interview (+26.4%)
4y 9m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 348 resolved cases by this examiner. Grant probability derived from career allowance rate.