DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application is being examined under the pre-AIA first to invent provisions.
Remarks
In response to communication files on November 5, 2025, claim 1 is amended by applicant's request. Therefore, claims 1-20 are presently pending in the application.
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 11/5/2025 has been entered.
Election/Restrictions
Newly submitted amended claims 14-20 are directed to an invention that is independent or distinct from the invention originally claimed for the following reasons:
Group I: Claims 1-13 are highly focused on the application of data processing for a specific type of data (documents) and a specific outcome (labeling for training an ML model).
Primary Area: Document Analysis and Processing
G06F 16/30 (Information retrieval; Web search or mining: Search for text or document content)
G06F 16/33 (Information retrieval: determined by content; Indexing methods; Abstracting methods)
Secondary Area: Machine Learning Applications (Specific)
G06N 20/00 (Machine learning) - specifically applications of ML in document analysis.
Group II: Claims 14-20 are is broader and describes the general architecture and functionality of a human-in-the-loop system for generating training data for any processor-implemented method, not just documents.
Primary Area: General Machine Learning and System Architecture
G06N 20/00 (Machine learning)
G06N 3/02 (Neural networks) - if the "processor-implemented method" is a neural network.
G06F 18/00 (Pattern recognition)
Secondary Area: Human-Computer Interaction / User Interfaces for Data Input
G06F 3/048 (Interaction techniques for graphical user interfaces) - relating to the steps of presenting data to a user and receiving input.
Since applicant has received an action on the merits for the originally presented invention, this invention has been constructively elected by original presentation for prosecution on the merits. Accordingly, claims 14-20 withdrawn from consideration as being directed to a non-elected invention. See 37 CFR 1.142(b) and MPEP § 821.03.
To preserve a right to petition, the reply to this action must distinctly and specifically point out supposed errors in the restriction requirement. Otherwise, the election shall be treated as a final election without traverse. Traversal must be timely. Failure to timely traverse the requirement will result in the loss of right to petition under 37 CFR 1.144. If claims are subsequently added, applicant must indicate which of the subsequently added claims are readable upon the elected invention.
Should applicant traverse on the ground that the inventions are not patentably distinct, applicant should submit evidence or identify such evidence now of record showing the inventions to be obvious variants or clearly admit on the record that this is the case. In either instance, if the examiner finds one of the inventions unpatentable over the prior art, the evidence or admission may be used in a rejection under 35 U.S.C. 103 or pre-AIA 35 U.S.C. 103(a) of the other invention.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-13 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding claims 1 and 8,
Step 1 Analysis: Claims 1 and 8 are directed to a process, which falls within one of the four statutory categories.
Step 2A Prong 1 Analysis: Claim 1 and 8 recites,
The claim language uses generic functional terms ("receiving," "applying," "noting," "performing," "using") without specifying a particular, non-conventional technological implementation.
The claim appears to fall within the judicial exception of an abstract idea, specifically related to a mental process and a method of organizing human activity.
The core of the claim involves a series of steps for processing information and using human input to refine a process. The step of a "user to assign a label" is a mental step that can be performed by a human in their mind or using pen and paper.
The overall method of using human review to supplement automated processing to create a training dataset is a fundamental and common practice in machine learning. This is considered a method of organizing human activity or a conventional business practice, which courts have identified as an abstract idea when claimed generically.
Step 2A Prong 2 Analysis: The claim does not appear to integrate the abstract idea into a practical application in a manner that is non-conventional and specific.
The Federal Circuit has consistently held that merely speeding up a human activity with generic computing is not enough to confer eligibility.
The use of generic "automated techniques" and "training processes" is inherent to the nature of machine learning itself and does not represent an improvement in computer technology or a specific, non-conventional technical solution to a technical problem beyond the abstract idea of labeling documents.
Because the claim is directed to an abstract idea not integrated into a non-conventional practical application, it fails Step 2A.
Step 2B Analysis: The additional elements in the claim consist of generic
computer components and routine steps. For example, "receiving a document file" is a generic data processing step.
The combination of these elements simply implements the abstract idea using conventional computing functions. There are no limitations describing a specific hardware improvement (e.g., a specific ASIC for neural networks), an improved network function, or an unconventional sensor configuration that would elevate the claim to patent-eligible subject matter.
The fact that the process yields a "training dataset" used in a machine learning context does not automatically make it patent-eligible, especially since the training itself is described as a conventional, iterative process.
Therefore, the claim does not include an inventive concept that amounts to significantly more than the abstract idea.
Regarding claims 2-7 and 9-13, the rejection of claims 1 and 8 are further incorporated, and further, the claim recites: extracting…providing…presenting… storing… This limitation amounts to an insignificant extra-solution activity. See MPEP 2106.05(g).
The claim does not include any additional elements that amount to significantly more than the judicial exception. the claim is directed to the abstract idea of a human-in-the-loop document labeling method using conventional computer technology. The claim lacks a specific, non-conventional technical solution or improvement that would render it patent-eligible. The claim is not patent eligible.
Claim Rejections - 35 USC § 103
The following is a quotation of pre-AIA 35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negative by the manner in which the invention was made.
Claims 1-13 are rejected under pre-AIA 35 U.S.C. 103(a) as being unpatentable over Rujan et al. (US Pub. 2006/0212413) (Eff filing date of app: 4/27/2000) (Hereinafter Rujan) in view of Shanahan (US Pub 2003/0078899) (Eff filing date of app: 8/13/2001) and further in view of Summerlin et al (US Pub 2009/0089305) (Hereinafter Summerlin).
As to claim 1, Rujan teaches a processor-implemented method comprising:
receiving a document file, in which a document represented by the document file (see p. 8, “documents are represented as vectors for classification purposes. A document is formed of or may comprise a sequence of terms.”);
applying one or more automated techniques to data from the document filer to try to label the document represented by the document file (see p. 9, “automatic classification), then the representation of the documents by their corresponding vectors together with the separation of the corresponding vector space into subspaces forms a classification scheme (or a classification model) which reflects the classification of the already classified documents and which may be used for (automatic) classification of unknown and unclassified documents.” And p. 16, “Thereby the subspaces calculated based on the already classified documents are such that they are particularly suitable for automatically classifying unknown documents’ p. 19, “An automatic classification of an unknown document may be performed based on calculating the location of the vector representing said document”);
responsive to not being able to assign a label to the document above an acceptable threshold level using one or more automated techniques, noting the document file for subsequent review (see p. 23, “it is preferable if the renewed calculation of the classification scheme includes one or more documents which have either been wrongly classified during automatic classification or documents which have a low confidence level.”; P. 26, “refining the classification scheme by selecting the document(s) with the lowest confidence level or the document(s) whose confidence level is below a certain threshold for repeatedly calculating the classification scheme”; p. 33, unclassified document) (see p. 34, manually performed classification and claim 10);
receiving a training dataset comprising the document file in which the label has been assigned to the document (see p. 13, 15, 35, 53-54, 62, 75-76, and 9, “If such a representation is applied to a set of documents which have already been classified (for example by a user, or by any other automatic classification), then the representation of the documents by their corresponding vectors together with the separation of the corresponding vector space into subspaces forms a classification scheme (or a classification model) which reflects the classification of the already classified documents and which may be used for (automatic) classification of unknown and unclassified documents.”); and
using at least part of the training dataset, which includes the document file, as training data in one or more training processes for training or updating one or more automated techniques for document labeling (see p. 9, “If such a representation is applied to a set of documents which have already been classified (for example by a user, or by any other automatic classification), then the representation of the documents by their corresponding vectors together with the separation of the corresponding vector space into subspaces forms a classification scheme (or a classification model) which reflects the classification of the already classified documents and which may be used for (automatic) classification of unknown and unclassified documents.” and p. 71, “which means that the classification of the documents has been learned by the classification scheme and may be used for classifying new unknown documents”).
Rujan teaches unclassified document see abstract but does not expressly teach receiving a document file, in which a document represented by the document file.
Shanahan teaches fuzzy text categorizer, see abstract, in which he teaches receiving a document file (see p. 43, consumes the unlabeled text object), in which a document represented by the document file (see p. 58, feature extractor generates a feature vector from a document. The generated feature vector consists of tuples. A feature is the characters that makeup a word. The words are a plurality of characters).
It would have been obvious to a person having ordinary skill in the art at the time the invention was made to have modified Rujan by the teaching of Shanahan, because receiving a document file…, would enable the method because “Text categorization can be applied to documents that are purely textual, as well as, documents that contain both text and other forms of data such as images.” See p. 2.
Rujan does not expressly teach performing subsequent review of data from the document file by a user to assign a label.
Summerlin teaches electronic records automated classification system, see abstract, in which he teaches performing subsequent review of data from the document file by a user to assign a label (see p. 32, Documents in the Review Classification folder 21 are available for subsequent inspection and review by a designated user).
It would have been obvious to a person having ordinary skill in the art at the time the invention was made to have modified Rujan by the teaching of Summerlin, because performing subsequent review…, would enable the user to select or assign the proper label to the document.
As to claim 2, Rujan as modified teaches the processor-implemented method further comprising the step of:
responsive to the document file being an image file, extracting characters from the document file using optical character recognition (see Shanahan, p. 2, text and other forms of data such as images).
As to claim 3, Rujan as modified teaches the processor-implemented method further comprising the step of:
responsive to the document file being an audio file, extracting characters from the document file using speech-to-text recognition (see Shanahan, p. 222, audio).
As to claim 4, Rujan as modified teaches the processor-implemented method further comprising the step of:
providing at least a portion of data from the document file to a user interface system to facilitate the subsequent review, which comprises receiving input from one or more user regarding the document file (see Rujan, p. 9, “set of documents which have already been classified (for example by a user,” and 67, “classification of the members of a set of documents (the training corpus) which already exists and has for example been made by a user”).
As to claim 5, Rujan as modified teaches The processor-implemented method further comprising the step of:
presenting at least a portion of data from the document file in a review dataset to a user (see Rujan, p. 24, input by the user) receiving input from the user regarding a label for associating with the document file (see Rujan, p. 24); and storing the document file and at least part of the input associated with the document file in the training dataset (see Rujan p. 82, “Typical examples are sorting e-mails and bookmarks, organizing hard disks, creating interest-profiles for Internet search or automatic indexing of archives (normal and electronic libraries). This implementation allows each user of a given databank to classify, search, and store data according to their own personal criteria.”).
As to claim 6, Rujan as modified teaches wherein the step of responsive to not being able to assign a label to the document above an acceptable threshold level using one or more automated techniques, noting the document file for subsequent review comprises:
providing a prompt to a user to provide input related to the document file (see Rujan, p. 25-26 and 82);
receiving the input from the user regarding the document file (see Rujan, p. 26, input by the user and 84); and
storing the document file and at least part of the input associated with the document file in the training dataset (see Rujan p. 82, “Typical examples are sorting e-mails and bookmarks, organizing hard disks, creating interest-profiles for Internet search or automatic indexing of archives (normal and electronic libraries). This implementation allows each user of a given databank to classify, search, and store data according to their own personal criteria.”).
As to claim 7, Rujan as modified teaches wherein the step of using at least part of the training dataset, which includes the document file, as training data in one or more training processes for training or updating one or more automated document labeling comprises:
for at least one of the one or more processor-implemented methods, using the at least part of the training dataset as training data in the training process to refine the at least one of the one or more processor-implemented methods (see Shanahan, p. 81 and 86, “In an alternative embodiment, these Zipf-based selection criteria are applied to each document in the training database 1604 to eliminate features that have low or high frequency within each document in the training database 1604.”).,
As to claim 8, Rujan as modified teaches a processor-implemented method comprising:
applying an automated technique using at least some data from input data to assign an output that is associated with the input data;
presenting the output of data related to the output to a user;
receiving feedback from a user related to the output of the automated technique;
storing data related to the feedback input from the user into a dataset; and
using at least part of the datasetdback as training data in one or more machine learning training processes.
As to claim 9, Rujan as modified teaches The processor-implemented method further comprising the step of: providing a prompt to the user to provide feedback related to the output (see Rujan, p. 26 and 82, input by user).
As to claim 10, Rujan as modified teaches wherein the step of providing a prompt to the user to provide feedback related to the output comprises: providing the prompt in response to the automated technique not being able to successfully process the data at a threshold level or above a threshold level (see Rujan, p. 24-26, “It is therefore preferable to have the system to do the following: [0025] listing the automatically classified documents according to their confidence value; [0026] refining the classification scheme by selecting the document(s) with the lowest confidence level or the document(s) whose confidence level is below a certain threshold for repeatedly calculating the classification scheme, based on a correct classification of said document, which may for example be input by the user.” And p. 82).
As to claim 11, Rujan as modified teaches wherein the automated technique is a classifier (see Shanahan, p. 81-82, “Typically this corpus forms all or a subset of training database 1604 shown in FIG. 16 that is used to learn the knowledge base 122 in categorizer 1600.”).
As to claim 12, Rujan as modified teaches wherein the step of using a least part of the dataset as training data on one or more machine learning training processes further comprising:
using at least some of the dataset as a training dataset to train or refine a processor-implemented machine learning model (see Rujan, p. 13, “This can be done by going through a training corpus of documents which already have been classified, such that the terms which should finally form the dimensions of the vector space are extracted. One can apply certain predefined criteria (e.g. to search for individual words, to search for words which have a length greater than a certain threshold, or the like), and the elements of the documents of the training corpus which meet these criteria then are considered to form the terms which correspond to the dimensions of the vector space to be generated. The more terms are used to create the vector space, the better the representation of the contents of the documents of the training corpus is.”).
As to claim 13, Rujan as modified teaches using at least some of the dataset as a training dataset to train or refine a processor-implemented machine learning model comprises:
using at least some of the dataset as a training dataset to perform training or finetuning on the automated technique (see Rujan, p. 67, “a classification of the members of a set of documents (the training corpus) which already exists and has for example been made by a user (or is obtained from any other source) a representation of this classification according to a classification scheme is calculated.”) .
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BELIX M ORTIZ DITREN whose telephone number is (571)272-4081. The examiner can normally be reached M-F 9am -5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Amy Ng can be reached at 571-270-1698. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
BELIX M. ORTIZ DITREN
Primary Examiner
Art Unit 2164
/Belix M Ortiz Ditren/Primary Examiner, Art Unit 2164