DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
Claims 1-8 are pending in this application.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-8 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claims recite parsing text, tagging the tokens and assigning a confidence value for each tag, which is a mental process that could be performed with pencil and paper. This judicial exception is not integrated into a practical application because the only additional elements in the claims are generic computer components performing generic computing functions. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the only additional elements in the claims are generic computer components performing generic computing functions.
As per claim 1, the following limitations are recited:
A) extracting the text from a resource;
B) splitting one or more sentences in the text into a predetermined number of plurality of tokens;
C) generating a plurality of lists using a machine learning model, for identifying one or more fields in the text, wherein the plurality of lists comprises at least a list of tokens, a list of tags and a list of confidence score of tokens; and
D) post-processing the plurality of lists for extracting one or more fields for parsing the text.
Limitations A-D are directed to a mental process.
The subject matter eligibility analysis is as follows:
Step 1 - Is the claim to a process, machine, manufacture or composition of matter? YES
Step 2A, Prong 1 - Does the claim recite an abstract idea, law of nature, or natural phenomenon? YES
Step 2A, Prong 2 - Does the claim recite additional elements that integrate the judicial exception into a practical application? NO
Step 2B – Does the claim recite additional elements that amount to significantly more that the judicial exception? NO
Therefore the claim is not drawn to eligible subject matter.
As per claim 6, the following limitations are recited:
K) a processor configured to execute non-transitory machine-readable instructions
A) extracting the text from a resource;
B) splitting one or more sentences in the text into a predetermined number of plurality of tokens;
C) generating a plurality of lists using a machine learning model, for identifying one or more fields in the text, wherein the plurality of lists comprises at least a list of tokens, a list of tags and a list of confidence score of tokens; and
D) post-processing the plurality of lists for extracting one or more fields for parsing the text.
Limitations A-D are directed to a mental process and limitation K is directed to a generic computer component performing generic computing functions.
The subject matter eligibility analysis is as follows:
Step 1 - Is the claim to a process, machine, manufacture or composition of matter? YES
Step 2A, Prong 1 - Does the claim recite an abstract idea, law of nature, or natural phenomenon? YES
Step 2A, Prong 2 - Does the claim recite additional elements that integrate the judicial exception into a practical application? NO
Step 2B – Does the claim recite additional elements that amount to significantly more that the judicial exception? NO
Therefore the claim is not drawn to eligible subject matter.
As per claims 2 and 7, the following additional limitations are disclosed:
E) receiving a PDF document and identifying one or more bounding boxes in text from the PDF document;
F) converting the one or more bounding boxes into a plurality of images; and
G) parsing the text from each section of the plurality of images.
Limitations E-G are directed to a mental process, therefore the subject matter eligibility analysis remains unchanged.
As per claims 3 and 8, the following additional limitations are disclosed:
H) classifying the one or more sentences with a plurality of labels;
I) splitting the classified sentences into one or more tokens; and
J) passing the one or more tokens into a classifier for generating the plurality of lists.
Limitations H-J are directed to a mental process, therefore the subject matter eligibility analysis remains unchanged.
As per claim 4, the following limitations are recited:
extracting the text from a resource;
L) generating a training set for the artificial intelligence model based on the extracted text and importing the training set into the artificial intelligence model; and
M) training and evaluating the artificial intelligence model using the training set for generating a plurality of lists for identifying one or more fields in the text, wherein the plurality of lists comprises at least a list of tokens, a list of tags and a list of confidence score of tokens.
Limitations A & L are directed to a mental process and limitation M is directed to a mathematical concept.
The subject matter eligibility analysis is as follows:
Step 1 - Is the claim to a process, machine, manufacture or composition of matter? YES
Step 2A, Prong 1 - Does the claim recite an abstract idea, law of nature, or natural phenomenon? YES
Step 2A, Prong 2 - Does the claim recite additional elements that integrate the judicial exception into a practical application? NO
Step 2B – Does the claim recite additional elements that amount to significantly more that the judicial exception? NO
Therefore the claim is not drawn to eligible subject matter.
As per claim 7, the following additional limitations are disclosed:
N) the machine learning model is a Cased-Sci-Bert model.
Limitation N is directed to a mathematical concept, therefore the subject matter eligibility analysis remains unchanged.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claims 1, 3-4, 6 and 8 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Vu et al. (U.S. Patent 12,210,830).
As per claims 1 and 6, Vu et al. discloses:
A system for token-based classification for reducing overlap in field extraction during parsing of a text, the system comprising a processor (Figure 1, item 106 and column 5, line 39-52) configured to execute non-transitory machine-readable instructions that when executed perform:
extracting the text from a resource (Column 5, line 65 – Column 6, line 15 – text is extracted from the utterance);
splitting one or more sentences in the text into a predetermined number of plurality of tokens (Figure 4, items 420 – 426 and Column 23, lines 1-35 – the text is chunked into overlapping chunks of predetermined size);
generating a plurality of lists using a machine learning model, for identifying one or more fields in the text, wherein the plurality of lists comprises at least a list of tokens, a list of tags and a list of confidence score of tokens (Figure 4, items 420 – 426 and Column 23, lines 1-35 – the chunks each have a label for each token and an associated confidence score); and
post processing the plurality of lists for extracting one or more fields for parsing the text (Figure 4, items 420 – 426 and Column 23, lines 1-35 – The confidence scores for the chunks are merged and a final annotated label for the overlapping tokens is arrived at).
As per claims 3 and 8, Vu et al. discloses all of the limitations of claims 1 and 6 above. Vu et al. further discloses:
classifying the one or more sentences with a plurality of labels; splitting the classified sentences into one or more tokens; and passing the one or more tokens into a classifier for generating the plurality of lists (Figure 4, items 420 – 426 and Column 23, lines 1-35 – the chunks each have a label for each token and an associated confidence score).
As per claim 4, Vu et al. discloses:
A processor-implemented method of training a machine learning model for token-based classification for reducing overlap in field extraction during parsing of text, the method comprising:
extracting the text from a resource (Column 5, line 65 – Column 6, line 15 – text is extracted from the utterance);
generating a training set for the artificial intelligence model based on the extracted text and importing the training set into the artificial intelligence model (Claim 2 and Column 22, lines 52-67 – each chuck is treated as a separate training example); and
training and evaluating the artificial intelligence model using the training set for generating a plurality of lists for identifying one or more fields in the text, wherein the plurality of lists comprises at least a list of tokens, a list of tags and a list of confidence score of tokens (Claim 1).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 2 and 7 are rejected under 35 U.S.C. 103 as being unpatentable over Vu et al. (U.S. Patent 12,210,830) in view of Pillai et al. (U.S. Patent 11,829,399).
As per claims 2 and 7, Vu et al. discloses all of the limitations of claims 1 and 6 above. Vu et al. fails to disclose, but Pillai et al. in the same field of endeavor teaches:
receiving a PDF document and identifying one or more bounding boxes in text from the PDF document; converting the one or more bounding boxes into a plurality of images; and parsing the text from each section of the plurality of images (Figure 4, item 404 and Column 10, lines 15 -27 – the PDF is parsed into bounding boxes from which text is extracted).
It would be obvious for a person having ordinary skill in the art at the effective filing date of the invention to modify the method and system of Vu et al. with the PDF parsing capabilities of Pillai et al. because it is a case of simple substitution of one known element for another to obtain predictable results. Vu et al. teaches text extraction in preparation for text processing, but not explicitly from PDFs. Pillai et al. teaches extracting text from PDFs in preparation for text processing, and so such PDF processing was known in the art. One of ordinary skill in the art could have substituted the PDF text extraction of Pillai et al. for the generic text extraction of Vu et al.
Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Vu et al. (U.S. Patent 12,210,830) in view of Zhou et al. (U.S. Patent Application Publication 2022/0121822).
As per claim 5, Vu et al. discloses all of the limitations of claim 4. Vu et al. fails to disclose, but Zhou et al. in the same field of endeavor teaches:
the machine learning model is a Cased-Sci-Bert model (Paragraph [0132]).
It would be obvious for a person having ordinary skill in the art at the effective filing date of the invention to modify the method and system of Vu et al. with the cased Scibert capabilities of Zhou et al. because it is a case of simple substitution of one known element for another to obtain predictable results. Vu et al. teaches the use of a generic machine learning model for classification, but not explicitly cased Scibert. Zhou et al. teaches the use of cased Scibert for classification, citing the original Scibert paper, and so the use of cased Scibert was known in the art. One of ordinary skill in the art could have substituted the use of cased Scibert in Zhou et al. for the generic classifier of Vu et al.
Examiner Notes
The Examiner cites particular columns and line numbers in the references as applied to the claims above for the convenience of the Applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that, in preparing responses, the Applicant fully considers the references in its entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or as disclosed by the Examiner.
Communications via Internet e-mail are at the discretion of the applicant and require written authorization. Should the Applicant wish to communicate via e-mail, including the following paragraph in their response will allow the Examiner to do so:
“Recognizing that Internet communications are not secure, I hereby authorize the USPTO to communicate with me concerning any subject matter of this application by electronic mail. I understand that a copy of these communications will be made of record in the application file.”
Should e-mail communication be desired, the Examiner can be reached at Edwin.Leland@USPTO.gov
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to EDWIN S LELAND III whose telephone number is (571)270-5678. The examiner can normally be reached 8:00 - 5:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hai Phan can be reached at 571-272-6338. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/EDWIN S LELAND III/Primary Examiner, Art Unit 2654