DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-23 are pending. Claims 24-28 are cancelled. Claims 1 and 12 have been amended.
Response to Arguments
Applicant’s arguments with respect to claim(s) 1-23 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The following is a quotation of pre-AIA 35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is invoked.
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph:
(A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function;
(B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and
(C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function.
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function.
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function.
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitation(s) is/are: a data storage engine, a document conversion engine, an information retrieval engine, an information extraction engine as written in claim 12. Claims 12-23 are being interpreted under 35 U.S.C. 112(f) for containing these claim limitations.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 103
Claim(s) 1-3, 5, 7-8, 10-11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vig et al (US 20200175304 A1) in view of Nor et al. ("IMAGE SEGMENTATION AND TEXT EXTRACTION: APPLICATION TO THE EXTRACTION OF TEXTUAL INFORMATION IN SCENE IMAGES") in view of Gillick (US 20020133341 A1) in view of Visser et al. (US 20150301796 A1)
With respect to claim 1, Vig et al. discloses a method for extracting information from a computer-readable digital document, comprising:
identifying segments that contain needed information; ([0010], receiving input images with needed information; detecting textual entities)
classifying the identified segments into machine-typed or handwritten text; ([0031])
converting each segment of the document into a digital text format using one of a trained machine learning model or an optical character recognition algorithm; ([0031]-[0032], [0036]; print text uses OCR, handwritten text uses a trained handwritten recognition model)
and extracting information from the converted text. ([0041]-[0044], queries extract relevant information)
Vig et al. does not teach converting the document to an image;
segregating the converted image into segments;
However Nor et al. does teach converting the document to an image; (Section 3, “PDF or PowerPoint form of the original electronic documents is converted into a relatively high-resolution image”)
segregating the converted image into segments; (Section 4.1 – 4.2)
The use of image segmentation in the process of converting images to text is known in the art. It would have been obvious to one with ordinary skill in the art before the effective filing date to convert documents into segmented images as taught by Nor et al. to use in the information extraction method of Vig et al. because it separates out the information in the image (Nor et al. Section 6).
Vig et al. and Nor et al. do not teach generating a confidence score with respect to the extracted information and sending a message requesting manual intervention when the confidence score is lower than a threshold, wherein the message includes a reason for the confidence score being below the threshold.
Gillick does teach sending a message requesting for manual intervention when the confidence score is lower than a threshold, wherein the message includes a reason for the confidence score being below the threshold. ([0039]-[0041]: compare a confidence estimate of an extracted text string to a threshold; if the threshold is too low, forward information to a human reviewer including information indicating the cause of the low confidence score)
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention to have calculated confidence scores for extracted information as it allows the system to determine whether to forward the issue to a human reviewer for correction (Gillick [0039]-[0041]).
Vig et al, Nor et al., and Gillick do not disclose wherein the reason for the confidence score being below the threshold comprises of or more of: incomplete or missing information; inconsistent information; unclear information; and calculation verification required.
However, Visser et al. does disclose wherein the reason for the confidence score being below the threshold comprises of or more of: incomplete or missing information; inconsistent information; unclear information; and calculation verification required. ([0130]: when verification fails, output a reason as to why [noisy/distant speech is unclear information]; [0184]: verification success status is dependent on confidence level being above a threshold)
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention to include reasons such as unclear information for the confidence level being below the threshold as taught by Visser et al. when performing the method of Vig et al, Nor et al, and Gillick because it enables specific suggestions to correct the issue (see Visser et al. [0132]).
With respect to claim 2, elements of parent claim 1 are disclosed as written above. Vig et al. further discloses the method wherein extracting information is done using at least one natural language processing technique ([0041], sequence to sequence models).
With respect to claim 3, elements of parent claim 1 are disclosed as written above. Vig et al. further discloses the method wherein extracting information is based on spatial coordinates of text on the image ([0039], textual entities are associated with spatial coordinates).
With respect to claim 5, elements of parent claim 1 are disclosed as written above. Vig et al. and Nor et al. further disclose the method wherein each segment comprises one or more lines of text (Nor et al. Figure 4.1; Vig et al. [0036], textual entities).
With respect to claim 7, elements of parent claim 1 are disclosed as written above. Nor et al. further discloses the method wherein segregating an image into segments uses a blank horizontal space or a blank vertical space to identify the start or the end of a segment (Section 4.2.1, segmentation based on blank spaces).
With respect to claim 8, elements of parent claim 1 are disclosed as written above. Nor et al. further discloses the method wherein segregating an image into segments comprises using a row with a specified characteristics as the start of a segment. (Section 4.2.1, horizontal segmentation based on rows with a certain amount of black pixel density)
With respect to claim 10, elements of parent claim 1 are disclosed as written above. Vig et al. further teaches the method wherein the conversion of segments to a digital text format uses a trained handwriting recognition model for handwritten text, and an optical character recognition algorithm for machine-typed text. ([0031]-[0032], [0036]; handwritten text recognition engine for handwritten text, OCR for machine-typed text)
With respect to claim 11, elements of parent claim 1 are disclosed as written above. Vig et al. further teaches the method, wherein the conversion of segments to a digital text format uses a trained unified text recognition model for both handwritten text and machine-typed text. ([0024], vision algorithms that recognize a combination of handwritten and printed text)
Claim(s) 4, 6, and 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vig et al in view of Nor et al. in view of Gillick as applied to claim 1 above, and further in view of Sadhuram et al. ("Natural Language Processing based New Approach to Design Factoid Question Answering System").
With respect to claim 4, Vig et al., Gillick, Nor et al., and Visser et al. disclose its dependent elements as written above for claim 1. Neither of them teach the method wherein extracting information is done using a question answering system.
However, Sadhuram et al. does teach the method wherein extracting information is done using a question answering system. (Section 3.4 QA System)
It would have been obvious to one with ordinary skill in the art before the effective filing date to incorporate a question-answering system taught by Sadhuram et al. in the extraction of information because it allows the method to retrieve relevant answers from the document in response to a question (Sadhuram et al. Section 1)
With respect to claim 6, Vig et al., Visser et al., Gillick and Nor et al. disclose its dependent elements as written above for claim 1. None of them teach the method wherein segregating an image into segments uses a set of received keywords to identify the start or the end of a segment, wherein the identification comprises using a similarity measure between the keywords and the words of the document.
However Sadhuram et al. does teach the method wherein segregating an image into segments uses a set of received keywords to identify the start or the end of a segment, wherein the identification comprises using a similarity measure between the keywords and the words of the document. (Section 3.4, QA System and Passage Retrieval)
It would have been obvious to one with ordinary skill in the art before the effective filing date to use a identify relevant segments using a similarity measure between the document and received keywords for the same motivation as used above for claim 4.
With respect to claim 9, Vig et al., Visser et al, Gillick and Nor et al. disclose its dependent elements as written above for claim 1. Neither of them teach the method wherein segregating an image into segments comprises a question- answering technique.
However, Sadhuram et al. does teach the method wherein segregating an image into segments comprises a question- answering technique. (Section 3.4, QA System)
It would have been obvious to one with ordinary skill in the art before the effective filing date to identify relevant segments with a question-answering technique for the same motivation as used above for claim 4.
Claim(s) 12, 15-22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vig et al. (US 20200175304 A1) in view of Gillick (US 20020133341 A1) in view of Visser et al. (US 20150301796 A1).
With respect to claim 12, Vig et al. discloses a system for retrieving data from a database of documents, the system comprising:
a data storage engine configured to store documents in the database; ([0040], Figure 2)
a document conversion engine configured to convert the documents in the database to text; ([0009], using OCR to convert input images to text)
an information retrieval engine configured to retrieve documents in the database based on at least one natural language processing (NLP) technique; ([0041], converting natural language queries into SQL)
and an information extraction engine configured to extract information from the retrieved documents and supply the extracted information as the retrieved data. ([0041]-[0044], queries extract relevant information)
Vig et al. does not teach generating a confidence score with respect to the extracted information and sending a message requesting manual intervention when the confidence score is lower than a threshold, wherein the message includes a reason for the confidence score being below the threshold.
Gillick does teach sending a message requesting for manual intervention when the confidence score is lower than a threshold, wherein the message includes a reason for the confidence score being below the threshold. ([0039]-[0041]: compare a confidence estimate of an extracted text string to a threshold; if the threshold is too low, forward information to a human reviewer including information indicating the cause of the low confidence score)
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention to have calculated confidence scores for extracted information as it allows the system to determine whether to forward the issue to a human reviewer for correction (Gillick [0039]-[0041]).
Vig et al, and Gillick do not disclose wherein the reason for the confidence score being below the threshold comprises of or more of: incomplete or missing information; inconsistent information; unclear information; and calculation verification required.
However, Visser et al. does disclose wherein the reason for the confidence score being below the threshold comprises of or more of: incomplete or missing information; inconsistent information; unclear information; and calculation verification required. ([0130]: when verification fails, output a reason as to why [noisy/distant speech is unclear information]; [0184]: verification success status is dependent on confidence level being above a threshold)
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention to include reasons such as unclear information for the confidence level being below the threshold as taught by Visser et al. when performing the method of Vig et al. and Gillick because it enables specific suggestions to correct the issue (see Visser et al. [0132]).
With respect to claim 15, limitations from parent claim 12 are addressed above. Vig et al. further discloses the document conversion engine configured to convert image documents to text. ([0036], performing OCR on images)
With respect to claim 16, limitations from parent claim 12 are addressed above. Vig et al. further discloses the conversion of image documents to text that uses a trained handwriting recognition model for handwritten text, and an optical character recognition algorithm for machine-typed text. ([0031]-[0032], [0036]; handwritten text recognition engine for handwritten text, OCR for machine-typed text)
With respect to claim 17, limitations from parent claim 12 are addressed above. Vig et al. further discloses the system wherein the conversion of image documents to text further uses a trained model to distinguish between handwritten text and machine-typed text. ([0031])
With respect to claim 18, limitations from parent claim 12 are addressed above. Vig et al. further discloses the conversion of image documents to text using a trained unified text recognition model for both handwritten text and machine-typed text. ([0024], vision algorithms that recognize a combination of handwritten and printed text)
With respect to claim 19, limitations from parent claim 12 are addressed above. Vig et al. further discloses the document conversion engine configured to convert documents that include tables to text ([0036], [0047], [0050], tables are one of the entities recognized)
With respect to claim 20, limitations from parent claim 12 are addressed above. Vig et al. further discloses the document conversion engine configured to convert documents that include multiple columns to text. ([0036], Fig 6; conversion of tables and other listed elements necessarily imply presence of multiple columns)
With respect to claim 21, limitations from parent claim 12 are addressed above. Vig et al. further discloses the system wherein the information retrieval engine uses one or more of knowledge-based techniques, rule-based techniques, keyword-based techniques. and deep- learning NLP model based techniques. ([0041], sequence to sequence models)
With respect to claim 22, limitations from parent claim 12 are addressed above. Vig et al. further discloses the system wherein the information extraction engine uses one or more of knowledge-based techniques, rule-based techniques, keyword-based techniques, and deep- learning NLP model based techniques. ([0041], sequence to sequence models)
Claim(s) 13-14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vig et al. in view of Gillick as applied to claim 12 above, and further in view of Nor et al. ("IMAGE SEGMENTATION AND TEXT EXTRACTION: APPLICATION TO THE EXTRACTION OF TEXTUAL INFORMATION IN SCENE IMAGES").
With respect to claim 13, Vig et al., Visser et al., and Gillick teach their dependent elements as written above for claim 12. They do not teach the document conversion engine configured to convert pdf documents to text.
However, Nor et al. does teach the document conversion engine configured to convert pdf documents to text. (Section 3-4, PDF document converted to an image; the rest of the reference details conversion of the image to text with OCR)
PDF is a commonly used file type so it is highly desirable and common in the art to convert them to a purely textual format. It would have been obvious to one with ordinary skill in the art before the effective date of filing to additionally convert PDF documents to text because the extraction of information from such documents is desirable (Section 1).
With respect to claim 14, Vig at el., Visser et al., and Gillick teach its dependent elements as written above for claim 12. They do not teach the document conversion engine configured to convert pdf documents to images.
However, Nor et al. does teach the document conversion engine configured to convert pdf documents to images. (Section 3)
Vig et al. teaches the conversion of images to text, but does not make mention of PDF documents. Nor et al. teaches the conversion of PDF documents to images as part of the overall process of converting those documents to text. It would have been obvious to one with ordinary skill in the art before the effective filing date to convert pdf documents to images as part of Vig et al.’s image conversion system in order to allow OCR to be performed on them (Nor et al. Section 3).
Claim(s) 23 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vig et al. in view of Gillick as applied to claim 12 above, and further in view of Sadhuram et al. ("Natural Language Processing based New Approach to Design Factoid Question Answering System").
With respect to claim 23, Vig et al., Visser et al., and Gillick disclose its dependent elements as written above for claim 12. However, they do not teach the information extraction engine configured to receive a set of keywords, compare the keywords with the text from the converted documents using a similarity measure to identify matching portions of text, and select the matching portions of text as the extracted information.
Sadhuram et al. does teach the information extraction engine configured to receive a set of keywords, compare the keywords with the text from the converted documents using a similarity measure to identify matching portions of text, and select the matching portions of text as the extracted information. (Section 3.4, QA System and Passage Retrieval)
Vig et al. teaches the conversion of natural language queries to SQL in order to find relevant data. It does not teach searching for relevant data by using a similarity measure between an input query and the passages in the corpus, which Sadhuram et al. does teach. This is a known method of finding relevant information in a collection in the art. It would have been obvious to one with ordinary skill in the art to incorporate Sadhuram et al.’s similarity searching into Vig et al.’s information extraction system because it allows the system retrieve relevant answers for queried questions (Sadhuram et al. Section I).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALVIN ISKENDER whose telephone number is (703)756-4565. The examiner can normally be reached M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, HAI PHAN can be reached on (571) 272-6338. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ALVIN ISKENDER/Examiner, Art Unit 2654
/HAI PHAN/Supervisory Patent Examiner, Art Unit 2654