DETAILED ACTION
Notice of AIA Status
The present application is being examined under the AIA the first inventor to file provisions.
Information Disclosure Statement
The information disclosure statements (IDS) submitted on 03/01/2024 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The following is a quotation of pre-AIA 35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is invoked.
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph:
(A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function;
(B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and
(C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function.
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function.
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function.
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
Claims 12, 15-17, and 19, recites limitations that use words like “means” (or “step”) or similar terms with functional language and do invoke 35 U.S.C. 112(f):
Claim 12; recites the limitation, “the processing device to……,” [Line 3-4].
Claim 15; recites the limitation, “the processing device is to……,” [Line 1-2].
Claim 16; recites the limitation, “the processing device is to ……,” [Line 2-3].
Claim 17; recites the limitation, “the processing device is to……,” [Line 1].
Claim 19; recites the limitation, “executed by a processing device, cause the processing device to ……,” [Line 2].
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
After a careful analysis, as disclosed above, and a careful review of the specification the following limitations in claim 12, 15-17, and 19:
“Processing device” (Fig. 7, #702. Paragraph [0068]- “Processing device 702 (which can include processing logic 703) represents one or more general-purpose processing devices such as a microprocessor, central processing unit, or the like. More particularly, the processing device 702 may be a complex instruction set computing (CISC) microprocessor, reduced instruction set computing (RISC) microprocessor, very long instruction word (VLIW) microprocessor, or a processor implementing other instruction sets or processors implementing a combination of instruction sets. The processing device 702 may also be one or more special-purpose processing devices such as an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a digital signal processor (DSP), network processor, or the like.” thus, have sufficient structure or material wherein is hardware containing any suitable number and types of processors or other processing devices including one or more microprocessors, microcontrollers, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), or discrete circuitry.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-5, 11-15 and 19 are rejected under 35 U.S.C 103 as being unpatentable over Semenov (US 20240169752 A1) hereafter referenced as Semenov in view of Weller et al. (US 20210081664 A1) hereafter referenced as Weller.
Regarding claim 1, Semenov explicitly teaches a method comprising (Fig. 9, Paragraph [0078]- Semenov discloses FIG. 9 is a flow diagram illustrating example method 900 of efficient identification of key-value associations in documents using neural networks, in accordance with some implementations of the present disclosure.):
processing a representation of a document to obtain a first set of hypotheses each associating the document with a respective value of a first document attribute (Fig. 1, Paragraph [0033]- Semenov discloses document context model 330 may transform object embeddings into feature vectors that account for context provided by other objects in document 102. Key hypotheses model 340 may generate hypotheses of association of object(s) in document 102 with various keys.);
processing the representation of the document to obtain a second set of hypotheses each associating the document with a respective value of a second document attribute (Fig. 1, Paragraph [0033]- Semenov discloses similarly, value hypotheses model 350 may generate hypotheses of association of object(s) of document 102 with possible values.);
forming a plurality of combined hypotheses each comprising at least a hypothesis of the first set and a hypothesis of the second set (Fig. 1, Paragraph [0023]- Semenov discloses the output hypotheses may then be processed by a trained KVA model that generates multiple KVA hypotheses, each KVA hypothesis linking a specific hypothesized key with a one of hypothesized value.);
identifying a preferred hypothesis from the plurality of combined hypotheses (Fig. 1, Paragraph [0023]- Semenov discloses different KVA hypotheses may then be combined (e.g., without contradictions, such as a given value associated with multiple different keys) to obtain one or more aggregated hypotheses. A trained evaluator may then evaluate the likelihood (probability) that various aggregated hypotheses are correct and select (e.g., as the hypothesis with the highest likelihood) one of the hypotheses as the final key-value associations of the document.),
the preferred hypothesis associating a first value with the first document attribute and a second value with the second document attribute (Fig. 1, Paragraph [0023]- Semenov discloses different KVA hypotheses may then be combined (e.g., without contradictions, such as a given value associated with multiple different keys) to obtain one or more aggregated hypotheses. A trained evaluator may then evaluate the likelihood (probability) that various aggregated hypotheses are correct and select (e.g., as the hypothesis with the highest likelihood) one of the hypotheses as the final key-value associations of the document.);
Semenov fails to explicitly teach extracting, using the first value and the second value, information content of the document.
However, Weller explicitly teaches extracting, using the first value and the second value, information content of the document (Fig. 3, Paragraph [0033]- Weller discloses at operation 315, a country associated with the electronic document may be identified or otherwise determined based on the identified and extracted items of information. For example, as discussed above with respect to electronic document 200 of FIG. 2, an email address or a particular tax ID may be included in the electronic document 200 which may provide clues as to a country of origin associated with the electronic document. At operation 320, country-dependent operations may be performed on the electronic document to identify and extract additional items of information.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to combine the teachings of Semenov of a method comprising: processing a representation of a document to obtain a first set of hypotheses each associating the document with a respective value of a first document attribute; processing the representation of the document to obtain a second set of hypotheses each associating the document with a respective value of a second document attribute with the teachings of Weller extracting, using the first value and the second value, information content of the document.
Wherein having Semenov’s system for processing a document wherein extracting, using the first value and the second value, information content of the document.
The motivation behind the modification would have been to allow for a more accurate extraction of information, since both Semenov and Weller are systems that extract information from documents. Wherein Semenov’s system wherein improved efficiency of Key value associations in documents, while Weller’s system wherein improved accuracy of the data extracted from the document. Please see Semenov et al. (US 20240169752 A1), Paragraph [0024] and Weller et al. (US 20210081664 A1) Paragraph [0021-22].
Regarding claim 2, Semenov in view of Weller explicitly teaches the method of claim 1,
Semenov further teaches wherein the representation of the document is obtained using one or more optical character recognition (OCR) algorithms (Fig. 4, Paragraph [0043]- Semenov discloses Input document 302 may undergo optical character recognition (OCR) 410-1.).
Regarding claim 3, Semenov in view of Weller explicitly teaches the method of claim 1, Semenov fails to explicitly teach wherein the first document attribute comprises at least one of: a country associated with an originator of the document, a name of the originator of the document, a country referenced in the document, a currency referenced in the document, an address referenced in the document, a language of the document, or a date format used in the document.
However, Weller explicitly teaches wherein the first document attribute comprises at least one of: a country associated with an originator of the document (Fig. 2, Paragraph [0028]- Weller discloses electronic document 200 lists certain information which may be utilized to determine or identify a country of origin of a vendor.),
a name of the originator of the document (Fig. 4, Paragraph [0035]- Weller discloses Extraction of the sender and/or receiver may include, for example, extraction of a name, address, tax ID, e-mail, and/or bank account, to name just a few examples among many.),
a country referenced in the document (Fig. 2, Paragraph [0028]- Weller discloses certain other information located on the electronic document 200 may also be utilized to identify a country of origin, such as the inclusion of an “ABN” number, which may refer to an “Australian Business Number.” In this example embodiment, “ABN: 11 00 222” is shown on electronic document 200, thereby giving an indication that an associated vendor is likely located in Australia. Of course, other information shown on electronic document 200 may also give indications of a country of origin in some embodiments, such as formatting of numbers of a phone or fax number, or of a street address, for example.),
a currency referenced in the document (Fig. 4, Paragraph [0035]- Weller discloses other items of information may also be extracted, such as a document number, a document data, a currency, an amount, a table, and an employee name or ID, for example.),
an address referenced in the document (Fig. 2, Paragraph [0028]- Weller discloses other information shown on electronic document 200 may also give indications of a country of origin in some embodiments, such as formatting of numbers of a phone or fax number, or of a street address, for example.),
a language of the document (Fig. 1, Paragraph [0015]- Weller discloses some invoices may also use a different coloring scheme or a different language. For example, an invoice from a German company may be printed in the German language, whereas an invoice from a company in the United States of America may be printed in the English language.),
or a date format used in the document (Fig. 2, Paragraph [0030]- Weller discloses for example, it may be determined that date “09102018” refers to “Oct. 9, 2018” instead of “Sep. 10, 2018” because a vendor associated with electronic document 200 is determined to likely be located in Australia, where day is typically listed before a month in a date format.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to combine the teachings of Semenov of a method comprising: processing a representation of a document to obtain a first set of hypotheses each associating the document with a respective value of a first document attribute; processing the representation of the document to obtain a second set of hypotheses each associating the document with a respective value of a second document attribute with the teachings of Weller wherein the first document attribute comprises at least one of: a country associated with an originator of the document, a name of the originator of the document, a country referenced in the document, a currency referenced in the document, an address referenced in the document, a language of the document, or a date format used in the document.
Wherein having Semenov’s system for processing a document wherein the first document attribute comprises at least one of: a country associated with an originator of the document, a name of the originator of the document, a country referenced in the document, a currency referenced in the document, an address referenced in the document, a language of the document, or a date format used in the document.
The motivation behind the modification would have been to allow for a more accurate extraction of information, since both Semenov and Weller are systems that extract information from documents. Wherein Semenov’s system wherein improved efficiency of Key value associations in documents, while Weller’s system wherein improved accuracy of the data extracted from the document. Please see Semenov et al. (US 20240169752 A1), Paragraph [0024] and Weller et al. (US 20210081664 A1) Paragraph [0021-22].
Regarding claim 4, Semenov in view of Weller explicitly teaches the method of claim 1, Semenov further teaches wherein identifying the preferred hypothesis comprises: processing, using a hypotheses classifier model, the plurality of combined hypotheses (Fig. 1, Paragraph [0023]- Semenov discloses the output hypotheses may then be processed by a trained KVA model that generates multiple KVA hypotheses, each KVA hypothesis linking a specific hypothesized key with a one of hypothesized value.).
Regarding claim 5, Semenov in view of Weller explicitly teaches the method of claim 1, Semenov further teaches wherein processing the representation of the document to obtain the first set of hypotheses is performed using a first machine learning model (MLM) (Fig. 1, Paragraph [0033]- Semenov discloses document context model 330 may transform object embeddings into feature vectors that account for context provided by other objects in document 102. Key hypotheses model 340 may generate hypotheses of association of object(s) in document 102 with various keys.),
and wherein processing the representation of the document to obtain the second set of hypotheses is performed using a second MLM (Fig. 1, Paragraph [0033]- Semenov discloses similarly, value hypotheses model 350 may generate hypotheses of association of object(s) of document 102 with possible values.).
Regarding claim 11, Semenov in view of Weller explicitly teaches the method of claim 1, Semenov further teaches wherein at least some of the information content is present in the document in an unstructured form (Fig. 1, Paragraph [0019]- Semenov discloses in many instances, however, information is entered into printed or other physical documents or electronic unstructured documents (e.g., a scan of a physical form) using various writing or typing instruments, including pens, pencils, typewriters, printers, stamps, and the like, with filled out forms subsequently scanned or photographed to obtain an unstructured image of the form/document. In other instances, information is entered into unstructured electronic documents using a computer. The unstructured electronic documents may be stored, communicated, and eventually processed by a recipient computer to identify information contained in the documents, including determining values of various populated fields, e.g., using techniques of optical character recognition (OCR).).
Regarding claim 12, Semenov teaches a system comprising (Fig. 1, Paragraph [0029]- Semenov discloses FIG. 1 is a block diagram of an example computer system 100 in which implementations of the disclosure may operate.)
a memory (Fig. 10, Paragraph [0087]- Semenov discloses the exemplary computer system 1000 includes a processing device 1002, a main memory 1004 (e.g., read-only memory (ROM), flash memory, dynamic random access memory (DRAM) such as synchronous DRAM (SDRAM)), a static memory 1006 (e.g., flash memory, static random access memory (SRAM)), and a data storage device 1018, which communicate with each other via a bus 1030.);
and a processing device communicatively coupled to the memory (Fig. 10, Paragraph [0087]- Semenov discloses the exemplary computer system 1000 includes a processing device 1002, a main memory 1004 (e.g., read-only memory (ROM), flash memory, dynamic random access memory (DRAM) such as synchronous DRAM (SDRAM)), a static memory 1006 (e.g., flash memory, static random access memory (SRAM)), and a data storage device 1018, which communicate with each other via a bus 1030.),
the processing device to: process a representation of a document to obtain a first set of hypotheses each associating the document with a respective value of a first document attribute (Fig. 1, Paragraph [0033]- Semenov discloses document context model 330 may transform object embeddings into feature vectors that account for context provided by other objects in document 102. Key hypotheses model 340 may generate hypotheses of association of object(s) in document 102 with various keys.);
process the representation of the document to obtain a second set of hypotheses each associating the document with a respective value of a second document attribute (Fig. 1, Paragraph [0033]- Semenov discloses similarly, value hypotheses model 350 may generate hypotheses of association of object(s) of document 102 with possible values.);
form a plurality of combined hypotheses each comprising at least a hypothesis of the first set and a hypothesis of the second set (Fig. 1, Paragraph [0023]- Semenov discloses the output hypotheses may then be processed by a trained KVA model that generates multiple KVA hypotheses, each KVA hypothesis linking a specific hypothesized key with a one of hypothesized value.);
identify a preferred hypothesis from the plurality of combined hypotheses (Fig. 1, Paragraph [0023]- Semenov discloses different KVA hypotheses may then be combined (e.g., without contradictions, such as a given value associated with multiple different keys) to obtain one or more aggregated hypotheses. A trained evaluator may then evaluate the likelihood (probability) that various aggregated hypotheses are correct and select (e.g., as the hypothesis with the highest likelihood) one of the hypotheses as the final key-value associations of the document.),
the preferred hypothesis associating a first value with the first document attribute and a second value with the second document attribute (Fig. 1, Paragraph [0023]- Semenov discloses different KVA hypotheses may then be combined (e.g., without contradictions, such as a given value associated with multiple different keys) to obtain one or more aggregated hypotheses. A trained evaluator may then evaluate the likelihood (probability) that various aggregated hypotheses are correct and select (e.g., as the hypothesis with the highest likelihood) one of the hypotheses as the final key-value associations of the document.);
Semenov fails to explicitly teach extract, using the first value and the second value, information content of the document.
However, Weller explicitly teaches extract, using the first value and the second value, information content of the document (Fig. 3, Paragraph [0033]- Weller discloses at operation 315, a country associated with the electronic document may be identified or otherwise determined based on the identified and extracted items of information. For example, as discussed above with respect to electronic document 200 of FIG. 2, an email address or a particular tax ID may be included in the electronic document 200 which may provide clues as to a country of origin associated with the electronic document. At operation 320, country-dependent operations may be performed on the electronic document to identify and extract additional items of information.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to combine the teachings of Semenov of a system comprising: a memory; and a processing device communicatively coupled to the memory, the processing device to: process a representation of a document to obtain a first set of hypotheses each associating the document with a respective value of a first document attribute; process the representation of the document to obtain a second set of hypotheses each associating the document with a respective value of a second document attribute with the teachings of Weller extract, using the first value and the second value, information content of the document.
Wherein having Semenov’s system for processing a document wherein extract, using the first value and the second value, information content of the document.
The motivation behind the modification would have been to allow for a more accurate extraction of information, since both Semenov and Weller are systems that extract information from documents. Wherein Semenov’s system wherein improved efficiency of Key value associations in documents, while Weller’s system wherein improved accuracy of the data extracted from the document. Please see Semenov et al. (US 20240169752 A1), Paragraph [0024] and Weller et al. (US 20210081664 A1) Paragraph [0021-22].
Regarding claim 13, Semenov in view of Weller teaches the system of claim 12, Semenov further teaches wherein the representation of the document is obtained using one or more optical character recognition (OCR) algorithms (Fig. 4, Paragraph [0043]- Semenov discloses Input document 302 may undergo optical character recognition (OCR) 410-1.).
Regarding claim 14, Semenov in view of Weller teaches the system of claim 12, Semenov fails to explicitly teach wherein the first document attribute comprises at least one of: a country associated with an originator of the document, a name of the originator of the document, a country referenced in the document, a currency referenced in the document, an address referenced in the document, a language of the document, or a date format used in the document.
However, Weller explicitly teaches wherein the first document attribute comprises at least one of: a country associated with an originator of the document (Fig. 2, Paragraph [0028]- Weller discloses electronic document 200 lists certain information which may be utilized to determine or identify a country of origin of a vendor.),
a name of the originator of the document (Fig. 4, Paragraph [0035]- Weller discloses Extraction of the sender and/or receiver may include, for example, extraction of a name, address, tax ID, e-mail, and/or bank account, to name just a few examples among many.),
a country referenced in the document (Fig. 2, Paragraph [0028]- Weller discloses certain other information located on the electronic document 200 may also be utilized to identify a country of origin, such as the inclusion of an “ABN” number, which may refer to an “Australian Business Number.” In this example embodiment, “ABN: 11 00 222” is shown on electronic document 200, thereby giving an indication that an associated vendor is likely located in Australia. Of course, other information shown on electronic document 200 may also give indications of a country of origin in some embodiments, such as formatting of numbers of a phone or fax number, or of a street address, for example.),
a currency referenced in the document (Fig. 4, Paragraph [0035]- Weller discloses other items of information may also be extracted, such as a document number, a document data, a currency, an amount, a table, and an employee name or ID, for example.),
an address referenced in the document (Fig. 2, Paragraph [0028]- Weller discloses other information shown on electronic document 200 may also give indications of a country of origin in some embodiments, such as formatting of numbers of a phone or fax number, or of a street address, for example.),
a language of the document (Fig. 1, Paragraph [0015]- Weller discloses some invoices may also use a different coloring scheme or a different language. For example, an invoice from a German company may be printed in the German language, whereas an invoice from a company in the United States of America may be printed in the English language.),
or a date format used in the document (Fig. 2, Paragraph [0030]- Weller discloses for example, it may be determined that date “09102018” refers to “Oct. 9, 2018” instead of “Sep. 10, 2018” because a vendor associated with electronic document 200 is determined to likely be located in Australia, where day is typically listed before a month in a date format.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to combine the teachings of Semenov of a system comprising: a memory; and a processing device communicatively coupled to the memory, the processing device to: process a representation of a document to obtain a first set of hypotheses each associating the document with a respective value of a first document attribute; process the representation of the document to obtain a second set of hypotheses each associating the document with a respective value of a second document attribute with the teachings of Weller wherein the first document attribute comprises at least one of: a country associated with an originator of the document, a name of the originator of the document, a country referenced in the document, a currency referenced in the document, an address referenced in the document, a language of the document, or a date format used in the document.
Wherein having Semenov’s system for processing a document wherein the first document attribute comprises at least one of: a country associated with an originator of the document, a name of the originator of the document, a country referenced in the document, a currency referenced in the document, an address referenced in the document, a language of the document, or a date format used in the document.
The motivation behind the modification would have been to allow for a more accurate extraction of information, since both Semenov and Weller are systems that extract information from documents. Wherein Semenov’s system wherein improved efficiency of Key value associations in documents, while Weller’s system wherein improved accuracy of the data extracted from the document. Please see Semenov et al. (US 20240169752 A1), Paragraph [0024] and Weller et al. (US 20210081664 A1) Paragraph [0021-22].
Regarding claim 15, Semenov in view of Weller teaches the system of claim 12, Semenov further teaches wherein to identify the preferred hypothesis, the processing device is to: process, using a hypotheses classifier model, the plurality of combined hypotheses (Fig. 1, Paragraph [0023]- Semenov discloses the output hypotheses may then be processed by a trained KVA model that generates multiple KVA hypotheses, each KVA hypothesis linking a specific hypothesized key with a one of hypothesized value.).
Regarding claim 19, Semenov teaches a non-transitory computer-readable memory storing instructions that (Fig. 1, Paragraph [0006]- Semenov discloses a non-transitory machine-readable storage medium is disclosed storing instructions that, when accessed by a processing device, cause a processing device to obtain a plurality of vectors, each vector of the plurality of vectors being representative of one of a plurality of objects in a document.),
when executed by a processing device, cause the processing device to perform operations comprising: processing a representation of a document to obtain a first set of hypotheses each associating the document with a respective value of a first document attribute (Fig. 1, Paragraph [0033]- Semenov discloses document context model 330 may transform object embeddings into feature vectors that account for context provided by other objects in document 102. Key hypotheses model 340 may generate hypotheses of association of object(s) in document 102 with various keys.);
processing the representation of the document to obtain a second set of hypotheses each associating the document with a respective value of a second document attribute (Fig. 1, Paragraph [0033]- Semenov discloses similarly, value hypotheses model 350 may generate hypotheses of association of object(s) of document 102 with possible values.);
forming a plurality of combined hypotheses each comprising at least a hypothesis of the first set and a hypothesis of the second set (Fig. 1, Paragraph [0023]- Semenov discloses the output hypotheses may then be processed by a trained KVA model that generates multiple KVA hypotheses, each KVA hypothesis linking a specific hypothesized key with a one of hypothesized value.);
identifying a preferred hypothesis from the plurality of combined hypotheses (Fig. 1, Paragraph [0023]- Semenov discloses different KVA hypotheses may then be combined (e.g., without contradictions, such as a given value associated with multiple different keys) to obtain one or more aggregated hypotheses. A trained evaluator may then evaluate the likelihood (probability) that various aggregated hypotheses are correct and select (e.g., as the hypothesis with the highest likelihood) one of the hypotheses as the final key-value associations of the document.),
the preferred hypothesis associating a first value with the first document attribute and a second value with the second document attribute (Fig. 1, Paragraph [0023]- Semenov discloses different KVA hypotheses may then be combined (e.g., without contradictions, such as a given value associated with multiple different keys) to obtain one or more aggregated hypotheses. A trained evaluator may then evaluate the likelihood (probability) that various aggregated hypotheses are correct and select (e.g., as the hypothesis with the highest likelihood) one of the hypotheses as the final key-value associations of the document.);
Semenov fails to explicitly teach extracting, using the first value and the second value, information content of the document.
However, Weller explicitly teaches extracting, using the first value and the second value, information content of the document (Fig. 3, Paragraph [0033]- Weller discloses at operation 315, a country associated with the electronic document may be identified or otherwise determined based on the identified and extracted items of information. For example, as discussed above with respect to electronic document 200 of FIG. 2, an email address or a particular tax ID may be included in the electronic document 200 which may provide clues as to a country of origin associated with the electronic document. At operation 320, country-dependent operations may be performed on the electronic document to identify and extract additional items of information.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to combine the teachings of Semenov of A non-transitory computer-readable memory storing instructions that, when executed by a processing device, cause the processing device to perform operations comprising: processing a representation of a document to obtain a first set of hypotheses each associating the document with a respective value of a first document attribute; processing the representation of the document to obtain a second set of hypotheses each associating the document with a respective value of a second document attribute with the teachings of Weller extracting, using the first value and the second value, information content of the document.
Wherein having Semenov’s system for processing a document wherein extracting, using the first value and the second value, information content of the document.
The motivation behind the modification would have been to allow for a more accurate extraction of information, since both Semenov and Weller are systems that extract information from documents. Wherein Semenov’s system wherein improved efficiency of Key value associations in documents, while Weller’s system wherein improved accuracy of the data extracted from the document. Please see Semenov et al. (US 20240169752 A1), Paragraph [0024] and Weller et al. (US 20210081664 A1) Paragraph [0021-22].
Claims 6 and 16 are rejected under 35 U.S.C 103 as being unpatentable over Semenov (US 20240169752 A1) hereafter referenced as Semenov in view of Weller et al. (US 20210081664 A1) hereafter referenced as Weller and Semenov (US 20210150338 A1) hereafter referenced as Semenov2.
Regarding claim 6, Semenov in view of Weller teaches the method of claim 5, Semenov in view of Weller fails to explicitly teach wherein the representation of the document comprises a plurality of vectors each associated with a respective symbol sequence of a plurality of symbol sequences of the document, and wherein at least one of the first MLM or second MLM comprises a first subnetwork and a second subnetwork, wherein the first subnetwork processes the plurality of vectors along a horizontal dimension of the document, and the second subnetwork processes the plurality of vectors along a vertical dimension of the document.
However, Semenov2 explicitly teaches wherein the representation of the document comprises a plurality of vectors each associated with a respective symbol sequence of a plurality of symbol sequences of the document (Fig. 1, Paragraph [0068]- Semenov2 discloses the field detection engine 111 may input the symbol sequences SymSeq(x,y) into the subsystem A 240 to generate feature vector representations for each of the symbol sequences: SymSeq(x,y).fwdarw.vec(x,y).),
and wherein at least one of the first MLM or second MLM comprises a first subnetwork and a second subnetwork (Fig. 3, Paragraph [0079]- Semenov2 discloses the subsystem 300 may include one or more neural networks each containing a plurality of layers of neurons. In some implementation, the subsystem 300 may include two neural networks, a horizontal-pass network 310 and a vertical-pass network 320.),
wherein the first subnetwork processes the plurality of vectors along a horizontal dimension of the document (Fig. 3, Paragraph [0080]- Semenov2 discloses the horizontal-pass network 310 and the vertical-pass network 320 may perform a plurality of passes along the horizontal (x) and vertical (y) dimensions of the cube 250.),
and the second subnetwork processes the plurality of vectors along a vertical dimension of the document (Fig. 3, Paragraph [0080]- Semenov2 discloses the horizontal-pass network 310 and the vertical-pass network 320 may perform a plurality of passes along the horizontal (x) and vertical (y) dimensions of the cube 250.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to combine the teachings of Semenov in view of Weller of a method comprising: processing a representation of a document to obtain a first set of hypotheses each associating the document with a respective value of a first document attribute; processing the representation of the document to obtain a second set of hypotheses each associating the document with a respective value of a second document attribute with the teachings of Semenov2 wherein the representation of the document comprises a plurality of vectors each associated with a respective symbol sequence of a plurality of symbol sequences of the document, and wherein at least one of the first MLM or second MLM comprises a first subnetwork and a second subnetwork, wherein the first subnetwork processes the plurality of vectors along a horizontal dimension of the document, and the second subnetwork processes the plurality of vectors along a vertical dimension of the document.
Wherein having Semenov’s system for processing a document wherein the representation of the document comprises a plurality of vectors each associated with a respective symbol sequence of a plurality of symbol sequences of the document, and wherein at least one of the first MLM or second MLM comprises a first subnetwork and a second subnetwork, wherein the first subnetwork processes the plurality of vectors along a horizontal dimension of the document, and the second subnetwork processes the plurality of vectors along a vertical dimension of the document.
The motivation behind the modification would have been to allow for more quality and accurate field detection, since both Semenov and Semenov2 are systems that extract information from documents. Wherein Semenov’s system wherein improved efficiency of Key value associations in documents, while Semenov2’s system wherein improved accuracy and quality of the detection results. Please see Semenov et al. (US 20240169752 A1), Paragraph [0024] and Semenov2 et al. (US 20210150338 A1) Paragraph [0035].
Regarding claim 16, Semenov in view of Weller teaches the system of claim 12, Semenov further teaches wherein to process the representation of the document to obtain the first set of hypotheses and the second set of hypotheses, the processing device is to use a machine learning model (MLM) (Fig. 1, Paragraph [0033]- Semenov discloses document context model 330 may transform object embeddings into feature vectors that account for context provided by other objects in document 102. Key hypotheses model 340 may generate hypotheses of association of object(s) in document 102 with various keys.),
Semenov in view of Weller fails to explicitly teach wherein the representation of the document comprises a plurality of vectors each associated with a respective symbol sequence of a plurality of symbol sequences of the document, and wherein at least one of the MLM comprises a first subnetwork and a second subnetwork, the first subnetwork processing the plurality of vectors along a horizontal dimension of the document and the second subnetwork processing the plurality of vectors along a vertical dimension of the document.
However, Semenov2 explicitly teaches wherein the representation of the document comprises a plurality of vectors each associated with a respective symbol sequence of a plurality of symbol sequences of the document (Fig. 1, Paragraph [0068]- Semenov2 discloses the field detection engine 111 may input the symbol sequences SymSeq(x,y) into the subsystem A 240 to generate feature vector representations for each of the symbol sequences: SymSeq(x,y).fwdarw.vec(x,y).),
and wherein at least one of the MLM comprises a first subnetwork and a second subnetwork (Fig. 3, Paragraph [0079]- Semenov2 discloses the subsystem 300 may include one or more neural networks each containing a plurality of layers of neurons. In some implementation, the subsystem 300 may include two neural networks, a horizontal-pass network 310 and a vertical-pass network 320.),
the first subnetwork processing the plurality of vectors along a horizontal dimension of the document (Fig. 3, Paragraph [0080]- Semenov2 discloses the horizontal-pass network 310 and the vertical-pass network 320 may perform a plurality of passes along the horizontal (x) and vertical (y) dimensions of the cube 250.)
and the second subnetwork processing the plurality of vectors along a vertical dimension of the document (Fig. 3, Paragraph [0080]- Semenov2 discloses the horizontal-pass network 310 and the vertical-pass network 320 may perform a plurality of passes along the horizontal (x) and vertical (y) dimensions of the cube 250.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to combine the teachings of Semenov in view of Weller of a system comprising: a memory; and a processing device communicatively coupled to the memory, the processing device to: process a representation of a document to obtain a first set of hypotheses each associating the document with a respective value of a first document attribute; process the representation of the document to obtain a second set of hypotheses each associating the document with a respective value of a second document attribute with the teachings of Semenov2 wherein the representation of the document comprises a plurality of vectors each associated with a respective symbol sequence of a plurality of symbol sequences of the document, and wherein at least one of the MLM comprises a first subnetwork and a second subnetwork, the first subnetwork processing the plurality of vectors along a horizontal dimension of the document and the second subnetwork processing the plurality of vectors along a vertical dimension of the document.
Wherein having Semenov’s system for processing a document wherein the representation of the document comprises a plurality of vectors each associated with a respective symbol sequence of a plurality of symbol sequences of the document, and wherein at least one of the MLM comprises a first subnetwork and a second subnetwork, the first subnetwork processing the plurality of vectors along a horizontal dimension of the document and the second subnetwork processing the plurality of vectors along a vertical dimension of the document.
The motivation behind the modification would have been to allow for more quality and accurate field detection, since both Semenov and Semenov2 are systems that extract information from documents. Wherein Semenov’s system wherein improved efficiency of Key value associations in documents, while Semenov2’s system wherein improved accuracy and quality of the detection results. Please see Semenov et al. (US 20240169752 A1), Paragraph [0024] and Semenov2 et al. (US 20210150338 A1) Paragraph [0035].
Claim 7 is rejected under 35 U.S.C 103 as being unpatentable over Semenov (US 20240169752 A1) hereafter referenced as Semenov in view of Weller et al. (US 20210081664 A1) hereafter referenced as Weller and Zhdanov (US 11501210 B1) hereafter referenced as Zhdanov.
Regarding claim 7, Semenov in view of Weller teaches the method of claim 5, Semenov in view of Weller fails to explicitly teach further comprising: responsive to the preferred hypothesis being identified with a confidence below a threshold confidence, forwarding the document to a review; and updating training of at least the first MLM based at least on a difference between the first value and a first ground truth value obtained during the review.
However, Zhdanov explicitly teaches further comprising: responsive to the preferred hypothesis being identified with a confidence below a threshold confidence, forwarding the document to a review (Column 4 Lines [0034-41]- Zhdanov discloses here, if the confidence that the fields represent a key value pair is less than a threshold and/or if the confidence that the words within the fields are less than a threshold, human review may be invoked. As such, if any and/or all of the condition(s) are met, the prediction of the ML model(s) may be output. Alternatively, if the conditions are not met, the prediction of the ML model(s) may not be sent for human review.);
and updating training of at least the first MLM based at least on a difference between the first value and a first ground truth value obtained during the review (Column 28 Lines [0045-53]- Zhdanov discloses after performing the first operation 702, the process 700 may include determining a first confidence 704 associated with the first operation 702. For example, the ML model(s) may determine a confidence that the content does not include or contain explicit material. In some instances, if a reviewer performs the first operation, the input or answer to the first operation 702, may be treated as the ground truth or that the content does not contain explicit material. Further in Column 5 Lines [0056-58]- Zhdanov discloses upon receiving the verifications and/or readjustments from the reviewers, as noted above, the ML models may be retrained to more accurately predict outputs.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to combine the teachings of Semenov in view of Weller of a method comprising: processing a representation of a document to obtain a first set of hypotheses each associating the document with a respective value of a first document attribute; processing the representation of the document to obtain a second set of hypotheses each associating the document with a respective value of a second document attribute with the teachings of Zhdanov further comprising: responsive to the preferred hypothesis being identified with a confidence below a threshold confidence, forwarding the document to a review; and updating training of at least the first MLM based at least on a difference between the first value and a first ground truth value obtained during the review.
Wherein having Semenov’s system for processing a document wherein further comprising: responsive to the preferred hypothesis being identified with a confidence below a threshold confidence, forwarding the document to a review; and updating training of at least the first MLM based at least on a difference between the first value and a first ground truth value obtained during the review.
The motivation behind the modification would have been to allow for more accurate ML models, since both Semenov and Zhdanov are systems that extract information from documents. Wherein Semenov’s system wherein improved efficiency of Key value associations in documents, while Zhdanov’s system wherein improved accuracy of the machine learning models. Please see Semenov et al. (US 20240169752 A1), Paragraph [0024] and Zhdanov et al. (US 11501210 B1) Column 21 Lines [0048-67].
Claim 8 is rejected under 35 U.S.C 103 as being unpatentable over Semenov (US 20240169752 A1) hereafter referenced as Semenov in view of Weller et al. (US 20210081664 A1) hereafter referenced as Weller and Attar (US 20240160672 A1) hereafter referenced as Attar.
Regarding claim 8, Semenov in view of Weller teaches the method of claim 1, Semenov in view of Weller fails to explicitly teach further comprising: identifying, using a database of attributes, a database value associated with the first document attribute; and responsive to the first value being identified with a confidence above a reference confidence, updating the database value to the first value.
However, Attar explicitly teaches further comprising: identifying, using a database of attributes, a database value associated with the first document attribute (Fig. 3, Paragraph [0049]- Attar discloses that is, the system may be configured to compare the first text data to data contained within one or more known document types (e.g., financial, tax, employment, etc.), to determine a likelihood that a document type match has been found based on one or more document classification confidences.);
and responsive to the first value being identified with a confidence above a reference confidence, updating the database value to the first value (Fig. 1, Paragraph [0039]- Attar discloses the one or more confidence thresholds may be predetermined, such as preset percentages (e.g., 75%, 85%, 95%, etc.), or may be relative, such as based on a comparison of features contained within each document type of the plurality of known document types. The system may be configured to repeat the above iterative process until the correspondence between the first text data and the one or more document types meets or exceeds one or more confidence thresholds.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to combine the teachings of Semenov in view of Weller of a method comprising: processing a representation of a document to obtain a first set of hypotheses each associating the document with a respective value of a first document attribute; processing the representation of the document to obtain a second set of hypotheses each associating the document with a respective value of a second document attribute with the teachings of Attar further comprising: identifying, using a database of attributes, a database value associated with the first document attribute; and responsive to the first value being identified with a confidence above a reference confidence, updating the database value to the first value.
Wherein having Semenov’s system for processing a document wherein further comprising: identifying, using a database of attributes, a database value associated with the first document attribute; and responsive to the first value being identified with a confidence above a reference confidence, updating the database value to the first value.
The motivation behind the modification would have been to allow for more accurate and efficient system, since both Semenov and Attar are systems that extract information from documents. Wherein Semenov’s system wherein improved efficiency of Key value associations in documents, while Attar’s system wherein improved accuracy and efficiency of the system. Please see Semenov et al. (US 20240169752 A1), Paragraph [0024] and Attar et al. (US 20240160672 A1) Paragraph [0088].
Claims 9 and 20 are rejected under 35 U.S.C 103 as being unpatentable over Semenov (US 20240169752 A1) hereafter referenced as Semenov in view of Weller et al. (US 20210081664 A1) hereafter referenced as Weller and Sharma (US 12051255 B1) hereafter referenced as Sharma.
Regarding claim 9, Semenov in view of Weller teaches the method of claim 1, Semenov in view of Weller fails to explicitly teach wherein processing of the representation of the document to obtain the first set of hypotheses and the second set of hypotheses is responsive to identifying that a database of attributes is unavailable for a type of documents that is associated with the document.
However, Sharma explicitly teaches wherein processing of the representation of the document to obtain the first set of hypotheses and the second set of hypotheses is responsive to identifying that a database of attributes is unavailable for a type of documents that is associated with the document (Fig. 3, Column 9-10 Lines [0061-66 and 0001-7]- Sharma discloses a fallback path is provided for use in certain circumstances. For example, the message analyzer 302 can determine whether to route the message for processing on the fallback path. The fallback path can be designed to process messages according to a different machine learning model trained on a smaller feature set. This can be used, for example, when the message analyzer determines that the message relates to a client or jurisdictions that the ML model 320 has not been trained for. These can result in features that can bias the classification result. The fallback path can be a mirror of the processing flow described above with the only difference being the features extracted and the features used in the machine learning model on the fallback path.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to combine the teachings of Semenov in view of Weller of a method comprising: processing a representation of a document to obtain a first set of hypotheses each associating the document with a respective value of a first document attribute; processing the representation of the document to obtain a second set of hypotheses each associating the document with a respective value of a second document attribute with the teachings of Sharma wherein processing of the representation of the document to obtain the first set of hypotheses and the second set of hypotheses is responsive to identifying that a database of attributes is unavailable for a type of documents that is associated with the document.
Wherein having Semenov’s system for processing a document wherein processing of the representation of the document to obtain the first set of hypotheses and the second set of hypotheses is responsive to identifying that a database of attributes is unavailable for a type of documents that is associated with the document.
The motivation behind the modification would have been to allow for a more efficient system, since both Semenov and Sharma are systems that extract information from documents. Wherein Semenov’s system wherein improved efficiency of Key value associations in documents, while Sharma’s system wherein improved efficiency of the system. Please see Semenov et al. (US 20240169752 A1), Paragraph [0024] and Sharma et al. (US 12051255 B1) Column 3 Lines [0021-43].
Regarding claim 20, Semenov in view of Weller teaches the non-transitory computer-readable memory of claim 19, Semenov in view of Weller fails to explicitly teach wherein processing of the representation of the document to obtain the first set of hypotheses and the second set of hypotheses is responsive to identifying that a database of attributes is unavailable for a type of documents that is associated with the document.
However, Sharma explicitly teaches wherein processing of the representation of the document to obtain the first set of hypotheses and the second set of hypotheses is responsive to identifying that a database of attributes is unavailable for a type of documents that is associated with the document (Fig. 3, Column 9-10 Lines [0061-66 and 0001-7]- Sharma discloses a fallback path is provided for use in certain circumstances. For example, the message analyzer 302 can determine whether to route the message for processing on the fallback path. The fallback path can be designed to process messages according to a different machine learning model trained on a smaller feature set. This can be used, for example, when the message analyzer determines that the message relates to a client or jurisdictions that the ML model 320 has not been trained for. These can result in features that can bias the classification result. The fallback path can be a mirror of the processing flow described above with the only difference being the features extracted and the features used in the machine learning model on the fallback path.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to combine the teachings of Semenov in view of Weller of A non-transitory computer-readable memory storing instructions that, when executed by a processing device, cause the processing device to perform operations comprising: processing a representation of a document to obtain a first set of hypotheses each associating the document with a respective value of a first document attribute; processing the representation of the document to obtain a second set of hypotheses each associating the document with a respective value of a second document attribute with the teachings of Sharma wherein processing of the representation of the document to obtain the first set of hypotheses and the second set of hypotheses is responsive to identifying that a database of attributes is unavailable for a type of documents that is associated with the document.
Wherein having Semenov’s system for processing a document wherein processing of the representation of the document to obtain the first set of hypotheses and the second set of hypotheses is responsive to identifying that a database of attributes is unavailable for a type of documents that is associated with the document.
The motivation behind the modification would have been to allow for a more efficient system, since both Semenov and Sharma are systems that extract information from documents. Wherein Semenov’s system wherein improved efficiency of Key value associations in documents, while Sharma’s system wherein improved efficiency of the system. Please see Semenov et al. (US 20240169752 A1), Paragraph [0024] and Sharma et al. (US 12051255 B1) Column 3 Lines [0021-43].
Claims 10 and 18 are rejected under 35 U.S.C 103 as being unpatentable over Semenov (US 20240169752 A1) hereafter referenced as Semenov in view of Weller et al. (US 20210081664 A1) hereafter referenced as Weller and Erle (US 20160162569 A1) hereafter referenced as Erle.
Regarding claim 10, Semenov in view of Weller teaches the method of claim 1, Semenov in view of Weller fails to explicitly teach wherein extracting the information content of the document comprises: using an auxiliary information that is selected based on at least one of the first value of the first document attribute or the second value of the second document attribute.
However, Erle explicitly teaches wherein extracting the information content of the document comprises: using an auxiliary information that is selected based on at least one of the first value of the first document attribute or the second value of the second document attribute (Fig. 1, Paragraph [0039]- Erle discloses the document text, tokens, tags, document metadata and auxiliary data are used by a feature extracting algorithm as described in application (Attorney Docket No. 1402805.00017_IDB017), which is incorporated herein by reference.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to combine the teachings of Semenov in view of Weller of a method comprising: processing a representation of a document to obtain a first set of hypotheses each associating the document with a respective value of a first document attribute; processing the representation of the document to obtain a second set of hypotheses each associating the document with a respective value of a second document attribute with the teachings of Erle wherein extracting the information content of the document comprises: using an auxiliary information that is selected based on at least one of the first value of the first document attribute or the second value of the second document attribute.
Wherein having Semenov’s system for processing a document wherein extracting the information content of the document comprises: using an auxiliary information that is selected based on at least one of the first value of the first document attribute or the second value of the second document attribute.
The motivation behind the modification would have been to allow for improved performance of the machine learning models, since both Semenov and Erle are systems that extract information from documents. Wherein Semenov’s system wherein improved efficiency of Key value associations in documents, while Erle’s system wherein improved performance in generating a natural language model. Please see Semenov et al. (US 20240169752 A1), Paragraph [0024] and Erle et al. (US 20160162569 A1) Paragraph [0026].
Regarding claim 18, Semenov in view of Weller teaches the system of claim 12, Semenov in view of Weller fails to explicitly teach wherein extracting the information content of the document comprises: using an auxiliary information that is selected based on at least one of the first value of the first document attribute or the second value of the second document attribute.
However, Erle explicitly teaches wherein extracting the information content of the document comprises: using an auxiliary information that is selected based on at least one of the first value of the first document attribute or the second value of the second document attribute (Fig. 1, Paragraph [0039]- Erle discloses the document text, tokens, tags, document metadata and auxiliary data are used by a feature extracting algorithm as described in application (Attorney Docket No. 1402805.00017_IDB017), which is incorporated herein by reference.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to combine the teachings of Semenov in view of Weller of a system comprising: a memory; and a processing device communicatively coupled to the memory, the processing device to: process a representation of a document to obtain a first set of hypotheses each associating the document with a respective value of a first document attribute; process the representation of the document to obtain a second set of hypotheses each associating the document with a respective value of a second document attribute with the teachings of Erle wherein extracting the information content of the document comprises: using an auxiliary information that is selected based on at least one of the first value of the first document attribute or the second value of the second document attribute.
Wherein having Semenov’s system for processing a document wherein extracting the information content of the document comprises: using an auxiliary information that is selected based on at least one of the first value of the first document attribute or the second value of the second document attribute.
The motivation behind the modification would have been to allow for improved performance of the machine learning models, since both Semenov and Erle are systems that extract information from documents. Wherein Semenov’s system wherein improved efficiency of Key value associations in documents, while Erle’s system wherein improved performance in generating a natural language model. Please see Semenov et al. (US 20240169752 A1), Paragraph [0024] and Erle et al. (US 20160162569 A1) Paragraph [0026].
Claim 17 is rejected under 35 U.S.C 103 as being unpatentable over Semenov (US 20240169752 A1) hereafter referenced as Semenov in view of Weller et al. (US 20210081664 A1) hereafter referenced as Weller, Semenov (US 20210150338 A1) hereafter referenced as Semenov2, and Sharma (US 12051255 B1) hereafter referenced as Sharma.
Regarding claim 17, Semenov in view of Weller and Semenov2 teaches the system of claim 16, Semenov in view of Weller and Semenov2 fails to explicitly teach wherein the processing device is to process the representation of the document to obtain the first set of hypotheses and the second set of hypotheses responsive to identifying that a database of attributes is unavailable for a type of documents that is associated with the document.
However, Sharma explicitly teaches wherein the processing device is to process the representation of the document to obtain the first set of hypotheses and the second set of hypotheses responsive to identifying that a database of attributes is unavailable for a type of documents that is associated with the document (Fig. 3, Column 9-10 Lines [0061-66 and 0001-7]- Sharma discloses a fallback path is provided for use in certain circumstances. For example, the message analyzer 302 can determine whether to route the message for processing on the fallback path. The fallback path can be designed to process messages according to a different machine learning model trained on a smaller feature set. This can be used, for example, when the message analyzer determines that the message relates to a client or jurisdictions that the ML model 320 has not been trained for. These can result in features that can bias the classification result. The fallback path can be a mirror of the processing flow described above with the only difference being the features extracted and the features used in the machine learning model on the fallback path.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to combine the teachings of Semenov in view of Weller and Semenov2 of a system comprising: a memory; and a processing device communicatively coupled to the memory, the processing device to: process a representation of a document to obtain a first set of hypotheses each associating the document with a respective value of a first document attribute; process the representation of the document to obtain a second set of hypotheses each associating the document with a respective value of a second document attribute with the teachings of Sharma wherein the processing device is to process the representation of the document to obtain the first set of hypotheses and the second set of hypotheses responsive to identifying that a database of attributes is unavailable for a type of documents that is associated with the document
Wherein having Semenov’s system for processing a document wherein the processing device is to process the representation of the document to obtain the first set of hypotheses and the second set of hypotheses responsive to identifying that a database of attributes is unavailable for a type of documents that is associated with the document
The motivation behind the modification would have been to allow for a more efficient system, since both Semenov and Sharma are systems that extract information from documents. Wherein Semenov’s system wherein improved efficiency of Key value associations in documents, while Sharma’s system wherein improved efficiency of the system. Please see Semenov et al. (US 20240169752 A1), Paragraph [0024] and Sharma et al. (US 12051255 B1) Column 3 Lines [0021-43].
Conclusion
Listed below are the prior arts made of record and not relied upon but are considered
pertinent to applicant`s disclosure.
Langseth et al. (US 20070011175 A1)- A system and method of making unstructured data available to structured data analysis tools. The system includes middleware software that can be used in combination with structured data tools to perform analysis on both structured and unstructured data. Data can be read from a wide variety of unstructured sources. The data may then be transformed with commercial data transformation products that may, for example, extract individual pieces of data and determine relationships between the extracted data. The transformed data and relationships may then be passed through an extraction/transform/load (ETL) layer and placed in a structured schema. The structured schema may then be made available to commercial or proprietary structured data analysis tools....................Please see Fig. 1. Abstract.
Owen et al. (US 20220350814 A1)- Data from multiple data sources, and in multiple different formats, can be processed accurately and automatically using an intelligent data extraction system. Input data can be processed using a first neural network to infer a classification. Based at least in part upon this classification, a processing workflow can be generated that includes a number of different analytical tools (such as engines, tools, and services) that are able to accurately identify and extract different types of data. Candidate results from these tools can include values for determined attributes, along with associated confidences in those values. An intelligent selection engine, which may also include a neural network, can analyze these values and confidences to select the appropriate value(s) for each of these attributes from the input data. The selected and merged data may be stored using a determined description language, in order to provide for consistent output and presentation of the extracted data....................Please see Fig. 1. Abstract.
Datta et al. (US 20140143254 A1)- Systems and methods can determine categories for product searches. One or more computing devices can receive a product query of search terms. The product query can be classified to identify a product category. The search terms may be verified against an ambiguous term list for the product category. The search terms may also be verified against an attribute list for the product category. The product query may be classified as fully understood in response to all of the search terms matching either the ambiguous term list or the attribute list for the product category. A product search may be performed on the product query. The product search may be informed by the product category when the product query has been classified as fully understood. Search results may be generated and returned according to the product search......................Please see Fig. 1. Abstract.
Balakrishnan et al. (US 20210124919 A1)- A system and methods directed to the authentication/verification of identification and other documents. Such documents may include identity cards, driver's licenses, passports, documents being used to show a proof of registration or certification, voter ballots, data entry forms, etc. The authentication or verification process may be performed for purposes of control of access to information, control of access to and/or use of a venue, a method of transport, or a service, for assistance in performing a security function, to establish eligibility for and enable provision of a government provided service or benefit, etc. The authentication or verification process may also or instead be performed for purposes of verifying a document itself as authentic so that the information it contains can confidently be assumed to be accurate and reliable........................Please see Fig. 1. Abstract.
SHAABAN et al. (US 20220121881 A1)- Systems and methods for enabling target data to be extracted from documents are disclosed herein. In an embodiment, a method of enabling target data to be extracted from documents includes accessing a database including a plurality of documents including target data, for each of multiple of the documents, creating a region tensor based on extracted text including the target data, for each of the multiple of the documents, creating a label tensor based on an area including the target data, and using the region tensor and the label tensor, training an extraction algorithm to extract the target data from additional documents.........................Please see Fig. 1. Abstract.
Peng et al. (US 20240048558 A1)- Disclosed is a device authentication method used in a server, comprising: (S11) receiving a certification request sent by at least one terminal device; (S12) parsing the certification request so as to perform authentication on physical code information of the terminal device according to a preset device table; (S13) in a situation where the physical code information of the terminal device matches a preset terminal device code, determining that the terminal device passes authentication; (S14) in a situation where the physical code information of the terminal device does not match any preset terminal device code in the preset device table and the total number of preset terminal device codes in the preset device table has not reached a threshold, in response to an add-to-device table operation, adding the physical code information of the terminal device to the preset device table and determining that the terminal device passes authentication..........................Please see Fig. 1. Abstract.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LUCIUS C.G. ALLEN whose telephone number is (703)756-5987. The examiner can normally be reached Mon - Fri 8-5pm (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chineyere Wills-Burns can be reached at (571)272-9752. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/LUCIUS CAMERON GREEN ALLEN/Examiner, Art Unit 2673
/CHINEYERE WILLS-BURNS/Supervisory Patent Examiner, Art Unit 2673