DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment and Arguments
Applicant’s amendment filed on December 26, 2025 has been entered and made of record. Claims 1 and 3-20 are pending and are being examined in this application.
Applicant’s arguments with respect to the 103 rejections have been fully considered, but are unpersuasive for at least the following reasons:
Regarding amended claim 1, which now incorporates the subject matter of claim 16, applicant argues that the cited references fail to teach or suggest “in response to a request that includes an individual molecule representation of the molecule, returning a reference to the document.” In particular, applicant argues that “a search engine cannot index information that is not already present in textual form. However, Li does not index documents or provide any mechanism for searching documents by molecule representation. Thus, even if Cordeiro introduces AI-based extraction and identification of metadata of images and tables, combining Cordeiro's metadata with Li's molecular database still does not teach or suggest retrieving documents based on molecular representation” [Remarks, pgs. 5 and 6].
However, Li’s disclosure of looking up a molecular structure in a space database [fig. 5; abstract; pg. 6, second half; pg. 7, par. 3; pg. 8, par. 3; pg. 12, par. 2; claim 1] teaches “querying a molecule reference with the molecule representation” as recited in the third step of claim 1 (not argued by applicant). As such, Li also teaches the claimed “in response to a request that includes an individual molecule representation of the molecule, returning...”
Li further discloses that, in response to the lookup, the space database returns information about the molecule that is associated with the molecular structure [fig. 5; abstract; pg. 6, second half; pg. 7, par. 3; pg. 8, par. 3; pg. 12, par. 2; claim 1]. Thus, Li discloses returning associated information in response to lookup up the molecular structure in the space database, but does not disclose that the associated information includes a document.
Cordeiro’s disclosure of extracting images and text from a document, converting the extracted images and text into structured information, and organizing the structure information together with the document [fig. 1; pars. 25-27, and 39] teaches the claimed “associating the document with the molecule data” as recited in the fifth step of claim 1 (not argued by applicant). Cordeiro further teaches indexing and retrieving documents through search engines [par. 25].
As such, combining Cordeiro’s disclosure of associating a document with structured information and retrieving indexed documents via search engines combined with Li’s disclosure of returning associated information from a space database in response to a lookup using a molecular structure clearly teaches the claimed “in response to a request that includes an individual molecule representation of the molecule, returning a reference to the document.”
Regarding amended claim 3, applicant “does not see anywhere that Li discusses creating synthetic documents by inserting images of molecules into text documents, and in particular, into unrelated text documents” [Remarks, pg. 7].
However, Li discloses that synthetic images (i.e., images with randomly replaced atoms) are combined with text data to train a fusion model capable of processing documents with both image and text data; the fusion model uses machine learning to perform image recognition and naming entity recognition [pg. 2, second half; pg. 5, second half; pg. 8, last 3 pars.]. In other words, the original image of the molecule is related to the text / document from which it was extracted, but the synthetic image comprising the randomly replaced atoms is no longer related. Also, combining the synthetic image with the text data is considered to be a synthetic document or, alternatively, associating the synthetic image with the document (i.e., in combination with Cordeiro) is considered to be a synthetic document.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1 and 3-20 are rejected under 35 U.S.C. 103 as being unpatentable over Li et al. (CN 115458077A, translation provided) in view of Cordeiro et al. (US Pub. 20250046110).
Referring to claim 1, Li discloses A method comprising:
extracting an image of a molecule from a document [fig. 5; abstract; pg. 6, second half; pg. 7, par. 3; pg. 8, par. 3; pg. 12, par. 2; claim 1; image data of a molecule is extracted from a document (e.g., a patent document)];
converting the image to a molecule representation [fig. 5; abstract; pg. 6, second half; pg. 7, par. 3; pg. 8, par. 3; pg. 12, par. 2; claim 1; the image data is converted to a molecular structure represented in the SMILES format];
querying a molecule reference with the molecule representation [fig. 5; abstract; pg. 6, second half; pg. 7, par. 3; pg. 8, par. 3; pg. 12, par. 2; claim 1; the molecular structure is looked up in a space database];
retrieving molecule data from the molecule reference [fig. 5; abstract; pg. 6, second half; pg. 7, par. 3; pg. 8, par. 3; pg. 12, par. 2; claim 1; the space database returns information about the molecule that is associated with the molecular structure]; and
in response to a request that includes an individual molecule representation of the molecule, returning... [fig. 5; abstract; pg. 6, second half; pg. 7, par. 3; pg. 8, par. 3; pg. 12, par. 2; claim 1; note lookups in the space database using the molecular structure to return the associated information].
Li does not appear to explicitly disclose associating the document with the molecule data in a metadata database; and in response to a request that includes an individual molecule representation of the molecule, returning a reference to the document.
However, Cordeiro discloses associating the document with the molecule data in a metadata database [fig. 1; pars. 25-27, and 39; images and text are extracted from a document via image text recognition; the extracted images and text converted into structured information via image classification and named entity identification, respectively; the structured information for the images and the text are stored in separate files but organized together in a folder for the document]; and in response to a request that includes an individual molecule representation of the molecule, returning a reference to the document [par. 25; note that providing the structured information to a search engine would return links to documents having the structured information that was aggregated by a metadata aggregator.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the processing of image data and text data from a document taught by Li so that the image data and the text data (i.e., structured information) are associated with the document from which they were extracted as taught by Cordeiro, with a reasonable expectation of success. The motivation for doing so would have been to make it possible to index and subsequently retrieve the document through search engines using the structured information [Cordeiro, par. 25].
Referring to claim 3, Li discloses The method of claim 1, wherein the image of the molecule is extracted from the document using an image extraction machine learning model trained on synthetic documents, and wherein the synthetic documents are created by inserting images of molecules into unrelated text documents [pg. 2, second half; pg. 5, second half; pg. 8, last 3 pars.; synthetic images (i.e., images with randomly replaced atoms) are combined with text data to train a fusion model capable of processing documents with both image and text data; the fusion model uses machine learning to perform image recognition and naming entity recognition].
Referring to claim 4, Li discloses The method of claim 3, wherein the image extraction machine learning model is refined by manually tagging images of molecules identified in real world documents by the image extraction machine learning model [pg. 11, par. 4; note the manual processing].
Referring to claim 5, Li discloses The method of claim 1, further comprising: embedding the molecule data into the document [pg. 4, par. 3; pg. 8, last par. – pg. 9, par. 3; the molecular structure from the image data is fused with the information about the molecule from text data to generate a fusion of the image data and the text information, which is stored in the space database that associates the molecular structure with the information about the molecule (e.g., synthetic property, drug property, and pharmacological activity)].
Referring to claim 6, Li discloses The method of claim 1, wherein converting the image to the molecule representation comprises: providing the image to a structure identification machine learning model [pg. 2, second half; pg. 4, pars. 4-8; pg. 10, par. 2; the image data is converted to the molecular structure via image recognition using machine learning].
Referring to claim 7, Li discloses The method of claim 6, wherein the structure identification machine learning model predicts a location of an atom in the molecule and one or more bonds between atoms of the molecule, and wherein the molecule information is generated from the predicted atom location and the predicted one or more bonds [pg. 3, first half; when the image data and text data is provided as input to the fusion model, the fusion model outputs the molecular structure in the SMILES format (which includes bond information), key and charge classification and coordinate (i.e., location) information and substituent molecule].
Referring to claim 8, see at least the rejection for claim 1. Li further discloses A system comprising: a processing unit; and a computer-readable storage medium having computer-executable instructions stored thereupon, which, when executed by the processing unit, cause the processing unit to perform the claimed steps [pg. 7, par. 4; various embodiments may be implemented using instruction code stored in computer accessible memory].
Referring to claim 9, Li discloses The system of claim 8, wherein the molecule data comprises a graphic representation of the molecule obtained from the molecule reference [pg. 4, par. 3; pg. 8, last par. – pg. 9, par. 3; note the fusion of the image data and the text data stored in the space database; see also fig. 3 of Cordeiro, displaying an image of the structured information].
Referring to claim 10, Cordeiro discloses The system of claim 8, wherein the molecule data is displayed in a user interface of an application that displays the document [fig. 3; note the displaying of the structured information associated with the document].
Referring to claim 11, see the rejection for claim 3.
Referring to claim 12, see the rejection for claim 6.
Referring to claim 13, Li discloses The system of claim 8, wherein the molecule data comprises a name, a molecular formula, or a molecular weight [abstract; pg. 8, pars. 2 and 3; the information about the molecule includes substituent compounds (i.e., molecular formula)].
Referring to claim 14, Cordeiro discloses The system of claim 8, wherein the molecule data is embedded with a page number of the image [fig. 3; each structured information is associated with a page number of its source image in an XML file].
Referring to claim 15, Li discloses The system of claim 8, wherein the molecule representation comprises a text-based representation [fig. 5; abstract; pg. 6, second half; pg. 7, par. 3; pg. 8, par. 3; pg. 12, par. 2; claim 1; note the SMILES format].
Referring to claim 16, see at least the rejection for claim 1. Li further discloses A computer-readable storage medium having encoded thereon computer-readable instructions that when executed by a processing unit cause a system to perform the claimed steps [pg. 7, par. 4; various embodiments may be implemented using instruction code stored in computer accessible memory].
Referring to claim 17, see the rejection for claim 15.
Referring to claim 18, Li discloses The computer-readable storage medium of claim 17, wherein the molecule representation comprises a Simplified Molecular Input Line Entry System (SMILES) [fig. 5; abstract; pg. 6, second half; pg. 7, par. 3; pg. 8, par. 3; pg. 12, par. 2; claim 1; note the SMILES format].
Referring to claim 19, Cordeiro discloses The computer-readable storage medium of claim 16, wherein the individual molecule representation was embedded in another document, and wherein the other document includes another image of the molecule [par. 25; note that providing the structured information to a search engine would return links to other documents including the structured information, including other documents having other images with the same structured information].
Referring to claim 20, Li discloses The computer-readable storage medium of claim 16, wherein the individual molecule representation was listed in a search result received from the metadata database [par. 25; note that providing the structured information to a search engine would return search results of documents having the structured information aggregated by the metadata aggregator].
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to GRACE PARK whose telephone number is (571)270-7727. The examiner can normally be reached M-F 8AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, TAMARA KYLE can be reached at (571)272-4241. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Grace Park/Primary Examiner, Art Unit 2144