Last updated: May 29, 2026

Application No. 18/385,873

ENHANCING DOCUMENT METADATA WITH CONTEXTUAL MOLECULAR INTELLIGENCE

Final Rejection §103

Filed

Oct 31, 2023

Examiner

PARK, GRACE A

Art Unit

2144

Tech Center

2100 — Computer Architecture & Software

Assignee

Microsoft Technology Licensing, LLC

OA Round

2 (Final)

Interview Optional

— +18.1% interview lift. Examiner has a relatively high allowance rate (76%); +18.1% interview lift. A written response may suffice.

Based on 560 resolved cases, 2023–2026

Examiner Intelligence

PARK, GRACE A View full profile →

Grants 76% — above average

Career Allowance Rate

424 granted / 560 resolved

+20.7% vs TC avg

Strong +18% interview lift

Without

With

+18.1%

Interview Lift

resolved cases with interview

Typical timeline

3y 4m

Avg Prosecution

18 currently pending

Career history

585

Total Applications

across all art units

Statute-Specific Performance

§101

6.3%

-33.7% vs TC avg

§103

80.8%

+40.8% vs TC avg

§102

7.9%

-32.1% vs TC avg

§112

3.0%

-37.0% vs TC avg

Black line = Tech Center average estimate • Based on career data from 560 resolved cases

Office Action

§103

DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment and Arguments
Applicant’s amendment filed on December 26, 2025 has been entered and made of record.  Claims 1 and 3-20 are pending and are being examined in this application.
Applicant’s arguments with respect to the 103 rejections have been fully considered, but are unpersuasive for at least the following reasons:
Regarding amended claim 1, which now incorporates the subject matter of claim 16, applicant argues that the cited references fail to teach or suggest “in response to a request that includes an individual molecule representation of the molecule, returning a reference to the document.” In particular, applicant argues that “a search engine cannot index information that is not already present in textual form. However, Li does not index documents or provide any mechanism for searching documents by molecule representation. Thus, even if Cordeiro introduces AI-based extraction and identification of metadata of images and tables, combining Cordeiro's metadata with Li's molecular database still does not teach or suggest retrieving documents based on molecular representation” [Remarks, pgs. 5 and 6].
However, Li’s disclosure of looking up a molecular structure in a space database [fig. 5; abstract; pg. 6, second half; pg. 7, par. 3; pg. 8, par. 3; pg. 12, par. 2; claim 1] teaches “querying a molecule reference with the molecule representation” as recited in the third step of claim 1 (not argued by applicant). As such, Li also teaches the claimed “in response to a request that includes an individual molecule representation of the molecule, returning...”
Li further discloses that, in response to the lookup, the space database returns information about the molecule that is associated with the molecular structure [fig. 5; abstract; pg. 6, second half; pg. 7, par. 3; pg. 8, par. 3; pg. 12, par. 2; claim 1]. Thus, Li discloses returning associated information in response to lookup up the molecular structure in the space database, but does not disclose that the associated information includes a document.
Cordeiro’s disclosure of extracting images and text from a document, converting the extracted images and text into structured information, and organizing the structure information together with the document [fig. 1; pars. 25-27, and 39] teaches the claimed “associating the document with the molecule data” as recited in the fifth step of claim 1 (not argued by applicant). Cordeiro further teaches indexing and retrieving documents through search engines [par. 25].
As such, combining Cordeiro’s disclosure of associating a document with structured information and retrieving indexed documents via search engines combined with Li’s disclosure of returning associated information from a space database in response to a lookup using a molecular structure clearly teaches the claimed “in response to a request that includes an individual molecule representation of the molecule, returning a reference to the document.”

Regarding amended claim 3, applicant “does not see anywhere that Li discusses creating synthetic documents by inserting images of molecules into text documents, and in particular, into unrelated text documents” [Remarks, pg. 7].
However, Li discloses that synthetic images (i.e., images with randomly replaced atoms) are combined with text data to train a fusion model capable of processing documents with both image and text data; the fusion model uses machine learning to perform image recognition and naming entity recognition [pg. 2, second half; pg. 5, second half; pg. 8, last 3 pars.]. In other words, the original image of the molecule is related to the text / document from which it was extracted, but the synthetic image comprising the randomly replaced atoms is no longer related. Also, combining the synthetic image with the text data is considered to be a synthetic document or, alternatively, associating the synthetic image with the document (i.e., in combination with Cordeiro) is considered to be a synthetic document.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1 and 3-20 are rejected under 35 U.S.C. 103 as being unpatentable over Li et al. (CN 115458077A, translation provided) in view of Cordeiro et al. (US Pub. 20250046110).

Referring to claim 1, Li discloses A method comprising: 
extracting an image of a molecule from a document [fig. 5; abstract; pg. 6, second half; pg. 7, par. 3; pg. 8, par. 3; pg. 12, par. 2; claim 1; image data of a molecule is extracted from a document (e.g., a patent document)]; 
converting the image to a molecule representation [fig. 5; abstract; pg. 6, second half; pg. 7, par. 3; pg. 8, par. 3; pg. 12, par. 2; claim 1; the image data is converted to a molecular structure represented in the SMILES format]; 
querying a molecule reference with the molecule representation [fig. 5; abstract; pg. 6, second half; pg. 7, par. 3; pg. 8, par. 3; pg. 12, par. 2; claim 1; the molecular structure is looked up in a space database]; 
retrieving molecule data from the molecule reference [fig. 5; abstract; pg. 6, second half; pg. 7, par. 3; pg. 8, par. 3; pg. 12, par. 2; claim 1; the space database returns information about the molecule that is associated with the molecular structure]; and
in response to a request that includes an individual molecule representation of the molecule, returning... [fig. 5; abstract; pg. 6, second half; pg. 7, par. 3; pg. 8, par. 3; pg. 12, par. 2; claim 1; note lookups in the space database using the molecular structure to return the associated information].

Li does not appear to explicitly disclose associating the document with the molecule data in a metadata database; and in response to a request that includes an individual molecule representation of the molecule, returning a reference to the document.
However, Cordeiro discloses associating the document with the molecule data in a metadata database [fig. 1; pars. 25-27, and 39; images and text are extracted from a document via image text recognition; the extracted images and text converted into structured information via image classification and named entity identification, respectively; the structured information for the images and the text are stored in separate files but organized together in a folder for the document]; and in response to a request that includes an individual molecule representation of the molecule, returning a reference to the document [par. 25; note that providing the structured information to a search engine would return links to documents having the structured information that was aggregated by a metadata aggregator.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the processing of image data and text data from a document taught by Li so that the image data and the text data (i.e., structured information) are associated with the document from which they were extracted as taught by Cordeiro, with a reasonable expectation of success. The motivation for doing so would have been to make it possible to index and subsequently retrieve the document through search engines using the structured information [Cordeiro, par. 25].

Referring to claim 3, Li discloses The method of claim 1, wherein the image of the molecule is extracted from the document using an image extraction machine learning model trained on synthetic documents, and wherein the synthetic documents are created by inserting images of molecules into unrelated text documents [pg. 2, second half; pg. 5, second half; pg. 8, last 3 pars.; synthetic images (i.e., images with randomly replaced atoms) are combined with text data to train a fusion model capable of processing documents with both image and text data; the fusion model uses machine learning to perform image recognition and naming entity recognition].

Referring to claim 4, Li discloses The method of claim 3, wherein the image extraction machine learning model is refined by manually tagging images of molecules identified in real world documents by the image extraction machine learning model [pg. 11, par. 4; note the manual processing].

Referring to claim 5, Li discloses The method of claim 1, further comprising: embedding the molecule data into the document [pg. 4, par. 3; pg. 8, last par. – pg. 9, par. 3; the molecular structure from the image data is fused with the information about the molecule from text data to generate a fusion of the image data and the text information, which is stored in the space database that associates the molecular structure with the information about the molecule (e.g., synthetic property, drug property, and pharmacological activity)].

Referring to claim 6, Li discloses The method of claim 1, wherein converting the image to the molecule representation comprises: providing the image to a structure identification machine learning model [pg. 2, second half; pg. 4, pars. 4-8; pg. 10, par. 2; the image data is converted to the molecular structure via image recognition using machine learning].

Referring to claim 7, Li discloses The method of claim 6, wherein the structure identification machine learning model predicts a location of an atom in the molecule and one or more bonds between atoms of the molecule, and wherein the molecule information is generated from the predicted atom location and the predicted one or more bonds [pg. 3, first half; when the image data and text data is provided as input to the fusion model, the fusion model outputs the molecular structure in the SMILES format (which includes bond information), key and charge classification and coordinate (i.e., location) information and substituent molecule].

Referring to claim 8, see at least the rejection for claim 1. Li further discloses A system comprising: a processing unit; and a computer-readable storage medium having computer-executable instructions stored thereupon, which, when executed by the processing unit, cause the processing unit to perform the claimed steps [pg. 7, par. 4; various embodiments may be implemented using instruction code stored in computer accessible memory].

Referring to claim 9, Li discloses The system of claim 8, wherein the molecule data comprises a graphic representation of the molecule obtained from the molecule reference [pg. 4, par. 3; pg. 8, last par. – pg. 9, par. 3; note the fusion of the image data and the text data stored in the space database; see also fig. 3 of Cordeiro, displaying an image of the structured information].

Referring to claim 10, Cordeiro discloses The system of claim 8, wherein the molecule data is displayed in a user interface of an application that displays the document [fig. 3; note the displaying of the structured information associated with the document].

Referring to claim 11, see the rejection for claim 3.
Referring to claim 12, see the rejection for claim 6.

Referring to claim 13, Li discloses The system of claim 8, wherein the molecule data comprises a name, a molecular formula, or a molecular weight [abstract; pg. 8, pars. 2 and 3; the information about the molecule includes substituent compounds (i.e., molecular formula)].

Referring to claim 14, Cordeiro discloses The system of claim 8, wherein the molecule data is embedded with a page number of the image [fig. 3; each structured information is associated with a page number of its source image in an XML file].

Referring to claim 15, Li discloses The system of claim 8, wherein the molecule representation comprises a text-based representation [fig. 5; abstract; pg. 6, second half; pg. 7, par. 3; pg. 8, par. 3; pg. 12, par. 2; claim 1; note the SMILES format].

Referring to claim 16, see at least the rejection for claim 1. Li further discloses A computer-readable storage medium having encoded thereon computer-readable instructions that when executed by a processing unit cause a system to perform the claimed steps [pg. 7, par. 4; various embodiments may be implemented using instruction code stored in computer accessible memory].

Referring to claim 17, see the rejection for claim 15.

Referring to claim 18, Li discloses The computer-readable storage medium of claim 17, wherein the molecule representation comprises a Simplified Molecular Input Line Entry System (SMILES) [fig. 5; abstract; pg. 6, second half; pg. 7, par. 3; pg. 8, par. 3; pg. 12, par. 2; claim 1; note the SMILES format].

Referring to claim 19, Cordeiro discloses The computer-readable storage medium of claim 16, wherein the individual molecule representation was embedded in another document, and wherein the other document includes another image of the molecule [par. 25; note that providing the structured information to a search engine would return links to other documents including the structured information, including other documents having other images with the same structured information].

Referring to claim 20, Li discloses The computer-readable storage medium of claim 16, wherein the individual molecule representation was listed in a search result received from the metadata database [par. 25; note that providing the structured information to a search engine would return search results of documents having the structured information aggregated by the metadata aggregator].

Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.


Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to GRACE PARK whose telephone number is (571)270-7727. The examiner can normally be reached M-F 8AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, TAMARA KYLE can be reached at (571)272-4241. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/Grace Park/Primary Examiner, Art Unit 2144

Read full office action

Prosecution Timeline

Oct 31, 2023

Application Filed

Sep 26, 2025

Non-Final Rejection mailed — §103

Nov 24, 2025

Interview Requested

Dec 26, 2025

Response Filed

Mar 06, 2026

Final Rejection mailed — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/901,053

Patent 12639627

TRAINING OF MACHINE LEARNING MODELS WITH HARDWARE-IN-THE-LOOP SIMULATIONS

3y 8m to grant Granted May 26, 2026

17/946,770

Patent 12639568

Method, System, and Computer Program Product for Determining Relationships of Entities Associated with Interactions

3y 8m to grant Granted May 26, 2026

18/416,342

Patent 12639356

OPTIMIZED CONTENT GENERATION METHOD AND SYSTEM

2y 4m to grant Granted May 26, 2026

17/966,892

Patent 12608650

STORAGE MEDIUM, INFORMATION PROCESSING METHOD, AND INFORMATION PROCESSING APPARATUS

3y 6m to grant Granted Apr 21, 2026

18/149,682

Patent 12591807

SKETCHED AND CLUSTERED FEDERATED LEARNING WITH AUTOMATIC TUNING

3y 2m to grant Granted Mar 31, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

3-4

Expected OA Rounds

76%

Grant Probability

94%

With Interview (+18.1%)

3y 4m (~9m remaining)

Median Time to Grant

Moderate

PTA Risk

Based on 560 resolved cases by this examiner. Grant probability derived from career allowance rate.