Last updated: May 29, 2026

Application No. 18/750,828

GENERATING LARGE-LANGUAGE-MODEL COMPATIBLE SEQUENTIAL ATTACHMENT-BASED FRAGMENT EMBEDDING MOLECULAR REPRESENTATIONS

Non-Final OA §101§102§103

Filed

Jun 21, 2024

Priority

Jan 05, 2024 — provisional 63/618,172

Examiner

VILLENA, MARK

Art Unit

2658

Tech Center

2600 — Communications

Assignee

Recursion Pharmaceuticals Inc.

OA Round

1 (Non-Final)

Interview Optional

— +15.4% interview lift. Examiner has a relatively high allowance rate (70%); +15.4% interview lift. A written response may suffice.

Based on 486 resolved cases, 2023–2026

Examiner Intelligence

VILLENA, MARK View full profile →

Grants 70% — above average

Career Allowance Rate

342 granted / 486 resolved

+8.4% vs TC avg

Strong +15% interview lift

Without

With

+15.4%

Interview Lift

resolved cases with interview

Typical timeline

3y 8m

Avg Prosecution

13 currently pending

Career history

504

Total Applications

across all art units

Statute-Specific Performance

§101

3.7%

-36.3% vs TC avg

§103

75.7%

+35.7% vs TC avg

§102

6.5%

-33.5% vs TC avg

§112

0.8%

-39.2% vs TC avg

Black line = Tech Center average estimate • Based on career data from 486 resolved cases

Office Action

§101 §102 §103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 07/25/2024 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Drawings
The drawings were submitted on 06/21/2024.  These drawings are reviewed and accepted by the examiner.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim(s) recite(s) steps for generating a molecular string representation. This judicial exception is not integrated into a practical application because no additional elements are recited outside of the method steps. 

Step 2A, Prong One:
The claims recite mathematical concepts. For example, in claim 1: “identifying a molecular string representation”, “generating a set of fragments”, “generating a sequential attachment-based fragment embedding”, “concatenating fragments”, and “generating ring link characters” (see also claims 2-20). Limitations such as generating an embedding molecular string representation and concatenating fragments recite mathematical relationships and calculations.
The claims recite mental processes, namely, observations/evaluations and decisions that could be performed conceptually in the human mind, including identifying connections between atom representations, generating fragments, and generating rink link characters.

Step 2A, Prong Two:
The claims are “computer-implemented” and include steps like “identifying a molecular string representation”, “generating a set of fragments”, “concatenating fragments,” and “generating ring link characters.” These are generic computer functions involving data gathering, processing (mathematical), decision-making, and output. Merely applying an abstract idea on a generic computer or using conventional speech recognition does not integrate the exception into a practical application. See Alice Corp. v. CLS Bank Int’l, 573 U.S. 208 (2014); Credit Acceptance Corp. v. Westlake Servs., 859 F.3d 1044 (Fed. Cir. 2017).
The claims do not recite an improvement to the functioning of the computer or to another technology/technical field. There is no recitation of a specific, technological improvement.

Step 2B:
Beyond the abstract ideas, the claims recite generic computer implementation: identifying a string representation, fragmenting data, generating an embedding, concatenating fragmented data, and generating characters. The specification, as reflected by the claim language, does not require any unconventional hardware or a particular machine.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claim(s) 1, 3-7, 12, and 14-19 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Cheng et al. (“Group SELFIES: a robust fragment-based molecular string representation”, 2023).


Regarding claims 1, 12, and 16, Cheng teaches:
“identifying a molecular string representation comprising ring structure identifiers that indicate virtual connections between atom representations of a molecular compound” (pg. 750, 3.1 SELFIES framework; ‘The encoder takes in a molecule and converts it to a SELFIES string, and the decoder takes in a SELFIES string and converts it to a molecule.’);
“generating a set of fragments from the molecular string representation” (pg. 752, 3.5 Determining fragments; ‘Fragments can also be obtained from several fragment libraries found in the literature.38–40 Generally, a useful set of groups will appear in many molecules in the dataset and replace many atoms, with similar fragments merged together to reduce redundancy.’); and
“generating a sequential attachment-based fragment embedding (SAFE) molecular string representation that represents the molecular string representation as an order agnostic sequence of interconnected fragment blocks” (pg. 751, Fig. 2; ‘Bottom: celecoxib represented in Group SELFIES. Tokens are colored by the groups and atoms they refer to. Index overloads are shown where interpreted. Colored arrows indicate how the decoder navigates around the attachment points of the groups.’)

    PNG
    media_image1.png
    378
    1244
    media_image1.png
    Greyscale
 by:
“concatenating fragments from the set of fragments utilizing a separation character between the fragments to generate a linked fragment string” (pg. 750, 3.3 Groups; ‘We call this dictionary a “group set”, and every group set denes its own distinct instance of Group SELFIES. In particular, the decoder will not recognize a Group SELFIES string that contains group tokens not present in the current group set.’); and
“generating ring link characters in the linked fragment string to represent attachment points for fragment links” (pg. 751, top left; ‘To distinguish group tokens from other tokens, we include a : character at the front of the token (e.g. [:1parabenzene]). All group tokens are of the form [:S<group-name>], where S is the starting attachment index of the group, and <group-name> is any alphanumeric string that does not contain dashes or start with a number.’).

Regarding claims 3 (dep. on claim 1) and 14 (dep. on claim 12), Cheng further teaches:
“generating the linked fragment string by ordering the fragments from the set of fragments based on fragment size” (pg. 750, 3.2 Basic tokens in group SELFIES; ‘The next X tokens immediately following [RingX] will be interpreted as a number N, and we will count backwards N atoms in placement order to determine the target of the ring bond.’).

Regarding claims 4 (dep. on claim 1), 15, (dep. on claim 12), and 17 (dep. on claim 16), Cheng further teaches:
“generating the SAFE molecular string representation by: extracting attachment point indicators from the molecular string representation; and utilizing the attachment point indicators to generate the linked fragment string” (pg. 751, Fig. 2; ‘Bottom: celecoxib represented in Group SELFIES. Tokens are colored by the groups and atoms they refer to. Index overloads are shown where interpreted. Colored arrows indicate how the decoder navigates around the attachment points of the groups.’).

Regarding claim 5 (dep. on claim 4), Cheng further teaches:
“generating the SAFE molecular string representation by replacing the attachment point indicators in the linked fragment string with the ring link characters” (pg. 751, Fig. 2; ‘Bottom: celecoxib represented in Group SELFIES. Tokens are colored by the groups and atoms they refer to. Index overloads are shown where interpreted. Colored arrows indicate how the decoder navigates around the attachment points of the groups.’).

Regarding claims 6 (dep. on claim 1) and 18 (dep. on claim 16), Cheng further teaches:
“generating an additional SAFE molecular string representation from the SAFE molecular string representation by reordering fragment blocks comprising the fragments and the ring link characters, wherein the additional SAFE molecular string representation represents the molecular string representation” (pg. 750, 3.1 SELFIES framework; ‘For instance, when decoding [C][O][]C], adding []C] would exceed the valency of [O], so SELFIES changes the bond order and adds [C] instead.’; 3.2; ‘All tokens except [pop] can be modied by adding =, #, \ or / to change the bond order or stereochemistry of their parent bond (e.g. [#Branch] or [/C]).’).

Regarding claims 7 (dep. on claim 1) and 19 (dep. on claim 16), Cheng further teaches:
“wherein the ring link characters comprise ring digits” (pg. 751, top left; ‘To distinguish group tokens from other tokens, we include a : character at the front of the token (e.g. [:1parabenzene]). All group tokens are of the form [:S<group-name>], where S is the starting attachment index of the group, and <group-name> is any alphanumeric string that does not contain dashes or start with a number.’).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 2 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Cheng in view of Arus-Pous et al. (“SMILES‑based deep generative scaffold decorator for de‑novo drug design,” 2020).

Regarding claims 2 (dep. on claim 1) and 13 (dep. on claim 12), Cheng does not expressly teach:
“generating the set of fragments by utilizing a bond slicing algorithm with the molecular string representation.” 
Arus-Pous teaches:
“generating the set of fragments by utilizing a bond slicing algorithm with the molecular string representation” (pg. 2, top, right col.; ‘The second experiment instead used a subset of drug-like molecules in ChEMBL, which was exhaustively sliced using the same algorithm but restricting the acyclic bonds to cut to those that complied with the synthetic chemistry-based RECAP [35] rules.’).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the Cheng’s molecule grouping by incorporating Arus-Pous’s slicing in order to generate training sets that help generative models generalize for a wide range of scaffolds. (Arus-Pous: pg. 2, left col., bottom par.).

Claim(s) 8-11 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Cheng in view of Qian et al. (“Can Large Language Models Empower Molecular Property Prediction?”, 2023).

Regarding claims 8 (dep. on claim 1) and 20 (dep. on claim 16), Cheng does not expressly teach large language models, as in:
“generating, utilizing a large language model from the SAFE molecular string representation, an additional SAFE molecular string representation representing an additional molecular compound.”
Qian teaches:
“generating, utilizing a large language model from the SAFE molecular string representation, an additional SAFE molecular string representation representing an additional molecular compound” (pg. 2, left col., top paragraph; ‘Then, we propose a novel molecular representation called Captions as new Representation (CaR), which leverages ChatGPT to generate informative and professional textual analyses for SMILES. Then the textual explanation can serving as new representation for molecules, as illustrated in Figure 1.’).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Cheng’s molecular string representation by incorporating Qian’s large language model in order to generate informative and professional textual analyses for SMILES. The combination provides textual descriptions which are meaningful for assisting in molecular-related tasks. (Qian: pg. 2, left col., top paragraph)

Regarding claim 9 (dep. on claim 1), the combination of Cheng in view of Qian further teaches:
“generating, utilizing a large language model from the SAFE molecular string representation, a complete SAFE molecular compound sequence representation from a partial SAFE molecular compound sequence representation” (Qian: pg. 2, left col., top paragraph; ‘Then, we propose a novel molecular representation called Captions as new Representation (CaR), which leverages ChatGPT to generate informative and professional textual analyses for SMILES. Then the textual explanation can serving as new representation for molecules, as illustrated in Figure 1.’).

Regarding claim 10 (dep. on claim 1), the combination of Cheng in view of Qian further teaches:
“generating, utilizing a large language model from the SAFE molecular string representation, a linking SAFE molecular string representation for two or more molecular compound sequences” (Cheng: pg. 750, 3.3 Groups; ‘Each group is dened as a set of atoms and bonds representing the molecular group with its attachment points, indicating how the group can participate in bonding.’; Qian: pg. 2, left col., top paragraph; ‘Then, we propose a novel molecular representation called Captions as new Representation (CaR), which leverages ChatGPT to generate informative and professional textual analyses for SMILES. Then the textual explanation can serving as new representation for molecules, as illustrated in Figure 1.’).

Regarding claim 11 (dep. on claim 1), the combination of Cheng in view of Qian further teaches:
“generating, utilizing a large language model from the SAFE molecular string representation, a molecular compound sequence based on one or more target molecule compound constraints” (Cheng: pg. 750, 2.2 Genetic programming; ‘A string representation of molecules such as SELFIES can be thought of as a programming language where programs specify how to construct molecules. Genetic programming35 uses genetic algorithms to design programs that fulll desired constraints.’; Qian: pg. 2, left col., top paragraph; ‘Then, we propose a novel molecular representation called Captions as new Representation (CaR), which leverages ChatGPT to generate informative and professional textual analyses for SMILES. Then the textual explanation can serving as new representation for molecules, as illustrated in Figure 1.’).

Conclusion
Other pertinent prior art are cited in the PTO-892 for the applicant's consideration. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARK VILLENA whose telephone number is (571)270-3191. The examiner can normally be reached 10 am - 6pm EST Monday through Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached at (571) 272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

MARK . VILLENA
Examiner
Art Unit 2658



/MARK VILLENA/           Examiner, Art Unit 2658

Read full office action

Prosecution Timeline

Jun 21, 2024

Application Filed

Mar 12, 2026

Non-Final Rejection mailed — §101, §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/016,599

Patent 12640147

METHOD AND APPARATUS FOR LANGUAGE DETECTION BASED VOICE REAL-TIME TRANSLATION AS A SERVICE IN TELECOM

3y 4m to grant Granted May 26, 2026

18/201,103

Patent 12640145

DATA PROCESSING METHOD, APPARATUS, DEVICE, STORAGE MEDIUM AND PROGRAM PRODUCT

3y 0m to grant Granted May 26, 2026

18/246,030

Patent 12640157

HIGHER ORDER AMBISONICS ENCODING AND DECODING

3y 2m to grant Granted May 26, 2026

18/037,265

Patent 12619637

INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING SYSTEM, INFORMATION PROCESSING METHOD, AND PROGRAM

2y 11m to grant Granted May 05, 2026

18/132,417

Patent 12620318

ARTIFICIAL INTELLIGENCE CO-PILOT FOR MANNED AND UNMANNED AIRCRAFT

3y 0m to grant Granted May 05, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

70%

Grant Probability

86%

With Interview (+15.4%)

3y 8m (~1y 9m remaining)

Median Time to Grant

Low

PTA Risk

Based on 486 resolved cases by this examiner. Grant probability derived from career allowance rate.