Last updated: April 19, 2026

Application No. 18/351,384

SYSTEMS AND METHODS FOR CONVERTING AN ELECTRONIC DATA CAPTURE DATASET TO A STANDARD DATA TABULATION MODEL (SDTM) DATASET

Non-Final OA §101§103

Filed

Jul 12, 2023

Examiner

CASANOVA, JORGE A

Art Unit

2165

Tech Center

2100 — Computer Architecture & Software

Assignee

Medidata Solutions Inc.

OA Round

1 (Non-Final)

Interview Optional

— +20.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 783 resolved cases, 2023–2026

Examiner Intelligence

CASANOVA, JORGE A View full profile →

Grants 85% — above average

Career Allow Rate

664 granted / 783 resolved

+29.8% vs TC avg

Strong +20% interview lift

Without

With

+20.0%

Interview Lift

resolved cases with interview

Typical timeline

2y 8m

Avg Prosecution

14 currently pending

Career history

797

Total Applications

across all art units

Statute-Specific Performance

§101

19.1%

-20.9% vs TC avg

§103

41.4%

+1.4% vs TC avg

§102

17.6%

-22.4% vs TC avg

§112

9.3%

-30.7% vs TC avg

Black line = Tech Center average estimate • Based on career data from 783 resolved cases

Office Action

§101 §103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-24 are presented for examination.
This Office action is Non-Final.
Information Disclosure Statement
The information disclosure statements (IDS) filed on 12/23/2024 and 10/10/2025 have been considered by the Examiner and made of record in the application file.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claim 1-24 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1:  Statutory Category
Claims 1, 9 and 17 recite a method, system and non-transitory computer-readable medium, thus, the claims fall within statutory subject matter.
Step 2A:  Prong One
Independent claim 1 recites, in substance:
processing EDC metadata into vectors,
embedding the vectors using a Siamese neural network,
classifying embedded vectors to predict SDTM fields, and
mapping and converting datasets based on the prediction.
These steps constitutes:
mathematical concepts – vectorization, embeddings, similarity distance
mental processes – classification, labeling, mapping
data analysis/organization – converting one schema to another
Recognized by the courts as abstract in:
Electric Power Group, LLC v. Alstom S.A., 830 F.3d 1350, 1354, 119 USPQ2d 1739, 1742 (Fed. Cir. 2016)
SAP America, Inc. v. InvestPic, LLC, 898 F.3d 1161, 1163, 127 USPQ2d 1597, 1599 (Fed. Cir. 2018)
CyberSource Corp. v. Retail Decisions, Inc., 654 F.3d 1366, 1372, 99 USPQ2d 1690, 1695 (Fed. Cir. 2011)
Accordingly, claim 1 recites an abstract idea.
Independent claims 9 and 17 recites substantially the same limitations in system and non-transitory computer-readable medium form and therefore also recites and abstract idea.
Step 2A: Prong Two
The claims further recite:
a Siamese neural network,
a prediction model,
generic processors and memory (system/CRM claims).
The claims:
use neural networks only a tools for data analysis,
do not improve computer functionality,
do not recite specialized hardware, and
do not modify neural-network operation itself.
Thus, the claims merely apply conventional ML to perform abstract data processing.
Under SAP America, Inc. v. InvestPic, LLC and Electric Power Group, LLC v. Alstom S.A., this claims do not integrate the judicial exception into a practical application.
Step 2B:  Inventive Concept
Remaining elements:
Siamese neural network
embeddings
classifier
generic computing components
Machine-learning classification using:
neural networks,
embeddings, and
similarity metrics.
These are well-understood, routine and conventional before the priority data and therefore do not amount to significantly more than the abstract idea.
Applying these tools to convert EDC data into SDTM format is merely use of conventional ML in a new data domain, which does not provide an inventive concept.  See SAP America, Inc. v. InvestPic, LLC and Alice Corp. Pty. Ltd. v. CLS Bank Int’l, 573 U.S. 208, 218, 110 USPQ2d 1976, 1981 (2014).
Dependent claims 2, 3, 10, 11, 18 and 19 adds only use of labeled training subsets or manual curation.
Such data-selection steps are mental processes/data gathering, which do not add significantly more, thus, remains an abstract idea.
Dependent claims 4, 12 and 20 adds pair generation for Siamese inputs, however, pair construction is a mathematical/algorithmic preparation step within the abstract  ML process, thus, provides no practical application integration.
Dependent claims 5, 6, 13, 14, 21 and 22 recites LSTM architecture/distance computation.
These recite only:
particular neural-network architectures, and
distance calculations between embeddings.
Such limitations are:
mathematical operations, and
routine ML design choices.
They do not improve computer technology itself, thus, remains abstract. See Enfish, LLC v. Microsoft Corp., 822 F.3d 1327, 1335-36, 118 USPQ2d 1684, 1689 (Fed. Cir. 2016)
Dependent claims 7, 15 and 23 recites classifier trained on embeddings.
Training and using a classifier is:
a core mathematical ML function,
previously determined to be abstract in SAP America, Inc. v. InvestPic, LLC.
Accordingly, no inventive concept are added.
Dependents 8, 16 and 24 adds final SDTM data conversion.
Converting one data representation to another is:
data organization/presentation,
a classic abstract information-processing step (Electric Power Group, LLC v. Alstom S.A.)
Thus, no eligible improvement.
Claims 9-24 represents the system and CRM forms of the above analysis for claims 1-8 and merely recite processors, memory, and instructions for implementing the same, in other words, the claims merely implements the same abstract idea on generic computers, which does not confer eligibility. See Alice Corp. Pty. Ltd. v. CLS Bank Int’l and Ultramercial, Inc. v. Hulu, LLC, 772 F.3d 709, 715-16, 112 USPQ2d 1750, 1755 (Fed. Cir. 2014)
Accordingly, claims 1-24 lack an inventive concept.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-24 are rejected under 35 U.S.C. 103 as being unpatentable over Ramanujam (US 12,112,837 B2, US-PUB version also cited by the Applicant on the IDS filed on 10/10/2025) hereinafter “Ramanujam”, further in view of Velagapudi et al. (US 11,645,456 B2) hereinafter “Velagapudi”.
With respect to claims 1, 9 and 17, Ramanujam discloses a method, system, and non-transitory computer-readable medium for converting an Electronic Data Capture (EDC) dataset to a Standard Data Tabulation Model (SDTM) dataset [see Abstract, disclosing a method and a clinical data standards (CDS) automated system are provided for automating clinical data standards and generating study data tabulation model (SDTM) artifacts required for a regulatory submission process using a machine learning model and a natural language processing (NLP) engine with minimal user intervention], comprising:
a computer having one or more processors in communication with a memory, the memory storing instructions executable by said one or more processors to perform [see col. 2, lines 38-44, disclosing a clinical data standards (CDS) automated system comprising at least one processor configured to execute computer program instructions for automating clinical data standards and generating SDTM artifacts required for the regulatory submission process using a machine learning model and a natural language processing engine with minimal user intervention; also, see col. 12, lines 6-9, disclosing a memory unit 806 for storing programs and data, and at least one processor 805 communicatively coupled to the non-transitory, computer readable storage medium]:
processing metadata for an EDC dataset, the metadata comprising EDC field names, to produce vectors of dimensionality n1, where n1 is an integer [see col. 13, lines 47-55, disclosing the modules of the clinical data standards (CDS) automated system 800 comprise the metadata extractor 202, the case report form (CRF) extractor 203, the annotator 204, the mapping engine 211, multiple study data tabulation model (SDTM) generators, for example, 212, 213, and 214, the define package generator 215, the validator 216, and the machine learning (ML)/natural language processing (NLP) engine 208 constituted by the ML model 206 and the NLP engine 207 as exemplarily illustrated in FIG. 2]; and
mapping the EDC field names, respectively, with the SDTM field names based at least in part on the class predictions for the EDC field names [see col. 14, lines 7-15, disclosing the mapping engine 211 comprises the smart mapper 211a as disclosed in the detailed description of FIG. 2; The smart mapper 211a maps one or more raw datasets against a target study data tabulation model (SDTM) variable based on the extracted metadata, for example, an annotated case report form (CRF), available in the clinical data standards (CDS) automated system 800 and information acquired from previous learnings], 
wherein the embedding model is obtained from a trained Siamese neural network comprising a first embedding subnetwork and a second embedding subnetwork.  (emphasis added)
Ramanujan discloses the method, system and non-transitory computer-readable medium, as referenced above.
Ramanujan does not explicitly discloses:
processing the vectors of dimensionality n1, in an embedding model, to produce embedded vectors of dimensionality n2, where n2 is an integer and is less than n1;
processing the embedded vectors, in a prediction model, to produce class predictions for the EDC field names, the classes corresponding to the SDTM field names; and
mapping the EDC field names, respectively, with the SDTM field names based at least in part on the class predictions for the EDC field names, 
wherein the embedding model is obtained from a trained Siamese neural network comprising a first embedding subnetwork and a second embedding subnetwork.  (emphasis added)
However, Velagapudi discloses:
processing the vectors of dimensionality n1, in an embedding model, to produce embedded vectors of dimensionality n2, where n2 is an integer and is less than n1 [see col. 3, lines 48-52, disclosing another technical benefit is that the resulting machine learning models may be more accurate because the neural networks used to identify the training data utilize embeddings that are based on historical textual data that is domain specific];
processing the embedded vectors, in a prediction model, to produce class predictions for the EDC field names, the classes corresponding to the SDTM field names [see col. 6, lines 2-14, disclosing an embedding analysis unit 230 that may be used to process historical data 215 to produce embedding data 235; The embedding data 235 represents relationships between words in the historical data 215 and provide the SNN 100 with a domain-specific understanding of how language is being used within that domain; The historical data 215 may include textual content, such as text messages, emails, transcripts of meetings, phone calls, or other discussions, and/or other such text-based input; The historical data 215 can be used to determine a model of how language is being used in that domain, which in turn can be used to better understand textual inputs from training data to be analyzed by the SNN 100 for outliers]; and
mapping the EDC field names, respectively, with the SDTM field names based at least in part on the class predictions for the EDC field names, 
wherein the embedding model is obtained from a trained Siamese neural network comprising a first embedding subnetwork and a second embedding subnetwork [see col. 12, lines 49-58, disclosing the process 900 may include an operation 920 of analyzing the training data using a Siamese Neural Network to determine within-label similarities and cross-label similarities associated with a plurality of data elements within the training data; The within-label represent similarities between a respective data element and a first set of data elements similarly labeled in the training data, and the cross-label similarities represent similarities between the respective data element and a second set of data elements dissimilarly labeled in the training data].  (emphasis added)
It would have been obvious before the effective filing date of the invention to a person having ordinary skill in the art to which said subject matter pertains to modify the clinical data converter as taught by Ramanujam with the Siamese neural networks machine learning techniques as taught by Velagapudi.  Doing so would have provided Ramanujam technical benefits such that the resulting machine learning models may be more accurate because the neural networks used to identify the training data utilize embeddings that are based on historical textual data that is domain specific.
With respect to claims 2, 10 and 18, as modified the combination of Ramanujam and Velagapudi discloses the method, system and non-transitory computer-readable medium of claims 1, 9 and 17, as referenced above.  The combination further teaches training the Siamese neural network based on a subset of the EDC field names which is pre-mapped to a subset of the SDTM field names [Velagapudi, see col. 12, lines 19-29, disclosing the configuration unit 735 may be configured to allow the user to provide historical data to configure the embedding used by the SNN 100 and may also be configured allow the user to provide SNN training data that may be used to train the neural networks of the SNN 100; The configuration unit 735 may also be configured to allow the user to reset the embedding and/or the neural networks of the SNN 100 to a default state; The user may wish to retrain the embedding and/or neural networks when switching from analyzing training data associated from a first domain to analyzing training data associated with a second domain].
With respect to claims 3, 11 and 19, as modified the combination of Ramanujam and Velagapudi discloses the method, system and non-transitory computer-readable medium of claims 2, 10 and 18, as referenced above.  The combination further teaches wherein the subset of the EDC field names is manually curated [Velagapudi, see col. 12, lines 25-29, disclosing the configuration unit 735 may also be configured to allow the user to reset the embedding and/or the neural networks of the SNN 100 to a default state; The user may wish to retrain the embedding and/or neural networks when switching from analyzing training data associated from a first domain to analyzing training data associated with a second domain].
With respect to claims 4, 12 and 20, as modified the combination of Ramanujam and Velagapudi discloses the method, system and non-transitory computer-readable medium of claims 2, 10 and 18, as referenced above.  The combination further teaches generating pairs of vectors of dimensionality n1 to be input, respectively, to the first embedding subnetwork and the second embedding subnetwork [Velagapudi, see col. 4, lines 21-25, disclosing the textual content of the first inputs 115a and the second input 115b can be converted from text to a vectorized representation using word embedding information trained using textual historical data associated with a domain for which the training data is to be assessed by the SNN 100].
With respect to claims 5, 13 and 21, as modified the combination of Ramanujam and Velagapudi discloses the method, system and non-transitory computer-readable medium of claims 4, 12 and 20, as referenced above.  The combination further teaches processing, by the first embedding subnetwork and the second embedding subnetwork, the vectors of dimensionality n1 in a long short-term memory neural network to produce the embedded vectors [Velagapudi, see col. 4, lines 28-34, disclosing the subnetworks 130a and 130b may be implemented as various types of neural networks. In the examples that follow, the subnetworks 130a and 130b are implemented as a Bidirectional Long Short-Term Memory (BiLSTM) neural network; Other types of neural networks may be used to implement the subnetworks 130a and 130b in other implementations].
With respect to claims 6, 14 and 22, as modified the combination of Ramanujam and Velagapudi discloses the method, system and non-transitory computer-readable medium of claims 4, 12 and 20, as referenced above.  The combination further teaches determining, in an external layer of the Siamese neural network, a distance between each pair of the embedded vectors produced, respectively, by the first embedding subnetwork and the second embedding subnetwork [Velagapudi, see cols. 4-5, lines 62-67 and 1-6, disclosing the first output 125a and the second output 125b are provided as inputs to the comparator unit 140; The comparator unit 140 is configured to calculate a “distance” between the first output 125a and the second output 125b and to output this distance as the similarity value 135; The SNN 100 yields a small distance for similar inputs and a greater distance for inputs that are dissimilar; For example, if subnetworks determine that their respective inputs fall into the same class, the distance between the two inputs should be relatively small; The SNN 100 outputs this distance as the similarity value 145; The similarity value is a numerical value representing the distance between the first output 125a and the second output 125b].
With respect to claims 7, 15 and 23, as modified the combination of Ramanujam and Velagapudi discloses the method, system and non-transitory computer-readable medium of claims 2, 10 and 18, as referenced above.  The combination further teaches training a classification model using a subset of the metadata which is pre-mapped to a subset of the SDTM field names, wherein the subset of the metadata is processed by the embedding model before being input to the classification model; and using the trained classification model as the prediction model [Ramanujam, see col. 14, lines 7-15, disclosing the mapping engine 211 comprises the smart mapper 211a as disclosed in the detailed description of FIG. 2; The smart mapper 211a maps one or more raw datasets against a target study data tabulation model (SDTM) variable based on the extracted metadata, for example, an annotated case report form (CRF), available in the clinical data standards (CDS) automated system 800 and information acquired from previous learnings; also, Velagapudi, see col. 6, lines 2-14, disclosing an embedding analysis unit 230 that may be used to process historical data 215 to produce embedding data 235; The embedding data 235 represents relationships between words in the historical data 215 and provide the SNN 100 with a domain-specific understanding of how language is being used within that domain; The historical data 215 may include textual content, such as text messages, emails, transcripts of meetings, phone calls, or other discussions, and/or other such text-based input; The historical data 215 can be used to determine a model of how language is being used in that domain, which in turn can be used to better understand textual inputs from training data to be analyzed by the SNN 100 for outliers].
With respect to claims 8, 16 and 24, as modified the combination of Ramanujam and Velagapudi discloses the method, system and non-transitory computer-readable medium of claims 1, 9 and 17, as referenced above.  The combination further teaches converting the EDC dataset to an SDTM dataset based at least in part on said mapping [Ramanujam, see Abstract, disclosing a method and a clinical data standards (CDS) automated system are provided for automating clinical data standards and generating study data tabulation model (SDTM) artifacts required for a regulatory submission process using a machine learning model and a natural language processing (NLP) engine with minimal user intervention].
Prior Art Made of Record
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Cella et al. (‘035) discloses a robotic fleet resource provisioning system.
Park et al. (‘754) discloses simultaneous training and correction of artificial neural network and dataset.
Lahlou et al. discloses exact and approximate inverses of neural embedding models.
Malfait discloses code generator for clinical research study systems.
Park et al. (‘454) discloses detecting anomaly in time series data and computing device for executing the method.
Cella et al. discloses predictive model data stream prioritization.
Bronkalla et al. discloses intelligent processing of bulk historic patient data.
Narain et al. discloses patient stratification and identification of potential biomarkers.
Glushakov et al. discloses topology-based clinical data mining.
Juneja et al. discloses data harmonization and data mapping in specified domains.
Pattnaik et al. discloses clinical data aggregation architecture and platform.
Janevski et al. discloses contextualized tracking of the progress of a clinical study.
Bentwich et al. discloses medical search clinical interaction.

Conclusions/Points of Contacts
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JORGE A CASANOVA whose telephone number is (571)270-3563. The examiner can normally be reached M-F: 9 a.m. to 6 p.m. (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aleksandr Kerzhner can be reached at (571) 270-1760. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/JORGE A CASANOVA/Primary Examiner, Art Unit 2165

Read full office action

Prosecution Timeline

Jul 12, 2023

Application Filed

Feb 18, 2026

Non-Final Rejection — §101, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/945,268

Patent 12596748

GRAPH DATABASE STORAGE ENGINE

2y 5m to grant Granted Apr 07, 2026

17/933,913

Patent 12591620

TEMPORAL GRAPH ANALYTICS ON PERSISTENT MEMORY

2y 5m to grant Granted Mar 31, 2026

18/895,080

Patent 12566798

CAUSAL ANALYSIS WITH TIME SERIES DATA

2y 5m to grant Granted Mar 03, 2026

18/909,916

Patent 12554734

INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND PROGRAM

2y 5m to grant Granted Feb 17, 2026

19/011,362

Patent 12554739

CONFIGURATION-DRIVEN EFFICIENT TRANSFORMATION OF FORMATS AND OBJECT STRUCTURES FOR DATA SPECIFICATIONS IN COMPUTING SERVICES

2y 5m to grant Granted Feb 17, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

85%

Grant Probability

99%

With Interview (+20.0%)

2y 8m

Median Time to Grant

Low

PTA Risk

Based on 783 resolved cases by this examiner. Grant probability derived from career allow rate.