Last updated: May 29, 2026
Application No. 18/987,022
MACHINE LEARNING SYSTEMS AND METHODS TO DIAGNOSE RARE DISEASES

Non-Final OA §101§103
Filed
Dec 19, 2024
Priority
Sep 23, 2020 — provisional 63/082,369 +4 more
Examiner
HAMILTON, MATTHEW L
Art Unit
3682
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
Sanofi
OA Round
1 (Non-Final)
This examiner grants 54% of cases after interview

— +61.9% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 511 resolved cases, 2023–2026
Examiner Intelligence

HAMILTON, MATTHEW L View full profile →
Grants 54% of resolved cases
Career Allowance Rate
274 granted / 511 resolved
+1.6% vs TC avg
Strong +62% interview lift
Without
With
+61.9%
Interview Lift
resolved cases with interview
Typical timeline
4y 1m
Avg Prosecution
25 currently pending
Career history
541
Total Applications
across all art units
Statute-Specific Performance

§101
16.2%
-23.8% vs TC avg
§103
52.6%
+12.6% vs TC avg
§102
7.1%
-32.9% vs TC avg
§112
22.9%
-17.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 511 resolved cases
Office Action

§101 §103
DETAILED ACTION
	This action is in response to the initial filing filed on December 19, 2024.  Claims 1-20 have been examined and are currently pending. 

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Inventorship
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Information Disclosure Statement
The Information Disclosure Statements filed January 2, 2025 and October 15, 2025 have been considered. Initialed copies of the Form 1449 are enclosed herewith.

Claim Objections
Claim 1 is objected to because of the following informalities: Independent claim 1 recites the limitation, “in response to removal of the medical data of the one or more individuals in the one or more least-representative clusters, generating, by the one or more computers, a pruned dataset:”.  Replace the colon at the end of the limitation with a semi-colon.  Appropriate correction is required.

Claims 1, 12, and 17 are objected to because of the following informalities: The term, “the predefined set” lacks antecedent basis in lines 28, 33-34, and 29 respectively.  Appropriate correction is required.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.

ALICE/ MAYO:  TWO-PART ANALYSIS
2A.   First, a determination whether the claim is directed to a judicial exception (i.e., abstract idea).  
Prong 1:  A determination whether the claim recites a judicial exception (i.e., abstract idea).

Groupings of abstract ideas enumerated in the 2019 Revised Patent Subject Matter Eligibility Guidance.

Mathematical concepts- mathematical relationships, mathematical formulas or equations, mathematical calculations.
Certain methods of organizing human activity- fundamental economic principles or practices (including hedging, insurance, mitigating risk); commercial or legal interactions (including agreements in the form of contracts; legal obligations; advertising, marketing or sales activities or behaviors; business relations); managing personal behavior or relationships or interactions between people (including social activities, teaching, and following rules or instructions).
Mental processes- concepts performed in the human mind (including an observation, evaluation, judgement, opinion).

Prong 2:  A determination whether the judicial exception (i.e., abstract idea) is integrated into a practical application.

Considerations indicative of integration into a practical application enumerated in the 2019 Revised Patent Subject Matter Eligibility Guidance.

Improvement to the functioning of a computer, or an improvement to any other technology or technical field
Applying or using a judicial exception to effect a particular treatment or prophylaxis for a disease or medical condition
Applying the judicial exception with, or by use of a particular machine.
Effecting a transformation or reduction of a particular article to a different state or thing
Applying or using the judicial exception in some other meaningful way beyond generally linking the use of the judicial exception to a particular technological environment, such that the claim as a whole is more than a drafting effort designed to monopolize the exception
Considerations that are not indicative of integration into a practical application enumerated in the 2019 Revised Patent Subject Matter Eligibility Guidance.

Merely reciting the words “apply it” (or an equivalent) with the judicial exception, or merely including instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea.
Adding insignificant extra-solution activity to the judicial exception.
Generally linking the use of the judicial exception to a particular technological environment or field of use.

2B. Second, a determination whether the claim provides an inventive concept (i.e., Whether the claim(s) include additional elements, or combinations of elements, that are sufficient to amount to significantly more than the judicial exception (i.e., abstract idea)).
Considerations indicative of an inventive concept (aka “significantly more”) enumerated in the 2019 Revised Patent Subject Matter Eligibility Guidance.

Improvement to the functioning of a computer, or an improvement to any other technology or technical field
Applying the judicial exception with, or by use of a particular machine.
Effecting a transformation or reduction of a particular article to a different state or thing
Applying or using the judicial exception in some other meaningful way beyond generally linking the use of the judicial exception to a particular technological environment, such that the claim as a whole is more than a drafting effort designed to monopolize the exception  NOTE:  The only consideration that does not overlap with the considerations indicative of integration into a practical application associated with step 2A: Prong 2.

Considerations that are not indicative of an inventive concept (aka “significantly more”) enumerated in the 2019 Revised Patent Subject Matter Eligibility Guidance.

Merely reciting the words “apply it” (or an equivalent) with the judicial exception, or merely including instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea.
Adding insignificant extra-solution activity to the judicial exception.
Generally linking the use of the judicial exception to a particular technological environment or field of use.
Simply appending well-understood, routine, conventional activities previously known to the industry, specified at a high level of generality, to the judicial exception.  NOTE:  The only consideration that does not overlap with the considerations that are not indicative of integration into a practical application associated with step 2A: Prong 2.

See also, 2019 Revised Patent Subject Matter Eligibility Guidance; Federal Register; Vol. 84, No. 4; Monday, January 7, 2019

Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more.  
1:  Statutory Category
Applicant’s claimed invention, as described in independent claim 1 is directed to a method, independent claim 12 is directed to a system, and independent claim 17 is directed to a nontransitory computer readable medium.

2(A):  The claim(s) are directed to a judicial exception (i.e., an abstract idea).
PRONG 1:  The claim(s) recite a judicial exception (i.e., an abstract idea).
Certain Methods of Organizing Human Activity
Independent claims 1, 12 and 17 recite the limitations, “receiving, by one or more computers from a database, an initial dataset comprising respective medical data associated with a plurality of individuals with the rare disease, the respective medical data for each individual comprising data indicative of a plurality of features of the rare disease experienced by the individual; identifying, by the one or more computers, a plurality of clusters of individuals whose medical data is in the initial dataset by applying an unsupervised clustering algorithm; in response to applying the unsupervised clustering algorithm, identifying, by the one or more computers, from among the clusters, one or more least-representative clusters as being least representative of the rare disease based on the medical data of the individuals in the clusters; removing, by the one or more computers, from the initial dataset, medical data of one or more individuals in the one or more least-representative clusters based on medical data of the one or more individuals; in response to removal of the medical data of the one or more individuals in the one or more least-representative clusters, generating, by the one or more computers, a pruned dataset: applying, by the one or more computers, a natural language algorithm to a corpus of medical literature related to the rare disease to extract a set of clinical terms; generating, by the one or more computers, word embeddings for (i) the set of clinical terms extracted from the medical literature and (ii) the predefined set of specific symptoms of the rare disease:” are directed to the abstract idea of certain methods of organizing human activity.  In particular the limitations recited above are directed to filtering content or medical data associated with individuals with a rare disease data or content received from a database.  Additionally, the medical data or content associated with individuals with a rare disease is further filtered by performing clustering function and removal of individuals from the clustered groups.  Further, the invention filters content or clinic terms from reviewing or analyzing available medical literature.   As per MPEP 2106.04(a)(2)(II)(C), the function of “filtering content” is directed to the abstract idea of managing personal behavior under the abstract idea of certain methods of organizing human activity.     
Mathematical Concepts
Independent claims 1, 12 and 17 recite the limitations, “determining, by the one or more computers, a similarity between the word embeddings of the clinical terms and the word embeddings of the predefined set of specific symptoms of the rare disease using a similarity metric; and augmenting, by the one or more computers, the training dataset with one or more clinical terms having similarity scores above a threshold value to generate an augmented training dataset.” are directed to the abstract idea mathematical concepts under mathematical relationships and mathematical calculations.  The claim recites limitations directed to using mathematical calculations determine correlations or resemblance between word embeddings and clinic terms and word embeddings and symptoms of the rare disease.  The claim recites mathematical relationships to determine clinical terms are greater or more than a particular threshold to change the training dataset. 
PRONG 2:  The judicial exception (i.e., an abstract idea) is not integrated into a practical application.
The applicant has not shown or demonstrated any of the requirements described above under "integration into a practical application" under step 2A. Specifically, the applicant's limitations are not "integrated into a practical application" because they are adding words "apply it" with the judicial exception, or mere instructions to implement an abstract idea merely as a tool to perform an abstract idea (see MPEP 2106.05(f)).  Additionally, improvements to the functioning of a computer or any other technology or technical field has not been shown or disclosed (see MPEP 2106.05(a)).  The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.  Specifically, the applicant’s limitations are not “significantly more” because they are adding words “apply it” with the judicial exception, or mere instructions to implement an abstract idea merely as a tool to perform an abstract idea (see MPEP 2106.05(f)).  The applicant’s claimed limitations do not demonstrate an improvement to another technology or technical field, an improvement to the functioning of the computer itself, effecting a transformation or reduction of particular article to a different state or thing. The current application does not amount to 'significantly more' than the abstract idea as described above. The claim does not include additional elements or limitations individually or in combination that are sufficient to amount to significantly more than the judicial exception. Specifically, the individual elements of one or more computers, one or more processors, memory, database, and non-transitory computer readable medium amount to no more than implementing an idea with a computerized system and they are adding words “apply it” with the judicial exception, or mere instructions to implement an abstract idea merely as a tool to perform an abstract idea. The additional elements taken in combination add nothing more than what is present when the elements are considered individually. Therefore, based on the two-part Alice Corp. analysis, there are no meaningful limitations in the claims that transform the exception (i.e., abstract idea) into a patent eligible application.
Dependent claims 2-11, 13-16, and 18-20 are rejected as ineligible subject matter under 35 U.S.C. 101 based on a rationale similar to the claims from which they depend. 
Since the claim(s) recite a judicial exception and fails to integrate the judicial exception into a practical application, the claim(s) is/are “directed to” the judicial exception.  Thus, the claim(s) must be reviewed under the second step of the Alice/ Mayo analysis to determine whether the abstract idea has been applied in an eligible manner.
2(B):  The claims do not provide an inventive concept (i.e., The claim(s) do not include additional elements, or combinations of elements, that are sufficient to amount to significantly more than the judicial exception (i.e., abstract idea)).
As discussed with respect to Step 2A Prong Two, the additional element(s) in the claim amounts to no more than mere instructions to apply the exception using a generic computer component.   The same analysis applies here in 2B, i.e., mere instructions to apply an exception using a generic computer component cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.  
For these reasons, there is no invention concept in the claim, and thus the claim is ineligible.
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 1-20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-16 of U.S. Patent No. 12,211,619 B2. Although the claims at issue are not identical, they are not patentably distinct from each other, see table below.
Application 18/987,022 claim 1
US Patent 12, 211,619 B2 claim 1
receiving, by one or more computers from a database, an initial dataset comprising respective medical data associated with a plurality of individuals with the rare disease, the respective medical data for each individual comprising data indicative of a plurality of features of the rare disease experienced by the individual;
receiving, by one or more computers from a database, an initial dataset comprising respective medical data associated with a plurality of individuals with the rare disease, the respective medical data for each individual comprising data indicative of a plurality of features of the rare disease experienced by the individual;
identifying, by the one or more computers, a plurality of clusters of individuals whose medical data is in the initial dataset by applying an unsupervised clustering algorithm;
identifying, by the one or more computers, a plurality of clusters of individuals whose medical data is in the initial dataset by applying an unsupervised clustering algorithm;
in response to applying the unsupervised clustering algorithm, identifying, by the one or more computers, from among the clusters, one or more least-representative clusters as being least representative of the rare disease based on the medical data of the individuals in the clusters;
in response to applying the unsupervised clustering algorithm, identifying, by the one or more computers, from among the clusters, one or more least-representative clusters as being least representative of the rare disease based on the medical data of the individuals in the clusters, wherein identifying the one or more least-representative clusters comprises:

identifying a respective representative symptom of the rare disease for each of the plurality of clusters, comparing the respective representative symptom for each cluster to a predefined set of specific symptoms of the rare disease, the predefined set of specific symptoms comprising a predefined set of more representative symptoms and a predefined set of least representative symptoms, and identifying a cluster as being one of the least-representative clusters in response to determining that the respective representative symptom of the cluster is in the predefined set of least representative symptoms;
removing, by the one or more computers, from the initial dataset, medical data of one or more individuals in the one or more least-representative clusters based on medical data of the one or more individuals;
removing, by the one or more computers, from the initial dataset, medical data of one or more individuals in the one or more least-representative clusters based on medical data of the one or more individuals;
in response to removal of the medical data of the one or more individuals in the one or more least-representative clusters, generating, by the one or more computers, a pruned dataset:
in response to removal of the medical data of the one or more individuals in the one or more least-representative clusters, generating, by the one or more computers, a pruned dataset;
combining by the one or more computers, the pruned dataset with a control dataset comprising medical data of a plurality of individuals without the rare disease;
combining by the one or more computers, the pruned dataset with a control dataset comprising medical data of a plurality of individuals without the rare disease;
generating, by the one or more computers, a training dataset in response to combining the pruned dataset with the control data set;
generating, by the one or more computers, a training dataset in response to combining the pruned dataset with the control data set;
applying, by the one or more computers, a natural language algorithm to a corpus of medical literature related to the rare disease to extract a set of clinical terms;
applying, by the one or more computers, a natural language algorithm to a corpus of medical literature related to the rare disease to extract a set of clinical terms;
generating, by the one or more computers, word embeddings for (i) the set of clinical terms extracted from the medical literature and (ii) the predefined set of specific symptoms of the rare disease:
generating, by the one or more computers, word embeddings for (i) the set of clinical terms extracted from the medical literature and (ii) the predefined set of specific symptoms of the rare disease;
determining, by the one or more computers, a similarity between the word embeddings of the clinical terms and the word embeddings of the predefined set of specific symptoms of the rare disease using a similarity metric;
determining, by the one or more computers, a similarity between the word embeddings of the clinical terms and the word embeddings of the predefined set of specific symptoms of the rare disease using a similarity metric;
and augmenting, by the one or more computers, the training dataset with one or more clinical terms having similarity scores above a threshold value to generate an augmented training dataset.
augmenting, by the one or more computers, the training dataset with one or more clinical terms having similarity scores above a threshold value to generate an augmented training dataset.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-7 and 9-20 are rejected under 35 U.S.C. 103 as being unpatentable over Steinberg-Koch et al. US Publication 20220223293 A1 in view of Yadav US Publication 20200104648 A1 in view of Feng et al. US Publication 20170227528 A1 in view of in view of Han et al. US Publication 20190213303 A1 further in view of Genner et al. US Patent 11967173 B1.

Claims 1, 12, and 17:
	As per claims 1, 12, and 17, Steinberg-Koch teach a method, system and non-transitory computer readable medium comprising;
receiving, by one or more computers from a database, an initial dataset comprising respective medical data associated with a plurality of individuals with the rare disease, the respective medical data for each individual comprising data indicative of a plurality of features of the rare disease experienced by the individual (paragraphs 0028, 0101, 0129 “The novel algorithms of the present disclosure process a collection of subject data collected from sources comprising at least some of electronic medical records (EMR), electronic health records (EHR), insurance claims data, patient sensors data such as IoT sensors, or data from health application programs, and suggest a subject's risk for having a common or uncommon autoimmune related disease, such as CD, IBD (Crohn's disease/ulcerative colitis), multiple sclerosis (MS), rheumatoid arthritis (RA), systemic lupus erythematosus (SLE), and others. The method prioritizes subjects according to probability/risk and makes recommendations regarding the appropriate subsequent steps, such as related tests or prescription of a specific treatment. The system provides explanatory output regarding relevant symptoms and signs, and analyzes trends, symptom recurrence, symptom distribution and all relevant patient history, to determine the risk of the particular subject having or developing the specific disease under consideration by the system. The service enables providers to seamlessly integrate this solution into their current workflow by either integrating the algorithms and software into the existing EMR system or by providing a separate software interface.”); 

identifying, by the one or more computers, a plurality of clusters of individuals whose medical data is in the initial dataset by applying an unsupervised clustering algorithm (paragraph 0025, 0173, 0177-0178, and Figure 6 “The methods of the present disclosure are based on the ability to cluster individuals or groups of individuals based on defining characteristics, such as demographic, symptoms, lab test results, medications, procedures, biomarkers, or other measurable properties, while recognizing that individuals differ in an almost infinite number of characteristics representing their biologic individuality. The methods of the present disclosure collect, store, and analyze huge bodies of data to classify people according to their individual likelihood of acquiring symptoms of a specific autoimmune disease or having a specific autoimmune disease which is undiagnosed at the point of the data collection.” and “The graph is an output of the T-SNE (t-distributed stochastic neighbor embedding) algorithm, which is a dimensional reduction method that may be used to visualize data set clustering.…By contrast, individuals diagnosed with a specific disease have values that differ significantly from normal and are part of the distinct clusters 601, 602, and 603 outlined by dotted ovals in the upper limits of the graph. These smaller disease clusters represent individuals having values that fall far from the mean average of normal individuals in the general population for the measured parameter on the y-axis, i.e., above the normal threshold. In terms of autoimmune disease, each small cluster may represent, for example, individuals identified as having or being predisposed to develop, CD 601, ulcerative colitis 602, or Crohn's disease 603. Thus, even though all of the individuals in these disease clusters have values outside—in this case, above—the normal threshold for the parameter measured on the y-axis, they vary among each other in terms of the second parameter represented on the x-axis, and each cluster or diagnosis can thus be distinguished from the others. In this example, the individuals in each disease cluster display values for the parameter represented on the x-axis which are below normal 601, normal 602, or above normal 603.”);

in response to applying the unsupervised clustering algorithm, identifying, by the one or more computers, from among the clusters, one or more least-representative clusters as being least representative of the rare disease based on the medical data of the individuals in the clusters (paragraph 0177 “The graph is an output of the T-SNE (t-distributed stochastic neighbor embedding) algorithm, which is a dimensional reduction method that may be used to visualize data set clustering.…By contrast, individuals diagnosed with a specific disease have values that differ significantly from normal and are part of the distinct clusters 601, 602, and 603 outlined by dotted ovals in the upper limits of the graph. These smaller disease clusters represent individuals having values that fall far from the mean average of normal individuals in the general population for the measured parameter on the y-axis, i.e., above the normal threshold. In terms of autoimmune disease, each small cluster may represent, for example, individuals identified as having or being predisposed to develop, CD 601, ulcerative colitis 602, or Crohn's disease 603. Thus, even though all of the individuals in these disease clusters have values outside—in this case, above—the normal threshold for the parameter measured on the y-axis, they vary among each other in terms of the second parameter represented on the x-axis, and each cluster or diagnosis can thus be distinguished from the others. In this example, the individuals in each disease cluster display values for the parameter represented on the x-axis which are below normal 601, normal 602, or above normal 603.”); 

Steinberg-Koch does not teach removing, by the one or more computers, from the initial dataset, medical data of one or more individuals in the one or more least-representative clusters based on medical data of the one or more individuals.  However, Yadav teaches an Apparatus and Method for Detecting and Removing Outliers Using Sensitivity Score and further teaches, “In some embodiments of the present disclosure, text corpus provider 530 may be one or more digitized health record providers, such as electronic medical records repositories of patient cases. Outliers of the health record data can provide valuable insights, such as medical errors and new variation of diseases, etc. However, the unique features of health care records, such as large scale and high-dimensionality (hundreds to thousands of dimensionalities, including patient, hospital, doctor, medicine information, etc.), bring challenges to conventional outlier detection techniques. Outlier detection and removal system 500 as shown in FIG. 5 may provide solutions to the technical problems in this field. For example, outlier detection and removal system 500 may be configured to obtain digitized health records dataset from the one or more health record providers, dynamically calculate sensitivity scores of the dataset, and detect outliers from the digitized health records based on dynamically computed sensitivity scores of the dataset. The spherical k-means clustering technique used in clustering of the data set allows efficient partitioning high-dimensional dataset and the detecting outliers based on dynamically determined sensitivity scores allows linear scan of the dataset and thereby meets the challenges in health care data analysis. The simple, efficient and scalable features of the proposed method provide technical solutions to the technical problems in digitized health care records analysis.” and “Clustering documents is a fundamental subroutine in many machine learning and data mining applications. Thus, removal of outliers before performing the clustering is a very crucial step. The methods and apparatuses described in the present disclosure can be applied to, but not limited to, the technical areas such as detect service failure in telecommunication network; detecting medical errors or new diseases from health care records; automatically categorizing library records; summarizing search engines' results; spam filtering; credit card abuse detection in financial transactions; document summarization; learning feature representation; hypertext clustering and web searching and building recommendation engine for recommending products to users on e-commerce websites. As most of such datasets are noisy and contains outliers and presence of outliers may give results that are very different from optimal results. One can apply the outlier detection and removal mechanism described in the present disclosure to remove outliers to obtain optimal results. In some datasets, detected outliers can give valuable insights of the datasets and provide useful information.” (paragraph 0087).  Therefore, it would have been obvious to one of ordinary skilled in the art at the time of filing to modify Steinburg-Koch to include removing, by the one or more computers, from the initial dataset, medical data of one or more individuals in the one or more least-representative clusters based on medical data of the one or more individuals as taught by Yadav in order to clean the dataset and enhance accuracy of the model.  

Steinberg-Koch does not teach in response to removal of the medical data of the one or more individuals in the one or more least-representative clusters, generating, by the one or more computers, a pruned dataset:  However, Yadav teaches an Apparatus and Method for Detecting and Removing Outliers Using Sensitivity Score and further teaches, “In some embodiments of the present disclosure, text corpus provider 530 may be one or more digitized health record providers, such as electronic medical records repositories of patient cases. Outliers of the health record data can provide valuable insights, such as medical errors and new variation of diseases, etc. However, the unique features of health care records, such as large scale and high-dimensionality (hundreds to thousands of dimensionalities, including patient, hospital, doctor, medicine information, etc.), bring challenges to conventional outlier detection techniques. Outlier detection and removal system 500 as shown in FIG. 5 may provide solutions to the technical problems in this field. For example, outlier detection and removal system 500 may be configured to obtain digitized health records dataset from the one or more health record providers, dynamically calculate sensitivity scores of the dataset, and detect outliers from the digitized health records based on dynamically computed sensitivity scores of the dataset. The spherical k-means clustering technique used in clustering of the data set allows efficient partitioning high-dimensional dataset and the detecting outliers based on dynamically determined sensitivity scores allows linear scan of the dataset and thereby meets the challenges in health care data analysis. The simple, efficient and scalable features of the proposed method provide technical solutions to the technical problems in digitized health care records analysis.” and “Clustering documents is a fundamental subroutine in many machine learning and data mining applications. Thus, removal of outliers before performing the clustering is a very crucial step. The methods and apparatuses described in the present disclosure can be applied to, but not limited to, the technical areas such as detect service failure in telecommunication network; detecting medical errors or new diseases from health care records; automatically categorizing library records; summarizing search engines' results; spam filtering; credit card abuse detection in financial transactions; document summarization; learning feature representation; hypertext clustering and web searching and building recommendation engine for recommending products to users on e-commerce websites. As most of such datasets are noisy and contains outliers and presence of outliers may give results that are very different from optimal results. One can apply the outlier detection and removal mechanism described in the present disclosure to remove outliers to obtain optimal results. In some datasets, detected outliers can give valuable insights of the datasets and provide useful information.” (paragraph 0087).    Therefore, it would have been obvious to one of ordinary skilled in the art at the time of filing to modify Steinburg-Koch to include in response to removal of the medical data of the one or more individuals in the one or more least-representative clusters, generating, by the one or more computers, a pruned dataset as taught by Yadav in order to clean the dataset and enhance accuracy of the model.  

Steinberg-Koch and Yadav do not teach combining by the one or more computers, the pruned dataset with a control dataset comprising medical data of a plurality of individuals without the rare disease.  However, Feng teaches Biomarker Compositions Specific to Coronary Heart Disease Patients and Uses Thereof and further teaches, “In one embodiment of the present invention, the method further comprises a step of establishing a training set for contents of the biomarker composition according to the first aspect of the present invention in samples (e.g., blood plasma, whole blood) of a coronary heart disease subject and a normal subject (control group).” (paragraph 0025) and “Endogenous small molecules in body are the basis of life activities, and changes of disease states and body functions will inevitably lead to changes of metabolism of the endogenous small molecules in the body. The present invention shows that there are significant differences in blood plasma metabolite profiles between the coronary heart disease group and the control group. In the present invention, a plurality of relevant biomarkers are obtained through comparison and analysis of metabolite profiles of the coronary heart disease group and the control group, which can be used in combintion with high quality data of metabolite profiles of biomarkers of coronary heart disease population and normal population as the training set to accurately perform risk assessment, early diagnosis and pathological staging of coronary heart disease.” (paragraph 0075).  Therefore, it would have been obvious to one of ordinary skilled in the art at the time of filing to modify Steinberg-Koch to include combining by the one or more computers, the pruned dataset with a control dataset comprising medical data of a plurality of individuals without the rare disease as taught by Feng in order to create a model that includes individuals with and without the rare disease.

Steinberg-Koch, Yadav, and Feng do not teach generating, by the one or more computers, a training dataset in response to combining the pruned dataset with the control data set.  However, Feng teaches Biomarker Compositions Specific to Coronary Heart Disease Patients and Uses Thereof and further teaches, “In one embodiment of the present invention, the method further comprises a step of establishing a training set for contents of the biomarker composition according to the first aspect of the present invention in samples (e.g., blood plasma, whole blood) of a coronary heart disease subject and a normal subject (control group).” (paragraph 0025) and “Endogenous small molecules in body are the basis of life activities, and changes of disease states and body functions will inevitably lead to changes of metabolism of the endogenous small molecules in the body. The present invention shows that there are significant differences in blood plasma metabolite profiles between the coronary heart disease group and the control group. In the present invention, a plurality of relevant biomarkers are obtained through comparison and analysis of metabolite profiles of the coronary heart disease group and the control group, which can be used in combintion with high quality data of metabolite profiles of biomarkers of coronary heart disease population and normal population as the training set to accurately perform risk assessment, early diagnosis and pathological staging of coronary heart disease.” (paragraph 0075).  Therefore, it would have been obvious to one of ordinary skilled in the art at the time of filing to modify Steinberg-Koch to include generating, by the one or more computers, a training dataset in response to combining the pruned dataset with the control data set as taught by Feng in order to develop and use a model that includes individuals with and without the rare disease.
  
Steinberg-Koch, Yadav, and Feng do not teach applying, by the one or more computers, a natural language algorithm to a corpus of medical literature related to the rare disease to extract a set of clinical terms.  However, Han teaches Adaptive Weighting of Similarity Metrics for Predictive Analytics of a Cognitive System and further teaches, “Present invention embodiments may be utilized as part of a cognitive system, e.g., comprising a machine learning system and/or a natural language processing system that is used to extract data (e.g., extract text describing characteristics of an entity, extract drug related information describing characteristics of an entity, and extract chemical information describing structural aspects of an entity)…” (paragraph 0057) and “Client systems 20 enable users to submit queries (e.g., queries for predictive analytics, etc.) to server systems 10 to generate predictions based upon an analysis of a large corpus of data (e.g., scientific data, scientific journals, publically and/or privately accessible chemical databases, databases of known pharmaceutical and/or biologic therapeutic compounds, databases/literature of known genes/DNA, databases/literature of expressed RNA, databases/literature covering proteomics, databases/literature covering metabolomics, etc.)…”(paragraph 0015).  Therefore, it would have been obvious to one of ordinary skilled in the art at the time of filing to modify Steinberg-Koch to include applying, by the one or more computers, a natural language algorithm to a corpus of medical literature related to the rare disease to extract a set of clinical terms as taught by Han in order to identify new or relevant keywords associated with the rare disease. 

Steinberg-Koch, Yadav, and Feng do not teach generating, by the one or more computers, word embeddings for (i) the set of clinical terms extracted from the medical literature and (ii) the predefined set of specific symptoms of the rare disease:  However, Han teaches Adaptive Weighting of Similarity Metrics for Predictive Analytics of a Cognitive System and further teaches, “A first similarity measure, textSimilarity 210, may be used to determine a similarity between entities based on the context of a term in a document, etc. Text similarity may evaluate a corpus of documents that mentions a particular entity, e.g., a particular chemical name, a particular gene name, a disease type, or other characteristic, etc. to determine a similarity score based on contextual analysis. This function may consider the context of an entity (e.g., what the document, which may be a publication, a literature reference, or a database, discloses about the particular entity). In some aspects, a document may be represented by a feature vector, a vector of terms which describe the document. The text similarity function may compare, using feature vectors, documents comprising particular entities across a corpus of documents, to determine a similarity score for an entity in set B/B′ relative to set A.” (paragraph 0028).  Therefore, it would have been obvious to one of ordinary skilled in the art at the time of filing to modify Steinberg-Koch to include generating, by the one or more computers, word embeddings for (i) the set of clinical terms extracted from the medical literature and (ii) the predefined set of specific symptoms of the rare disease: as taught by Han in order to determine a relationships or correlations between the words or terms.  

Steinberg-Koch, Yadav, and Feng do not teach determining, by the one or more computers, a similarity between the word embeddings of the clinical terms and the word embeddings of the predefined set of specific symptoms of the rare disease using a similarity metric.  However, Han teaches Adaptive Weighting of Similarity Metrics for Predictive Analytics of a Cognitive System and further teaches, “A first similarity measure, textSimilarity 210, may be used to determine a similarity between entities based on the context of a term in a document, etc. Text similarity may evaluate a corpus of documents that mentions a particular entity, e.g., a particular chemical name, a particular gene name, a disease type, or other characteristic, etc. to determine a similarity score based on contextual analysis. This function may consider the context of an entity (e.g., what the document, which may be a publication, a literature reference, or a database, discloses about the particular entity). In some aspects, a document may be represented by a feature vector, a vector of terms which describe the document. The text similarity function may compare, using feature vectors, documents comprising particular entities across a corpus of documents, to determine a similarity score for an entity in set B/B′ relative to set A.” (paragraph 0028).  Therefore, it would have been obvious to one of ordinary skilled in the art at the time of filing to modify Steinberg-Koch to include determining, by the one or more computers, a similarity between the word embeddings of the clinical terms and the word embeddings of the predefined set of specific symptoms of the rare disease using a similarity metric as taught by Han in order to measure the closeness or relationships between or among the word or terms.

Steinberg-Koch, Yadav, Feng and Han do not teach and augmenting, by the one or more computers, the training dataset with one or more clinical terms having similarity scores above a threshold value to generate an augmented training dataset.  However, Genner teaches Face Cover-Compatible Biometrics and Processes for Generating and Using Same and further teaches, “FIG. 6 shows a chart 600 that demonstrates similarity score distributions obtained from the experiment and classified into a control group 605 and an experimental group 607. According to one embodiment, score distributions of the control group 605 were not calibrated and score distributions of the experimental group 607 were calibrated.” (column 26, lines 16-21).  Therefore, it would have been obvious to one of ordinary skilled in the art at the time of filing to modify Steinberg-Koch to include augmenting, by the one or more computers, the training dataset with one or more clinical terms having similarity scores above a threshold value to generate an augmented training dataset as taught by Genner in order to accurately define patients that might have the rare disease.   

Claims 2, 13 and 18:
	As per claims 2, 13, and 18, Steinberg-Koch, Yadav, Feng, Han and Genner teach the method, system, and nontransitory computer readable medium of claims 1, 12, and 17 as described above and Steinberg-Koch further teaches wherein removing the medical data of the one or more of the individuals comprises:
for each individual in the least-representative clusters:  determining whether medical data associated with the individual satisfies a threshold condition, the threshold condition being defined based on symptoms of the rare disease (paragraphs 0177-0178 and 0096 and 0098). 

Yadav further teaches and in response to determining that the threshold condition is not satisfied, removing the individual from the plurality of clusters (paragraphs 0085 and 0087).  Therefore, it would have been obvious to one of ordinary skilled in the art at the time of filing to modify Steinberg-Koch to include in response to determining that the threshold condition is not satisfied, removing the individual from the plurality of clusters as taught by Yadav in order to maintain accurate data corresponding to the model.  

Claims 3, 14, and 19:
	As per claims 3, 14, and 19, Steinberg-Koch, Yadav, Feng, Han and Genner teach the method, system, and nontransitory computer readable medium of claims 2, 13, and 18 as described above and Steinberg-Koch further teaches wherein the threshold condition comprises a threshold number of symptoms of the rare disease (paragraph 0096).

Claims 4, 15, and 20:
	As per claims 4, 15, and 20, Steinberg-Koch, Yadav, Feng, Han and Genner teach the method, system, and nontransitory computer readable medium of claims 2, 13, and 18 as described above and Steinberg-Koch further teaches wherein satisfying the threshold condition comprises having at least one symptom of the rare disease in a characteristic subset of symptoms of the rare disease (paragraphs 0096, 0144-0146).

Claims 5 and 16:
	As per claims 5 and 16, Steinberg-Koch, Yadav, Feng, Han and Genner teach the method and system of claims 1 and 12 as described above and Steinberg-Koch further teaches wherein identifying the plurality of clusters of individuals in the initial dataset comprises using a hierarchical agglomerative clustering to cluster the initial dataset into a predetermined number of clusters (paragraphs 0025 and 0177-0178).

Claim 6:
	As per claim 6, Steinberg-Koch, Yadav, Feng, Han and Genner teach the method of claim 1 as described above and Feng further teaches further comprising:
comparing the medical data in the pruned dataset to medical data in the control dataset to identify one or more potential symptoms of the rare disease (paragraph 0075).  Therefore, it would have been obvious to one of ordinary skilled in the art at the time of filing to modify Steinberg-Koch to include comparing the medical data in the pruned dataset to medical data in the control dataset to identify one or more potential symptoms of the rare disease as taught by Feng in order to assess and compare the sets of data.  

and augmenting the training dataset with the one or more potential symptoms of the rare disease (paragraph 0075).  Therefore, it would have been obvious to one of ordinary skilled in the art at the time of filing to modify Steinberg-Koch to include and augmenting the training dataset with the one or more potential symptoms of the rare disease as taught by Feng in order to make corrections and facilitate accuracy of the data. 

Claim 7:
As per claim 7, Steinberg-Koch, Yadav, Feng, Han and Genner teaches the method of claim 1 as described above and Steinberg-Koch further teaches further comprising:
extracting, by using natural language processing, one or more potential symptoms of the rare disease from literature associated to the rare disease (paragraph 0081);
and augmenting the training dataset with the one or more potential symptoms of the rare disease (paragraph 0081).

Claim 9:
	As per claim 9, Steinberg-Koch, Yadav, Feng, Han and Genner teach the method of claim 1 as described above and Feng further teaches wherein combining the pruned dataset with the control dataset comprises matching a plurality of individuals with medical data in the control dataset to each individual with medical data in the pruned dataset at a predefined ratio, the matching being based on one or more demographic properties of the individuals (paragraph 0075).  Therefore, it would have been obvious to one of ordinary skilled in the art at the time of filing to modify Steinberg-Koch to include wherein combining the pruned dataset with the control dataset comprises matching a plurality of individuals with medical data in the control dataset to each individual with medical data in the pruned dataset at a predefined ratio, the matching being based on one or more demographic properties of the individuals as taught by Feng in order to analyze and compare attribute associated each of the groups. 

Claim 10:
	As per claim 10, Steinberg-Koch, Yadav, Feng, Han and Genner teach the method of claim 1 as described above and Steinberg-Koch further teaches wherein the method further comprises training a machine learning model on the training dataset, wherein the machine learning model is trained on a subset of data in the training dataset, the subset comprising, for each individual with medical data in the training dataset with the rare disease, medical data collected prior to the individual being diagnosed with the rare disease (paragraphs 0030-0032).

Claim 11:
	As per claim 11, Steinberg-Koch, Yadav, Feng, Han and Genner teach the method of claim 10 as described above and Steinberg-Koch further teaches further comprising diagnosing that a particular individual has the rare disease by:

inputting, into the machine learning model, medical data associated with the particular individual (paragraphs 0030-0032);

processing, by using the machine learning model, the input medical data to generate data indicative of whether the particular individual has the rare disease (paragraphs 0030-0032);

and outputting, from the machine learning model, the data indicative of whether the particular individual has the rare disease (paragraphs 0030-0032).

Claim(s) 8 is rejected under 35 U.S.C. 103 as being unpatentable over Steinberg-Koch, Yadav, Feng, Han and Genner as applied to claim 1 above, and further in view of Boon US Publication 20170196541 A1.
Claim 8:
	As per claim 8, Steinberg-Koch, Yadav, Feng, Han and Genner teach the method of claim 1 as described above but do not teach wherein the control dataset comprises medical data of individuals without the rare disease having at least a threshold number of symptoms of the rare disease.  However, Boon  teaches a System and Method for Quantitative Muscle Ultrasound for the Diagnosis of Neuromuscular Disease and further teaches, “In accordance with the present disclosure, a method for determining a normalized ultrasound data value for a patient of interest having a known age, weight, height, and sex includes receiving an ultrasound imaging data set comprising an ultrasound data value for each member of a normalized control group of individuals that do not have a disease state of interest.” (paragraph 0010).   Therefore, it would have been obvious to one of ordinary skilled in the art at the time of filing to modify Steinberg-Koch to include wherein the control dataset comprises medical data of individuals without the rare disease having at least a threshold number of symptoms of the rare disease as taught by Boon in order to enhance the study of the model and predict individuals with a disease.   

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MATTHEW L HAMILTON whose telephone number is (571)270-1837. The examiner can normally be reached Monday-Thursday 9:30-5:30 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Fonya Long can be reached at (571)270-5096. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/MATTHEW L HAMILTON/Primary Examiner, Art Unit 3682
Read full office action
Prosecution Timeline

Dec 19, 2024
Application Filed
Mar 26, 2026
Non-Final Rejection mailed — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/625,113
Patent 12633382
RANKING BIOLOGICAL ENTITY PAIRS BY EVIDENCE LEVEL
4y 4m to grant Granted May 19, 2026
17/950,398
Patent 12626792
HEALTHCARE NETWORK
3y 7m to grant Granted May 12, 2026
18/557,909
Patent 12603184
Systems and Methods for Continuous Cancer Treatment and Prognostics
2y 5m to grant Granted Apr 14, 2026
18/105,015
Patent 12597510
TECHNOLOGIES FOR MEDICAL DEVICE USAGE TRACKING AND ANALYSIS
3y 2m to grant Granted Apr 07, 2026
18/374,450
Patent 12573500
ASSESSING OPERATOR BEHAVIOR DURING A MEDICAL PROCEDURE
2y 5m to grant Granted Mar 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

1-2
Expected OA Rounds
54%
Grant Probability
99%
With Interview (+61.9%)
4y 1m (~2y 8m remaining)
Median Time to Grant
Low
PTA Risk
Based on 511 resolved cases by this examiner. Grant probability derived from career allowance rate.