DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The amendment filed 10 September 2025 has been entered. Applicant amended claims 1, 11-12, 15, 18, and 20 and cancelled claim 10. Accordingly, claims 1-9 and 11-20 remain pending.
Applicant’s amendment to the specification overcomes the drawing objection of 11 June 2025. Therefore, the drawing objection of 11 June 2025 is withdrawn.
Applicant’s amendment to claim 18 overcome the claim objection of 11 June 2025. Therefore, the claim objection of 11 June 2025 is withdrawn.
Response to Arguments
Regarding the Drawing objection and Claim objection:
Applicant’s arguments, filed 10 September 2025, with respect to drawing objection and claim objection have been fully considered and are persuasive. The drawing objection and claim objection of 11 June 2025 have been withdrawn.
Regarding the 35 USC 112(b) rejection:
Applicant’s arguments, filed 10 September 2025, with respect to 35 USC 112(b) rejection of 11 June 2025 have been fully considered and are persuasive for the claim rejections pertaining to the variables m and n. Therefore the 35 USC 112(b) rejection for said claims are withdrawn.
Applicant’s remarks on page 10, regarding claim 7 does not explain the negative value presented in the “-max(…)”. The remarks does not explain the negative sign. Furthermore, the claim should define the “x” operator which is used throughout the claims. It is unclear whether the operation represents a function, multiplication, cross product. Therefore, the 35 USC 112(b) rejection is maintain.
The 35 USC 112(b) rejections for the synthetic generation process is not persuasive. Applicant is claiming potentially every/any type of synthetic generation process, resulting in any/every process to be applied to the claim which may require undue experimentation or excessive amount of testing required for one skilled in the art to make and use the invention. The specification must show teachings for those skilled in the art how to make use the full scope of the claimed invention without “undue experimentation”. Applicant’s disclosure does not disclose experimental examples of the different type of process that can be applied for the synthetic generation process. Therefore, weighing the factual considerations, the enablement requirement is not satisfied, see 35 USC 112(a) rejection. The 35 USC 112(b) rejections of 11 June 2025 for said claims are maintained.
Regarding the 35 USC 101 rejection:
Applicant's arguments filed 10 September 2025 have been fully considered but they are not persuasive.
Applicant’s arguments:
[A] The current independent claims are not merely abstract ides of mathematical concepts and mental processes without significantly more. In contrast, the current claims have been amended to particularly recite that the output comprises an output dataset that is for sharing for at least one of healthcare, artificial intelligence and machine learning, or research.
[B] The current independent claims use real data from a population to output an output dataset generated by a synthetic data generation process. The real data is split into a training dataset and a holdout dataset, which are combined using a sample proportion (n/N) to generate an attack dataset. The training dataset is used to generate a synthetic dataset using the synthetic data generation process. A number of matching records between the synthetic and attack datasets is determined and used to determine a membership disclosure risk score. If the determined score is at an acceptable level, the output dataset is output. As recited in claims 11 and 12, the output dataset is the synthetic dataset generated by the synthetic data generation process, or is a dataset generated by the synthetic data generation process from the real dataset.
[C] None of the claimed steps can be practically, accurately, and efficiently performed by the human mind. Generating and outputting these datasets based on determined membership disclosure risks cannot be practically performed by the human mind. The complexity and accuracy required for providing the large amounts of data used to generate attack and synthetic datasets, determine a membership disclosure risk score based on matching records between the synthetic and attack datasets, and outputting a generated output dataset based on the risk score are beyond the capabilities of the human mind. Any attempt to perform these steps manually would result in errors and inaccuracies, …Therefore, any output dataset for such uses should be accurately and reliably generated to prevent any adversary from compromising target individuals. Further, as recited in the description at paragraph [0084], the use of the partitioning fraction (or sample proportion) t=n/N, which is used to generate the attack dataset, allows for a more reliable output dataset to be generated and allows for less processing time in a computing system to generate the output dataset.
[D] The claimed invention therefore improves the functioning of computing devices by enabling them to generate output datasets that balance utility and privacy concerns in a computationally less costly manner. This improvement is achieved through the technical steps of generating the attack dataset using the partitioning fraction, generating synthetic data by the synthetic dataset generation process, determining membership disclosure risk scores, and outputting an output dataset generated by the synthetic dataset generation process based on the score. These steps allow for a more accurate evaluation of the privacy characteristics of the output dataset and enhance the capabilities of computing devices in handling large datasets for sharing while ensuring data privacy.
In addition, Applicant submits that the current claims recite elements that integrate the alleged 'judicial exception' into a practical application. These elements include the generated output dataset that is output to be shared for healthcare, artificial intelligence and machine learning, and/or research. The improved determination of the membership disclosure risk score allows for the generated output dataset to have high utility and meet privacy requirements for sharing.
Examiner’s remark:
[A] The claims are presented at a high level of generality that nothing in the claims precludes the claims from being an abstract idea being performed in the human mind and mathematical concepts. The additional element of the output dataset for sharing is considered post solution activity of transmitting data. The process of transmission of data is routine, conventional, well-understood in the art and does not add significantly more to the abstract idea.
[B]/[C] Applicant has indicated that the synthetic data generation process can be any type of process for data generation of synthetic data. This admission on the record does not prevent the process from being a process that can be performed in the human mind with the use of pencil and paper. Furthermore, the claims does not impose restrictions of the m, n, and N in that there is nothing preventing one or more of those variables from being 0.
[D] The judicial exception alone cannot provide the improvement. The improvement can be provided by one or more additional elements. See the discussion of Diamond v. Diehr, 450 U.S. 175, 187 and 191-92, 209 USPQ 1, 10 (1981)). In addition, the improvement can be provided by the additional element(s) in combination with the recited judicial exception. It has been determined that the claims do not provide additional elements that amounts to significantly more than the abstract idea.
Regarding the 35 USC 103 rejection:
Applicant’s remarks:
Applicant submits that Oliveira further fails to disclose any generation of an attack dataset by combining m x
PNG
media_image1.png
46
18
media_image1.png
Greyscale
records from the training dataset and m x (1 -
PNG
media_image2.png
47
29
media_image2.png
Greyscale
records from the hold out dataset, where m is the number of records in the attack dataset, n is the number of records in the real dataset, and N is the size of the population, nor any generation of the synthetic dataset as claimed. There is no determination or generation of any attack dataset that is used with a generated synthetic dataset to determine a number of matching records. Therefore, Oliveira does not disclose or suggest the generated attack dataset and the generated synthetic data, as claimed. It is respectfully noted that the fractions used to determine the number of records from the training and holdout datasets for the attack dataset of the claimed invention are not disclosed or suggested in Oliveira…Oliveira fails to teach or suggest such a fraction used to generate an attack dataset. None of Oliveira, Goncalves, or Duddu cure the deficiencies of Oliveira. None of the cited references teaches or suggests the claimed fractions used to determine the number of records from the training and holdout datasets for the attack dataset, which is then used to determine the membership disclosure risk score of the claimed invention.
Examiner’s remarks:
Applicant has defined the variables of m and n; however, Applicant has failed to provide initial conditions or bounds on the variables that prevent one or more variables from being zero or N. In applying the broadest reasonable interpretation, the examiner recommends for Applicant to provide more details to prevent an interpretation of a null or zero data set or n=N. For example, in an initialization for the generation of the attack data set, m may initially be 0, m=0. The claims does not provide an initial condition for the attack data set, but merely the attack dataset is generated. Therefore, the attack dataset generation may be 0 initially. There is no additional iteration that will bring m from the zero condition. Another condition, is when n=N, wherein the attack data set will be m records from the training dataset and 0 records from the hold data set. The examiner has applied the prior art with the broadest reasonable interpretation as best understood. Furthermore, other than matching and outputting of the data, there is no recitation in the independent claims that the generated attack dataset is iterated higher than m= zero condition from which the attack dataset is initially in the generation. The claims do not provide teachings that there is an initial receiving of the attack dataset. The examiner has maintain the rejection.
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
Claims 1-9 and 11-20 are rejected under 35 U.S.C. 112(a) as failing to comply with the enablement requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to enable one skilled in the art to which it pertains, or with which it is most nearly connected, to make and/or use the invention.
Claims 1, 15, 19, and 20 recite generating a synthetic dataset from the training dataset or from the real data set (as disclosed in claim 12) using a synthetic data generation process. Paragraph 71 of Applicant’s specification and Applicant noted in the remarks that the synthetic data generation process can by any type of synthetic data generation process. Algorithms or working examples for the synthetic data generation processes to generate the synthetic dataset are not described in the disclosure. The synthetic data generation process is required for the matching of the attack records and the synthetic data set and determining the membership disclosure risk score. Even though the specification provides limited nomenclature examples of synthetic data generation processes, the specification disclosure does not provide described critical teachings of the synthetic data generation processes (being any type of synthetic data generation process) for the claims to be enabled. Therefore, the disclosure does not enable a skilled artisan to predict synthetic data across the breadth of the claimed synthetic generation process. The specification must show teachings for those skilled in the art how to make use the full scope of the claimed invention without “undue experimentation”.
In order to determine compliance with the enablement requirement of 35 U.S.C. 112(a), the Federal Circuit developed a framework of factors in In re Wands, 858 F.2d 731, 737, 8 USPQ2d 1400, 1404 (Fed. Cir. 1988), referred to as the Wands factors to assess whether any necessary experimentation required by the specification is "reasonable" or is "undue." Consistent with Amgen Inc. et al. v. Sanofi et al., 598 U.S. 594, 2023 USPQ2d 602 (2023), the Wands factors continue to provide a framework for assessing enablement in a utility application or patent, regardless of technology area. See Guidelines for Assessing Enablement in Utility Applications and Patents in View of the Supreme Court Decision in Amgen Inc. et al. v. Sanofi et al., 89 FR 1563 (January 10, 2024). These factors include, but are not limited to:
The breadth of the claims — Everything within the scope of the claim is not enabled. The specification does not teach every synthetic generation process technique to enable the claims for the generation of the output dataset generated by the synthetic generation process.
The nature of the invention — Specification merely recites nomenclature of a few synthetic generation processes. No formula, nor algorithm is provided for any synthetic generation processes.
The state of the prior art — While a few synthetic generation processes may be well known in the art, there are still a great number of different possible processes to generate synthetic data from training data/real data, and provide an output dataset based on matching and disclosure risks, and the specification does not identify any relevant formulas or working examples.
The level of one of ordinary skill — Even skilled artisans would need to devise new methods to practice the invention across its entire scope and must experiment broadly to determine which formulas to use;
The level of predictability in the art — Due to any and every synthetic generation process that can be achieved on training dataset or real dataset , depending on the data, methods and formulas used to determine output dataset, results can vary greatly and be very unpredictable; furthermore, the use of any type of synthetic data generation process is not predictable for the matching to an attack dataset, so that a person skilled in the art may not have believe the success of applying any and all types of synthetic data generation process.
The amount of direction provided by the inventor — Limited to very specific examples, yet different synthetic generation processes and no formulas/algorithms are supplied;
The existence of working examples — no examples that contain formulas or algorithms are provided; and
The quantity of experimentation needed to make or use the invention based on the content of the disclosure — substantial, since artisans must invent techniques and determine the correct mathematical formula for the different output potentially generated for variety synthetic generation processes.
Based on the evidence regarding the Wands factors above, the specification, at the time the application was filed, would not have taught one skilled in the art how to make and/or use the full scope of the claimed invention. In re Wright, 999 F.2d 1557, 1562, 27 USPQ2d 1510, 1513 (Fed. Cir. 1993).
Claims 2-9, 11, and 13-14 are rejected as being dependent on, and failing to overcome the deficiencies of, rejected independent claim 1.
Claims 16-18 are rejected as being dependent on, and failing to overcome the deficiencies of, rejected independent claim 15.
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
Claims 1-9 and 11-20 are rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, regards as the invention.
Claim 1 recites m x
n
N
records from the training dataset and m x (1-
n
N
) records from the holdout dataset for generating an attack dataset. However, m is not initially defined, yet m is defined as the attack dataset. The claim does not provide a limitation of previously extracting an initial “m”. The claim does not prevent “m” from being zero and does not provide steps of going beyond the “0th” iteration. It is unclear whether m is initialize or defined as in the initial iteration of generating the attack dataset. Therefore, the claim has failed to distinctly claim the subject matter. The examiner has interpreted said limitations as best understood
Claim 7 recites the equation for the loss, which includes M>0.2 in one inversion bracket and M<=0.2 in another inversion bracket. The operator “x” is not defined in describing that M be M>0.2 in one inversion bracket and M<=0.2 in another inversion bracket, in the loss equation. For example, for the inversion of M>0.2, |M|= value of 1 if M is true, and 0 otherwise. Thus, a value of 0 will result in the entire for the inversion. For the inversion of M<=0.2, |M|= value of 1 if M is true, and 0 otherwise. Thus, a value of 0 will result in the entire for the inversion. In addition, RU in lossru is not defined. Therefore, claim 7 is further rejected for being indefinite for failing to particularly point out and distinctly claim loss equation. The examiner has interpreted said limitations as best understood.
Claims 1, 5, 7, 15, and 20 uses the same operator “x”. Based on the Applicant’s remarks, it is unclear whether the operator indicates multiplication, cross product, or some other function. It is requested for Applicant to define each operator (For example, according to Applicant’s remarks “x” operator appears to be a different function for claim 7 than claim 1.
Claims 2-4, 6, 8-9 and 11-14 are rejected as being dependent on, and failing to cure the deficiencies of, rejected independent claim 1.
Claim 15 recites m x
n
N
records from the training dataset and m x (1-
n
N
) records from the holdout dataset for generating an attack dataset. However, m is not initially defined, yet m is defined as the attack dataset. The claim does not provide a limitation of previously extracting an initial “m”. The claim does not prevent “m” from being zero and does not provide steps of going beyond the “0th” iteration. It is unclear whether m is initialize or defined as in the initial iteration of generating the attack dataset. Therefore, the claim has failed to distinctly claim the subject matter. The examiner has interpreted said limitations as best understood as indicated in the office action.
Claims 16-19 are rejected as being dependent on, and failing to cure the deficiencies of, rejected independent claim 15.
Claim 20 recites m x
n
N
records from the training dataset and m x (1-
n
N
) records from the holdout dataset for generating an attack dataset. However, m is not initially defined, yet m is defined as the attack dataset. The claim does not provide a limitation of previously extracting an initial “m”. The claim does not prevent “m” from being zero and does not provide steps of going beyond the “0th” iteration. It is unclear whether m is initialize or defined as in the initial iteration of generating the attack dataset. Therefore, the claim has failed to distinctly claim the subject matter. The examiner has interpreted said limitations as best understood as indicated in the office action.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract ideas of mathematical concepts and mental processes without significantly more. The independent claim(s) recite(s) “receiving at a computing device a real dataset comprising a plurality (n) of records from a population having an estimated population size of N; splitting the real dataset into a training dataset and a holdout dataset; generating an attack dataset by combining m x
n
N
records from the training dataset and m x (1-
n
N
) records from the holdout dataset, where m is the number of record in the attack dataset and n is the number of records in the real dataset; generating a synthetic dataset from the training dataset using a synthetic data generation process; determining a number of matching records in the attack dataset that match a record in the synthetic dataset; determining a membership disclosure risk score based on the number of matching records; and providing an output at the computing device based on the membership disclosure risk score when the membership disclosure risk score indicates an acceptable level of risk, wherein the output comprises an output dataset generated by the synthetic data generation process, and the output dataset is for sharing for at least one of healthcare artificial intelligence and machine learning, or research”. The limitations of “ splitting the real dataset into a training dataset and a holdout dataset; generating an attack dataset by combining m x
n
N
records from the training dataset and m x (1-
n
N
) records from the holdout dataset; generating a synthetic dataset from the training dataset using a synthetic data generation process; determining a number of matching records in the attack dataset that match a record in the synthetic dataset; determining a membership disclosure risk score based on the number of matching records, and wherein the output comprises an output dataset generated by the synthetic data generation process” pertain to a method of use in providing synthetic datasets and are directed to steps under its broadest reasonable interpretation covers performance of the limitations being a mental process and mathematical concepts abstract idea. The steps recited above can manually be performed by a human using pencil and paper and are directed to mathematical concepts. Therefore, nothing in the claimed elements preclude the steps from being mathematical concepts in combination with the steps being performed manually by a human using pencil and paper. If a claim under its broadest reasonable interpretation covers performance in the mind, or by a human using pencil and paper and utilizes mathematical concepts, then the claim falls within the mental process and mathematical concepts grouping of abstract ideas. Accordingly, claims 1, 15, and 20 recite an abstract idea.
This judicial exception is not integrated into a practical application. The independent claims recite additional elements of “receiving at a computing device a real dataset comprising a plurality (n) of records from a population having an estimated population size of N”, “providing an output at the computing device based on the membership disclosure risk score when the membership disclosure risk score indicates an acceptable level of risk”. Claims 15 and 20 further recite computer components such as memory and processor which executes instructions pertaining to the method.
The additional element of “receiving at a computing device a real dataset comprising a plurality (n) of records from a population having an estimated population size of N” is merely data gathering or obtaining data to be manipulated. Data gathering is considered pre-solution activity or extra solution activity that is well-understood, routine and convention. Such limitations does not integrate the abstract idea into a practical application and are not elements that are sufficient to amount to significantly more than the judicial exception because the gathering of data is extra-solution activity.
The additional element of “providing an output at the computing device based on the membership disclosure risk score when the membership disclosure risk score indicates an acceptable level of risk, wherein the output comprises an output dataset generated by the synthetic data generation process, and the output dataset is for sharing for at least one of healthcare artificial intelligence and machine learning, or research” is an extra solution activity or post solution activity of outputting the data. Such limitations does not integrate the abstract idea into a practical application and are not elements that are sufficient to amount to significantly more than the judicial exception because the gathering of data is extra-solution activity.
The additional element of generic computing components such as the memory and processor which executes instructions pertaining to the method as recited in claims 15 and 20 are recited at a high level of generality such that it amounts no more than mere instructions to apply the exception using the generic computing components. This additional element to no integrate the abstract idea into a practical application and are not elements that are sufficient to amount to significantly more than the judicial exception because it does not impose meaningful limits on practicing the abstract idea. Thus, the independent claims are not patent eligible under 35 USC 101.
Claims 2 and 16 recite further limitations that narrow the synthetic data generation process and such limitations are regarded as processes directed to a mental process and mathematical concept abstract idea. Because the limitations do not add significant elements, the claims do not integrate the judicial exception into a practical application and do not amount to significantly more than the judicial exception. Thus, claims 2 and 16 are not eligible under 35 USC 101.
Claims 3 and 17 recite further limitations that narrow the membership disclosure risk score and such limitations are regarded as processes directed to a mental process and mathematical concept abstract idea. Because the limitations do not add significant elements, the claims do not integrate the judicial exception into a practical application and do not amount to significantly more than the judicial exception. Thus, claims 3 and 17 are not eligible under 35 USC 101.
Claims 4 and 18 recite further limitations that narrow the membership disclosure risk score and such limitations are regarded as processes directed to a mental process and mathematical concept abstract idea. Because the limitations do not add significant elements, the claims do not integrate the judicial exception into a practical application and do not amount to significantly more than the judicial exception. Thus, claims 4 and 18 are not eligible under 35 USC 101.
Claim 5 recites further limitations that narrow the membership disclosure risk score and such limitations are regarded as processes directed to a mental process and mathematical concept abstract idea. Because the limitations do not add significant elements, the claim does not integrate the judicial exception into a practical application and do not amount to significantly more than the judicial exception. Thus, claim 5 is not eligible under 35 USC 101.
Claims 6 and 19 recite further limitations that narrow the membership disclosure risk score and such limitations are regarded as processes directed to a mental process and mathematical concept abstract idea. Because the limitations do not add significant elements, the claims do not integrate the judicial exception into a practical application and do not amount to significantly more than the judicial exception. Thus, claims 6 and 19 are not eligible under 35 USC 101.
Claim 7 recites further limitations that narrow the synthetic data generation process and such limitations are regarded as processes directed to a mental process and mathematical concept abstract idea. Because the limitations do not add significant elements, the claim does not integrate the judicial exception into a practical application and do not amount to significantly more than the judicial exception. Thus, claim 7 is not eligible under 35 USC 101.
Claims 8 and 10-12 recite further limitations that narrow the output recited in the independent claims and such limitations are regarded as post solution activity. Because the limitations do not add significant elements, the claims do not integrate the judicial exception into a practical application and do not amount to significantly more than the judicial exception. Thus, claims 8 and 10-12 are not eligible under 35 USC 101.
Claim 9 recites further limitations that narrow the membership disclosure risk score and such limitations are regarded as processes directed to a mental process and mathematical concept abstract idea. Because the limitations do not add significant elements, the claim does not integrate the judicial exception into a practical application and do not amount to significantly more than the judicial exception. Thus, claim 9 is not eligible under 35 USC 101.
Claims 13-14 recite further limitations that narrow the membership disclosure risk score and such limitations are regarded as processes directed to a mental process and mathematical concept abstract idea. Because the limitations do not add significant elements, the claims do not integrate the judicial exception into a practical application and do not amount to significantly more than the judicial exception. Thus, claims 13-14 are not eligible under 35 USC 101.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-9 and 11-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Santana De Oliveira et al US 20230267216 (hereinafter Oliveira), in view of Goncalves A, Ray P, Soper B, Stevens J, Coyle L, Sales AP. Generation and evaluation of synthetic patient data. BMC Med Res Methodol. 2020 May 7;20(1):108. doi: 10.1186/s12874-020-00977-1. PMID: 32381039; PMCID: PMC7204018 (hereinafter Goncalves), in further view of Duddu, Vasisht et al. “SHAPr: An Efficient and Versatile Membership Privacy Risk Metric for Machine Learning.” ArXiv abs/2112.02230 (2021) (hereinafter Duddu), and in further view of Shalev et al WO 2022150343 (hereinafter Shalev).
As to claim 1, Oliveira teaches a method for use in providing a synthetic dataset (Figures 2 and 5 disclose system and method for use in providing a synthetic dataset, wherein the synthetic dataset is the training dataset combined with noise according to machine learning model) comprising:
receiving at a computing device (Figure 9 shows a block diagram of the computer system that executes the method) a real dataset comprising a plurality (n) of records from a population having an estimated population size of N (paragraph 30 discloses a dataset from a population is acquired. Note: The size of a population refers to a number N);
splitting the real dataset into a training dataset and a holdout dataset (paragraph 30 discloses splitting the dataset of the population into a training dataset and a test dataset. Note since n and m are undefined variables in the claims, applying the broadest reasonable interpretation, such limitations merely states splitting the dataset such that a fraction of the data set is the training dataset and a fraction of the dataset if the testing dataset);
generating an attack dataset by combining m x
n
N
records from the training dataset and m x (1-
n
N
) records from the holdout dataset, where m is the number of records in the attack dataset and n is the number of record in the real dataset (paragraphs 30-32 disclose the adversary model/attack dataset is provided with the training data and the test data. Note since m is not initialize variable in the claims, applying the broadest reasonable interpretation (m=0 for initial condition, therefore, there is no attack dataset; however, applying broadest reasonable interpretation the limitations merely states splitting the dataset such that a portion of the data set is the training dataset and a portion of the dataset is the testing dataset. Paragraph 29 reveals the output of the adversary model is used to modify the differential privacy budget applied to training data. The “x” operator is not defined, based on applicant arguments, see 35 USC 112(b) rejection);
generating a synthetic dataset from the training dataset using a synthetic data generation process (paragraph 47 discloses generating a transformed dataset by applying noise to the input data that is received by random noise generation process, wherein paragraph 25 discloses input data item include the training data);
determining a number of matching records in the [training data items] that match a record in the synthetic dataset (paragraph 50 discloses generate membership loss by indicating the probability of the training data items/synthetic data set are match with the training dataset using the adversary model);
determining a [ risk score] based on the number of matching records (paragraph 50 discloses determining the probability of the training data items/synthetic data set are match with the training dataset using the adversary model); and
providing an output at the computing device based on the membership disclosure risk score when the membership disclosure risk score indicates an acceptable level of risk (paragraph 28 discloses the output of the classifier is then determined based on the transformed training data items generated by the multi-feature differential privacy layer. Additional training epochs may be performed until the output of the classifier reaches a suitable level of accuracy. Paragraph 41 discloses when there is no change to the privacy budget/score, than the classifier is ready for use on production input data items, wherein paragraph 25 discloses input data item include the training data. Paragraph 15 discloses the privacy budget ε may describe the maximum permissible difference between a query on the training data and the training data adding or removing one entry. The privacy budget can also describe the amount of random noise that is added to the machine learning training data set X.sub.1, such that the resulting machine learning model cannot be discerned from a machine learning model trained on the machine learning training data set adding or removing one entry X.sub.2. A lower privacy budget (e.g., a smaller permissible difference between the training data set X.sub.1 and the training data set adding or removing one entry X.sub.2) implies a higher level of random noise added to the training data set X.sub.1).
Oliveira does not teach determining a number of matching records in the attack dataset that match a record in the synthetic dataset and determining a membership disclosure risk score; and wherein the output comprises an output dataset generated by the synthetic data generation process, and the output dataset is for sharing for at least one of healthcare artificial intelligence and machine learning, or research.
Goncalves teaches determining a number of matching records in the attack dataset that match a record in the synthetic dataset (page 8, first column, first paragraph recites for membership disclosure, record x (which can be the attack dataset) is present in the training dataset if there is at least one synthetic data sample within a certain distance to the record x. Page 7, second column, last paragraph disclose the attacker dataset entails an attacker having the complete set of records. Therefore the attack record is a combination of the training dataset and the test dataset, where a set of record used to train and another set of records called the test records is compared/matched with the synthetic dataset).
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the matching in Oliveira’s method for use in providing synthetic dataset with Goncalves’s teachings of matching records of an attacker with synthetic dataset train the synthetic data models such that any private information leakage from the synthetic model is not significant (page 31, second column of Goncalves).
The combination of Oliveira in view of Goncalves does not teach determining a membership disclosure risk score; and wherein the output comprises an output dataset generated by the synthetic data generation process, and the output dataset is for sharing for at least one of healthcare artificial intelligence and machine learning, or research.
Duddu teaches determining a membership disclosure risk score based on the number of matching records (page 2, second column and page 3, first column, section B discloses computing the membership privacy risk metric based on matching records with the output data).
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the probability score in Oliveira’s method for use in providing synthetic dataset in view of Goncalves’s teachings of matching records of an attacker with synthetic dataset train the synthetic data models with Duddu’s membership disclosure risk to generate scores which can impact its effectiveness to asses susceptibility to newer membership inference attacks (page 1, second column, first paragraph of Duddu).
The combination of Oliveira in view of Goncalves and Duddu does not teach, but Shalev teaches and wherein the output comprises an output dataset generated by the synthetic data generation process, and the output dataset is for sharing for at least one of healthcare artificial intelligence and machine learning, or research (paragraph 60 reveals generative model generates synthetic data. The synthetic data may be input to various ML models).
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the probability score in Oliveira’s method for use in providing synthetic dataset in view of Goncalves’s teachings of matching records of an attacker with synthetic dataset train the synthetic data models and Duddu’s membership disclosure risk with Shalev’s teachings of the output synthetic data being input to various ML model to improve the quality and quantity of data available for improving the modeling of systems, training machine learning models, and/or other purposes by offering improved generation of synthetic data and/or validation of the models generating the synthetic data (paragraph 9 of Shalev).
As to claim 2, the combination of Oliveira in view of Goncalves, Duddu, and Shalev teaches further comprising tuning one or more hyperparameters of the synthetic data generation process when the membership disclosure risk score indicates an unacceptable level of risk (Oliveira: paragraphs 15-16, 41, 44 disclose determining importance values for various features of the input/training data items to the classifier, wherein adjusting the privacy parameter affects the matching and the probability distribution/metric. The feature importance tool may send the feature importance values for the various features to the budget tool. It is then determined whether there are to be any changes to the respective privacy budgets for the various features of the input data items. The operation may be performed, for example, by the budget tool. If there is to be a change to the privacy budget of one or more of the features of the input data items, the change is made at operation and the process flow may return to the operation 202 with the updated feature privacy budget or budgets. Goncalves: page 9, first column, last paragraph to second column disclose for the synthetic data generation, utilizing hyper-parameters. Hyperparameters are settings that determine the learning process and the resulting model's structure). Motivation is similar to the motivation presented in claim 1.
As to claim 3, the combination of Oliveira in view of Goncalves, Duddu, and Shalev teaches wherein the membership disclosure risk score is based on an F1 score (Duddu: page 7, second column, reveals the membership privacy risk score is based on an F1 score). Motivation similar to the motivation presented in claim 1.
As to claim 4, the combination of Oliveira in view of Goncalves, Duddu, and Shalev teaches wherein the membership disclosure risk is an M score determined according to: M=
F
-
F
m
a
x
1
-
F
m
a
x
, where F is the F1 score; and Fmax is a maximum F1 score (Duddu: page 7, second column, reveals the membership privacy risk score is based on an F1 score. Page 2, second column disclose the membership disclosure risk score as the posterior probability that z given the output prediction from the model fθ(xi), wherein the score is computed using bayes theorem. Average membership privacy risk score is the average over the membership privacy risk scores assigned to training data records by a metric to evaluate the membership privacy risk across a group of data records. Three additional metrics to measure the success of the SHAPR scores with respect to Iment and Ilira: precision and F1 score. The highest values is one indicates perfect precision. Note: the examiner applied the BRI, because variable n is not define in the claims, a value of n=0, result in M= Fscore). Motivation similar to the motivation presented in claim 1.
As to claim 5, the combination of Oliveira in view of Goncalves, Duddu, and Shalev teaches wherein Fmax is determined according to: Fmax=
2
n
N
1
+
n
N
(Duddu: page 7, second column, reveals the membership privacy risk score is based on an F1 score. F1 score is the harmonic mean of precision and recall computed as (2* precision*recall)/ (precision+recall). The highest values is one indicates perfect precision and recall while the minimum value of zero is when either precision or recall are zero. Note: the examiner applied the BRI, because variable n is not define in the claims, wherein n=0, result in Fmax=0, which will be the value when either the precision or recall values are zero). Motivation similar to the motivation presented in claim 1.
As to claim 6, the combination of Oliveira in view of Goncalves, Duddu, and Shalev teaches wherein the M score is used in a loss function for training a model used in the synthetic data generation process (Duddu: page 5, second column, last paragraph, and page 6, first column, last paragraph disclose M score is computed and noted in case 3, when the score is <0, there is a higher loss and the data record is indistinguishable from testing data records which makes the data records less susceptible to MIAs. Page 2, column 2, further reveals the membership score is predicted by measuring the likelihood of its loss under each of the distributions and return the membership corresponding to the most likely distribution). Motivation similar to the motivation presented in claim 1.
As to claim 7, the combination of Oliveira in view of Goncalves, Duddu, and Shalev teaches wherein the loss function is determined according to
PNG
media_image3.png
61
446
media_image3.png
Greyscale
(Duddu: page 6, first column, disclose a condition where there is no membership risk, where M=0 and therefore the loss is 0). Motivation similar to the motivation presented in claim 1.
As to claim 8, the combination of Oliveira in view of Goncalves, Duddu, and Shalev teaches wherein the output comprises the trained model used in the synthetic data generation process (Oliveira: paragraph 28 discloses the output of the classifier (paragraph 21 discloses classier machine learning model) is then determined bas