DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
Claims 21-40 as presented in the preliminary amendment dated 11/15/2021 are pending and are examined herein.
Claims 21-40 are rejected under 35 USC 112(b).
Claims 21-40 are rejected under 35 USC 101 as being directed to an abstract idea without significantly more.
Claims 21-40 are rejected under 35 USC 102 or 103.
Claims 21-40 are rejected on the grounds of non-statutory double patenting.
Information Disclosure Statement
The attached information disclosure statements (IDS) are in compliance with the provisions of 37 CFR 1.97. Accordingly, the attached information disclosure statements are being considered by the examiner.
Claim Rejections - 35 USC § 112(b)
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 21-40 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Independent claims 21, 35 and 40 recite “sensitive data portion”; however, “sensitive” is a subjective term which renders the scope of the claim indefinite. While the specification provides examples of data which may be sensitive (e.g., “customer financial information, patient healthcare information, and the like”, see published specification at [0030]), the specification does not provide a definition of the term. The term “sensitive” in its plain meaning does not unambiguously refer to a fixed collection of datatypes, but rather is dependent on the context and priorities of the person using the term. Consequently, a person of ordinary skill in the art would not be reasonably apprised of the scope of the claimed invention. For the purposes of examination, “sensitive data portion” will be interpreted as “
Claims 30 and 31 use the terms “unstructured data” and “structured data”. These terms do not have a clear and unambiguous meaning in the art. For example, consider the following discussion from the Wikipedia article for Unstructured data dated 23 April 2018: “The term is imprecise for several reasons: 1. Structure, while not formally defined, can still be implied. 2. Data with some form of structure may still be characterized as unstructured if its structure is not helpful for the processing task at hand. 3. Unstructured information might have some structure (semi-structured) or even be highly structured but in ways that are unanticipated or unannounced.” Consequently, a person having ordinary skill in the art would not be reasonably apprised of the scope of structured/unstructured data. For the purposes of examination, claim 30 will be interpreted as “The system of claim 21, wherein the actual data comprises
Claim Rejections - 35 USC § 101 – Abstract Idea
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 21-40 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis
Each of the claims fall within one of the four statutory categories (i.e. process, machine, manufacture, or composition of matter).
Step 2 Analysis
Claim 21 includes the following recitation of an abstract idea:
determining a class associated with the at least one sensitive data portion; (This is practical to perform in the human mind under its broadest reasonable interpretation. This is a recitation of a mental process.)
...generating, ..., at least one synthetic data portion; and (This is practical to perform in the human mind under its broadest reasonable interpretation. This is a recitation of a mental process.)
replacing the at least one sensitive data portion with the at least one synthetic data portion. (This is practical to perform in the human mind under its broadest reasonable interpretation. This is a recitation of a mental process.)
Claim 21 recites the following additional elements which, considered individually and as an ordered combination with the additional elements from the claim upon which it depends, do not integrate the abstract idea into a practical application or amount to significantly more than the abstract idea:
A system comprising: at least one processor; and at least one non-transitory memory storing instructions that, when executed by the at least one processor, cause the system to perform operations comprising: (This is a high level recitation of generic computer components for performing the abstract idea. This does not integrate the abstract idea into a practical application or amount to significantly more than the abstract idea. See MPEP 2106.05(f).)
receiving actual data having at least one sensitive data portion; (This is insignificant extra-solution activity. See MPEP 2106.05(g). Moreover, sending or receiving data is well-understood, routine, conventional as evidenced by the court cases cited at MPEP 2106.05(d), example i. Receiving or transmitting data.)
accessing a synthetic data generation model trained using a data space having data of the class; (This is a high level recitation of generic computer components for performing the abstract idea. This does not integrate the abstract idea into a practical application or amount to significantly more than the abstract idea. See MPEP 2106.05(f).)
...using the synthetic data generation model (This is a high level recitation of generic computer components for performing the abstract idea. This does not integrate the abstract idea into a practical application or amount to significantly more than the abstract idea. See MPEP 2106.05(f).)
Claim 21 does not reflect an improvement to computer technology or any other technology.
Claim 22 recites at least the abstract idea identified above in the claim upon which it depends, and further recites
generate synthetic data satisfying a similarity criterion. (This is practical to perform in the human mind under its broadest reasonable interpretation. This is a recitation of a mental process.)
Claim 22 recites the following additional elements which, considered individually and as an ordered combination with the additional elements from the claim upon which it depends, do not integrate the abstract idea into a practical application or amount to significantly more than the abstract idea:
wherein the synthetic data generation model is trained to (This is a high level recitation of generic computer components for performing the abstract idea. This does not integrate the abstract idea into a practical application or amount to significantly more than the abstract idea. See MPEP 2106.05(f).)
Claim 22 does not reflect an improvement to computer technology or any other technology.
Claim 23 recites at least the abstract idea identified above in the claim upon which it depends, and further recites
wherein the similarity criterion is based on at least one of a statistical correlation score, a data similarity score, or a data quality score. (This is a further detail of the mental process. This is practical to perform in the human mind under its broadest reasonable interpretation. A person could both generate and use any of the listed scores. This is a recitation of a mental process.)
Claim 23 does not recite further additional elements which might integrate the abstract idea into a practical application or amount to significantly more than the abstract idea.
Claim 23 does not reflect an improvement to computer technology or any other technology.
Claim 24 recites at least the abstract idea identified above in the claim upon which it depends.
Claim 24 recites the following additional elements which, considered individually and as an ordered combination with the additional elements from the claim upon which it depends, do not integrate the abstract idea into a practical application or amount to significantly more than the abstract idea:
wherein the synthetic data generation model is a generative adversarial network (GAN). (This is an attempt to limit the abstract idea to a particular field of use or technological environment, which does not integrate the abstract idea into a practical application or amount to significantly more than the abstract idea. See MPEP 2106.05(h).)
Claim 24 does not reflect an improvement to computer technology or any other technology.
Claim 25 recites at least the abstract idea identified above in the claim upon which it depends, and further recites
determining the class associated with the at least one sensitive data portion (This is practical to perform in the human mind under its broadest reasonable interpretation. This is a recitation of a mental process.)
Claim 25 recites the following additional elements which, considered individually and as an ordered combination with the additional elements from the claim upon which it depends, do not integrate the abstract idea into a practical application or amount to significantly more than the abstract idea:
comprises applying a recurrent neural network (RNN) to the actual data. (This is an attempt to limit the abstract idea to a particular field of use or technological environment, which does not integrate the abstract idea into a practical application or amount to significantly more than the abstract idea. See MPEP 2106.05(h).)
Claim 25 does not reflect an improvement to computer technology or any other technology.
Claim 26 recites at least the abstract idea identified above in the claim upon which it depends, and further recites
distinguish between classes of data (This is practical to perform in the human mind under its broadest reasonable interpretation. This is a recitation of a mental process.)
Claim 26 recites the following additional elements which, considered individually and as an ordered combination with the additional elements from the claim upon which it depends, do not integrate the abstract idea into a practical application or amount to significantly more than the abstract idea:
wherein the RNN is trained to (This is an attempt to limit the abstract idea to a particular field of use or technological environment, which does not integrate the abstract idea into a practical application or amount to significantly more than the abstract idea. See MPEP 2106.05(h).)
Claim 26 does not reflect an improvement to computer technology or any other technology.
Claim 27 recites at least the abstract idea identified above in the claim upon which it depends.
Claim 27 recites the following additional elements which, considered individually and as an ordered combination with the additional elements from the claim upon which it depends, do not integrate the abstract idea into a practical application or amount to significantly more than the abstract idea:
wherein the at least one sensitive data portion is a first text string and the at least one synthetic data portion is a second text string. (This is an attempt to limit the abstract idea to a particular field of use or technological environment, which does not integrate the abstract idea into a practical application or amount to significantly more than the abstract idea. See MPEP 2106.05(h).)
Claim 27 does not reflect an improvement to computer technology or any other technology.
Claim 28 recites at least the abstract idea identified above in the claim upon which it depends.
Claim 28 recites the following additional elements which, considered individually and as an ordered combination with the additional elements from the claim upon which it depends, do not integrate the abstract idea into a practical application or amount to significantly more than the abstract idea:
wherein the at least one sensitive data portion comprises at least one of a social security number or a financial service account number. (This is an attempt to limit the abstract idea to a particular field of use or technological environment, which does not integrate the abstract idea into a practical application or amount to significantly more than the abstract idea. See MPEP 2106.05(h).)
Claim 28 does not reflect an improvement to computer technology or any other technology.
Claim 29 recites at least the abstract idea identified above in the claim upon which it depends.
Claim 29 recites the following additional elements which, considered individually and as an ordered combination with the additional elements from the claim upon which it depends, do not integrate the abstract idea into a practical application or amount to significantly more than the abstract idea:
wherein the actual data comprises personnel records or patient medical records. (This is an attempt to limit the abstract idea to a particular field of use or technological environment, which does not integrate the abstract idea into a practical application or amount to significantly more than the abstract idea. See MPEP 2106.05(h).)
Claim 29 does not reflect an improvement to computer technology or any other technology.
Claim 30 recites at least the abstract idea identified above in the claim upon which it depends.
Claim 30 recites the following additional elements which, considered individually and as an ordered combination with the additional elements from the claim upon which it depends, do not integrate the abstract idea into a practical application or amount to significantly more than the abstract idea:
wherein the actual data comprises unstructured data, the unstructured data comprising at least one of character strings or tokens. (This is an attempt to limit the abstract idea to a particular field of use or technological environment, which does not integrate the abstract idea into a practical application or amount to significantly more than the abstract idea. See MPEP 2106.05(h).)
Claim 30 does not reflect an improvement to computer technology or any other technology.
Claim 31 recites at least the abstract idea identified above in the claim upon which it depends.
Claim 31 recites the following additional elements which, considered individually and as an ordered combination with the additional elements from the claim upon which it depends, do not integrate the abstract idea into a practical application or amount to significantly more than the abstract idea:
wherein the actual data comprises structured data, the structured data comprising at least one of a key-value pair, a relational database file, or a spreadsheet. (This is an attempt to limit the abstract idea to a particular field of use or technological environment, which does not integrate the abstract idea into a practical application or amount to significantly more than the abstract idea. See MPEP 2106.05(h).)
Claim 31 does not reflect an improvement to computer technology or any other technology.
Claim 32 recites at least the abstract idea identified above in the claim upon which it depends, and further recites
the operations further comprising selecting the synthetic data generation model based on the determined class. (This is practical to perform in the human mind under its broadest reasonable interpretation. This is a recitation of a mental process.)
Claim 32 does not recite further additional elements which might integrate the abstract idea into a practical application or amount to significantly more than the abstract idea.
Claim 32 does not reflect an improvement to computer technology or any other technology.
Claim 33 recites at least the abstract idea identified above in the claim upon which it depends, and further recites
determining a subclass associated with the at least one sensitive data portion; (This is practical to perform in the human mind under its broadest reasonable interpretation. This is a recitation of a mental process.)
and selecting the synthetic data generation model based on the determined subclass. (This is practical to perform in the human mind under its broadest reasonable interpretation. This is a recitation of a mental process.)
Claim 33 does not recite further additional elements which might integrate the abstract idea into a practical application or amount to significantly more than the abstract idea.
Claim 33 does not reflect an improvement to computer technology or any other technology.
Claim 34 recites at least the abstract idea identified above in the claim upon which it depends, and further recites
wherein determining the subclass comprises using a distribution model associated with the class. (This is practical to perform in the human mind under its broadest reasonable interpretation. This is a recitation of a mental process.)
Claim 34 does not recite further additional elements which might integrate the abstract idea into a practical application or amount to significantly more than the abstract idea.
Claim 34 does not reflect an improvement to computer technology or any other technology.
Claims 35-39 are method claims whose steps are substantially similar to the functions recited in the systems of claims 21-23 and 32-33, respectively, and are rejected with the same rationale.
Claim 40 recites substantially similar subject matter to claim 1 including substantially the same abstract idea.
Claim 40 recites the following additional element which, considered individually and in ordered combination with the additional elements already addressed above with respect to claim 1, does not integrate the abstract idea into a practical application or amount to significantly more than the abstract idea:
using a neural network classifier configured to distinguish classes of sensitive data (This is a high level recitation of generic computer components for performing the abstract idea. This does not integrate the abstract idea into a practical application or amount to significantly more than the abstract idea. See MPEP 2106.05(f).)
Claim 40 does not reflect an improvement to computer technology or any other technology.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claims 21, 27-30, 32-33, 35, and 38-39 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by “Foroughi” (US 10,546,054 B1).
Regarding claim 21, Foroughi teaches
A system comprising: at least one processor; and at least one non-transitory memory storing instructions that, when executed by the at least one processor, cause the system to perform operations comprising: (Foroughi, Figure 9a, described at column 7, line 12 through column 8, line 9.)
receiving actual data having at least one sensitive data portion; (Foroughi, Figure 1 shows that data is received from an electronic data source 102. Figure 2, step 201 indicates that a form document is received from an electronic data source. Column 3, lines 55-67 provides some examples of the data. Any of the data would, in at least some circumstances, be regarded as sensitive data (e.g., SSN, wages, tax data, state, and control number). Any collection of this data could be interpreted as being the “sensitive data portion”. See also column 4, lines 39-47, where it is indicated that the data source may be real electronic record data.)
determining a class associated with the at least one sensitive data portion; (Foroughi, Figure 1, element 104, described at column 3, lines 55-67. See also Figure 2, step 203, described at column 4, lines 49-53. The field value data from the input is mapped to a category.)
accessing a synthetic data generation model trained using a data space having data of the class; generating, using the synthetic data generation model, at least one synthetic data portion; and (Foroughi, Figure 1, element 114-124, described at column 4, lines 1-23. Figure 2, step 205-209, described at column 5, lines 9-51. The statistical data distributions are trained/learned based on the types of the data at step 207. The learned statistical data distributions and the PII anonymizer, taken together or separately, could be taken to be synthetic data generation model(s). The PII anonymizer and distributions are then used to generate the synthetic data at steps 205/209.)
replacing the at least one sensitive data portion with the at least one synthetic data portion. (Foroughi, Figure 1, element 126, described at column 4, lines 1 through 23. Figure 2, step 211, described at column 6, lines 11-16. When the synthesized data is placed in the field of the original data, the original data is replaced.)
Regarding claim 27, the rejection of claim 21 is incorporated herein. Furthermore, Foroughi teaches
wherein the at least one sensitive data portion is a first text string and the at least one synthetic data portion is a second text string. (Foroughi, column 3, lines 54-67, any of the listed examples would appear as a text string. See also Figures 3 and 7, where the original and synthesized forms show text boxes.)
Regarding claim 28, the rejection of claim 21 is incorporated herein. Furthermore, Foroughi teaches
wherein the at least one sensitive data portion comprises at least one of a social security number or a financial service account number. (Foroughi, column 3, lines 54-67 indicates that the data may include a social security number. The sensitive data portion would be interpreted as a collection of sensitive data that also includes SNN.)
Regarding claim 29, the rejection of claim 21 is incorporated herein. Furthermore, Foroughi teaches
wherein the actual data comprises personnel records or patient medical records. (Foroughi, column 3, lines 54-67 indicates that the data may include employee records (i.e., personnel records).)
Regarding claim 30, the rejection of claim 21 is incorporated herein. Furthermore, Foroughi teaches
wherein the actual data comprises unstructured data, the unstructured data comprising at least one of character strings or tokens. (Foroughi, column 3, lines 54-67 provides a list of examples, any of which might be represented as character strings or tokens. See also Figures 3 and 7, where the original and synthesized forms show text boxes.)
Regarding claim 32, the rejection of claim 21 is incorporated herein. Furthermore, Foroughi teaches
the operations further comprising selecting the synthetic data generation model based on the determined class. (Foroughi, Figure 1, element 114-124, described at column 4, lines 1-23. Figure 2, step 205-209, described at column 5, lines 9-51. The statistical data distributions are trained/learned based on the types of the data at step 207. For mapping claim 32, the particular distribution trained used/selected for that particular type is taken to be the synthetic data generation model. See also column 4, lines 54-64. Column 5, line 64 through column 6, line 10 provides an example in which the CDF corresponding to “Federal Income Tax Withheld” is used to generate synthetic data for that particular data category.)
Regarding claim 33, the rejection of claim 21 is incorporated herein. Furthermore, Foroughi teaches
determining a subclass associated with the at least one sensitive data portion; and selecting the synthetic data generation model based on the determined subclass. (Foroughi, Figure 1, element 104, described at column 3, lines 55-67. See also Figure 2, step 203, described at column 4, lines 49-53. The field value data from the input is mapped to a label and data category. The combination of label and data category is a subclass of just data category (e.g., “Federal Income Tax Withheld” is a subcategory of numerical data). Figure 1, element 114-124, described at column 4, lines 1-23. Figure 2, step 205-209, described at column 5, lines 9-51. The statistical data distributions are trained/learned based on the types of the data at step 207. For mapping claim 32, the particular distribution trained used/selected for that particular type is taken to be the synthetic data generation model. See also column 4, lines 54-64. Column 5, line 64 through column 6, line 10 provides an example in which the CDF corresponding to “Federal Income Tax Withheld” is used to generate synthetic data for that particular data category.)
Regarding claim 35, Foroughi teaches
A method for replacing sensitive data, the method comprising: (Foroughi, Abstract)
The remainder of claim 35 is substantially similar to claim 21; claim 35 is rejected with the same rationale.
Regarding claims 38-39, the rejection of claim 35 is incorporated herein. Claims 38-39 recite substantially similar subject matter to claims 32-33, respectively, and are rejected with the same rationale.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 22-24 and 36-37 are rejected under 35 U.S.C. 103 as being unpatentable over “Foroughi” (US 10,546,054 B1) in view of “Park” (Data Synthesis based on Generative Adversarial Networks, arXiv:1806.03384v5).
Regarding claim 22, the rejection of claim 21 is incorporated herein. Foroughi does not appear to explicitly teach
wherein the synthetic data generation model is trained to generate synthetic data satisfying a similarity criterion.
However, Park—directed to analogous art--teaches
wherein the synthetic data generation model is trained to generate synthetic data satisfying a similarity criterion. (Park, Abstract describes using a generative adversarial network (GAN) to synthesize data to protect privacy. Section 4.2.2 describes the loss function used to train the network. In particular, it includes a privacy level which controls the similarity to the original data that the model generates.)
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have modified Foroughi by Park because the use of a GAN “exhibits the best trade-ff between privacy level and model compatibility” (Park, Conclusion).
Regarding claim 23, the rejection of claim 22 is incorporated herein. Foroughi does not appear to explicitly teach
wherein the similarity criterion is based on at least one of a statistical correlation score, a data similarity score, or a data quality score.
However, Park—directed to analogous art--teaches
wherein the similarity criterion is based on at least one of a statistical correlation score, a data similarity score, or a data quality score. (Park, Abstract describes using a generative adversarial network (GAN) to synthesize data to protect privacy. Section 4.2.2 describes the loss function used to train the network. The Lmean portion of the loss function, for example, measures the difference between the means of the features of the original and synthetic data. This is a measure of both similarity and also data quality.)
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have combined these references in this way for the same reasons given above with respect to claim 22.
Regarding claim 24, the rejection of claim 22 is incorporated herein. Foroughi does not appear to explicitly teach
wherein the synthetic data generation model is a generative adversarial network (GAN).
However, Park—directed to analogous art--teaches
wherein the synthetic data generation model is a generative adversarial network (GAN). (Park, Abstract describes using a generative adversarial network (GAN) to synthesize data to protect privacy. See sections 2.3 and 4.1 for more details of the GAN.)
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have combined these references in this way for the same reasons given above with respect to claim 22.
Regarding claims 36-37, the rejection of claim 35 is incorporated herein. Claims 36-37 recite substantially similar subject matter to claims 22-23, respectively, and are rejected with the same rationale.
Claims 25-26, 31, and 40 are rejected under 35 U.S.C. 103 as being unpatentable over “Foroughi” (US 10,546,054 B1) in view of “Williamson” (US 2018/0232528 A1).
Regarding claim 25, the rejection of claim 21 is incorporated herein. Foroughi does not appear to explicitly teach
wherein determining the class associated with the at least one sensitive data portion comprises applying a recurrent neural network (RNN) to the actual data.
However, Williamson—directed to analogous art--teaches
wherein determining the class associated with the at least one sensitive data portion comprises applying a recurrent neural network (RNN) to the actual data. (Williamson, Abstract and Figure 5 describe and show performing classification of sensitive data to determine a sensitive data type. [0063] indicates that the data classifier may comprise an RNN. The data is described at [0022-0025]. The training is described at [0080-0081], where it is indicated that the model may be trained on labeled data that includes the sensitive data type. See also [0082], where it is indicated that the data type is one of the characteristics detected by the model in the data which is reported to the user.)
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have modified Foroughi by Williamson because using a deep learning classifier allows for the system to learn patterns that are specific to a particular organization resulting in fewer errors as described by Williamson at [0080].
Regarding claim 26, the rejection of claim 25 is incorporated herein. Foroughi does not appear to explicitly teach
wherein the RNN is trained to distinguish between classes of data.
However, Williamson—directed to analogous art--teaches
wherein the RNN is trained to distinguish between classes of data. (Williamson, [0063] describes using an RNN classifier to classify data. The training is described at [0080-0081], where it is indicated that the model may be trained on labeled data that includes the sensitive data type. See also [0082], where it is indicated that the data type is one of the characteristics detected by the model in the data which is reported to the user.)
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have combined these references in this way for the same reasons given above with respect to claim 25.
Regarding claim 31, the rejection of claim 21 is incorporated herein. Foroughi does not appear to explicitly teach
wherein the actual data comprises structured data, the structured data comprising at least one of a key-value pair, a relational database file, or a spreadsheet.
However, Williamson-directed to analogous art--teaches
wherein the actual data comprises structured data, the structured data comprising at least one of a key-value pair, a relational database file, or a spreadsheet. (Williamson, [0063] describes using a deep learning classifier including a recurrent neural network to determine whether or not a data portion is sensitive. The data is described at [0022-0025]. In particular, [0025] indicates that the data may take the form of a relational database or an excel file (i.e., a spreadsheet).)
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have combined these references in this way for the same reasons given above with respect to claim 25 because the network would need to be trained using the data format that the customer uses.
Regarding claim 40, Foroughi teaches
A non-transitory computer readable medium containing instructions that, when executed by one or more processors, case a computing system to perform operations comprising: (Foroughi, Column 2, lines 12-33 and column 7, lines 50-59.)
The remainder of claim 40 is substantially similar to claim 26 (including claims 21 and 25 upon which claim 26 depends). Claim 40 is rejected with the same rationale, mutatis mutandis.
Claim 34 is rejected under 35 U.S.C. 103 as being unpatentable over “Foroughi” (US 10,546,054 B1) in view of “Ha-Thuc” (Large-scale hierarchical text classification without labelled data).
Regarding claim 34, the rejection of claim 33 is incorporated herein. Foroughi does not appear to explicitly teach
wherein determining the subclass comprises using a distribution model associated with the class.
However, Ha-Thuc—directed to analogous art--teaches
wherein determining the subclass comprises using a distribution model associated with the class. (Ha-Thuc, Abstract describes categorizing textual data using a hierarchical collection of models. This is described in more detail in section 4. For example, page 687, section 4.1, first paragraph explains that that there is a distribution for each level of the hierarchy (shown in Figure 1 as a hierarchical tree of language models (LM)).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have modified Foroughi by Ha-Thuc because the hierarchical approach “is robust to noise in pseudo-relevant documents and could be able to identify terms relevant to categories at different levels of abstraction” (Ha-Thuc, Conclusion).
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 21, 32-35, and 38-39 are rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1 of U.S. Patent No. 11,513,869. Although the claims at issue are not identical, they are not patentably distinct from each other as shown in the table below. Overlapping subject matter is shown in bold.
Claims 22-24 and 36-37 are rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1 of U.S. Patent No. 11,513,869 in view of “Park” (Data Synthesis based on Generative Adversarial Networks, arXiv:1806.03384v5).
Claims 25-26, 31, and 40 are rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1 of U.S. Patent No. 11,513,869 in view of “Williamson” (US 2018/0232528 A1).
Claims 27-30 rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1 of U.S. Patent No. 11,513,869 in view of “Foroughi” (US 10.546,054 B1).
In the table, overlapping subject matter between the instant and patented claims are shown in bold. Subject matter in the instant claims which is not present in the patented claims are shown in italics, with an explanation as to how the secondary reference teaches that subject matter provided in the rightmost column.
Instant Application
11,513,869 Patent
Secondary Reference(s)
21. A system comprising:
at least one processor; and
at least one non-transitory memory storing instructions that, when executed by the at least one processor, cause the system to perform operations comprising:
receiving actual data having at least one sensitive data portion;
determining a class associated with the at least one sensitive data portion;
accessing a synthetic data generation model trained using a data space having data of the class;
generating, using the synthetic data generation model, at least one synthetic data portion; and
replacing the at least one sensitive data portion with the at least one synthetic data portion.
1. A system for training models for outputting synthetic database query results, the system comprising: at least one memory unit for storing instructions; and at least one processor configured to execute the instructions to perform operations comprising: receiving a first query input entered by a user at a user interface; determining a type of the first query input; generating a synthetic dataset using a dataset generator comprising a trained generative adversarial network, the synthetic dataset: differing by at least a predetermined amount from a reference dataset according to a similarity metric; and comprising synthetic data portions generated by: determining a class of sensitive data portions in the first query input ; selecting a subclass of sensitive data portions within the class based on a distribution model; generating synthetic data portions using a subclass- specific model trained to generate synthetic values for the selected subclass and not for other subclasses within the class; and replacing the sensitive data portions with the synthetic portions; based on the determined first query input type, providing the first query input and the synthetic dataset to a plurality of training models; training the plurality of training models based on the first query input and the synthetic dataset; receiving a second query input; determining a type of the second query input; and routing the second query input to a selected training model of the plurality of training models based on the determined second query input type and an output format of the selected training model.
N/A
22. The system of claim 21, wherein the synthetic data generation model is trained to generate synthetic data satisfying a similarity criterion.
Patented claim 1 as shown above with respect to instant claim 21.
Park, Abstract describes using a generative adversarial network (GAN) to synthesize data to protect privacy. Section 4.2.2 describes the loss function used to train the network. In particular, it includes a privacy level which controls the similarity to the original data that the model generates.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have modified the patented claim by Park because the use of a GAN “exhibits the best trade-ff between privacy level and model compatibility” (Park, Conclusion).
23. The system of claim 22, wherein the similarity criterion is based on at least one of a statistical correlation score, a data similarity score, or a data quality score.
Patented claim 1 as shown above with respect to instant claim 21.
Park, Abstract describes using a generative adversarial network (GAN) to synthesize data to protect privacy. Section 4.2.2 describes the loss function used to train the network. The Lmean portion of the loss function, for example, measures the difference between the means of the features of the original and synthetic data. This is a measure of both similarity and also data quality.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have combined these references in this way for the same reasons given above with respect to claim 22.
24. The system of claim 22, wherein the synthetic data generation model is a generative adversarial network (GAN).
Patented claim 1 as shown above with respect to instant claim 21.
Park, Abstract describes using a generative adversarial network (GAN) to synthesize data to protect privacy. See sections 2.3 and 4.1 for more details of the GAN.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have combined these references in this way for the same reasons given above with respect to claim 22.
25. The system of claim 21, wherein determining the class associated with the at least one sensitive data portion comprises applying a recurrent neural network (RNN) to the actual data.
Patented claim 1 as shown above with respect to instant claim 21.
Williamson, Abstract and Figure 5 describe and show performing classification of sensitive data to determine a sensitive data type. [0063] indicates that the data classifier may comprise an RNN. The data is described at [0022-0025]. The training is described at [0080-0081], where it is indicated that the model may be trained on labeled data that includes the sensitive data type. See also [0082], where it is indicated that the data type is one of the characteristics detected by the model in the data which is reported to the user.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have modified the claim by Williamson because using a deep learning classifier allows for the system to learn patterns that are specific to a particular organization resulting in fewer errors as described by Williamson at [0080].
26. The system of claim 25, wherein the RNN is trained to distinguish between classes of data.
Patented claim 1 as shown above with respect to instant claim 21.
Williamson, [0063] describes using an RNN classifier to classify data. The training is described at [0080-0081], where it is indicated that the model may be trained on labeled data that includes the sensitive data type. See also [0082], where it is indicated that the data type is one of the characteristics detected by the model in the data which is reported to the user.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have combined these references in this way for the same reasons given above with respect to claim 25.
27. The system of claim 21, wherein the at least one sensitive data portion is a first text string and the at least one synthetic data portion is a second text string.
Patented claim 1 as shown above with respect to instant claim 21.
Foroughi, column 3, lines 54-67, any of the listed examples would appear as a text string. See also Figures 3 and 7, where the original and synthesized forms show text boxes.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have modified the claim by Foroughi because the techniques taught by Foroughi allow for the generation of synthetic forms which preserve the privacy of users as described by Foroughi in the Abstract and column 5, lines 1-9.
28. The system of claim 21, wherein the at least one sensitive data portion comprises at least one of a social security number or a financial service account number.
Patented claim 1 as shown above with respect to instant claim 21.
Foroughi, column 3, lines 54-67 indicates that the data may include a social security number. The sensitive data portion would be interpreted as a collection of sensitive data that also includes SNN.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have combined these references in this way for the same reasons given above with respect to claim 27.
29. The system of claim 21, wherein the actual data comprises personnel records or patient medical records.
Patented claim 1 as shown above with respect to instant claim 21.
Foroughi, column 3, lines 54-67 indicates that the data may include a social security number. The sensitive data portion would be interpreted as a collection of sensitive data that also includes SNN.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have combined these references in this way for the same reasons given above with respect to claim 27.
30. The system of claim 21, wherein the actual data comprises unstructured data, the unstructured data comprising at least one of character strings or tokens.
Patented claim 1 as shown above with respect to instant claim 21.
Foroughi, column 3, lines 54-67 provides a list of examples, any of which might be represented as character strings or tokens. See also Figures 3 and 7, where the original and synthesized forms show text boxes.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have combined these references in this way for the same reasons given above with respect to claim 27.
31. The system of claim 21, wherein the actual data comprises structured data, the structured data comprising at least one of a key-value pair, a relational database file, or a spreadsheet.
Patented claim 1 as shown above with respect to instant claim 21.
Williamson, [0063] describes using a deep learning classifier including a recurrent neural network to determine whether or not a data portion is sensitive. The data is described at [0022-0025]. In particular, [0025] indicates that the data may take the form of a relational database or an excel file (i.e., a spreadsheet).)
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have combined these references in this way for the same reasons given above with respect to claim 25.
32. The system of claim 21, the operations further comprising selecting the synthetic data generation model based on the determined class.
1. A system for training models for outputting synthetic database query results, the system comprising: at least one memory unit for storing instructions; and at least one processor configured to execute the instructions to perform operations comprising: receiving a first query input entered by a user at a user interface; determining a type of the first query input; generating a synthetic dataset using a dataset generator comprising a trained generative adversarial network, the synthetic dataset: differing by at least a predetermined amount from a reference dataset according to a similarity metric; and comprising synthetic data portions generated by: determining a class of sensitive data portions in the first query input ; selecting a subclass of sensitive data portions within the class based on a distribution model; generating synthetic data portions using a subclass- specific model trained to generate synthetic values for the selected subclass and not for other subclasses within the class; and replacing the sensitive data portions with the synthetic portions; based on the determined first query input type, providing the first query input and the synthetic dataset to a plurality of training models; training the plurality of training models based on the first query input and the synthetic dataset; receiving a second query input; determining a type of the second query input; and routing the second query input to a selected training model of the plurality of training models based on the determined second query input type and an output format of the selected training model.
N/A
33. The system of claim 21, the operations further comprising:
determining a subclass associated with the at least one sensitive data portion;
and selecting the synthetic data generation model based on the determined subclass.
1. A system for training models for outputting synthetic database query results, the system comprising: at least one memory unit for storing instructions; and at least one processor configured to execute the instructions to perform operations comprising: receiving a first query input entered by a user at a user interface; determining a type of the first query input; generating a synthetic dataset using a dataset generator comprising a trained generative adversarial network, the synthetic dataset: differing by at least a predetermined amount from a reference dataset according to a similarity metric; and comprising synthetic data portions generated by: determining a class of sensitive data portions in the first query input ; selecting a subclass of sensitive data portions within the class based on a distribution model; generating synthetic data portions using a subclass- specific model trained to generate synthetic values for the selected subclass and not for other subclasses within the class; and replacing the sensitive data portions with the synthetic portions; based on the determined first query input type, providing the first query input and the synthetic dataset to a plurality of training models; training the plurality of training models based on the first query input and the synthetic dataset; receiving a second query input; determining a type of the second query input; and routing the second query input to a selected training model of the plurality of training models based on the determined second query input type and an output format of the selected training model.
N/A
34. The system of claim 33, wherein determining the subclass comprises using a distribution model associated with the class.
1. A system for training models for outputting synthetic database query results, the system comprising: at least one memory unit for storing instructions; and at least one processor configured to execute the instructions to perform operations comprising: receiving a first query input entered by a user at a user interface; determining a type of the first query input; generating a synthetic dataset using a dataset generator comprising a trained generative adversarial network, the synthetic dataset: differing by at least a predetermined amount from a reference dataset according to a similarity metric; and comprising synthetic data portions generated by: determining a class of sensitive data portions in the first query input ; selecting a subclass of sensitive data portions within the class based on a distribution model; generating synthetic data portions using a subclass- specific model trained to generate synthetic values for the selected subclass and not for other subclasses within the class; and replacing the sensitive data portions with the synthetic portions; based on the determined first query input type, providing the first query input and the synthetic dataset to a plurality of training models; training the plurality of training models based on the first query input and the synthetic dataset; receiving a second query input; determining a type of the second query input; and routing the second query input to a selected training model of the plurality of training models based on the determined second query input type and an output format of the selected training model.
N/A
35. A method for replacing sensitive data, the method comprising:
receiving actual data having at least one sensitive data portion;
determining a class associated with the at least one sensitive data portion;
accessing a synthetic data generation model trained using a data space having data of the class;
generating, using the synthetic data generation model, at least one synthetic data portion; and
replacing the at least one sensitive data portion with the at least one synthetic data portion.
1. A system for training models for outputting synthetic database query results, the system comprising: at least one memory unit for storing instructions; and at least one processor configured to execute the instructions to perform operations comprising: receiving a first query input entered by a user at a user interface; determining a type of the first query input; generating a synthetic dataset using a dataset generator comprising a trained generative adversarial network, the synthetic dataset: differing by at least a predetermined amount from a reference dataset according to a similarity metric; and comprising synthetic data portions generated by: determining a class of sensitive data portions in the first query input ; selecting a subclass of sensitive data portions within the class based on a distribution model; generating synthetic data portions using a subclass- specific model trained to generate synthetic values for the selected subclass and not for other subclasses within the class; and replacing the sensitive data portions with the synthetic portions; based on the determined first query input type, providing the first query input and the synthetic dataset to a plurality of training models; training the plurality of training models based on the first query input and the synthetic dataset; receiving a second query input; determining a type of the second query input; and routing the second query input to a selected training model of the plurality of training models based on the determined second query input type and an output format of the selected training model.
N/A
36. The method of claim 35, wherein the synthetic data generation model is trained to generate synthetic data satisfying a similarity criterion.
Patented claim 1 as shown above with respect to instant claim 35.
Park as applied to instant claim 22.
37. The method of claim 36, wherein the similarity criterion is based on at least one of a statistical correlation score, a data similarity score, or a data quality score.
Patented claim 1 as shown above with respect to instant claim 35.
Park as applied to instant claim 23.
38. The method of claim 35, further comprising selecting the synthetic data generation model based on the determined class.
1. A system for training models for outputting synthetic database query results, the system comprising: at least one memory unit for storing instructions; and at least one processor configured to execute the instructions to perform operations comprising: receiving a first query input entered by a user at a user interface; determining a type of the first query input; generating a synthetic dataset using a dataset generator comprising a trained generative adversarial network, the synthetic dataset: differing by at least a predetermined amount from a reference dataset according to a similarity metric; and comprising synthetic data portions generated by: determining a class of sensitive data portions in the first query input; selecting a subclass of sensitive data portions within the class based on a distribution model; generating synthetic data portions using a subclass- specific model trained to generate synthetic values for the selected subclass and not for other subclasses within the class; and replacing the sensitive data portions with the synthetic portions; based on the determined first query input type, providing the first query input and the synthetic dataset to a plurality of training models; training the plurality of training models based on the first query input and the synthetic dataset; receiving a second query input; determining a type of the second query input; and routing the second query input to a selected training model of the plurality of training models based on the determined second query input type and an output format of the selected training model.
N/A
39. The method of claim 35, further comprising:
determining a subclass associated with the at least one sensitive data portion;
and selecting the synthetic data generation model based on the determined subclass.
1. A system for training models for outputting synthetic database query results, the system comprising: at least one memory unit for storing instructions; and at least one processor configured to execute the instructions to perform operations comprising: receiving a first query input entered by a user at a user interface; determining a type of the first query input; generating a synthetic dataset using a dataset generator comprising a trained generative adversarial network, the synthetic dataset: differing by at least a predetermined amount from a reference dataset according to a similarity metric; and comprising synthetic data portions generated by: determining a class of sensitive data portions in the first query input ; selecting a subclass of sensitive data portions within the class based on a distribution model; generating synthetic data portions using a subclass- specific model trained to generate synthetic values for the selected subclass and not for other subclasses within the class; and replacing the sensitive data portions with the synthetic portions; based on the determined first query input type, providing the first query input and the synthetic dataset to a plurality of training models; training the plurality of training models based on the first query input and the synthetic dataset; receiving a second query input; determining a type of the second query input; and routing the second query input to a selected training model of the plurality of training models based on the determined second query input type and an output format of the selected training model.
N/A
40. A non-transitory computer readable medium containing instructions that, when executed by one or more processors, cause a computing system to perform operations comprising:
receiving actual data having at least one sensitive data portion;
determining, using a neural network classifier configured to distinguish classes of sensitive data, a class associated with the at least one sensitive data portion;
accessing a synthetic data generation model trained using training data associated with the class;
generating, using the synthetic data generation model, at least one synthetic data portion; and
replacing the at least one sensitive data portion with the at least one synthetic data portion.
1. A system for training models for outputting synthetic database query results, the system comprising: at least one memory unit for storing instructions; and at least one processor configured to execute the instructions to perform operations comprising: receiving a first query input entered by a user at a user interface; determining a type of the first query input; generating a synthetic dataset using a dataset generator comprising a trained generative adversarial network, the synthetic dataset: differing by at least a predetermined amount from a reference dataset according to a similarity metric; and comprising synthetic data portions generated by: determining a class of sensitive data portions in the first query input ; selecting a subclass of sensitive data portions within the class based on a distribution model; generating synthetic data portions using a subclass- specific model trained to generate synthetic values for the selected subclass and not for other subclasses within the class; and replacing the sensitive data portions with the synthetic portions; based on the determined first query input type, providing the first query input and the synthetic dataset to a plurality of training models; training the plurality of training models based on the first query input and the synthetic dataset; receiving a second query input; determining a type of the second query input; and routing the second query input to a selected training model of the plurality of training models based on the determined second query input type and an output format of the selected training model.
Williamson as applied to instant claim 25.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Newman (US 2018/0247078 A1) – Abstract describes providing masked information. Figure 5 shows that this includes replacing sensitive data with anonymized data.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Markus A Vasquez whose telephone number is (303)297-4432. The examiner can normally be reached Monday to Friday 9AM to 4PM PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li Zhen can be reached on (571) 272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MARKUS A. VASQUEZ/ Primary Examiner, Art Unit 2121