Last updated: May 29, 2026
Application No. 17/216,891
Partially customized machine learning models for data de-identification

Final Rejection §103§112
Filed
Mar 30, 2021
Priority
Mar 30, 2020 — provisional 63/001,655
Examiner
NGUYEN, HENRY K
Art Unit
2121
Tech Center
2100 — Computer Architecture & Software
Assignee
Google LLC
OA Round
4 (Final)
Interview Optional

— +30.7% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 58% grant rate with +30.7% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 160 resolved cases, 2023–2026
Examiner Intelligence

NGUYEN, HENRY K View full profile →
Grants 58% of resolved cases
Career Allowance Rate
92 granted / 160 resolved
+2.5% vs TC avg
Strong +31% interview lift
Without
With
+30.7%
Interview Lift
resolved cases with interview
Typical timeline
4y 5m
Avg Prosecution
13 currently pending
Career history
185
Total Applications
across all art units
Statute-Specific Performance

§101
3.9%
-36.1% vs TC avg
§103
92.9%
+52.9% vs TC avg
§102
1.7%
-38.3% vs TC avg
§112
0.9%
-39.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 160 resolved cases
Office Action

§103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
Acknowledgement is made of Applicant’s claim amendments on 12/03/2025. The claim amendments are entered. Presently, claims 1-21 remain pending. Claims 1, 7, 12, and 17 have been amended.
Response to Arguments
Applicant's arguments filed 12/03/2025 have been fully considered but they are not persuasive. 
Applicant argues: Rice does not disclose “second protected data [] not associated with the organization” and Dernoncourt and Rice do not teach "train [] the neural network on the second training dataset, wherein the training is based on the character-based token embedding of the unlabeled training data of the first training dataset to learn a language model specific to the organization". (page 10-12 of remarks).
Examiner response: Examiner respectfully disagrees. Rice discloses training a prediction model including PII (i.e., protected data) (Rice para [0058]). The data is from different unrelated users and are not associated with a same organization whereas Dernoncourt has PII information collected from health organizations. The claims do not define or specify what the organization is. Dernoncourt further teaches that the neural network which includes a language model as shown in figure 1 (Dernoncourt pg. 598; “As a result, methods based on ANNs have shown promising results for various tasks in natural language processing (NLP), such as language modeling, text classification, question answering, machine translation, and named entity recognition.” Pg. 602; “In other words, the ANN model’s intrinsic flexibility al lows it to better capture the variances in human language than the CRF model.”). The neural network of Dernoncourt may be pretrained on an unlabeled dataset associated with a healthcare organization to initialize and learn token embeddings (pg. 598; “The token embeddings are jointly learned with the other parameters of the ANN. They can be initialized randomly, or can be pre-trained using large unlabeled datasets typically based on token co-occurrences.30–32 The latter often performs better, since the pre-trained token embeddings explicitly encode many linguistic regularities and patterns.” The RNN is trained based on the learned token embeddings learned during the pre-training. Pg. 600; “The direct mapping VTðÞ from token to vector, often called a token (or word) embedding, can be pre-trained on large unlabeled datasets using programs such as word2vec30,54,55 or GloVe,32 and can be learned jointly with the rest of the model.”). Dernoncourt further teaches training the neural network on at least two datasets (Dernoncourt pg. 603; “We proposed the first system based on ANN for patient note de identification. It outperforms state-of-the-art systems based on CRF on two datasets, while requiring no handcrafted features. Utilizing both token and character embeddings, the system can automatically learn effective features from data by fine-tuning the parameters. It jointly learns the parameters for the embeddings, the bidirectional LSTMs, and the label sequence optimization, and can make use of token embeddings pre-trained on large unlabeled datasets.”). The datasets may be labeled (Dernoncourt pg. 602; “figure 4 details the impact of the number of labeled PHI in stances in the training set on the model’s performance for a given PHI type in the i2b2 dataset.”) The token embeddings learned during pre-training are used to further generalize other token embeddings in a similar token-embedding space (Dernoncourt pg. 600; “Token embeddings, often learned by sampling token co-occurrence distributions, have desirable properties, such as locating semantically similar words closely in the vector space, leading to state-of-the-art performance for various tasks.” Pg. 601-602; “Another interesting difference between the ANN and CRF results was the PROFESSION category, where the ANN significantly outperformed the CRF. The reason behind this is that the embeddings of the tokens that represent a profession tend to be close in the token-embedding space, which allows the ANN model to generalize well.”). The neural network is trained based on training sets associated with health organizations, therefore, the language model is organization specific (Dernoncourt pg. 600; “We evaluate our two models on two datasets: i2b2 2014 and MIMIC. The i2b2 2014 dataset was released as part of the 2014 i2b2/UTHealth shared task Track 1.29 It is the largest publicly available dataset for de-identification. Ten teams participated in this shared task, and 22 systems were submitted. As a result, we used the i2b2 2014 dataset to compare our models against state-of-the-art systems. The MIMIC de-identification dataset was created for this work as follows: the MIMIC-III dataset contains data for 61 532 intensive care unit stays over 58976 hospital admissions for 46520 patients and includes 2 million patient notes.”). Arguments are not persuasive.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-18 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 1 recites “based on the output of the neural network learned language model specific to the organization and using the plurality of tags, de-identifying the protected data in the input unstructured textual data associated with the organization”. It is unclear whether the “the neural network learned language model” is referring to the neural network or the learned language model. For examination purposes, Examiner interprets the limitation as “based on the output of the neural network and the learned language model specific to the organization and using the plurality of tags, de-identifying the protected data in the input unstructured textual data associated with the organization” similar to claims 19 and 20. Claims 2-18 are dependent claims that do not cure the deficiencies and are rejected for the same reasons.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3-4, 8-11, 14-15, and 17-21 are rejected under 35 U.S.C. 103 as being unpatentable over Dernoncourt et al. ("De-identification of patient notes with recurrent neural networks.") in view of Rice et al. (US-20190080063-A1).
Regarding Claim 1,
Dernoncourt teaches a computer-implemented method of using a neural network to de-identify protected data associated with an organization, comprising: 
receiving, by a computing device, input unstructured textual data associated with the organization (pg. 596; “The Centers for Medicare and Medicaid Services have paid out more than $30 billion in EHR incentive payments to hospitals and providers who have attested to meaningful use as of March 2015… However, before patient notes can be shared with medical investigators, some types of information, referred to as protected health information (PHI), must be removed in order to preserve patient confidentiality.” Healthcare organizations.); 
outputting, by the neural network, a determination whether the input unstructured textual data comprises protected data associated with the organization (pg. 600; “The label-prediction layer takes as input the sequence of vectors e1:n, i.e., the outputs of the character-enhanced token-embedding layer, and outputs a1:n, where the t th element of an is the probability that the nth token has the label t. The label is either one of the PHI types or non-PHI.” The model outputs whether the input has PHI (i.e., protected data).), 
the neural network having been trained on organization-specific data by:
providing a first training dataset comprising unlabeled training data that includes first protected data (pg. 598; “They can be initialized randomly, or can be pre-trained using large unlabeled datasets typically based on token co-occurrences.”), wherein the first protected data comprises the organization-specific data (pg. 596; “The Centers for Medicare and Medicaid Services have paid out more than $30 billion in EHR incentive payments to hospitals and providers who have attested to meaningful use as of March 2015… However, before patient notes can be shared with medical investigators, some types of information, referred to as protected health information (PHI), must be removed in order to preserve patient confidentiality.” Centers for Medicare and Medicaid are organizations.),
generating a character-based token embedding of the unlabeled training data of the first training dataset (pg. 600; “The direct mapping VTðÞ from token to vector, often called a token (or word) embedding, can be pre-trained on large unlabeled datasets using programs such as word2vec30,54,55 or GloVe,32 and can be learned jointly with the rest of the model.”),
providing a second training dataset comprising labeled training data that includes second protected data, (pg. 600; “Table 1 introduces the PHI types used as labels for training, and Table 3 presents the sizes of the datasets.” Dernoncourt teaches that the RNN can pre-trained on an unlabeled dataset and further trained on a labelled training set.), and
training the neural network on the second training dataset, wherein the training is based on the character-based token embedding of the unlabeled training data of the first training dataset (pg. 598; “The token embeddings are jointly learned with the other parameters of the ANN. They can be initialized randomly, or can be pre-trained using large unlabeled datasets typically based on token co-occurrences.30–32 The latter often performs better, since the pre-trained token embeddings explicitly encode many linguistic regularities and patterns.” The RNN is trained based on the learned token embeddings learned during the pre-training. Pg. 600; “The direct mapping VTðÞ from token to vector, often called a token (or word) embedding, can be pre-trained on large unlabeled datasets using programs such as word2vec30,54,55 or GloVe,32 and can be learned jointly with the rest of the model.”) to learn a language model specific to the organization (pg. 598; “As a result, methods based on ANNs have shown promising results for various tasks in natural language processing (NLP), such as language modeling, text classification, question answering, machine translation, and named entity recognition.” The ANN includes a language model further depicted in figure 1. Pg. 596; “The Centers for Medicare and Medicaid Services have paid out more than $30 billion in EHR incentive payments to hospitals and providers who have attested to meaningful use as of March 2015. Medical investiga tions can greatly benefit from the resulting increasingly large EHR datasets." The model is trained on data from health organizations.), and 
wherein the neural network comprises a tag prediction layer that is configured to project a token embedding of the input unstructured textual data onto a probability distribution over a plurality of tags (pg. 599; “The character-enhanced tokenembedding layer maps each token to a vector representation. The sequence of vector representations corresponding to a sequence of tokens is inputted into the label-prediction layer, which outputs the sequence of vectors containing the probability of each label for each corresponding token.”), wherein the plurality of tags are indicative of protected data (pg. 600; “The label-prediction layer takes as input the sequence of vectors e1:n, i.e., the outputs of the character-enhanced token-embedding layer, and outputs a1:n, where the t th element of an is the probability that the nth token has the label t. The label is either one of the PHI types or non-PHI.”); and 
based on the output of the neural network learned language model specific to the organization and using the plurality of tags, de-identifying the protected data in the input unstructured textual data associated with the organization (pg. 598; “To alleviate some downsides of the rule-based systems, there have been many attempts to use supervised machine learning algorithms to de-identify patient notes. These algorithms are used to train a classifier to label each word as PHI or not PHI, sometimes distinguishing between different PHI types.”).
While Dernoncourt teaches training the model with a second labelled training set including PHI, Denoncourt does not explicitly disclose the second training set is not associated with the organization.
Dernoncourt does not explicitly disclose
providing a second training dataset comprising labeled training data that includes second protected data, wherein the second protected data is not associated with the organization
However, Rice (US 20190080063 A1) teaches
providing a second training dataset comprising labeled training data that includes second protected data, wherein the second protected data is not associated with the organization (para [0058] “The prediction model can also be trained using some of the log data from log data storage 170 that have been labelled PII. The classifier can then use the trained prediction model to process the partition, to predict whether PII is absent or present in the partition.”)
Dernoncourt and Rice are analogous because they are directed towards the same field of endeavor of using neural networks to identify personally identifiable information.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the training of Dernoncourt with the labelled training data of Rice.
Doing so would allow for customizing the training dataset to include a number of sets labelled as PII and a number of sets labelled as non-PII (Rice para [0070]).
Regarding Claim 3,
Dernoncourt and Rice teach the computer-implemented method of claim 1. Dernoncourt further teaches further comprising: training the neural network to predict whether future input unstructured textual data comprises protected data associated with the organization (pg. 600; “The label-prediction layer takes as input the sequence of vectors e1:n, i.e., the outputs of the character-enhanced token-embedding layer, and outputs a1:n, where the t th element of an is the probability that the nth token has the label t. The label is either one of the PHI types or non-PHI”).
Regarding Claim 4,
Dernoncourt and Rice teach the computer-implemented method of claim 3. Dernoncourt further teaches wherein the training of the neural network comprises training the neural network based on a mixture of (i) the unlabeled training data from the first training dataset (pg. 598; “They can be initialized randomly, or can be pre-trained using large unlabeled datasets typically based on token co-occurrences.”), and (ii) the labeled training data from the second training dataset (pg. 600; “Table 1 introduces the PHI types used as labels for training, and Table 3 presents the sizes of the datasets.).
Regarding Claim 8,
Dernoncourt and Rice teach the computer-implemented method of claim 3. Dernoncourt further teaches further comprising: 
obtaining the second training dataset (pg. 598; “They can be initialized randomly, or can be pre-trained using large unlabeled datasets typically based on token co-occurrences.”);
obtaining the first training dataset (pg. 600; “Table 1 introduces the PHI types used as labels for training, and Table 3 presents the sizes of the datasets.”);
applying the character-based token embedding to the second dataset, and wherein the training of the neural network comprises training the neural network on the second dataset based on the character-based token embedding. (pg. 601; “As mentioned previously, token embeddings can be pre-trained, and during training the token mapping VTðÞ is initialized with the pre-trained token embeddings. We tried pre-training token embeddings on the i2b2 2014 and MIMIC datasets (for MIMIC, we used the entire dataset containing 2 million notes and 800 million tokens), using word2vec and GloVe. Both of these were trained using a window size of 10, a minimum vocabulary count of 5, and 15 iterations.”).
Regarding Claim 9,
Dernoncourt and Rice teach the computer-implemented method of claim 1. wherein the character-based token embedding is based on an algorithm to estimate word representations in a vector space (pg. 600; “Token embeddings, often learned by sampling token co-occurrence distributions, have desirable properties, such as locating semantically similar words closely in the vector space, leading to state-of-the-art performance for various tasks.”).
Regarding Claim 10,
Dernoncourt and Rice teach the computer-implemented method of claim 1. Dernoncourt further teaches wherein the neural network comprises: 
a pre-trained token embedding model to map a tokenized representation of the input unstructured textual data to a first multidimensional vector space, wherein the tokenized representation comprises one or more tokens (pg. 600; “Token embeddings, often learned by sampling token co-occurrence distributions, have desirable properties, such as locating semantically similar words closely in the vector space, leading to state-of-the-art performance for various tasks.” And pg. 601; “As mentioned previously, token embeddings can be pre-trained, and during training the token mapping VTðÞ is initialized with the pre-trained token embeddings.”);
a bi-directional recurrent neural network (BiRNN) (pg. 599; “Bidirectional LSTM An RNN is a neural network architecture designed to handle input sequences of variable sizes, but it fails to model long-term dependencies. An LSTM is a type of RNN that mitigates this issue by keeping a memory cell that serves as a summary of the preceding elements of an input sequence.”) to generate the character-based token embedding that maps each token of the one or more tokens to a second multidimensional vector space (pg. 599-600; “The character-enhanced token-embedding layer takes a token as input and outputs its vector representation. The latter results from the concatenation of two different types of embeddings; the first one directly maps a token to a vector, while the second one comes from the output of a character-level token encoder.”); 
a second BiRNN to add contextual information to the tokenized representation (pg. 600; “The label-prediction layer contains a bidirectional LSTM that takes the input sequence e1:n and generates the corresponding output sequence di:n $ : Each output di $ of the LSTM is given to a feed-forward neural network with 1 hidden layer, which outputs the corresponding probability vector ai.” Figure 1 shows a second biRNN stacked on top of the first BIRNN that takes the character enhanced tokens as input and generates the output di.); and
the tag prediction layer to project the first multidimensional vector space onto a probability distribution over a plurality of tags, wherein the plurality of tags are indicative of the protected data associated with the organization (pg. 600; “The label-prediction layer takes as input the sequence of vectors e1:n, i.e., the outputs of the character-enhanced token-embedding layer, and outputs a1:n, where the t th element of an is the probability that the nth token has the label t. The label is either one of the PHI types or non-PHI.”).
Regarding Claim 11,
Dernoncourt and Rice teach the computer-implemented method of claim 10. Dernoncourt further teaches wherein the neural network comprises: a second tag prediction layer based on a conditional random field (pg. 601; “CRF is the model based on conditional random fields, ANN is the model based on artificial neural networks, and CRF+ANN is the result obtained by combining the outputs of the CRF and ANN models.”), wherein the conditional random field determines whether a tag of the plurality of tags is consistent as a sequence (pg. 598; “In order to effectively incorporate context when predicting a label, all the features for a given token are computed based on that token and on the four surrounding tokens.”).
Regarding Claim 14,
Dernoncourt and Rice teach the computer-implemented method of claim 1. Dernoncourt further teaches wherein the training dataset comprises free text unstructured notes (pg. 597; “First, only a restricted set of individuals is allowed to access the identified patient notes, thus the task cannot be crowdsourced. Second, humans are prone to mistakes. Neamatullah et al.4 asked 14 clinicians to detect PHI in approximately 130 patient notes; the results of the manual de-identification varied from clinician to clinician, with recall ranging from 0.63 to 0.94.”).
Regarding Claim 15,
Dernoncourt and Rice teach the computer-implemented method of claim 14. Dernoncourt further teaches wherein the free text unstructured notes comprise free text notes associated with a discharge of a patient (pg. 600; “To create the gold standard MIMIC de-identification dataset, we selected 1635 discharge summaries, each belonging to a different patient, containing a total of 60 700 PHI instances.”).
Regarding Claim 17,
Dernoncourt and Rice teach the computer-implemented method of claim 1. Dernoncourt further teaches further comprising: obtaining a trained neural network at the computing device (pg. 598; “The token embeddings are jointly learned with the other parameters of the ANN. They can be initialized randomly, or can be pre-trained using large unlabeled datasets typically based on token co-occurrences.”), and wherein the predicting of whether the input unstructured textual data comprises protected data comprises predicting by the computing device using the trained neural network (pg. 600; “The label-prediction layer takes as input the sequence of vectors e1:n, i.e., the outputs of the character-enhanced token-embedding layer, and outputs a1:n, where the t th element of an is the probability that the nth token has the label t. The label is either one of the PHI types or non-PHI. For example, if we aimed to predict all 18 HIPAAdefined PHI types, there would be 19 different labels.”).
Regarding Claim 18,
Dernoncourt and Rice teach the computer-implemented method of claim 1. Dernoncourt further teaches wherein the protected data comprises one or more of personally identifiable information (PII), protected health information (PHI) (pg. 596; “However, before patient notes can be shared with medical investigators, some types of information, referred to as protected health information (PHI), must be removed in order to preserve patient confidentiality.”), or payment card industry (PCI) information.
Regarding Claim 19,
Dernoncourt teaches a server for using a neural network to de-identify protected data associated with an organization, comprising:
receiving, by a computing device, a first training dataset comprising unlabeled training data that includes first protected data, wherein the first protected data comprises organization-specific data (pg. 596; “The Centers for Medicare and Medicaid Services have paid out more than $30 billion in EHR incentive payments to hospitals and providers who have attested to meaningful use as of March 2015… However, before patient notes can be shared with medical investigators, some types of information, referred to as protected health information (PHI), must be removed in order to preserve patient confidentiality.” Healthcare organizations. pg. 598; “They can be initialized randomly, or can be pre-trained using large unlabeled datasets typically based on token co-occurrences.”);
generating a character-based token embedding of the unlabeled training data of the first training dataset (pg. 600; “The direct mapping VTðÞ from token to vector, often called a token (or word) embedding, can be pre-trained on large unlabeled datasets using programs such as word2vec30,54,55 or GloVe,32 and can be learned jointly with the rest of the model.”);
receiving, by the computing device, a second training dataset comprising labeled training data that includes second protected data (pg. 600; “Table 1 introduces the PHI types used as labels for training, and Table 3 presents the sizes of the datasets.” Dernoncourt teaches that the RNN can pre-trained on an unlabeled dataset and further trained on a labelled training set.), 
training the neural network on the second training dataset, wherein the training is based on the character-based token embedding of the unlabeled training data of the first training dataset (pg. 598; “The token embeddings are jointly learned with the other parameters of the ANN. They can be initialized randomly, or can be pre-trained using large unlabeled datasets typically based on token co-occurrences.30–32 The latter often performs better, since the pre-trained token embeddings explicitly encode many linguistic regularities and patterns.” The RNN is trained based on the learned token embeddings based on the pre-training. Pg. 600; “The direct mapping VTðÞ from token to vector, often called a token (or word) embedding, can be pre-trained on large unlabeled datasets using programs such as word2vec30,54,55 or GloVe,32 and can be learned jointly with the rest of the model.”) to learn a language model specific to the organization (pg. 598; “As a result, methods based on ANNs have shown promising results for various tasks in natural language processing (NLP), such as language modeling, text classification, question answering, machine translation, and named entity recognition.” The ANN includes a language model further depicted in figure 1. Pg. 596; “The Centers for Medicare and Medicaid Services have paid out more than $30 billion in EHR incentive payments to hospitals and providers who have attested to meaningful use as of March 2015. Medical investigations can greatly benefit from the resulting increasingly large EHR datasets." The model is trained on data from health organizations.) and to output whether an input unstructured textual data comprises protected data associated with the organization (pg. 600; “The label-prediction layer takes as input the sequence of vectors e1:n, i.e., the outputs of the character-enhanced token-embedding layer, and outputs a1:n, where the t th element of an is the probability that the nth token has the label t. The label is either one of the PHI types or non-PHI”), and
wherein the neural network comprises a tag prediction layer that is configured to project a token embedding of the input unstructured textual data onto a probability distribution over a plurality of tags (pg. 599; “The character-enhanced tokenembedding layer maps each token to a vector representation. The sequence of vector representations corresponding to a sequence of tokens is inputted into the label-prediction layer, which outputs the sequence of vectors containing the probability of each label for each corresponding token.”), wherein the plurality of tags are indicative of protected data (pg. 600; “The label-prediction layer takes as input the sequence of vectors e1:n, i.e., the outputs of the character-enhanced token-embedding layer, and outputs a1:n, where the t th element of an is the probability that the nth token has the label t. The label is either one of the PHI types or non-PHI.”); and
providing, by the computing device, the trained neural network for de-identifying protected data in future input unstructured textual data associated with the organization, wherein the de-identifying of the protected data uses a future plurality of tags corresponding to the future input unstructured textual data and uses the learned language model specific to the organization (pg. 598; “To alleviate some downsides of the rule-based systems, there have been many attempts to use supervised machine learning algorithms to de-identify patient notes. These algorithms are used to train a classifier to label each word as PHI or not PHI, sometimes distinguishing between different PHI types.”).
Dernoncourt does not explicitly disclose
receiving, by the computing device, a second training dataset comprising labeled training data that includes second protected data, wherein the second protected data is not associated with the organization;
However, Rice (US 20190080063 A1) teaches
one or more processors (para [0014] “The system may one or more processors;”); and 
memory storing computer-executable instructions that, when executed by the one or more processors (para [0066]), cause the server to perform operations comprising:
receiving, by the computing device, a second training dataset comprising labeled training data that includes second protected data, wherein the second protected data is not associated with the organization (para [0058] “The prediction model can also be trained using some of the log data from log data storage 170 that have been labelled PII. The classifier can then use the trained prediction model to process the partition, to predict whether PII is absent or present in the partition.”)
Dernoncourt and Rice are analogous because they are directed towards the same field of endeavor of using neural networks to identify personally identifiable information.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the training of Dernoncourt with the labelled training data of Rice.
Doing so would allow for customizing the training dataset to include a number of sets labelled as PII and a number of sets labelled as non-PII (Rice para [0070]).
Regarding Claim 20,
Claim 20 the article of manufacture corresponding to the server of claim 19. Claim 20 is substantially similar to claim 19 and is rejected on the same grounds.
Regarding Claim 21,
Dernoncourt and Rice teach the computer-implemented method of claim 1. Dernoncourt further teaches the computer-implemented method of claim 1, further comprising: 
receiving, by the computing device, a third training dataset comprising second unlabeled training data that includes third protected data, wherein the third protected data comprises another organization-specific data corresponding to a second organization (pg. 596; “The Centers for Medicare and Medicaid Services have paid out more than $30 billion in EHR incentive payments to hospitals and providers who have attested to meaningful use as of March 2015… However, before patient notes can be shared with medical investigators, some types of information, referred to as protected health information (PHI), must be removed in order to preserve patient confidentiality.”); 
generating, by the computing device, a second customized character-based token embedding of the third training dataset (pg. 600; “The direct mapping VTðÞ from token to vector, often called a token (or word) embedding, can be pre-trained on large unlabeled datasets using programs such as word2vec30,54,55 or GloVe,32 and can be learned jointly with the rest of the model.”); and 
training another neural network on the second training dataset by using the second customized character-based token embedding to output whether another input unstructured textual data of the second unlabeled training data of the third training dataset (pg. 598; “The token embeddings are jointly learned with the other parameters of the ANN. They can be initialized randomly, or can be pre-trained using large unlabeled datasets typically based on token co-occurrences.30–32 The latter often performs better, since the pre-trained token embeddings explicitly encode many linguistic regularities and patterns.” The RNN is trained based on the learned token embeddings based on the pre-training. Pg. 600; “The direct mapping VTðÞ from token to vector, often called a token (or word) embedding, can be pre-trained on large unlabeled datasets using programs such as word2vec30,54,55 or GloVe,32 and can be learned jointly with the rest of the model.”), and
wherein the de-identifying of the protected data comprises de-identifying the protected data in another input unstructured textual data associated with the second organization based on another plurality of tags corresponding to the other input unstructured textual data (pg. 598; “To alleviate some downsides of the rule-based systems, there have been many attempts to use supervised machine learning algorithms to de-identify patient notes. These algorithms are used to train a classifier to label each word as PHI or not PHI, sometimes distinguishing between different PHI types.”).

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Dernoncourt/Rice, as applied above, and further in view of Medalion et al. (US-20210125615-A1).
Regarding Claim 2,
Dernoncourt and Rice teach the computer-implemented method of claim 1. 
	Dernoncourt and Rice do not explicitly disclose
wherein the first training dataset comprises labeled training data.
However, Medalion teaches
wherein the first training dataset comprises labeled training data (para [0026] “Semi-supervised machine learning algorithms generally use both labeled and unlabeled data for training, which may typically involve a relatively smaller amount of labeled data and a relatively larger amount of unlabeled data.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the machine learning model of Dernoncourt and Rice with the semi-supervised learning of Medalion.
Doing so would allow for training the model with labelled and unlabeled data which greatly improves the accuracy as compared to unsupervised learning (Medallion para [0026]).

Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Dernoncourt/Rice, as applied above, and further in view of Gu et al. (US-20200160593-A1).
Regarding Claim 5,
Dernoncourt and Rice teach the computer-implemented method of claim 3. 
	Dernoncourt and Rice do not explicitly disclose
wherein the training of the neural network comprises: 
a pre-training of the neural network based on the second training dataset; and 
a training of the pre-trained neural network based on the first training dataset.
However, Gu teaches 
a pre-training of the neural network based on the second training dataset; and 
a training of the pre-trained neural network based on the first training dataset (para [0029] “In an embodiment, labeled synthetic images are used to pre-train the IRN 105 before the IRN 105 is trained within the inverse rendering training system 100, and then parameters of the IRN 105 are fine-tuned within the inverse rendering training system 100 using unlabeled real images.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the training of Dernoncourt and Rice with the inverse training of Gu.
Doing so would allow for training a machine learning model without requiring a large labelled dataset which can take a long time for humans to annotate (Gu para [0030]).
	
Claims 6-7 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Dernoncourt/Rice, as applied above, and further in view of Dain et al. (US-20200241967-A1).
Regarding Claim 6,
Dernoncourt and Rice teach the computer-implemented method of claim 3. Dernoncourt further teaches further comprising: 
receiving at least a portion of the first training dataset from the platform (para [0058] “The prediction model can also be trained using some of the log data from log data storage 170 that have been labelled PII.”).
Dernoncourt and Rice are analogous because they are directed towards the same field of endeavor of using neural networks to identify personally identifiable information.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the training of Dernoncourt with the labelled training data of Rice.
Doing so would allow for customizing the training dataset to include a number of sets labelled as PII and a number of sets labelled as non-PII (Rice para [0070]).
Dernoncourt and Rice do not explicitly disclose
providing a platform to generate a manually programmed dictionary of terms indicative of protected data;
However, Dain (US 20200241967 A1) teaches
providing a platform to generate a manually programmed dictionary of terms indicative of protected data (para [0123] For example, in the above dictionary examples for code development environment, if International Organization for Standardization (ISO) and problem management report (PMR) words are present (extracted facets), the rules engine 357 may tag the metadata with “sensitive” tag indicating that the corresponding content includes sensitive data.);
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the machine learning model of Dernoncourt and Rice with the dictionary of Dain.
Doing so would allow for dictionary lookups to quickly match words and synonyms from a dictionary to words in content of the metadata to associate keywords with user defined facets such as sensitive information (Dain para [0118])
Regarding Claim 7,
Dernoncourt, Rice, and Dain teach the computer-implemented method of claim 6. Dain further teaches wherein the providing of the platform comprises providing an applications programming interface (API) (para [0044] The communication network 310 in some cases may also include application programming interfaces (APIs) including, e.g., cloud service provider APIs, virtual machine management APIs, and hosted service provider APIs.).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the machine learning model of Dernoncourt and Rice with the dictionary API of Dain.
Doing so would allow for dictionary lookups to quickly match words and synonyms from a dictionary to words in content of the metadata to associate keywords with user defined facets such as sensitive information (Dain para [0118])

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Dernoncourt/Rice, as applied above, and further in view of Brigham et al. (US-20200143115-A1) and Lutz et al. (US 20190121615 A1).
Regarding Claim 12,
Dernoncourt and Rice teach the computer-implemented method of claim 1. Dernoncourt further comprising: 
generating the tokenized representation by generating one or more tokens based on the input unstructured textual data (pg. 598; “In the CRF model, each patient note is tokenized using the Stanford CoreNLP tokenizer,50 and features are extracted for each token.”); 
Dernoncourt and Rice do not explicitly disclose
for each token of the one or more tokens, converting (i) a character to a lowercase letter, and (ii) a numeral to zero.
However, Brigham (US 20200143115 A1) teaches
for each token of the one or more tokens, converting (i) a character to a lowercase letter (para [0012] Next, the message sentences are normalized. This may include converting to lowercase strings, identifying specific parts of speech and replacing it with tokens, generating n-grams, and the like.), 
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the tokens of Dernoncourt and Rice with the string conversion of Brigham.
Doing so would allow for normalizing the sentence. It may be desirable to analyze the normalized message text for what is known as a “speech act”, which basically categorizes the sentences into a type of speaking action: a question, statement, command, desire or commitment (Brigham para [0012]).
Lutz (US 20190121615 A1) teaches
for each token of the one or more tokens, converting (ii) a numeral to zero (para [0161] In particular, as discussed earlier, a certain number of least significant bits of the lim string may all be set to zero if the lim string is padded in that manner.).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the tokens of Dernoncourt and Rice with the string conversion of Lutz.
Doing so would allow for formatting a string in a padded manner (Lutz para [0161]).

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Dernoncourt/Rice, as applied above, and further in view of Steigauf et al. (US-20160350919-A1).
Regarding Claim 13,
Dernoncourt and Rice teach the computer-implemented method of claim 1. 
	Dernoncourt and Rice do not explicitly disclose
wherein the first protected data comprises protected health information, and wherein the first training dataset comprises radiology images that include the protected health information.
	However, Steigauf (US 20160350919 A1) teaches
wherein the first protected data comprises protected health information, and wherein the training dataset (para [0056] In connection with the identification, further operations may be performed to anonymize or de-identify protected health information (or other identifying data fields) for use in the training model (operation 640). For example, this may include the anonymization or removal of patient names, medical record numbers, or other identifying characteristics from the training images and associated reports or order data. In some examples, protected health information and other identifying information that is included directly within (“burned-in”) pixel data may be detected with the use of optical character recognition (OCR) or other computer-assisted techniques.) comprises radiology images that include the protected health information (para [0031] FIG. 2 illustrates a system operations diagram 200 of an example workflow for generating and routing a set of data produced from a particular medical imaging study (e.g., a radiology study) with use of a trained image recognition model 240 applied by a machine learning system 230 according to an example described herein.).
	It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the machine learning model of Dernoncourt and Rice with the training data of Steigauf.
	Doing so would allow for anonymizing data fields with sensitive information this may include the anonymization or removal of patient names, medical record numbers, or other identifying characteristics from the training images and associated reports or order data (Steigauf para [0056]).

Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Dernoncourt/Rice, as applied above, and further in view of Thomas et al. (US-20150141149-A1).
Regarding Claim 16,
Dernoncourt and Rice teach the computer-implemented method of claim 1. Rice further comprising:
the second computing device comprising a trained version of the neural network (para [0063] “Social network host 700 may include one or more processors 702, a network interface card (NIC) 704, and computer readable medium 720 that stores a ML-based classifier 730, an action module 732, as well as log data processing module 180 of FIG. 1.” Para [0069] “As discussed above, a prediction model can be constructed and trained from data samples that are labelled as PII.”); 
after sending the request, the computing device receiving, from the second computing device, the predicting of whether the given input unstructured textual data comprises protected data (para [0003] “For example, within a certain period of time after the record is created, or based on a user's request, the record may be examined to identify information in the record that can potentially be used as PII.” para [0069] “ML-based classifier 730 can use the prediction model to process the log data partition, to predict whether the log data partition includes PII.”); and 
de-identifying the protected data in the given input unstructured textual data para [0054] “Machine-learning (ML) or artificial intelligence (AI)-based techniques may be used to assist in the aforementioned de-identification processing. More specifically, the system can isolate a partition of a log data column that has previously been subject to the de-identification processing.”).\
Dernoncourt and Rice are analogous because they are directed towards the same field of endeavor of using neural networks to identify personally identifiable information.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the training of Dernoncourt with the labelled training data of Rice.
Doing so would allow for customizing the training dataset to include a number of sets labelled as PII and a number of sets labelled as non-PII (Rice para [0070]).
Dernoncourt and Rice do not explicitly disclose
determining, by the computing device, a request to de-identify potentially protected data in a given input unstructured textual data; 
sending the request to de-identify the potentially protected data from the computing device to a second computing device, 
However, Thomas (US 20150141149 A1) teaches
determining, by the computing device, a request to de-identify potentially protected data in a given input unstructured textual data (para [0188] Embodiments can receive common requests from clients, create provider-specific requests based on the common request, de-identify personal health information, obfuscate or remove sensitive data such as personally identifying information, and ensure compliance with privacy/labor rules before sending provider-specific requests to a selected gamification provider.); 
sending the request to de-identify the potentially protected data from the computing device to a second computing device (para [0191] At 806, module 18 transforms the common request into a provider specific request. In some embodiments, transforming the common request into a provider specific request includes de-identify personal health information, obfuscate sensitive data such as personally identifying information, and/or ensure compliance with privacy/labor rules before sending provider-specific requests to a selected gamification provider, as described above in FIG. 6.), 
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the machine learning model of Dernoncourt and Rice with the method of de-identifying of Thomas.
Doing so would allow for de-identify personal health information, obfuscate sensitive data such as personally identifying information, and/or ensure compliance with privacy/labor rules before sending provider-specific requests to a client device (Thomas para [0191]).
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HENRY K NGUYEN whose telephone number is (571)272-0217. The examiner can normally be reached Mon - Fri 7:00am-4:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached at 5712723768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/HENRY NGUYEN/Examiner, Art Unit 2121
Read full office action
Prosecution Timeline

Show 7 earlier events
Apr 09, 2025
Examiner Interview Summary
Apr 17, 2025
Request for Continued Examination
Apr 30, 2025
Response after Non-Final Action
Aug 14, 2025
Non-Final Rejection mailed — §103, §112
Dec 03, 2025
Response Filed
Mar 11, 2026
Final Rejection mailed — §103, §112
May 26, 2026
Examiner Interview Summary
May 26, 2026
Applicant Interview (Telephonic)
Precedent Cases

Applications granted by this same examiner with similar technology

16/561,896
Patent 12585933
TRANSFER LEARNING WITH AUGMENTED NEURAL NETWORKS
6y 6m to grant Granted Mar 24, 2026
19/115,468
Patent 12572776
Method, System, and Computer Program Product for Universal Depth Graph Neural Networks
11m to grant Granted Mar 10, 2026
15/225,806
Patent 12547484
Methods and Systems for Modifying Diagnostic Flowcharts Based on Flowchart Performances
9y 6m to grant Granted Feb 10, 2026
17/153,453
Patent 12541676
NEUROMETRIC AUTHENTICATION SYSTEM
5y 0m to grant Granted Feb 03, 2026
18/509,585
Patent 12505470
SYSTEMS, METHODS, AND STORAGE MEDIA FOR TRAINING A MACHINE LEARNING MODEL
2y 1m to grant Granted Dec 23, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

5-6
Expected OA Rounds
58%
Grant Probability
88%
With Interview (+30.7%)
4y 5m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 160 resolved cases by this examiner. Grant probability derived from career allowance rate.