Notice of Pre-AIA or AIA Status
1. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
2. The information disclosure statements (IDS) submitted on December 20, 2023 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Continued Examination Under 37 CFR 1.114
3. A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on October 14, 2025 has been entered.
Response to Amendment
4. The amendment filed on August 6, 2025 has been entered. Claims 1-7, 9-17, and 21-24 remain pending in the application. Claims 1 and 11 are amended. Claims 8 and 18-20 are cancelled.
The applicant argues that the prior art of record does not disclose the amended limitation of “the mask covering or substituting the personal information at the position for the at least one label with arbitrary symbols or characters”.
Applicant’s arguments with respect to the 35 U.S.C. 103 rejections for claims 1-7, 9-17, and 21-24 have been considered but are moot because the arguments are directed towards amended claim language, addressed on new grounds of rejection below.
Claim Rejections - 35 USC § 101
5. 35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-6, 9-14, and 21-24 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
The Independent claim 1 recites “A system for automated text anonymisation of a document, the system comprising: at least two symbolic artificial intelligence (Al) pipeline components including named- entity recognition (NER) processes that detect personal information in the document, at least one of the symbolic Al pipeline components: generating at least one label that indicates a type of the personal information, and indicating a position of the personal information in the document; and a masking software component comprising computing instructions for receiving the at least one label and the position of the personal information and applying a mask to the document based on the position of the personal information, the mask covering or substituting the personal information at the position for the at least one label with arbitrary symbols or characters, the masking component generating a de-identified document”.
The limitations “at least two symbolic artificial intelligence (Al) pipeline components including named- entity recognition (NER) processes that detect personal information in the document, at least one of the symbolic Al pipeline components: generating at least one label that indicates a type of the personal information, and indicating a position of the personal information in the document; and a masking software component comprising computing instructions for receiving the at least one label and the position of the personal information and applying a mask to the document based on the position of the personal information, the mask covering or substituting the personal information at the position for the at least one label with arbitrary symbols or characters, the masking component generating a de-identified document” as drafted, covers a mental process, as this could be done by mentally or by hand with pen and paper.
This judicial exception is not integrated into a practical application. Claim 1 recites “A system for automated text anonymization of a document, the system comprising:…”. This limitation directs towards using a computer for the method, and does not impose any meaningful limits on practicing the abstract idea.
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. The addition of the generic computer components recited above with regard to claim 1 do not amount to more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Claim 1 does not recite any additional limitations. The claim as drafted, is not patent eligible.
The independent claim 11 recites “A method for automated text anonymisation of a document, the method comprising: processing the document by applying symbolic artificial intelligence (Al) pipeline components including named-entity recognition (NER) processes that detect personal information in the document; outputting the personal information individually as at least one label to a label file associated with the document, each of the at least one label including a type of the personal information and an indication of a position of the personal information in the document; and generating, via execution of a masking software component comprising computing instructions, a de-identified document based on the label file and the document,the de-identified document having a mask at a mask position corresponding to the position of personal information in each of the at least one label and the position of the personal information in the document, the mask covering or substituting the personal information at the position for the at least one label with arbitrary placeholder symbols or characters”.
The limitations “processing the document by applying symbolic artificial intelligence (Al) pipeline components including named-entity recognition (NER) processes that detect personal information in the document; outputting the personal information individually as at least one label to a label file associated with the document, each of the at least one label including a type of the personal information and an indication of a position of the personal information in the document; and generating, via execution of a masking software component comprising computing instructions, a de-identified document based on the label file and the document,the de-identified document having a mask at a mask position corresponding to the position of personal information in each of the at least one label and the position of the personal information in the document, the mask covering or substituting the personal information at the position for the at least one label with arbitrary placeholder symbols or characters” as drafted, covers a mental process, as this could be done by mentally or by hand with pen and paper.
This judicial exception is not integrated into a practical application. Claim 11 recites “A method for automated text anonymisation of a document, the method comprising…”. This limitation directs towards using a computer for the method, and does not impose any meaningful limits on practicing the abstract idea.
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. The addition of the generic computer components recited above with regard to claim 11 do not amount to more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Claim 11 does not recite any additional limitations. The claim as drafted, is not patent eligible.
Claim Rejections - 35 USC § 103
6. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically taught as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
7. Claims 1-6, 9-14, and 21-24 are rejected under 35 U.S.C. 103 as being unpatentable over Vianu (U.S. Publication No. 20200334416) in view of Rao (U.S. Publication No. 20210158936) in view of Mehta (U.S. Patent No. 11870757).
Regarding claim 1, Vianu discloses a system for automated text anonymisation of a document ([0164] - performs classification over anonymized target entities in a sentence by using tags to mask the target entities), the system comprising:
at least two symbolic artificial intelligence (AI) pipeline components including named-entity recognition (NER) processes that detect personal information in the first document, at least one of the symbolic pipeline components ([0143] - the architecture of NLP pipeline 400 includes an OCR cleaning system 410, a named entity recognition (NER) system 420, an entity relationship linking system 430, an entity classification system 440, and a reviewer model x50);
generating at least one label that indicates a type of the personal information ([0019] - training a classifier over a pre-defined lexicon of clinical terminology; and providing the plurality of NER tagged spans to the trained classifier, such that the trained classifier labels NER tagged spans of the Pathology class with a specific pathology type),
and indicating a position of the personal information in the document ([0019] - training a classifier over a pre-defined lexicon of clinical terminology; and providing the plurality of NER tagged spans to the trained classifier, such that the trained classifier labels NER tagged spans of the Pathology class with a specific pathology type);
However, Vianu does not disclose a masking software component comprising computing instructions for receiving the at least one label and the position of the personal information and applying a mask to the document based on the position of the personal information,
the masking component generating a de-identified document.
Rao does teach a masking software component comprising computing instructions for receiving the at least one label and the position of the personal information and applying a mask to the document based on the position of the personal information ([0084] - The patient history data 430 can include patient identifier data 431 which can include basic patient information such as name or an identifier that may be anonymized to protect the confidentiality of the patient, age, and/or gender. The patient identifier data 431 can also map to a patient entry in a separate patient database stored by the database storage system, or stored elsewhere),
the masking component generating a de-identified document ([0182] - the report data can be de-identified by obfuscating, hashing, removing, replacing with a fiducial, or otherwise anonymizing the identified patient identifying text to generate de-identified report data).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Vianu to incorporate the teachings of Rao in order to implement a masking software component comprising computing instructions for receiving the at least one label and the position of the personal information and applying a mask to the document based on the position of the personal information, the mask covering or substituting the personal information with arbitrary symbols or characters, the masking component generating a de-identified document. Doing so allows improved inference functions and models corresponding to new scan categories (Rao [0171]).
However, Vianu in view of Rao does not disclose the mask covering or substituting the personal information at the position for the at least one label with arbitrary symbols or characters.
Mehta does teach the mask covering or substituting the personal information at the position for the at least one label with arbitrary symbols or characters ([Col 4, Rows 18-21] - anonymization 168 may be configured to tokenize the structured PII elements detected by a rule-based detection layer of detection unit 164 by replacing the actual data in text data 122 with strings of symbols tokens [Col 7, Rows 31-53] - Control plane 208 of anonymization unit 168 is configured to generate signaling information 224 that defines the tokenization of the sensitive data in text data 200. More specifically, the signaling information 224 identifies, for each token in tokenized data 220, one or more of a location of the token, a type of the token, or an algorithm applied to generate the token for the respective instance of sensitive data within text data 200. An example of signaling information for the specific example shown in FIG. 2 is as follows:… Computing system 160 executing masking service 162 may store text data 200 and the signaling information 224 in cache 214 (illustrated as being within anonymization unit 168) or another on-premises cache or other storage location).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Vianu in view of Rao to incorporate the teachings of Mehta in order to implement the mask covering or substituting the personal information at the position for the at least one label with arbitrary symbols or characters. Doing so protects customer personal information to ensure privacy and security for customers ([Col 1, Rows 16-19]).
Regarding claim 2, Vianu in view of Rao in view of Mehta discloses all limitations of claim 1, above.
Vianu discloses the system, wherein a zoning software component comprises computing instructions for generating a key area label for each of identified one or more key areas of the document, each key area label including a type of the key area and a position of the key area in the document ([0142] - generate multiple points of training data, e.g., from single sentences and/or from sets of sentences relating only to the same identified pathology. In this manner, the training process can separate the need for the NLP pipeline (or constituent machine learning model (s)) to learn to detect relevant sentences from the need to learn to interpret those sentences) [0143] - the architecture of NLP pipeline 400 includes an OCR cleaning system 410, a named entity recognition (NER) system 420, an entity relationship linking system 430, an entity classification system 440, and a reviewer model x50).
Regarding claim 3, Vianu in view of Rao in view of Mehta discloses all limitations of claim 2, above.
Vianu discloses the system, wherein the masking component receives at least one key area label from the zoning software component and applies a mask to the document based on the position of the key area, the mask covering or substituting text in the key area with non-identifying placeholder symbols or characters ([0016] - the best replacement candidate calculated by analyzing a plurality of character-level optical transformation costs weighted by a frequency analysis over a corpus corresponding to the radiological report text [0142] - generate multiple points of training data, e.g., from single sentences and/or from sets of sentences relating only to the same identified pathology. In this manner, the training process can separate the need for the NLP pipeline (or constituent machine learning model (s)) to learn to detect relevant sentences from the need to learn to interpret those sentences) [0143] - the architecture of NLP pipeline 400 includes an OCR cleaning system 410, a named entity recognition (NER) system 420, an entity relationship linking system 430, an entity classification system 440, and a reviewer model x50).
Regarding claim 4, Vianu in view of Rao discloses all limitations of claim 1, above.
Vianu discloses the system, wherein the symbolic AI pipeline components comprise a newline segmenter configured to split or join sentences in the first document according to a predetermined sentence segmentation logic ([0167] - potential NER pairs are generated from a set of training NER spans and a segment of training text is generated for each potential NER pair according to the above masking convention. Each segment of training text is then fed to the appropriate detection model for classification, such that the classification output is trained to predict whether the two NER spans are linked).
Regarding claim 5, Vianu in view of Rao in view of Mehta discloses all limitations of claim 1, above.
Vianu discloses the system, wherein the symbolic AI pipeline components comprise a person title NER component of the NER processes that includes symbolic AI rules that identify person titles in text of the first document and outputs a title label ([0019] - training a classifier over a pre-defined lexicon of clinical terminology; and providing the plurality of NER tagged spans to the trained classifier, such that the trained classifier labels NER tagged spans of the Pathology class with a specific pathology type).
Regarding claim 6, Vianu in view of Rao in view of Mehta discloses all limitations of claim 1, above.
Vianu discloses the system, wherein the symbolic AI pipeline components comprise:
a person label NER component of the NER processes that includes symbolic AI rules that identify person labels in the text of the first document and outputs an honorific label ([0019] - training a classifier over a pre-defined lexicon of clinical terminology; and providing the plurality of NER tagged spans to the trained classifier, such that the trained classifier labels NER tagged spans of the Pathology class with a specific pathology type);
or a person name NER component of the NER processes that includes symbolic AI rules that identify person names in text of the first document and outputs a name label ([0019] - training a classifier over a pre-defined lexicon of clinical terminology; and providing the plurality of NER tagged spans to the trained classifier, such that the trained classifier labels NER tagged spans of the Pathology class with a specific pathology type).
Regarding claim 9, Vianu in view of Rao in view of Mehta discloses all limitations of claim 1, above.
Vianu discloses the system, wherein the symbolic Al pipeline components are applied to a training set or as a training set for a neural network Al model or a machine learning Al model to bootstrap a learning of the neural network Al model or the machine learning Al model, the learning comprising representing the symbolic Al logic of the symbolic Al pipeline components as machine learning or neural network layer connections ([0016] - Using the neural network, a plurality of NER tagged spans are generated for the radiological report text, the generating based on the concatenated word and character-level embeddings).
Regarding claim 10, Vianu in view of Rao in view of Mehta discloses all limitations of claim 1, above.
Vianu discloses the system, wherein the symbolic Al pipeline components comprise computer readable instructions stored on computer readable media that, when executed by a hardware processor of a computer, cause the computer to perform one or more processes including detecting the personal information ([0042] - computer system, with non-transitory computer-readable storage media).
Regarding claim 11, Vianu discloses a method for automated text anonymisation of a document ([0164] - performs classification over anonymized target entities in a sentence by using tags to mask the target entities), the method comprising:
processing the document by applying symbolic artificial intelligence (Al) pipeline components including named-entity recognition (NER) processes that detect personal information in the document ([0019] - training a classifier over a pre-defined lexicon of clinical terminology; and providing the plurality of NER tagged spans to the trained classifier, such that the trained classifier labels NER tagged spans of the Pathology class with a specific pathology type [0143] - the architecture of NLP pipeline 400 includes an OCR cleaning system 410, a named entity recognition (NER) system 420, an entity relationship linking system 430, an entity classification system 440, and a reviewer model x50);
outputting the personal information individually as at least one label to a label file associated with the document, each of the at least one label including a type of the personal information and an indication of a position of the personal information in the document ([0019] - training a classifier over a pre-defined lexicon of clinical terminology; and providing the plurality of NER tagged spans to the trained classifier, such that the trained classifier labels NER tagged spans of the Pathology class with a specific pathology type).
However, Vianu does not disclose generating, via execution of a masking software component, comprising computing instructions, a de-identified document based on the label file and the document,
the de-identified document having a mask at a mask position corresponding to the position of personal information in each of the at least one label and the position of the personal information in the document,
Rao does teach generating, via execution of a masking software component, comprising computing instructions, a de-identified document based on the label file and the document ([0084] - The patient history data 430 can include patient identifier data 431 which can include basic patient information such as name or an identifier that may be anonymized to protect the confidentiality of the patient, age, and/or gender. The patient identifier data 431 can also map to a patient entry in a separate patient database stored by the database storage system, or stored elsewhere),
the de-identified document having a mask at a mask position corresponding to the position of personal information in each of the at least one label and the position of the personal information in the document ([0182] - the report data can be de-identified by obfuscating, hashing, removing, replacing with a fiducial, or otherwise anonymizing the identified patient identifying text to generate de-identified report data),
the mask covering or substituting the personal information with arbitrary placeholder symbols or characters ([0182] - the report data can be de-identified by obfuscating, hashing, removing, replacing with a fiducial, or otherwise anonymizing the identified patient identifying text to generate de-identified report data).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Vianu to incorporate the teachings of Rao in order to implement generating, via execution of a masking software component, comprising computing instructions, a de-identified document based on the label file and the document, the de-identified document having a mask at a mask position corresponding to the position of personal information in each of the at least one label and the position of the personal information in the document. Doing so allows improved inference functions and models corresponding to new scan categories (Rao [0171]).
However, Vianu in view of Rao does not disclose the mask covering or substituting the personal information at the position for the at least one label with arbitrary symbols or characters.
Mehta does teach the mask covering or substituting the personal information at the position for the at least one label with arbitrary symbols or characters ([Col 4, Rows 18-21] - anonymization 168 may be configured to tokenize the structured PII elements detected by a rule-based detection layer of detection unit 164 by replacing the actual data in text data 122 with strings of symbols tokens [Col 7, Rows 31-53] - Control plane 208 of anonymization unit 168 is configured to generate signaling information 224 that defines the tokenization of the sensitive data in text data 200. More specifically, the signaling information 224 identifies, for each token in tokenized data 220, one or more of a location of the token, a type of the token, or an algorithm applied to generate the token for the respective instance of sensitive data within text data 200. An example of signaling information for the specific example shown in FIG. 2 is as follows:… Computing system 160 executing masking service 162 may store text data 200 and the signaling information 224 in cache 214 (illustrated as being within anonymization unit 168) or another on-premises cache or other storage location).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Vianu in view of Rao to incorporate the teachings of Mehta in order to implement the mask covering or substituting the personal information at the position for the at least one label with arbitrary symbols or characters. Doing so protects customer personal information to ensure privacy and security for customers ([Col 1, Rows 16-19]).
Regarding claim 12, Vianu in view of Rao in view of Mehta discloses all limitations of claim 11, above.
Vianu discloses the method, further comprising: instantiating, via an AI pipeline module, the symbolic AI pipeline components and the at least one machine learning model ([0142] - generate multiple points of training data, e.g., from single sentences and/or from sets of sentences relating only to the same identified pathology. In this manner, the training process can separate the need for the NLP pipeline (or constituent machine learning model (s)) to learn to detect relevant sentences from the need to learn to interpret those sentences) [0143] - the architecture of NLP pipeline 400 includes an OCR cleaning system 410, a named entity recognition (NER) system 420, an entity relationship linking system 430, an entity classification system 440, and a reviewer model x50).
Regarding claim 13, Vianu in view of Rao in view of Mehta discloses all limitations of claim 11, above.
Vianu discloses the method, wherein the symbolic Al pipeline components comprise a newline segmenter configured to split or join sentences in the first document according to a predetermined sentence segmentation logic, the newline segmenter processing the document before the NER processes ([0167] - potential NER pairs are generated from a set of training NER spans and a segment of training text is generated for each potential NER pair according to the above masking convention. Each segment of training text is then fed to the appropriate detection model for classification, such that the classification output is trained to predict whether the two NER spans are linked).
Regarding claim 14, Vianu in view of Rao in view of Mehta discloses all limitations of claim 11, above.
Vianu discloses the method, wherein the symbolic Al pipeline components comprise:
a person title NER component of the NER processes that includes symbolic Al rules that identify person titles in text of the first document and outputs a title label ([0019] - training a classifier over a pre-defined lexicon of clinical terminology; and providing the plurality of NER tagged spans to the trained classifier, such that the trained classifier labels NER tagged spans of the Pathology class with a specific pathology type);
a person label NER component of the NER processes that includes symbolic Al rules that identify person labels in text of the first document and outputs an honorific label ([0019] - training a classifier over a pre-defined lexicon of clinical terminology; and providing the plurality of NER tagged spans to the trained classifier, such that the trained classifier labels NER tagged spans of the Pathology class with a specific pathology type);
or a person name NER component of the NER processes that includes symbolic Al rules that identify person names in text of the first document and outputs a name label ([0019] - training a classifier over a pre-defined lexicon of clinical terminology; and providing the plurality of NER tagged spans to the trained classifier, such that the trained classifier labels NER tagged spans of the Pathology class with a specific pathology type).
Regarding claim 21, Vianu in view of Rao in view of Mehta discloses all limitations of claim 1, above.
Vianu discloses the system, further comprising: a zoning software component comprising computing instructions, connected to at least one of the symbolic artificial intelligence (AI) pipeline components or the masking component, the zoning software component executing a trained machine learning model to identify key areas of the document ([0142] - generate multiple points of training data, e.g., from single sentences and/or from sets of sentences relating only to the same identified pathology. In this manner, the training process can separate the need for the NLP pipeline (or constituent machine learning model (s)) to learn to detect relevant sentences from the need to learn to interpret those sentences).
Regarding claim 22, Vianu in view of Rao in view of Mehta discloses all limitations of claim 11, above.
Vianu discloses the method, further comprising: processing the document by applying a zoning software component comprising computer instructions, connected to at least one of the symbolic artificial intelligence (AI) pipeline components or a masking component, that executes a trained machine learning model to identify key areas of the document ([0142] - generate multiple points of training data, e.g., from single sentences and/or from sets of sentences relating only to the same identified pathology. In this manner, the training process can separate the need for the NLP pipeline (or constituent machine learning model (s)) to learn to detect relevant sentences from the need to learn to interpret those sentences).
Regarding claim 23, Vianu in view of Rao in view of Mehta discloses all limitations of claim 12, above.
Vianu discloses the method, further comprising: generating a key area label for each of identified one or more key areas of the document, each key area label including a type of the key area and a position of the key area in the document ([0142] - generate multiple points of training data, e.g., from single sentences and/or from sets of sentences relating only to the same identified pathology. In this manner, the training process can separate the need for the NLP pipeline (or constituent machine learning model (s)) to learn to detect relevant sentences from the need to learn to interpret those sentences) [0143] - the architecture of NLP pipeline 400 includes an OCR cleaning system 410, a named entity recognition (NER) system 420, an entity relationship linking system 430, an entity classification system 440, and a reviewer model x50).
Regarding claim 24, Vianu in view of Rao in view of Mehta discloses all limitations of claim 13, above.
Vianu discloses the method, further comprising the masking component:
receiving at least one key area label from a zoning software component comprising computing instructions ([0016] - the best replacement candidate calculated by analyzing a plurality of character-level optical transformation costs weighted by a frequency analysis over a corpus corresponding to the radiological report text);
and applying a mask to the document based on the position of the key area, the mask covering or substituting text in the key area with non-identifying placeholder symbols or characters ([0030] - masking both NER tagged spans with an identifier, the identifier corresponding to a class of the NER tagged span; and generating a masked text sequence for the given pair by replacing the NER tagged spans in the sentence portion with the identifiers).
8. Claims 7 and 15-17 are rejected under 35 U.S.C. 103 as being unpatentable over Vianu (U.S. Publication No. 20200334416) in view of Rao (U.S. Publication No. 20210158936) in view of Mehta (U.S. Patent No. 11870757) in view of Fusco (U.S. Publication No. 20180052904).
Regarding claim 7, Vianu in view of Rao in view of Mehta discloses all aforementioned limitations of claim 1, above.
However, Vianu does not disclose the system, wherein the symbolic AI pipeline components comprise:
a hashing component that generates a hash code representation of a first text item of the personal information, the has code representation being included in the at least one label,
wherein the hashing component applies a n-gram hash to generate the has code representation.
Fusco does teach the system, wherein the symbolic AI pipeline components comprise:
a hashing component that generates a hash code representation of a first text item of the personal information, the has code representation being included in the at least one label ([0002] - The string hash function receives an input character string. The string hash function divides the input character string into n-grams. An n-gram as used herein is a string consisting of n characters. For instance an n-gram with three characters is also referred to as a trigram. The string hash function calculates an n-gram hash value for each of the n-grams. The string hash function calculates an output integer at least partially by aggregating the n-gram hash value for each of the n-grams);
wherein the hashing component applies a n-gram hash to generate the has code representation ([0002] - The string hash function receives an input character string. The string hash function divides the input character string into n-grams. An n-gram as used herein is a string consisting of n characters. For instance an n-gram with three characters is also referred to as a trigram. The string hash function calculates an n-gram hash value for each of the n-grams. The string hash function calculates an output integer at least partially by aggregating the n-gram hash value for each of the n-grams).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the application to modify the teaching of Vianu to include the teachings of Fusco in order to implement the system, wherein the symbolic AI pipeline components comprise: a hashing component that generates a hash code representation of a first text item of the personal information, the has code representation being included in the at least one label, wherein the hashing component applies a n-gram hash to generate the has code representation. Doing so allows for more efficient grouping of elements (Fusco [0021]).
Regarding claim 15, Vianu in view of Rao in view of Mehta discloses all aforementioned limitations of claim 11, above.
However, Vianu does not disclose the method, further comprising:
generating, via a hashing component, a hash code representation of a first text item of the personal information
and incorporating the hash code representation into a first label of the at least one label corresponding to the first text item of the personal information,
wherein the hashing component applies a N-gram hash to generate the hash code representation.
Fusco does teach the method, further comprising:
generating, via a hashing component, a hash code representation of a first text item of the personal information ([0002] - The string hash function receives an input character string. The string hash function divides the input character string into n-grams. An n-gram as used herein is a string consisting of n characters. For instance an n-gram with three characters is also referred to as a trigram. The string hash function calculates an n-gram hash value for each of the n-grams. The string hash function calculates an output integer at least partially by aggregating the n-gram hash value for each of the n-grams);
and incorporating the hash code representation into a first label of the at least one label corresponding to the first text item of the personal information ([0002] - The string hash function receives an input character string. The string hash function divides the input character string into n-grams. An n-gram as used herein is a string consisting of n characters. For instance an n-gram with three characters is also referred to as a trigram. The string hash function calculates an n-gram hash value for each of the n-grams. The string hash function calculates an output integer at least partially by aggregating the n-gram hash value for each of the n-grams);
wherein the hashing component applies a N-gram hash to generate the hash code representation ([0002] - The string hash function receives an input character string. The string hash function divides the input character string into n-grams. An n-gram as used herein is a string consisting of n characters. For instance an n-gram with three characters is also referred to as a trigram. The string hash function calculates an n-gram hash value for each of the n-grams. The string hash function calculates an output integer at least partially by aggregating the n-gram hash value for each of the n-grams).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the application to modify the teaching of Vianu to include the teachings of Fusco in order to implement the method, further comprising: generating, via a hashing component, a hash code representation of a first text item of the personal information; and incorporating the hash code representation into a first label of the at least one label corresponding to the first text item of the personal information, wherein the hashing component applies a N-gram hash to generate the hash code representation. Doing so allows for more efficient grouping of elements (Fusco [0021]).
Regarding claim 16, Vianu in view of Rao in view of Mehta in view of Fusco discloses all aforementioned limitations of claim 11, above.
Vianu discloses the method, wherein the hash code representation is applied by the masking component as the mask to the document in the position of the personal information corresponding to the first text item ([0016] - the best replacement candidate calculated by analyzing a plurality of character-level optical transformation costs weighted by a frequency analysis over a corpus corresponding to the radiological report text).
Regarding claim 17, Vianu in view of Rao in view of Mehta in view of Fusco discloses all aforementioned limitations of claim 11, above.
Vianu discloses the method, wherein the mask applied by the masking component to the document is a character-for-character replacement of characters forming the personal information ([0016] - the best replacement candidate calculated by analyzing a plurality of character-level optical transformation costs weighted by a frequency analysis over a corpus corresponding to the radiological report text).
Conclusion
9. The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Hertz (U.S. Publication No. 20190354544) teaches machine learning-based relationship association and related discovery and search engines. Muffat (U.S. Publication No. 20200250139) teaches methods, personal data analysis system for sensitive personal information detection, linking, and purposes of personal data usage prediction.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ETHAN DANIEL KIM whose telephone number is (571) 272-1405. The examiner can normally be reached on Monday - Friday 9:00 - 5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on (571) 272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ETHAN DANIEL KIM/
Examiner, Art Unit 2658
/RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658