Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
2. This action is in response to the amendment filed December 29, 2025.
3. Claims 1-3, 5, 7-12, and 16-20 have been amended.
4. Claims 1-20 have been examined and are pending with this action.
Response to Arguments
5. Applicant’s arguments filed December 29, 2025, with respect to the rejection(s) of claims 1-5, 7-14, and 16-20 under 35 U.S.C. 102(a)(1) and 102(a)(2) as being anticipated by Rao (US 2022/0027508 A1) have been fully considered and are persuasive. Therefore, the rejection has been withdrawn. However, upon further consideration, a new ground(s) of rejection is made in view of Davis et al. (US 2011/0113049 A1) and Ghare et al. (US 10,129,118 B1).
Davis has been cited to explicitly teach the use of the “data anonymizations rule ontology” and Ghare has been cited to explicitly teach the newly amended “automatically modifiable” functionality. Independent claims 10 and 19 do not reflect the newly amended “automatically modifiable” functionality, and therefore rejected in view of Davis et al. (US 2011/0113049 A1) alone and not in view of Ghare et al. (US 10,129,118 B1). Please see rejections below.
For at least the rejections set forth below, claims 1-20 remain rejected and pending.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
6. Claims 1-9 are rejected under 35 U.S.C. 103 as being unpatentable over Rao (US 2022/0027508 A1) in view of Davis et al. (US 2011/0113049 A1) and Ghare et al. (US 10,129,118 B1).
As per claim 1, Rao teaches a computer-implemented method, comprising:
retrieving a first maintenance report comprising an instance of text data describing a maintenance event for a first physical apparatus (see Rao, [0033]: “The Privacy Engine receives the electronic documents, that include content portions of unstructured text, structured text, image content, audio content and/or video content. The Privacy Engine stores the received electronic documents in a data store device. The Privacy Engine builds a content profile according to indicated categories for portions of the content (Act 204).”; and [0038]: “The machine learning model receives input data based one or more portions of content from the collection of electronic documents and identifies those portions of content that include data (i.e. account numbers, social security numbers, monetary amounts, personal contact information and various types of financial transactions) that are likely to be identified as personal and private information subject to the privacy rules and regulations of the specific region. For portions of content identified as including possible personal and private information, the Privacy Engine returns a probability value generated by the selected machine learning model that corresponds to the pre-defined category for the specific region.”);
processing, by operation of one or more computer processors, the first maintenance report using a trained Named Entity Recognition (NER) model to identify instances of one or more words that are associated with a respective one or more real-world names (see Rao, [0006]: “By leveraging multiple Natural Language Processing (NLP) models, Named Entity Recognition models and others machine learning techniques to recognize privacy information included within unstructured content, the Privacy Engine provides a customizable solution that improves the accuracy of privacy information detection while still allowing for human activity in a privacy engine feedback loop to review/approve the redacted content and to further train the Privacy Engine based on detected privacy information and the review/approval activity of one or more human reviewers.”; and [0056]: “However, there may also be an additional machine learning model for named entity recognition trained for identifying whether data at portions of electronic document content may likely include various types of privacy information that typically appear in various types of documents related to a second particular organization—where the first and second particular organizations are different than each other. Moreover, the training data 122 used to train the machine learning model 506 for named entity recognition and additional machine learning model for named entity recognition may be the same in some respects and may be different in some respects as well.”);
determining whether a first identified instance of one or more words represents sensitive data (see Rao, [0003]: “Conventional systems exist for sorting content that may include sensitive information, such as private and personal data that is subject to various laws and regulations.”; [0005]: “a portion of electronic document content (“a content portion”) may be one or more phrases or sentences in a document, a segment of a video frame in a video file and/or a range of time during an audio file. In various embodiments, a content portion may be a document within a plurality of documents and/or be a file within a plurality of files of different formats. The Privacy Engine recreates the one or more of the initial electronic documents to include display of the one or more privacy information redactions. Each recreated electronic document with redactions may then be utilized to generate reports for the internal use by an organization in compliance with various law and regulations and may be utilized to generate a compliant report for delivery to a consumer.”; [0029]: “The network training module 112 of the system 100 may perform functionality in order to train the machine learning network 130 and one or more rule sets based on data in the one or more databases 120, 122.”; [0038]: “For example, a machine learning model for a specific region that includes a country involved in a financial transaction may enforce one or more privacy rules and regulations with respect to the type of information in the customer's financial records. The machine learning model receives input data based one or more portions of content from the collection of electronic documents and identifies those portions of content that include data (i.e. account numbers, social security numbers, monetary amounts, personal contact information and various types of financial transactions) that are likely to be identified as personal and private information subject to the privacy rules and regulations of the specific region.”; [0054]: “The machine learning model 410 for geographic category detection may identify whether an electronic document (or a n electronic document portion) may be subject to (or not subject to) various privacy laws, data handling laws, data maintenance laws and/or data storage laws for one or more countries and/or one or more jurisdictions. For example, machine learning model 410 for geographic category detection may identify an electronic document (or a portion of an electronic document) that includes symbols, numbers, graphics and/or phrases that correlate with legal requirements arising under laws of one or more jurisdictions. According to various embodiments, the machine learning model 410 for geographic category detection may detect that an electronic document (or a portion of an electronic document) may be subject to legal requirements from multiple jurisdictions.”); and [0060]: “The detection engine module 106 passes each content portion to one or more machine learning models 504, 506, 508, 510 selected according to the rule set 500 and the respective content portion's indicated pre-defined categories (and sub-categories). In various embodiments there may be multiple machine learning models 504, 506, 508, 510 selected to be applied to a respective content portion according to an order specific to that respective content portion due to the one or more indicated pre-defined categories. The machine learning models generate probability values related to data at a respective content portion that may represent a likelihood that the data may be one or more types of privacy information subject to redaction”); and
based on determining that the first maintenance report includes sensitive data:
determining whether the first maintenance report is automatically modifiable with a first modification to remove the sensitive data (see Rao, [0035]: “The Privacy Engine visually modifies a document location of each occurrence of privacy information such that the privacy information is obscured from view. The Privacy Engine displays the redacted versions of the electronic documents.”; and [0061]: “The redaction module 108 generates a version of the content (such as an electronic document) based on its content portions (e.g. electronic document portions), the corresponding stored structural data and attributes for the content portions and the identified likely privacy information based on data at each content portion. The redaction module 108 generates a version of the content with redactions of the privacy information mapped to the structural attributes. In various embodiments, if the content is a financial document based on one or more document portions with data that is privacy information located within a header and a table, the redaction module 108 generates a version of the financial document such that the privacy information is visually modified (i.e. concealed, obscured, deleted)—and such visual modification is perceivable at a document position of the privacy information as mapped according to the financial document's extracted structural data and attributes. For example, if the financial document header includes an account number that one or more machine learning models identified as privacy information, the redaction module 108 generates the version of the financial document with the account number privacy information redacted at the document position of the header (as represented by the extracted structural data and attributes)”);
performing, by operation of the one or more computer processors, the first modification on the first maintenance report and adding the modified first maintenance report to a plurality of maintenance reports to be externally released (see Rao, [0035]: “The Privacy Engine visually modifies a document location of each occurrence of privacy information such that the privacy information is obscured from view. The Privacy Engine displays the redacted versions of the electronic documents.”; [0039]: “The Privacy Engine may then generate a report for the requesting customer that identifies the types of personal and private information retained by the organization but the report itself may include the redactions to avoid inadvertent of the customer's actual personal and private information.”; [0045]: “In response to the human reviewer selection, the Privacy Engine replaces the selected visually modified electronic document position with a display of the data of the portion of content that originally appeared at the document position.”; and [0061]: “The redaction module 108 generates a version of the content with redactions of the privacy information mapped to the structural attributes. In various embodiments, if the content is a financial document based on one or more document portions with data that is privacy information located within a header and a table, the redaction module 108 generates a version of the financial document such that the privacy information is visually modified (i.e. concealed, obscured, deleted)—and such visual modification is perceivable at a document position of the privacy information as mapped according to the financial document's extracted structural data and attributes.”); or
flagging the first maintenance report as a potentially sensitive maintenance report for further review (see Rao, [0046]: “human reviewer may provide an approval or rejection of the selected redaction… The human reviewer's approvals and rejections (and the corresponding redactions and data at the content portions) may be fed back into the Privacy Engine to build and train a machine learning model to be added to the machine learning network 130 and/or to tune how the Privacy Engine selects current machine learning model for one or more detected pre-defined categories and/or to be applied to future content portions”; [0058]: “It is further understood that the rule set 500 may be continuously updated based on human reviewer rejection and approval decisions such that the rule set 500 (as it becomes continuously updated) gets tuned to select more appropriate machine learning models 504, 506, 508, 510 for the detected categories of any given content portion. The machine learning models 504, 506, 508, 510 may employ, according to non-limiting examples, RegEx expression machine learning techniques, named entity recognition techniques, natural language processing techniques, keyphrase detection techniques and/or sentiment analysis.”).
Rao does not explicitly teach using a data anonymization rules ontology that describes a plurality of different ways to identify sensitive data within maintenance reports; and removing the sensitive data using the data anonymization rules ontology.
Davis teaches using a data anonymization rules ontology that describes a plurality of different ways to identify sensitive data within maintenance reports; and removing the sensitive data using the data anonymization rules ontology (see Davis, [0014]: “Domain ontology-driven entity extraction and anonymization analysis may be used to sanitize unstructured data to comply with regulations for release.”; [0017]: “A characteristic may be generalized by replacing the term used for the characteristic in the unstructured data with a more general term determined using ontological analysis, which defines relationships between concepts. In some embodiments, ontological analysis may include use of a taxonomy.”; and [0020]: “Anonymization module 404 performs anonymization on PAT 403, using ontological analysis module 405, which may in some embodiments include a taxonomy.”).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the system of Rao in view of Davis by implementing using a data anonymization rules ontology that describes a plurality of different ways to identify sensitive data within maintenance reports; and removing the sensitive data using the data anonymization rules ontology. One would be motivated to do so because Rao teaches in paragraph [0060], “The detection engine module 106 passes each content portion to one or more machine learning models 504, 506, 508, 510 selected according to the rule set 500 and the respective content portion's indicated pre-defined categories (and sub-categories). In various embodiments there may be multiple machine learning models 504, 506, 508, 510 selected to be applied to a respective content portion according to an order specific to that respective content portion due to the one or more indicated pre-defined categories. The machine learning models generate probability values related to data at a respective content portion that may represent a likelihood that the data may be one or more types of privacy information subject to redaction.”, emphasis added.
Rao does not explicitly teach the performing step and flagging step are based on determining that the first maintenance report is automatically modifiable or not automatically modifiable, respectively.
Ghare teaches the performing step and flagging step are based on determining that the first maintenance report is automatically modifiable or not automatically modifiable, respectively (see Ghare, col.5, lines 39-50: “Anomaly detection 120 may provide indications of anomalies 122 which may trigger the performance of various responsive actions by stream management system 110 or other systems, components, or devices. For example, automated corrective actions may be performed in response to the detection or indication of an anomaly in a stream of data records to halt the operation of a device that emitted the anomalous data metric. In some embodiments, identified anomalies may be flagged or marked for further analysis, by a different type of anomaly detector that operates utilizing a more costly, or may be filtered or removed from the stream of data records.”.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the system of Rao in view of by implementing performing step and flagging step are based on determining that the first maintenance report is automatically modifiable or not automatically modifiable, respectively. One would be motivated to do so because Ghare teaches in column 3, lines 32-34, “anomaly detection may be performed on all data records of data streams automatically, without explicit requests from a client.”.
As per claim 2, which depends on claim 1, Rao further teaches wherein the first maintenance report comprises a first section comprises structured text data describing attributes of the maintenance event and a second section comprises unstructured text data written by a maintenance operator describing details of the maintenance event (see Rao, [0006]: “By leveraging multiple Natural Language Processing (NLP) models, Named Entity Recognition models and others machine learning techniques to recognize privacy information included within unstructured content”; [0023]: “The machine learning network 130 and the databases 120, 122 may further be components of the system 100 as well. In various embodiments, a database 120 may include various types of structured and unstructured content, customer identifiers, extracted structural attributes and/or human reviewer activity and approval decisions.”; and [0033]: “The Privacy Engine receives the electronic documents, that include content portions of unstructured text, structured text, image content, audio content and/or video content. The Privacy Engine stores the received electronic documents in a data store device. The Privacy Engine builds a content profile according to indicated categories for portions of the content (Act 204). According to various embodiments, the Privacy Engine evaluates the one or more initial electronic documents to build a content profile based on detecting information at the respective portions of content that indicates one or more pre-defined categories”).
As per claim 3, which depends on claim 1, Rao teaches further comprising: prior to processing the first maintenance report, training the NER model using a plurality of annotated maintenance reports, wherein each of the plurality of annotated maintenance reports comprises a first section comprising structured text data describing attributes of a maintenance event and a second section comprising unstructured text data written by a maintenance operator describing details of the maintenance event, and wherein the plurality of annotated maintenance reports are annotated to comprise a plurality of text entries, each corresponding to a portion of text in either the first section or the second section and associated with a respective one or more tagged machine components (see Rao, [0023]: “FIG. 1 illustrates a block diagram of an example system 100 of the Privacy Engine for training a machine learning network 130 with input training… The machine learning network 130 and the databases 120, 122 may further be components of the system 100 as well. In various embodiments, a database 120 may include various types of structured and unstructured content, customer identifiers, extracted structural attributes and/or human reviewer activity and approval decisions”; [0033]: “The Privacy Engine receives the electronic documents, that include content portions of unstructured text, structured text, image content, audio content and/or video content. The Privacy Engine stores the received electronic documents in a data store device. The Privacy Engine builds a content profile according to indicated categories for portions of the content (Act 204)”; and [0051]: “As shown in FIG. 4, structured and/or unstructured content 400 may be uploaded to and ingested via the ingestion module 102. The content 400 may include one or more of any type of content format, such as: an unstructured document, a structured document, an email, an audio file, a video file, an audio recording, an audio recording transcript, and/or a presentation. The ingestion module detects the various types of content formats of the content 400 and tags the content as being associated with one or more customer identifiers. In various embodiments, the content 400 include one or more initial electronic documents and/or content portions from the one or more initial electronic documents.”).
As per claim 4, which depends on claim 3, Rao and Davis further teaches wherein training the NER model further uses one or more specific machine ontologies that describes a physical machine and a plurality of components of the physical machine, wherein the one or more tagged machine components associated with the plurality of text entries each correspond to a respective concept within the one or more specific machine ontologies (see Claim 1 rejection above).
As per claim 5, which depends on claim 3, Rao and Davis teach further comprising: responsive to flagging the first maintenance report as a potentially sensitive maintenance report for further review, receiving user feedback specifying whether the first maintenance report comprises sensitive information; and updating the data anonymization rules ontology based on the received user feedback, wherein one or more weights within the data anonymization rules ontology are modified to reinforce the determination that the first identified instance of the one or more words represents sensitive data if the user feedback indicates that the determination was correct, and wherein the one or more weights within the data anonymization rules ontology are modified to weaken the determination that the first identified instance of the one or more words represents sensitive data if the user feedback indicates that the determination was incorrect (see Rao, [0006]: “By leveraging multiple Natural Language Processing (NLP) models, Named Entity Recognition models and others machine learning techniques to recognize privacy information included within unstructured content, the Privacy Engine provides a customizable solution that improves the accuracy of privacy information detection while still allowing for human activity in a privacy engine feedback loop to review/approve the redacted content and to further train the Privacy Engine based on detected privacy information and the review/approval activity of one or more human reviewers.”; [0058]: “It is further understood that the rule set 500 may be continuously updated based on human reviewer rejection and approval decisions such that the rule set 500 (as it becomes continuously updated) gets tuned to select more appropriate machine learning models 504, 506, 508, 510 for the detected categories of any given content portion. The machine learning models 504, 506, 508, 510 may employ, according to non-limiting examples, RegEx expression machine learning techniques, named entity recognition techniques, natural language processing techniques, keyphrase detection techniques and/or sentiment analysis.”; [0062]: “In addition, according to various embodiments, it is understood that the approval module 110 feeds the approvals and the rejections back into the database 120 and/or training data 122 for use by the network training module 112 to tune the rule set 500 to improve machine learning model selection with regard to indicated pre-defined categories (and sub-categories), to train the machine learning network 130 to better identify data at portion of content as privacy information and/or to build new machine learning models to be later deployed in the machine learning network 130 for use by the content profiler module 104 and/or the detection engine module 106.”; and Claim 1 rejection above).
As per claim 6, which depends on claim 1, Rao further teaches wherein determining whether the first identified instance of one or more words represents sensitive data further comprises:
identifying one or more text portions within the first maintenance report that correspond to one or more machine components (see Rao, [0033]: “The Privacy Engine receives the electronic documents, that include content portions of unstructured text, structured text, image content, audio content and/or video content. The Privacy Engine stores the received electronic documents in a data store device. The Privacy Engine builds a content profile according to indicated categories for portions of the content (Act 204).”; and [0055]: “It is understood that in addition to the machine learning models 404, 406, 408, 410, the content profiler module 104 may apply one or more rules-based systems/techniques to identify various types of content that have one or more content portions that match a rule(s) that correlates with a particular pre-defined category and sub-category (i.e. language, business domain, organization, geographic).”); and
determining, for each of the one or more machine components, whether the respective machine component is classified as a sensitive machine component, using the trained NER model, one or more data sensitivity rules, and one or more rule-based resources (see Rao, [0056]: “The detection engine module 106 includes a rule set 500 that includes rules to select one or more machine learning models 504, 506, 508, 510 from the machine learning network 130 based on a respective content portion's indicated pre-defined categories (and sub-categories) as listed in the index 414. For example, a machine learning model 506 for named entity recognition may be trained by the network training module 112 for identifying whether data at portions of electronic document content may likely include various types of privacy information that typically appear in various types of documents related to a first particular organization. However, there may also be an additional machine learning model for named entity recognition trained for identifying whether data at portions of electronic document content may likely include various types of privacy information that typically appear in various types of documents related to a second particular organization—where the first and second particular organizations are different than each other. Moreover, the training data 122 used to train the machine learning model 506 for named entity recognition and additional machine learning model for named entity recognition may be the same in some respects and may be different in some respects as well.”), wherein the one or more rule-based resources comprise at least one of a rule-based dictionary structure and a rule-based pattern (see Rao, [0023]: “In various embodiments, a database 120 may include various types of structured and unstructured content, customer identifiers, extracted structural attributes and/or human reviewer activity and approval decisions.”; [0029]: “The network training module 112 of the system 100 may perform functionality in order to train the machine learning network 130 and one or more rule sets based on data in the one or more databases 120, 122.”; [0038]: “For example, a machine learning model for a specific region that includes a country involved in a financial transaction may enforce one or more privacy rules and regulations with respect to the type of information in the customer's financial records.”; [0049]: “Privacy information may be based on individual preferences and characteristics, such as inferences related to shopping patterns and behaviors of a user.”; [0056]: “The detection engine module 106 includes a rule set 500 that includes rules to select one or more machine learning models 504, 506, 508, 510 from the machine learning network 130 based… there may also be an additional machine learning model for named entity recognition trained for identifying whether data at portions of electronic document content may likely include various types of privacy information that typically appear in various types of documents related to a second particular organization—where the first and second particular organizations are different than each other... ”; and [0058]: “It is further understood that the rule set 500 may be continuously updated based on human reviewer rejection and approval decisions such that the rule set 500 (as it becomes continuously updated) gets tuned to select more appropriate machine learning models 504, 506, 508, 510 for the detected categories of any given content portion.”).
As per claim 7, which depends on claim 1, Rao further teaches wherein the sensitive data comprises at least one of: personal data relating to a specific person; business data describing information about a particular business or a customer, partner or subcontractor of the particular business; manufacturing data describing information about a manufacturing process or machine components or configurations involved in the manufacturing process; or other data deemed sensitive by a business entity (see Rao, [0037]: “According to various embodiments, a customer of an organization may request a report of all the customer's personal and private information retained by the organization. The organization identifies a collection of electronic documents associated with the customer and evaluates content portions within the collection of electronic documents to determine one or more pre-defined categories indicated by the information in the evaluated content portions. For example, the collection of electronic documents may include financial records of the customer, that includes account numbers, social security numbers, monetary amounts, personal contact information and various types of financial transactions in multiple different countries. The Privacy Engine deployed by the organization thereby determines that the pre-defined categories indicated by the information in the collection of electronic documents include, for example, a type of financial industry business domain as well as multiple geographic domains as a result of the financial transactions being related to different countries.”; and [0038]: “Given the one or more indicated pre-defined categories associated with the financial records of the customer, the Privacy Engine selects one or more machine learning models that correspond to the pre-defined categories. For example, a machine learning model for a specific region that includes a country involved in a financial transaction may enforce one or more privacy rules and regulations with respect to the type of information in the customer's financial records. The machine learning model receives input data based one or more portions of content from the collection of electronic documents and identifies those portions of content that include data (i.e. account numbers, social security numbers, monetary amounts, personal contact information and various types of financial transactions) that are likely to be identified as personal and private information subject to the privacy rules and regulations of the specific region. For portions of content identified as including possible personal and private information, the Privacy Engine returns a probability value generated by the selected machine learning model that corresponds to the pre-defined category for the specific region”).
As per claim 8, which depends on claim 1, Rao further teaches wherein the first maintenance report is part of a plurality of maintenance reports to be externally released are configured to be used as at least part of a training data set for one or more machine learning models (see Rao, [0057]: “”Moreover, the training data 122 used to train the machine learning model 505 for keyphrase detection and additional machine learning model for keyphrase detection may be the same in some respects and may be different in some respects as well.).
As per claim 9, which depends on claim 1, Rao further teaches wherein upon flagging the first maintenance report as a potentially sensitive maintenance report for further review: receiving one or more redactions to the first maintenance report from a reviewer, the one or more redactions modifying or deleting one or more text characters from the first maintenance report; processing the first maintenance report to incorporate the one or more redactions; and adding the processed first maintenance report to a plurality of maintenance reports to be externally released (see Rao, [0052]: “The content profiler module 104 receives the tagged content 400 whereby content portions may each be an individual file amongst the tagged content 400 or various segments of each individual file amongst the tagged content 400. The content profiler module 104 accesses one or more machine learning models 404, 406, 408, 410 in the machine learning network 130 to detect pre-defined categories and sub-categories of the content portions in the uploaded content 400.”; [0058]: “It is further understood that the rule set 500 may be continuously updated based on human reviewer rejection and approval decisions such that the rule set 500 (as it becomes continuously updated) gets tuned to select more appropriate machine learning models 504, 506, 508, 510 for the detected categories of any given content portion.”; and [0065]: “The selected approvals and rejections are stored in the database 120 by the approval module and the redaction module 108 recreates an updated version of the organizational document 700 to include only approved redactions of privacy information 900-1, 900-2, 900-3, 900-4, 900-5, 900-7, 900-8, 900-10, 900-11, 900-12 and to allow the privacy information 900-6, 900-9 for rejected redactions to be visible in the updated version of the organizational document 700.”).
7. Claims 10-20 are rejected under 35 U.S.C. 103 as being unpatentable over Rao (US 2022/0027508 A1) in view of Davis et al. (US 2011/0113049 A1)
As per claim 10, Rao and Davis teach a system, comprising:
one or more computer processors (see Rao, [0022]: “Some embodiments are implemented by a computer system. A computer system may include a processor, a memory, and a non-transitory computer-readable medium. The memory and non-transitory medium may store instructions for performing methods and steps described herein.”); and
a memory comprising computer program code executable by the one or more computer processors (see Rao, [0022]: “Some embodiments are implemented by a computer system. A computer system may include a processor, a memory, and a non-transitory computer-readable medium. The memory and non-transitory medium may store instructions for performing methods and steps described herein.”) to perform operations comprising:
retrieving a first maintenance report comprising an instance of text data describing a maintenance event for a first physical apparatus (see Claim 1 rejection above);
processing the first maintenance report using a trained Named Entity Recognition (NER) model to identify instances of one or more words that are associated with a respective one or more real-world names (see Claim 1 rejection above);
determining whether a first identified instance of one or more words represents sensitive data, using a data anonymization rules ontology that describes a plurality of different ways to identify sensitive data within maintenance reports (see Claim 1 rejection above);
based on determining that the first maintenance report includes sensitive data, flagging the first maintenance report as a potentially sensitive maintenance report for further review (see Claim 1 rejection above).
As per claim 11, which depends on claim 10, Rao further teaches wherein the first maintenance report comprises a first section comprising structured text data describing attributes of the maintenance event and a second section comprising unstructured text data written by a maintenance operator describing details of the maintenance event (see Rao, [0006]: “By leveraging multiple Natural Language Processing (NLP) models, Named Entity Recognition models and others machine learning techniques to recognize privacy information included within unstructured content”; [0023]: “The machine learning network 130 and the databases 120, 122 may further be components of the system 100 as well. In various embodiments, a database 120 may include various types of structured and unstructured content, customer identifiers, extracted structural attributes and/or human reviewer activity and approval decisions.”; and [0033]: “The Privacy Engine receives the electronic documents, that include content portions of unstructured text, structured text, image content, audio content and/or video content. The Privacy Engine stores the received electronic documents in a data store device. The Privacy Engine builds a content profile according to indicated categories for portions of the content (Act 204). According to various embodiments, the Privacy Engine evaluates the one or more initial electronic documents to build a content profile based on detecting information at the respective portions of content that indicates one or more pre-defined categories”).
As per claim 12, which depends on claim 10, Rao further teaches wherein the operations further comprise: prior to processing the first maintenance report, training the NER model using a plurality of annotated maintenance reports, wherein each of the plurality of annotated maintenance reports comprises a first section comprising structured text data describing attributes of a maintenance event and a second section comprising unstructured text data written by a maintenance operator describing details of the maintenance event, and wherein the plurality of annotated maintenance reports are annotated to comprise a plurality of text entries, each corresponding to a portion of text in either the first section or the second section and associated with a respective one or more tagged machine components. (see Rao, [0023]: “FIG. 1 illustrates a block diagram of an example system 100 of the Privacy Engine for training a machine learning network 130 with input training… The machine learning network 130 and the databases 120, 122 may further be components of the system 100 as well. In various embodiments, a database 120 may include various types of structured and unstructured content, customer identifiers, extracted structural attributes and/or human reviewer activity and approval decisions”; [0033]: “The Privacy Engine receives the electronic documents, that include content portions of unstructured text, structured text, image content, audio content and/or video content. The Privacy Engine stores the received electronic documents in a data store device. The Privacy Engine builds a content profile according to indicated categories for portions of the content (Act 204)”; and [0051]: “As shown in FIG. 4, structured and/or unstructured content 400 may be uploaded to and ingested via the ingestion module 102. The content 400 may include one or more of any type of content format, such as: an unstructured document, a structured document, an email, an audio file, a video file, an audio recording, an audio recording transcript, and/or a presentation. The ingestion module detects the various types of content formats of the content 400 and tags the content as being associated with one or more customer identifiers. In various embodiments, the content 400 include one or more initial electronic documents and/or content portions from the one or more initial electronic documents.”).
As per claim 13, which respectively depends on claim 12, Rao and Davis further teach wherein training the NER model further uses one or more specific machine ontologies that describes a physical machine and a plurality of components of the physical machine, wherein the one or more tagged machine components associated with the plurality of text entries each correspond to a respective concept within the one or more specific machine ontologies (see Claim 1 rejection above).
As per claim 14, which depends on claim 10, Rao further teaches wherein determining whether the first identified instance of one or more words represents sensitive data further comprises:
identifying one or more text portions within the first maintenance report that correspond to one or more machine components (see Rao, [0055]: “It is understood that in addition to the machine learning models 404, 406, 408, 410, the content profiler module 104 may apply one or more rules-based systems/techniques to identify various types of content that have one or more content portions that match a rule(s) that correlates with a particular pre-defined category and sub-category (i.e. language, business domain, organization, geographic).”); and
determining, for each of the one or more machine components, whether the respective machine component is classified as a sensitive machine component, using the trained NER model, one or more data sensitivity rules, and one or more rule-based resources, wherein the one or more rule-based resources comprise at least one of a rule-based dictionary structure and a rule-based pattern (see Rao, [0056]: “The detection engine module 106 includes a rule set 500 that includes rules to select one or more machine learning models 504, 506, 508, 510 from the machine learning network 130 based on a respective content portion's indicated pre-defined categories (and sub-categories) as listed in the index 414. For example, a machine learning model 506 for named entity recognition may be trained by the network training module 112 for identifying whether data at portions of electronic document content may likely include various types of privacy information that typically appear in various types of documents related to a first particular organization. However, there may also be an additional machine learning model for named entity recognition trained for identifying whether data at portions of electronic document content may likely include various types of privacy information that typically appear in various types of documents related to a second particular organization—where the first and second particular organizations are different than each other. Moreover, the training data 122 used to train the machine learning model 506 for named entity recognition and additional machine learning model for named entity recognition may be the same in some respects and may be different in some respects as well.”).
As per claim 15, which depends on claim 14, Rao further teaches wherein the one or more rule-based resources comprise at least one of a rule-based dictionary structure and a rule-based pattern (see Claim 16 rejection above)
As per claim 16, which depends on claim 10, Rao further teaches wherein the sensitive data comprises at least one of: personal data relating to a specific person; business data describing information about a particular business or a customer, partner or subcontractor of the particular business; manufacturing data describing information about a manufacturing process or machine components or configurations involved in the manufacturing process; or other data deemed sensitive by a business entity (see Rao, [0037]: “According to various embodiments, a customer of an organization may request a report of all the customer's personal and private information retained by the organization. The organization identifies a collection of electronic documents associated with the customer and evaluates content portions within the collection of electronic documents to determine one or more pre-defined categories indicated by the information in the evaluated content portions. For example, the collection of electronic documents may include financial records of the customer, that includes account numbers, social security numbers, monetary amounts, personal contact information and various types of financial transactions in multiple different countries. The Privacy Engine deployed by the organization thereby determines that the pre-defined categories indicated by the information in the collection of electronic documents include, for example, a type of financial industry business domain as well as multiple geographic domains as a result of the financial transactions being related to different countries.”; and [0038]: “Given the one or more indicated pre-defined categories associated with the financial records of the customer, the Privacy Engine selects one or more machine learning models that correspond to the pre-defined categories. For example, a machine learning model for a specific region that includes a country involved in a financial transaction may enforce one or more privacy rules and regulations with respect to the type of information in the customer's financial records. The machine learning model receives input data based one or more portions of content from the collection of electronic documents and identifies those portions of content that include data (i.e. account numbers, social security numbers, monetary amounts, personal contact information and various types of financial transactions) that are likely to be identified as personal and private information subject to the privacy rules and regulations of the specific region. For portions of content identified as including possible personal and private information, the Privacy Engine returns a probability value generated by the selected machine learning model that corresponds to the pre-defined category for the specific region”).
As per claim 17, which depends on claim 10, Rao further teaches wherein the first maintenance report is part of a plurality of maintenance reports to be externally released as at least part of a training data set for one or more machine learning models (see Rao, [0057]: “”Moreover, the training data 122 used to train the machine learning model 505 for keyphrase detection and additional machine learning model for keyphrase detection may be the same in some respects and may be different in some respects as well.).
As per claim 18, which depends on claim 10, Rao further teaches wherein upon flagging the first maintenance report as a potentially sensitive maintenance report for further review: receiving one or more redactions to the first maintenance report from a reviewer, the one or more redactions modifying or deleting one or more text characters from the first maintenance report; processing the first maintenance report to incorporate the one or more redactions; and adding the processed first maintenance report to a plurality of maintenance reports to be externally released (see Rao, [0052]: “The content profiler module 104 receives the tagged content 400 whereby content portions may each be an individual file amongst the tagged content 400 or various segments of each individual file amongst the tagged content 400. The content profiler module 104 accesses one or more machine learning models 404, 406, 408, 410 in the machine learning network 130 to detect pre-defined categories and sub-categories of the content portions in the uploaded content 400.”; [0058]: “It is further understood that the rule set 500 may be continuously updated based on human reviewer rejection and approval decisions such that the rule set 500 (as it becomes continuously updated) gets tuned to select more appropriate machine learning models 504, 506, 508, 510 for the detected categories of any given content portion.”; and [0065]: “The selected approvals and rejections are stored in the database 120 by the approval module and the redaction module 108 recreates an updated version of the organizational document 700 to include only approved redactions of privacy information 900-1, 900-2, 900-3, 900-4, 900-5, 900-7, 900-8, 900-10, 900-11, 900-12 and to allow the privacy information 900-6, 900-9 for rejected redactions to be visible in the updated version of the organizational document 700.”).
As per claim 19, Rao and Davis teach a non-transitory computer-readable medium comprising computer program code that, when executed by operation of one or more computer processors, performs operations (see Rao, [0022]: “Some embodiments are implemented by a computer system. A computer system may include a processor, a memory, and a non-transitory computer-readable medium. The memory and non-transitory medium may store instructions for performing methods and steps described herein.”) comprising:
retrieving a first maintenance report comprising an instance of text data describing a maintenance event for a first physical apparatus, wherein the first maintenance report comprises a first section comprising structured text data describing attributes of the maintenance event and a second section comprising unstructured text data written by a maintenance operator describing details of the maintenance event (see Rao, [0033]: “The Privacy Engine receives the electronic documents, that include content portions of unstructured text, structured text, image content, audio content and/or video content. The Privacy Engine stores the received electronic documents in a data store device. The Privacy Engine builds a content profile according to indicated categories for portions of the content (Act 204).”; and [0038]: “The machine learning model receives input data based one or more portions of content from the collection of electronic documents and identifies those portions of content that include data (i.e. account numbers, social security numbers, monetary amounts, personal contact information and various types of financial transactions) that are likely to be identified as personal and private information subject to the privacy rules and regulations of the specific region. For portions of content identified as including possible personal and private information, the Privacy Engine returns a probability value generated by the selected machine learning model that corresponds to the pre-defined category for the specific region.”);
processing the first maintenance report using a trained Named Entity Recognition (NER) model to identify instances of one or more words that correspond to one or more machine components of the first physical apparatus (see Claim 1 rejection above); and
determining whether a first identified instance of one or more words represents sensitive data, using a data anonymization rules ontology that describes a plurality of different ways to identify sensitive data within maintenance reports (see Claim 1 rejection above), comprising:
determining, for each of the one or more machine components of the physical apparatus, whether the respective machine component is classified as a sensitive machine component, using one or more data sensitivity rules and one or more rule-based resources (see Rao, [0060], “The detection engine module 106 passes each content portion to one or more machine learning models 504, 506, 508, 510 selected according to the rule set 500 and the respective content portion's indicated pre-defined categories (and sub-categories).);
upon determining that the first maintenance report includes sensitive data:
flagging the first maintenance report as a potentially sensitive maintenance report for further review (see Claim 1 rejection above);
receiving one or more redactions to the first maintenance report from a reviewer, the one or more redactions modifying or deleting one or more text characters from the first maintenance report (see Rao, [0046]: “human reviewer may provide an approval or rejection of the selected redaction. A rejection indicates the redaction corresponds to data that was incorrectly identified as privacy information or that the redaction corresponds to a different type privacy information. An approval indicates the redaction corresponds to data that is privacy information and that the redaction is appropriate… ”);
processing the first maintenance report to incorporate the one or more redactions (see Claim 1 rejection above); and
adding the processed first maintenance report to a plurality of maintenance reports to be externally released (see Claim 1 rejection above).
As per claim 20, which depends on claim 19, Rao teaches the operations further comprising:
prior to processing the first maintenance report, training the NER model using a plurality of annotated maintenance reports, wherein each of the plurality of annotated maintenance reports comprises a first section comprising structured text data describing attributes of a maintenance event and a second section comprising unstructured text data written by a maintenance operator describing the maintenance event, and wherein the plurality of annotated maintenance reports are annotated to contain a plurality of text entries, each corresponding to a portion of text in either the first section or the second section and associated with a respective one or more tagged machine components (see claim 3 rejection above),
wherein training the NER model further uses one or more specific machine ontologies that describes a physical machine and a plurality of components of the physical machine, wherein the one or more tagged machine components associated with the plurality of text entries each correspond to a respective concept within the one or more specific machine ontologies (see claim 1 rejection above).
Conclusion
8. For the reasons above, claims 1-20 have been rejected and remain pending.
9. Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
10. Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL Y WON whose telephone number is (571)272-3993. The examiner can normally be reached on Wk.1: M-F: 8-5 PST & Wk.2: M-Th: 8-7 PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Nicholas R Taylor can be reached on 571-272-3889. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Michael Won/Primary Examiner, Art Unit 2443