Last updated: April 19, 2026
Application No. 16/922,793
DATA IDENTIFICATION USING NEURAL NETWORKS

Final Rejection §103§112
Filed
Jul 07, 2020
Examiner
SHINE, NICHOLAS B
Art Unit
2126
Tech Center
2100 — Computer Architecture & Software
Assignee
Accenture Global Solutions Limited
OA Round
4 (Final)
This examiner grants 38% of cases after interview

— +44.6% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 37 resolved cases, 2023–2026
Examiner Intelligence

SHINE, NICHOLAS B View full profile →
Grants only 38% of cases
Career Allow Rate
14 granted / 37 resolved
-17.2% vs TC avg
Strong +45% interview lift
Without
With
+44.6%
Interview Lift
resolved cases with interview
Typical timeline
5y 1m
Avg Prosecution
25 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
34.9%
-5.1% vs TC avg
§103
46.0%
+6.0% vs TC avg
§102
5.3%
-34.7% vs TC avg
§112
13.4%
-26.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 37 resolved cases
Office Action

§103 §112
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
This action is responsive to remarks filed 07/28/2025. Claims 1, 4, 7–9, 11, 13, 15, 17, and 19 are amended. Claim 6 has been cancelled, and there are no new claims.
	Claims 1, 4–5, 7–9, 11–15 and 17–20 are pending for examination.

Response to Arguments
In reference to 35 USC § 101
Applicant’s arguments, filed on 07/28/2025, with respect to the § 101 rejections have been fully considered and are persuasive.
Examiner notes that while the claims recite several limitations that are abstract ideas (mental processes), the claims as a whole are not directed to an abstract idea. Applicant amended the claims, which collectively now recite a detailed system directed toward identifying personal identifiable information (PII) from a wide range of attributes in a dataset and tagging the information in order to protect sensitive data by applying a deep neural network models, such as convolutional neural network (CNN), and recurrent neural network (RNN) with minimal human intervention. The newly amended independent claims now include "encoding, using the first data encoder, the input dataset based on quantization of each character of the input dataset using a one-hot encoding component and a first dictionary;" "encoding, using the second data encoder, each character of the input dataset using an embedding matrix corresponding to a set of embedding layers of the RNN component, a second dictionary, a dictionary index corresponding to the second dictionary, and a weight corresponding to each embedding layer of the embedding matrix",  "perform a pattern identification action based on the first format and the second format data feature of the output dataset to automatically notify the data feature of the input dataset corresponding to the identity parameter to a user", and "provide to the user, based on the output dataset, the data feature of the input dataset corresponding to the identity parameter in the first format and another data feature of the input dataset in the second format different than the first format." These additional limitations are not abstract ideas (see MPEP 2106.04(a)). Thus, these limitations must be considered additional elements to the abstract idea. Examiner notes that these additional elements integrate the abstract idea into a practical application because the entire claim amounts to a detailed system that requires implementing a specific combination of hardware with the methods of classification and notification (as opposed to a broad recitation at a high level of generality), and the specific combination of hardware and instructions recited in the additional element amounts to an improvement to the functioning of a computer/field, as set forth by MPEP 2106.05(a)), which states “the claim must include the components or steps of the invention that provide the improvement described in the specification.” Pursuant to this requirement set forth by the MPEP, examiner points out that the Specification states in at least [0024, 0027, 0103]: “The present disclosure provides for a system for PII tagging that may generate key insights related to PII pattern identification with minimal human intervention. Furthermore, the present disclosure may deduce a mechanism of modifying a data identification technique, in near real-time, based on the identification of unrecognized patterns and the associated characteristics in the dataset.” Therefore, the additional elements reflect the improvement set forth and explains what the resulting improvement is. 
Thus, the additional limitations do amount to significantly more, and the § 101 rejections are withdrawn.

In reference to 35 USC § 103
Applicant’s arguments filed on 07/28/2025, with respect to the 35 USC § 103 rejections have been fully considered but are not persuasive. 
Applicant argues, beginning on Pg. 9 of the Remarks, that the “none of the applied references teach or suggest the above-mentioned claimed aspects as claimed in amended claim 1.” More specifically, applicant argues that “Kuppa merely discloses mapping of labels to a two-dimensional space representing attack techniques and attack taxonomy for vulnerability assessment at each stage of an attack cycle.” Examiner respectfully disagrees. Examiner contends that Kuppa teaches a data manipulator that arranges data objects (i.e., identifies characteristics) based on classifiers operating on data of specific vector length (i.e., based on a pre-defined size parameter). See § 103 rejections below for a complete analysis of the claims.
Applicant’s arguments with regard to the newly added amendments have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. 

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


Claims 1, 9, and 15 are rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor regards as the invention.
Each independent claim recites, “provide to the user, based on the output dataset, the data feature of the input dataset corresponding to the identity parameter in the first format and another data feature of the input dataset in the second format different than the first format.” However, it is unclear how the method provides the data features of the input dataset corresponding to the first format and second format when the two have conditions precedent that are not required. Specifically, the first and second are determined “if” and only if their corresponding type of deep neural network is selected as stated in the newly amended independent claims. Furthermore, there is no requirement that both networks would be selected thereby producing a first and second format. For example, examiner notes that, according to the claim language, a CNN may only be selected and then the output would never include a second format from the RNN. In the interest of compact prosecution, examiner is construing the limitation as “provide to the user, based on the output dataset, the data feature of the input dataset corresponding to the identity parameter in the first format [[and]] and/or another data feature of the input dataset in the second format different than the first format.”
Claims 4–5, 7–8, 11–14, and 17–20 are rejected under § 112(b) for depending from a claim rejected under § 112(b), as each dependent claims necessarily includes the limitations of the claim from which it depends, respectively.
Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 4–5, 7–9, 11–15 and 17–20 are rejected under 35 U.S.C. 103 as being unpatentable over Kuppa et al., (US 20210367961 A1), hereinafter “Kuppa”, in view of Bonageri (US20200410614A1), hereinafter “Bonageri”, and in view of Jimmy Lei Ba et al., ("Layer Normalization," https://arxiv.org/abs/1607.06450v1, 2016), hereinafter “Ba”, and further in view of Shanmugamani et al., (US 20190303465 A1), hereinafter “Shanmugamani”.
Regarding claim 1, Kuppa teaches: 
a processor (Kuppa ¶0114: “The various illustrative logical blocks, modules, and circuits described in connection with the aspects disclosed herein may be implemented or performed with a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices (e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration—[emphasis added]); and
a memory communicatively coupled to the processor (Kuppa ¶0115: “The methods, sequences, and/or algorithms described in connection with the aspects disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM, flash memory, ROM, EPROM, EEPROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of non-transitory computer-readable medium known in the art. An exemplary non-transitory computer-readable medium may be coupled to the processor such that the processor can read information from, and write information to, the non-transitory computer-readable medium. In the alternative, the non-transitory computer-readable medium may be integral to the processor. The processor and the non-transitory computer-readable medium may reside in an ASIC. The ASIC may reside in an IoT device. In the alternative, the processor and the non-transitory computer-readable medium may be discrete components in a user terminal”—[emphasis added]), 
wherein the memory comprises a data manipulator, which, when executed by the processor, cause the processor to (Kuppa Fig. 5, ¶0061: “The computing device 501 may include a data manipulator 502 that provides digital storage space for structured and unstructured data as well as data processing capabilities for data analysis”; see also Kuppa ¶0114: “The various illustrative logical blocks, modules, and circuits described in connection with the aspects disclosed herein may be implemented or performed with a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices (e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration)”—[wherein the data manipulator is implemented or performed with (i.e., coupled to) a general purpose processor]): 
obtain an input dataset comprising data associated with an individual, wherein the input dataset is defined in a one-dimensional data structure (Kuppa Fig. 7A, ¶0066–0067: 

    PNG
    media_image1.png
    409
    657
    media_image1.png
    Greyscale

“The transform network 505 may also receive data objects with mitigation steps 603 that are parsed from exploit reports, intrusion reports, or imported from a database of mitigation techniques and patches. Specifically, the transform network may receive the pre-exploit descriptions (e.g. exploited system configuration) and post-exploit descriptions (e.g. recovery method, logs, or isolation method) as textual descriptions, parsed textual descriptions, or encoded text. The context encoder 503 includes word tokens 701 inputted to the system from the features and descriptions 601. The word tokens 701 are generated by a word parser (e.g. word2vec) that converts natural language to word strings or tokens 701 that are in are arranged in an array as shown in FIG. 7A. The word tokens 701 are input into a bi-Long Short Term Memory model 702 that outputs context labels 703 to the first combination node”; see also Kuppa Table 2 ¶0074: “Examples of various mitigation techniques that may be part of the textual description of a particular CVE are as follows”

    PNG
    media_image2.png
    615
    691
    media_image2.png
    Greyscale

—[(emphasis added) wherein the system receives one-dimension word embedding data objects (see also Fig. 7A) associated with an individual (e.g., see “Implement Physical Security” Table 2) wherein the system’s mitigation techniques, that are based on the received data, are associated with an individual]); 
wherein selecting the type of the deep neural network comprises:
identifying a characteristic associated with the input dataset based on a pre-defined parameter, where the pre-defined parameter comprises at least one of a size of the input dataset and length of individual elements in the dataset (Kuppa Figs. 7B, 10–12, ¶0062: “The data manipulator 502 as illustrated in FIG. 5 may include a joint latent space 506 that stores data and a context encoder 503, a label encoder 504, and a transform network 505. The context encoder 503 may feed the joint latent space 506 with data objects encoded, extracted or characterized by the context encoder 503. The label encoder 504 may feed the joint latent space 506 with data objects encoded, extracted or characterized by the label encoder 504. The transform network 505 may perform additional data analysis on the data objects encoded by the context encoder 503 and the label encoder 504. The transformer network 505 may receive data objects and reprocess them back to the joint latent space 506 with additional or new embeddings. The computing device 501 may include a Multi-Layer Perceptron (MLP) classifier 507 that operates on the joint latent space 506 and arranges the data objects of the joint latent space 506. In addition, the MLP classifier 507 may output data objects as results to the external database(s) 508. These results may be used by the vulnerability management system 250”; see also Kuppa ¶0105: “Various models including BI-LSTM, Attention-based BI-LSTM, and TD-IDF-based SVM multi-label classifiers may also be used as the classifier. The term frequency-inverse document frequency (TF-IDF) approach represents all textual features as vectors with the same length as the vocabulary of the entire text corpus”—[(emphasis added) wherein classifier operates on the data objects using TF-IDF (i.e., identifies a characteristic of the input dataset based on a pre-defined parameter) including all textual features as vectors with the same length (i.e., length of individual elements)]);
selecting the type of the deep neural network component as a convolutional neural network (CNN) component, when the input dataset is associated with the first characteristic (Kuppa ¶0061–0062: “The computing device 501 may include a data manipulator 502 that provides digital storage space for structured and unstructured data as well as data processing capabilities for data analysis. The data manipulator 502 may include many nodes and connections in a hierarchical or layered structure to facilitate mapping of data points to each other. For example, the connections may be ordered via a convolutional neural network, a recurrent neural network, or other neural network operated by the data manipulator. Specifically, in at least one embodiment, the data manipulator may perform sorts, filters, comparisons, correlations, similarity determinations, and/or other data analysis … The model selector 118 selects a learning model 108 from among the learning models 126 to apply in evaluating clinical trial data. The learning model 108 can be selected based on indicators, features, and/or attributes identified within the evaluation criteria 106 that are identified as being relevant to a clinical trial or a clinical trial site. For example, if a clinical trial involves an investigation of efficacy of a drug, the learning model 108 can be trained to identify anomalies within the medical record data 124 that are associated with drug safety, dosage restrictions, or unexpected disease conditions or side effects”—[(emphasis added) wherein the data manipulator facilitates the mapping by ordering (i.e., selecting) the convolutional neural network as the hierarchical neural network based on indicators, features and/or attributes identified (i.e., a first characteristic)]); and
selecting the deep neural network component as a recurrent neural network (RNN) component when the input dataset is associated with the second characteristic (Kuppa ¶0061: “The computing device 501 may include a data manipulator 502 that provides digital storage space for structured and unstructured data as well as data processing capabilities for data analysis. The data manipulator 502 may include many nodes and connections in a hierarchical or layered structure to facilitate mapping of data points to each other. For example, the connections may be ordered via a convolutional neural network, a recurrent neural network, or other neural network operated by the data manipulator. Specifically, in at least one embodiment, the data manipulator may perform sorts, filters, comparisons, correlations, similarity determinations, and/or other data analysis… The model selector 118 selects a learning model 108 from among the learning models 126 to apply in evaluating clinical trial data. The learning model 108 can be selected based on indicators, features, and/or attributes identified within the evaluation criteria 106 that are identified as being relevant to a clinical trial or a clinical trial site. For example, if a clinical trial involves an investigation of efficacy of a drug, the learning model 108 can be trained to identify anomalies within the medical record data 124 that are associated with drug safety, dosage restrictions, or unexpected disease conditions or side effects”—[(emphasis added) wherein the data manipulator facilitates the mapping by ordering (i.e., selecting) the recurrent neural network as the hierarchical neural network based on indicators, features and/or attributes identified (i.e., a second characteristic)]), 
wherein the RNN includes connections to add feedback and memory to a plurality of layers of the RNN, to learn the input dataset including a data pattern that indicates a sequence of characters in the input dataset, and predict a next character in the sequence of characters, based on the data pattern (Kuppa ¶¶0066–0068: “the transform network 505 may receive data objects from the joint latent space 506 and use feedback to add embeddings and improve the data objects. Specifically, the transform network may receive the pre-exploit descriptions (e.g. exploited system configuration) and post-exploit descriptions (e.g. recovery method, logs, or isolation method) as textual descriptions, parsed textual descriptions, or encoded text … The context encoder 503 includes word tokens 701 inputted to the system from the features and descriptions 601. The word tokens 701 are generated by a word parser (e.g. word2vec) that converts natural language to word strings or tokens 701 that are in are arranged in an array as shown in FIG. 7A. The word tokens 701 are input into a bi-Long Short Term Memory model 702 that outputs context labels 703 to the first combination node. The bi-Long Short Term Memory model 702 is based on an artificial recurrent neural network architecture with feedback to process sequential streams of tokens into labels and/or embeddings”—[(emphasis added)]);
encode each character in the input dataset using a predefined dictionary and a predefined encoding function, to convert the input dataset into a formatted dataset of a two-dimensional data structure by using a first data encoder and a second data encoder, wherein a format of the formatted dataset is defined in accordance to the type of the deep neural network component (Kuppa Fig. 13, ¶0025: “FIG. 13 illustrates a mapping of labels to a two-dimensional space representing attack techniques and attack taxonomy in accordance with an embodiment of the disclosure”; see also Kuppa Figs. 7A, 7B, “context encoder”, “label encoder”, ¶¶0067–0068: “The context encoder 503 includes word tokens 701 inputted to the system from the features and descriptions 601. The word tokens 701 are generated by a word parser (e.g. word2vec) that converts natural language to word strings or tokens 701 that are in are arranged in an array as shown in FIG. 7A. The word tokens 701 are input into a bi-Long Short Term Memory model 702 that outputs context labels 703 to the first combination node. The bi-Long Short Term Memory model 702 is based on an artificial recurrent neural network architecture with feedback to process sequential streams of tokens into labels and/or embeddings. The label encoder 504 receives word token embeddings 705 and character-based token embeddings and inputs the embeddings into another bi-Long Short Term Memory (LSTM) model 706. The embeddings include word and character tokens 602 that are derived from descriptions of intrusion techniques (e.g. ATT&CK stages). The label encoder 504 may apply a parser (e.g. word2vec) to the inputs to convert the data to embeddings or vectors. The LSTM model outputs a label 704 to the first combination node and the second combination node. The labels 704 are output to each combination node for improved embeddings and similarity analysis at the nodes”; see also Kuppa ¶0071: “In some designs, the model architecture of the labelling and filtering pipeline 600 may be adapted to encode labels from unstructured data. These labels are then fed into the joint latent space 506”; see also Kuppa ¶0075: “At 804, the system parses text from the at least one first textual description in accordance with one or more rules. The rules may include selecting certain nouns, pronouns, verbs, and/or abbreviations from the textual description. The rules may include selecting words based on proximity to a named CVE or other keyword. The rules may include selecting or separating words based on whether the words precede a keyword or follow a keyword. The rules may be adapted for various languages. The parsing may include filtering and vectorizing the words. The resultant parsed text (e.g., after filtering, vectorizing, etc.) is referred to herein as a CVE “context”. Accordingly, reference to the parsed text may refer to the literal parsed text, or alternatively a processed version of the parsed text”—[wherein the BRI of predefined dictionary is any tool, component, computer code that aids in converting data from one format to another (see present disclosure para [0038]), and wherein the unstructured data is manipulated by a parser according to rules (i.e., predefined dictionary and a predefined encoding function), and based on rules, the encoder inputs the embedding into another bi-LSTM (i.e., converts the dataset into a formatted dataset; e.g., encoding each character token), the dataset is formatted according to the bi-LSTM (i.e., in accordance with the type of deep neural network component), and wherein the data is output as a mapping of two-dimensional space representing attack techniques (e.g., Fig. 13)]);
if the selected deep neural networking component is the RNN component: encoding, using the second data encoder, each character of the input dataset using an embedding matrix corresponding to a set of embedding layers of the RNN component, a second dictionary, a dictionary index corresponding to the second dictionary, and a weight corresponding to each embedding layer of the embedding matrix (Kuppa ¶0068: “The label encoder 504 receives word token embeddings 705 and character-based token embeddings and inputs the embeddings into another bi-Long Short Term Memory (LSTM) model 706. The embeddings include word and character tokens 602 that are derived from descriptions of intrusion techniques (e.g. ATT&CK stages). The label encoder 504 may apply a parser (e.g. word2vec) to the inputs to convert the data to embeddings or vectors. The LSTM model outputs a label 704 to the first combination node and the second combination node. The labels 704 are output to each combination node for improved embeddings and similarity analysis at the nodes”; see also Kuppa ¶0092: “For each of the parsing and encoding (embedding) modules 1002, the parsed text may be further processed through a bi-LSTM network, LSTM network, or another artificial recurrent neural network as a part of 804. The token embedding layer of the context encoder 503 takes a token as input and outputs its vector representation, given an input sequence of tokens xl . . . xn, the output vector ei (i=1 . . . n) of each token xi results from the concatenation of two different types of embeddings: token embeddings Vt(xi) and the character-based token embeddings (bi) that come from the output of a character-level and word-level bi-LSTM encoder. Features that have less contextual information but may contain out of vocabulary (OOV) tokens also pass through the token embedding layer to the joint latent space 506”; see also Kuppa ¶¶0095–0096: “The joint latent space 506 between two ATT&CK techniques domains and the CVE feature domain is created by a component-wise multiplication of each embedding type with label embedding for their joint representation given by:
h AJ (ij) =h j y ·h i A and h MJ (ij) =h i y ·h i M,
where hi y is the label embedding, hi A is the mitigation or transform embedding, and hi M is the context embedding. The probabilities for each are calculated as: pA (ij)=hAJ (ij)ωA+bA and pM (ij)=hMJ (ij)ωM+bM. The probability for h belongs to one of the k known labels and is modeled by a linear unit that maps any point in the joint space into a score which indicates the validity of the combination, where ω ∈ Rdj and b are scalar variables and dj is the number of CVE exploits input for training. Therefore, the hAJ (ij) dot product or component-wise multiplication is an implementation of combination node 1009 and the hi M dot product or component-wise multiplication is an implementation of combination node 1007” —[wherein the bi-LSTM network is a recurrent neural network used by the label encoder with corresponding embedding layers that encodes the tokens in the embedded matrix Vt(xi), and wherein the parser word2vec is the second dictionary, and the i is the embedded parsed data index (i.e., dictionary index corresponding to the second dictionary) and the scalar ω is the weight corresponding to each layer of the embedded matrix]); and
determining, using the second data encoder, a second formatted dataset based on the encoding of the input dataset, where the second formatted dataset is of a predefined length (Kuppa ¶0096: “Finally, the output of combination node 1007 and combination node 1009 combined with the probabilities creates a multi-dimensional joint latent space 506 or model where attack chain description labels are mapped to CVE description labels and mitigation labels as in 808. These mapped labels are joined in a single joint latent space 506 via dot product or combination node 1008 as illustrated in FIG. 10 and FIG. 11. The resulting joint space has an independent label dimension”—[wherein the output of combination nodes 1007 and 1009, combined with the probabilities creates the second formatted dataset based on the encoding, and wherein the resulting joint space has an independent label dimension (i.e., a predefined length)); and
process the formatted dataset through a plurality of layers of the deep neural network component, the processing comprising transforming the formatted dataset at each layer of the plurality of layers of the deep neural network component based on a transformation function, a predefined filter, a weight, and a bias component to generate an output indicative of a category of the input dataset (Kuppa Figs. 5–7B, ¶0069: “A single layer of the transformer network 505 includes at least two feedback loops and connections to other layers of the transformer network 505. Each loop includes a stage to add bias and normalize 707 the labels. Layer normalization, LayerNorm(x+Sublayer(x)), is also used after each sublayer, where Sublayer(x) denotes the sub-layer function. In addition, a first loop includes a multi-head self-attention stage 708 that identifies similarity between newly coded labels 704 and mitigation steps. In the transform network 505, each key, query, and value may be a vector corresponding to a sentence”; see also Kuppa ¶0105: “For the term frequency-inverse document frequency (TF-IDF) model, each entry in the vector corresponds to a unique word, and its weight gives the frequency of that word in the post divided by its document frequency. These document vectors are then used in the classification task. Also since TF-IDF results in high-dimensional representations, a support vector machine (SVM) is applied on the TF-IDF features. In testing, the MLP classifier 507 operating on the three filtered and combined label domains generates the best results”—[(emphasis added) wherein the data is processed through layers of the transform network (i.e., deep neural network component) based on sub-layer function (i.e., transformation function), a bias, weight, and filters, to label the input data (i.e., generate an output indicative of the category of the input data)]) 
based on the processing of the formatted dataset, determine output data corresponding to a classification indicative of a probability of a data feature of the input dataset (Kuppa ¶0087: “At 808, the system generates or refines a model that maps the parsed text to at least one first label associated with the one or more stages of the attack chain taxonomy. The model may include the joint latent space 506 and MLP classifier 507. The mapping may include arranging or scoring labels in the joint latent space 506 based on relevance, attack timing, or mitigation. In an example, new CVE descriptions become available frequently, whereas the attack chain taxonomy and associated concepts may change less frequently. The above-noted model may generally be applied with respect to new CVE description in a predictive manner so as to label the new CVE with regard to labels that are associated with one or more attack stages of a respective attack stage taxonomy, such as ATT&CK”; see also Kuppa Table 2 ¶0074: “Examples of various mitigation techniques that may be part of the textual description of a particular CVE are as follows”

    PNG
    media_image2.png
    615
    691
    media_image2.png
    Greyscale

—[wherein the system uses the MLP classifier to map and score labels based on relevance (i.e., indicative of a probability), attack timing, or mitigation of the common vulnerabilities and exposure (CVE) data (i.e.,  a data feature of the input dataset indicative of sensitive data) and the mitigation includes CVE data including based on the received data associated with an individual (e.g., see “Implement Physical Security” Table 2))].
store the output data corresponding to the classification of the input dataset, to generate an output dataset (Kuppa ¶0080: “The model that is refined and/or generated in the final step of FIG. 8 may include the joint latent space 506 and the MLP classifier 507. The joint latent space 506 contains labels of different sizes from each of the embeddings modules. The MLP classifier 507 can then operate on labels of all different sizes and select sets of labels based on an input to the MLP classifier 507”—[wherein the space includes (i.e., stores) labels (i.e., corresponding to the classification) to operate on the labels (i.e., generate an output set)]);
perform a pattern identification action based on the first format and the second format data feature of the output dataset to automatically notify the data feature of the input dataset corresponding to the identity parameter to a user (Kuppa ¶0062: “The data manipulator 502 as illustrated in FIG. 5 may include a joint latent space 506 that stores data and a context encoder 503, a label encoder 504, and a transform network 505. The context encoder 503 may feed the joint latent space 506 with data objects encoded, extracted or characterized by the context encoder 503. The label encoder 504 may feed the joint latent space 506 with data objects encoded, extracted or characterized by the label encoder 504. The transform network 505 may perform additional data analysis on the data objects encoded by the context encoder 503 and the label encoder 504. The transformer network 505 may receive data objects and reprocess them back to the joint latent space 506 with additional or new embeddings. The computing device 501 may include a Multi-Layer Perceptron (MLP) classifier 507 that operates on the joint latent space 506 and arranges the data objects of the joint latent space 506. In addition, the MLP classifier 507 may output data objects as results to the external database(s) 508. These results may be used by the vulnerability management system 250”—[wherein the data manipulator performs the data analysis based on the data from the context and label encoders (i.e., first and second format data features) to automatically output (i.e., notify) the external databases and vulnerability management system (i.e., users)]); and
Kuppa does not appear to explicitly teach: 
wherein encoding further comprises: if the selected deep neural networking component is the CNN component: encoding, using the first data encoder, the input dataset based on quantization of each character of the input dataset using a one-hot encoding component and a first dictionary;
determining, a first formatted dataset based on the encoding of the input dataset, where the first formatted dataset is in two-dimensional data structure representing a matrix of binary digits;
wherein the characteristics associated with the input dataset comprise:
a first characteristic, indicative of a size of the input dataset being greater than a predetermined size;
a second characteristic, indicative of a size being less than the predetermined size;
select a type of a deep neural network component based on a characteristic of the input dataset;
wherein the memory further comprises an identification classifier which, when executed by the processor, cause the processor to;
provide to the user, based on the output dataset, the data feature of the input dataset corresponding to the identity parameter in the first format and another data feature of the input dataset in the second format different than the first format; 
corresponding to an identity parameter associated with an identity of the individual, the identity parameter being indicative of sensitive data; and
a weight corresponding to each layer of the plurality of layers.
However, Bonageri teaches:
wherein the characteristics associated with the input dataset comprise:
a first characteristic, indicative of a size of the input dataset being greater than a predetermined size (Bonageri ¶0100: “At 406, a list key risk indicators (KRIs) that are used for adverse event detection are identified … KRIs of a target metric can be identified to measure the importance or contribution of predictor features. In each optimized predictive model, the top most important predictor features (e.g., top ten most important) can be determined as the KRIs. The number of predictor features can be adjusted according to, for instance, the total number of predictor features available in the data, and the distribution of feature importance”; see also Bonageri ¶0103: “At 414, predictions of adverse events are generated for the new investigation data 403 based on the application of the trained model. For example, probabilities associated with different types of unsatisfactory adverse event reporting can be computed for each clinical trial site based on the deployment of the trained learning models to the investigation data 403 in step 408. In some instances, the probabilities are represented as values that ranging from “0” to “1.” In such instances, a threshold value (e.g., 0.65) can be applied to differentiate between clinical trial sites that are identified as being likely to exhibit unsatisfactory adverse event reporting (e.g., probability values exceeding 0.65) and other clinical trial sites that are not likely to exhibit unsatisfactory adverse event reporting (e.g., probability values below 0.65). In some implementations, the threshold value is customizable by a user to balance precision and recall in identifying clinical trial sites that are likely to exhibit unsatisfactory adverse event reporting. In other implementations, a default value of the threshold value (F1) is computed based on a specified equation where F1=2*[(precision)*(recall)]/(precision+recall). In such implementations, a higher threshold value generally tends to improve precision level and decrease the level of recall”—[(emphasis added) wherein the identified characteristic includes a number (i.e., indicating a size) from the dataset including a key risk indicator (KRI), and the KRI can be a number of patient visits, a number of investigators, a number of related clinical trial sites associated with the same clinical trial investigation, a patient visit volume over a specified time period, among others (i.e., the first characteristic and the second characteristic) and the number is processed to represent a probability which is compared to a threshold value (i.e., a predetermined size) to determine if it is below or exceeding the value]); and
a second characteristic, indicative of a size being less than the predetermined size (Bonageri ¶0100: “At 406, a list key risk indicators (KRIs) that are used for adverse event detection are identified … KRIs of a target metric can be identified to measure the importance or contribution of predictor features. In each optimized predictive model, the top most important predictor features (e.g., top ten most important) can be determined as the KRIs. The number of predictor features can be adjusted according to, for instance, the total number of predictor features available in the data, and the distribution of feature importance”; see also Bonageri ¶0103: “At 414, predictions of adverse events are generated for the new investigation data 403 based on the application of the trained model. For example, probabilities associated with different types of unsatisfactory adverse event reporting can be computed for each clinical trial site based on the deployment of the trained learning models to the investigation data 403 in step 408. In some instances, the probabilities are represented as values that ranging from “0” to “1.” In such instances, a threshold value (e.g., 0.65) can be applied to differentiate between clinical trial sites that are identified as being likely to exhibit unsatisfactory adverse event reporting (e.g., probability values exceeding 0.65) and other clinical trial sites that are not likely to exhibit unsatisfactory adverse event reporting (e.g., probability values below 0.65). In some implementations, the threshold value is customizable by a user to balance precision and recall in identifying clinical trial sites that are likely to exhibit unsatisfactory adverse event reporting. In other implementations, a default value of the threshold value (F1) is computed based on a specified equation where F1=2*[(precision)*(recall)]/(precision+recall). In such implementations, a higher threshold value generally tends to improve precision level and decrease the level of recall”—[(emphasis added) wherein the identified characteristic includes a number (i.e., indicating a size) from the dataset including a key risk indicator (KRI), and the KRI can be a number of patient visits, a number of investigators, a number of related clinical trial sites associated with the same clinical trial investigation, a patient visit volume over a specified time period, among others (i.e., the first characteristic and the second characteristic) and the number is processed to represent a probability which is compared to a threshold value (i.e., a predetermined size) to determine if it is below or exceeding the value]);
select a type of a deep neural network component based on a characteristic of the input dataset (Bonageri ¶0049: “The learning models 126 can be used to compute the metrics described throughout. Each learning model specifies a set of one or more predicted analytics techniques that utilize data patterns and/or trends within electronic data to predict the occurrence of a certain condition (e.g., excessive prescribing activity, risk of an adverse event, etc.). In some instances, each learning model is trained to apply an alternative predictive analytic technique to compute corresponding metrics. In this regard, the system 100 selects a particular learning model from among multiple learning models when computing a metric. As described in detail below, the system 100 can use various types of data attributes to determine which learning model to select when computing a metric. These techniques can be used to improve, for instance, computational resources that are necessary to compute the metrics”—[wherein the system selects a particular learning model (i.e., deep neural network component) based on data attributes (i.e., a characteristic of the dataset)]).  
wherein the memory further comprises an identification classifier which, when executed by the processor, cause the processor to (Bonageri  ¶0048: “The database 120 also stores learning models 126 that are used to evaluate stored data to perform data predictions, such as the detection of data anomalies in the medical records data 124 or determining the likelihood of a compliance risk being present within the investigation data 122. The operations performed by the components of the server 110 in relation to data stored in the database 120 are described in reference to FIG. 2. The learning models 126 can specify a different statistical technique that may be applied by the server 110 to compute data metrics. For example, the learning models 126 can specify the use of different classifiers to that are used to predict the progression of tracked data parameters at a subsequent time. The learning models 126 can include parametric models that make specific assumptions with respect to one or more of the data parameters that characterize underlying data distributions, non-parametric models that make fewer data assumptions, and semi-parametric models that combine aspects of parametric and non-parametric models. Examples of such models can include Bayesian theory models, gradient boosting machine models, deep learning models, among others that are often used in predictive analytics”—[wherein components of the server (i.e., coupled to the processor) include different classifiers (i.e., identification classifier)]);
provide to the user, based on the output dataset, the data feature of the input dataset corresponding to the identity parameter in the first format and another data feature of the input dataset in the second format different than the first format (Bonageri Figs. 4B–4C ¶¶0105–0111: “As shown in FIG. 4B, the interface 400B includes various interface elements that allow a user (e.g., a clinical trial investigator, an individual associated with a regulatory agency or a sponsoring organization of a clinical trial) to access and/or manipulate predictions generated by the system 100. For example, interface element 422 displays a graph representing a distribution of probability scores that computed for multiple clinical trial sites. As described throughout, each probability score represents a likelihood that a clinical trial site will exhibit unsatisfactory adverse event reporting (e.g., underreporting, non-reporting, or delayed reporting of adverse events). Interface elements 434, 436, 438, and 442 include visualizations that are adjusted based on the threshold score specified for a probability score in the slider displayed in interface element 432. Interface 438 displays a chart that allows a user to validate accuracy of the risk groups identified in the interface element 434 based on observed data for the clinical trial sites. For example, possible misclassifications are identified based on unsatisfactory adverse event reporting that is actually observed at clinical trial sites. Interface element 442 displays a map that displays different colors to represent that number of high-risk clinical trial sites that are included in various geographies”—[wherein the interface provides scores (i.e., data features of the input dataset) to the user in graphs and charts (i.e., first and second format)]); and
corresponding to an identity parameter associated with an identity of the individual, the identity parameter being indicative of sensitive data (Bonageri ¶0094: “The process 300C can include the operation of determining a score for each medical record included in the subset of medical records (360). For example, the server 110 can determine a score for each medical record included in the subset of medical records based using the learning model. As described throughout, each score can represent a respective likelihood that a certain medical record included in the subset of medical records is associated with an adverse event. For example, a score with a value of 0.32 can represent a 32 percent probability that medical record information for a patient collected during a recent visit indicates that the patient may have experienced a stroke. In this example, the computed score is used to indicate that the patient may have experienced an unexpected side effect of the clinical trial, and that the risk posed by the clinical trial to the unexpected side effect exceeds a predetermined threshold (e.g., 10 percent), which likely indicates that an adverse event has occurred”—[wherein the score can include a probability associated with medical record information from a patient (i.e., an identity parameter associated with an identity of the individual)].
 The system Kuppa, the teachings of Bonageri, and the instant application are analogous art because they pertain to recognizing, encoding, and using neural networks to manipulate data.
It would be obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the system of Kuppa with the teachings of Bonageri to provide for selecting a neural network based best suited for the manipulation based on the characteristics of the data. One would be motivated to do so to improve computational resources when computing the metrics to produce the outputs (Bonageri ¶0049: “The learning models 126 can be used to compute the metrics described throughout. Each learning model specifies a set of one or more predicted analytics techniques that utilize data patterns and/or trends within electronic data to predict the occurrence of a certain condition (e.g., excessive prescribing activity, risk of an adverse event, etc.). In some instances, each learning model is trained to apply an alternative predictive analytic technique to compute corresponding metrics. In this regard, the system 100 selects a particular learning model from among multiple learning models when computing a metric. As described in detail below, the system 100 can use various types of data attributes to determine which learning model to select when computing a metric. These techniques can be used to improve, for instance, computational resources that are necessary to compute the metrics”).
Kuppa in view of Bonageri does not appear to explicitly teach:
wherein encoding further comprises: if the selected deep neural networking component is the CNN component: encoding, using the first data encoder, the input dataset based on quantization of each character of the input dataset using a one-hot encoding component and a first dictionary;
determining, a first formatted dataset based on the encoding of the input dataset, where the first formatted dataset is in two-dimensional data structure representing a matrix of binary digits; and
a weight corresponding to each layer of the plurality of layers.
However, Ba teaches: 
a weight corresponding to each layer of the plurality of layers (Ba Pg. 3, 3.1 Layer normalized recurrent neural networks, Eq 4: “The recent sequence to sequence models [Sutskever et al., 2014] utilize compact recurrent neural networks to solve sequential prediction problems in natural language processing. It is common among the NLP tasks to have different sentence lengths for different training cases. This is easy to deal with in an RNN because the same weights are used at every time-step. But when we apply batch normalization to an RNN in the obvious way, we need to compute and store separate statistics for each time step in a sequence. This is problematic if a test sequence is longer than any of the training sequences. Layer normalization does not have such problem because its normalization terms depend only on the summed inputs to a layer at the current time-step. It also has only one set of gain and bias parameters shared over all time-steps”—[(emphasis added) wherein Ba teaches that when Kuppa’s feed-forward recurrent neural network processes the input through the plurality of layers, the same weights are used at every step]).
The system Kuppa in view of Bonageri, the teachings of Ba, and the instant application are analogous art because they pertain to recognizing, encoding, and using neural networks to manipulate data.
It would be obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the system of Kuppa in view of Bonageri with the teachings of Ba to provide for using the same weight for more than one layer. One would be motivated to do so to solve sequential prediction problems (Ba Pg. 3, 3.1 Layer normalized recurrent neural networks, Eq 4: “The recent sequence to sequence models [Sutskever et al., 2014] utilize compact recurrent neural networks to solve sequential prediction problems in natural language processing. It is common among the NLP tasks to have different sentence lengths for different training cases. This is easy to deal with in an RNN because the same weights are used at every time-step. But when we apply batch normalization to an RNN in the obvious way, we need to compute and store separate statistics for each time step in a sequence. This is problematic if a test sequence is longer than any of the training sequences. Layer normalization does not have such problem because its normalization terms depend only on the summed inputs to a layer at the current time-step. It also has only one set of gain and bias parameters shared over all time-steps”).
Kuppa in view of Bonageri and Ba does not appear to explicitly teach:
wherein encoding further comprises: if the selected deep neural networking component is the CNN component: encoding, using the first data encoder, the input dataset based on quantization of each character of the input dataset using a one-hot encoding component and a first dictionary; and
determining, a first formatted dataset based on the encoding of the input dataset, where the first formatted dataset is in two-dimensional data structure representing a matrix of binary digits.
However, Shanmugamani teaches:
wherein encoding further comprises: if the selected deep neural networking component is the CNN component: encoding, using the first data encoder, the input dataset based on quantization of each character of the input dataset using a one-hot encoding component and a first dictionary (Shanmugamani ¶0042: “In some examples, the one hot encoding encodes each data point to a group of bits with a single high bit (e.g., 1), and a single low bit (e.g., 0) for other bits. In these examples, the length of the group of bits depends on the number of data point variations in a column. For example, assuming that there are only three variations of “APPLE,” “DELL,” and “OTHERS” for the data points in column 208, each of these variations is assigned to a group of bits with only one high in a unique bit location. For example, APPLE can be assigned to [0,0,1], DELL to [0,1,0], and OTHERS to [1,0,0]. In some examples, the one hot encoding is performed on every character using the global character map. For example, if there are five characters in a global character map, a data point ABC may be presented as [1,0,0,0,0], [0,1,0,0,0], [0,0,1,0,0]”; see also Shanmugamani ¶0033: “Data points in a column can be transferred to an input vector space. For example, a data point can be transferred into the input vector space based on the characters in the data point. In some examples, each character can be assigned a unique number, and the unique number is transferred to the input vector space. In some implementations, a global character map that includes all characters that may occur in a data point is utilized. For example, a global character map may cover English lower case letters, English upper case letters, numbers, and special characters (e.g., %, @). Such a global character map can cover, for example, 68 characters. A data point can be transferred to an input vector based on the unique numbers that the global character map provides for the data point's characters. In some examples, the global character map can act as a dictionary and map each character to a group of characters (e.g., a word). For example, each character can be mapped to an n-gram word in the input vector space”—[(Emphasis added) Examiner notes this is a contingent limitation pursuant to MPEP § 2111.04(II) which states “The broadest reasonable interpretation of a method (or process) claim having contingent limitations requires only those steps that must be performed and does not include steps that are not required to be performed because the condition(s) precedent are not met.” Thus, this limitation is not required and holds no patentable weight. However, in the interest of compact prosecution, examiner is including these limitations in the examination]); and
determining, a first formatted dataset based on the encoding of the input dataset, where the first formatted dataset is in two-dimensional data structure representing a matrix of binary digits (Shanmugamani ¶0042: “In some examples, the one hot encoding encodes each data point to a group of bits with a single high bit (e.g., 1), and a single low bit (e.g., 0) for other bits. In these examples, the length of the group of bits depends on the number of data point variations in a column. For example, assuming that there are only three variations of “APPLE,” “DELL,” and “OTHERS” for the data points in column 208, each of these variations is assigned to a group of bits with only one high in a unique bit location. For example, APPLE can be assigned to [0,0,1], DELL to [0,1,0], and OTHERS to [1,0,0]. In some examples, the one hot encoding is performed on every character using the global character map. For example, if there are five characters in a global character map, a data point ABC may be presented as [1,0,0,0,0], [0,1,0,0,0], [0,0,1,0,0]”; see also Shanmugamani Fig. 4D, ¶0049: “In some implementations, the neural network layers in the encoder 460 include convolution, dropout, linear, and exponential linear unit layers. In some examples, at least one of the layers is a fully connected layer. In the depicted example, layers 466, 470 are 2-D convolution layers with kernel size of 50×3, with strides of 1×1, where 50 is the dimension of the one character after encoding; but other sizes and dimensions for the convolution layers are feasible. The output channel (e.g., number of filters) for the two convolution layers 466, 470 are 10 and 20, respectively; but other sizes are also feasible. The dropout layers 468, 472 can have random weights while the network is being trained. For example, the dropout layers are not utilized during a testing stage”—[wherein the dataset is represented as bits with a single high bit (e.g., 1), and a single low bit (e.g., 0) (i.e., binary) and the datasets are processed by 2-D convolution layers (i.e., two-dimensions)]).
The system of Kuppa in view of Bonageri and Ba, the teachings of Shanmugamani, and the instant application are analogous art because they pertain to manipulating inputs to identify data with encoders and artificial neural networks.
It would be obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the system of Kuppa in view of Bonageri and Ba with the teachings of Shanmugamani to provide using a specific encoding algorithm (e.g., one hot encoding) with a map (i.e., dictionary) to identify data. One would be motivated to do so to create data points to map and identify data (Shanmugamani Fig. 4, ¶0038: “The column-type determination layer 402 determines the type of each column in each of the query data set, and the target data set, as described herein. In some implementations, column-type determination also includes pre-processing of data points, as also described herein. Each column is input to a respective encoder 404 based on the type of the column. The type-dependent outputs of the encoders 404 are concatenated at concatenation embedding 406, and are further processed in one or more fully connected layers 408. The output of the fully connected layers for each of the query data set, and the target data set is mapped to a latent space 410 to provide multi-dimensional vectors with a pre-determined dimension for each of the query data set, and the target data set. In some implementations, the multi-dimensional vectors for the query data set, and the target data set have equal dimensions. These multi-dimensional vectors are compared using loss determiner 412 to identify matching data points, and/or columns between the query data set, and the target data set”).
Regarding claim 4, Kuppa in view of Bonageri, Ba, and Shanmugamani teaches all the limitations of claim 1.
Shanmugamani teaches: 
Wherein the memory further comprises a convolutional neural network modeler coupled to the processor, the convolutional network modeler comprising a first set of layers and a second set of layers to (Shanmugamani Fig. 4A, ¶0039: “In some implementations, the column type encoder 404 includes one or more encoders that target a column based on the type of the column. In some implementations, an encoder in 404 is specific to the column type that the encoder accepts as input. In some implementations, the number, and/or type of layers of an encoder that targets a first column type differ from the number, and/or type of layers of an encoder that targets a second column type. In some implementations, the encoders 404 includes fully connected, convolution, and Long Short-Term Memory (LTSM) layers. In some implementations, the size of output of each encoder type differs from the size of other encoder types. In one non-limiting example, the output of each categorical encoder takes up to 20 dimensions, the output of each numerical encoder takes up to 5 dimensions, and the output of each string encoder take up to 128 dimensions”; see also Shanmugamani ¶0036: “In accordance with implementations of the present disclosure, the data columns are input to a neural network that includes one or more layers, to be processed and encoded. In some implementations, the process and encoding (e.g., type or number of layers) varies based on the type of column that is being processed. In some implementations, the encoding at least partially depends on the column type and partly is shared between all types of columns”—[emphasis added]): 
process the first formatted dataset by the first set of layers of the convolutional neural network component using a one-step stride and at least a predefined filter (Shanmugamani Figs. 4A–4D, ¶0049: “In some implementations, the neural network layers in the encoder 460 include convolution, dropout, linear, and exponential linear unit layers. In some examples, at least one of the layers is a fully connected layer. In the depicted example, layers 466, 470 are 2-D convolution layers with kernel size of 50×3, with strides of 1×1, where 50 is the dimension of the one character after encoding; but other sizes and dimensions for the convolution layers are feasible. The output channel (e.g., number of filters) for the two convolution layers 466, 470 are 10 and 20, respectively; but other sizes are also feasible. The dropout layers 468, 472 can have random weights while the network is being trained. For example, the dropout layers are not utilized during a testing stage”—[emphasis added]); 
based on the processing of the first formatted dataset, compute a first output data indicative of a one-dimensional convolution of the first formatted dataset (Shanmugamani Fig. 4, ¶0039: “In some implementations, the column type encoder 404 includes one or more encoders that target a column based on the type of the column. In some implementations, an encoder in 404 is specific to the column type that the encoder accepts as input. In some implementations, the number, and/or type of layers of an encoder that targets a first column type differ from the number, and/or type of layers of an encoder that targets a second column type. In some implementations, the encoders 404 includes fully connected, convolution, and Long Short-Term Memory (LTSM) layers. In some implementations, the size of output of each encoder type differs from the size of other encoder types. In one non-limiting example, the output of each categorical encoder takes up to 20 dimensions, the output of each numerical encoder takes up to 5 dimensions, and the output of each string encoder take up to 128 dimensions”—[wherein the output of the convolution encoder network can be up to 5 dimensions (i.e., output indicative of one-dimension convolution)]); 
process the first output data by the second set of layers of the convolutional neural network component, wherein the second set of layers corresponds to fully connected layers of the artificial neural network (Shanmugamani Fig. 4, ¶0039: “In some implementations, the column type encoder 404 includes one or more encoders that target a column based on the type of the column. In some implementations, an encoder in 404 is specific to the column type that the encoder accepts as input. In some implementations, the number, and/or type of layers of an encoder that targets a first column type differ from the number, and/or type of layers of an encoder that targets a second column type. In some implementations, the encoders 404 includes fully connected, convolution, and Long Short-Term Memory (LTSM) layers. In some implementations, the size of output of each encoder type differs from the size of other encoder types. In one non-limiting example, the output of each categorical encoder takes up to 20 dimensions, the output of each numerical encoder takes up to 5 dimensions, and the output of each string encoder take up to 128 dimensions”; see also Shanmugamani ¶0036: “In accordance with implementations of the present disclosure, the data columns are input to a neural network that includes one or more layers, to be processed and encoded. In some implementations, the process and encoding (e.g., type or number of layers) varies based on the type of column that is being processed. In some implementations, the encoding at least partially depends on the column type and partly is shared between all types of columns”—[emphasis added]); and 
based on processing of the first output data, compute a second output data indicative of the classification of the input dataset (Shanmugamani Fig. 4, ¶0054: “The multi-dimensional vectors are input to the loss determiner 412 to provide an output that indicates one or more matching data points between the query data set, and the target data set. The loss determiner can be used to lower errors in data matchings. In some implementations, the loss determiner includes a loss function that determines the difference between two or more input data. In some examples, the difference is calculated by the cosine distance between the inputs. In some examples, the loss function is evaluated in the latent space, and error is back-propagated to different layers of the network”—[wherein the vectors (i.e., processed first data output) are used as input to the loss determiner that indicates (i.e., classifies) the matching data]).
The same motivation that was utilized for combining Kuppa in view of Bonageri and Ba with Shanmugamani, as set forth in claim 1, is equally applicable to claim 4.
Regarding claim 5, Kuppa in view of Bonageri, Ba, and Shanmugamani teaches all the limitations of claim 4.
Shanmugamani teaches: 
wherein the first dictionary comprises sixty-eight characters (Shanmugamani ¶0033: “Data points in a column can be transferred to an input vector space. For example, a data point can be transferred into the input vector space based on the characters in the data point. In some examples, each character can be assigned a unique number, and the unique number is transferred to the input vector space. In some implementations, a global character map that includes all characters that may occur in a data point is utilized. For example, a global character map may cover English lower case letters, English upper case letters, numbers, and special characters (e.g., %, @). Such a global character map can cover, for example, 68 characters. A data point can be transferred to an input vector based on the unique numbers that the global character map provides for the data point's characters. In some examples, the global character map can act as a dictionary and map each character to a group of characters (e.g., a word). For example, each character can be mapped to an n-gram word in the input vector space”—[(emphasis added) wherein the character map can cover 68 characters (i.e., dictionary comprises 68 characters)), 
the first set of layers comprises six layers, the second set of layers comprises two layers, and the first formatted dataset is one hundred and fifty bits long (Shanmugamani ¶0042: “In some examples, the one hot encoding encodes each data point to a group of bits with a single high bit (e.g., 1), and a single low bit (e.g., 0) for other bits. In these examples, the length of the group of bits depends on the number of data point variations in a column. For example, assuming that there are only three variations of “APPLE,” “DELL,” and “OTHERS” for the data points in column 208, each of these variations is assigned to a group of bits with only one high in a unique bit location. For example, APPLE can be assigned to [0,0,1], DELL to [0,1,0], and OTHERS to [1,0,0]. In some examples, the one hot encoding is performed on every character using the global character map. For example, if there are five characters in a global character map, a data point ABC may be presented as [1,0,0,0,0], [0,1,0,0,0], [0,0,1,0,0]”; see also Shanmugamani Fig. 4D, ¶0047: “FIG. 4D depicts an example encoder 460 for string columns. The encoder 460 performs two string encodings 462, 464, and includes seven neural network layers 466, 468, 470, 472, 474-476, 478-480, 482-484. An output 486 of the string encoder is input to the concatenation embedding 406”—[(emphasis added) wherein the length of the group of bits (i.e., size of the formatted datasets (e.g., assigned bits)) depends on the number of data point variations (i.e., including 150 bits), and wherein the neural network has one or more layers including a set of convolution layers (layers 1 and 3; see Fig. 4D) and 6 other layers including the two dropout layers, three exponential and linear layers, and an output layer (see Fig. 4D)]).
The same motivation that was utilized for combining Kuppa in view of Bonageri and Ba with Shanmugamani as set forth in claims 1 is equally applicable to claim 5.
Regarding claim 7, Kuppa in view of Bonageri, Ba, and Shanmugamani teaches all the limitations of claim 1.
Kuppa teaches:
wherein the memory further comprises the RNN comprising a bi-directional long short term memory modeler, which, when executed by the processor, cause the processor to (Kuppa ¶0105: “Various models including BI-LSTM, Attention-based BI-LSTM, and TD-IDF-based SVM multi-label classifiers may also be used as the classifier. The term frequency-inverse document frequency (TF-IDF) approach represents all textual features as vectors with the same length as the vocabulary of the entire text corpus. For the term frequency-inverse document frequency (TF-IDF) model, each entry in the vector corresponds to a unique word, and its weight gives the frequency of that word in the post divided by its document frequency. These document vectors are then used in the classification task. Also since TF-IDF results in high-dimensional representations, a support vector machine (SVM) is applied on the TF-IDF features. In testing, the MLP classifier 507 operating on the three filtered and combined label domains generates the best results”—[wherein the models include recurrent neural network BI-LSTM (i.e., recurrent bi-directional long short term memory modeler)]): 
process the second formatted dataset by a backward feedback layer component and a forward feedback layer component of the bi-directional long short term component to generate a third output data (Kuppa ¶0068: “The label encoder 504 receives word token embeddings 705 and character-based token embeddings and inputs the embeddings into another bi-Long Short Term Memory (LSTM) model 706. The embeddings include word and character tokens 602 that are derived from descriptions of intrusion techniques (e.g. ATT&CK stages). The label encoder 504 may apply a parser (e.g. word2vec) to the inputs to convert the data to embeddings or vectors. The LSTM model outputs a label 704 to the first combination node and the second combination node. The labels 704 are output to each combination node for improved embeddings and similarity analysis at the nodes”—[wherein each of the tokens (i.e., formatted dataset) is processed through the bi-LSTM (i.e., bi-directional (e.g., forward and backward) model to generate the third output)]); 
process the third output data of the bi-directional long short term component by an adaptive maximum pooling layer function to generate a fourth output data and an adaptive average pooling layer function to generate a fifth output data (Kuppa ¶0099: “For a pre-trained token embedding, a word to vector coder (e.g. word2vec) may be trained with a window size of 8, a minimum vocabulary count of 1, and 15 iterations. The negative sampling number is set to 8 and the model type may be skipgram. The dimension of the output token embedding is set to 300. The transformer network may be configured with 2 transformer blocks, with hidden size of 768 and a feed-forward intermediate layer size of 4×768, i.e., 3072, the hidden size relating to hidden layers of the feed forward neural network. The 768-dimensional representation obtained from the transformer is pooled by the decoder which is a five-layer feed-forward network with rectified linear unit (ReLU) nonlinearity in each layer with a hidden size of 200, and a 300-dimensional output layer for the embedding”—[wherein the data (i.e., the fourth output) is obtained from the decoder by the five-layer pooling feedforward network (i.e., max pooling and adaptive average pooling) to generate the fifth output]); 
concatenate the fourth output data and the fifth output data using a concatenation layer function to generate a sixth output data (Kuppa ¶0092: “For each of the parsing and encoding (embedding) modules 1002, the parsed text may be further processed through a bi-LSTM network, LSTM network, or another artificial recurrent neural network as a part of 804. The token embedding layer of the context encoder 503 takes a token as input and outputs its vector representation, given an input sequence of tokens xl . . . xn, the output vector ei (i=1 . . . n) of each token xi results from the concatenation of two different types of embeddings: token embeddings Vt(xi) and the character-based token embeddings (bi) that come from the output of a character-level and word-level bi-LSTM encoder. Features that have less contextual information but may contain out of vocabulary (OOV) tokens also pass through the token embedding layer to the joint latent space 506”—[wherein each token xi results from the concatenation of two different types of embeddings Vt(xi) and (bi) the come from the output of the LSTM network (i.e., the fourth and fifth output data) to generate the output vector ei (i.e., sixth output)]); and 
process the sixth output data by a third set of layers corresponding to end-to-end connected layers of the recurrent neural network component to generate a seventh output data indicating the classification of the input dataset (Kuppa ¶0070: “A self-attention stage such as stage 708 computes a new value for each vector by comparing it with all vectors (including itself). Additionally, a multi-head transform as in stage 708 transforms an array of vectors and then applies attention to teach head before performing a final transformation. In addition to attention sub-layers, in some designs, each of the layers in the encoder and decoder of the transform network 505 contains a fully connected feed-forward network 709, which is applied to each position separately and identically. Each layer of the transformer network may include a position-wise feed-forward sub-layer 709 that compares across positions of a vector array and passes input through one or more layers of neural networks before output. Residual connections may be maintained across layers or sublayers for easy passage of information through a deep stack of layers”; see also Kuppa ¶0096: “Finally, the output of combination node 1007 and combination node 1009 combined with the probabilities creates a multi-dimensional joint latent space 506 or model where attack chain description labels are mapped to CVE description labels and mitigation labels as in 808. These mapped labels are joined in a single joint latent space 506 via dot product or combination node 1008 as illustrated in FIG. 10 and FIG. 11. The resulting joint space has an independent label dimension”—[where in the BRI of end-to-end includes fully connected (see present disclosure para [0043]), and wherein each layer contains a fully connected feed forward network (i.e., corresponding to end-to-end connected layers) and wherein the latent space 506 (i.e., seventh output data) is created (i.e., generated)]).
Regarding claim 8, Kuppa in view of Bonageri, Ba, and Shanmugamani teaches all the limitations of claim 1.
Shanmugamani teaches: 
wherein the second dictionary comprises sixty eight characters (Shanmugamani ¶0033: “Data points in a column can be transferred to an input vector space. For example, a data point can be transferred into the input vector space based on the characters in the data point. In some examples, each character can be assigned a unique number, and the unique number is transferred to the input vector space. In some implementations, a global character map that includes all characters that may occur in a data point is utilized. For example, a global character map may cover English lower case letters, English upper case letters, numbers, and special characters (e.g., %, @). Such a global character map can cover, for example, 68 characters. A data point can be transferred to an input vector based on the unique numbers that the global character map provides for the data point's characters. In some examples, the global character map can act as a dictionary and map each character to a group of characters (e.g., a word). For example, each character can be mapped to an n-gram word in the input vector space”—[(emphasis added) where in the character map can cover 68 characters (i.e., second dictionary comprises 68 characters)]), 
the second formatted dataset is ten bits long, and the set of embedding layers comprises twenty four embedding layers (Shanmugamani ¶0042: “In some examples, the one hot encoding encodes each data point to a group of bits with a single high bit (e.g., 1), and a single low bit (e.g., 0) for other bits. In these examples, the length of the group of bits depends on the number of data point variations in a column. For example, assuming that there are only three variations of “APPLE,” “DELL,” and “OTHERS” for the data points in column 208, each of these variations is assigned to a group of bits with only one high in a unique bit location. For example, APPLE can be assigned to [0,0,1], DELL to [0,1,0], and OTHERS to [1,0,0]. In some examples, the one hot encoding is performed on every character using the global character map. For example, if there are five characters in a global character map, a data point ABC may be presented as [1,0,0,0,0], [0,1,0,0,0], [0,0,1,0,0]”; see also Shanmugamani ¶0036: “In accordance with implementations of the present disclosure, the data columns are input to a neural network that includes one or more layers, to be processed and encoded. In some implementations, the process and encoding (e.g., type or number of layers) varies based on the type of column that is being processed. In some implementations, the encoding at least partially depends on the column type and partly is shared between all types of columns—[(emphasis added) wherein the length of the group of bits (i.e., size of the formatted datasets (e.g., assigned bits)) depends on the number of data point variations (i.e., including 10 bits), and wherein the neural network has one or more layers that varies (e.g., including 24 layers) based on the type of column being processed]).
The same motivation that was utilized for combining Kuppa in view of Bonageri and Ba with Shanmugamani as set forth in claim 1 is equally applicable to claim 8.
Regarding independent claim 15, Kuppa teaches: 
a non-transitory computer readable medium including machine readable instructions that are executable by a processor to (Kuppa ¶0029: “Further, various aspects and/or embodiments may be described in terms of sequences of actions to be performed by, for example, elements of a computing device. Those skilled in the art will recognize that various actions described herein can be performed by specific circuits (e.g., an application specific integrated circuit (ASIC)), by program instructions being executed by one or more processors, or by a combination of both. Additionally, these sequences of actions described herein can be considered to be embodied entirely within any form of non-transitory computer-readable medium having stored thereon a corresponding set of computer instructions that upon execution would cause an associated processor to perform the functionality described herein. Thus, the various aspects described herein may be embodied in a number of different forms, all of which have been contemplated to be within the scope of the claimed subject matter. In addition, for each of the aspects described herein, the corresponding form of any such aspects may be described herein as, for example, “logic configured to” and/or other structural components configured to perform the described action”—[emphasis added]); and
Although varying in scope, the remaining limitations of claims 15 are substantially the same as the limitations of claim 1. Thus, these limitations are rejected using the same reasoning and analysis as claims 1 above.
Regarding claims 9, 13, and 19, although varying in scope, the limitations of claims 9, 13, and 19 are substantially the same as the limitations of claims 1 and 7, respectively. Thus, claims 9, 13, and 19 are rejected using the same reasoning and analysis as claims 1 and 7 above, respectively.
Regarding claims 11–12, 14, 17–18, and 20, although varying in scope, the limitations of claims 11–12, 14, 17–18, and 20 are substantially the same as the limitations of claims 4–5 and 8, respectively. Thus, claims 11–12, 14, 17–18, and 20 are rejected using the same reasoning and analysis as claims 4–5 and 8 above, respectively.

Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NICHOLAS SHINE whose telephone number is (571)272-2512. The examiner can normally be reached M-F, 11am – 7pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David Yi can be reached on (571) 270-7519. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/N.B.S./Examiner, Art Unit 2126       

/DAVID YI/Supervisory Patent Examiner, Art Unit 2126
Read full office action
Prosecution Timeline

Jul 07, 2020
Application Filed
May 07, 2024
Non-Final Rejection — §103, §112
Jul 22, 2024
Response Filed
Dec 12, 2024
Final Rejection — §103, §112
Feb 13, 2025
Request for Continued Examination
Feb 14, 2025
Response after Non-Final Action
May 28, 2025
Non-Final Rejection — §103, §112
Jul 28, 2025
Response Filed
Oct 10, 2025
Final Rejection — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/173,148
Patent 12579449
HYDROCARBON OIL FRACTION PREDICTION WHILE DRILLING
2y 5m to grant Granted Mar 17, 2026
17/213,958
Patent 12572440
AUTOMATICALLY DETECTING WORKLOAD TYPE-RELATED INFORMATION IN STORAGE SYSTEMS USING MACHINE LEARNING TECHNIQUES
2y 5m to grant Granted Mar 10, 2026
17/172,707
Patent 12561554
ERROR IDENTIFICATION FOR AN ARTIFICIAL NEURAL NETWORK
2y 5m to grant Granted Feb 24, 2026
17/103,827
Patent 12533800
TRAINING REINFORCEMENT LEARNING AGENTS TO LEARN FARSIGHTED BEHAVIORS BY PREDICTING IN LATENT SPACE
2y 5m to grant Granted Jan 27, 2026
17/183,870
Patent 12536428
KNOWLEDGE GRAPHS IN MACHINE LEARNING DECISION OPTIMIZATION
2y 5m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
38%
Grant Probability
82%
With Interview (+44.6%)
5y 1m
Median Time to Grant
High
PTA Risk
Based on 37 resolved cases by this examiner. Grant probability derived from career allow rate.