Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
This action is in response to the amendments filed 11/26/2025. Claims 1, 14, and 20 have been amended, claims 1-6, 8-15, and 17-22 are currently pending.
Response to Arguments
Applicant’s arguments regarding the prior art rejection have been fully considered but they are not persuasive. Applicant asserts that “the activities (event attributes) and the global attributes (case attributes) of Nolle are not “(i) a set of event state features [which] define an event state of the event encoding data object, and (ii) a set of event attribute features [which] define one or more secondary non-state-based features of the event encoding data object” and argues that “Nolle therefore fails to describe determining the event state features and the event attributes accordingly”. Examiner respectfully disagrees and notes that Applicant has not pointed to a definition of “event state features” or “event attribute features” in the claims or in the specification such that the broadest reasonable interpretation would exclude the event attributes and case attributes from Nolle. Applicant has not shown how the event state features or secondary event attributes features differ from the event and secondary attribute modelling from Nolle. The prior art rejections have been updated to include the amended limitations and to clarify the reasoning given for the limitations that were not amended.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-6, 8-9, 14-15, 17-18, and 20-22 are rejected under 35 U.S.C. 103 as being unpatentable over Nolle et al* (“DeepAlign: Alignment Based Process Anomaly Correction using Recurrent Neural Networks”, herein Nolle) in view of Camargo et al* (“Discovering Generative Models from Event Logs”, herein Camargo) in further view of Zhu et al (US 20220051796 A1, herein Zhu), in further view of Mogren* (“C-RNN-GAN: Continuous Recurrent Neural Networks with Adversarial Training”, herein Mogren).
*this document was included in the IDS dated 06/08/2021; therefore an additional copy has not been provided with this action
Regarding claim 1, Nolle teaches a computer-implemented method (the abstract recites “In this paper, we propose DeepAlign, a novel approach to multi-perspective process anomaly correction, based on recurrent neural networks and bidirectional beam search. DeepAlign utilizes the case-level and event-level attributes to closely model the decisions within a process. We evaluate the performance of our approach on an elaborate data corpus of 252 realistic synthetic event logs and compare it to three state-of-the-art conformance checking methods”) comprising:
determining, by one or more processors and based on configuration data associated with an event encoding data object of an ordered sequence of event encoding data objects, (i) a set of event state features that collectively define an event state of the event encoding data object, and (ii) a set of event attribute features that define secondary non-state-based features of the event encoding data object (section 2.1 para. 2-4 recite “During the execution of a digital business process, each process step is stored in a database. This includes information on when the process step was executed (timestamp), what process step was executed (activity), and to which business case it belongs (case identifier). These three fundamental bits of event information are the basis for every process mining algorithm and are usually combined into a single data structure called event log. A log consists of cases, each of which consists of events executed within a process, and some attributes connected to the case (case attributes). Each event is defined by an activity name and its attributes”. Section 3.1 para. 2-3 recite “we propose a new neural architecture for next event prediction. It has been designed to model the sequence of activities (control-flow), the attributes connected to these activities (event attributes), and the global attributes connected to the case (case attributes). Each categorical attribute is fed through an embedding layer to map the values into a lower-dimensional embedding space. To include the case attributes, we make use of the internal state of the GRU. Instead of initializing the state with zeros (the default), we initialize it based on a representation of the case attributes. All case attributes are transformed by a case attribute network, consisting of two fully-connected layers (FC), to produce a real-valued representation of the case attributes. In other words, we initialize the next event prediction with a representation of the case attributes, thereby conditioning it to predict events according to these case attributes” (i.e., determining activity, or state, values, or features that define a state of a given event, and attribute values, or features that define secondary characteristics of a given event based on the representation, or configuration of the encoded data associated with conformance predictive model));
generating, by the one or more processors and by processing the event encoding data object using a state processing recurrent neural network machine learning model, a state-level [attention] weight value corresponding to an event state feature of the set of event state features, wherein the state level attention weight value describes an inferred predictive significance of the event state features to a conformance score for the event encoding data object (section 2.1 para. 3-4 recite “A log consists of cases, each of which consists of events executed within a process, and some attributes connected to the case (case attributes). Each event is defined by an activity name and its attributes (e.g., a user who executed the event). Section 3 para. 1 recites “In this section we describe the DeepAlign algorithm and all its components. An overview of the algorithm is shown in Fig. 1. Two neural networks are trained to predict the next event, one reading cases from left to right (forwards), the other reading them from right to left (backwards). An extended BiBS (i.e., bidirectional beam search) is then used to transform the input case to the most probable case under the two RNN models”. Section 3.1 para. 1-2 recite “we propose a new neural architecture for next event prediction. It has been designed to model the sequence of activities (control-flow), the attributes connected to these activities (event attributes), and the global attributes connected to the case (case attributes). Figure 2 shows the architecture in detail. At the heart of the network is a Gated Recurrent Unit (GRU), a type of RNN. This GRU is iteratively fed an event, consisting of its activity and its event attributes, and must predict the corresponding next event. Each categorical attribute is fed through an embedding layer to map the values into a lower-dimensional embedding space” (i.e., inputting a sequence of encoded objects to a recurrent neural network to generate an activity, or state, value describing an “inferred predictive significance” of features related to states/activities associated with the activity));
generating, by the one or more processors and using an attribute processing machine learning model, a hidden state vector for the event encoding data object and subsequently processing the hidden state vector based on parameters of the attribute processing machine learning model and an activation function to generate an attribute-level [attention] weight vector corresponding to an event attribute feature of the set of event attribute features; wherein the attribute-level [attention] weight value describes an inferred predictive significance of the event attribute feature value to the conformance score for the event encoding data object (section 3 para. 1 recites “In this section we describe the DeepAlign algorithm and all its components. An overview of the algorithm is shown in Fig. 1. Two neural networks are trained to predict the next event, one reading cases from left to right (forwards), the other reading them from right to left (backwards). An extended BiBS (i.e., bidirectional beam search) is then used to transform the input case to the most probable case under the two RNN models”. Section 3.1 para. 1-2 recite “we propose a new neural architecture for next event prediction. It has been designed to model the sequence of activities (control-flow), the attributes connected to these activities (event attributes), and the global attributes connected to the case (case attributes). Figure 2 shows the architecture in detail”. Section 3.2 para. 1 recites “Our goal is to utilize the two RNNs as the reference model for conformance checking and produce an alignment between log and the RNNs”. Section 3.2 para. 3 recites “the GRU output is fed into separate FC layers with Softmax activations to produce a probability distribution over all possible attributes of the next event (i.e., the prediction of the next event)”. Section 3.2 para. 3 further recites “Let further RNN(h, c) be the probability of case c under RNN, initialized with the hidden state h”. Section 3.2 para. 5 recites “ht+1 is the hidden state of RNN after reading c[t+1:T]. An example is shown in Fig. 4”. (i.e., processing a hidden state vector of a machine learning model and using the activation function of the machine learning model to generate a series, or vector, of attributes associated with an activity to show the inferred predictive significance of the features related to the states/activities and attributes associated with those states/activities));
generating, by the one or more processors, a conformance score based at least in part on: (i) the state-level [attention] weight value for the event encoding data object and at least one state-level [attention] weight value corresponding to at least one additional event encoding data object that has a lower position value relative to the event encoding data object in accordance with the ordered sequence, and (ii) the attribute-level [attention] weight vector for the event encoding data object and at least one attribute-level [attention] weight vector for the at least one additional event encoding data object (section 2.2 para. 1 recites “In process analytics, it is desirable to relate the behavior observed in an event log to the behavior defined in a process model. This discipline is called conformance checking. The goal of conformance checking is to find an alignment between an event log and a reference process model”. Section 3.2 para. 2 recites “Our goal is to utilize the two RNNs as the reference model for conformance checking and produce an alignment between log and the RNNs”. The description of figure 3 recites “The probability of a case c = (a, b, c, d, e) is computed by the average probability of the case under both the forward and the backward RNN” (i.e., calculating an alignment, or conformance, of the current event or state data object, the states or event data objects corresponding to lower positions in the sequence, and attribute data associated with the events));
and responsive to the predicting, updating, by the one or more processors, at least one parameter of the at least one state processing recurrent neural network machine learning model and the attribute processing machine learning model (section 2.3 para. 1 recites “Neural networks can be efficiently trained using a gradient descent learning procedure, minimizing the error in the prediction by tuning its internal parameters (weights)”. Section 2.5 para. 3 recites “The BiBS algorithm is iteratively applied to the original sentence, updating it with each step, until convergence, i.e., no insertions would yield a higher probability in any of the K beams” (i.e., the parameters of the machine learning models can be updated as the algorithm is iterated)).
However, while Nolle teaches a machine learning framework which processes values related to related to states/activities and attributes associated with those states/activities (see at least figs. 1-2), Nolle does not explicitly teach a machine learning framework which uses two machine learning models to separate generation of a state-level [attention] weight value for the event encoding data object and the generation of an attribute-level [attention] weight vector for the event encoding data object.
Camargo teaches a machine learning framework which uses two machine learning models to separate generation of a state-level [attention] weight value for the event encoding data object and the generation of an attribute-level [attention] weight vector for the event (section 3 para. 6 recites “Here, we use encoding and scaling techniques depending on the data type of the event attribute (i.e., categorical, or continuous). In the case of the categorical attributes, i.e., activities and roles, they were encoded using the embedded dimensions technique (i.e., attribute-level information). This method helps us to keep the dimensionality low, which enhances the performance of the neural network. The embedded dimensions are n-dimensional spaces in which the models can map the activities and roles according to their proximity. In the case of continuous attributes, that is, start and end timestamps (i.e., state level information), they are first relativized and later scaled over a range of [0, 1]”. Section 3 para. 7 recites “In the Model Training Phase, one of three possible stacked base architectures is selected for training. The network structures vary according to whether or not they share intermediate LSTM or GRU layers, considering that sometimes sharing information can help to differentiate execution patterns. Fig. 3 presents the general structure of the defined architectures: (a) this architecture does not share any information, (b) this architecture concatenates the inputs related with activities and roles, and shares the first layer, (c) this architecture completely shares the first layer” (i.e., architectures a and b teach machine learning frameworks wherein the state-level and attribute-level information is computing by separate recurrent neural networks)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine these teachings by applying the split machine learning model architecture from Camargo to compute the alignments from Nolle. Nolle and Camargo are both directed to using recurrent neural network frameworks to compute conformance of sequential data; as such, one of ordinary skill in the art would understand how to modify the framework from Nolle to use the framework from Camargo to compute conformance scores, as Camargo teaches in at least section 3 paragraph 5 that its method can be used to train other kinds of RNN architecture such as gated recurrent units (like those taught in section 3.1 of Nolle) to broaden the scope of the evaluation.
However, while the combination of Nolle and Camargo teaches generating a state-level weight value and an attribute-level weight vector (see at least section 3.1 of Nolle); the combination of Nolle and Camargo does not explicitly teach generating attention weight values from a neural network.
Zhu teaches generating attention weight values from a neural network (para. [0013] recites “the first recurrent neural network interacts with a first attention mechanism; the second recurrent neural network interacts with a second attention mechanism; the first attention mechanism computes a respective attention weight to apply to a hidden state of the first recurrent neural network corresponding to each time point in the assessment time window; the second attention mechanism computes a respective attention weight to apply to a hidden state of the second recurrent neural network corresponding to each time point in the assessment time window” (i.e., generating attention weight values from two separate recurrent neural networks)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine these teachings by using the attention RNN from Zhu to modify the recurrent neural networks taught by Nolle (as modified by Camargo). Zhu and Nolle are both directed to methods of monitoring for deviations from conformance, or anomalies, in sequential input data. Accordingly, one of ordinary skill in the art would understand how to combine these references such that attention weight values or vectors can be generated using attention recurrent neural networks.
However, while the combination of Nolle, Camargo, and Zhu teaches predicting a conformance score (see at least section 2.2 and 3.2 of Nolle) and adding random noise to event data (see at least section 4 para. 8 of Nolle) the combination of Nolle, Camargo, and Zhu does not explicitly teach predicting, by the one or more processors and [based at least in part on the conformance score], that the event encoding data object is an event noise data object.
Mogren teaches predicting, by the one or more processors and [based at least in part on the conformance score], that the event encoding data object is an event noise data object (section 2 para. 1 recites “The proposed model is a recurrent neural network with adversarial training. The adversaries are two different deep recurrent neural models, a generator (G) and a discriminator (D). The generator is trained to generate data that is indistinguishable from real data, while the discriminator is trained to identify the generated data. The training becomes a zero-sum game for which the Nash equilibrium is when the generator produces data that the discriminator cannot tell from real data. We define the following loss functions LD and LG:”
PNG
media_image1.png
138
496
media_image1.png
Greyscale
where z(i) is a sequence of uniform random vectors in [0, 1]k, and x(i) is a sequence from the training data. k is the dimensionality of the data in the random sequence. The input to each cell in G is a random vector, concatenated with the output of previous cell” (i.e., the discriminator model is used to predict whether a next event was an event noise data object generated by the generator model or a real event data object. Examiner’s Note: an “event noise data object” is interpreted as an “object that is not sampled from observed data and is manufactured based at least in part on event noise data by the generator machine learning model” based on paragraph [0100] of Applicant’s specification. Examiner notes that the broadest reasonable interpretation of “noise data” includes the sequence of uniform random vectors z(i) created as the input for the generator model G, modified by the technique from Nolle section 4 wherein random noise can be applied to event data)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine these teachings by modify the generative adversarial model from Mogren with the multi-RNN framework from Nolle (as modified by Camargo and Zhu) to further randomize the random data input to Mogren’s generator model with the random noise event data from Nolle. Nolle and Mogren are both directed to using RNNs to model sequential continuous data, and Mogren teaches that other kinds of continuous data may be modeled by their RNN-GAN framework. As the functions of the RNN framework from the combination of Nolle, Camargo, and Zhu and the RNN framework from Mogren are both known in the art, one of ordinary skill would have understood how to modify the generative adversarial model from Mogren with the ability to randomize data with event noise from Nolle.
Regarding claim 2, the combination of Nolle, Camargo, Zhu, and Mogren teaches the computer-implemented method of claim 1, wherein the event encoding data object comprises an event state encoding and an event attribute feature encoding characterized by one or more event attribute features (Nolle section 3.1 para. 1-2 recite “we propose a new neural architecture for next event prediction. It has been designed to model the sequence of activities (control-flow), the attributes connected to these activities (event attributes), and the global attributes connected to the case (case attributes). Figure 2 shows the architecture in detail. At the heart of the network is a Gated Recurrent Unit (GRU), a type of RNN. This GRU is iteratively fed an event, consisting of its activity and its event attributes, and must predict the corresponding next event. Each categorical attribute is fed through an embedding layer to map the values into a lower-dimensional embedding space. To include the case attributes, we make use of the internal state of the GRU. Instead of initializing the state with zeros (the default), we initialize it based on a representation of the case attributes” (i.e., a state, or an event, data object is encoded and comprises features associated with the event)).
Regarding claim 3, the combination of Nolle, Camargo, Zhu, and Mogren teaches the computer-implemented method of claim 2, wherein the attribute-level attention weight vector comprises an attribute-level attention weight value the one or more event attribute features (Nolle section 3.1 para. 1-2 recite “we propose a new neural architecture for next event prediction. It has been designed to model the sequence of activities (control-flow), the attributes connected to these activities (event attributes), and the global attributes connected to the case (case attributes). Figure 2 shows the architecture in detail. At the heart of the network is a Gated Recurrent Unit (GRU), a type of RNN. This GRU is iteratively fed an event, consisting of its activity and its event attributes, and must predict the corresponding next event. Each categorical attribute is fed through an embedding layer to map the values into a lower-dimensional embedding space. To include the case attributes, we make use of the internal state of the GRU. Instead of initializing the state with zeros (the default), we initialize it based on a representation of the case attributes” (i.e., a collection of event attributes, or an attribute vector, data object is encoded and comprises features associated with the attributes)).
Regarding claim 4, the combination of Nolle, Camargo, Zhu, and Mogren teaches the computer-implemented method of claim 1, wherein the state processing recurrent neural network machine learning model is a long short term memory machine learning model (Camargo section 3 para. 4-5 recite “paper, we use the DeepGenerator method proposed in [2] to train generative DL models for the task of generating complete event logs. The DeepGenerator method extends previous approaches for training DL models with LSTM architectures, by including dimensionality control techniques such as the use of n-grams and embedded dimensions, as well as the exploration of random sampling following probability distribution for the category selection of the next predicted event” (i.e., a LSTM model)).
Regarding claim 5, the combination of Nolle, Camargo, Zhu, and Mogren teaches the computer-implemented method of claim 1, wherein the attribute processing machine learning model is a long short term memory machine learning model (Camargo section 3 para. 4-5 recite “paper, we use the DeepGenerator method proposed in [2] to train generative DL models for the task of generating complete event logs. The DeepGenerator method extends previous approaches for training DL models with LSTM architectures, by including dimensionality control techniques such as the use of n-grams and embedded dimensions, as well as the exploration of random sampling following probability distribution for the category selection of the next predicted event” (i.e., a LSTM model)).
Regarding claim 6, the combination of Nolle, Camargo, Zhu, and Mogren teaches the computer-implemented method of claim 1, wherein the state processing recurrent neural network machine learning model and the attribute processing machine learning model (Camargo section 3 para. 7 recites “In the Model Training Phase, one of three possible stacked base architectures is selected for training. The network structures vary according to whether or not they share intermediate LSTM or GRU layers, considering that sometimes sharing information can help to differentiate execution patterns. Fig. 3 presents the general structure of the defined architectures: (a) this architecture does not share any information, (b) this architecture concatenates the inputs related with activities and roles, and shares the first layer, (c) this architecture completely shares the first layer” (i.e., architectures a and b teach machine learning frameworks wherein the state-level and attribute-level information is computing by separate recurrent neural networks)) are part of a discriminator machine learning model, and the discriminator machine learning model is trained as part of a generative adversarial machine learning framework (Mogren section 2 para. 1 recites “The proposed model is a recurrent neural network with adversarial training. The adversaries are two different deep recurrent neural models, a generator (G) and a discriminator (D). The generator is trained to generate data that is indistinguishable from real data, while the discriminator is trained to identify the generated data” (i.e., a trained discriminator model of a generative adversarial model can be comprised of recurrent neural networks such as those taught by the combination of Nolle, Camargo, and Zhu)).
Regarding claim 8, the combination of Nolle, Camargo, Zhu, and Mogren teaches the computer-implemented method of claim 6, the computer-implemented method further comprising: identifying, by the one or more processors, a set of event noise data objects; identifying, by the one or more processors, a defined number of observed event encoding data objects based at least in part on an observed event distribution (Nolle section 4 para. 1 recites “We evaluate the DeepAlign algorithm for the task of anomaly correction. Given an event log consisting of normal and anomalous cases, an anomaly correction algorithm is expected to align each case in the event log with a correct activity sequence according to the process (without anomalies) that produced the event log”. Nolle section 4 para. 8 recites “we introduce noise to the event logs by randomly applying one of 7 anomalies to a fixed percentage of the cases in the event log”. Mogren section 2 para. 1 recites “The proposed model is a recurrent neural network with adversarial training. The adversaries are two different deep recurrent neural models, a generator (G) and a discriminator (D). The generator is trained to generate data that is indistinguishable from real data, while the discriminator is trained to identify the generated data. The training becomes a zero-sum game for which the Nash equilibrium is when the generator produces data that the discriminator cannot tell from real data. We define the following loss functions LD and LG:”
PNG
media_image1.png
138
496
media_image1.png
Greyscale
where z(i) is a sequence of uniform random vectors in [0, 1]k, and x(i) is a sequence from the training data. k is the dimensionality of the data in the random sequence. The input to each cell in G is a random vector, concatenated with the output of previous cell” (i.e., identifying observed event data objects and event noise data objects that have been created by the generator model (see claim 1 for interpretation of an “event noise data object”));
generating, by the one or more processors and by processing each event noise data object using the discriminator machine learning model, a set of event noise inferences; generating, by the one or more processors and by processing each observed event encoding data object using the discriminator machine learning model, a set of observed event inferences (Mogren section 2 para. 1 recites “The proposed model is a recurrent neural network with adversarial training. The adversaries are two different deep recurrent neural models, a generator (G) and a discriminator (D). The generator is trained to generate data that is indistinguishable from real data, while the discriminator is trained to identify the generated data”. Mogren section 2 para. 3 recites “The discriminator consists of a bidirectional recurrent net, allowing it to take context in both directions into account for its decisions. In this work, the recurrent network used is the Long short-term memory (LSTM)” (i.e., the discriminator from Mogren can be used to generate inferences from data such as the generator input event data objects randomized by the noise from Nolle));
generating, by the one or more processors and based at least in part on the set of event noise inferences and the set of observed event inferences, a discriminator gradient value for the discriminator machine learning model; and updating, by the one or more processors, one or more parameters of the discriminator machine learning model to maximize the discriminator gradient value (Mogren section 2 para. 1 recites “The proposed model is a recurrent neural network with adversarial training. The adversaries are two different deep recurrent neural models, a generator (G) and a discriminator (D). The generator is trained to generate data that is indistinguishable from real data, while the discriminator is trained to identify the generated data. The training becomes a zero-sum game for which the Nash equilibrium is when the generator produces data that the discriminator cannot tell from real data. We define the following loss functions LD and LG:”
PNG
media_image1.png
138
496
media_image1.png
Greyscale
Mogren section 3 para. 1 recites “The LSTM network in both G and D has depth , each LSTM cell has 350 internal (hidden) units. D has a bidirectional layout, while G is unidirectional. The output from each LSTM cell in D are fed into a fully connected layer with weights shared across time steps, and one sigmoid output per cell is then averaged to the final decision for the sequence”. Mogren section 3 para. 5 recites “Backpropagation through time (BPTT) and mini-batch stochastic gradient descent was used. Learning rate was set to 0.1, and we apply L2 regularization to the weights both in G and D. The model was pretrained for 6 epochs with a squared error loss for predicting the next event in the training sequence. Just as in the adversarial setting, the input to each LSTM cell is a random vector v, concatenated with the output at previous time step, v is uniformly distributed in [0, 1]k, and k was chosen to be the number of features in each tone” (i.e., performing stochastic gradient descent with the discriminator to output a gradient value and update the parameters of the discriminator for the next training epoch)).
Regarding claim 9, the combination of Nolle, Camargo, Zhu, and Mogren teaches the computer-implemented method of claim 6, wherein generating the discriminator machine learning model comprises: identifying, by the one or more processors, a set of event noise data objects (Mogren section 2 para. 1 recites “The proposed model is a recurrent neural network with adversarial training. The adversaries are two different deep recurrent neural models, a generator (G) and a discriminator (D). The generator is trained to generate data that is indistinguishable from real data, while the discriminator is trained to identify the generated data. The training becomes a zero-sum game for which the Nash equilibrium is when the generator produces data that the discriminator cannot tell from real data. We define the following loss functions LD and LG:”
PNG
media_image1.png
138
496
media_image1.png
Greyscale
where z(i) is a sequence of uniform random vectors in [0, 1]k, and x(i) is a sequence from the training data. k is the dimensionality of the data in the random sequence. The input to each cell in G is a random vector, concatenated with the output of previous cell”. Nolle section 4 para. 8 recites “we introduce noise to the event logs by randomly applying one of 7 anomalies to a fixed percentage of the cases in the event log” (i.e., a number of event noise data objects created by the generator model and randomized by the event noise from Nolle. See claim 1 for interpretation of “event noise data objects”));
generating, by the one or more processors and by processing each event noise data object using the discriminator machine learning model, a set of event noise inferences (Mogren section 2 para. 1 recites “The proposed model is a recurrent neural network with adversarial training. The adversaries are two different deep recurrent neural models, a generator (G) and a discriminator (D). The generator is trained to generate data that is indistinguishable from real data, while the discriminator is trained to identify the generated data”. Mogren section 2 para. 3 recites “The discriminator consists of a bidirectional recurrent net, allowing it to take context in both directions into account for its decisions. In this work, the recurrent network used is the Long short-term memory (LSTM)” (i.e., the discriminator from Mogren can be used to generate inferences from data such as the generated events created by the generator modified by the random noise data from Nolle));
generating, by the one or more processors and based at least in part on the set of event noise inferences, a generator gradient value for a corresponding generator machine learning model; and updating, by the one or more processors one or more parameters of the corresponding generator machine learning model to minimize the generator gradient value (Mogren section 2 para. 1 recites “The proposed model is a recurrent neural network with adversarial training. The adversaries are two different deep recurrent neural models, a generator (G) and a discriminator (D). The generator is trained to generate data that is indistinguishable from real data, while the discriminator is trained to identify the generated data. The training becomes a zero-sum game for which the Nash equilibrium is when the generator produces data that the discriminator cannot tell from real data. We define the following loss functions LD and LG:”
PNG
media_image1.png
138
496
media_image1.png
Greyscale
Mogren section 3 para. 1 recites “The LSTM network in both G and D has depth , each LSTM cell has 350 internal (hidden) units. D has a bidirectional layout, while G is unidirectional. The output from each LSTM cell in D are fed into a fully connected layer with weights shared across time steps, and one sigmoid output per cell is then averaged to the final decision for the sequence”. Mogren section 3 para. 5 recites “Backpropagation through time (BPTT) and mini-batch stochastic gradient descent was used. Learning rate was set to 0.1, and we apply L2 regularization to the weights both in G and D. The model was pretrained for 6 epochs with a squared error loss for predicting the next event in the training sequence. Just as in the adversarial setting, the input to each LSTM cell is a random vector v, concatenated with the output at previous time step, v is uniformly distributed in [O, 1]k, and k was chosen to be the number of features in each tone” (i.e., performing stochastic gradient descent with the generator to output a gradient value and update the parameters of the generator for the next training epoch)).
Claim 14 is a system claim and its limitation is included in claim 1. The only difference is that claim 14 requires a system. Therefore, claim 14 is rejected for the same reasons as claim 1.
Claim 15 is a system claim and its limitation is included in claim 6. Claim 15 is rejected for the same reasons as claim 6.
Claim 17 is a system claim and its limitation is included in claim 8. Claim 17 is rejected for the same reasons as claim 8.
Claim 18 is a system claim and its limitation is included in claim 9. Claim 18 is rejected for the same reasons as claim 9.
Claim 20 is a non-transitory computer-readable storage medium claim and its limitation is included in claim 1. The only difference is that claim 20 requires a non-transitory computer-readable storage medium. Therefore, claim 20 is rejected for the same reasons as claim 1.
Claim 21 is a non-transitory computer-readable storage media claim and its limitation is included in claim 6. Claim 15 is rejected for the same reasons as claim 6.
Claim 22 is a non-transitory computer-readable storage media claim and its limitation is included in claim 9. Claim 18 is rejected for the same reasons as claim 9.
Claims 10-13 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Nolle et al* (“DeepAlign: Alignment Based Process Anomaly Correction using Recurrent Neural Networks”, herein Nolle) in view of Camargo et al* (“Discovering Generative Models from Event Logs”, herein Camargo), in further view of Zhu et al (US 20220051796 A1, herein Zhu), in further view of Mogren* (“C-RNN-GAN: Continuous Recurrent Neural Networks with Adversarial Training”, herein Mogren), in further view of Scheepens et al (US 20210201184 A1, herein Scheepens).
Regarding claim 10, the combination of Nolle, Camargo, Zhu, and Mogren teaches the computer-implemented method of claim 2.
However, the combination of Nolle, Camargo, Zhu, and Mogren does not teach determining, by the one or more processors, for a kth event attribute feature of the one or more event attribute features of a jth event encoding data object in the ordered sequence and with respect to the conformance score for the jth event encoding data object, an attribute-level in-state contribution score based at least in part on the state-level attention weight value for the jth event encoding data object, the attribute-level attention weight vector for the jth event encoding data object, one or more trained parameters, and a target value of the jth event encoding data object that corresponds to the kth event attribute feature.
Scheepens teaches determining, by the one or more processors, for a kth event attribute feature of the one or more event attribute features of a jth event encoding data object in the ordered sequence and with respect to the conformance score for the jth event encoding data object, an attribute-level in-state contribution score based at least in part on the state-level attention weight value for the jth event encoding data object, the attribute-level attention weight vector for the jth event encoding data object, one or more trained parameters, and a target value of the jth event encoding data object that corresponds to the kth event attribute feature (para. [0037] recites “Feature contributions can form part of the explanation framework to provide the user with an understanding of what leads to the prediction. Determining feature contribution involves computing and interpreting values based on one or more types of attributes encoded in the feature vector associated with the running process”. Para. [0045] recites “FIGS. 7A, 7B and 7C show views of the three attribute types, which are sub-sections of feature contributions. FIG. 7A shows case attributes 705, which in this example includes case type 706 (e.g., "high level invoice", "small invoice", etc.) and contribution 707 (e.g., time contribution) (i.e., a case attribute corresponds to a state-level attribute value). FIG. 7B shows activities 710, which in this example includes the activity type 711 (i.e., an activity type corresponds to an attribute level value), occurrences 712, and contribution 713. FIG. 7C shows event attributes 715, which in this example includes the country 716, occurrences 717, and contribution 718” (i.e., determining a contribution score for a given attribute based on state-level values, attribute-level values, and attribute features. Examiner’s Note: the terms “state-level” and “attribute-level” attention weight value are interpreted in light of paragraphs [0036] and [0038] of Applicant’s specification respectively as a value describing an “inferred predictive significance” of features related to states/activities and attributes associated with those states/activities)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine these teachings by using the feature contribution calculation method from Scheepens to determine which features contribute to the conformance prediction from Nolle (as modified by Camargo, Zhu, and Mogren). Nolle and Scheepens are both directed to process mining techniques, which include conformance checking (see at least paragraph [0002] of Scheepens). Scheepens also teaches in at least paragraph [0008] that its methods are model-agnostic and can be applied to any model framework used for process mining. As such, it would be obvious to one of ordinary skill in the art to improve the conformance prediction from Nolle with the feature contribution calculation from Scheepens by determining which features in Nolle contribute most to the overall conformance prediction.
Regarding claim 11, the combination of Nolle, Camargo, Zhu, Mogren, and Scheepens teaches the computer-implemented method of claim 10, further comprising: generating user interface data for a prediction output user interface that depicts each attribute-level in-state contribution score for the jth event encoding data object (Scheepens para. [0045] recites “FIGS. 7A, 7B and 7C show views of the three attribute types, which are sub-sections of feature contributions. FIG. 7A shows case attributes 705, which in this example includes case type 706 (e.g., "high level invoice", "small invoice", etc.) and contribution 707 (e.g., time contribution) (i.e., a case attribute corresponds to a state-level attribute value). FIG. 7B shows activities 710, which in this example includes the activity type 711 (i.e., an activity type corresponds to an attribute level value), occurrences 712, and contribution 713. FIG. 7C shows event attributes 715, which in this example includes the country 716, occurrences 717, and contribution 718” (i.e., generating user interface data depicting at least attribute-level contribution information. Examiner’s Note: the terms “state-level” and “attribute-level” attention weight value are interpreted in light of paragraphs [0036] and [0038] of Applicant’s specification respectively as a value describing an “inferred predictive significance” of features related to states/activities and attributes associated with those states/activities)).
Regarding claim 12, the combination of Nolle, Camargo, Zhu, and Mogren teaches the computer-implemented method of claim 1.
However, the combination of Nolle, Camargo, Zhu, and Mogren does not teach generating user interface data for a prediction output user interface that depicts the conformance score for each event encoding data object in the ordered sequence.
Scheepens teaches generating user interface data for a prediction output user interface that depicts the conformance score for each event encoding data object in the ordered sequence (para. [0046] recites “FIG. 8 shows a screenshot of an integrated dashboard 800 with tabs for overview dashboard 805, contributions dashboard 830 and training data dashboard 850. The training data dashboard 850 is configured to show an overview of the training data. In this non-limiting example, training data dashboard 850 includes dashboard item 851 that shows a view of the prediction of the selected case and its trustworthiness (e.g., "high").” (i.e., generating user interface data depicting information related to whether the prediction is trustworthy, which falls under the broadest reasonable interpretation of determining whether a prediction meets a conformance standard)).
See claim 10 for motivation to combine.
Regarding claim 13, the combination of Nolle, Camargo, Zhu, and Mogren teaches the computer-implemented method of claim 1.
However, the combination of Nolle, Camargo, Zhu, and Mogren does not teach generating user interface data for a prediction output user interface that depicts the state-level attention weight value for each event encoding data object in the ordered sequence.
Scheepens teaches generating user interface data for a prediction output user interface that depicts the state-level attention weight value for each event encoding data object in the ordered sequence (para. [0045] recites “FIGS. 7A, 7B and 7C show views of the three attribute types, which are sub-sections of feature contributions. FIG. 7A shows case attributes 705, which in this example includes case type 706 (e.g., "high level invoice", "small invoice", etc.) and contribution 707 (e.g., time contribution) (i.e., a case attribute corresponds to a state-level attribute value). FIG. 7B shows activities 710, which in this example includes the activity type 711 (i.e., an activity type corresponds to an attribute level value), occurrences 712, and contribution 713. FIG. 7C shows event attributes 715, which in this example includes the country 716, occurrences 717, and contribution 718” (i.e., generating user interface data depicting at least state-level contribution information. Examiner’s Note: the terms “state-level” and “attribute-level” attention weight value are interpreted in light of paragraphs [0036] and [0038] of Applicant’s specification respectively as a value describing an “inferred predictive significance” of features related to states/activities and attributes associated with those states/activities)).
See claim 10 for motivation to combine.
Claim 19 is a system claim and its limitation is included in claim 10. Claim 19 is rejected for the same reasons as claim 10.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US 20230061808 A1 (Gillian) teaches a method for monitoring an activity using a machine learning model, wherein events are associated with interactive objects associated with configuration data.
US 11941531 B1 (Arik et al) teaches a method for determining respective attention weights for input data elements using a neural network to predict a characterization of a given input data element.
“Evaluating Conformance Measures in Process Mining using Conformance Propositions” (Syring et al) teaches methods for formulating conformance propositions using the formulated propositions to evaluate current and existing conformance measures.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEAH M FEITL whose telephone number is (571) 272-8350. The examiner can normally be reached on M-F 0900-1700 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Viker Lamardo can be reached on (571) 270-5871. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/L.M.F./ Examiner, Art Unit 2147
/VIKER A LAMARDO/Supervisory Patent Examiner, Art Unit 2147