Office Action Analysis: 17746457 — CORRELATION MODEL INTERPRETER USING TEACHER-STUDENT MODELS

Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
	Claims 1-20 are presented for examination in this application (17746457) filed on May 17, 2022.
	The Examiner cites particular sections in the references as applied to the claims
below for the convenience of the applicant(s). Although the specified citations are
representative of the teachings in the art and are applied to the specific limitations within
the individual claim, other passages and figures may apply as well. It is respectfully
requested that, in preparing responses, the applicant(s) fully consider the references in
their entirety as potentially teaching all or part of the claimed invention, as well as the
context of the passage as taught by the prior art or disclosed by the Examiner.

Response to Arguments
Applicant’s arguments and remarks filed 2025-09-22 have been fully considered. The arguments and remarks regarding the 35 U.S.C 101 rejections were found to be persuasive. The arguments and remarks regarding the 35 U.S.C 103 rejections were found to be persuasive however the amendments have necessitated a change in the references applied. The 35 U.S.C 103 rejections have been maintained via new ground of rejection.
 
35 U.S.C 103
Applicant’s response: 
Applicant respectfully traverses the § 103 rejections because the Examiner failed to state a prima facie case of obviousness and/or the current amendments to the claims now render the Examiner's arguments moot. The Examiner agrees that Chan fails to teach or suggest all the claimed limitations, but, as discussed below, a combination of Chan-1 and Chan-2 fails to compensate for the deficiencies of Chan.
As detailed above, claim 1 is allowable over Chan in view of Chen-1.
Accordingly, Chan focuses on improving accuracy of making decisions by the 
respective surrogate models by training the respective surrogate models based on training data were previously used to train the machine learning model. Chan does not and does not need to teach creating, based on the feature as identified based on a first score of the interpreter model, an additional training data for training the correlation model. Nor does Chan teach training the correlation model using the additional training data. 
McNutt generally relates to estimating a relative binding free energy (RBFE) by a Siamese Convolutional Neural Network as a model. Specifically, McNutt describes performing testing of the model on the two-permutations between the ligands in the training set and ligands that the model has not seen. 
If one skilled in the art were to modify the surrogate models of Chan to combine with the teachings of Chen-1, Chen-2, and McNutt, the result would not and would not need to teach creating, based on the feature as identified based on a first score of the interpreter model, an additional training data for training the correlation model. Nor would the result describe training the correlation model using the additional training data. The result would not describe the limitations of claim 1 detailed above. 
As such, claim 1 is allowable over Chan in view of Chen-1 and in view of Chen-2 and further in view of McNutt.

Examiner’s response:
	The Examiner respectfully disagrees. In regard to the assertion that there is not a prima facie case of obviousness from the combination of Chan and Chen 2,  the examiner recognizes that obviousness may be established by combining or modifying the teachings of the prior art to produce the claimed invention where there is some teaching, suggestion, or motivation to do so found either in the references themselves or in the knowledge generally available to one of ordinary skill in the art. See In re Fine, 837 F.2d 1071, 5 USPQ2d 1596 (Fed. Cir. 1988), In re Jones, 958 F.2d 347, 21 USPQ2d 1941 (Fed. Cir. 1992), and KSR International Co. v. Teleflex, Inc., 550 U.S. 398, 82 USPQ2d 1385 (2007). In this case, the combination of Chan and Chen 2 to include limitations of having multi-dimensional vector representations wherein the number is based on a number of features associated with the set of data was found to combined in order to increase the likelihood of incident links.
	In regard to the assertions made that Chan, as well as neither Chen 1, Chen 2, nor McNutt, teach creating, based on the feature as identified based on a first score of the interpreter model, an additional training data for training the correlation model, the arguments have been fully considered but are considered moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Objections
	Claim 15 is objected to because of the following informality: this claim depends on claim 12, which was cancelled. Appropriate correction should be made. Claim 17 is objected to for the following informalities: a typo appears to be present in the limitations “the first pair of sets incident data” and “correlation model predicts a correlation between first set of the incident and the second set of the incident data”. A potential correction could be to add “of” in between “sets” and “incident” as well as to add “data” after incident in the latter limitation. Appropriate corrections should be made.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-6, 9-11, 13-17, and 19-20 are rejected under 35 U.S.C 103 as being unpatentable over Chan (US20190325335 hereinafter referred to as Chan) in view of Wong et al. (US20230186052A1 hereinafter referred to as Wong).
Regarding claim 1:
	Chan teaches a computer-implemented method, the method comprising (see claim 1: “A method, comprising:”),
… 
	 training an interpreter model using the training data, wherein the interpreter model interprets a behavior of a correlation model trained based on the training data (see [0033: “ Surrogate model 112 may receive, from complex model server 102 via network 105, training data 106, model prediction data 107, and actual outcome data 108 and store as training data 116, model prediction data 117, and actual outcome data 118, respectively. Using training data 116, model prediction data 117, and actual outcome data 118, surrogate model server 122 may train one or more surrogate models to make one or more predictions. ”. Also, see [0034]: “Linear surrogate model 114 may be a K-LIME surrogate model. With K-LIME, local generalized linear model (GLM) surrogates are used to explain the predictions of complex response functions, and local regions are defined by K clusters or user-defined segments instead of simulated, perturbed observation samples.”), 
	wherein the correlation model predicts a correlation between the first set of data and the second set of data in the first pair of sets of data (see [0070]: “In some embodiments, the machine learning model prediction correlates with the actual outcome data. For example, a point 308 on line 301 indicates a particular outcome is likely to happen (prediction label ≈0.75) and black dot 307 indicates that the particular outcome actually occurred.”),  
	Chan does not teach retrieving, a first pair of sets of data, wherein the first pair of sets of data comprises a first set of data and a second set of data, and the first set of data comprises a feature and a value of the feature, wherein the training data further includes a ground-truth correlation between a first set of data and a second set of data in the first set of data and the second set of data, creating, based on the identified feature, additional training data with the emphasis on the identified feature for further training the correlation model, generating, by the interpreter model, a first score as a global feature importance score and a second score as a local importance score, wherein the global feature importance score enables identifying a feature incident ticket as the sets of data to emphasize, and the local feature importance feature enables comparing important features between a pair of incident tickets, and training the correlation model using the additional training data, thereby the correlation after trained predicts a correlation between two input data according to the emphasis in the identified feature with accuracy. 
	Wong, however, analogously teaches retrieving, a first pair of sets of data, wherein the first pair of sets of data comprises a first set of data and a second set of data, and the first set of data comprises a feature and a value of the feature (see para [0004]: “A first training set of linked pairs of incident management tickets is generated, where each linked pair of the first training set is labeled as being linked and comprises: a first ticket having a first text feature and a second feature, and a second ticket having a first text feature and a second feature. ”)
	generating, based on the first pair of sets of data, training data wherein the training data further includes a ground-truth correlation between a first set of data and a second set of data in the first set of data and the second set of data (see para [0004]: “The Siamese neural network model is trained using the first input embeddings, the second input embeddings, the second feature of the first ticket, and the second feature of the second ticket as inputs to an output layer of the Siamese neural network model. The output layer is configured to generate first output embeddings for the first ticket and second output embeddings for the second ticket. The Siamese neural network model is trained using a contrastive loss function between the first output embeddings for the first ticket and the second output embeddings for the second ticket.”)
	generating, emphasize, and the local feature importance feature enables comparing important features between a pair of incident tickets (see para [0033]: “In some examples, the Siamese neural network model 205 is trained through contrastive loss to learn relationships between ticket pairs’ labels (e.g., related or unrelated) and a plurality of text features. Generally, the trained embeddings 252 and 254 for each ticket pair are used to calculate a Euclidean distance and pairs that are linked have embeddings close in Euclidean distance, while unlinked pairs are farther apart. In other examples, the Siamese neural network model 205 is trained using cosine embedding loss or other suitable loss functions”)
	identifying, based on the first score and the second score the feature as an emphasis for further training the correlation model (see para [0044]: “The method 400 may further include training the Siamese neural network model using the first text feature of the third ticket and the first text feature of the fourth ticket as inputs to the input layer of the Siamese neural network model, the input layer being configured to generate first input embeddings for the third ticket and second input embeddings for the fourth ticket; training the Siamese neural network model using the first input embeddings, the second input embeddings, the second feature of the third ticket, and the second feature of the fourth ticket as inputs to the output layer of the Siamese neural network model, the output layer being configured to generate first output embeddings for the third ticket and second output embeddings for the fourth ticket; and training the Siamese neural network model using a contrastive loss function between the first output embeddings for the third ticket and the second output embeddings for the fourth ticket.”)
	creating, based on the identified feature, additional training data with the emphasis on the identified feature for further training the correlation model (see para [0039]: “At step 402, a first training set of linked pairs of incident management tickets is generated. In some examples, each linked pair of the first training set is labeled as being linked and comprises a first ticket having a first text feature and a second feature, and a second ticket having a first text feature and a second feature. In some examples, the ticket generator 114 generates the first training set of linked pairs. In various examples, the first training set of linked pairs may include the first ticket 302 and the second ticket 304, and/or the first ticket 202 and the second ticket 204.”)
training the correlation model using the additional training data, thereby the correlation after trained predicts a correlation between two input data according to the emphasis in the identified feature with accuracy (see para [0024]: “For example, the source tickets 164 include various groups of two or more tickets that have been labeled as being linked, and the neural network model 162 is trained to identify similar links between tickets. In some aspects, the neural network model 162 is also configured to determine a confidence level of the identified links (e.g., 95% confident).”. Also see para [0017]: “ In examples described herein, a Siamese neural network model is utilized by an incident processor to predict whether pairs or groups of tickets are linked to one another.”)
Before the effective filing data of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Chan and Wong in order to work with incident data training pairs of data to predict correlation in order to discern between several tickets related to improve diagnosing problems and managing of tickets (see para [0001]: “For example, tickets may be generated by various computing devices or network management personnel and open tickets may then be reviewed by on-call engineers or site reliability engineers. However, management of tickets becomes challenging when the computing devices are spread out across a large geographical area and encompass many different business groups or sub-groups, at least due to a number of tickets that may be generated, their frequency of generation, etc. Although a trained and experienced engineer may be able to discern when several tickets are related to each other and more easily diagnose a problem, sifting through large numbers of tickets still requires a large pool of engineers for managing cloud or distributed computing systems.”)

Regarding claim 2 (currently amended):
	Chan in view of Wong teaches the method of claim 1.
	Chan further teaches causing, based on the second score, interactive displaying of one or more features associated with the first (see [0054]: “Application 124 is configured to display via graphical user interface 126, one or more graphs depicting the linear surrogate model 114 and at least one of the one or more non-linear surrogate models 115.”).
	Chan does not teach wherein the set of data includes incident data, generating, based at least on a first incident data of a received second pair of incident data, embeddings associated with the first incident data of the received second pair of incident data using the interpreter model, or generating a second score based at least on the embeddings associated with the first incident data of the received second pair of incident data.
	Wong however, teaches in analogous wherein the set of data includes incident data (see para [0005]: “In another aspect, a method for generating link information is provided. A plurality of incident management tickets are received. Each of the plurality of incident management tickets has a first text feature and a second feature. Pairs of tickets within the plurality of incident management tickets that are linked are identified, comprising: selecting a first candidate ticket and a second candidate ticket from the plurality of incident management tickets;”), 
	generating, based at least on a first incident data of a received second pair of incident data, embeddings associated with the first incident data of the received second pair of incident data using the interpreter model (see para [0004]: “A Siamese neural network model is trained using the first text feature of the first ticket and the first text feature of the second ticket as inputs to an input layer of the Siamese neural network model. The input layer is configured to generate first input embeddings for the first ticket and second input embeddings for the second ticket. The Siamese neural network model is trained using the first input embeddings, the second input embeddings, the second feature of the first ticket, and the second feature of the second ticket as inputs to an output layer of the Siamese neural network model. ”), 
	generating the second score based at least on the embeddings associated with the first incident data of the received second pair of incident data (see claim 8: “converting the tokenized first text feature of the first ticket and the tokenized second text feature of the second ticket to respective integer indexes”.). 
Before the effective filing data of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Chan and Wong to add attributes of incident data and generating scores based on embeddings in order to find duplicate tickets, responsible tickets, and/or related tickets (“see Wong at claim 9: “ wherein the first ticket and the second ticket are linked as one of duplicate tickets, responsible tickets, and/or related tickets.”.)

Regarding claim 3: 
	Chan in view of Wong teaches the method of claim 1. 
	Chan further teaches wherein the correlation model represents a teacher of a teacher-student model (see [0025]: “FIG. 1 is a block diagram illustrating an embodiment of a system for machine learning model interpretation. In the example shown, system 100 includes a complex model server 102, a network 105, a surrogate model server 112, and a client device 122.”. Also, see [0026]: “Complex model server 102 includes a machine learning model 104, training data 106, model prediction data 107, and actual outcome data 108. ”. Also see, [0027]: “Machine learning model 104 is configured to implement one or more machine learning algorithms (e.g., decision trees, naïve Bayes classification, least squares regression, logistic regression, support vector machines, neural networks, deep learning, etc.”.) [Examiner note: the correlation model represents a teacher model or complex black-box model, such as a Siamese network, as per [0003] of the instant case’s specification: “…correlation model represents a teacher model or a complex black-box model. … An example of the correlation model includes a Siamese network”. The interpreter model represents a student model or simpler, glass-box model such as a linear model as per the instant case’s specification at [0003]: “…interpreter model represents a student model or a simpler, glass-box model.”. Additionally, see [0059] of the instant case’s specification: “Examples of the interpreter model may use Random Forest, Gradient Boosting Regressor, a linear model, and the like.”],
	wherein the interpreter model represents a student of the teacher-student model (see [0025]: “FIG. 1 is a block diagram illustrating an embodiment of a system for machine learning model interpretation. In the example shown, system 100 includes a complex model server 102, a network 105, a surrogate model server 112, and a client device 122.”. Also, see [0032]: “Surrogate model server 112 includes a linear surrogate model 114, one or more surrogate non-linear models 115, training data 116, model prediction data 117, and actual outcome data 118. ”. Also, see [0033]: “Surrogate model server 112 is configured to implement one or more surrogate models. A surrogate model is a data mining and engineering technique in which a generally simpler model is used to explain another usually more complex model or phenomenon.”.), 
	wherein the behavior of the interpreter model includes inferring the behavior of the correlation model (see [0033]: “Surrogate model server 112 is configured to implement one or more surrogate models. A surrogate model is a data mining and engineering technique in which a generally simpler model is used to explain another usually more complex model or phenomenon.”.).
	Chan does not explicitly teach wherein the correlation model includes a Siamese Network including a plurality of neural networks to generate embeddings associated with the second pair of data.
	Wong, however, teaches wherein the correlation model includes a Siamese Network including a plurality of neural networks to generate embeddings associated with the second pair of data (see abstract: “A Siamese neural network model is trained using the first text features as inputs to an input layer of the model. The input layer is configured to generate first and second input embeddings for the first and second tickets, respectively”.)
Before the effective filing data of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Chan and Wong to include attributes of wherein the correlation model includes a Siamese Network including a plurality of neural networks to generate embeddings associated with the second pair of data in order to discern between several tickets related to improve diagnosing problems and managing of tickets (see Wong at para [0017]: “ In examples described herein, a Siamese neural network model is utilized by an incident processor to predict whether pairs or groups of tickets are linked to one another.”)

Regarding claim 13:
Claim 13, inter alia, recites analogous limitations to claim 13. Therefore, claim 13 is rejected on the same grounds as claim 3.

Regarding claim 4: 
	Chan in view of Wong teaches the method of claim 1. 
	Chan does not teach explicitly teach wherein the set of data includes incident
	Wong, however, analogously teaches wherein the set of data includes incident data identifier the incident,
	a title (see para [0026]: “The source tickets 164 may include a plurality of text features or text strings, such as a title,”), 
	a severity level of the incident, 
a status of the incident, 
a topology of a system associated with the incident (see [0026]: “While five text features are described, the tickets (e.g., source tickets 164) may have additional text features (e.g., a user-entered description) and non-text features (e.g., timestamps, IP addresses, network topology features, Dag, Machine, Forest, Rack, Cluster, or other suitable metadata), in various examples.”), or 
a timestamp associated with occurrence of the incident (see [0026]: “While five text features are described, the tickets (e.g., source tickets 164) may have additional text features (e.g., a user-entered description) and non-text features (e.g., timestamps, IP addresses, network topology features, Dag, Machine, Forest, Rack, Cluster, or other suitable metadata), in various examples.”).
Before the effective filing data of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Chan and Wong to include attributes of incident data information in order to create embeddings of which to compare for linking incident tickets (see Wong at para [0027]: “In some examples, the text features are tokenized and indexed to create embeddings.”. Also see Wong at para [0028]: “FIG. 2 depicts an example of an incident processor 200 for linking incident management tickets, according to an aspect of the disclosure. The incident processor 200 generally corresponds to the incident processor 112 and/or 122, in some examples. The incident processor 200 includes a Siamese neural network model 205 that is configured to process pairs of tickets, such as ticket 202 and ticket 204, to generate respective trained embeddings 252 (corresponding to ticket 202) and 254 (corresponding to ticket 204).”).

Regarding claims 14 and 20: 
	Claims 14 and 20 recite analogous limitations to claim 4 and are therefore rejected on the same grounds as claim 4. 

Regarding claim 5: 
Chan in view of Wong teaches the method of claim 1. 
Chan further teaches wherein the first score represents a global feature importance score (see [0041]: “The one or more surrogate non-linear models 115 may include a feature importance model, decision tree model, a partial dependence plot, and/or any other non-linear models.”. Also, see [0042]: “A feature may have a global feature importance and a local feature importance. ”),
wherein the global feature importance indicates a degree of influence of the feature relative to other features in a plurality of sets of data (see [0042]: “ Global feature importance measures the overall impact of an input feature on the model predictions while taking nonlinearity and interactions into considerations. Global feature importance values give an indication of the magnitude of a feature's contribution to model predictions for all observations”).
Chan does not teach wherein the correlation model includes a Siamese Network including a plurality of neural networks to generate embeddings associated with the second pair of data.
Wong, however, teaches in analogous wherein the correlation model includes a Siamese Network including a plurality of neural networks to generate embeddings associated with the second pair of data (see section I ‘Introduction”: “Siamese networks (SN for short in the sequel) are widely used in similarity metric learning [16] and contrastive learning [4] where objects are compared. Different from conventional architectures that take one input instance, an SN maps a pair of instances (the “query” and the “reference”) to a similarity score ”). 
Before the effective filing data of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Chan and Wong before him or her, to modify the method of claim 5 to include attributes of a Siamese network as disclosed by Wong in order to compare objects, especially when labeled data are scarce and imbalanced, a key attribute of Siamese networks (see Abstract: “Learning to compare two objects are essential in applications, especially when labeled data are scarce and imbalanced. As these applications can involve humans and make high-stake decisions, it is critical to explain the learned models. We aim to study post-hoc explanations of Siamese networks (SN) widely used in learning to compare. We characterize the instability of gradient-based explanations due to the additional compared object in SN, in contrast to architectures with a single input instance. We optimize for global invariance based on unlabeled data using self-learning to promote the stability of local explanations for individual input. The invariance leads to constrained optimization problems that can be solved using gradient descent-ascent (GDA), or KL-divergence regularized unconstrained optimization solved by SGD. We provide convergence proofs when the objective functions are nonconvex due to the Siamese architecture.”.).

Regarding claim 15:
Claim 15, inter alia, recites analogous limitations to claim 5. Therefore, claim 15 is rejected on the same grounds as claim 5.

Regarding claim 6:
	Chan in view of Wong teaches the method of claim 2. 
	Chan further teaches the second score represents a local feature importance score (see [0041]: “The one or more surrogate non-linear models 115 may include a feature importance model, decision tree model, a partial dependence plot, and/or any other non-linear models.”. Also, see [0042]: “A feature may have a global feature importance and a local feature importance. ”), 
	wherein the local feature importance score indicates a degree of influence of a combination of the feature and a word appearing in the value associated with the feature relative to other features in a plurality of sets of data (see [0042]: “Local feature importance describes how the combination of the learned model rules or parameters and an individual observation's attributes affect a model's prediction for that observation while taking nonlinearity and interactions into effect”).

Regarding claim 16:
Claim 6, inter alia, recites analogous limitations to claim 16. Therefore, claim 16 is rejected on the same grounds as claim 6.

Regarding claim 9: 
Chan in view of Wong teaches the method of claim 2.
	Chan does not explicitly teach wherein embeddings represent Siamese embeddings.
Wong, however, analogously teaches wherein the embeddings represent Siamese embeddings (see [0005]: “Pairs of tickets within the plurality of incident management tickets that are linked are identified, comprising: selecting a first candidate ticket and a second candidate ticket from the plurality of incident management tickets; providing the first text feature of the first candidate ticket and the first text feature of the second candidate ticket to an input layer of a Siamese neural network model, the input layer being configured to generate first input embeddings for the first candidate ticket and second input embeddings for the second candidate ticket”.).
Before the effective filing data of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Chan and Wong to include attributes of Siamese embeddings in order in order to create embeddings of which to compare for linking incident tickets (see Wong at para [0027]: “In some examples, the text features are tokenized and indexed to create embeddings.”. Also see Wong at para [0028]: “FIG. 2 depicts an example of an incident processor 200 for linking incident management tickets, according to an aspect of the disclosure. The incident processor 200 generally corresponds to the incident processor 112 and/or 122, in some examples. The incident processor 200 includes a Siamese neural network model 205 that is configured to process pairs of tickets, such as ticket 202 and ticket 204, to generate respective trained embeddings 252 (corresponding to ticket 202) and 254 (corresponding to ticket 204).”).

Regarding claim 10:
	Chan in view of Wong teaches the method of claim 1.
	Chan further teaches wherein the interpreter model includes one of: Random Forest, Gradient Boosting Regressor, or a linear model (see [0032]: “Surrogate model server 112 includes a linear surrogate model 114, one or more surrogate non-linear models 115, training data 116, model prediction data 117, and actual outcome data 118. ”)

Regarding claim 19:
Claim 19, inter alia, recites analogous limitations to claim 10. Therefore, claim 19 is rejected on the same grounds as claim 10.

Regarding claim 11: 
	Chan teaches a system comprising (see [0017]: “The invention can be implemented in numerous ways, including as a process; an apparatus; a system;”):
	a processor (see [0017]: “and/or a processor, such as a processor configured to execute instructions stored on and/or provided by a memory coupled to the processor”); and
	a memory storing computer-executable instructions that when executed by the processor cause the system to execute a method comprising (see [0017]: “a computer program product embodied on a computer readable storage medium; and/or a processor, such as a processor configured to execute instructions stored on and/or provided by a memory coupled to the processor.”):
…
training a correlation model using the training data (see fig. 12. Also, see [0127]: “At 1212, the linear and/or nonlinear model(s) are retrained. In some embodiments, the linear and/or non-linear surrogate models are retrained in the event a threshold number of entries are flagged. An entry may be flagged in the event a prediction label associated with a linear surrogate model does not correlate with a prediction label associated with a non-linear surrogate mode.”.);
wherein the correlation model predicts a correlation between the first set of data and the second set of data in the first pair of sets of data (see [0070]: “In some embodiments, the machine learning model prediction correlates with the actual outcome data. For example, a point 308 on line 301 indicates a particular outcome is likely to happen (prediction label ≈0.75) and black dot 307 indicates that the particular outcome actually occurred.”),  
training an interpreter model using the training data, wherein the interpreter model interprets a behavior of the correlation model based on the training data (see [0033: “ Surrogate model 112 may receive, from complex model server 102 via network 105, training data 106, model prediction data 107, and actual outcome data 108 and store as training data 116, model prediction data 117, and actual outcome data 118, respectively. Using training data 116, model prediction data 117, and actual outcome data 118, surrogate model server 122 may train one or more surrogate models to make one or more predictions. ”. Also, see [0034]: “Linear surrogate model 114 may be a K-LIME surrogate model. With K-LIME, local generalized linear model (GLM) surrogates are used to explain the predictions of complex response functions, and local regions are defined by K clusters or user-defined segments instead of simulated, perturbed observation samples.”);
Chan does not teach retrieving, a first pair of sets of data, wherein the first pair of sets of incident data comprises a first set of incident data and a second set of the incident data, and the first set of the incident data comprises an incident ticket, wherein the set of incident data includes a feature with a value of the feature, generating, based on the first pair of sets of incident data, training data, wherein the training data further includes a ground-truth correlation between the first set of incident data and the second set of incident data, creating, based on the identified feature, additional training data with the emphasis on the identified feature for further training the correlation model, generating, by the interpreter model, a first score as a global feature importance score and a second score as a local importance score, wherein the global feature importance score enables identifying a feature incident ticket as the sets of data to emphasize, and the local feature importance feature enables comparing important features between a pair of incident tickets, and training the correlation model using the additional training data, thereby the correlation after trained predicts a correlation between two input data according to the emphasis in the identified feature with accuracy.
Wong, however, analogously teaches retrieving, a first pair of sets of data, wherein the first pair of sets of data comprises a first set of data and a second set of data, and the first set of data comprises a feature and a value of the feature (see para [0004]: “A first training set of linked pairs of incident management tickets is generated, where each linked pair of the first training set is labeled as being linked and comprises: a first ticket having a first text feature and a second feature, and a second ticket having a first text feature and a second feature. ”)… 
generating, based on the first pair of sets of data, training data wherein the training data further includes a ground-truth correlation between a first set of data and a second set of data in the first set of data and the second set of data (see para [0004]: “The Siamese neural network model is trained using the first input embeddings, the second input embeddings, the second feature of the first ticket, and the second feature of the second ticket as inputs to an output layer of the Siamese neural network model. The output layer is configured to generate first output embeddings for the first ticket and second output embeddings for the second ticket. The Siamese neural network model is trained using a contrastive loss function between the first output embeddings for the first ticket and the second output embeddings for the second ticket.”)
generating, (see para [0033]: “In some examples, the Siamese neural network model 205 is trained through contrastive loss to learn relationships between ticket pairs’ labels (e.g., related or unrelated) and a plurality of text features. Generally, the trained embeddings 252 and 254 for each ticket pair are used to calculate a Euclidean distance and pairs that are linked have embeddings close in Euclidean distance, while unlinked pairs are farther apart. In other examples, the Siamese neural network model 205 is trained using cosine embedding loss or other suitable loss functions”)
identifying, based on the first score and the second score the feature as an emphasis for further training the correlation model (see para [0044]: “The method 400 may further include training the Siamese neural network model using the first text feature of the third ticket and the first text feature of the fourth ticket as inputs to the input layer of the Siamese neural network model, the input layer being configured to generate first input embeddings for the third ticket and second input embeddings for the fourth ticket; training the Siamese neural network model using the first input embeddings, the second input embeddings, the second feature of the third ticket, and the second feature of the fourth ticket as inputs to the output layer of the Siamese neural network model, the output layer being configured to generate first output embeddings for the third ticket and second output embeddings for the fourth ticket; and training the Siamese neural network model using a contrastive loss function between the first output embeddings for the third ticket and the second output embeddings for the fourth ticket.”)
	creating, based on the identified feature, additional training data with the emphasis on the identified feature for further training the correlation model (see para [0039]: “At step 402, a first training set of linked pairs of incident management tickets is generated. In some examples, each linked pair of the first training set is labeled as being linked and comprises a first ticket having a first text feature and a second feature, and a second ticket having a first text feature and a second feature. In some examples, the ticket generator 114 generates the first training set of linked pairs. In various examples, the first training set of linked pairs may include the first ticket 302 and the second ticket 304, and/or the first ticket 202 and the second ticket 204.”)
training the correlation model using the additional training data, thereby the correlation after trained predicts a correlation between two input data according to the emphasis in the identified feature with accuracy (see para [0024]: “For example, the source tickets 164 include various groups of two or more tickets that have been labeled as being linked, and the neural network model 162 is trained to identify similar links between tickets. In some aspects, the neural network model 162 is also configured to determine a confidence level of the identified links (e.g., 95% confident).”. Also see para [0017]: “ In examples described herein, a Siamese neural network model is utilized by an incident processor to predict whether pairs or groups of tickets are linked to one another.”)
Before the effective filing data of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Chan and Wong in order to work with incident data training pairs of data to predict correlation in order to discern between several tickets related to improve diagnosing problems and managing of tickets (see para [0001]: “For example, tickets may be generated by various computing devices or network management personnel and open tickets may then be reviewed by on-call engineers or site reliability engineers. However, management of tickets becomes challenging when the computing devices are spread out across a large geographical area and encompass many different business groups or sub-groups, at least due to a number of tickets that may be generated, their frequency of generation, etc. Although a trained and experienced engineer may be able to discern when several tickets are related to each other and more easily diagnose a problem, sifting through large numbers of tickets still requires a large pool of engineers for managing cloud or distributed computing systems.”) 

Regarding claim 17:
	Chan teaches a computer-implemented method, the method comprising (see claim 1: “A method, comprising:”),
… 
	generating, based on the first pair of sets of data, training data wherein the training data further includes a ground-truth correlation between a first set of data and a second set of data in the first set of data and the second set of data (see para [0004]: “The Siamese neural network model is trained using the first input embeddings, the second input embeddings, the second feature of the first ticket, and the second feature of the second ticket as inputs to an output layer of the Siamese neural network model. The output layer is configured to generate first output embeddings for the first ticket and second output embeddings for the second ticket. The Siamese neural network model is trained using a contrastive loss function between the first output embeddings for the first ticket and the second output embeddings for the second ticket.”)
training a correlation model using the training data (see fig. 12. Also, see [0127]: “At 1212, the linear and/or nonlinear model(s) are retrained. In some embodiments, the linear and/or non-linear surrogate models are retrained in the event a threshold number of entries are flagged. An entry may be flagged in the event a prediction label associated with a linear surrogate model does not correlate with a prediction label associated with a non-linear surrogate mode.”.);
	wherein the correlation model predicts a correlation between the first set of the  (see [0070]: “In some embodiments, the machine learning model prediction correlates with the actual outcome data. For example, a point 308 on line 301 indicates a particular outcome is likely to happen (prediction label ≈0.75) and black dot 307 indicates that the particular outcome actually occurred.”),
	the correlation model represents a teacher of a teacher-student model (see para [0027]: “Machine learning model 104 is configured to implement one or more machine learning algorithms (e.g., decision trees, naïve Bayes classification, least squares regression, logistic regression, support vector machines, neural networks, deep learning, etc.”.) [Examiner note: the correlation model represents a teacher model or complex black-box model, such as a Siamese network, as per [0003] of the instant case’s specification: “…correlation model represents a teacher model or a complex black-box model. … An example of the correlation model includes a Siamese network”. The interpreter model represents a student model or simpler, glass-box model such as a linear model as per the instant case’s specification at [0003]: “…interpreter model represents a student model or a simpler, glass-box model.”. Additionally, see [0059] of the instant case’s specification: “Examples of the interpreter model may use Random Forest, Gradient Boosting Regressor, a linear model, and the like.”];
training an interpreter model using the training data, wherein the interpreter model interprets a behavior of a correlation model trained based on the training data , and the interpreter model represents a student of the teacher student model (see para [0033]: “ Surrogate model 112 may receive, from complex model server 102 via network 105, training data 106, model prediction data 107, and actual outcome data 108 and store as training data 116, model prediction data 117, and actual outcome data 118, respectively. Using training data 116, model prediction data 117, and actual outcome data 118, surrogate model server 122 may train one or more surrogate models to make one or more predictions. ”. Also, see [0034]: “Linear surrogate model 114 may be a K-LIME surrogate model. With K-LIME, local generalized linear model (GLM) surrogates are used to explain the predictions of complex response functions, and local regions are defined by K clusters or user-defined segments instead of simulated, perturbed observation samples.”).
Chan does not teach retrieving, a first pair of sets of incident data, wherein the first pair of sets of incident data comprises a first set of the incident data and a second set of incident data, and the first set of incident data comprises an incident ticket, wherein the set of incident data includes a feature with a value of the feature, generating, based on the first pair of sets of incident data, training data wherein the training data further includes a ground-truth correlation between the first set of the incident data and second set of the incident data, generating, by the interpreter model, a first score as a global feature importance score and a second score as a local importance score, wherein the global feature importance score enables identifying a feature incident ticket as the sets of data to emphasize, and the local feature importance feature enables comparing important features between a pair of incident tickets, creating, based on the identified feature, additional training data with the emphasis on the identified feature for further training the correlation model, training the correlation model using the additional training data, thereby the correlation after trained predicts a correlation between two input data according to the emphasis in the identified feature with accuracy. 
	Wong, however, analogously teaches retrieving, a first pair of sets of incident data, wherein the first pair of sets of incident data comprises a first set of the incident data and a second set of incident data, and the first set of incident data comprises an incident ticket, wherein the set of incident data includes a feature with a value of the feature (see para [0004]: “A first training set of linked pairs of incident management tickets is generated, where each linked pair of the first training set is labeled as being linked and comprises: a first ticket having a first text feature and a second feature, and a second ticket having a first text feature and a second feature. ”)
	generating, based on the first pair of sets of incident data, training data wherein the training data further includes a ground-truth correlation between the first set of the incident data and second set of the incident data (see para [0004]: “The Siamese neural network model is trained using the first input embeddings, the second input embeddings, the second feature of the first ticket, and the second feature of the second ticket as inputs to an output layer of the Siamese neural network model. The output layer is configured to generate first output embeddings for the first ticket and second output embeddings for the second ticket. The Siamese neural network model is trained using a contrastive loss function between the first output embeddings for the first ticket and the second output embeddings for the second ticket.”)
	generating, (see para [0033]: “In some examples, the Siamese neural network model 205 is trained through contrastive loss to learn relationships between ticket pairs’ labels (e.g., related or unrelated) and a plurality of text features. Generally, the trained embeddings 252 and 254 for each ticket pair are used to calculate a Euclidean distance and pairs that are linked have embeddings close in Euclidean distance, while unlinked pairs are farther apart. In other examples, the Siamese neural network model 205 is trained using cosine embedding loss or other suitable loss functions”)
	creating, based on the identified feature, additional training data with the emphasis on the identified feature for further training the correlation model (see para [0039]: “At step 402, a first training set of linked pairs of incident management tickets is generated. In some examples, each linked pair of the first training set is labeled as being linked and comprises a first ticket having a first text feature and a second feature, and a second ticket having a first text feature and a second feature. In some examples, the ticket generator 114 generates the first training set of linked pairs. In various examples, the first training set of linked pairs may include the first ticket 302 and the second ticket 304, and/or the first ticket 202 and the second ticket 204.”)
training the correlation model using the additional training data, thereby the correlation after trained predicts a correlation between two input data according to the emphasis in the identified feature with accuracy (see para [0024]: “For example, the source tickets 164 include various groups of two or more tickets that have been labeled as being linked, and the neural network model 162 is trained to identify similar links between tickets. In some aspects, the neural network model 162 is also configured to determine a confidence level of the identified links (e.g., 95% confident).”. Also see para [0017]: “ In examples described herein, a Siamese neural network model is utilized by an incident processor to predict whether pairs or groups of tickets are linked to one another.”).
Before the effective filing data of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Chan and Wong in order to work with incident data training pairs of data to predict correlation in order to discern between several tickets related to improve diagnosing problems and managing of tickets (see para [0001]: “For example, tickets may be generated by various computing devices or network management personnel and open tickets may then be reviewed by on-call engineers or site reliability engineers. However, management of tickets becomes challenging when the computing devices are spread out across a large geographical area and encompass many different business groups or sub-groups, at least due to a number of tickets that may be generated, their frequency of generation, etc. Although a trained and experienced engineer may be able to discern when several tickets are related to each other and more easily diagnose a problem, sifting through large numbers of tickets still requires a large pool of engineers for managing cloud or distributed computing systems.”)

Regarding claim 19: 
Chan in view of Wong teaches the method of claim 17. 
Chan further teaches wherein the interpreter model interprets a behavior of the correlation model, wherein the correlation model predicts correlations among a plurality of sets of (see [0032]: “Surrogate model server 112 includes a linear surrogate model 114, one or more surrogate non-linear models 115, training data 116, model prediction data 117, and actual outcome data 118. ”.).
Chan does not explicitly teach incident data. 
Wong, however, analogously teaches incident data (see para [0004]: “In one aspect, a method for training a neural network for linking incident management tickets is provided”).
Before the effective filing data of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Chan and Wong before him or her, to modify the system of Chan to include attributes of incident data as disclosed by Wong in order to predict whether incident tickets are linked to each other (see Wong at para [0017]: “In examples described herein, a Siamese neural network model is utilized by an incident processor to predict whether pairs or groups of tickets are linked to one another.”)

Claim 8 is rejected under 35 U.S.C 103 as being unpatentable over Chan et al. (US20190325335 hereinafter referred to as Chan) in view Wong et al. (“US20230186052A1” hereinafter referred to as Wong) in further view of Chen et al. (“Identifying Linked Incidents in Large-Scale Online Service Systems” hereinafter referred to as Chen 2).
Regarding claim 8: 
	Chan in view of Wong teaches the method of claim 2.
	Chan does not explicitly teach wherein the embeddings include a multi-dimensional vector representation or wherein a number of dimensions of the embeddings is based on a number of features associated with the set of data.
Chen 2, however, analogously teaches wherein the embeddings include a multi-dimensional vector representation (see section 4 ‘Experiments’ subsection ‘4.1’ Dataset and Setup’: “The preserved word embedding dimension is 300, and the number of training epochs is 30.”), 
	wherein a number of dimensions of the embeddings is based on a number of features associated with the set of data (see section 3 ‘Proposed Approach’ subsection 3.1 ‘Overview’: “These two features are fed into two modules: the textual embedding module and the component representation module. These two modules are aimed to extract the textual information and the component dependency information, respectively. First, to deal with the challenges brought by textual description from different sources, we introduce a text encoder that can grasp the semantic representation of the incident description rather than textual only information. The textual module is trained to map the textual description into a high dimensional semantic representation.”.)
Before the effective filing data of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Chan, Wong, and Chen 2 before him or her, to modify the method of Chan to include multi-dimensional vector representations wherein the number is based on a number of features associated with the set of data as disclosed by Chen 2 in order to increase the likelihood of incident links (see section 4 ‘Experiments’ subsection 4.4 ‘Effectiveness of Component Representation Learning’: “Ideally, the encoded component vector should give a good representation of the component. One way to check the effectiveness of learned representation is to investigate whether the component vectors from the same service can be grouped together. Figure 6 visualizes multiple embedded component vectors with dimension dC = 64 from the following 7 services: Data Analytics Service, Database Service, Database Query Service, Extension Service, Networking Service, Support Service and Storage Service. To visualize these 64-d vectors, we apply the t-SNE (t-distributed stochastic neighbor embedding) method [24] to transform the vector from the high-dimensional space into a two-dimensional space. In Figure 6, components from the same service are presented in the same colour. By looking at the spatial distribution of components, we can find that our embedding method can cluster the components from the same service together in the latent space. This observation reflects the effectiveness of component representation learning: similar components (from the same service) have similar encoded vectors and are more likely to have incident links.”.)

Claims 7 and 18 are rejected under 35 U.S.C 103 as being unpatentable over Chan et al. (US20190325335 hereinafter referred to as Chan) in view of Wong et al. (“US20230186052A1” hereinafter referred to as Wong) in further view of McNutt et al. (“Exploring ∆∆G prediction with Siamese Networks” hereinafter referred to as McNutt)
Regarding claim 7:
	Chan in view of Wong teaches the method of claim 2. 
	Chan does not explicitly teach wherein the training data is based on permutative combinations of pairs of sets of data. 
McNutt, however, teaches in analogous wherein the training data is based on permutative combinations of pairs of sets of data (see section 2 ‘Methods’ subsection 2.3 ‘Additional Ligands Comparison’: “In order to directly compare our trained model to Jiménez-Luna et al. 11 , we utilize the additional ligands training set as described in their manuscript. … We train our models on a reference ligand, as described in A.2, and a given number of additional ligands for each congeneric series. In the one additional ligand training set, we train on the two ordered pairs of the reference ligand and one additional ligand. Then testing is carried out on the two-permutations between the ligands in the training set and ligands that the model has not seen.”.)
Before the effective filing data of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Chan, Wong, and McNutt before him or her, to modify the method of Chan to include training based on permutative combinations of pairs of sets of data in order to comprehensively go through the training data and have increased model performance (see section 4 ‘Discussion and Conclusion’: “Our models show higher correlation with experimental RBFE and lower errors of prediction than the model developed by Jiménez-Luna et al. … We see an increase in model performance as the amount of information about each congeneric series is increased. Our highest parameter CNN architecture, Dense, was able to outperform the lower parameter Default2018 architecture on the smallest training set.”).

Regarding claim 18: 
	Chan in view of Wong teaches the method of claim 17.
	Chan does not explicitly teach wherein the correlation model uses a Siamese Network including a plurality of convolutional neural networks.
McNutt, however, analogously teaches wherein the correlation model uses a Siamese Network including a plurality of convolutional neural networks (see pg. 2 section 2.2 “We implement the Siamese Network by training a linear layer on the difference between the final latent vectors (27,648 and 224 for Default2018 and Dense, respectively) of the convolutional architectures of the two input complexes to predict the ∆∆G.”).
Before the effective filing data of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Chan, Wong, and McNutt before him or her, to modify the system of Chan to include attributes of a Siamese network including a plurality of convolutional neural networks, as disclosed by McNutt, in order to increase performance (see section 5 ‘Lessons Learned’: “. Relative binding free energy (RBFE) methods allow comparisons of molecular potency during this optimization. We utilize a Siamese Convolutional Neural Network (CNN) to directly estimate the RBFE with higher throughput than simulation based methods. Our models show improved performance over a previously published Siamese RBFE predictor. We observe decreased performance on out-of-domain RBFE predictions.”).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Andrew A Bracero whose telephone number is (571)270-0592. The examiner can normally be reached Monday - Thursday 7:30a.m. - 5:00 p.m. ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David Yi can be reached Monday – Friday 9:00 a.m. – 5:00 p.m. ET whose telephone number is 571-270-7519. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/ANDREW BRACERO/Examiner, Art Unit 2126                                                                                                                                                                                                        /DAVID YI/Supervisory Patent Examiner, Art Unit 2126
Read full office action
CORRELATION MODEL INTERPRETER USING TEACHER-STUDENT MODELS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

CORRELATION MODEL INTERPRETER USING TEACHER-STUDENT MODELS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email