Last updated: April 19, 2026
Application No. 18/175,487
SYSTEM FOR TRAINING NEURAL NETWORK TO DETECT ANOMALIES IN EVENT DATA

Non-Final OA §101§103§112
Filed
Feb 27, 2023
Examiner
WASAFF, JOHN S.
Art Unit
3629
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
Quantiphi Inc.
OA Round
1 (Non-Final)
This examiner grants 33% of cases after interview

— +44.2% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 373 resolved cases, 2023–2026
Examiner Intelligence

WASAFF, JOHN S. View full profile →
Grants only 33% of cases
Career Allow Rate
124 granted / 373 resolved
-18.8% vs TC avg
Strong +44% interview lift
Without
With
+44.2%
Interview Lift
resolved cases with interview
Typical timeline
4y 1m
Avg Prosecution
37 currently pending
Career history
410
Total Applications
across all art units
Statute-Specific Performance

§101
25.4%
-14.6% vs TC avg
§103
39.3%
-0.7% vs TC avg
§102
11.1%
-28.9% vs TC avg
§112
20.4%
-19.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 373 resolved cases
Office Action

§101 §103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claims 1-14 are pending. 

Claim Objections
Claims 7-11 and 14 are objected to because of the following informalities.
In claim 7, applicant recites “at least one of masked log modelling, log event classification, log events cluster analysis, auto-regressive analysis, and log event prediction.” Since the use of “and” suggests the conjunctive – i.e., each of “masked log modelling,” “log event classification,” “log events cluster analysis,” “auto-regressive analysis,” and “log event prediction” is performed – examiner suggests amending the claim to recite: “at least one of masked log modelling, log event classification, log events cluster analysis, auto-regressive analysis, or log event prediction.” This interpretation aligns with applicant’s specification. 
Similarly, in claim 11, applicant recites “one of an attention head decoder, a classifier head decoder, a masked log decoder.” Examiner suggests amending the claim to recite: “one of an attention head decoder, a classifier head decoder, or a masked log decoder.” 
In claim 14, applicant recites “the processing arrangement,” which should read: “a processing arrangement,” given it hasn’t been introduced in the claim. (Examiner notes this is not a matter of indefiniteness, since the metes and bounds are clear from the specification.)  
Claims 8-10 are objected to by virtue of their dependency. 
Appropriate correction is required.
Examiner notes that claim 3, while not being objected to, appears to require all of: 
1) removing special characters from a semantic token; 
2) splitting at least one keyword from a semantic token; 
3) replacing at least one semantic token with a substitute token extracted based on a substitute parameter; and 
4) adding at least one keyword or parameter to a semantic token. 
Applicant is encouraged to review the claim to ensure that the scope is commensurate with applicant’s desires.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: 
Claim 4: wherein the first transformation model is configured to map each of the set of tokens to the event representation, wherein the event representation is a multi-dimensional semantic latent space representation generated using the one or more event embeddings.
Claim 5: the first transformation model is an encoder only sentence transformer-based language model, configured to generate one or more sentence event embeddings for the set of tokens.
Claim 6: the multiheaded attention mechanism configured to process the one or more contextual embeddings for each of the set of tokens via the at least one statistical technique to derive the correlations between the plurality of log events.
See [0033]-[0035] and [0049] for support, for example. 
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.


Claim Rejections - 35 USC § 112(b)
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

Claims 6, 11, 13, and 14 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
	In claim 6, applicant recites “the multiheaded attention mechanism.” There is insufficient antecedent basis for this limitation in the claim. Examiner cannot determine to what previous claim limitation applicant is referring. Given the lack of antecedent basis and resulting ambiguity, the metes and bounds are unclear. The claim is rejected for indefiniteness.
In claim 11, applicant recites “the decoder.” There is insufficient antecedent basis for this limitation in the claim. Examiner cannot determine whether applicant is referring to the previously claimed “decoder” of claim 10 or another. Given the lack of antecedent basis and resulting ambiguity, the metes and bounds are unclear. The claim is rejected for indefiniteness.
In claim 13, applicant recites “the trained neural network of claim 1.” There is insufficient antecedent basis for this limitation in the claim. Further, it’s unclear how the claim is to be read. Examiner cannot determine whether claim 13 incorporates the entirety of claim 1 or only certain features. Lastly, examiner notes that the steps of claim 13 more closely align with a method vs. a system. Given the lack of antecedent basis and resulting ambiguity, the metes and bounds are unclear. The claim is rejected for indefiniteness.
In claim 14, applicant recites “A computer readable storage medium having computer executable instruction” in the preamble, then proceeds to claim method steps. Examiner cannot determine if applicant attempting to claim a method or a computer readable storage medium. Given this ambiguity, the metes and bounds are unclear. The claim is rejected for indefiniteness. 
Appropriate correction is required.

Claim Rejections - 35 USC § 101 (Signal Per Se)
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claim 14 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because the “computer readable storage medium” may encompass a signal. Applicant’s specification does not explicitly exclude this interpretation.
Applicant may overcome this rejection by amending the limitation to recite “non-transitory computer readable storage medium.” Such an amendment does not constitute new matter. 

Claim Rejections - 35 USC § 101 (Abstract Idea)
Claims 1-14 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception without significantly more.

Step 1 (The Statutory Categories): Is the claim to a process, machine, manufacture, or composition of matter? MPEP 2106.03.
Per Step 1, claim 1-13 are to a system (i.e., a machine), claim 14 to a computer readable storage medium (i.e., a manufacture or machine). Thus, the claims are directed to statutory categories of invention. However, the claims are rejected under 35 U.S.C. 101 because they are directed to an abstract idea, a judicial exception, without reciting additional elements that integrate the judicial exception into a practical application.
	The analysis proceeds to Step 2A Prong One. 

Step 2A Prong One: Does the claim recite an abstract idea, law of nature, or natural phenomenon? MPEP 2106.04. 
The abstract idea of independent claims 1 and 14 is (claim 1 being representative):
receive event data associated with a plurality of log events for a given time period;
pre-process the received event data to extract a set of tokens and positional encodings associated with each of the plurality of log events to generate refined event data,
process the refined event data,
map each token of the set of tokens based on the positional encodings associated with each token along with the time period associated with each of the plurality of log event in the refined event data to generate an event representation for the mapped set of tokens; and
process the event representation for the set of tokens to generate one or more event embeddings for each of the log events from the plurality of log events in the refined event data based on a first transformation model; and
process the one or more event embeddings for a given sequence of log events of the plurality of log events in the refined event data based on a second transformation model to generate one or more contextual embeddings for each log event; and
simultaneously process the one or more contextual embeddings for each log event via at least one statistical technique to derive correlations between the plurality of log events associated with the set of tokens;
generate an embedding matrix utilizing the derived correlations between the plurality of log events; and
process the embedding matrix to detect the one or more anomalies in the event data.

Steps a) though i) are those which could be performed mentally. An individual could accomplish the steps above, while performing an evaluation of data to determine whether or not it’s anomalous (receive, pre-process, process, map, generate describe manual steps that can be performed with pen and paper). If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind, including observations, evaluations, judgements, and/or opinions, then it falls within the Mental Processes – Concepts Performed in the Human Mind grouping of abstract ideas. Accordingly, the claim recites an abstract idea.
Additionally and alternatively, steps b) and d) through i) could be interpreted as describing a mathematical calculations (e.g., generate one or more event embeddings, generate an embedding matrix, simultaneously process the one or more contextual embeddings for each log event via at least one statistical technique to derive correlations between the plurality of log events associated with the set of tokens, etc.), which constitutes a process that, under its broadest reasonable interpretation, covers mathematical concepts. If a claim limitation, under its broadest reasonable interpretation, covers mathematical concepts, including mathematical relationships, mathematical formulas or equations, mathematical calculations, then it falls within the Mathematical Concepts grouping of abstract ideas. Accordingly, the claim recites an abstract idea.


Step 2A Prong Two: Does the claim recite additional elements that integrate the judicial exception into a practical application? MPEP 2106.04.
This judicial exception is not integrated into a practical application because the additional elements are merely instructions to apply the abstract idea to a computer, as described in MPEP 2106.05(f). 
Claim 1 recites the following additional elements: training a neural network; a processing arrangement, communicably coupled to a database configured to store the event data; using an encoder architecture of the processing arrangement; the encoder architecture comprising at least two encoders; a first encoder; a second encoder.
Claim 14 recites the following additional elements: a computer readable storage medium having computer executable instruction; a computer system; using an encoder architecture of the processing arrangement; the encoder architecture comprising at least two encoders; a first encoder; a second encoder.
These elements are merely instructions to apply the abstract idea to a computer, per MPEP 2106.05(f). Applicant has only described generic computing elements and/or machinery in their specification, as seen in [0022]-[0024] and [0029] of applicant’s specification as filed, for example. 
Further, the combination of these elements is nothing more than a generic computing system applied to the tasks of the abstract idea. Because the additional elements are merely instructions to apply the abstract idea to a generic computing system, they do not integrate the abstract idea into a practical application, when viewed in combination. See MPEP 2106.05(f). 
Therefore, per Step 2A Prong Two, the additional elements, alone and in combination, do not integrate the judicial exception into a practical application. The claim is directed to an abstract idea.

Step 2B (The Inventive Concept): Does the claim recite additional elements that amount to significantly more than the judicial exception? MPEP 2106.05.
Step 2B involves evaluating the additional elements to determine whether they amount to significantly more than the judicial exception itself. 
	The examination process involves carrying over identification of the additional element(s) in the claim from Step 2A Prong Two and carrying over conclusions from Step 2A Prong Two pertaining to MPEP 2106.05(f).
The additional elements and their analysis are therefore carried over: applicant has merely recited elements that facilitate the tasks of the abstract idea, as described in MPEP 2106.05(f). 
Further, the combination of these elements is nothing more than a generic computing system applied to the tasks of the abstract idea. When the claim elements above are considered, alone and in combination, they do not amount to significantly more. 
Therefore, per Step 2B, the additional elements, alone and in combination, are not significantly more. The claims are not patent eligible. 
	The analysis takes into consideration all dependent claims as well:
	Dependent claims 2-13 further narrow the abstract idea with additional abstract steps and/or information. This narrowing of the abstract idea groupings above does not integrate into practical application and/or add significantly more. Some of the dependent claims recite further additional elements, beyond those highlighted previously:
	Claim 5: a sentence-transformer model, the first transformation model is an encoder only sentence transformer-based language model.
	Claim 6: a vanilla transformer-based encoder architecture implementing the multiheaded attention mechanism.
	Claim 8: a decoder operatively coupled to the first and second encoder of the encoder architecture; the decoder; via a first encoder; via a multi-headed attention mechanism.
	Claim 9: via the first encoder; via a multi-headed attention mechanism.
	Claim 10: via a classified head decoder.
	Claim 11: an event encoder; a log sequence encoder; the decoder is one of an attention head decoder, a classifier head decoder, a masked log decoder.
	Claim 13: the trained neural network; executing the trained neural network.
	Similar to above, these are generically recited computing devices or machinery that are claimed in a results-oriented manner and equivalent to “apply it.” Whether viewed alone or in combination, these further additional elements do not integrate the abstract idea into practical application and/or add significantly more. See MPEP 2106.05(f).
Accordingly, claims 1-14 are rejected under 35 USC § 101 as being directed to non-statutory subject matter. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-2, 4, 6-7, 10, and 13-14 are rejected under 35 U.S.C. 103 as being unpatentable over Hajimirsadeghi et al. (US20200076841; hereinafter “Hajimirsadeghi”) in view of “LogBERT: Log Anomaly Detection via BERT” by Guo et al. (hereinafter “Guo”).

Claims 1 and 14
Hajimirsadeghi discloses:
(Claim 1) A system for training a neural network to detect one or more anomalies in an event data {[0345] For example, FIG. 24 is a block diagram that illustrates a computer system 2400 upon which an embodiment of the invention may be implemented. Computer system 2400 includes a bus 2402 or other communication mechanism for communicating information, and a hardware processor 2404 coupled with bus 2402 for processing information. Hardware processor 2404 may be, for example, a general purpose microprocessor. [0103] FIG. 1 is a block diagram that depicts an example computer 100, in an embodiment. Computer 100 has a predictive recurrent neural network (RNN) that detects an anomalous network flow, i.e., event(s) data. [0111] For example, one ordering of packets may be anomalous, while another ordering of the same packets may be normal, which is something that RNN 170 can be trained to facilitate detection of.}, the system comprising a processing arrangement, communicably coupled to a database configured to store the event data {processor, database described in [0106], [0347]}, wherein the processing arrangement is configured to:
(Claim 14) A computer readable storage medium having computer executable instruction that when executed by a computer system, causes the computer system to execute a method for detecting one or more anomalies in an event data {See previous citations to [0103], [0106], [0111], [0345], and [0347]}, the method comprising:
(Claims 1 and 14) receive event data associated with a plurality of log events for a given time period {[0264] For example, traces 1-9 may be time series events, such that contents of log 1900 are temporally sorted (e.g. naturally as spooled), such that contents of log 1900 may be sequenced along a timeline in the shown ordering of traces 1-9, i.e., plurality of log events for a given time period. event data received indicated in [0267].};
the time period associated with each of the plurality of log event in the refined event data {See previous citations to [0264], [0267], where refined event data corresponds to sequenced events.};
simultaneously process each log event via at least one statistical technique to derive correlations between the plurality of log events {[0158] For example, predicted features 731 is a first vector in predicted sequence 730, whereas correlated observed features 712 is a second vector in actual sequence 710. That is because a first prediction (generated from a first packet) actually forecasts a next (i.e. second) packet. This skew is expressly depicted in FIG. 7, but may be implied (i.e. not shown) in other figures (e.g. FIG. 1) herein that show parallel sequences of input and output vectors without skew.};
generate an embedding matrix utilizing the derived correlations between the plurality of log events {[0160] All other vectors, i.e., matrices, are subject to correlation and comparison in pairs as the horizontal lines suggest. Each individual comparison entails a pair of opposite vectors, such as 712 compared to 731. The fitness (i.e. closeness) of each comparison, or lack of fitness, may be measured as a packet anomaly score such as 741-742, such as according to a prediction error calculation, such as mean squared. For example, if vectors 712 and 731 are mostly similar, and vectors 713 and 732 are divergent (i.e. discrepant), then packet anomaly score 742 would exceed packet anomaly score 741.}.
Hajimirsadeghi doesn’t explicitly disclose, however, Guo, in a similar field of endeavor directed to log anomaly detection via BERT, teaches:
pre-process the received event data to extract a set of tokens and positional encodings associated with each of the plurality of log events to generate refined event data {See III, A (Framework): Anomaly detection includes a pre-processing approach that comprises extracting tokens from log messages, i.e., log events, via a log parser (shown in Figure 2). LogBERT then represents each log key as an input representation, where the representation is a summation of a log key embedding and a position embedding, i.e., encoding, thereby generating refined event data.}
process the refined event data using an encoder architecture of the processing arrangement {See III, A (Framework): LogBERT adopts Transformer encoder to learn, i.e., process, the contextual relations among log keys in a sequence.}, the encoder architecture comprising at least two encoders {See III, A (Framework): The Transformer encoder usually consists of multiple transformer layers, i.e., at least two.}, wherein:
a first encoder {See III, A (Framework): LogBERT adopts Transformer encoder to learn, i.e., process, the contextual relations among log keys in a sequence.} is configured to:
map each token of the set of tokens based on the positional encodings associated with each token along to generate an event representation for the mapped set of tokens {See III, A (Framework): Then, we can define, i.e., map, a log sequence as a sequence of ordered log keys S = {k1, ..., kt, ..., kT }, where kt ∈ K indicates the log key in the t-th position, i.e., map each token of the set of tokens based on the positional encodings associated with each token along to generate an event representation for the mapped set of tokens}; and
process the event representation for the set of tokens to generate one or more event embeddings for each of the log events from the plurality of log events in the refined event data based on a first transformation model {See III, A (Framework): In this work, we randomly generate a matrix as the log key embedding matrix, while the position embeddings are generated by using a sinusoid function to encode the position information of log keys in a sequence, i.e., generate one or more event embeddings for each of the log events from the plurality of log events in the refined event data based on a first transformation model.}; and
a second encoder {See III, A (Framework): The Transformer encoder usually consists of multiple transformer layers, i.e., at least two.} is configured to:
process the one or more event embeddings for a given sequence of log events of the plurality of log events in the refined event data based on a second transformation model to generate one or more contextual embeddings for each log event {See III, A (Framework): LogBERT adopts Transformer encoder to learn the contextual relations among log keys in a sequence. Transformer encoder consists of multiple transformer layers. Each transformer layer includes a multi-head self-attention and a position-wise feed forward sub-layer in which a residual connection is employed around each of two sub-layers, followed by layer normalization… We denote a contextual embedding vector of the log key produced by the Transformer encoder,}; and
one or more contextual embeddings for each log event; the plurality of log events associated with the set of tokens {See previous citations to III, A (Framework)}; 
process the embedding matrix to detect the one or more anomalies in the event data {See III, C (Anomaly Detection): After training, we can deploy LogBERT for anomalous log sequence detection, i.e., in the event data. The idea of applying LogBERT for log anomaly detection is that since LogBERT is trained on normal log sequences, it can achieve high prediction accuracy on predicting the masked log keys if a testing log sequence is normal. Hence, we can derive the anomalous score of a log sequence based on the prediction results on the MASK tokens.}.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify Hajimirsadeghi to include the features of Guo. Given that Hajimirsadeghi is directed to anomaly detection, one of ordinary skill in the art would have been motivated to look to Guo, in order to facilitate the use of contextual embeddings of log entries, thereby capturing information of whole log sequences and resulting in improved anomaly detection as compared to various state-of-the-art baselines {See I (Introduction) of Guo}. 

Claim 2
Hajimirsadeghi further discloses: wherein the processing arrangement is configured to: obtain a set of semantic [features] from the event data {[0234] Each log message, i.e., those obtained, may contain values for multiple semantic features. Likewise, each log trace may contain values for multiple semantic features that may be extracted or otherwise derived from features of log messages of the trace. Traces may be interrelated according to coincidence of their feature value(s).}; perform at least one cleaning technique on each [feature] of the set of semantic [features] to generate one or more [features] associated with each of the plurality of log events {[0264] Graph pruning aims to increase feature relevance by narrowing the context (i.e. related traces) of a trace to include only semantically close traces (i.e. subgraph). Besides pruning content (i.e. semantics), a subgraph may be temporally pruned, i.e., cleaned. For example, traces 1-9 may be time series events, such that contents of log 1900 are temporally sorted (e.g. naturally as spooled), such that contents of log 1900 may be sequenced along a timeline in the shown ordering of traces 1-9.}; or arrange the generated one or more [features] to generate the refined event data {Examiner notes that broadest reasonable interpretation requires consideration of only one alternative, in this case the former.}.
Guo further teaches: preprocess the received event data; tokens {See previous citations to III, A (Framework) in claim 1 rejection above.}.  
The motivation and rationale to include the additional features of Guo is the same as set forth above.
Claim 4
Guo further teaches: wherein the first transformation model is configured to map each of the set of tokens to the event representation, wherein the event representation is a multi-dimensional semantic latent space representation generated using the one or more event embeddings, and wherein the number of dimensions in the multi-dimensional semantic latent space representation range from 128 to 256, or 256 to 512, or 512 to 1024, or 1024 to 2048 {See IV, A (Experimental Setup): For LogBERT, we construct a Transformer encoder by using two Transformer layers. The dimensions for the input representation and hidden vectors are 50 and 256, respectively}.
The motivation and rationale to include the additional features of Guo is the same as set forth above.

Claim 6
Hajimirsadeghi further discloses: via the at least one statistical technique to derive the correlations between the plurality of log events {See previous citations to [0160] in claim 1 rejection above.}
Guo further teaches: wherein the second transformation model is a vanilla transformer-based encoder architecture implementing the multiheaded attention mechanism configured to process the one or more contextual embeddings for each of the set of tokens {See III, A (Framework): LogBERT adopts Transformer encoder to learn the contextual relations among log keys in a sequence. Transformer encoder consists of multiple transformer layers. Each transformer layer includes a multi-head self-attention and a position-wise feed forward sub-layer in which a residual connection is employed around each of two sub-layers, followed by layer normalization. The multi-head attention employs H parallel self-attentions to jointly capture different aspect information at different positions over the input log sequence.}.
The motivation and rationale to include the additional features of Guo is the same as set forth above.

Claim 7
Guo further teaches: wherein the encoder architecture of the processing arrangement is configured to process the embedding matrix by performing the at least one statistical technique comprising at least one of masked log modelling, log event classification, log events cluster analysis, auto-regressive analysis, and log event prediction, on the embedding matrix to detect the one or more anomalies in the event data {See I (Introduction): By using the structure of BERT, we expect the contextual embedding of each log entry can capture the information of whole log sequences. To achieve that, we propose two self-supervised training tasks: 1) masked log key prediction, which aims to correctly predict log keys in normal log sequences that are randomly masked; 2) volume of hypersphere minimization, which aims to make the normal log sequences close to each other in the embedding space.}.
	The motivation and rationale to include the additional features of Guo is the same as set forth above.

Claim 10
Guo further teaches: wherein to perform log event classification prediction on the embedding matrix, the processing arrangement is further configured to process the embedding matrix via a classified head decoder based on a classification algorithm to provide classification outputs to each embedding matrix, wherein the classification outputs includes either an anomalous embedding matrix or a normal embedding matrix {See III, A (Framework): In this work, we randomly generate a matrix as the log key embedding matrix, while the position embeddings are generated by using a sinusoid function to encode the position information of log keys in a sequence.}.
	The motivation and rationale to include the additional features of Guo is the same as set forth above.

Claim 13
Hajimirsadeghi further discloses: A system for detecting one or more anomalies in an event data, the system comprising a processing arrangement, communicably coupled to a database {See citations to Hajimirsadeghi in claim 1.}.
Guo further teaches: 
input the event data to the trained neural network of claim 1 {See III, A (Framework), citations above.}; and
executing the trained neural network to detect the one or more anomalies in the event data {See III, C (Anomaly Detection), citations above.}.
The motivation and rationale to include the additional features of Guo is the same as set forth above.

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Hajimirsadeghi and Guo, further in view of Patel (US 20200311345; hereinafter “Patel”).

Claim 3
The combination of Hajimirsadeghi and Guo, while teaching the features above, doesn’t explicitly teach, however, Patel, in a similar field of endeavor directed to language-independent contextual embedding, teaches: wherein to perform the at least one cleaning technique, the processing arrangement is configured to: extract one or more blocks from the event data to provide the set of semantic tokens associated with the plurality of log events {[0085] In an embodiment, the transmutation module employs autoencoder model to process the at least one character coordinates in the document. It will be appreciated that the autoencoder consists of two parts, namely, an encoder and a decoder. Furthermore, a given sentence in a training dataset is a sequence of tokens, wherein each token has associated at least one character coordinates. The autoencoder model takes the sequence of character coordinates comprising the plurality of tokens of the document and learns to produce another sequence of character coordinates that is substantially similar to the sequence of character coordinates comprising the given sentence. Subsequently, the autoencoder model analyzes the semantic and syntactic structure of the at least one character coordinates of the given sentence thereby enhancing the output data.}; and
modify at least one of the set of semantic tokens by removing special characters from a semantic token, splitting at least one keyword from a semantic token, replacing at least one semantic token with a substitute token extracted based on a substitute parameter and adding at least one keyword or parameter to a semantic token, to provide the one or more tokens of the refined event data {[0064] Moreover, optionally, the tokenizer module determined the plurality of tokens based on at least one of: rules pertaining to lexeme, regular expressions, specific sequence of characters of one or more words, specific and separating characters (such as, punctuations, white spaces, and so forth). More optionally, the plurality of tokens may be made of alphabetic characters, alpha-numeric characters, or numeric characters. In an embodiment, the tokenizing module analyzes a punctuation character (such as, a period ‘.’) and white space so as to define the plurality of tokens. In such case, the punctuation character (namely, the period ‘.’) may denote an abbreviation, a decimal point, an ellipsis, an email-address, or an end of a sentence.}.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the combination of Hajimirsadeghi and Guo to include the features of Patel. Given that the combination of Hajimirsadeghi and Guo is directed to anomaly detection based on Bidirectional Encoder Representations from Transformers (BERT), one of ordinary skill in the art would have been motivated to look to Patel, in order to provide a platform for character based contextual embedding of data that facilitates language-independent interpretation of data {See [0007] of Patel}. 

Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Hajimirsadeghi and Guo, further in view of “LAnoBERT : System Log Anomaly Detection based on BERT Masked Language Model” by Lee et al. (hereinafter “Lee”).

Claim 5
The combination of Hajimirsadeghi and Guo, while teaching the features above, doesn’t explicitly teach, however, Lee, in a similar field of endeavor directed to system log anomaly detection based on BERT masked language model, teaches: 
wherein the first encoder of the encoder architecture is configured to implement a sentence-transformer model, the first transformation model is an encoder only sentence transformer-based language model, configured to generate one or more sentence event embeddings for the set of tokens {See 3.2 Bert: One of the major characteristics of BERT is that pre-training is performed using two unsupervised learning methods, which are masked language modeling (MLM) and next sentence prediction (NSP). MLM involves replacing certain tokens of an input sentence with ‘[MASK]’ and predicting that they would appear in the corresponding position. NSP involves combining two sentences with the token ‘[SEP]’ in between, and then predicting whether the two sentences are semantically connected through the ‘[CLS]’ token positioned in the very front of the input sentence.}
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the combination of Hajimirsadeghi and Guo to include the features of Lee. Given that the combination of Hajimirsadeghi and Guo is directed to anomaly detection based on Bidirectional Encoder Representations from Transformers (BERT), one of ordinary skill in the art would have been motivated to look to Lee, in order to facilitate higher anomaly detection performance {See Abstract of Lee}. 

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Hajimirsadeghi and Guo, further in view of “Mask-Predict: Parallel Decoding of Conditional Masked Language Models” by Ghazvininejad et al. (hereinafter “Ghazvininejad”).

Claim 11
Guo further teaches: wherein the first encoder is an event encoder and the second encoder is a log sequence encoder {See previous citations to III, A (Framework) in claim 1 rejection above.}.
The motivation and rationale to include the additional features of Guo is the same as set forth above.
The combination of Hajimirsadeghi and Guo, while teaching the features above, doesn’t explicitly teach, however, Ghazvininejad, in a similar field of endeavor directed to parallel decoding of conditional masked language models, teaches: 
the decoder is one of an attention head decoder, a classifier head decoder, a masked log decoder {See 3 Decoding with Mask-Predict: We introduce the mask-predict algorithm, which decodes an entire sequence in parallel within a constant number of cycles. At each iteration, the algorithm selects a subset of tokens to mask, and then predicts them (in parallel) using an underlying CMLM. Masking the tokens where the model has doubts while conditioning on previous high-confidence predictions lets the model re-predict the more challenging cases, but with more information.}.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the combination of Hajimirsadeghi and Guo to include the features of Ghazvininejad. Given that the combination of Hajimirsadeghi and Guo is directed to anomaly detection based on Bidirectional Encoder Representations from Transformers (BERT), one of ordinary skill in the art would have been motivated to look to Ghazvininejad, in order to facilitate converging on a high quality output sequence with reduced decoding iterations {See 3 Decoding with Mask-Predict of Ghazvininejad}. 

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Hajimirsadeghi and Guo, further in view of “Automated Event Identification from System Logs Using Natural Language Processing” by Dwaraki et al. (hereinafter “Dwaraki”).

Claim 12
The combination of Hajimirsadeghi and Guo, while teaching the features above, doesn’t explicitly teach, however, Dwaraki, in a similar field of endeavor directed to event identification from system logs, teaches: 
wherein the database further comprises an event ontology generated via the processing arrangement, and wherein the event ontology is dynamically updated via addition of the one or more tokens {See II Event Identification Framework, A. Event Ontology: To perform this identification as accurately as possible, we need a well-defined event ontology that describes the different classes of events and their sub-types and arguments. We define a potential event identification ontology, as depicted in Figure 1, which is a combination of classifications from sources such as [17], the AIX Audit Events guide, and various other manuals.}.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the combination of Hajimirsadeghi and Guo to include the features of Dwaraki. Given that the combination of Hajimirsadeghi and Guo is directed to anomaly detection based on Bidirectional Encoder Representations from Transformers (BERT), one of ordinary skill in the art would have been motivated to look to Dwaraki, in order to facilitate automatically identifying events within logs {See Abstract of Dwaraki}. 

No Prior Art Applied to Claims 8 and 9
There is no prior art applied to claims 8 and 9.
Regarding claim 8, the prior art does not teach or suggest: 
wherein to perform log event prediction statical technique to detect the one or more anomalies, the processing arrangement further comprises a decoder operatively coupled to the first and second encoder of the encoder architecture, wherein the processing arrangement is configured to:
select at least one of the one or more event embeddings for each of the set of tokens in the event representation;
process, via a first encoder, the selected at least one of the one or more event embeddings based on a first transformation model to generate a standard embedding matrix, and the remaining of the one more event embeddings based on the second transformation model to generate a predicted embedding matrix; and, wherein the decoder is configured to:
process the one or more contextual embeddings for each of the set of tokens via a multi-headed attention mechanism to derive correlations between the plurality of log events associated with the set of tokens; and
determine a degree of dissimilarity for each of the set of tokens based on a similarity score of the predicted embedding matrix with the standard embedding matrix;
determine an anomalous embedding matrix, if the degree of dissimilarity for any of the predicted embedding matrix with the standard embedding matrix is greater than or equal to a first predefined threshold; and, if so,
identify the anomalous log event associated with the anomalous embedding matrix for detecting the one or more anomalies in the event data.
Regarding claim 9, the prior art does not teach or suggest:
wherein to perform the masked log modelling statistical technique on the embedding matrix, the second encoder of the processing arrangement is further configured to:
select at least one of the one or more event embeddings for each token of the set of tokens in the event representation;
process, via the first encoder, the selected at least one of the one or more event embeddings based on the first transformation model to generate a masked embedding matrix and the remaining of the one more event embeddings to generate a predicted embedding matrix; and wherein the decoder is configured to:
process the one or more contextual embeddings for each of the set of tokens via a multi-headed attention mechanism to derive correlations between the plurality of log events associated with the set of tokens; and
determine a degree of dissimilarity for each of the set of tokens based on a similarity score of the predicted embedding matrix with the masked embedding matrix;
determine an anomalous embedding matrix, if the degree of dissimilarity for any of the masked embedding matrix with the predicted embedding matrix is greater than or equal to a second predefined threshold; and, if so,
detect the anomalous log event associated with the anomalous embedding matrix for detecting the one or more anomalies in the event data.
Hajimirsadeghi, cited above and considered the closest prior art, features a general discussion of calculating degrees of similarity with respect to a packet similarity score:
[0160] All other vectors are subject to correlation and comparison in pairs as the horizontal lines suggest. Each individual comparison entails a pair of opposite vectors, such as 712 compared to 731. The fitness (i.e. closeness) of each comparison, or lack of fitness, may be measured as a packet anomaly score such as 741-742, such as according to a prediction error calculation, such as mean squared. For example, if vectors 712 and 731 are mostly similar, and vectors 713 and 732 are divergent (i.e. discrepant), then packet anomaly score 742 would exceed packet anomaly score 741.
Still, this stops well short of the specificity of the claims. 
Additional references considered include:
Tang et al. (US 20170302516), which teaches: [0023] To overcome the lack of intrinsic distance measured among entities, the entities are embedded into a common latent space where their semantic can be preserved. To be more specific, each entity, such as a user, or a process in computer systems, is represented as a d-dimensional vector and will be automatically learned from the data. In the embedding space, the distance of entities can be naturally computed by distance/similarity measures in the space, such as Euclidean distances, vector dot product, and so on. Compared with other distance/similarity metrics defined on sets, such as the Jaccard similarity coefficient, the embedding method is more flexible and has nice properties such as transitivity.
Byeon et al. (US 20220027608), which teaches: [0047] The embedder learning unit 30 causes the context embedder neural networks 32a and 32b and the distance learning neural network 31 to perform learning using a backpropagation algorithm. For example, with respect to the pairs of embedding vectors generated by the context embedder neural networks 32a and 32b, the distance value of an output layer calculated by the distance learning neural network 31 is compared with the reference value d included in the context learning data to calculate the weight W so as to minimize a difference between both values. The process of calculating the weight W until the difference between the distance value of the output layer and the set reference value falls within a predetermined range is referred to as learning.
Acharjee et al. (US 20220103418), which teaches: [0033] To test if an incoming log key m.sub.t (parsed from an incoming log entry timestamped for time t) is to be considered normal or abnormal, machine learning model 300 receives inputs in the window w of {m.sub.t-h, . . . , m.sub.t-1} and outputs the probability distribution Pr[m.sub.t|w]={k.sub.1:p.sub.1, k.sub.2:p.sub.2, . . . , k.sub.n:p.sub.n} describing the probability (p) for each log key (k) from the set K to appear as the next log key value from time t given the history from time t−h to time t−1. In practice, multiple log key values may appear as m.sub.t and not be anomalous. The possible log keys k in the set K are therefore sorted based on their probabilities Pr[m.sub.t|w], so that the incoming log key for time t is treated as normal if that value among the top g candidates in the probability distribution, but is otherwise treated as being anomalous if outside of the top g candidates. A network operator may set the value for g so the incoming log key is compared against more or fewer predicted log keys for time t. For example, if the sorted probability distribution Pr based on the window w of {k.sub.22, k.sub.5, k.sub.11, k.sub.9, k.sub.11} were {k.sub.57, k.sub.26, k.sub.108, k.sub.19, k.sub.11, . . . k.sub.n}, when g is set to 3, any of {k.sub.57, k.sub.26, k.sub.108} will be treated as non-anomalous for m.sub.t, and when g is set t
Read full office action
Prosecution Timeline

Feb 27, 2023
Application Filed
Nov 14, 2025
Non-Final Rejection — §101, §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/510,541
Patent 12602710
ENSEMBLE OF LANGUAGE MODELS FOR IMPROVED USER SUPPORT
2y 5m to grant Granted Apr 14, 2026
18/383,622
Patent 12555122
OMNI-CHANNEL CONTEXT SHARING
2y 5m to grant Granted Feb 17, 2026
18/216,374
Patent 12548095
Artificial Intelligence for Sump Pump Monitoring and Service Provider Notification
2y 5m to grant Granted Feb 10, 2026
18/519,309
Patent 12547996
COMPUTING SYSTEM FOR SHARING NETWORKS PROVIDING SHARED RESERVE FEATURES AND RELATED METHODS
2y 5m to grant Granted Feb 10, 2026
18/200,341
Patent 12541775
UNIQUE METHOD OF PROCESSING API DATA SUPPORTING WIDE VARIETY OF DATA TYPES AND MULTIPLE/SINGULAR FORMATS WITHOUT DATA DUPLICATION
2y 5m to grant Granted Feb 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
33%
Grant Probability
77%
With Interview (+44.2%)
4y 1m
Median Time to Grant
Low
PTA Risk
Based on 373 resolved cases by this examiner. Grant probability derived from career allow rate.