Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Note: The claims are not directed towards patent ineligible subject matter under 35 U.S.C. 101
Step 1: IS THE CLAIM DIRECTED TO A PROCESS, MACHINE, MANUFACTURE OR COMPOSITION OF MATTER?
Yes
Step 2A.1: IS THE CLAIM DIRECTED TO A LAW OF NATURE, A NATURAL PHENOMENON (PRODUCT OF NATURE) OR AN ABSTRACT IDEA?
No
Step 2A.2: DOES THE CLAIM RECITE ADDITIONAL ELEMENTS THAT INTEGRATE THE JUDICIAL EXCEPTION INTO A PRACTICAL APPLICATION?
Yes, if the claims are alternatively construed to be abstract in step 2A1. The claims seek to improve accuracy, speed, and resource consumption of a learning model supported by the specification, and reflected by the claims e.g. in spec: 0018 and 0026 In other words, the claims enable the invention to improve accuracy, speed, and resource consumption to utilize time series based object discrete status in a window using dual models to derive embedded attributes for future characteristic prediction while training a learning model of the dual models in the precise manner claimed.
Supported by the following:
In Finjan Inc. v. Blue Coat Systems, Inc., 879 F.3d 1299, 125 USPQ2d 1282 (Fed. Cir. 2018), the claimed invention was a method of virus scanning that scans an application program, generates a security profile identifying any potentially suspicious code in the program, and links the security profile to the application program. 879 F.3d at 1303-04, 125 USPQ2d at 1285-86. The Federal Circuit noted that the recited virus screening was an abstract idea, and that merely performing virus screening on a computer does not render the claim eligible. 879 F.3d at 1304, 125 USPQ2d at 1286. The court then continued with its analysis under part one of the Alice/Mayo test by reviewing the patent’s specification, which described the claimed security profile as identifying both hostile and potentially hostile operations. The court noted that the security profile thus enables the invention to protect the user against both previously unknown viruses and “obfuscated code,” as compared to traditional virus scanning, which only recognized the presence of previously-identified viruses. The security profile also enables more flexible virus filtering and greater user customization. 879 F.3d at 1304, 125 USPQ2d at 1286. The court identified these benefits as improving computer functionality, and verified that the claims recite additional elements (e.g., specific steps of using the security profile in a particular way) that reflect this improvement. Accordingly, the court held the claims eligible as not being directed to the recited abstract idea. 879 F.3d at 1304-05, 125 USPQ2d at 1286-87. This analysis is equivalent to the Office’s analysis of determining that the additional elements integrate the judicial exception into a practical application at Step 2A Prong Two, and thus that the claims were not directed to the judicial exception (Step 2A: NO).
Examples of claims that improve technology and are not directed to a judicial exception include: Enfish, LLC v. Microsoft Corp., 822 F.3d 1327, 1339, 118 USPQ2d 1684, 1691-92 (Fed. Cir. 2016) (claims to a self-referential table for a computer database were directed to an improvement in computer capabilities and not directed to an abstract idea); McRO, Inc. v. Bandai Namco Games Am. Inc., 837 F.3d 1299, 1315, 120 USPQ2d 1091, 1102-03 (Fed. Cir. 2016) (claims to automatic lip synchronization and facial expression animation were directed to an improvement in computer-related technology and not directed to an abstract idea); Visual Memory LLC v. NVIDIA Corp., 867 F.3d 1253,1259-60, 123 USPQ2d 1712, 1717 (Fed. Cir. 2017) (claims to an enhanced computer memory system were directed to an improvement in computer capabilities and not an abstract idea); Finjan Inc. v. Blue Coat Systems, Inc., 879 F.3d 1299, 125 USPQ2d 1282 (Fed. Cir. 2018) (claims to virus scanning were found to be an improvement in computer technology and not directed to an abstract idea); SRI Int’l, Inc. v. Cisco Systems, Inc., 930 F.3d 1295, 1303 (Fed. Cir. 2019) (claims to detecting suspicious activity by using network monitors and analyzing network packets were found to be an improvement in computer network technology and not directed to an abstract idea). Additional examples are provided in MPEP § 2106.05(a).
Regarding the December 5th 2025 Memo in light of September 26, 2025 Appeals Review Panel Decision in Ex parte Desjardins, Appeal 2024-000567 for Application 16/319,040, in deciding if a recited abstract idea does or does not direct the entire claim to an abstract idea, when a claim is considered as a whole:
Paragraph 21 of the Specification, which the Appellant cites, identifies improvements in training the machine learning model itself. Of course, such an assertion in the Specification alone is insufficient to support a patent eligibility determination, absent a subsequent determination that the claim itself reflects the disclosed improvement. See MPEP § 2106.05(a) (citing Intellectual Ventures I LLC v. Symantec Corp., 838 F.3d 1307, 1316 (Fed. Cir. 2016)). Here, however, we are persuaded that the claims reflect such an improvement. For example, one improvement identified in the 8 Appeal2024-000567 Application 16/319,040 Specification is to "effectively learn new tasks in succession whilst protecting knowledge about previous tasks." Spec. ,r 21. The Specification also recites that the claimed improvement allows artificial intelligence (AI) systems to "us[e] less of their storage capacity" and enables "reduced system complexity." Id. When evaluating the claim as a whole, we discern at least the following limitation of independent claim 1 that reflects the improvement: "adjust the first values of the plurality of parameters to optimize performance of the machine learning model on the second machine learning task while protecting performance of the machine learning model on the first machine learning task." We are persuaded that constitutes an improvement to how the machine learning model itself operates, and not, for example, the identified mathematical calculation. Under a charitable view, the overbroad reasoning of the original panel below is perhaps understandable given the confusing nature of existing § 101 jurisprudence, but troubling, because this case highlights what is at stake. Categorically excluding AI innovations from patent protection in the United States jeopardizes America's leadership in this critical emerging technology. Yet, under the panel's reasoning, many AI innovations are potentially unpatentable-even if they are adequately described and nonobvious-because the panel essentially equated any machine learning with an unpatentable "algorithm" and the remaining additional elements as "generic computer components," without adequate explanation. Dec. 24. Examiners and panels should not evaluate claims at such a high level of generality.
Specifically, Ex Parte Desjardins explained the following:
Enfish ranks among the Federal Circuit's leading cases on the eligibility of technological improvements. In particular, Enfish recognized that “[m]uch of the advancement made in computer technology consists of improvements to software that, by their very nature, may not be defined by particular physical features but rather by logical structures and processes.” 822 F.3d at 1339. Moreover, because “[s]oftware can make non-abstract improvements to computer technology, just as hardware improvements can,” the Federal Circuit held that the eligibility determinations should turn on whether “the claims are directed to an improvement to computer functionality versus being directed to an abstract idea.” Id. at 1336. (Desjardins, page 8).
Further in Ex Parte Desjardins, Appeal No. 2024-000567 (PTAB September 26, 2025, Appeals Review Panel Decision) (precedential), the claimed invention was a method of training a machine learning model on a series of tasks. The Appeals Review Panel (ARP) overall credited benefits including reduced storage, reduced system complexity and streamlining, and preservation of performance attributes associated with earlier tasks during subsequent computational tasks as technological improvements that were disclosed in the patent application specification. Specifically, the ARP upheld the Step 2A Prong One finding that the claims recited an abstract idea (i.e., mathematical concept). In Step 2A Prong Two, the ARP then determined that the specification identified improvements as to how the machine learning model itself operates, including training a machine learning model to learn new tasks while protecting knowledge about previous tasks to overcome the problem of “catastrophic forgetting” encountered in continual learning systems. Importantly, the ARP evaluated the claims as a whole in discerning at least the limitation “adjust the first values of the plurality of parameters to optimize performance of the machine learning model on the second machine learning task while protecting performance of the machine learning model on the first machine learning task” reflected the improvement disclosed in the specification. Accordingly, the claims as a whole integrated what would otherwise be a judicial exception instead into a practical application at Step 2A Prong Two, and therefore the claims were
The claim itself does not need to explicitly recite the improvement described in the specification (e.g., “thereby increasing the bandwidth of the channel”). See, e.g., Ex Parte Desjardins, Appeal No. 2024-000567 (PTAB September 26, 2025, Appeals Review Panel Decision) (precedential), in which the specification identified the improvement to machine learning technology by explaining how the machine learning model is trained to learn new tasks while protecting knowledge about previous tasks to overcome the problem of “catastrophic forgetting,” and that the claims reflected the improvement identified in the specification. Indeed, enumerated improvements identified in the Desjardins specification included disclosures of the effective learning of new tasks in succession in connection with specifically protecting knowledge concerning previously accomplished tasks; allowing the system to reduce use of storage capacity; and the enablement of reduced complexity in the system. Such improvements were tantamount to how the machine learning model itself would function in operation and therefore not subsumed in the identified mathematical calculation.
Step 2B: DOES THE CLAIM RECITE ADDITIONAL ELEMENTS THAT AMOUNT TO SIGNIFICANTLY MORE THAN THE JUDICIAL EXCEPTION?
If Yes at step 2A.1 and step 2A.2 fails, the interpretation in the context of 35 USC 101 amounts to a system that will improve accuracy, speed, and resource consumption to utilize time series based object discrete status in a window using dual models to derive embedded attributes for future characteristic prediction while training a learning model of the dual models in the precise manner claimed. A human would not seek such claim limitations as a general or routine step, wherein such steps improve the functioning of a learning model by performing significant encoding and embedding for the purposes of training a model for prediction or similarly such a technology and its functionality thereof with the specific limitations recited. A person would not simply seek to implement such states as a mental process per se, or utilize a two models and specific limitations as claimed to utilize a generic computer use or extra solution activity thereof. Such claim language provides significance beyond mere data manipulation or mental decision making and is also analogous to interim amendment examples. Additionally, if one or ordinary skill in the art fails to properly consider any specified hardware or arranged components in the claims, considers such claim limitations non-generic, or simply disregards Enfish, the claims still demonstrate that there exists improvements to the functioning of a computer (or technology), e.g., a modification of conventional Internet hyperlink protocol to dynamically produce a dual-source hybrid webpage, as discussed in DDR Holdings, LLC v. Hotels.com, L.P., 773 F.3d 1245, 1258-59, 113 USPQ2d 1097, 1106-07 (Fed. Cir. 2014) (see MPEP § 2106.05(a));. Therefore, in light of the above as a whole, it is not warranted to give a rejection under 35 USC 101.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claims 1-4, 7, 10-15, 19, and 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Guo et al. , LogBERT: Log Anomaly Detection via BERT, 2021, International Joint Conference on Neural Networks (IJCNN), All pages (hereinafter Guo).
Re claim 12, Guo teaches
A method executed by one or more processors of a computer system the method comprising: (Section III, "Methodology" and Section IV, "Experiments" describe a computational framework for training deep neural networks on system log datasets like HDFS and BlueGene, which inherently requires execution by computer processors; further note discussion of online computer systems and associated environment which the disclosure is used within)
receiving one or more time series indicating discrete status information for an object (e.g. log keys represent detailed information about computational events generated by computer systems... also note “discrete log messages”; see I. INTRODUCTION)…
over a time window… (e.g. sliding time window; see IV. EXPERIMENTS, A. Experimental Setup), wherein the discrete status information includes a set of discrete values representing discrete statuses associated with the object over the time window (e.g. collecting log messages from computer systems and parsing them into "log sequences" S = {k1, ..., kt, ..., kT}, where each kt is a "log key" representing a discrete system event or status; see Section III.A, "Framework": "we first extract log keys (string templates) from log messages... define a log sequence as a sequence of ordered log keys");
generating one or more time series encodings based on the one or more time series (Guo the act of converting the initial key embedding into a vector is encoding, discloses converting the sequence of log keys into input vectors; see Section III.A, "Input Representation": "LogBERT then represents each log key kjt as an input representation xjt, where the representation xjt is a summation of a log key embedding and a position embedding");
providing the one or more time series encodings as input to a trained natural language processing (NLP) model (e.g. using a Transformer-based model inspired by BERT, a well-known NLP model, to process the input representations; see Section III.A, "Transformer Encoder": "LogBERT adopts Transformer encoder... Inspired by BERT), the trained NLP model being configured to generate one or more output embeddings corresponding to the one or more time series encodings (e.g. a new embedding per se from already encoded data, we denote ht as the contextual embedding vector of the log key kt produced by the Transformer encoder");
determining attributes associated with the object based on the one or more output embeddings (e.g. calculating probability distributions or distances based on the embeddings; see Section III.B, "Objective Function": "feed the contextual embedding vector... to a softmax function, which will output a probability distribution over the entire set of log keys" and Section III.C: calculating the anomalous score); and
providing the attributes for use with a machine-learning model configured to predict a characteristic of the object (Guo “for use” indicates the ability to provide the outputs to a model which does not exclude the BERT model or any model per se, discloses using the probability attributes to perform anomaly detection; see Section III.C, "Anomaly Detection": "derive the anomalous score of a log sequence based on the prediction results... If the observed log key is not in the top-g candidate set... we will label this log sequence as anomalous," where the anomaly detection logic constitutes the machine-learning model predicting the characteristic of being anomalous).
Re claim 1, this claim has been rejected for teaching a broader, or narrower claim based on general inclusion of hardware alone (e.g. processor, memory, instructions), representation of claim 12 omitting/including hardware for instance, otherwise amounting to a virtually identical scope.
Re claim 20, this claim has been rejected for teaching a broader, or narrower claim based on general inclusion of hardware alone (e.g. processor, memory, instructions), representation of claim 12 omitting/including hardware for instance, otherwise amounting to a virtually identical scope.
Re claims 2 and 13, Guo teaches
2. The system of claim 1, wherein the one or more time series includes a first time series and a second times series, the one or more time series encodings includes a first time-series encoding and a second time-series encoding, and the one or more memories further comprise instructions that are executable by the one or more processors for causing the one or more processors to: (section 3.1 the initial keys are encoded embeddings and a second set of encoded embeddings are the position encodings, the combination thereof is Xtj as a summation of a log key embedding and a position embedding )
generate a first embedding using the first time-series encoding; (section 3.1 the initial keys are encoded embeddings)
generate a second embedding using the second time-series encoding; (section 3.1 a second set of encoded embeddings are the position encodings)
generate an aggregated embedding by aggregating the first embedding and the second embedding; and (section 3.1 the initial keys are encoded embeddings and a second set of encoded embeddings are the position encodings, the combination thereof is Xtj as a summation of a log key embedding and a position embedding )
determine the attributes based on the aggregated embedding. (using 3.1 then e.g. calculating probability distributions or distances based on the embeddings; see Section III.B, "Objective Function": "feed the contextual embedding vector... to a softmax function, which will output a probability distribution over the entire set of log keys" and Section III.C: calculating the anomalous score);
Re claims 3 and 14, Guo teaches
3. The system of claim 2, wherein the first embedding is a first document embedding, the second embedding is a second document embedding, and the one or more memories further comprise instructions that are executable by the one or more processors for causing the one or more processors to: (LogBERT or general BERT is a NLP model which handles text such as a document or packet with text see section 1 and 2)
generate the first document embedding by supplying the first time-series encoding as input to the trained NLP model, the trained NLP model being configured to output the first document embedding based on the first time-series encoding; (under the premise of documents per se as LogBERT or general BERT is a NLP model which handles text such as a document or packet with text see section 1 and , in section 3.1 the initial keys are encoded embeddings and a second set of encoded embeddings are the position encodings, the combination thereof is Xtj as a summation of a log key embedding and a position embedding) and
generate the second document embedding by supplying the second time-series encoding as input to the trained NLP model, the trained NLP model being configured to output the second document embedding based on the second time-series encoding. (under the premise of documents per se as LogBERT or general BERT is a NLP model which handles text such as a document or packet with text see section 1 and , in section 3.1 the initial keys are encoded embeddings and a second set of encoded embeddings are the position encodings, the combination thereof is Xtj as a summation of a log key embedding and a position embedding)
Re claims 4 and 15, Guo teaches
4. The system of claim 2, wherein the first embedding is a first document embedding, the second embedding is a second document embedding, and the one or more memories further comprise instructions that are executable by the one or more processors for causing the one or more processors to: (LogBERT or general BERT is a NLP model which handles text such as a document or packet with text see section 1 and 2)
generate the first document embedding by aggregating a first set of word embeddings generated by the trained NLP model based on the first time-series encoding; and (under the premise of documents per se as LogBERT or general BERT is a NLP model which handles text such as a document or packet with text see section 1 and , in section 3.1 the initial keys are encoded embeddings and a second set of encoded embeddings are the position encodings, the combination thereof is Xtj as a summation of a log key embedding and a position embedding)
generate the second document embedding by aggregating a second set of word embeddings generated by the trained NLP model based on the second time-series encoding. (under the premise of documents per se as LogBERT or general BERT is a NLP model which handles text such as a document or packet with text see section 1 and , in section 3.1 the initial keys are encoded embeddings and a second set of encoded embeddings are the position encodings, the combination thereof is Xtj as a summation of a log key embedding and a position embedding)
Re claim 7, Guo teaches
7. The system of claim 1, wherein the one or more memories further comprise instructions that are executable by the one or more processors for causing the one or more processors to:
generate a set of training data that includes encodings of a plurality of time series of discrete values; and (BERT models inherently operate on discrete sequential data… e.g. sliding time window; see IV. EXPERIMENTS, A. Experimental Setup), wherein the discrete status information includes a set of discrete values representing discrete statuses associated with the object over the time window (e.g. collecting log messages from computer systems and parsing them into "log sequences" S = {k1, ..., kt, ..., kT}, where each kt is a "log key" representing a discrete system event or status; see Section III.A, "Framework": "we first extract log keys (string templates) from log messages... define a log sequence as a sequence of ordered log keys"
train a NLP model based on the set of training data to produce the trained NLP model. (The model is a BERT model which inherently operate on discrete sequential data as in section 3.1… e.g. sliding time window; see IV. EXPERIMENTS, A. Experimental Setup), wherein the discrete status information includes a set of discrete values representing discrete statuses associated with the object over the time window (e.g. collecting log messages from computer systems and parsing them into "log sequences" S = {k1, ..., kt, ..., kT}, where each kt is a "log key" representing a discrete system event or status; see Section III.A, "Framework": "we first extract log keys (string templates) from log messages... define a log sequence as a sequence of ordered log keys")
Re claims 10 and 19, Guo teaches
10. The system of claim 1, wherein the one or more memories further comprise instructions that are executable by the one or more processors for causing the one or more processors to:
generate a set of training data that includes one or more of the attributes as one or more output variables; and (outputting per se during the BERT iteration, calculating probability distributions or distances based on the embeddings; see Section III.B, "Objective Function": "feed the contextual embedding vector... to a softmax function, which will output a probability distribution over the entire set of log keys" and Section III.C: calculating the anomalous score);
train the machine-learning model based on the set of training data to produce a trained machine-learning model. (The model is a BERT model which inherently operate on discrete sequential data as in section 3.1… e.g. sliding time window; see IV. EXPERIMENTS, A. Experimental Setup), wherein the discrete status information includes a set of discrete values representing discrete statuses associated with the object over the time window (e.g. collecting log messages from computer systems and parsing them into "log sequences" S = {k1, ..., kt, ..., kT}, where each kt is a "log key" representing a discrete system event or status; see Section III.A, "Framework": "we first extract log keys (string templates) from log messages... define a log sequence as a sequence of ordered log keys"
Re claim 11, Guo teaches
11. The system of claim 1, wherein the machine-learning model is a trained machine-learning model, and wherein the one or more memories further comprise instructions that are executable by the one or more processors for causing the one or more processors to:
provide the attributes as input to the trained machine-learning model, the machine-learning model being configured to output a prediction of the characteristic of the object based on the attributes; and (Using known premises of a BERT or LogBERT model e.g. calculating probability distributions or distances based on the embeddings; see Section III.B, "Objective Function": "feed the contextual embedding vector... to a softmax function, which will output a probability distribution over the entire set of log keys" and Section III.C: calculating the anomalous score);
automatically perform one or more control operations based on the prediction. (Prediction as in 3.1 to 3.2 using known premises of a BERT or LogBERT model e.g. calculating probability distributions or distances based on the embeddings; see Section III.B, "Objective Function": "feed the contextual embedding vector... to a softmax function, which will output a probability distribution over the entire set of log keys" and Section III.C: calculating the anomalous score);
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 5, 6, 8, 9, 16, 17, and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Guo et al. , LogBERT: Log Anomaly Detection via BERT, 2021, International Joint Conference on Neural Networks (IJCNN), All pages (hereinafter Guo) in view of US 20230035639 A1 Zhao; Runhua et al. (hereinafter Zhao).
Re claims 5 and 16, while Guo teaches multi-embedding handling using an NLP model like BERT such as to handle text/strings/words, it fails to teach the classification of n-grams per se:
5. The system of claim 1, wherein the one or more memories further comprise instructions that are executable by the one or more processors for causing the one or more processors to:
generate a set of n-grams based on the set of discrete values in the one or more time series; and (NOTE “n-grams” not taught by , but the structure of handling encoded embeddings is taught by Guo, substitutable with n-grams as text… The model is a BERT model which inherently operate on discrete sequential data as in section 3.1… e.g. sliding time window; see IV. EXPERIMENTS, A. Experimental Setup), wherein the discrete status information includes a set of discrete values representing discrete statuses associated with the object over the time window (e.g. collecting log messages from computer systems and parsing them into "log sequences" S = {k1, ..., kt, ..., kT}, where each kt is a "log key" representing a discrete system event or status; see Section III.A, "Framework": "we first extract log keys (string templates) from log messages... define a log sequence as a sequence of ordered log keys")
determine additional attributes associated with the object based on the set of n-grams. (NOTE “n-grams” not taught by , but the structure of handling encoded embeddings is taught by Guo, substitutable with n-grams as text… The model is a BERT model which inherently operate on discrete sequential data as in section 3.1… e.g. sliding time window; see IV. EXPERIMENTS, A. Experimental Setup), wherein the discrete status information includes a set of discrete values representing discrete statuses associated with the object over the time window (e.g. collecting log messages from computer systems and parsing them into "log sequences" S = {k1, ..., kt, ..., kT}, where each kt is a "log key" representing a discrete system event or status; see Section III.A, "Framework": "we first extract log keys (string templates) from log messages... define a log sequence as a sequence of ordered log keys")
Guo fails to teach:
n-grams (Zhao 027 and 0034 with fig. 1b and 1e n-gram handling and multi-vector embeddings output into a cluster model)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Guo to incorporate the above claim limitations as taught by Zhao to allow for simple substitution of one known element for another to obtain predictable results such as text with Zhao’s n-gram, which allows the model to better capture specific, ordered, multi-word phrases (e.g., "not good" vs. "not very good") that might otherwise be overlooked as well as helping with OOV words which at least improves accuracy in tasks like text classification and sentiment analysis by combining fine-grained, phrase-level nuances with deep, bidirectional, semantic understanding.
Re claims 6 and 17, while Guo teaches multi-embedding handling using an NLP model like BERT such as to handle text/strings/words, it fails to teach the classification of n-grams per se:
6. The system of claim 5, wherein the one or more time series includes a first time series and a second times series, and wherein the one or more memories further comprise instructions that are executable by the one or more processors for causing the one or more processors to: (see IV. EXPERIMENTS, A. Experimental Setup), wherein the discrete status information includes a set of discrete values… vectors thereof using 3.1 then e.g. calculating probability distributions or distances based on the embeddings; see Section III.B, "Objective Function": "feed the contextual embedding vector... to a softmax function, which will output a probability distribution over the entire set of log keys" and Section III.C: calculating the anomalous score);
generate a first set of n-grams based on the first time series; (NOTE “n-grams” not taught by , but the structure of handling encoded embeddings is taught by Guo, substitutable with n-grams as text… vectors thereof multiple embeddings from multiple encodings, using 3.1 then e.g. calculating probability distributions or distances based on the embeddings; see Section III.B, "Objective Function": "feed the contextual embedding vector... The model is a BERT model which inherently operate on discrete sequential data as in section 3.1… e.g. sliding time window; see IV. EXPERIMENTS, A. Experimental Setup), wherein the discrete status information includes a set of discrete values… to a softmax function, which will output a probability distribution over the entire set of log keys" and Section III.C: calculating the anomalous score);
generate a second set of n-grams based on the second time series; (NOTE “n-grams” not taught by , but the structure of handling encoded embeddings is taught by Guo, substitutable with n-grams as text… second set of vectors thereof multiple embeddings from multiple encodings, using 3.1 then e.g. calculating probability distributions or distances based on the embeddings; see Section III.B, "Objective Function": "feed the contextual embedding vector... The model is a BERT model which inherently operate on discrete sequential data as in section 3.1… e.g. sliding time window; see IV. EXPERIMENTS, A. Experimental Setup), wherein the discrete status information includes a set of discrete values… to a softmax function, which will output a probability distribution over the entire set of log keys" and Section III.C: calculating the anomalous score);
generate an aggregated n-gram vector by aggregating the first set of n-grams and the second set of n-grams; and (NOTE “n-grams” not taught by , but the structure of handling encoded embeddings is taught by Guo, substitutable with n-grams as text… section 3.1 the initial keys are encoded embeddings and a second set of encoded embeddings are the position encodings, the combination thereof is Xtj as a summation of a log key embedding and a position embedding…vectors thereof multiple embeddings from multiple encodings, using 3.1 then e.g. calculating probability distributions or distances based on the embeddings; see Section III.B, "Objective Function": "feed the contextual embedding vector... The model is a BERT model which inherently operate on discrete sequential data as in section 3.1… e.g. sliding time window; see IV. EXPERIMENTS, A. Experimental Setup), wherein the discrete status information includes a set of discrete values… to a softmax function, which will output a probability distribution over the entire set of log keys" and Section III.C: calculating the anomalous score);
determine the additional attributes associated with the object based on the aggregated n- gram vector. (NOTE “n-grams” not taught by , but the structure of handling encoded embeddings is taught by Guo, substitutable with n-grams as text… vectors thereof multiple embeddings from multiple encodings, using 3.1 then e.g. calculating probability distributions or distances based on the embeddings; see Section III.B, "Objective Function": "feed the contextual embedding vector... The model is a BERT model which inherently operate on discrete sequential data as in section 3.1… e.g. sliding time window; see IV. EXPERIMENTS, A. Experimental Setup), wherein the discrete status information includes a set of discrete values… to a softmax function, which will output a probability distribution over the entire set of log keys" and Section III.C: calculating the anomalous score);
Guo fails to teach:
n-grams (Zhao 027 and 0034 with fig. 1b and 1e n-gram handling and multi-vector embeddings output into a cluster model)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Guo to incorporate the above claim limitations as taught by Zhao to allow for simple substitution of one known element for another to obtain predictable results such as text with Zhao’s n-gram, which allows the model to better capture specific, ordered, multi-word phrases (e.g., "not good" vs. "not very good") that might otherwise be overlooked as well as helping with OOV words which at least improves accuracy in tasks like text classification and sentiment analysis by combining fine-grained, phrase-level nuances with deep, bidirectional, semantic understanding.
Re claims 8 and 18, while Guo teaches multi-embedding handling using an NLP model like BERT, it fails to teach clustering per se:
8. The system of claim 1, wherein the one or more memories further comprise instructions that are executable by the one or more processors for causing the one or more processors to:
determine at least one additional attribute associated with the object by providing the one or more output embeddings as input to a trained clustering model, wherein trained clustering model is configured to select a particular cluster among a plurality of candidate clusters based on the one or more output embeddings, the particular cluster being indicative of the at least one additional attribute. (Zhao 027 and 0034 with fig. 1b and 1e multi-vector embeddings output into a cluster model)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Guo to incorporate the above claim limitations as taught by Zhao to allow for simple substitution of one known element for another to obtain predictable results such as a general receivable model using embedding vectors as a cluster model to be trained which significantly improves text clustering by leveraging deep, contextualized semantic embeddings rather than simple word counts and also enhances clustering accuracy, identifies nuanced topics, handles high-dimensional data better, and produces well-separated, semantically rich clusters, such as a K-means or similar.
Re claim 9, while Guo teaches multi-embedding handling using an NLP model like BERT, it fails to teach clustering per se:
9. The system of claim 8, wherein the one or more memories further comprise instructions that are executable by the one or more processors for causing the one or more processors to:
train a clustering model based on a set of training data to produce the trained clustering model. (Zhao 027 and 0034 with fig. 1b and 1e multi-vector embeddings output into a cluster model)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Guo to incorporate the above claim limitations as taught by Zhao to allow for simple substitution of one known element for another to obtain predictable results such as a general receivable model using embedding vectors as a cluster model to be trained which significantly improves text clustering by leveraging deep, contextualized semantic embeddings rather than simple word counts and also enhances clustering accuracy, identifies nuanced topics, handles high-dimensional data better, and produces well-separated, semantically rich clusters, such as a K-means or similar.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US 20230096821 A1 Huang; Ronny et al.
Using an ASR model to encode, embed, and provide signal features for a neural network model when a user requests information in a chat by test or speech input
US 20230315781 A1 HUANG; Li et al.
Taking a segment of time and embedding data for multiple users searching images based on a user profile metadata such as age, location, and gender
US 20210350786 A1 Chen; Zhehuai et al.
Speaker and utterance embeddings for training an ASR model and GAN model for synthesizing speech when a user inputs an ASR request
US 20230004366 A1 Zhang; Qianyu et al.
Encoder to embedding to decoder transformation independent of modeling and NLP
US 20240169147 A1 Sharma; Utkarsh Hemant Kumar et al.
Embedded data created by using an NLP model to produce an encoded narration
US 12190060 B1 Tavanaei; Amirhossein et al.
Dual embedding inputs to produce an encoded sequence to sequence output associated with a previous attribute
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL C COLUCCI whose telephone number is (571)270-1847. The examiner can normally be reached on M-F 9 AM - 5 PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached at (571)272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MICHAEL COLUCCI/Primary Examiner, Art Unit 2655 (571)-270-1847
Examiner FAX: (571)-270-2847
Michael.Colucci@uspto.gov