Last updated: April 19, 2026
Application No. 18/737,324
DETECTING SURREPTITIOUS SPEECH USING MACHINE LEARNING MODELS

Non-Final OA §101§102§103§DP
Filed
Jun 07, 2024
Examiner
LEE, EUNICE SOMIN
Art Unit
2656
Tech Center
2600 — Communications
Assignee
X Development LLC
OA Round
1 (Non-Final)
Interview Optional

— +27.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 27 resolved cases, 2023–2026
Examiner Intelligence

LEE, EUNICE SOMIN View full profile →
Grants 89% — above average
Career Allow Rate
24 granted / 27 resolved
+26.9% vs TC avg
Strong +27% interview lift
Without
With
+27.3%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
20 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
18.7%
-21.3% vs TC avg
§103
53.0%
+13.0% vs TC avg
§102
7.3%
-32.7% vs TC avg
§112
2.7%
-37.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 27 resolved cases
Office Action

§101 §102 §103 §DP
DETAILED ACTION
This communication is in response to the Application filed on June 7, 2024. 
Claims 1 - 20 are pending and have been examined. 
Claims 1, 15, 19 and 20 are independent.
PCT/US25/32106 was filed on June 3, 2025.

           

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Information Disclosure Statement
The information disclosure statements (IDS) submitted on January 17, 2025 and October 31, 2025 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.


Drawings
The drawings filed on June 7, 2024 have been accepted and considered by the Examiner.




Double Patenting Note
The Examiner notes that previously published U.S. Patent Application Publication 2026/0018251 were analyzed for Double Patenting. However, based on the current claim scope no Double patenting was found.



Claim Rejections - 35 USC § 101


35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claim 20 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.
Regarding Claim 20,
Claim 20 recites “One or more computer-readable storage media...”
The broadest reasonable interpretation (BRI) of machine-readable media can encompass non-statutory transitory forms of signal transmission, such as a propagating electrical or electromagnetic signal per se. See in re Nuijten, 500 F.3d 1346, 84 USPQ2d 1495 (Fed.Cir. 2007). When the BRI encompasses transitory forms of signal transmission, a rejection under 35 U.S.C. 101 as failing to claim statutory subject matter would be appropriate. In the Specification Pg. 26, the phrase “computer-readable storage media” refers to both “non transitory” (Pg. 26, Ln. 16) and also “propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal” (Pg. 26, Ln. 20-21). Under the broadest reasonable interpretation (BRI), “computer readable storage medium” covers forms of transitory propagating signals per se, and therefore would not be patent-eligible.

Claims 1 - 4 , 7, 9, 12 - 20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to Abstract Idea without significantly more.
Regarding Claim 1,
Step 1: Claim 1 recites steps for detecting surreptitious speech from a given sequence of text, which falls under the statutory category of process.
Step 2A Prong I: The limitations (a) “obtaining data representing a sequence of text”, (b) “obtaining a sequence of tokens for the sequence of text comprising one or more groups of tokens”, as drafted, are a process that, under a broadest reasonable interpretation, encompass mental processes performed in the human mind using observation, evaluation, and judgement. See MPEP 2106/04(a)(2). Accordingly, the claim recites an abstract idea.
Step 2A Prong II: The additional limitations (c) “wherein each group comprises two or more tokens”, (d) “processing the one or more groups of tokens using a first machine learning model to identify tokens that are out of context in the sequence of tokens”, as drafted can be practically performed in the human mind using observation, evaluation, and judgement or by a human using a pen and paper. Using observation, evaluation, and judgement, a human can identify tokens representing words that are out of context in the sequence of tokens representing words. Accordingly, the additional element(s) do(es) not integrate the abstract idea into a practical application.
Step 2B: The additional element of limitation (d) “providing data representing the identified tokens” provides nothing more than mere data gathering and output recited at a high level of generality to implement an abstract idea or other exception on a computer, which do not provide an inventive concept. The claim is not patent eligible.

Regarding Claim 2,
Dependent claim 2 further narrows the steps for detecting surreptitious speech from a given sequence of text by reciting “wherein the sequence of text represents one or more documents”. However, the claimed limitations recite only the idea of steps for detecting surreptitious speech from a given sequence of text with a high level of generality. No additional components appear in the claim that apply the abstract idea to a practical application or amount to significantly more than the abstract idea. No additional limitations are present. The claim is not patent eligible.

Regarding Claim 3,
Dependent claim 3 further narrows the steps for detecting surreptitious speech from a given sequence of text by reciting “wherein obtaining a sequence of tokens for the sequence of text comprises providing the sequence of text as input to a model that is configured to generate a sequence of tokens given an input sequence of text.” However, the claimed limitations recite only the idea of steps for detecting surreptitious speech from a given sequence of text with a high level of generality. Each of the additional limitations is no more than mere instructions to apply the exception using a generic “model” component. The claim is not patent eligible.

Regarding Claim 4,
Dependent claim 4 further narrows the steps for detecting surreptitious speech from a given sequence of text by reciting “wherein each group represents a sentence fragment of the sequence of text”. However, the claimed limitations recite only the idea of steps for detecting surreptitious speech from a given sequence of text with a high level of generality. No additional components appear in the claim that apply the abstract idea to a practical application or amount to significantly more than the abstract idea. No additional limitations are present. The claim is not patent eligible.

Regarding Claim 7,
Dependent claim 7 further narrows the steps for detecting surreptitious speech from a given sequence of text by reciting “wherein processing the one or more groups of tokens using a first machine learning model to identify tokens that are out of context in the sequence of tokens comprises processing the one or more groups of tokens using the first machine learning model to identify phrases that are out of context” and “wherein each phrase comprises two or more consecutive tokens”. However, the claimed limitations recite only the idea of steps for detecting surreptitious speech from a given sequence of text with a high level of generality. No additional components appear in the claim that apply the abstract idea to a practical application or amount to significantly more than the abstract idea. No additional limitations are present. The claim is not patent eligible.

Regarding Claim 9,
Dependent claim 9 further narrows the steps for detecting surreptitious speech from a given sequence of text by reciting “wherein identifying one or more phrases comprises identifying one or more phrases that each comprise two or more consecutive tokens and less than a maximum number of consecutive tokens”. However, the claimed limitations recite only the idea of steps for detecting surreptitious speech from a given sequence of text with a high level of generality. No additional components appear in the claim that apply the abstract idea to a practical application or amount to significantly more than the abstract idea. No additional limitations are present. The claim is not patent eligible.

Regarding Claim 10,
Dependent claim 10 further narrows the steps for detecting surreptitious speech from a given sequence of text by reciting “wherein the first machine learning model comprises a language model that has been trained on a masked language modeling task”. However, the claimed limitations recite only the idea of steps for detecting surreptitious speech from a given sequence of text with a high level of generality. No additional components appear in the claim that apply the abstract idea to a practical application or amount to significantly more than the abstract idea. No additional limitations are present. The claim is not patent eligible.

Regarding Claim 12,
Dependent claim 12 further narrows the steps for detecting surreptitious speech from a given sequence of text by reciting “further comprising identifying high-value tokens from the identified tokens”. However, the claimed limitations recite only the idea of steps for detecting surreptitious speech from a given sequence of text with a high level of generality. No additional components appear in the claim that apply the abstract idea to a practical application or amount to significantly more than the abstract idea. No additional limitations are present. The claim is not patent eligible.

Regarding Claim 13,
Dependent claim 13 further narrows the steps for detecting surreptitious speech from a given sequence of text by reciting “wherein identifying high-value tokens from the identified tokens comprises”, “for each of the identified token”, “determining a number of occurrences of the identified token in the sequence of text” and “identifying one or more identified tokens with a number of occurrences over a threshold number of occurrences as high-value tokens”. However, the claimed limitations recite only the idea of steps for detecting surreptitious speech from a given sequence of text with a high level of generality. No additional components appear in the claim that apply the abstract idea to a practical application or amount to significantly more than the abstract idea. No additional limitations are present. The claim is not patent eligible.

Regarding Claim 14,
Dependent claim 14 further narrows the steps for detecting surreptitious speech from a given sequence of text by reciting “wherein the sequence of text represents one or more documents originating from one or more authors”, “wherein identifying high-value tokens from the identified tokens comprises”, “obtaining one or more authors of interest from the one or more authors”, “for each of the identified tokens”, “determining a corresponding set of authors for the identified token”, “identifying one or more identified tokens with a corresponding set of authors that includes at least one of the one or more authors of interest as high-value tokens”. However, the claimed limitations recite only the idea of steps for detecting surreptitious speech from a given sequence of text with a high level of generality. No additional components appear in the claim that apply the abstract idea to a practical application or amount to significantly more than the abstract idea. No additional limitations are present. The claim is not patent eligible.

Regarding Claim 15,
Step 1: Claim 15 recites steps for detecting surreptitious speech from a given sequence of text, which falls under the statutory category of process.
Step 2A Prong I: The limitations (a) “obtaining data representing a sequence of text”, (b) “dividing the sequence of text into a plurality of segments”, as drafted, are a process that, under a broadest reasonable interpretation, encompass mental processes performed in the human mind using observation, evaluation, and judgement. See MPEP 2106/04(a)(2). Accordingly, the claim recites an abstract idea.
Step 2A Prong II: The additional limitations (c) “wherein each segment comprises a plurality of words or subwords that are semantically relevant”, (d) “processing the plurality of segments using a second machine learning model to identify segments that include surreptitious language”, as drafted can be practically performed in the human mind using observation, evaluation, and judgement or by a human using a pen and paper. Using observation, evaluation, and judgement, a human can identify segments comprising a plurality of words or subwords that are semantically relevant. Accordingly, the additional element(s) do(es) not integrate the abstract idea into a practical application.
Step 2B: The additional element of limitation (d) “providing data representing the identified segments” provides nothing more than mere data gathering and output recited at a high level of generality to implement an abstract idea or other exception on a computer, which do not provide an inventive concept. The claim is not patent eligible.

Regarding Claim 16,
Dependent claim 16 further narrows the steps for detecting surreptitious speech from a given sequence of text by reciting “wherein processing the plurality of segments using a second machine learning model to identify segments that include surreptitious language comprises”, “for each segment of the plurality of segments”, “providing the segment to the second machine learning model, wherein the second machine learning model is configured to generate a score representing a likelihood that an input segment of text includes language that indicates an author of the segment is hiding information”, “determining that the score for the segment meets a threshold score”, and “in response to determining that the score for the segment meets the threshold score, identifying the segment as including surreptitious language”. However, the claimed limitations recite only the idea of steps for detecting surreptitious speech from a given sequence of text with a high level of generality. Each of the additional limitations is no more than mere instructions to apply the exception using a generic “machine learning model” component. The claim is not patent eligible.

Regarding Claim 17,
Dependent claim 17 further narrows the steps for detecting surreptitious speech from a given sequence of text by reciting “wherein processing the plurality of segments using a second machine learning model to identify segments that include surreptitious language comprises”, “obtaining a timestamp for each segment in the plurality of segments”, “determining a temporally ordered sequence of segments for the plurality of segments based on the timestamps for each segment”, “for each consecutive pair of segments in the temporally ordered sequence: determining an interval of time elapsed between a first segment of the consecutive pair of segments and a second segment of the consecutive pair of segments”, “determining that the interval of time meets a threshold interval of time”,” determining that the interval of time meets a threshold interval of time”, “in response to determining that the interval of time meets the threshold interval of time”, “providing the first segment to the second machine learning model”, “wherein the second machine learning model is configured to generate a score representing a likelihood that an input segment of text includes language that indicates an author of the segment is hiding information”, “determining that the score for the first segment meets a threshold score”, “in response to determining that the score for the first segment meets the threshold score”, and “identifying the first segment as including surreptitious language”. However, the claimed limitations recite only the idea of steps for detecting surreptitious speech from a given sequence of text with a high level of generality. Each of the additional limitations is no more than mere instructions to apply the exception using a generic “machine learning model” component. The claim is not patent eligible.

Regarding Claim 18,
Dependent claim 18 further narrows the steps for detecting surreptitious speech from a given sequence of text by reciting “wherein the threshold interval of time is determined based on an average interval of time elapsed between consecutive segments in the temporally ordered sequence of segments”. However, the claimed limitations recite only the idea of steps for detecting surreptitious speech from a given sequence of text with a high level of generality. No additional components appear in the claim that apply the abstract idea to a practical application or amount to significantly more than the abstract idea. No additional limitations are present. The claim is not patent eligible.


Regarding Claim 19,
Step 1: Claim 19 recites a system for determining a command based on a set of pseudo words, which falls under the statutory category of machine.
Step 2A Prong I: As drafted, Claim 8 is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “a system”, nothing in the claim element precludes the step from practically being performed in the mind. Accordingly, the claim recites a judicial exception, and the analysis must therefore proceed to Step 2A Prong II.
Step 2A Prong II: This judicial exception is not integrated into a practical application. In particular, the claim only recites the additional elements of “obtaining data representing a sequence of text; obtaining a sequence of tokens for the sequence of text comprising one or more groups of tokens”, “wherein each group comprises two or more tokens”, “processing the one or more groups of tokens using a first machine learning model to identify tokens that are out of context in the sequence of tokens”, and “providing data representing the identified tokens” which are recited at a high level of generality and amounts to mere data gathering, which is a form of insignificant extra-solution activity. Each of the additional limitations is no more than mere instructions to apply the exception using a generic computer component. Accordingly, the additional element(s) do(es) not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea and the claim is therefore directed to the judicial exception.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using “a system” to perform the aforementioned steps amounts to no more than mere instructions to apply the exception using a generic computer component, which cannot provide an inventive concept. The claim is not patent eligible.

Regarding Claim 20,
Step 1: Claim 20 recites a computer-readable storage media for determining a command based on a set of pseudo words, which falls under the statutory category of machine.
Step 2A Prong I: As drafted, Claim 8 is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “a computer-readable storage media”, nothing in the claim element precludes the step from practically being performed in the mind. Accordingly, the claim recites a judicial exception, and the analysis must therefore proceed to Step 2A Prong II.
Step 2A Prong II: This judicial exception is not integrated into a practical application. In particular, the claim only recites the additional elements of “obtaining data representing a sequence of text; obtaining a sequence of tokens for the sequence of text comprising one or more groups of tokens”, “wherein each group comprises two or more tokens”, “processing the one or more groups of tokens using a first machine learning model to identify tokens that are out of context in the sequence of tokens”, and “providing data representing the identified tokens” which are recited at a high level of generality and amounts to mere data gathering, which is a form of insignificant extra-solution activity. Each of the additional limitations is no more than mere instructions to apply the exception using a generic computer component. Accordingly, the additional element(s) do(es) not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea and the claim is therefore directed to the judicial exception.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using “a computer-readable storage media” to perform the aforementioned steps amounts to no more than mere instructions to apply the exception using a generic computer component, which cannot provide an inventive concept. The claim is not patent eligible.


Claim Rejections - 35 USC § 102
The following is a quotation of pre-AIA  35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claims 1, 3 - 4 and 19 - 20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Oz et al., (U.S. Patent Application Publication 2025/0181679), hereinafter referred to as Oz.
Regarding Claims 1, 19 and 20, Oz teaches:
1. A method comprising, 19. A system comprising, and 20. One or more computer-readable storage media storing instructions that when executed by one or more computers cause the one or more computers to perform operations comprising: [Oz, “the method comprising: determining, for each of a plurality of tokens of the target data, a probability-based metric for a respective token using a model, the probability-based metric of a respective token of the plurality of tokens being based on a probability of the respective token given at least one preceding token of the plurality of tokens;” Par. 0006; “FIG. 1 is a schematic block diagram of an example environment including a system according to the disclosure.” Par. 0010; “a storage medium (i.e., the claimed “one or more computer-readable storage media,” Par. 0112; “Although at least some aspects of the embodiments described herein with reference to the drawings comprise computer processes performed in processing systems (i.e., the claimed “computer”) or processors, the invention also extends to computer programs, particularly computer programs on or in a carrier, adapted for putting the invention into practice.” Par. 0112]
obtaining data representing a sequence of text; [Oz, “These techniques apply a probability-based metric such as perplexity to assess the content of target data such as the textual prompt or response (i.e., the claimed “sequence of text”).” Par. 0007]
obtaining a sequence of tokens for the sequence of text comprising one or more groups of tokens, [Oz, “The subset of tokens may be a contiguous series of tokens (i.e., the claimed “sequence of tokens for the sequence of text comprising one or more groups of tokens”). The size of the subset of tokens is defined by the parameter w, which represents a window size of a so-called sliding window.” Par. 0061; “identifying a subset of the plurality of tokens (i.e., the claimed “one or more groups of tokens”) having a change in the probability-based metric with respect to others of the plurality of the tokens not within the subset of the plurality of tokens (i.e., the claimed “one or more groups of tokens”), the change being reflective of a reduced probability of the tokens in the subset of the plurality of tokens,” Par. 0006; “A shift from benign content to malicious content in a prompt may involve the use of tokens that are less probable given the previous tokens (i.e., the claimed “one or more groups of tokens”). In other words, the injected attack often comprises an unlikely sequence of tokens.” Par. 0021; “For example, a n-gram language model will take into account the probability of the preceding n tokens (i.e., the claimed “one or more groups of tokens”), whereas an LLM 201 may employ a substantially more complex means of determining the most probable next token.” Par. 0047; “The method may comprise processing the probability-based metric for a plurality of subsets of the plurality of tokens (i.e., the claimed “one or more groups of tokens”). The method may comprise detecting the jailbreak attempt in response to identifying the change in the probability-based metric in any of the subsets (i.e., the claimed one or more groups of tokens”).” Par. 0101]
wherein each group comprises two or more tokens; [Oz, “Each of the plurality of subsets (i.e., the claimed “each group”) may comprise a window having a window size of a predetermined number of tokens (i.e. ,the claimed “two or more tokens”).” Par. 0102; “identifying a subset of the plurality of tokens (i.e., the claimed “group comprises two or more tokens”) having a change in the probability-based metric with respect to others of the plurality of the tokens (i.e., the claimed “two or more tokens”) not within the subset of the plurality of tokens (i.e., the claimed “group comprises two or more tokens”), the change being reflective of a reduced probability of the tokens in the subset of the plurality of tokens,” Par. 0006; “A shift from benign content to malicious content in a prompt may involve the use of tokens that are less probable given the previous tokens (i.e., the claimed “group comprises two or more tokens”). In other words, the injected attack often comprises an unlikely sequence of tokens.” Par. 0021; “For example, a n-gram language model will take into account the probability of the preceding n tokens (i.e., the claimed “group comprises two or more tokens”), whereas an LLM 201 may employ a substantially more complex means of determining the most probable next token.” Par. 0047]
processing the one or more groups of tokens using a first machine learning model to identify tokens that are out of context in the sequence of tokens; and [Oz, “As well as outputting the most probable next token, language models have a related capability of determining the probability of a token given its context (i.e., the preceding tokens in the text) (i.e., the claimed “identify tokens that are out of context in the sequence of tokens”). The techniques herein make use of this capability to assess the probability of a token given its context, in order to identify token sequences that are unlikely in context (i.e., the claimed “identify tokens that are out of context in the sequence of tokens”) and thus representative of prompt injection attacks.” Par. 0048]
providing data representing the identified tokens. [Oz, “As well as outputting the most probable next token, language models have a related capability of determining the probability of a token given its context (i.e., the preceding tokens in the text) (i.e., the claimed “identify tokens that are out of context in the sequence of tokens”). The techniques herein make use of this capability to assess the probability of a token given its context, in order to identify token sequences that are unlikely in context (i.e., the claimed “identify tokens that are out of context in the sequence of tokens”) and thus representative of prompt injection attacks.” Par. 0048; “preventing display of the target data; preventing an action carried out based on a response of the generative model; generating an alert (i.e., the claimed “providing data representing the identified tokens”); generating a log entry (i.e., the claimed “providing data representing the identified tokens”); and changing user rights.” Par. 0097]

Regarding Claim 3, Oz has been discussed above. Oz further teaches:
wherein obtaining a sequence of tokens for the sequence of text comprises providing the sequence of text as input to a model that is configured to generate a sequence of tokens given an input sequence of text. [Oz, see mapping applied to claim 1; “Often, as in the case of an LLM 201, the model is used to generate text. That is to say, the language model is repeatedly used to output the most probable next token in a sequence, so as to generate a sequence of tokens forming a text.” Par. 0047]

Regarding Claim 4, Oz has been discussed above. Oz further teaches:
wherein each group represents a sentence fragment of the sequence of text. [Oz, see mapping applied to claim 1; Oz, “Each of the plurality of subsets (i.e., the claimed “each group”) may comprise a window having a window size of a predetermined number of tokens.” Par. 0102; “Conceptually, the sliding window is moved over the text (i.e., the claimed “sequence of text”) to define a segment or region of the text (i.e., the claimed “sentence fragment”) that is examined for a change in the perplexity.” Par. 0061]



Claim Rejections - 35 USC § 103
The following is a quotation of pre-AIA  35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains.  Patentability shall not be negatived by the manner in which the invention was made.


Claims 2, 15 - 16 are rejected under 35 U.S.C. 103(a) as being unpatentable over Oz in view of Tangari et al., (U.S. Patent Application Publication 2024/0061834), hereinafter referred to as Tangari.
Regarding Claim 2, Oz has been discussed above. Oz further teaches:
wherein the sequence of text represents one or more documents. [Oz, see mapping applied to claim 1]
Oz fails to explicitly teach one or more documents.
However, Tangari teaches:
wherein the sequence of text represents one or more documents. [Tangari, “In some instances, the pre-processing includes tokenizing the utterances of the data assets 645. Tokenizing is splitting a phrase, sentence, paragraph, or an entire text (i.e., the claimed “sequence of text”) document (i.e., the claimed “one or more documents”) into smaller units, such as individual words or terms. Each of these smaller units are called tokens.” Par. 0135]
Oz and Tangari pertain to “out of scope”/ “out of context” detection systems and are analogous to the instant application. Accordingly, it would have been obvious to one of ordinary skill in the “out of scope”/ “out of context” detection systems art to modify Oz’s teachings of “assessing the probability of a token given its context, in order to identify token sequences that are unlikely in context (i.e., the claimed “identify tokens that are out of context in the sequence of tokens”)” (Oz, Par. 0048) with the teachings of “entire text (i.e., the claimed “sequence of text”) document (i.e., the claimed “one or more documents”)” (Tangari, Par. 0135) taught by Tangari in order to “detect out-of-domain and out-of-scope input, and more particularly, to machine-learning techniques for detecting out-of-domain and out-of-scope utterances” (Tangari, Par. 0002).

Regarding Claim 15, Oz in view of Tangari has been discussed above. The combination further teaches:
15. A method comprising: [Oz, see mapping applied to claim 1]
obtaining data representing a sequence of text; [Oz, see mapping applied to claim 1]
dividing the sequence of text into a plurality of segments, [Oz, Oz, “Each of the plurality of subsets (i.e., the claimed “plurality of segments”) may comprise a window having a window size of a predetermined number of tokens.” Par. 0102; “The subset of tokens may be a contiguous series of tokens. The size of the subset of tokens is defined by the parameter w, which represents a window (i.e., the claimed “dividing the sequence of text into a plurality of segments”) size of a so-called sliding window. Conceptually, the sliding window is moved over the text to define a segment or region of the text (i.e., the claimed “dividing the sequence of text into a plurality of segments”)) that is examined for a change in the perplexity.” Par. 0061; Referring to Specification Pg. 12, Ln. 17-20 of the instant Application indicates window is used to divide sequence of text: “divide the sequence of tokens into groups that include a number of tokens that is less than or equal to a context window”; “context window that defines the number of tokens”.]
wherein each segment comprises a plurality of words or subwords that are semantically relevant; [Oz, see mapping applied to claim 1; Oz, “Each of the plurality of subsets (i.e., the claimed “each segment”) may comprise a window having a window size of a predetermined number of tokens.” Par. 0102; “Conceptually, the sliding window is moved over the text (i.e., the claimed “sequence of text”) to define a segment or region of the text (i.e., the claimed “sentence fragment”) that is examined for a change in the perplexity.” Par. 0061; “As well as outputting the most probable next token, language models have a related capability of determining the probability of a token given its context (i.e., the preceding tokens in the text). The techniques herein make use of this capability to assess the probability of a token given its context, in order to identify token sequences (i.e., the claimed “segments”) that are unlikely in context (i.e., “semantically irrelevant”) and thus representative of prompt injection attacks.” Par. 0048; Oz, “The unit may be a semantic unit—i.e., a part of the input to the model that conveys meaning. For example, in the context of language models, each token may be a word (i.e., the claimed “semantically relevant”) or punctuation mark.” Par. 0049; Tangari, “The techniques described herein solve these problems and others by accurately detecting out-of-domain, out-of-scope, and confusion-span utterances (i.e., “semantically irrelevant”) before invoking the semantic parser or routing the utterance elsewhere.” Par. 0037; Tangari, “Out of domain and out of scope detection are important in systems that process natural language input. Given an arbitrary natural language input, a machine learning model may not be able to produce desired outputs if that utterance is not relevant (i.e., the claimed “semantically relevant”), unclear, or ambiguous.” Par. 0007]
processing the plurality of segments using a second machine learning model to identify segments that include surreptitious language; and [Tangari, “second machine learning model” Par. 0009; Oz, “Each of the plurality of subsets (i.e., the claimed “each segment”) may comprise a window having a window size of a predetermined number of tokens.” Par. 0102; “As well as outputting the most probable next token, language models have a related capability of determining the probability of a token given its context (i.e., the preceding tokens in the text). The techniques herein make use of this capability to assess the probability of a token given its context, in order to identify token sequences (i.e., the claimed “identified segments”) that are unlikely in context (i.e., the claimed “surreptitious language”) and thus representative of prompt injection attacks.” Par. 0048; Referring to the Specification Pg. 5, Ln. 9, “surreptitious” speech/language refers to “words or phrases that are out of context”.]
providing data representing the identified segments. [Oz, “As well as outputting the most probable next token, language models have a related capability of determining the probability of a token given its context (i.e., the preceding tokens in the text). The techniques herein make use of this capability to assess the probability of a token given its context, in order to identify token sequences (i.e., the claimed “identified segments”) that are unlikely in context and thus representative of prompt injection attacks.” Par. 0048; “preventing display of the target data; preventing an action carried out based on a response of the generative model; generating an alert (i.e., the claimed “providing data representing the identified segments”); generating a log entry (i.e., the claimed “providing data representing the identified segments”); and changing user rights.” Par. 0097]

Regarding Claim 16, Oz in view of Tangari has been discussed above. The combination further teaches:
wherein processing the plurality of segments using a second machine learning model to identify segments that include surreptitious language comprises: [Oz, see mapping applied to claim 15; Tangari, see mapping applied to claim 15]
for each segment of the plurality of segments: [Oz, see mapping applied to claim 15; Tangari, see mapping applied to claim 15]
providing the segment to the second machine learning model, [Oz, see mapping applied to claim 15; Tangari, see mapping applied to claim 15]
wherein the second machine learning model is configured to generate a score representing a likelihood that an input segment of text includes language that indicates an author of the segment is hiding information; [Oz, see mapping applied to claim 15; Tangari, see mapping applied to claim 15; Oz, “In each of the prompts 300, 320, and in response 310, there is a relatively sudden shift in content or tone (i.e., the claimed “indicates an author of the segment is hiding information”). Put differently, the tokens comprising the malicious (i.e., the claimed “indicates an author of the segment is hiding information”) sections 302, 312, 322 of the texts (i.e., the claimed “input segment of text”) are tokens that are of relatively low probability (i.e., the claimed “score representing a likelihood”) given the preceding tokens.” Par. 0046; “For example, a n-gram language model (i.e., the claimed “machine learning model”) will take into account the probability (i.e., the claimed “score representing a likelihood”) of the preceding n tokens, whereas an LLM (i.e., the claimed “machine learning model”) 201 may employ a substantially more complex means of determining the most probable next token.” Par. 0047; “The techniques herein make use of probability-based metrics, which encompasses the raw probability (i.e., the claimed “score representing a likelihood”) itself as well as any suitable metric that takes into account the probability of tokens based on their context.” Par. 0050; “The disclosure herein detects localized changes in probability across the prompt or response, which may be reflective of a shift from benign to malicious (i.e., the claimed “indicates an author of the segment is hiding information”) content.” Par. 0007; “malicious actor (i.e., the claimed “indicates an author of the segment is hiding information”)” Par. 0019]
determining that the score for the segment meets a threshold score; and [Oz, see mapping applied to claim 1; “A variety of techniques are possible and contemplated for identifying the change in the probability-based metric. For example, in addition to applying thresholds (i.e., the claimed “threshold score”) to the first and last tokens of the window, other tokens may also be examined.” Par. 0074; “Identifying the change in the probability-based metric in the subset of the plurality of tokens may comprise determining that a first token of the subset has a probability-based metric reflective of a probability below a first threshold (i.e., the claimed “meet a threshold score).” Par. 0107; “identifying a subset of the plurality of tokens having a change in the probability-based metric with respect to others of the plurality of the tokens not within the subset of the plurality of tokens, the change being reflective of a reduced probability (i.e., the claimed “does not meet a threshold score”) of the tokens in the subset of the plurality of tokens,” Par. 0006; “A shift from benign content to malicious content in a prompt may involve the use of tokens that are less probable (i.e., the claimed “does not meet a threshold score”) given the previous tokens. In other words, the injected attack often comprises an unlikely sequence of tokens.” Par. 0021]
in response to determining that the score for the segment meets the threshold score, [Oz, see mapping applied to claim 1; “A variety of techniques are possible and contemplated for identifying the change in the probability-based metric. For example, in addition to applying thresholds (i.e., the claimed “threshold score”) to the first and last tokens of the window, other tokens may also be examined.” Par. 0074; “Identifying the change in the probability-based metric in the subset of the plurality of tokens may comprise determining that a first token of the subset has a probability-based metric reflective of a probability below a first threshold (i.e., the claimed “meet a threshold score).” Par. 0107; “identifying a subset of the plurality of tokens having a change in the probability-based metric with respect to others of the plurality of the tokens not within the subset of the plurality of tokens, the change being reflective of a reduced probability (i.e., the claimed “does not meet a threshold score”) of the tokens in the subset of the plurality of tokens,” Par. 0006; “A shift from benign content to malicious content in a prompt may involve the use of tokens that are less probable (i.e., the claimed “does not meet a threshold score”) given the previous tokens. In other words, the injected attack often comprises an unlikely sequence of tokens.” Par. 0021]
identifying the segment as including surreptitious language. [Oz, see mapping applied to claim 1; Referring to the Specification Pg. 5, Ln. 9, “surreptitious” speech/language refers to “words or phrases that are out of context”.]


Claims 5, 7 - 11 are rejected under 35 U.S.C. 103(a) as being unpatentable over Oz in view of Zaremoodi et al., (U.S. Patent 12,554,934), hereinafter referred to as Zaremoodi.
Regarding Claim 5, Oz has been discussed above. Oz further teaches:
wherein processing the one or more groups of tokens using a first machine learning model to identify tokens that are out of context in the sequence of tokens comprises: [Oz, see mapping applied to claims 1 and 3]
for each group in the one or more groups: [Oz, see mapping applied to claim 1] 
for each token in the group: [Oz, see mapping applied to claim 1]
generating an input prompt for the token, [Oz, see mapping applied to claim 1; “The LLM 201 is trained on a very large corpus (e.g., in the order of billions of tokens), and can generate text or data in response to receipt of an input in the form of a prompt (i.e., the claimed “input prompt”).” Par. 0023]
wherein the input prompt comprises the two or more tokens of the group and a mask in a location of the token; [Oz, see mapping applied to claim 1; “The LLM 201 is trained on a very large corpus (e.g., in the order of billions of tokens), and can generate text or data in response to receipt of an input in the form of a prompt (i.e., the claimed “input prompt”).” Par. 0023]
providing the input prompt to the first machine learning model, [Oz, see mapping applied to claim 1; “The LLM (i.e., the claimed “machine learning model”) 201 is trained on a very large corpus (e.g., in the order of billions of tokens), and can generate text or data in response to receipt of an input in the form of a prompt (i.e., the claimed “input prompt”).” Par. 0023; Oz, “The application 140 may make use of template prompts 121 as a basis for the prompts 202 provided to the LLM 201 (i.e., the claimed “first machine learning model”).”, Par. 0029]
wherein the first machine learning model is configured to generate a probability distribution given one or more tokens and a mask, [Oz, “In each of the prompts 300, 320, and in response 310, there is a relatively sudden shift in content or tone. Put differently, the tokens (i.e., the claimed “one or more tokens”) comprising the malicious sections 302, 312, 322 of the texts are tokens (i.e., the claimed “one or more tokens”) that are of relatively low probability (i.e., the claimed “probability distribution”) given the preceding tokens.” Par. 0046; “For example, a n-gram language model (i.e., the claimed “machine learning model”) will take into account the probability (i.e., the claimed “probability distribution”) of the preceding n tokens (i.e., the claimed “one or more tokens”), whereas an LLM (i.e., the claimed “machine learning model”) 201 may employ a substantially more complex means of determining the most probable next token.” Par. 0047; “The techniques herein make use of probability-based metrics, which encompasses the raw probability (i.e., the claimed “probability distribution”) itself as well as any suitable metric that takes into account the probability of tokens based on their context.” Par. 0050; Referring to the Specification Pg. 6, Ln. 5 - 6 of the instant Application, “probability distribution” indicates that “probability distribution” is used to indicate a “probability” value: “the probability distribution describing the probability that a word appears in a particular location”. ]
wherein the probability distribution comprises a respective probability that each of a plurality of tokens appears in the location of the mask, and  [Oz, “In each of the prompts 300, 320, and in response 310, there is a relatively sudden shift in content or tone. Put differently, the tokens (i.e., the claimed “one or more tokens”) comprising the malicious sections 302, 312, 322 of the texts are tokens (i.e., the claimed “one or more tokens”) that are of relatively low probability (i.e., the claimed “probability distribution”) given the preceding (i.e., the claimed “location”) tokens.” Par. 0046; “For example, a n-gram language model (i.e., the claimed “machine learning model”) will take into account the probability (i.e., the claimed “probability distribution”) of the preceding (i.e., the claimed “location”) n tokens (i.e., the claimed “one or more tokens”), whereas an LLM (i.e., the claimed “machine learning model”) 201 may employ a substantially more complex means of determining the most probable next token.” Par. 0047; “The techniques herein make use of probability-based metrics, which encompasses the raw probability (i.e., the claimed “probability distribution”) itself as well as any suitable metric that takes into account the probability of tokens based on their context.” Par. 0050; Referring to the Specification Pg. 6, Ln. 5 - 6 of the instant Application, “probability distribution” indicates that “probability distribution” is used to indicate a “probability” value: “the probability distribution describing the probability that a word appears in a particular location”.]
wherein the plurality of tokens includes the tokens; [Oz, see mapping applied to claim 1]
determining that the respective probability for the tokens does not meet a threshold probability; and [Oz, see mapping applied to claim 1; “A variety of techniques are possible and contemplated for identifying the change in the probability-based metric. For example, in addition to applying thresholds (i.e., the claimed “threshold probability”) to the first and last tokens of the window, other tokens may also be examined.” Par. 0074; “Identifying the change in the probability-based metric in the subset of the plurality of tokens may comprise determining that a first token of the subset has a probability-based metric reflective of a probability below a first threshold (i.e., the claimed “does not meet a threshold probability).” Par. 0107; “identifying a subset of the plurality of tokens having a change in the probability-based metric with respect to others of the plurality of the tokens not within the subset of the plurality of tokens, the change being reflective of a reduced probability (i.e., the claimed “does not meet a threshold probability”) of the tokens in the subset of the plurality of tokens,” Par. 0006; “A shift from benign content to malicious content in a prompt may involve the use of tokens that are less probable (i.e., the claimed “does not meet a threshold probability”) given the previous tokens. In other words, the injected attack often comprises an unlikely sequence of tokens.” Par. 0021]
in response to determining that the respective probability for the token does not meet a threshold probability, [Oz, see mapping applied to claim 1; “A variety of techniques are possible and contemplated for identifying the change in the probability-based metric. For example, in addition to applying thresholds (i.e., the claimed “threshold probability”) to the first and last tokens of the window, other tokens may also be examined.” Par. 0074; “Identifying the change in the probability-based metric in the subset of the plurality of tokens may comprise determining that a first token of the subset has a probability-based metric reflective of a probability below a first threshold (i.e., the claimed “does not meet a threshold probability).” Par. 0107; “identifying a subset of the plurality of tokens having a change in the probability-based metric with respect to others of the plurality of the tokens not within the subset of the plurality of tokens, the change being reflective of a reduced probability (i.e., the claimed “does not meet a threshold probability”) of the tokens in the subset of the plurality of tokens,” Par. 0006; “A shift from benign content to malicious content in a prompt may involve the use of tokens that are less probable (i.e., the claimed “does not meet a threshold probability”) given the previous tokens. In other words, the injected attack often comprises an unlikely sequence of tokens.” Par. 0021]
identifying the token as out of context. [Oz, see mapping applied to claim 1]
Oz fails to teach mask.
However, Zaremoodi teaches:
wherein the input prompt comprises the two or more tokens of the group and a mask in a location of the token; [Zaremoodi, “The dialog flow defines operations or actions that a skill bot will take, e.g., how the skill bot responds to user utterances, how the skill bot prompts users for input (i.e., the claimed “input prompt”), how the skill bot returns data.” Col. 15:11-15; “As described above, each token string may include tokens (i.e., the claimed “two or more tokens of the group) from one sentence or from a plurality of sentences. The masker 536 masks one or more token embeddings (i.e., the claimed “mask in a location of the token) that come from a less significant sentence for a particular token string.” Col. 39:48-52]
wherein the first machine learning model is configured to generate a probability distribution given one or more tokens and a mask, [Zaremoodi, “The dialog flow defines operations or actions that a skill bot will take, e.g., how the skill bot responds to user utterances, how the skill bot prompts users for input (i.e., the claimed “input prompt”), how the skill bot returns data.” Col. 15:11-15; “As described above, each token string may include tokens (i.e., the claimed “two or more tokens of the group) from one sentence or from a plurality of sentences. The masker 536 masks one or more token embeddings (i.e., the claimed “mask in a location of the token) that come from a less significant sentence for a particular token string.” Col. 39:48-52]
wherein the probability distribution comprises a respective probability that each of a plurality of tokens appears in the location of the mask, and [Zaremoodi, “The dialog flow defines operations or actions that a skill bot will take, e.g., how the skill bot responds to user utterances, how the skill bot prompts users for input (i.e., the claimed “input prompt”), how the skill bot returns data.” Col. 15:11-15; “As described above, each token string may include tokens (i.e., the claimed “two or more tokens of the group) from one sentence or from a plurality of sentences. The masker 536 masks one or more token embeddings (i.e., the claimed “mask in a location of the token) that come from a less significant sentence for a particular token string.” Col. 39:48-52]
Oz and Zaremoodi pertain to machine learning systems and are analogous to the instant application. Accordingly, it would have been obvious to one of ordinary skill in the machine learning systems art to modify Oz’s teachings of “assessing the probability of a token given its context, in order to identify token sequences that are unlikely in context (i.e., the claimed “identify tokens that are out of context in the sequence of tokens”)” (Oz, Par. 0048) with the teachings of “masks” (Zaremoodi, Col. 39:48-52) taught by Zaremoodi in order to “to understand the end user's intention” (Zaremoodi, Col.1:30).

Regarding Claim 7, Oz in view of Zaremoodi has been discussed above. The combination further teaches:
wherein processing the one or more groups of tokens using a first machine learning model to identify tokens that are out of context in the sequence of tokens comprises processing the one or more groups of tokens using the first machine learning model to identify phrases that are out of context, [Oz, see mapping applied to claims 1 and 3; Zaremoodi, “Therefore, utterances can be phrased as questions, commands, requests, and the like (i.e., the claimed “phrases”), that reflect the user's intent.” Col. 8:33-34]
wherein each phrase comprises two or more consecutive tokens. [Oz, see mapping applied to claims 1 and 3; Zaremoodi, “Therefore, utterances can be phrased as questions, commands, requests, and the like (i.e., the claimed “phrases”), that reflect the user's intent.” Col. 8:33-34; “16 tokens (i.e. ,the claimed “two or more consecutive tokens”) of the sentence B (i.e., the claimed “phrase”),” Col. 30:46-47; Oz, “The subset of tokens may be a contiguous series of tokens.” Par. 0061]

Regarding Claim 8, Oz in view of Zaremoodi has been discussed above. The combination further teaches:
wherein processing the one or more groups of tokens using the first machine learning model to identify phrases that are out of context comprises: [Oz, see mapping applied to claims 1, 3 and 7; Zaremoodi, see mapping applied to claims 7]
for each group in the one or more groups: [Oz, see mapping applied to claim 1]
identifying one or more phrases in the group; [Oz, see mapping applied to claims 1, 3, 5 and 7; Zaremoodi, see mapping applied to claims 5 and 7]
for each identified phrase: [Oz, see mapping applied to claims 1, 3, 5 and 7; Zaremoodi, see mapping applied to claims 5 and 7]
for each token in the identified phrase: [Oz, see mapping applied to claims 1, 3, 5 and 7; Zaremoodi, see mapping applied to claims 5 and 7] 
generating an input prompt, wherein the input prompt comprises the two or more tokens of the group and a mask in a location of the token; [Oz, see mapping applied to claims 1, 3, 5 and 7; Zaremoodi, see mapping applied to claims 5 and 7]
providing the input prompt to the first machine learning model, [Oz, see mapping applied to claims 1, 3, 5 and 7; Zaremoodi, see mapping applied to claims 5 and 7]
wherein the first machine learning model is configured to generate a probability distribution given one or more tokens and a mask, [Oz, see mapping applied to claims 1, 3, 5 and 7; Zaremoodi, see mapping applied to claims 5 and 7]
wherein the probability distribution comprises a respective probability that each of a plurality of tokens appears in the location of the mask, and [Oz, see mapping applied to claims 1, 3, 5 and 7; Zaremoodi, see mapping applied to claims 5 and 7]
wherein the plurality of tokens includes the token; [Oz, see mapping applied to claims 1, 3, 5 and 7; Zaremoodi, see mapping applied to claims 5 and 7; Oz, “determining, for each of a plurality of tokens of the target data,” Par. 0095]
determining a respective probability for the token; [Oz, see mapping applied to claims 1, 3, 5 and 7; Zaremoodi, see mapping applied to claims 5 and 7; Oz, “the probability-based metric of a respective token of the plurality of tokens being based on a probability of the respective token (i.e., the claimed “respective probability for the token”) given at least one preceding token of the plurality of tokens,” Par. 0095]
determining a combined probability for the identified phrase based on the respective probabilities for the tokens in the identified phrase; [Oz, see mapping applied to claims 1, 3, 5 and 7; Zaremoodi, see mapping applied to claims 5 and 7; Oz, “The method may comprise determining the probability-based metrics of the first plurality of tokens and/or the second plurality of tokens based on the probability values.” Par. 0106; “Identifying the change in the probability-based metric in the subset of the plurality of tokens may comprise determining that a first token of the subset (i.e., the claimed “phrase”) has a probability-based metric reflective of a probability below a first threshold. The method may comprise determining that a second token of the subset (i.e., the claimed “phrase”) has a probability-based metric reflective of a probability below a second threshold. The first token may form a beginning of the subset. The second token may form an end of the subset. The method may comprise determining the second threshold based on the first threshold (i.e., the claimed “combined probability”).” Par. 0107]
determining that the combined probability for the identified phrase does not meet a second threshold probability; and [Oz, see mapping applied to claims 1, 3, 5 and 7; Zaremoodi, see mapping applied to claims 5 and 7; Oz, “The method may comprise determining the probability-based metrics of the first plurality of tokens and/or the second plurality of tokens based on the probability values.” Par. 0106; “Identifying the change in the probability-based metric in the subset of the plurality of tokens may comprise determining that a first token of the subset (i.e., the claimed “phrase”) has a probability-based metric reflective of a probability below a first threshold. The method may comprise determining that a second token of the subset (i.e., the claimed “phrase”) has a probability-based metric reflective of a probability below a second threshold (i.e., the claimed “does not meet a second threshold probability”). The first token may form a beginning of the subset. The second token may form an end of the subset. The method may comprise determining the second threshold based on the first threshold (i.e., the claimed “combined probability”).” Par. 0107] 
in response to determining that the combined probability for the identified phrase does not meet the second threshold probability, [Oz, see mapping applied to claims 1, 3, 5 and 7; Zaremoodi, see mapping applied to claims 5 and 7; Oz, “The method may comprise determining the probability-based metrics of the first plurality of tokens and/or the second plurality of tokens based on the probability values.” Par. 0106; “Identifying the change in the probability-based metric in the subset of the plurality of tokens may comprise determining that a first token of the subset (i.e., the claimed “phrase”) has a probability-based metric reflective of a probability below a first threshold. The method may comprise determining that a second token of the subset (i.e., the claimed “phrase”) has a probability-based metric reflective of a probability below a second threshold (i.e., the claimed “does not meet a second threshold probability”). The first token may form a beginning of the subset. The second token may form an end of the subset. The method may comprise determining the second threshold based on the first threshold (i.e., the claimed “combined probability”).” Par. 0107] 
identifying the identified phrase as out of context. [Oz, see mapping applied to claims 1, 3, 5 and 7; Zaremoodi, see mapping applied to claims 5 and 7]

Regarding Claim 9, Oz in view of Zaremoodi has been discussed above. The combination further teaches:
wherein identifying one or more phrases comprises identifying one or more phrases that each comprise two or more consecutive tokens and less than a maximum number of consecutive tokens. [Oz, see mapping applied to claims 1, 3, 5 and 7; Zaremoodi, see mapping applied to claims 5 and 7; “In some embodiments, the number of tokens from the main sentence (i.e., the claimed “phrase”) can be limited by a maximum chunk size (i.e., the claimed “less than a maximum number of consecutive tokens”),” Col. 31:51-52; Oz, “The subset of tokens may be a contiguous series of tokens.” Par. 0061]

Regarding Claim 10, Oz in view of Zaremoodi has been discussed above. The combination further teaches:
wherein the first machine learning model comprises a language model that has been trained on a masked language modeling task. [Oz, see mapping applied to claims 1, 3, 5 and 7; Zaremoodi, see mapping applied to claims 5 and 7; Zaremoodi, “building a multi-task model (i.e., the claimed “first machine learning model comprises a language model”) using context masking (i.e., the claimed “masked language modeling task”),” Col. 5: 51-52; “The masking subsystem 532 performs certain processing on the first training dataset 520 that results in the generation of a second training dataset 534 that is then passed on to the multi-task model generation subsystem 505. The multi-task model generation subsystem 505 performs processing on the first training dataset 520 and/or the second training dataset 534 to generate the multi-task model 507.” Col. 32:48-54]

Regarding Claim 11, Oz in view of Zaremoodi has been discussed above. The combination further teaches:
wherein the language model comprises an encoder-based transformer. [[Oz, see mapping applied to claims 1, 3, 5 and 7; Zaremoodi, see mapping applied to claims 5, 7 and 10; Zaremoodi teaches BERT which is encoder based transformer: “In some embodiments, the ML models could include, but not limited to, convolutional neural network (CNN), linear regression, logistic regression, deep recurrent neural network (e.g., fully-connected recurrent neural network (RNN), Gated Recurrent Unit (GRU), long short-term memory, (LSTM)), transformer-based methods (e.g. XLNet, BERT, XLM, RoBERTa),” Col. 24: 1-7]


Claim 6 is rejected under 35 U.S.C. 103(a) as being unpatentable over Oz in view of Zaremoodi, and Harang et al., (U.S. Patent 12,437,239), hereinafter referred to as Harang.
Regarding Claim 6, Oz in view of Zaremoodi has been discussed above. The combination further teaches:
wherein the threshold probability is obtained from a user. [Oz, see mapping applied to claim 5]
The combination fails to explicitly teach obtaining from a user.
However, Harang teaches:
wherein the threshold probability is obtained from a user. [Harang, “In some implementations, the evaluator 114 can receive the threshold criteria from a user or a human analyst.” Col. 17:18-20]
Oz, Zaremoodi and Harang pertain to machine learning systems and are analogous to the instant application. Accordingly, it would have been obvious to one of ordinary skill in the machine learning systems art to modify Oz’s teachings of “assessing the probability of a token given its context, in order to identify token sequences that are unlikely in context (i.e., the claimed “identify tokens that are out of context in the sequence of tokens”)” (Oz, Par. 0048) with the teachings of “masks” (Zaremoodi, Col. 39:48-52) taught by Zaremoodi and the teachings of “threshold criteria from a user” (Harang, Col. 17:18-20) taught by Harang in order to “to understand the end user's intention” (Zaremoodi, Col.1:30) and “adapt to changes in the landscape of artifacts that can potentially carry out malicious activity” (Harang, Col. 1: 28-29).


Claims 12 - 14 are rejected under 35 U.S.C. 103(a) as being unpatentable over Oz in view of Catalano et al., (U.S. Patent Application Publication 2019/0325020), hereinafter referred to as Catalano.
Regarding Claim 12, Oz has been discussed above. Oz further teaches:
further comprising identifying high-value tokens from the identified tokens. [Oz, see mapping applied to claim 1]
Oz fails to teach high-value tokens.
However, Catalano teaches:
further comprising identifying high-value tokens from the identified tokens. [Catalano, “a subset of tokens that have a higher occurrence frequency (i.e., the claimed “high-value tokens”) in the lexicon frequency index of the author as compared to in the reference frequency index.” Par. 0082]
Oz and Catalano pertain to machine learning systems and are analogous to the instant application. Accordingly, it would have been obvious to one of ordinary skill in the machine learning systems art to modify Oz’s teachings of “assessing the probability of a token given its context, in order to identify token sequences that are unlikely in context (i.e., the claimed “identify tokens that are out of context in the sequence of tokens”)” (Oz, Par. 0048) with the teachings of “subset of tokens that have a higher occurrence frequency (i.e., the claimed “high-value tokens”)” (Catalano, Par. 0082) taught by Catalano in order to provide “recommendations using tokenization from natural language processing” (Catalano, Par. 0001).

Regarding Claim 13, Oz in view of Catalano has been discussed above. The combination further teaches:
wherein identifying high-value tokens from the identified tokens comprises: [Oz, see mapping applied to claim 1; Catalano, see mapping applied to claim 12]
for each of the identified tokens, [Oz, see mapping applied to claim 1]
determining a number of occurrences of the identified token in the sequence of text; and [Oz, see mapping applied to claim 1; Catalano, see mapping applied to claim 12]
identifying one or more identified tokens with a number of occurrences over a threshold number of occurrences as high-value tokens. [Catalano, see mapping applied to claim 12; Catalano, “identifies, as the high frequency words, a subset of tokens (i.e., the claimed “high-value tokens”) of the screened frequency index of the author that have an occurrence frequency higher than a predetermined threshold occurrence frequency (i.e., the claimed “threshold number of occurrences”).” Par. 0082]

Regarding Claim 14, Oz in view of Catalano has been discussed above. The combination further teaches:
wherein the sequence of text represents one or more documents originating from one or more authors, and [Oz, see mapping applied to claim 1; Catalano, “For example, in some embodiments of the present invention, communication content sources read by a user include websites, books, and/or online journals that are being linked to a user via the user's lexicon profile. In some embodiment of the present invention, communication content sources that are written by the user and linked to the user's lexicon profile include publications, social media posts, emails, SMS text messages, and/or locally stored documents.” Par. 0069]
wherein identifying high-value tokens from the identified tokens comprises: [Oz, see mapping applied to claim 1; Catalano, see mapping applied to claim 12]
obtaining one or more authors of interest from the one or more authors; [Catalano, “Embodiments of the present invention provide a computer-implemented method for recommending at least one author, of a plurality of authors (i.e., the claimed “one or more authors of interest from the one or more authors”),” Par. 0004]
for each of the identified tokens, [Oz, see mapping applied to claim 1; Catalano, see mapping applied to claim 12]
determining a corresponding set of authors for the identified token; and [Catalano, “Embodiments of the present invention provide a computer-implemented method for recommending at least one author, of a plurality of authors (i.e., the claimed “one or more authors of interest from the one or more authors”),” Par. 0004; “Author word frequency mapping program 600 then identifies, as the high frequency words, a subset of tokens of the screened frequency index of the author that have an occurrence frequency higher than a predetermined threshold occurrence frequency.” Par. 0082]
identifying one or more identified tokens with a corresponding set of authors that includes at least one of the one or more authors of interest as high-value tokens. [Catalano, “Embodiments of the present invention provide a computer-implemented method for recommending at least one author, of a plurality of authors (i.e., the claimed “at least one of the one or more authors of interest”),” Par. 0004; “Author word frequency mapping program 600 then identifies, as the high frequency words, a subset of tokens (i.e., the claimed “high-value tokens”) of the screened frequency index of the author that have an occurrence frequency higher than a predetermined threshold occurrence frequency.” Par. 0082]


Claims 17 - 18 are rejected under 35 U.S.C. 103(a) as being unpatentable over Oz in view of Tangari, and Kraus et al., (U.S. Patent Application Publication 2020/0285737), hereinafter referred to as Kraus.
Regarding Claim 17, Oz in view of Tangari has been discussed above. The combination further teaches:
wherein processing the plurality of segments using a second machine learning model to identify segments that include surreptitious language comprises: [Oz, see mapping applied to claim 15; Tangari, see mapping applied to claim 15]
obtaining a timestamp for each segment in the plurality of segments; [Oz, see mapping applied to claim 15; Tangari, see mapping applied to claim 15]
determining a temporally ordered sequence of segments for the plurality of segments based on the timestamps for each segment; [Oz, see mapping applied to claim 15; Tangari, see mapping applied to claim 15]
providing the first segment to the second machine learning model, [Oz, see mapping applied to claim 15; Tangari, see mapping applied to claim 15]
wherein the second machine learning model is configured to generate a score representing a likelihood that an input segment of text includes language that indicates an author of the segment is hiding information; [Oz, see mapping applied to claim 15; Tangari, see mapping applied to claim 15]
determining that the score for the first segment meets a threshold score; and [Oz, see mapping applied to claim 15; Tangari, see mapping applied to claim 15]
in response to determining that the score for the first segment meets the threshold score, [Oz, see mapping applied to claim 15; Tangari, see mapping applied to claim 15]
identifying the first segment as including surreptitious language. [Oz, see mapping applied to claim 15; Tangari, see mapping applied to claim 15]
The combination fails to teach timestamp and temporally ordered.
However, Kraus teaches:
obtaining a timestamp for each segment in the plurality of segments; [Kraus, “This embodiment represents an event sequence 410 by a textual document 804: it transforms an event's feature tuple 806 <timestamp, authentication type, operation type, error code, account name, IP address, user agent, and response size> into a sequence of corresponding tokens 812, in which each token represents a feature value.” Par. 0301]
determining a temporally ordered sequence of segments for the plurality of segments based on the timestamps for each segment; [Kraus, “In some embodiments, each selected event has an associated timestamp, and the difference between timestamps of any two consecutive selected events when the selected events are ordered (i.e., the claimed “temporally ordered”) by timestamp (i.e., the claimed “based on timestamps”) value  is no more than max-time-between-events 516.” Par. 0253]
for each consecutive pair of segments in the temporally ordered sequence: [Kraus, “In some embodiments, each selected event has an associated timestamp, and the difference between timestamps of any two consecutive selected events (i.e., the claimed “each consecutive pair of segments”) when the selected events are ordered by timestamp value is no more than max-time-between-events 516. In some, the difference between timestamps of the earliest and latest selected events is no more than max-time-between-events 516.” Par. 0253]
determining an interval of time elapsed between a first segment of the consecutive pair of segments and a second segment of the consecutive pair of segments; [Kraus, “This embodiment represents an event sequence (i.e., the claimed “sequence of segments”) 410 by a textual document 804: it transforms an event's feature tuple 806 <timestamp, authentication type, operation type, error code, account name, IP address, user agent, and response size> into a sequence of corresponding tokens (i.e. ,the claimed “sequence of segments”) 812, in which each token represents a feature value.” Par. 0301; “In some embodiments, each selected event (i.e., the claimed “segment”) has an associated timestamp, and the difference between timestamps of any two consecutive selected events (i.e., the claimed “first segment of the consecutive pair of segments and a second segment of the consecutive pair of segments”) when the selected events are ordered by timestamp value is no more than max-time-between-events 516. In some, the difference (i.e., the claimed “interval of time elapse”) between timestamps of the earliest and latest selected events is no more than max-time-between-events 516.” Par. 0253]

determining that the interval of time meets a threshold interval of time; [Kraus, In some, the difference (i.e., the claimed “interval of time elapse”) between timestamps of the earliest and latest selected events is no more than max-time-between-events 516.” Par. 0253; “N events 520 sampled from the activity which occurred between midnight and 6:00 am, where N is a specified threshold (i.e., the claimed “threshold interval of time”) 1020.” Par. 0254]
in response to determining that the interval of time meets the threshold interval of time, [Kraus, In some, the difference (i.e., the claimed “interval of time elapse”) between timestamps of the earliest and latest selected events is no more than max-time-between-events 516.” Par. 0253; “N events 520 sampled from the activity which occurred between midnight and 6:00 am, where N is a specified threshold (i.e., the claimed “threshold interval of time”) 1020.” Par. 0254]
Oz, Tangari and Kraus pertain to “out of scope”/ “anomaly”/ “out of context” detection systems and are analogous to the instant application. Accordingly, it would have been obvious to one of ordinary skill in the “out of scope”/ “anomaly”/ “out of context” detection systems art to modify Oz’s teachings of “assessing the probability of a token given its context, in order to identify token sequences that are unlikely in context (i.e., the claimed “identify tokens that are out of context in the sequence of tokens”)” (Oz, Par. 0048) with the teachings of “entire text (i.e., the claimed “sequence of text”) document (i.e., the claimed “one or more documents”)” (Tangari, Par. 0135) taught by Tangari and the teachings of “timestamps” (Kraus, Par. 0253) taught by Kraus in order to “detect out-of-domain and out-of-scope input, and more particularly, to machine-learning techniques for detecting out-of-domain and out-of-scope utterances” (Tangari, Par. 0002) and “detecting unusual behavior” (Kraus, Par. 0003).

Regarding Claim 18, Oz in view of Tangari and Kraus has been discussed above. The combination further teaches:
wherein the threshold interval of time is determined based on an average interval of time elapsed between consecutive segments in the temporally ordered sequence of segments. [Oz, see mapping applied to claim 17; Tangari, see mapping applied to claim 17; Kraus, see mapping applied to claim 17; Kraus, “moving average (i.e., the claimed “average interval of time elapse”), or N events (i.e., the claimed “segments”) 520 sampled from the activity which occurred between midnight and 6:00 am, where N is a specified threshold (i.e., the claimed “threshold interval of time”) 1020.” Par. 0254]


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Singh et al., (U.S. Patent Application Publication 2025/0298980) teaches high frequency tokens/frequently occurring words (i.e., the claimed “high value tokens”).
Suchow et al., (U.S. Patent Application Publication 2025/0190706) teaches detecting spurious reasoning.
Chen et al., (U.S. Patent Application Publication 12,468,887) teaches detecting metaphors.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to EUNICE LEE whose telephone number is 571-272-1886. The examiner can normally be reached M-F 8:00 AM - 5:00 PM.

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/EUNICE LEE/Examiner, Art Unit 2656
/BHAVESH M MEHTA/Supervisory Patent Examiner, Art Unit 2656
Read full office action
Prosecution Timeline

Jun 07, 2024
Application Filed
Feb 27, 2026
Non-Final Rejection — §101, §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/449,809
Patent 12603078
GENERATING SPEECH DATA USING ARTIFICIAL INTELLIGENCE TECHNIQUES
2y 5m to grant Granted Apr 14, 2026
17/992,605
Patent 12597365
AUTOMATIC TRANSLATION BETWEEN SIGN LANGUAGE AND SPOKEN LANGUAGE
2y 5m to grant Granted Apr 07, 2026
18/205,615
Patent 12585876
METHOD OF TRAINING POS TAGGING MODEL, COMPUTER-READABLE RECORDING MEDIUM AND POS TAGGING METHOD
2y 5m to grant Granted Mar 24, 2026
18/518,786
Patent 12579385
EMBEDDED TRANSLATE, SUMMARIZE, AND AUTO READ
2y 5m to grant Granted Mar 17, 2026
18/140,389
Patent 12566928
READABILITY BASED CONFIDENCE SCORE FOR LARGE LANGUAGE MODELS
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
89%
Grant Probability
99%
With Interview (+27.3%)
2y 10m
Median Time to Grant
Low
PTA Risk
Based on 27 resolved cases by this examiner. Grant probability derived from career allow rate.