Last updated: May 29, 2026
Application No. 18/424,572
SLANG USAGE DETECTION AND MITIGATION FOR LARGE LANGUAGE MODELS

Non-Final OA §103
Filed
Jan 26, 2024
Examiner
TENGBUMROONG, NATHAN NARA
Art Unit
2654
Tech Center
2600 — Communications
Assignee
Intuit Inc.
OA Round
1 (Non-Final)
Interview Optional

— +26.7% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 47% grant rate with +26.7% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 19 resolved cases, 2023–2026
Examiner Intelligence

TENGBUMROONG, NATHAN NARA View full profile →
Grants 47% of resolved cases
Career Allowance Rate
9 granted / 19 resolved
-14.6% vs TC avg
Strong +27% interview lift
Without
With
+26.7%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
21 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§103
98.3%
+58.3% vs TC avg
§102
1.7%
-38.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 19 resolved cases
Office Action

§103
DETAILED ACTION
This office action is in response to Applicant’s submission filed on 1/26/2024. Claims 1-20 are pending in the application. As such, claims 1-20 have been examined.
	
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) was submitted on 1/26/2024. The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 4-6, 14, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Elisco et al. (US 20220300711 A1; hereinafter referred to as Elisco) in view of Pei et al. (Pei, Z., Sun, Z., & Xu, Y. (2019, November). Slang detection and identification. In Proceedings of the 23rd conference on computational natural language learning (CoNLL) (pp. 881-889).; hereinafter referred to as Pei).
Regarding claim 1, Elisco teaches: a method of training a machine learning (ML) model to detect slang usage, comprising: masking at least one token of a plurality of tokens for each training data instance of a plurality of training data instances to create a plurality of masked training data instances ([0048] During training, a first proportion such as 15% of tokens in a textual input may be replaced with a special mask token, a second proportion such as 10% may be replaced with a random token, and/or remaining tokens may be kept the same. A task in training may be to predict an original masked token),… 
  training the ML model, using a masked-language modeling head, to predict the at least one masked token for each of the plurality of masked training data instances using the plurality of training data instances and the plurality of masked training data instances… ([0048] A task in training may be to predict an original masked token… Only masked tokens in a note may contribute to a loss function which influences how to change network parameters to improve performance. A model being trained may learn which tokens appear in a similar context for a given dataset).
Elisco does not explicitly, but Pei discloses: wherein: the plurality of tokens for each training data instance in a first subset of the plurality of training data instances ([4.1] We consider datasets that are composed of sentences in two distinct categories, standard (slangless) and slang-specific) does not include at least one slang token of a plurality of slang tokens ([4.1] The sentences from Wall Street News are taken to be non-slang sentences since the news-based sentences were typically standard English conformed and reviewed before publication. In order to construct an even more trustworthy negative set for standard English, we filtered the sentences from Wall Street News based on the proportion of unknown tokens within the sentences), and the plurality of tokens for each training data instance in a second subset of the plurality of training data instances includes the at least one slang token of the plurality of slang tokens… ([4.1] We collect positive examples from lexical entries in the Online Slang Dictionary (OSD) where example usage sentences are available);
labeling each training data instance of the plurality of training data instances with at least one of a slang instance label, a non-slang instance label, one or more slang token labels, or one or more non-slang token labels to generate a plurality of labeled training data instances ([3.1] In the slang identification task, our models identify each token within the input sentence as ‘nonslang’ or ‘slang’ by sequence labeling, which determines the exact positions of slang usage);
and training the ML model ([Fig. 2] For a specific token fire in the source sentence “she can cook some fire food”, the related linguistic features are represented as token vectors to concatenate the feature-based input for this token. Each randomly initialized vector is updated during training), using a classification head, to: classify each training data instance of the plurality of labeled training data instances as one of a slang instance or a non-slang instance ([3.1] the models in the identification task encapsulate the detection task; an empty prediction that labels all tokens as ‘non-slang’ is equivalent to classifying the sentence as a non-slang sentence in the detection task, and vice versa) and thereby generate a classification output; or classify each token of the plurality of tokens for each training data instance of the plurality of labeled training data instances as one of a slang token or a non-slang token and thereby generate the classification output ([4.2.1] We evaluated our models to determine whether a given sentence contains at least one slang usage).
 Elisco and Pei are considered analogous in the field of natural language processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Elisco to combine the teachings of Pei because doing so would help improve slang detection and classification in machine learning models by utilizing tokenization and word embeddings to help locate slang (Pei [5] For unknown tokens, character-based convolutional embeddings improve the model in handling novel slang terms. We demonstrate that features combined with distributed word embeddings help machine detection of slang in general, and that Part-of-Speech among others is a prominent feature of slang usage. Our work provides a basis for locating slang from its flexible and unconventional syntactic word uses and offers opportunities for slang processing in downstream tasks in natural language processing).

Regarding claim 4, the combination of Elisco and Pei teaches: the method of claim 1. Elisco further teaches: wherein masking the at least one token of the plurality of tokens for each training data instance of the plurality of training data instances to create the plurality of masked training data instances comprises randomly masking at least a first percentage of the plurality of tokens for each training data instance ([0048] Language modeling 320 may involve learning a probability distribution over a sequence of words, which probability distribution may be used to characterize relationships between words, for instance and without limitation as captured by geometric relationships between vectors as described in this disclosure. During training, a first proportion such as 15% of tokens in a textual input may be replaced with a special mask token).

Regarding claim 5, the combination of Elisco and Pei teaches: the method of claim 1. Elisco further teaches: wherein masking the at least one token of the plurality of tokens for each training data instance of the plurality of training data instances to create the plurality of masked training data instances comprises masking the at least one slang token included in the plurality of tokens for each training data instance in the second subset of the plurality of training data instances ([0048] During training, a first proportion such as 15% of tokens in a textual input may be replaced with a special mask token... a resulting model may be catered towards clinically-specific text and language used by case workers. Model may be able to learn slang, acronyms, synonyms, misspellings, jargon, and more which may be otherwise absent from generalized models).

Regarding claim 6, the combination of Elisco and Pei teaches: the method of claim 1. Elisco further teaches: wherein the ML model comprises an encoder only transformer architecture ([0024] As used in this disclosure a “transformer model” is a deep learning model for processing sequential data, such as natural language, for tasks such as translation and text summarization. As a non-limiting example a transformer model may include pre-trained systems such as Bidirectional Encoder Representations from Transformers (BERT). BERT is an example of an encoder only transformer.).

Regarding claim 14, Elisco teaches: a method of slang detection and mitigation, comprising: receiving an input sentence comprising a plurality of tokens… ([0046] textual input 304 such as a document from a current document sequence as described below may be received by neural network 108. Computing device 104 may tokenize textual input 304 prior to provision to base network 204).
Elisco does not explicitly, but Pei teaches: and processing, with a first machine learning (ML) model trained for slang classification, the input sentence comprising the plurality of tokens and thereby generating at least one of: a first classification output for the input sentence, the first classification output comprising a slang instance classification ([4.2.1] We evaluated our models to determine whether a given sentence contains at least one slang usage);
or a second classification output for each of the plurality of tokens of the input sentence, at least one second classification output comprising a slang token classification ([3.1] In the slang identification task, our models identify each token within the input sentence as ‘nonslang’ or ‘slang’ by sequence labeling, which determines the exact positions of slang usage).

Regarding claim 20, the combination of Elisco and Pei teaches: the method of claim 14. Elisco further teaches: prior to using the first ML model to process the input sentence to generate at least one of the first classification output or the second classification output: training the first ML model, using a masked-language modeling head, to predict at least one token masked for each of a plurality of masked training data instances… ([0048] During training, a first proportion such as 15% of tokens in a textual input may be replaced with a special mask token, a second proportion such as 10% may be replaced with a random token, and/or remaining tokens may be kept the same. A task in training may be to predict an original masked token);
and training the first ML model, using a classification head and a plurality of non-masked training data instance ([0048] some sub-network processing may include token-level classification 312, which may classify tokens to outputs of interest, and/or sentence-level classification 316, which may classify sentences to outputs of interest… During training, a first proportion such as 15% of tokens in a textual input may be replaced with a special mask token, a second proportion such as 10% may be replaced with a random token, and/or remaining tokens may be kept the same), to generate an instance-level slang classification or a token-level slang classification ([0048] a resulting model may be catered towards clinically-specific text and language used by case workers. Model may be able to learn slang, acronyms, synonyms, misspellings, jargon, and more which may be otherwise absent from generalized models).
Pei further teaches: wherein: each training data instance of a first subset of the plurality of masked training data instances does not include at least one slang token of a plurality of slang tokens ([4.1] The sentences from Wall Street News are taken to be non-slang sentences since the news-based sentences were typically standard English conformed and reviewed before publication. In order to construct an even more trustworthy negative set for standard English, we filtered the sentences from Wall Street News based on the proportion of unknown tokens within the sentences), and each training data instance of a second subset of the plurality of masked training data instances comprises the at least one slang token of the plurality of slang tokens… ([4.1] We collect positive examples from lexical entries in the Online Slang Dictionary (OSD) where example usage sentences are available).

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Elisco in view of Pei as applied to claims 1, 4-6, 14, and 20 above, and further in view of Lin et al. (US 20210366467 A1; hereinafter referred to as Lin), Nagaraju et al. (US 20240185001 A1; hereinafter referred to as Nagaraju), and Sackett et al. (US 20230147359 A1; hereinafter referred to as Sackett).
Regarding claim 2, the combination of Elisco and Pei teaches: the method of claim 1. The combination of Elisco and Pei does not explicitly, but Lin teaches: each training data instance in the first subset of the plurality of training data instances comprises at least a definition associated with a slang token in the plurality of slang tokens… ([0054] plurality of slang sentences and respective explanations for the plurality of slang sentences are extracted from audio or video data by capturing emotional reactions to slang. A set of training samples are generated by clustering the plurality of slang sentences and the explanations. Each of the training samples comprises at least one slang sentence and at least one explanation corresponding to same slang).
Elisco, Pei, and Lin are considered analogous in the field of natural language processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Elisco and Pei to combine the teachings of Lin because doing so would improve the efficiency of training a ML model to identify and understand slang by using clustered training data instances (Lin [0065] in order to improve the training efficiency of the slang identification and explanation model 403, the data processing apparatus 411 may further cluster explanations mapped to a same slang category into explanation categories based on similarities of these explanations).
The combination of Elisco, Pei, and Lin does not explicitly, but Nagaraju teaches: and the method further comprises generating the second subset of the plurality of training data instances by, for each training data instance in the first subset of the plurality of training data instances: generating, by a large language model (LLM), a plurality of output tokens ([0020] increase the diversity of the generated outputs, the template queries may include examples of outputs that use colloquial phrasing or descriptions of task-specific items instead of naming the task-specific item directly. Some template queries may include outputs that do not correspond to any task-specific item, and/or some template queries may include examples of long or complex outputs or outputs that include terms modifying task-specific items. Because the LLM has been trained on such a large amount of data, the LLM may be able to appropriately respond to the varied template queries and generate a more varied set of conversational query outputs) based on a prompt comprising the respective training data instance and the slang token associated with the respective training data instance ([0049] Prompt 310 may then be provided to trained large language model 134, with the example input/output pairings providing trained large language model 134 with few-shot examples of the kind of output value to be generated);
determining the plurality of output tokens comprise the slang token… ([0049] trained large language model 134 may generate an output that more closely simulates a likely human's response to the same prompt 310. For example, trained large language model 134 may generate an output that includes conversational colloquialisms such as asking for “the twelve-piece” instead of “the twelve-piece chicken bucket.”).
Elisco, Pei, Lin, and Nagaraju are considered analogous in the field of natural language processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Elisco, Pei, and Lin to combine the teachings of Nagaraju because doing so would allow for the efficient creation of diverse training data to be used for training and improving a ML model for slang detection (Nagaraju [0022] The advantages of the disclosed techniques include but are not limited to systems and methods that are capable of generating a large amount of task-specific training data without the time and expense required by traditional data collection methods. By using specially curated template queries and task-specific structured data, the disclosed techniques can also create training data that includes more linguistic diversity and less ungrammatical or implausible examples than other dataset generation methods).
	The combination of Elisco, Pei, Lin, and Nagaraju does not explicitly, but Sackett teaches: and determining, by an entailment model, that an entailment score between the definition of the respective training data instance ([0065] The entailment classifier 204 is a classifier that uses the input phrase 202 and the selected entailment comparison data 208 to generate entailment classification data 222 (e.g., via generating an entailment mapping 216). The entailment classifier 204 can be a model (e.g., entailment model), such as a machine learning model, that has been pre-trained to determine whether or not a first phrase (e.g., the input phrase 202) entails a second phrase (e.g., each example phrase 210)) and the plurality of output tokens satisfies a threshold ([0084] If a positive classification is made (e.g., the entailment classifier identifies at least one topic, optionally associated with a confidence score above a threshold confidence score, and optionally with a particular entailment value or values), that classification data can be output and used to generate a response at block 412).
Elisco, Pei, Lin, Nagaraju, and Sackett are considered analogous in the field of natural language processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Elisco, Pei, Lin, and Nagaraju to combine the teachings of Sackett because doing so would allow for better classification of slang terms using entailment scores, leading to better detection of slang terms in training (Sackett [0044] Certain aspects of the present disclosure, including the use of an entailment classifier alone or combined with at least one of a pattern matching classifier (e.g., RegEx classifier) and an SML classifier (e.g., a BERT classifier), provide specific improvements to the technological process of interpreting user input and generating appropriate responses in natural language processing, especially with respect to chatbots and automated psychological therapy, and especially with respect to fields where subject-matter granularity is important).

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Elisco in view of Pei as applied to claims 1, 4-6, 14, and 20 above, and further in view of Lin, Lancioni et al. (US 20250117593 A1; hereinafter referred to as Lancioni), and Sackett.
Regarding claim 3, the combination of Elisco and Pei teaches: the method of claim 1. The combination of Elisco and Pei does not explicitly, but Lin teaches: each training data instance in the second subset of the plurality of training data instances comprises at least a definition of a slang token in the plurality of slang tokens and a first sentence comprising the slang token… ([0054] plurality of slang sentences and respective explanations for the plurality of slang sentences are extracted from audio or video data by capturing emotional reactions to slang. A set of training samples are generated by clustering the plurality of slang sentences and the explanations. Each of the training samples comprises at least one slang sentence and at least one explanation corresponding to same slang).
Elisco, Pei, and Lin are considered analogous in the field of natural language processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Elisco and Pei to combine the teachings of Lin because doing so would improve the efficiency of training a ML model to identify and understand slang by using clustered training data instances (Lin [0065] in order to improve the training efficiency of the slang identification and explanation model 403, the data processing apparatus 411 may further cluster explanations mapped to a same slang category into explanation categories based on similarities of these explanations).
The combination of Elisco, Pei, and Lin does not explicitly, but Lancioni teaches: and the method further comprises, generating the first subset of the plurality of training data instances by, for each training data instance in the second subset of the plurality of training data instances: generating, by a large language model (LLM), a plurality of output tokens ([0039] The example large language model circuitry 155 of the illustrated example of FIG. 1 executes a large language model to transform an input prompt into an output message) based on a prompt comprising the first sentence of the respective training data instance comprising the slang token ([0043] the message validator circuitry 170 augments the message by placing the message received from the large language model circuitry 155 into a prompt template that is designed to request the large language model circuitry 155 to determine whether any terms, phrases, sayings, etc. are included in the message that indicate that the message is inappropriate. Such phrases may include, for example, offensive language, slang or unprofessionally written language);
determining the plurality of output tokens do not comprise the slang token… ([0061-0062] the message validator circuitry 170 may seek to determine whether inappropriate content is included in the response message, whether offensive language is included in the response message, whether the response message contains slang or unprofessionally written language.. a positive identification of the offensive language and slang wording may result in both anti-hypothesis being utilized to modify the original prompt).
Elisco, Pei, Lin, and Lancioni are considered analogous in the field of natural language processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Elisco, Pei, and Lin to combine the teachings of Lancioni because doing so would allow for specific training data to generated by an LLM for the purpose of training and improving an ML model for slang detection (Lancioni [0124] methods have been disclosed that enable large language models to be utilized in various contexts while providing guardrails for the responses that are provided the LLM. Disclosed systems, apparatus, articles of manufacture, and methods improve the efficiency of using a computing device by limiting the amount of additional anti-hypothesis that are added to a prompt for generation of a response).
The combination of Elisco, Pei, Lin, and Lancioni does not explicitly, but Sackett teaches: and determining, by an entailment model, that an entailment score between the definition of the respective training data instance and the plurality of output tokens or between the first sentence and the plurality of output tokens satisfies a threshold ([0035] As the entailment classifier processes each of the example phrases within an entailment comparison dataset, it can generate an entailment mapping that includes the entailment value, and optionally a confidence score, for that example phrase. An entailment value is indicative of the input phrase entailing, contradicting, or being neutral with respect to an example phrase. The confidence score can be an indication of how confident the entailment classifier is that the entailment value is correct).
Elisco, Pei, Lin, Lancioni, and Sackett are considered analogous in the field of natural language processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Elisco, Pei, Lin, and Lancioni to combine the teachings of Sackett because doing so would allow for better classification of slang terms using entailment scores, leading to better detection of slang terms in training (Sackett [0044] Certain aspects of the present disclosure, including the use of an entailment classifier alone or combined with at least one of a pattern matching classifier (e.g., RegEx classifier) and an SML classifier (e.g., a BERT classifier), provide specific improvements to the technological process of interpreting user input and generating appropriate responses in natural language processing, especially with respect to chatbots and automated psychological therapy, and especially with respect to fields where subject-matter granularity is important).

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Elisco in view of Pei as applied to claims 1, 4-6, 14, and 20 above, and further in view of Sewak et al. (US 20220414137 A1; hereinafter referred to as Sewak).
Regarding claim 7, the combination of Elisco and Pei teaches: the method of claim 6. The combination of Elisco and Pei does not explicitly, but Sewak teaches: wherein the encoder only transformer architecture comprises a decoding-enhanced bidirectional encoder representations from transformers with disentangled attention (DeBERTa) model ([0081] For better results, larger and more expressive models may be used. The models may preferably be pre-trained (concept of transfer learning where models learn partially from large unsupervised and unlabelled data) and further fine-tuned with data, preferably from similar domains as application requirements. Some examples of similar models could be (but not limited to) GPT-3, Microsoft DeBerta etc, preferably models with a good zero-shot generative capabilities mode (a mode in which the model could generate text without fine-tuning with specific type of data)).
Elisco, Pei, and Sewak are considered analogous in the field of natural language processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Elisco and Pei to combine the teachings of Sewak because doing so would allow for the use of a DeBERTa model, which would improve the efficiency for training a ML slang detection model (Sewak [0081] Generative NLP models available in a repository 162 are loaded (or remains pre-loaded throughout). For better results, larger and more expressive models may be used. The models may preferably be pre-trained (concept of transfer learning where models learn partially from large unsupervised and unlabelled data) and further fine-tuned with data, preferably from similar domains as application requirements).

Claims 8 and 11-13 are rejected under 35 U.S.C. 103 as being unpatentable over Elisco in view of Pei, Kondadadi et al. (US 20170199963 A1; hereinafter referred to as Kondadadi), and Sackett.
Regarding claim 8, Elisco teaches: A method of training a machine learning (ML) model to mitigate slang usage, comprising: masking at least one token of a plurality of tokens for each first training data instance of a plurality of first training data instances to create a plurality of masked training data instances… ([0048] During training, a first proportion such as 15% of tokens in a textual input may be replaced with a special mask token, a second proportion such as 10% may be replaced with a random token, and/or remaining tokens may be kept the same. A task in training may be to predict an original masked token);
training the ML model, using a masked-language modeling head, to predict the at least one token masked for each of the plurality of masked training data instances using the plurality of first training data instances and the plurality of masked training data instances… ([0048] A task in training may be to predict an original masked token… Only masked tokens in a note may contribute to a loss function which influences how to change network parameters to improve performance. A model being trained may learn which tokens appear in a similar context for a given dataset);
and for each of the plurality of second training data instances, training the ML model, using a causal language modeling head ([0024] a transformer model may include pre-trained systems such as Bidirectional Encoder Representations from Transformers (BERT) and/or Generative Pre-trained Transformer (GPT). GPT inherently contains a causal language modeling head.), to predict the second plurality of tokens from the first plurality of tokens ([0048] During training, a first proportion such as 15% of tokens in a textual input may be replaced with a special mask token, a second proportion such as 10% may be replaced with a random token, and/or remaining tokens may be kept the same. A task in training may be to predict an original masked token).
Elisco does not explicitly, but Pei teaches: wherein: the plurality of tokens for each first training data instance in a first subset of the plurality of first training data instances ([4.1] We consider datasets that are composed of sentences in two distinct categories, standard (slangless) and slang-specific) does not include at least one slang token of a plurality of slang tokens ([4.1] The sentences from Wall Street News are taken to be non-slang sentences since the news-based sentences were typically standard English conformed and reviewed before publication. In order to construct an even more trustworthy negative set for standard English, we filtered the sentences from Wall Street News based on the proportion of unknown tokens within the sentences), and the plurality of tokens for each first training data instance in a second subset of the plurality of first training data instances includes the at least one slang token of the plurality of slang tokens… ([4.1] We collect positive examples from lexical entries in the Online Slang Dictionary (OSD) where example usage sentences are available).
Elisco and Pei are considered analogous in the field of natural language processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Elisco to combine the teachings of Pei because doing so would help improve slang detection and classification in machine learning models by utilizing tokenization and word embeddings to help locate slang (Pei [5] For unknown tokens, character-based convolutional embeddings improve the model in handling novel slang terms. We demonstrate that features combined with distributed word embeddings help machine detection of slang in general, and that Part-of-Speech among others is a prominent feature of slang usage. Our work provides a basis for locating slang from its flexible and unconventional syntactic word uses and offers opportunities for slang processing in downstream tasks in natural language processing).
The combination of Elisco and Pei does not explicitly, but Kondadadi teaches: obtaining a plurality of second training data instances, wherein each second training data instance comprises: a training input comprising a first plurality of tokens ([0063] each training text (e.g., free-form clinician narration) may be tokenized to break it down into various levels of syntactic substructure) including the at least one slang token of the plurality of slang tokens ([0083] in some embodiments, acronyms and abbreviations in training texts (e.g., past medical reports, and/or any other suitable medical texts) may be manually labeled with their proper expanded forms in each particular context in which they appear. Abbreviations can be a type of slang.);
and a training output comprising a second plurality of tokens that do not include the at least one slang token of the plurality of slang tokens… ([0083] matching of an acronym or abbreviation encountered in a medical report to its proper expanded form in that particular instance may be performed by an acronym/abbreviation expansion model, which may be trained statistically in some embodiments using methods similar to those described above for training the statistical entity detection model and/or the statistical relation model. The expanded form of an acronym or abbreviation is not considered slang.)
Elisco, Pei, and Kondadadi are considered analogous in the field of natural language processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Elisco and Pei to combine the teachings of Kondadadi because doing so would allow for the creation of training data that contains slang token inputs and outputs without slang tokens, which can be used for contextually training a ML model for slang detection (Kondadadi [0084] using techniques similar to those described above for other statistical models and statistical classifier models in general, the labeled training text may be used as input to train the statistical acronym/abbreviation expansion model by extracting features from the text including the acronym or abbreviation, and probabilistically associating the extracted features with the manually supplied label indicating the proper expanded form of the acronym or abbreviation in a particular context). 
The combination of Elisco, Pei, and Kondadadi does not explicitly, but Sackett teaches: wherein an entailment score between the training input and the training output of the respective second training data instance is greater than a threshold… ([0067] Each entailment value 218 indicates whether the input phrase 202 entails, contradicts, or is neutral with respect to the respective example phrase… the highest confidence score or a number of highest confidence scores (e.g., top three scores, all scores above a threshold score, etc.) can be used to generate the entailment classification data 222).
Elisco, Pei, Kondadadi, and Sackett are considered analogous in the field of natural language processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Elisco, Pei, and Kondadadi to combine the teachings of Sackett because doing so would allow for better classification of slang terms using entailment scores, leading to better detection of slang terms in training (Sackett [0044] Certain aspects of the present disclosure, including the use of an entailment classifier alone or combined with at least one of a pattern matching classifier (e.g., RegEx classifier) and an SML classifier (e.g., a BERT classifier), provide specific improvements to the technological process of interpreting user input and generating appropriate responses in natural language processing, especially with respect to chatbots and automated psychological therapy, and especially with respect to fields where subject-matter granularity is important).

Regarding claim 11, the combination of Elisco, Pei, Kondadadi, and Sackett teaches: the method of claim 8. Elisco further teaches: wherein masking the at least one token of the plurality of tokens for each training data instance of the plurality of first training data instances to create the plurality of masked training data instances comprises randomly masking at least a first percentage of the plurality of tokens for each first training data instance ([0048] Language modeling 320 may involve learning a probability distribution over a sequence of words, which probability distribution may be used to characterize relationships between words, for instance and without limitation as captured by geometric relationships between vectors as described in this disclosure. During training, a first proportion such as 15% of tokens in a textual input may be replaced with a special mask token).

Regarding claim 12, the combination of Elisco, Pei, Kondadadi, and Sackett teaches: the method of claim 8. Elisco further teaches: wherein masking the at least one token of the plurality of tokens for each first training data instance of the plurality of first training data instances to create the plurality of masked training data instances comprises masking the at least one slang token included in the plurality of tokens for each first training data instance in the second subset of the plurality of first training data instances ([0048] During training, a first proportion such as 15% of tokens in a textual input may be replaced with a special mask token... a resulting model may be catered towards clinically-specific text and language used by case workers. Model may be able to learn slang, acronyms, synonyms, misspellings, jargon, and more which may be otherwise absent from generalized models).

Regarding claim 13, the combination of Elisco, Pei, Kondadadi, and Sackett teaches: the method of claim 8. Elisco further teaches: wherein the ML model comprises an encoder-decoder transformer architecture ([0073] Encoder element 704 first receives an input 716, wherein an “input” is any textual representation, audiographic representation, and/or videographic representation. Input 716 is entered to a multi-head-attention 708 such that encoder 704 produces output encodings that are provided to the next encoder element and/or a decoder element 720. As used in this disclosure a decoder element 720 is an element that decodes the output encodings from encoder element 704 to provide output probabilities 724).

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Elisco in view of Pei, Kondadadi, and Sackett as applied to claims 8 and 11-13 above, and further in view of Lin and Nagaraju.
Regarding claim 9, the combination of Elisco, Pei, Kondadadi, and Sackett teaches: the method of claim 8. Sackett further teaches: and determining, by an entailment model, that an entailment score between the definition of the respective training data instance ([0065] The entailment classifier 204 is a classifier that uses the input phrase 202 and the selected entailment comparison data 208 to generate entailment classification data 222 (e.g., via generating an entailment mapping 216). The entailment classifier 204 can be a model (e.g., entailment model), such as a machine learning model, that has been pre-trained to determine whether or not a first phrase (e.g., the input phrase 202) entails a second phrase (e.g., each example phrase 210)) and the plurality of output tokens satisfies a threshold ([0084] If a positive classification is made (e.g., the entailment classifier identifies at least one topic, optionally associated with a confidence score above a threshold confidence score, and optionally with a particular entailment value or values), that classification data can be output and used to generate a response at block 412).
The combination of Elisco, Pei, Kondadadi, and Sackett does not explicitly, but Lin teaches: each training data instance in the first subset of the plurality of first training data instances comprises at least a definition associated with a slang token in the plurality of slang tokens… ([0054] plurality of slang sentences and respective explanations for the plurality of slang sentences are extracted from audio or video data by capturing emotional reactions to slang. A set of training samples are generated by clustering the plurality of slang sentences and the explanations. Each of the training samples comprises at least one slang sentence and at least one explanation corresponding to same slang).
Elisco, Pei, Kondadadi, Sackett, and Lin are considered analogous in the field of natural language processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Elisco, Pei, Kondadadi, and Sackett to combine the teachings of Lin because doing so would improve the efficiency of training a ML model to identify and understand slang by using clustered training data instances (Lin [0065] in order to improve the training efficiency of the slang identification and explanation model 403, the data processing apparatus 411 may further cluster explanations mapped to a same slang category into explanation categories based on similarities of these explanations).
The combination of Elisco, Pei, Kondadadi, Sackett, and Lin does not explicitly, but Nagaraju teaches: and the method further comprises generating the second subset of the plurality of first training data instances by, for each training data instance in the first subset of the plurality of first training data instances: generating, by a large language model (LLM), a plurality of output tokens ([0020] increase the diversity of the generated outputs, the template queries may include examples of outputs that use colloquial phrasing or descriptions of task-specific items instead of naming the task-specific item directly. Some template queries may include outputs that do not correspond to any task-specific item, and/or some template queries may include examples of long or complex outputs or outputs that include terms modifying task-specific items. Because the LLM has been trained on such a large amount of data, the LLM may be able to appropriately respond to the varied template queries and generate a more varied set of conversational query outputs) based on a prompt comprising the respective training data instance and the slang token associated with the respective training data instance ([0049] Prompt 310 may then be provided to trained large language model 134, with the example input/output pairings providing trained large language model 134 with few-shot examples of the kind of output value to be generated);
determining the plurality of output tokens comprise the slang token… ([0049] trained large language model 134 may generate an output that more closely simulates a likely human's response to the same prompt 310. For example, trained large language model 134 may generate an output that includes conversational colloquialisms such as asking for “the twelve-piece” instead of “the twelve-piece chicken bucket.”).
Elisco, Pei, Kondadadi, Sackett, Lin, and Nagaraju are considered analogous in the field of natural language processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Elisco, Pei, Kondadadi, Sackett, and Lin to combine the teachings of Nagaraju because doing so would allow for the efficient creation of diverse training data to be used for training and improving a ML model for slang detection (Nagaraju [0022] The advantages of the disclosed techniques include but are not limited to systems and methods that are capable of generating a large amount of task-specific training data without the time and expense required by traditional data collection methods. By using specially curated template queries and task-specific structured data, the disclosed techniques can also create training data that includes more linguistic diversity and less ungrammatical or implausible examples than other dataset generation methods).

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Elisco in view of Pei, Kondadadi, and Sackett as applied to claims 8 and 11-13 above, and further in view of Lin and Lacioni.
Regarding claim 10, the combination of Elisco, Pei, Kondadadi, and Sackett teaches: the method of claim 8. Sackett further teaches: and determining, by an entailment model, that an entailment score between the definition of the respective training data instance and the plurality of output tokens satisfies a threshold ([0035] As the entailment classifier processes each of the example phrases within an entailment comparison dataset, it can generate an entailment mapping that includes the entailment value, and optionally a confidence score, for that example phrase. An entailment value is indicative of the input phrase entailing, contradicting, or being neutral with respect to an example phrase. The confidence score can be an indication of how confident the entailment classifier is that the entailment value is correct).
The combination of Elisco, Pei, Kondadadi, and Sackett does not explicitly, but Lin teaches: each training data instance in the second subset of the plurality of first training data instances comprises at least a definition of a slang token in the plurality of slang tokens and a first sentence comprising the slang token… ([0054] plurality of slang sentences and respective explanations for the plurality of slang sentences are extracted from audio or video data by capturing emotional reactions to slang. A set of training samples are generated by clustering the plurality of slang sentences and the explanations. Each of the training samples comprises at least one slang sentence and at least one explanation corresponding to same slang).
Elisco, Pei, Kondadadi, Sackett, and Lin are considered analogous in the field of natural language processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Elisco, Pei, Kondadadi, and Sackett to combine the teachings of Lin because doing so would improve the efficiency of training a ML model to identify and understand slang by using clustered training data instances (Lin [0065] in order to improve the training efficiency of the slang identification and explanation model 403, the data processing apparatus 411 may further cluster explanations mapped to a same slang category into explanation categories based on similarities of these explanations).
The combination of Elisco, Pei, Kondadadi, Sackett, and Lin does not explicitly, but Lancioni teaches: and the method further comprises, generating the first subset of the plurality of first training data instances by, for each training data instance in the second subset of the plurality of first training data instances: generating, by a large language model (LLM), a plurality of output tokens ([0039] The example large language model circuitry 155 of the illustrated example of FIG. 1 executes a large language model to transform an input prompt into an output message) based on a prompt comprising the first sentence of the respective training data instance comprising the slang token ([0043] the message validator circuitry 170 augments the message by placing the message received from the large language model circuitry 155 into a prompt template that is designed to request the large language model circuitry 155 to determine whether any terms, phrases, sayings, etc. are included in the message that indicate that the message is inappropriate. Such phrases may include, for example, offensive language, slang or unprofessionally written language);
determining the plurality of output tokens do not comprise the slang token… ([0061-0062] the message validator circuitry 170 may seek to determine whether inappropriate content is included in the response message, whether offensive language is included in the response message, whether the response message contains slang or unprofessionally written language.. a positive identification of the offensive language and slang wording may result in both anti-hypothesis being utilized to modify the original prompt).
Elisco, Pei, Kondadadi, Sackett, Lin, and Lancioni are considered analogous in the field of natural language processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Elisco, Pei, Kondadadi, Sackett, and Lin to combine the teachings of Lancioni because doing so would allow for specific training data to generated by an LLM for the purpose of training and improving an ML model for slang detection (Lancioni [0124] methods have been disclosed that enable large language models to be utilized in various contexts while providing guardrails for the responses that are provided the LLM. Disclosed systems, apparatus, articles of manufacture, and methods improve the efficiency of using a computing device by limiting the amount of additional anti-hypothesis that are added to a prompt for generation of a response).

Claims 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over Elisco in view of Pei, as applied to claims 1, 4-6, 14, and 20 above, and further in view of Lacioni and Sackett.
Regarding claim 15, the combination of Elisco and Pei teaches: the method of claim 14. The combination of Elisco and Pei does not explicitly, but Lancioni teaches: further comprising, based on at least one of the first classification output comprising the slang instance classification or the at least one second classification output comprising the slang token classification, processing with a second ML model trained for slang mitigation ([0062] a first hypothesis that the message includes offensive language may be tested in addition to a second hypothesis that the message includes slang wording. In some examples, a positive identification of the offensive language and slang wording may result in both anti-hypothesis being utilized to modify the original prompt), the input sentence to generate an output sentence… ([0039] The example large language model circuitry 155 of the illustrated example of FIG. 1 executes a large language model to transform an input prompt into an output message),
and the output sentence does not comprise any slang tokens ([0061-0062] the message validator circuitry 170 may seek to determine whether inappropriate content is included in the response message, whether offensive language is included in the response message, whether the response message contains slang or unprofessionally written language.. a positive identification of the offensive language and slang wording may result in both anti-hypothesis being utilized to modify the original prompt).
Elisco, Pei, and Lancioni are considered analogous in the field of natural language processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Elisco and Pei to combine the teachings of Lancioni because doing so would allow for specific training data to generated by an LLM for the purpose of training and improving an ML model for slang detection (Lancioni [0124] methods have been disclosed that enable large language models to be utilized in various contexts while providing guardrails for the responses that are provided the LLM. Disclosed systems, apparatus, articles of manufacture, and methods improve the efficiency of using a computing device by limiting the amount of additional anti-hypothesis that are added to a prompt for generation of a response).
The combination of Elisco, Pei, and Lancioni does not explicitly, but Sackett teaches: wherein: an entailment score between the input sentence and the output sentence satisfies a threshold… ([0067] Each entailment value 218 indicates whether the input phrase 202 entails, contradicts, or is neutral with respect to the respective example phrase… the highest confidence score or a number of highest confidence scores (e.g., top three scores, all scores above a threshold score, etc.) can be used to generate the entailment classification data 222).
Elisco, Pei, Lancioni, and Sackett are considered analogous in the field of natural language processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Elisco, Pei, and Lancioni to combine the teachings of Sackett because doing so would allow for better classification of slang terms using entailment scores, leading to better detection of slang terms in training (Sackett [0044] Certain aspects of the present disclosure, including the use of an entailment classifier alone or combined with at least one of a pattern matching classifier (e.g., RegEx classifier) and an SML classifier (e.g., a BERT classifier), provide specific improvements to the technological process of interpreting user input and generating appropriate responses in natural language processing, especially with respect to chatbots and automated psychological therapy, and especially with respect to fields where subject-matter granularity is important).

Regarding claim 16, the combination of Elisco, Pei, Lancioni, and Sackett teaches: the method of claim 15. Lancioni further teaches: further comprising using the output sentence to perform one or more tasks ([0022] Moreover, in some examples, the output data may undergo post-processing after it is generated by the model to transform the output into a useful result (e.g., a display of data, an instruction to be executed by a machine, etc.)).

Claims 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over Elisco in view of Pei, Lancioni, and Sackett, as applied to claims 15-16 above, and further in view of Tensmeyer et al. (US 20230334244 A1; hereinafter referred to as Tensmeyer).
Regarding claim 17, the combination of Elisco, Pei, Lancioni, and Sackett teaches: the method of claim 15. Elisco further teaches: wherein: the first ML model comprises an encoder only transformer architecture… ([0024] a transformer model may include pre-trained systems such as Bidirectional Encoder Representations from Transformers (BERT)).
The combination of Elisco, Pei, Lancioni, and Sackett does not explicitly, but Tensmeyer teaches: and the second ML model comprises an encoder-decoder transformer architecture ([0049] a sequence-to-sequence training is used, where the fixer module 126 uses an encoder-decoder model to predict the probability of the next token given a context from a previous token. The output of the fixer module 126 can then be used to generate modified training sentence 612 with the assertion in the training sentence corresponding to masked training sentence 606).
Elisco, Pei, Lancioni, Sackett, and Tensmeyer are considered analogous in the field of natural language processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Elisco, Pei, Lancioni, and Sackett to combine the teachings of Tensmeyer because doing so would allow for use of an encoder-decoder transformer for more flexible training of a ML model using masked tokens (Tensmeyer [0036] the interaction of the user with the system can be used as a training signal to further improve the fact correction system 102. For example, when a user selects a suggested sentence from a ranked list, the selection can be used to further train the fact correction system 102 to rank that sentence first, or higher, in subsequent suggested corrections. In some embodiments, the data table used to modify the masked tokens in natural language sentence 105 can also be provided in output 130).

Regarding claim 18, the combination of Elisco, Pei, Lancioni, and Sackett teaches: the method of claim 15. Pei further teaches: wherein: each training data instance of a first subset of the plurality of masked training data instances does not include at least one slang token of a plurality of slang tokens ([4.1] The sentences from Wall Street News are taken to be non-slang sentences since the news-based sentences were typically standard English conformed and reviewed before publication. In order to construct an even more trustworthy negative set for standard English, we filtered the sentences from Wall Street News based on the proportion of unknown tokens within the sentences), and each training data instance of a second subset of the plurality of masked training data instances comprises the at least one slang token of the plurality of slang tokens ([4.1] We collect positive examples from lexical entries in the Online Slang Dictionary (OSD) where example usage sentences are available). 
The combination of Elisco, Pei, Lancioni, and Sackett does not explicitly, but Tensmeyer teaches: prior to using the second ML model to process the input sentence to generate the output sentence: training the second ML model, using a masked-language modeling head, to predict at least one token masked for each of a plurality of masked training data instances… ([0004] The fact correction system then predicts a new token for each of the one or more masked tokenized element based on the input sentence with the one or more masked tokenized element and the identified data table using a third machine learning model).

Regarding claim 19, the combination of Elisco, Pei, Lancioni, Sackett, and Tensmeyer teaches: the method of claim 18. Lancioni further teaches: training the second ML model, using a causal language modeling head ([0015] Many different types of machine learning models and/or machine learning architectures exist. In examples disclosed herein, a Large Language Model (LLM) such as ChatGPT is used. GPT models inherently contain a causal language modeling head.), to generate a second output sentence from a second output sentence including the at least one slang token of the plurality of slang tokens ([0061-0062] the message validator circuitry 170 may seek to determine whether inappropriate content is included in the response message, whether offensive language is included in the response message, whether the response message contains slang or unprofessionally written language.. a positive identification of the offensive language and slang wording may result in both anti-hypothesis being utilized to modify the original prompt).

	
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Premkumar et al. (US 20180107655 A1) – discloses a method of labeling a level of formality of target words in an input text.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Nathan Tengbumroong whose telephone number is (703)756-1725. The examiner can normally be reached Monday - Friday, 11:30 am - 8:00 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hai Phan can be reached at 571-272-6338. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/NATHAN TENGBUMROONG/Examiner, Art Unit 2654                                            

/HAI PHAN/Supervisory Patent Examiner, Art Unit 2654
Read full office action
Prosecution Timeline

Jan 26, 2024
Application Filed
Jan 09, 2026
Non-Final Rejection mailed — §103
Mar 30, 2026
Examiner Interview Summary
Mar 30, 2026
Applicant Interview (Telephonic)
Precedent Cases

Applications granted by this same examiner with similar technology

18/195,121
Patent 12640161
METHOD AND APPARATUS FOR PROCESSING AUDIO FOR SCENE CLASSIFICATION
3y 0m to grant Granted May 26, 2026
18/173,495
Patent 12530536
Mixture-Of-Expert Approach to Reinforcement Learning-Based Dialogue Management
2y 11m to grant Granted Jan 20, 2026
17/876,156
Patent 12451142
NON-WAKE WORD INVOCATION OF AN AUTOMATED ASSISTANT FROM CERTAIN UTTERANCES RELATED TO DISPLAY CONTENT
3y 2m to grant Granted Oct 21, 2025
17/883,265
Patent 12412050
MULTI-PLATFORM VOICE ANALYSIS AND TRANSLATION
3y 1m to grant Granted Sep 09, 2025
Study what changed to get past this examiner. Based on 4 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

1-2
Expected OA Rounds
47%
Grant Probability
74%
With Interview (+26.7%)
3y 0m (~8m remaining)
Median Time to Grant
Low
PTA Risk
Based on 19 resolved cases by this examiner. Grant probability derived from career allowance rate.