Last updated: April 19, 2026
Application No. 17/729,097
MACHINE-LEARNING SYSTEM AND METHOD FOR PREDICTING EVENT TAGS

Final Rejection §103
Filed
Apr 26, 2022
Examiner
LEY, SALLY THI
Art Unit
2147
Tech Center
2100 — Computer Architecture & Software
Assignee
Accenture Global Solutions Limited
OA Round
4 (Final)
This examiner grants 15% of cases after interview

— +28.8% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 33 resolved cases, 2023–2026
Examiner Intelligence

LEY, SALLY THI View full profile →
Grants only 15% of cases
Career Allow Rate
5 granted / 33 resolved
-39.8% vs TC avg
Strong +29% interview lift
Without
With
+28.8%
Interview Lift
resolved cases with interview
Typical timeline
3y 10m
Avg Prosecution
35 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
29.2%
-10.8% vs TC avg
§103
50.2%
+10.2% vs TC avg
§102
10.8%
-29.2% vs TC avg
§112
9.8%
-30.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 33 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
	This Office Action is in response to the communication filed on 01 October 2025.
	Claims 1-5, 7-9, and 13-20 are being considered on the merits.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5, 7-9, and 13-20 are rejected under 35 U.S.C. 103 as being unpatentable over Mahna (US20220382722; hereinafter, “Mahna”) in view of Fei, et. al. (US20200311519; hereinafter, “Fei”), in view of Ferreira, M.C. (“Incident Routing: Text Classification, Feature Selection, Imbalanced Datasets, and Concept Drift In Incident Ticket Management.” Master’s Thesis, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil, 2017. Hereinafter, “Ferreira”), and further in view of Bender, et. al. (US 2021/0294829 A1; hereinafter, “Bender”)

Regarding claim 1, 17, and 19, Mahna teaches: 
A computer-implemented training method for training a machine-learning model for predicting event tags (Mahna, para. 0005: “In an aspect, a method for generating a schema, the method comprising displaying, at a graphical control interface, a content field window, receiving, as a function of the content field window, a criterion element, and generating a schema as a function of the criterion element, wherein generating the schema further comprises identifying at least a significant term as a function of the criterion element, receiving at least a training example, training a machine-learning model as a function of the at least a training example, and generating the schema as a function of the criterion element and the machine-learning model”) from event data, (Mahna, para. 0068 and 0086: “Elements in training data 504 may be linked to descriptors of categories by tags, tokens, or other data elements” “Now referring to FIG. 8, an exemplary embodiment 800 of a criterion element 116 is illustrated. In an embodiment, and without limitation, criterion element 116 may include an ailment criterion 804. As used in this disclosure an “ailment criterion” is an element of datum denoting a parameter and/or identifier associated with an ailment. For example, and without limitation, ailment criterion 804 may denote criterion associated with multiple sclerosis. In an embodiment, and without limitation, criterion element 116 may include a clinical criterion 808. As used in this disclosure a “clinical criterion” is an element of datum denoting a parameter and/or identifier associated with a clinical history and/or medical record.” Examiner notes that Mahna teaches an event at least in the non-limiting example of a clinical history and/or medical record) comprising:  
A system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computers, cause the one or more computers to perform (Mahna, paras. 0021-0022: “At a high level, aspects of the present disclosure are directed to systems and methods for generating a schema…Computing device may include one or more computing devices dedicated to data storage, security, distribution of traffic for load balancing, and the like. Computing device may distribute one or more computing tasks as described below across a plurality of computing devices of computing device, which may operate in parallel, in series, redundantly, or in any other manner used for distribution of tasks or memory between computing devices.”): 
One or more computer-readable storage media storing instructions (Mahna, para. 0091: “A machine-readable storage medium may be any medium that is capable of storing and/or encoding a sequence of instructions for execution by a machine (e.g., a computing device) and that causes the machine to perform any one of the methodologies and/or embodiments described herein.”) that, when executed by one or more computers of a system (Mahna, para. 0022: “Computing device may include a single computing device operating independently, or may include two or more computing device operating in concert, in parallel, sequentially or the like; two or more computing devices may be included together in a single computing device or in two or more computing devices. Computing device may interface or communicate with one or more additional devices as described below in further detail via a network interface device. Network interface device may be utilized for connecting computing device to one or more of a variety of networks, and one or more devices”), cause the one or more computers to perform:
obtaining event data that specifies, for each of a plurality of events, a respective set of text fields characterizing the respective event (Mahna, para. 0025 and 0027: “Criterion element 116 may further be received as a function of content field window 108. As used in this disclosure a “content field window” is a window and/or field that allows a user to enter criterion element 116. For example, and without limitation, content field window 108 may include one or more text boxes, text fields, text entry boxes, and the like thereof.” “As a further non-limiting example, criterion element 116 may include a treatment plan for a medical condition. As a further non-limiting example, criterion element 116 may include a time and/or time period required to treat the medical condition, wherein a time period is any temporal window such as, but not limited to seconds, minutes, hours, days, weeks, months, years, and the like thereof. As a further non-limiting example, criterion element 116 may include a time and/or date the criteria was entered.” Examiner notes that Mahna teaches an event at least in the non-limiting example of a clinical history and/or medical record); 
encoding, via a language feature generation engine (Mahna, para. 0051: “Referring still to FIG. 1, at least a processor 104 may use a language processing module. Language processing module may include any hardware and/or software module. Language processing module may be configured to extract, from the one or more documents, one or more words. One or more words may include, without limitation, strings of one or more characters, including without limitation any sequence or sequences of letters, numbers, punctuation, diacritic marks, engineering symbols, geometric dimensioning and tolerancing (GMT) symbols, chemical symbols and formulas, spaces, whitespace, and other symbols, including any symbols usable as textual data as described above. Textual data may be parsed into tokens, which may include a simple word (sequence of letters separated by whitespace) or more generally a sequence of characters as described previously.”), from the event data, language features for the plurality of events, (Mahna, para. 0034: “Still referring to FIG. 1, in some cases, OCR may include post-processing. For example, OCR accuracy can be increased, in some cases, if output is constrained by a lexicon. A lexicon may include a list or set of words that are allowed to occur in a document. In some cases, a lexicon may include, for instance, all the words in the English language, or a more technical lexicon for a specific field.” Examiner notes that the broadest reasonable language of “encoded language features” means a “lexicon” where the words have certain meaning) 
wherein encoding the language features for the plurality of events comprises: obtaining one or more configuration parameters for an n-gram processing model; (Mahna, paras. 0024, 0038, 0043 and 0083: “Processor 104 and/or computing device may perform determinations, classification, and/or analysis steps, methods, processes, or the like as described in this disclosure using machine learning processes” “Still referring to FIG. 1, at least a processor 104 is configured to produce a schema 124 as a function of criterion element 116. As used in this disclosure a “schema” is a diagram and/or chart that aids in determining a course of action. In an embodiment, and without limitation, schema may be configured to present and/or depict a statistical probability.” “As used in this disclosure a “semantic relationship” is a relationship between the at least a significant term 128 and a semantic unit in criterion element 116…As used in this disclosure “semantic units” are words, phrases, sentences, and/or “n-grams” of words, defined as a set of n words appearing contiguously in a text.” “Still referring to FIG. 7, at step 725, at least a processor 104 may decide if available data is sufficient to satisfy specified at least a criterion. In some cases, step 725 may include ensuring that an analytical constraint corresponds to each analytical parameter of a plurality of parameters”)
generating an n-gram input based on the event data; (Mahna, para. 0039 and 0043: “Referring still to FIG. 1, at least a processor 104 may identifying at least a significant term 128 as a function of criterion element 116. As used in this disclosure a “significant term” is any string of symbols, text, and/or depictions that represent one or more objects and/or entities that influence a medical decision” “Still referring to FIG. 1, each vector may represent a semantic relationship between at least a significant term 128 and a semantic unit in criterion element 116…As used in this disclosure “semantic units” are words, phrases, sentences, and/or “n-grams” of words, defined as a set of n words appearing contiguously in a text” Examiner notes that an Mahna teaches n-gram input to  vector where the n-gram is a relationship based on a significant term, which itself is based on event data).  
processing the n-gram input using the n-gram processing model characterized by the one or more configuration parameters (Mahna at paras. 0024, 0038, 0043 and 0083, supra, teaches a model with configuration parameters to process n-grams)…characterizing the respective event, (Mahna, para. 0025 and 0043: “Still referring to FIG. 1, at least a processor 104 is configured to display a content field window 108. Criterion element 116 may further be received as a function of content field window 108. As used in this disclosure a “content field window” is a window and/or field that allows a user to enter criterion element 116. For example, and without limitation, content field window 108 may include one or more text boxes, text fields, text entry boxes, and the like thereof. As a further non-limiting content field window 108 may include one or more drop down menus, buttons, and/or selection options.” “Still referring to FIG. 1, each vector may represent a semantic relationship between at least a significant term 128 and a semantic unit in criterion element 116. As used in this disclosure a “semantic relationship” is a relationship between the at least a significant term 128 and a semantic unit in criterion element 116.”)
…to generate the encoded language features; (Mahna, para. 0034: “Still referring to FIG. 1, in some cases, OCR may include post-processing. For example, OCR accuracy can be increased, in some cases, if output is constrained by a lexicon. A lexicon may include a list or set of words that are allowed to occur in a document. In some cases, a lexicon may include, for instance, all the words in the English language, or a more technical lexicon for a specific field.”)
obtaining knowledge data that specifies information of the event data; (Mahna, para. 0021: “In an embodiment, this disclosure can receive a criterion element as a function of a user input. Aspects of the present disclosure can train a machine-learning model using the criterion element. This is so, at least in part, because this disclosure identifies significant terms as a function of the criterion element and trains the machine-learning model with the significant terms.”) 
generating, via a content tagging engine (Mahna, para. 0039: “at least a processor 104 may identifying at least a significant term 128 as a function of criterion element 116.”), from the event data and the knowledge data, tag data specifying a respective tag for each of the plurality of events; (Mahna, para. 0068: “As a non-limiting example, training data 504 may include data entered in standardized forms by persons or processes, such that entry of a given data element in a given field in a form may be mapped to one or more descriptors of categories. Elements in training data 504 may be linked to descriptors of categories by tags, tokens, or other data elements; for instance, and without limitation, training data 504 may be provided in fixed-length formats, formats linking positions of data to categories such as comma-separated value (CSV) formats and/or self-describing formats such as extensible markup language (XML), JavaScript Object Notation (JSON), or the like, enabling processes or devices to detect categories of data.”)
wherein generating the tag data comprises: generating, from the event data and according to a selected n-gram feature configuration… (Mahna, para. 0039 and 0043: “Referring still to FIG. 1, at least a processor 104 may identifying at least a significant term 128 as a function of criterion element 116. As used in this disclosure a “significant term” is any string of symbols, text, and/or depictions that represent one or more objects and/or entities that influence a medical decision” “Still referring to FIG. 1, each vector may represent a semantic relationship between at least a significant term 128 and a semantic unit in criterion element 116…As used in this disclosure “semantic units” are words, phrases, sentences, and/or “n-grams” of words, defined as a set of n words appearing contiguously in a text” Examiner notes that an Mahna teaches n-gram input to  vector where the n-gram is a relationship based on a significant term, which itself is based on event data) 
processing the list of n-grams and knowledge data generate the tag data; (Mahna, para. 0068: “Training data 504 may be formatted and/or organized by categories of data elements, for instance by associating data elements with one or more descriptors corresponding to categories of data elements. As a non-limiting example, training data 504 may include data entered in standardized forms by persons or processes, such that entry of a given data element in a given field in a form may be mapped to one or more descriptors of categories. Elements in training data 504 may be linked to descriptors of categories by tags, tokens, or other data elements”)
generating, from at least the encoded language features, a respective encoded feature vector for each of the plurality of events; (Mahna, para. 0068 and 0086: “Elements in training data 504 may be linked to descriptors of categories by tags, tokens, or other data elements” “Now referring to FIG. 8, an exemplary embodiment 800 of a criterion element 116 is illustrated. In an embodiment, and without limitation, criterion element 116 may include an ailment criterion 804. As used in this disclosure an “ailment criterion” is an element of datum denoting a parameter and/or identifier associated with an ailment. For example, and without limitation, ailment criterion 804 may denote criterion associated with multiple sclerosis. In an embodiment, and without limitation, criterion element 116 may include a clinical criterion 808. As used in this disclosure a “clinical criterion” is an element of datum denoting a parameter and/or identifier associated with a clinical history and/or medical record.” Examiner notes that Mahna teaches an event at least in the non-limiting example of a clinical history and/or medical record).
by arranging the encoded language features into a vector format for each event based on a predefined feature configuration space; (Mahna, para. 0040: “Still referring to FIG. 1, at least a processor 104 may be configured to generate a vector for at least a significant term 128. As used in this disclosure a “vector” as defined in this disclosure is a data structure that represents one or more quantitative values and/or measures significant terms. A vector may be represented as an n-tuple of values, where n is one or more values, as described in further detail below;”)
combining the tag data with the encoded feature vectors (Mahna, para. 0043: “Still referring to FIG. 1, each vector may represent a semantic relationship between at least a significant term 128 and a semantic unit in criterion element 116. As used in this disclosure a “semantic relationship” is a relationship between the at least a significant term 128 and a semantic unit in criterion element 116. As a non-limiting example, semantic relationships may include associations between the meanings of phrases, sentences, paragraphs, essays, novels, and/or written documents. Additionally and/or alternatively semantic relationships may include, without limitation, synonymy, antonymy, homonymy, polysemy, and/or metonymy.”) to generate a plurality of training examples, (Mahna, para. 0068: Still referring to FIG. 5, “training data,” as used herein, is data containing correlations that a machine-learning process may use to model relationships between two or more categories of data elements. For instance, and without limitation, training data 504 may include a plurality of data entries, each entry representing a set of data elements that were recorded, received, and/or generated together; data elements may be correlated by shared existence in a given data entry, by proximity in a given data entry, or the like. Multiple data entries in training data 504 may evince one or more trends in correlations between categories of data elements”) each training example including an encoded feature vector and a corresponding training tag; and (Mahna, par. 0068: “Elements in training data 504 may be linked to descriptors of categories by tags, tokens, or other data elements”)
performing training of the machine-learning model on the plurality of training examples, (Mahna, para. 0024: “Processor 104 and/or computing device may perform determinations, classification, and/or analysis steps, methods, processes, or the like as described in this disclosure using machine learning processes. A “machine learning process,” as used in this disclosure, is a process that automatedly uses a body of data known as “training data” and/or a “training set” (described further below) to generate an algorithm that will be performed by a computing device/module to produce outputs given data provided as inputs”)
wherein the model includes a neural network with a deep convolutional neural network (CNN) architecture (Mahna, para. 0072: “As a further non-limiting example, a machine-learning model 524 may be generated by creating an artificial neural network, such as a convolutional neural network comprising an input layer of nodes, one or more intermediate layers, and an output layer of nodes.”) 
wherein performing the training of the machine-learning model comprises: processing the encoded feature vector of the training example (Mahna, para. 0042: “In an embodiment, and with continued reference to FIG. 1, each unique extracted and/or other language element may be represented by a dimension of a vector space; as a non-limiting example, each element of a vector may include a number representing an enumeration of co-occurrences of the significant term 128 and/or language element represented by the vector with another significant term 128, and/or language element”) using the machine-learning model (Mahna, para. 0079: “Criterion element 116 may be any of the elements as described herein with reference to FIG. 1. Schema 124 may be any of the schema as described herein with reference to FIG. 1. Machine learning model may be any of the machine learning models as described herein with reference to FIGS. 1 and 5.”) and 
in accordance with current values of parameters of the machine-learning model to generate a predicted tag for the encoded feature vector; (Mahna, para. 0068: “As a non-limiting example, training data 504 may include data entered in standardized forms by persons or processes, such that entry of a given data element in a given field in a form may be mapped to one or more descriptors of categories. Elements in training data 504 may be linked to descriptors of categories by tags, tokens, or other data elements; for instance, and without limitation, training data 504 may be provided in fixed-length formats, formats linking positions of data to categories such as comma-separated value (CSV) formats and/or self-describing formats such as extensible markup language (XML), JavaScript Object Notation (JSON), or the like, enabling processes or devices to detect categories of data.”)
updating the current values of the parameters (Mahna, para. 0027 and 0057: “As used in this disclosure a “criterion element” is an element of datum denoting a parameter and/or identifier associated with a medical record.” “At least a processor 104 may update machine-learning model 132 as a function of the current criterion element and produce schema 124 as a function of the updated machine-learning model.”) using the gradient, (Mahna, para. 0049: “Algorithm to generate language processing model may include a stochastic gradient descent algorithm, which may include a method that iteratively optimizes an objective function, such as an objective function representing a statistical estimation of relationships between terms, including relationships between input terms and output terms, in the form of a sum of relationships to be estimated.”)
generating, using the machine-learning model and in accordance with the current values of parameters of the machine-learning model, a plurality of predicted tags for a plurality of additional events; (Mahna, para. 0068: “As a non-limiting example, training data 504 may include data entered in standardized forms by persons or processes, such that entry of a given data element in a given field in a form may be mapped to one or more descriptors of categories. Elements in training data 504 may be linked to descriptors of categories by tags, tokens, or other data elements”)
evaluating a prediction error for one or more of the predicted tags based on the knowledge data; and (Mahna, para. 0074: “Scoring function may be expressed as a risk function representing an “expected loss” of an algorithm relating inputs to outputs, where loss is computed as an error function representing a degree to which a prediction generated by the relation is incorrect when compared to a given input-output pair provided in training data 504.” Examiner notes that Mahna teaches loss as an error function based on evaluation of the prediction.)
Mahna does not explicitly disclose, however Fei teaches: 
…to generate an n-gram output that includes a respective set of n-grams for each set of text fields… (Fei, para. 0041: “Given raw text sequence, the skip-gram generation and convolution module is configured to generate non-consecutive n-gram sequences (as shown in FIG. 2), to extract non-consecutive n-gram features and to detect localized abstract features at different positions (as shown in FIG. 3), which provide more comprehensive and enriched text expression in syntax, help understand the varying text expression better” Examiner notes that Fei teaches a set of output n-grams in the form of skip-grams where each set of input n-grams produces a corresponding set of output skim-grams)
wherein the one or more configuration parameters include a minimum gram size and a maximum gram size (Fei, para. 0069: “Experiments are also conducted to test the robustness of deep skip-gram network embodiments with difference[sic] gram sizes on large-scale datasets. The gram sizes are respectively set to 3, 4, 5 individually and the filter size and stride size are set to be equal to corresponding gram size. Their results are compared to aforementioned gram size setting: 3-4-5 combination”) received on a user interface (Fei, para. 0079: “A number of controllers and peripheral devices may also be provided, as shown in FIG. 9. An input controller 903 represents an interface to various input device(s) 904, such as a keyboard, mouse, touchscreen, and/or stylus.”) based on a user input (Fei, para. 0025: “The skip generation module 110 generates a plurality of skip-gram sequences 115 with n-gram length from input text data 105. In one or more embodiments, the input text data 105 comprises one or more statements or sentences. The input text or input sentence comprises one or more words, and may or may not be a format of formal sentence.”), and wherein the system is associated with the user interface (Fei, para. 0079, infra, teaches a user interface); and
processing the n-gram output, with a n-gram encoding model having a neural network (Fei, para. 0025: “The skip-gram convolution module 130 may be a convolutional neural network (CNN) to extract non-consecutive n-gram features of varying text expression, and effectively detects local-range features at different positions”)
…a respective list of n-grams for each of the plurality of events; and (Fei, para. 0041: “Given raw text sequence, the skip-gram generation and convolution module is configured to generate non-consecutive n-gram sequences (as shown in FIG. 2)”)
with a SoftMax activation function, (Fei, para. 0064: “In one or more experimental settings, only one skip-gram convolutional layer, one-max-overtime-pooling layer, one recurrent layer and one fully-connected softmax layer are used.”) and a categorical cross-entropy loss function to train the neural network and (Fei, para. 0052: “In one or more embodiments, for classification, the cross entropy loss function is used to train the deep skip-gram networks”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Fei into Mahna. Mahna teaches generating a schema using machine learning and user inputs; Fei teaches incorporation of skip-gram convolution to extract non-consecutive local n-gram patterns for comprehensive information for varying text expressions. One of ordinary skill would have been motivated to combine the teachings of Fei into Mahna in order to more comprehensively capture local patterns of varying text expressions of human language (Fei, para. 0037). 
Moreover, Ferreira teaches:
determining a gradient with respect to the parameters of the machine-learning model of a training loss that measures, for each training example, an error between the predicted tag for the training example and the training tag in the training example; (Ferreira, pg. 37 penultimate paragraph and last paragraph: “These labels can then be compared to the actual expected labels and the error corresponding to the set of weights can be calculated.” “a learning model that employs the gradient descent looks to find the set of weights that, when multiplied by the values of the attributes, will produce the smallest error.”)
performing an updated training of the machine-learning model based on the prediction error. (Ferreira, pg. 36 penultimate paragraph and pg 37 first paragraph: “These labels can then be compared to the actual expected labels and the error corresponding to the set of weights can be calculated.” “In the stochastic gradient descent method, the error (and the gradient of the point) is calculated for only one item that is randomly (or stochastically) picked, therefore considerably decreasing the time it takes for the weights to be updated and the time it takes the algorithm to go down the slope”)
wherein the parameter is updated using backpropagation-based machine learning techniques; (Ferreira, pg. 36, first paragraph: Such weights are adjusted through the backpropagation algorithm.”) 
	It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Ferreira into Mahna, as modified. Ferreira teaches a system that automatically assigns registered incidents to the areas that have the expertise to solve them. One of ordinary skill would have been motivated to combine the teachings of Ferreira into Mahna, as modified, in order to improve reduce the workload involved in the management of incidents (Ferreira, pg. vi). 
Finally, Bender teaches:
wherein the encoded language features include, for each n-gram output, a numerical representation of the n-gram, wherein processing the n-gram output comprises: (Bender, para. 0061: “The process 300 may include determining a learned representation of n-grams based on the obtained corpus, as indicated by block 308. A learned representation may include various value types, such as categories, Boolean values, quantitative values, or the like.”),
generating a corresponding embedding feature vector for the each n-gram; and (Bender, para. 0006: “ The process includes determining a first set of embedding vectors in an embedding space based on a first set of n-grams of the natural-language text document.”)
encoding the corresponding embedding feature vector into an encoded n-gram for each n-gram; (Bender, para. 0006: “The process includes generating a sequence of n-grams based on a second set of embedding vectors identified by the second set of vertices.”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Bender into Mahna, as modified. Bender teaches obtaining parameters and a document, determining a domain based on the parameters, where the domain maps to a first ontology, and where ontologies map n-grams onto a set of concepts. One of ordinary skill would have been motivated to combine the teachings of Bender into Mahna, as modified, in order to improve the relevance of retrieved documents by accounting for a user's domain expertise or specific interests (Bender, para. 0041). 

Regarding claim 2, Mahna, as modified, teaches claim 1 above. Mahna further teaches: 
The method of claim 1, wherein the respective tag for the respective event specifies an event category of the event. (Mahna, para. 0068 and 0086: “Elements in training data 504 may be linked to descriptors of categories by tags, tokens, or other data elements” “Now referring to FIG. 8, an exemplary embodiment 800 of a criterion element 116 is illustrated. In an embodiment, and without limitation, criterion element 116 may include an ailment criterion 804. As used in this disclosure an “ailment criterion” is an element of datum denoting a parameter and/or identifier associated with an ailment. For example, and without limitation, ailment criterion 804 may denote criterion associated with multiple sclerosis. In an embodiment, and without limitation, criterion element 116 may include a clinical criterion 808. As used in this disclosure a “clinical criterion” is an element of datum denoting a parameter and/or identifier associated with a clinical history and/or medical record.” Examiner notes that Mahna teaches an event at least in the non-limiting example of a clinical history and/or medical record).

Regarding claim 3, Mahna, as modified, teaches claim 1 above. Mahna does not explicitly disclose: 
The method of claim 1, wherein the plurality of events includes a plurality of information technology (IT) incidents, and the event data includes digital records of the IT incidents. 
However, Ferreira teaches:
The method of claim 1, wherein the plurality of events includes a plurality of information technology (IT) incidents, and the event data includes digital records of the IT incidents. (Ferreira, pg. 21: “The essence of Incident Routing is removing, from the hands of human agents, the process of analyzing the text of the incident and figuring out the area to which the ticket must be sent. That process is to be done by the system itself, which will look into what has been written in order to, then, categorize the ticket” Examiner notes that Ferreira teaches Information Technology Service Management such that “ticket” refers to a Support Desk Ticket i.e. an IT incident.)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Ferreira into Mahna, as modified, as set forth above with respect to claim 1.

Regarding claim 4, Mahna, as modified, teaches claim 1 above. Mahna further teaches: 
The method of claim 1, wherein the knowledge data include data that specify a list of event categories, (Mahna, para. 0037: “Continuing to refer to FIG. 1, receiving criterion element 116 may further comprise selecting a therapeutic. As used in this disclosure a “therapeutic” is a treatment and/or therapy for a diagnosis and/or medical condition. For example, and without limitation, therapeutic may include one or more drugs, surgeries, diets, and the like thereof. In an embodiment, therapeutic may be selected from a therapeutic database. In an embodiment, therapeutic database may be implemented, without limitation, as a relational database, a key-value retrieval database such as a NOSQL database, or any other format or structure for use as a database that a person skilled in the art would recognize as suitable upon review of the entirety of this disclosure.”) and one or more keywords or indicators for one or more of the event categories. (Mahna, para. 0040: “Each value of n-tuple of values may represent a measurement or other quantitative value associated with a given category of data, or attribute, examples of which are provided in further detail below”)

Regarding claim 5, Mahna, as modified, teaches claim 1 above. Mahna does not explicitly disclose:
The method of claim 1, wherein the encoded language features include, for each of the plurality of event, n-gram features of the respective set of text fields characterizing the respective event. (Mahna, para. 0043: “As used in this disclosure a “semantic relationship” is a relationship between the at least a significant term 128 and a semantic unit in criterion element 116. As a non-limiting example, semantic relationships may include associations between the meanings of phrases, sentences, paragraphs, essays, novels, and/or written documents. Additionally and/or alternatively semantic relationships may include, without limitation, synonymy, antonymy, homonymy, polysemy, and/or metonymy. As used in this disclosure “semantic units” are words, phrases, sentences, and/or “n-grams” of words, defined as a set of n words appearing contiguously in a text.”)

Regarding claim 7, Mahna, as modified, teaches claim 1 above. Mahna further teaches: 
The method of claim 1, wherein obtaining the one or more configuration parameters for the n-gram processing model comprises: receiving the user input specifying the one or more configuration parameters. (Mahna, para. 0026 and 0078: “Referring still to FIG. 1, content field window 108 is displayed at a graphical control interface 112. As used in this disclosure a “graphical control interface” is a user interface comprising a graphical and/or pictorial representation. For example, and without limitation, graphical control interface 112 may include displaying on display window and/or client device a graphical user interface to allow a user and/or medical professional to select an icon, entering a textual string of data, selecting a text box, verbally confirming, and the like thereof.” “Criterion element 116 may be received as a function of a user input. Schema is configured to present a plurality of queries, receive a plurality of rejoinders as a function of the plurality of queries, and determine an outcome as a function of the plurality of rejoinders. At least a processor 104 may be any of the computing device as described herein with reference to FIGS. 1 and 12. Content field window 108 may be any of the content field windows as described herein with reference to FIG. 1. Criterion element 116 may be any of the elements as described herein with reference to FIG. 1.” Examiner notes that Mahna teaches input of a textural string of data wherein the string itself specifies a configuration parameter i.e. the number of words in the continuous string). 

Regarding claim 8, Mahna, as modified, teaches claim 1 above. Mahna further teaches: 
The method of 1, wherein the configuration parameters include one or more gram size parameters and a feature size. (Mahna, para. 0026 and 0078: “Referring still to FIG. 1, content field window 108 is displayed at a graphical control interface 112. As used in this disclosure a “graphical control interface” is a user interface comprising a graphical and/or pictorial representation. For example, and without limitation, graphical control interface 112 may include displaying on display window and/or client device a graphical user interface to allow a user and/or medical professional to select an icon, entering a textual string of data, selecting a text box, verbally confirming, and the like thereof.” “Criterion element 116 may be received as a function of a user input. Schema is configured to present a plurality of queries, receive a plurality of rejoinders as a function of the plurality of queries, and determine an outcome as a function of the plurality of rejoinders. At least a processor 104 may be any of the computing device as described herein with reference to FIGS. 1 and 12. Content field window 108 may be any of the content field windows as described herein with reference to FIG. 1. Criterion element 116 may be any of the elements as described herein with reference to FIG. 1.” Examiner notes that Mahna teaches input of a textural string of data wherein the string itself specifies a configuration parameter i.e. the number of words in the continuous string).

Regarding claim 9, Mahna, as modified, teaches claim 8 above. Ferreria further teaches:
The method of claim 8, wherein the gram size parameters include the minimum gram size and the maximum gram size. (Fei, para. 0069: “Experiments are also conducted to test the robustness of deep skip-gram network embodiments with difference[sic] gram sizes on large-scale datasets. The gram sizes are respectively set to 3, 4, 5 individually and the filter size and stride size are set to be equal to corresponding gram size. Their results are compared to aforementioned gram size setting: 3-4-5 combination”). 
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Fei into Mahna as set forth above with respect to claim 1.

Regarding claim 13, Mahna, as modified, teaches claim 1 above, Mahna further teaches: 
Further comprising: generating a model input from the event data; (Mahna, para. 0021: “In an embodiment, this disclosure can receive a criterion element as a function of a user input. Aspects of the present disclosure can train a machine-learning model using the criterion element.”)
generating an event tag for the event by processing the model input using a machine learning model that has been trained using the training method of claim 1; and (Mahna, para. 0068: “Still referring to FIG. 5, “training data,” as used herein, is data containing correlations that a machine-learning process may use to model relationships between two or more categories of data elements. For instance, and without limitation, training data 504 may include a plurality of data entries, each entry representing a set of data elements that were recorded, received, and/or generated together; data elements may be correlated by shared existence in a given data entry, by proximity in a given data entry, or the like. Multiple data entries in training data 504 may evince one or more trends in correlations between categories of data elements; for instance, and without limitation, a higher value of a first data element belonging to a first category of data element may tend to correlate to a higher value of a second data element belonging to a second category of data element, indicating a possible proportional or other mathematical relationship linking values belonging to the two categories. Multiple categories of data elements may be related in training data 504 according to various correlations; correlations may indicate causative and/or predictive links between categories of data elements, which may be modeled as relationships such as mathematical relationships by machine-learning processes as described in further detail below.”)
outputting the predicted event tag. (Mahna, para. 0068: “Still referring to FIG. 5, “training data,” as used herein, is data containing correlations that a machine-learning process may use to model relationships between two or more categories of data elements. For instance, and without limitation, training data 504 may include a plurality of data entries, each entry representing a set of data elements that were recorded, received, and/or generated together; data elements may be correlated by shared existence in a given data entry, by proximity in a given data entry, or the like. Multiple data entries in training data 504 may evince one or more trends in correlations between categories of data elements; for instance, and without limitation, a higher value of a first data element belonging to a first category of data element may tend to correlate to a higher value of a second data element belonging to a second category of data element, indicating a possible proportional or other mathematical relationship linking values belonging to the two categories. Multiple categories of data elements may be related in training data 504 according to various correlations; correlations may indicate causative and/or predictive links between categories of data elements, which may be modeled as relationships such as mathematical relationships by machine-learning processes as described in further detail below.”)

Regarding claims 14 and 20, Mahna, as modified, teaches claims 13 and 19 above. Mahna further teaches: 
wherein the event tag specifies an event category of the event. (Mahna, para. 0068: “correlations may indicate causative and/or predictive links between categories of data elements, which may be modeled as relationships such as mathematical relationships by machine-learning processes as described in further detail below.”) 

Regarding claim 16, Mahna, as modified, teaches claims 13 above. Mahna further teaches: 
The method of claim 13, wherein the machine-learning model includes a neural network or a decision tree. (Mahna, para. 0070: “Classification may be performed using, without limitation, linear classifiers such as without limitation logistic regression and/or naive Bayes classifiers, nearest neighbor classifiers such as k-nearest neighbors classifiers, support vector machines, least squares support vector machines, fisher's linear discriminant, quadratic classifiers, decision trees, boosted trees, random forest classifiers, learning vector quantization, and/or neural network-based classifiers.”)

Regarding claims 15 and 18, Mahna, as modified, teaches claims 13 and 17 above. Ferreria further teaches: 
However, Ferreira teaches:
wherein the event is an information technology (IT) incident, and the event data includes a digital record of the IT incident. (Ferreira, pg. 21: “The essence of Incident Routing is removing, from the hands of human agents, the process of analyzing the text of the incident and figuring out the area to which the ticket must be sent. That process is to be done by the system itself, which will look into what has been written in order to, then, categorize the ticket” Examiner notes that Ferreira teaches Information Technology Service Management such that “ticket” refers to a Support Desk Ticket i.e. an IT incident.)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Ferreira into Mahna, as modified, as set forth above with respect to claim 1.

Response to Applicant Arguments/Remarks
35 USC §103
	On page 12, applicant argues that the none of the cited references teaches the claims as amended. Applicant argues that a n-gram and skip-grams serve different purposes in NLP. Applicant specifically argues that Fei teaches away from the claimed invention of “generating n-gram output from the n-gram input”, however, applicant’s argument is unpersuasive for 3 reasons: First, applicant does not claim generating an n-gram output from an n-gram input. Second, Fei teaches both skip grams and n-grams. Third, Fei is not the main reference teaching n-grams but Fei is referenced to teach other claimed features. 
Applicant further argues at the top of page 13 that Fei does not teach “processing the n-gram output, with an encoding model having a neural network, to generate the encoded language features, wherein the encoded language features include, for each n-gram in the n-gram output, a numerical representation of the n-gram” However, this claim limitation has many features which are taught by the combination of references as set above. Furthermore, applicant fails to acknowledge that skip-grams are embeddings which is a specific type of encoding. Applicant’s limitation claims some unspecified “encoded language features” are to be generated as a result of an n-gram input where Fei specifically teaches generation of skip-gram embeddings (i.e. encoded language features) as resulting from n-grams. 
Toward the bottom of page 13, applicant further argues that Bender “merely teaches about scoring a set of n-grams of natural language text” and does not specifically teach “processing an n-gram output, with an n-gram encoding model…” However, this particular limitation is taught by Fei, as referenced above. 
Applicant argues in the middle of page 14 that there is no motivation in either Fei or Bender to replace the teachings of Fei or Bender to arrive at the claimed feature of “generate n-gram output from n-gram input to generate the encoded language features”. However, Bender specifically teaches in paragraph 6, as cited above, use of a first set of n-grams i.e. an n-gram input to generate a sequence of n-grams i.e. n-gram output wherein the n-gram output is generated as a result of n-gram input and a connected first and second ontology graph (i.e. n-gram input to generate the ended language features). 
At the top of page 15, Applicant argues that the motivation to combine is insufficient to support the combination of features in the particular way that applicant argues the features are combined. However, the references each teach different aspects of applicant’s claims, which are lengthy and give rise to claimed features that are broad and often unclear. For example, applicant claims an “n-gram processing model characterized by the one or more configuration parameters” which gives rise to the questions of: (1) what is an n-gram processing model other than any model that is capable of processing n-grams? (2) what models are not characterized by their configuration parameters? 
	At the bottom of page 15, applicant remarks that independent claims 17 and 19 are analogous to independent claim 1 and that all such arguments supporting claim 1 are equally applicable to claims 17 and 19. However, as the arguments are not persuasive with respect to claim 1, they are similarly not persuasive as to claims 17 and 19. 
	Applicant makes no independent argument regarding other dependent claims. Therefore, such claims remain rejected at least as a result of their dependency on rejected independent claims but also for the reasons set forth in the rejection above. 

Conclusion

THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Sally T. Ley whose telephone number is (571)272-3406. The examiner can normally be reached Monday - Thursday, 10:00am - 6:00pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use th
Read full office action
Prosecution Timeline

Apr 26, 2022
Application Filed
Nov 29, 2024
Non-Final Rejection — §103
Feb 03, 2025
Response Filed
Apr 08, 2025
Final Rejection — §103
Jun 05, 2025
Response after Non-Final Action
Jul 11, 2025
Request for Continued Examination
Jul 17, 2025
Response after Non-Final Action
Jul 30, 2025
Non-Final Rejection — §103
Oct 01, 2025
Response Filed
Oct 24, 2025
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

16/733,393
Patent 12443830
COMPRESSED WEIGHT DISTRIBUTION IN NETWORKS OF NEURAL PROCESSORS
2y 5m to grant Granted Oct 14, 2025
16/835,892
Patent 12135927
EXPERT-IN-THE-LOOP AI FOR MATERIALS DISCOVERY
2y 5m to grant Granted Nov 05, 2024
17/992,958
Patent 11880776
GRAPH NEURAL NETWORK (GNN)-BASED PREDICTION SYSTEM FOR TOTAL ORGANIC CARBON (TOC) IN SHALE
2y 5m to grant Granted Jan 23, 2024
Study what changed to get past this examiner. Based on 3 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
15%
Grant Probability
44%
With Interview (+28.8%)
3y 10m
Median Time to Grant
High
PTA Risk
Based on 33 resolved cases by this examiner. Grant probability derived from career allow rate.