Last updated: April 19, 2026
Application No. 16/661,053
PREDICTIVE DATA ANALYSIS WITH CATEGORICAL INPUT DATA

Non-Final OA §103
Filed
Oct 23, 2019
Examiner
MULLINAX, CLINT LEE
Art Unit
2123
Tech Center
2100 — Computer Architecture & Software
Assignee
Optum Services (Ireland) Limited
OA Round
7 (Non-Final)
Interview Optional

— +38.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 123 resolved cases, 2023–2026
Examiner Intelligence

MULLINAX, CLINT LEE View full profile →
Grants 48% of resolved cases
Career Allow Rate
59 granted / 123 resolved
-7.0% vs TC avg
Strong +38% interview lift
Without
With
+38.3%
Interview Lift
resolved cases with interview
Typical timeline
4y 4m
Avg Prosecution
25 currently pending
Career history
148
Total Applications
across all art units
Statute-Specific Performance

§101
22.8%
-17.2% vs TC avg
§103
53.6%
+13.6% vs TC avg
§102
6.3%
-33.7% vs TC avg
§112
13.1%
-26.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 123 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 03/30/2026 has been entered.

Status of Claims
This action is in reply to the amendments and remarks filed on 03/30/2026.
Claims 1-11, 13-14, 16-18, and 20-22 are pending.
Claims 1, 6-7, 16-18, 20, and 22 have been amended.
Claims 12 and 15 have been canceled. 

Response to Arguments
Applicant’s arguments, with respect to the rejection(s) of claim(s) 1, 18, and 20 under 35 U.S.C. 103, have been considered but are not persuasive. More specifically, the applicant argues that no prior art of reference teaches the amended claim language, since in Amiriparian, “neither of YMRS class, nor the number value corresponding to the YMRS class are input to the CapsNet algorithm”. The examiner respectfully disagrees.
Amiriparian is maintained teach the amended claim limitations based on the broadness of the claim language. Amiriparian, section 1 teaches using a CapsNet for detecting bipolar severity for early treatment (medical service information for a medical service event associated with the categorical input data object), and sections 2-3 teach inputting audio samples of patients including data of bipolar disorder YMRS class (categorical input data objects comprises medical service information…a categorical feature value, of the categorical input data object, corresponding to a discreet candidate category that is predictive of a value of the medical service event) and corresponding label value (numerical feature value, of the categorical input data object, indicating a recorded value of the medical service event) into a CapsNet algorithm (categorical inference machine learning engine) for “training”, and outputting “predictions” being one of the classes (a predicted value) for BD severity classification early detection (medical service); thus, the training dataset includes the mapped elements and is input into the model.
New art, LaLonde, is cited in alternative for teaching the amended limitations.
See 35 U.S.C 103 section for full mapping of claim limitations necessitated by applicant amendments.

Applicant’s arguments, with respect to the rejection(s) of claim(s) 1, 18, and 20 under 35 U.S.C. 103, have been considered but are not persuasive. More specifically, the applicant argues that no prior art of reference teaches the amended claim language, since “Thomas does not describe merging any of those three components (e.g., the text, embedding of the text, and inferred instantiation parameters) to generate an activity vector”. The examiner respectfully disagrees due to the broadness of the claim language.
Thomas is maintained teach the amended claim limitations based on the broadness of the claim language. Thomas, paragraphs 0019-0021, 0029-0032, 0039-0042, and Figs. 1-3 teach the neural network’s (using a regime-specific layer of the categorical inference machine learning engine) layers taking the categorized, “learned entity features” as “input vectors” (that corresponds to the value regime designation/based at least in part on the value regime designation) of the labeled text data (with the numerical feature value) to create an “activity vector [that] represents instantiation parameters of a specific type of entity” (generating…a regime-specific latent representation) that is passed to the next layer for generating another “activity vector” (generating…a regime-specific latent representation) in agreement with the previous prediction based on the parameters (merging the one or more inferred instantiation parameters with the numerical feature value); thus combining vectors.
See 35 U.S.C 103 section for full mapping of claim limitations necessitated by applicant amendments.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-2, 4-8, 14, and 21-22 are rejected under 35 U.S.C. 103 as being unpatentable over Thomas et al (US Pub 20200394509) hereinafter Thomas, in view of Amiriparian et al (“Audio-based Recognition of Bipolar Disorder Utilising Capsule Networks”, 2019) hereinafter Amiriparian, in view of LaLonde et al (“Encoding High-Level Visual Attributes in Capsules for Explainable Medical Diagnoses”, 2019) hereinafter LaLonde.
Regarding claims 1, 18, and 20, Thomas teaches a computer-implemented method comprising; a system comprising memory and one or more processors communicatively coupled to the memory, the one or more processors configured to; one or more non-transitory computer-readable storage media including instructions that, when executed by one or more processors, cause the one or more processors to (paragraphs 0005-0006 and 0073-0074 teach “the invention or elements thereof can be implemented in the form of a system (or apparatus) including a memory, and at least one processor that is coupled to the memory and operative to perform exemplary method steps” and “instructions executing” on the at least one processor to perform the embodiments of the disclosure, as taught in paragraphs 0019-0021, 0029-0030, 0039-0040, and Figs. 1-3 to include a neural network (categorical inference machine learning engine) for classifying a “given a corpus of unlabeled and labeled text documents 301” and “text segments 302” (and based at least in part on categorical input data)): 
receiving, by one or more processors, a plurality of categorical input data objects for input to a categorical inference machine learning engine (paragraphs 0005-0006 and 0073-0074 teach a “processor” as mapped above that in paragraphs 0019-0021, 0029-0032, 0039, and Figs. 1-3 teach “given a corpus of unlabeled and labeled text documents 301 (comprises (a) one or more categorical feature values) to train a neural network based text classifier, the text is reduced to text segments 302 (receiving…a plurality of categorical input data objects) from the unlabeled and labeled text ((a) one or more categorical feature values), which are used as input into a word embedding layer 303” of a neural network (categorical inference machine learning engine), and wherein “the labels are categorical (regime), as compared to continuous labels. These labels can include any label that a user (e.g., human reader) wants to assign to a text (e.g., category, heading, sentiment, title, etc.) (one or more categorical feature values)”. Further, “The word embedding layer 303 maps the words into a m-dimensional space (illustrated as 102 in FIG. 1), transforming the text into numerical values”), 




, 

 generating, by one or more processors and using one or more shared layers of the categorical inference machine learning engine, an embedded feature representation for the categorical feature value (paragraphs 0005-0006 and 0073-0074 teach a “processor” as mapped above that in paragraphs 0039 and Figs. 1-3 teach “to train a neural network based text classifier (categorical inference machine learning engine)…The word embedding layer 303 maps the words into a m-dimensional space (illustrated as 102 in FIG. 1), transforming the text into a numerical values (generating…an embedded feature representations for the one or more categorical feature values)”; and paragraph 0039 “The word embedding layer 303 maps the words into a m-dimensional space (illustrated as 102 in FIG. 1), transforming the text into numerical values”.); 
generating, by the one or more processors and using the one or more shared layers of the categorical inference machine learning engine (paragraphs 0005-0006 and 0073-0074 teach a “processor” as mapped above that in paragraphs 0039-0040 and Figs. 1-3 teach “Through forward-oriented dynamic routing 306, the entity (e.g., n-gram) features of the first capsule layer 201 are communicated to a second capsule layer 202” on the network (using the one or more shared layers of the categorical inference machine learning engine), and paragraphs 0029-0030 teach a “capsule is a group of neurons whose activity vector represents instantiation parameters of a specific type of entity, in this case, a particular n-gram feature”), one or more inferred instantiation parameters for the categorical input data object based on the embedded feature representation for the categorical feature value, wherein an inferred instantiation parameter of the one or more inferred instantiation parameter for the categorical input data object indicates an inferred occurrence property of a corresponding inferred attribute with respect to the categorical input data object (paragraphs 0039-0040 and Figs. 1-3 teach “embedding” input values that are fed to the capsule layers (based on the embedded feature representation for the categorical feature value), of which “Through forward-oriented dynamic routing 306, the entity (e.g., n-gram) features of the first capsule layer 201 are communicated to a second capsule layer 202. The latter capsule layer comprises a set of capsules that are connected to the first layer by forward-oriented dynamic routing. The second capsule layer has the same structure as the first layer. The forward-oriented dynamic routing between those two capsule layers captures global characteristics of the text, before the output of the second capsule layer (generating… one or more inferred instantiation parameters for the categorical input data object) is communicated to the long short-term memory layer 203”. Further, paragraphs 0029-0030 and Fig. 2 teach a “capsule is a group of neurons whose activity vector represents instantiation parameters of a specific type of entity (generating… one or more inferred instantiation parameters for the categorical input data object), in this case, a particular n-gram feature, in this case, a particular n-gram feature…The length of the activity vector represents a probability that the entity is present in the text (an inferred instantiation parameter of the one or more inferred instantiation parameter…indicates an inferred occurrence property of a corresponding inferred attribute with respect to the categorical input data object)”.);
generating, by the one or more processors and using a regime-specific layer of the categorical inference machine learning engine that corresponds to the value regime designation, a regime-specific latent representation by merging the one or more inferred instantiation parameters with the numerical feature values based at least in part on the value regime designation (paragraphs 0005-0006 and 0073-0074 teach a “processor” as mapped above that in paragraphs 0019-0021, 0029-0032, 0039-0042, and Figs. 1-3 teach the neural network’s (using a regime-specific layer of the categorical inference machine learning engine) layers taking the categorized, “learned entity features” as “input vectors” (that corresponds to the value regime designation/based at least in part on the value regime designation) of the labeled text data (with the numerical feature value) to create an “activity vector [that] represents instantiation parameters of a specific type of entity” (generating…a regime-specific latent representation) that is passed to the next layer for generating another “activity vector” (generating…a regime-specific latent representation) in agreement with the previous prediction based on the parameters (merging the one or more inferred instantiation parameters with the numerical feature value)); and 
generating, by the one or more processors, the predictions based at least in part on the regime-specific latent representation (paragraphs 0005-0006 and 0073-0074 teach a “processor” as mapped above that in paragraphs 0004, 0039-0040, and Figs. 1-3 teach “The forward-oriented dynamic routing between those two capsule layers captures global characteristics of the text, before the output of the second capsule layer is communicated to the long short-term memory layer 203”, wherein “the third plurality of processing elements are structured as a long short-term memory layer, which is configured to output a probability distribution (generating the predictions) over all labels generated by the first and second plurality of processing elements (based at least in part on the regime-specific latent representation)” as extracted “sequential features of the text” (generating the predictions)).

However, while Thomas teaches converting the word features into numerical values in order to classify the values, Thomas does not explicitly teach wherein; (i) a categorical input data object of the plurality of categorical input data objects comprises medical service information for a medical service event associated with the categorical input data object, (ii) the categorical inference machine learning engine is trained to output a prediction based at least in part on: (a) a predicted value for the medical service event (b) a categorical feature value, of the categorical input data object, corresponding to a discreet candidate category that is predictive of a value of the medical service event, and (c) a numerical feature value, of the categorical input data object, indicating a recorded value of the medical service event, and (iii) the numerical feature value is associated with a defined total range and a value regime designation of a plurality of value regime designations that respectively correspond to a plurality of numerical subranges of the defined total range; subsequent to identifying the categorical input data object is associated with the value regime designation based on the numerical feature value.
Amiriparian teaches wherein; (i) a categorical input data object of the plurality of categorical input data objects comprises medical service information for a medical service event associated with the categorical input data object, (ii) the categorical inference machine learning engine is trained to output a prediction based at least in part on: (a) a predicted value for the medical service event, (b) a categorical feature value, of the categorical input data object, corresponding to a discreet candidate category that is predictive of a value of the medical service event, and (c) a numerical feature value, of the categorical input data object, indicating a recorded value of the medical service event (section 1 teaches using a CapsNet for detecting bipolar severity for early treatment (medical service information for a medical service event associated with the categorical input data object), and sections 2-3 teach inputting audio samples of patients including data of bipolar disorder YMRS class (categorical input data objects comprises medical service information…a categorical feature value, of the categorical input data object, corresponding to a discreet candidate category that is predictive of a value of the medical service event) and corresponding label value (numerical feature value, of the categorical input data object, indicating a recorded value of the medical service event) into a CapsNet algorithm (categorical inference machine learning engine) for “training”, and outputting “predictions” being one of the classes (a predicted value) for BD severity classification early detection (medical service)), and (iii) the numerical feature value is associated with a defined total range and a value regime designation of a plurality of value regime designations that respectively correspond to a plurality of numerical subranges of the defined total range; subsequent to identifying the categorical input data object is associated with the value regime designation based on the numerical feature value (sections 2-3 teach the YMRS label values (numerical feature values) corresponding to YMRS scores to different ranges (a value regime designation of a plurality of value regime designations that respectively correspond to a plurality of numerical subranges of the defined total range) within 7-20 (defined total range). The data is taught to be vectorized for further processing (subsequent…embedding) for outputting “predictions” of one of the classes.).
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Amiriparian’s teachings of different types of data inputs into a CapsNet algorithm for predictive processing into Thomas’ teaching of training a neural network with capsule layers for generating text data classification predictions in order to train a CapsNet for predicting specific data and increase prediction performance (Amiriparian, sections 2-3).
Further, Amiriparian at least implies wherein; (i) a categorical input data object of the plurality of categorical input data objects comprises medical service information for a medical service event associated with the categorical input data object, (ii) the categorical inference machine learning engine is trained to output a prediction based at least in part on: (a) a predicted value for the medical service event, (b) a categorical feature value, of the categorical input data object, corresponding to a discreet candidate category that is predictive of a value of the medical service event, and (c) a numerical feature value, of the categorical input data object, indicating a recorded value of the medical service event (see mappings above); however, LaLonde teaches wherein; (i) a categorical input data object of the plurality of categorical input data objects comprises medical service information for a medical service event associated with the categorical input data object, (ii) the categorical inference machine learning engine is trained to output a prediction based at least in part on: (a) a predicted value for the medical service event, (b) a categorical feature value, of the categorical input data object, corresponding to a discreet candidate category that is predictive of a value of the medical service event, and (c) a numerical feature value, of the categorical input data object, indicating a recorded value of the medical service event (sections 1.3 and 3 teach training “CapsNet[s]” for outputting “explainable diagnosis” (medical service information for a medical service event) of the existence of lung cancer in medical images. LIDC-IDRI is used as the training dataset and “contains a collection of lung nodules with scores ranging from 1 – 5 (numerical feature value, of the categorical input data object, indicating a recorded value of the medical service event) across a set of visual attributes, indicating their relative appearance, and malignancy (categorical feature value, of the categorical input data object, corresponding to a discreet candidate category that is predictive of a value of the medical service event), as scored by up to four radiologists”; further, the models are tuned based on loss functions during training form the prediction error (alternative predicted value) for the output diagnosis (medical service)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify training a neural network with capsule layers for generating text data classification predictions, as taught by Thomas as modified by Amiriparian’s teachings of different types of data inputs into a CapsNet algorithm for predictive processing, to include training CapsNets for outputting a diagnosis for a patient regarding lung cancer from medical images as taught by LaLonde in order to increase accuracy of diagnosing patient lung cancer and improve speed of detection in medical examinations (LaLonde, abstract, sections 1.1 and 4).
	
Regarding claim 2, the combination of Thomas, Amiriparian, and LaLonde teach all the claim limitations of claim 1 above; and further teach generating, by one or more processors and using the categorical inference machine learning engine (Thomas, paragraphs 0005-0006 and 0073-0074 teach a “processor” as mapped above that in paragraphs 0039-0040 and Figs. 1-3 teach “the word embedding 303 determines the n-gram (e.g., 3-gram) features from the text segments” and “convolutional kernels (filters) extract and learn entity features from the text 305. The learned entity features are input vectors to the capsules” of a neural network’s “first capsule layer 201” (using the categorical inference machine learning engine); and further to communicate “the entity (e.g., n-gram) features of the first capsule layer 201” (generating) to following layers), one or more initial instantiation parameters indicating an extracted occurrence property of the embedded feature representation with respect to the categorical input data object (Thomas, paragraphs 0029-0030 teach a “capsule is a group of neurons whose activity vector represents instantiation parameters of a specific type of entity (generating…one or more initial instantiation parameters), in this case, a particular n-gram feature…The length of the activity vector represents a probability that the entity is present in the text (indicating an extracted occurrence property of the embedded feature representation with respect to the categorical input data object)”), and wherein: 
generating the one or more initial instantiation parameters includes using one or more initial capsule layers of the categorical inference machine learning engine (Thomas, paragraphs 0029-0030 teach a “capsule is a group of neurons whose activity vector represents instantiation parameters of a specific type of entity (generating…one or more initial instantiation parameters)”, and paragraphs 0039-0040 and Figs. 1-3 teach “the word embedding 303 determines the n-gram (e.g., 3-gram) features from the text segments (for each embedded feature representation associated with the corresponding categorical input data object…and based at least in part on the corresponding embedded feature representation)” and “convolutional kernels (filters) extract and learn entity features from the text 305. The learned entity features are input vectors to the capsules” of a neural network’s “first capsule layer 201” (using one or more initial capsule layers of the categorical inference machine learning engine); and further to communicate “the entity (e.g., n-gram) features of the first capsule layer 201” (generating) to following layers), wherein the one or more initial capsule layers comprise a plurality of spatial fully-connected layers and one or more localized convolution layers (Thomas, paragraph 0029 teaches “restructuring the convolutional layers into capsule layers (see FIG. 1)” (localized convolution layers) and paragraph 0035 teaches “connecting a first capsule layer 201 and a second capsule layer 202…Each capsule in the second layer 202 will only use the outputs of the matching capsule in the first layer 201 and those capsules preceding the matching capsule in that layer (1 to (n−1) 'th). The n'th capsule in the first layer is the matching capsule of the n'th capsule in the second layer”, thus the first capsule layer and second capsule layer are each interpreted as fully-connected layers (spatial fully-connected layers)), 
the plurality of spatial fully-connected layers are configured to process the embedded feature representation based at least in part on a spatial relationship between the embedded feature representation and the categorical input data object to generate a spatial feature representation for the embedded feature representation (Thomas, paragraphs 0025-0030, 0039-0040, 0050, and Figs. 1-3 teach “multiple layers of capsules that perform internal computations on their inputs (taught to be vectors) and encapsulate the results of these computations into a vector of informative outputs (to generate a spatial feature representation for the embedded feature representation). Each capsule learns to capture implicitly defined global features or entities (e.g., informative word sequences) over text including labeled and unlabeled text samples”, and categorize them accordingly via “The length of the activity vector represents a probability that the entity is present in the text”), and 
the one or more localized convolution layers are configured to process the spatial feature representation for the embedded feature representation in accordance with one or more feature extraction kernels to generate the one or more initial instantiation parameters for the embedded feature representation (Thomas, paragraph 0029-0030, 0039-0040 and Figs. 1-3 teach “restructuring the convolutional layers into capsule layers (see FIG. 1)” (localized convolution layers), and  “the word embedding 303 determines the n-gram (e.g., 3-gram) features from the text segments (embedded feature representation)” and “convolutional kernels (filters) extract and learn entity features from the text 305 (feature extraction kernels). The learned entity features are input vectors to the capsules” of a “first capsule layer 201” (process the spatial feature representation); wherein a “capsule is a group of neurons whose activity vector represents instantiation parameters of a specific type of entity (to generate the one or more initial instantiation parameters for the embedded feature representation), in this case, a particular n-gram feature…The length of the activity vector represents a probability that the entity is present in the text”).

Regarding claim 4, the combination of Thomas, Amiriparian, and LaLonde teach all the claim limitations of claim 2 above; and further teach wherein the one or more initial capsule layers are further configured to generate, for the embedded feature representation associated with the categorical input data object, an initial occurrence probability for the one or more embedded feature representation with respect to the categorical input data object (Thomas, paragraphs 0039-0040 and Figs. 1-3 teach “the word embedding 303 determines the n-gram (e.g., 3-gram) features from the text segments (for the corresponding embedded feature representation with respect to the embedded categorical input data object)” and “convolutional kernels (filters) extract and learn entity features from the text 305. The learned entity features are input vectors to the capsules” of a neural network’s “first capsule layer 201” (one or more initial capsule layers are further configured to). Paragraphs 0029-0030 teach a “capsule is a group of neurons whose activity vector (generate…an initial occurrence probability) represents instantiation parameters of a specific type of entity (for the embedded feature representation), in this case, a particular n-gram feature…The length of the activity vector represents a probability that the entity is present in the text (an initial occurrence probability for the one or more embedded feature representation with respect to the categorical input data object)”).

Regarding claim 5, the combination of Thomas, Amiriparian, and LaLonde teach all the claim limitations of claim 1 above; and further teach wherein one or more updated capsule layers are configured to generate an inferred probability for the corresponding inferred attribute with respect to the categorical input data object (Thomas, paragraphs 0039-0040 and Figs. 1-3 teach “Through forward-oriented dynamic routing 306, the entity (e.g., n-gram) features of the first capsule layer 201 are communicated to a second capsule layer 202. The latter capsule layer comprises a set of capsules that are connected to the first layer by forward-oriented dynamic routing. The second capsule layer has the same structure as the first layer. The forward-oriented dynamic routing between those two capsule layers captures global characteristics of the text, before the output of the second capsule layer (generate an inferred probability for each corresponding inferred attribute) is communicated to the long short-term memory layer 203”. Further, paragraphs 0029-0030 and Fig. 2 teach a “capsule is a group of neurons whose activity vector (generate an inferred probability) represents instantiation parameters of a specific type of entity (for the corresponding inferred attribute with respect to the categorical input data object), in this case, a particular n-gram feature, in this case, a particular n-gram feature…The length of the activity vector represents a probability (generate an inferred probability) that the entity is present in the text (for the corresponding inferred attribute with respect to the categorical input data object)”.).

Regarding claim 6, the combination of Thomas, Amiriparian, and LaLonde teach all the claim limitations of claim 1 above; and further teach wherein generating the regime-specific latent representation based at least in part on the one or more inferred instantiation parameters for the categorical input data object comprises: 
generating, by the one or more processors and using one or more dimension-adjustment layers of the categorical inference machine learning engine, a dimensionally-adjusted structured representation of the plurality categorical input data objects based at least in part on the one or more inferred instantiation parameters for the categorical input data object (Thomas, paragraphs 0005-0006 and 0073-0074 teach a “processor” as mapped above that in paragraphs 0039, 0045, and Figs. 1-3 teach “to train a neural network based text classifier (categorical inference machine learning engine)…The word embedding layer 303 (by one or more dimension-adjustment layers) maps the words into a m-dimensional space (illustrated as 102 in FIG. 1), transforming the text into a numerical values (generating…a dimensionally-adjusted structured representation of the plurality categorical input data objects)”, and to repeat the process for “a plurality of times” using the learned outputs and “parameters” (based at least in part on the one or more inferred instantiation parameters for the categorical input data object)); 
processing, by the one or more processors and using one or more pre-merger fully-connected layers of the categorical inference machine learning engine, the dimensionally-adjusted structured representation to generate a pre-merger latent representation of the plurality categorical input data objects (Thomas, paragraphs 0005-0006 and 0073-0074 teach a “processor” as mapped above that in paragraph 0029 teaches “restructuring the convolutional layers into capsule layers (see FIG. 1)” and paragraph 0035 teaches “connecting a first capsule layer 201 and a second capsule layer 202…Each capsule in the second layer 202 will only use the outputs of the matching capsule in the first layer 201 and those capsules preceding the matching capsule in that layer (1 to (n−1) 'th). The n'th capsule in the first layer is the matching capsule of the n'th capsule in the second layer”, thus the neural network’s (of the categorical inference machine learning engine) first capsule layer and second capsule layer are each interpreted as full-connected layers (by one or more per-merger fully-connected layers). Paragraphs 0029-0030, 0039-0040, and Figs. 1-3 teach the “first capsule layer 201” taking the embedding layer output (processing…the dimensionally-adjusted structured representation) and applying “convolutional kernals” to “extract and learn entity features from the text” (to generate a pre-merger latent representation of the one or more categorical input data objects).); and
processing, by the one or more processors and using one or more numerical merger layers of the categorical inference machine learning engine and based at least in part on one or more numerical feature values for the categorical input data object, the pre-merger latent representation to generate the regime-specific latent representation (Thomas, paragraphs 0005-0006 and 0073-0074 teach a “processor” as mapped above that in paragraphs 0019-0021, 0029-0030, 0039-0040, and Figs. 1-3 teach the neural network’s (of the categorical inference machine learning engine) “first capsule layer 201” (by one or more numerical merger layers) taking the “learned entity features” as “input vectors” (processing…the pre-merger latent representation) of the labeled text data (and based at least in part on one or more numerical feature values for the categorical input data object) to create an “activity vector [that] represents instantiation parameters of a specific type of entity” (to generate regime-specific latent representation)). 

Regarding claim 7, the combination of Thomas, Amiriparian, and LaLonde teach all the claim limitations of claim 1 above; and further teach wherein generating, by the one or more processors, the prediction based at least in part on the regime-specific latent representation comprises: 
processing, by the one or more processors and using one or more post-merger fully-connected layers of the categorical inference machine learning engine, the pre-merger latent representation to generate a final latent representation of the plurality categorical input data objects (Thomas, paragraphs 0005-0006 and 0073-0074 teach a “processor” as mapped above that in paragraphs 0019-0021, 0029-0036, 0039-0040, and Figs. 1-3 teach a neural network’s (of the categorical inference machine learning engine) “second capsule layer” as mapped above (using one or more post-merger fully-connected layers) taking the first layer’s outputs (processing…the pre-merger latent representation) to create an “activity vector [that] represents instantiation parameters of a specific type of entity” for “the global characteristics of the text” (to generate a final latent representation of the one or more categorical input data objects)); and 
processing, by the one or more processors and using one or more final prediction layers of the plurality categorical input data objects, the final latent representation to generate the prediction (Thomas, paragraphs 0005-0006 and 0073-0074 teach a “processor” as mapped above that in paragraphs 0004, 0039-0040, and Figs. 1-3 teach “The forward-oriented dynamic routing between those two capsule layers captures global characteristics of the text, before the output of the second capsule layer is communicated to the long short-term memory layer 203”, wherein “the third plurality of processing elements are structured as a long short-term memory layer (by one or more final prediction layers), which is configured to output a probability distribution (processing…the final latent representation to generate the one or more predictions) over all labels generated by the first and second plurality of processing elements (of the one or more categorical input data objects)” as extracted “sequential features of the text” (generate the one or more predictions)).

Regarding claim 8, the combination of Thomas, Amiriparian, and LaLonde teach all the claim limitations of claim 1 above; and further teach wherein training the categorical inference machine learning engine comprises: receiving, by one or more processors, one or more training data objects, wherein the one or more training data object are associated with one or more training categorical feature values and one or more ground-truth predictions (Thomas, paragraphs 0005-0006 and 0073-0074 teach a “processor” as mapped above that in paragraphs 0019-0021, 0039, and Figs. 1-3 teach “given a corpus of unlabeled and labeled text documents 301 (wherein the one or more training data object are associated with one or more training categorical feature values) to train a neural network based text classifier, the text is reduced to text segments 302 (receiving one or more training data objects)”, and wherein “the labels are categorical, as compared to continuous labels. These labels can include any label that a user (e.g., human reader) wants to assign to a text (e.g., category, heading, sentiment, title, etc.) (wherein the one or more training data object are associated with one or more training categorical feature values and one or more ground-truth predictions)”; in order “to train a neural network based text classifier (training the categorical inference machine learning engine)”); 
processing, by the one or more processors, the one or more training categorical feature values associated with a training data object of the one or more training data objects using the categorical inference machine learning engine to generate one or more training predictions for the training data object (Thomas, paragraphs 0005-0006 and 0073-0074 teach a “processor” as mapped above that in paragraphs 0019-0021, 0039, 0050 and Figs. 1-3 teach “By capturing local, global and sequential features of the controls (text descriptions) (processing…the one or more training categorical feature values associated with a training data object of the one or more training data objects) the neural network (using the categorical inference machine learning engine) is able to capture additional details (e.g., a categorization of the text) in both the labeled data and the unlabeled data” (generate one or more training predictions for the training data object), and categorize “customer complaints by type (e.g., sentiment) or severity, identified groups (e.g., groups of customers interesting a different products or services), groups users of a system according to a hierarchy, etc.” (generate one or more training predictions for the particular training data object).); 
determining, by the one or more processors, a residual error measure for the training data object based at least in part on the one or more ground-truth predictions for the training data object and the one or more training predictions for the training data object (Thomas, paragraphs 0005-0006 and 0073-0074 teach a “processor” as mapped above that in paragraphs 0044-0046 teach “Optimization is used to find the parameters of the system that reduce (e.g., minimize) value of the loss function (determining a residual error measure), which is indicative of how well the system approximates a solution during training”, and utilizing the “probability distribution over the labels (for the training data object) that in turn is used by the loss function. Depending on whether the input (e.g., 101, FIG. 1) is labeled or unlabeled, different loss functions are used and summed at the end of each iteration…Labeled input is processed using a cross-entropy loss (based at least in part on the one or more ground-truth predictions) and unlabeled input is processed using the virtual adversarial loss function”); 
selecting, by the one or more processors, an error designation of a plurality of error designations for the training data object based at least in part on the residual error measure for the training data object (Thomas, paragraphs 0005-0006 and 0073-0074 teach a “processor” as mapped above that in paragraphs 0029-0030 and 0044-0046 teach “Optimization is used to find the parameters (selecting an error designation) of the system that reduce (e.g., minimize) value of the loss function (based at least in part on the residual error measure for the training data object), which is indicative of how well the system approximates a solution during training” using the inputs (training data), wherein the parameters are taught to be “parameters of a specific type of entity” (selecting an error designation)); 
selecting, by the one or more processors, an error-designation-specific loss model of a plurality of error-designation-specific loss models for the training data object based at least in part on the error designation for the training data object (Thomas, paragraphs 0005-0006 and 0073-0074 teach a “processor” as mapped above that in paragraphs 0029-0030 and 0044-0046 teach “Optimization is used to find the parameters (selecting an error designation) of the system that reduce (e.g., minimize) value of the loss function (based at least in part on the residual error measure for the training data object), which is indicative of how well the system approximates a solution during training” using the inputs (the training data objects); wherein the parameters are taught to be “parameters of a specific type of entity” (based at least in part on the error designation), and “Labeled input (error designation) is processed using a cross-entropy loss (selecting an error-designation-specific loss model) and unlabeled input (error designation) is processed using the virtual adversarial loss function (selecting an error-designation-specific loss model)”); 
determining, by the one or more processors, a prediction error measure for the training data object using the error-designation-specific loss model for the training data object (Thomas, paragraphs 0005-0006 and 0073-0074 teach a “processor” as mapped above that in paragraphs 0044-0046 teach “Optimization is used to find the parameters of the system that reduce (e.g., minimize) value of the loss function (determining a prediction error measure), which is indicative of how well the system approximates a solution during training”, and utilizing the “probability distribution over the labels (for the training data object) that in turn is used by the loss function. Depending on whether the input (e.g., 101, FIG. 1) is labeled or unlabeled, different loss functions are used and summed at the end of each iteration…Labeled input is processed using a cross-entropy loss (using the error-designation-specific loss model) and unlabeled input is processed using the virtual adversarial loss function (using the error-designation-specific loss model)”); and 
updating, by the one or more processors, the categorical inference machine learning engine based at least in part on the prediction error measure for the training data object (Thomas, paragraphs 0005-0006 and 0073-0074 teach a “processor” as mapped above that in paragraph 0044 teaches “a loss function is used to optimize the neural network (updating the categorical inference machine learning engine) by reducing (e.g., minimizing) the loss function of the system” as mapped above).

Regarding claim 14, the combination of Thomas, Amiriparian, and LaLonde teach all the claim limitations of claim 1 above; and further teach wherein the embedded feature representation has a shared embedding structure relative to other embedded feature representations of the embedded feature representation (Thomas, paragraphs 0039 and Figs. 1-3 teach “The word embedding layer 303 (using one or more embedding layers) maps the words into a m-dimensional space (illustrated as 102 in FIG. 1), transforming the text into a numerical values (shared)”).

Regarding claim 21, the combination of Thomas, Amiriparian, and LaLonde teach all the claim limitations of claim 1 above; and further teach generating, by one or more processors and using the categorical inference machine learning engine (paragraphs 0005-0006 and 0073-0074 teach a “processor” as mapped above that in paragraphs 0039-0040 and Figs. 1-3 teach “the word embedding 303 determines the n-gram (e.g., 3-gram) features from the text segments” and “convolutional kernels (filters) extract and learn entity features from the text 305. The learned entity features are input vectors to the capsules” of a neural network’s “first capsule layer 201” (using the categorical inference machine learning engine); and further to communicate “the entity (e.g., n-gram) features of the first capsule layer 201” (generating) to following layers), one or more initial instantiation parameters indicating an extracted occurrence property of the embedded feature representation with respect to the categorical input data object (paragraphs 0029-0030 teach a “capsule is a group of neurons whose activity vector represents instantiation parameters of a specific type of entity (generating…one or more initial instantiation parameters), in this case, a particular n-gram feature…The length of the activity vector represents a probability that the entity is present in the text (indicating an extracted occurrence property of the embedded feature representation with respect to the categorical input data object)”); and wherein:
generating the embedded feature representation includes using an embedding layer of the one or more shared layers of the categorical inference machine learning engine (Thomas, paragraphs 0039 and Figs. 1-3 teach “to train a neural network based text classifier (of the categorical inference machine learning engine)…The word embedding layer 303 maps the words (includes using an embedding layer of the one or more shared layers) into a m-dimensional space (illustrated as 102 in FIG. 1), transforming the text into a numerical values (generating the one or more embedded feature representations)”), generating the one or more initial instantiation parameters includes using an initial capsule layer one or more shared layers of the of the categorical inference machine learning engine (Thomas, paragraphs 0029-0030 teach a “capsule is a group of neurons whose activity vector represents instantiation parameters of a specific type of entity (generating…one or more initial instantiation parameters)”, and paragraphs 0039-0040 and Figs. 1-3 teach “the word embedding 303 determines the n-gram (e.g., 3-gram) features from the text segments (for each embedded feature representation associated with the corresponding categorical input data object…and based at least in part on the corresponding embedded feature representation)” and “convolutional kernels (filters) extract and learn entity features from the text 305. The learned entity features are input vectors to the capsules” of a neural network’s “first capsule layer 201” (using one or more initial capsule layers of the categorical inference machine learning engine); and further to communicate “the entity (e.g., n-gram) features of the first capsule layer 201” (generating) to following layers), and generating the one or more inferred instantiation parameters includes using an subsequent capsule layer of the one or more shared layers of the categorical inference machine learning engine (Thomas, paragraphs 0039-0040 and Figs. 1-3 teach “Through forward-oriented dynamic routing 306, the entity (e.g., n-gram) features of the first capsule layer 201 (based at least in part on each initial instantiation parameter) are communicated to a second capsule layer 202 (includes using one or more subsequent capsule layers of the categorical inference machine learning engine). The latter capsule layer comprises a set of capsules that are connected to the first layer by forward-oriented dynamic routing. The second capsule layer has the same structure as the first layer. The forward-oriented dynamic routing between those two capsule layers captures global characteristics of the text, before the output of the second capsule layer (generating the one or more inferred instantiation parameters) is communicated to the long short-term memory layer 203”; and paragraphs 0029-0030 teach a “capsule is a group of neurons whose activity vector represents instantiation parameters of a specific type of entity, in this case, a particular n-gram feature”).

Regarding claim 22, the combination of Thomas, Amiriparian, and LaLonde teach all the claim limitations of claim 1 above; and further teach the regime-specific layer is configured to generate the regime-specific latent representation by merging the one or more inferred instantiation parameters with the numerical feature value, and the regime-specific layer is based at least in part on the value regime designation and is one of a plurality of regime-specific layers that respectively correspond to the plurality of value regime designations (Thomas, paragraphs 0019-0021, 0029-0032, 0039-0042, and Figs. 1-3 teach the neural network’s layers (regime-specific layer) taking the categorized, “learned entity features” as “input vectors” (based at least in part on the value regime designation) of the labeled text data to create an “activity vector [that] represents instantiation parameters of a specific type of entity” (generating…the regime-specific latent representation) that is passed to the next layer for generating another “activity vector” (generating…a regime-specific latent representation) in agreement with the previous prediction based on the parameters (merging the one or more inferred instantiation parameters with the one or more numerical feature values)).

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Thomas et al (US Pub 20200394509) hereinafter Thomas, in view of Amiriparian et al (“Audio-based Recognition of Bipolar Disorder Utilising Capsule Networks”, 2019) hereinafter Amiriparian, in view of LaLonde et al (“Encoding High-Level Visual Attributes in Capsules for Explainable Medical Diagnoses”, 2019) hereinafter LaLonde, in view of Berman et al (“DGA CapsNet: 1D Application of Capsule Networks to DGA Detection”, 2019) hereinafter Berman.
Regarding claim 3, the combination of Thomas, Amiriparian, and LaLonde teach all the claim limitations of claim 2 above; however, the combination does not explicitly teach wherein the plurality of spatial fully-connected layers are wrapped by a time-distributed layer 
Berman teaches wherein the plurality of spatial fully-connected layers are wrapped by a time-distributed layer (section 2.1-2.4 teach a CapsNet utilizing convolutional layers, wherein “The convolution layers are the core of the CNN. This layer essentially applies a filter to a subset of the input at a specific instance of time” and are “fully connected layers”).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify training a neural network with capsule layers for generating text data classification predictions, as taught by Thomas as modified by Amiriparian’s teachings of different types of data inputs into a CapsNet algorithm for predictive processing, as modified by training CapsNets for outputting a diagnosis for a patient regarding lung cancer from medical images as taught by LaLonde, to include specific CapsNet embedding functions as taught by Berman in order to improve data classification accuracy performance and reduce training time (Berman, section 6).

Claims 9-11 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Thomas et al (US Pub 20200394509) hereinafter Thomas, in view of Amiriparian et al (“Audio-based Recognition of Bipolar Disorder Utilising Capsule Networks”, 2019) hereinafter Amiriparian, in view of LaLonde et al (“Encoding High-Level Visual Attributes in Capsules for Explainable Medical Diagnoses”, 2019) hereinafter LaLonde, in view of Schmidt et al (US Pub 20080292194) hereinafter Schmidt, in view of Shin et al (“An RHHS approach to robust functional linear regression”, 2019) hereinafter Shin.
Regarding claim 9, the combination of Thomas, Amiriparian, and LaLonde teach all the claim limitations of claim 8 above; however, the combination does not explicitly teach wherein: the plurality of error designations comprises a low error designation, a medium error designation, and a high error designation; and the plurality of error-designation-specific loss models comprises a high-outlier-resistant loss model for the low error designation, a medial-outlier-resistant loss model for the medium error designation, and a low-outlier-resistant loss model for the high error designation.
Schmidt teaches wherein: the plurality of error designations comprises a low error designation, a medium error designation, and a high error designation (paragraph 0210 teaches “There are several methods that could be explored to improve this step in future implementations. Different loss functions could be examined, since loss functions such as the absolute error and the Huber loss are more robust to outliers than the squared error measure used here (high-outlier-resistant loss model) [Hastie et al., 2001], though at a higher computational expense”); and 
the plurality of error-designation-specific loss models comprises a high-outlier-resistant loss model for the low error designation, a medial-outlier-resistant loss model for the medium error designation, and a low-outlier-resistant loss model for the high error designation (paragraph 0210 teaches “There are several methods that could be explored to improve this step in future implementations. Different loss functions could be examined, since loss functions such as the absolute error and the Huber loss are more robust to outliers than the squared error measure used here (high-outlier-resistant loss model for the low error designation) [Hastie et al., 2001], though at a higher computational expense”).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify training a neural network with capsule layers for generating text data classification predictions, as taught by Thomas as modified by Amiriparian’s teachings of different types of data inputs into a CapsNet algorithm for predictive processing, as modified by training CapsNets for outputting a diagnosis for a patient regarding lung cancer from medical images as taught by LaLonde, to include different loss function calculations as taught by Schmidt in order to “improve” the step of loss calculations and achieve “more effective intensity standardization” (Schmidt, paragraphs 0210).
However, Schmidt does not explicitly teach and a low-outlier-resistant loss model for the high error designation. Shin teaches and a low-outlier-resistant loss model for the high error designation (section 4 teaches “The non-convex loss functions (biweight and Cauchy) (low-outlier-resistant loss model) clearly outperform convex loss functions (Huber and logistic) under severe outlyingness (mixture Gaussian) (for the high error designation), while all four outlier-resistant loss functions perform comparably under mild outlyingness (t3 and t10).”).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify training a neural network with capsule layers for generating text data classification predictions, as taught by Thomas as modified by as modified by Amiriparian’s teachings of different types of data inputs into a CapsNet algorithm for predictive processing, as modified by training CapsNets for outputting a diagnosis for a patient regarding lung cancer from medical images as taught by LaLonde, as modified by different loss function calculations as taught by Schmidt, to include different loss function calculations as taught by Shin in order to provide “considerable improvement in prediction” accuracies through loss function calculations (Shin, sections 4-5).

Regarding claim 10, the combination of Thomas, Amiriparian, LaLonde, Schmidt, and Shin teach all the claim limitations of claim 9 above; and further teach wherein the high-outlier-resistant loss model is determined based at least in part on a squared-error-based loss model (Schmidt, paragraph 0210 teaches “There are several methods that could be explored to improve this step in future implementations. Different loss functions could be examined, since loss functions such as the absolute error and the Huber loss are more robust to outliers than the squared error measure used here (high-outlier-resistant loss model) [Hastie et al., 2001], though at a higher computational expense”).
Thomas, Amiriparian, LaLonde, Schmidt, and Shin are combinable for the same rationale as set forth above with respect to claim 9.

Regarding claim 11, the combination of Thomas, Amiriparian, LaLonde, Schmidt, and Shin teach all the claim limitations of claim 9 above; and further teach wherein the medial-outlier-resistant loss model is determined based at least in part on an absolute-deviation-based loss model or a Huber loss model (Schmidt, paragraph 0210 teaches “There are several methods that could be explored to improve this step in future implementations. Different loss functions could be examined, since loss functions such as the absolute error (medial-outlier-resistant loss model) and the Huber loss (medial-outlier-resistant loss model) are more robust to outliers than the squared error measure used here [Hastie et al., 2001], though at a higher computational expense”).
Thomas, Amiriparian, LaLonde, Schmidt, and Shin are combinable for the same rationale as set forth above with respect to claim 9.

Regarding claim 13, the combination of Thomas, Amiriparian, LaLonde, Schmidt, and Shin teach all the claim limitations of claim 9 above; and further teach wherein the low-outlier-resistant loss model is determined based at least in part on a Cauchy loss function (Shin, section 4 teaches “The non-convex loss functions (biweight and Cauchy) (low-outlier-resistant loss model) clearly outperform convex loss functions (Huber and logistic) under severe outlyingness (mixture Gaussian), while all four outlier-resistant loss functions perform comparably under mild outlyingness (t3 and t10).”).
Thomas, Amiriparian, LaLonde, Schmidt, and Shin are combinable for the same rationale as set forth above with respect to claim 9.

Claims 15-17 are rejected under 35 U.S.C. 103 as being unpatentable over Thomas et al (US Pub 20200394509) hereinafter Thomas, in view of Amiriparian et al (“Audio-based Recognition of Bipolar Disorder Utilising Capsule Networks”, 2019) hereinafter Amiriparian, in view of LaLonde et al (“Encoding High-Level Visual Attributes in Capsules for Explainable Medical Diagnoses”, 2019) hereinafter LaLonde, in view of Chao et al (“Emotion recognition from multiband EEG signals using CapsNet”, 2019) hereinafter Chao.
Regarding claim 15, the combination of Thomas, Amiriparian, and LaLonde teach all the claim limitations of claim 1 above; however, the combination does not explicitly teach wherein: the categorical input data object of the plurality of categorical input data objects comprises medical service information for a medical service event associated with the categorical input data object, and the one or more predictions for the categorical input data object of the one or more categorical input data objects comprise a predicted value for the medical service event associated with the categorical input data object.
Chao teaches wherein: the categorical input data object of the plurality of categorical input data objects comprises medical service information for a medical service event associated with the categorical input data object, and the one or more predictions for the categorical input data object of the plurality of categorical input data objects comprise a predicted value for the medical service event associated with the categorical input data object (abstract and sections 2.1 and 3.3 teach a CapsNet categorizing EEG “multiband feature matrix (MFM)” (the categorical input data object of the one or more categorical input data objects comprises medical service information for a medical service event) inputs for classifying the input readings as specific emotions (predictions for the categorical input data object of the one or more categorical input data objects comprise a predicted value for the medical service event associated with the categorical input data object)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify training a neural network with capsule layers for generating text data classification predictions, as taught by Thomas as modified by Amiriparian’s teachings of different types of data inputs into a CapsNet algorithm for predictive processing, as modified by training CapsNets for outputting a diagnosis for a patient regarding lung cancer from medical images as taught by LaLonde, to include using a CapsNet for detecting a humans emotional state from EEG MFM inputs as taught by Chao in order to achieve higher prediction accuracies and “improve the performance of multi-channel EEG-based emotion recognition” (Chao, section 4.3).

Regarding claim 16, the combination of Thomas, Amiriparian, LaLonde, and Chao teach all the claim limitations of claim 15 above; and further teach determining, based at least in part on the predicted value for the categorical input data object of the plurality of categorical input data objects, an adjustment need determination (Thomas, paragraphs 0044-0046 teach “Optimization is used to find the parameters of the system that reduce (e.g., minimize) value of the loss function (one or more claim adjustment need determinations), which is indicative of how well the system approximates a solution during training”, and utilizing the “probability distribution over the labels (based at least in part on the predicted value for the categorical input data object) that in turn is used by the loss function. Depending on whether the input (e.g., 101, FIG. 1) is labeled or unlabeled (object), different loss functions are used and summed at the end of each iteration…Labeled input is processed using a cross-entropy loss and unlabeled input is processed using the virtual adversarial loss function”; in other words, the output of the system (claim) is used to determine the loss and then the parameters that are in need of tuning (adjustment) for optimizing the loss function and improving the output accuracy of the model); and 
automatically performing an adjustment corresponding to the adjustment need determination (Thomas, paragraphs 0044-0046 teach “Optimization is used to find the parameters of the system that reduce (e.g., minimize) value of the loss function (one or more claim adjustment need determinations), which is indicative of how well the system approximates a solution during training” and further tuning the found parameters (performing…claim adjustments) without human intervention (automatically)).

Regarding claim 17, the combination of Thomas, Amiriparian, LaLonde, and Chao teach all the claim limitations of claim 15 above; and further teach determining, based at least in part on the predicted value for the categorical input data object of the plurality of categorical input data objects, that an audit need determination (Thomas, paragraphs 0044-0046 teach “Optimization is used to find the parameters of the system that reduce (e.g., minimize) value of the loss function (one or more claim audit need determinations), which is indicative of how well the system approximates a solution during training”, and utilizing the “probability distribution over the labels (based at least in part on the predicted value for the categorical input data object) that in turn is used by the loss function. Depending on whether the input (e.g., 101, FIG. 1) is labeled or unlabeled (object), different loss functions are used and summed at the end of each iteration…Labeled input is processed using a cross-entropy loss and unlabeled input is processed using the virtual adversarial loss function”; in other words, the output of the system (claim) is used to determine the loss and then the parameters that are in need of tuning (audit) for optimizing the loss function and improving the output accuracy of the model); and 
automatically performing the audit corresponding to the audit that need determination (Thomas, paragraphs 0044-0046 teach “Optimization is used to find the parameters of the system that reduce (e.g., minimize) value of the loss function (one or more claim audit that need determinations), which is indicative of how well the system approximates a solution during training” and further tuning the found parameters (performing…claim audit) without human intervention (automatically)).

Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Rivaz et al (US Pub 20120128223) teaches capsule network utilization trained according to data feature categories and tuning parameters. 

Conclusion
17.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to CLINT MULLINAX whose telephone number is 571-272-3241.  The examiner can normally be reached on Mon - Fri 8:00-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 571-270-3428.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/C.M./Examiner, Art Unit 2123                                                                                                                                                                                                        

/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123
Read full office action
Prosecution Timeline

Oct 23, 2019
Application Filed
Apr 06, 2023
Non-Final Rejection — §103
Jun 07, 2023
Interview Requested
Jun 21, 2023
Applicant Interview (Telephonic)
Jun 21, 2023
Examiner Interview Summary
Jul 11, 2023
Response Filed
Oct 20, 2023
Final Rejection — §103
Dec 11, 2023
Response after Non-Final Action
Dec 28, 2023
Applicant Interview (Telephonic)
Dec 28, 2023
Response after Non-Final Action
Jan 31, 2024
Request for Continued Examination
Feb 05, 2024
Response after Non-Final Action
Mar 19, 2024
Non-Final Rejection — §103
Jun 25, 2024
Response Filed
Jun 25, 2024
Response after Non-Final Action
Nov 04, 2024
Response Filed
Mar 05, 2025
Final Rejection — §103
Apr 10, 2025
Applicant Interview (Telephonic)
Apr 10, 2025
Examiner Interview Summary
May 12, 2025
Response after Non-Final Action
Jun 03, 2025
Request for Continued Examination
Jun 04, 2025
Response after Non-Final Action
Aug 08, 2025
Non-Final Rejection — §103
Nov 14, 2025
Response Filed
Jan 02, 2026
Final Rejection — §103
Feb 12, 2026
Examiner Interview Summary
Feb 12, 2026
Applicant Interview (Telephonic)
Feb 24, 2026
Response after Non-Final Action
Mar 30, 2026
Request for Continued Examination
Apr 03, 2026
Response after Non-Final Action
Apr 04, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/375,973
Patent 12561620
Machine Learning-Based URL Categorization System With Noise Elimination
2y 5m to grant Granted Feb 24, 2026
16/726,709
Patent 12554962
CONFIGURABLE PROCESSOR ELEMENT ARRAYS FOR IMPLEMENTING CONVOLUTIONAL NEURAL NETWORKS
2y 5m to grant Granted Feb 17, 2026
17/230,446
Patent 12547887
SYSTEM FOR DETECTING ELECTRIC SIGNALS
2y 5m to grant Granted Feb 10, 2026
17/367,179
Patent 12518169
SYSTEMS AND METHODS FOR SAMPLE GENERATION FOR IDENTIFYING MANUFACTURING DEFECTS
2y 5m to grant Granted Jan 06, 2026
18/410,742
Patent 12493771
DEEP LEARNING MODEL FOR ENERGY FORECASTING
2y 5m to grant Granted Dec 09, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

7-8
Expected OA Rounds
48%
Grant Probability
86%
With Interview (+38.3%)
4y 4m
Median Time to Grant
High
PTA Risk
Based on 123 resolved cases by this examiner. Grant probability derived from career allow rate.
PREDICTIVE DATA ANALYSIS WITH CATEGORICAL INPUT DATA

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email