Last updated: April 19, 2026

Application No. 18/309,088

SYSTEMS AND METHODS FOR TRAINING AND LEVERAGING A MULTI-HEADED MACHINE LEARNING MODEL FOR PREDICTIVE ACTIONS IN A COMPLEX PREDICTION DOMAIN

Non-Final OA §101§102§103

Filed

Apr 28, 2023

Examiner

WENG, PEI YONG

Art Unit

2141

Tech Center

2100 — Computer Architecture & Software

Assignee

UNITEDHEALTH GROUP, INCORPORATED

OA Round

1 (Non-Final)

Interview Optional

— +23.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 637 resolved cases, 2023–2026

Examiner Intelligence

WENG, PEI YONG View full profile →

Grants 79% — above average

Career Allow Rate

506 granted / 637 resolved

+24.4% vs TC avg

Strong +23% interview lift

Without

With

+23.1%

Interview Lift

resolved cases with interview

Typical timeline

3y 3m

Avg Prosecution

18 currently pending

Career history

655

Total Applications

across all art units

Statute-Specific Performance

§101

12.4%

-27.6% vs TC avg

§103

49.3%

+9.3% vs TC avg

§102

19.2%

-20.8% vs TC avg

§112

8.8%

-31.2% vs TC avg

Black line = Tech Center average estimate • Based on career data from 637 resolved cases

Office Action

§101 §102 §103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
This action is responsive to the following communication: Non-Provisional Application filed Apr. 28, 2023.
Claims 1-20 are pending in the case. Claims 1, 12 and 18 are independent claims.
Drawings
The drawings filed 4/28/2023 are objected to because Figs 3, 5 and 7 are low quality scans with illegible elements.  Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Claim Rejections - 35 USC § 101
101 Rejection35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 USC § 101 because the claimed invention is directed to non-statutory subject matter.
Regarding Claim 1:  Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 1 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis:  Claim 1 under its broadest reasonable interpretation is a series of mental processes.  For example, but for the generic computer components language, the above limitations in the context of this claim encompass machine learning processing, including the following: 
receiving, […], a text input and contextual data indicative of a predictive category for the text input (mental process that can be drawn on paper or help with generic computer)
generating, […] and using a multi-headed composite model, an output embedding for the text input based on the predictive category, wherein: the multi-headed composite model comprises a model body, a plurality of model heads, and a gate function, the text input is processed with at least one model head of the plurality of model heads, and the gate function is configured to select the at least one model head based on the predictive category for the text input (mathematical calculations and relationships.  The claim explicitly recites gate function which is a mathematical function.  This claim limitation is highly analogous to Example 47 of the 2024 SME guidance);
providing, […], a predictive label for the text input based on the output embedding. (observation, evaluation and judgement)
Therefore, claim 1 recites an abstract idea which is a judicial exception.
Step 2A Prong Two Analysis:  Claim 1 recites additional elements “by one or more processors”. However, these additional features are computer components recited at a high-level of generality, such that they amount to no more than mere instructions to apply the judicial exception using a generic computer component.  An additional element that merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, does not integrate the judicial exception into a practical application (See MPEP 2106.05(f)).  Claim 1 also recites additional elements “receiving, by one or more processors, a plurality of training datasets corresponding to a plurality of predictive categories” which amounts to gathering data which is insignificant extra-solution activity (See MPEP 2106.05(g)).  Therefore, claim 1 is directed to a judicial exception.
Step 2B Analysis:  Claim 1 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the lack of integration of the abstract idea into a practical application, the additional elements recited in claim 1 amount to no more than mere instructions to apply the judicial exception using a generic computer component and insignificant extra-solution activity.  The gathering and outputting of data is considered well-understood, routine, and conventional in the art (see MPEP 2106.05(d)(II)(i)).
For the reasons above, claim 1 is rejected as being directed to non-patentable subject matter under §101. This rejection applies equally to independent claims 12 and 18, which recite a system and storage media, respectively, as well as to dependent claims 2-11 and 19-20. 
Independent claim 12 recites additional instructions to apply the judicial exception using generic computer components “A system comprising memory and one or more processors communicatively coupled to the memory, the one or more processors configured to” which does not integrate the judicial exception into a practical application (see MPEP 2106.05(f)).
Independent claim 19 also recites additional instructions to apply the judicial exception using generic computer components “One or more non-transitory computer-readable storage media including instructions that, when executed by one or more processors, cause the one or more processors to” which does not integrate the judicial exception into a practical application (see MPEP 2106.05(f)).
The additional limitations of the dependent claims are addressed briefly below:
Dependent claim 2 recites additional insignificant extra-solution activity of gathering and outputting data “the contextual data is indicative of a third-party category for the text input and the predictive category is based on the third-party category” which amounts to selection of data (see MPEP 2106.05(g)) which is well-understood, routine, and conventional in the art (See MPEP 2106.05(d)(II)(iii))
Dependent claim 3 recites additional insignificant extra-solution activity of gathering and outputting data “the predictive category is based on a semantic mapping between a plurality of third-party categories and a plurality of predictive categories” which amounts to selection of data (see MPEP 2106.05(g)) which is well-understood, routine, and conventional in the art (See MPEP 2106.05(d)(II)(iii))
Dependent claim 4 recites additional insignificant extra-solution activity of gathering and outputting data “the contextual data is indicative of user input that identifies the predictive category” which amounts to selection of data (see MPEP 2106.05(g)) which is well-understood, routine, and conventional in the art (See MPEP 2106.05(d)(II)(iii))
Dependent claim 5 recites additional instructions to apply the judicial exception using generic computer components “the multi-headed composite model comprises a neural network” (See MPEP 2106.05(f))
Dependent claim 6 recites additional instructions to apply the judicial exception using generic computer components “the model body comprises a first plurality of attention blocks of the neural network and each model head of the plurality of model heads comprises a second plurality of attention blocks of the neural network” (See MPEP 2106.05(f))
Dependent claim 7 recites additional insignificant extra-solution activity “each of the plurality of model heads corresponds to a particular predictive category of a plurality of predictive categories in a prediction domain” which is well-understood, routine, and conventional in the art (See MPEP 2106.05(d)(II)(iii)).
Dependent claim 8 recites additional insignificant extra-solution activity “generating the output embedding comprises: generating, using the model body, an intermediate output for the text input; and generating, using the at least one model head, the output embedding based on the intermediate output” which is well-understood, routine, and conventional in the art (See MPEP 2106.05(d)(II)(iii)).
Dependent claim 9 recites additional insignificant extra-solution activity “the predictive label is one of a plurality of predefined ontology agnostic predictive labels” which amounts to selection of a data type (see MPEP 2106.05(g)).
Dependent claim 10 recites additional insignificant extra-solution activity “providing the predictive label for the text input based on the output embedding comprises: generating a plurality of label probabilities based on a comparison between the output embedding and a plurality of label embeddings corresponding to the plurality of predefined ontology agnostic predictive labels; and identifying the predictive label based on the plurality of label probabilities.” which is well-understood, routine, and conventional in the art (See MPEP 2106.05(d)(II)(iii)).
Dependent claim 11 recites additional observation, evaluation, and judgement “each of the plurality of label probabilities are indicative of a distance between the output embedding and a respective label embedding of the plurality of label embeddings” which is well-understood, routine, and conventional in the art (See MPEP 2106.05(d)(II)(iii)).
Dependent claims 13-17 and 19-20 are rejected for the similar reasons discussed above.
Therefore, when considering the elements separately and in combination, they do not add significantly more to the inventive concept. Accordingly, claims 1-20 are rejected under 35 U.S.C. § 101. 

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1, 4-8, 12 and 15-18 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Wang (“Cross-Media Keyphrase Prediction: A Unified Framework with Multi-Modality Multi-Head Attention and Image Wordings”, 2020).
With respect to independent claim 1, Wang teaches computer-implemented method, the computer-implemented method comprising: 
receiving, by one or more processors, a text input and contextual data indicative of a predictive category for the text input (see e.g., Section 3 – “we use an open-source toolkit (Smith, 2007) to extract OCR texts in form of a word sequence. It is then appended into the post text with a delimited token hsepi to notify the change of text genres, which is shown to be a simple yet effective design to combine OCR features.”); 
generating, by the one or more processors and using a multi-headed composite model, an output embedding for the text input based on the predictive category (see e.g., Section 3.2 – “Our design of multi-head attention is inspired by its prototype in Transformer (Vaswani et al., 2017). We extend it to capture multiple forms of crossmodality interactions for a multimedia post, which is therefore named as M3H-Att, short for MultiModality Multi-Head Attention. Compared to its original use as a self-attention over texts only, we instead operate on three modalities (text, attribute, and vision) in a pairwise co-attention manner.”), 
wherein: the multi-headed composite model comprises a model body, a plurality of model heads, and a gate function, the text input is processed with at least one model head of the plurality of model heads (see e.g., Fig. 3 and Section 3.1-3.3 The examiner notes that a “gate function” is a data flow filter function.), and 
the gate function is configured to select the at least one model head based on the predictive category for the text input (see e.g., Fig. 2 and Section 3 – “our proposed crossmedia keyphrase prediction model in Figure 2. We first encode a text-image tweet into three modalities: text, attribute, and vision (§3.1), and propose a Multi-Modality Multi-Head Attention (M3H-Att) to capture their intricate interactions (§3.2).”); and 
providing, by the one or more processors, a predictive label for the text input based on the output embedding (see e.g., Section 3.1 “It will be fed into a keyphrase classifier and generator for the unified prediction. Notably, this indicates that our M3HAtt’s great potential to serve as a generic module for benefiting other cross-media applications.”).  
With respect to dependent claim 4, Wang teaches the contextual data is indicative of user input that identifies the predictive category (see e.g., Fig. 2 and Section 1 and 3.1-3.3 – “We extend it to capture diverse cross-media interactions, named as Multi-Modality Multi-Head Attention (M3H-Att). Moreover, to well align the images’ semantics to texts’, we adopt image wordings and define two forms for that — explicit optical characters (such as “NBA Finals” in post (b)) detected from the optical character reader (OCR) and implicit image attributes (Wu et al., 2006), high-level text labels predicted to summarize the image’s semantic concepts (such as a “cat” label for post (a)).”).  
With respect to dependent claim 5, Wang teaches the multi-headed composite model comprises a neural network (see e.g., Section 2).  
With respect to dependent claim 6, Wang teaches the model body comprises a first plurality of attention blocks of the neural network and each model head of the plurality of model heads comprises a second plurality of attention blocks of the neural network (see e.g., Fig. 2).  
With respect to dependent claim 7, Wang teaches each of the plurality of model heads corresponds to a particular predictive category of a plurality of predictive categories in a prediction domain (see e.g., Fig. 2 and section 3 - “first encode a text-image tweet into three modalities: text, attribute, and vision (§3.1), and propose a Multi-Modality Multi-Head Attention (M3H-Att) to capture their intricate interactions (§3.2). Then, we feed the learned multi-modality representations for either keyphrase classification or generation, followed with a tailored aggregator to combine their outputs (§3.3). Lastly, the entire framework can be jointly trained via multi-task learning (§3.4).”).  
With respect to dependent claim 8, Wang teaches generating the output embedding comprises: generating, using the model body, an intermediate output for the text input (see e.g. Section 3 - “We represent each input as a triplet (x, I, y), where x and y are formulated as word sequences x = hx1, ..., xlx i and y = hy1, ..., yly i (lx and ly denote the number of words)”); and generating, using the at least one model head, the output embedding based on the intermediate output (see e.g., Section 3.2).  
Claim 12 is rejected for the similar reasons discussed above with respect to claim 1. 
Claim 15 is rejected for the similar reasons discussed above with respect to claim 4. 
Claim 16 is rejected for the similar reasons discussed above with respect to claim 5. 
Claim 17 is rejected for the similar reasons discussed above with respect to claim 6. 
Claim 18 is rejected for the similar reasons discussed above with respect to claim 1. 
Claim 19 is rejected for the similar reasons discussed above with respect to claim 5. 
Claim 20 is rejected for the similar reasons discussed above with respect to claim 6. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2, 3, 13 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Wang in view of Lambert (“MSeg: A Composite Dataset for Multi-domain Semantic Segmentation”, 2020).
With respect to dependent claim 2, Wang does not expressly show the contextual data is indicative of a third-party category for the text input and the predictive category is based on the third-party category.  However, Lambert, in the same field of endeavor, teaches similar feature (Page 1, 3 - "A computer vision professional will likely resort to multiple models, each trained on a different dataset." Each dataset specific domain interpreted as a predictive category corresponding to a respective dataset, each model explicitly trained for the respective domain). Wang as well as Lambert are directed towards machine learning.  Therefore, Wang as well as Lambert are reasonably pertinent analogous art.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Wang with the teachings of Lambert by using the model in Wang as the model architecture for the multiple models trained on different domains/datasets in Lambert.  Lambert provides as additional motivation for combination (Page 1, 3 - "A computer vision professional will likely resort to multiple models, each trained on a different dataset.").  This motivation for combination also applies to the remaining claims which depend on this combination.
With respect to dependent claim 3, the modified Wang teaches the predictive category is based on a semantic mapping between a plurality of third-party categories and a plurality of predictive categories (see e.g., Page 1 – “We present MSeg, a composite dataset that unifies semantic segmentation datasets from different domains: COCO [10], ADE20K [11], Mapillary [9], IDD [13], BDD [14], Cityscapes [8], and SUN RGB-D [15]. A naive merge of the taxonomies of the seven datasets would yield more than 300 classes, with substantial internal inconsistency in definitions. Instead, we reconcile the taxonomies, merging and splitting classes to arrive at a unified taxonomy with 194 categories.”).  
Claim 13 is rejected for the similar reasons discussed above with respect to claim 2. 
Claim 14 is rejected for the similar reasons discussed above with respect to claim 3. 
Claims 9-11 are rejected under 35 U.S.C. 103 as being unpatentable over Wang in view of Lai (“Ontology-based Interpretable Machine Learning for Textual Data”, 2020).
With respect to dependent claim 9, Wang does not expressly show the predictive label is one of a plurality of predefined ontology agnostic predictive labels.  However, Lai, in the same field of endeavor, teaches similar feature (Page 1, 3) Wang as well as Lambert are directed towards machine learning.  Therefore, Wang as well as Lai are reasonably pertinent analogous art.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Wang with the teachings of Lai by using the model in Wang as the model architecture for the multiple models trained on different domains/datasets in Lai.  Lai provides as additional motivation ontology based machine learning based on textual data (Page 1, 3).  This motivation for combination also applies to the remaining claims which depend on this combination.
With respect to dependent claim 10, the modified Wang teaches providing the predictive label for the text input based on the output embedding comprises: generating a plurality of label probabilities based on a comparison between the output embedding and a plurality of label embeddings corresponding to the plurality of predefined ontology agnostic predictive labels; and identifying the predictive label based on the plurality of label probabilities (see e.g., Lai P3 - “To learn the local behavior of f in its vicinity (Eq. 1), we approximate L(f, g, φx) by drawing samples based on x, with the proximity indicated by φx. A sample z can be sampled as: z =  ∪xi∈x,i6=k,i6=l R(xi)  ∪ R({xk, xl}) (3) where R(xi) and R({xk, xl}) are probabilities randomly drawn for each word xi ∈ x(i 6= k, l) and words xk, xl ∈ x together, respectively. If R is greater than a predefined threshold, then the word(s) will be included in z.”).  
With respect to dependent claim 11, the modified Wang teaches each of the plurality of label probabilities are indicative of a distance between the output embedding and a respective label embedding of the plurality of label embeddings (see e.g., Lai P3 - “To learn the local behavior of f in its vicinity (Eq. 1), we approximate L(f, g, φx) by drawing samples based on x, with the proximity indicated by φx. A sample z can be sampled as: z =  ∪xi∈x,i6=k,i6=l R(xi)  ∪ R({xk, xl}) (3) where R(xi) and R({xk, xl}) are probabilities randomly drawn for each word xi ∈ x(i 6= k, l) and words xk, xl ∈ x together, respectively. If R is greater than a predefined threshold, then the word(s) will be included in z.”).

It is noted that any citation to specific pages, columns, lines, or figures in the prior art references and any interpretation of the references should not be considered to be limiting in any way.  “The use of patents as references is not limited to what the patentees describe as their own inventions or to the problems with which they are concerned. They are part of the literature of the art, relevant for all they contain.” In re Heck, 699 F.2d 1331, 1332-33, 216 USPQ 1038, 1039 (Fed. Cir. 1983) (quoting In re Lemelson, 397 F.2d 1006, 1009, 158 USPQ 275, 277 (CCPA 1968)). Further, a reference may be relied upon for all that it would have reasonably suggested to one having ordinary skill the art, including nonpreferred embodiments. Merck & Co. v. Biocraft Laboratories, 874 F.2d 804, 10 USPQ2d 1843 (Fed. Cir.), cert. denied, 493 U.S. 975 (1989). See also  Upsher-Smith Labs. v. Pamlab, LLC, 412 F.3d 1319, 1323, 75 USPQ2d 1213, 1215 (Fed. Cir. 2005);  Celeritas Technologies Ltd. v. Rockwell International Corp., 150 F.3d 1354, 1361, 47 USPQ2d 1516, 1522-23 (Fed. Cir. 1998).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PEIYONG WENG whose telephone number is (571)270-1660.  The examiner can normally be reached on Mon.-Fri.  8 am to 5 pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Matthew Ell, can be reached on (571) 270-3264.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://portal.uspto.gov/external/portal. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). 
/PEI YONG WENG/Primary Examiner, Art Unit 2141

Read full office action

Prosecution Timeline

Apr 28, 2023

Application Filed

Feb 22, 2026

Non-Final Rejection — §101, §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/048,884

Patent 12602594

DIRECTED TRAJECTORIES THROUGH COMMUNICATION DECISION TREE USING ITERATIVE ARTIFICIAL INTELLIGENCE

2y 5m to grant Granted Apr 14, 2026

17/789,227

Patent 12579468

TRAINING DATA SCREENING DEVICE, ROBOT SYSTEM, AND TRAINING DATA SCREENING METHOD

2y 5m to grant Granted Mar 17, 2026

17/556,648

Patent 12572845

INTELLIGENT MACHINE-LEARNING MODEL CATALOG

2y 5m to grant Granted Mar 10, 2026

17/832,519

Patent 12561608

APPARATUS AND METHODS FOR PREDICTING SLIPPING EVENTS FOR MICROMOBILITY VEHICLES

2y 5m to grant Granted Feb 24, 2026

17/831,027

Patent 12555665

HOME EXERCISE PLAN PREDICTION

2y 5m to grant Granted Feb 17, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

79%

Grant Probability

99%

With Interview (+23.1%)

3y 3m

Median Time to Grant

Low

PTA Risk

Based on 637 resolved cases by this examiner. Grant probability derived from career allow rate.