Last updated: April 19, 2026
Application No. 17/827,364
SYSTEM AND METHOD FOR INTEGRATED LARGE-SCALE AUDIENCE TARGETING VIA AUGMENTED HETEROGENEOUS SUBSYSTEMS

Final Rejection §101§102§103§112§DP
Filed
May 27, 2022
Examiner
GONZALES, VINCENT
Art Unit
2124
Tech Center
2100 — Computer Architecture & Software
Assignee
Yahoo Assets LLC
OA Round
2 (Final)
Interview Optional

— +10.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 522 resolved cases, 2023–2026
Examiner Intelligence

GONZALES, VINCENT View full profile →
Grants 78% — above average
Career Allow Rate
410 granted / 522 resolved
+23.5% vs TC avg
Moderate +10% lift
Without
With
+10.5%
Interview Lift
resolved cases with interview
Typical timeline
3y 6m
Avg Prosecution
26 currently pending
Career history
548
Total Applications
across all art units
Statute-Specific Performance

§101
21.2%
-18.8% vs TC avg
§103
39.9%
-0.1% vs TC avg
§102
13.2%
-26.8% vs TC avg
§112
14.6%
-25.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 522 resolved cases
Office Action

§101 §102 §103 §112 §DP
DETAILED ACTION
This action is written in response to the remarks and amendments dated 11/26/25. This action is made final. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
The Applicants argue that the previous art of record does not anticipate or render obvious the claims as currently amended. The Examiner provides updated prior art rejections below necessitated by the current amendments.
§101 – The Applicant argues: “The Office Action alleges that the claims fall under "Mental processes" grouping of abstract ideas. See, Office Action at page 5. Applicant respectfully disagrees for at least the following reasons. Initially, it should be understood that "[i]he present teaching generally relates to machine learning" (para. [0001]), which cannot be performed in the human mind. Particularly, claim 1 recites "constructing an expert hierarchy comprising an initial expert layer and one or more augmented expert layers" and "obtaining a nonlinear integration model, via machine learning, for combining expert predictions from experts in the expert hierarch based on an input to generate an integrated expert prediction in response to the input." At lease these features cannot be performed in the human mind because human mind cannot construct an expert hierarchy in which an expert at any augmented expert layer augments the expertise of experts at any lower layer and cannot obtain a model via machine learning. Thus, the claims do not fall within the "Mental processes" grouping of abstract ideas.” (Remarks, pp. 12-13.)
The Examiner is not persuaded. The Applicant provides no definition or meaningful guidance as to the scope of the claim term ‘expert’. Accordingly, the Examiner interprets these structures according to their broadest reasonable interpretation as encompassing any (human) mental process which results in a decision. A plurality of these processes is still a mental process.

§101 – The Applicant argues: “The recited features of claim 1 are clearly tied to a practical application, i.e., integrated targeting in the technical field of machine learning.”
Machine learning is itself a thinking tool with potential applications in any domain. The Applicant does not address any particular, real-world problem, and therefore the Examiner is unable to identify any practical application in the claims.
For the foregoing reasons, the Examiner maintains all outstanding rejections under §101, which are reproduced infra.

§102 – The Applicant argues “Jordan does not teach the above-quoted claim features for at least the following reasons. Jordan teaches "[a] two-level hierarchical mixture of experts. To form a deeper tree, each expert is expanded recursively into a gating network and a set of sub-experts" (Jordan, FIG. 1). However, Jordan does not teach or suggest the recited way of experts in the expert hierarchy being trained - "wherein each expert in a layer of the expert hierarchy is trained based on training data directed thereto and an expert output from an expert at any lower layer of the expert hierarchy so that an expert at any of the one or more augmented expert layers augments the expertise of experts at any lower layer of the expert hierarchy," as recited in claim 1.”
The Examiner is not persuaded. Every expert disclosed in Jordan is trained via supervised training:
P. 2 “Hierarchical mixtures of experts … The algorithms that we discuss in this paper are supervised learning algorithms.” …“Each expert produces an output vector μij for each input vector. These output vectors proceed up the tree, being blended by the gating network outputs.”
Supervised learning means that the models are trained with labeled training data, ie “training data directed thereto”. The Applicant’s amendments do not seem incorporate any features that are not taught by Jordan.
For the foregoing reasons, the Examiner maintains all outstanding rejections under §§ 102 and 103, which are reproduced infra, and which have been updated to reflect the Applicant’s claim amendments.

Claim Objections
Each of independent claims 1, 8 and 15 recites “expert hierarch”. It appears that this should be “expert hierarchy”. Appropriate correction is required.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, eg, In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 1-5, 8-13 and 15-20 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over the corresponding claims  of US patent application 17/827400. Although the claims at issue are not identical, they are not patentably distinct from each other for the reasons outlined in the table below. This is a provisional rejection because the claims of the copending application have not been allowed and issued.
This application – 17/827364
Copending application – 17/827400
1. A method implemented on at least one processor, a memory, and a communication platform for integrated targeting, comprising:
1. A method implemented on at least one processor, a memory, and a communication platform for predicting user segment via machine learning, comprising:
constructing an expert hierarchy comprising an initial expert layer and one or more augmented expert layers,

wherein the initial expert layer has a plurality of initial experts and an augmented expert layer has at least one augmented expert for prediction, wherein each expert in a layer of the expert hierarchy is trained based on training data directed thereto and an expert output from an expert at any lower layer of the expert hierarchy so that an expert at any of the one or more augmented expert layers augments the expertise of experts at any lower layer of the expert hierarchy;
creating an initial expert layer of an expert hierarchy with a plurality of initial experts trained for prediction;


deriving at least one augmented expert layer for the expert hierarchy with one or more augmented experts at each of the at least one augmented expert layer, wherein each augmented expert at any of the at least one augmented expert layer augments the plurality of initial experts and is trained, via machine learning for the prediction,using training data and outputs from all of experts from a lower expert layer, wherein each of the outputs is generated by each of the experts from the lower expert layer based on the training data as an input to the expert;


obtaining a nonlinear integration model, via machine learning, for combining expert predictions from experts in the expert hierarch based on an input to generate an integrated expert prediction in response to the input.
7. The method of claim 1, further comprising:

accessing a nonlinear integration model provided for integrating different expert predictions; combining, in accordance with the nonlinear integration model, expert predictions from the initial and augmented experts in the expert hierarchy generated based on the input; and

generating an integrated expert prediction based on a result of the combining.
As illustrated in the table above, every limitation in this application has a corresponding equivalent or more specific limitation in the ‘400 application. Thus, the ‘400 application anticipates this claim.

Independent claims 8 and 15 recite an analogous medium and system, respectively.


The correspondence in dependent claims is illustrated in the table below.
This application – 17/827364
Copending application – 17/827400
2. The method of claim 1, wherein the initial experts of the initial expert layer are heterogeneous experts.
2. The method of claim 1, wherein the plurality of initial experts are heterogeneous experts.

3. The method of claim 1, wherein an augmented expert at an augmented expert layer is derived by augmenting experts at any lower layer based on training data as well as predictions from experts at any lower layer based on the training data.
3. The method of claim 1, wherein when the expert hierarch has multiple augmented expert layers, each augmented expert at an augmented expert layer higher than a first augmented expert layer additionally augments any augmented expert at a lower augmented expert layer.
4. The method of claim 1, wherein the step of obtaining the nonlinear integration model comprises:

configuring the nonlinear integration model via a plurality of parameters; and

learning values of the plurality of parameters via machine learning to capture nonlinear relationships among the experts in the expert hierarchy.
These further limitations are not in the claims of the ‘400 application. However, they are taught by the Jordan reference, see claim mapping in §102 rejections infra. At the time of filing, it would have been obvious to a person of ordinary skill to combine the techniques disclosed by Jordan with the method of ‘400 because this would improve classification performance through iterative training.
5. The method of claim 4, wherein the nonlinear integration model corresponds to an artificial neural network (ANN) with the plurality of parameters related to the ANN, including embeddings of the ANN.
These further limitations are not in the claims of the ‘400 application. However, they are taught by the Jordan reference, see claim mapping in §102 rejections infra. At the time of filing, it would have been obvious to a person of ordinary skill to combine the techniques disclosed by Jordan with the method of ‘400 because this would improve classification performance through iterative training.
As illustrated in the table above, every claim limitation above in this application has a corresponding equivalent or more specific limitation in the ‘400 application. Thus, the ‘400 application anticipates these claim. (Obviousness instead of anticipation applies where noted above.)

Dependent claims 9-13 and 16-20 are analogous to dependent claims 2-5.



Claim Rejections - 35 USC § 112(b) - Indefiniteness
The following is a quotation of the second paragraph of 35 U.S.C. 112:
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

Claims 1-20 are rejected under 35 U.S.C. 112(b), as being indefinite for failing to particularly point out and distinctly claim the subject matter which applicant regards as the invention.
Claim 1 recites “wherein each expert in a layer of the expert hierarchy is trained based on training data directed thereto and an expert output from an expert at any lower layer of the expert hierarchy so that an expert at any of the one or more augmented expert layers augments the expertise of experts at any lower layer of the expert hierarchy”. However, this limitation is unclear for the following reasons:
The Applicant provides no definition in their specification for ‘expert’. Illustrative use in the specification suggests that an expert is a machine learning model. The scope of the term is not clear from the claim language.
Machine learning models are not commonly viewed as having ‘expertise’. This term implies judgement, which is a human ability, and not one inherent in an algorithm.
It is unclear how the ‘experts’ of the expert hierarchy are trained, ie the role of the training data is unclear. What is “training data directed thereto”? How is the “output from an expert” at any lower level used by the current expert for training?
For the foregoing reasons, the identified limitation is incoherent, and consequently a person of ordinary skill would not be able to understand the scope of the claim with reasonable certainty. Therefore the claim is indefinite. Independent claims 8 and 15 are indefinite for the same reasons, and all pending dependent claims inherit this deficiency from their respective parent claims.

Claim Rejections - 35 USC § 101
Claims 1-3, 8-10 and 15-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. 35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
In determining whether the claims are subject matter eligible, the Examiner applies the 2019 USPTO Patent Eligibility Guidelines.1
Step 1: Is the claim to a process, machine, manufacture, or composition of matter? Yes—claim 1 recites a method, which is a process.
Step 2A, prong one: Does the claim recite an abstract idea, law of nature or natural phenomenon? Yes—the claim recites one or more limitations which—under their broadest reasonable interpretation—covers performance of the limitation in the mind (see table below).
Claim limitation 
Examiner analysis
1. A method … for integrated targeting, comprising:
constructing an expert hierarchy comprising an initial expert layer and one or more augmented expert layers, wherein the initial expert layer has a plurality of initial experts and an augmented expert layer has at least one augmented expert for prediction,
wherein each expert in a layer of the expert hierarchy is trained based on training data directed thereto and an expert output from an expert at any lower layer of the expert hierarchy so that an expert at any of the one or more augmented expert layers augments the expertise of experts at any lower layer of the expert hierarchy;
This is a mental process akin to a human judgment/observation.

This is merely additional information about the mental process identified above.
obtaining a nonlinear integration model, via machine learning, for combining expert predictions from experts in the expert hierarch based on an input to generate an integrated expert prediction in response to the input. 
This is a mental process akin to a human judgment/observation. Many machine learning models can be practically implemented as a mental process (typically with the aid of pencil and paper).

Because the claim recites limitations which can practically be implemented as mental processes, the claim recites a mental process.
Step 2A, prong two: Does the claim recite additional elements that integrate the judicial exception into a practical application? No—no particular real-world problem is addressed by the claimed invention. Machine learning is a thinking tool with applications in any domain. The Applicant recites only broad/vague outputs: “to generate expert prediction in response to the input”. (Expert prediction of what?)
Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception? No—the additional limitations are addressed below:
processor, a memory, and a communication platform
This is merely generic computing hardware.

The only limitation on the performance of the described method is that it must be performed using generic computing hardware (ie a processor, a memory, and a “communication platform”). The claim thus recites computing components only at a high-level of generality such that it amounts to no more than mere instructions to apply the exception using generic computer components. The statement that the method is performed by computer does not satisfy the test of “inventive concept.” See Alice Corp. Pty. Ltd. v. CLS Bank Int’l, 573 U.S. 208, 134 S. Ct. 2347, 2360 (2014).
For the reasons above, claim 1 is rejected as being directed to non-patentable subject matter under §101. This rejection applies equally to independent claims 8 and 15, which recite a non-transitory medium and a system, respectively, as well as to dependent claims 2-3, 9-10 and 16-17. The additional limitations of the dependent claims are addressed briefly below. Taken alone, the additional elements of the dependent claims above do not amount to significantly more than the above-identified judicial exception (the abstract idea). Looking at the limitations as an ordered combination adds nothing that is not already present when looking at the elements taken individually. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide conventional computer implementation.
Claim limitation 
Examiner analysis
2. The method of claim 1, wherein the initial experts of the initial expert layer are heterogeneous experts.
This is merely additional information about one or more previously identified mental processes.
3. The method of claim 1, wherein an augmented expert at an augmented expert layer is derived by augmenting experts at any lower layer based on training data as well as predictions from experts at any lower layer based on the training data. 
This is merely additional information about one or more previously identified mental processes.


Additionally, independent claim 15 recites “a system” comprising “an expert hierarchy” and “a nonlinear integration model”. Nothing in this claim makes it clear that the recited system falls within one of the four statutory categories of invention (ie a process, machine, manufacture, or composition of matter). Therefore, the claim is ineligible under §101. Dependent claims 16-20 inherit this deficiency from claim 15.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.
Claims 1, 3-8, 10-15 and 17-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Jordan.
Jordan, Michael I., and Robert A. Jacobs. "Hierarchical mixtures of experts and the EM algorithm." Neural computation 6, no. 2 (1994): 181-214.

Regarding claims 1, 8 and 15, Jordan discloses a method implemented on at least one processor, a memory, and a communication platform for integrated targeting, comprising:
constructing an expert hierarchy comprising an initial expert layer and one or more augmented expert layers, wherein the initial expert layer has a plurality of initial experts and each of the one or more augmented expert layers has at least one augmented expert for prediction, wherein each expert in a layer of the expert hierarchy is trained based on training data directed thereto and an expert output from an expert at any lower layer of the expert hierarchy so that an expert at any of the one or more augmented expert layers augments the expertise of experts at any lower layer of the expert hierarchy;
P. 3, fig. 1 (reproduced below). Caption: “A two-level hierarchical mixture of experts. To form a deeper tree, each expert is expanded recursively into a gating network and a set of sub-experts.”

    PNG
    media_image1.png
    598
    742
    media_image1.png
    Greyscale

The Examiner notes that the Applicant provides no explicit definition for the terms ‘expert’, ‘augment’ or ‘augmented expert’.
The examiner interprets ‘augmented expert’ according to its broadest reasonable interpretation as encompassing every expert network above the first (ie leaf) layer. According to Jordan, this hierarchical mixture-of-experts (HME) architecture “yielded a modest improvement” over previous efforts. Thus, the additional hierarchical experts have ‘augmented’ the performance of the base experts. Each expert in the described model is a model which is trained in a supervised manner, ie using labeled training data.
P. 2 “The algorithms that we discuss in this paper are supervised learning algorithms.” (Emphasis added.) The Examiner notes that supervised learning means that the models are trained with labeled training data, ie “training data directed thereto”.
obtaining a nonlinear integration model, via machine learning, for combining expert predictions from experts in the expert hierarch based on an input to generate an integrated expert prediction in response to the input.
PP. 3-4, eqns. 1-3. “Then the ith output of the top-level gating network is the “softmax” function of the ξi”. (Emphasis added.) The Examiner notes that the softmax function is non-linear.
Cf. Applicant’s specification at [0063], including eqn. (8), which includes the softmax function.

Regarding independent claim 8, the recited computing hardware (ie a machine readable and non-transitory medium) are inherent throughout the Jordan disclosure.

Regarding claims 3, 10 and 17, Jordan discloses the further limitations wherein an augmented expert at an augmented expert layer is derived by augmenting experts at any lower layer based on training data as well as predictions from experts at any lower layer based on the training data.
P. 2, “We propose to solve nonlinear supervised learning problems by dividing the input space into a nested set of regions and fitting simple surfaces to the data that fall in these regions. The regions have “soft” boundaries, meaning that data points may lie simultaneously in multiple regions. The boundaries between regions are themselves simple parameterized surfaces that are adjusted by the learning algorithm.”
(cont.) “The hierarchical mixture-of-experts (HME) architecture is shown in Figure 1.1 The architecture is a tree in which the gating networks sit at the nonterminals of the tree. These networks receive the vector x as input and produce scalar outputs that are a partition of unity at each point in the input space. The expert networks sit at the leaves of the tree. Each expert produces an output vector ij for each input vector. These output vectors proceed up the tree, being blended by the gating network outputs.”
See also p. 3, fig. 1 (reproduced supra).

Regarding claims 4, 11 and 18, Jordan discloses the further limitation wherein the step of obtaining the nonlinear integration model comprises:
configuring the nonlinear integration model via a plurality of parameters; and
P. 2, “Expert network (i; j) produces its output μij as a generalized linear function of the input x: μij = f(Ujix) … where Uij is a weight matrix and f is a fixed continuous nonlinearity.” (Emphasis added.)
learning values of the plurality of parameters via machine learning to capture nonlinear relationships among the experts in the expert hierarchy.
Id.

Regarding claims 5 and 12, Jordan discloses the further limitations wherein the nonlinear integration model corresponds to an artificial neural network (ANN) with the plurality of parameters related to the ANN, including embeddings of the ANN.
P. 8, “EM is an iterative approach to maximum likelihood estimation. Each iteration of an EM algorithm is composed of two steps: an Estimation (E) step and a Maximization (M) step. The M step involves the maximization of a likelihood function that is redefined in each iteration by the E step. If the algorithm simply increases the function during the M step, rather than maximizing the function, then the algorithm is referred to as a Generalized EM (GEM) algorithm. The Boltzmann learning algorithm (Hinton & Sejnowski, 1986) is a neural network example of a GEM algorithm. GEM algorithms are often significantly slower to converge than EM algorithms.” (Emphasis added.)

Regarding claims 6, 13 and 19, Jordan discloses the further limitation wherein the step of learning comprises:
initializing the values of the plurality of parameters;
Initialization of weight parameters is inherent in any neural network system. Cf. pp. 17-18, describing on-line supervised training: “In this section we present the equations for the on-line algorithm. These equations involve an update not only of the parameters in each of the networks,6 but also the storage and updating of an inverse covariance matrix for each network”
See also footnote 6: “Note that in this section we use the term “parameters" for the variables that are traditionally called “weights" in the neural network literature. We reserve the term “weights" for the observation weights”.
See also p. 18, “λ was initialized to 0.99”.
receiving the training data having pairs of data, wherein each of the pair includes an input feature vector and a corresponding ground truth label; and’
Id.
See also p. 21, ‘supervised learning’.
for each of the pairs in the training data, receiving the outputs from the respective plurality of experts generated based on the input feature vector in the pair, generating an integrated output of the received outputs based on current values of the plurality of parameters of the nonlinear function, determining a loss based on a discrepancy between the integrated output and the ground truth label in the pair, updating the current values of the plurality of parameter based on the loss, and repeating the steps of receiving, generating, determining, and updating until a convergence condition is satisfied.
P. 17, “It can be shown, however, that R(t)ij is an estimate of the inverse Hessian of the least-squares cost function (Ljung & Sooderstrom, 1986), thus Equation 32 is in fact a stochastic approximation to a Newton-Raphson method rather than a gradient method.” (Emphasis added.)
See also p. 17, first paragraph, discussing least squares learning, which inherently involves a lost / cost function.
Iterative repetition of the described on-line training algorithm is assumed. See p. 18, fig. 5, (reproduced below) illustrating relative error vs. training epochs (ie number of training cycles).

    PNG
    media_image2.png
    554
    818
    media_image2.png
    Greyscale


Regarding claims 7, 14 and 20, Jordan discloses the further limitation comprising:
receiving the input;
P. 2, “These networks receive the vector x as input”.
sending the input to the experts at different layers of the expert hierarchy to facilitate each of the experts in the expert hierarchy to generate a prediction based on the input; and
P. 3, fig. 1 (reproduced below). Caption: “A two-level hierarchical mixture of experts. To form a deeper tree, each expert is expanded recursively into a gating network and a set of sub-experts.”

    PNG
    media_image1.png
    598
    742
    media_image1.png
    Greyscale

combining, via the nonlinear integration model, predictions generated by the experts in the expert hierarchy to output an integrated expert prediction in response to the input.
Id.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the manner in which the invention was made.
The following are the references relied upon in the rejections below:
Jordan (Jordan, Michael I., and Robert A. Jacobs. "Hierarchical mixtures of experts and the EM algorithm." Neural computation 6, no. 2 (1994): 181-214.)
Shazeer (Shazeer, Noam, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. "Outrageously large neural networks: The sparsely-gated mixture-of-experts layer." arXiv preprint arXiv:1701.06538. 2017. Cited by Applicant in IDS dated 6/24/25.)

Claims 2, 9 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Jordan and Shazeer.
Regarding claims 2, 9 and 16, Shazeer discloses the following further limitation which Jordan does not disclose wherein the initial experts of the initial expert layer are heterogeneous experts.
P. 13, “Experiments: We trained a set of models with identical architecture (the MoE-256 model described in Appendix C), using different values of wimportance and wload. We trained each model for 10 epochs, then measured perplexity on the test set.” (Emphasis added.)
The Examiner notes that the experts described are heterogenous with respect to their models (ie their model weights).
At the time of filing, it would have been obvious to a person of ordinary skill to combine the features disclosed by Shazeer with the Jordan system. There are only two possibilities: either all expert models in a MOE model are identical, or they are heterogeneous. If all experts comprise identical models, there would be no advantage gained from creating an ensemble (ie a mixture) thereof.

Additional Relevant Prior Art
The following references were identified by the Examiner as being relevant to the disclosed invention, but are not relied upon in any particular prior art rejection:
Eto discloses a hierarchical mixture of experts system. See eg fig. 2 and abstract. (US 2021/0150388 A1)

Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). 
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Vincent Gonzales whose telephone number is (571) 270-3837. The examiner can normally be reached on Monday-Friday 7 a.m. to 4 p.m. MT. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang, can be reached at (571) 270-7092.
Information regarding the status of an application may be obtained from the USPTO Patent Center.
/Vincent Gonzales/Primary Examiner, Art Unit 2124




    
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
    

    
        1 2019 Revised Patent Subject Matter Eligibility Guidance, 84 Fed. Reg. 50, Jan. 7, 2019.
Read full office action
Prosecution Timeline

May 27, 2022
Application Filed
Aug 22, 2025
Non-Final Rejection — §101, §102, §103
Nov 26, 2025
Response Filed
Feb 05, 2026
Final Rejection — §101, §102, §103
Apr 09, 2026
Request for Continued Examination
Apr 13, 2026
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

18/309,438
Patent 12585920
PREDICTING OPTIMAL PARAMETERS FOR PHYSICAL DESIGN SYNTHESIS
2y 5m to grant Granted Mar 24, 2026
19/003,490
Patent 12580040
DIFFUSION MODEL FOR GENERATIVE PROTEIN DESIGN
2y 5m to grant Granted Mar 17, 2026
17/815,161
Patent 12566984
METHODS AND SYSTEMS FOR EXPLAINING ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING
2y 5m to grant Granted Mar 03, 2026
17/097,220
Patent 12561402
IDENTIFICATION OF A SECTION OF BODILY TISSUE FOR PATHOLOGY TESTS
2y 5m to grant Granted Feb 24, 2026
18/207,861
Patent 12547647
Unsupervised Machine Learning System to Automate Functions On a Graph Structure
2y 5m to grant Granted Feb 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
78%
Grant Probability
89%
With Interview (+10.5%)
3y 6m
Median Time to Grant
Moderate
PTA Risk
Based on 522 resolved cases by this examiner. Grant probability derived from career allow rate.