DETAILED ACTION
This action is responsive to Applicant Arguments and Remarks filed on December 04, 2025.
Claims 9, 18 and 20 have been amended.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
Applicant's Remarks, filed December 04, 2025, has been fully considered and entered.
Accordingly, Claims 1-20 are pending in this application. Claims 9, 18 and 20 have been amended. Claims 1, 13 and 20 are independent claims. In light of Applicant amendments, the 101 rejection of claim 20 for being directed to non-statutory subject matter has been withdrawn.
Response to Arguments
Applicant’s arguments, see pages 7-17, filed December 04, 2025, with respect to the
rejections of claim 1 have been fully considered, but they are not persuasive.
Argument 1: Regarding the 101 rejection, Applicant argues on pages 8-14 "the claims are not directed towards mental processes because the claimed steps are computer-implemented steps that cannot be practically performed in the human mind or using pen/paper... the specific, computer-implemented operations include executing trained computer models and having various computer models output computer-generated data to one another and process initial input features, pre-processed input features, mixed expert outputs, and intermediate outputs to generate final outputs. The generation of new computer outputs by software constructs (i.e., the computer models being executed) are extensive computer data generation processes and cannot be practically performed in the human mind... the claimed approach improves technology relating to reusing previously-calculated intermediate results, thereby reducing the number of redundant computations that computer processors are required to perform... The claims place meaningful limits on the implementation of any purported abstract idea by reciting particularized limitations of processing, mixing, and reusing previously-computed intermediate outputs to generate a final output... each of the amended independent claims recites the limitations of "mixing the input features and the expert outputs to generate mixed expert outputs," "processing the mixed expert outputs using a first model of the hierarchical model to generate intermediate outputs," and "processing the mixed expert outputs and the intermediate outputs using a second model of the hierarchical model to generate a final output." These limitations are specific to imparting the technological improvement of the claimed approach."
Response to Argument 1: Examiner respectfully disagrees. The Examiner would like to clarify that the only independent claim that has been amended, is claim 20 which is not rejected under 101 for being directed to an abstract idea. Regarding claims 1 and 13, the claims do not provide any limitations on how the input features are obtained or processed, nor how the pre-processed input features, mixed expert outputs and intermediate outputs are processed, also it does not specify how the pre-processed input features, expert outputs, mixed expert outputs, intermediate outputs and final outputs are generated.
Under the broadest reasonable interpretation, the terms of the claim are presumed to have their plain meaning consistent with the specification as it would be interpreted by one of ordinary skill in the art. See MPEP 2111. The plain meaning of the term "expert model" can refer to a model that simulates a human expert. Thus, nothing in the claim elements precludes the steps from practically being performed in the mind or with a pen and paper, (i.e., "determine”, "generate”) can be performed in the human mind though observation, evaluation, judgment, opinion with the aid of pen and paper. Thus, the claim does not reflect the argued improvement. See rejection below.
This judicial exceptions are not integrated into a practical application. In particular the recitation of “instructions… executed by one or more processors” in claim 13, are mere instructions to implement an abstract idea or other exception on a computer. Accordingly, this additional elements do not integrate the abstract idea into a practical application because the claim amount to nothing more than an instruction to apply the abstract idea using a generic computer. Also, the recitation of “using a plurality of expert models” and “using… hierarchical model” in claims 1 and 13 also merely indicates a field of use or technological environment in which the judicial exception is performed. Although the additional elements “using a plurality of expert models” and “using… hierarchical model” limits the identified judicial exceptions, this type of limitation merely confines the use of the abstract idea to a particular technological environment (machine learning) and thus fails to add an inventive concept to the claims. See MPEP 2106.05(h).
The claim is directed to an abstract idea. Therefore, the Examiner has determined that this argument is not persuasive.
Argument 2: Regarding the 103 rejection, Applicant argues on pages 15-16 "to teach or suggest the above limitations of claim 1, Pardeshi would have to disclose that both the image data for the objects depicted in the image and the feature vectors are processed by the variational autoencoders. Importantly, Pardeshi contains no such teachings. Rather, as discussed above, Pardeshi discloses only that the feature vectors are input into the variational autoencoders for further processing. Notably, nowhere does Pardeshi disclose or otherwise suggest that the variational autoencoders process both: (1) the image data itself, and (2) the feature vectors derived from the image, as would be required to teach the claim language. Pardeshi is silent in this regard. In view of at least these distinctions, Applicant submits that Pardeshi cannot be properly interpreted as teaching or suggesting the above limitations of claim 1."
Response to Argument 2: Examiner respectfully disagrees. Pardeshi paragraphs [0050-0055] disclose that feature vectors are generated by a convolutional neural network trained to detect one or more objects/features in a scene represented in an image, and the feature vectors are then provided as input to at least one expert autoencoder to encode features.
Examiner interprets a convolutional neural network trained to detect objects in a scene in images, as an expert model.
Thus, by inputting the features into the a convolutional neural network trained to detect one or more objects/features in a scene represented in an image to generate feature vectors and then inputting the feature vectors to at least one expert autoencoder to encode features it is processing the input features and the pre-processed input features using a plurality of expert models to generate expert outputs.
Therefore, the Examiner has determined that this argument is not persuasive.
Information Disclosure Statement
As required by M.P.E.P. 609, the applicant’s submission of the Information Disclosure Statement dated December 01, 2025 is acknowledged by the examiner and the cited references have been considered in the examination of the claims now pending.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-8, 11-17 and 19 are rejected under 35 U.S.C. 101 because claimed invention is directed to an abstract idea without significantly more.
Step 1 analysis:
In the instant case, the claims are directed to a method (claims 1-8 and 10-12), and computer readable medium (claim 13-17 and 19). Thus, each of the claims (1-8, 11-17 and 19) falls within one of the four statutory categories (i.e. process, machine, manufacture of composition of matter).
Step2A analysis:
Based on determining the claim fall within or can be amended to fall within a statutory category (Step 1), it must be determined if the claims are directed to a judicial exception (i.e., law of nature, natural phenomenon, and abstract idea), in this case the claims fall within the judicial exception of an abstract idea. Specifically, the abstract idea of mental processes.
Step 2A: Prong 1 analysis:
The claim(s) recite(s):
Claim 1 (Similarly in claim 13):
“processing input features to generate pre-processed input features;” (Mental process – processing features to generate processed features can be practically performed mentally or with the aid of pen and paper.)
“processing the input features and the pre-processed input features using a plurality of expert models to generate expert outputs;” (Mental process - processing the features and processed features to generate outputs can be practically performed mentally or with the aid of pen and paper.)
“mixing the input features and the expert outputs to generate mixed expert outputs;” (mental process – mixing features and outputs to generate mixed outputs can be practically performed mentally or with the aid of pen and paper.)
“processing the mixed expert outputs using a first model of the hierarchical model to generate intermediate outputs;” (mental process – processing the mixed outputs to generate intermediate outputs can be practically performed mentally or with the aid of pen and paper.)
“processing the mixed expert outputs and the intermediate outputs using a second model of the hierarchical model to generate a final output.” (mental process – processing mixed outputs and intermediate outputs to generate a final output can be practically performed mentally or with the aid of pen and paper.)
Step 2A: Prong 2 analysis:
This judicial exceptions are not integrated into a practical application. In particular the
recitation of “instructions… executed by one or more processors” in claim 13, are mere instructions to implement an abstract idea or other exception on a computer. Accordingly, this additional elements do not integrate the abstract idea into a practical application because the claim amount to nothing more than an instruction to apply the abstract idea using a generic computer. Also, the recitation of “using a plurality of expert models” and “using… hierarchical model” in claims 1 and 13 also merely indicates a field of use or technological environment in which the judicial exception is performed. Although the additional elements “using a plurality of expert models” and “using… hierarchical model” limits the identified judicial exceptions, this type of limitation merely confines the use of the abstract idea to a particular technological environment (machine learning) and thus fails to add an inventive concept to the claims. See MPEP 2106.05(h).
The claim is directed to an abstract idea.
Step 2B analysis:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of instructions executed by one or more processors and using hierarchical and expert models are just mere instructions to apply an exception using a generic computer component in a particular technological environment, which do not provide an inventive concept.
Dependent claim(s) 2-8, 11-12, 14-17 and 19, when analyzed as a whole are held to be patent ineligible under 35 U.S.C. 101 because the additional recited limitation(s) fail(s) to establish that the claim(s) is/are not directed to an abstract idea, as they recite further embellishment of the judicial exception.
Claim 2 (Similarly in claim 14):
“wherein the plurality of expert models include first expert models for processing input features for the first model, second expert models for processing input features for the second model, and third expert models for processing input features for both the first model and the second model.” (Further reciting embellishment of the judicial exception.) Claims 2 and 14 also fail Step 2A prong 2, the additional elements using hierarchical, expert models merely confines the use of the abstract idea to a particular technological environment (neural networks) and thus fails to add an inventive concept to the claims. See MPEP 2106.05(h). thus the claim is directed to the judicial exception as it has not been integrated into practical application, and fails Step 2B as the claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements using hierarchical and expert models are just mere instructions to apply an exception in a particular technological environment, which do not provide an inventive concept. Therefore, Claims 2 and 14 do not recite patent eligible subject matter under 35 U.S.C. § 101.
Claim 3 (Similarly in claim 15):
“combining the input features and outputs from the first expert models to generate first inputs for the first model;” (Further reciting abstract idea of mental process – combining features and outputs to generate input data can be practically performed mentally or with aid of pen and paper.)
“combining the input features and outputs from the second expert models to generate second inputs for the second model;” (Further reciting abstract idea of mental process – combining features and outputs to generate input data can be practically performed mentally or with aid of pen and paper.)
“combining the input features and outputs from the third expert models to generate third inputs for the first model and fourth inputs for the second model.” (Further reciting abstract idea of mental process – combining features and outputs to generate input data can be practically performed mentally or with aid of pen and paper.) Claims 3 and 15 also fail Step 2A prong 2, the additional elements using hierarchical, expert models merely confines the use of the abstract idea to a particular technological environment (neural networks) and thus fails to add an inventive concept to the claims. See MPEP 2106.05(h). thus the claim is directed to the judicial exception as it has not been integrated into practical application, and fails Step 2B as the claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements using hierarchical and expert models are just mere instructions to apply an exception in a particular technological environment, which do not provide an inventive concept. Therefore, Claims 3 and 15 do not recite patent eligible subject matter under 35 U.S.C. § 101.
Claim 4 (Similarly in claim 16):
“combining the input features and outputs from the first expert models comprises generating gating weights.” – Based on the Specification paragraph [0116], the broadest reasonable interpretation of “gating weights” is a number ranging between 0 and 1. (Thus, further reciting abstract idea of mental process - combining features and outputs to generate weights can be practically performed mentally or with aid of pen and paper.) Claims 4 and 16 do not recite any other additional elements and for the same reasons as above with regard to integration into practical application and whether additional elements amount to significantly more, claims 4 and 16 also fail both Step 2A prong 2, thus the claims are directed to the judicial exception as it has not been integrated into practical application, and fail Step 2B as not amounting to significantly more. Therefore, Claims 4 and 16 do not recite patent eligible subject matter under 35 U.S.C. § 101.
Claim 5:
“generating the gating weights comprises generating attention scores for the outputs from the first expert models.” – Based on the Specification paragraph [0116], the broadest reasonable interpretation of “gating weights” is a number ranging between 0 and 1. (Further reciting abstract idea of mental process - generating scores for outputs can be practically performed mentally or with aid of pen and paper.) Claim 5 does not recite any other additional elements and for the same reasons as above with regard to integration into practical application and whether additional elements amount to significantly more, claim 5 also fails both Step 2A prong 2, thus the claims are directed to the judicial exception as it has not been integrated into practical application, and fails Step 2B as not amounting to significantly more. Therefore, Claim 5 does not recite patent eligible subject matter under 35 U.S.C. § 101.
Claim 6 (Similarly in claim 17):
“wherein processing the input features to generate the pre-processed input features comprises extracting features for rows or groups of content that are most likely to be of interest to a user.” (Insignificant extra-solution activity MPEP 2106.05(g) – Selecting a particular data source or type of data to be manipulated, WURC in accordance with MPEP 2106.05(d)(II) -Storing and retrieving information in memory.) Claims 6 and 17 do not recite any other additional elements and for the same reasons as above with regard to integration into practical application and whether additional elements amount to significantly more, claims 6 and 17 also fail both Step 2A prong 2, thus the claims are directed to the judicial exception as it has not been integrated into practical application, and fail Step 2B as not amounting to significantly more. Therefore, Claims 6 and 17 do not recite patent eligible subject matter under 35 U.S.C. § 101.
Claim 7 (Similarly in claim 17):
“wherein processing the input features to generate the pre-processed input features comprises extracting features based on preferences of a user.” (Insignificant extra-solution activity MPEP 2106.05(g) – Selecting a particular data source or type of data to be manipulated, WURC in accordance with MPEP 2106.05(d)(II) -Storing and retrieving information in memory.) Claims 7 and 17 do not recite any other additional elements and for the same reasons as above with regard to integration into practical application and whether additional elements amount to significantly more, claims 7 and 17 also fail both Step 2A prong 2, thus the claims are directed to the judicial exception as it has not been integrated into practical application, and fails Step 2B as not amounting to significantly more. Therefore, Claims 7 and 17 do not recite patent eligible subject matter under 35 U.S.C. § 101.
Claim 8:
“wherein the input features include one or more of row features, page features, video features, or user features.” (Insignificant extra-solution activity MPEP 2106.05(g) – Selecting a particular data source or type of data to be manipulated, WURC in accordance with MPEP 2106.05(d)(II) -Storing and retrieving information in memory.) Claim 8 does not recite any other additional elements and for the same reasons as above with regard to integration into practical application and whether additional elements amount to significantly more, claim 8 also fails both Step 2A prong 2, thus the claims are directed to the judicial exception as it has not been integrated into practical application, and fails Step 2B as not amounting to significantly more. Therefore, Claim 8 does not recite patent eligible subject matter under 35 U.S.C. § 101.
Claim 11 (Similarly in claim 19):
“the first model ranks entities within groups of entities; and (Further reciting abstract idea of mental process - ranking can be performed mentally or with aid of pen and paper)
“the second model recommends groups of entities to display to a user.” (Further reciting abstract idea of mental process – recommending groups can be performed mentally or with aid of pen and paper) Claims 11 and 19 do not recite any other additional elements and for the same reasons as above with regard to integration into practical application and whether additional elements amount to significantly more, claims 11 and 19 also fail both Step 2A prong 2, thus the claims are directed to the judicial exception as it has not been integrated into practical application, and fails Step 2B as not amounting to significantly more. Therefore, Claims 11 and 19 do not recite patent eligible subject matter under 35 U.S.C. § 101.
Claim 12:
“wherein the entities correspond to media content items.” (Insignificant extra-solution activity MPEP 2106.05(g) – Selecting a particular data source or type of data to be manipulated, WURC in accordance with MPEP 2106.05(d)(II) -Storing and retrieving information in memory.) Claim 12 does not recite any other additional elements and for the same reasons as above with regard to integration into practical application and whether additional elements amount to significantly more, claim 12 also fails both Step 2A prong 2, thus the claims are directed to the judicial exception as it has not been integrated into practical application, and fails Step 2B as not amounting to significantly more. Therefore, Claim 12 does not recite patent eligible subject matter under 35 U.S.C. § 101.
The claims are not patent eligible.
Viewed as a whole, these additional claim element(s) do not provide meaningful limitation(s) to transform the abstract idea into a patent eligible application of the abstract idea such that the claim(s) amounts to significantly more than the abstract idea itself. Therefore, the claim(s) are rejected under 35 U.S.C. 101 as being directed to non-statutory subject matter.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-4, 6-8, and 13-17 are rejected under 35 U.S.C. 103 as being unpatentable over Pardeshi (US Patent Application Publication No. US 20220012858 A1 – hereinafter Reference1), in view of Li (US Patent Application Publication No. US 20230385686 A1).
Regarding claim 1, Reference1 teaches a computer-implemented method for generating recommendations using a hierarchical model, the method comprising: processing input features to generate pre-processed input features; (See Reference1 [0050-0053] “an image 302 can be provided as input, where that image includes one or more objects to be transformed through such a time-lapse generation process… features [e.g. input features] for these objects can be transformed [Thus, processing input features] into one or more feature vectors that adhere to a specified schema [e.g. to generate processed input features]… In at least one embodiment, one or more feature vectors generated by CNN 308 for one or more detected classes of objects are provided as input to at least one autoencoder, such as a variational autoencoder (VAE) 312… autoencoder selection can utilize a hierarchical mixture of experts (MoE)-based approach [e.g. using a hierarchical model]”)
processing the input features and the pre-processed input features using a plurality of expert models to generate expert outputs; (See Reference1 [0052-0055] “In at least one embodiment, one or more feature vectors generated by CNN 308 for one or more detected classes of objects are provided as input [Thus, processing the input features and the pre-processed input features] to at least one autoencoder [e.g. using a plurality of expert models], such as a variational autoencoder (VAE) 312… a gating network 310 can be use a mixture-of-experts (MoE) approach to select a VAE 312 for each class of object to be transformed… autoencoder selection can utilize a hierarchical mixture of experts (MoE)-based approach… an encoder that is trained for a class of object (specifically or as a set of classes) can be utilized, where that encoder can encode features [e.g. preprocessing features] for that class of object into a latent space…. these VAEs can be considered experts for different objects classes in a mixture of experts-based approach… this can include having multiple VAEs encode and recreate portions of these images and select one or more VAEs that produce a best result, or most accurate recreation [Thus, using a plurality of expert models to generate expert outputs]… a first expert model might specialize in objects of a first class, such as vehicles, while a second expert model might specialize in objects of a second class, such as dogs or animals.”)
Examiner notes that in Mixture of Experts model, each "expert" is a model. For more information see https://www.datacamp.com/blog/mixture-of-experts-moe)
mixing the input features and the expert outputs to generate mixed expert outputs; (See Reference1 [0053, 0062] “autoencoder selection can utilize a hierarchical mixture of experts (MoE)-based approach… In at least one embodiment, a single set of VAEs may be used for sampling and for processing [e.g. expert outputs], while in at least one embodiment there can be a first set of VAEs for sampling [e.g. input features] and a second set of VAEs for processing, where those sets may be equivalent but used at different stages in a processing pipeline, or may be configured for high performance for sampling and high accuracy for latent space encoding… this latent space can be provided as input to a generative model, such as a generative adversarial network (GAN), to serve as a constraint for generation of an output image 320 [e.g. generate mixed expert output]… there may be at least two sets of encoders that analyze different aspects [e.g. input features] of an input image… a generator can then sample from outputs from both classes of autoencoders [e.g. expert outputs], in order to consider objects in an image as well as context for an overall scene represented in that image.”)
Reference1 does not explicitly discloses processing the mixed expert outputs using a first model of the hierarchical model to generate intermediate outputs;
However, Li teaches mixing the input features and the expert outputs to generate mixed expert outputs; processing the mixed expert outputs using a first model of the hierarchical model to generate intermediate outputs in more details. (See Li [0035, 0068] “The multiple experts being combined using the scheme as disclosed herein may include homogeneous and/or heterogeneous experts… the experts being combined may include conventional experts and/or augmented experts created based on some given existing experts using the augmentation scheme as disclosed herein… To further augment the model capacity and obtain a deep representation of the data… the neural multi-mixture of experts' architecture may be adopted that learns an ensemble of individual experts in an end-to-end fashion.” See also Li [0042] "when an input, e.g., a feature vector, is received by the expert hierarchy, the input [e.g. input features] is sent to all experts in the hierarchy [e.g. hierarchical model] and each expert may then act on the input and generate its respective output [e.g. generate expert outputs].” See also Li [0073] “experts [e.g. expert models] in the expert hierarchy… take input training data (e.g., feature vectors) as input and generate their respective expert outputs (some experts need to generate their outputs based on expert outputs from experts from lower levels of the expert hierarchy) [Thus, generate mixed expert outputs]. These expert outputs are then fed to the HEI model learning engine 410 as inputs. To learn the values of the learnable parameters 440, in each iteration, the HEI model learning engine 410 takes expert outputs from experts in the expert hierarchy” See also Li [0042] “Some of the expert outputs are further sent to augmented experts [e.g. first model of the hierarchical model] at a higher level as additional input to these augmented experts so that augmented experts in the hierarchy also generate their respective outputs [e.g. generate intermediate outputs] based on outputs [e.g. mixed expert outputs] of other experts [Thus, processing the mixed expert outputs using a first model of the hierarchical model to generate intermediate outputs].” See also Li Fig. 3B, [0049] “all expert outputs from the initial expert layer are provided to all augmented experts as input in order for developing augmented experts”
PNG
media_image1.png
570
663
media_image1.png
Greyscale
Thus, augmented expert (e.g. first model) obtains mixed expert outputs from initial experts and generate output [e.g. intermediate outputs].)
It would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to modify Reference1; which uses a hierarchical mixture of experts (MoE)-based approach for processing input features and produce outputs where outputs of different classes of experts are considered, to incorporate the teachings of Li of using an expert hierarchy in a neural multi-mixture of experts' architecture to combine and augment heterogeneous experts in a nonlinear manner via learning.
One would be motivated to do so to optimize learning and obtain a deep representation of the data [Li 0055].
Reference1 further in view of Li, [hereinafter Reference1-Li] additionally disclose processing the mixed expert outputs and the intermediate outputs using a second model of the hierarchical model to generate a final output. (See Li [0042] “augmented experts in the hierarchy also generate their respective outputs based on outputs of other experts [Thus, processing the mixed expert outputs]. These multiple expert outputs are further combined to generate an integrated expert output [e.g. processing the mixed expert outputs and the intermediate outputs using a second model of the hierarchical model to generate a final output] as the integrated expert prediction of the expert hierarchy.” See also Li [0072] “FIG. 4A depicts an exemplary high level system framework 400 for learning a nonlinear model 420 for integrating experts' outputs using a neural network, in accordance with an exemplary embodiment of the present teaching. In this framework 400, a heterogeneous expert integration (HEI) model 420 [e.g. using a second model of the hierarchical model]… configured to operate via a set of learnable model parameters 440… is capable of combining, in a nonlinear manner, outputs from individual experts to generate integrated expert decisions [Thus, to generate a final output]”)
Regarding claim 2, Reference1-Li teaches all limitations and motivations of claim 1, wherein the plurality of expert models include first expert models for processing input features for the first model, second expert models for processing input features for the second model, and third expert models for processing input features for both the first model and the second model. (See Li [0036-0037] “The expert layers include an initial expert layer 210 and one or more layers 220 of augmented experts. The initial expert layer 210 may include a plurality of experts [e.g. plurality of expert models], including initial expert 1 210-1, initial expert 2 210-2, . . . , initial expert k 210-3 [e.g. first, second, third expert models]… For instance, augmented expert 1 220-1, augmented expert 2 220-2, . . . , augmented expert j 220-j may be at the first augmented expert layer 1 and are created via learning from both training data and by leveraging the expertise of previously trained experts at the initial expert layer 210.” See also Li [0073] “experts in the expert hierarchy… take input training data (e.g., feature vectors) as input and generate their respective expert outputs (some experts need to generate their outputs [Thus, processing input features] based on expert outputs from experts from lower levels of the expert hierarchy).” See Li [0045] “the initial experts 310-1 and 310-2 are trained first using, e.g., training data set T1… Once the initial experts [e.g. first, second, third expert models] are trained [Thus, processing input features], they may be used to create augmented experts [e.g. first model (Thus, for the first model)].” See also Li Fig. 3B, [0047] “once the training of augmented experts 21 320-1 and 22 320-2 are completed, the nonlinear integration modeling unit 240 may be trained based on training data T3. The training data in T3 is sent to all experts in the expert hierarchy, i.e., initial experts 11 310-1 and initial expert 12 310-2 as well as augmented expert 21 320-1 and augmented expert 22 320-2. These trained experts, reacting to the training data in T3 [Thus, processing input features], generate their respective expert outputs… All these expert outputs are then all sent to the nonlinear integration modeling unit 240 so that the nonlinear expert integration model 260 [e.g. second model (Thus, for the second model)] can be trained”)
PNG
media_image1.png
570
663
media_image1.png
Greyscale
Regarding claim 3, Reference1-Li teaches all limitations and motivations of claim 2, wherein mixing the input features and the expert outputs comprises: combining the input features and outputs from the first expert models to generate first inputs for the first model; (See Li [0073] “experts in the expert hierarchy… take input training data (e.g., feature vectors) as input and generate their respective expert outputs (some experts need to generate their outputs based on expert outputs from experts from lower levels of the expert hierarchy).” See Li [0045-0046] “the initial experts 310-1 and 310-2 are trained first using, e.g., training data set T1… Once the initial experts [e.g. first expert models] are trained [Thus, processing input features], they may be used to create augmented experts” See also Li [0040] “The configured models for the augmented experts [e.g. first model] are trained, at 225, based on training data [e.g. input features] and the outputs of the initial experts [e.g. combining the input features and outputs from the first expert models to generate first inputs for the first model]… the training data in T2 are also provided to the initial experts 310-1 and 310-2 so that they produce expert outputs o11 and o12, both of which are fed to the augmented experts 320-1 and 320-2 as input [e.g. first inputs].”)
combining the input features and outputs from the second expert models to generate second inputs for the second model; and (See Li [0048] “The nonlinear integration modeling unit 240 is provided for training the nonlinear expert integration model 260 [e.g. second model]… takes input data [e.g. second inputs] (including training data T3 [e.g. input features] as well as expert outputs [e.g. from second expert models] o11, o12, o21, and o22 generated based on the same training data T3) and learns various parameters that define the nonlinear expert integration model 260 [Thus, combining the input features and outputs from the second expert models to generate second inputs for the second model]”)
combining the input features and outputs from the third expert models to generate third inputs for the first model and fourth inputs for the second model. (See Li [0046, 0048] “the training data in T2 are also provided to the initial experts [e.g. third expert models] 310-1 and 310-2 so that they produce expert outputs o11 and o12, both of which [Thus, combining the input features and outputs] are fed to the augmented experts [e.g. for the first model] 320-1 and 320-2 as input [e.g. third input]. That is, the training of augmented experts 320-1 and 320-2 are based on both input from the training data T2 [e.g. input features] as well as the input expert outputs from the initial experts… The nonlinear integration modeling unit 240 is provided for training the nonlinear expert integration model 260 [e.g. second model]… takes input data [e.g. fourth inputs] (including training data T3 [e.g. input features] as well as expert outputs [e.g. from third expert models] o11, o12, o21, and o22 generated based on the same training data T3) and learns various parameters that define the nonlinear expert integration model 260 [Thus, combining the input features and outputs from the third expert models to generate second inputs for the second model]” See also Li Fig. 3B Disclosing that training data [e.g. input features] and output [e.g. o11, o12, o13, etc.] from the initial experts [e.g. third expert models] are sent as input [e.g. third inputs] to the augmented experts [e.g. first model] and sent as input [e.g. fourth inputs] to the nonlinear expert integration model [e.g. second model]”)
PNG
media_image1.png
570
663
media_image1.png
Greyscale
Regarding claim 4, Reference1-Li teaches all limitations and motivations of claim 3, wherein combining the input features and outputs from the first expert models comprises generating gating weights. (See Li [0055, 0068-0069] “The framework as disclosed herein, can flexibly incorporate adaptive expert combination modules and deep representation learning module from original input to augmenting the heterogeneous experts [e.g. combining the input features and outputs from the first expert models]… To further augment the model capacity and obtain a deep representation of the data, a complementary expert module hcomp may be incorporated with hypothesis space Hcomp that allows flexible modulation of information flow while respecting the simplicity of network design. To that end, the neural multi-mixture of experts' architecture may be adopted that learns an ensemble of individual experts [e.g. from first expert models] in an end-to-end fashion… All experts in the inner expert neural submodules are called InnerExpertt, 1≤t≤E. There may also be certain gating network Gatei, that projects an input from the original data representation… The prediction of the final complementary expert maps a concept vector representation… which can be expressed as:
PNG
media_image2.png
194
804
media_image2.png
Greyscale
Here, the intermediate representation vs may corresponds to a weighted sum by a shallow network
PNG
media_image3.png
67
186
media_image3.png
Greyscale
[Thus, generating gating weights] after normalizing into unit simplex via softmax(·)… The final model prediction… may then be produced using an additional layer of weighted sums over all possible experts.”)
Regarding claim 6, Reference1-Li teaches all limitations and motivations of claim 1, wherein processing the input features to generate the pre-processed input features comprises extracting features for rows or groups of content that are most likely to be of interest to a user. (See Reference1 [0050-0052] “an image 302 can be provided as input, where that image includes one or more objects to be transformed through such a time-lapse generation process… a user may specify a point, period, or era in time [e.g. based on specified user interest] through an interface or application, which can generate this time vector 304 for input… input image 302 can be provided to at least one classifying network, such as a convolutional neural network 308, that is trained to detect one or more objects in a scene represented in this image. In at least one embodiment, this CNN 308 can also attempt to classify an era, period, or point in time for each of these objects… features for these objects can be transformed into one or more feature vectors that adhere to a specified schema [Thus, wherein processing the input features to generate the pre-processed input features comprises extracting features based on specified user interest]” See also Reference1 [0066] “a generative model can also be trained to link or correlate various objects… a single autoencoder or expert can be used to learn these object groupings or associations… a k-means (or other clustering) approach can be utilized to identify object that may be related [e.g. groups of content that are most likely to be of interest to a user]… this grouping may be encoded into latent space [Thus, extracting features for groups of content that are most likely to be of interest to a user]for consideration by a generator network.”)
Regarding claim 7, Reference1-Li teaches all limitations and motivations of claim 1, wherein processing the input features to generate the pre-processed input features comprises extracting features based on preferences of a user. (See Reference1 [0050-0052] “an image 302 can be provided as input, where that image includes one or more objects to be transformed through such a time-lapse generation process… a user may specify a point, period, or era in time [e.g. preferences of a user] through an interface or application, which can generate this time vector 304 for input… input image 302 can be provided to at least one classifying network, such as a convolutional neural network 308, that is trained to detect one or more objects in a scene represented in this image. In at least one embodiment, this CNN 308 can also attempt to classify an era, period, or point in time for each of these objects… features for these objects can be transformed into one or more feature vectors that adhere to a specified schema [Thus, wherein processing the input features to generate the pre-processed input features comprises extracting features based on preferences of a user]”)
Regarding claim 8, Reference1-Li teaches all limitations and motivations of claim 1, wherein the input features include video features. (See Reference1 [0355] “input data may be representative of one or more images, video [e.g. video features], and/or other data representations generated by one or more imaging devices”)
Regarding claim 13, Reference1-Li teaches all of the elements of claim 1 in method form rather than computer-readable media form. Therefore, the supporting rationale of the rejection to claim 1 applies equally as well to those elements of claim 13.
Regarding claim 14, Reference1-Li teaches all of the elements of claim 2 in method form rather than computer-readable media form. Therefore, the supporting rationale of the rejection to claim 2 applies equally as well to those elements of claim 14.
Regarding claim 15, Reference1-Li teaches all of the elements of claim 3 in method form rather than computer-readable media form. Therefore, the supporting rationale of the rejection to claim 3 applies equally as well to those elements of claim 15.
Regarding claim 16, Reference1-Li teaches all of the elements of claim 4 in method form rather than computer-readable media form. Therefore, the supporting rationale of the rejection to claim 4 applies equally as well to those elements of claim 16.
Regarding claim 17, Reference1-Li teaches all of the elements of claims 6-7 in method form rather than computer-readable media form. Therefore, the supporting rationale of the rejection to claims 6-7 applies equally as well to those elements of claim 17.
Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Reference1-Li, in view of Zhao (US Patent Application Publication No. US 20230385686 A1).
Regarding claim 5, Reference1-Li teaches all limitations and motivations of claim 4.
Reference1-Li does not explicitly disclose generating attention scores.
However, Zhao teaches wherein generating the gating weights comprises generating attention scores for the outputs from the first expert models in more details. (See Zhao [0041-0045, 0051] “The neural network 102 includes a gating subsystem 110 that is configured to select, based on respective weights computed [Thus, generating gating weights] for each of one or more of the plurality of the expert neural networks… the gating subsystem 110 combines the expert outputs generated by the selected expert neural networks [e.g. first expert models] by weighting the expert output generated by each of the selected expert neural networks… the gating subsystem 110 applies a softmax function to (i) a first set of gating parameters having first learned values and (ii) the MoE subnetwork input to generate a respective softmax score [e.g. generating attention scores] for each of one or more of the plurality of expert neural networks.”)
It would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to modify Reference1-Li; which generates gating weights that projects an input from original data representation, to incorporate the teachings of Zhao of applying a function to a set of gating parameters having values learned through training to generate a respective softmax score and weights for each of one or more of the plurality of expert neural networks.
One would be motivated to do so to improve probabilistic interpretation of data [Zhao 0016].
Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Reference1-Li, in view of Wu (US Patent Application Publication No. US 20220188517 A1).
Regarding claim 10, Reference1-Li teaches all limitations and motivations of claim 1.
Reference1-Li does not explicitly disclose using positive training features included in a log and a subset of negative training features included in the log.
However, Wu teaches wherein the hierarchical model is trained using positive training features included in a log and a subset of negative training features included in the log. (See Wu [0023, 0026] “ FIG. 2 illustrates a hierarchical machine learning system 200 [e.g. hierarchical model] including components to collaborate between real-time AI engines 101 and master AI engine 102 according to an implementation of the disclosure… each real-time AI engine 101 may capture operator's expert knowledge using a unique machine learning algorithm.” See also Wu [0036-0043] “processing device 2 may execute a real-time AI engine 300… responsive to receiving a document [e.g. log] for training purpose, identify tokens from a DOM tree associated with the document 302… processing device 2 may determine non-tagged N tokens surrounding a labeled strong positive token (e.g., N nearest neighbors, where N is greater than one) and label these N tokens as weak negative tokens because the operator fails to select them in the labeling process. Thus, a weak negative token means a potential negative token determined implicitly based on the inaction of the operator. In one implementation, tokens that are identical to the positive tokens but unlabeled may also be treated as weak negative tokens. The training process may be carried out progressively in one or more iterations. In each iteration, the intermediate machine learning model [e.g. hierarchical model] generated during the prior iteration may be used to label tokens in the documents…. the strong (positive and negative) [e.g. using positive training features included in a log] and weak (positive and negative [e.g. a subset of negative training features included in the log]) may be associated with different weightings in the training process. Thus, the different levels of tokens may impact the training process differently.”)
It would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to modify Reference1-Li; which a model generates media content items for presentation to a user [Reference1 0063], to incorporate the teachings of Reference3 of using a framework based on a multi-gate mixture of experts which applies a model to rank multiple candidate videos and uses a recommender to recommend the user to top ranked videos.
One would be motivated to do so to improve user engagement and satisfaction.
Claims 11-12 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Reference1-Li, in view of Li (US Patent Application Publication No. US 20220019878 A1 – hereinafter Reference3).
Regarding claim 11, Reference1-Li teaches all limitations and motivations of claim 1.
Reference1-Li does not explicitly disclose the first model ranks entities within groups of entities.
However, Reference3 teaches the first model ranks entities within groups of entities; and
the second model recommends groups of entities to display to a user. (See Reference3 [0002, 0030] “ Deep neural networks have achieved great successes in many domains, such as computer vision, natural language processing, recommender systems, etc. …the present disclosure proposes embodiments to construct a user recommendation (e.g., video recommendation) framework based on SAC with the multiple actions learned with multi-gate mixture of experts” See also [0026-0027] “ranking strategy is directly related to user behavior and thus plays an essential role in keeping users watching videos. The ranking strategy firstly attempts to attract users to click a short video and then, after user finishing watching it, attracts them to click recommended videos [e.g. groups of entities] to continue [Thus, display to a user]. In the ranking stage, a recommender [e.g. second model] has multiple candidates [e.g. entities within groups of entities] retrieved via candidate generation and applies a large-capacity model [e.g. first model] to rank [Thus, the first model ranks entities within groups of entities]. Finally, it recommends the user to the top one or few videos [e.g. groups of entities] to select [Thus, recommends groups of entities to display to a user]… users are often presented with slates of multiple items [i.e. groups of entities]”
Examiner notes that based on the Specification paragraph [0149] entities correspond to media content items.)
It would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to modify Reference1-Li; which a model generates media content items for presentation to a user [Reference1 0063], to incorporate the teachings of Reference3 of using a framework based on a multi-gate mixture of experts which applies a model to rank multiple candidate videos and uses a recommender to recommend the user to top ranked videos.
One would be motivated to do so to improve user engagement and satisfaction.
Regarding claim 12, Reference1-Li further in view of Reference3 of teaches all of the elements of claim 11 in method form. Therefore, the supporting rationale of the rejection to claim 11 applies equally as well to those elements of claim 12.
Regarding claim 19, Reference1-Li further in view of Reference3 teaches all of the elements of claim 11 in method form rather than computer-readable media form. Therefore, the supporting rationale of the rejection to claim 11 applies equally as well to those elements of claim 19.
Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Li (US Patent Application Publication No. US 20230385686 A1), in view of Jordan (Non-Patent literature - (1993), Hierarchical mixtures of experts and the EM algorithm. Proceedings of 1993 International Joint Conference on Neural Networks, pages 1339-1344).
Regarding claim 20, Li teaches a recommendation system comprising: a hierarchical model comprising a first model and a second model, wherein output from the first model is provided to the second model; a plurality of expert models preprocessing input features for the hierarchical model, the plurality of expert models including first expert models for preprocessing input features for the first model, second expert models for preprocessing input features for the second model, and third expert models for preprocessing input features for both the first model and the second model; (See Li [0036-0037] “The expert layers include an initial expert layer 210 and one or more layers 220 of augmented experts. The initial expert layer 210 may include a plurality of experts [e.g. plurality of expert models], including initial expert 1 210-1, initial expert 2 210-2, . . . , initial expert k 210-3 [e.g. first, second, third expert models]… For instance, augmented expert 1 220-1, augmented expert 2 220-2, . . . , augmented expert j 220-j may be at the first augmented expert layer 1 and are created via learning from both training data and by leveraging the expertise of previously trained experts at the initial expert layer 210.” See also Li [0073] “experts in the expert hierarchy… take input training data (e.g., feature vectors) as input [e.g. preprocessing input features] and generate their respective expert outputs (some experts need to generate their outputs based on expert outputs from experts from lower levels of the expert hierarchy).” See Li [0045] “the initial experts 310-1 and 310-2 are trained first using, e.g., training data set T1… Once the initial experts [e.g. first, second, third expert models] are trained [Thus, preprocessing input features], they may be used to create augmented experts [e.g. first model (Thus, for the first model)].” See also Li Fig. 3B, [0047] “once the training of augmented experts 21 320-1 and 22 320-2 are completed, the nonlinear integration modeling unit 240 may be trained based on training data T3. The training data in T3 is sent to all experts in the expert hierarchy, i.e., initial experts 11 310-1 and initial expert 12 310-2 [e.g. second, third expert models] as well as augmented expert 21 320-1 and augmented expert 22 320-2. These trained experts, reacting to the training data in T3 [Thus, processing input features], generate their respective expert outputs… All these expert outputs [Thus, output from the first model is provided to the second model] are then all sent to the nonlinear integration modeling unit 240 so that the nonlinear expert integration model 260 [e.g. second model (Thus, for the second model)] can be trained”)
Li teaches the use of a gating network. (See Li [0068] “the neural multi-mixture of experts' architecture may be adopted that learns an ensemble of individual experts [e.g. from first expert models] in an end-to-end fashion… All experts in the inner expert neural submodules are called InnerExpertt, 1≤t≤E. There may also be certain gating network Gatei, that projects an input from the original data representation)
Li does not explicitly disclose a second, third and fourth gating networks.
However, Jordan teaches a first gating network combining outputs from the first expert models to generate input for the first model; a second gating network combining the input features and outputs from the third expert models to generate input for the first model; a third gating network combining the input features and outputs from the second expert models to generate input for the second model; and a fourth gating network combining outputs from the third expert models to generate input for the second model. (See Jordan pages 1339-1340, 2 HIERARCHICAL MIXTURES OF EXPERTS “ We propose to solve nonlinear supervised learning problems by dividing the input space into a nested set of regions and fitting simple surfaces to the data that fall in these regions. The regions have “soft” boundaries, meaning that data points may lie simultaneously in multiple regions. The boundaries between regions are themselves simple parameterized surfaces that are adjusted by the learning algorithm. The hierarchical mixture-of-experts (HME) architecture is shown in Figure l.’ The architecture is a tree in which the gating networks sit at the nonterminals of the tree. These networks receive the vector x as input and produce scalar outputs that are a partition of unity at each point in the input space. The expert networks sit at the leaves of the tree. Each expert [e.g. expert models] produces an output vector… for each input vector [e.g. input features]. These output vectors proceed up the tree [e.g. generate input for a first model]
, being multiplied by the gating network outputs and summed at the nonterminals.”
PNG
media_image4.png
436
455
media_image4.png
Greyscale
[Thus, combining the input features and outputs from expert models at the leaves of the tree]
See also Jordan page 1343, 2.5.1 Simulation Results, Fig. 2 “We generated 15 000 data points for training and 5,000 points for testing… The hierarchy was a four-level hierarchy [e.g. first, second models] with 16 expert networks [e.g. first, second, third expert models] and 15 gating networks [e.g. first, second, third, fourth gating networks]. Each expert network had 4 output units and each gating network had 1 output unit [Thus, generating input for models up in the hierarchy].
PNG
media_image5.png
382
461
media_image5.png
Greyscale
Fig.2 disclose an architecture that includes 16 experts in a four-level hierarchy with 15 gating networks where each expert [e.g. expert models] produces an output vector for each input vector [e.g. input features] and these output vectors proceed up the tree. Thus, at the second level each gating network [e.g. first, second, third, fourth gating networks] combines the corresponding expert outputs from the expert models [e.g. first, second, third expert models] to generate input for expert networks at the third level [e.g. first, second models].
Thus, Jordan teaches a first gating network combining outputs from the first expert models to generate input for the first model, a second gating network combining the input features and outputs from the third expert models to generate input for the first model, a third gating network combining the input features and outputs from the second expert models to generate input for the second model, and a fourth gating network combining outputs from the third expert models to generate input for the second model.
It would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to modify Li; which teaches the use of a gating network that projects an input of original data representation in a neural multi-mixture of experts' architecture, to incorporate the teachings of Jordan of using a hierarchical mixtures of experts with a multiple level hierarchy tree architecture with multiple gating networks at the non-terminals nodes of the tree.
One would be motivated to do so to improve accuracy through specialization by routing inputs to relevant experts at different levels of the hierarchy.
Allowable Subject Matter
Claims 9 and 18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base and any intervening claims. After sufficient search and analysis, Examiner concluded that the claimed invention has been recited in such a manner that claims 9 and 18 are not taught by any prior reference found through search. The primary reason for allowance of the claims in this case, is the inclusion of the limitations “caching the intermediate outputs to generate cached intermediate outputs; and in response to determining that second input features correspond to the cached intermediate outputs, processing the mixed expert outputs and the cached intermediate outputs using a replica of the second model to generate a second final output.” which are not found in the prior art of record. Overcoming the 35 USC 101 rejections and incorporating claims 9 and 18 into independent claims would put claims in condition for allowance.
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to OSCAR WEHOVZ whose telephone number is (571)272-3362. The examiner can normally be reached 8:00am - 5:00pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, APU M MOFIZ can be reached at (571) 272-4080. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/OSCAR WEHOVZ/Examiner, Art Unit 2161
/APU M MOFIZ/Supervisory Patent Examiner, Art Unit 2161