DETAILED ACTION
This action is in response to the claims filed 12/01/2025 for Application number 17/849,506. Claims 1, 5-6, 8-9, 11, 14-16, and 18 have been amended. Thus, claims 1-20 are currently pending.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding claim 1,
Step 1 Analysis: Claim 1 is directed to a process, which falls within one of the four statutory categories.
Step 2A Prong 1 Analysis: Claim 1 recites, in part, The limitations of:
determining a set of data, the set of data comprising a plurality of numerical ranges, a plurality of embeddings, and a plurality of attributes, wherein a numerical range of the plurality of numerical ranges is associated with at least one embedding of the plurality of embeddings and at least one attribute of the plurality of attributes can be considered to be an evaluation in the human mind,
sampling the numerical range to obtain a plurality of sample values, wherein a sample value of the plurality of sample values is sampled from the numerical range and is associated with the at least one embedding and the at least one attribute can be considered to be an evaluation in the human mind
determining, [by the trained neural network prediction model], an output based on a set of input data, the set of input data comprising at least one input embedding and at least one input attribute, wherein the output is a predicted range of values based on an output mean and an output standard deviation can be considered to be an evaluation in the human mind.
These limitations as drafted, are processes that, under broadest reasonable interpretation, covers performance of the limitation in the mind or with the aid of pen and paper which falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2 Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites the additional elements - “training a neural network prediction model by minimizing a loss of an output of the neural network prediction based on a set of sample value training data…”. Thus, these elements in the claim are recited at a high level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer component. Please see MPEP 2106.05(f). Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.
Step 2B Analysis: The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of utilizing a neural network prediction model to perform the steps of the claimed process amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.
Regarding claim 2, the rejection of claim 1 is further incorporated, and further, the claim recites: wherein determining, by the trained neural network prediction model, the output further comprises
determining an output uncertainty, This limitation recites an additional mental step in addition to the judicial exception identified in the rejection of claim 1, thus recites a judicial exception.
and
the method further comprises:
causing the output to be presented on a computing device in response to determining that the output uncertainty exceeds a threshold. This limitation is an insignificant extra-solution activity and thus the judicial exception is not integrated into a practical application.
The claim does not include any additional elements that amount to significantly more than the judicial exception. The limitation of “causing the output to be presented on a computing device in response to determining that the output uncertainty exceeds a threshold” is just a nominal or tangential addition to the claim, and is also well-understood, routine and conventional as evidenced by MPEP §2106.05(d)(II)(I), “receiving or transmitting data over a network”. This limitation therefore remains insignificant extra-solution activity even upon reconsideration, and does not amount to significantly more. Even when considered in combination, this additional element represents an insignificant extra-solution activity which cannot provide an inventive concept. The claim is not patent eligible.
Regarding claim 3, the rejection of claim 2 is further incorporated, and further, the claim recites: wherein determining the output uncertainty further comprises: determining a coefficient of variation based on the output mean and the output standard deviation This limitation recites an additional mental step in addition to the judicial exception identified in the rejection of claim 1, thus recites a judicial exception.
and
wherein determining whether the output uncertainty satisfies a threshold uncertainty comprises determining whether the coefficient of variation satisfies the threshold uncertainty. This limitation recites an additional mental step in addition to the judicial exception identified in the rejection of claim 1, thus recites a judicial exception.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible.
Regarding claim 4, the rejection of claim 3 is further incorporated, and further, the claim recites: wherein the neural network prediction model comprises a plurality of ensemble models, an ensemble model of the plurality of ensemble models has a random initialization of weights different from other ensemble models of the plurality of ensemble models, This limitation amounts to no more than generally linking the elements to the judicial exception. Please see MPEP 2106.05(h).
determining the output uncertainty further comprises:
calculating a variance of a plurality outputs of the plurality of ensemble models; This limitation recites a mathematical concept in addition to the judicial exception identified in the rejection of claim 1
and wherein determining whether the output uncertainty satisfies a threshold uncertainty further comprises determining whether the variance satisfies the threshold uncertainty. This limitation recites an additional mental step in addition to the judicial exception identified in the rejection of claim 1, thus recites a judicial exception.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible.
Regarding claim 5, the rejection of claim 2 is further incorporated, and further, the claim recites: wherein the set of data further comprises a plurality of fixed values, and the method further comprises:
generating a set of fixed value training data comprising a plurality of fixed value training vectors, wherein a fixed value training vector of the plurality of fixed value training vectors is based on a fixed value of the plurality of fixed values, not sampled from a numerical range, at least one associated fixed value embedding, and at least one associated fixed value attribute; This limitation recites an additional mental step in addition to the judicial exception identified in the rejection of claim 1, thus recites a judicial exception.
and
wherein training neural network prediction model further comprises applying the prediction model to the set of fixed value training data. This limitation amounts to mere instructions to apply the judicial exception using a generic computer component. Please see MPEP 2106.05(f).
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible.
Regarding claim 6, the rejection of claim 5 is further incorporated, and further, the claim recites: wherein training the neural network prediction model further comprises:
transforming the set of sample value training data from the plurality of sample value training vectors into a first tensor; This limitation recites a mathematical concept in addition to the judicial exception identified in the rejection of claim 1
and
transforming the set of fixed value training data from the plurality of fixed value training vectors into a second tensor, wherein the first tensor and the second tensor are of a same order. This limitation recites a mathematical concept in addition to the judicial exception identified in the rejection of claim 1
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible.
Regarding claim 7, the rejection of claim 5 is further incorporated, and further, the claim recites: wherein generating the set of fixed value training data further comprises, for the fixed value training vector of the plurality of fixed value training vectors:
determining the fixed value, the at least one associated fixed value embedding, and the at least one associated fixed value attribute, wherein the fixed value, the at least one associated fixed value embedding, and the at least one associated fixed value attribute are encrypted; This limitation recites an additional mental step in addition to the judicial exception identified in the rejection of claim 1, thus recites a judicial exception.
and
decrypting the encrypted fixed value, the encrypted at least one associated fixed value embedding, and the encrypted at least one associated fixed value attribute. This limitation recites a mathematical concept (transforming data using mathematical calculations) in addition to the judicial exception identified in the rejection of claim 1.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible.
Regarding claim 8, the rejection of claim 7 is further incorporated, and further, the claim recites:
training the first expert model by updating parameters of the first expert model to minimize a loss of an output of the first expert model based on a first subset of the set of sample value training data; This limitation amounts to mere instructions to apply the judicial exception using a generic computer component. Please see MPEP 2106.05(f).
training the second expert model by updating parameters of the second expert model to minimize a loss of an output of the second expert model based on a second subset of the set of sample value training data, wherein the first subset and the second subset include a different number of attributes of the plurality of attributes; This limitation amounts to mere instructions to apply the judicial exception using a generic computer component. Please see MPEP 2106.05(f).
determining, by the first expert model, a first expert output based on the set of input data; This limitation recites an additional mental step in addition to the judicial exception identified in the rejection of claim 1, thus recites a judicial exception.
determining, by the second expert model, a second expert output based on the set of input data; This limitation recites an additional mental step in addition to the judicial exception identified in the rejection of claim 1, thus recites a judicial exception.
and wherein determining the output further comprises:
determining, by a gating network, a first output probability associated with the first expert output; This limitation recites an additional mental step in addition to the judicial exception identified in the rejection of claim 1, thus recites a judicial exception.
determining, by the gating network, a second output probability associated with the first expert output; This limitation recites an additional mental step in addition to the judicial exception identified in the rejection of claim 1, thus recites a judicial exception.
and
determining the output based on the first expert output, the second expert output, the first output probability, and the second output probability. This limitation recites an additional mental step in addition to the judicial exception identified in the rejection of claim 1, thus recites a judicial exception.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible.
Regarding claim 9, the rejection of claim 1 is further incorporated, and further, the claim recites:
wherein determining the set of data comprising the plurality of attributes further comprises: Determining the plurality of attributes, wherein the at least one attribute of the plurality of attributes comprises two or more of entity, title, location, industry, and skills. This limitation recites an additional mental step in addition to the judicial exception identified in the rejection of claim 1, thus recites a judicial exception.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible.
Regarding claim 10, the rejection of claim 1 is further incorporated, and further, the claim recites:
wherein determining the set of data comprising the plurality of embeddings further comprises:
determining the plurality of embeddings, wherein the embedding of the plurality of embeddings comprises a vector measuring similarity between separate attributes of the plurality of attributes. This limitation recites a mathematical concept in addition to the judicial exception identified in the rejection of claim 1.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible.
Claim 11 recites features similar to claim 1 and is rejected for at least the same reasons therein. Claim 11 additionally requires A system comprising: at least one memory device; and a processing device, operatively coupled with the at least one memory device, however these additional elements amount to mere instructions to apply the judicial exception using a generic computer component. Please see MPEP 2106.05(f).
Regarding Claims 12-20, they recite features similar to claims 2-10 and are rejected for at least the same reasons therein.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-3, 9-13, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Chow et al. ("US 20220327373 A1", hereinafter "Chow") in view of Fong et al. ("Accelerated PSO Swarm Search Feature Selection for Data Stream Mining Big Data", hereinafter "Fong") and further in view of Fawaz et al. ("US 20210012267 A1", hereinafter "Fawaz").
Regarding claim 1, Chow teaches A method comprising:
determining a set of data, the set of data comprising a plurality of embeddings, and a plurality of attributes, wherein a numerical range of the plurality of numerical ranges is associated with at least one embedding of the plurality of embeddings and at least one attribute of the plurality of attributes (“The system identifies relationships between the user and the navigational targets based on the embeddings… [Abstract])… The machine learning engine selects pairs of users/navigational targets as a data set for training a machine learning model. In FIG. 4A, a first user, Sue, is selected and the machine learning engine obtains the attributes 405 for Sue. The user attributes include “Job Title,” “New Hire?”, “Age,” “Location,” “Work Group,” and “Salary Level.” Additional user attributes are indicated by the ellipses in FIG. 4A” [¶0077; See further below, the prior art of Ma discloses “salary range” thus when combined with Chow would teach the limitation as claimed.]);
a set of sample value training data comprising a plurality of sample value training vectors, wherein a sample value training vector of the plurality of sample value training vectors is based on the sample value, the at least one associated embedding, and the at least one associated attribute (“In one or more embodiments, the user attributes and navigational target attributes include textual and alphanumerical values that must be converted into numerical values to be input into the two neural networks. For example, for some textual attributes the system may perform a hash-vectorizer or count-vectorizer operation to generate numerical values. In order to incorporate the contextual and semantic meaning of textual attributes such as job name, city, or country, a pre-trained word2vec model converts the textual attribute values to numerical vector values.” [¶0023]);
and
determining, by the trained neural network prediction model, an output based on a set of input data, the set of input data comprising at least one input embedding and at least one input attribute (“A set of user attributes, converted to numerical values, is provided to the first neural network 142 as an input data set to generate an output user embedding 135.” [¶0037; See further: “Information describing user navigation 131, user attributes 132, navigational target attributes 133, training data sets 134, user embeddings 135, and navigational target embeddings 136 may be implemented across any of components within the system 100. [¶0036]]),
However fails to explicitly teach wherein the output is a predicted range of values based on an output mean and an output standard deviation.
Fong teaches wherein the output is a predicted range of values based on an output mean and an output standard deviation. (“CCV is founded on a basic belief that a good attribute in a training dataset should have its data vary sufficiently wide across a range of values, so that it is significant to characterize a useful prediction model. The coefficient of variation (CV) is expressed as a real number from – 1 to þ 1 and it describes the standard deviation of a set of numbers relative to their mean” [pg. 39, left col, ¶2])
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Chow’s teachings by using the coefficient of variation technique as taught by Fong. One would have been motivated to make this modification to balance the prediction model induction between generalization and overfitting. [pg. 39, left col, ¶2, Fong]
However Chow/Fong fails to explicitly teach the set of data comprising a plurality of numerical ranges,
sampling the numerical range to obtain a plurality of sample values, wherein a sample value of the plurality of sample values is sampled from the numerical range and is associated with the at least one embedding and the at least one attribute
training a neural network prediction model to minimize a loss of an output of the neural network prediction model based on a set of sample value training data…
Fawaz teaches the set of data comprising a plurality of numerical ranges, (“For example, the features include the candidate's title, function, skills, education, seniority, industry, location, and/or other professional and/or demographic attributes. The features also include job features such as the job's title, industry, function, seniority, desired or required skill and experience, salary range, and/or location.” [¶0068])
sampling the numerical range to obtain a plurality of sample values, wherein a sample value of the plurality of sample values is sampled from the numerical range and is associated with the at least one embedding and the at least one attribute (“Sampling apparatus 204 additionally uses rules 222 to filter candidate-job features 224 and labels 212 in training data for filtering model 208, so that labels 212 match outcomes 226 specified in rules 222. As described above, each rule includes one or more candidate attributes of a candidate, as well as one or more job attributes of a job.”])
training a neural network prediction model to minimize a loss of an output of the neural network prediction model based on a set of sample value training data… (“Model-creation apparatus 210 then uses a training technique and/or one or more hyperparameters to update parameters values of filtering model 208 so that scores 214 outputted by filtering model 208 better reflect labels 212 for the corresponding candidate-job features 224.” [¶0057; note: “to minimize a loss of an output of the neural network prediction model” is merely an intended use or result of the training thus carries little to no patentable weight. The updating of the parameters disclosed by Fawaz would be based on the sampled data.])
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Chow’s/Fong’s teachings in order to train a neural network model to minimize a loss based on sampled data taught by Fawaz. One would have been motivated to make this modification in order to improve computer systems, applications, user experiences, tools, and/or technologies related to user recommendations, training machine learning models, employment, recruiting, and/or hiring. [¶0018, Fawaz]
Regarding claim 2, Chow/Fong/Fawaz teaches The method of claim 1, where Fong teaches wherein determining, by the trained neural network prediction model, the output further comprises
determining an output uncertainty (“It can be used to compare variability even when the units are not the same. (corresponds to output “uncertainty”) In general CV informs us about the extent of variation relative to the size of the observation, and it has the advantage that the coefficient of variation is independent of the units of observation” [pg. 39, left col, ¶2]), and the method further comprises:
[causing the output to be presented on a computing device] in response to determining that the output uncertainty exceeds a threshold. (“The subsequent step required in CCV after calculating the CV is to find a threshold in order to decide which features and how many features are to be retained” [pg. 39, right col; note: Fong doesn’t display/present the output on a computing device, however Chow explicitly disclose displaying outputs on a computing device (See ¶0056) therefore the combination of Chow/Fong teaches the limitation as recited.])
Same motivation to combine the teachings of Chow/Fong/Fawaz as in claim 1.
Regarding claim 3, Chow/Fong/Fawaz teaches The method of claim 2, where Fong teaches wherein determining the output uncertainty further comprises:
determining a coefficient of variation based on the output mean and the output standard deviation (“CCV is founded on a basic belief that a good attribute in a training dataset should have its data vary sufficiently wide across a range of values, so that it is significant to characterize a useful prediction model. The coefficient of variation (CV) is expressed as a real number from – 1 to þ 1 and it describes the standard deviation of a set of numbers relative to their mean” [pg. 39, left col, ¶2]); and
wherein determining whether the output uncertainty satisfies a threshold uncertainty comprises determining whether the coefficient of variation satisfies the threshold uncertainty. (“The subsequent step required in CCV after calculating the CV is to find a threshold in order to decide which features and how many features are to be retained” [pg. 39, right col])
Same motivation to combine the teachings of Chow/Fong/Fawaz as in claim 1.
Regarding claim 9, Chow/Fong/Fawaz teaches The method of claim 1, wherein determining the set of data comprising the plurality of attributes further comprises: where Chow teaches Determining the plurality of attributes, wherein the at least one attribute of the plurality of attributes comprises two or more of entity, title, location, industry, and skills. (“One neural network is trained on user attributes, such as a user's job type, time employed, and location” [¶0020])
Regarding claim 10, Chow/Fong/Fawaz teaches The method of claim 1, wherein determining the set of data comprising the plurality of embeddings further comprises:
Chow teaches determining the plurality of embeddings (“The system identifies relationships between the user and the navigational targets based on the embeddings.” [¶0022]), wherein the embedding of the plurality of embeddings comprises a vector measuring similarity between separate attributes of the plurality of attributes. (“In one or more embodiments, the similarity value used for training the neural networks is formed by performing a dot-product or a cosine similarity function on a user attribute embedding and a navigational target embedding.” [¶0026])
Regarding claims 11-13 and 19-20, they are substantially similar to claims 1-3 and 9-10 respectively, and are rejected in the same manner, the same art, and reasoning applying.
Claims 4 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Chow in view of Fong and Fawaz and further in view of Chen et al. ("Checkpoint Ensembles: Ensemble Methods from a Single Training Process", hereinafter "Chen").
Regarding claim 4, Chow/Fong/Fawaz teaches The method of claim 3, where Fong further teaches determining the output uncertainty further comprises:
calculating a variance of a plurality outputs of the plurality of ensemble models (“As we shall see, there is a tradeoff between bias and variance, with very flexible models (which can possibly overfit) having low bias and high variance, and relatively rigid models (under fit) having high bias and low variance.” [pg. 39, right col]); and
wherein determining whether the output uncertainty satisfies a threshold uncertainty further comprises determining whether the variance satisfies the threshold uncertainty. (“The subsequent step required in CCV after calculating the CV is to find a threshold in order to decide which features and how many features are to be retained… The underlying concept behind this task is Bia-Variance dilemma.” [pg. 39, right col])
However Chow/Fong/Fawaz fails to explicitly teach
wherein the neural network prediction model comprises a plurality of ensemble models, an ensemble model of the plurality of ensemble models has a random initialization of weights different from other ensemble models of the plurality of ensemble models
Chen teaches wherein the neural network prediction model comprises a plurality of ensemble models, an ensemble model of the plurality of ensemble models has a random initialization of weights different from other ensemble models of the plurality of ensemble models, (“we use random initialization ensembles (RIE) for comparison as well. For RIE, run models with different random initializations…” [pg. 2, bottom left col – top right col])
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Chow’s/Fawaz’s/Fong’s teachings to run an ensemble of models with different random initializations as taught by Chen. One would have been motivated to make this modification in order to reduce overfitting through training ensemble of models. [Abstract, Chen]
Regarding claim 14, it is substantially similar to claim 4 respectively, and is rejected in the same manner, the same art, and reasoning applying.
Claims 5-7 and 15-17 are rejected under 35 U.S.C. 103 as being unpatentable over Chow in view of Fong and Fawaz and further in view of Chen ("US 20190370496 A1", hereinafter "Chen2").
Regarding claim 5, Chow/Fong/Fawaz teaches The method of claim 2, however fails to explicitly teach wherein the set of data further comprises a plurality of fixed values, and the method further comprises:
generating a set of fixed value training data comprising a plurality of fixed value training vectors, wherein a fixed value training vector of the plurality of fixed value training vectors is based on a fixed value of the plurality of fixed values, not sampled from a numerical range, at least one associated fixed value embedding, and at least one associated fixed value attribute; and
wherein training neural network prediction model further comprises applying the prediction model to the set of fixed value training data.
Chen2 teaches generating a set of fixed value training data comprising a plurality of fixed value training, wherein a fixed value training vector of the plurality of fixed value training vectors is based on a fixed value of the plurality of fixed values, not sampled from a numerical range (“The confidential data, along with the identification of the user, may be stored in a submission table by the confidential data backend 106 in a confidential information database 108. In some example embodiments, this submission table may be encrypted in order to ensure security of the information in the submission table.” [¶0027; note: In light of the specification, the examiner is interpreting encrypting confidential information (i.e. salary) to correspond to “fixed value training data”)]), at least one associated fixed value embedding (¶0062, “Organization embeddings”), and at least one associated fixed value attribute (Abstract, ¶0084, “title, region”); and
wherein training neural network prediction model further comprises applying the prediction model to the set of fixed value training data (“In an example embodiment, the (title, region) component leverages a regression model based prediction approach, while the organization component is modeled via a Bayesian model where an organization is smoothed with peer organization confidential data if there are enough submissions for the peer organizations (regardless of which (title, region) they are from), and smoothed by global information of all submitted confidential data otherwise.” [¶0085]).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Chow’s/Fong’s/Fawaz’s to encrypt and decrypt confidential data (i.e. using fixed data values) as taught by Chen2. One would have been motivated to make this modification in order to utilize confidential data to provide statistical analysis for large organizations. [¶0002-¶0004, Chen2]
Regarding claim 6, Chow/Fong/Fawaz/Chen2 teaches The method of claim 5, where Chow teaches wherein training the neural network prediction model further comprises:
transforming the set of sample value training data from the plurality of sample value training vectors into a first tensor (“In an initial iteration of the user attribute input vector, the machine learning engine may assign a default value to the weights as in the entries in the matrix or tensor. For example, the weights may be assigned a random value between 0 and 1.” [¶0066]); and
transforming the set of fixed value training data from the plurality of fixed value training vectors into a second tensor, wherein the first tensor and the second tensor are of a same order. (“In one or more embodiments, the hidden layers of the user attribute neural network and the navigational target neural network comprise sets of neurons, each comprising a matrix or tensor to perform a transform using outputs from neurons of a previous layer.” [¶0066; note: Chow explicitly discloses transforming training data into tensors while Chen2 discloses the fixed value training data, thus when combined would teach the limitation as currently recited.])
Same motivation to combine the teachings of Chow/Fong/Fawaz/Chen2 as claim 5.
Regarding claim 7, Chow/Fong/Fawaz/Chen2 teaches The method of claim 5, Chen2 teaches wherein
generating the set of fixed value training data further comprises, for the fixed value training vector of the plurality of fixed value training vectors:
determining the fixed value, the at least one associated fixed value embedding, and the at least one associated fixed value attribute, wherein the fixed value, the at least one associated fixed value embedding, and the at least one associated fixed value attribute are encrypted (“The confidential data, along with the identification of the user, may be stored in a submission table by the confidential data backend 106 in a confidential information database 108. In some example embodiments, this submission table may be encrypted in order to ensure security of the information in the submission table.” [¶0027]); and
decrypting the encrypted fixed value, the encrypted at least one associated fixed value embedding, and the encrypted at least one associated fixed value attribute. (“In an example embodiment, the confidential information is stored in encrypted format in the confidential information database 108 when the databus listener 110 sends it to the backend queue 112. As such, one function of the ETL backend 114 is to decrypt the confidential information.” [¶0039])
Same motivation to combine the teachings of Chow/Fong/Fawaz/Chen2 as claim 5.
Regarding claims 15-17, they are substantially similar to claims 5-7 respectively, and is rejected in the same manner, the same art, and reasoning applying.
Claims 8 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Chow in view of Fong, Fawaz and Chen2 and further in view of Hu et al. ("Graph Classification by Mixture of Diverse Experts", hereinafter "Hu").
Regarding claim 8, Chow/Fong/Fawaz/Chen2 teaches The method of claim 7, where Chow teaches wherein the neural network prediction model comprises at least a first expert model and a second expert model (“As described in FIG. 2, once the neural networks are trained, resulting in a pair of machine learning models, the machine learning engine propagates user attributes and navigational target attributes through the models to generate a user embedding for each user and a navigational target embedding for each navigational target.” [¶0074]), and the method further comprises:
training the first expert model by updating parameters of the first expert model to minimize a loss of an output of the first expert model based on a first subset of the set of sample value training data (“The machine learning engine uses the feedback value to update the neural networks. The machine learning engine back propagates the feedback value through the pair of neural networks to adjust the weights in the neural network matrix or tensors by a gradient of the loss function. [¶0071]]);
training the second expert model by updating parameters of the second expert model to minimize a loss of an output of the first expert model based on a second subset of the set of sample value training data (“The machine learning engine uses the feedback value to update the neural networks. The machine learning engine back propagates the feedback value through the pair of neural networks to adjust the weights in the neural network matrix or tensors by a gradient of the loss function. [¶0071]]),
wherein the first subset and the second subset include a different number of attributes of the plurality of attributes (“As described in FIG. 2, once the neural networks are trained, resulting in a pair of machine learning models, the machine learning engine propagates user attributes and navigational target attributes through the models to generate a user embedding for each user and a navigational target embedding for each navigational target.” [¶0074]);
determining, by the first expert model, a first expert output based on the set of input data (“A system propagates sets of user attributes through one neural network and sets of navigational target attributes through another neural network. The neural networks are configured to generate, as outputs, vectors mapped to a same vector space.” [Abstract]);
determining, by the second expert model, a second expert output based on the set of input data (“A system propagates sets of user attributes through one neural network and sets of navigational target attributes through another neural network. The neural networks are configured to generate, as outputs, vectors mapped to a same vector space.” [Abstract]);
However Chow/Fong/Fawaz/Chen2 fails to explicitly teach wherein determining the output further comprises:
determining, by a gating network, a first output probability associated with the first expert output;
determining, by the gating network, a second output probability associated with the first expert output; and
determining the output based on the first expert output, the second expert output, the first output probability, and the second output probability.
wherein determining the output further comprises:
Hu teaches determining, by a gating network, a first output probability associated with the first expert output (p(y|z, x; Θ) represents output distribution of the z-th expert.” [pg. 4]);
determining, by the gating network, a second output probability associated with the first expert output (p(y|z, x; Θ) represents output distribution of the z-th expert.” [pg. 4]); and
determining the output based on the first expert output, the second expert output, the first output probability, and the second output probability. (“p(z|x; Θ) is the output of the gating network, indicating the prior probability to assign x to the z-th expert. p(y|z, x; Θ) represents output distribution of the z-th expert.” [pg. 4, left col; note: output distribution would include a first/second… probability associated with their respective outputs.])
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Chow’s/Fong’s/Fawaz’s/Chen2’s teachings in order to implement the gating network of Hu to determine the output probabilities of the expert models. One would have been motivated to make this modification in order to make the learning process easier for each expert model. [pg. 4, left col, Mixture of diverse experts, Hu]
Regarding claim 18, it is substantially similar to claim 8 respectively, and is rejected in the same manner, the same art, and reasoning applying.
Response to Arguments
Applicant's arguments filed 12/01/2025 have been fully considered but they are not persuasive.
Regarding the 35 U.S.C. §101 Rejection:
Applicant appears to assert the claim limitations of “training a neural network prediction model by updating parameters of the neural network prediction model…” and “determining, by the trained neural network prediction model, an output based on a set of input data…” encompasses AI in a way that cannot be practically performed in the human mind. While the examiner agrees that training a neural network prediction model cannot be practically performed in the human mind, this limitation is being interpreted as an additional element under Step 2A Prong 2. Additionally, the limitation of “determining, by the trained neural network prediction model, an output based on a set of input data…” can be practically performed in the human as it is merely performing a prediction where the output is a predicted range of values based on mean and standard deviation. The mere fact that this step is being performed by a neural network model does not mean the step is not practically performable as a mental process. The neural network model is merely used as a tool to perform the abstract idea. Therefore, the examiner asserts the claims do recite an abstract idea.
Applicant appears to assert that by training a neural network prediction model to minimize a loss of an output of the neural network prediction model based on a set of sample data would integrate the abstract idea into a practical application because the claim improves the functioning of a computer or technical field. Examiner respectfully disagrees. Specifically, applicant asserts the improvement is recited in ¶0012 of the instant specification. While ¶0012 does set for an improvement, it does so in a conclusory manner without any particular details on any specific training steps of the model that would lead to such an improvement. The claim merely states that the model/neural network is being trained on sampled data which amounts to operating a computer or other machinery in its ordinary capacity and amounts to mere instructions to apply the judicial exception using a generic computer component. Please see MPEP 2106.05(f). Therefore, the claims fail to reflect any improvement over existing computer technology or technical field.
Applicant further asserts the claims amount to significantly more than the judicial exception because the claims recite an improvement to the technical field of training machine learning models. Examiner respectfully disagrees. As noted above, the claim recites training the machine learning in a broad and generic manner such that it amounts to merely operating the computer or other machinery in its ordinary capacity and amounts to mere instructions to apply the judicial exception using a generic computer component. Please see MPEP 2106.05(f).
Regarding the 35 U.S.C. §103 Rejection:
Applicant’s arguments regarding the claimed combination of Chow/Fong, Chen, Chen2, and Hu does not suggest at least “the set of data comprising a plurality of ranges” has been considered but are moot because these limitations and the newly amended limitations are now taught by the newly provided art of Fawaz. Please see the updated 103 rejection above.
Applicant’s arguments with respect to the rejections of the dependent claims have been fully considered but they are not persuasive as they rely upon the allowability of the independent claims
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL H HOANG whose telephone number is (571)272-8491. The examiner can normally be reached Mon-Fri 8:30AM-4:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached at (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MICHAEL H HOANG/PRIMARY EXAMINER, Art Unit 2122