Last updated: April 19, 2026
Application No. 17/694,463
MACHINE LEARNING TECHNIQUES TO PREDICT CONTENT ACTIONS

Non-Final OA §101§102§103§112
Filed
Mar 14, 2022
Examiner
PHUNG, STEVEN HUYNH
Art Unit
2125
Tech Center
2100 — Computer Architecture & Software
Assignee
Snap Inc.
OA Round
1 (Non-Final)
Interview Optional

— +26.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 38 resolved cases, 2023–2026
Examiner Intelligence

PHUNG, STEVEN HUYNH View full profile →
Grants 74% — above average
Career Allow Rate
28 granted / 38 resolved
+18.7% vs TC avg
Strong +26% interview lift
Without
With
+26.2%
Interview Lift
resolved cases with interview
Typical timeline
4y 6m
Avg Prosecution
20 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
33.6%
-6.4% vs TC avg
§103
34.6%
-5.4% vs TC avg
§102
10.3%
-29.7% vs TC avg
§112
20.6%
-19.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 38 resolved cases
Office Action

§101 §102 §103 §112
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
The present application is being examined under the claims filed on 3-14-2022.
Claims 1-20 are pending.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 11-3-2023 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.

Claim Objections
Claim 2, 4-5, 7, and 19 is objected to because of the following informalities:
In claims 2, 4, 7, “comprising” should read “further comprising” to clarify the addition of new elements in the dependent claims.
Claim 5 is objected to for inheriting the deficiencies of claim 4.
In claim 5, “the method comprises” should read “the method further comprises”.
In claim 19, “the machine learning architecture includes; one or more” should read “the machine learning architecture includes: one or more”.
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 5 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Regarding Claim 5:
Claim 5 recites “a first machine learning architecture”. Claim 4, which claim 5 is dependent upon, recites “a first machine learning architecture”. Since both limitations recite “a first machine learning architecture” is it unclear if claim 5’s limitation refers to claim 4 or if Applicant is referring to separate first machine learning architecture. Examiner is interpreting the machine learning architecture from claim 5 to be referring to claim 4. Applicant is advised to amend to “the first machine learning architecture”.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Step 1: Claims 1-7 are directed to a method [process]. Claims 8-16 are directed to a computing system [machine]. Claims 17-20 are directed to a non-transitory computer-readable storage media [machine].

Regarding Claim 1:
Step 2A, Prong 1: The following limitations are directed to the abstract idea of a mental process [see MPEP 2106.04(a)(2) III. C.]. In particular, the claim recites mental processes that are concepts performed in the human mind or with pen and paper (including an observation, evaluation, judgement, or opinion).
(a) analyzing…the user input to determine a content actions group that corresponds to the user input, the content actions group including a plurality of actions that users of the client application perform in relation to content accessible by the client application
(b) determining…a machine learning architecture that corresponds to the content actions group, the machine learning architecture including a feature extraction layer and one or more computational experts models
(c) determining…based on the content actions group, input data for the machine learning architecture, the input data including profile data of the user
(d) …determine output data of the feature extraction layer 
(e) …determine first probabilities of the user performing the plurality of actions with respect to a first content item
(f) …determine second probabilities of the user performing the plurality of actions with respect to a second content item
(g) determining…that the first probabilities are greater than the second probabilities
As drafted, under their broadest reasonable interpretation (BRI), in view of the specification, the above limitations cover concepts performed in the human mind (observation, evaluation, judgement, or opinion). Given a sufficiently small set of data, nothing in the claim prohibits this process from being performed mentally or with pen and paper.
Step 2A, Prong 2: There are no additional elements in this claim that integrate the judicial exception into a practical application.
The following additional elements are adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea [see MPEP 2106.05(f)] and therefore fails to integrate the judicial exception into a practical application.
(a) …by the computing system…
(b) …by the computing system…
(c) …by the computing system…
(d) executing, by the computing system, the feature extraction layer based on the input data to…
(e) executing, by the computing system and based on the output data of the feature extraction layer, the one or more computational experts models to…
(f) executing, by the computing system, the one or more computational experts models to…
(g) …by the computing system…
The following additional elements are directed to insignificant extra-solution activity to the judicial exception [see MPEP 2106.05(g)].
receiving, by a computing system that includes one or more processors and memory, user input corresponding to content accessible to a user of a client application
causing, by the computing system, the first content item to be accessible to the user via the client application
Step 2B: There are no additional elements in this claim that amount to significantly more than the judicial exception.
The following additional elements are adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea [see MPEP 2106.05(f)] and therefore fails to amount to significantly more than the judicial exception.
(a) …by the computing system…
(b) …by the computing system…
(c) …by the computing system…
(d) executing, by the computing system, the feature extraction layer based on the input data to…
(e) executing, by the computing system and based on the output data of the feature extraction layer, the one or more computational experts models to…
(f) executing, by the computing system, the one or more computational experts models to…
(g) …by the computing system…
The following additional elements are directed to receiving or transmitting data over a network. The courts (as per Intellectual Ventures v. Symantec, 838 F.3d 1307, 1321; 120 USPQ2d 1353, 1362 (Fed. Cir. 2016)) have recognized receiving or transmitting data over a network as well-understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity to the judicial exception [see MPEP 2106.05(d) II.].
receiving, by a computing system that includes one or more processors and memory, user input corresponding to content accessible to a user of a client application
The following additional elements are directed to presenting content to a user via a client application at a high level of generality.
Nashed (US 20060111974), discloses in para. [0017] “The communications network sites 4 and 5 are preferably websites having communications capabilities, and include servers and associated processors that perform data processing operations and provide for the exchange of information over the network 3, such as communicating information with web browsers operating on a user terminal over the communication network 3, as well known in the art. The sites 4 and 5 can include such applications as a search engine, content links and advertisement bars, and also have transaction capabilities that permit a user accessing the site via the browser on the user terminal to purchase products and services and also to download content, as well known and conventional in the art.”
Nashed discloses it is well known in the art for users to communicate information through websites/web browsers. Additionally, Nash discloses it is well known and conventional for these websites to include information such as content links and advertisement bars. Nashed has recognized presenting content to a user via a client application as a well-understood, routine, and conventional activity previously known in the industry [see MPEP 2106.05(d)].
causing, by the computing system, the first content item to be accessible to the user via the client application

Regarding Claim 2:
Step 2A, Prong 1: This claim recites the same abstract ideas as in claim 1. Additionally,
The following limitations are/remain directed to the abstract idea of a mental process [see MPEP 2106.04(a)(2) III. C.]. In particular, the claim recites mental processes that are concepts performed in the human mind (including an observation, evaluation, judgement, or opinion).
(a) determining…one or more characteristics of the content accessible to the user
(b) determining…a number of candidate advertising content items to make accessible to the user based on the one or more characteristics, wherein the first content item is a first advertising content item of the number of candidate advertising content items and the second content item is a second advertising content item of the number of candidate advertising content items
Step 2A, Prong 2: There are no additional elements in this claim that integrate the judicial exception into a practical application.
The following additional elements are adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea [see MPEP 2106.05(f)] and therefore fails to integrate the judicial exception into a practical application.
(a) …by the computing system…
(b) …by the computing system…
The following additional elements are directed to insignificant extra-solution activity to the judicial exception [see MPEP 2106.05(g)].
causing, by the computing system, the first advertising content item to be displayed in conjunction with the content
Step 2B: There are no additional elements in this claim that amount to significantly more than the judicial exception.
The following additional elements are adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea [see MPEP 2106.05(f)] and therefore fails to amount to significantly more than the judicial exception.
(a) …by the computing system…
(b) …by the computing system…
The following additional elements are directed to displaying advertising content with other content at a high level of generality.
Nashed discloses in para. [0017] “The communications network sites 4 and 5 are preferably websites having communications capabilities, and include servers and associated processors that perform data processing operations and provide for the exchange of information over the network 3, such as communicating information with web browsers operating on a user terminal over the communication network 3, as well known in the art. The sites 4 and 5 can include such applications as a search engine, content links and advertisement bars, and also have transaction capabilities that permit a user accessing the site via the browser on the user terminal to purchase products and services and also to download content, as well known and conventional in the art.”
Nashed discloses it is well known in the art for users to communicate information through websites/web browsers. Additionally, Nash discloses it is well known and conventional for these websites to include information such as content links and advertisement bars. Nashed has recognized displaying advertising content with other content as a well-understood, routine, and conventional activity previously known in the industry [see MPEP 2106.05(d)].
causing, by the computing system, the first advertising content item to be displayed in conjunction with the content

Regarding Claim 3:
Step 2A, Prong 1: This claim recites the same abstract ideas as in claim 1. Additionally,
The following limitations are/remain directed to the abstract idea of a mental process [see MPEP 2106.04(a)(2) III. C.]. In particular, the claim recites mental processes that are concepts performed in the human mind (including an observation, evaluation, judgement, or opinion).
(b of claim 1) wherein:
the machine learning architecture is one of a plurality of machine learning architectures
the content actions group is one of a plurality of content actions groups that are associated with content items
individual machine learning architectures of the plurality of machine learning architectures corresponding to an individual content actions group of the plurality of content action groups
Step 2A, Prong 2: There are no additional elements in this claim that integrate the judicial exception into a practical application.
Step 2B: There are no additional elements in this claim that amount to significantly more than the judicial exception.

Regarding Claim 4:
Step 2A, Prong 1: This claim recites the same abstract ideas as in claim 3.
Step 2A, Prong 2: There are no additional elements in this claim that integrate the judicial exception into a practical application.
The following additional elements are adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea [see MPEP 2106.05(f)] and therefore fails to integrate the judicial exception into a practical application.
performing, by the computing system, a first training process of a first machine learning architecture of the plurality of machine learning architectures using a first set of training data, the first set of training data including one or more first characteristics of profile data of users of the client application
performing, by the computing system, a second training process of a second machine learning architecture of the plurality of machine learning architectures using a second set of training data, the second set of training data including one or more second characteristics of profile data of users of the client application, the one or more second characteristics being different from the one or more first characteristics
Step 2B: There are no additional elements in this claim that amount to significantly more than the judicial exception.
The following additional elements are adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea [see MPEP 2106.05(f)] and therefore fails to amount to significantly more than the judicial exception.
performing, by the computing system, a first training process of a first machine learning architecture of the plurality of machine learning architectures using a first set of training data, the first set of training data including one or more first characteristics of profile data of users of the client application
performing, by the computing system, a second training process of a second machine learning architecture of the plurality of machine learning architectures using a second set of training data, the second set of training data including one or more second characteristics of profile data of users of the client application, the one or more second characteristics being different from the one or more first characteristics

Regarding Claim 5:
Step 2A, Prong 1: This claim recites the same abstract ideas as in claim 4. Additionally,
The following limitations are/remain directed to the abstract idea of a mental process [see MPEP 2106.04(a)(2) III. C.]. In particular, the claim recites mental processes that are concepts performed in the human mind (including an observation, evaluation, judgement, or opinion).
(b of claim 1) wherein the machine learning architecture is a first machine learning architecture
(a) extracting…the one or more first characteristics from profile data of the user from a database
(b) analyzing…values of the one or more first characteristics included in the profile data of the user to determine the first probabilities and the second probabilities
Step 2A, Prong 2: There are no additional elements in this claim that integrate the judicial exception into a practical application.
The following additional elements are adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea [see MPEP 2106.05(f)] and therefore fails to integrate the judicial exception into a practical application.
(a) …by the computing system…
(b) …by the computing system and using the first machine learning architecture…
Step 2B: There are no additional elements in this claim that amount to significantly more than the judicial exception.
The following additional elements are adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea [see MPEP 2106.05(f)] and therefore fails to amount to significantly more than the judicial exception.
(a) …by the computing system…
(b) …by the computing system and using the first machine learning architecture…

Regarding Claim 6:
Step 2A, Prong 1: This claim recites the same abstract ideas as in claim 1. Additionally,
The following limitations are/remain directed to the abstract idea of a mental process [see MPEP 2106.04(a)(2) III. C.]. In particular, the claim recites mental processes that are concepts performed in the human mind (including an observation, evaluation, judgement, or opinion).
(c of claim 1) wherein the input data includes first data that corresponds to continuous data, second data that corresponds to discrete values, and third data that corresponds to sparse data, the sparse data corresponding to a set of data values with a majority of the set of data values being zero
Step 2A, Prong 2: There are no additional elements in this claim that integrate the judicial exception into a practical application.
Step 2B: There are no additional elements in this claim that amount to significantly more than the judicial exception.

Regarding Claim 7:
Step 2A, Prong 1: This claim recites the same abstract ideas as in claim 6. Additionally,
The following limitations are/remain directed to the abstract idea of a mental process [see MPEP 2106.04(a)(2) III. C.]. In particular, the claim recites mental processes that are concepts performed in the human mind (including an observation, evaluation, judgement, or opinion).
(a) performing…a first normalization process with respect to the second data to produce modified second data
(b) performing…a second normalization process with respect to the third data to produce modified third data
(c) combining…the first data, the modified second data, and the modified third data to produce modified input data
Step 2A, Prong 2: There are no additional elements in this claim that integrate the judicial exception into a practical application.
The following additional elements are adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea [see MPEP 2106.05(f)] and therefore fails to integrate the judicial exception into a practical application.
(a) …by the computing system…
(b) …by the computing system…
(c) …by the computing system…
providing, by the computing system, the modified input data to the feature extraction layer
Step 2B: There are no additional elements in this claim that amount to significantly more than the judicial exception.
The following additional elements are adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea [see MPEP 2106.05(f)] and therefore fails to amount to significantly more than the judicial exception.
(a) …by the computing system…
(b) …by the computing system…
(c) …by the computing system…
providing, by the computing system, the modified input data to the feature extraction layer

Regarding Claim 8:
Step 2A, Prong 1: The following limitations are directed to the abstract idea of a mental process [see MPEP 2106.04(a)(2) III. C.]. In particular, the claim recites mental processes that are concepts performed in the human mind or with pen and paper (including an observation, evaluation, judgement, or opinion).
analyzing the user input to determine a content actions group that corresponds to the user input, the content actions group including a plurality of actions that users of the client application perform in relation to content accessible by the client application
determining a machine learning architecture that corresponds to the content actions group, the machine learning architecture including a feature extraction layer and one or more computational experts models
determining, based on the content actions group, input data for the machine learning architecture, the input data including profile data of the user
(a) …determine output data of the feature extraction layer 
(b) …determine probabilities of the user performing the plurality of actions with respect to one or more content items
determining, based on the probabilities, a content item of the one or more content items to make accessible to the user via the client application
As drafted, under their broadest reasonable interpretation (BRI), in view of the specification, the above limitations cover concepts performed in the human mind (observation, evaluation, judgement, or opinion). Given a sufficiently small set of data, nothing in the claim prohibits this process from being performed mentally or with pen and paper.
Step 2A, Prong 2: There are no additional elements in this claim that integrate the judicial exception into a practical application.
The following additional elements are adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea [see MPEP 2106.05(f)] and therefore fails to integrate the judicial exception into a practical application.
A computing system comprising: one or more hardware processors; and one or more non-transitory computer-readable storage media including computer-readable instructions that, when executed by the one or more hardware processors, cause the one or more hardware processors to perform operations comprising:
(a) executing the feature extraction layer based on the input data to…
(b) executing, based on the output data of the feature extraction layer, the one or more computational experts models to…
The following additional elements are directed to insignificant extra-solution activity to the judicial exception [see MPEP 2106.05(g)].
receiving user input corresponding to content accessible to a user of a client application
Step 2B: There are no additional elements in this claim that amount to significantly more than the judicial exception.
The following additional elements are adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea [see MPEP 2106.05(f)] and therefore fails to amount to significantly more than the judicial exception.
A computing system comprising: one or more hardware processors; and one or more non-transitory computer-readable storage media including computer-readable instructions that, when executed by the one or more hardware processors, cause the one or more hardware processors to perform operations comprising:
(a) executing the feature extraction layer based on the input data to…
(b) executing, based on the output data of the feature extraction layer, the one or more computational experts models to…
The following additional elements are directed to receiving or transmitting data over a network. The courts (as per Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362) have recognized receiving or transmitting data over a network as well-understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity to the judicial exception [see MPEP 2106.05(d) II.].
receiving user input corresponding to content accessible to a user of a client application

Regarding Claim 9:
Step 2A, Prong 1: This claim recites the same abstract ideas as in claim 8.
Step 2A, Prong 2: There are no additional elements in this claim that integrate the judicial exception into a practical application.
The following additional elements are adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea [see MPEP 2106.05(f)] and therefore fails to integrate the judicial exception into a practical application.
wherein the feature extraction layer includes a deep and cross network having a plurality of cross layers coupled to a deep network (executing the feature extraction layer from claim 8)
Step 2B: There are no additional elements in this claim that amount to significantly more than the judicial exception.

Regarding Claim 10:
Step 2A, Prong 1: This claim recites the same abstract ideas as in claim 9. Additionally,
The following limitations are/remain directed to the abstract idea of a mental process [see MPEP 2106.04(a)(2) III. C.]. In particular, the claim recites mental processes that are concepts performed in the human mind (including an observation, evaluation, judgement, or opinion).
performing one or more normalization processes with respect to data output from individual cross layers of the plurality of cross layers
Step 2A, Prong 2: There are no additional elements in this claim that integrate the judicial exception into a practical application.
The following additional elements are adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea [see MPEP 2106.05(f)] and therefore fails to integrate the judicial exception into a practical application.
wherein the one or more non-transitory computer-readable storage media including additional computer-readable instructions that, when executed by the one or more hardware processors, cause the one or more hardware processors to perform additional operations comprising:
Step 2B: There are no additional elements in this claim that amount to significantly more than the judicial exception.
The following additional elements are adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea [see MPEP 2106.05(f)] and therefore fails to amount to significantly more than the judicial exception.
wherein the one or more non-transitory computer-readable storage media including additional computer-readable instructions that, when executed by the one or more hardware processors, cause the one or more hardware processors to perform additional operations comprising:

Regarding Claim 11:
Step 2A, Prong 1: This claim recites the same abstract ideas as in claim 8. Additionally,
The following limitations are/remain directed to the abstract idea of a mental process [see MPEP 2106.04(a)(2) III. C.]. In particular, the claim recites mental processes that are concepts performed in the human mind (including an observation, evaluation, judgement, or opinion).
wherein the machine learning architecture includes one or more extraction layers, the one or more extraction layers including the one or more computational experts models (determining a machine learning architecture [from claim 8] with one or more layers/one or more models is mentally performable)
Step 2A, Prong 2: There are no additional elements in this claim that integrate the judicial exception into a practical application.
Step 2B: There are no additional elements in this claim that amount to significantly more than the judicial exception.

Regarding Claim 12:
Step 2A, Prong 1: This claim recites the same abstract ideas as in claim 11. Additionally,
The following limitations are/remain directed to the abstract idea of a mental process [see MPEP 2106.04(a)(2) III. C.]. In particular, the claim recites mental processes that are concepts performed in the human mind (including an observation, evaluation, judgement, or opinion).
wherein the one or more extraction layers include:
a first extraction layer having one or more first computational experts models that correspond to a first content action of the plurality of actions, one or more second computational experts models that correspond to a second content action of the plurality of actions, and one or more shared computational experts models; and
a second extraction layer having one or more first additional computational experts models that correspond to the first content action, one or more second additional computational experts models that correspond to the second content action, and one or more additional shared computational experts models
(determining a machine learning architecture [from claim 11] with multiple layers/multiple models is mentally performable)
Step 2A, Prong 2: There are no additional elements in this claim that integrate the judicial exception into a practical application.
Step 2B: There are no additional elements in this claim that amount to significantly more than the judicial exception.

Regarding Claim 13:
Step 2A, Prong 1: This claim recites the same abstract ideas as in claim 12. Additionally,
The following limitations are/remain directed to the abstract idea of a mental process [see MPEP 2106.04(a)(2) III. C.]. In particular, the claim recites mental processes that are concepts performed in the human mind (including an observation, evaluation, judgement, or opinion).
wherein:
the first extraction layer includes a first gating network coupled to the one or more first computational experts models, a second gating network coupled to the one or more second computational experts models, and a third gating network coupled to the one or more shared computational experts models
the second extraction layer includes a first additional gating network coupled to the one or more first additional computational experts models and a second additional gating network coupled to the one or more second additional computational experts model
(determining a machine learning architecture [from claim 12] with multiple layers/multiple models coupled to gating networks is mentally performable)
Step 2A, Prong 2: There are no additional elements in this claim that integrate the judicial exception into a practical application.
Step 2B: There are no additional elements in this claim that amount to significantly more than the judicial exception.

Regarding Claim 14:
Step 2A, Prong 1: This claim recites the same abstract ideas as in claim 13. Additionally,
The following limitations are/remain directed to the abstract idea of a mental process [see MPEP 2106.04(a)(2) III. C.]. In particular, the claim recites mental processes that are concepts performed in the human mind (including an observation, evaluation, judgement, or opinion).
wherein the machine learning architecture includes:
a first additional computational layer coupled to the first additional gating network to modify output of the first additional gating network; and
a second additional computational layer coupled to the second additional gating network to modify output of the second additional gating network
(determining a machine learning architecture [from claim 13] with multiple layers coupled by gating networks is mentally performable)
Step 2A, Prong 2: There are no additional elements in this claim that integrate the judicial exception into a practical application.
Step 2B: There are no additional elements in this claim that amount to significantly more than the judicial exception.

Regarding Claim 15:
Step 2A, Prong 1: This claim recites the same abstract ideas as in claim 14. Additionally,
The following limitations are/remain directed to the abstract idea of a mental process [see MPEP 2106.04(a)(2) III. C.]. In particular, the claim recites mental processes that are concepts performed in the human mind (including an observation, evaluation, judgement, or opinion).
(a) …applies one or more first linear transforms to the output of the first additional gating network to determine first probabilities corresponding to the first content action
(b) …applies one or more second linear transforms to the output of the second additional gating network to determine second probabilities corresponding to the second content action
Step 2A, Prong 2: There are no additional elements in this claim that integrate the judicial exception into a practical application.
The following additional elements are adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea [see MPEP 2106.05(f)] and therefore fails to integrate the judicial exception into a practical application.
(a) the first additional computational layer…
(b) the second additional computational layer…
Step 2B: There are no additional elements in this claim that amount to significantly more than the judicial exception.
The following additional elements are adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea [see MPEP 2106.05(f)] and therefore fails to amount to significantly more than the judicial exception.
(a) the first additional computational layer…
(b) the second additional computational layer…

Regarding Claim 16:
Step 2A, Prong 1: This claim recites the same abstract ideas as in claim 15. Additionally,
The following limitations are/remain directed to the abstract idea of a mental process [see MPEP 2106.04(a)(2) III. C.]. In particular, the claim recites mental processes that are concepts performed in the human mind (including an observation, evaluation, judgement, or opinion).
wherein the one or more first linear transforms produce one or more first logit values
the one or more second linear transforms produce one or more second logit values
Step 2A, Prong 2: There are no additional elements in this claim that integrate the judicial exception into a practical application.
Step 2B: There are no additional elements in this claim that amount to significantly more than the judicial exception.

Regarding Claim 17:
Step 2A, Prong 1: The following limitations are directed to the abstract idea of a mental process [see MPEP 2106.04(a)(2) III. C.]. In particular, the claim recites mental processes that are concepts performed in the human mind or with pen and paper (including an observation, evaluation, judgement, or opinion).
analyzing the user input to determine a content actions group that corresponds to the user input, the content actions group including a plurality of actions that users of the client application perform in relation to content accessible by the client application
determining a machine learning architecture that corresponds to the content actions group, the machine learning architecture including a feature extraction layer and one or more computational experts models
determining, based on the content actions group, input data for the machine learning architecture, the input data including profile data of the user
(a) …determine output data of the feature extraction layer
(b) …based on the output data of the feature extraction layer…determine probabilities of the user performing the plurality of actions with respect to one or more content items
determining, based on the probabilities, a content item of the one or more content items to make accessible to the user via the client application
As drafted, under their broadest reasonable interpretation (BRI), in view of the specification, the above limitations cover concepts performed in the human mind (observation, evaluation, judgement, or opinion). Given a sufficiently small set of data, nothing in the claim prohibits this process from being performed mentally or with pen and paper.
Step 2A, Prong 2: There are no additional elements in this claim that integrate the judicial exception into a practical application.
The following additional elements are adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea [see MPEP 2106.05(f)] and therefore fails to integrate the judicial exception into a practical application.
One or more non-transitory computer-readable storage media including computer-readable instructions that, when executed by one or more hardware processors, cause the one or more hardware processors to perform operations comprising:
(a) executing the feature extraction layer based on the input data to…
(b) executing…the one or more computational experts models to…
The following additional elements are directed to insignificant extra-solution activity to the judicial exception [see MPEP 2106.05(g)].
receiving user input corresponding to content accessible to a user of a client application
Step 2B: There are no additional elements in this claim that amount to significantly more than the judicial exception.
The following additional elements are adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea [see MPEP 2106.05(f)] and therefore fails to amount to significantly more than the judicial exception.
One or more non-transitory computer-readable storage media including computer-readable instructions that, when executed by one or more hardware processors, cause the one or more hardware processors to perform operations comprising:
(a) executing the feature extraction layer based on the input data to…
(b) executing…the one or more computational experts models to…
The following additional elements are directed to receiving or transmitting data over a network. The courts (as per Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362) have recognized receiving or transmitting data over a network as well-understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity to the judicial exception [see MPEP 2106.05(d) II.].
receiving user input corresponding to content accessible to a user of a client application

Regarding Claim 18:
Step 2A, Prong 1: This claim recites the same abstract ideas as in claim 17. Additionally,
The following limitations are/remain directed to the abstract idea of a mental process [see MPEP 2106.04(a)(2) III. C.]. In particular, the claim recites mental processes that are concepts performed in the human mind (including an observation, evaluation, judgement, or opinion).
wherein:
the one or more computational experts models include one or more feed forward neural networks
the one or more computational experts models are coupled to one or more gating networks
the one or more gating networks include a plurality of softmax layers
(determining a machine learning architecture [from claim 17] with feed forward networks coupled with gating networks/softmax layers is mentally performable)
Step 2A, Prong 2: There are no additional elements in this claim that integrate the judicial exception into a practical application.
Step 2B: There are no additional elements in this claim that amount to significantly more than the judicial exception.

Regarding Claim 19:
Step 2A, Prong 1: This claim recites the same abstract ideas as in claim 17. Additionally,
The following limitations are/remain directed to the abstract idea of a mental process [see MPEP 2106.04(a)(2) III. C.]. In particular, the claim recites mental processes that are concepts performed in the human mind (including an observation, evaluation, judgement, or opinion).
wherein the machine learning architecture includes
one or more extraction layers that include the one or more computational experts models and one or more gating networks coupled to the one or more computational experts models; and
one or more additional computational layers that are coupled to the one or more gating networks, wherein the one or more additional computational layers determine the probabilities based on output obtained from the one or more gating networks
(determining a machine learning architecture [from claim 17] with layers, models, gating networks coupled to the models mentally performable)
Step 2A, Prong 2: There are no additional elements in this claim that integrate the judicial exception into a practical application.
Step 2B: There are no additional elements in this claim that amount to significantly more than the judicial exception.

Regarding Claim 20:
Step 2A, Prong 1: This claim recites the same abstract ideas as in claim 17. Additionally,
The following limitations are/remain directed to the abstract idea of a mental process [see MPEP 2106.04(a)(2) III. C.]. In particular, the claim recites mental processes that are concepts performed in the human mind (including an observation, evaluation, judgement, or opinion).
wherein:
the input data includes at least one of information indicating content viewing history of the user or demographic information of the user (determining input data [from claim 17] is mentally performable)
the content item includes advertising content related to an item available for purchase via the client application (determining content item [from claim 17] is mentally performable)
the plurality of actions includes viewing a page related to the item, purchasing the item, adding the item to a cart of the user for a potential future purchase of the item, and performing a sign up action with regard to the item (determining the probabilities of the user performing the plurality of actions [from claim 17] is mentally performable)
Step 2A, Prong 2: There are no additional elements in this claim that integrate the judicial exception into a practical application.
Step 2B: There are no additional elements in this claim that amount to significantly more than the judicial exception.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claims 1-5, 8, 11-12, 17, and 20 are rejected under 35 U.S.C. 102(a)(1)/(a)(2) as being anticipated by Xiong et al. (US 20190114528), hereinafter Xiong).

Regarding Claim 1:
Xiong discloses:
A method comprising:…by a computing system that includes one or more processors and memory
Xiong, [0015], “FIG. 2 is a system environment 200 of an online system 240…”
[0068], “Any of the steps, operations, or processes described herein may be performed or implemented with one or more hardware or software modules, alone or in combination with other devices. In one embodiment, a software module is implemented with a computer program product comprising a computer-readable medium containing computer program code, which can be executed by a computer processor for performing any or all of the steps, operations, or processes described.”
In para. 15, Xiong discloses their online system [computing system], and para. 68 further discloses the embodiments are implemented with a computer-readable medium [memory] executed by a computer processor [one or more processors] for performing the steps, operations, or processes described [method].
receiving…user input corresponding to content accessible to a user of a client application
Xiong, [0029], “The action logger 245 receives communications about user actions internal to and/or external to the online system 240, populating the action log 250 with information about user actions.”
[0025], “A user profile in the user profile store 242 may also maintain references to actions by the corresponding user performed on content items in the content store 243 and stored in the action log 250.”
[0016], “In one embodiment, a client device 210 executes an application allowing a user of the client device 210 to interact with the online system 240. For example, a client device 210 executes a browser application to enable interaction between the client device 210 and the online system 240 via the network 220. In another embodiment, a client device 210 interacts with the online system 240 through an application programming interface (API) running on a native operating system of the client device 210…”
In para. 29, Xiong discloses that the action logger of the online system receives communications about user actions [receiving…user input], and para. 25 specifies the actions correspond to user actions performed on content items [user input corresponding to content accessible to a user]. Lastly, para. 16 discloses a client device for executing an application for allowing a user of the client device to interact with the online system [client application].
analyzing, by the computing system, the user input to determine a content actions group that corresponds to the user input
Xiong, [0030], “The action log 250 may be used by the online system 240 to track user actions on the online system 240…Users may interact with various objects on the online system 240, and information describing these interactions are stored in the action log 250…Additionally, the action log 250 may record a user's interactions with advertisements on the online system 240 as well as with other applications operating on the online system 240.”
[0031], “The action log 250 may also store user actions taken on an external system 230, such as an external website, and communicated to the online system 240.”
[0050], “Tasks A1 and A2 are tasks in a task domain D1 460 that is a category of tasks performed by a user.”
[0003], “For various tasks (or actions)…”
In paras. 30 and 31, Xiong discloses information describing users interactions are stored in the action log based on users interacting with objects of the online system, users interacting with advertisements and or applications of the online system, or users interactions with external systems to the online system [analyzing…the user input to determine a content actions group that corresponds to the user input]. In view of this, para. 50 discloses an example where tasks (Xiong interchangeably uses “tasks” and “actions”, e.g. para. 3) A1 and A2 are tasks belonging to the task domain D1 [content action group].
the content actions group including a plurality of actions that users of the client application perform in relation to content accessible by the client application
Xiong, [0030], “Examples of interactions with objects include: commenting on posts, sharing links, and checking-in to physical locations via a mobile device, accessing content items, and any other interactions.”
[0031], “Hence, the action log 250 may record information about actions users perform on the external system 230, including webpage viewing histories, advertisements that were engaged, purchases made, and other patterns from shopping and buying.”
In paras. 30 and 31, Xiong discloses examples of user interactions with objects and external systems to the online system [a plurality of actions that users of the client application perform in relation to the content accessible by the client application].
determining, by the computing system, a machine learning architecture that corresponds to the content actions group
Xiong, [0050], “FIG. 4A is an example of a first flexible multi-task neural network prediction model 400A applied to the content item 110 for predicting one or more specific tasks between the viewing user 105 and the content item 110, in accordance with an embodiment…Alternatively (not shown in FIG. 4A), the prediction model 400A includes one or more additional separate layers associated with other suitable tasks.”
[0055], “FIG. 4B is an example of a second flexible multi-task neural network prediction model 400B applied to the content item 110 for predicting one or more specific tasks between the viewing user 105 and the content item 110, in accordance with an embodiment.”
[0057], “FIG. 4C is an example of a third multi-task neural network prediction model 400C applied to the content item 110 for predicting one or more specific tasks between the viewing user 105 and the content item 110, in accordance with an embodiment…In various embodiments (not shown), the prediction model 400C includes more separate layers in the domain D1 460 and the domain D2 470.”
In paras. 50, 55, and 57, Xiong discloses various examples of a flexible multi-task neural network prediction model. Specifically in para. 57, Xiong discloses the model may include more layers [determining…a machine learning architecture] for the domains [corresponds to the action groups].
the machine learning architecture including a feature extraction layer and one or more computational experts models
Xiong, [0035], “The flexible multi-task neutral network prediction model predicts interactions between the viewing user and content items presented by one or more posters (or received from the external systems 230) based on at least three types of layers. The three types of layers include shared layers, independent layers, and separate layers. The shared layers extract common features that are shared across tasks by sharing layers among the prediction of the various tasks. Each independent layer extracts features for a specific task, and the extracted features are not shared across various tasks. The separate layers predict a likelihood of the viewing user performing a specific task associated with the content items.”
In para. 35, Xiong discloses their flexible multi-task neural network prediction model includes three types of layers: (1) shared layers, (2) independent layers, and (3) separate layers. In particular, the (1) shared layers and (2) independent layers extract common task features and specific task features [the machine learning architecture including a feature extraction layer]. Lastly, the (3) separate layers [one or more computational expert models] predict the likelihood of the user performing a task. The separate layers are construed as corresponding to computational expert models because both refer to performing computations to make predictions.
determining, by the computing system and based on the content actions group, input data for the machine learning architecture, the input data including profile data of the user
Xiong, [0051], “A feature vector 410 associated with the content item 110 is generated. The feature vector 410 includes features associated with characteristics of the poster Lucy Hall (e.g., information included in Lucy Hall's user profile, Lucy Hall's current location), features associated with characteristics of the content item 110, features associated with characteristics of the viewing user (e.g., information included in the viewing user's user profile and the viewing user's location), and features associated with relationships among the poster, the viewing user and the content item 110.”
[0052], “The feature vector 410 is an input to the shared layers 420A, an input to the independent low layers 430A, and an input to the independent low layer 430B.”
[0050], “Examples of tasks in the task domain D1 460 include liking a content item, or sharing a content item.”
In para. 51, Xiong discloses generating a feature vector associated with the content item [determining], and para. 52 further discloses the feature vector is input into the shared layers [input data for the machine learning architecture]. Back in para. 51, the feature vector is said to include features associated with characteristics of the viewing user’s user profile [the input data including profile data of the user]. Lastly, in view of para. 50, the content item is disclosed to be from tasks in a task domain [based on the content actions group].
executing, by the computing system, the feature extraction layer based on the input data to determine output data of the feature extraction layer
As cited above in para. 35, Xiong discloses their (1) shared layers and (2) independent layers extract common task features and specific task features. Also cited above in para. 52, Xiong discloses inputting the feature vector into the shared layers and independent layers to extract features [executing…the feature extract layer based on the input data to determine output data of the feature extraction layer].
executing, by the computing system and based on the output data of the feature extraction layer, the one or more computational experts models to determine first probabilities of the user performing the plurality of actions with respect to a first content item
Xiong, [0054], “In various embodiments (not shown), the prediction model 400A doesn't include the independent middle layers 440A and 440B. The features outputted from the shared layers 420A and features outputted from the independent low layers 430A are inputs to the separate layer 450A for predicting how likely the viewing user 105 will perform the task A1. The features outputted from the shared layers 420A and features outputted from the independent low layers 430B are inputs to the separate layer 450B for predicting how likely the viewing user 105 will perform the task A2.”
In para. 54, Xiong discloses that the features outputted from the shared layers and the independent layers [based on the output data of the feature extraction layer], the separate layers 450A and 450B take the output and compute a prediction [executing…the one or more computational expert models] of how likely the viewing user will perform a task A1 [determine first probabilities of the user performing the plurality of actions with respect to a first content item].
executing, by the computing system, the one or more computational experts models to determine second probabilities of the user performing the plurality of actions with respect to a second content item
As cited above in para. 54, Xiong discloses the features output from the shared layers and the independent layer are input into another separate layer to compute a prediction [executing…the one or more computational expert models] of how likely the viewing user will perform a task A2 [determine second probabilities of the user performing the plurality of actions with respect to a second content item].
determining, by the computing system, that the first probabilities are greater than the second probabilities
Xiong, [0065], “The online system 240 applies each feature vector to the retrieved prediction model and predicts likelihood of each task. The online system 240 scores 640 each content item based on predicted likelihood of each task. The online system 240 ranks 650 the plurality of content items based on the scoring, as described above with respect to the content ranking module 340 of FIG. 3.”
Xiong discloses each content item is scored and ranked based on the predicted likelihood of each task [determining…that the first probabilities are greater than the second probabilities]. In other words, a higher ranking score with a low ranking score corresponds to making the determination that the content item has a higher probability than a second probability.
causing, by the computing system, the first content item to be accessible to the user via the client application
Xiong, [0014], “The content item ranking first is delivered to the viewing user 105 for the opportunity.”
Xiong discloses delivering the highest ranking content item [causing the first content item to be accessible to the user via the client application].

Regarding Claim 2:
As discussed above, Xiong teaches [the] method of claim 1, and further discloses:
comprising: determining, by the computing system, one or more characteristics of the content accessible to the user
Xiong, [0043], “The feature extractor 310 generates feature vectors for each content item. A feature vector associated with a content item describes characteristics of the content item, characteristics of a poster who posts the content item, characteristics of a viewing user whom the content item is presented to, relationships among the characteristics of the content item, the poster, and the viewing user. Examples of characteristics of the content item may include textual content, topics associated with the content item (e.g., derived from the textual content), posted time, a posted location, an activity (e.g., attending an event, making a purchase, and following on one or more additional users), content delivery strategies associated with the content item, interactions between the content item and additional user (e.g., an additional user likes, clicks on or purchases the content item).”
Xiong discloses characteristics of the content item that are viewed by the viewing user [determining…one or more characteristics of the content accessible to the user].
determining, by the computing system, a number of candidate advertising content items to make accessible to the user based on the one or more characteristics
Xiong, [0014], “When an opportunity 150 arises to present a content item (e.g., an advertisement, not shown) to the viewing user 105, a group of content items is identified based on the flexible multi-task neural network prediction model that predicts how likely the viewing user 105 will interact with each content item.”
[0052], “The feature vector 410 is an input to the shared layers 420A, an input to the independent low layers 430A, and an input to the independent low layer 430B.”
In para. 14, Xiong discloses that a group of identified content items to be presented to the viewing user are advertisements [determining…a number of candidate advertising content items to make accessible to the user]. As cited above with para. 43, Xiong also discloses characteristics of the content item as a feature vector. In view of paras. 14 and 52, Xiong, further specifies the feature vector is input into the model and used to make predictions [based on the one or more characteristics].
wherein the first content item is a first advertising content item of the number of candidate advertising content items and the second content item is a second advertising content item of the number of candidate advertising content items
As cited above in para. 14, Xiong discloses a group of identified content items for the viewing user [wherein the first content item is a first advertising content item of the number of candidate advertising content items and the second content item is a second advertising content item of the number of candidate advertising content items]. 
causing, by the computing system, the first advertising content item to be displayed in conjunction with the content
Xiong, [0014], “The group of content items is ranked based on their predictions, and a content item ranks first indicating the viewing user 105 is most likely to perform actions on the content item (e.g., clicking on the content item, visiting a website via clicking on the content item, placing the content item in a virtual shopping cart, or purchasing the content item). The content item ranking first is delivered to the viewing user 105 for the opportunity.”
Xiong discloses that the highest ranking content item (advertisement) is delivered to the viewing user [causing…the first advertising content items to be displayed in conjunction with the content].

Regarding Claim 3:
As discussed above, Xiong teaches [the] method of claim 1, and further discloses:
the machine learning architecture is one of a plurality of machine learning architectures
Xiong, [0050], “FIG. 4A is an example of a first flexible multi-task neural network prediction model 400A applied to the content item 110 for predicting one or more specific tasks between the viewing user 105 and the content item 110, in accordance with an embodiment…Alternatively (not shown in FIG. 4A), the prediction model 400A includes one or more additional separate layers associated with other suitable tasks.”
[0055], “FIG. 4B is an example of a second flexible multi-task neural network prediction model 400B applied to the content item 110 for predicting one or more specific tasks between the viewing user 105 and the content item 110, in accordance with an embodiment.”
[0057], “FIG. 4C is an example of a third multi-task neural network prediction model 400C applied to the content item 110 for predicting one or more specific tasks between the viewing user 105 and the content item 110, in accordance with an embodiment…In various embodiments (not shown), the prediction model 400C includes more separate layers in the domain D1 460 and the domain D2 470.”
In paras. 50, 55, and 57, Xiong discloses various examples of a flexible multi-task neural network prediction model [the machine learning architecture is one of a plurality of machine learning architectures]. Additionally in para. 57, Xiong discloses the model may include more layers for the domains.
the content actions group is one of a plurality of content actions groups that are associated with content items; and
Xiong, [0050], “Tasks A1 and A2 are tasks in a task domain D1 460 that is a category of tasks performed by a user. As shown in FIG. 4A, the task domain D1 460 is for feed ranking. Examples of tasks in the task domain D1 460 include liking a content item, or sharing a content item.”
[0056], “Examples of the tasks in the domain D2 470 include clicking on an advertisement inserted in the opportunity 150, or making a purchase on the advertisement inserted in the opportunity 150.”
In para. 50 Xiong discloses a task domain D1 which is a category of tasks performed by the user [content action group]. Para. 56 further discloses another task domain D2 [is one of a plurality of content action groups]. Both paras. disclose examples of tasks associated with the content item [associated with content items].
individual machine learning architectures of the plurality of machine learning architectures corresponding to an individual content actions group of the plurality of content action groups
Xiong, [0057], “The independent low layers 430C extracts features for the domain D2 470, and the extracted features are not shared by the domain D1 460…In various embodiments (not shown), the prediction model 400C includes more separate layers in the domain D1 460 and the domain D2 470.”
Xiong discloses separate independent layers in order to extract features pertaining to one domain and not the other. Additionally, there are separate layers for each domain, and as discussed above, each domain has associated tasks with content items [individual machine learning architectures of the plurality of machine learning architectures corresponding to an individual content actions group of the plurality of content action groups].

Regarding Claim 4:
As discussed above, Xiong teaches [the] method of claim 3, and further discloses:
performing, by the computing system, a first training process of a first machine learning architecture of the plurality of machine learning architectures using a first set of training data
Xiong, [0046], “In some embodiments, during the multi-task learning, for a specific task, the model module 330 trains shared layers, independent layers associated with the specific task, and a separate layer associated with the specific task included in the prediction model, using the training set by weighting the various features in each feature vectors…For a next specific task, the model module 330 selects a corresponding training set to train shared layers, independent layers associated with the next specific task, and a separate layer associated with the next specific task.”
Xiong discloses training their model with multiple layers. For a first specific task, the corresponding layers are trained [performing…a first training process of a first machine learning architecture of the plurality of machine learning architectures] using a training set [using a first set of training data].
the first set of training data including one or more first characteristics of profile data of users of the client application
Xiong, [0044], “The training set module 320 generates a positive set and a negative set for training a model using feature vectors extracted by the feature extractor 310 for each content item.”
[0043], “The feature extractor 310 generates feature vectors for each content item. A feature vector associated with a content item describes characteristics of the content item, characteristics of a poster who posts the content item, characteristics of a viewing user whom the content item is presented to…Examples of characteristics of the viewing user may include the viewing user's user profile…”
In para. 44, Xiong discloses generating a training set for training the model using feature vectors. Para. 43 further specifies the feature vectors are generated by characteristics such as the viewing user’s user profile. Putting it together with para. 46 above, Xiong discloses a first training set with the viewing user’s user profile characteristics for a first specific task.
performing, by the computing system, a second training process of a second machine learning architecture of the plurality of machine learning architectures using a second set of training data
Xiong, [0046], “In some embodiments, during the multi-task learning, for a specific task, the model module 330 trains shared layers, independent layers associated with the specific task, and a separate layer associated with the specific task included in the prediction model, using the training set by weighting the various features in each feature vectors…For a next specific task, the model module 330 selects a corresponding training set to train shared layers, independent layers associated with the next specific task, and a separate layer associated with the next specific task.”
[0047], “In various embodiments, for a specific task, the model module 330 only trains independent layers associated with the specific task and a separate layer associated with the specific task. For a next specific task, the model module 330 only trains independent layers associated with the next specific task and a separate layer associated with the next specific task.”
As cited above in para. 46 and further in view of para. 47, Xiong further discloses training another corresponding set of layers for another specific task [performing…a second training process of a second machine learning architecture of the plurality of machine learning architectures], using a corresponding training set [using a second set of training data].
the second set of training data including one or more second characteristics of profile data of users of the client application
Xiong, [0044], “The training set module 320 generates a positive set and a negative set for training a model using feature vectors extracted by the feature extractor 310 for each content item.”
[0043], “The feature extractor 310 generates feature vectors for each content item. A feature vector associated with a content item describes characteristics of the content item, characteristics of a poster who posts the content item, characteristics of a viewing user whom the content item is presented to…Examples of characteristics of the viewing user may include the viewing user's user profile…”
In para. 44, Xiong discloses generating a training set for training the model using feature vectors. Para. 43 further specifies the feature vectors are generated by characteristics such as the viewing user’s user profile. Putting it together with paras. 46-47 above, Xiong discloses a second training set with the viewing user’s user profile characteristics for a next specific task.
the one or more second characteristics being different from the one or more first characteristics
As cited above in para. 47, the specific task and the next specific task are different from each other [the one or more second characteristics being different from the one or more first characteristics].

Regarding Claim 5:
As discussed above, Xiong teaches [the] method of claim 4, and further discloses:
wherein the machine learning architecture is a first machine learning architecture, and the method comprises:
Xiong, [0050], “FIG. 4A is an example of a first flexible multi-task neural network prediction model 400A applied to the content item 110 for predicting one or more specific tasks between the viewing user 105 and the content item 110, in accordance with an embodiment…Alternatively (not shown in FIG. 4A), the prediction model 400A includes one or more additional separate layers associated with other suitable tasks.”
Xiong discloses a first flexible multi-task neural network prediction model [the machine learning architecture is a first machine learning architecture].
extracting, by the computing system, the one or more first characteristics from profile data of the user from a database
Xiong, [0043], “The feature extractor 310 generates feature vectors for each content item. A feature vector associated with a content item describes characteristics of the content item, characteristics of a poster who posts the content item, characteristics of a viewing user whom the content item is presented to…Examples of characteristics of the viewing user may include the viewing user's user profile, and the viewing user's current location.”
[0025], “Each user of the online system 240 is associated with a user profile, which is stored in the user profile store 242.”
In para. 43, Xiong discloses extracting features to generate feature vectors [extracting] associated with characteristics of a content item such as characteristics of a viewing user, which may be the viewing user’s user profile [the one or more first characteristics from profile data of the user]. Para. 25 further specifies that user profiles are stored in the user profile store [from a database].
analyzing, by the computing system and using the first machine learning architecture, values of the one or more first characteristics included in the profile data of the user to determine the first probabilities and the second probabilities
Xiong, [0054], “In various embodiments (not shown), the prediction model 400A doesn't include the independent middle layers 440A and 440B. The features outputted from the shared layers 420A and features outputted from the independent low layers 430A are inputs to the separate layer 450A for predicting how likely the viewing user 105 will perform the task A1. The features outputted from the shared layers 420A and features outputted from the independent low layers 430B are inputs to the separate layer 450B for predicting how likely the viewing user 105 will perform the task A2.”
In para. 54, Xiong discloses that the features outputted from the shared layers and the independent layers [analyzing…using the first machine learning architecture, values of the one or more first characteristics included in the profile data of the user], the separate layers 450A and 450B take the output and compute a prediction of how likely the viewing user will perform a task A1 or task A2 [determine the first probabilities and second probabilities].

Regarding Claim 8:
Xiong discloses:
A computing system comprising: one or more hardware processors; and one or more non-transitory computer-readable storage media including computer-readable instructions that, when executed by the one or more hardware processors, cause the one or more hardware processors to perform operations comprising:
Xiong, [0015], “FIG. 2 is a system environment 200 of an online system 240…”
[0068], “Any of the steps, operations, or processes described herein may be performed or implemented with one or more hardware or software modules, alone or in combination with other devices. In one embodiment, a software module is implemented with a computer program product comprising a computer-readable medium containing computer program code, which can be executed by a computer processor for performing any or all of the steps, operations, or processes described.”
In para. 15, Xiong discloses their online system [computing system], and para. 68 further discloses the embodiments are implemented with a computer-readable medium [one or more non-transitory computer-readable storage media including computer-readable instructions] executed by a computer processor [one or more hardware processors] for performing the steps, operations, or processes described.
receiving user input corresponding to content accessible to a user of a client application
Xiong, [0029], “The action logger 245 receives communications about user actions internal to and/or external to the online system 240, populating the action log 250 with information about user actions.”
[0025], “A user profile in the user profile store 242 may also maintain references to actions by the corresponding user performed on content items in the content store 243 and stored in the action log 250.”
[0016], “In one embodiment, a client device 210 executes an application allowing a user of the client device 210 to interact with the online system 240. For example, a client device 210 executes a browser application to enable interaction between the client device 210 and the online system 240 via the network 220. In another embodiment, a client device 210 interacts with the online system 240 through an application programming interface (API) running on a native operating system of the client device 210…”
In para. 29, Xiong discloses that the action logger of the online system receives communications about user actions [receiving user input], and para. 25 specifies the actions correspond to user actions performed on content items [user input corresponding to content accessible to a user]. Lastly, para. 16 discloses a client device for executing an application for allowing a user of the client device to interact with the online system [client application].
analyzing the user input to determine a content actions group that corresponds to the user input
Xiong, [0030], “The action log 250 may be used by the online system 240 to track user actions on the online system 240…Users may interact with various objects on the online system 240, and information describing these interactions are stored in the action log 250…Additionally, the action log 250 may record a user's interactions with advertisements on the online system 240 as well as with other applications operating on the online system 240.”
[0031], “The action log 250 may also store user actions taken on an external system 230, such as an external website, and communicated to the online system 240.”
[0050], “Tasks A1 and A2 are tasks in a task domain D1 460 that is a category of tasks performed by a user.”
[0003], “For various tasks (or actions)…”
In paras. 30 and 31, Xiong discloses information describing users interactions are stored in the action log based on users interacting with objects of the online system, users interacting with advertisements and or applications of the online system, or users interactions with external systems to the online system [analyzing the user input to determine a content actions group that corresponds to the user input]. In view of this, para. 50 discloses an example where tasks (Xiong interchangeably uses “tasks” and “actions”, e.g. para. 3) A1 and A2 are tasks belonging to the task domain D1 [content action group].
the content actions group including a plurality of actions that users of the client application perform in relation to content accessible by the client application
Xiong, [0030], “Examples of interactions with objects include: commenting on posts, sharing links, and checking-in to physical locations via a mobile device, accessing content items, and any other interactions.”
[0031], “Hence, the action log 250 may record information about actions users perform on the external system 230, including webpage viewing histories, advertisements that were engaged, purchases made, and other patterns from shopping and buying.”
In paras. 30 and 31, Xiong discloses examples of user interactions with objects and external systems to the online system [a plurality of actions that users of the client application perform in relation to the content accessible by the client application].
determining a machine learning architecture that corresponds to the content actions group
Xiong, [0050], “FIG. 4A is an example of a first flexible multi-task neural network prediction model 400A applied to the content item 110 for predicting one or more specific tasks between the viewing user 105 and the content item 110, in accordance with an embodiment…Alternatively (not shown in FIG. 4A), the prediction model 400A includes one or more additional separate layers associated with other suitable tasks.”
[0055], “FIG. 4B is an example of a second flexible multi-task neural network prediction model 400B applied to the content item 110 for predicting one or more specific tasks between the viewing user 105 and the content item 110, in accordance with an embodiment.”
[0057], “FIG. 4C is an example of a third multi-task neural network prediction model 400C applied to the content item 110 for predicting one or more specific tasks between the viewing user 105 and the content item 110, in accordance with an embodiment…In various embodiments (not shown), the prediction model 400C includes more separate layers in the domain D1 460 and the domain D2 470.”
In paras. 50, 55, and 57, Xiong discloses various examples of a flexible multi-task neural network prediction model. Specifically in para. 57, Xiong discloses the model may include more layers [determining a machine learning architecture] for the domains [corresponds to the action groups].
the machine learning architecture including a feature extraction layer and one or more computational experts models
Xiong, [0035], “The flexible multi-task neutral network prediction model predicts interactions between the viewing user and content items presented by one or more posters (or received from the external systems 230) based on at least three types of layers. The three types of layers include shared layers, independent layers, and separate layers. The shared layers extract common features that are shared across tasks by sharing layers among the prediction of the various tasks. Each independent layer extracts features for a specific task, and the extracted features are not shared across various tasks. The separate layers predict a likelihood of the viewing user performing a specific task associated with the content items.”
In para. 35, Xiong discloses their flexible multi-task neural network prediction model includes three types of layers: (1) shared layers, (2) independent layers, and (3) separate layers. In particular, the (1) shared layers and (2) independent layers extract common task features and specific task features [the machine learning architecture including a feature extraction layer]. Lastly, the (3) separate layers [one or more computational expert models] predict the likelihood of the user performing a task. The separate layers are construed as corresponding to computational expert models because both refer to performing computations to make predictions.
determining, based on the content actions group, input data for the machine learning architecture, the input data including profile data of the user
Xiong, [0051], “A feature vector 410 associated with the content item 110 is generated. The feature vector 410 includes features associated with characteristics of the poster Lucy Hall (e.g., information included in Lucy Hall's user profile, Lucy Hall's current location), features associated with characteristics of the content item 110, features associated with characteristics of the viewing user (e.g., information included in the viewing user's user profile and the viewing user's location), and features associated with relationships among the poster, the viewing user and the content item 110.”
[0052], “The feature vector 410 is an input to the shared layers 420A, an input to the independent low layers 430A, and an input to the independent low layer 430B.”
[0050], “Examples of tasks in the task domain D1 460 include liking a content item, or sharing a content item.”
In para. 51, Xiong discloses generating a feature vector associated with the content item [determining], and para. 52 further discloses the feature vector is input into the shared layers [input data for the machine learning architecture]. Back in para. 51, the feature vector is said to include features associated with characteristics of the viewing user’s user profile [the input data including profile data of the user]. Lastly, in view of para. 50, the content item is disclosed to be from tasks in a task domain [based on the content actions group].
executing the feature extraction layer based on the input data to determine output data of the feature extraction layer
As cited above in para. 35, Xiong discloses their (1) shared layers and (2) independent layers extract common task features and specific task features. Also cited above in para. 52, Xiong discloses inputting the feature vector into the shared layers and independent layers to extract features [executing the feature extract layer based on the input data to determine output data of the feature extraction layer].
executing, based on the output data of the feature extraction layer, the one or more computational experts models to determine probabilities of the user performing the plurality of actions with respect to one or more content items
Xiong, [0054], “In various embodiments (not shown), the prediction model 400A doesn't include the independent middle layers 440A and 440B. The features outputted from the shared layers 420A and features outputted from the independent low layers 430A are inputs to the separate layer 450A for predicting how likely the viewing user 105 will perform the task A1. The features outputted from the shared layers 420A and features outputted from the independent low layers 430B are inputs to the separate layer 450B for predicting how likely the viewing user 105 will perform the task A2.”
In para. 54, Xiong discloses that the features outputted from the shared layers and the independent layers [based on the output data of the feature extraction layer], the separate layers 450A and 450B take the output and compute a prediction [executing the one or more computational expert models] of how likely the viewing user will perform a task A1 or task A2 [determine probabilities of the user performing the plurality of actions with respect to one or more content items].
determining, based on the probabilities, a content item of the one or more content items to make accessible to the user via the client application
Xiong, [0065], “The online system 240 applies each feature vector to the retrieved prediction model and predicts likelihood of each task. The online system 240 scores 640 each content item based on predicted likelihood of each task. The online system 240 ranks 650 the plurality of content items based on the scoring, as described above with respect to the content ranking module 340 of FIG. 3.”
[0014], “The content item ranking first is delivered to the viewing user 105 for the opportunity.”
In para. 65, Xiong discloses each content item is scored and ranked based on the predicted likelihood of each task [determining, based on the probabilities, a content item of the one or more content items], and para. 14 discloses delivering the highest ranking content item [to make accessible to the user via the client application].

Regarding Claim 11:
As discussed above, Xiong teaches [the] computing system of claim 8, and further discloses:
wherein the machine learning architecture includes one or more extraction layers, the one or more extraction layers including the one or more computational expert models
Xiong, [0035], “The shared layers extract common features that are shared across tasks by sharing layers among the prediction of the various tasks. Each independent layer extracts features for a specific task, and the extracted features are not shared across various tasks.”
[0046], “the model module 330 trains shared layers, independent layers associated with the specific task”
In para. 35, Xiong discloses shared layers and independent layers that extract features [the machine learning architecture includes one or more extraction layers] which are construed as including one or more computational expert models since they are disclosed as being trained in para. 46 [the one or more extraction layers including the one or more computational experts models].

Regarding Claim 12:
As discussed above, Xiong teaches [the] computing system of claim 11, and further discloses:
a first extraction layer having one or more first computational experts models that correspond to a first content action of the plurality of actions, one or more second computational experts models that correspond to a second content action of the plurality of actions, and one or more shared computational experts models; and
Xiong, [0050], “As shown in FIG. 4A, the prediction model 400A includes one or more shared layers 420A, one or more independent low layers 430A associated with a task A1, one or more independent low layers 430B associated with a task A2, one or more independent middle layers 440A associated with the task A1, one or more independent middle layers 440B associated with the task A2…”
[0035], “The shared layers extract common features that are shared across tasks by sharing layers among the prediction of the various tasks. Each independent layer extracts features for a specific task, and the extracted features are not shared across various tasks. The separate layers predict a likelihood of the viewing user performing a specific task associated with the content items.”
In para. 50, Xiong discloses shared layers [one or more shared computational expert models] and an independent layer associated with a task A1 [a first extraction layer having one or more first computational expert models that correspond to a first content action of the plurality of actions] and another independent layer associated with another task A2 [one or more second computational expert models that correspond to a second content action of the plurality of actions.
a second extraction layer having one or more first additional computational experts models that correspond to the first content action, one or more second additional computational experts models that correspond to the second content action, and one or more additional shared computational experts models.
Xiong, [0050], “Alternatively (not shown in FIG. 4A), the prediction model 400A includes one or more additional separate layers associated with other suitable tasks.”
Xiong discloses there may be additional separate layers [a second extraction layer] with other suitable tasks that follow what was discussed above [one or more first additional computational experts models that correspond to the first content action, one or more second additional computational experts models that correspond to the second content action, and one or more additional shared computational experts models].

Regarding Claim 17:
Xiong discloses:
One or more non-transitory computer-readable storage media including computer-readable instructions that, when executed by one or more hardware processors, cause the one or more hardware processors to perform operations comprising:
Xiong, [0068], “Any of the steps, operations, or processes described herein may be performed or implemented with one or more hardware or software modules, alone or in combination with other devices. In one embodiment, a software module is implemented with a computer program product comprising a computer-readable medium containing computer program code, which can be executed by a computer processor for performing any or all of the steps, operations, or processes described.”
In para. 68, Xiong discloses the embodiments are implemented with a computer-readable medium [one or more non-transitory computer-readable storage media including computer-readable instructions] executed by a computer processor [one or more hardware processors] for performing the steps, operations, or processes described.
receiving user input corresponding to content accessible to a user of a client application
Xiong, [0029], “The action logger 245 receives communications about user actions internal to and/or external to the online system 240, populating the action log 250 with information about user actions.”
[0025], “A user profile in the user profile store 242 may also maintain references to actions by the corresponding user performed on content items in the content store 243 and stored in the action log 250.”
[0016], “In one embodiment, a client device 210 executes an application allowing a user of the client device 210 to interact with the online system 240. For example, a client device 210 executes a browser application to enable interaction between the client device 210 and the online system 240 via the network 220. In another embodiment, a client device 210 interacts with the online system 240 through an application programming interface (API) running on a native operating system of the client device 210…”
In para. 29, Xiong discloses that the action logger of the online system receives communications about user actions [receiving user input], and para. 25 specifies the actions correspond to user actions performed on content items [user input corresponding to content accessible to a user]. Lastly, para. 16 discloses a client device for executing an application for allowing a user of the client device to interact with the online system [client application].
analyzing the user input to determine a content actions group that corresponds to the user input
Xiong, [0030], “The action log 250 may be used by the online system 240 to track user actions on the online system 240…Users may interact with various objects on the online system 240, and information describing these interactions are stored in the action log 250…Additionally, the action log 250 may record a user's interactions with advertisements on the online system 240 as well as with other applications operating on the online system 240.”
[0031], “The action log 250 may also store user actions taken on an external system 230, such as an external website, and communicated to the online system 240.”
[0050], “Tasks A1 and A2 are tasks in a task domain D1 460 that is a category of tasks performed by a user.”
[0003], “For various tasks (or actions)…”
In paras. 30 and 31, Xiong discloses information describing users interactions are stored in the action log based on users interacting with objects of the online system, users interacting with advertisements and or applications of the online system, or users interactions with external systems to the online system [analyzing the user input to determine a content actions group that corresponds to the user input]. In view of this, para. 50 discloses an example where tasks (Xiong interchangeably uses “tasks” and “actions”, e.g. para. 3) A1 and A2 are tasks belonging to the task domain D1 [content action group].
the content actions group including a plurality of actions that users of the client application perform in relation to content accessible by the client application
Xiong, [0030], “Examples of interactions with objects include: commenting on posts, sharing links, and checking-in to physical locations via a mobile device, accessing content items, and any other interactions.”
[0031], “Hence, the action log 250 may record information about actions users perform on the external system 230, including webpage viewing histories, advertisements that were engaged, purchases made, and other patterns from shopping and buying.”
In paras. 30 and 31, Xiong discloses examples of user interactions with objects and external systems to the online system [a plurality of actions that users of the client application perform in relation to the content accessible by the client application].
determining a machine learning architecture that corresponds to the content actions group
Xiong, [0050], “FIG. 4A is an example of a first flexible multi-task neural network prediction model 400A applied to the content item 110 for predicting one or more specific tasks between the viewing user 105 and the content item 110, in accordance with an embodiment…Alternatively (not shown in FIG. 4A), the prediction model 400A includes one or more additional separate layers associated with other suitable tasks.”
[0055], “FIG. 4B is an example of a second flexible multi-task neural network prediction model 400B applied to the content item 110 for predicting one or more specific tasks between the viewing user 105 and the content item 110, in accordance with an embodiment.”
[0057], “FIG. 4C is an example of a third multi-task neural network prediction model 400C applied to the content item 110 for predicting one or more specific tasks between the viewing user 105 and the content item 110, in accordance with an embodiment…In various embodiments (not shown), the prediction model 400C includes more separate layers in the domain D1 460 and the domain D2 470.”
In paras. 50, 55, and 57, Xiong discloses various examples of a flexible multi-task neural network prediction model. Specifically in para. 57, Xiong discloses the model may include more layers [determining a machine learning architecture] for the domains [corresponds to the action groups].
the machine learning architecture including a feature extraction layer and one or more computational experts models
Xiong, [0035], “The flexible multi-task neutral network prediction model predicts interactions between the viewing user and content items presented by one or more posters (or received from the external systems 230) based on at least three types of layers. The three types of layers include shared layers, independent layers, and separate layers. The shared layers extract common features that are shared across tasks by sharing layers among the prediction of the various tasks. Each independent layer extracts features for a specific task, and the extracted features are not shared across various tasks. The separate layers predict a likelihood of the viewing user performing a specific task associated with the content items.”
In para. 35, Xiong discloses their flexible multi-task neural network prediction model includes three types of layers: (1) shared layers, (2) independent layers, and (3) separate layers. In particular, the (1) shared layers and (2) independent layers extract common task features and specific task features [the machine learning architecture including a feature extraction layer]. Lastly, the (3) separate layers [one or more computational expert models] predict the likelihood of the user performing a task. The separate layers are construed as corresponding to computational expert models because both refer to performing computations to make predictions.
determining, based on the content actions group, input data for the machine learning architecture, the input data including profile data of the user
Xiong, [0051], “A feature vector 410 associated with the content item 110 is generated. The feature vector 410 includes features associated with characteristics of the poster Lucy Hall (e.g., information included in Lucy Hall's user profile, Lucy Hall's current location), features associated with characteristics of the content item 110, features associated with characteristics of the viewing user (e.g., information included in the viewing user's user profile and the viewing user's location), and features associated with relationships among the poster, the viewing user and the content item 110.”
[0052], “The feature vector 410 is an input to the shared layers 420A, an input to the independent low layers 430A, and an input to the independent low layer 430B.”
[0050], “Examples of tasks in the task domain D1 460 include liking a content item, or sharing a content item.”
In para. 51, Xiong discloses generating a feature vector associated with the content item [determining], and para. 52 further discloses the feature vector is input into the shared layers [input data for the machine learning architecture]. Back in para. 51, the feature vector is said to include features associated with characteristics of the viewing user’s user profile [the input data including profile data of the user]. Lastly, in view of para. 50, the content item is disclosed to be from tasks in a task domain [based on the content actions group].
executing the feature extraction layer based on the input data to determine output data of the feature extraction layer
As cited above in para. 35, Xiong discloses their (1) shared layers and (2) independent layers extract common task features and specific task features. Also cited above in para. 52, Xiong discloses inputting the feature vector into the shared layers and independent layers to extract features [executing the feature extract layer based on the input data to determine output data of the feature extraction layer].
executing, based on the output data of the feature extraction layer, the one or more computational experts models to determine probabilities of the user performing the plurality of actions with respect to one or more content items
Xiong, [0054], “In various embodiments (not shown), the prediction model 400A doesn't include the independent middle layers 440A and 440B. The features outputted from the shared layers 420A and features outputted from the independent low layers 430A are inputs to the separate layer 450A for predicting how likely the viewing user 105 will perform the task A1. The features outputted from the shared layers 420A and features outputted from the independent low layers 430B are inputs to the separate layer 450B for predicting how likely the viewing user 105 will perform the task A2.”
In para. 54, Xiong discloses that the features outputted from the shared layers and the independent layers [based on the output data of the feature extraction layer], the separate layers 450A and 450B take the output and compute a prediction [executing the one or more computational expert models] of how likely the viewing user will perform a task A1 or task A2 [determine probabilities of the user performing the plurality of actions with respect to one or more content items].
determining, based on the probabilities, a content item of the one or more content items to make accessible to the user via the client application
Xiong, [0065], “The online system 240 applies each feature vector to the retrieved prediction model and predicts likelihood of each task. The online system 240 scores 640 each content item based on predicted likelihood of each task. The online system 240 ranks 650 the plurality of content items based on the scoring, as described above with respect to the content ranking module 340 of FIG. 3.”
[0014], “The content item ranking first is delivered to the viewing user 105 for the opportunity.”
In para. 65, Xiong discloses each content item is scored and ranked based on the predicted likelihood of each task [determining, based on the probabilities, a content item of the one or more content items], and para. 14 discloses delivering the highest ranking content item [to make accessible to the user via the client application].

Regarding Claim 20:
As discussed above, Xiong teaches [the] one or more non-transitory computer-readable storage media of claim 17, and further discloses:
the input data includes at least one of information indicating content viewing history of the user or demographic information of the user
Xiong, [0025], “Each user of the online system 240 is associated with a user profile, which is stored in the user profile store 242. Examples of information stored in a user profile include biographic, demographic , and other types of descriptive information…”
As discussed above with respect to claim 17, Xiong discloses that the user profile is part of the input data, and the user profile is disclosed to include user demographic information.
the content item includes advertising content related to an item available for purchase via the client application
Xiong, [0056], “Examples of the tasks in the domain D2 470 include clicking on an advertisement inserted in the opportunity 150, or making a purchase on the advertisement inserted in the opportunity 150.”
Xiong discloses a task in a domain such as making a purchase on the advertisement inserted in the opportunity [includes content related to an item available for purchase via the client application].
the plurality of actions includes
viewing a page related to the item,
Xiong, [0003], “For various tasks (or actions)… Examples of tasks may include…visiting a website via clicking on a content item…”
purchasing the item,
Xiong, [0003], “For various tasks (or actions)… Examples of tasks may include… purchasing a content item.”
adding the item to a cart of the user for a potential future purchase of the item, and
Xiong, [0003], “For various tasks (or actions)… Examples of tasks may include…placing a content item in a virtual shopping cart…”
performing a sign up action with regard to the item.
Xiong, [0003], “For various tasks (or actions)… Examples of tasks may include… following on a content item…”

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Xiong in view of Jha et al. (US 20220114644), hereinafter Jha.

Regarding Claim 6:
As discussed above, Xiong teaches [the] method of claim 1, and Xiong further discloses:
wherein the input data includes first data that corresponds to continuous data
Xiong, [0043], “A feature vector associated with a content item describes characteristics of the content item…Examples of characteristics of the content item may include…posted time…”
Xiong discloses posted time as a characteristic of the content item [includes first data that corresponds to continuous data].
second data that corresponds to discrete values
Xiong, [0051], “A feature vector 410 associated with the content item 110 is generated. The feature vector 410 includes features associated with characteristics of the poster Lucy Hall… Examples of characteristics of the content item 110 may include…interactions between the content item 110 and additional users (e.g., the number of ‘likes,’ the number of ‘comments’, and the number of ‘shares’).”
Xiong discloses characteristics such as the number of likes, number of comments, and number of shares [second data that corresponds to discrete values].
Xiong does not explicitly disclose:
third data that corresponds to sparse data, the sparse data corresponding to a set of data values with a majority of the set of data values being zero
However, in the same field, analogous art Jha teaches:
third data that corresponds to sparse data, the sparse data corresponding to a set of data values with a majority of the set of data values being zero
Jha, [0042], “FIG. 3 shows an example of an encoder applied to a symbol in a set of sparse category features…The sparse binary representation 320 may be an array having a specified width d. As such, the sparse binary representation 320 may include a number of positions from [0 to d−1]. In this example, the sparse binary representation 320 may be initialized to values of zero. The encoder may be applied to the input symbol 300 to identify a number of indices to set to a value of one in the sparse binary representation 320. Each of the positions in the sparse binary representation 320 corresponding to the indices specified by the encoder may then be set to a value of one.”
Jha teaches sparse category features [third data that corresponds to sparse data]. As depicted in FIG. 3, the sparse binary representation mainly contains 0 for data values [a set of data values with a majority of the set of data values being zero].
Xiong, Jha, and the instant application are analogous art because they are all directed to recommendation systems and machine learning.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify Xiong with Jha in order to fix the size of data and create a more robust system. “Each set of sparse features may represent a different category of items, for example movies, books, content items, etc., or different interactions by users with these categories. In general, the symbols of the sparse category features 204 are processed by an encoder 220 to generate a sparse binary representation 222. Where the sparse category features 204 may describe the sparse features as a list of symbols (e.g., a subset from the alphabet of symbols), the sparse binary representation 222 represents the subset of symbols as a binary representation, such as a binary array, thus converting the list of an unknown size to a binary representation, typically of a fixed size” (Jha, [0038]). Jha states that using sparse binary representation allows the conversion of a list with an unknown (possibly large size) into a known, fixed size. This creates a robust system that deals with known, fixed lengths of data compared to unknown, dynamic lengths of data.

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Xiong in view of Jha, as applied to claim 6 above, further in view of Bragdon (US 10841257), further in view of Riesa et al. (US 20190347323), hereinafter Riesa.

Regarding Claim 7:
As discussed above Xiong in view of Jha (hereinafter Xiong and Jha) teach [the] method of claim 6, and Xiong further teaches:
providing, by the computing system, the…input data to the feature extraction layer
Xiong, [0052], “The feature vector 410 is an input to the shared layers 420A, an input to the independent low layers 430A, and an input to the independent low layer 430B.”
[0035], “The shared layers extract common features that are shared across tasks by sharing layers among the prediction of the various tasks. Each independent layer extracts features for a specific task, and the extracted features are not shared across various tasks.”
In para. 52, Xiong discloses inputting the feature vectors into the layers [providing the input data]. Para. 35 specifies the layers extract the features [the feature extraction layer].
Jha further teaches:
performing, by the computing system, a second normalization process with respect to the third data to produce modified third data;
Jha, [0042], “FIG. 3 shows an example of an encoder applied to a symbol in a set of sparse category features…The sparse binary representation 320 may be an array having a specified width d. As such, the sparse binary representation 320 may include a number of positions from [0 to d−1]. In this example, the sparse binary representation 320 may be initialized to values of zero. The encoder may be applied to the input symbol 300 to identify a number of indices to set to a value of one in the sparse binary representation 320. Each of the positions in the sparse binary representation 320 corresponding to the indices specified by the encoder may then be set to a value of one.”
Jha teaches using an encoder to create a sparse binary representation for sparse category features [performing…a second normalization process with respect to the third data to produce modified third data].
combining, by the computing system…the modified third data to produce modified input data
Jha, [0040], “The prediction model 230 may include one or more aggregation layers (not shown) that combine the dense vector representation 212 and category vector representation 226. In one example, the aggregation layer concatenates the dense vector representation 212 with the category vector representation 226.”
Jha teaches an aggregation layer for combining and concatenating vector representations.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify Xiong with Jha in order to fix the size of data and create a more robust system. “Each set of sparse features may represent a different category of items, for example movies, books, content items, etc., or different interactions by users with these categories. In general, the symbols of the sparse category features 204 are processed by an encoder 220 to generate a sparse binary representation 222. Where the sparse category features 204 may describe the sparse features as a list of symbols (e.g., a subset from the alphabet of symbols), the sparse binary representation 222 represents the subset of symbols as a binary representation, such as a binary array, thus converting the list of an unknown size to a binary representation, typically of a fixed size” (Jha, [0038]). Jha states that using sparse binary representation allows the conversion of a list with an unknown (possibly large size) into a known, fixed size. This creates a robust system that deals with known, fixed lengths of data compared to unknown, dynamic lengths of data.
Xiong and Jha do not explicitly disclose:
performing, by the computing system, a first normalization process with respect to the second data to produce modified second data
combining, by the computing system, the first data, the modified second data, and the modified third data to produce modified input data
However, in the same field, analogous art Bragdon teaches:
performing, by the computing system, a first normalization process with respect to the second data to produce modified second data
Bragdon, [39],“A sparse calibration layer of the neural network system obtains the discretized feature values and generates refined feature values by normalizing the discretized feature values to generate normalized feature values…”
Bragdon teaches normalizing discretized feature values [performing…a first normalization process with respect to the second data] to generate normalized feature values [to produce modified second data].
Xiong, Jha, Bragdon, and the instant application are analogous art because they are all directed to digital engagement and machine learning.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify Xiong and Jha with Bragdon in order to increase the robustness of the system. “A sparse calibration layer of the neural network system obtains the discretized feature values and generates refined feature values by normalizing the discretized feature values to generate normalized feature values and applying a type-specific bias value to each normalized feature value. This layer has two main extras compared to other sparse layers out there, namely an online normalization scheme that prevents gradients from exploding, and a per-feature bias to distinguish between the absence of a feature and the presence of a zero-valued feature” (Bragdon, [39]). Bragdon teaches that their normalization scheme prevents gradients from exploding and aids to distinguish features that are absent versus features that are zero-valued.
Xiong and Jha in view of Bragdon do not explicitly disclose:
combining, by the computing system, the first data, the modified second data, and the modified third data to produce modified input data
However, in the same field, analogous art Reisa teaches:
combining…the first data [and] the modified second data…to produce modified input data
Riesa, [0049], “Lastly, the embedding layer 302 is configured to concatenate each learned embedding matrix Eg corresponding to each received feature input 210 (e.g., feature group g) to form the embedding layer h0=vec[XgEg|∀g]. A final size of the embedding layer 302 may include a sum of all embedded feature sizes. The feed-forward neural network model 300 may use both discrete and continuous features.”
Xiong, Jha, Bragdon, Riesa, and the instant application are analogous art because they are all directed to feature vectors for machine learning.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify Xiong, Jha, and Bragdon with Riesa in order to allow the neural network model to use multiple types of input data. “Lastly, the embedding layer 302 is configured to concatenate each learned embedding matrix Eg corresponding to each received feature input 210 (e.g., feature group g) to form the embedding layer h0=vec[XgEg|∀g]. A final size of the embedding layer 302 may include a sum of all embedded feature sizes. The feed-forward neural network model 300 may use both discrete and continuous features” (Riesa, [0049]). Riesa states that the neural network model may be able to use both discrete and continuous features by concatenating their embeddings. This allows the neural network model to work with more types of data rather than being confined to a singular type of data.

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Xiong in view of Ma et al. (US 20220358347), hereinafter Ma.

Regarding Claim 9:
As discussed above, Xiong teaches [the] computing system of claim 8, but does not explicitly disclose:
wherein the feature extraction layer includes a deep and cross network having a plurality of cross layers coupled to a deep network
However, in the same field, analogous art Ma teaches:
wherein the feature extraction layer includes a deep and cross network having a plurality of cross layers coupled to a deep network
Ma, Figure 4: 
    PNG
    media_image1.png
    425
    431
    media_image1.png
    Greyscale

Figure 4 of Ma depicts a deep and cross network by having cross network coupled in a parallel manner to deep network.
Xiong, Ma, and the instant application are analogous art because they are all directed to recommendation systems.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify Xiong with Ma in order to increase the robustness of feature extraction of a content object. “As discussed herein, and below in relation to FIG. 6, framework 400 receives input features 402, which can be a set of features identified, extracted and/or derived from a content object(s), as discussed below in more detail in relation to FIG. 6 (e.g., 212 hand-crafted features xo). These features 402 are fed into two sub-networks—cross network model 404 and deep network model 406. As discussed below, the cross network 404 learns feature crossings in an efficient manner, while the deep network 406 implicitly learns useful features. The output of these features are then concatenated used to train the DCN model 408” (Ma, [0070]).

Claims 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Xiong in view of Zhao et al. (US 20220253680), hereinafter Zhao.

Regarding Claim 18:
As discussed above, Xiong teaches [the] one or more non-transitory computer-readable storage media of claim 17, but does not explicitly disclose:
the one or more computational experts models include one or more feed forward neural networks
the one or more computational experts models are coupled to one or more gating networks
the one or more gating networks include a plurality of softmax layers
However, in the same field, analogous art Zhao teaches:
the one or more computational experts models include one or more feed forward neural networks
Zhao, [0039], “the expert neural networks can have identical architectures, e.g., a feed-forward network architecture, but with different parameters.”
Zhao teaches expert neural networks [one or more computational expert models] are feed-forward network architecture [include one or more feed forward neural networks].
the one or more computational experts models are coupled to one or more gating networks
Zhao, [0041], “The neural network 102 includes a gating subsystem 110 that is configured to select, based on respective weights computed for each of one or more of the plurality of the expert neural networks, only a proper subset of the expert neural networks.”
[0043], “the gating subsystem 110 can include multiple gating engines, e.g., each associated with a respective task. That is, the gating subsystem 110 can use different gating engines to select different subsets of expert neural networks for use in generating a respective MoE output for each of the multiple different tasks that the neural network 102 is configured to perform.”
In para. 41, Zhao teaches that there is a gating subsystem which selects one or more of the expert neural networks [one or more computational expert models are coupled to one or more gating networks]. Para. 43 further specifies the gating subsystem has multiple gating engines for selecting expert neural networks.
the one or more gating networks include a plurality of softmax layers
Zhao, [0056], “The system applies, by using the gating subsystem, a softmax function to a set of gating parameters having adjustable (i.e., learnable) values to generate a respective softmax score for each of one or more of the plurality of expert neural networks (step 204).”
In view of para. 43 above, which teaches multiple gating engines, para. 56 teaches that the gating subsystem uses softmax. Since there are multiple gating engines and each has their own softmax, it is construed as teaching multiple softmax [the one or more gating networks include a plurality of softmax layers].
Xiong, Zhao, and the instant application are analogous art because they are all directed to machine learning tasks.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify Xiong with Zhao in order to increase accuracy of predictions while only running required neural networks. “In some examples, multi-task learning can be applied to train the neural network on multiple tasks simultaneously by using a shared MoE subnetwork (and possibly other components of the neural network). Generally, multi-task learning is aimed at leveraging the information of multiple, mutually related tasks to make more accurate predictions for the individual tasks. In these examples, the gating subsystem can be configured to implement a respective gating engine for each task which adaptively determines whether to share expert neural networks with other tasks” (Zhao, [0083]).
.
Regarding Claim 19:
As discussed above, Xiong teaches [the] one or more non-transitory computer-readable storage media of claim 17, and further discloses:
one or more extraction layers that include the one or more computational experts models
Xiong, [0035], “The shared layers extract common features that are shared across tasks by sharing layers among the prediction of the various tasks. Each independent layer extracts features for a specific task, and the extracted features are not shared across various tasks.”
[0046], “the model module 330 trains shared layers, independent layers associated with the specific task”
In para. 35, Xiong discloses shared layers and independent layers that extract features which are construed as one or more computational expert models since they are disclosed as being trained in para. 46 [one or more extraction layers that include the one or more computational experts models].
Xiong does not explicitly disclose:
one or more gating networks coupled to the one or more computational experts models
one or more additional computational layers that are coupled to the one or more gating networks
wherein the one or more additional computational layers determine the probabilities based on output obtained from the one or more gating networks
However, in the same field, analogous art Zhao teaches:
one or more gating networks coupled to the one or more computational experts models
Zhao, [0041], “The neural network 102 includes a gating subsystem 110 that is configured to select, based on respective weights computed for each of one or more of the plurality of the expert neural networks, only a proper subset of the expert neural networks.”
[0043], “the gating subsystem 110 can include multiple gating engines, e.g., each associated with a respective task. That is, the gating subsystem 110 can use different gating engines to select different subsets of expert neural networks for use in generating a respective MoE output for each of the multiple different tasks that the neural network 102 is configured to perform.”
In para. 41, Zhao teaches that there is a gating subsystem which selects one or more of the expert neural networks [one or more gating networks coupled to the one or more computational expert models]. Para. 43 further specifies the gating subsystem has multiple gating engines for selecting expert neural networks.
one or more additional computational layers that are coupled to the one or more gating networks
Zhao, [0041], “The gating subsystem 110 combines the expert outputs generated by the selected expert neural networks in accordance with the respective weights for the selected expert neural networks to generate one or more MoE outputs.”
[0048], “In some implementations, the gating subsystem 110 then provides the MoE output 132 (or MoE output 133) as input to the second neural network layer 106 (or the third neural network layer 107) in the neural network for further processing so as to generate the output of the neural network 120 for a corresponding task from the network input 101.”
In paras. 41 and 48, Zhao discloses a second neural network layer (or third) [one or more additional computational layers] that are connected to the gating subsystem [coupled to the one or more gating networks].
wherein the one or more additional computational layers determine the probabilities based on output obtained from the one or more gating networks
As cited above with paras. 41 and 48, Zhao teaches that the gating subsystem outputs the selected outputs [an output obtained from the one or more gating networks] for second neural network layer to continue processing and generating an output [the or more additional computational layers determine the probabilities].
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify Xiong with Zhao in order to increase accuracy of predictions while only running required neural networks. “In some examples, multi-task learning can be applied to train the neural network on multiple tasks simultaneously by using a shared MoE subnetwork (and possibly other components of the neural network). Generally, multi-task learning is aimed at leveraging the information of multiple, mutually related tasks to make more accurate predictions for the individual tasks. In these examples, the gating subsystem can be configured to implement a respective gating engine for each task which adaptively determines whether to share expert neural networks with other tasks” (Zhao, [0083]).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to STEVEN PHUNG whose telephone number is (703) 756-1499. The examiner can normally be reached Monday-Thursday: 9:00AM-4:00PM ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, KAMRAN AFSHAR can be reached at (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/STEVEN PHUNG/Examiner, Art Unit 2125                                                                                                                                                                                                        

/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125
Read full office action
Prosecution Timeline

Mar 14, 2022
Application Filed
Mar 17, 2026
Non-Final Rejection — §101, §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/378,794
Patent 12596941
REALISTIC COUNTERFACTUAL EXPLANATION OF MACHINE LEARNING PREDICTIONS
2y 5m to grant Granted Apr 07, 2026
17/378,678
Patent 12576990
PREDICTIVE MAINTENANCE MODEL DESIGN SYSTEM
2y 5m to grant Granted Mar 17, 2026
17/315,701
Patent 12572844
Probing Model Signal Awareness
2y 5m to grant Granted Mar 10, 2026
17/085,603
Patent 12554979
ADAPTING AI MODELS FROM ONE DOMAIN TO ANOTHER
2y 5m to grant Granted Feb 17, 2026
17/154,123
Patent 12554997
Deep Multi-View Network Embedding on Incomplete Data
2y 5m to grant Granted Feb 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
74%
Grant Probability
99%
With Interview (+26.2%)
4y 6m
Median Time to Grant
Low
PTA Risk
Based on 38 resolved cases by this examiner. Grant probability derived from career allow rate.