Office Action Analysis: 18166415 — SEQUENTIAL MODEL FOR DETERMINING USER REPRESENTATIONS

Office Action

§101 §102 §103 §112
DETAILED ACTION
The action is in response to the original filing on February 8, 2023. Claims 1-20 are pending and have been considered below. Claims 1, 5, and 16 are independent claims.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statements (IDS) submitted on February 8, 2023, is being considered by the examiner.

Drawings
The drawings are objected to because of the following minor informalities:
In Fig. 2B, reference character “216” is not mentioned in the description.
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


	Claim 11 recites the limitation “the context aware user embedding” in line 9. There is insufficient antecedent basis for the limitation in the claim. For examination purposes “the context aware user embedding” will be read as “the context aware user embedding” of claim 9.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-15 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Regarding claim 1:
Step 1 – Claim 1 is directed to a method: A computer-implemented method, comprising…
Step 2A, Prong 1 – A judicial exception is recited in this claim as it recites mathematical concepts (see MPEP 2106.04(a)(2)(I)):  determining… based at least in part on the sequence of actions, a first user embedding associated with the user that is representative of the user and is configured to predict a plurality of predicted user actions associated with the user and determining… based at least in part on the first user embedding and the contextual information, a second user embedding configured to predict a plurality of recommended content items for the user. To “determine” an embedding is to calculate a low-dimensional vector that is representative of higher-dimensional data. Hence, determining, based at least in part on a sequence of actions, a first user embedding that is representative of the user and is configured to predict a plurality of predicted user actions associated with the user is a mathematical calculation. Furthermore, determining, based at least in part on the first user embedding and contextual information, a second user embedding configured to predict a plurality of recommended content items for the user is a mathematical calculation.
Step 2A, Prong 2 – The following limitations are additional elements without significantly more than the abstract idea: providing a first sequence of actions associated with a user to a first trained machine learning model as a first input to the first trained machine learning model, using the first trained machine learning model, providing the first user embedding to a second trained machine learning model as a first input to the second trained machine learning model, providing contextual information as a second input to the second trained machine learning model, and using the second trained machine learning model. First and second trained machine learning models used as mere tools to apply an exception are generic elements for performing or applying the abstract idea using a generic computing environment (see MPEP 2106.05(f)). Furthermore, providing a first sequence of user actions as an input to the first trained machine learning model and the first user embedding and contextual information as inputs to the second trained machine learning model amounts to insignificant extra-solution activity of data gathering that does not add a meaningful limitation to the “computer-implemented method” (see MPEP 2106.05(g)).
Step 2B – As discussed above, the additional elements using the first trained machine learning model and using the second trained machine learning model amount to insignificant extra-solution activity as mere instructions to apply the judicial exception using a generic computing environment and is not indicative of significantly more. The additional elements providing a first sequence of actions associated with a user to a first trained machine learning model as a first input to the first trained machine learning model, providing the first user embedding to a second trained machine learning model as a first input to the second trained machine learning model, and providing contextual information as a second input to the second trained machine learning model amount to WURC activity similar to “receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F. 3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information)” (see MPEP 2106.05(d)). These limitations taken alone or in combination, fail to provide an inventive concept. Thus, the claim is not patent eligible.

Claims 2-4 recite limitations which further narrow the abstract idea of claim 1 by specifying more details of the mathematical concepts that occur:
Regarding claim 2, describing the first user embedding as being determined offline, in batch is attempting to limit the field of use without significantly more (see MPEP 2106.05(h)).
Regarding claim 3, this claim further limits the abstract idea of claim 1 to be based on a mathematical concept: incrementally determining an updated user embedding for the user based at least in part on a subset of the first sequence of user actions and the second sequence of user actions. Incrementally determining an updated user embedding for the user based at least in part on a subset of the first and second sequences of user actions is a mathematical calculation. Furthermore, obtaining a second sequence of user actions associated with the user since the first user embedding was determined is still insignificant extra-solution activity as discussed in Ultramercial, 772 F.3d at 715, 112 USPQ2d at 1754 (see MPEP 2106.05(g)).
Regarding claim 4, describing the first and second trained machine learning models as a single, end-to-end learned model is attempting to limit the field of use without significantly more (see MPEP 2106.05(h)).

Regarding claim 5:
	Step 1 – Claim 5 is directed to a system comprising a processor (a manufacture, see MPEP 2106.03): A computing system, comprising: one or more processors…
Step 2A, Prong 1 – A judicial exception is recited in this claim as it recites mathematical concepts (see MPEP 2106.04(a)(2)(I)): determine, for each user action of the first sequence of user actions, a corresponding embedding… determine a plurality of embeddings from the corresponding embeddings determined for each user action of the first sequence of user actions… determine, for each of the plurality of embeddings, a corresponding predicted action… and determine, based at least in part on the corresponding predicted actions, a user embedding that is representative of the user and is configured to predict a plurality of user actions over a defined timeframe. To “determine” an embedding is to calculate a low-dimensional vector that is representative of higher-dimensional data. Hence, determining a corresponding embedding for each user action of the first sequence of user actions and determining, based at least in part on the corresponding predicted actions, a user embedding that is representative of the user and is configured to predict a plurality of user actions over a defined timeframe are mathematical calculations. Furthermore, determining a plurality of embeddings from the corresponding embeddings involves calculating a subset of embeddings, which is a mathematical concept. Finally, determining, for each of the plurality of embeddings, a corresponding predicted action, involves calculating a likelihood of a predicted action using data from each of the plurality of embeddings or low-dimensional vectors, which is a mathematical concept.
Step 2A, Prong 2 – The following limitations are additional elements without significantly more than the abstract idea: one or more processors; and a memory storing program instructions that, when executed by the one or more processors, cause the one or more processors at least: receive a first sequence of user actions associated with a user… One or more processors and a memory storing program instructions used as mere tools to apply an exception are generic elements for performing or applying the abstract idea using a generic computing environment (see MPEP 2106.05(f)). Furthermore, receiving a first sequence of user actions amounts to insignificant extra-solution activity of data gathering that does not add a meaningful limitation to the “computing system” (see MPEP 2106.05(g)).
Step 2B – As discussed above, the additional elements one or more processors; and a memory storing program instructions amount to mere tools to apply the judicial exception using a generic computing environment and is not indicative of significantly more. The additional element receive a first sequence of user actions associated with a user amounts to WURC activity similar to “receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F. 3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information)” (see MPEP 2106.05(d)). These limitations taken alone or in combination, fail to provide an inventive concept. Thus, the claim is not patent eligible.

Claims 6-15 recite limitations which further narrow the abstract idea of claim 5 by specifying more details of the mathematical concepts that occur:
Regarding claim 6, this claim further limits the abstract idea of claim 5 to be based on a mathematical concept: incrementally determine an updated embedding for the user based at least in part on a subset of the first sequence of user actions and the second sequence of user actions. Incrementally determining an updated embedding for the user based at least in part on a subset of the first and second sequence of user actions is a mathematical calculation. Furthermore, receiving a second sequence of user actions associated with the user since the user embedding was determined is still insignificant extra-solution activity as discussed in Ultramercial, 772 F.3d at 715, 112 USPQ2d at 1754 (see MPEP 2106.05(g)).
Regarding claim 7, specifying wherein the user embedding is further configured to predict a classification associated with the user in this manner does not overcome the rejection of claim 5 as modifying the embedding does not make “determining” to not be a mathematical concept.
Regarding claim 8, this claim further limits the abstract idea of claim 6 to be based on a mathematical concept: prior to incrementally determining the updated embedding, determine that a number of actions included in the second sequence of user actions exceeds a threshold value. Determining whether a number of actions exceed a threshold value is a mathematical calculation.
Regarding claim 9, this claim further limits the abstract idea of claim 5 to be based on a mathematical concept: determine a context aware user embedding based at least in part on the user embedding and the contextual information. Determining a context aware user embedding based at least in part on the user embedding and the contextual information is a mathematical calculation. Furthermore, receiving contextual information associated with the user is still insignificant extra-solution activity as discussed in Ultramercial, 772 F.3d at 715, 112 USPQ2d at 1754 (see MPEP 2106.05(g)).
Regarding claim 10, specifying wherein the contextual information includes at least one of: a query submitted by the user; an interest associated with the user; or a content item with which the user has interacted in this manner does not overcome the rejection of claim 9 as modifying the contextual information does not make “determining” to not be a mathematical concept.
Regarding claim 11, this claim further limits the abstract idea of claim 5 to be based on a mental process: identify, based at least in part on the context aware user embedding, one or more content items from a corpus of content items to present to the user in response to a request for content items. For example, given a simple enough, low-dimensional context aware user embedding and a small enough corpus of content items, a human can reasonably perform identifying one or more content items to present to the user in response to a request for content items (see MPEP 2106.04(a)(2)(III)).
Regarding claim 12, describing wherein the user embedding is generated offline in batch and the context aware user embedding is generated in real-time is attempting to limit the field of use without significantly more (see MPEP 2106.05(h)).
Regarding claim 13, this claim further limits the abstract idea of claim 5 to be based on a mathematical concept: wherein a causal mask is applied to the first sequence of user actions. Applying a causal mask to a first sequence of user actions involves performing matrix operations to form a lower-triangular matrix, which is a mathematical calculation.
Regarding claim 14, specifying wherein the predicted plurality of user actions includes representations of content items with which the user is expected to engage in this manner does not overcome the rejection of claim 5 as modifying the predicted plurality of user actions does not make “determining” to not be a mathematical concept.
Regarding claim 15, specifying wherein the first sequence of user actions includes representations of content items with which the user has engaged in this manner does not overcome the rejection of claim 5 as modifying the first sequence of user actions does not make “determining” to not be a mathematical concept.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

	
Claims 1 and 4 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Zhao et al. (US 20210256366 A1, hereinafter Zhao).

	Regarding claim 1, Zhao teaches a computer-implemented method (¶6 “Certain embodiments provide a method”), comprising: providing a first sequence of actions associated with a user to a first trained machine learning model as a first input to the first trained machine learning model (Fig. 1 – 104, 112, ¶28 “The first model 104 and the second model 106 are deep-learning models… that have been trained,” ¶30 “time dependent data 112 stored in the user database 108 is routinely accessed by the first model 104… The time dependent data 112 includes a set of time segments… The time dependent data 112 is generated by identifying user activity for a time segment (e.g., the most recent 24 hours),” ¶33 “For example, the time dependent data 112 can include user search keywords entered by the user in the past 24 hour time segment,” wherein “time dependent data” including “user search keywords entered… in the past 24 hour time segment” encompasses a first sequence of actions associated with a user).
	Zhao teaches determining, using the first trained machine learning model and based at least in part on the sequence of actions (Fig. 1 – 104, 112, 124, ¶29 “user data includes time dependent data 112,” ¶32 “Using both the user data and the application data, the first model 104 of the recommendation engine 102 generates a relevance score 124 for each available third-party application by transforming the user data and application data into vectors,” wherein generating a “relevance score” using “the user data” which includes “time dependent data” encompasses determining, using the first trained machine learning model and based at least in part on the sequence of actions), a first user embedding associated with the user that is representative of the user (Fig. 1 – 104, 126, ¶35 “Once vector representation(s) are generated for each vector, the first model 104 concatenates the vectors generated and passes the concatenated vector through an activation function (e.g., Softmax) to generate a relevance score for each available third-party application. The relevance score indicates whether the third-party application is of relevance or interest to the user… Based on the relevance scores of the third-party applications, the first model 104 determines those third-party applications that are relevant applications 126 to the user,” ¶36 “Once the first model 104 identifies the relevant applications 126, the relevant applications are arranged by row… from highest ranking application of relevance to the lowest ranking application of relevance,” wherein the “relevant applications… arranged by row… from highest… to the lowest” encompasses a first user embedding associated with the user that is representative of the user, given the broadest reasonable interpretation of an embedding, which includes a one-dimensional vector representation of data, or in this case, how “relevant” applications are to a user) and is configured to predict a plurality of predicted user actions associated with the user (¶23 “a relevance score for each available third-party application, indicating whether a user is predicted to find relevance (or interest) in that third-party application,” wherein “a relevance score… indicating whether a user is predicted to find relevance (or interest)” encompasses predicted user action associated with the user, hence a row of applications arranged by ”relevance score” is configured to predict a plurality of user actions).
	Zhao teaches providing the first user embedding to a second trained (Fig. 1 – 106, ¶28) machine learning model as a first input to the second trained machine learning model (Fig. 1 – 106, 126, ¶36 “The relevant applications 126, arranged by row from highest ranking application of relevance to the lowest ranking application of relevance, is input to the second model 106 to generate a connection score 130 for the relevant applications 126”).
	Zhao teaches providing contextual information as a second input to the second trained machine learning model (Fig. 1 – 106, 118, 120, ¶37 “second model 106 accesses unstructured data such as user search history 118, application topic 120, user reviews of an application, tag data, application description, etc.”).
	Zhao teaches and determining, using the second trained machine learning model and based at least in part on the first user embedding and the contextual information (Fig. 1 – 106, 126, 130, ¶37 “In order to generate the connection scores 130 for each relevant application 126, the second model 106 accesses… unstructured data…” wherein the “second model” using “unstructured data” to generate “connection scores” for “each relevant application” encompasses determining, using the second trained machine learning model and based at least in part on the first user embedding and the contextual information), a second user embedding configured to predict a plurality of recommended content items for the user (Fig. 1 – 106, 126, 130-134, ¶38 “The second model 106 generates an engagement score 132 for each relevant third-party application based on combining the connection score 130 of the relevant third-party application with the corresponding relevance score 124 (e.g., multiplying the relevance score and the connection score). Based on the engagement score 132, the recommendation engine 102 determines the top-X third-party applications to recommend to the user,” wherein “the top-X third-party applications to recommend” encompasses second user embedding configured to predict a plurality of recommended content items for the user, given the broadest reasonable interpretation of an embedding as discussed above).

	Regarding claim 4, Zhao teaches the computer-implemented method of claim 1 (and thus the rejection of claim 1 is incorporated).
	Zhao further teaches wherein the first trained machine learning model and the second trained machine learning model are implemented as a single, end-to-end learned model (Fig. 1 – 102, 104, 106, Fig. 7 – 702-704, 716, ¶28 “The recommendation engine 102 includes a first model 104 and a second model 106. The first model 104 and the second model 106 are deep-learning models (e.g., recurrent neural networks) that have been trained to generate values indicative of a prediction of user interest in and retention of, respectively, third-party applications,” ¶55, ¶58, ¶65, Figure 1 depicts a single RECOMMENDATION ENGINE with a FIRST MODEL linked to a SECOND MODEL. An end-to-end model, when given its broadest reasonable interpretation, is a model trained to map raw inputs into desired outputs; one of ordinary skill in the art would recognize that the RECOMMENDATION ENGINE which maps “user data” and “application data” to “personalized recommendations to the user” encompasses a single, end-to-end learned model).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

	
Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Zhao in view of Batal et al. (“Multi-Channel Sequential Behavior Networks for User Modeling in Online Advertising,” hereinafter Batal).

Regarding claim 2, Zhao teaches the computer-implemented method of claim 1 (and thus the rejection of claim 1 is incorporated).
Zhao fails to teach wherein the first user embedding is determined offline, in batch. However, Batal, in the same field of endeavor, teaches this limitation (Page 3, Col. 2, ¶4 “Once the entire MC-SBN model is trained end-to-end from logs, its user encoder network gets applied in an offline batch mode to keep updating user vectors as events are streamed through the system,” Page 6, Col. 2, ¶3 “All compared models, baselines and MC-SBN, use 128 vectors to represent the user and ad embeddings,” wherein “user… embeddings” that are represented by “user vectors” updated in an “offline batch mode” encompasses wherein the first user embedding is determined offline, in batch).
Zhao and Batal are analogous to the claimed invention as both are from the same field of endeavor of machine learning. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to implement a method that combined the teachings of Zhao and Batal. The motivation to do so, as stated by Batal, is to design a model “to be practical for large-scale and latency-sensitive serving systems… which serves tens of millions of users, processes hundreds of thousands of user events per seconds, while ensures that serving latency is within tens of milliseconds” (Batal, Page 1, Col. 2, ¶3).

	Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Zhao in view of Wu et al. (“Rethinking Lifelong Sequential Recommendation with Incremental Multi-Interest Attention,” hereinafter Wu).
Regarding claim 3, Zhao teaches the computer-implemented method of claim 1 (and thus the rejection of claim 1 is incorporated).
Zhao fails to teach obtaining a second sequence of user actions associated with the user since the first user embedding was determined. However, Wu, in the same field of endeavor, teaches this limitation (Page 3, Col. 2, ¶1 “we have the updated sequence [𝑣1, 𝑣2, · · · , 𝑣𝐿, 𝑣𝐿+1]. We then use this sequence to predict the next item (or the next set of items) 𝑢 might click… we compute a preference score of user 𝑢 for each item 𝑣… The preference scores are then sorted to retrieve the top-𝑘 items as the candidate set for 𝑢… When the user 𝑢 generates more actions 𝑣𝐿+2, 𝑣𝐿+3, · · · , they are appended to the user’s historical sequence,” wherein “preference scores” for each item encompasses the first user embedding when given its broadest reasonable interpretation, as discussed above with respect to the ”relevant applications… arranged by row” of Zhao).
Wu teaches incrementally determining an updated embedding for the user based at least in part on a subset of the first sequence of user actions and the second sequence of user actions (Page 3, Col. 2, ¶1 “When the user 𝑢 generates more actions 𝑣𝐿+2, 𝑣𝐿+3, · · · , they are appended to the user’s historical sequence, and the same recommendation procedure is repeated for online inference with respect to the user’s real-time evolving behaviors,” Page 4, Col. 1, ¶3 “The recommender should be able to continuously (incrementally) update the user representation to adapt to the new click behaviors generated by the user,” wherein “the same recommendation procedure” is implied to encompass determining an updated embedding for the user).
Zhao and Wu are analogous to the claimed invention as both are from the same field of endeavor of machine learning. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to implement a method that combined the teachings of Zhao and Wu. The motivation to do so, as stated by Wu, is to design “a novel incremental self-attention based method for lifelong sequential recommendation, which goes beyond the limitations of RNN-based memory networks, while also possesses their ability to incrementally update the user representation for online inference” (Wu, Page 2, Col. 2, ¶2).

Claims 5, 7, 9-11, and 14-15 are rejected under 35 U.S.C. 103 as being unpatentable over Zhao in view of Ulanov et al. (US 20240020345 A1, hereinafter Ulanov), and further in view of Fujimura et al. (US 20230341831 A1, hereinafter Fujimura).

Regarding claim 5, Zhao teaches a computing system, comprising: one or more processors (¶7 “a processor of a computing system”).
Zhao teaches and a memory storing program instructions (¶88 “the functions may be stored or transmitted over as one or more instructions or code on a computer-readable medium… Computer-readable media include… computer storage media”) that, when executed by the one or more processors (¶85 “the various operations of methods described above may be performed by any suitable means capable of performing the corresponding functions. The means may include various hardware… including… a circuit… or processor,” ¶89 “The computer-readable media may comprise a number of software modules. The software modules include instructions that, when executed by an apparatus such as a processor, cause the processing system to perform various functions”), cause the one or more processors at least: receive a first sequence of user actions associated with a user (Fig. 1 – 102, 112, ¶30 “the recommendation engine 102 can retrieve time dependent data 112,” ¶33).
Zhao fails to teach determine, for each user action of the first sequence of user actions, a corresponding embedding. However, Ulanov, in the same field of endeavor, teaches this limitation (Fig. 3 – 310-350, ¶36 “a user interacts with content items on webpages 310. For example, when considering products for purchase, a user might view a webpage 310 with one or more products and accompanying textual content 320,” ¶37 “Word embeddings 330 for some or all of the words from the textual content 320 are obtained,” ¶38 “only a subset of the words of the textual content 320 are used to generate a content embedding 350” wherein “a content embedding” generated from “textual content” viewed by a user encompass a corresponding embedding for each user action).
Ulanov teaches determine a plurality of embeddings from the corresponding embeddings determined for each user action of the first sequence of user actions (Fig. 3 – 350, Fig. 4A – 410, 420A-C, 450A-C, ¶41 “the content embeddings 350 that are used to generate a user embedding 450 are limited by timeframe in addition to being selected by category. For example, a user embedding 450B may be generated using content embeddings 350 for content that the user 410 added to a cart 420B within a prior time period, e.g., within the last two weeks,” wherein using “content embeddings” that are “limited by timeframe in addition to being selected by category” encompasses determine a plurality of embeddings from the corresponding embeddings).
Zhao teaches determine, for each of the plurality of embeddings, a corresponding predicted action (Fig. 1 – 104, 126, ¶34 “the application identification of a third-party application is converted to a vector representation by the first model 104,” wherein a “vector representation” encompasses an embedding when given its broadest reasonable interpretation, ¶35 “Once vector representation(s) are generated for each vector, the first model 104 concatenates the vectors… to generate a relevance score for each available third-party application. The relevance score indicates whether the third-party application is of relevance or interest to the user… Based on the relevance scores of the third-party applications, the first model 104 determines those third-party applications that are relevant applications 126 to the user,” wherein “vector representation(s)” encompass a plurality of embeddings, and a “relevance score for each… third-party application” encompasses a corresponding predicted action, given that a relevance score predicts “whether the third-party application is of relevance or interest to the user,” or how likely a user will use an application).
Zhao teaches and determine, based at least in part on the corresponding predicted actions, a user embedding that is representative of the user and is configured to predict a plurality of user actions (Fig. 1 – 104, 126, ¶36 “Once the first model 104 identifies the relevant applications 126, the relevant applications are arranged by row… from highest ranking application of relevance to the lowest ranking application of relevance,” ¶29 “The first model 104… determines whether a third-party application is relevant or of interest to the user (e.g., determining whether the user is likely to click on or connect to the third-party application if recommended by the recommendation engine),” wherein “the relevant applications… arranged by row” encompasses a user embedding that is representative of the user given the broadest reasonable interpretation of an embedding, and a row of applications “of relevance” that determines “whether the user is likely to click on or connect to the third-party application” encompasses predict a plurality of user actions). However, Zhao fails to teach to predict a plurality of user actions over a defined timeframe.
Fujimura, in the same field of endeavor, teaches a model configured to predict a plurality of user actions over a defined timeframe (Fig. 5, ¶83 “predicted state changes may be information indicating user actions… and occurrence probabilities at every predetermined period of time (at every estimated occurrence time). The predicted state changes shown in FIG. 5 show the occurrence probability within a time span of less than 30 seconds from a predetermined timing… the occurrence probability within a time span of 30 seconds or more and less than 3 minutes from the predetermined timing, and the occurrence probability within a time span of 1 minute or more and less than 5 minutes from the predetermined timing”).
Zhao, Ulanov, and Fujimura are analogous to the claimed invention as all are from the same field of endeavor of machine learning. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to implement a system that combined the plurality of embeddings of Ulanov and the predicted plurality of user actions over a defined timeframe of Fujimura with the plurality of embeddings and user embedding configured to predict a plurality of user actions of Zhao, respectively. The motivation to do so is “to better tailor content and product recommendations to the interests of a user” (Ulanov, ¶4) and to increase the probability that predicted future actions match the user’s intention (Fujimura ¶39 “operations to be performed are determined based on the plurality of user’s future actions and the predicted times at which the plurality of user’s future actions are predicted to be performed, and it is therefore possible to increase the possibility that the appliance control performed automatically matches the user’s intention”).

Regarding claim 7, Zhao in view of Ulanov and further in view of Fujimura teaches the computing system of claim 5 (and thus the rejection of claim 1 is incorporated).
Zhao teaches wherein the user embedding is further configured to predict a classification associated with the user (Fig. 1 – 104, 126 ¶35 “The relevance score indicates whether the third-party application is of relevance or interest to the user. For example, if the application data regarding a rarely used application by the user was entered into the first model 104, the relevance score would be low for that third-party application (as well as other third-party applications associated with the same topic), reflecting the user's disinterest in the application. Based on the relevance scores of the third-party applications, the first model 104 determines those third-party applications that are relevant applications 126 to the user,” wherein “interest to the user” or “the user’s disinterest” encompasses a classification associated with the user).

Regarding claim 9, Zhao in view of Ulanov and further in view of Fujimura teaches the computing system of claim 5 (and thus the rejection of claim 5 is incorporated).
Zhao teaches wherein the program instructions, when executed by the one or more processors, further cause the one or more processors at least: receive contextual information associated with the user (Fig. 1 – 106, 118, 120, 130, ¶37 “In order to generate the connection scores 130 for each relevant application 126, the second model… accesses additional types of application data… such as user search history 118, application topic 120, user reviews of an application, tag data, application description, etc.”).
Zhao teaches and determine a context aware user embedding based at least in part on the user embedding and the contextual information (Fig. 1 – 102, 106, 126, 130-134, ¶38 “The second model 106 generates an engagement score 132 for each relevant third-party application based on combining the connection score 130 of the relevant third-party application with the corresponding relevance score 124 (e.g., multiplying the relevance score and the connection score). Based on the engagement score 132, the recommendation engine 102 determines the top-X third-party applications to recommend to the user,” wherein “the top-X third-party applications to recommend” based on the “connection score” for “each relevant application” generated from “additional types of application data” encompasses a context aware user embedding based at least in part on the user embedding and the contextual information, given the broadest reasonable interpretation of an embedding as discussed above).

Regarding claim 10, Zhao in view of Ulanov and further in view of Fujimura teaches the computing system of claim 9 (and thus the rejection of claim 9 is incorporated).
Zhao teaches wherein the contextual information includes at least one of: a query submitted by the user (Fig. 1 – 106, 118, ¶37 “the second model 106 accesses unstructured data such as user search history 118”); an interest associated with the user (¶37 “user reviews of an application”); or a content item with which the user has interacted (Fig. 1 – 120, ¶37 “user reviews” implies a user has already interacted with an application or content item).

Regarding claim 11, Zhao in view of Ulanov and further in view of Fujimura teaches the computing system of claim 5 (and thus the rejection of claim 5 is incorporated).
Zhao teaches wherein the program instructions, when executed by the one or more processors, further cause the one or more processors at least: identify, based at least in part on the context aware user embedding, one or more content items from a corpus of content items to present to the user in response to a request for content items (Fig. 1 – 102, 132, 134, ¶38 “Based on the engagement score 132, the recommendation engine 102 determines the top-X third-party applications to recommend to the user. The recommended applications 134 are displayed to the user. In some cases, the recommended application 134 are displayed upon user request”). 

Regarding claim 14, Zhao in view of Ulanov and further in view of Fujimura teaches the computing system of claim 5 (and thus the rejection of claim 5 is incorporated).
Zhao teaches wherein the predicted plurality of user actions includes representations of content items with which the user is expected to engage (Fig. 1 – 104, 124-126, ¶29 “The first model 104…  determines whether a third-party application is relevant or of interest to the user (e.g., determining whether the user is likely to click on or connect to the third-party application if recommended by the recommendation engine),” ¶32 “the first model 104 of the recommendation engine 102 generates a relevance score 124 for each available third-party application by transforming the user data and application data into vectors,” ¶35 “Once vector representation(s) are generated for each vector, the first model 104 concatenates the vectors generated and passes the concatenated vector through an activation function (e.g., Softmax) to generate a relevance score for each available third-party application”).

Regarding claim 15, Zhao in view of Ulanov and further in view of Fujimura teaches the computing system of claim 5 (and thus the rejection of claim 5 is incorporated).
Zhao teaches a first sequence of user actions (Fig. 1 – 104, 112, ¶30, ¶33 “For example, the time dependent data 112 can include user search keywords entered by the user in the past 24 hour time segment of the time dependent data 112). However, Zhao fails to teach wherein the first sequence of user actions includes representations of content items with which the user has engaged.
Ulanov teaches representations of content items with which the user has engaged: (Fig. 1 – 140, Fig. 3 – 310-350, Fig. 5 – 510, ¶36 “a user interacts with content items on webpages 310. For example, when considering products for purchase, a user might view a webpage 310 with one or more products and accompanying textual content 320. Examples of textual content 320 include product descriptions, user reviews, articles, or any other text accompanying or associated with a content item. Textual content 320 comprises text that is associated with a content item,” ¶37 “Word embeddings 330 for some or all of the words from the textual content 320 are obtained,” wherein “word embeddings” of “textual content” of a “webpage” that a user has interacted with encompasses representations of content items with which the user has engaged).
Zhao and Ulanov are analogous to the claimed invention as both are from the same field of endeavor of machine learning. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to implement a system that combined the first sequence of user actions of Zhao with the representations of content items with which the user has engaged of Ulanov. The motivation to do so is “to better tailor content and product recommendations to the interests of a user” (Ulanov, ¶4).

Claims 6 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Zhao in view of Ulanov and further in view of Fujimura, and further in view of Wu.

Regarding claim 6, Zhao in view of Ulanov and further in view of Fujimura teaches the computing system of claim 5 (and thus the rejection of claim 5 is incorporated).
The combination of Zhao, Ulanov, and Fujimura fails to teach wherein the program instructions, when executed by the one or more processors, further cause the one or more processors at least: receive a second sequence of user actions associated with the user since the user embedding was determined. 
Wu teaches this limitation (Page 2, Col. 2, ¶1 “to address the computational and storage inefficiency of the vanilla attention in online inference, we design a novel multi-interest extraction module under our incremental attention framework,” wherein a “module” designed to address “computational and storage inefficiency” implies program instructions, when executed by the one or more processors, Page 3, Col. 2, ¶1 “we have the updated sequence [𝑣1, 𝑣2, · · · , 𝑣𝐿, 𝑣𝐿+1]. We then use this sequence to predict the next item (or the next set of items) 𝑢 might click… we compute a preference score of user 𝑢 for each item 𝑣… The preference scores are then sorted to retrieve the top-𝑘 items as the candidate set for 𝑢… When the user 𝑢 generates more actions 𝑣𝐿+2, 𝑣𝐿+3, · · · , they are appended to the user’s historical sequence,” wherein “preference scores” for each item encompasses the first user embedding when given its broadest reasonable interpretation).
Wu teaches incrementally determine an updated embedding for the user based at least in part on a subset of the first sequence of user actions and the second sequence of user actions (Page 3, Col. 2, ¶1 “When the user 𝑢 generates more actions 𝑣𝐿+2, 𝑣𝐿+3, · · · , they are appended to the user’s historical sequence, and the same recommendation procedure is repeated for online inference with respect to the user’s real-time evolving behaviors,” Page 4, Col. 1, ¶3 “The recommender should be able to continuously (incrementally) update the user representation to adapt to the new click behaviors generated by the user,” wherein “the same recommendation procedure” is implied to encompass incrementally determine an updated embedding for the user).
Zhao and Wu are analogous to the claimed invention as both are from the same field of endeavor of machine learning. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to implement a system that combined the teachings of Wu with the processor of Zhao. The motivation to do so, as stated by Wu, is to design “a novel incremental self-attention based method for lifelong sequential recommendation, which goes beyond the limitations of RNN-based memory networks, while also possesses their ability to incrementally update the user representation for online inference” (Wu, Page 2, Col. 2, ¶2).

Regarding claim 8, Zhao in view of Ulanov and further in view of Fujimura and further in view of Wu teaches the computing system of claim 6 (and thus the rejection of claim 6 is incorporated).
Wu teaches wherein the program instructions, when executed by the one or more processors, further cause the one or more processors at least: prior to incrementally determining the updated embedding, determine that a number of actions included in the second sequence of user actions exceeds a threshold value (Page 8, Col. 1, ¶1 “we extract the subset of users whose behavior sequences has length greater or equal a threshold 𝑙, we compute the performance of different methods only on this subset of users. We then observe how does the performance vary when we increase the threshold 𝑙,” wherein the number of actions included in the second sequence of user actions is implied to exceed a threshold value so that the total number of actions in a user’s sequence exceeds “a threshold 𝑙”).
Zhao and Wu are analogous to the claimed invention as both are from the same field of endeavor of machine learning. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to implement a system that combined the teachings of Wu and Zhao. The motivation to do so, as stated by Wu, is to design “a novel incremental self-attention based method for lifelong sequential recommendation, which goes beyond the limitations of RNN-based memory networks, while also possesses their ability to incrementally update the user representation for online inference” (Wu, Page 2, Col. 2, ¶2).

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Zhao in view of Ulanov and further in view of Fujimura, and further in view of Batal.

Regarding claim 12, Zhao in view of Ulanov and further in view of Fujimura teaches the computing system of claim 9 (and thus the rejection of claim 9 is incorporated).
Zhao teaches and the context aware user embedding is generated in real-time (Fig. 1 – 106, 132-134, ¶20 “The third-party application(s) recommended to the user include those third-party applications the recommendation engine has determined to be of relevance to the user and that the user will connect and use. In some cases, the recommendation can be displayed to the user upon request or automatically (e.g., in real time),” ¶38 “The second model 106 generates an engagement score 132 for each relevant third-party application… Based on the engagement score 132, the recommendation engine 102 determines the top-X third-party applications to recommend to the user. The recommended applications 134 are displayed to the user. In some cases, the recommended application 134 are displayed upon user request. In other cases, the recommended application 134 are displayed automatically in the user interface,” wherein the “top-X third party applications to recommend” being displayed “automatically (e.g., in real time)” implies that the context aware user embedding must be generated in real-time). However, the combination of Zhao, Ulanov, and Fujimura fails to teach wherein the user embedding is generated offline in batch.
Batal, in the same field of endeavor, teaches this limitation (Page 3, Col. 2, ¶4 “Once the entire MC-SBN model is trained end-to-end from logs, its user encoder network gets applied in an offline batch mode to keep updating user vectors as events are streamed through the system,” Page 6, Col. 2, ¶3 “All compared models, baselines and MC-SBN, use 128 vectors to represent the user and ad embeddings,” wherein “user… embeddings” that are represented by “user vectors” updated in an “offline batch mode” encompasses wherein the user embedding is determined offline in batch).
Zhao and Batal are analogous to the claimed invention as both are from the same field of endeavor of machine learning. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to implement a system that combined the teachings of Zhao and Batal. The motivation to do so, as stated by Batal, is to design a model “to be practical for large-scale and latency-sensitive serving systems… which serves tens of millions of users, processes hundreds of thousands of user events per seconds, while ensures that serving latency is within tens of milliseconds” (Batal, Page 1, Col. 2, ¶3).

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Zhao in view of Ulanov and further in view of Fujimura, and further in view of Raffel et al. (“Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer,” hereinafter Raffel).

Regarding claim 13, Zhao in view of Ulanov and further in view of Fujimura teaches the computing system of claim 5 (and thus the rejection of claim 5 is incorporated).
Zhao teaches a first sequence of user actions (Fig. 1 – 112, ¶30, ¶33). However, the combination of Zhao, Ulanov, and Fujimura fails to teach wherein a causal mask is applied to the first sequence of user actions.
Raffel, in the same field of endeavor, teaches wherein a causal mask is applied to a sequence of inputs (Fig. 3, Page 16, ¶2 “When producing the ith entry of the output sequence, causal masking prevents the model from attending to the jth entry of the input sequence for j > i. This is used during training so that the model can’t “see into the future” as it produces its output”).
Zhao and Raffel are analogous to the claimed invention as both are from the same field of endeavor of machine learning. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to implement a system that combined the causal mask of Raffel with the first sequence of user actions of Zhao. The motivation to do so is to provide a simple way of training a single flexible model on a wide variety of tasks (Raffel, Pages 2-3 “the text-to-text framework allows us to directly apply the same model, objective, training procedure, and decoding process to every task we consider. We leverage this flexibility by evaluating performance on a wide variety of English-based NLP problems, including question answering, document summarization, and sentiment classification, to name a few”).

Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Grbovic et al. (“E-commerce in Your Inbox: Product Recommendations at Scale,” hereinafter Grbovic) in view of Lomada et al. (US 20200107072 A1, hereinafter Lomada).

Regarding claim 16, Grbovic teaches a computer-implemented method for training a sequential machine learning model, comprising: obtaining a first sequence of user actions (Page 5, Col. 1, Section 4.1, ¶1 “Our data sets included e-mail receipts sent to users…” Page 5, Col. 2, Section 4.1, ¶3 “More formally, data set Dp… was derived by forming e-mail receipt sequences… for each user… along with their timestamps,” wherein “e-mail receipts” implies a purchase for every receipt, which encompasses a first sequence of user actions).
Grbovic teaches determining a point in time within the first sequence of user actions (Page 2, Col. 1, ¶3 “The model was evaluated on a held-out month, where we tested the effectiveness of recommendations”).
Grbovic teaches dividing the first sequence of user actions into a first plurality of user actions that were performed prior to the point in time and a second plurality of user actions that were performed after the point in time (Page 5, Col. 2, ¶3 “Predictions were evaluated on a held-out month of user purchases                         
                            
                                
                                    D
                                
                                
                                    p
                                
                                
                                    t
                                    s
                                
                            
                        
                     formed in the same manner as Dp,” Fig. 7 depicts “Lookback” days which occur prior to the point in time and “Lookahead days” which occur after the point in time).
Grbovic teaches providing the first plurality of user actions to the sequential machine learning model as training inputs (Fig. 7, Page 5, Col. 2, ¶3 “In particular, we measured the number of product purchases that were correctly predicted… For all models, the accuracy was measured separately on each day, based on the recommendations calculated using prior days,” wherein the models described encompass a sequential machine learning model given its broadest reasonable interpretation of a model that receives a sequence of user actions as input, Page 6, Col. 2, ¶1 “In Figure 7 we give results for popular products calculated in the previous 5, 10, 20, and 30 days of training data Dp”).
Grbovic teaches providing the second plurality of user actions to the sequential machine learning model as labeled positive training data (Fig. 7 – “Lookahead days,” Page 6, Col. 2, ¶1 “evaluated on the first 1, 3, 7, 15, and 30 days of test data                         
                            
                                
                                    D
                                
                                
                                    p
                                
                                
                                    t
                                    s
                                
                            
                        
                    ,” wherein “test data” is implied to encompass labeled positive training data, as the model is tested for accuracy, or how many predictions were accurate with respect to training data labeled as “positive”).
Grbovic teaches training the sequential machine learning model using the training inputs and the labeled positive training data (Fig. 9, Page 7, Col. 1, Section 4.4, ¶2 “prod2vec-topK was trained using data set Dp…” Page 7, Col. 2, “Results” section, “we evaluated performance of prod2vec for different values of decay factors. In Figure 9 we show prediction accuracy on test data                         
                            
                                
                                    D
                                
                                
                                    p
                                
                                
                                    t
                                    s
                                
                            
                        
                      when looking 1, 3, 7, 15, and 30 days ahead. Initial prod2vec predictions were based on the last user purchase in the training data set Dp.”) to predict a set of user actions over a period of time for each corresponding user (Fig. 1, Page 9, Col. 1, Section 5.1, ¶2 “Popular products were used as back-fill for users who do not have prior purchase history. Following the experimental results, we recalculated popular products every 3 days, with a lookback of 5 days…” Page 9, Col. 2, ¶4 “Given predictions for a certain user, once the user logs into the e-mail client we show a new recommendation after every user action,” wherein the model is trained to predict a set of user actions, or “popular products” the user might purchase, over a period of “every 3 days”). However, Grbovic fails to teach to generate user embeddings that are representative of corresponding users and are configured to predict a set of user actions over a period of time for each corresponding user.
Lomada, in the same field of endeavor, teaches training a model using a sequence of user actions (¶32 “users manifest a user trait when interacting with content”) to generate user embeddings that are representative of corresponding users (Fig. 1 – 104, Fig. 2 – 206-208, ¶61 “FIG. 2 also includes the user embeddings system 104 training 206 an LSTM autoencoder model using the user trait sequences. In one or more embodiments, the user embeddings system 104 creates an LSTM autoencoder model… The user embeddings system 104 utilizes the user trait sequences to train the LSTM autoencoder model in a semi-supervised manner… until the LSTM autoencoder model generates learned user embeddings”) and are configured to predict a user action for each corresponding user (Fig. 1 – 104 ¶64 “the user embeddings system 104 can utilize the user embeddings for a user to determine how likely the user is to perform a future action (e.g., click on a URL link) based on the actions of similar users as determined by the user embeddings”).
Grbovic teaches generating an executable sequential machine learning model from the trained sequential machine learning model (Page 9, Col. 2, Section 6, ¶1 “Several variants of the prediction models were tested offline and the best candidate was chosen for an online bucket test. Following the encouraging bucket test results, we launched the system in production”).
Grbovic and Lomada are analogous to the claimed invention as both are from the same field of endeavor of machine learning. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to implement a method that combined the generated user embeddings of Lomada with the trained sequential machine learning model of Grbovic. The motivation to do so, as stated by Lomada, is to “more accurately predict the outcome of a task than conventional systems” (Lomada, ¶30).

Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Grbovic in view of Lomada and further in view of Ulanov and further in view of White et al. (US 11068935 B1, hereinafter White).

Regarding claim 17, Grbovic in view of Lomada teaches the computer-implemented method of claim 16 (and thus the rejection of claim 16 is incorporated).
Grbovic teaches wherein training the sequential machine learning model includes: generating, by the sequential machine learning model, a plurality of embeddings that correspond to the first plurality of user actions provided to the sequential machine learning model (Page 2, Col. 1, ¶2 “we propose an approach that embeds products into real-valued, low-dimensional vector space using a neural language model applied to a time series of user purchases. As a result, products with similar contexts (i.e., their surrounding purchases) are mapped to vectors that are nearby in the embedding space,” Fig. 2, Page 3, Col. 2, Section 3.1, ¶1 “The prod2vec model involves learning vector representations of products from e-mail receipt logs”).
The combination of Grbovic and Lomada fails to teach determining a subset of the plurality of embeddings. However, Ulanov teaches this limitation (Fig. 3 – 350, Fig. 4A – 410, 420A-C, 450A-C, ¶41 “the content embeddings 350 that are used to generate a user embedding 450 are limited by timeframe in addition to being selected by category. For example, a user embedding 450B may be generated using content embeddings 350 for content that the user 410 added to a cart 420B within a prior time period, e.g., within the last two weeks,” ¶45 “Content embeddings 350 may be generated for content that a user has interacted with in the past,” wherein the “content embeddings,” which encompass a plurality of embeddings that correspond to user actions, being “limited by timeframe in addition to being selected by category” encompasses determining a subset of the plurality of embeddings).
The combination of Grbovic, Lomada, and Ulanov teaches training the sequential learning model (Grbovic, Page 7 Section 4.4) and each embedding of the subset of the plurality of embeddings (Ulanov, Fig. 3 – 350, ¶41). However, the combination of Grbovic, Lomada, and Ulanov fails to teach training the sequential machine learning model to predict a respective user action for each embedding of the subset of the plurality of embeddings. 
White, in the same field of endeavor, teaches training a machine learning model to predict a respective user action for each embedding (Fig. 7 – 920-980, Col. 17 Lines 21-23, Col. 18 Lines 11-25, Col. 19 Lines 15-30, wherein “a value indicative of a likelihood of a conversion event occurring (e.g., the likelihood of a purchase of a good and/or service)” based on “The position of the website in the 128-deimensional space” encompasses to predict a respective user action for each embedding).
Grbovic, Lomada, Ulanov, and White are all analogous to the claimed invention as all are from the same field of endeavor of machine learning. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to implement a method that combined the subset of the plurality of embeddings of Ulanov and the predicted user action for each embedding of White with the plurality of embeddings of Grbovic. The motivation to do so is “to better tailor content and product recommendations to the interests of a user” (Ulanov, ¶4) and to develop a method that “can facilitate delivery of targeted content to user devices in situations in which historic tracking data (e.g., cookie data) is generally unavailable and/or unreliable” (White, Abstract).

Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Grbovic in view of Lomada and further in view of Rang et al. (“Data Life Aware Model Updating Strategy for Stream-Based Online Deep Learning”).
Regarding claim 18, Grbovic in view of Lomada teaches the computer-implemented method of claim 16 (and thus the rejection of claim 16 is incorporated).
Lomada teaches a second sequence of user actions (Lomada, Fig. 1 – 104, Fig. 3 – 310, ¶71 “the user embeddings system 104 updates the user trait data table 310 as information is received. For example, when a user interacts with content or provides user information exhibiting a user trait, the user embeddings system 104 includes the detected user trait in the user trait data table 310,” wherein updating “the user trait data table as information is received” including “the detected user trait” such as “when a user interacts with content” encompasses a second sequence of user actions). However, the combination of Grbovic and Lomada fails to teach further comprising: updating the sequential machine learning model using a second sequence of user actions by using the second sequence of user actions to re-train an initially trained sequential machine learning model to generate a first updated sequential machine learning model.
Rang, in the same field of endeavor, teaches updating the sequential machine learning model (Page 5, Col. 1, Section 3.2.3, ¶1 “Let m data samples arrive in a sequence,” a machine learning model that receives a sequence of data encompasses a sequential learning model, given its broadest reasonable interpretation) using a second sequence of actions by using the second sequence of actions to re-train an initially trained sequential machine learning model to generate a first updated sequential machine learning model (Fig. 6, Page 7, Col. 1, Section 3.4, ¶1 “Fig. 6 depicts the detailed workflow of our proposed architecture. Three training model stages are included: Batch0, Batch1, and Batch2… During the model training procedure with data sample a0, some new data also arrived at the same time. If the current training is finished and its training metrics are profiled, a new data sample a1… for the next training procedure Batch1 is supposed to be built…” Page 3, Col. 2, ¶1 “The next model training Batch1 is triggered with a newly built data sample and model from the previous training procedure,” Fig. 6 depicts the same “Trained Model” from “Batch 0” being trained again or updated in “Batch 1” with a “Data Sample” containing “New Data” which encompasses using a second sequence of actions to re-train an initially trained sequential machine learning model, wherein to generate a first updated sequential machine learning model is implicit after retraining the “Trained Model”).
Lomada teaches a third sequence of user action (Fig. 1 – 104, Fig. 3A – 310-318, ¶71 “to determining the presence of user traits at timestamp intervals, the user embeddings system 104 updates the user trait data table 310 as information is received,” Fig. 3A depicts “User 2” having different values for “Traits At Timestamp 1,” “Traits At Timestamp 2,” and “Traits At Timestamp 2,” implying a third sequence of user actions that updated the “Traits At Timestamp 3”). However, Lomada fails to teach and subsequently updating the first updated sequential machine learning model using a third sequence of user action by using the third sequence of user actions to re-train the initially trained sequential machine learning model to generate a second updated sequential machine learning model. 
Rang teaches subsequently updating the first updated sequential machine learning model using a third sequence of actions by using the third sequence of actions to re-train the initially trained sequential machine learning model to generate a second updated sequential machine learning model (Fig. 6, Page 3, Col. 2, ¶1 “Batch1 trains the model and prepares data sample a2 for the next training stage Batch2. This training loop would continue until no more new data arrives or no improvement in model quality,” Fig. 6 depicts subsequently updating the first updated sequential machine learning model, in this case the “Trained Model” in “Batch 1,” by using a new “Data Sample” in “Batch 2” to train the same “Trained Model” as depicted “Batch 0” and “Batch 1,” which encompasses using a third sequence of actions to re-train the initially trained sequential machine learning model, wherein to generate a second updated sequential machine learning model is implicit after retraining the “Trained Model” again on a different “Data Sample” from Batch 2).
Grbovic, Lomada, and Rang are analogous to the claimed invention as all are from the same field of endeavor of machine learning. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to implement a method that combined the second and third sequences of user actions of Lomada and the first and second updated sequential machine learning models of Rang with the sequential machine learning model of Grbovic. The motivation to do so, as stated by Rang, is to design an online learning method that “improves model quality with continually generated data” (Rang, Page 1, Col. 2, Section 2.2, ¶1).

Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Grbovic in view of Lomada and further in view of Blanco-Mallo et al. (“On the effectiveness of convolutional autoencoders on image-based personalized recommender systems,” hereinafter Blanco-Mallo).
Regarding claim 19, Grbovic in view of Lomada teaches the computer-implemented method of claim 16 (and thus the rejection of claim 16 is incorporated).
Grbovic teaches further comprising: determining a plurality of parameters associated with a plurality of users associated with the sequence of user actions (Page 5, Col. 2, Section 4.2, ¶1 “To get deeper insights into behavior of online users based on their demographic background and geographic location, we segregated users into cohorts based on their age, gender, and location, and looked at their purchasing habits,” wherein “age, gender, and location” encompass a plurality of parameters associated with a plurality of users associated with the sequence of user actions).
The combination of Grbovic and Lomada fails to teach determining, based at least in part on the plurality of parameters, that the sequence of user actions is unbalanced with respect to at least one parameter of the plurality of parameters. However, Blanco-Mallo, in the same field of endeavor, teaches this limitation (Page 4, Col. 1, Section A, ¶1 “The data used in this work were collected in 2018 and 2019 from the TripAdvisor reviews published by users about restaurants in cities of different sizes,” wherein “TripAdvisor reviews published by users” encompasses sequence of user actions, Table III, Page 5, Col. 1, ¶1-2 “Table III shows the number of images per partition for the three datasets, including the ratios between positive and negative samples. As the datasets are highly unbalanced, a strategy must be applied to reduce its impact on the model performance,” wherein “Positive samples” or “Negative samples” for “cities of different sizes” encompasses based at least in part on the plurality of parameters).
Blanco-Mallo teaches and at least one of up-sampling or down-sampling user actions of at least some of the plurality of users based at least in part on the at least one parameter, so as to balance the sequence of user actions with respect to the at least one parameter (Page 5, Col. 1, ¶2-3 “image data augmentation involves expanding the size of the train set by creating modified versions of the original images. The objective of this technique is not only to increase the amount of data available, but also their variability, thus improving the robustness of the learning models. Data augmentation can be applied to all the samples in the train set but… we only over-sampled the minority class…” Page 5, Col. 1, ¶1 “As a result, the imbalance problem is alleviated and the ratios between the positive and the negative classes are very close to 1:1 (1.2:1 for Santiago de Compostela, 1.02:1 for Barcelona, and 1.39:1 for New York)”).
Grbovic and Blanco-Mallo are analogous to the claimed invention as both are from the same field of endeavor of machine learning. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to implement a method that combined the plurality of parameters of Grbovic with the up-sampling or down-sampling of Blanco-Mallo. The motivation to do so, as stated by Blanco-Mallo, is to design a recommender system that “makes use of the context of the problem, (2) it works better than standard approaches that use a pre-trained convolutional neural network (CNN), and (3) it is less computationally expensive that integrating and fine-tuning a CNN” (Blanco-Mallo, Page 2, Col. 1, ¶1).

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Grbovic in view of Lomada and further in view of Solano Gomez (US 20220224683 A1, hereinafter Solano Gomez).
Regarding claim 20, Grbovic in view of Lomada teaches the computer-implemented method of claim 16 (and thus the rejection of claim 16 is incorporated).
The combination of Grbovic and Lomada fails to teach further comprising: obtaining a plurality of labeled negative training data. However, Solano Gomez, in the same field of endeavor, teaches this limitation (Fig. 4 – 404, ¶71 “a plurality of positive and a plurality of negative pairs of biometric data are obtained from the set of biometric data associated with a plurality of users performing login sessions”).
Solano Gomez teaches and providing the plurality of labeled negative training data to the sequential machine learning model (Fig. 3 – 306, ¶69 “the Siamese network may be trained with pairs of behavior data (e.g., to train for pairwise similarity) of prior users performing logins to a computer system,” ¶71 “training data is obtained from a set of biometric data associated with a plurality of users performing login sessions… a plurality of negative pairs of biometric data are obtained”).
Solano Gomez teaches wherein: training the sequential machine learning model is further based on the plurality of labeled negative training data (Fig. 3 – 306, ¶60 “the machine learning model includes a Siamese neural network… the Siamese network may be trained by positive pairs of training data and negative pairs of training data”).
Grbovic teaches the second plurality of user actions (Page 6 “test data                 
                    
                        
                            D
                        
                        
                            p
                        
                        
                            t
                            s
                        
                    
                
            ”). However, Grbovic fails to teach and the plurality of negative training data includes a portion of the second plurality of user actions that were a positive engagement for a different respective user.
Solano Gomez teaches the plurality of negative training data includes a portion of user actions that were a positive engagement for a different respective user (¶60 “a positive pair of biometric data may be a pair of two login behaviors associated with the same user during different sessions to login an account. A negative pair of biometric data may be a pair of two login behaviors associated with two different users during different sessions to login accounts,” wherein “login behaviors associated with two different users” encompasses positive engagement, given its broadest reasonable interpretation of user engagement labeled as “positive” training data for a different user).
Grbovic and Solano Gomez are analogous to the claimed invention as both are from the same field of endeavor of machine learning. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to implement a method that combined the second plurality of user actions that were a positive engagement for a different respective user of Grbovic with the negative training data of Solano Gomez. The motivation to do so is to design a machine learning model that is more efficient to train (Solano Gomez, ¶21 “Trained with a large set of prior other users' behaviors associated with logging into computer systems to learn similarity based on the features specific to the domain of user's login behaviors, the machine learning model does away with the conventionally required training with a particular user's historic login behavior to authenticate new login behaviors”).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WILLIAM MICHAEL LEE whose telephone number is (571)272-4761. The examiner can normally be reached Monday-Thursday: 8am-5pm, every other Friday 8am-4pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Cesar Paula can be reached at (571)272-4128. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/W.M.L./Examiner, Art Unit 2145               


/CESAR B PAULA/               Supervisory Patent Examiner, Art Unit 2145
Read full office action
SEQUENTIAL MODEL FOR DETERMINING USER REPRESENTATIONS

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

SEQUENTIAL MODEL FOR DETERMINING USER REPRESENTATIONS

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email