DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
This office action is in response to the amendment filed on 01/20/2026. Claims remain pending in the application. Claims 1, 9, and 17 are independent.
Drawings
Applicant's amendment to the specification corrects previous objections; therefore, the previous objections are withdrawn.
Claim Objections
Applicant's amendment to claims corrects some of previous objections; therefore, some of the previous objections are withdrawn. The remaining objections are shown below.
Claims 4 and 12 are objected to because of the following informalities:
in Claim 4, lines 1-2; and Claim 12, lines 1-2, "… wherein the label is selected from two or more of the following …" appears to be "… wherein the label is selected from two or more of following …".
Appropriate correction is required.
Claim Rejections - 35 USC § 112
Applicant's amendment to claims corrects previous rejections; therefore, the previous rejections are withdrawn.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 3-4, 7, 9, 11-12, 15, 17, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Flanagan in view of Wu et al. ("Intent-aware Multi-source Contrastive Alignment for Tag-enhanced Recommendation", ARXIV ID: 2211.06370, Nov. 12, 2022), hereinafter Wu.
Independent Claims 1, 9, and 17
Flanagan discloses a method of training a model (Flanagan, ¶¶ [0040] and [0047] with FIG. 2: the application of Differential Privacy encoding applied to the model updates ΔXi, sent from the client or user equipment 100 back to the server 200 and the training of a machine learning model in federated mode with the application of differential privacy to the model updates ΔXi, as applied to the Federated Collaborative Filter (FCF)), comprising:
obtaining, at a client device (Flanagan, ¶ [0030] with 100 in FIG. 1: user equipment or device 100; ¶ [0039] with FIG. 2: one or more user equipment or device 100a-100m, also referred to as client devices) from a server (Flanagan, ¶ [0030] with 200 in FIG. 1: backend server 200; ¶ [0039] with FIG. 2: the backend server 200), a first version of a machine learning model for media content recommendation (Flanagan, ¶¶ [0006]-[0009], [0013]-[0014], and [0017]: download a master machine learning model for generating a user recommendation related to one or more of a use or interaction with an application of the user equipment; generate the user recommendation related to the use of the application (e.g., a video service running on the user equipment) based on the downloaded master machine learning model; the user uses the personalized recommendations that propose video choices to the user based on video preference selections, user demographic and/or gender data, or videos they have previously selected and/or watched through the service; the master machine learning model is downloaded from a backend server associated with an application service to a user equipment; ¶¶ [0031]-[0032] with FIG. 1: download a master machine learning model for generating a user recommendation related to use of an application of the user equipment 100; the master machine learning model can be downloaded from the backend server 200 to the user equipment 100; the user recommendation can provide one or more different options, or recommendations, to the user related to the use of the application or service, wherein the application is a video service; ¶¶ [0040]-[0041] with FIG. 2: the Master model Y for generating a recommendation in the Federated Learning (FL) mode is distributed to all of the user devices 100a-100m from the backend server 200; the Master model Y will be stored locally on the user equipment 100a-100m as Xi; ¶ [0048] with FIGS. 1-2 and 302 in FIG.3: a master machine learning model is downloaded 302 to a particular user device, such as user equipment 100 shown in FIG. 1; the master machine learning model can be downloaded from the server 200; as described above with respect to FIG. 2, this can be the master model Y);
recommending, based on local information of the client device, a first set of media contents according to the first version of the machine learning model (Flanagan, ¶¶ [0008]-[0009] and [0017]: generate the user recommendation related to the use of the application based on the downloaded master machine learning model and the data related to one or more of the user of the user equipment or the user interaction with the user equipment; minimize the risk of exposing user data by generating the recommendations on the user equipment; provide a high level of user privacy when the user uses the personalized recommendations that propose video choices to the user based on video preference selections, user demographic and/or gender data, or videos they have previously selected and/or watched through the service; ¶¶ [0031]-[0033] with FIG. 1: generating a user recommendation related to use of an application of the user equipment 100; the user recommendation can provide one or more different options, or recommendations, to the user related to the use of the application or service; when the application is a video service, the data can include information pertaining to a video watched by the user in the video service; another form of data can include information about the user; e.g., the data can include any form of user demographic data; the data can include meta data such as location of the user and the user equipment, a type of the user equipment, user gender, or user age, or any combination thereof; this data is obtained by and stored locally in the user equipment 100; ¶ [0042] with FIG. 2: using a combination of the locally stored master model Xi, and local user data, such as for example videos the user has previously watched, a set of personalized recommendations for the user of the user equipment 100a-100m can be generated; ¶ [0051]: Huawei video service provides an application to users to run on their mobile device that allows them to watch videos through the service; the service backend is hosted in a cloud service; the video service would like to offer users a personalized recommendation service to propose video choices to users based on videos they have previously watched through the service, as well as other user specific preferences and demographics; the video service would like to provide the highest level of user privacy they can when the user uses the personalized recommendations; the video service decides to use a Collaborative Filter (CF) recommendation algorithm/model and use a Federated Learning mode to build and update the CF model);
determining an update to the machine learning model based on respective interactions of a user with the first set of media contents (Flanagan, ¶¶ [0006], [0013], and [0031]-[0034] with FIG. 1: calculate a model update for the master machine learning model based on the master machine learning model and data related to one or more of the user of the user equipment or the user's interaction with the user equipment; the data, also referred to as user data, can have different types; e.g., the data can include data obtained or recorded from the user's interaction with the application or service; this can include data recorded based on a user's selection of an item or option of the application, or selection of one or more items being recommended; e.g., when the application is a video service, the data can include information pertaining to a video watched by the user in the video service; the data can include user behavioral data and/or user meta data, or any combination thereof; the calculated model update is encoded using an c-differential privacy mechanism, and the c-differential privacy encoded model update is then transmitted; ¶¶ [0039] and [0042]-[0043] with FIG. 2: model updates ΔXi are generated from the model Xi, and user data; the model updates ΔXi, to "learn" the model Y are then calculated in the user equipment 100a-100m for each user or client, such as Client 1-Client M, respectively, from the master model Xi, stored locally on a specific user equipment 100a-100m, and the corresponding local user data; Differential Privacy encoding is applied to the model update ΔXi, of a particular user equipment 100a-100m to give E(ΔXi) the DP encoded updates; ¶¶ [0048]-[0049] with FIG. 2 and 304 and 306 in FIG. 3: the machine learning model update is calculated 304, such as for example the model update ΔXi, described above with reference to FIG. 2; the model update ΔXi, is encoded 306 by applying ε-Differential Privacy); and
providing the update to the server (Flanagan, ¶¶ [0006], [0010], [0012]-[0013], [0015]-[0016], and [0018]-[0019]: the model updates uploaded from the user equipment to the backend server; the server apparatus receives a plurality of ε-differential privacy encoded model updates for a master machine learning model; use ε-Differential Privacy to encode the model updates sent from a user equipment to the backend in such a way that it is impossible or very difficult for any agent (including the backend itself) to intercept or view the encoded updates to reverse engineer the encoded updates to extract any useful information about the user data; ¶¶ [0034]-[0035] with FIG. 1: the encoded model update is transmitted to the apparatus 200, referred to herein as the server, or backend server; use ε-Differential Privacy (DP) to encode a model update on the user equipment or device 100 and decode an aggregation of user model updates on the backend server 200; hashing-randomization process is applied to the model updates (which are numbers) and instead of transferring the plain model updates from the user device 100 to the backend server 200, the encoded version is transferred from the user device 100 to the backend server 200; ¶¶ [0044]-[0047] with FIG. 2: the encoded model updates E(ΔXi) are transferred back to the back end server 200 and are aggregated on the server 200 as E(ΔY)=ΣiΔXi; a decoding is applied to E(ΔY) to give an approximation to ΔY; the master model Y is updated as Y=Y+η ΔŶ; the process can continue with the distribution of the updated master model Y, as described above; ¶¶ [0049]-[0050] and [0052] with FIGS. 1-2, 308 in FIG. 3, and FIG.4: the encoded model update, referred to as E(ΔXi) is then sent 308 to the backend server, such as the backend server 200 of FIGS. 1 and 2; a plurality of encoded model updates E(ΔXi) are received 310 at the backend server, such as backend server 200 illustrated in FIGS. 1 and 2; the plurality of encoded model updates E(ΔXi) will be for a given master model Y; the plurality of encoded model updates E(ΔXi) will be aggregated 312 and decoded 314, generally as described with respect to FIG. 2; the given master model Y will be updated 316; the video service applies ε-Differential Privacy to encode the model updates; the encoded model updates are then sent, via the cloud service, to the service backend; the service backend aggregates the encoded model updates and decodes the resulting aggregate to calculate an estimate of the actual model updates; in this manner, the privacy of the user is enhanced since the model updates cannot be decoded individually to learn anything about the user),
wherein determining the update to the machine learning model comprises: (Flanagan, ¶¶ [0006], [0013], and [0031]-[0034] with FIG. 1: calculate a model update for the master machine learning model based on the master machine learning model and data related to one or more of the user of the user equipment or the user's interaction with the user equipment; the data, also referred to as user data, can have different types; e.g., the data can include data obtained or recorded from the user's interaction with the application or service; this can include data recorded based on a user's selection of an item or option of the application, or selection of one or more items being recommended; e.g., when the application is a video service, the data can include information pertaining to a video watched by the user in the video service; the data can include user behavioral data and/or user meta data, or any combination thereof; the calculated model update is encoded using an c-differential privacy mechanism, and the c-differential privacy encoded model update is then transmitted; ¶¶ [0039] and [0042]-[0043] with FIG. 2: model updates ΔXi are generated from the model Xi, and user data; the model updates ΔXi, to "learn" the model Y are then calculated in the user equipment 100a-100m for each user or client, such as Client 1-Client M, respectively, from the master model Xi, stored locally on a specific user equipment 100a-100m, and the corresponding local user data; Differential Privacy encoding is applied to the model update ΔXi, of a particular user equipment 100a-100m to give E(ΔXi) the DP encoded updates; ¶¶ [0048]-[0049] with FIG. 2 and 304 and 306 in FIG. 3: the machine learning model update is calculated 304, such as for example the model update ΔXi, described above with reference to FIG. 2; the model update ΔXi, is encoded 306 by applying ε-Differential Privacy).
Flanagan further discloses an electronic device (Flanagan, ¶ [0030] with 100 in FIG. 1: the user equipment or device 100; ¶ [0053] with 100/200 in FIG. 1 and 1000 in FIG. 5: the apparatus 1000 is appropriate for use in a wireless network and can be implemented in one or more of the user equipment apparatus 100 or the backend server apparatus 200), comprising a computer processor (Flanagan, ¶ [0030] with 102 in FIG. 1: user equipment 100, includes one or more processors 102; ¶¶ [0054]-[0055] with 1002 in FIG. 5: processor or computing hardware 1002) coupled to a computer-readable memory unit (Flanagan, ¶ [0030] with 108 in FIG. 1: connected or coupled to one or more memory devices 108; ¶¶ [0054] and [0056] with 1004 in FIG. 5: coupled to a memory 1004), the memory unit comprising instructions that when executed by the computer processor implements a method for the media content recommendation described above (Flanagan, ¶ [0030] with FIG. 1: the processor 102 is configured to execute non-transitory machine readable program instructions; ¶¶ [0056]-[0057] with FIG. 5: the memory 1004 is configured to store computer program instructions that may be accessed and executed by the processor 1002 to cause the processor 1002 to perform a variety of desirable computer implemented processes or methods such as the methods as described herein).
Flanagan also discloses a computer program product, the computer program product comprising a non- transitory computer readable storage medium (Flanagan, ¶ [0030] with 108 in FIG. 1: one or more memory devices 108; ¶¶ [0054] and [0056] with 1004 in FIG. 5: a memory 1004) having program instructions embodied therewith (Flanagan, ¶ [0057]: the program instructions stored in memory 1004), the program instructions executable by an electronic device (Flanagan, ¶ [0030] with 100 in FIG. 1: the user equipment or device 100; ¶ [0053] with 100/200 in FIG. 1 and 1000 in FIG. 5: the apparatus 1000 is appropriate for use in a wireless network and can be implemented in one or more of the user equipment apparatus 100 or the backend server apparatus 200) to cause the electronic device to perform a method for media content recommendation described above (Flanagan, ¶¶ [0056]-[0057] with FIG. 5: the memory 1004 is configured to store computer program instructions that may be accessed and executed by the processor 1002 to cause the processor 1002 to perform a variety of desirable computer implemented processes or methods such as the methods as described herein).
Flanagan fails to explicitly disclose for a given media content in the first set of media contents, assigning, to the given media content, a label corresponding to an interaction of the user with the given media content, the label indicating a degree of interest of the user in the given media content; determining a difference between the label and a prediction of the given media content.
Wu teaches a system and a method related to recommendation service (Wu, Abstract), wherein for a given media content in the first set of media contents, assigning, to the given media content, a label corresponding to an interaction of the user with the given media content, the label indicating a degree of interest of the user in the given media content; determining a difference between the label and a prediction of the given media content (Wu, Abstract in Page 1: use a self-supervision signal to pair users with the auxiliary information (tags) associated with the items they have interacted with before; for a given item, the model predicts which is the correct pairing between the representations obtained from the users that have interacted with this item and the tags assigned to it; provide an efficient solution, using the auxiliary information (tags) directly to enhance the quality of user and item embeddings; user behavior in recommendation systems is driven by the complex interactions of many factors behind the users’ decision-making processes; to make the pairing process more fine-grained and avoid embedding collapse, propose a user intent-aware self-supervised pairing process where split the user embeddings into multiple sub-embedding vectors; each sub-embedding vector captures a specific user intent via self-supervised alignment with a particular cluster of tags; integrate designed framework with various recommendation models, demonstrating its flexibility and compatibility; Section I in Pages 1-2 with FIG. 1 in Page 1: recommendation systems are primarily interested in using the user-item interaction history to predict the users’ interests and thereby recommend potential satisfactory items to users; to alleviate the cold-start problem and improve the recommendation quality, auxiliary information (e.g., tags of items, reviews of items, profiles of users) is usually introduced into the item recommendation process to enrich the modeling of user-item interactions; the intent here means the motivation behind a user’s interaction with an item; e.g., a user’s intent behind visiting a restaurant may be to experience good "service" or to "taste" good food, and thus, "service" and "taste" are related to two kinds of user intents; a user may like a restaurant because she likes certain attributes (tags) of it (e.g., good service, wonderful taste, etc.), and it is unnecessary for her to like all the attributes of the restaurant; i.e., a review to a restaurant received from a reviewer with words only related to "good service" indicates that the reviewer has higher degree of interest in the "service" of the restaurant than the degree of interest in the food "taste" of the restaurant; similarly, a review to a restaurant received from a reviewer with words only related to "good taste" indicates that the reviewer has higher degree of interest in the food "taste" of the restaurant than the degree of interest in the "service" of the restaurant; focusing on tag-enhanced recommendation due to the ubiquity and accessibility of tags; propose a method to efficiently bridge the collaborative filtering (CF) signal and the auxiliary semantic information; the core idea is to refine the learned representations through contrastive objectives; in addition to modeling the user-item interaction using Bayesian Personalized Ranking (BPR), construct a self-supervised learning (SSL) task to conduct alignment between multiple sources (i.e., users, items, and tags); employ Intent-aware Representation Modeling (IRM) to decompose user and item embeddings into multiple components, where each component captures a specific intent whose semantic meaning is identified by a corresponding tag cluster, derived using a self-supervised end-to-end clustering method; meanwhile, enforce independence of different intents, ensuring that intents are effectively disentangled; introduce an Intent-aware Multi-source Contrastive Alignment (IMCA) module; for each intent and item, first aggregate associated users and tags; then, employ contrastive learning to optimize the alignment of the aggregated representations; also align user and item representations; propose an Intent-aware Set-to-set Alignment (ISA) module to improve the performance of IMCA on cold-start users and long-tail items; for each intent (tag cluster), identify whether two items are similar by evaluating the Jaccard index between the items’ tag sets, limiting our attention to tags in the cluster; then extend the intent-aware contrastive alignment in IMCA to optimize the alignment of aggregated user and tag representations derived from the sets of similar items, rather than individual items; interpret this as an intent-aware augmentation of the user-item interactions; it leads to an augmented interaction graph with a more uniform degree distribution, mitigating the problem of high-degree nodes exerting too much influence and improving learning significantly for low-degree nodes; proposed method are called Intent-aware Multi-source Contrastive Alignment for Tag-enhanced recommendation (IMCAT); Section II in Pages 3-4 and Section V.C in Pages 8-9: summarize the related works from three perspectives in the field of recommender systems: (i) the use of auxiliary tags, (ii) the use of knowledge graph, and (iii) the use of self-supervised learning technique; CFA represents the users by the tags they have interacted with, and then uses a deep neural network to extract the features layer-by-layer to predict the final score; DSPR makes use of a Multilayer Perceptron (MLP) to translate tag-based user and item profiles into an abstract embedding space, and then maximizes the similarity between the user representations and the relevant items; HDLPR leverages an autoencoder to compress tag-based user and item profiles into a low-rank feature space; TGCN builds a unified graph containing user nodes, item nodes, and tag nodes; it allows the model to leverage the contextual semantics of multi-hop neighbors in the user-tag-item graph through a message passing paradigm; focus on using tags since they are more accessible in practice; the KG-enhanced recommendation methods can be straightforwardly adapted and used in the tag-enhanced recommendation scenario; for applying these methods, treat the tags and items as entities, and each connection to a specific tag as a unique relation; most works for Self-supervised learning (SSL) use the contrastive learning method, which maximizes the similarity of the representation of a target sample with the representations of corresponding positive samples (mutations of the target sample) and minimizes the similarity with representations of negative samples (samples known to be different); SQN combines SSL with reinforcement learning to capture long-term user interests in sequential recommendation; incorporate auxiliary information via GNNs, construct self-supervised objectives from multiple sources to refine the representations for collaborative filtering, naturally bringing the tag information into training (i.e., using tags as training data); this reduces the time complexity and makes it compatible with a wide variety of recommendation models; make the positive sampling pairing process more fine-grained to avoid embedding collapse; the method is not restricted to a specific type of recommendation task (e.g., sequential recommendation), recommendation model (e.g., GNNs), or form of auxiliary knowledge (e.g., knowledge graph); the only auxiliary information required is tags, which are usually easy to obtain; this provides the strategy with a high level of compatibility; Sections III-IV in Pages 4-7 with FIG. 2 in Page 6, FIG. 3 in Page 7, and FIG. 4 in Page 8: denote the sets of all users, items, and tags as U, V, and T , respectively; for each user u [Symbol font/0xCE] U, the user preference data is represented by a set of items she has interacted with as Iu+ := { i [Symbol font/0xCE] I | Yu,i = 1 } where Yu,i [Symbol font/0xCE] R|U|[Symbol font/0xB4]|T| is the binary user-item rating matrix (i.e., indicating degree of user's interest in item); analogously, use Y'u,i [Symbol font/0xCE] R|I|[Symbol font/0xB4]|T| to represent the labelling history between items and tags; then split Iu+ into a training set Su+ and a test set Tu+; then the tag-enhanced top-N recommendation task is formulated as: given the training item set Su+, and the non-empty test item set Tu+ for user u, train a model to recommend an ordered set of N items Xu such that Xu ∩ Su+ = 0; and |Xu| = N; the model should learn from the collaborative filtering signal Y and the auxiliary tag information Y'; the recommendation quality is evaluated by a matching metric between Xu and Tu+ such as Recall@N; Bayesian Personalized Ranking (BPR) is one of the most widely studied methods in recommendation systems for learning the user preference from the implicit user-item interaction history; the core idea of BPR is to maximize the ranking of an item that the user has accessed (treated as a positive sample; i.e., a set of items she has interacted in Iu+) relative to a randomly sampled item (treated as a negative sample) (i.e., a user has higher degree of interest for items in positive samples than items in negative samples); this goal is achieved via a carefully designed loss function LUV as in eqn. (1), where (u, v+, v–) is a training triplet with a positive item v+ and a negative item v– for user u; ỹuv+ refers to the relevance score between u and v+; adopt a similar formulation for learning the relations between items and tags; this can be viewed as recommending tags to items based on the previous item-tag pairing history; the loss function LVT for this task can be formulated as eqn. (2), where (v, t+, t–) is a training triplet with a positive tag t+ and a negative tag t– for item v; use u, v, t to represent the individual embeddings of a user, item, and tag, respectively; to model the user-item interaction behavior in a more fine-grained manner, decompose the representation of each user and item into K components; each component (sub-embedding) aims to represent a distinct user preference; conduct the intent-aware initialization for users and items as represented in eqn. (3), where K denotes the number of user intents; multiple tags with similar semantic meanings can be regarded as a common factor that biases a user to interact with items with similar traits; e.g., consider a restaurant recommendation scenario; a tag cluster "delicious food, yummy, amazing dessert" may correspond to the same factor that the users like the restaurant due to the "taste" of the food, while a tag cluster "feel at home, friendly waiter" can be used to explain another intent corresponding to a desire for good "service"; focus on how to cluster tags so that the kth tag cluster can be properly aligned with the kth intent embedding for users and items; iteratively apply the K-means algorithm on the learned tag embeddings as the training procedure proceeds; the tag embeddings can be trained through the objective LUV + α LVT, where α is a scaling factor; employ an end-to-end self-supervised clustering approach to adaptively obtain the tag clusters; use a Student’s t-distribution to model the probability of assigning the tag tl to the kth cluster as eqn. (4); construct a target distribution which strives to push the representations closer to cluster centers, strengthening the cohesion of the clusters; the target distribution is defined as eqn. (5); construct a self-supervised loss objective LKL for the end-to-end clustering as the Kullback–Leibler (KL)-divergence between the above two matrices as eqn. (6); use a hard allocation to assign each item to one tag cluster; the assigned tag cluster index is determined by argmaxk(Qlk) for tag tl; introduce a new formulation by combining the two modalities of information (user-item collaborative signals Y and item-tag auxiliary information Y') into a common user-item-tag space using a contrastive learning paradigm; treat the items as a middle ground to link the information coming from the other two sources because items are present in both user-item interactions and item-tag labels; for a given item, conduct an aggregation on those users who previously interacted with the item, but design this aggregation to be intent-aware; use an arithmetic average on each intent component over the user embeddings; the tag clusters are obtained through self-supervised training, where each cluster is related to a user intent; now, for a given item vj, compute its tag cluster embedding for each cluster through aggregating only over those tags assigned to vj; conduct this across all K clusters and all |V| items; different items may have varying degrees of relatedness to distinct tag clusters and user intents; e.g., if an item has 10 tags related to intent 1 and only 1 tag related to intent 2, the item is more closely related to intent 1 than intent 2 (i.e., degree of user's interest on item for intent 1 (e.g., "taste") is higher than degree of user's interest on item for intent 2 (e.g., "service")); use a vector mj [Symbol font/0xCE] RK to store the relatedness of vj with respect to all intents; mj is computed based on the number of vj’s tags in each cluster, and the kth entry can be written as eqn. (9); define M = [m1, m2, …, m|V|]T [Symbol font/0xCE] R|V|[Symbol font/0xB4]K, where each row contains the relatedness of an item to all intents (i.e., matrix M indicating degree of user's interest on item for different intents obtained from user-item interactions Yu,i and item-tag labels Y'u,i of the training data); use this matrix for re-weighting the contrastive loss; aim to maximize the alignment of the pairs of positive samples from the sources of users and tags; first use a linear layer to transform the tag aggregation to make it share the same dimension as the user aggregation; propose to merge the user-tag and user-item intent-aware alignments into a single unified alignment task; maximize the alignment between the aggregated user representation for intent k and the sum of the item embedding and its corresponding aggregated tag embedding for intent k when j = j', and minimize it when j ≠ j'; adopt the commonly used InfoNCE loss formulation to maximize the cosine similarity of the correct pairings of user representations and item/tag representations in each training batch while minimizing the cosine similarity of the embeddings of the incorrect pairings; adopt a bidirectional contrastive alignment loss formulation LCA to ensure the pairing process across the multiple sources can be jointly exploited as eqn. (11); the user to item-tag (u2it) alignment under the kth intent is formalized as eqn. (12); use the predefined matrix M here to capture the degree of alignment for each item with respect to each intent based on the corresponding relatedness; Mj,k refers to the entry of M located at the jth row and the kth column, which denotes the relatedness of item vj to the kth cluster and intent (i.e., indicating degree of user's interest on item vj for kth intent); N(vj) is the set of negative samples of vj; treat all items other than j as candidate negatives; analogously, the item-tag to user (it2u) alignment under the kth intent can be formalized as eqn. (13); introduce a learnable nonlinear transformation between the representations and the contrastive loss, which further improves the quality of the learned representations; design more diverse positive sample pairs by aligning users with the tags not only from the items they have interacted with but also the tags from other similar items; this serves to enrich the representations of the cold start users and items; compute the similarity metric as shown in eqn. (15) based on the Jaccard index for items j and j' for the kth intent; then treat any pair of items larger than a predefined threshold as similar items and regard them within the same set under the kth user intent factor; obtain updated loss function LCA* as eqns. (16)-(17) from eqn. (11)-(13); adapt the model to allow forward and backward propagation for mini-batches of data; overall training objective can be formulated as L = LUV + α LVT + β LCA* + γ LKL in eqn. (18), where α, β, and γ are scaling factors; Section V.A in Pages 7-8: evaluate our proposed method on seven real-world datasets with different domains and sparsity, where the first three datasets are all released in HetRec 2011; HetRec-MV is a movie recommendation dataset; it links movies in the MovieLens dataset with their corresponding Internet Movie Database (IMDb) web pages and Rotten Tomatoes movie reviews, where each movie is assigned with tags provided by users; HetRec-FM is an artist recommendation dataset obtained from Last.fm; it contains social networks, music tags, and music-artist listening histories of users; HetRec-Del is gathered from the Delicious social bookmarking system, which contains social relations, bookmarks, and tags from users; AMZBook-Tag is a real-world online product recommendation dataset derived from the Amazon review datasets; to be consistent with the implicit feedback setting, for the datasets with explicit ratings, retain any ratings no less than four (out of five) as positive feedback and treat all other ratings as missing entries).
Flanagan and Wu are analogous art because they are from the same field of endeavor, a system and a method related to recommendation service. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to apply the teaching of Wu to Flanagan. Motivation for doing so would (1) improve the recommendation quality with the superior training efficiency; and (2) significantly improve the performance of the recommendation task, especially for cold-start users and longtail items (Wu, Section I in Pages 1-2).
Claims 3 and 11
Flanagan in view of Wu discloses all the elements as stated in Claims 1 and 9 respectively and further discloses wherein the label is selected from one or more of following: a first label corresponding to a positive user interaction, and a second label corresponding to a negative user interaction (Wu, Sections III-IV in Pages 4-7 with FIG. 2 in Page 6, FIG. 3 in Page 7, and FIG. 4 in Page 8: denote the sets of all users, items, and tags as U, V, and T , respectively; for each user u [Symbol font/0xCE] U, the user preference data is represented by a set of items she has interacted with as Iu+ := { i [Symbol font/0xCE] I | Yu,i = 1 } where Yu,i [Symbol font/0xCE] R|U|[Symbol font/0xB4]|T| is the binary user-item rating matrix; analogously, use Y'u,i [Symbol font/0xCE] R|I|[Symbol font/0xB4]|T| to represent the labelling history between items and tags; then split Iu+ into a training set Su+ and a test set Tu+; then the tag-enhanced top-N recommendation task is formulated as: given the training item set Su+, and the non-empty test item set Tu+ for user u, train a model to recommend an ordered set of N items Xu such that Xu ∩ Su+ = 0; and |Xu| = N; the model should learn from the collaborative filtering signal Y and the auxiliary tag information Y'; the recommendation quality is evaluated by a matching metric between Xu and Tu+ such as Recall@N; Bayesian Personalized Ranking (BPR) is one of the most widely studied methods in recommendation systems for learning the user preference from the implicit user-item interaction history; the core idea of BPR is to maximize the ranking of an item that the user has accessed (treated as a positive sample) relative to a randomly sampled item (treated as a negative sample); this goal is achieved via a carefully designed loss function LUV as in eqn. (1), where (u, v+, v–) is a training triplet with a positive item v+ and a negative item v– for user u; ỹuv+ refers to the relevance score between u and v+; adopt a similar formulation for learning the relations between items and tags; this can be viewed as recommending tags to items based on the previous item-tag pairing history; the loss function LVT for this task can be formulated as eqn. (2), where (v, t+, t–) is a training triplet with a positive tag t+ and a negative tag t– for item v; use u, v, t to represent the individual embeddings of a user, item, and tag, respectively; to model the user-item interaction behavior in a more fine-grained manner, decompose the representation of each user and item into K components; each component (sub-embedding) aims to represent a distinct user preference; conduct the intent-aware initialization for users and items as represented in eqn. (3), where K denotes the number of user intents; focus on how to cluster tags so that the kth tag cluster can be properly aligned with the kth intent embedding for users and items; iteratively apply the K-means algorithm on the learned tag embeddings as the training procedure proceeds; the tag embeddings can be trained through the objective LUV + α LVT, where α is a scaling factor; employ an end-to-end self-supervised clustering approach to adaptively obtain the tag clusters; use a Student’s t-distribution to model the probability of assigning the tag tl to the kth cluster as eqn. (4) construct a target distribution which strives to push the representations closer to cluster centers, strengthening the cohesion of the clusters; the target distribution is defined as eqn. (5); construct a self-supervised loss objective LKL for the end-to-end clustering as the Kullback–Leibler (KL)-divergence between the above two matrices as eqn. (6); use a hard allocation to assign each item to one tag cluster; the assigned tag cluster index is determined by argmaxk(Qlk) for tag tl; introduce a new formulation by combining the two modalities of information (user-item collaborative signals Y and item-tag auxiliary information Y') into a common user-item-tag space using a contrastive learning paradigm; treat the items as a middle ground to link the information coming from the other two sources because items are present in both user-item interactions and item-tag labels; for a given item, conduct an aggregation on those users who previously interacted with the item, but design this aggregation to be intent-aware; use an arithmetic average on each intent component over the user embeddings; the tag clusters are obtained through self-supervised training, where each cluster is related to a user intent; now, for a given item vj, compute its tag cluster embedding for each cluster through aggregating only over those tags assigned to vj; conduct this across all K clusters and all |V| items; different items may have varying degrees of relatedness to distinct tag clusters and user intents; e.g., if an item has 10 tags related to intent 1 and only 1 tag related to intent 2, the item is more closely related to intent 1 than intent 2; use a vector mj [Symbol font/0xCE] RK to store the relatedness of vj with respect to all intents; mj is computed based on the number of vj’s tags in each cluster, and the kth entry can be written as eqn. (9); define M = [m1, m2, …, m|V|]T [Symbol font/0xCE] R|V|[Symbol font/0xB4]K, where each row contains the relatedness of an item to all intents; use this matrix for re-weighting the contrastive loss; aim to maximize the alignment of the pairs of positive samples from the sources of users and tags; first use a linear layer to transform the tag aggregation to make it share the same dimension as the user aggregation; propose to merge the user-tag and user-item intent-aware alignments into a single unified alignment task; maximize the alignment between the aggregated user representation for intent k and the sum of the item embedding and its corresponding aggregated tag embedding for intent k when j = j', and minimize it when j ≠ j'; adopt the commonly used InfoNCE loss formulation to maximize the cosine similarity of the correct pairings of user representations and item/tag representations in each training batch while minimizing the cosine similarity of the embeddings of the incorrect pairings; adopt a bidirectional contrastive alignment loss formulation LCA to ensure the pairing process across the multiple sources can be jointly exploited as eqn. (11); the user to item-tag (u2it) alignment under the kth intent is formalized as eqn. (12); use the predefined matrix M here to capture the degree of alignment for each item with respect to each intent based on the corresponding relatedness; Mj,k refers to the entry of M located at the jth row and the kth column, which denotes the relatedness of item vj to the kth cluster and intent; N(vj) is the set of negative samples of vj; analogously, the item-tag to user (it2u) alignment under the kth intent can be formalized as eqn. (13); introduce a learnable nonlinear transformation between the representations and the contrastive loss, which further improves the quality of the learned representations; design more diverse positive sample pairs by aligning users with the tags not only from the items they have interacted with but also the tags from other similar items; this serves to enrich the representations of the cold start users and items; compute the similarity metric as shown in eqn. (15) based on the Jaccard index for items j and j' for the kth intent; then treat any pair of items larger than a predefined threshold as similar items and regard them within the same set under the kth user intent factor; obtain updated loss function LCA* as eqns. (16)-(18) from eqn. (11)-(13); adapt the model to allow forward and backward propagation for mini-batches of data; overall training objective can be formulated as L = LUV + α LVT + β LCA* + γ LKL in eqn. (18), where α, β, and γ are scaling factors).
It would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to apply the teaching of Wu to Flanagan. Motivation for doing so would (1) improve the recommendation quality with the superior training efficiency; and (2) significantly improve the performance of the recommendation task, especially for cold-start users and longtail items (Wu, Section I in Pages 1-2).
Claims 4 and 12
Flanagan in view of Wu discloses all the elements as stated in Claims 1 and 9 respectively and further discloses wherein the label is selected from two or more of following: a third label corresponding to tagging the given media content positively, a fourth label corresponding to spreading of the given media content, a fifth label corresponding to commenting the given media content, a sixth label corresponding to ignoring the given media content, and a seventh label corresponding to tagging the given media content negatively (Wu, Sections III-IV in Pages 4-7 with FIG. 2 in Page 6, FIG. 3 in Page 7, and FIG. 4 in Page 8: denote the sets of all users, items, and tags as U, V, and T , respectively; for each user u [Symbol font/0xCE] U, the user preference data is represented by a set of items she has interacted with as Iu+ := { i [Symbol font/0xCE] I | Yu,i = 1 } where Yu,i [Symbol font/0xCE] R|U|[Symbol font/0xB4]|T| is the binary user-item rating matrix; analogously, use Y'u,i [Symbol font/0xCE] R|I|[Symbol font/0xB4]|T| to represent the labelling history between items and tags; then split Iu+ into a training set Su+ and a test set Tu+; then the tag-enhanced top-N recommendation task is formulated as: given the training item set Su+, and the non-empty test item set Tu+ for user u, train a model to recommend an ordered set of N items Xu such that Xu ∩ Su+ = 0; and |Xu| = N; the model should learn from the collaborative filtering signal Y and the auxiliary tag information Y'; the recommendation quality is evaluated by a matching metric between Xu and Tu+ such as Recall@N; Bayesian Personalized Ranking (BPR) is one of the most widely studied methods in recommendation systems for learning the user preference from the implicit user-item interaction history; the core idea of BPR is to maximize the ranking of an item that the user has accessed (treated as a positive sample) relative to a randomly sampled item (treated as a negative sample); this goal is achieved via a carefully designed loss function LUV as in eqn. (1), where (u, v+, v–) is a training triplet with a positive item v+ and a negative item v– for user u; ỹuv+ refers to the relevance score between u and v+; adopt a similar formulation for learning the relations between items and tags; this can be viewed as recommending tags to items based on the previous item-tag pairing history; the loss function LVT for this task can be formulated as eqn. (2), where (v, t+, t–) is a training triplet with a positive tag t+ and a negative tag t– for item v; use u, v, t to represent the individual embeddings of a user, item, and tag, respectively; to model the user-item interaction behavior in a more fine-grained manner, decompose the representation of each user and item into K components; each component (sub-embedding) aims to represent a distinct user preference; conduct the intent-aware initialization for users and items as represented in eqn. (3), where K denotes the number of user intents; focus on how to cluster tags so that the kth tag cluster can be properly aligned with the kth intent embedding for users and items; iteratively apply the K-means algorithm on the learned tag embeddings as the training procedure proceeds; the tag embeddings can be trained through the objective LUV + α LVT, where α is a scaling factor; employ an end-to-end self-supervised clustering approach to adaptively obtain the tag clusters; use a Student’s t-distribution to model the probability of assigning the tag tl to the kth cluster as eqn. (4) construct a target distribution which strives to push the representations closer to cluster centers, strengthening the cohesion of the clusters; the target distribution is defined as eqn. (5); construct a self-supervised loss objective LKL for the end-to-end clustering as the Kullback–Leibler (KL)-divergence between the above two matrices as eqn. (6); use a hard allocation to assign each item to one tag cluster; the assigned tag cluster index is determined by argmaxk(Qlk) for tag tl; introduce a new formulation by combining the two modalities of information (user-item collaborative signals Y and item-tag auxiliary information Y') into a common user-item-tag space using a contrastive learning paradigm; treat the items as a middle ground to link the information coming from the other two sources because items are present in both user-item interactions and item-tag labels; for a given item, conduct an aggregation on those users who previously interacted with the item, but design this aggregation to be intent-aware; use an arithmetic average on each intent component over the user embeddings; the tag clusters are obtained through self-supervised training, where each cluster is related to a user intent; now, for a given item vj, compute its tag cluster embedding for each cluster through aggregating only over those tags assigned to vj; conduct this across all K clusters and all |V| items; different items may have varying degrees of relatedness to distinct tag clusters and user intents; e.g., if an item has 10 tags related to intent 1 and only 1 tag related to intent 2, the item is more closely related to intent 1 than intent 2; use a vector mj [Symbol font/0xCE] RK to store the relatedness of vj with respect to all intents; mj is computed based on the number of vj’s tags in each cluster, and the kth entry can be written as eqn. (9); define M = [m1, m2, …, m|V|]T [Symbol font/0xCE] R|V|[Symbol font/0xB4]K, where each row contains the relatedness of an item to all intents; use this matrix for re-weighting the contrastive loss; aim to maximize the alignment of the pairs of positive samples from the sources of users and tags; first use a linear layer to transform the tag aggregation to make it share the same dimension as the user aggregation; propose to merge the user-tag and user-item intent-aware alignments into a single unified alignment task; maximize the alignment between the aggregated user representation for intent k and the sum of the item embedding and its corresponding aggregated tag embedding for intent k when j = j', and minimize it when j ≠ j'; adopt the commonly used InfoNCE loss formulation to maximize the cosine similarity of the correct pairings of user representations and item/tag representations in each training batch while minimizing the cosine similarity of the embeddings of the incorrect pairings; adopt a bidirectional contrastive alignment loss formulation LCA to ensure the pairing process across the multiple sources can be jointly exploited as eqn. (11); the user to item-tag (u2it) alignment under the kth intent is formalized as eqn. (12); use the predefined matrix M here to capture the degree of alignment for each item with respect to each intent based on the corresponding relatedness; Mj,k refers to the entry of M located at the jth row and the kth column, which denotes the relatedness of item vj to the kth cluster and intent; N(vj) is the set of negative samples of vj; analogously, the item-tag to user (it2u) alignment under the kth intent can be formalized as eqn. (13); introduce a learnable nonlinear transformation between the representations and the contrastive loss, which further improves the quality of the learned representations; design more diverse positive sample pairs by aligning users with the tags not only from the items they have interacted with but also the tags from other similar items; this serves to enrich the representations of the cold start users and items; compute the similarity metric as shown in eqn. (15) based on the Jaccard index for items j and j' for the kth intent; then treat any pair of items larger than a predefined threshold as similar items and regard them within the same set under the kth user intent factor; obtain updated loss function LCA* as eqns. (16)-(18) from eqn. (11)-(13); adapt the model to allow forward and backward propagation for mini-batches of data; overall training objective can be formulated as L = LUV + α LVT + β LCA* + γ LKL in eqn. (18), where α, β, and γ are scaling factors; Section V.A in Pages 7-8: evaluate our proposed method on seven real-world datasets with different domains and sparsity, where the first three datasets are all released in HetRec 2011; HetRec-MV is a movie recommendation dataset; it links movies in the MovieLens dataset with their corresponding Internet Movie Database (IMDb) web pages and Rotten Tomatoes movie reviews, where each movie is assigned with tags provided by users; HetRec-FM is an artist recommendation dataset obtained from Last.fm; it contains social networks, music tags, and music-artist listening histories of users; HetRec-Del is gathered from the Delicious social bookmarking system, which contains social relations, bookmarks, and tags from users).
It would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to apply the teaching of Wu to Flanagan. Motivation for doing so would (1) improve the recommendation quality with the superior training efficiency; and (2) significantly improve the performance of the recommendation task, especially for cold-start users and longtail items (Wu, Section I in Pages 1-2).
Claims 7, 15, and 20
Flanagan in view of Wu discloses all the elements as stated in Claims 1, 9, and 17 respectively and further discloses obtaining, from the server, a second version of the machine learning model, the second version being updated from the first version at least based on the update and a further update provided by a further client device (Flanagan, ¶¶ [0039]-[0047] with FIG. 2: the Master model Yin the Federated Learning (FL) mode is distributed to all of the user devices 100a-100m from the backend server 200; model updates ΔXi are generated from the model Xi, and user data; the model updates ΔXi, to "learn" the model Y are then calculated in the user equipment 100a-100m for each user or client, such as Client 1-Client M, respectively, from the master model Xi, stored locally on a specific user equipment 100a-100m, and the corresponding local user data; Differential Privacy encoding is applied to the model update ΔXi, of a particular user equipment 100a-100m to give E(ΔXi) the DP encoded updates; the encoded model updates E(ΔXi) are transferred back to the back end server 200 and are aggregated on the server 200 as E(ΔY)=ΣiΔXi; a decoding is applied to E(ΔY) to give an approximation to ΔY; the master model Y is updated as Y=Y+η ΔŶ; the process can continue with the distribution of the updated master model Y, as described above; i.e., the distribution of the updated master model Y to all of the user devices 100a-100m from the backend server 200).
Claims 5 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Flanagan in view of Wu as applied to Claims 1 and 9 respectively above, and further in view of Gong et al. ("Real-time Short Video Recommendation on Mobile Devices", ARXIV ID: 2208.09577, Aug. 20, 2022, pp. 1-10), hereinafter Gong.
Claims 5 and 13
Flanagan in view of Wu discloses all the elements as stated in Claims 1 and 9 respectively and further discloses wherein determining the difference is response to determining the update utilizing the machine learning model (Flanagan, ¶¶ [0006], [0013], and [0031]-[0034] with FIG. 1: calculate a model update for the master machine learning model based on the master machine learning model and data related to one or more of the user of the user equipment or the user's interaction with the user equipment; the data, also referred to as user data, can have different types; e.g., the data can include data obtained or recorded from the user's interaction with the application or service; this can include data recorded based on a user's selection of an item or option of the application, or selection of one or more items being recommended; e.g., when the application is a video service, the data can include information pertaining to a video watched by the user in the video service; the data can include user behavioral data and/or user meta data, or any combination thereof; the calculated model update is encoded using an c-differential privacy mechanism, and the c-differential privacy encoded model update is then transmitted; ¶¶ [0039] and [0042]-[0043] with FIG. 2: model updates ΔXi are generated from the model Xi, and user data; the model updates ΔXi, to "learn" the model Y are then calculated in the user equipment 100a-100m for each user or client, such as Client 1-Client M, respectively, from the master model Xi, stored locally on a specific user equipment 100a-100m, and the corresponding local user data; Differential Privacy encoding is applied to the model update ΔXi, of a particular user equipment 100a-100m to give E(ΔXi) the DP encoded updates; ¶¶ [0048]-[0049] with FIG. 2 and 304 and 306 in FIG. 3: the machine learning model update is calculated 304, such as for example the model update ΔXi, described above with reference to FIG. 2; the model update ΔXi, is encoded 306 by applying ε-Differential Privacy) (Wu, Sections III-IV in Pages 4-7 with FIG. 2 in Page 6, FIG. 3 in Page 7, and FIG. 4 in Page 8: denote the sets of all users, items, and tags as U, V, and T , respectively; for each user u [Symbol font/0xCE] U, the user preference data is represented by a set of items she has interacted with as Iu+ := { i [Symbol font/0xCE] I | Yu,i = 1 } where Yu,i [Symbol font/0xCE] R|U|[Symbol font/0xB4]|T| is the binary user-item rating matrix; analogously, use Y'u,i [Symbol font/0xCE] R|I|[Symbol font/0xB4]|T| to represent the labelling history between items and tags; then split Iu+ into a training set Su+ and a test set Tu+; then the tag-enhanced top-N recommendation task is formulated as: given the training item set Su+, and the non-empty test item set Tu+ for user u, train a model to recommend an ordered set of N items Xu such that Xu ∩ Su+ = 0; and |Xu| = N; the model should learn from the collaborative filtering signal Y and the auxiliary tag information Y'; the recommendation quality is evaluated by a matching metric between Xu and Tu+ such as Recall@N; Bayesian Personalized Ranking (BPR) is one of the most widely studied methods in recommendation systems for learning the user preference from the implicit user-item interaction history; the core idea of BPR is to maximize the ranking of an item that the user has accessed (treated as a positive sample) relative to a randomly sampled item (treated as a negative sample); this goal is achieved via a carefully designed loss function LUV as in eqn. (1), where (u, v+, v–) is a training triplet with a positive item v+ and a negative item v– for user u; ỹuv+ refers to the relevance score between u and v+; adopt a similar formulation for learning the relations between items and tags; this can be viewed as recommending tags to items based on the previous item-tag pairing history; the loss function LVT for this task can be formulated as eqn. (2), where (v, t+, t–) is a training triplet with a positive tag t+ and a negative tag t– for item v; use u, v, t to represent the individual embeddings of a user, item, and tag, respectively; to model the user-item interaction behavior in a more fine-grained manner, decompose the representation of each user and item into K components; each component (sub-embedding) aims to represent a distinct user preference; conduct the intent-aware initialization for users and items as represented in eqn. (3), where K denotes the number of user intents; focus on how to cluster tags so that the kth tag cluster can be properly aligned with the kth intent embedding for users and items; iteratively apply the K-means algorithm on the learned tag embeddings as the training procedure proceeds; the tag embeddings can be trained through the objective LUV + α LVT, where α is a scaling factor; employ an end-to-end self-supervised clustering approach to adaptively obtain the tag clusters; use a Student’s t-distribution to model the probability of assigning the tag tl to the kth cluster as eqn. (4) construct a target distribution which strives to push the representations closer to cluster centers, strengthening the cohesion of the clusters; the target distribution is defined as eqn. (5); construct a self-supervised loss objective LKL for the end-to-end clustering as the Kullback–Leibler (KL)-divergence between the above two matrices as eqn. (6); use a hard allocation to assign each item to one tag cluster; the assigned tag cluster index is determined by argmaxk(Qlk) for tag tl; introduce a new formulation by combining the two modalities of information (user-item collaborative signals Y and item-tag auxiliary information Y') into a common user-item-tag space using a contrastive learning paradigm; treat the items as a middle ground to link the information coming from the other two sources because items are present in both user-item interactions and item-tag labels; for a given item, conduct an aggregation on those users who previously interacted with the item, but design this aggregation to be intent-aware; use an arithmetic average on each intent component over the user embeddings; the tag clusters are obtained through self-supervised training, where each cluster is related to a user intent; now, for a given item vj, compute its tag cluster embedding for each cluster through aggregating only over those tags assigned to vj; conduct this across all K clusters and all |V| items; different items may have varying degrees of relatedness to distinct tag clusters and user intents; e.g., if an item has 10 tags related to intent 1 and only 1 tag related to intent 2, the item is more closely related to intent 1 than intent 2; use a vector mj [Symbol font/0xCE] RK to store the relatedness of vj with respect to all intents; mj is computed based on the number of vj’s tags in each cluster, and the kth entry can be written as eqn. (9); define M = [m1, m2, …, m|V|]T [Symbol font/0xCE] R|V|[Symbol font/0xB4]K, where each row contains the relatedness of an item to all intents; use this matrix for re-weighting the contrastive loss; aim to maximize the alignment of the pairs of positive samples from the sources of users and tags; first use a linear layer to transform the tag aggregation to make it share the same dimension as the user aggregation; propose to merge the user-tag and user-item intent-aware alignments into a single unified alignment task; maximize the alignment between the aggregated user representation for intent k and the sum of the item embedding and its corresponding aggregated tag embedding for intent k when j = j', and minimize it when j ≠ j'; adopt the commonly used InfoNCE loss formulation to maximize the cosine similarity of the correct pairings of user representations and item/tag representations in each training batch while minimizing the cosine similarity of the embeddings of the incorrect pairings; adopt a bidirectional contrastive alignment loss formulation LCA to ensure the pairing process across the multiple sources can be jointly exploited as eqn. (11); the user to item-tag (u2it) alignment under the kth intent is formalized as eqn. (12); use the predefined matrix M here to capture the degree of alignment for each item with respect to each intent based on the corresponding relatedness; Mj,k refers to the entry of M located at the jth row and the kth column, which denotes the relatedness of item vj to the kth cluster and intent; N(vj) is the set of negative samples of vj; analogously, the item-tag to user (it2u) alignment under the kth intent can be formalized as eqn. (13); introduce a learnable nonlinear transformation between the representations and the contrastive loss, which further improves the quality of the learned representations; design more diverse positive sample pairs by aligning users with the tags not only from the items they have interacted with but also the tags from other similar items; this serves to enrich the representations of the cold start users and items; compute the similarity metric as shown in eqn. (15) based on the Jaccard index for items j and j' for the kth intent; then treat any pair of items larger than a predefined threshold as similar items and regard them within the same set under the kth user intent factor; obtain updated loss function LCA* as eqns. (16)-(18) from eqn. (11)-(13); adapt the model to allow forward and backward propagation for mini-batches of data; overall training objective can be formulated as L = LUV + α LVT + β LCA* + γ LKL in eqn. (18), where α, β, and γ are scaling factors).
Flanagan in view of Wu fails to explicitly disclose wherein determining the difference (i.e., update) is response to turning off a media content provision application utilizing the machine learning model.
Gong teaches a system and a method relating to video recommendation (Gong, TITLE and ABSTRACT in Page 1), wherein determining the difference (i.e., update) is response to turning off a media content provision application utilizing the machine learning model (Gong, ABSTRACT in Page 1: traditionally, recommender systems deployed at server side return a ranked list of videos for each request from client; thus it cannot adjust the recommendation results according to the user’s real-time feedback before the next request; however, as users continue to watch videos and feedback, the changing context leads the ranking of the server-side recommendation system inaccurate; propose to deploy a short video recommendation framework on mobile devices to solve these problems; specifically, design and deploy a tiny on-device ranking model to enable real-time re-ranking of server-side recommendation results; improve its prediction accuracy by exploiting users’ real-time feedback of watched videos and client-specific real-time features; Section 1 with FIG. 1 in Pages 1-3: it is very important for the recommender system to be both more accurate and more sensitive to user’s real-time feedback; traditionally, recommender systems are deployed at the server side, and are generally consisted of multiple stages, such as retrieval, ranking, and re-ranking etc.; since it is such a complicated system, the client usually sends pagination requests to the recommender system to fetch a page of results at once, and display them one by one to the user, in the order decided by the recommender system; after the user finishes watching one page of videos, the client sends another request to fetch the next page, and so on; due to the pagination request mechanism, the recommender system can only interact with client when a new request is sent to the server; it is impossible to adjust the content order according to the real-time feedback, even if there may exist some videos that match the user’s current interest on the client side; i.e., adjust the content order (i.e., updating ranking model based on new feedback) only occurs when a new request is sent from the client after finishes watching one page of videos; with the rapid increase of computational power and storage capability on mobile devices, it is possible to offload part of DNN model inference and even training on these devices; a natural benefit is that some lightweight models can be deployed on mobile devices, to provide real-time ranking capability, thus solve the above two problems; it can react immediately to users’ implicit (such as watching a video longer than a threshold) or explicit (such as liking or sharing a video) feedback, to make adjustment to remaining candidates accordingly; it is also able to make use of real-time features and client-specific features without any latency, to keep track of the changing context and improve model prediction accuracy; the key advantage of client-side recommendation is that utilize users’ real-time behaviors and some other signals that are not available at the server; by feeding those real-time and complementary signals only along with server-side predictions (such as predicted rates of effective views, likes, follows etc.) into a very lightweight edge-side model, user engagement metrics get substantially improved; to get better ranking result, consider not only immediate reward of the current candidate video, but also its influence on subsequent videos; listwise re-ranking approaches provide promising solutions to search for the best possible permutation of candidates with optimal total reward; on mobile devices, users usually can only see a very limited number 𝑛 of videos at a time (𝑛 = 1 in our immersive scenario as in Figure 1(a)), so the edge-side re-ranking only needs to determine the next 𝑛 videos; once the user finishes watching these videos, another re-ranking process can be triggered to order the following 𝑛 videos; finding a partially ordered list that approximates the optimal one, and propose an efficient technique for context-aware re-ranking, which uses novel adaptive beam search to reduce the searching complexity; present an edge-side re-ranking solution for short-video recommendation to leverage valuable real-time signals only available on mobile devices and overcome intrinsic limitations of traditional server-side recommender systems; propose a novel context-aware re-ranking algorithm specifically tailored for edge scenarios; Sections 3-5 with FIGS. 2-4 and Algorithm 1 in Pages 3-7: the whole framework can be divided into three modules, as show in Figure 2; the first module is a traditional recommender system deployed on the server side; it is consisted of retrieval, ranking, and re-ranking stages; the second module is a model training system; it first generates training samples from collected data; then use distributed training to train the ranking model in an incremental way; the third module is a recommendation system deployed on the client-side, which can be further divided into two parts: (a) Feature collection which collects features from both server side and client side, then joins them together to form complete input feature set to be sent to ranking model; and (b) Context-aware re-ranking: when the user swipes to watch next video, or likes/shares a video, the system will trigger the model on device to re-rank the candidates according to the user’s behavior; .these triggers are configurable in the system, and in production, only swipe is currently used to trigger re-rank; when the re-rank process is triggered, the client first generates input features from both watched video list and candidate set, then feed the input to re-ranking model, and a context-aware ranking method is used to sequentially generate an ordered list with largest ListReward as defined in Equation 4; after re-ranking, client inserts the top-ranked video at the next position; design a small but self-contained model for mobile devices, which is a complement of server-side model, in the sense that it mainly takes advantage of client-side user real-time feedback to improve prediction accuracy; ranking model on server side has compressed most of the information into the final prediction scores, so use this as input to avoid redundant computation, and make the model small enough; carefully choose the most important input features to keep the model as small as possible; these features can be classified into 3 categories: (a) server-side prediction – server-side ranking model is good at capturing user’s long term interest; (b) video static attributes – only use video category and duration attributes; and (c) client-side features – during running, client will collect many important features, such as user feedback, video watch time, etc., and focus on real-time feature; to enhance the influence of real-time features, add crossing features derived from them in model inputs; specifically, add the following crossing features: (a) 𝑝𝑋𝑇𝑅 diff, which is calculated as 𝑝𝑋𝑇𝑅 −𝑝𝑋𝑇𝑅ℎ, and with 𝑝𝑋𝑇𝑅 diff as input feature, the model can perceive user’s preference shift in real-time; i.e., instead of using a static score as anchor (such as average 𝑝𝑋𝑇𝑅 of user engaged videos in the past), using 𝑝𝑋𝑇𝑅 of recently watched videos can automatically adapt to user’s real-time interests; (b) time since last impression, which is calculated as 𝑣𝑡 − 𝑣𝑡ℎ , which is to capture the temporal importance of previously watched videos; and generally, the more recent an impression is, the more influential it will be; and (c) impression position gap between videos, which is calculates as 𝑣𝑝𝑜𝑠 − 𝑣𝑝𝑜𝑠ℎ, and this is similar to temporal diff, but it only considers impression position, which will be more stable if the user consumes videos at varying speed; in recommendation, mutual influence among items will lead to different user preferences for different ordering of the same set of candidates, which has been shown in previous work; the architecture of our mobile ranking model is presented in Figure 3, which has 4 types of inputs: (a) real-time watch history sequence which is the client-side maintained real-time watched video list and includes both video information and corresponding user feedback; (b) ordered candidates list is added to help model the interactions between ordered candidates and target video, in order to facilitate the context-aware planning method; (c) target video which is the video to be predicted; and (d) other features are mainly contextual features such as impression position and network condition etc.; the watch history sequence is modeled using a multi-head attention (MHA) module with target attention, calculated as Equation (2), where 𝑸, 𝑲, 𝑽 are the query, key and value, respectively, and 𝑑 is the embedding dimension; to explicitly model influence of already ordered candidates on current target video to be predicted, use another MHA module with target attention, in which the key 𝑲 and value 𝑽 are projected from features of ordered candidates; there are many targets to consider, including watch time, user interactions (e.g., like, share, comment), etc.; 3 targets are chosen, which are "has_next", "effective_view", and "like"; "has_next" is defined as the user continues to watch videos after current one; in immersive scenario where video will automatically start playing in full screen mode, there is no “click” operation, so we define an "effective_view" label as user watches a video longer than a threshold (e.g., 5 seconds), and videos in different duration intervals have different thresholds; "like" is defined as the user likes current video by clicking the like button or double tapping/long pressing screen; train the model in a multi-task learning fashion; the loss function is defined as the sum of log losses of each target, averaged by the number of training samples as shown in Equation (3); optimize the set of model parameters Θ by minimizing the loss function L(Θ) through gradient decent (i.e., calculating the difference for updating model); once a user finishes watching a video and generates new real-time ranking signals, responsively update our client-side model predictions and trigger a new re-ranking process; find the optimal permutation P of the candidate set C, which leads to maximum ListReward (LR) defined as shown in Equations (4)-(5); beam search is a commonly used approximation solution to such problem, which reduces the time complexity to 𝑂(𝑘𝑚2), where 𝑘 is the beam size; propose a novel beam search strategy to choose an adaptive search step 𝑛 ≤ 𝑙 ≪ 𝑚 and further reduce the searching time complexity to 𝑂(𝑘𝑙𝑚); the algorithm is sketched in Algorithm 1, and it is implemented inside the exported TFLite execution graph).
Flanagan in view of Wu, and Gong are analogous art because they are from the same field of endeavor, a system and a method relating to video recommendation. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to apply the teaching of Gong to Flanagan in view of Wu. Motivation for doing so would improve prediction accuracy by exploiting user's .
Claims 6, 8, 14, 16, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Flanagan in view of Wu as applied to Claims 1, 9, and 17 respectively above, and further in view of AMAD-UD-DIN et al. (US 2022/0012601 A1, pub. date: 01/13/2022; filed on 09/24/2021), hereinafter AMAD-UD-DIN'601.
Claims 6, 14, and 19
Flanagan in view of Wu discloses all the elements as stated in Claims 1, 9, and 17 respectively and further discloses wherein recommending the first set of media contents comprises: obtaining, from the server, content information concerning a first number of candidate media contents (Flanagan, ¶ [0051]: Huawei video service provides an application to users to run on their mobile device that allows them to watch videos through the service; the service backend is hosted in a cloud service; the video service would like to offer users a personalized recommendation service to propose video choices to users based on videos they have previously watched through the service, as well as other user specific preferences and demographics; the video service would like to provide the highest level of user privacy they can when the user uses the personalized recommendations; the video service decides to use a Collaborative Filter (CF) recommendation algorithm/model and use a Federated Learning mode to build and update the CF model; ¶¶ [0008] and [0031]-[0033] with FIG. 1: generate the user recommendation related to the use of the application based on the downloaded master machine learning model and the data related to one or more of the user of the user equipment or the user interaction with the user equipment; the user recommendation can provide one or more different options, or recommendations, to the user related to the use of the application or service; the data, also referred to as user data, can have different types; e.g. the data can include data obtained or recorded from the user's interaction with the application or service; this can include data recorded based on a user's selection of an item or option of the application, or selection of one or more items being recommended; e.g., when the application is a video service, the data can include information pertaining to a video watched by the user in the video service; another form of data can include information about the user; e.g. the data can include any form of user demographic data; the data can include meta data such as location of the user and the user equipment, a type of the user equipment, user gender, or user age, or any combination thereof; the data can include user behavioral data and/or user meta data, or any combination thereof; this data is obtained by and stored locally in the user equipment 100; this type of data is obtained in any suitable manner and stored on any suitable storage medium accessible by the user equipment 100; ¶¶ [0009] and [0017]: provide a high level of user privacy when the user uses the personalized recommendations that propose video choices to the user based on video preference selections, user demographic and/or gender data, or videos they have previously selected and/or watched through the service; ¶ [0042] with FIG. 1: using a combination of the locally stored master model Xi, and local user data, such as for example videos the user has previously watched, a set of personalized recommendations for the user of the user equipment 100a-100m can be generated).
Flanagan in view of Wu fails to explicitly disclose wherein the first number of candidate media contents being selected from a second number of candidate media contents.
AMAD-UD-DIN'601 teaches a system and a method relating to Federated Recommendation (AMAD-UD-DIN'601, ¶ [0002]), wherein the first number of candidate media contents being selected from a second number of candidate media contents (AMAD-UD-DIN'601, ¶¶ [0041]-[0044] and [0048]-[0049] with FIGS. 2-3: the server side 202 of the recommendation system 300 is composed of one or more processors running two algorithms operating in Federated Learning mode; a Collaborative Filter (CF) 312 is used to generate a user specific candidate set of video recommendations; a Predictive Model (PM) 314 is used to score each video in the candidate set and to generate the final video recommendations; a client on the client side 200, also referred to herein as a client side device or client side devices, is also composed of one or more processors running two algorithms operating in Federated Learning mode; a Collaborative Filter (CF) 322 on the client side 200 is used to receive and generate a user specific candidate set 325 of video recommendations; a Predictive Model (PM) 324 on the client side 200 is used to score 326 each video in the candidate set and to generate the final set 327 of video recommendations; each client 200a-200n on the client side will generate a final set 327 of video recommendations; the Collaborative Filter 322 generates the candidate set 325 based on a user's video watch event or behavioral data; the Predictive Model 324 re-scores 326 the candidate set 325 based on the user's personal data; the candidate set 325 is seen as a sub-set of the total number of videos, filtered based on the user's watching behavior; this filtered set is then re-scored 326 such that the videos which have high probability of being liked by the user get a high score and are recommended; each of the master models and metrics described above are distributed to each of Huawei Video services user devices on the client side 200; the master models along with the metrics from the server side 202, referred to as the local master models on the client side 200, now reside on the user devices, such as the user devices 200a-200n shown in FIG. 2, and have the same hyper-parameter configurations as the master models on the servers 202; the local master models that now reside on the client side 200 are generally configured to generate recommendations, update and train and evaluate; the local master model of the collaborative filter 322 is used to generate a candidate set 325 of videos for the user using the local user data which can include, but is not limited to, the videos watched by the user on that device; the generated candidate set 325 of videos is scored 326 by the local predictive module 324 based on user personal data which can include for example, but is not limited to other applications used by the user, date of birth stored on the user device, location of the device etc.; the result of the scoring 326 is the final list or set 327 of videos, which is generated or provided as a personal set of video recommendations to the user; the locally generated video recommendations can then be shown or otherwise presented to the user on the device; in this manner, the user of a particular client device 200 is encouraged to select one or more of the video recommendations from this personalized set 327 for watching).
Flanagan in view of Wu, and AMAD-UD-DIN'601 are analogous art because they are from the same field of endeavor, a system and a method relating to Federated Recommendation. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to apply the teaching of AMAD-UD-DIN'601 to Flanagan in view of Wu. Motivation for doing so would maximize the client's privacy, improve the quality of recommendations, and reduce the computational complexity at the same time (AMAD-UD-DIN'601, ¶¶ [0002], [0004], [0006], [0009], [0012]-[0014], [0017], [0032], [0040], and [0074]).
Claims 8 and 16
Flanagan in view of Wu discloses all the elements as stated in Claims 7 and 15 respectively and further discloses recommending, based on the local information, a second set of media contents according to the second version of the machine learning model (Flanagan, ¶¶ [0039]-[0047] with FIG. 2: the Master model Yin the Federated Learning (FL) mode is distributed to all of the user devices 100a-100m from the backend server 200; using a combination of the locally stored master model Xi, and local user data, such as for example videos the user has previously watched, a set of personalized recommendations for the user of the user equipment 100a-100m can be generated; the process can continue with the distribution of the updated master model Y, as described above; i.e., the distribution of the updated master model Y to all of the user devices 100a-100m from the backend server 200, and generate new recommendations based on the updated master model Y).
Flanagan in view of Wu fails to explicitly disclose wherein determining a metric for evaluating the machine learning model based on respective interactions of the user with the second set of media contents; and providing the metric to the server.
AMAD-UD-DIN'601 teaches a system and a method relating to Federated Recommendation (AMAD-UD-DIN'601, ¶ [0002]), wherein determining a metric for evaluating the machine learning model based on respective interactions of the user with the second set of media contents; and providing the metric to the server (AMAD-UD-DIN'601, ¶ [0017] and [0035]: the only information that is required from the clients, without knowing their identities, is the validation set performances, also referred as accuracy metrics; ¶ [0038] with FIG. 2: the hyper-parameter optimizer 204 receives as an input 212 from the Federated Learning Server master model 202, current hyper-parameter configuration and performance metrics; the Federated Learning Server master model 202 collects and stores the current configuration and performance metrics as part of model updates sent by clients; ¶¶ [0041]-[0057] with FIGS. 2-4: a Collaborative Filter (CF) 322 on the client side 200 is used to receive and generate a user specific candidate set 325 of video recommendations; a Predictive Model (PM) 324 on the client side 200 is used to score 326 each video in the candidate set and to generate the final set 327 of video recommendations; the Collaborative Filter 322 generates the candidate set 325 based on a user's video watch event or behavioral data; the Predictive Model 324 re-scores 326 the candidate set 325 based on the user's personal data; the candidate set 325 is seen as a sub-set of the total number of videos, filtered based on the user's watching behavior. This filtered set is then re-scored 326 such that the videos which have high probability of being liked by the user get a high score and are recommended; Huawei Video Service initializes validation set performance metrics namely Root Mean Squared Error (RMSE) and log-loss on its server 202, one for the Collaborative Filter 312 and one for the Predictive Model 314, respectively; Huawei Video Service creates two master models on its server 202, one for the Collaborative Filter 312 and one for the Predictive Model 314; the two master models are initialized with the respective hyper-parameters suggested by the hyper-parameter optimizer 204; each of the master models and metrics described above are distributed to each of Huawei Video services user devices on the client side 200; the master models along with the metrics from the server side 202, referred to as the local master models on the client side 200, now reside on the user devices, such as the user devices 200a-200n shown in FIG. 2, and have the same hyper-parameter configurations as the master models on the servers 202; the local master models that now reside on the client side 200 are generally configured to generate recommendations, update and train and evaluate; based on the user's viewing of different videos in the Huawei video service, the different videos are randomly divided into training, validation and test sets; using the training set, the local master model of the respective collaborative filter 322 is updated; the local master model updates for each user, or client side device 200, are different and independent; using the training set and based on the user's personal data, such as for example, the user's uses of other services on the device, the user's age and gender, the local predictive model 324 is updated; on the client side 200, using the local data, the validation set 325 and training set 327 video recommendations are generated for each user independently; the training set 327 is used to update the local model; the validation set 325 is used to evaluate the local model and compute the validation set performance metrics; the validation set recommendations are evaluated to update the validation set performance metrics; the validation set performance metrics updates for the local collaborative filter 322 and predictive model 324 models are transferred back to the Federated Learning Server, or in this example, the Huawei video service server 202, where the Federated Learning Master Model is residing; the collaborative filter model updates received from the client side devices 200 are aggregated 402 to update the collaborative filter 312 master model; the collaborative filter validation set performance metric updates received from each client side device 200 are averaged to create a new updated collaborative filter metric, referred to as RMSE; the server 202 is also configured to aggregate the predictive model updates obtained from each client 200 and update the master model of the predictive model 314 of the server 202. The validation set performance metric updates received from each client for the predictive model 324 are averaged to create a new updated predicted model metric, generally referred to herein as log-loss).
Flanagan in view of Wu, and AMAD-UD-DIN'601 are analogous art because they are from the same field of endeavor, a system and a method relating to Federated Recommendation. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to apply the teaching of AMAD-UD-DIN'601 to Flanagan in view of Wu. Motivation for doing so would maximize the client's privacy.
Response to Arguments
Applicant's arguments filed 01/20/2026 have been fully considered but they are not persuasive.
Applicant argues on Pages 10-16 of the Remarks that Flanagan and Wu fail to disclose or suggest at least "wherein determining the update to the machine learning model comprises: for a given media content in the first set of media contents, assigning, to the given media content, a label corresponding to an interaction of the user with the given media content, the label indicating a degree of interest of the user in the given media content; determining a difference between the label and a prediction of the given media content by the first version of the machine learning model; and determining the update based on respective differences determined for the first set of media contents," as now recited in claim 1.
In response, examiner respectfully disagrees. Flanagan discloses in ¶¶ [0006], [0013], [0031]-[0034], [0039], [0042]-[0049] with FIGS. 1-3 that (1) calculate a model update for the master machine learning model based on the master machine learning model and data related to one or more of the user of the user equipment or the user's interaction with the user equipment, wherein the data include data obtained or recorded from the user's interaction with the application or service (e.g., video service), data recorded based on a user's selection of an item (e.g., video) or option of the application, or selection of one or more items (e.g., videos) being recommended, user behavioral data and/or user meta data, or any combination thereof; (2) the calculated model update is encoded using an c-differential privacy mechanism; (3) model updates ΔXi are generated from the model Xi, and user data, wherein the model updates ΔXi, to "learn" the model Y are then calculated in the user equipment 100a-100m for each user or client, such as Client 1-Client M, respectively, from the master model Xi, stored locally on a specific user equipment 100a-100m, and the corresponding local user data; and (4) the application of Differential Privacy encoding applied to the model updates ΔXi sent from the client or user equipment 100 back to the server 200 and the training of a machine learning model in federated mode with the application of differential privacy to the model updates v as applied to the FCF; i.e., Flanagan teaches "wherein determining the update to the machine learning model comprises: determining a difference for media contents interacted in each user equipment by the first version of the machine learning model (e.g., calculate delta of parameters ΔXi in a machine learning model to correct errors in the machine learning model during local training); and determining the update based on respective differences determined for the first set of media contents (e.g., updating the machine learning model globally by aggregating each model update ΔXi determined for media contents interacted in each user equipment)". Wu teaches in Abstract and Sections I-IV, and V.A. with FIGS. 1-4 that (1) use a self-supervision signal to pair users with the auxiliary information (tags) associated with the items they have interacted with before, wherein provide an efficient solution, using the auxiliary information (tags) directly to enhance the quality of user and item embeddings; e.g., for a given item, the model predicts which is the correct pairing between the representations obtained from the users that have interacted with this item and the tags assigned to it (i.e., generating labeled pairs (users, tags, and items) based on user interactions); (2) to make the pairing process more fine-grained and avoid embedding collapse, propose a user intent-aware self-supervised pairing process where split the user embeddings into multiple sub-embedding vectors; each sub-embedding vector captures a specific user intent via self-supervised alignment with a particular cluster of tags; (3) recommendation systems are primarily interested in using the user-item interaction history to predict the users’ interests and thereby recommend potential satisfactory items to users; (4) to alleviate the cold-start problem and improve the recommendation quality, auxiliary information (e.g., tags of items, reviews of items, profiles of users) is usually introduced into the item recommendation process to enrich the modeling of user-item interactions, wherein the intent here means the motivation behind a user’s interaction with an item; e.g., a user’s intent behind visiting a restaurant may be to experience good "service" or to "taste" good food, and thus, "service" and "taste" are related to two kinds of user intents, and a user may like a restaurant because she likes certain attributes (tags) of it (e.g., good service, wonderful taste, etc.), and it is unnecessary for her to like all the attributes of the restaurant (i.e., a review to a restaurant received from a reviewer with words only related to "good service" indicates that the reviewer has higher degree of interest in the "service" of the restaurant than the degree of interest in the food "taste" of the restaurant; similarly, a review to a restaurant received from a reviewer with words only related to "good taste" indicates that the reviewer has higher degree of interest in the food "taste" of the restaurant than the degree of interest in the "service" of the restaurant); (5) employ Intent-aware Representation Modeling (IRM) to decompose user and item embeddings into multiple components, where each component captures a specific intent whose semantic meaning is identified by a corresponding tag cluster, derived using a self-supervised end-to-end clustering method, and enforce independence of different intents, ensuring that intents are effectively disentangled; (5) for each intent (tag cluster), identify whether two items are similar by evaluating the Jaccard index between the items’ tag sets, limiting our attention to tags in the cluster, and then extend the intent-aware contrastive alignment in IMCA to optimize the alignment of aggregated user and tag representations derived from the sets of similar items, rather than individual items; (6) most works for Self-supervised learning (SSL) use the contrastive learning method, which maximizes the similarity of the representation of a target sample with the representations of corresponding positive samples (mutations of the target sample) and minimizes the similarity with representations of negative samples (samples known to be different); (7) construct self-supervised objectives from multiple sources to refine the representations for collaborative filtering, naturally bringing the tag information into training (i.e., using tags as training data); (8) for each user u [Symbol font/0xCE] U, (a) the user preference data is represented by a set of items she has interacted with as Iu+ := { i [Symbol font/0xCE] I | Yu,i = 1 } where Yu,i [Symbol font/0xCE] R|U|[Symbol font/0xB4]|T| is the binary user-item rating matrix (i.e., indicating degree of user's interest in item); (b) analogously, use Y'u,i [Symbol font/0xCE] R|I|[Symbol font/0xB4]|T| to represent the labelling history between items and tags; (c) then split Iu+ into a training set Su+ and a test set Tu+; (d) then the tag-enhanced top-N recommendation task is formulated as: given the training item set Su+, and the non-empty test item set Tu+ for user u, train a model to recommend an ordered set of N items Xu such that Xu ∩ Su+ = 0; and |Xu| = N; (e) the model should learn from the collaborative filtering signal Y and the auxiliary tag information Y'; (9) the recommendation quality is evaluated by a matching metric between Xu and Tu+ such as Recall@N; (10) Bayesian Personalized Ranking (BPR) is one of the most widely studied methods in recommendation systems for learning the user preference from the implicit user-item interaction history; the core idea of BPR is to maximize the ranking of an item that the user has accessed (treated as a positive sample; i.e., a set of items she has interacted in Iu+) relative to a randomly sampled item (treated as a negative sample) (i.e., a user has higher degree of interest for items in positive samples than items in negative samples); (11) this goal is achieved via a carefully designed loss function LUV as in eqn. (1) based on difference between ỹuv+ and ỹuv-, where (u, v+, v–) is a training triplet with a positive item v+ and a negative item v– for user u; and ỹuv+ refers to the relevance score between u and v+; (12) adopt a similar formulation for learning the relations between items and tags, which can be viewed as recommending tags to items based on the previous item-tag pairing history; i.e., the loss function LVT for this task can be formulated as eqn. (2) based on difference between ỹvt+ and ỹvt-, where (v, t+, t–) is a training triplet with a positive tag t+ and a negative tag t– for item v; (13) use u, v, t to represent the individual embeddings of a user, item, and tag, respectively; (14) to model the user-item interaction behavior in a more fine-grained manner, decompose the representation of each user and item into K components, wherein each component (sub-embedding) aims to represent a distinct user preference and K denotes the number of user intents; (15) multiple tags with similar semantic meanings can be regarded as a common factor that biases a user to interact with items with similar traits; e.g., consider a restaurant recommendation scenario; a tag cluster "delicious food, yummy, amazing dessert" may correspond to the same factor that the users like the restaurant due to the "taste" of the food, while a tag cluster "feel at home, friendly waiter" can be used to explain another intent corresponding to a desire for good "service"; (16) focus on how to cluster tags so that the kth tag cluster can be properly aligned with the kth intent embedding for users and items; (17) employ an end-to-end self-supervised clustering approach to adaptively obtain the tag clusters using a Student’s t-distribution to model the probability of assigning the tag tl to the kth cluster as eqn. (4); (18) to build a self-supervised signal in the clustering task for an end-to-end learning, construct a target distribution which strives to push the representations closer to cluster centers, strengthening the cohesion of the clusters, wherein the target distribution is defined as eqn. (5); (19) construct a self-supervised loss objective LKL for the end-to-end clustering as the Kullback–Leibler (KL)-divergence between the above two matrices as eqn. (6); (20) use a hard allocation to assign each item to one tag cluster, wherein the assigned tag cluster index is determined by argmaxk(Qlk) for tag tl; (21) introduce a new formulation by combining the two modalities of information (user-item collaborative signals Y and item-tag auxiliary information Y') into a common user-item-tag space using a contrastive learning paradigm by treating the items as a middle ground to link the information coming from the other two sources because items are present in both user-item interactions and item-tag labels; (22) for a given item, conduct an aggregation on those users who previously interacted with the item, but design this aggregation to be intent-aware; (23) for a given item vj, compute its tag cluster embedding for each cluster through aggregating only over those tags assigned to vj; conduct this across all K clusters and all |V| items; (24) different items may have varying degrees of relatedness to distinct tag clusters and user intents; e.g., if an item has 10 tags related to intent 1 and only 1 tag related to intent 2, the item is more closely related to intent 1 than intent 2 (i.e., degree of user's interest on item for intent 1 (e.g., "taste") is higher than degree of user's interest on item for intent 2 (e.g., "service")); (25) use a vector mj [Symbol font/0xCE] RK to store the relatedness of vj with respect to all intents, wherein mj is computed based on the number of vj’s tags in each cluster, and the kth entry can be written as eqn. (9); (26) define M = [m1, m2, …, m|V|]T [Symbol font/0xCE] R|V|[Symbol font/0xB4]K, where each row contains the relatedness of an item to all intents (i.e., matrix M indicating degree of user's interest on item for different intents obtained from user-item interactions Yu,i and item-tag labels Y'u,i of the training data); (27) aim to maximize the alignment of the pairs of positive samples from the sources of users and tags; (28) maximize the alignment between the aggregated user representation for intent k and the sum of the item embedding and its corresponding aggregated tag embedding for intent k when j = j', and minimize it when j ≠ j'; (29) adopt the commonly used InfoNCE loss formulation to maximize the cosine similarity of the correct pairings (i.e., correct labeling) of user representations and item/tag representations in each training batch while minimizing the cosine similarity of the embeddings of the incorrect pairings (i.e., incorrect labeling); (30) adopt a bidirectional contrastive alignment loss formulation LCA to ensure the pairing process across the multiple sources can be jointly exploited as eqn. (11); (31) the user to item-tag (u2it) alignment under the kth intent is formalized as eqn. (12) which uses the predefined matrix M to capture the degree of alignment for each item with respect to each intent based on the corresponding relatedness; Mj,k refers to the entry of M located at the jth row and the kth column, which denotes the relatedness of item vj to the kth cluster and intent (i.e., indicating degree of user's interest on item vj for kth intent); (32) analogously, the item-tag to user (it2u) alignment under the kth intent can be formalized as eqn. (13); (33) design more diverse positive sample pairs by aligning users with the tags not only from the items they have interacted with but also the tags from other similar items, which this serves to enrich the representations of the cold start users and items; and adapt the model to allow forward and backward propagation for mini-batches of data; overall training objective can be formulated as L = LUV + α LVT + β LCA* + γ LKL in eqn. (18), where α, β, and γ are scaling factors; (34) HetRec-MV is a movie recommendation dataset which links movies in the MovieLens dataset with their corresponding Internet Movie Database (IMDb) web pages and Rotten Tomatoes movie reviews, where each movie is assigned with tags provided by users; and (35) to be consistent with the implicit feedback setting, for the datasets with explicit ratings, retain any ratings no less than four (out of five) as positive feedback and treat all other ratings as missing entries. In other words, Wu DOES teach that for a given media content in the first set of media contents (e.g., a given item/restaurant/movie in a set of items/restaurants/movies v), assigning, to the given media content, a label corresponding to an interaction of the user with the given media content (e.g., assigning the tag tl to the kth cluster, assign each item to one tag cluster; assigning the tag tl to the kth cluster, tags assigned to vj, each movie is assigned with tags provided by users, and pairing between the representations obtained from the users that have interacted with this item and the tags assigned to it (i.e., generating labeled pairs (users u, tags t, and items v) based on training data from user-item interaction history and auxiliary information (e.g., tags of items, reviews of items, profiles of users)), the label indicating a degree of interest of the user in the given media content (e.g., "delicious food, yummy, amazing dessert", "feel at home, friendly waiter", if an item has 10 tags related to intent 1 and only 1 tag related to intent 2, the item is more closely related to intent 1 than intent 2 (i.e., degree of user's interest on the item for intent 1 (e.g., "taste" for a restaurant/"story" for a movie) is higher than degree of user's interest on item for intent 2 (e.g., "service" restaurant/"actor" for a movie), and explicit ratings – ratings four out of five); determining a difference between the label and a prediction of the given media content by the first version of the machine learning model (e.g., matching metric between Xu and Tu+, maximize correct pairing (i.e., correct labeling) and minimize incorrect parring (i.e., incorrect labeling or errors) using forward and backward propagation for mini-batches of data to perform model training and updates). Therefore, the combination of Flanagan and Wu DOES disclose or suggest at least "wherein determining the update to the machine learning model comprises: for a given media content in the first set of media contents, assigning, to the given media content, a label corresponding to an interaction of the user with the given media content, the label indicating a degree of interest of the user in the given media content; determining a difference between the label and a prediction of the given media content by the first version of the machine learning model; and determining the update based on respective differences determined for the first set of media contents," as now recited in Claims 1, 9, and 17.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Miaoet al. (US 2017/009 1651 A1, pub. date on 03/30/2017) discloses in Abstract that (1) performing version control for asynchronous distributed machine learning; (2) transmit a first global version of a statistical model to a set of client computer systems; (3) obtain, from a first subset of the client computer systems, a first set of updates to the first global version; (4) merge the first set of updates into a second global version of the statistical model.; and (5) transmit the second global version to the client computer systems asynchronously from receiving a second set of updates to the first and/or second global versions from a second Subset of the client computer systems. Miaoet further discloses in ¶¶ [0055]-[0066] with FIGS. 1 and 3 that (1) the regularization of model adaptation for in-session recommendations; (2) shows the personalization of a statistical model (e.g., statistical model 108) to a user during a user session 310 with the user on a client (e.g., client 1 104, client y 106 of FIG. 1); (3) the client may obtain a global version 302 of the statistical model from a server, such as server 102 of FIG. 1; e.g., the client may download global version 302 from the server at the beginning of a first user session 310 with the user; (4) the client may use the statistical model to interact with the user during user session 310; (5) during interaction with the user in user session 310, the client may use global version 302 to output one or more recommendations 318 to the user; (6) the client may also use user feedback 314 from the user to create a personalized version 306 of the statistical model during user session 310; (7) the client may create personalized version 306 from global version 302 at the beginning of user session 310 and use the session identifier for user session 310 as the update identifier for personalized version 306; (8) the client may track the user's clicks, views, searches, applications to job listings, and/or other activity with the job search tool as user feedback 314 from the user; (9) each piece of user feedback 314 may be provided as training data that is used to create or update personalized version 306 during user session 310; (10) the output of personalized version 306 may be adapted to the user's real-time behavior or preferences during user session 310; (11) training of personalized version 306 is affected by the quality 308 of user feedback 314; (12) quality 308 may be based on the context of and/or one or more attributes associated with user feedback 314; (13) to train personalized version 306 based on quality 308 of user feedback 314, each piece of user feedback 314 may be assigned a weight that reflects quality 308; (14) personalized version 306 is adapted more to the higher quality user feedback 314 than the lower quality user feedback 314; (15) after each piece of user feedback 314 is assigned a weight, the weight may be provided as additional training data to personalized version 306; (16) alternatively, the weight may be used to scale a value representing the corresponding user feedback 314 before the value is inputted as training data to personalized version 306; (17) similarly, user feedback 314 may be labeled before user feedback 314 is provided as training data to personalized version 306; e.g., user feedback 314 may be labeled as positive or negative feedback, with positive feedback representing positive user actions (e.g., clicks, views, searches, conversions, likes, shares, upvotes, follows, etc.) and negative feedback representing a lack of user action (e.g., non-clicks or ignores) or negative user actions (e.g., downvotes, dislikes, hides, unfollows, etc.); (18) the labels may also include weights associated with quality 308; e.g., strong positive labels for user feedback 314 may be associated with longer viewing times and/or lengthier user interaction, while weak positive labels for user feedback 314 may be associated with short viewing times and/or an immediate return to a previous screen of the application; (19) once personalized version 306 is adapted from global version 302 based on user feedback 314 during user session 310, personalized version 306 may be used to output one or more additional recommendations 320 to the user; (20) recommendations 320 from personalized version 306 may be based on both user feedback 314 and previously outputted recommendations 318; (21) recommendations 320 may be selected based on a similarity 330 to content associated with user feedback 314; (22) at the same time, because personalized version 306 is adapted from global version 302 instead of created only from relatively small amounts of user feedback 314 in user session 310, overfitting of personalized version 306 to user feedback 314 may be averted; (23) recommendations 318 and user feedback 314 may be used by personalized version 306 to avoid including previously outputted recommendations 318 in newer recommendations 320.; (24) each time a recommendation is selected and shown to the user without receiving positive user feedback 314 (e.g., a click) associated with the recommendation, the importance of the recommendation is discounted; (25) if the user continues to ignore the recommendation, the frequency with which the recommendation is selected and/or shown may continue to decrease until the recommendation is no longer outputted to the user; (26) personalized version 306 may be used to output new recommendations, which may be more relevant and/or interesting to the user; (27) at the end of user session 310, the client may transmit an update 322 containing a difference between personalized version 306 and global version 302 to the server; (28) once update 322 is provided to the server, the client may discard personalized version 306; (29) the server may use update 322 and/or other updates to global version 302 or other previous global versions of the statistical model from other clients to produce a new global version 304 of the statistical model; e.g., the server may use version control to merge update 322 and/or the other updates into global version 304 asynchronously from receiving the updates from the clients; (30) the server may then transmit global version 304 to the clients, and the clients may adapt global version 304 into personalized versions during individual user sessions with a set of users; (31) as a result, the statistical model may be continuously updated through the creation of per-session personalized versions of the statistical model from global versions of the statistical model on the clients and the subsequent merging of the personalized versions into new global versions of the statistical model on the server; (32) personalized version 306 and update 322 are created based on one or more parameters 324 associated with regularized in-session adaptation of the statistical model; (33) parameters 324 may include an optimization parameter and/or a regularization parameter; (34) the optimization parameter may be used by the server to adjust the rate of convergence of global versions 302, 304 of the statistical model; (35) The regularization parameter may be used by the client to control the amount of personalization of the statis tical model to the user during user session 310; (36) the regularization parameter may be adapted to user feedback 314 and/or other user behavior or characteristics; e.g., the regularization parameter may initially be set to a large value to prevent overfitting of the statistical model to limited user feedback 314 during user session 310; (37) as additional user feedback 314 is collected, the regularization parameter may be decreased to adapt personalized version 306 to the behavior of the user; (38) if the behavior of the user diverges from that of other users (e.g., based on aggregated user feedback from the other users used to create global version 302), the regularization parameter may continue to be decreased until the regularization parameter reaches 0; and (39) different values of the regularization parameter may be used with different user sessions with the user, and the value of the regularization parameter with the best performance may be selected for the user.
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HWEI-MIN LU whose telephone number is (313)446-4913. The examiner can normally be reached Mon - Fri: 9:00 AM - 6:00 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mariela D. Reyes can be reached at (571) 270-1006. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/HWEI-MIN LU/Primary Examiner, Art Unit 2142