Last updated: April 19, 2026
Application No. 17/851,426
REINFORCEMENT LEARNING FOR DIVERSE CONTENT GENERATION

Final Rejection §101§103
Filed
Jun 28, 2022
Examiner
WONG, LUT
Art Unit
2127
Tech Center
2100 — Computer Architecture & Software
Assignee
Spotify AB
OA Round
2 (Final)
Interview Optional

— +15.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 598 resolved cases, 2023–2026
Examiner Intelligence

WONG, LUT View full profile →
Grants 77% — above average
Career Allow Rate
463 granted / 598 resolved
+22.4% vs TC avg
Moderate +15% lift
Without
With
+15.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 6m
Avg Prosecution
23 currently pending
Career history
621
Total Applications
across all art units
Statute-Specific Performance

§101
18.7%
-21.3% vs TC avg
§103
32.6%
-7.4% vs TC avg
§102
28.6%
-11.4% vs TC avg
§112
11.3%
-28.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 598 resolved cases
Office Action

§101 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed 9-15-2025 have been fully considered but they are not persuasive. 
Please see the updated rejection below that amended claims are still rejected under 101 and 103.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 1-4, 6-10, 12-16, 18 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Claim 1: Step 1: the claim is directed to statuary category.
Step 2A Prong 1: The claim recites the following limitations:
randomly sampling a policy from a distribution of policies to obtain a sampled policy (randomly sample a policy in high level is an observation, evaluation, judgment, opinion mental process which can reasonably be performed in one’s mind with the aid of pencil and paper); 
generating a candidate media-content-item recommendation using the sampled policy (recommendation generation in high level is an observation, evaluation, judgment, opinion mental process which can reasonably be performed in one’s mind with the aid of pencil and paper); 
measuring a quality of the generated candidate media-content-item recommendation based on a predefined quality criteria (quality measurement in high level is an observation, evaluation, judgment, opinion mental process which can reasonably be performed in one’s mind with the aid of pencil and paper), to determine if the candidate media-content-item recommendation is valid (validity determination in high level is an observation, evaluation, judgment, opinion mental process which can reasonably be performed in one’s mind with the aid of pencil and paper); and 
based on the measuring of the quality of the generated candidate media- content-item recommendation, adjusting the distribution of policies to generate an updated distribution of policies that are predicted to generate valid media-content-item recommendations (adjusting/updating distribution in high level is an observation, evaluation, judgment, opinion mental process which can reasonably be performed in one’s mind with the aid of pencil and paper; reinforcement learning algorithm is a mathematical concept);
using the updated distribution of policies as a basis to generate at least one media- content-item recommendation (recommendation generation using the updated distribution is an observation, evaluation, judgment, opinion mental process which can reasonably be performed in one’s mind with the aid of pencil and paper).
The claim recites an abstract idea. 
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. The claim recites the following additional elements:
A computing system, comprising: at least one processor; non-transitory storage; program instructions stored in the non-transitory data storage and executable by the at least one processor, to cause the at least one processor to (amounts to a generic computer component to perform a computer function as discussed in MPEP 2106.05(f)) to carry out operations for controlling successive presentation of media-content-items for playout by a media playback device (Examiner Note: intended use), the operations including: 
using a trained parameter model amounts to a generic computer component to perform a computer function as discussed in MPEP 2106.05(f));
wherein each policy in the distribution of policies comprises a respective set of actions to generate a media-content-item recommendation (amounts to generally linking the abstract ideas to the technological environment or field of use as discussed in in MPEP 2106.05(h));
using the at least one generated media-content-item recommendation as a basis to control presentation of a list of successive media-content-items for plavout by the media playback device (amounts to mere insignificant application, an insignificant extra-solution activity as discussed in MPEP 2106.05(g)). 
Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. 
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The judicial exceptions are not integrated into a practical application. 
A computing system, comprising: at least one processor; non-transitory storage; program instructions stored in the non-transitory data storage and executable by the at least one processor, to cause the at least one processor to (amounts to a generic computer component to perform a computer function as discussed in MPEP 2106.05(f)) to carry out operations for controlling successive presentation of media-content-items for playout by a media playback device (Examiner Note: intended use), the operations including: 
using a trained parameter model amounts to a generic computer component to perform a computer function as discussed in MPEP 2106.05(f));
wherein each policy in the distribution of policies comprises a respective set of actions to generate a media-content-item recommendation (amounts to generally linking the abstract ideas to the technological environment or field of use as discussed in in MPEP 2106.05(h));
using the at least one generated media-content-item recommendation as a basis to control presentation of a list of successive media-content-items for plavout by the media playback device (amounts to mere insignificant application, an insignificant extra-solution activity as discussed in MPEP 2106.05(g), which is extra-solution activity of well, understood routine and conventional operation of applying it under MPEP 2106.05(d)). 
The claim is not patent eligible.

Claim 2: Step 1: the claim is directed to statuary category.
Step 2A Prong 1: The claim recites the abstract idea of claim 1.
2. The computing system according to claim 1, wherein the generating, measuring, and adjusting cooperatively form at least part of a reinforcement learning (RL) algorithm (reinforcement learning is mathematical concept).
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. 
The claim recites no additional element:
Step 2B: As shown above, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The judicial exceptions are not integrated into a practical application. 
The claim is not patent eligible.

Claim 3: Step 1: the claim is directed to statuary category.
Step 2A Prong 1: The claim recites the abstract idea of claim 1.
Furthermore, defining the distribution of policies based on an action space (define a distribution of policies in high level is an observation, evaluation, judgment, opinion mental process which can reasonably be performed in one’s mind with the aid of pencil and paper); 
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. 
The claim recites no additional element:
Step 2B: the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The judicial exceptions are not integrated into a practical application. 
The claim is not patent eligible.

Claim 4: Step 1: the claim is directed to statuary category.
Step 2A Prong 1: The claim recites the abstract idea of claim 1.
sampling a predetermined number (K) of policies from the updated distribution of policies, thereby obtaining a predetermined number (K) of sampled policies, wherein K is the number of media-content-item recommendations to be generated (sample a predetermined number (K) of policies in high level is an observation, evaluation, judgment, opinion mental process which can reasonably be performed in one’s mind with the aid of pencil and paper); and 
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. 
The claim recites additional element:
wherein using the updated distribution of policies as a basis to generate at least one media-content-item recommendation comprises:
obtaining a plurality of environment settings (amounts to mere data gathering, an insignificant extra-solution activity as discussed in MPEP 2106.05(g), which is well understood, routine and convention activity of receiving or gathering data as identified by the court in MPEP 2106.05(d)); 
passing the plurality of environment settings to the predetermined number (K) of sampled policies; and (amounts to mere insignificant application, an insignificant extra-solution activity as discussed in MPEP 2106.05(g), which is extra-solution activity of well, understood routine and conventional operation of presentation of offer or statistics under MPEP 2106.05(d)); 
using the predetermined number (K) of sampled policies, based on the plurality of environment settings, to generate K media-content-item recommendations (amounts to mere insignificant application, an insignificant extra-solution activity as discussed in MPEP 2106.05(g), which is extra-solution activity of well, understood routine and conventional operation of applying it under MPEP 2106.05(d)).
Step 2B: As shown above, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The judicial exceptions are not integrated into a practical application. 
The claim is not patent eligible.

Claim 6: Step 1: the claim is directed to statuary category.
Step 2A Prong 1: The claim recites the abstract idea of claim 1.
Step 2A Prong 2: The judicial exceptions are not integrated into a practical application. 
The claim recites additional element:
6. The computing system according to claim 1, wherein using the at least one generated media-content-item recommendation as a basis to control presentation of a list of successive media-content-items for playout by the media playback device comprises:
selecting from a database of media-content-items at least one media-content-item in accordance with the at least one generated media-content-item recommendation (amounts to mere data gathering, an insignificant extra-solution activity as discussed in MPEP 2106.05(g), which is well understood, routine and convention activity of receiving or gathering data as identified by the court in MPEP 2106.05(d)); and 
communicating the at least one selected media-content-item to the playback device for playback (amounts to mere insignificant application, an insignificant extra-solution activity as discussed in MPEP 2106.05(g), which is extra-solution activity of well, understood routine and conventional operation of presentation of offer or statistics under MPEP 2106.05(d)).
Step 2B: As shown above, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The judicial exceptions are not integrated into a practical application. 
The claim is not patent eligible.

Claims 7-10, 12 are method claims having similar limitation as claims 1-4, 6 and are rejected under the same rationale. 

Claims 13-16, 18 are non-transitory computer readable medium claims having similar limitation as claims 1-4, 6 and are rejected under the same rationale. The additional elements in claim 13-16, 18 are A non-transitory computer-readable medium having stored thereon one or more sequences of instructions for causing one or more processors carry out operations for controlling successive presentation of media-content-items for playout by a media playback device, the operations including (amounts to performing generic function of execution of stored instructions (MPEP 2106.05(f)). Accordingly, the additional elements do not integrate the abstract into practical application and are not sufficient to amount to significant more than the abstract idea. Therefore, the claims are an abstract idea.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-2, 6-8, 12-14, 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Malik et al (US 2021/0117780 A1) in view of Cruz Huertas et al (US 20190318008 A1 ) and further in view of Marino et al (“On the Design of Variational RL Algorithms” 2019)
Claim 1. Malik disclose A computing system, comprising: at least one processor; non-transitory storage; program instructions stored in the non-transitory data storage and executable by the at least one processor, to cause the at least one processor to ([0156] In particular embodiments, computer system 1600 includes a processor 1602, memory 1604, storage 1606, an input/output (I/O) interface 1608, a communication interface 1610, and a bus 1612. Although this disclosure describes and illustrates a particular computer system having a particular number of particular components in a particular arrangement, this disclosure contemplates any suitable computer system having any suitable number of any suitable components in any suitable arrangement.): 
randomly sampling a policy from a distribution of policies to obtain a sampled policy ([0065] In particular embodiments, the action execution module 230 may call local agents to execute tasks. A local agent on the client system 130 may be able to execute simpler tasks compared to an agent on the server-side. As an example and not by way of limitation, multiple device-specific implementations (e.g., real-time calls for a client system 130 or a messaging application on the client system 130) may be handled internally by a single agent. Alternatively, these device-specific implementations may be handled by multiple agents associated with multiple domains. In particular embodiments, the action execution module 230 may additionally perform a set of general executable dialog actions. The set of executable dialog actions may interact with agents, users and the assistant system 140 itself. These dialog actions may comprise dialog actions for slot request, confirmation, disambiguation, agent execution, etc. The dialog actions may be independent of the underlying implementation of the action selector or dialog policy. Both tree-based policy and model-based policy may generate the same basic dialog actions, with a callback function hiding any action selector specific implementation details. [0095] In particular embodiments, for a given training round t, a server may use a random selection process to select a subset of m client systems S.sub.m.sup.t⊂U to train the machine-learning model, such that a complement subset of (j−m) client system .sub.j−m.sup.t=U.sub.j\S.sub.m.sup.t is not selected to train the model. The server may then send a current version of the model having current model parameters w.sup.t to each of the selected client systems S.sub.m.sup.t. In particular embodiments, the random selection process may be an independent and uniform random sampling method, or any other suitable random selection method.); 
wherein each policy in the distribution of policies comprises a respective set of actions to generate a the general policy 346 may be used for actions that are not specific to individual tasks. The general policy 346 may comprise handling low confidence intents, internal errors, unacceptable user response with retries, skipping or inserting confirmation based on ASR or NLU confidence scores, etc. The general policy 346 may also comprise the logic of ranking dialog state update candidates from the dialog state tracker 337 output and pick the one to update (such as picking the top ranked task intent). In particular embodiments, the assistant system 140 may have a particular interface for the general policy 346, which allows for consolidating scattered cross-domain policy/business-rules, especial those found in the dialog state tracker 337, into a function of the action selector 341. The interface for the general policy 346 may also allow for authoring of self-contained sub-policy units that may be tied to specific situations or clients, e.g., policy functions that may be easily switched on or off based on clients, situation, etc. The interface for the general policy 346 may also allow for providing a layering of policies with back-off, i.e. multiple policy units, with highly specialized policy units that deal with specific situations being backed up by more general policies 346 that apply in wider circumstances. In this context the general policy 346 may alternatively comprise intent or task specific policy. In particular embodiments, a task policy 347 may comprise the logic for action selector 341 based on the task and current state. In particular embodiments, there may be the following four types of task policies 347: 1) manually crafted tree-based dialog plans; 2) coded policy that directly implements the interface for generating actions; 3) configurator-specified slot-filling tasks; and 4) machine-learning model based policy learned from data. In particular embodiments, the assistant system 140 may bootstrap new domains with rule-based logic and later refine the task policies 347 with machine-learning models. In particular embodiments, a dialog policy 345 may a tree-based policy, which is a pre-constructed dialog plan. Based on the current dialog state, a dialog policy 345 may choose a node to execute and generate the corresponding actions. As an example and not by way of limitation, the tree-based policy may comprise topic grouping nodes and dialog action (leaf) nodes. [0151] In particular embodiments, privacy settings may allow a first user to specify whether particular objects or information associated with the first user may be accessed from particular client systems 130 or third-party systems 170. The privacy settings may allow the first user to opt in or opt out of having objects or information accessed from a particular device (e.g., the phone book on a user's smart phone), from a particular application (e.g., a messaging app), or from a particular system (e.g., an email server). The social-networking system 160 or assistant system 140 may provide default privacy settings with respect to each device, system, or application, and/or the first user may be prompted to specify a particular privacy setting for each context. As an example and not by way of limitation, the first user may utilize a location-services feature of the social-networking system 160 or assistant system 140 to provide recommendations for restaurants or other places in proximity to the user. The first user's default privacy settings may specify that the social-networking system 160 or assistant system 140 may use location information provided from a client device 130 of the first user to provide the location-based services, but that the social-networking system 160 or assistant system 140 may not store the location information of the first user or provide it to any third-party system 170. The first user may then update the privacy settings to allow location information to be used by a third-party image-sharing application in order to geo-tag photos.);
generating a candidate generate a personalized set of content objects to display to a user, such as a newsfeed of aggregated stories of other users connected to the user. [0066] … In particular embodiments, the CU composer may check privacy constraints associated with the user to make sure the generation of the communication content follows the privacy policies. ..); 
measuring a quality of the generated candidate which k indicates a knowledge source, c indicates a communicative goal, u indicates a user model, and d indicates a discourse model. In particular embodiments, the CU composer may comprise a natural-language generation (NLG) module and a user interface (UI) payload generator. The natural-language generator may generate a communication content based on the output of the action execution module 226 using different language models and/or language templates. ..The content determination component may determine the communication content based on the knowledge source, communicative goal, and the user's expectations. [0080] … In particular embodiments, the dialog state tracker 337 may update/rank the dialog state of the current dialog session. As an example and not by way of limitation, the dialog state tracker 337 may update the dialog state as “completed” if the dialog session is over. As another example and not by way of limitation, the dialog state tracker 337 may rank the dialog state based on a priority associated with it..), to determine if the candidate generate a communication content based on the output of the action execution module 226 using different language models and/or language templates. ..The content determination component may determine the communication content based on the knowledge source, communicative goal, and the user's expectations. Examiner Note: any content determined to be satisfactory to use’s goal/expectation is considered as valid content); and 
based on the measuring of the quality of the generated candidate generate an updated distribution of policies that are predicted to generate valid the dialog manager may use reinforcement learning for dialog optimization. Assistant state tracking aims to keep track of a state that changes over time as a user interacts with the world and the assistant system 140 interacts with the user. As an example and not by way of limitation, assistant state tracking may track what a user is talking about, whom the user is with, where the user is, what tasks are currently in progress, and where the user's gaze is at, etc., subject to applicable privacy policies. In particular embodiments, the dialog manager may use a set of operators to track the dialog state. The operators may comprise the necessary data and logic to update the dialog state. Each operator may act as delta of the dialog state after processing an incoming request. In particular embodiments, the dialog manager may further comprise a dialog state tracker and an action selector. In alternative embodiments, the dialog state tracker may replace the entity resolution component and resolve the references/mentions and keep track of the state. [0009] In particular embodiments, a client system may receive, from one or more remote servers, a current version of a neural network model comprising a plurality of model parameters. The client system may then train the neural network model on a plurality of examples retrieved from a local data store to generate a plurality of updated model parameters. ... [0084] In particular embodiments, the action execution module 226 may call different agents 350 for task execution. An agent 350 may select among registered content providers to complete the action. The data structure may be constructed by the dialog manager 335 based on an intent and one or more slots associated with the intent. A dialog policy 345 may further comprise multiple goals related to each other through logical operators. In particular embodiments, a goal may be an outcome of a portion of the dialog policy and it may be constructed by the dialog manager 335. A goal may be represented by an identifier (e.g., string) with one or more named arguments, which parameterize the goal. As an example and not by way of limitation, a goal with its associated goal argument may be represented as {confirm_artist, args: {artist: “Madonna”}}. In particular embodiments, a dialog policy may be based on a tree-structured representation, in which goals are mapped to leaves of the tree. In particular embodiments, the dialog manager 335 may execute a dialog policy 345 to determine the next action to carry out. The dialog policies 345 may comprise generic policy 346 and domain specific policies 347, both of which may guide how to select the next system action based on the dialog state. In particular embodiments, the task completion component 340 of the action execution module 226 may communicate with dialog policies 345 comprised in the dialog arbitrator 216 to obtain the guidance of the next system action. In particular embodiments, the action selection component 341 may therefore select an action based on the dialog intent, the associated content objects, and the guidance from dialog policies 345); 
using the updated distribution of policies as a basis to generate at least one media- content-item recommendation (([0055] In particular embodiments, the dialog manager may conduct dialog optimization and assistant state tracking. Dialog optimization is the problem of using data to understand what the most likely branching in a dialog should be. As an example and not by way of limitation, with dialog optimization the assistant system 140 may not need to confirm who a user wants to call because the assistant system 140 has high confidence that a person inferred based on dialog optimization would be very likely whom the user wants to call. In particular embodiments, the dialog manager may use reinforcement learning for dialog optimization. Assistant state tracking aims to keep track of a state that changes over time as a user interacts with the world and the assistant system 140 interacts with the user. As an example and not by way of limitation, assistant state tracking may track what a user is talking about, whom the user is with, where the user is, what tasks are currently in progress, and where the user's gaze is at, etc., subject to applicable privacy policies. In particular embodiments, the dialog manager may use a set of operators to track the dialog state. The operators may comprise the necessary data and logic to update the dialog state. Each operator may act as delta of the dialog state after processing an incoming request. In particular embodiments, the dialog manager may further comprise a dialog state tracker and an action selector. In alternative embodiments, the dialog state tracker may replace the entity resolution component and resolve the references/mentions and keep track of the state. [0009] In particular embodiments, a client system may receive, from one or more remote servers, a current version of a neural network model comprising a plurality of model parameters. The client system may then train the neural network model on a plurality of examples retrieved from a local data store to generate a plurality of updated model parameters. ... [0084] In particular embodiments, the action execution module 226 may call different agents 350 for task execution. An agent 350 may select among registered content providers to complete the action. The data structure may be constructed by the dialog manager 335 based on an intent and one or more slots associated with the intent. A dialog policy 345 may further comprise multiple goals related to each other through logical operators. In particular embodiments, a goal may be an outcome of a portion of the dialog policy and it may be constructed by the dialog manager 335. A goal may be represented by an identifier (e.g., string) with one or more named arguments, which parameterize the goal. As an example and not by way of limitation, a goal with its associated goal argument may be represented as {confirm_artist, args: {artist: “Madonna”}}. In particular embodiments, a dialog policy may be based on a tree-structured representation, in which goals are mapped to leaves of the tree. In particular embodiments, the dialog manager 335 may execute a dialog policy 345 to determine the next action to carry out. The dialog policies 345 may comprise generic policy 346 and domain specific policies 347, both of which may guide how to select the next system action based on the dialog state. In particular embodiments, the task completion component 340 of the action execution module 226 may communicate with dialog policies 345 comprised in the dialog arbitrator 216 to obtain the guidance of the next system action. In particular embodiments, the action selection component 341 may therefore select an action based on the dialog intent, the associated content objects, and the guidance from dialog policies 345);

While Malik disclose smart assistant using RF for restaurant recommendation ([0051]), Malik fails to explicitly disclose media playlist recommendation.  Specifically, Malik fails to explicitly disclose to carry out operations for controlling successive presentation of media-content-items for playout by a media playback device, using the at least one generated media-content-item recommendation as a basis to control presentation of a list of successive media-content-items for plavout by the media playback device. 
However, Cruz Huertas disclose RF (thereby in the same field of endeavor) and further disclose to carry out operations for controlling successive presentation of media-content-items for playout by a media playback device, using the at least one generated media-content-item recommendation as a basis to control presentation of a list of successive media-content-items for plavout by the media playback device ([0003] Embodiments of the present invention disclose a method, computer system, and a computer program product for selecting a media playlist based on learning past behaviors of a user. The present invention may include receiving a plurality of current user data associated with the user from a user device, wherein the received plurality of current user data associated with the user includes a plurality of user reaction data to a plurality of media selections corresponding with the user. The present invention may then include receiving a plurality of current external conditions data associated with the user from the user device. The present invention may also include enriching a plurality of current raw data associated with the received plurality of current user data, the received plurality of user reactions to the plurality of media selections and the received plurality of current external conditions data. The present invention may further include determining the received plurality of current user data exceeds a threshold associated with the user. The present invention may also include, in response to determining that the received plurality of current user data exceeds the threshold, creating a dataset based on the determined plurality of current user data based on the exceeded threshold associated with the user. The present invention may then include retrieving, from a records of learned user preferences and behaviors database and a combination of external digital devices, a media playlist based on the determined plurality of current user data exceeding the threshold associated with the user, wherein the retrieved media playlist alters the received plurality of current user data. The present invention may further include sending the retrieved media playlist to a media device associated with the user. [0023] According to at least one embodiment, the media customization program may be able to learn user behaviors based on media played according to the context (e.g., location, user activity, weather, date/time, cognitive state) and based on context may intelligently propose songs liked by the user or similar to the media element liked by the user depending on the detected context. The provided functions of the media customization program save the user from continuously shifting between songs and creating playlists. The present embodiment may include the optimization of the streaming of media files for those songs that are more probable to be played based on the context.)
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to apply the RF smart assistant of Malik to the field of media playlist recommendation in light of teaching of Cruz Huertas.
Given the fact that smart assistant can be apply to various area (Malik’s [0003] An assistant system can provide information or services on behalf of a user based on a combination of user input, location awareness, and the ability to access information from a variety of online sources (such as weather conditions, traffic congestion, news, stock prices, user schedules, retail prices, etc.). The user input may include text (e.g., online chat), especially in an instant messaging application or other applications, voice, images, motion, or a combination of them. The assistant system may perform concierge-type services (e.g., making dinner reservations, purchasing event tickets, making travel arrangements) or provide information based on the user input. The assistant system may also perform management or data-handling tasks based on online information and events without user initiation or interaction. Examples of those tasks that may be performed by an assistant system may include schedule management (e.g., sending an alert to a dinner date that a user is running late due to traffic conditions, update schedules for both parties, and change the restaurant reservation time). The assistant system may be enabled by the combination of computing devices, application programming interfaces (APIs), and the proliferation of applications on user devices.), one having ordinary skill in the art would have been motivated to make this obvious modification with predictable result. 

Claim 2. Malik disclose The computing system according to claim 1, wherein the generating, measuring, and adjusting cooperatively form at least part of a reinforcement learning (RL) algorithm ([0055] In particular embodiments, the dialog manager may conduct dialog optimization and assistant state tracking. Dialog optimization is the problem of using data to understand what the most likely branching in a dialog should be. As an example and not by way of limitation, with dialog optimization the assistant system 140 may not need to confirm who a user wants to call because the assistant system 140 has high confidence that a person inferred based on dialog optimization would be very likely whom the user wants to call. In particular embodiments, the dialog manager may use reinforcement learning for dialog optimization. Assistant state tracking aims to keep track of a state that changes over time as a user interacts with the world and the assistant system 140 interacts with the user. As an example and not by way of limitation, assistant state tracking may track what a user is talking about, whom the user is with, where the user is, what tasks are currently in progress, and where the user's gaze is at, etc., subject to applicable privacy policies. In particular embodiments, the dialog manager may use a set of operators to track the dialog state. The operators may comprise the necessary data and logic to update the dialog state. Each operator may act as delta of the dialog state after processing an incoming request. In particular embodiments, the dialog manager may further comprise a dialog state tracker and an action selector. In alternative embodiments, the dialog state tracker may replace the entity resolution component and resolve the references/mentions and keep track of the state). 

Claim 6. Cruz Huertas disclose the computing system according to claim 1, wherein using the at least one generated media-content-item recommendation as a basis to control presentation of a list of successive media-content-items for playout by the media playback device comprises:
selecting from a database of media-content-items at least one media-content-item in accordance with the at least one generated media-content-item recommendation ([0003] Embodiments of the present invention disclose a method, computer system, and a computer program product for selecting a media playlist based on learning past behaviors of a user. The present invention may include receiving a plurality of current user data associated with the user from a user device, wherein the received plurality of current user data associated with the user includes a plurality of user reaction data to a plurality of media selections corresponding with the user. The present invention may then include receiving a plurality of current external conditions data associated with the user from the user device. The present invention may also include enriching a plurality of current raw data associated with the received plurality of current user data, the received plurality of user reactions to the plurality of media selections and the received plurality of current external conditions data. The present invention may further include determining the received plurality of current user data exceeds a threshold associated with the user. The present invention may also include, in response to determining that the received plurality of current user data exceeds the threshold, creating a dataset based on the determined plurality of current user data based on the exceeded threshold associated with the user. The present invention may then include retrieving, from a records of learned user preferences and behaviors database and a combination of external digital devices, a media playlist based on the determined plurality of current user data exceeding the threshold associated with the user, wherein the retrieved media playlist alters the received plurality of current user data. The present invention may further include sending the retrieved media playlist to a media device associated with the user. [0070] Alternatively, the media customization program 110a, 110-b may create a dataset based on the user reaction to a particular media selection. The Customized Playlist Selection module may be utilized to query the records of learned user preferences and behaviors database 212 for the media corresponding to the created dataset associated with the change in the anatomical functions or systems associated with the user, to retrieve media that may alleviate the change to the user's anatomical functions or systems); and 
communicating the at least one selected media-content-item to the playback device for playback ([0023] According to at least one embodiment, the media customization program may be able to learn user behaviors based on media played according to the context (e.g., location, user activity, weather, date/time, cognitive state) and based on context may intelligently propose songs liked by the user or similar to the media element liked by the user depending on the detected context. The provided functions of the media customization program save the user from continuously shifting between songs and creating playlists. The present embodiment may include the optimization of the streaming of media files for those songs that are more probable to be played based on the context. [0072] Then, at 310, the resulting media playlist is retrieved by the media customization program 110a, 110-b. After the media customization program 110a, 110-b searches the records of learned user preferences and behaviors database 212 and analyzes a combination of external factors (e.g., positioning activities) from digital devices (e.g., smartwatch) that may describe the activity the individual is performing, media titles suitable for the user's current situation may be returned. The media titles may then be sent to the extension, which may deliver the retrieved resulting media playlist (i.e., media playlist) to a device associated with the user (e.g., media player, a preferred application, preferred method of choice to listen to media that already contains a predetermination of the user's preferred types of media). The device associated with the user may then stream the retrieved media playlist to the user.).

Claims 7-8, 12-14, 18 are method and medium claims having similar limitation as claims 1-2, 6 and are rejected under the same rationale. (See Malik [0012] The embodiments disclosed herein are only examples, and the scope of this disclosure is not limited to them. Particular embodiments may include all, some, or none of the components, elements, features, functions, operations, or steps of the embodiments disclosed herein. Embodiments according to the invention are in particular disclosed in the attached claims directed to a method, a storage medium, a system and a computer program product, wherein any feature mentioned in one claim category, e.g. method, can be claimed in another claim category, e.g. system, as well).

Claim(s) 3-4, 9-10, 15-16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Malik et al (US 2021/0117780 A1) in view of Cruz Huertas et al (US 20190318008 A1), and further in view of Marino et al (“On the Design of Variational RL Algorithms” 2019)
Claim 3. While Malik disclose RF, Malik fails to explicitly disclose define a distribution of policies based on an action space.  
However, Marino disclose RF (thereby in the same field of endeavor) and further disclose define a distribution of policies based on an action space (see section 3.2.1).

    PNG
    media_image1.png
    200
    400
    media_image1.png
    Greyscale

It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention that the RF of Malik define a distribution of policies based on an action space in light of teaching of Marino.
Given the fact that standard RF involves define a distribution of policies based on an action space (See Marino section 3.2.1), one having ordinary skill in the art would have been motivated to make this obvious modification with predictable result. 

Claim 4. Marino disclose the computing system according to claim 1, wherein using the updated distribution of policies as a basis to generate at least one media-content-item recommendation comprises: 
obtaining a plurality of environment settings (See section 2.1 on environment state transitions); 

    PNG
    media_image2.png
    200
    400
    media_image2.png
    Greyscale


sampling a predetermined number (K) of policies from the distribution of policies, thereby obtaining a predetermined number (K) of sampled policies (see section 3.2.4 and 3.2 on M sampled trajectories), wherein K is the number of media-content-item recommendations to be generated; and passing the plurality of environment settings to the predetermined number (K) of sampled policies (see section 3.2.4).

    PNG
    media_image3.png
    200
    400
    media_image3.png
    Greyscale


    PNG
    media_image4.png
    200
    400
    media_image4.png
    Greyscale

Cruz Huertas disclose using the predetermined number (K) of sampled policies, based on the plurality of environment settings, to generate K media-content-item recommendations ([0022] Therefore, it may be advantageous to, among other things, assess a user's interests and media selections based on past behaviors and foreign data from sources (e.g., GPS, sensor movement). Specifically, the media customization program may learn characteristics of when a user prefers certain genres of music, then offer music to match those genres when those conditions are replicated. For instance, the media customization program may base key decisions on reinforcement learning and K-nearest neighbors that can adapt to items such as location-based services, accelerometer readings, or other parameters, to better match a user's tastes and change the user status.)

Claims 9-10, 15-16 are method and medium claims having similar limitation as claims 3-4 and are rejected under the same rationale.

Pertinent Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Nakada et al (US 20190244133 A1) disclose reinforcement learning.  See e.g. abstract.
Kaiser et al (“MODEL BASED REINFORCEMENT LEARNING FOR ATARI” 2020) disclose model free reinforcement learning for Atari game.  See e.g. abstract.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.	
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LUT WONG whose telephone number is (571)270-1123. The examiner can normally be reached M-F 10am-6pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Al Kawsar can be reached at 5712703169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/LUT WONG/Primary Examiner, Art Unit 2127
Read full office action
Prosecution Timeline

Jun 28, 2022
Application Filed
May 07, 2025
Non-Final Rejection — §101, §103
Sep 15, 2025
Response Filed
Oct 29, 2025
Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/579,335
Patent 12602451
UNSUPERVISED ANOMALY DETECTION MACHINE LEARNING FRAMEWORKS
2y 5m to grant Granted Apr 14, 2026
17/851,197
Patent 12591782
INTELLIGENT SCALING FACTORS FOR USE WITH EVOLUTIONARY STRATEGIES-BASED ARTIFICIAL INTELLIGENCE (AI)
2y 5m to grant Granted Mar 31, 2026
17/866,576
Patent 12591786
INTELLIGENT AMMUNITION CO-EVOLUTION TASK ASSIGNMENT METHOD
2y 5m to grant Granted Mar 31, 2026
19/196,841
Patent 12585956
SYSTEMS, METHODS, AND GRAPHICAL USER INTERFACES FOR MITIGATING BIAS IN A MACHINE LEARNING-BASED DECISIONING MODEL
2y 5m to grant Granted Mar 24, 2026
17/738,053
Patent 12566977
OPTIMIZING COGBOT RETRAINING
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
77%
Grant Probability
92%
With Interview (+15.0%)
3y 6m
Median Time to Grant
Moderate
PTA Risk
Based on 598 resolved cases by this examiner. Grant probability derived from career allow rate.