DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is in reply to the communications filed on October 27, 2025. The Applicant’s Amendment and Request for Reconsideration has been received and entered.
Claims 1-20 are currently pending. Claims 10-20 have been withdrawn in response to the restriction requirement. Claims 1 and 5 have been amended. Claims 1-9 have been examined in this application.
Response to Arguments
Applicant’s amendments necessitated the new grounds of rejection.
The previous rejection of claim 5 under 35 USC 112(b) has been withdrawn in view of Applicant’s amendments.
Regarding the rejection of claims 1-9 under 35 USC 101, Applicant’s arguments have been fully considered but they are not persuasive for the reasons set forth infra.
Additionally, Applicant argues that the invention solves the technical challenges of redundant storage and inefficient search. The Examiner respectfully argues that the claims merely disclose updating items listed at each listing platform based on the new distribution of items to each listing platform, which does not necessarily or inherently support “reducing redundant data storage.” The Examiner further argues that “optimizing item distribution and search ranking rules across multiple listing platforms using reinforcement learning” is not an improvement another technology, but rather, is an improvement to business operations. Indeed, as per Applicant’s specification, “Each of these listing platforms maintains a database storing information regarding available items and provides interfaces that enable users to access item information and otherwise interact with the listing platform, for instance, to purchase, rent, download, or stream items. Each listing platform also typically provides a search engine to facilitate users finding items on the listing platform.” (App. Spec. [0001]) -- thus optimizing item distribution and search ranking rules across multiple listing platforms using reinforcement learning improves facilitation of users in finding items on the listing platform to purchase, rent, download, or stream.
Applicant’s remaining arguments have been fully considered but they are not persuasive. Particularly, Applicant’s arguments are directed to the instantly amended claims, and are thus moot in view of the new grounds of rejection.
Still, to offer further transparency as to Examiner’s interpretation of the claims, Examiner offers the following supplemental explanation:
The limitation “initializing, by at least one of the one or more servers of the item distribution and ranking system, a reinforcement learning agent using the item interaction data” is taught by the combination of Round and Zhu. Specifically, Round teaching initializing, by at least one of one or more servers of an item distribution and ranking system, a . . . learning agent using the item interaction data; and (Round: Fig, 1; [0057]; [0073]; [0086]-[0087]; [0102]-[0104]) and Zhu teaches that this learning agent may be a reinforcement learning agent (Zhu: [0047]; Fig. 3; [0050]-[0057]; [0109]-[0126]). Similarly, it is the combination of Round/Zhu which teach “deploying, by at least one of the one or more servers of the item distribution and ranking system, the reinforcement learning agent to use a function to select an action at each of a plurality of epochs and update the function at each epoch, the action selected by the function at each epoch changing a current distribution of items to each listing platform from the plurality of listing platforms and current search ranking rules for each listing platform to a new distribution of items to each listing platform and new search ranking rules for each listing platform” as found infra. In response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references. See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986). The Examiner respectfully notes that newly applied reference Thandani (US PGP 2020/0401881) also teaches “initializing, by at least one of the one or more servers of the item distribution and ranking system, a reinforcement learning agent using the item interaction data” (Thadani: Fig. 1; [0030] (employs reinforcement learning); [0037]-[0041] (reinforcement learning model); Fig. 2 [0042]-[0044] (The modeling component 204 is configured to execute modeling workflows including data collection 206 to obtain training data, and model training 208 to train (or re-train) the neural network using the training data. . . . For purposes of reinforcement learning, each training datapoint consists of a comment ranking shown to a user (which can be represented by a comment feature matrix) and the corresponding reward (e.g. dwell time) for that ranking. . . . When a user sees the ranking, the front-end (e.g. content server 232) logs the dwell time the user spent on that ranking, which is the reward for the reinforcement learning.). Thandani further teaches “deploying, by at least one of the one or more servers of the item distribution and ranking system, the reinforcement learning agent to use a function to select an action at each of a plurality of epochs and update the function at each epoch, the action selected by the function at each epoch changing a current distribution of items to each listing platform from the plurality of listing platforms and current search ranking rules for each listing platform to a new distribution of items to each listing platform and new search ranking rules for each listing platform” (Thadani: Fig. 1 ; [0031]-[0032] (FIG. 1 conceptually illustrates a method for using a reinforcement learning model to rank comments for serving to users, in accordance with implementations of the disclosure. . . . Given the feature vectors, the system ranks the comments and presents them to the user. The user views the comments in their ranked order, and reacts in some way that is measurable as a scalar reward (e.g. dwell time) for the particular ranking. This reward is processed and used to update the ranking mechanism in order to maximize the reward.); [0039]-[0041] (For a given generated ranking, comments are served in accordance with the order defined by the generated ranking, and the reward (e.g. dwell time) is captured. This information is used to update/optimize the scoring model. To update the scoring model, an implementation of a policy gradient algorithm is applied. The policy gradient (PG) algorithm converts the policy search problem into an optimization problem. It works by repeatedly estimating the gradient of the policy's performance (reward) with respect to its parameters followed by gradient ascent to find parameters that can increase the expected rewards.); Fig. 2; [0042]-[0046] (The result of the training is to update the parameters of the neural network (e.g. weights, biases), which are stored to a model document 210. By implementing using a model document to store the model parameters, to update the model the system can feed new weights to that document, and when the model is loaded at runtime for each query, then the next ranking call will always load the latest model parameters. The updated model is thus fed to the comments serving component 212, which uses it to provide optimized presentation of comments, e.g. to maximize dwell time while simultaneously providing for explorative learning. . . . While ideal reinforcement learning updates the model after every datapoint, data may not be received that quickly due to processing time. Thus, in some implementations, training data is collected in batches, which are then used to re-train the neural network. By way of example without limitation, training data can be collected in one or two-hour batches, or any other predefined time period. To carry out model re-training, the current model is loaded, and an iteration of reinforcement learning is run on the (e.g. two-hour) batch of training data. Given training data, the neural network parameters (e.g. weights) are updated.))
The Examiner welcomes Applicant to contact Examiner for a telephonic interview to further prosecution.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-9 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.
Step 1. When considering subject matter eligibility under 35 U.S.C. 101, it must be determined whether the claim is directed to one of the four statutory categories of invention, i.e., process, machine, manufacture, or composition of matter.
Step 2A – Prong One. If the claims fall within one of the statutory categories, it must then be determined whether the claims recite an abstract idea, law of nature, or natural phenomenon.
Step 2A – Prong Two. If the claims recite an abstract idea, law of nature, or natural phenomenon, it must then be determined whether the claims recite additional elements that integrate the judicial exception into a practical application. If the claims do not recite additional elements that integrate the judicial exception into a practical application, then the claims are directed to a judicial exception.
Step 2B. If the claims are directed to a judicial exception, it must be evaluated whether the claims recite additional elements that amount to an inventive concept (i.e. “significantly more”) than the recited judicial exception.
In the instant case, claims 1-9 are directed to a manufacture. It is noted that claims 1-9 recite “one or more computer storage media” and Applicant’s specification explicitly disclose “[c]omputer storage media does not comprise signals per se.” (App. Spec. [0054])
A claim “recites” an abstract idea if there are identifiable limitations that fall within at least one of the groupings of abstract ideas enumerated in MPEP 2106. In the instant case, claim 1 recites the steps of:
Determining, by an item distribution and ranking, item interaction data using historical listing data for at least one listing from a plurality of listing; initializing, by the item distribution and ranking, a learning using the item interaction data; and deploying, by the item distribution and ranking, the learning to use a function to select an action at each of a plurality of epochs and update the function at each epoch, the action selected by the function at each epoch changing a current distribution of items to each listing from the plurality of listing and current search ranking rules for each listing to a new distribution of items to each listing and new search ranking rules for each listing; and at an epoch of the plurality of epochs, causing each listing from the plurality of listing to update items listed at each listing based on the new distribution of items to each listing and to update search ranking rules used by each listing based on the new search ranking rules for each listing -- these claim limitations set forth certain methods of organizing human activity, particularly commercial interactions including advertising, marketing, and sales activities/behaviors.
Additionally, these steps set forth mental processes, particularly concepts performed in the human mind, including, inter alia, the observation and evaluation of information.
Further, the limitations of the claims are not indicative of integration into a practical application. Taking the independent claim elements separately, the additional elements of performing the steps using a machine learning model by at least one of one or more servers of a system, at least one platform from a plurality of platforms, and a reinforcement learning agent -- merely implement the abstract idea on a computer environment. Additionally, taking the dependent claim elements separately, the additional elements of performing the steps using a Markov decision process also merely implement the abstract idea on a computer environment. Considered in combination, the steps of Applicant’s method add nothing that is not already present when the steps are considered separately.
Thus, claims 1-9 are directed to an abstract idea.
Regarding the claims, the technical elements of performing the steps by at least one of one or more servers of a system and using at least one platform from a plurality of platforms merely implement the abstract idea on a computer environment. Additionally, the Examiner notes that while the claims recite a machine learning model, a reinforcement learning agent, and a Markov decision process -- these limitations are recited at a high level of generality and thus does not amount to significantly more.
When considering the elements and combinations of elements, the claim(s) as a whole, do not amount to significantly more than the abstract idea itself. This is because the claims do not amount to an improvement to another technology or technical field; the claims do not amount to an improvement to the functioning of a computer itself; the claims do not move beyond a general link of the use of an abstract idea to a particular technological environment; the claims merely amounts to the application or instructions to apply the abstract idea on a computer; or the claims amounts to nothing more than requiring a generic computer to perform generic computer functions that are well-understood, routine and conventional activities previously known to the industry.
The analysis above applies to all statutory categories of invention. Accordingly, claims 1-9 are rejected as ineligible for patenting under 35 USC 101 based upon the same rationale.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-6, 8, and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Round (US PGP 2011/0288930) in view of Zhu (US PGP 2018/0165745) in view of Thadani (US PGP 2020/0401881).
As per claim 1, Round teaches [o]ne or more computer storage media storing instructions that, when used by one or more processors, cause the one or more processors to perform operations, the operations comprising:
determining, . . . by at least one of one or more servers of an item distribution and ranking system, item interaction data using historical listing data for at least one listing platform from a plurality of listing platforms; (Round: Fig, 1; [0057]; [0090]-[0091] (In the first step 98 of the indicator item identification process 92, the process 92 identifies a set of one or more source items. As described above, source items are items that are directly associated with the user's 12 current online activity or recent purchase history or other items that are assumed by the list system 26 to be indicators of the user's 12 current interests. For example, as described above, when a user clicks on a link to view a detail page for a given item, that item can be identified as a source item. When a user performs a search of the items available on the website 10, the items returned as search results can be identified as source items. When a user 12 first logs on to a new session at the website 10 and has no current online activity, the most recently purchased items can be identified as source items.); [0179] (In one embodiment, the list system is a component or service of the website in which the lists are displayed, and is provided or controlled by the operator of that website. In another embodiment, the list system is part of a separate metadata service, such as the Alexa Internet service, in which website metadata is displayed to users through a browser plug-in. Specifically, users of the metadata service are given an option, through the browser plug-in and/or a website of the metadata service provider, to create lists of favorite products or other items. These lists are uploaded to a metadata server operated by the metadata service provider. As a user of the metadata service browses the website of a merchant or other entity, information about the items being viewed by the user is reported by the browser plug-in from the user's computer to the metadata server. This information is in turn used by the metadata server to select lists to display to the user. The selected lists (or links to such lists) are then displayed to the user by the plug-in. With this method, the lists may be presented to users across many different websites of many different merchants or other entities.))
initializing, by at least one of one or more servers of an item distribution and ranking system, a . . . learning agent using the item interaction data; and (Round: Fig, 1; [0057]; [0073] (In order to increase the likelihood that a user will make a purchase during a session, in one embodiment, the list selection process 32 can also refer to a “list-effectiveness” table 36 that ranks each list based on the comparative frequency with which users who viewed the list went on to make a purchase or to place an item in their shopping cart. In one embodiment, the list effectiveness table 36 is based on clickstream and purchase history information collected during the normal course of operations for the website 100.); [0086]-[0087] (As illustrated by FIG. 5, in the first step 92 of the list selection process 32, the list system 26 identifies one or more indicator items. Indicator items are items that the system 26 assumes exemplify the current online interests of the user 12 and that the system 26 will use to help identify lists that it assumes will be of interest to the user 12. Once the list selection process 32 has identified indicator items, the process 32 moves on to step 94 where the process 32 scores and ranks lists in the list repository 30 in order to select those lists that the process 32 identifies as being of the greatest potential interest to the user 12. The process 94 of scoring and ranking lists from the repository 30 will be described in greater detail with reference to FIG. 8.); [0102]-[0104] (FIG. 8 shows the general sequence of steps for a process 94 to score and rank lists from the lists repository 30 based on the presence of indicator items in the lists and on the relative SimStrength scores of those items. In step 130, the process 94 calculates a ListStrength score for each list. In one embodiment, the ListStrength score for each list is calculated by identifying all indicator items on the list and by adding the SimStrength score for each. In one embodiment, a bonus is added to the ListStrength score for each indicator item that is also a source item. When the process has calculated all ListStrength scores in step 130, it moves on to step 132, where the process 94 ranks the lists according to their ListStrength scores.))
deploying, by at least one of one or more servers of an item distribution and ranking system, the reinforcement learning agent to use a function to select an action at each of a plurality of epochs and update the function at each epoch, the action selected by the function at each epoch changing a current distribution of items to each listing platform from the plurality of listing platforms and current search ranking rules for each listing platform to a new distribution of items to each listing platform and new search ranking rules for each listing platform. (Round: Fig, 1; [0057]; [0062] (The table is preferably generated periodically, for example, daily or weekly, using a most recent set of purchase history data, product viewing history data, and/or other types of historical browsing data reflecting users' item interests.); [0073] (Periodically, the “Yes/No” or other similar ratings can be compiled, and associated effectiveness ratings can be applied to the lists.); [0086]-[0087]; [0102]-[0104] (FIG. 8 shows the general sequence of steps for a process 94 to score and rank lists from the lists repository 30 based on the presence of indicator items in the lists and on the relative SimStrength scores of those items. In step 130, the process3 94 calculates a ListStrength score for each list. In one embodiment, the ListStrength score for each list is calculated by identifying all indicator items on the list and by adding the SimStrength score for each. In one embodiment, a bonus is added to the ListStrength score for each indicator item that is also a source item. When the process has calculated all ListStrength scores in step 130, it moves on to step 132, where the process 94 ranks the lists according to their ListStrength scores. As an optional step, the process can execute step 134, in which the top lists that have already been ranked 7 viewing user 12 in the order of their ranking. Thus, the top-ranked list will be displayed in a first, and, therefore presumably most prominent, position. In one embodiment, the set of lists selected for display is supplemented by at least one randomly chosen list. Re-ordering, or re-ranking, the selected set of lists can affect the positional prominence of each list, and if only a top subset of the re-ranked lists is chosen for display, re-ranking can also affect the set of lists selected for display. In one embodiment, a list re-ranking process 134 re-ranks the top ranked lists in order to give additional prominence to lists that exhibit some preferred characteristic. In one embodiment, the list re-ranking process 134 draws upon information that is periodically gathered and stored by the system 26 to generate a list effectiveness rating (LER) for each list. The LER of a list is based on information gleaned from the browsing histories of users who have viewed the list. The LER is a measure of the list's comparative performance in encouraging users 12 to make purchases at the website 10 or to place items in their shopping carts soon after having viewed the list or to exhibit some other user action.); [0012]-[0115]; [0116]-[0018] (The re-ranking process 134 begins by identifying a block of top-ranked lists to re-rank. In the example shown in FIG. 11, the block size B is three. Therefore, the top three lists are identified for re-ranking. In this example, the top three lists 148 are List6, List2, and List3. Referring to the list effectiveness table 36, which ranks the lists based on their power to encourage users to purchase, or to consider purchasing, items from the product catalog, the list effectiveness ratings (LERs) 37 for these three lists are 4973, 0684, and 1096, respectively. In the example shown in FIG. 11, an LER score of 0001 is the highest possible score, and a LER score of 9999 is the lowest. Therefore, re-ranking the three “current” lists 148 according to their LER scores produces the new ordering 149: List2, List3, List6. These re-ranked lists 149 are now the top three lists that will be displayed to the user. Thus, among these top three lists, the order of presentation has been altered slightly to give extra prominence to the lists that historically have been shown to encourage viewers to buy or to consider buying products. In the example shown in FIG. 11, the number of lists needed for display, D, is six. At this point, the process 134 has only re-ranked three lists. Therefore, the process identifies the next block 150 of three lists to re-rank. This time the lists are List5, List4, and List1. Based on their LER scores 37, the block of lists 150 will be re-ranked to a new ordering of: List5, List1, List4, and these three lists 151 will be added to the set of lists 152 selected for display to the user 12.); [0179] (In one embodiment, the list system is a component or service of the website in which the lists are displayed, and is provided or controlled by the operator of that website. In another embodiment, the list system is part of a separate metadata service, such as the Alexa Internet service, in which website metadata is displayed to users through a browser plug-in. Specifically, users of the metadata service are given an option, through the browser plug-in and/or a website of the metadata service provider, to create lists of favorite products or other items. These lists are uploaded to a metadata server operated by the metadata service provider. As a user of the metadata service browses the website of a merchant or other entity, information about the items being viewed by the user is reported by the browser plug-in from the user's computer to the metadata server. This information is in turn used by the metadata server to select lists to display to the user. The selected lists (or links to such lists) are then displayed to the user by the plug-in. With this method, the lists may be presented to users across many different websites of many different merchants or other entities.)))
. . . , causing one or more servers of each listing platform from the plurality of listing platforms to update items listed at each listing platform based on the new distribution of items to each listing platform and . . . by each listing platform . . . for each listing platform. (Round: Fig, 1; [0057]; [0062] (The table is preferably generated periodically, for example, daily or weekly, using a most recent set of purchase history data, product viewing history data, and/or other types of historical browsing data reflecting users' item interests.); [0073] (Periodically, the “Yes/No” or other similar ratings can be compiled, and associated effectiveness ratings can be applied to the lists.); [0086]-[0087]; [0102]-[0104] (FIG. 8 shows the general sequence of steps for a process 94 to score and rank lists from the lists repository 30 based on the presence of indicator items in the lists and on the relative SimStrength scores of those items. In step 130, the process3 94 calculates a ListStrength score for each list. In one embodiment, the ListStrength score for each list is calculated by identifying all indicator items on the list and by adding the SimStrength score for each. In one embodiment, a bonus is added to the ListStrength score for each indicator item that is also a source item. When the process has calculated all ListStrength scores in step 130, it moves on to step 132, where the process 94 ranks the lists according to their ListStrength scores. As an optional step, the process can execute step 134, in which the top lists that have already been ranked 7 viewing user 12 in the order of their ranking. Thus, the top-ranked list will be displayed in a first, and, therefore presumably most prominent, position. In one embodiment, the set of lists selected for display is supplemented by at least one randomly chosen list. Re-ordering, or re-ranking, the selected set of lists can affect the positional prominence of each list, and if only a top subset of the re-ranked lists is chosen for display, re-ranking can also affect the set of lists selected for display. In one embodiment, a list re-ranking process 134 re-ranks the top ranked lists in order to give additional prominence to lists that exhibit some preferred characteristic. In one embodiment, the list re-ranking process 134 draws upon information that is periodically gathered and stored by the system 26 to generate a list effectiveness rating (LER) for each list. The LER of a list is based on information gleaned from the browsing histories of users who have viewed the list. The LER is a measure of the list's comparative performance in encouraging users 12 to make purchases at the website 10 or to place items in their shopping carts soon after having viewed the list or to exhibit some other user action.); [0012]-[0115]; [0116]-[0018] (The re-ranking process 134 begins by identifying a block of top-ranked lists to re-rank. In the example shown in FIG. 11, the block size B is three. Therefore, the top three lists are identified for re-ranking. In this example, the top three lists 148 are List6, List2, and List3. Referring to the list effectiveness table 36, which ranks the lists based on their power to encourage users to purchase, or to consider purchasing, items from the product catalog, the list effectiveness ratings (LERs) 37 for these three lists are 4973, 0684, and 1096, respectively. In the example shown in FIG. 11, an LER score of 0001 is the highest possible score, and a LER score of 9999 is the lowest. Therefore, re-ranking the three “current” lists 148 according to their LER scores produces the new ordering 149: List2, List3, List6. These re-ranked lists 149 are now the top three lists that will be displayed to the user. Thus, among these top three lists, the order of presentation has been altered slightly to give extra prominence to the lists that historically have been shown to encourage viewers to buy or to consider buying products. In the example shown in FIG. 11, the number of lists needed for display, D, is six. At this point, the process 134 has only re-ranked three lists. Therefore, the process identifies the next block 150 of three lists to re-rank. This time the lists are List5, List4, and List1. Based on their LER scores 37, the block of lists 150 will be re-ranked to a new ordering of: List5, List1, List4, and these three lists 151 will be added to the set of lists 152 selected for display to the user 12.); [0179] (In one embodiment, the list system is a component or service of the website in which the lists are displayed, and is provided or controlled by the operator of that website. In another embodiment, the list system is part of a separate metadata service, such as the Alexa Internet service, in which website metadata is displayed to users through a browser plug-in. Specifically, users of the metadata service are given an option, through the browser plug-in and/or a website of the metadata service provider, to create lists of favorite products or other items. These lists are uploaded to a metadata server operated by the metadata service provider. As a user of the metadata service browses the website of a merchant or other entity, information about the items being viewed by the user is reported by the browser plug-in from the user's computer to the metadata server. This information is in turn used by the metadata server to select lists to display to the user. The selected lists (or links to such lists) are then displayed to the user by the plug-in. With this method, the lists may be presented to users across many different websites of many different merchants or other entities.)))
Round does not explicitly state the following known techniques which are taught by Zhu:
. . . using a machine learning model . . . ; (Zhu: [0047] (reinforcement learning model); Fig. 3; [0050]-[0057]; [0109]-[0126])
initializing a reinforcement learning agent . . . (Zhu: [0047] (reinforcement learning model); Fig. 3; [0050]-[0057] (the MDP involves two entities, i.e., an agent 302 and an environment 304, that interact with each other. The Agent is an entity that makes decisions. The environment is an entity for information feedback. For example, in the application scenario of product recommendation technology, the Agent may be set as the main subject for making product recommendation decisions . . . (1) S is a State Space, which contain a set of environmental states that the Agent may perceive. (2) A is an Action Space, which contain the set of actions the Agent may take on each state of the environment. (3) R is a Rewarding Function, and R (s, a, s′) represents the reward that the Agent obtains from the environment when the action a is performed on the state s and the state is changed to state s′ (4) T is the State Transition Function, and T (s, a, s′) can represent the probability of executing action a on state s and moving to state s′. As shown in FIG. 3, in the process of interaction between the Agent and the environment in the MDP, the Agent senses that the environment state at time t is st. Based on the environment state st, the Agent may select an action at from the action space A to execute. After the environment receives the action selected by Agent, it returns corresponding reward signal feedback rt+1 to the Agent and transfers to new environment state st+1, and waits for Agent to make a new decision.; [0109]-[0126]))
This known technique is applicable to the method of Round as they both share characteristics and capabilities, namely, they are directed to displaying items to user based on user historical data.
One of ordinary skill in the art at the time of filing would have recognized that applying the known technique of Zhu would have yielded predictable results and resulted in an improved method. It would have been recognized that applying the technique of Zhu to the teachings of Round would have yielded predictable results because the level of ordinary skill in the art demonstrated by the references applied shows the ability to incorporate such machine learning model and reinforcement learning agent features into similar methods. Further, applying the machine learning model and reinforcement learning agent to the learning agents of Round would have been recognized by those of ordinary skill in the art as resulting in an improved method that would allow modeling the changes of a smart decision maker, ultimately progressively optimizing long-term goals and learning the optimal recommendation strategy step by step (Zhu: Para [0040], [0057]).
Round/Zhu does not explicitly state the following known techniques which are taught by Thadani:
and at an epoch of the plurality of epochs, . . . update search ranking rules used . . . based on the new search ranking rules . . . . (Thadani: Fig. 1 ; [0031]-[0032] (FIG. 1 conceptually illustrates a method for using a reinforcement learning model to rank comments for serving to users, in accordance with implementations of the disclosure. . . . Given the feature vectors, the system ranks the comments and presents them to the user. The user views the comments in their ranked order, and reacts in some way that is measurable as a scalar reward (e.g. dwell time) for the particular ranking. This reward is processed and used to update the ranking mechanism in order to maximize the reward.); [0039]-[0041] (For a given generated ranking, comments are served in accordance with the order defined by the generated ranking, and the reward (e.g. dwell time) is captured. This information is used to update/optimize the scoring model. To update the scoring model, an implementation of a policy gradient algorithm is applied. The policy gradient (PG) algorithm converts the policy search problem into an optimization problem. It works by repeatedly estimating the gradient of the policy's performance (reward) with respect to its parameters followed by gradient ascent to find parameters that can increase the expected rewards.); Fig. 2; [0042]-[0046] (The result of the training is to update the parameters of the neural network (e.g. weights, biases), which are stored to a model document 210. By implementing using a model document to store the model parameters, to update the model the system can feed new weights to that document, and when the model is loaded at runtime for each query, then the next ranking call will always load the latest model parameters. The updated model is thus fed to the comments serving component 212, which uses it to provide optimized presentation of comments, e.g. to maximize dwell time while simultaneously providing for explorative learning. . . . While ideal reinforcement learning updates the model after every datapoint, data may not be received that quickly due to processing time. Thus, in some implementations, training data is collected in batches, which are then used to re-train the neural network. By way of example without limitation, training data can be collected in one or two-hour batches, or any other predefined time period. To carry out model re-training, the current model is loaded, and an iteration of reinforcement learning is run on the (e.g. two-hour) batch of training data. Given training data, the neural network parameters (e.g. weights) are updated.))
This known technique is applicable to the method of Round/Zhu as they both share characteristics and capabilities, namely, they are directed to ranking.
One of ordinary skill in the art at the time of filing would have recognized that applying the known technique of Thadani would have yielded predictable results and resulted in an improved method. It would have been recognized that applying the technique of Thadani to the teachings of Round/Zhu would have yielded predictable results because the level of ordinary skill in the art demonstrated by the references applied shows the ability to incorporate such update features into similar methods. Further, applying the and at an epoch of the plurality of epochs, update search ranking rules used based on the new search ranking rules to the learning agents of Round/Zhu would have been recognized by those of ordinary skill in the art as resulting in an improved method that would provide a ranking system that overcomes the limitations of existing ranking schemes through continuously learning from recent user interactions, thus adapting to changing data and user preferences (Thadani: Para [0029]-[0030]).
As per claim 2, Round/Zhu/Thadani teach wherein the historical listing data for the at least one listing platform comprises historical user behavior information for the at least one listing platform, the historical user behavior information comprising at least one selected from the following: user views of items; time lengths of item views, and user purchases of items. (Round: [0090]-[0091] (In the first step 98 of the indicator item identification process 92, the process 92 identifies a set of one or more source items. As described above, source items are items that are directly associated with the user's 12 current online activity or recent purchase history or other items that are assumed by the list system 26 to be indicators of the user's 12 current interests. For example, as described above, when a user clicks on a link to view a detail page for a given item, that item can be identified as a source item. When a user performs a search of the items available on the website 10, the items returned as search results can be identified as source items. When a user 12 first logs on to a new session at the website 10 and has no current online activity, the most recently purchased items can be identified as source items.));
As per claim 3, Round/Zhu/Thadani teach wherein the historical listing data for the at least one listing platform comprises metadata for the at least one listing platform, the metadata for the at least one listing platform comprising at least one selected from the following: whether the at least one listing platform is internal or external; a size of the at least one listing platform; and a type of the at least one listing platform. (Round: [0179] (In another embodiment, the list system is part of a separate metadata service, such as the Alexa Internet service, in which website metadata is displayed to users through a browser plug-in. Specifically, users of the metadata service are given an option, through the browser plug-in and/or a website of the metadata service provider, to create lists of favorite products or other items. These lists are uploaded to a metadata server operated by the metadata service provider. As a user of the metadata service browses the website of a merchant or other entity, information about the items being viewed by the user is reported by the browser plug-in from the user's computer to the metadata server)
Examiner Note: While prior art has been applied, the Examiner notes that what is included in the metadata is merely nonfunctional descriptive material and is not functionally involved in the steps recited. The historical listing data comprising metadata for the at least one listing platform would be performed the same regardless of the type of information included in the metadata. This descriptive material will not distinguish the claimed invention from the prior art in terms of patentability, see In re Gulack, 70 F.2d 1381, 1385, 217 USPQ 401 (Fed. Cir. 1983); In re Lowry, 32 F.3d 1579, 32 USPQ2d 1031 (Fed. Cir. 1994).
As per claim 4, Round/Zhu/Thadani teach wherein initializing the reinforcement learning agent using the item interaction data comprises using the item interaction data to set at least one selected from the following: an initial function, an initial distribution of items to each listing platform, and an initial search ranking rules for each listing platform. (Round: [0073] (In order to increase the likelihood that a user will make a purchase during a session, in one embodiment, the list selection process 32 can also refer to a “list-effectiveness” table 36 that ranks each list based on the comparative frequency with which users who viewed the list went on to make a purchase or to place an item in their shopping cart. In one embodiment, the list effectiveness table 36 is based on clickstream and purchase history information collected during the normal course of operations for the website 100.); [0086]-[0087] (As illustrated by FIG. 5, in the first step 92 of the list selection process 32, the list system 26 identifies one or more indicator items. Indicator items are items that the system 26 assumes exemplify the current online interests of the user 12 and that the system 26 will use to help identify lists that it assumes will be of interest to the user 12. Once the list selection process 32 has identified indicator items, the process 32 moves on to step 94 where the process 32 scores and ranks lists in the list repository 30 in order to select those lists that the process 32 identifies as being of the greatest potential interest to the user 12. The process 94 of scoring and ranking lists from the repository 30 will be described in greater detail with reference to FIG. 8.); [0102]-[0104] (FIG. 8 shows the general sequence of steps for a process 94 to score and rank lists from the lists repository 30 based on the presence of indicator items in the lists and on the relative SimStrength scores of those items. In step 130, the process 94 calculates a ListStrength score for each list. In one embodiment, the ListStrength score for each list is calculated by identifying all indicator items on the list and by adding the SimStrength score for each. In one embodiment, a bonus is added to the ListStrength score for each indicator item that is also a source item. When the process has calculated all ListStrength scores in step 130, it moves on to step 132, where the process 94 ranks the lists according to their ListStrength scores.))
As per claim 5, Round/Zhu/Thadani teach wherein the reinforcement learning agent operates based on a Markov decision process to update the function over the plurality of epochs. (Zhu: [0047]; [0133]-[0135] (The data analysis server performs learning processing on the key operation behaviors by using a reinforcement learning method to obtain a product recommendation strategy for the user. Optionally, in an example embodiment of the present disclosure, the learning processing the key operation behavior by using the reinforcement learning method to obtain the product recommendation strategy for the user may include: based on a Markov Decision Making Process (MDP), using, as a status, page feature information and/or product feature information corresponding to one or more key operation behaviors before the key operation behavior); [0168]; Fig. 3; [0050]-[0057] (model of MDP))
The motivation for applying the known techniques of Zhu to the teachings of Round is the same as that set forth above, in the rejection of Claim 1.
As per claim 6, Round/Zhu/Thadani teach wherein a first new search ranking rule increases or decreases a ranking of a first item at a first listing platform from the plurality of listing platforms. (Round: [0109] (In one embodiment, the list system 26 displays the top ranked lists, or a randomly chosen subset thereof, to a viewing user 12 in the order of their ranking. Thus, the top-ranked list will be displayed in a first, and, therefore presumably most prominent, position. In one embodiment, the set of lists selected for display is supplemented by at least one randomly chosen list. Re-ordering, or re-ranking, the selected set of lists can affect the positional prominence of each list, and if only a top subset of the re-ranked lists is chosen for display, re-ranking can also affect the set of lists selected for display. In one embodiment, a list re-ranking process 134 re-ranks the top ranked lists in order to give additional prominence to lists that exhibit some preferred characteristic.); [0117]-[0118] (Therefore, the process identifies the next block 150 of three lists to re-rank. This time the lists are List5, List4, and List1. Based on their LER scores 37, the block of lists 150 will be re-ranked to a new ordering of: List5, List1, List4, and these three lists 151 will be added to the set of lists 152 selected for display to the user 12.))
As per claim 8, Round/Zhu/Thadani teach wherein the reinforcement learning agent adjusts the function at each epoch . . . (Round: [0062] (The table is preferably generated periodically, for example, daily or weekly, using a most recent set of purchase history data, product viewing history data, and/or other types of historical browsing data reflecting users' item interests.); [0073] (Periodically, the “Yes/No” or other similar ratings can be compiled, and associated effectiveness ratings can be applied to the lists.); [0086]-[0087]; [0102]-[0104] (FIG. 8 shows the general sequence of steps for a process 94 to score and rank lists from the lists repository 30 based on the presence of indicator items in the lists and on the relative SimStrength scores of those items. In step 130, the process3 94 calculates a ListStrength score for each list. In one embodiment, the ListStrength score for each list is calculated by identifying all indicator items on the list and by adding the SimStrength score for each. In one embodiment, a bonus is added to the ListStrength score for each indicator item that is also a source item. When the process has calculated all ListStrength scores in step 130, it moves on to step 132, where the process 94 ranks the lists according to their ListStrength scores. As an optional step, the process can execute step 134, in which the top lists that have already been ranked 7 viewing user 12 in the order of their ranking. Thus, the top-ranked list will be displayed in a first, and, therefore presumably most prominent, position. In one embodiment, the set of lists selected for display is supplemented by at least one randomly chosen list. Re-ordering, or re-ranking, the selected set of lists can affect the positional prominence of each list, and if only a top subset of the re-ranked lists is chosen for display, re-ranking can also affect the set of lists selected for display. In one embodiment, a list re-ranking process 134 re-ranks the top ranked lists in order to give additional prominence to lists that exhibit some preferred characteristic. In one embodiment, the list re-ranking process 134 draws upon information that is periodically gathered and stored by the system 26 to generate a list effectiveness rating (LER) for each list. The LER of a list is based on information gleaned from the browsing histories of users who have viewed the list. The LER is a measure of the list's comparative performance in encouraging users 12 to make purchases at the website 10 or to place items in their shopping carts soon after having viewed the list or to exhibit some other user action.); [0012]-[0115]; [0116]-[0018] (The re-ranking process 134 begins by identifying a block of top-ranked lists to re-rank. In the example shown in FIG. 11, the block size B is three. Therefore, the top three lists are identified for re-ranking. In this example, the top three lists 148 are List6, List2, and List3. Referring to the list effectiveness table 36, which ranks the lists based on their power to encourage users to purchase, or to consider purchasing, items from the product catalog, the list effectiveness ratings (LERs) 37 for these three lists are 4973, 0684, and 1096, respectively. In the example shown in FIG. 11, an LER score of 0001 is the highest possible score, and a LER score of 9999 is the lowest. Therefore, re-ranking the three “current” lists 148 according to their LER scores produces the new ordering 149: List2, List3, List6. These re-ranked lists 149 are now the top three lists that will be displayed to the user. Thus, among these top three lists, the order of presentation has been altered slightly to give extra prominence to the lists that historically have been shown to encourage viewers to buy or to consider buying products. In the example shown in FIG. 11, the number of lists needed for display, D, is six. At this point, the process 134 has only re-ranked three lists. Therefore, the process identifies the next block 150 of three lists to re-rank. This time the lists are List5, List4, and List1. Based on their LER scores 37, the block of lists 150 will be re-ranked to a new ordering of: List5, List1, List4, and these three lists 151 will be added to the set of lists 152 selected for display to the user 12.);
wherein the reinforcement learning agent adjusts the function at each epoch based at least in part on a reward provided in response to the action selected for the epoch. (Zhu: Fig. 3; [0050]-[0057] ((1) S is a State Space, which contain a set of environmental states that the Agent may perceive. (2) A is an Action Space, which contain the set of actions the Agent may take on each state of the environment. (3) R is a Rewarding Function, and R (s, a, s′) represents the reward that the Agent obtains from the environment when the action a is performed on the state s and the state is changed to state s′ (4) T is the State Transition Function, and T (s, a, s′) can represent the probability of executing action a on state s and moving to state s′. As shown in FIG. 3, in the process of interaction between the Agent and the environment in the MDP, the Agent senses that the environment state at time t is st. Based on the environment state st, the Agent may select an action at from the action space A to execute. After the environment receives the action selected by Agent, it returns corresponding reward signal feedback rt+1 to the Agent and transfers to new environment state st+1, and waits for Agent to make a new decision. In the process of interacting with the environment, the goal of Agent is to find an optimal strategy π* such that π* obtains the largest long-term cumulative reward in any state s and any time step t,); [0109]-[0126] (The Q function estimates are arranged in descending order and the nine candidate products with the highest Q function estimation are presented as recommended products according to the method steps shown in S1208, which displays candidate products when corresponding reward values meet the preset condition.))
The motivation for applying the known techniques of Zhu to the teachings of Round is the same as that set forth above, in the rejection of Claim 1.
As per claim 9, Round/Zhu/Thadani teach wherein the reward is based on a key performance indicator. (Zhu: [0043] (For example, the reward that the recommendation system obtains from the change of the states (such as jumping from one page to another page) is based on the optimization goal. For instance, if the optimization goal is that the user purchases the recommended product, a positive reward is assigned to the recommendation system when the user makes purchases at the order page. For instance, the reward value may be the transaction amount of the purchased product. As the frequency of purchase is not high, in another example, a positive reward is assigned to the recommendation system when the user clicks the recommended content provided by the recommendation system.); Fig. 3; [0050]-[0057] ((3) R is a Rewarding Function, and R (s, a, s′) represents the reward that the Agent obtains from the environment when the action a is performed on the state s and the state is changed to state s′ As shown in FIG. 3, in the process of interaction between the Agent and the environment in the MDP, the Agent senses that the environment state at time t is st. Based on the environment state st, the Agent may select an action at from the action space A to execute. After the environment receives the action selected by Agent, it returns corresponding reward signal feedback rt+1 to the Agent and transfers to new environment state st+1, and waits for Agent to make a new decision. In the process of interacting with the environment, the goal of Agent is to find an optimal strategy π* such that π* obtains the largest long-term cumulative reward in any state s and any time step t,); [0126] The Q function estimates are arranged in descending order and the nine candidate products with the highest Q function estimation are presented as recommended products according to the method steps shown in S1208, which displays candidate products when corresponding reward values meet the preset condition.); [0109]-[0126])
The motivation for applying the known techniques of Zhu to the teachings of Round is the same as that set forth above, in the rejection of Claim 1.
Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Round/Zhu in view of Poznanski (US PGP 2013/0198174).
As per claim 7, Round/Zhu/Thadani teach wherein the reinforcement learning agent is further initialized using one or more . . . search ranking rules. (Round: [0073] (In order to increase the likelihood that a user will make a purchase during a session, in one embodiment, the list selection process 32 can also refer to a “list-effectiveness” table 36 that ranks each list based on the comparative frequency with which users who viewed the list went on to make a purchase or to place an item in their shopping cart. In one embodiment, the list effectiveness table 36 is based on clickstream and purchase history information collected during the normal course of operations for the website 100.); [0086]-[0087] (As illustrated by FIG. 5, in the first step 92 of the list selection process 32, the list system 26 identifies one or more indicator items. Indicator items are items that the system 26 assumes exemplify the current online interests of the user 12 and that the system 26 will use to help identify lists that it assumes will be of interest to the user 12. Once the list selection process 32 has identified indicator items, the process 32 moves on to step 94 where the process 32 scores and ranks lists in the list repository 30 in order to select those lists that the process 32 identifies as being of the greatest potential interest to the user 12. The process 94 of scoring and ranking lists from the repository 30 will be described in greater detail with reference to FIG. 8.); [0102]-[0104] (FIG. 8 shows the general sequence of steps for a process 94 to score and rank lists from the lists repository 30 based on the presence of indicator items in the lists and on the relative SimStrength scores of those items. In step 130, the process 94 calculates a ListStrength score for each list. In one embodiment, the ListStrength score for each list is calculated by identifying all indicator items on the list and by adding the SimStrength score for each. In one embodiment, a bonus is added to the ListStrength score for each indicator item that is also a source item. When the process has calculated all ListStrength scores in step 130, it moves on to step 132, where the process 94 ranks the lists according to their ListStrength scores.))
. . . using one or more user-provided search ranking rules. (Poznanski: Fig. 4-8; [0050]-[0056] (After a start operation, the process 400 flows to operation 410, where a GUI is displayed that displays options for allowing a user to configure a ranking rule (See FIGS. 5-8 for exemplary GUI displays). Moving to operation 420, a user selects an option to add a ranking rule. In response to selecting the option, a user configures a new rule. Flowing to operation 430, a user defines the rule by specifying a match type, a match value for the specified match type and a re-ranking action to perform on any search results that meet the specified match type and match value. Transitioning to operation 440, the ranking rule may be created once the specified values are received.); Fig. 3; [0046] (Flowing to operation 330, any ranking rules are applied to the search results.))
This known technique is applicable to the method of Round/Zhu as they both share characteristics and capabilities, namely, they are directed to displaying items based on user historical data.
One of ordinary skill in the art at the time of filing would have recognized that applying the known technique of Poznanski would have yielded predictable results and resulted in an improved method. It would have been recognized that applying the technique of Poznanski to the teachings of Round/Zhu would have yielded predictable results because the level of ordinary skill in the art demonstrated by the references applied shows the ability to incorporate such user-provided search ranking rules features into similar methods. Further, applying the user-provided search ranking rules to the search ranking rules of Round/Zhu would have been recognized by those of ordinary skill in the art as resulting in an improved method that would allow users to rank results differently if wanted, and build a better ranking model. (Poznanski: Para [0040], [0057]).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Hu, Yujing, et al. 2018. Reinforcement Learning to Rank in E-Commerce Search Engine: Formalization, Analysis, and Application. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '18). Association for Computing Machinery, New York, NY, USA, 368–377. -- reinforcement learning to rank in e-commerce search engine.
Burhani (US PGP 2019/0370649) -- reward for the reinforcement learning neural network reflecting a difference between the second performance metric and the first performance metric is computed and provided to the reinforcement learning neural network to train the automated agent.
Zhuang (US Pat No 8,744,978) – user customized ranking criteria.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JENNIFER V LEE whose telephone number is (571)272-4778. The examiner can normally be reached Monday - Friday 9AM - 5PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, JEFFREY A. SMITH can be reached at (571)272-6763. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/JENNIFER V LEE/Examiner, Art Unit 3688
/Jeffrey A. Smith/Supervisory Patent Examiner, Art Unit 3688