Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 11/26/2024 and 11/21/2023 has been considered by the examiner.
Specification
The specification and the drawing submitted on 07/07/2023 has been considered by the examiner.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Bathe et al (US 20220245530 A1 --- hereinafter—Bathe) in view of WANG et al. (US 20220245393 A ---hereinafter--"WANG”).
As per claim 1: Bathe discloses a method for managing a prediction model ([0018] generating one or more feature vectors for a user, the one or more feature vectors at least comprising transaction-based features and slot-based features; generating, using a machine learning architecture, a repurchase prediction for the user based, at least in part, on the one or more feature vectors; generating, using the machine learning architecture, a time slot prediction for the user based, at least in part, on the one or more feature vectors, the time slot prediction predicting a time slot desired by the user for an upcoming transaction; and executing a reservation function that facilitates reserving of the time slot for the user), comprising:
obtaining gradient information associated with the prediction model based on sample data for a time slot in a predetermined time period ([0066] The machine learning architecture 350 can be configured to analyze the historical data 320 (including the slot selection data 420) to generate repurchase predictions 351 indicating whether or not the users 305 are likely to place or initiate new transactions 321 within an upcoming time period 455, and time slot predictions 352 indicating the time slots 370 the users 305 will prefer in connection with placing or initiating these transactions 321. The machine learning architecture 350 can utilize various machine learning models to generate these predictions. Exemplary machine learning models can include a random forest decision tree model 451, a logistic regression model 452, and gradient boosted tree model 453);
acquiring an offset of the time slot in the predetermined time period ([0085] The random forest decision tree model 451 can be configured for multi-class prediction to generate the time slot predictions 352. The machine learning architecture 350 may additionally, or alternatively, use a heuristics approach to generate the time slot predictions 352. For each user, the heuristics approach can identify a frequency parameter (e.g., indicating the most frequent slot chosen by the user based on the historical data 320 stored for the user) and a recency parameter (e.g., indicating the most recent time slot chosen by the user based on the historical data 320 stored for user). These parameters may then be used as baselines to generate the time slot predictions 352. Note that: The frequency parameter and recency parameter are used as a baselines (offset) to generate the time slot predictions); and
updating a parameter of the prediction model based on the gradient information, the offset, and historical gradient information that is determined based on historical sample data for a group of historical time slots before the time slot (([0072] The random forest decision tree model 451, logistic regression model 452, and/or gradient boosted tree model 453 can be trained on feature vectors 460 (including the transaction-based features 461 and slot-based features 462) generated from several months of historical data 320 for a large number of users 305 (e.g., thousands or millions of users 305). The performance of these models can be evaluated based on the purchases of the same set of users 305 in a subsequent time period (e.g., k days or weeks) following the several months of data used for training. The performance of these models can be evaluated against different frequency and recency-based baseline heuristics, e.g., which may indicate the most frequent time slot 370 chosen by the users 305 in the historical data 320 and the most recent time slot 370 chosen by the users 305).
Bathe does not explicitly disclose determining a step size for the updated parameter. WANG, in analogous art however, discloses determining a step size for the updated parameter ([0041] The dynamic interval function increases (increasing sampling frequency) as the gradient increases. The sampling frequency can decrease when the gradient decreases. The degree (gain) to which the sample frequency changes as a function of the gradient can be arbitrarily defined depending on the needs of a particular application. Moreover, the change in sampling can be rough (predefined steps) or smooth (arbitrary steps) based on the slope of the gradient. [0045] A revision metric is calculated based on the gradient and the deviation at each time step: Revise(t)=revise(d(t), g.sub.t)). The revision metric can evaluate to “true” or “false”, indicating that the gradient is enough (e.g., sharply negative but just above the L threshold; sharply positive but just below the H threshold, etc.) to trigger a judgment adjustment. [0049] Having multiple data sets “k” against which deviations are calculated, the compare module 212 can include further functionality, such as a precheck module 600 that orders the data sets in terms of importance prior to passing those real data-prediction data pairs to a compare block 602. FIG. 6 Precheck module 600 that can be incorporated within or prior to the compare module 212. Deviations D.sub.1, D.sub.2, . . . , D.sub.k for k data sets can be reordered using an importance precheck 604 (also referred to as an importance-weighted order) into the list D.sub.i, D.sub.l, D.sub.b, . . . , D.sub.s. In some embodiments of the invention, ordering of the k data sets can be by magnitude of the individual deviations (d.sub.i) for each time step. For example, the deviations can be reordered such that the most deviant data set is first (most critical) and the least deviant data set is last (least critical). Only the real data-predicted data pairs for the top “x” deviations in terms of magnitude are passed to a compare block 602, reducing the computational load and time latency of the system. Each real-prediction pair (R.sub.i-P.sub.i) passed through to the compare block 602 is evaluated to determine whether the deviation d for the pair is >H, <L, or L<d<H. If a current real-prediction pair (RN-PN) for a dataset “N” is within the L and H thresholds, the next real-prediction pair (R.sub.N+1-P.sub.N+1) for the next dataset “N+1” is checked until a pair is found having a deviation >H or <L. Once found, the corresponding dataset (here, N+1) is deemed most critical for a next judgment, and the associated real-prediction pair is passed to the compare module 212 and/or the compare block 502 (FIG. 5). This process of selectively making judgments on only the most critical data sets (which dynamically change over time depending on each data set's deviation value at each time step) or only the “x” most critical data sets can continue indefinitely and the number of “x” data sets chosen for judgments can be adjusted based on the computational availably or time latency constraints of a given application. For example, in applications having strict time latency constraints (judgments must be made as quickly as possible) the number “x” can be lowered. The value of “x” can be dynamically adjusted, such as, for example, lowered when the system cannot achieve predetermined time latency constraints at the current value for “x”. In this manner a tradeoff can be made between computational rigor and platform-specific timing requirements). Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the claimed limitations of the updated parameter disclosed by Bathe to include determining a step size for the updated parameter. This modification would have been obvious because a person having ordinary skill in the art would have been motivated by the desire to provide products to dynamically evaluate the acceptability of a model by generating a plurality of predictions that each define a plurality of future inputs for the model. A deviation curve can be generated by determining a distance between each prediction of the plurality of predictions and a respective known data point of a plurality of known data points. One or more points in the deviation curve are sampled and the sampled points are compared to a low threshold and a high threshold. A judgement is determined for each prediction to determine whether the respective prediction will be accepted or denied as an input to the model. The future inputs for the model are modified based on the judgments as suggested by WANG ([0001- [0004]).
As per claim 2: Bathe in view of WANG discloses the method according to claim 1, wherein determining the step size comprises:
determine a weight for the historical gradient information based on the offset (WANG [0034] the prediction module 208 is implemented as one or more neural network(s) which can be trained on historical data (e.g., real-time data history 228) to generate weights for one or more internal (hidden) layers of the neural network. [0050] deviations D.sub.1, D.sub.2, . . . , D.sub.k for k data sets can be reordered using an importance precheck 604 (also referred to as an importance-weighted order) into the list D.sub.i, D.sub.l, D.sub.b, . . . , D.sub.s); and
generating the step size based on the gradient information, the historical gradient information, and the weight for the historical gradient information (WANG [0020] the sampling interval itself can be dynamically adjusted according to calculated changes in a gradient (slope) at a sampling point in the deviation curve. For example, a large gradient can indicate an acceleration or deceleration of the gap at a future sampling point. This gradient data can be leveraged to make improved judgments (e.g., denying a prediction that would otherwise be acceptable because the gradient at that sampling point indicates that the gap is accelerating).
As per claim 3: Bathe in view of WANG discloses the method according to claim 2, wherein the weight is within a predefined area and increases with the offset (WANG [0051] an example workflow 700 for evaluating model acceptability in accordance with one or more embodiments. As shown in FIG. 7, a system for evaluating model acceptability can include the prediction module 208, the sampling module 210, and the compare module 212 discussed previously. FIG. 7 further illustrates the data flows over time when evaluating model acceptability. In particular, judgments at times t−n, t−1, t, t+n−1, and t+n are provided for scenarios Deny, Recheck, and Accept. In FIG. 7, “m” is the sampling interval, “n” is the prediction interval, “f(x)” is the sampling function, “D” is the deviation, “L” is the Low deviation threshold, “H” is the High deviation threshold, “R.sub.p” is the prediction data result, and “R.sub.r” is the real data result).
As per claim 4: Bathe in view of WANG discloses the method according to claim 2, wherein generating the step size comprises:
determining an intermediate parameter associated with the time slot based on the gradient information and a weighted historical gradient information that is determined based on the historical gradient information and the weight (WANG [0040] The sampling module 210 is configured to sample the deviation values (Di) at one or more times “t.sub.0 . . . t.sub.N”. In some embodiments of the invention, the sampling frequency (time between samples) can be predetermined (e.g., every 5 seconds, 10 minutes, hour, etc.). In some embodiments of the invention, the sampling frequency can be dynamically calculated. For example, timestamp t.sub.i+1 for a next sample can occur at timestamp t.sub.i+a dynamic interval m(g.sub.i), resulting in the sampling interval formula: f.sub.i+1=sample(t.sub.i+m(g.sub.i)) where t.sub.i is the last sampling timestamp, g.sub.i is the last sampling point gradient, and m(g.sub.i) is the sampling interval function defined as m(g.sub.i)=dyn_interval(g.sub.i−1, g.sub.i) where “g” is the gradient of the deviation curve calculated at the time t.sub.i); and
creating the step size based on the intermediate parameter and the gradient information (WANG [0041] the dynamic interval function increases (increasing sampling frequency) as the gradient increases. Likewise, the sampling frequency can decrease when the gradient decreases. The degree (gain) to which the sample frequency changes as a function of the gradient can be arbitrarily defined depending on the needs of a particular application. Moreover, the change in sampling can be rough (predefined steps) or smooth (arbitrary steps) based on the slope of the gradient; WANG [0042] The Deviation Curve 402 illustrated in FIG. 4 provides example dynamic sampling points taken along the sampling lines “f” along the curve “d” and their associated calculated gradients at times t.sub.0, t.sub.1, t.sub.2, t.sub.i+1, etc. As shown in FIG. 4, the sampling rate increased (the dynamic interval decreased) during the interval t.sub.i, t.sub.i+1 due to the relatively high gradient g.sub.i−1 at time t.sub.i−1. It should be understood that the exact gradient slope required for a given change in sampling frequency can be arbitrarily defined. In some embodiments of the invention, the dynamic interval function itself is predefined. For example, each 5% increase in the gradient can result in a 5% (or 10%, etc.) increase in sample frequency, although a linear function is merely used for illustration and non-linear functions are also within the scope of the invention).
As per claim 5: Bathe in view of WANG discloses the method according to claim 4, wherein creating the step size comprises:
obtaining an attenuation factor for the intermediate parameter based on the offset (WANG [0037] the sampling module 210 is configured to continuously or periodically calculate a deviation value (D.sub.i) for each prediction (P.sub.i) based on the known real data (R.sub.i) at the time corresponding to each prediction. In some embodiments of the invention, the deviation values (D.sub.i) are taken as the error between the prediction (P.sub.i) and the known real data (R.sub.i) according to the error formula: (R.sub.i−P.sub.i)/R.sub.i, although it is understood that any other suitable deviation measure can be used (absolute difference, percent difference, etc.). In some embodiments of the invention, the deviation values (D.sub.i) are plot over time to define a deviation curve, such as the Deviation Curve 402 illustrated in FIG. 4 and corresponding to the Sampling Curve 400 discussed previously); and
determining the step size based on the gradient information and an attenuated intermediate parameter that is determined based on the intermediate parameter and the attenuation factor (WANG [0040] In some embodiments of the invention, the sampling module 210 is configured to sample the deviation values (Di) at one or more times “t.sub.0 . . . t.sub.N”. In some embodiments of the invention, the sampling frequency (time between samples) can be predetermined (e.g., every 5 seconds, 10 minutes, hour, etc.). In some embodiments of the invention, the sampling frequency can be dynamically calculated. For example, timestamp t.sub.i+1 for a next sample can occur at timestamp t.sub.i+a dynamic interval m(g.sub.i), resulting in the sampling interval formula: f.sub.i+1=sample(t.sub.i+m(g.sub.i)) where t.sub.i is the last sampling timestamp, g.sub.i is the last sampling point gradient, and m(g.sub.i) is the sampling interval function defined as m(g.sub.i)=dyn_interval(g.sub.i−1, g.sub.i) where “g” is the gradient of the deviation curve calculated at the time t.sub.i).
As per claim 6: Bathe in view of WANG discloses the method according to claim 1, wherein obtaining the gradient information comprises:
obtaining a prediction for a label portion in the sample data based on a data portion in the sample data and the prediction model (Bathe [[073] The prediction for a new sample X (e.g., which may include the feature vector associated with a user) is obtained by taking the majority vote of the n predictions from n trees. Each leaf corresponding to X contains the majority vote of the labels from the training samples which belong to that leaf);
determining a loss between the prediction for the label portion and the label portion (Bathe [0082] The machine learning architecture 350 can additionally, or alternatively, utilize a gradient boosted tree model 453 to generate the repurchase predictions 351 and/or time slot predictions 352. The gradient boosted tree model 453 can include an ensemble of weak classifiers and may be implemented with decision trees in some cases);and
acquiring the gradient information based on a gradient of the loss and the parameter of the prediction model (Bathe [0083] During training, the gradient boosted tree model 453 receives the feature vector 460 for the users and corresponding labels y, and learns a sequence of K decision trees via a boosting method. The parameters of the gradient boosted tree model 453, which include tree depth and K number of decision trees, can be declared at the beginning of the training process).
As per claim 7: Bathe in view of WANG discloses the method according to claim 6, wherein the data portion represents features associated with a user and an object, the label portion represents an event between the user and the object, and the predetermined time period has a length of one or more days (Bathe [0080] the random forest decision tree model 451 can be configured for multi-class prediction using random forests. Unlike a binary prediction problem such as the purchase prediction model above (e.g., which predicts whether or not the user will purchase in the next k days in a binary fashion), the random forest decision tree model 451 can be configured to address a multi-class problem that chooses between multiple outcomes or classes to generate time slot predictions. In this scenario, each time slot (e.g., 1-2 PM on Date 1, 2-3 PM on Date 1, etc.) may represent a class. The same or similar historical data and feature vectors described above may be used to train the model and to generate the time slot predictions. For example, exemplary feature vectors that are used to train the model can include slot-based features indicating any or of the following features: a number of times each time slot was chosen; number of times each time slot was chosen as a ratio of total number of orders; the last time or occurrence each time slot was chosen; whether a user prefers time slots on weekends or weekdays; and/or whether the user prefers a morning or evening time slot).
As per claim 8: Bathe in view of WANG discloses the method according to claim 1, further comprising determining the historical gradient information by:
obtaining respective gradient information based on respective historical sample data for the group of historical time slots before the time slot (Bathe [0084] The output of the gradient boosted tree model 453 can include a repurchase prediction 351 that indicates the probability and/or likelihood that each user will repurchase items and/or place a new transaction 321 in the next k days. The output can be a value between 0 and 1. The gradient boosted tree model 453 can then learn thresholds for the probabilities based on the distribution of the users and their transactions with the goal of maximizing the confidence in the predictions. The particular threshold chosen can vary, but can include any number between 0 and 1 that reflects whether users are likely to repurchase in the next k days. In some exemplary cases, the threshold may be set to 0.5, 0.7, 0.8, or 0.9); and
acquiring the historical gradient information based on the obtained respective gradient information (Bathe [0085] the random forest decision tree model 451 can be configured for multi-class prediction to generate the time slot predictions 352. The machine learning architecture 350 may additionally, or alternatively, use a heuristics approach to generate the time slot predictions 352. For each user, the heuristics approach can identify a frequency parameter (e.g., indicating the most frequent slot chosen by the user based on the historical data 320 stored for the user) and a recency parameter (e.g., indicating the most recent time slot chosen by the user based on the historical data 320 stored for user). These parameters may then be used as baselines to generate the time slot predictions 352).
As per claim 9: Bathe in view of WANG discloses the method according to claim 8, wherein acquiring the historical gradient information comprises:
determining respective squares of respective gradient information associated with the respective historical time slots in the group of historical time slots, the group of historical time slots being within the predefined time period (Bathe [0058] The machine learning architecture 350 (or other component of the system 300) may initially analyze the historical data 320 to identify a subset of users who routinely and/or regularly utilize the electronic platform 330 and/or who routinely and/or regularly conduct transactions 321 for particular types of items (e.g. groceries and/or household items). The machine learning architecture 350 and scheduling system 360 can receive this list of active users and execute the functions described herein (e.g., related to generating repurchase predictions 351, generating time slot predictions, and executing reservation functions 361) for these users 305);
determining the historical gradient information based on a sum of the respective squares (Bathe [0083] During training, the gradient boosted tree model 453 receives the feature vector 460 for the users and corresponding labels y, and learns a sequence of K decision trees via a boosting method. The parameters of the gradient boosted tree model 453, which include tree depth and K number of decision trees, can be declared at the beginning of the training process. During inference, prediction ŷ of each individual tree i in an ensemble of size K is computed based on the feature vector 460. In this algorithm, the predictions can be updated such that the sum of our residuals is close to 0 (or minimum) and predicted values are sufficiently close to the actual values. The gradient boosted tree model 453 can use multiple weak learners to reduce the prediction error before it outputs the class for the repurchase predictions 351 indicating whether the customer will purchase in the next k days).
As per claim 10: Bathe in view of WANG discloses the method according to claim 1, further comprising: updating the parameter of the prediction model with the step size (WANG [0085] the random forest decision tree model 451 can be configured for multi-class prediction to generate the time slot predictions 352. The machine learning architecture 350 may additionally, or alternatively, use a heuristics approach to generate the time slot predictions 352. For each user, the heuristics approach can identify a frequency parameter (e.g., indicating the most frequent slot chosen by the user based on the historical data 320 stored for the user) and a recency parameter (e.g., indicating the most recent time slot chosen by the user based on the historical data 320 stored for user). These parameters may then be used as baselines to generate the time slot predictions 352).
As pers claims 11-19: Claims 11-19 are directed to an electronic device, comprising a computer processor coupled to a computer-readable memory unit, the memory unit comprising instructions that when executed by the computer processor implements a method for managing a prediction model, the method having substantially similar corresponding limitations of claims 1-6, 7 and 10, and 8-9 respectively, and therefore 11-19 are rejected with the same rationale given above to reject corresponding limitations of claims 1-6, 7 and 10, and 8-9 respectively.
As per claim 20: Claim 20 is directed to a non-transitory computer program product, the non-transitory computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by an electronic device to cause the electronic device to perform a method for managing a prediction model, the method having substantially similar corresponding limitations of claim 1 and therefore claims 20 is rejected with the same rationale given above to reject corresponding limitations of claim 1.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TECHANE GERGISO whose telephone number is (571)272-3784. The examiner can normally be reached 9:30am to 6:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, LINGLAN EDWARDS can be reached at (571) 270-5440. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/TECHANE GERGISO/ Primary Examiner, Art Unit 2408