Last updated: April 19, 2026
Application No. 18/642,485
MULTI-TASK DEEP LEARNING OF CUSTOMER DEMAND

Final Rejection §101§103§DP
Filed
Apr 22, 2024
Examiner
PUJOLS-CRUZ, MARJORIE
Art Unit
3624
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
Adp Inc.
OA Round
2 (Final)
Interview Optional

— +27.9% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 136 resolved cases, 2023–2026
Examiner Intelligence

PUJOLS-CRUZ, MARJORIE View full profile →
Grants only 18% of cases
Career Allow Rate
25 granted / 136 resolved
-33.6% vs TC avg
Strong +28% interview lift
Without
With
+27.9%
Interview Lift
resolved cases with interview
Typical timeline
3y 2m
Avg Prosecution
50 currently pending
Career history
186
Total Applications
across all art units
Statute-Specific Performance

§101
38.7%
-1.3% vs TC avg
§103
43.3%
+3.3% vs TC avg
§102
9.4%
-30.6% vs TC avg
§112
6.6%
-33.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 136 resolved cases
Office Action

§101 §103 §DP
DETAILED ACTION
This communication is a Final Office Action rejection on the merits. Claims 22-41 are currently pending and have been addressed below.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments

Applicant's arguments filed on 12/17/2025 (related to the 103 Rejection) have been fully considered but are moot in view of new grounds of rejection. Applicant's amendments necessitated the new ground(s) of rejection presented in this Office action. Rejection based on a newly cited reference(s) follows.

Applicant's arguments filed on 12/17/2025 (related to the 101 Rejection) have been fully considered but they are not persuasive.
Applicant states, on pages 8-10, that claim 22 describes a specific AI architecture that includes (i) a recurrent neural network (RNN) layer stack to output distribution parameters per change-event type from sequences of timestamped static and dynamic features, and (ii) for each change-event type, fully-connected neural-network (FCNN) layer groups configured to produce probability density functions using one or more distributions with weights with a predetermined constraint, and (iii) causes a server to update database entries in persistent storage prior to or at a predicted time-to-event. These claim elements collectively amount to a technical solution to a technical problem (modeling event timing at scale from heterogeneous telemetry), and integrate any alleged abstract idea into a practical application. See MPEP § 2106 and the 2019 PEG (and October 2019 Update).
Examiner respectfully disagrees with Applicant. These claim elements are considered to be abstract ideas because they are directed to “certain methods of organizing human activity” which include “commercial or legal interactions.” In this case, a method for determining one or more subscription change events and a time-to-event for each change based on historical data is a form of marketing or sales activities or behaviors (e.g., analyzing customer behaviors over time). If a claim limitation, under its broadest reasonable interpretation, covers commercial or legal interactions, then it falls within the “certain methods of organizing human activity” grouping of abstract ideas. Accordingly, the claim recites an abstract idea. 
The mere nominal recitation of generic computer components does not take the claim out of the methods of organizing human activity grouping. The additional element of a recurrent neural network (RNN) is merely used to model changes in customer demand for product bundles based on customer activity over time (Paragraph 0055). The FCNN is merely used to approximate the density (distribution) of an event time (Paragraph 0007). In this case, the model includes inputs (e.g., subscription data to identify one or more change events) and outputs (e.g., a probability density function). Although the model further receives feedback over time to update a database entry (Paragraph 0065), the claim and specification do not include any specific details about how the trained model operates or how the sequence is generated, which is merely claiming the idea of a solution or outcome (see MPEP 2106.05(a)). Rather, “receiving data/feedback to improve accuracy of the machine learning” is just an inherent characteristic used to train a machine learning model over time. See example 47 of July 2024 AI Subject Matter Eligibility. Therefore, the model is recited at a high level of generality, which results in “apply it” (see MPEP 2106.05(f)).
Also, the combination of using an RNN in conjunction with a FCNN is a “well-known” algorithm used to output a probability for classification in a time series (see 103 Rejection). Further, the step of "automatically training/updating a database" is considered a well-understood, routing, and conventional function since it’s just “performing repetitive calculations” and “storing information in a memory” (MPEP 2106.05(d)).
The claim fails to recite any improvements to another technology or technical field, improvements to the functioning of the computer itself, use of a particular machine, effecting a transformation or reduction of a particular article to a different state or thing, adding unconventional steps that confine the claim to a particular useful application, and/or meaningful limitations beyond generally linking the use of an abstract idea to a particular environment.  See 84 Fed. Reg. 55. Viewed individually or as a whole, these additional claim element(s) do not provide meaningful limitation(s) to transform the abstract idea into a patent eligible application of the abstract idea such that the claim(s) amounts to significantly more than the abstract idea itself.  Thus, the claim is not patent eligible. 
Independent claims 35 and 41 recite similar features and therefore are rejected for the same reasons as independent claim 22. Claims 23-33 and 36-40 are rejected for having the same deficiencies as those set forth with respect to the claims that they depend from, independent claims 22, 35, and 41.
Examiner recommends to further specify the specific inputs provided to the model, the deep learning architecture used to output a prediction of a bundle change, and the specific bundle change events predicted by the model (see Figure 7 and Paragraphs 0059-0067, Deep learning architecture 700 comprises a separate FCNN layer group for each type of predicted bundle change.  In the present example, the three possible change events are upgrade, downgrade, and termination.  Therefore, there are three FCNN layer groups 704, 706, 708, one for each type of change event). Also, specify how the deep learning architecture improves the predictions of the model.


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 22-41 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e. an abstract idea) without reciting significantly more. 

Independent Claim 22
Step One - First, pursuant to step 1 in the January 2019 Revised Patent Subject Matter Eligibility Guidance (“2019 PEG”) on 84 Fed. Reg. 53, the claim 22 is directed to an apparatus which is a statutory category.
Step 2A, Prong One - Claim 22 recites: A system, comprising to: identify one or more change events corresponding to one or more bundles of services for one or more account identifiers; determine metrics indicative of an amount of activity corresponding to the one or more bundles of services associated with the one or more account identifiers; to output, for each change-event type, one or more distribution parameters from a sequence of timestamps of static and dynamic features; and for each change-event type, a layer group configured to produce a probability density function using one or more distributions with weights with a predetermined constraint; determine, using the one or more change events and the metrics input into the model, one or more subscription change events and a time-to-event for each change to the one or more bundles of services according to the probability density functions; and generate data to cause to update, for the one or more account identifiers, a database entry associated with a service of the one or more bundles of services, the update stored prior to or at the time-to-event. These claim elements are considered to be abstract ideas because they are directed to “certain methods of organizing human activity” which include “commercial or legal interactions.” In this case, a method for determining one or more subscription change events and a time-to-event for each change based on historical data is a form of marketing or sales activities or behaviors (e.g., analyzing customer behaviors over time). If a claim limitation, under its broadest reasonable interpretation, covers commercial or legal interactions, then it falls within the “certain methods of organizing human activity” grouping of abstract ideas. Accordingly, the claim recites an abstract idea. 
Step 2A Prong 2 - The judicial exception is not integrated into a practical application. Claim 1 includes additional elements: one or more processors, coupled with memory; one or more client devices via one or more connections; an artificial intelligence model comprising: a recurrent neural-network (RNN) layer stack; a fully-connected neural-network (FCNN); via a network; a server; and a persistent storage.
The processor is merely used to execute instructions (Paragraph 0081). The memory is merely used to store instructions (Paragraph 0081). The client devices via one or more connections is merely used to provide information (Paragraph 0026). The recurrent neural network (RNN) is merely used to model changes in customer demand for product bundles based on customer activity over time (Paragraph 0055). The FCNN is merely used to approximate the density (distribution) of an event time (Paragraph 0007). The network is merely used to provide communications links between various devices and computers (Paragraph 0025). The server is merely used to connect network along with storage unit (Paragraph 0026). The persistent storage is merely used to store information (Paragraph 0082). Merely stating that the step is performed by a computer component results in “apply it” on a computer (MPEP 2106.05f). These elements of “processor,” “memory,” “client devices,” “RNN,” “FCNN,” “network,” “server,” and “persistent storage” are recited at a high level of generality such that it amounts no more than mere instructions to apply the exception using a generic computer element. Further, the processor and client device are also considered “insignificant extra-solution activity” since they’re just “mere data gathering” (MPEP 2106.05g) to use it for predicting changes in customer demand. Accordingly, alone and in combination, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. Therefore, the claims are directed to an abstract idea.
Step 2B - The claim does not include additional elements that are sufficient to amount significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the claims describe how to generally “apply” the concept of predicting types and timing of changes in customer bundle subscriptions based on historical data. The specification shows that the processor is merely used to execute instructions (Paragraph 0081). The memory is merely used to store instructions (Paragraph 0081). The client devices via one or more connections is merely used to provide information (Paragraph 0026). The recurrent neural network (RNN) is merely used to model changes in customer demand for product bundles based on customer activity over time (Paragraph 0055). The FCNN is merely used to approximate the density (distribution) of an event time (Paragraph 0007). The network is merely used to provide communications links between various devices and computers (Paragraph 0025). The server is merely used to connect network along with storage unit (Paragraph 0026). The persistent storage is merely used to store information (Paragraph 0082). In this case, the model includes inputs (e.g., subscription data to identify one or more change events) and outputs (e.g., a probability density function). Although the model further receives feedback over time to update a database entry (Paragraph 0065), the claim and specification do not include any specific details about how the trained model operates or how the sequence is generated, which is merely claiming the idea of a solution or outcome (see MPEP 2106.05(a)). Also, the processor, client device, and persistent storage are merely used to perform conventional functions such as “receiving and transmitting over a network,” “storing information in a memory,” and “determining an estimated outcome” (see MPEP 2106.05d). Thus, nothing in the claim adds significantly more to the abstract idea. The claim is ineligible.
Independent claim 35 is directed to a method at step 1, which is a statutory category. Claim 35 recites similar limitations as claim 22 and is rejected for the same reasons at step 2a, prong one; step 2a, prong 2; and step 2b. Thus, the claim is ineligible.
 Independent claim 41 is directed to an article of manufacture at step 1, which is a statutory category. Claim 41 recites similar limitations as claim 22 and is rejected for the same reasons at step 2a, prong one; step 2a, prong 2; and step 2b. Claim 41 further recites: a computer storage medium – which is treated as just an explicit “processor/computer” for executing and storing the operations and is treated under MPEP 2106.05f in the same manner as claim 1. Accordingly, this additional element of “computer storage medium” is viewed as “apply it on a computer” at step 2a, prong 2 and step 2b. Thus, nothing in the claim adds significantly more to the abstract idea. The claim is ineligible.
Dependent claims 23-29, 31-34 and 36-40 are not directed to any additional claim elements. Rather, these claims offer further descriptive limitations of the inputs and outputs of the RNN - such as: wherein the output is for each change-event type; wherein the inputs include metrics corresponding to the one or more bundles of services of the one or more account identifiers; wherein the inputs include a static feature and a dynamic feature; wherein the inputs include one or more change events based on the subscription data; wherein the time-to-event for each bundle subscription change comprises a normalized risk score; wherein accounts associated with the account identifiers are grouped according to a number of shared static features; wherein the one or more change events include an upgrade event, a downgrade event and a termination event; wherein the one or more change events and time-to-event comprises a density approximation for the metrics, wherein the density approximation is formed by determining a weighted average of the probability density functions; and wherein predicting a type and timing of change in bundle subscription for a particular account is based on past activities of that account and past activities of a number of other accounts sharing specified static features. Those descriptive elements are merely stating the inputs and outputs of the RNN. However, the claims and specification do not include any specific details about how the trained model operates or how the sequence is generated, which is merely claiming the idea of a solution or outcome (see MPEP 2106.05(a)). See example 47 of July 2024 AI Subject Matter Eligibility. Therefore, the model is recited at a high level of generality, which results in “apply it” (see MPEP 2106.05(f)).
Dependent claim 30 is directed to additional elements such as: a neural network stack. The neural network stack is merely used to learn deep representations of multiple different change events (Paragraph 0057). Merely stating that the step is performed by a computer component (e.g., a neural network stack) results in “apply it” on a computer (MPEP 2106.05f) being applicable at both Step 2A, Prong 2 and Step 2B. Thus, nothing in the claim adds significantly more to the abstract idea. The claim is ineligible.
Examiner recommends to further specify the specific inputs provided to the model, the deep learning architecture used to output a prediction of a bundle change, and the specific bundle change events predicted by the model (see Figure 7 and Paragraphs 0059-0067, Deep learning architecture 700 comprises a separate FCNN layer group for each type of predicted bundle change. In the present example, the three possible change events are upgrade, downgrade, and termination. Therefore, there are three FCNN layer groups 704, 706, 708, one for each type of change event). Also, specify how the deep learning architecture improves the predictions of the model.













Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 22-40 are rejected under 35 U.S.C. 103 as being unpatentable over Korte et al. (US 2020/0322662 A1), in view of Yu (Yu, R., Zheng, Y., Zhang, R., Jiang, Y. and Poon, C.C., 2019. Using a multi-task recurrent neural network with attention mechanisms to predict hospital mortality of patients. IEEE journal of biomedical and health informatics, 24(2), pp.486-492).
Regarding claim 22 (New), Korte et al. discloses a system, comprising: one or more processors, coupled with memory, to (Paragraph 0003, method and system may input telemetry into a data structure. The telemetry may comprise a plurality of measurement results or other data collected from remote network nodes. Subscriber use metrics may be determined from the telemetry. From the use metrics and habits of the subscriber, user characteristics may be determined. A likelihood of service cancellation, based on the user characteristics, may be estimated in an embodiment; See provisional application No. 62/830,072, Paragraph 0002): 
identify one or more change events corresponding to one or more bundles of services for one or more account identifiers (Paragraph 0066, Data mining applications are good candidates for prediction tasks. Trying to determine future outcomes based on estimations from historic data can be as simple as guessing the classification of the next inputs. One of the practical reasons to exercise Data Mining techniques might be to identify customers who are not presently enjoying their service and to predict the possibility of them cancelling their subscription; Paragraph 0067, Data Mining may be defined as the process of discovering patterns in data, either automatically or semi-automatically. This process is supported by tools and practical techniques, also known as Machine Learning, which are used to identify the underlying structure of the data. This structure is then represented in a comprehensible manner for use in Data Mining applications. Pattern descriptions are considered the output of a learning process; Paragraph 0083, The Model makes a decision as to “At Risk” or “Not at Risk”. Later, new data is augmented by actual subscriber behavior (whether he/she cancelled the service within certain period, i.e. one month, or not); See provisional application No. 62/830,072, Paragraphs 0065-0066 & 0143); 
determine metrics indicative of an amount of activity corresponding to the one or more bundles of services utilized by one or more client devices via one or more connections associated with the one or more account identifiers (Paragraph 0145, KPI and KQI performance values alone may not be enough to predict subscriber behavior. For instance, some subscribers may call the support center or leave the service while other subscribers with the same KQI do nothing and maintain their service. Subscribers should be classified by risk factor and subscribers in the high-risk group may be addressed first followed by lower-risk subscriber groups. Geolocation, for example, subscribers in a building and the presence of competitors may be considered as factors. Other subscriber behavior factors including history of complaints, purchases, viewing habits may also be considered; See provisional application No. 62/830,072, Paragraph 0127); 
select an artificial intelligence model comprising: a recurrent neural-network (RNN) layer stack configured to output, for each change-event type, one or more distribution parameters from a sequence of timestamps of static and dynamic features (Paragraph 0142, Data from many (thousands) of users, both which churned and stayed, covering different locations, payment plans, and account age should be used for the ML model training. The more data which is used for training purposes, the more accurate prediction will be; Paragraph 0143, Neural networks may be employed to handle time components. Some neural network types may include Recurrent Neural Networks (RNNs) and Convolution Neural Networks (CCNs). An implementation may be performed using tensorflow/keras; Paragraph 0148, Once the telemetry data is collected, the data may be processed using an AI predictive model 2130. In one embodiment, the AI predictive model 2130 may use a gradient boosting machines 2131 method for making a composition of decision trees. In another embodiment, neural networks 2132 may be used for handling time components. Results 2140 may indicate a probability of service cancellation 2141 and a probability of a particular customer calling the support center 2142; As stated in Paragraph 0062 of Applicant’s specification, the categories of bundle events are upgrade, downgrade, and termination. Therefore, based on Applicant’s definition, Korte et al. discloses a change-event type since it can predict a termination/cancelation of the service. See provisional application No. 62/830,072, Paragraph 0124-0125, & 0130); 
and for each change-event type, [machine learning] configured to produce a probability density function using one or more distributions with weights with a predetermined constraint (Paragraph 0081, In data mining, the path to a solution is nonlinear. The process includes iteratively exploring, building and tuning many models. The process typically starts with feature extraction from the source data based on the domain knowledge and ends with an evaluation of the model. During the training stage in machine learning, the model's weights are updated based on input and ground truth data; Paragraph 0148, Once the telemetry data is collected, the data may be processed using an AI predictive model 2130. In one embodiment, the AI predictive model 2130 may use a gradient boosting machines 2131 method for making a composition of decision trees. In another embodiment, neural networks 2132 may be used for handling time components. Results 2140 may indicate a probability of service cancellation 2141 and a probability of a particular customer calling the support center 2142; See provisional application No. 62/830,072, Paragraphs 0078 & 0130); 
determine, using the one or more change events and the metrics input into the artificial intelligence model, one or more subscription change events and a time-to-event for each change to the one or more bundles of services according to the probability density functions (Paragraph 0142, Data from many (thousands) of users, both which churned and stayed, covering different locations, payment plans, and account age should be used for the ML model training. The more data which is used for training purposes, the more accurate prediction will be; Paragraph 0143, Neural networks may be employed to handle time components. Some neural network types may include Recurrent Neural Networks (RNNs) and Convolution Neural Networks (CCNs). An implementation may be performed using tensorflow/keras; Paragraph 0148, Once the telemetry data is collected, the data may be processed using an AI predictive model 2130. In one embodiment, the AI predictive model 2130 may use a gradient boosting machines 2131 method for making a composition of decision trees. In another embodiment, neural networks 2132 may be used for handling time components. Results 2140 may indicate a probability of service cancellation 2141 and a probability of a particular customer calling the support center 2142; See provisional application No. 62/830,072, Paragraph 0124-0125, & 0130); 
and generate, via a network, data to cause a server to update, for the one or more account identifiers, a database entry associated with a service of the one or more bundles of services, the update stored in persistent storage prior to or at the time-to-event (Paragraph 0083, A plurality of data sources may be used as input to a prediction model. Historical data (Data Source 1 . . . N) may be used as an initial training data set. After preprocessing 1401, cleaned training data 1402 may be used to train 1403 the learning Prediction Model. After learning is completed, new data goes to the model input. The Model makes a decision as to “At Risk” or “Not at Risk”. Later, new data is augmented by actual subscriber behavior (whether he/she cancelled the service within certain period, i.e. one month, or not). Then this data may be used as training data to further refine the model. This process makes constant adjustments (updates) to the model improving prediction accuracy. Dynamic updates allow the model to automatically adapt to changing environmental conditions like changing subscriber taste, tolerance to issues, equipment changes, appearance of competitors, etc.; See provisional application No. 62/830,072, Paragraph 0143).
Although Korte et al. discloses a machine learning configured to produce a probability density function using one or more distributions with weights with a predetermined constraint (see Paragraphs 0081 & 0148), Korte et al. does not specifically disclose wherein each event type includes a FCNN layer.
However, Yu discloses for each change-event type, a fully-connected neural-network (FCNN) layer group configured to produce a probability density function using one or more distributions with weights with a predetermined constraint (see Figure 1, A new multi-task learning RNN model; Page 486, II. Related Work; Considering predicting different types of outcomes as tasks, hospital mortality, physiologic decomposition, ICU LOS and phenotype classification can be simultaneously predicted using a multi-task learning model [12]; Page 488, D. Classification With Attention, For classiﬁcation, an attention mechanism [15] was applied to set different level of importance to every time step in the time series, instead of relying on any single time or considering them all equally. The representation vector r of this time series is formed by a weighted sum of all hidden state vectors h based on a vector of classiﬁcation attention probability α: see equation 4.1 and 4.2, where ω is a trainable parameter vector, and ωT is the transpose of ω. The final representation used for classification is: see equation 4.3. 4.3. Two fully-connected layers were connected to these features h∗ to output the probability of mortality for classiﬁcation; Page 488, E. Multi-Task Learning Model, As shown in Fig. 1, the multi-task learning model combined the sequence reconstruction and classiﬁcation tasks under the hard parameter sharing scheme [16]. The LSTM encoder was shared between these two tasks. During the model training for classification, a cross-entropy loss function was used to measure the difference between the predicted mortality probability (p) and the corresponding mortality outcome (y) for each sample).


    PNG
    media_image1.png
    284
    348
    media_image1.png
    Greyscale


It would have been obvious to one ordinary skill in the art at the time the invention was filed to modify the machine learning used for predicting, for each change-event type, a probability density function using one or more distributions with weights with a predetermined constraint of the invention of Korte et al. to further specify wherein each event type includes a FCNN layer of the invention of Yu because doing so would allow the method to use a multi-task learning model, with the LSTM feature learning layer and two fully-connected layers to output a probability (see Yu, Page 489, C. Baseline Models). Further, the claimed invention is merely a simple substitution of one known element for another to obtain predictable results (e.g. substitution of a Machine Learning that uses RNN to learn a sequence for a Multi-Task Learning RNN because the Multi-Task Learning RNN achieves better prediction performance (see Yu, Page 490, V. Results).
Regarding claim 35 (New), Korte et al. discloses a method, comprising (Paragraph 0003, method and system may input telemetry into a data structure. The telemetry may comprise a plurality of measurement results or other data collected from remote network nodes. Subscriber use metrics may be determined from the telemetry. From the use metrics and habits of the subscriber, user characteristics may be determined. A likelihood of service cancellation, based on the user characteristics, may be estimated in an embodiment; See provisional application No. 62/830,072, Paragraph 0002): 
identifying, by one or more processors coupled with memory, one or more change events that correspond to one or more bundles of services for one or more account identifiers (Paragraph 0066, Data mining applications are good candidates for prediction tasks. Trying to determine future outcomes based on estimations from historic data can be as simple as guessing the classification of the next inputs. One of the practical reasons to exercise Data Mining techniques might be to identify customers who are not presently enjoying their service and to predict the possibility of them cancelling their subscription; Paragraph 0067, Data Mining may be defined as the process of discovering patterns in data, either automatically or semi-automatically. This process is supported by tools and practical techniques, also known as Machine Learning, which are used to identify the underlying structure of the data. This structure is then represented in a comprehensible manner for use in Data Mining applications. Pattern descriptions are considered the output of a learning process; Paragraph 0083, The Model makes a decision as to “At Risk” or “Not at Risk”. Later, new data is augmented by actual subscriber behavior (whether he/she cancelled the service within certain period, i.e. one month, or not); See provisional application No. 62/830,072, Paragraphs 0065-0066 & 0143); 
determining, by the one or more processors, metrics indicative of an amount of activity corresponding to the one or more bundles of services utilized by one or more client devices, via one or more connections associated with the one or more account identifiers (Paragraph 0145, KPI and KQI performance values alone may not be enough to predict subscriber behavior. For instance, some subscribers may call the support center or leave the service while other subscribers with the same KQI do nothing and maintain their service. Subscribers should be classified by risk factor and subscribers in the high-risk group may be addressed first followed by lower-risk subscriber groups. Geolocation, for example, subscribers in a building and the presence of competitors may be considered as factors. Other subscriber behavior factors including history of complaints, purchases, viewing habits may also be considered; See provisional application No. 62/830,072, Paragraph 0127); 
selecting, by the one or more processors, an artificial intelligence model comprising a recurrent neural-network (RNN) layer stack configured to output, for each change-event type, one or more distribution parameters from a sequence of timestamps of static and dynamic features (Paragraph 0142, Data from many (thousands) of users, both which churned and stayed, covering different locations, payment plans, and account age should be used for the ML model training. The more data which is used for training purposes, the more accurate prediction will be; Paragraph 0143, Neural networks may be employed to handle time components. Some neural network types may include Recurrent Neural Networks (RNNs) and Convolution Neural Networks (CCNs). An implementation may be performed using tensorflow/keras; Paragraph 0148, Once the telemetry data is collected, the data may be processed using an AI predictive model 2130. In one embodiment, the AI predictive model 2130 may use a gradient boosting machines 2131 method for making a composition of decision trees. In another embodiment, neural networks 2132 may be used for handling time components. Results 2140 may indicate a probability of service cancellation 2141 and a probability of a particular customer calling the support center 2142; As stated in Paragraph 0062 of Applicant’s specification, the categories of bundle events are upgrade, downgrade, and termination. Therefore, based on Applicant’s definition, Korte et al. discloses a change-event type since it can predict a termination/cancelation of the service. See provisional application No. 62/830,072, Paragraph 0124-0125, & 0130) and for each change-event type, a [machine learning] configured to produce a probability density function using one or more distributions with weights with a predetermined constraint (Paragraph 0081, In data mining, the path to a solution is nonlinear. The process includes iteratively exploring, building and tuning many models. The process typically starts with feature extraction from the source data based on the domain knowledge and ends with an evaluation of the model. During the training stage in machine learning, the model's weights are updated based on input and ground truth data; Paragraph 0148, Once the telemetry data is collected, the data may be processed using an AI predictive model 2130. In one embodiment, the AI predictive model 2130 may use a gradient boosting machines 2131 method for making a composition of decision trees. In another embodiment, neural networks 2132 may be used for handling time components. Results 2140 may indicate a probability of service cancellation 2141 and a probability of a particular customer calling the support center 2142; See provisional application No. 62/830,072, Paragraphs 0078 & 0130);
determining, by the one or more processors, using the one or more change events and the metrics input into the artificial intelligence model, one or more subscription change events and a time-to-event for each change to the one or more bundles of services according to the probability density functions (Paragraph 0142, Data from many (thousands) of users, both which churned and stayed, covering different locations, payment plans, and account age should be used for the ML model training. The more data which is used for training purposes, the more accurate prediction will be; Paragraph 0143, Neural networks may be employed to handle time components. Some neural network types may include Recurrent Neural Networks (RNNs) and Convolution Neural Networks (CCNs). An implementation may be performed using tensorflow/keras; Paragraph 0148, Once the telemetry data is collected, the data may be processed using an AI predictive model 2130. In one embodiment, the AI predictive model 2130 may use a gradient boosting machines 2131 method for making a composition of decision trees. In another embodiment, neural networks 2132 may be used for handling time components. Results 2140 may indicate a probability of service cancellation 2141 and a probability of a particular customer calling the support center 2142; See provisional application No. 62/830,072, Paragraph 0124-0125, & 0130); 
and generating, by the one or more processors, via a network, data to cause a server to update, for the one or more account identifiers, a database entry associated with a service of the one or more bundles of services, the update stored in persistent storage prior to or at the time-to-event (Paragraph 0083, A plurality of data sources may be used as input to a prediction model. Historical data (Data Source 1 . . . N) may be used as an initial training data set. After preprocessing 1401, cleaned training data 1402 may be used to train 1403 the learning Prediction Model. After learning is completed, new data goes to the model input. The Model makes a decision as to “At Risk” or “Not at Risk”. Later, new data is augmented by actual subscriber behavior (whether he/she cancelled the service within certain period, i.e. one month, or not). Then this data may be used as training data to further refine the model. This process makes constant adjustments (updates) to the model improving prediction accuracy. Dynamic updates allow the model to automatically adapt to changing environmental conditions like changing subscriber taste, tolerance to issues, equipment changes, appearance of competitors, etc.; See provisional application No. 62/830,072, Paragraph 0143).
Although Korte et al. discloses a machine learning configured to produce a probability density function using one or more distributions with weights with a predetermined constraint (see Paragraphs 0081 & 0148), Korte et al. does not specifically disclose wherein each event type includes a FCNN layer.
However, Yu discloses for each change-event type, a fully-connected neural-network (FCNN) layer group configured to produce a probability density function using one or more distributions with weights with a predetermined constraint (see Figure 1, A new multi-task learning RNN model; Page 486, II. Related Work; Considering predicting different types of outcomes as tasks, hospital mortality, physiologic decomposition, ICU LOS and phenotype classification can be simultaneously predicted using a multi-task learning model [12]; Page 488, D. Classification With Attention, For classiﬁcation, an attention mechanism [15] was applied to set different level of importance to every time step in the time series, instead of relying on any single time or considering them all equally. The representation vector r of this time series is formed by a weighted sum of all hidden state vectors h based on a vector of classiﬁcation attention probability α: see equation 4.1 and 4.2, where ω is a trainable parameter vector, and ωT is the transpose of ω. The final representation used for classification is: see equation 4.3. 4.3. Two fully-connected layers were connected to these features h∗ to output the probability of mortality for classiﬁcation; Page 488, E. Multi-Task Learning Model, As shown in Fig. 1, the multi-task learning model combined the sequence reconstruction and classiﬁcation tasks under the hard parameter sharing scheme [16]. The LSTM encoder was shared between these two tasks. During the model training for classification, a cross-entropy loss function was used to measure the difference between the predicted mortality probability (p) and the corresponding mortality outcome (y) for each sample).
	
    PNG
    media_image1.png
    284
    348
    media_image1.png
    Greyscale


It would have been obvious to one ordinary skill in the art at the time the invention was filed to modify the machine learning used for predicting, for each change-event type, a probability density function using one or more distributions with weights with a predetermined constraint of the invention of Korte et al. to further specify wherein each event type includes a FCNN layer of the invention of Yu because doing so would allow the method to use a multi-task learning model, with the LSTM feature learning layer and two fully-connected layers to output a probability (see Yu, Page 489, C. Baseline Models). Further, the claimed invention is merely a simple substitution of one known element for another to obtain predictable results (e.g. substitution of a Machine Learning that uses RNN to learn a sequence for a Multi-Task Learning RNN because the Multi-Task Learning RNN achieves better prediction performance (see Yu, Page 490, V. Results).
Regarding claims 23 and 36 (New), which are dependent of claims 22 and 35, the combination of Korte et al. and Yu discloses all the limitations in claims 22 and 35. Korte et al. further discloses wherein the RNN layer stack outputs, for each change-event type, at least one distribution parameter selected from a location, scale, or shape parameter (Paragraph 0142, Data from many (thousands) of users, both which churned and stayed, covering different locations, payment plans, and account age should be used for the ML model training. The more data which is used for training purposes, the more accurate prediction will be; Paragraph 0143, Neural networks may be employed to handle time components. Some neural network types may include Recurrent Neural Networks (RNNs) and Convolution Neural Networks (CCNs). An implementation may be performed using tensorflow/keras; Paragraph 0144, Subscriber metadata may include, but is not limited to Profile creation date, Location and sub_location, Last authorization date, Subscription plan (price, included channels, options, etc.), history of additional purchases (VOD, upgrades, etc.), Presence of Internet service in addition to TV service, Account type (residential or business), and Network type (FTTB, DSL, etc.) Table 3 provides example telemetry data; Paragraph 0148, Once the telemetry data is collected, the data may be processed using an AI predictive model 2130. In one embodiment, the AI predictive model 2130 may use a gradient boosting machines 2131 method for making a composition of decision trees. In another embodiment, neural networks 2132 may be used for handling time components. Results 2140 may indicate a probability of service cancellation 2141 and a probability of a particular customer calling the support center 2142; It can be noted that the claim language is written in alternative form. The limitation taught by Korte et al. is based on “distribution parameter selected from a location." See provisional application No. 62/830,072, Paragraph 0124-0126, & 0130).
Regarding claims 24 and 37 (New), which are dependent of claims 23 and 36, the combination of Korte et al. and Yu discloses all the limitations in claims 23 and 36. Korte et al. further discloses wherein the RNN is trained to use the metrics to predict subscription change events for the one or more account identifiers according to the activity corresponding to the one or more bundles of services of the one or more account identifiers (Table 3, service account number; Paragraph 0142, Data from many (thousands) of users, both which churned and stayed, covering different locations, payment plans, and account age should be used for the ML model training. The more data which is used for training purposes, the more accurate prediction will be; Paragraph 0143, Neural networks may be employed to handle time components. Some neural network types may include Recurrent Neural Networks (RNNs) and Convolution Neural Networks (CCNs). An implementation may be performed using tensorflow/keras; Paragraph 0144, Subscriber metadata may include, but is not limited to Profile creation date, Location and sub_location, Last authorization date, Subscription plan (price, included channels, options, etc.), history of additional purchases (VOD, upgrades, etc.), Presence of Internet service in addition to TV service, Account type (residential or business), and Network type (FTTB, DSL, etc.) Table 3 provides example telemetry data; Paragraph 0148, Once the telemetry data is collected, the data may be processed using an AI predictive model 2130. In one embodiment, the AI predictive model 2130 may use a gradient boosting machines 2131 method for making a composition of decision trees. In another embodiment, neural networks 2132 may be used for handling time components. Results 2140 may indicate a probability of service cancellation 2141 and a probability of a particular customer calling the support center 2142; See provisional application No. 62/830,072, Paragraph 0124-0126, 0130, & Table 3, service account number).
Regarding claims 25 and 38 (New), which are dependent of claims 22 and 35, the combination of Korte et al. and Yu discloses all the limitations in claims 22 and 35. Korte et al. further discloses wherein the one or more processors further use a static feature that remains unchanged across a plurality of timestamps and a dynamic feature that varies across the plurality of timestamps to determine a subscription-change event and the time-to-event for the one or more bundles of services (Paragraph 0142, Data from many (thousands) of users, both which churned and stayed, covering different locations, payment plans, and account age should be used for the ML model training. The more data which is used for training purposes, the more accurate prediction will be; Paragraph 0143, Neural networks may be employed to handle time components. Some neural network types may include Recurrent Neural Networks (RNNs) and Convolution Neural Networks (CCNs). An implementation may be performed using tensorflow/keras; Paragraph 0144, Subscriber metadata may include, but is not limited to Profile creation date, Location and sub_location, Last authorization date, Subscription plan (price, included channels, options, etc.), history of additional purchases (VOD, upgrades, etc.), Presence of Internet service in addition to TV service, Account type (residential or business), and Network type (FTTB, DSL, etc.) Table 3 provides example telemetry data; Paragraph 0145, KPI and KQI performance values alone may not be enough to predict subscriber behavior. For instance, some subscribers may call the support center or leave the service while other subscribers with the same KQI do nothing and maintain their service. Subscribers should be classified by risk factor and subscribers in the high-risk group may be addressed first followed by lower-risk subscriber groups. Geolocation, for example, subscribers in a building and the presence of competitors may be considered as factors. Other subscriber behavior factors including history of complaints, purchases, viewing habits may also be considered; Paragraph 0148, Once the telemetry data is collected, the data may be processed using an AI predictive model 2130. In one embodiment, the AI predictive model 2130 may use a gradient boosting machines 2131 method for making a composition of decision trees. In another embodiment, neural networks 2132 may be used for handling time components. Results 2140 may indicate a probability of service cancellation 2141 and a probability of a particular customer calling the support center 2142; Examiner notes that location is a static feature. Also, purchases and viewing habits are dynamic features. See provisional application No. 62/830,072, Paragraph 0124-0127, & 0130).
Regarding claims 26 and 39 (New), which are dependent of claims 25 and 38, the combination of Korte et al. and Yu discloses all the limitations in claims 25 and 38. Korte et al. further discloses wherein the static feature comprises at least one of an industry type, a sector, a geographic location, or a business partner type, and the dynamic feature comprise at least one of a login, a button click, a use of a feature, a page visit, a use of an email or a chat feature (Paragraph 0142, Data from many (thousands) of users, both which churned and stayed, covering different locations, payment plans, and account age should be used for the ML model training. The more data which is used for training purposes, the more accurate prediction will be; Paragraph 0143, Neural networks may be employed to handle time components. Some neural network types may include Recurrent Neural Networks (RNNs) and Convolution Neural Networks (CCNs). An implementation may be performed using tensorflow/keras; Paragraph 0144, Subscriber metadata may include, but is not limited to Profile creation date, Location and sub_location, Last authorization date, Subscription plan (price, included channels, options, etc.), history of additional purchases (VOD, upgrades, etc.), Presence of Internet service in addition to TV service, Account type (residential or business), and Network type (FTTB, DSL, etc.) Table 3 provides example telemetry data; Paragraph 0145, KPI and KQI performance values alone may not be enough to predict subscriber behavior. For instance, some subscribers may call the support center or leave the service while other subscribers with the same KQI do nothing and maintain their service. Subscribers should be classified by risk factor and subscribers in the high-risk group may be addressed first followed by lower-risk subscriber groups. Geolocation, for example, subscribers in a building and the presence of competitors may be considered as factors. Other subscriber behavior factors including history of complaints, purchases, viewing habits may also be considered; Paragraph 0148, Once the telemetry data is collected, the data may be processed using an AI predictive model 2130. In one embodiment, the AI predictive model 2130 may use a gradient boosting machines 2131 method for making a composition of decision trees. In another embodiment, neural networks 2132 may be used for handling time components. Results 2140 may indicate a probability of service cancellation 2141 and a probability of a particular customer calling the support center 2142; It can be noted that the claim language is written in alternative form. The limitation taught by Korte et al. is based on a static feature of a geographic location and a dynamic feature of a use of a feature. See provisional application No. 62/830,072, Paragraph 0124-0127, & 0130).
Regarding claims 27 and 40 (New), which are dependent of claims 22 and 35, the combination of Korte et al. and Yu discloses all the limitations in claims 22 and 35. Korte et al. further discloses wherein the one or more processors further: collect subscription data for one or more accounts over a time interval, wherein each account of the one or more accounts is subscribed to at least one bundle of services of the one or more bundles of services; and identify the one or more change events based on the subscription data (Table 3, service account number; Paragraph 0142, Data from many (thousands) of users, both which churned and stayed, covering different locations, payment plans, and account age should be used for the ML model training. The more data which is used for training purposes, the more accurate prediction will be; Paragraph 0143, Neural networks may be employed to handle time components. Some neural network types may include Recurrent Neural Networks (RNNs) and Convolution Neural Networks (CCNs). An implementation may be performed using tensorflow/keras; Paragraph 0144, Subscriber metadata may include, but is not limited to Profile creation date, Location and sub_location, Last authorization date, Subscription plan (price, included channels, options, etc.), history of additional purchases (VOD, upgrades, etc.), Presence of Internet service in addition to TV service, Account type (residential or business), and Network type (FTTB, DSL, etc.) Table 3 provides example telemetry data; Paragraph 0148, Once the telemetry data is collected, the data may be processed using an AI predictive model 2130. In one embodiment, the AI predictive model 2130 may use a gradient boosting machines 2131 method for making a composition of decision trees. In another embodiment, neural networks 2132 may be used for handling time components. Results 2140 may indicate a probability of service cancellation 2141 and a probability of a particular customer calling the support center 2142; See provisional application No. 62/830,072, Paragraph 0124-0126, 0130, & Table 3, service account number).
Regarding claim 28 (New), which is dependent of claim 27, the combination of Korte et al. and Yu discloses all the limitations in claim 27. Korte et al. further discloses wherein the one or more processors further: determine the metrics capturing the amount of activity based at least one of a number of clicks while using the bundle of services, a duration of using the bundle of services, and a number of page visits while using the bundle of services (Paragraph 0145, KPI and KQI performance values alone may not be enough to predict subscriber behavior. For instance, some subscribers may call the support center or leave the service while other subscribers with the same KQI do nothing and maintain their service. Subscribers should be classified by risk factor and subscribers in the high-risk group may be addressed first followed by lower-risk subscriber groups. Geolocation, for example, subscribers in a building and the presence of competitors may be considered as factors. Other subscriber behavior factors including history of complaints, purchases, viewing habits may also be considered; It can be noted that the claim language is written in alternative form. The limitation taught by Korte et al. is based on a duration of using the bundle of services. See provisional application No. 62/830,072, Paragraph 0127).
Regarding claim 29 (New), which is dependent of claim 22, the combination of Korte et al. and Yu discloses all the limitations in claim 22. Korte et al. further discloses wherein the RNN predicts subscription change events from timestamped activity associated with a client account (Table 3, service account number; Paragraph 0142, Data from many (thousands) of users, both which churned and stayed, covering different locations, payment plans, and account age should be used for the ML model training. The more data which is used for training purposes, the more accurate prediction will be; Paragraph 0143, Neural networks may be employed to handle time components. Some neural network types may include Recurrent Neural Networks (RNNs) and Convolution Neural Networks (CCNs). An implementation may be performed using tensorflow/keras; Paragraph 0144, Subscriber metadata may include, but is not limited to Profile creation date, Location and sub_location, Last authorization date, Subscription plan (price, included channels, options, etc.), history of additional purchases (VOD, upgrades, etc.), Presence of Internet service in addition to TV service, Account type (residential or business), and Network type (FTTB, DSL, etc.) Table 3 provides example telemetry data; Paragraph 0148, Once the telemetry data is collected, the data may be processed using an AI predictive model 2130. In one embodiment, the AI predictive model 2130 may use a gradient boosting machines 2131 method for making a composition of decision trees. In another embodiment, neural networks 2132 may be used for handling time components. Results 2140 may indicate a probability of service cancellation 2141 and a probability of a particular customer calling the support center 2142; See provisional application No. 62/830,072, Paragraph 0124-0126, 0130, & Table 3, service account number) and the [machine learning] compute probability density functions and a weighted average thereof to determine the time-to-event (Paragraph 0081, In data mining, the path to a solution is nonlinear. The process includes iteratively exploring, building and tuning many models. The process typically starts with feature extraction from the source data based on the domain knowledge and ends with an evaluation of the model. During the training stage in machine learning, the model's weights are updated based on input and ground truth data; Paragraph 0148, Once the telemetry data is collected, the data may be processed using an AI predictive model 2130. In one embodiment, the AI predictive model 2130 may use a gradient boosting machines 2131 method for making a composition of decision trees. In another embodiment, neural networks 2132 may be used for handling time components. Results 2140 may indicate a probability of service cancellation 2141 and a probability of a particular customer calling the support center 2142; see provisional application No. 62/830,072, Paragraphs 0078 & 0130).
Although Korte et al. discloses a machine learning configured to produce a probability density function using one or more distributions with weights with a predetermined constraint (see Paragraphs 0081 & 0148), Korte et al. does not specifically disclose wherein each event type includes a FCNN layer.
However, Yu discloses wherein the RNN predicts subscription change events from timestamped activity associated with a client … and the FCNN layer groups compute probability density functions and a weighted average thereof to determine the time-to-event (see Figure 1, A new multi-task learning RNN model; Page 486, II. Related Work; Considering predicting different types of outcomes as tasks, hospital mortality, physiologic decomposition, ICU LOS and phenotype classification can be simultaneously predicted using a multi-task learning model [12]; Page 488, D. Classification With Attention, For classiﬁcation, an attention mechanism [15] was applied to set different level of importance to every time step in the time series, instead of relying on any single time or considering them all equally. The representation vector r of this time series is formed by a weighted sum of all hidden state vectors h based on a vector of classiﬁcation attention probability α: see equation 4.1 and 4.2, where ω is a trainable parameter vector, and ωT is the transpose of ω. The final representation used for classification is: see equation 4.3. 4.3. Two fully-connected layers were connected to these features h∗ to output the probability of mortality for classiﬁcation; Page 488, E. Multi-Task Learning Model, As shown in Fig. 1, the multi-task learning model combined the sequence reconstruction and classiﬁcation tasks under the hard parameter sharing scheme [16]. The LSTM encoder was shared between these two tasks. During the model training for classification, a cross-entropy loss function was used to measure the difference between the predicted mortality probability (p) and the corresponding mortality outcome (y) for each sample).

    PNG
    media_image1.png
    284
    348
    media_image1.png
    Greyscale


It would have been obvious to one ordinary skill in the art at the time the invention was filed to modify the machine learning used for predicting, for each change-event type, a probability density function using one or more distributions with weights with a predetermined constraint of the invention of Korte et al. to further specify wherein each event type includes a FCNN layer of the invention of Yu because doing so would allow the method to use a multi-task learning model, with the LSTM feature learning layer and two fully-connected layers to output a probability (see Yu, Page 489, C. Baseline Models). Further, the claimed invention is merely a simple substitution of one known element for another to obtain predictable results (e.g. substitution of a Machine Learning that uses RNN to learn a sequence for a Multi-Task Learning RNN because the Multi-Task Learning RNN achieves better prediction performance (see Yu, Page 490, V. Results).
Regarding claim 30 (New), which is dependent of claim 22, the combination of Korte et al. and Yu discloses all the limitations in claim 22. Korte et al. further discloses wherein a [machine learning] calculates a probability density function of the probability density functions for each type of subscription change event predicted by a neural network stack (Paragraph 0081, In data mining, the path to a solution is nonlinear. The process includes iteratively exploring, building and tuning many models. The process typically starts with feature extraction from the source data based on the domain knowledge and ends with an evaluation of the model. During the training stage in machine learning, the model's weights are updated based on input and ground truth data; Paragraph 0148, Once the telemetry data is collected, the data may be processed using an AI predictive model 2130. In one embodiment, the AI predictive model 2130 may use a gradient boosting machines 2131 method for making a composition of decision trees. In another embodiment, neural networks 2132 may be used for handling time components. Results 2140 may indicate a probability of service cancellation 2141 and a probability of a particular customer calling the support center 2142; see provisional application No. 62/830,072, Paragraphs 0078 & 0130).
Although Korte et al. discloses a machine learning configured to produce a probability density function using one or more distributions with weights with a predetermined constraint (see Paragraphs 0081 & 0148), Korte et al. does not specifically disclose wherein each event type includes a FCNN layer.
However, Yu discloses wherein a fully connected neural network calculates a probability density function of the probability density functions for each type of … change event predicted by a neural network stack (see Figure 1, A new multi-task learning RNN model; Page 486, II. Related Work; Considering predicting different types of outcomes as tasks, hospital mortality, physiologic decomposition, ICU LOS and phenotype classification can be simultaneously predicted using a multi-task learning model [12]; Page 488, D. Classification With Attention, For classiﬁcation, an attention mechanism [15] was applied to set different level of importance to every time step in the time series, instead of relying on any single time or considering them all equally. The representation vector r of this time series is formed by a weighted sum of all hidden state vectors h based on a vector of classiﬁcation attention probability α: see equation 4.1 and 4.2, where ω is a trainable parameter vector, and ωT is the transpose of ω. The final representation used for classification is: see equation 4.3. 4.3. Two fully-connected layers were connected to these features h∗ to output the probability of mortality for classiﬁcation; Page 488, E. Multi-Task Learning Model, As shown in Fig. 1, the multi-task learning model combined the sequence reconstruction and classiﬁcation tasks under the hard parameter sharing scheme [16]. The LSTM encoder was shared between these two tasks. During the model training for classification, a cross-entropy loss function was used to measure the difference between the predicted mortality probability (p) and the corresponding mortality outcome (y) for each sample).

    PNG
    media_image1.png
    284
    348
    media_image1.png
    Greyscale


It would have been obvious to one ordinary skill in the art at the time the invention was filed to modify the machine learning used for predicting, for each change-event type, a probability density function using one or more distributions with weights with a predetermined constraint of the invention of Korte et al. to further specify wherein each event type includes a FCNN layer of the invention of Yu because doing so would allow the method to use a multi-task learning model, with the LSTM feature learning layer and two fully-connected layers to output a probability (see Yu, Page 489, C. Baseline Models). Further, the claimed invention is merely a simple substitution of one known element for another to obtain predictable results (e.g. substitution of a Machine Learning that uses RNN to learn a sequence for a Multi-Task Learning RNN because the Multi-Task Learning RNN achieves better prediction performance (see Yu, Page 490, V. Results).
Regarding claim 31 (New), which is dependent of claim 22, the combination of Korte et al. and Yu discloses all the limitations in claim 22. Korte et al. further discloses wherein the time-to-event for each bundle subscription change comprises a normalized risk score (Paragraph 0083, A plurality of data sources may be used as input to a prediction model. Historical data (Data Source 1 . . . N) may be used as an initial training data set. After preprocessing 1401, cleaned training data 1402 may be used to train 1403 the learning Prediction Model. After learning is completed, new data goes to the model input. The Model makes a decision as to “At Risk” or “Not at Risk”. Later, new data is augmented by actual subscriber behavior (whether he/she cancelled the service within certain period, i.e. one month, or not). Then this data may be used as training data to further refine the model. This process makes constant adjustments (updates) to the model improving prediction accuracy. Dynamic updates allow the model to automatically adapt to changing environmental conditions like changing subscriber taste, tolerance to issues, equipment changes, appearance of competitors, etc; See provisional application No. 62/830,072, Paragraph 0143).
Regarding claim 32 (New), which is dependent of claim 22, the combination of Korte et al. and Yu discloses all the limitations in claim 22. Korte et al. further discloses wherein accounts associated with the account identifiers are grouped according to a number of shared static features (Paragraph 0116, In an embodiment, KQIs may be calculated using data from subscribers and networks that use cable vs. fiber optic connections. This calculation may show a difference in reliability and quality between these two technologies. It can lead to decision whether to perform a distribution network update; Paragraph 0119, In an embodiment, KQIs and KPIs may be calculated using historical data from subscribers who cancelled the service and who had or did not have access to a competitor ISP. This calculation may shed light on how availability of one or more competing ISPs may change a subscriber's tolerance of service quality. There are countless ways data can be calculated depending on a needed perspective; Paragraph 0144, Subscriber metadata may include, but is not limited to Profile creation date, Location and sub_location, Last authorization date, Subscription plan (price, included channels, options, etc.), history of additional purchases (VOD, upgrades, etc.), Presence of Internet service in addition to TV service, Account type (residential or business), and Network type (FTTB, DSL, etc.) Table 3 provides example telemetry data; See provisional application No. 62/830,072, Paragraph 0107, 0110, & 0126), wherein the one or more change events include an upgrade event, a downgrade event and a termination event (Paragraph 0148, Once the telemetry data is collected, the data may be processed using an AI predictive model 2130. In one embodiment, the AI predictive model 2130 may use a gradient boosting machines 2131 method for making a composition of decision trees. In another embodiment, neural networks 2132 may be used for handling time components. Results 2140 may indicate a probability of service cancellation 2141 and a probability of a particular customer calling the support center 2142; It can be noted that the claim language is written in alternative form. The limitation taught by Korte et al. is based on a “termination event." See provisional application No. 62/830,072, Paragraph 0130).
Regarding claim 33 (New), which is dependent of claim 22, the combination of Korte et al. and Yu discloses all the limitations in claim 22. Korte et al. further discloses wherein the one or more change events and time-to-event comprises a density approximation for the metrics, wherein the density approximation is formed by determining a weighted average of the probability density functions (Paragraph 0081, In data mining, the path to a solution is nonlinear. The process includes iteratively exploring, building and tuning many models. The process typically starts with feature extraction from the source data based on the domain knowledge and ends with an evaluation of the model. During the training stage in machine learning, the model's weights are updated based on input and ground truth data; Paragraph 0148, Once the telemetry data is collected, the data may be processed using an AI predictive model 2130. In one embodiment, the AI predictive model 2130 may use a gradient boosting machines 2131 method for making a composition of decision trees. In another embodiment, neural networks 2132 may be used for handling time components. Results 2140 may indicate a probability of service cancellation 2141 and a probability of a particular customer calling the support center 2142; See provisional application No. 62/830,072, Paragraphs 0078 & 0130).
Regarding claim 34 (New), which is dependent of claim 32, the combination of Korte et al. and Yu discloses all the limitations in claim 32. Korte et al. further discloses wherein predicting a type and timing of change in bundle subscription for a particular account is based on past activities of that account and past activities of a number of other accounts sharing specified static features (Paragraph 0116, In an embodiment, KQIs may be calculated using data from subscribers and networks that use cable vs. fiber optic connections. This calculation may show a difference in reliability and quality between these two technologies. It can lead to decision whether to perform a distribution network update; Paragraph 0119, In an embodiment, KQIs and KPIs may be calculated using historical data from subscribers who cancelled the service and who had or did not have access to a competitor ISP. This calculation may shed light on how availability of one or more competing ISPs may change a subscriber's tolerance of service quality. There are countless ways data can be calculated depending on a needed perspective; Paragraph 0144, Subscriber metadata may include, but is not limited to Profile creation date, Location and sub_location, Last authorization date, Subscription plan (price, included channels, options, etc.), history of additional purchases (VOD, upgrades, etc.), Presence of Internet service in addition to TV service, Account type (residential or business), and Network type (FTTB, DSL, etc.) Table 3 provides example telemetry data; Paragraph 0148, Once the telemetry data is collected, the data may be processed using an AI predictive model 2130. In one embodiment, the AI predictive model 2130 may use a gradient boosting machines 2131 method for making a composition of decision trees. In another embodiment, neural networks 2132 may be used for handling time components. Results 2140 may indicate a probability of service cancellation 2141 and a probability of a particular customer calling the support center 2142; See provisional application No. 62/830,072, Paragraph 0107, 0110, 0126, & 0130).

Claim 41 is rejected under 35 U.S.C. 103 as being unpatentable over Korte et al. (US 2020/0322662 A1), in view of Yu (Yu, R., Zheng, Y., Zhang, R., Jiang, Y. and Poon, C.C., 2019. Using a multi-task recurrent neural network with attention mechanisms to predict hospital mortality of patients. IEEE journal of biomedical and health informatics, 24(2), pp.486-492), in further view of Gray et al. (US 2020/0267580 A1).
Regarding claim 41 (New), Korte et al. discloses a [system] storing instructions which, when executed by one or more processors, cause the one or more processors to (Paragraph 0003, method and system may input telemetry into a data structure. The telemetry may comprise a plurality of measurement results or other data collected from remote network nodes. Subscriber use metrics may be determined from the telemetry. From the use metrics and habits of the subscriber, user characteristics may be determined. A likelihood of service cancellation, based on the user characteristics, may be estimated in an embodiment; See provisional application No. 62/830,072, Paragraph 0002): 
identify one or more change events that correspond to one or more bundles of services for one or more account identifiers (Paragraph 0066, Data mining applications are good candidates for prediction tasks. Trying to determine future outcomes based on estimations from historic data can be as simple as guessing the classification of the next inputs. One of the practical reasons to exercise Data Mining techniques might be to identify customers who are not presently enjoying their service and to predict the possibility of them cancelling their subscription; Paragraph 0067, Data Mining may be defined as the process of discovering patterns in data, either automatically or semi-automatically. This process is supported by tools and practical techniques, also known as Machine Learning, which are used to identify the underlying structure of the data. This structure is then represented in a comprehensible manner for use in Data Mining applications. Pattern descriptions are considered the output of a learning process; Paragraph 0083, The Model makes a decision as to “At Risk” or “Not at Risk”. Later, new data is augmented by actual subscriber behavior (whether he/she cancelled the service within certain period, i.e. one month, or not); See provisional application No. 62/830,072, Paragraphs 0065-0066 & 0143); 
determine metrics indicative of an amount of activity corresponding to the one or more bundles of services utilized by one or more client devices via one or more connections associated with the one or more account identifiers (Paragraph 0145, KPI and KQI performance values alone may not be enough to predict subscriber behavior. For instance, some subscribers may call the support center or leave the service while other subscribers with the same KQI do nothing and maintain their service. Subscribers should be classified by risk factor and subscribers in the high-risk group may be addressed first followed by lower-risk subscriber groups. Geolocation, for example, subscribers in a building and the presence of competitors may be considered as factors. Other subscriber behavior factors including history of complaints, purchases, viewing habits may also be considered; See provisional application No. 62/830,072, Paragraph 0127); 
select an artificial intelligence model comprising: a recurrent neural-network (RNN) layer stack configured to output, for each change-event type, one or more distribution parameters from a sequence of timestamps of static and dynamic features (Paragraph 0142, Data from many (thousands) of users, both which churned and stayed, covering different locations, payment plans, and account age should be used for the ML model training. The more data which is used for training purposes, the more accurate prediction will be; Paragraph 0143, Neural networks may be employed to handle time components. Some neural network types may include Recurrent Neural Networks (RNNs) and Convolution Neural Networks (CCNs). An implementation may be performed using tensorflow/keras; Paragraph 0148, Once the telemetry data is collected, the data may be processed using an AI predictive model 2130. In one embodiment, the AI predictive model 2130 may use a gradient boosting machines 2131 method for making a composition of decision trees. In another embodiment, neural networks 2132 may be used for handling time components. Results 2140 may indicate a probability of service cancellation 2141 and a probability of a particular customer calling the support center 2142; As stated in Paragraph 0062 of Applicant’s specification, the categories of bundle events are upgrade, downgrade, and termination. Therefore, based on Applicant’s definition, Korte et al. discloses a change-event type since it can predict a termination/cancelation of the service. See provisional application No. 62/830,072, Paragraph 0124-0125, & 0130); 
and for each change-event type, [machine learning] configured to produce a probability density function using one or more distributions with weights with a predetermined constraint (Paragraph 0081, In data mining, the path to a solution is nonlinear. The process includes iteratively exploring, building and tuning many models. The process typically starts with feature extraction from the source data based on the domain knowledge and ends with an evaluation of the model. During the training stage in machine learning, the model's weights are updated based on input and ground truth data; Paragraph 0148, Once the telemetry data is collected, the data may be processed using an AI predictive model 2130. In one embodiment, the AI predictive model 2130 may use a gradient boosting machines 2131 method for making a composition of decision trees. In another embodiment, neural networks 2132 may be used for handling time components. Results 2140 may indicate a probability of service cancellation 2141 and a probability of a particular customer calling the support center 2142; See provisional application No. 62/830,072, Paragraphs 0078 & 0130); 
determine, using the one or more change events and the metrics input into the artificial intelligence model, one or more subscription change events and a time-to-event for each change to the one or more bundles of services according to the probability density functions (Paragraph 0142, Data from many (thousands) of users, both which churned and stayed, covering different locations, payment plans, and account age should be used for the ML model training. The more data which is used for training purposes, the more accurate prediction will be; Paragraph 0143, Neural networks may be employed to handle time components. Some neural network types may include Recurrent Neural Networks (RNNs) and Convolution Neural Networks (CCNs). An implementation may be performed using tensorflow/keras; Paragraph 0148, Once the telemetry data is collected, the data may be processed using an AI predictive model 2130. In one embodiment, the AI predictive model 2130 may use a gradient boosting machines 2131 method for making a composition of decision trees. In another embodiment, neural networks 2132 may be used for handling time components. Results 2140 may indicate a probability of service cancellation 2141 and a probability of a particular customer calling the support center 2142; See provisional application No. 62/830,072, Paragraph 0124-0125, & 0130); 
and generate, via a network, data to cause a server to update, for the one or more account identifiers, a database entry associated with a service of the one or more bundles of services, the update stored in persistent storage prior to or at the time-to-event (Paragraph 0083, A plurality of data sources may be used as input to a prediction model. Historical data (Data Source 1 . . . N) may be used as an initial training data set. After preprocessing 1401, cleaned training data 1402 may be used to train 1403 the learning Prediction Model. After learning is completed, new data goes to the model input. The Model makes a decision as to “At Risk” or “Not at Risk”. Later, new data is augmented by actual subscriber behavior (whether he/she cancelled the service within certain period, i.e. one month, or not). Then this data may be used as training data to further refine the model. This process makes constant adjustments (updates) to the model improving prediction accuracy. Dynamic updates allow the model to automatically adapt to changing environmental conditions like changing subscriber taste, tolerance to issues, equipment changes, appearance of competitors, etc.; See provisional application No. 62/830,072, Paragraph 0143).
Although Korte et al. discloses a machine learning configured to produce a probability density function using one or more distributions with weights with a predetermined constraint (see Paragraphs 0081 & 0148), Korte et al. does not specifically disclose wherein each event type includes a FCNN layer.
However, Yu discloses for each change-event type, a fully-connected neural-network (FCNN) layer group configured to produce a probability density function using one or more distributions with weights with a predetermined constraint (see Figure 1, A new multi-task learning RNN model; Page 486, II. Related Work; Considering predicting different types of outcomes as tasks, hospital mortality, physiologic decomposition, ICU LOS and phenotype classification can be simultaneously predicted using a multi-task learning model [12]; Page 488, D. Classification With Attention, For classiﬁcation, an attention mechanism [15] was applied to set different level of importance to every time step in the time series, instead of relying on any single time or considering them all equally. The representation vector r of this time series is formed by a weighted sum of all hidden state vectors h based on a vector of classiﬁcation attention probability α: see equation 4.1 and 4.2, where ω is a trainable parameter vector, and ωT is the transpose of ω. The final representation used for classification is: see equation 4.3. 4.3. Two fully-connected layers were connected to these features h∗ to output the probability of mortality for classiﬁcation; Page 488, E. Multi-Task Learning Model, As shown in Fig. 1, the multi-task learning model combined the sequence reconstruction and classiﬁcation tasks under the hard parameter sharing scheme [16]. The LSTM encoder was shared between these two tasks. During the model training for classification, a cross-entropy loss function was used to measure the difference between the predicted mortality probability (p) and the corresponding mortality outcome (y) for each sample).


    PNG
    media_image1.png
    284
    348
    media_image1.png
    Greyscale


It would have been obvious to one ordinary skill in the art at the time the invention was filed to modify the machine learning used for predicting, for each change-event type, a probability density function using one or more distributions with weights with a predetermined constraint of the invention of Korte et al. to further specify wherein each event type includes a FCNN layer of the invention of Yu because doing so would allow the method to use a multi-task learning model, with the LSTM feature learning layer and two fully-connected layers to output a probability (see Yu, Page 489, C. Baseline Models). Further, the claimed invention is merely a simple substitution of one known element for another to obtain predictable results (e.g. substitution of a Machine Learning that uses RNN to learn a sequence for a Multi-Task Learning RNN because the Multi-Task Learning RNN achieves better prediction performance (see Yu, Page 490, V. Results).
Although the combination of Korte et al. and Yu discloses a system, the combination of Korte et al. and Yu does not specifically disclose a non-transitory computer readable media.
However, Gray et al. discloses a non-transitory computer readable media storing instructions which, when executed by one or more processors, cause the one or more processors to (Paragraph 0119, Computer program products containing mechanisms to effectuate the systems and methods in accordance with the presently described technology may reside in the data storage devices 1304 and/or the memory devices 1306, which may be referred to as machine-readable media. It will be appreciated that machine-readable media may include any tangible non-transitory medium that is capable of storing or encoding instructions to perform any one or more of the operations of the present disclosure for execution by a machine or that is capable of storing or encoding data structures and/or modules utilized by or associated with such instructions).
It would have been obvious to one ordinary skill in the art at the time the invention was filed to modify the machine learning used for predicting, for each change-event type, a probability density function using one or more distributions with weights with a predetermined constraint of the invention of Korte et al. to further specify wherein the machine learning is executed in a non-transitory computer readable media of the invention of Gray et al. because the claimed invention is merely a combination of old elements, and in combination each element would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.





















Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 22-41 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 3, 5-8, 10, 12-15, 17, and 19-21 and 15 of U.S. Patent No. 11,966,927 B2. The claims presented here are almost the same – claims 1, 8, and 15 in ‘927 also have subscription data corresponding to the one or more bundles of services, analyzing sequence of the subscription data using an RNN, and determining one or more subscription change events and a time-to-event for each change to the one or more bundles of services according to the probability density functions. The application ‘927 is narrower as it recites the structure of the multimodal multi-task learning (e.g., comprising at least one recurrent neural network (RNN) including three layers and a plurality of fully connected neural networks (FCNNs) each associated with a plurality of bundle subscription change events). The differences in claim language are made obvious in light of the prior art rejections made in the 103 rejection above. 















Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
Keng et al. (US 2021/0125073 A1) – discloses predicting a time to the future event using a distribution function that is determined using a recurrent neural network, the distribution function including a learned density with peaks that approximate the times of the historical events in the historical data (see at least Abstract & Para 0046).
Li (Li, Y., Wang, J., Ye, J. and Reddy, C.K., 2016, August. A multi-task learning formulation for survival analysis. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1715-1724) – discloses a “MTLSA” model, which stands for “Multi-Task Learning model for Survival Analysis”. We formulate the original survival time prediction problem into a multi-task learning problem. The primary motivation of using multi-task learning is because of its ability to learn a shared representation across related tasks and reduce the prediction error of each task. Thus, the model can provide a more accurate estimation of whether an event occurs or not at the beginning of each time interval which will thus provide an accurate estimation of the survival time for each instance. Another advantage of using multi-task learning for survival time estimation is because it translates the regression problem into a series of related binary classiﬁcation problems, and at each time interval the corresponding classiﬁer only focuses on modeling the local problem and hence provides a more accurate estimation than the regression models which aim at modeling the entire problem at once (see at least Page 1716).
Ruder (Ruder, S., 2017. An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098) – discloses Multi-Task Learning in Deep Neural Networks (see at least Introduction).
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARJORIE PUJOLS-CRUZ whose telephone number is (571)272-4668. The examiner can normally be reached Mon-Thru 7:30 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Patricia H Munson can be reached at (571)270-5396. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/M.P./Examiner, Art Unit 3624                                                                                                                                                                                             /PATRICIA H MUNSON/Supervisory Patent Examiner, Art Unit 3624
Read full office action
Prosecution Timeline

Apr 22, 2024
Application Filed
Sep 15, 2025
Non-Final Rejection — §101, §103, §DP
Dec 04, 2025
Examiner Interview Summary
Dec 04, 2025
Applicant Interview (Telephonic)
Dec 17, 2025
Response Filed
Jan 12, 2026
Final Rejection — §101, §103, §DP
Feb 11, 2026
Interview Requested
Feb 24, 2026
Examiner Interview Summary
Feb 24, 2026
Applicant Interview (Telephonic)
Precedent Cases

Applications granted by this same examiner with similar technology

16/900,917
Patent 12106240
SYSTEMS AND METHODS FOR ANALYZING USER PROJECTS
2y 5m to grant Granted Oct 01, 2024
17/564,934
Patent 12014298
AUTOMATICALLY SCHEDULING AND ROUTE PLANNING FOR SERVICE PROVIDERS
2y 5m to grant Granted Jun 18, 2024
16/675,314
Patent 11966927
Multi-Task Deep Learning of Client Demand
2y 5m to grant Granted Apr 23, 2024
16/829,666
Patent 11941651
LCP Pricing Tool
2y 5m to grant Granted Mar 26, 2024
17/313,013
Patent 11847602
SYSTEM AND METHOD FOR DETERMINING AND UTILIZING REPEATED CONVERSATIONS IN CONTACT CENTER QUALITY PROCESSES
2y 5m to grant Granted Dec 19, 2023
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
18%
Grant Probability
46%
With Interview (+27.9%)
3y 2m
Median Time to Grant
Moderate
PTA Risk
Based on 136 resolved cases by this examiner. Grant probability derived from career allow rate.