Office Action Analysis: 18228401 — HEURISTIC DATA PIPELINE SCHEDULING

Office Action

§103
DETAILED ACTION
This Office Action is in response to claims filed on 07/31/2023.
Claims 1-20 are pending.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 6-15, and 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over Gupta et al. Pub. No. US 2020/0279173 A1 (hereinafter Gupta) in view of Foster, II et al. Pub. No. US 2022/0050728 A1 (hereinafter Foster).

Gupta was cited in the IDS filed 07/31/2023.

With regard to claim 1, Gupta teaches a computer-implemented method, comprising ([0018], The computer readable instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus, or other device to produce a computer implemented process):
receiving, by a computing device, a data pipeline job request ([0037], At 202, the data pipeline is submitted to a cognitive rules engine. A data pipeline (i.e., a job) may be transformed and moved to a data lake (e.g., a data repository) for analytics.);
generating, by the computing device, a complexity score for the data pipeline job request;
determining, by the computing device, a predicted execution time for the data pipeline job request;
	generating, by the computing device, a priority score for the data pipeline job request ([0043], At 204, the cognitive rules engine determines a priority of a data pipeline (i.e., a job) based on a series of learned metrics.) based on the [learned metrics] predicted execution time and the generated complexity score ([0043], The learned metrics may take into account factors such as the existence of a service level agreement (SLA), specific user information, and server load, among other factors … The data pipeline prioritization program 110a, 110b may thereafter use the machine learning model to determine a priority of a data pipeline (i.e., a job).); and
	comparing, by the computing device, the priority score for the data pipeline job request to another priority score to determine a schedule for the data pipeline job request ([0055], At 206, the data pipeline is reprioritized based on the determined priority at 204. At prioritization of jobs (i.e., data pipelines) determined by the data pipeline prioritization program 110a, 110b, based on the data analytics performed while the data pipeline was located within the data lake (i.e., the data repository), the machine learning algorithm may determine a priority of scheduling for the data pipeline (i.e., the job). Based on the determined priority, the data pipeline may be executed).
	However, Gupta does not explicitly teach generating, by the computing device, a complexity score for the data pipeline job request, determining, by a computing device, a predicted execution time for the data pipeline job request, or priority based on predicted execution time and complexity score.
Foster teaches generating, by the computing device, a complexity score for the data pipeline job request ([0032], At operation 204, the workload orchestration system (e.g., complexity determination unit 106 of workload orchestration system 103, etc.) computes complexity scores for respective portions of the received workload. The complexity scores can be based on data complexity of the workload (e.g., data types, size, volume, variety, etc.) and can be indicative of the predicted processing time and/or memory requirements of the respective portions of the workload. In some embodiments, the workload orchestration system (e.g., complexity determination unit 106, etc.) can apply rules, policies, statistics, and/or the like in computing complexity scores for respective portions of the received workload.);
determining, by the computing device, a predicted execution time for the data pipeline job request ([0023], In some embodiments, identifying which workload proportions may take more processing resources than other portions can be done using machine learning to learn patterns in the workload information data and predict complexity scores indicative of effort (processing time and required memory) necessary for processing respective portions of the workload based on the data complexity.)
	a priority score for the data pipeline job request based on the predicted execution time and complexity score ([0037], As illustrated in FIG. 3, dynamic workflow orchestration 300 provides for generating and dispatching workload portion types from heterogeneous workload data streams using policy driven dispatching rules for workload type specific portioning based on data complexity. In some embodiments, a workload type policy may contain rules to describe and control the workload type extraction/transform description and dispatch behaviors. Rules may have access to run time collected traffic information used to contribute operational statistics of the running environment as symbolics names to use in the policy rule scripts or markup)
	It would have been obvious to one of ordinary skill in the art at the time the invention was filed to apply the teachings of Foster with the teachings of Gupta in order to provide a method that teaches data pipeline job request scheduling associated with a complexity score and predicted execution time. The motivation for applying Foster teaching with Gupta teaching is to provide a method that allows for assignment of different workloads to corresponding compute based on their respective complexity scores, thereby enabling performance optimization by matching workload complexity to compute capabilities and efficient allocation of computing resources ([0035], Foster). Gupta and Foster are analogous art directed towards allocation of computing resources. Therefore, it would have been obvious for one of ordinary skill in the art to combine Foster with Gupta to teach the claimed invention in order to provide a complexity metric for associating a workload with scheduling a particular set of computing resources.

With regard to claim 2, Gupta teaches wherein the [final priority score] complexity score of the data pipeline job request includes a static analysis score ([0049], The cognitive rule engine may compute the priority of a data pipeline (i.e., a job) by aggregating the above factors, among other factors affecting the priority of a data pipeline (i.e., a job), to determine a data pipeline’s (i.e., a job’s) final priority score.) and 
However, Gupta does not explicitly teach that the complexity score including a runtime analysis score.
Foster teaches complexity score (Abstract, using an orchestration engine to assign the portions of the workload to corresponding compute resources, based on their respective complexity score) of the data pipeline job request includes … a runtime analysis score ([0022], The complexity determination unit 106 can retrieve information from the database 110 (e.g., policies, rules, run time collected traffic information, operational statistics of the run time environment, etc.) for use in computing complexity scores).
	It would have been obvious to one of ordinary skill in the art at the time the invention was filed to apply the teachings of Foster with the teachings of Gupta in order to provide a method that teaches a complexity score including a runtime analysis score. The motivation for applying Foster teaching with Gupta teaching is to provide a method that allows for integration of dynamic and static metrics that inform the complexity of a data pipeline job, and by extension its scheduling priority, thereby improving resource utilization by accounting for current system resources and predicted job impact. Gupta and Foster are analogous art directed towards allocation of computing resources ([0026], Foster). Therefore, it would have been obvious for one of ordinary skill in the art to combine Foster with Gupta to teach the claimed invention in order to provide runtime analysis when calculating job complexity analysis and scoring in association with priority scheduling.

With regard to claim 3, Gupta teaches wherein the static analysis score includes a count of significant patterns and weighting of a subset of the significant patterns of the data pipeline job request ([0049], Prior to determining a final priority score, each of the factors may be multiplied by a weight corresponding to the importance of the factor.).

With regard to claim 6, Gupta teaches wherein the significant patterns of the data pipeline job include at least one of:
a shuffling or sorting of data in the data pipeline job,
a de-duplication of the data,
a regular expression search of text in the data,
input-output and/or read-write of the data,
aggregation of the data, and
serverless function on the data ([0069], For example, if a data pipeline (i.e., a job) follows the extract, load, and transform (ELT) data processing approach, this may mean that data is first extracted from the pipeline, loaded into a target database, and transformed and integrated into a desired format (Examiner notes: Such that the significant patterns of the ETL pipeline include input-output operations of data and aggregation of data into a data lake).) 

With regard to claim 7, Gupta teaches wherein the predicted execution time is determined by utilizing a machine learning model, the machine learning model is trained with pipeline information of historical data pipeline jobs ([0068], Prior to scheduling a data pipeline (i.e., a job), the data pipeline prioritization program 110a, 110b may check whether any required computational resources are available. The data pipeline prioritization program 110a, 110b may determine a current priority based on an assigned historical priority. Historical usage data may be determined based on information stored within, or accessed by, the data pipeline prioritization program 110a, 110b).

With regard to claim 8, Gupta teaches wherein the pipeline information comprises information of a data pipeline of the historical data pipeline jobs including at least one of ([0068], Prior to scheduling a data pipeline (i.e., a job), the data pipeline prioritization program 110a, 110b may check whether any required computational resources are available. The data pipeline prioritization program 110a, 110b may determine a current priority based on an assigned historical priority. Historical usage data may be determined based on information stored within, or accessed by, the data pipeline prioritization program 110a, 110b):
a pipeline name of a data pipeline,
a timestamp of the data pipeline,
a type of the data pipeline,
a resource requirement needed to execute the data pipeline,
a time to live,
an initial timestamp of a data pipeline job,
a predicted execution time of the data pipeline, and
the complexity score of the data pipeline ([0069], For example, if a data pipeline (i.e., a job) follows the extract, load, and transform (ELT) data processing approach (Examiner notes: A historical type of data pipeline), this may mean that data is first extracted from the pipeline, loaded into a target database, and transformed and integrated into a desired format. The transformation processing in an ELT pipeline may take place within the target database, requiring little to no additional resources. In this case, the resource consumption analysis may reveal that since fewer resources are required for the handling of the ELT pipeline, more pipelines can be scheduled simultaneously with the ELT pipeline. That is to say that the ELT pipeline may be given a higher priority as it may not be competing for resources with other pipelines (Examiner notes: Wherein historical processing using the ELT pipeline informs priority of present job requests)

With regard to claim 9, Gupta teaches wherein the priority score is further based on a resource requirement needed to execute the data pipeline of the data pipeline job request ([0035], The present embodiment may automatically assign priorities to incoming data pipelines based on who is expected to consume the results of the data pipeline, the overall resources needed to run the job, and/or when the job needs to be triggered based on a service level agreement (SLA) for the job).

With regard to claim 10, Gupta teaches wherein the priority score is further based on a service level agreement (SLA) time for the data pipeline job request to prevent an SLA breach ([0035], The present embodiment may automatically assign priorities to incoming data pipelines based on who is expected to consume the results of the data pipeline, the overall resources needed to run the job, and/or when the job needs to be triggered based on a service level agreement (SLA) for the job).

With regard to claim 11, Gupta teaches providing the data pipeline job request to a reinforcement learning model for the schedule ([0040], A machine learning algorithm (e.g., a neutral network, among other machine learning algorithms) may be used by the cognitive rules engine to determine the priority of a data pipeline (i.e., a job), among various competing data pipelines, without using explicit instructions, and instead relying on models and interference to make a determination).

With regard to claim 12, Gupta teaches wherein the computing device includes software providing a cloud-based service ([0034], As will be discussed with reference to FIG. 4, server computer 112 may include internal components 902a and external components 904a, respectively, and client computer 102 may include internal components 902b and external components 904b, respectively. Server computer 112 may also operate in cloud computing service model, such as Software as a Service (SaaS), Platform as a Service (PaaS), or Infrastructure as Service (IaaS). Server 112 may also be located in a cloud computing deployment model, such as private cloud, community cloud, public cloud, or hybrid cloud).

With regard to claim 13, Gupta teaches a computer program product comprising one or more computer readable storage media having program instructions collectively stored on the one or more computer readable storage media, the program instructions executable to ([0012], The present invention may be a system, a method, and/or a computer program product at any possible technical detail level of integration. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention):
…
	wherein the predicted execution time is determined by utilizing a machine learning model trained with pipeline information of historical data pipeline jobs ([0021], The machine learning algorithm may be trained on previous job runs, an analysis of dependencies on other pipelines, resource consumption data, and historical priorities).
	Claim 13 is a computer program product having similar limitations to claim 1. Thus, claim 13 is rejected for the same rationale as applied to claim 1.

With regard to claim 14, it is a computer program product having similar limitations to claim 2. Thus, claim 14 is rejected for the same rationale as applied to claim 2.

With regard to claim 15, it is a computer program product having similar limitations to claim 3. Thus, claim 15 is rejected for the same rationale as applied to claim 3.

With regard to claim 17, Gupta teaches a system comprising ([0012], The present invention may be a system, a method, and/or a computer program product at any possible technical detail level of integration):
a processor, a computer readable memory, one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media, the program instructions executable to ([0033], Referring to FIG. 1, an exemplary networked computer environment 100 in accordance with one embodiment is depicted. The networked computer environment 100 may include a computer 102 with a processor 104 and a data storage device 106 (Examiner notes: computer readable memory) that is enabled to run a software program 108 and a data pipeline prioritization program 110a):
	Claim 17 is a computer system having similar limitations to claim 1. Thus, claim 17 is rejected for the same rationale as applied to claim 1. 

With regard to claim 18, it is a system having similar limitations to claim 2. Thus, claim 18 is rejected for the same rationale as applied to claim 2.

With regard to claim 19, it is a system having similar limitations to claim 3. Thus, claim 19 is rejected for the same rationale as applied to claim 3.

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Gupta in view of Foster as applied to claim 1 above, and further in view of Qiu et al. Pub. No. US 2020/0334616 A1 (hereinafter Qiu).

With regard to claim 4, Qiu teaches wherein the complexity score of the data pipeline job request is based on a dimensionality reduction technique ([0011], In some examples, preprocessing the live data comprises applying one or more natural dimensionality reduction techniques to the live data).
	It would have been obvious to one of ordinary skill in the art at the time the invention was filed to apply the teachings of Qiu with the teachings of Gupta and Foster in order to provide a method that teaches generation of complexity score values based on dimensionality reduction techniques. The motivation for applying Qiu teaching with Gupta and Foster teaching is to provide a method that enables feature engineering such that allows for selection of features to be consumed by a prediction model for training and generating predictions, wherein operating on a reduced feature set improves the model’s computational efficiencies ([0030], Qiu). Gupta and Foster and Qiu are analogous art directed towards machine learning. Therefore, it would have been obvious for one of ordinary skill in the art to combine Qiu with Gupta and Foster to teach the claimed invention in order to provide a predictive machine learning model use dimensionality reduction techniques.

Claims 5, 16, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Gupta in view of Foster as applied to claim 2 above, and further in view of Zhu et al. Pub. No. US 2023/0376800 A1 (hereinafter Zhu).

With regard to claim 5, Zhu teaches wherein the runtime analysis score is calculated at an end of an execution of the data pipeline job request ([0054], As previously indicated, prediction of runtime distributions may be based on understanding and predicting variations in runtimes over repeated runs of jobs. Repeated job runs may be assembled into job groups. Runtime variation may refer to recurring jobs (e.g., a sample size greater than one job run).), and the runtime analysis score includes a count of dominant data pipeline execution profiles ([0082], Clusterer 130 is configured to characterize (e.g., group or cluster) the historic runtime distributions derived by featurizer 128. Cluster 130 may output, for example, one or more sets of runtime distribution classes, such as a set of runtime distribution for ratio normalization) and actual spent resources ([0055], Historical job info 122 may indicates sources of runtime variation that may be useful to predict sources of runtime variation in proposed jobs. Runtimes of job instances within each job group may vary, for example, due to one or more of the following: intrinsic characteristics, resource allocation, physical cluster environment, etc.; [0075], Resource allocation features may (e.g., also) be extracted for historic job instances of the same job group. Resource allocation features may include, for example, resource utilization (e.g., min, max, and average token usage) and/or historic statistics (e.g., historic average and standard deviation). A historic average may be a variable for spare tokens (Examiner notes: Wherein a unit of resource allocation is referred to as a token, see [0057]).).
	It would have been obvious to one of ordinary skill in the art at the time the invention was filed to apply the teachings of Zhu with the teachings of Gupta in view of Foster in order to provide a method that teaches runtime analysis score comprising of a count of dominant data pipeline profiles and associated spent resources. The motivation for applying Zhu teaching with Gupta in view of Foster teaching is to provide a method that allows for the capability to identify sources of runtime variation and a capability to adjust proposed computing jobs and provide resources for sources of runtime variations ([0020], Zhu). Gupta in view of Foster and Zhu are analogous art directed towards machine learning. Therefore, it would have been obvious for one of ordinary skill in the art to combine Zhu with Gupta in view of Foster to teach the claimed invention in order to provide data pipeline execution profile count and spent resource analysis to classify and identify data pipeline job request.

With regard to claim 16, it is a computer program product having similar limitations to claim 5. Thus, claim 16 is rejected for the same rationale as applied to claim 5.

With regard to claim 20, it is a system having similar limitations to claim 5. Thus, claim 20 is rejected for the same rationale as applied to claim 5.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to IVAN A CASTANEDA whose telephone number is (571)272-0465. The examiner can normally be reached Monday-Friday 9:30AM-5:30PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached at (571) 272-4169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/I.A.C./Examiner, Art Unit 2195                                                                                                                                                                                                        
/Aimee Li/Supervisory Patent Examiner, Art Unit 2195
Read full office action
HEURISTIC DATA PIPELINE SCHEDULING

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

HEURISTIC DATA PIPELINE SCHEDULING

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email