DETAILED ACTION
This office action is in response to the communication filed on September 10, 2025. Claims 1, 3-7, 10-12, 15-17, and 19-20 are currently pending.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant's arguments filed on September 10, 2025 have been fully considered but they are not persuasive for the following reasons:
Applicant in Pages 7-10 of the Remarks argues that the amended independent claims 1, 12, and 20 recite additional elements and those elements integrate the abstract idea into a practical application as the claim improves parallel processing technology.
Examiner respectfully disagrees.
It is important to note, the judicial exception alone cannot provide the improvement. The improvement can be provided by one or more additional elements. (MPEP 2106.05(a)).
Independent claims 1, 12, and 20 covers several steps, such as the determining, inserting, and repeating the determining steps, that recite an abstract idea within the “Mental Processes” grouping of abstract ideas, because a person can mentally or using a pen and paper perform the limitations recited in said steps, which is discussed in detail in the current 101 rejection below.
The claims do not provide any limitations that are directed to a specific improvement in computer technology because the steps argued by the applicant as being directed to a specific improvement in computer technology, are all recited in the claims as limitations that have been identified as abstract ideas.
The remaining steps in the claims that are identified as reciting additional elements, such as the executing the query step, are only adding insignificant extra-solution activity to the judicial exception, and are recognized as a well understood, routine, and conventional activity within the field of computer functions, which is not sufficient to amount to significantly more than the judicial exception and are not directed to any specific improvement in computer technology.
Accordingly, the additional elements, individually or in combination, do not integrate the abstract idea into a practical application, even viewing the claims a whole, because it does not impose any meaningful limits on practicing the abstract idea.
Applicant in Pages 10-14 of the Remarks argues that Molka and Kauvlya do not teach or even suggest the features “determining whether an operation within a sequence of operations comprising a query pipeline for executing a query is a workload skewing operation, wherein a size of an output dataset of the workload skewing operation is more than a threshold larger or more than a threshold smaller than an input dataset of the workload skewing operation and wherein the query pipeline comprises one or more first operations following a first scheduling operation that is performed to determine a first task size for the one or more first operations based on a quantity of time required to perform the one or more first operations on a first portion of the output dataset” and “in response to determining that an operation is a workload skewing operation, inserting, in the query pipeline, a reparallelization point subsequent to the workload skewing operation, wherein, at the reparallelization point, a second scheduling operation is performed to determine a second task size for one or more second operations following the second scheduling operation in the query pipeline based on a second quantity of time required to perform the one or more second operations on a portion of the output dataset smaller that the first portion of the output dataset”, as recited in amended independent claim 1 and similarly in amended independent claims 12 and 20.
Examiner respectfully disagrees. The cited prior art alone and/or in combination discloses the argued features.
In response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references. See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).
Molka in [0057] and [0066] discloses processing heavily memory intensive query workloads in a parallel fashion, query planner analyzes query structure and creates job execution plan, which is forwarded to buffers, query plan parallelism processed by one or more several worker threads, perform tasks in parallel, workload placement including parameterization, identifying parameters for optimization.
Molka in [0070] and [0071] discloses query plan parallelism processed by one or more worker threads, using buffers for query processing and scheduling, query job split into several tasks and processed by worker threads in parallel.
Molka in [0079] and [0087] discloses estimating performance of a query workload operation, based on execution time activity the execution process of a query is divided into plurality of processing phases.
Molka in [0094] and [0142] discloses query workloads executed in a sequential order, workloads based on differently sized datasets, times in sequential execution order of query sets used for workload.
Molka does not explicitly disclose determining a skewing operation, wherein a size of an output dataset of the skewing operation is more than a threshold larger or more than a threshold smaller than an input dataset of the skewing operation and based on quantity of time required, but the Kauvlya reference discloses the feature.
Therefore, Molka discloses determining whether an operation within a sequence of operations comprising a query pipeline for executing a query is a workload operation and wherein the query pipeline comprises one or more first operations following a first scheduling operation that is performed to determine a first task size for the one or more first operations to perform the one or more first operations on a first portion of the output dataset.
Molka in [0057] and [0066] discloses processing heavily memory intensive query workloads in a parallel fashion, query planner analyzes query structure and creates job execution plan, which is forwarded to buffers, query plan parallelism processed by one or more several worker threads, perform tasks in parallel, workload placement including parameterization, identifying parameters for optimization.
Molka in [0079] and [0087] discloses estimating performance of a query workload operation, based on execution time activity the execution process of a query is divided into plurality of processing phases.
Molka in [0094] and [0142] discloses query workloads executed in a sequential order, workloads based on differently sized datasets.
Molka does not explicitly disclose skewing operation whose input dataset and output dataset exhibit an above-threshold difference in size and based on quantity of time required, but the Kauvlya reference discloses the feature.
Therefore, Molka discloses in response to determining than an operation is a workload operation, inserting, in the query pipeline, a reparallelization point to the workload operation, wherein, at the reparallelization point, a second scheduling operation is performed to determine a second task size for one or more second operations following the second scheduling operation in the query pipeline to perform the one or more second operations on a second portion of the output dataset smaller than the first portion of the output dataset.
Molka in [0057] and [0066] discloses processing heavily memory intensive query workloads in a parallel fashion, query planner analyzes query structure and creates job execution plan, which is forwarded to buffers, query plan parallelism processed by one or more several worker threads, perform tasks in parallel, workload placement including parameterization, identifying parameters for optimization.
Molka in [0079] and [0087] discloses estimating performance of a query workload operation, based on execution time activity the execution process of a query is divided into plurality of processing phases.
Molka in [0094] and [0142] discloses query workloads executed in a sequential order, workloads based on differently sized datasets.
Molka does not explicitly disclose skewing operations, but the Kauvlya reference discloses the feature.
Therefore, Molka discloses repeating, at each of multiple subsequent points of the query pipeline, the determining and the inserting to insert one or more additional reparallelization points in the query pipeline in response to determining that one or more additional operations is a workload operation.
Molka in [0042], [0068], and [0070] discloses receiving, processing, and executing query workloads, query planning and execution stage, create job execution plan, query plan parallelism processed by one or several worker threads.
Therefore, Molka discloses executing the query by at least performing the sequence of operations comprising the query pipeline.
Molka discloses parallelizing operations in a query pipeline by partitioning workload into multiple tasks which are executed in parallel, however, Molka does not explicitly disclose:
determining whether an operation…is a…skewing operation, wherein a size of an output dataset of the…skewing operation is more than a threshold larger or more than a threshold smaller than an input dataset of the…skewing operation and…determine a first task size for the one or more operations based on a first quantity of time required to perform the one or more first operations on a first portion of the output dataset;
in response to determining that an operation is a…skewing operation, inserting, a reparallelization point subsequent to the…skewing operation, wherein, at the reparallelization point, a second scheduling operation is performed to determine a second task size for one or more second operations following the second scheduling operation…based on a second quantity of time required to perform the one or more second operations on a second portion of the output dataset…;
Kauvlya in [0002] and [0041] discloses cost is based on processing time and size being larger than capacity.
Kauvlya in [0010] and [0011] discloses identifying and moving skewed keys, employ partitioning to divide skewed data on joins by breaking down jobs into tasks and distribute to multiple nodes, which is processed in parallel, identify partitions that evenly balance workload.
Kauvlya in [0018] and [0024] and in Figure 2 discloses estimating cost of processing a join operation, a job such as a skewed join processed in parallel in nodes in a distributed environment by task executors, task distribution based on size of the dataset and the capabilities of the task executors, determining based on the size of the input dataset that the size of the output dataset will spill, optimized estimating cost and selecting join and partitioning strategy with lowest cost represented in terms of latency or throughput, determining that the probability that the output dataset of skewed join operation not fitting the memory is greater than a threshold, splitting input datasets so that records are evenly distributed across multiple partitions.
Therefore, Kauvlya discloses determining whether an operation is a skewing operation, wherein a size of an output dataset of the skewing operation is more than a threshold larger or more than a threshold smaller than an input dataset of the skewing operation and determine a first task size for the one or more operations based on a first quantity of time required to perform one or more first operations on a first portion of an output dataset and in response to determining that an operation is a skewing operation, inserting, a reparallelization point subsequent to the skewing operation, wherein, at the reparallelization point, a second scheduling operation is performed to determine a second task size for one or more second operations following the second scheduling operation based on a second quantity of time required to perform the one or more second operations on a second portion of the output dataset.
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, having the teachings of Molka and Kauvlya, to have combined Molka and Kauvlya. The motivation to combine Molka and Kauvlya would be to provide optimized joins in Big Data systems by optimizing skewed joins.
For the above reasons, Examiner states that rejection of the current Office action is proper.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1, 3-7, 10-12, 15-17, and 19-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
At step 1:
Independent claims 1, 12, and 20 respectively recite a system, a method, and a non-transitory computer readable medium, which are directed to a statutory category such as a process, machine, or an article of manufacture.
At step 2A, prong one:
Independent claim 1 and similarly independent claims 12 and 20 recites the limitations:
“determining whether an operation within a sequence of operations comprising a query pipeline for executing a query is a workload skewing operation, wherein a size of an output dataset of the workload skewing operation is more than a threshold larger or more than a threshold smaller than an input dataset of the workload skewing operation and wherein the query pipeline comprises one or more first operations following a first scheduling operation that is performed to determine a first task size for the one or more first operations based on a quantity of time required to perform the one or more first operations on a first portion of the output dataset”;
A person can mentally or using a pen and paper determine whether an operation within a sequence of operations comprising a query pipeline for executing a query is a workload skewing operation, by mentally or using a pen and paper determining that a size of an output dataset of the operation is more than a threshold larger or more than a threshold smaller than an input dataset of the operation, and by mentally or using a pen and paper determining a task size for one or more operation based on quantity of time required to perform the operations.
“in response to determining that an operation is a workload skewing operation, inserting, in the query pipeline, a reparallelization point subsequent to the workload skewing operation, wherein, at the reparallelization point, a second scheduling operation is performed to determine a second task size for one or more second operations following the second scheduling operation in the query pipeline based on a second quantity of time required to perform the one or more second operations on a portion of the output dataset smaller that the first portion of the output dataset”;
A person can mentally or using a pen and paper determine that an operation is a workload skewing operation, and in response to the determination the person can mentally or using a pen and paper insert a reparallelization point subsequent to the workload skewing operation in the query pipeline for a corresponding scheduling operation to be performed to determine a first task size for one or more subsequent operations in the query pipeline based on a quantity of time required to perform one or more subsequent operation on a portion of the output dataset from preceding operations in the query pipeline.
“repeating, at each of multiple subsequent points of the query pipeline, the determining and the inserting to insert one or more additional reparallelization points in the query pipeline in response to determining that one or more additional operations is a workload skewing operation”;
A person can mentally or using a pen and paper determining that one or more additional operations at each of multiple subsequent points of a query pipeline is a workload skewing operation, and in response to the determination the person can mentally or using a pen and paper repeat a process of determining and inserting to insert one or more additional reparallelization points in the query pipeline.
The limitations, as recited above, are processes that, under their broadest reasonable interpretation, cover steps that can be performed in the human mind or by a human using a pen and paper, but for recitation of generic computer components.
If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claims recite an abstract idea.
At step 2A, prong two:
This judicial exception is not integrated into a practical application.
Independent claim 1 and similarly independent claims 12 and 20 recites the limitations:
“executing the query by at least performing the sequence of operations comprising the query pipeline”, which is a step of executing a query to perform operations, and is just adding the words apply it or its equivalent to the abstract idea.
The additional elements “a system, comprising: at least one data processor; and at least one memory storing instructions which, when executed by the at least one data processor, cause operations comprising:” in the steps in claim 1 are recited at a high-level of generality, such that it amounts to no more than mere instructions to apply the exception using generic computer components.
The additional elements “a computer-implemented method” in the steps in claim 12 are recited at a high-level of generality, such that it amounts to no more than mere instructions to apply the exception using generic computer components.
The additional elements “a non-transitory computer readable medium storing instructions, which when executed by at least one data processor, result in operations comprising:” in the steps in claim 20 are recited at a high-level of generality, such that it amounts to no more than mere instructions to apply the exception using generic computer components.
Accordingly, the additional elements, individually or in combination, do not integrate the abstract idea into a practical application, even viewing the claims a whole, because it does not impose any meaningful limits on practicing the abstract idea.
At step 2B:
Independent claims 1, 12, and 20 recite the same additional elements as identified in step 2A prong two above. These additional elements are not sufficient to amount to significantly more than the judicial exception.
Independent claim 1 and similarly independent claims 12 and 20 recites the limitations:
“executing the query by at least performing the sequence of operations comprising the query pipeline”, which is a step of executing a query to perform operations, and is just adding the words apply it or its equivalent to the abstract idea.
Accordingly, the additional limitations are not sufficient to amount to significantly more than the judicial exception. Therefore, the claims are directed to an abstract idea and are not patent eligible.
Dependent claim 3 recites additional limitations, such as:
“wherein the second scheduling operation is configured to gather the output dataset of the workload skewing operation into one or more buffers”, which is a step of gathering or retrieving data.
At step 2A prong two, the step is recited at a high level of generality, and amounts to mere data gathering, which is a form of insignificant extra-solution activity (MPEP 2106.05(g)).
At step 2B, the step is recognized as a well understood, routine, and conventional activity within the field of computer functions as an element of storing and retrieving information in memory (MPEP 2106.05(d)(II)(iv)).
The additional element “one or more buffers” is recited at a high-level of generality, such that it amounts to no more than mere instructions to apply the exception using generic computer components.
Accordingly, the additional elements, individually or in combination, do not integrate the abstract idea into a practical application, even viewing the claims a whole, because it does not impose any meaningful limits on practicing the abstract idea.
Dependent claim 4 recites additional limitations, such as:
“wherein the second scheduling operation is further configured to schedule, based at least on the second task size, one or more corresponding tasks for performance by a corresponding quantity of threads once the one or more buffers are full, and wherein data associated with each task is pushed to the one or more second operations following the workload skewing operation”.
This limitation is directed to the same abstract idea under the mental processes grouping as independent claim 1, because a person can mentally or using a pen and paper perform scheduling operations to schedule tasks, and because the limitation does not recite any additional elements that are sufficient to amount to significantly more.
The additional element “one or more buffers” is recited at a high-level of generality, such that it amounts to no more than mere instructions to apply the exception using generic computer components.
Accordingly, the additional elements, individually or in combination, do not integrate the abstract idea into a practical application, even viewing the claims a whole, because it does not impose any meaningful limits on practicing the abstract idea.
Dependent claim 5 recites additional limitations, such as:
“wherein the second scheduling operation is configured to avoid parallelization based at least on a first time required to gather the output dataset of the workload skewing operation into one or more buffers exceeding the second quantity of time required to perform the one or more second operations following the workload skewing operation”.
This limitation is directed to the same abstract idea under the mental processes grouping as independent claim 1, because a person can mentally or using a pen and paper perform scheduling operations, and because the limitation does not recite any additional elements that are sufficient to amount to significantly more.
The additional element “one or more buffers” is recited at a high-level of generality, such that it amounts to no more than mere instructions to apply the exception using generic computer components.
Accordingly, the additional elements, individually or in combination, do not integrate the abstract idea into a practical application, even viewing the claims a whole, because it does not impose any meaningful limits on practicing the abstract idea.
Dependent claim 6 recites additional limitations, such as:
“wherein the second scheduling operation is configured to avoid parallelization based at least on there being less than a threshold quantity of tasks of the second task size”.
This limitation is directed to the same abstract idea under the mental processes grouping as independent claim 1, because a person can mentally or using a pen and paper perform scheduling operations, and because the limitation does not recite any additional elements that are sufficient to amount to significantly more.
Accordingly, the additional elements, individually or in combination, do not integrate the abstract idea into a practical application, even viewing the claims a whole, because it does not impose any meaningful limits on practicing the abstract idea.
Dependent claim 7 recites additional limitations, such as:
“wherein the one or more buffers are configured to accommodate data for multiple tasks, and wherein the second scheduling operation is configured to schedule the multiple tasks in parallel”.
This limitation is directed to the same abstract idea under the mental processes grouping as independent claim 1, because a person can mentally or using a pen and paper perform scheduling operations, and because the limitation does not recite any additional elements that are sufficient to amount to significantly more.
The additional element “wherein the one or more buffers are configured to accommodate data for multiple tasks” is recited at a high-level of generality, such that it amounts to no more than mere instructions to apply the exception using generic computer components.
Accordingly, the additional elements, individually or in combination, do not integrate the abstract idea into a practical application, even viewing the claims a whole, because it does not impose any meaningful limits on practicing the abstract idea.
Dependent claim 10 recites additional limitations, such as:
“wherein the workload skewing operation comprises a selective join operation, an expanding join operation, or a selective table scan operation”.
This limitation is directed to the same abstract idea under the mental processes grouping as independent claim 1, because a person can mentally or using a pen and paper determine workload skewing operations comprising various operations, and because the limitation does not recite any additional elements that are sufficient to amount to significantly more.
Accordingly, the additional elements, individually or in combination, do not integrate the abstract idea into a practical application, even viewing the claims a whole, because it does not impose any meaningful limits on practicing the abstract idea.
Dependent claim 11 recites additional limitations, such as:
“wherein the workload skewing operation is identified based on a compile time estimation or a runtime estimation of a first size of the input dataset and a second size of the output dataset”.
This limitation is directed to the same abstract idea under the mental processes grouping as independent claim 1, because a person can mentally or using a pen and paper identify workload skewing operations based on various estimations, and because the limitation does not recite any additional elements that are sufficient to amount to significantly more.
Accordingly, the additional elements, individually or in combination, do not integrate the abstract idea into a practical application, even viewing the claims a whole, because it does not impose any meaningful limits on practicing the abstract idea.
Dependent claim 14 recites additional limitations, such as:
“wherein the second scheduling operation is configured to gather the output dataset of the workload skewing operation into one or more buffers”, which is a step of gathering or retrieving data.
At step 2A prong two, the step is recited at a high level of generality, and amounts to mere data gathering, which is a form of insignificant extra-solution activity (MPEP 2106.05(g)).
At step 2B, the step is recognized as a well understood, routine, and conventional activity within the field of computer functions as an element of storing and retrieving information in memory (MPEP 2106.05(d)(II)(iv)).
“wherein the second scheduling operation is further configured to schedule, based at least on the second task size, one or more corresponding tasks for performance by a corresponding quantity of threads once the one or more buffers are full, and wherein data associated with each task is pushed to the one or more second operations following the workload skewing operation”.
This limitation is directed to the same abstract idea under the mental processes grouping as independent claim 12, because a person can mentally or using a pen and paper perform scheduling operations to schedule tasks, and because the limitation does not recite any additional elements that are sufficient to amount to significantly more.
The additional element “one or more buffers” is recited at a high-level of generality, such that it amounts to no more than mere instructions to apply the exception using generic computer components.
Accordingly, the additional elements, individually or in combination, do not integrate the abstract idea into a practical application, even viewing the claims a whole, because it does not impose any meaningful limits on practicing the abstract idea.
Dependent claim 15 recites additional limitations, such as:
“wherein the second scheduling operation is configured to avoid parallelization based at least on a first time required to gather the output dataset of the workload skewing operation into one or more buffers exceeding the second quantity of time required to perform the one or more second operations following the workload skewing operation”.
This limitation is directed to the same abstract idea under the mental processes grouping as independent claim 12, because a person can mentally or using a pen and paper perform scheduling operations, and because the limitation does not recite any additional elements that are sufficient to amount to significantly more.
The additional element “one or more buffers” is recited at a high-level of generality, such that it amounts to no more than mere instructions to apply the exception using generic computer components.
Accordingly, the additional elements, individually or in combination, do not integrate the abstract idea into a practical application, even viewing the claims a whole, because it does not impose any meaningful limits on practicing the abstract idea.
Dependent claim 16 recites additional limitations such as:
“wherein the second scheduling operation is configured to avoid parallelization based at least on there being less than a threshold quantity of tasks of the second task size”.
This limitation is directed to the same abstract idea under the mental processes grouping as independent claim 12, because a person can mentally or using a pen and paper perform scheduling operations, and because the limitation does not recite any additional elements that are sufficient to amount to significantly more.
Accordingly, the additional elements, individually or in combination, do not integrate the abstract idea into a practical application, even viewing the claims a whole, because it does not impose any meaningful limits on practicing the abstract idea.
Dependent claim 17 recites additional limitations such as:
“wherein the one or more buffers are configured to accommodate data for multiple tasks, and wherein the second scheduling operation is configured to schedule the multiple tasks in parallel”.
This limitation is directed to the same abstract idea under the mental processes grouping as independent claim 12, because a person can mentally or using a pen and paper perform scheduling operations, and because the limitation does not recite any additional elements that are sufficient to amount to significantly more.
The additional element “wherein the one or more buffers are configured to accommodate data for multiple tasks” is recited at a high-level of generality, such that it amounts to no more than mere instructions to apply the exception using generic computer components.
Accordingly, the additional elements, individually or in combination, do not integrate the abstract idea into a practical application, even viewing the claims a whole, because it does not impose any meaningful limits on practicing the abstract idea.
Dependent claim 19 recites additional limitations, such as:
“wherein the workload skewing operation is identified based on a compile time estimation or a runtime estimation of a first size of the input dataset and a second size of the output dataset”.
This limitation is directed to the same abstract idea under the mental processes grouping as independent claim 1, because a person can mentally or using a pen and paper identify workload skewing operations based on various estimations, and because the limitation does not recite any additional elements that are sufficient to amount to significantly more.
Accordingly, the additional elements, individually or in combination, do not integrate the abstract idea into a practical application, even viewing the claims a whole, because it does not impose any meaningful limits on practicing the abstract idea.
Accordingly, dependent claims 3-7, 10, 11, 14-17, and 19 are also directed to abstract idea without significantly more and are not patent eligible.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1, 3, 4, 7-12, 14, and 17-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Molka (US Pub 2016/0328273) in view of Kauvlya (US Pub 2017/0185648).
With respect to claim 1, Molka discloses a system (Molka in [0048] and [0059] and in Figure 1A discloses a system), comprising:
at least one data processor (Molka in [0048] and [0059] and in Figure 1A discloses a processor); and
at least one memory storing instructions which, when executed by the at least one data processor, cause operations (Molka in [0059] and [0060] and in Figure 1A discloses a memory storing instructions executed by a processor) comprising:
determining whether an operation within a sequence of operations comprising a query pipeline for executing a query is a workload…operation…and wherein the query pipeline comprises one or more first operations following a first scheduling operation that is performed to determine a first task size for the one or more first operations…to perform the one or more first operations on a first portion of the output dataset (Molka in [0057] and [0066] discloses processing heavily memory intensive query workloads in a parallel fashion, query planner analyzes query structure and creates job execution plan, which is forwarded to buffers, query plan parallelism processed by one or more several worker threads, perform tasks in parallel, workload placement including parameterization, identifying parameters for optimization; Molka in [0070] and [0071] discloses query plan parallelism processed by one or more worker threads, using buffers for query processing and scheduling, query job split into several tasks and processed by worker threads in parallel; Molka in [0079] and [0087] discloses estimating performance of a query workload operation, based on execution time activity the execution process of a query is divided into plurality of processing phases; Molka in [0094] and [0142] discloses query workloads executed in a sequential order, workloads based on differently sized datasets, times in sequential execution order of query sets used for workload; here Molka does not explicitly disclose determining a skewing operation, wherein a size of an output dataset of the skewing operation is more than a threshold larger or more than a threshold smaller than an input dataset of the skewing operation and based on quantity of time required, but the Kauvlya reference discloses the feature, as discussed below);
in response to determining than an operation is a workload…operation, inserting, in the query pipeline, a reparallelization point…to the workload… operation, wherein, at the reparallelization point, a second scheduling operation is performed to determine a second task size for one or more second operations following the second scheduling operation in the query pipeline…to perform the one or more second operations on a second portion of the output dataset smaller than the first portion of the output dataset (Molka in [0057] and [0066] discloses processing heavily memory intensive query workloads in a parallel fashion, query planner analyzes query structure and creates job execution plan, which is forwarded to buffers, query plan parallelism processed by one or more several worker threads, perform tasks in parallel, workload placement including parameterization, identifying parameters for optimization; Molka in [0079] and [0087] discloses estimating performance of a query workload operation, based on execution time activity the execution process of a query is divided into plurality of processing phases; Molka in [0094] and [0142] discloses query workloads executed in a sequential order, workloads based on differently sized datasets; here Molka does not explicitly disclose skewing operation whose input dataset and output dataset exhibit an above-threshold difference in size and based on quantity of time required, but the Kauvlya reference discloses the feature, as discussed below);
repeating, at each of multiple subsequent points of the query pipeline, the determining and the inserting to insert one or more additional reparallelization points in the query pipeline in response to determining that one or more additional operations is a workload…opearation (Molka in [0057] and [0066] discloses processing heavily memory intensive query workloads in a parallel fashion, query planner analyzes query structure and creates job execution plan, which is forwarded to buffers, query plan parallelism processed by one or more several worker threads, perform tasks in parallel, workload placement including parameterization, identifying parameters for optimization; Molka in [0079] and [0087] discloses estimating performance of a query workload operation, based on execution time activity the execution process of a query is divided into plurality of processing phases; Molka in [0094] and [0142] discloses query workloads executed in a sequential order, workloads based on differently sized datasets; here Molka does not explicitly disclose skewing operations, but the Kauvlya reference discloses the feature, as discussed below); and
executing the query by at least performing the sequence of operations comprising the query pipeline (Molka in [0042], [0068], and [0070] discloses receiving, processing, and executing query workloads, query planning and execution stage, create job execution plan, query plan parallelism processed by one or several worker threads).
Molka discloses parallelizing operations in a query pipeline by partitioning workload into multiple tasks which are executed in parallel, however, Molka does not explicitly disclose:
determining whether an operation…is a…skewing operation, wherein a size of an output dataset of the…skewing operation is more than a threshold larger or more than a threshold smaller than an input dataset of the…skewing operation and…determine a first task size for the one or more operations based on a first quantity of time required to perform the one or more first operations on a first portion of the output dataset;
in response to determining that an operation is a…skewing operation, inserting, a reparallelization point subsequent to the…skewing operation, wherein, at the reparallelization point, a second scheduling operation is performed to determine a second task size for one or more second operations following the second scheduling operation…based on a second quantity of time required to perform the one or more second operations on a second portion of the output dataset smaller that the first portion of the output dataset;
The Kauvlya reference discloses determining whether an operation is a skewing operation, wherein a size of an output dataset of the skewing operation is more than a threshold larger or more than a threshold smaller than an input dataset of the skewing operation and determine a first task size for the one or more operations based on a first quantity of time required to perform one or more first operations on a first portion of an output dataset (Kauvlya in [0002] and [0041] discloses cost is based on processing time and size being larger than capacity; Kauvlya in [0010] and [0011] discloses identifying and moving skewed keys, employ partitioning to divide skewed data on joins by breaking down jobs into tasks and distribute to multiple nodes, which is processed in parallel, identify partitions that evenly balance workload; Kauvlya in [0018] and [0024] and in Figure 2 discloses estimating cost of processing a join operation, a job such as a skewed join processed in parallel in nodes in a distributed environment by task executors, task distribution based on size of the dataset and the capabilities of the task executors, determining based on the size of the input dataset that the size of the output dataset will spill, optimized estimating cost and selecting join and partitioning strategy with lowest cost represented in terms of latency or throughput, determining that the probability that the output dataset of skewed join operation not fitting the memory is greater than a threshold, splitting input datasets so that records are evenly distributed across multiple partitions);
in response to determining that an operation is a skewing operation, inserting, a reparallelization point subsequent to the skewing operation, wherein, at the reparallelization point, a second scheduling operation is performed to determine a second task size for one or more second operations following the second scheduling operation based on a second quantity of time required to perform the one or more second operations on a second portion of the output dataset (Kauvlya in [0002] and [0041] discloses cost is based on processing time and size being larger than capacity; Kauvlya in [0010] and [0011] discloses identifying and moving skewed keys, employ partitioning to divide skewed data on joins by breaking down jobs into tasks and distribute to multiple nodes, which is processed in parallel, identify partitions that evenly balance workload; Kauvlya in [0018] and [0024] and in Figure 2 discloses estimating cost of processing a join operation, a job such as a skewed join processed in parallel in nodes in a distributed environment by task executors, task distribution based on size of the dataset and the capabilities of the task executors, determining based on the size of the input dataset that the size of the output dataset will spill, optimized estimating cost and selecting join and partitioning strategy with lowest cost represented in terms of latency or throughput, determining that the probability that the output dataset of skewed join operation not fitting the memory is greater than a threshold, splitting input datasets so that records are evenly distributed across multiple partitions).
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, having the teachings of Molka and Kauvlya, to have combined Molka and Kauvlya. The motivation to combine Molka and Kauvlya would be to provide optimized joins in Big Data systems by optimizing skewed joins (Kauvlya: [0001]).
With respect to claim 3, Molka in view of Kauvlya discloses the system of claim 1, wherein the second scheduling operation is configured to gather the output dataset of the workload skewing operation into one or more buffers (Molka in [0070] discloses query job execution plans forwarded to buffer; Molka in [0071] discloses query processing and scheduling relying on buffers; Molka in [0099] discloses datasets related to workloads loaded in memory; Kauvlya in [0024] and [0025] discloses result dataset stored in memory; Kauvlya in [0002] and [0041] discloses cost is based on processing time and size being larger than capacity).
With respect to claim 4, Molka in view of Kauvlya discloses the system of claim 3, wherein the second scheduling operation is further configured to schedule, based at least on the second task size, one or more corresponding tasks for performance by a corresponding quantity of threads once the one or more buffers are full, and wherein data associated with each task is pushed to the one or more second operations following the workload skewing operation (Molka in [0057] and [0066] discloses processing heavily memory intensive query workloads in a parallel fashion, query planner analyzes query structure and creates job execution plan, which is forwarded to buffers, query plan parallelism processed by one or more several worker threads, perform tasks in parallel, workload placement including parameterization, identifying parameters for optimization; Molka in [0079] and [0087] discloses estimating performance of a query workload operation, based on execution time activity the execution process of a query is divided into plurality of processing phases; Molka in [0094] and [0142] discloses query workloads executed in a sequential order, workloads based on differently sized datasets; Molka in [0070] and [0071] discloses query plan parallelism processed by one or more worker threads, using buffers for query processing and scheduling, query job split into several tasks and processed by worker threads in parallel).
With respect to claim 7, Molka in view of Kauvlya discloses the system of claim 4, wherein the one or more buffers are configured to accommodate data for multiple tasks, and wherein the second scheduling operation is configured to schedule the multiple tasks in parallel (Molka in [0057] and [0066] discloses processing heavily memory intensive query workloads in a parallel fashion, query planner analyzes query structure and creates job execution plan, which is forwarded to buffers, query plan parallelism processed by one or more several worker threads, perform tasks in parallel, workload placement including parameterization, identifying parameters for optimization; Molka in [0079] and [0087] discloses estimating performance of a query workload operation, based on execution time activity the execution process of a query is divided into plurality of processing phases; Molka in [0094] and [0142] discloses query workloads executed in a sequential order, workloads based on differently sized datasets; Molka in [0070] and [0071] discloses query plan parallelism processed by one or more worker threads, using buffers for query processing and scheduling, query job split into several tasks and processed by worker threads in parallel).
With respect to claim 10, Molka in view of Kauvlya discloses the system of claim 1, wherein the workload skewing operation comprises a selective join operation, and expanding join operation, or a selective table scan operation (Kauvlya in [0010] and [0011] discloses employ partitioning to divide skewed data on joins by breaking down jobs into tasks and distribute to multiple nodes; Kauvlya in [0018] and [0024]-[0026] and in Figure 2 discloses partitions derived by identifying skewed keys and splitting input datasets so that records are evenly distributed across multiple partitions, partitioned data placed in queues prior to subsequent state of the join, storing the result of the join, model joins as queues if the input datasets are in excess of a specified threshold size).
With respect to claim 11, Molka in view of Kauvlya discloses the system of claim 1, wherein the workload skewing operation is identified based on a compile time estimation or a runtime estimation of a first size of the input dataset and a second size of the output dataset (Molka in [0070] and [0071] discloses query job execution plans forwarded to buffer, query processing and scheduling relying on buffers; Molka in [0099] discloses datasets related to workloads loaded in memory; Kauvlya in [0002] and [0041] discloses cost is based on processing time and size being larger than capacity; Kauvlya in [0024] and [0025] discloses result dataset stored in memory)
With respect to claim 12, Molka discloses a computer-implemented method (Molka in [0042] discloses a computer implemented method), comprising:
determining whether an operation within a sequence of operations comprising a query pipeline for executing a query is a workload…operation…and wherein the query pipeline comprises one or more first operations following a first scheduling operation that is performed to determine a first task size for the one or more first operations…to perform the one or more first operations on a first portion of the output dataset (Molka in [0057] and [0066] discloses processing heavily memory intensive query workloads in a parallel fashion, query planner analyzes query structure and creates job execution plan, which is forwarded to buffers, query plan parallelism processed by one or more several worker threads, perform tasks in parallel, workload placement including parameterization, identifying parameters for optimization; Molka in [0070] and [0071] discloses query plan parallelism processed by one or more worker threads, using buffers for query processing and scheduling, query job split into several tasks and processed by worker threads in parallel; Molka in [0079] and [0087] discloses estimating performance of a query workload operation, based on execution time activity the execution process of a query is divided into plurality of processing phases; Molka in [0094] and [0142] discloses query workloads executed in a sequential order, workloads based on differently sized datasets, times in sequential execution order of query sets used for workload; here Molka does not explicitly disclose determining a skewing operation, wherein a size of an output dataset of the skewing operation is more than a threshold larger or more than a threshold smaller than an input dataset of the skewing operation and based on quantity of time required, but the Kauvlya reference discloses the feature, as discussed below);
in response to determining than an operation is a workload…operation, inserting, in the query pipeline, a reparallelization point…to the workload… operation, wherein, at the reparallelization point, a second scheduling operation is performed to determine a second task size for one or more second operations following the second scheduling operation in the query pipeline…to perform the one or more second operations on a second portion of the output dataset smaller than the first portion of the output dataset (Molka in [0057] and [0066] discloses processing heavily memory intensive query workloads in a parallel fashion, query planner analyzes query structure and creates job execution plan, which is forwarded to buffers, query plan parallelism processed by one or more several worker threads, perform tasks in parallel, workload placement including parameterization, identifying parameters for optimization; Molka in [0079] and [0087] discloses estimating performance of a query workload operation, based on execution time activity the execution process of a query is divided into plurality of processing phases; Molka in [0094] and [0142] discloses query workloads executed in a sequential order, workloads based on differently sized datasets; here Molka does not explicitly disclose skewing operation whose input dataset and output dataset exhibit an above-threshold difference in size and based on quantity of time required, but the Kauvlya reference discloses the feature, as discussed below);
repeating, at each of multiple subsequent points of the query pipeline, the determining and the inserting to insert one or more additional reparallelization points in the query pipeline in response to determining that one or more additional operations is a workload…operation (Molka in [0057] and [0066] discloses processing heavily memory intensive query workloads in a parallel fashion, query planner analyzes query structure and creates job execution plan, which is forwarded to buffers, query plan parallelism processed by one or more several worker threads, perform tasks in parallel, workload placement including parameterization, identifying parameters for optimization; Molka in [0079] and [0087] discloses estimating performance of a query workload operation, based on execution time activity the execution process of a query is divided into plurality of processing phases; Molka in [0094] and [0142] discloses query workloads executed in a sequential order, workloads based on differently sized datasets; here Molka does not explicitly disclose skewing operations, but the Kauvlya reference discloses the feature, as discussed below); and
executing the query by at least performing the sequence of operations comprising the query pipeline (Molka in [0042], [0068], and [0070] discloses receiving, processing, and executing query workloads, query planning and execution stage, create job execution plan, query plan parallelism processed by one or several worker threads).
Molka discloses parallelizing operations in a query pipeline by partitioning workload into multiple tasks which are executed in parallel, however, Molka does not explicitly disclose:
determining whether an operation…is a…skewing operation, wherein a size of an output dataset of the…skewing operation is more than a threshold larger or more than a threshold smaller than an input dataset of the…skewing operation and…determine a first task size for the one or more operations based on a first quantity of time required to perform the one or more first operations on a first portion of the output dataset;
in response to determining that an operation is a…skewing operation, inserting, a reparallelization point subsequent to the…skewing operation, wherein, at the reparallelization point, a second scheduling operation is performed to determine a second task size for one or more second operations following the second scheduling operation…based on a second quantity of time required to perform the one or more second operations on a second portion of the output dataset smaller that the first portion of the output dataset;
The Kauvlya reference discloses determining whether an operation is a skewing operation, wherein a size of an output dataset of the skewing operation is more than a threshold larger or more than a threshold smaller than an input dataset of the skewing operation and determine a first task size for the one or more operations based on a first quantity of time required to perform one or more first operations on a first portion of an output dataset (Kauvlya in [0002] and [0041] discloses cost is based on processing time and size being larger than capacity; Kauvlya in [0010] and [0011] discloses identifying and moving skewed keys, employ partitioning to divide skewed data on joins by breaking down jobs into tasks and distribute to multiple nodes, which is processed in parallel, identify partitions that evenly balance workload; Kauvlya in [0018] and [0024] and in Figure 2 discloses estimating cost of processing a join operation, a job such as a skewed join processed in parallel in nodes in a distributed environment by task executors, task distribution based on size of the dataset and the capabilities of the task executors, determining based on the size of the input dataset that the size of the output dataset will spill, optimized estimating cost and selecting join and partitioning strategy with lowest cost represented in terms of latency or throughput, determining that the probability that the output dataset of skewed join operation not fitting the memory is greater than a threshold, splitting input datasets so that records are evenly distributed across multiple partitions);
in response to determining that an operation is a skewing operation, inserting, a reparallelization point subsequent to the skewing operation, wherein, at the reparallelization point, a second scheduling operation is performed to determine a second task size for one or more second operations following the second scheduling operation based on a second quantity of time required to perform the one or more second operations on a second portion of the output dataset (Kauvlya in [0002] and [0041] discloses cost is based on processing time and size being larger than capacity; Kauvlya in [0010] and [0011] discloses identifying and moving skewed keys, employ partitioning to divide skewed data on joins by breaking down jobs into tasks and distribute to multiple nodes, which is processed in parallel, identify partitions that evenly balance workload; Kauvlya in [0018] and [0024] and in Figure 2 discloses estimating cost of processing a join operation, a job such as a skewed join processed in parallel in nodes in a distributed environment by task executors, task distribution based on size of the dataset and the capabilities of the task executors, determining based on the size of the input dataset that the size of the output dataset will spill, optimized estimating cost and selecting join and partitioning strategy with lowest cost represented in terms of latency or throughput, determining that the probability that the output dataset of skewed join operation not fitting the memory is greater than a threshold, splitting input datasets so that records are evenly distributed across multiple partitions).
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, having the teachings of Molka and Kauvlya, to have combined Molka and Kauvlya. The motivation to combine Molka and Kauvlya would be to provide optimized joins in Big Data systems by optimizing skewed joins (Kauvlya: [0001]).
With respect to claim 14, Molka in view of Kauvlya discloses the method of claim 12, wherein the scheduling operation is configured to gather the output dataset of the workload skewing operation into one or more buffers, wherein the second scheduling operation is further configured to schedule, based at least on the second task size, one or more corresponding tasks for performance by a corresponding quantity of threads once the one or more buffers are full, and wherein data associated with each task is pushed to the one or more second operations following the workload skewing operation (Molka in [0070] and [0071] discloses query job execution plans forwarded to buffer, query processing and scheduling relying on buffers; Molka in [0099] discloses datasets related to workloads loaded in memory; Kauvlya in [0002] and [0041] discloses cost is based on processing time and size being larger than capacity; Kauvlya in [0024] and [0025] discloses result dataset stored in memory).
With respect to claim 17, Molka in view of Kauvlya discloses the method of claim 14, wherein the one or more buffers are configured to accommodate data for multiple tasks, and wherein the second scheduling operation is configured to schedule the multiple tasks in parallel (Molka in [0057] and [0066] discloses processing heavily memory intensive query workloads in a parallel fashion, query planner analyzes query structure and creates job execution plan, which is forwarded to buffers, query plan parallelism processed by one or more several worker threads, perform tasks in parallel, workload placement including parameterization, identifying parameters for optimization; Molka in [0079] and [0087] discloses estimating performance of a query workload operation, based on execution time activity the execution process of a query is divided into plurality of processing phases; Molka in [0094] and [0142] discloses query workloads executed in a sequential order, workloads based on differently sized datasets; Molka in [0070] and [0071] discloses query plan parallelism processed by one or more worker threads, using buffers for query processing and scheduling, query job split into several tasks and processed by worker threads in parallel).
With respect to claim 19, Molka in view of Kauvlya discloses the method of claim 12, wherein the workload skewing operation is identified based on a compile time estimation or a runtime estimation of a first size of the input dataset and a second size of the output dataset (Molka in [0070] and [0071] discloses query job execution plans forwarded to buffer, query processing and scheduling relying on buffers; Molka in [0099] discloses datasets related to workloads loaded in memory; Kauvlya in [0002] and [0041] discloses cost is based on processing time and size being larger than capacity; Kauvlya in [0024] and [0025] discloses result dataset stored in memory).
With respect to claim 20, Molka discloses a non-transitory computer readable medium storing instructions, which when executed by at least one data processor, result in operations (Molka in [0060] and [0062] and in Figure 1A discloses a non-transitory medium storing instructions executed by a processor) comprising:
determining whether an operation within a sequence of operations comprising a query pipeline for executing a query is a workload…operation…and wherein the query pipeline comprises one or more first operations following a first scheduling operation that is performed to determine a first task size for the one or more first operations…to perform the one or more first operations on a first portion of the output dataset (Molka in [0057] and [0066] discloses processing heavily memory intensive query workloads in a parallel fashion, query planner analyzes query structure and creates job execution plan, which is forwarded to buffers, query plan parallelism processed by one or more several worker threads, perform tasks in parallel, workload placement including parameterization, identifying parameters for optimization; Molka in [0070] and [0071] discloses query plan parallelism processed by one or more worker threads, using buffers for query processing and scheduling, query job split into several tasks and processed by worker threads in parallel; Molka in [0079] and [0087] discloses estimating performance of a query workload operation, based on execution time activity the execution process of a query is divided into plurality of processing phases; Molka in [0094] and [0142] discloses query workloads executed in a sequential order, workloads based on differently sized datasets, times in sequential execution order of query sets used for workload; here Molka does not explicitly disclose determining a skewing operation, wherein a size of an output dataset of the skewing operation is more than a threshold larger or more than a threshold smaller than an input dataset of the skewing operation and based on quantity of time required, but the Kauvlya reference discloses the feature, as discussed below);
in response to determining than an operation is a workload…operation, inserting, in the query pipeline, a reparallelization point…to the workload… operation, wherein, at the reparallelization point, a second scheduling operation is performed to determine a second task size for one or more second operations following the second scheduling operation in the query pipeline…to perform the one or more second operations on a second portion of the output dataset smaller than the first portion of the output dataset (Molka in [0057] and [0066] discloses processing heavily memory intensive query workloads in a parallel fashion, query planner analyzes query structure and creates job execution plan, which is forwarded to buffers, query plan parallelism processed by one or more several worker threads, perform tasks in parallel, workload placement including parameterization, identifying parameters for optimization; Molka in [0079] and [0087] discloses estimating performance of a query workload operation, based on execution time activity the execution process of a query is divided into plurality of processing phases; Molka in [0094] and [0142] discloses query workloads executed in a sequential order, workloads based on differently sized datasets; here Molka does not explicitly disclose skewing operation whose input dataset and output dataset exhibit an above-threshold difference in size and based on quantity of time required, but the Kauvlya reference discloses the feature, as discussed below);
repeating, at each of multiple subsequent points of the query pipeline, the determining and the inserting to insert one or more additional reparallelization points in the query pipeline in response to determining that one or more additional operations is a workload…operation (Molka in [0057] and [0066] discloses processing heavily memory intensive query workloads in a parallel fashion, query planner analyzes query structure and creates job execution plan, which is forwarded to buffers, query plan parallelism processed by one or more several worker threads, perform tasks in parallel, workload placement including parameterization, identifying parameters for optimization; Molka in [0079] and [0087] discloses estimating performance of a query workload operation, based on execution time activity the execution process of a query is divided into plurality of processing phases; Molka in [0094] and [0142] discloses query workloads executed in a sequential order, workloads based on differently sized datasets; here Molka does not explicitly disclose skewing operations, but the Kauvlya reference discloses the feature, as discussed below); and
executing the query by at least performing the sequence of operations comprising the query pipeline (Molka in [0042], [0068], and [0070] discloses receiving, processing, and executing query workloads, query planning and execution stage, create job execution plan, query plan parallelism processed by one or several worker threads).
Molka discloses parallelizing operations in a query pipeline by partitioning workload into multiple tasks which are executed in parallel, however, Molka does not explicitly disclose:
determining whether an operation…is a…skewing operation, wherein a size of an output dataset of the…skewing operation is more than a threshold larger or more than a threshold smaller than an input dataset of the…skewing operation and…determine a first task size for the one or more operations based on a first quantity of time required to perform the one or more first operations on a first portion of the output dataset;
in response to determining that an operation is a…skewing operation, inserting, a reparallelization point subsequent to the…skewing operation, wherein, at the reparallelization point, a second scheduling operation is performed to determine a second task size for one or more second operations following the second scheduling operation…based on a second quantity of time required to perform the one or more second operations on a second portion of the output dataset smaller that the first portion of the output dataset;
The Kauvlya reference discloses determining whether an operation is a skewing operation, wherein a size of an output dataset of the skewing operation is more than a threshold larger or more than a threshold smaller than an input dataset of the skewing operation and determine a first task size for the one or more operations based on a first quantity of time required to perform one or more first operations on a first portion of an output dataset (Kauvlya in [0002] and [0041] discloses cost is based on processing time and size being larger than capacity; Kauvlya in [0010] and [0011] discloses identifying and moving skewed keys, employ partitioning to divide skewed data on joins by breaking down jobs into tasks and distribute to multiple nodes, which is processed in parallel, identify partitions that evenly balance workload; Kauvlya in [0018] and [0024] and in Figure 2 discloses estimating cost of processing a join operation, a job such as a skewed join processed in parallel in nodes in a distributed environment by task executors, task distribution based on size of the dataset and the capabilities of the task executors, determining based on the size of the input dataset that the size of the output dataset will spill, optimized estimating cost and selecting join and partitioning strategy with lowest cost represented in terms of latency or throughput, determining that the probability that the output dataset of skewed join operation not fitting the memory is greater than a threshold, splitting input datasets so that records are evenly distributed across multiple partitions);
in response to determining that an operation is a skewing operation, inserting, a reparallelization point subsequent to the skewing operation, wherein, at the reparallelization point, a second scheduling operation is performed to determine a second task size for one or more second operations following the second scheduling operation based on a second quantity of time required to perform the one or more second operations on a second portion of the output dataset smaller that the first portion of the output dataset (Kauvlya in [0002] and [0041] discloses cost is based on processing time and size being larger than capacity; Kauvlya in [0010] and [0011] discloses identifying and moving skewed keys, employ partitioning to divide skewed data on joins by breaking down jobs into tasks and distribute to multiple nodes, which is processed in parallel, identify partitions that evenly balance workload; Kauvlya in [0018] and [0024] and in Figure 2 discloses estimating cost of processing a join operation, a job such as a skewed join processed in parallel in nodes in a distributed environment by task executors, task distribution based on size of the dataset and the capabilities of the task executors, determining based on the size of the input dataset that the size of the output dataset will spill, optimized estimating cost and selecting join and partitioning strategy with lowest cost represented in terms of latency or throughput, determining that the probability that the output dataset of skewed join operation not fitting the memory is greater than a threshold, splitting input datasets so that records are evenly distributed across multiple partitions).
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, having the teachings of Molka and Kauvlya, to have combined Molka and Kauvlya. The motivation to combine Molka and Kauvlya would be to provide optimized joins in Big Data systems by optimizing skewed joins (Kauvlya: [0001]).
Claim(s) 5, 6, 15, and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Molka (US Pub 2016/0328273) in view of Kauvlya (US Pub 2017/0185648) and in further view of White (US Pub 2019/0384843).
With respect to claim 5, Molka in view of Kauvlya discloses the system of claim 4, wherein the second scheduling operation is configured to avoid parallelization based at least on a first time required to gather the output dataset of the workload skewing operation into one or more buffers exceeding the second quantity of time required to perform the one or more second operations following the workload skewing operation (Molka in [0057] and [0066] discloses processing heavily memory intensive workloads in a parallel fashion, sequential execution of query used for each workload, query planner analyzes query structure and creates job execution plan, which is forwarded to buffers, query plan parallelism processed by one or more several worker threads, perform tasks in parallel, workload placement including parameterization, identifying parameters for optimization; Molka in [0078], [0079], and [0087] discloses introduce query thread level parallelism, estimate queue length based on query parallelism, based on execution time activity the execution process of a query is divided into plurality of processing phases; Molka in [0094] and [0142] discloses query workloads executed in a sequential order, workloads based on differently sized datasets; Molka in [0070] and [0071] discloses query plan parallelism processed by one or more worker threads, using buffers for query processing and scheduling, query job split into several tasks and processed by worker threads in parallel).
Molka discloses scheduling operation for parallelization of workload into one or more buffers to perform at least a first operation and Kauvlya discloses parallelization of workload skewing operation to perform operation following the workload skewing operation, however, Molka and Kauvlya do not explicitly disclose:
avoid parallelization based at least on a first time required to gather the output dataset of the operation exceeding a second time required to perform at least the first operation.
The White reference discloses avoid parallelization based at least on a first time required to gather an output dataset of an operation exceeding a second time required to perform at least a first operation (White in [0017] and [0018] discloses selecting operators to be parallelized during query optimization, determine if the operators are a good candidate for parallelization based on satisfying a threshold, determine cardinality high enough for two threads of control to be used for parallelization, determine whether the size of an operators input data stream is greater than a threshold because queries over small amounts of data execute very quickly without parallelization and can even be impacted negatively by the additional overhead of setting up and tearing down parallelism).
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, having the teachings of Molka, Kauvlya, and White, to have combined Molka, Kauvlya, and White. The motivation to combine Molka, Kauvlya, and White would be to maximize performance by deciding how much parallelism should be given to each operator by determining the number of separate threads of control to be allocated to the operators (White: [0005]).
With respect to claim 6, Molka in view of Kauvlya discloses the system of claim 4, wherein the second scheduling operation is configured to avoid parallelization based at least on there being less than a threshold quantity of tasks of the second task size (Molka in [0066], [0070], [0075], and [0094] discloses big data systems processing heavily memory intensive workloads in a parallel fashion, sequential execution of query used for each workload, query planner analyzes query structure and creates job execution plan, which is forwarded to buffers, query plan parallelism processed by one or more several worker threads, perform tasks in parallel; Molka in [0078], [0079], and [0087] discloses introduce query thread level parallelism, estimate queue length based on query parallelism, based on execution time activity the execution process of a query is divided into plurality of processing phases; Molka in [0071], [0075], and [0142] discloses query jobs split into several parallel tasks, workloads and operations based on differently seized datasets).
Molka and Kauvlya discloses parallelization based at least on there being a quantity of tasks, however, Molka and Kauvlya do not explicitly disclose:
avoid parallelization based at least on there being less than a threshold quantity of tasks of the first task size.
The White reference discloses avoid parallelization based at least on there being less than a threshold quantity of tasks of a first task size (White in [0017] and [0018] discloses selecting operators to be parallelized during query optimization, determine if the operators are a good candidate for parallelization based on satisfying a threshold, determine cardinality high enough for two threads of control to be used for parallelization, determine whether the size of an operators input data stream is greater than a threshold because queries over small amounts of data execute very quickly without parallelization and can even be impacted negatively by the additional overhead of setting up and tearing down parallelism).
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, having the teachings of Molka, Kauvlya, and White, to have combined Molka, Kauvlya, and White. The motivation to combine Molka, Kauvlya, and White would be to maximize performance by deciding how much parallelism should be given to each operator by determining the number of separate threads of control to be allocated to the operators (White: [0005]).
With respect to claim 15, Molka in view of Kauvlya discloses the method of claim 14, wherein the second scheduling operation is configured to avoid parallelization based at least on a first time required to gather the output dataset of the workload skewing operation into one or more buffers exceeding the second quantity of time required to perform the one or more second operations following the workload skewing operation (Molka in [0066], [0070], [0075], and [0094] discloses big data systems processing heavily memory intensive workloads in a parallel fashion, sequential execution of query used for each workload, query planner analyzes query structure and creates job execution plan, which is forwarded to buffers, query plan parallelism processed by one or more several worker threads, perform tasks in parallel; Molka in [0078], [0079], and [0087] discloses introduce query thread level parallelism, estimate queue length based on query parallelism, based on execution time activity the execution process of a query is divided into plurality of processing phases; Molka in [0071], [0075], and [0142] discloses query jobs split into several parallel tasks, workloads and operations based on differently seized datasets).
Molka discloses scheduling operation for parallelization of workload into one or more buffers to perform at least a first operation and Kauvlya discloses parallelization of workload skewing operation to perform operation following the workload skewing operation, however, Molka and Kauvlya do not explicitly disclose:
avoid parallelization based at least on a first time required to gather the output dataset of the operation exceeding a second time required to perform at least the first operation.
The White reference discloses avoid parallelization based at least on a first time required to gather an output dataset of an operation exceeding a second time required to perform at least a first operation (White in [0017] and [0018] discloses selecting operators to be parallelized during query optimization, determine if the operators are a good candidate for parallelization based on satisfying a threshold, determine cardinality high enough for two threads of control to be used for parallelization, determine whether the size of an operators input data stream is greater than a threshold because queries over small amounts of data execute very quickly without parallelization and can even be impacted negatively by the additional overhead of setting up and tearing down parallelism).
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, having the teachings of Molka, Kauvlya, and White, to have combined Molka, Kauvlya, and White. The motivation to combine Molka, Kauvlya, and White would be to maximize performance by deciding how much parallelism should be given to each operator by determining the number of separate threads of control to be allocated to the operators (White: [0005]).
With respect to claim 16, Molka in view of Kauvlya discloses the method of claim 14, wherein the second scheduling operation is configured to avoid parallelization based at least on there being less than a threshold quantity of tasks of the second task size (Molka in [0066], [0070], [0075], and [0094] discloses big data systems processing heavily memory intensive workloads in a parallel fashion, sequential execution of query used for each workload, query planner analyzes query structure and creates job execution plan, which is forwarded to buffers, query plan parallelism processed by one or more several worker threads, perform tasks in parallel; Molka in [0078], [0079], and [0087] discloses introduce query thread level parallelism, estimate queue length based on query parallelism, based on execution time activity the execution process of a query is divided into plurality of processing phases; Molka in [0071], [0075], and [0142] discloses query jobs split into several parallel tasks, workloads and operations based on differently seized datasets).
Molka and Kauvlya discloses parallelization based at least on there being a quantity of tasks, however, Molka and Kauvlya do not explicitly disclose:
avoid parallelization based at least on there being less than a threshold quantity of tasks of the first task size.
The White reference discloses avoid parallelization based at least on there being less than a threshold quantity of tasks of a first task size (White in [0017] and [0018] discloses selecting operators to be parallelized during query optimization, determine if the operators are a good candidate for parallelization based on satisfying a threshold, determine cardinality high enough for two threads of control to be used for parallelization, determine whether the size of an operators input data stream is greater than a threshold because queries over small amounts of data execute very quickly without parallelization and can even be impacted negatively by the additional overhead of setting up and tearing down parallelism).
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, having the teachings of Molka, Kauvlya, and White, to have combined Molka, Kauvlya, and White. The motivation to combine Molka, Kauvlya, and White would be to maximize performance by deciding how much parallelism should be given to each operator by determining the number of separate threads of control to be allocated to the operators (White: [0005]).
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to REZWANUL MAHMOOD whose telephone number is (571)272-5625. The examiner can normally be reached M-F 9-5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann J. Lo can be reached at 571-272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/R.M/Examiner, Art Unit 2159
/ANN J LO/Supervisory Patent Examiner, Art Unit 2159