Last updated: April 17, 2026
Application No. 19/286,313
Big Data processing platform and method

Non-Final OA §103
Filed
Jul 31, 2025
Examiner
CHU JOY, JORGE A
Art Unit
2195
Tech Center
2100 — Computer Architecture & Software
Assignee
unknown
OA Round
1 (Non-Final)
Interview Optional

— +37.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 408 resolved cases, 2023–2026
Examiner Intelligence

CHU JOY, JORGE A View full profile →
Grants 77% — above average
Career Allow Rate
314 granted / 408 resolved
+22.0% vs TC avg
Strong +37% interview lift
Without
With
+37.3%
Interview Lift
resolved cases with interview
Typical timeline
3y 1m
Avg Prosecution
41 currently pending
Career history
449
Total Applications
across all art units
Statute-Specific Performance

§101
11.0%
-29.0% vs TC avg
§103
55.3%
+15.3% vs TC avg
§102
3.2%
-36.8% vs TC avg
§112
19.6%
-20.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 408 resolved cases
Office Action

§103
DETAILED ACTION
Claims 1-20 are pending.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 07/31/2025 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Drawings
The drawings are objected to because in regards to figure 1, lakehouse 122 is shown in App layer instead of storage layer 105 while paragraph [0015] states “The storage layer 106 typically comprises one or more datamarts and lakehouse 122”.  
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Specification
The disclosure is objected to because of the following informalities:
Throughout the specification the terms “datamart” and “data mart” are used. Examiner respectfully suggests the applicant to consider amending the specification for term use consistency.
Appropriate correction is required.
Claim Objections
Claims are objected to because of the following informalities:
Claim 1 line 3 recites “A computer layer implemented on hardware and software” the “A” should be lower case. Similar issue with “In lieu” of claim 20. Appropriate correction is required.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 10, 11 and 14-17 are rejected under 35 U.S.C. 103 as being unpatentable over Garcia Calatrava et al. (US 2023/0418800 A1) hereinafter “GC” in view of Graham et al. (US 11,016,946 B1).

Regarding claim 1, GC teaches a computing system operating in association with a cloud-based environment (Fig. 1, shows sensor readings from a cloud to a system; [0077] system), comprising: 
a compute layer implemented on hardware and software and configured with multi-executor orchestration to provide ingestion, transformation and collation of time-series event data ([0055] a computer comprising hardware and software means adapted to perform a method according to any of the preceding claims; Fig. 6 Abstraction Layers; [0011] time-series record; [0025] Data (for instance, sensor readings) is ingested via the data pool(s) of the first step and keeps eventually cascading from one data pool to another, until reaching the last one. Each data pool is designed to keep the data that it holds in a specific manner, thanks to the data falls that move (and alter) the data from one data pool to another. Those data falls group, sort, or reduce the data that is being cascaded.; [0029] The different data fall operations are: [0030] Grouping; [0031] Sorting; [0032] Reducing), and;
a storage layer comprising a first data store, and a second data store, the first data store providing a system of record for harmonized data derived from the time-series event data, the harmonized data stored in an immutable, row-oriented data format, and the second data store providing an analytic environment and storing time-series event data in a column-oriented data format accessible utilizing columnar metadata (Abstract: A method is used for managing a flow of data in at least one database, wherein said database is configured with at least two data models of data storage. In said method, during a first period of time, a first data flow portion is received in a computer, and the first data flow portion is then stored in a first data pool of the database according to the first data model. Then, after the first period of time, a transformation is made on the first portion of the data and the transformed first data is assigned to a second data model, and the first data flow portion is then transferred from the first data pool to a second data pool.; [0007] Data models organize elements of data and define how they relate to each-other. Each data model has its own specific properties, performance, and may be preferred for different use cases. As data models vary, their properties and performance do too. Although the actual implementation might differ from one database to another, each data model follows some shared principles. Some of the most relevant data models, related to this time-series, are: [0009] Row oriented. A row, or tuple, represents a single data structure composed of multiple related data, such as sensor readings. Each row contains all the existing attributes that are closely related to the row primary key, the attribute that uniquely identifies the row. This makes it efficient to retrieve all attributes for a given primary key. All rows typically follow the same structure. [0010] Column oriented. Data is organized following a column fashion. Each column contains all the existing value related to the column identifier, for instance a sensorID. Column oriented data models are greatly efficient when performing historical queries.; [0048] Abstraction layers receive this query metadata, which is, in fact, a part of the query itself, known as where clause in traditional databases. Thanks to this hint, the abstractions layers evaluate which data should be selected and transformed, fitting the abstraction layers as much as possible with respect to the requested data.); and
wherein multi-executor orchestration splits a workload into a series of tasks ([0029] The different data fall operations are: [0030] Grouping; [0031] Sorting; [0032] Reducing), , and receives and aggregates results from execution of the tasks by the set of executors ([0030-31]; [0032] Reducing: Alters the data granularity by either filtering or aggregating data, being the origin source of another cascade-flow).
	GC teaches a workload split into different tasks such as MapReduce tasks but does not explicitly teach assigns each task in the set of tasks to an executor of a set of executors

However, Graham teaches assigns each task in the set of tasks to an executor of a set of executors (Col. 7, lines 32-43: Hadoop MR is a software framework to facilitate developing applications that process vast amounts of data (e.g., multi-terabyte datasets) in parallel on large clusters (e.g., thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner. An MR job determines how an input dataset should be split independent partitions (sometimes referred to as “splits”) which can be processed by so-called “map tasks” in a parallel manner. The MR framework sorts the outputs of the map tasks, which are then input to so-called “reduce tasks.” The final output of the job is based on the output of the reduce tasks.)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Graham with the teachings of GC to divide tasks and allocating them across a plurality of worker nodes. The modification would have been motivated by the desire of increasing processing speeds by using parallelization.

Regarding claim 2, Graham teaches wherein the row-oriented data format is Avro, and wherein the column-oriented data format is Parquet (Col. 2, lines 4-11: Generating the metadata record from the corresponding object metadata may include generating a record in Apache Avro format, Apache Thrift format, Apache Parquet format, Simple Key/Value format, JSON format, Hadoop SequenceFile format, or Google Protocol Buffer format.).

Regarding claim 3, GC teaches wherein the cloud-based environment comprises a set of compute nodes, and cloud storage, the set of compute nodes accessible via a set of network connections (Fig. 1A and Fig. 2).

Regarding claim 4, GC teaches wherein transformation and collation of time-series event data occurs following on-demand transfer of time-series event data from the first data store to the at least one compute node (Abstract; [0025] Data (for instance, sensor readings) is ingested via the data pool(s) of the first step and keeps eventually cascading from one data pool to another, until reaching the last one. Each data pool is designed to keep the data that it holds in a specific manner, thanks to the data falls that move (and alter) the data from one data pool to another. Those data falls group, sort, or reduce the data that is being cascaded.).
	In addition, Graham teaches on a first compute node of the set of compute nodes (Col. 4, lines 15-24: Any one of the data nodes 102 may be capable of receiving and processing client requests)

Regarding claim 10, GC teaches wherein time-series event data is stored in a hybrid row-columnar format ([0082] “row” of short columns).

Regarding claim 11, GC teaches wherein the hybrid row-columnar format stores segmented embedded vectors ([0082] In this step, the transformed data is store in Pool 3.1. It receives the data grouped in the shape of daily columns, as the previous cascade is just sorting them, not altering its data structure. Sensor daily-columns or groups are sorted sensor-wise, as the example data fall is set to, meaning that all data from a given sensor, from the previous data pool flush, is stored adjacently in disk, forming an in-disk larger column. It is the stream data pool, as it is the last data pool. On this example, it will keep all historical data, except if manually removed. The horizontal “row” of short columns (or long column) represents, in this case, one month of data for a given sensor. As there are 4 sensors (3 sensors+1 synthetic timestamp), each for “row of groups” represents the total amount of data coming from the last data fall.).

Regarding claim 14, it is method type claim having similar limitations as claim 1 above. Therefore, it is rejected under the same rationale above, as the citations encompass the limitations as claimed. The additional features are taught by GC wherein processing includes at least one executor applying a data processing primitive that is one of: a homogeneous collation primitive, a heterogeneous collation primitive for sort-merge joins ([0029] The different data fall operations are: [0030] Grouping; [0031] Sorting; [0032] Reducing; [0055] a computer comprising hardware and software means adapted to perform a method according to any of the preceding claims; Fig. 6 Abstraction Layers), a presorted record merge primitive, and a geospatial processing primitive.

Regarding claim 15, it is method type claim having similar limitations as claim 2 above. Therefore, it is rejected under the same rationale above.

Regarding claim 16, the combination teaches wherein the primitive is one of a set of primitives, the set of primitives are implemented across the one or more compute nodes using concurrent threads executing up to thousands of concurrent tasks per compute node (GC’s Abstract and [0052] This framework is typically intended to perform operation such as aggregations (MIN, AVG, etc.), but it is also able to alter the shape of data, or its structure, even being able to convert data from one data model to another, by using its powerful tools, such as aggregation pipelines or operations following the map-reduce paradigm, taking advantage of parallelism. Graham further teaches Col. 7, lines 28-42: A Hadoop cluster may provide both distributed processing capabilities and distributed data storage functionality. Distributed processing is typically provided by a MapReduce (MR) engine, which manages execution of MR jobs submitted by a job client (e.g., a user application). Hadoop MR is a software framework to facilitate developing applications that process vast amounts of data (e.g., multi-terabyte datasets) in parallel on large clusters (e.g., thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner. An MR job determines how an input dataset should be split independent partitions (sometimes referred to as “splits”) which can be processed by so-called “map tasks” in a parallel manner. The MR framework sorts the outputs of the map tasks, which are then input to so-called “reduce tasks.” The final output of the job is based on the output of the reduce tasks.).

Regarding claim 17, GC teaches wherein data stored in the given row-based file format and given column-based file format are accessed according to requests that conform to a procedural object-oriented language ([0047] In another preferred embodiment of the invention, wherein the first or the second data pool is configured with at least two polyglot abstraction layers. Polyglot abstraction layers consist of several data representations from which the user can access the very same data, but in different ways. This polyglot approach provides two additional main benefits: hybrid queries, since the abstraction layers enable data-model coexistence, and, thus, users are able to retrieve data independently from which data models it is stored in and are able to choose from which abstraction layer to query from, minimizing the data model transformation costs and final format consciousness, meaning that the method of the invention prevents a data transformation overhead, becoming more efficient and more resource-saving, accommodating itself to the final data format needed by the user.).

Claims 5-8 are rejected under 35 U.S.C. 103 as being unpatentable over GC and Graham, as applied to claim 1, in further view of Parcha et al. (US 2024/0061717 A1).

Regarding claim 5, GC nor Graham expressly teach wherein the first compute node of the set of compute nodes is a host machine on which a given number of virtual cores are configurable for execution.
	However, Parcha teaches wherein the first compute node of the set of compute nodes is a host machine on which a given number of virtual cores are configurable for execution ([0002] Various cloud-based services may be scaled, as needed, to handle variable processing demands of preset processes. These scalable cloud-based services provide increased processing power to accommodate intervals when the demands are high, while reducing the available processing power when demands are low. To accommodate the fluctuating demand, scalable services may dynamically adjust a quantity of virtual processors available to existing quantity of processes.).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Parcha with the teachings of GC and Graham to scale the instances dynamically based on workload requirement. The modification would have been motivated by the desire of ensuring the amount of processors is enough to meet processing demands thereby maintaining processing throughput.

Regarding claim 6, Parcha teaches wherein an executor executes on a virtual core of the host machine ([0002] Examiner Notes virtual processor are hosted in machines).

Regarding claim 7, Parcha teaches wherein multi-executor orchestration scales up one or more virtual cores on the first compute node as necessary to execute the workload ([0002]).

Regarding claim 8, Parcha teaches wherein multi-executor orchestration scales out to a second compute node when the given number of virtual cores in the first compute node are not sufficient to execute the workload ([0002]).

Claims 9 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over GC and Graham, as applied to claim 1, in further view of Srinivasan et al. (US 2020/0195571 A1).

Regarding claim 9, GC nor Graham explicitly teach wherein the multi-execution orchestration splits the workload across a swarm of serverless functions associated with the cloud-based environment.
	However, Srinivasan teaches wherein the multi-execution orchestration splits the workload across a swarm of serverless functions associated with the cloud-based environment ([0041] In some cases, the execution environment 300 may be a specially defined computational system deployed in a cloud platform. In some cases, the parameters defining the execution environment may be specified in a manifest for cloud deployment… The framework describes a series of the serverless tasks controlled via scripts. The serverless tasks overlap in execution to maintain continuity across the tasks. The computational task in divided into chunks that may be handled by individual serverless tasks. Accordingly, a complex analytic process, such as those describe in this disclosure, may be divided into chunks and executed over one or more overlapping serverless tasks.).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Srinivasan with the teachings of GC and Graham to split tasks and execute them using serverless functions. The modification would have been motivated by the desire of faster development and ease of scaling.

Regarding claim 19, it is method type claim having similar limitations as claim 9 above. Therefore, it is rejected under the same rationale above.

Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over GC and Graham, as applied to claim 1, in further view of Fan et al. (US 2022/0230227 A1).

Regarding claim 18, GC and Graham do not expressly teach wherein the procedural object-oriented language is GoLang, and wherein GoLang goroutines provide threads consuming less than ten (10) kilobytes of memory per thread to enable concurrent processing of data streams from the cloud storage without thread-switching overhead.
	However, Fan teaches wherein the procedural object-oriented language is GoLang, and wherein GoLang goroutines provide threads consuming less than ten (10) kilobytes of memory per thread to enable concurrent processing of data streams from the cloud storage without thread-switching overhead ([0130] The Golang-based POI service for the techniques disclosed herein will be described in details below; [0131] Go has goroutines that consume around 2 KB memory so that millions of goroutines can be spinned. Besides, goroutines come with channels to communicate safely between themselves and avoid the usage of mutex locking.).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Fan with the teachings of GC and Graham to utilize GoLang as it is very light-weighted and easy to start and stop, which makes GoLang outperform other languages in terms of stability as a backend language for various embodiments. See at least [0128]

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Graham et al. (US 11,016,946 B1) in further view of Lee et al. (US 2017/0244650 A1).

Regarding claim 20, Graham teaches a method associated with a cloud-based infrastructure comprising a plurality of compute nodes, cloud storage distinct from the plurality of compute nodes, and network connections between the plurality of compute nodes and the cloud storage (Col. 3, lines 23-54: The phrases “computer,” “computing system,” “computing environment,” “processing platform,” “data memory and storage system,” and “data memory and storage system environment” as used herein with respect to various embodiments are intended to be broadly construed, so as to encompass, for example, private or public cloud computing or storage systems, or parts thereof, as well as other types of systems comprising distributed virtual infrastructure and those not comprising virtual infrastructure…As used herein, the term “storage device” refers to any non-volatile memory (NVM) device, including hard disk drives (HDDs), flash devices (e.g., NAND flash devices), and next generation NVM devices, any of which can be accessed locally and/or remotely (e.g., via a storage attached network (SAN)). The term “storage device” can also refer to a storage array comprising one or more storage devices.), the method comprising: 
receiving event data to be processed by a first processing stage in the plurality of compute nodes, mapping operations on the event data using a plurality of executors of a set of executors (Col. 7, lines 28-43: A Hadoop cluster may provide both distributed processing capabilities and distributed data storage functionality. Distributed processing is typically provided by a MapReduce (MR) engine, which manages execution of MR jobs submitted by a job client (e.g., a user application). Hadoop MR is a software framework to facilitate developing applications that process vast amounts of data (e.g., multi-terabyte datasets) in parallel on large clusters (e.g., thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner. An MR job determines how an input dataset should be split independent partitions (sometimes referred to as “splits”) which can be processed by so-called “map tasks” in a parallel manner. The MR framework sorts the outputs of the map tasks, which are then input to so-called “reduce tasks.” The final output of the job is based on the output of the reduce tasks.); 
In lieu of transferring data between processing stages in the plurality of compute nodes, writing intermediate shuffle data generated by the plurality of executors directly to cloud object storage organized by one or more partition keys (Col. 7, lines 28-43 The MR framework sorts the outputs of the map tasks; Col. 7, lines 53-60: Typically, both a job's input and output are stored within a distributed file system. Intermediate results data (i.e., the output of individual map-reduce tasks) may be also be stored in a distributed file system. A single MR job can use multiple different file systems. For example, a job's input dataset could be read from an external HCFS, whereas the job's output data could be written to an HDFS instance local to the cluster 202.; Col. 5, lines 65 through Col. 6, line 4: To track the location of object data across the plurality of storage devices 112 and nodes 102, the system 100 may include mapping between object keys and storage locations, referred to herein as the “primary index.” In various embodiments, a primary index 104a is implemented using a key-value store 104 stored within the storage devices 112.); and 
executing reduce operations by reading the intermediate shuffle data from the cloud object storage into a second processing stage distinct from the first processing stage (Col. 7, lines 28-43 The MR framework sorts the outputs of the map tasks, which are then input to so-called “reduce tasks.” The final output of the job is based on the output of the reduce tasks; Col. 7, lines 53-60 Typically, both a job's input and output are stored within a distributed file system. Intermediate results data (i.e., the output of individual map-reduce tasks) may be also be stored in a distributed file system. A single MR job can use multiple different file systems. For example, a job's input dataset could be read from an external HCFS, whereas the job's output data could be written to an HDFS instance local to the cluster 202.); 
wherein the cloud object storage serves as a persistent shuffle medium enabling job recovery without re-execution of completed mapping tasks (Col. 7, lines 28-60 discuss that output obtained from mapping tasks are sorted and held in storage for reduce tasks. Therefore, the mapping tasks are not re-executed).
Graham teaches mapping, sorting and reducing but does not explicitly teach Shuffle.
However, Lee teaches shuffle ([0006] A MapReduce processing process involves a map calculation operation, a reduce calculation operation, and a shuffle and sort operation (referred to as a “shuffle operation” below) of moving data to reduce map calculation results. A map calculator reads a record and calculates a record having a new key and value by filtering the read record or converting the read record into another value. The calculated record is referred to as intermediate data and is stored in a local disk of the map calculator. A reduce calculator groups result values output through a map process based on the new key and then outputs a result of executing an aggregation operation. A shuffle operation involves a process of dividing a key bandwidth through partitioning, sorting and storing a map calculation result in the local disk, and then transferring the map calculation result as input data of reduce calculators via a network.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Lee with the teachings of Graham as it is well-known in the art that MapReduce processing includes steps such as mapping, shuffling/sorting and reduce. The modification would have been motivated by the desire of combining known elements to yield predictable results.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JORGE A CHU JOY-DAVILA whose telephone number is (571)270-0692. The examiner can normally be reached Monday-Friday, 6:00am-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee J Li can be reached at (571)272-4169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JORGE A CHU JOY-DAVILA/Primary Examiner, Art Unit 2195
Read full office action
Prosecution Timeline

Jul 31, 2025
Application Filed
Dec 19, 2025
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/391,320
Patent 12602244
OFFLOADING PROCESSING TASKS TO DECOUPLED ACCELERATORS FOR INCREASING PERFORMANCE IN A SYSTEM ON A CHIP
2y 5m to grant Granted Apr 14, 2026
17/957,939
Patent 12596565
USER ASSIGNED NETWORK INTERFACE QUEUES
2y 5m to grant Granted Apr 07, 2026
18/332,830
Patent 12591821
DYNAMIC ADJUSTMENT OF WELL PLAN SCHEDULES ON DIFFERENT HIERARCHICAL LEVELS BASED ON SUBSYSTEMS ACHIEVING A DESIRED STATE
2y 5m to grant Granted Mar 31, 2026
18/229,644
Patent 12585490
MIGRATING VIRTUAL MACHINES WHILE PERFORMING MIDDLEBOX SERVICE OPERATIONS AT A PNIC
2y 5m to grant Granted Mar 24, 2026
18/076,742
Patent 12579065
LIGHTWEIGHT KERNEL DRIVER FOR VIRTUALIZED STORAGE
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
77%
Grant Probability
99%
With Interview (+37.3%)
3y 1m
Median Time to Grant
Low
PTA Risk
Based on 408 resolved cases by this examiner. Grant probability derived from career allow rate.