Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Effective Filing Data
The effective filing date is 11/23/2022.
Information Disclosure Statement
The information disclosure statement(s) submitted on 05/28/2024 is/are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement(s) is/are being considered by the examiner.
Status of Claims
The present application is being examined under the claims filed on 02/03/2026.
Claims 1, 2, 4-11, 13-23 are rejected. Claims 1, 2, 4-11, 13-23 are pending.
Response to Arguments - 103
As noted in the interview conducted on 12/23/2025, the amendments overcome the prior art on record and further search and consideration was required. Further search and consideration has since been conducted and the amended limitations are taught by reference Eldar. Refer to the updated claim mappings in this document.
Prior Art References
Short Name
Reference
Novak
Nikolaev, R., 2019. A scalable, portable, and memory-efficient lock-free FIFO queue. arXiv preprint arXiv:1908.04511.
Crabtree
US 20210092160 A1 - DATA SET CREATION WITH CROWD-BASED REINFORCEMENT
Verbraeken
Verbraeken, J., Wolting, M., Katzy, J., Kloppenburg, J., Verbelen, T. and Rellermeyer, J.S., 2020. A survey on distributed machine learning. Acm computing surveys (csur), 53(2), pp.1-33.
Nikolaev
Novak, J., Kasera, S.K. and Stutsman, R., 2020, October. Auto-scaling cloud-based memory-intensive applications. In 2020 IEEE 13th International Conference on Cloud Computing (CLOUD) (pp. 229-237). IEEE.
Eldar
US 20220374273 A1 - COMPUTING RESOURCE AUTOSCALING BASED ON PREDICTED METRIC BEHAVIOR
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 2, 10, 11, 19, and 20-23 are rejected under 35 U.S.C. 103 as being unpatentable over Novak in view of Crabtree in further view of Verbraeken in further view of Eldar.
Claims 4-9, and 13-18 are rejected under 35 U.S.C. 103 as being unpatentable over Novak in view of Crabtree in further view of Verbraeken in further view of Eldar in further view of Nikolaev.
In reference to claim 1.
Novak teaches:
“1. An auto-scalable synthetic data generation system comprising a plurality of computers and a plurality of storage devices storing instructions that are operable, when executed by the computers, to cause the computers to perform operations comprising: maintaining a plurality of [synthetic data generator] replicas [that are each configured to generate synthetic training examples for training a machine learning model to perform a particular task];” (Novak Fig. 5, “Consumers”, Examiner notes that the consumers are the replicas)
PNG
media_image1.png
185
408
media_image1.png
Greyscale
“maintaining a plurality of [machine learning training] workers [that are each configured to obtain synthetic training examples generated by one or more of the synthetic data generator replicas and to use the synthetic training examples to concurrently perform operations required to update the machine learning model];” (Novak Fig. 5, “Producer”, Examiner notes that the figure only depicts one producer, but the reference notes that the same architecture could also have multiple producers pushing jobs to the queue.)
“in response to determining that the size associated with the queue of each of the one or more of the synthetic data generator replicas is below the threshold size, determining, by the autoscaler (Novak Fig. 5, “Controller”, The controller is the autoscaler), that a number of [synthetic data generator] replicas is insufficient to service a current demand level of the plurality of [machine learning training] workers (Novak 233, “A scale out event begins when the queue is growing (i.e., there are not enough resources to meet demand)”);”
and in response to determining that the number of synthetic data generator replicas is insufficient, deploying, by the autoscaler, an additional [synthetic data generator] replica in the [synthetic data generation] system. (Novak 233, “To change the system memory size, the controller selects a set of VMs to stop (set A) and creates a set of new VMs to launch (set B) to meet the computed memory size.”)
Crabtree teaches:
“[maintaining a plurality of] synthetic data generator [replicas] each synthetic data generator replica configured to generate synthetic training examples [and to store the synthetic training examples in a queue associated with the synthetic data generator replica] for training a machine learning model to perform a particular task;” (Crabtree [0056], “The synthetic data generator generates synthetic data”)
“[maintaining a plurality of] machine learning training [workers] that are each configured to obtain synthetic training examples generated by one or more of the synthetic data generator replicas [from a queue associated with each of the one or more of the synthetic data generator replicas] and to use the synthetic training examples [to concurrently perform operations required to update the machine learning model];” (Crabtree [0025], “a synthetic data generator may be used to produce additional training data from these curated high quality data sets.”)
“and to store the synthetic training examples in a queue associated with the synthetic data generator replica”; “and to store the synthetic training examples in a queue associated with the synthetic data generator replica” (Crabtree [0056], “Data from the synthetic data generator is fed into the verification queue.”, Crabtree further teaches pushing to the queue for additional iterations, Crabtree [0032], “Techniques and mechanisms described or referenced herein will sometimes be described in singular form for clarity. However, it should be appreciated that particular aspects may include multiple iterations of a technique or multiple instantiations of a mechanism unless noted otherwise.”)
Motivation to combine Novak, Crabtree.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Novak and Crabtree. Novak discloses a system for dynamically scaling arbitrary distributed systems. Crabtree discloses the generation of synthetic training data for machine learning algorithms. One would be motivated to combine these references because Novak provides an obvious way to scale up the system of Crabtree. Novak discusses a specific architecture that supports horizontally scaling arbitrary distributed systems and one of ordinary skill in the art would be motivated to combine Novak with Crabtree in order to increase the throughput of the system described in Crabtree. Further, MPEP 2143 sets forth the Supreme Court rationales for obviousness including: (F) Known work in one field of endeavor may prompt variations of it for use in either the same field or a different one based on design incentives or other market forces if the variations are predictable to one of ordinary skill in the art.
Verbraeken teaches:
“[maintaining a plurality of machine learning training workers that are each configured to obtain synthetic training examples generated by one or more of the synthetic data generator replicas from a queue associated with each of the one or more of the synthetic data generator replicas and to use the synthetic training examples] to concurrently perform operations required to update the machine learning model;” (Verbraeken 30:5, “In the Model-Parallel approach, exact copies of the entire datasets are processed by the worker nodes that operate on different parts of the model. The model is therefore the aggregate of all model parts.”)
Motivation to combine Novak, Crabtree, Verbraeken.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Novak, Crabtree, and Verbraeken. Novak discloses a system for dynamically scaling arbitrary distributed systems. Verbraeken discloses model-parallelism wherein multiple compute nodes perform machine learning for updating a single model. One would be motivated to combine these references because Novak provides an obvious way to scale up the system of Verbraeken. Novak discusses a specific architecture that supports horizontally scaling arbitrary distributed systems and one of ordinary skill in the art would be motivated to combine Novak with Verbraeken in order to increase the throughput of the system described in Verbraeken. Further, MPEP 2143 sets forth the Supreme Court rationales for obviousness including: (F) Known work in one field of endeavor may prompt variations of it for use in either the same field or a different one based on design incentives or other market forces if the variations are predictable to one of ordinary skill in the art.
Eldar teaches:
“determining, by an autoscaler of the synthetic data generation system, that a size of the queue associated with each of the one or more of the synthetic data generator replicas is below a threshold size;” (Eldar [0001], “Other solutions perform reactive autoscaling, where resource allocation is triggered based on detecting an increase in resource consumption”)
Motivation to combine Novak, Crabtree, Verbraeken, Eldar.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Novak, Crabtree, Verbraeken, and Eldar. Novak, Crabtree, Verbraeken discloses a system for dynamically scaling arbitrary distributed systems. Eldar discloses a specific mechanism (reactive autoscaling) for scaling the same systems as described in Novak, Crabtree, Verbraeken. One would be motivated to combine these references because Eldar provides an obvious mechanism by which to perform the scaling in Novak, Crabtree, and Verbraeken. Novak discusses a specific architecture that supports horizontally scaling arbitrary distributed systems and one of ordinary skill in the art would be motivated to combine Novak with Verbraeken in order to increase the throughput of the system described in Verbraeken. Further, MPEP 2143 sets forth the Supreme Court rationales for obviousness including: (F) Known work in one field of endeavor may prompt variations of it for use in either the same field or a different one based on design incentives or other market forces if the variations are predictable to one of ordinary skill in the art.
In reference to claim 2.
Novak teaches:
“2. The system of claim 1, wherein the operations further comprise: receiving, by a load balancer (Novak Fig. 5, “Job Queue or Load Balancer”) for the plurality of machine learning training workers, a request (Novak Fig. 5, “J” arrow from “Producer”, Examiner notes the J stands for job) [for synthetic training examples];”
“providing, by the load balancer, the data fetch request to the additional [synthetic data generator] replica deployed in the [synthetic data generation] system (Novak Fig. 5, “J” arrow pointing to “Consumers”, Examiner notes the J stands for job);”
“and obtaining, by at least one of the plurality of [machine learning training] workers, [synthetic training] examples generated by the additional [synthetic data generator] replica deployed in the distributed [synthetic data generation] system.” (Novak 232, “It supports a wide variety of applications including […] request/response (req/resp) web servers.”, Examiner notes that the request from the workers is sent to the replica(s) and in turn the worker(s) requesting receive a response)
Crabtree teaches:
“synthetic training examples”, “synthetic data generator”, “synthetic data generation”, “machine learning training”, “synthetic training” (Crabtree [0025], “a synthetic data generator may be used to produce additional training data from these curated high quality data sets.”)
In reference to claim 4.
Nikolaev teaches:
“4. The system of claim 1, wherein each synthetic data generator replica is configured to pause generation of synthetic training examples when the associated queue is filled.” (Nikolaev 4, “Note that enqueue does not need to check if a queue is full. It is only called when an available entry (out of n) exists.”, Examiner notes data is only enqueued is there is space available to store it. If enqueue is not being called, the queue is filled and data generation is paused.)
Motivation to combine Novak, Crabtree, Verbraeken, Eldar, Nikolaev.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Novak, Crabtree, Verbraeken, Eldar and Nikolaev. Novak, Crabtree, Verbraeken, Eldar discloses a system for dynamically scaling arbitrary distributed systems. Nikolaev discloses an efficient implementation of a first-in-first-out data structure (a queue). One would be motivated to combine these references because queueing is ubiquitous in distributed system architectures and one of ordinary skill in the art that is architecting such a system would be motivated to increase the efficiency of the system by utilizing an efficient queue implementation. Further, MPEP 2143 sets forth the Supreme Court rationales for obviousness including: (D) Applying a known technique to a known device (method, or product) ready for improvement to yield predictable results.
In reference to claim 5.
Nikolaev teaches:
“5. The system of claim 4, wherein each synthetic data generator replica is configured to resume generation of synthetic training examples when the associated queue is no longer filled.” (Nikolaev 4, “Note that enqueue does not need to check if a queue is full. It is only called when an available entry (out of n) exists.”, Examiner notes data is only enqueued is there is space available to store it)
In reference to claim 6.
Novak teaches:
“6. The system of claim 4, wherein determining, by the autoscaler of the synthetic data generation system, that a number of synthetic data generator replicas is insufficient to service a current demand level of the plurality of machine learning training workers comprises computing a utilization metric based on the fullness of queues for the synthetic data generation replicas.” (Novak TABLE I and Algorithm 1, Examiner notes that the utilization metric maps to the scaling metric p in the reference and it is tied to the amount of memory available across the system. Under the broadest reasonable interpretation of the queue data structure being claimed, a utilization
PNG
media_image2.png
610
1085
media_image2.png
Greyscale
PNG
media_image3.png
413
620
media_image3.png
Greyscale
metric for the system memory would also be a metric on the fullness of the queues being utilized.)
In reference to claim 7.
Novak teaches:
“7. The system of claim 4, wherein determining, by the autoscaler of the synthetic data generation system, that a number of synthetic data generator replicas is insufficient to service a current demand level of the plurality of machine learning training workers comprises computing a target number of synthetic data generator replicas based on queue sizes.” (Novak 233, “To change the system memory size, the controller selects a set of VMs to stop (set A) and creates a set of new VMs to launch (set B) to meet the computed memory size.”)
In reference to claim 8.
Novak teaches:
“8. The system of claim 7, wherein computing the target number of synthetic data generator replicas comprises computing a ratio of a target queue size to an observed queue size multiplied by a number of synthetic data generation replicas.” (Novak 233, “We treat this as a knapsack problem with weights and values of VMs equal to the memory size of the VM”, Examiner notes that the claimed formula for computing the number of target replicas is a trivial formulation of the knapsack problem wherein all VM’s have the same amount of memory. I.e., the target number of VM’s is equal to the current number of VM’s multiplied by some ratio of desired utilization and actual utilization of memory.)
In reference to claim 9.
Novak teaches:
“9. The system of claim 8, wherein the target number of synthetic data generator replicas target num replicas is given by: wherein the target number of synthetic data generator replicas target num replicas is given by: target num replicas = ceil((target queue size / observed queue size) * num replicas),” (Novak 233, “We treat this as a knapsack problem with weights and values of VMs equal to the memory size of the VM”, Examiner notes that the claimed formula for computing the number of target replicas is a trivial formulation of the knapsack problem wherein all VM’s have the same amount of memory. I.e., the target number of VM’s is equal to the current number of VM’s multiplied by some ratio of desired utilization and actual utilization of memory.)
“where target queue size is a predetermined queue fullness metric (Novak TABLE I, Examiner notes that l is a flag that gets set if available VM memory (i.e. queue size) falls below a certain limit. This limit is the predetermined queue fullness metric.), observed queue size is a metric representing a current level of queue fullness (Novak TABLE I, “mi, mi+1 - existing, new memory in system”), and num replicas is the current number of synthetic data generation replicas (Novak 233, “To change the system memory size, the controller selects a set of VMs to stop (set A) and creates a set of new VMs to launch (set B) to meet the computed memory size.”, Examiner notes that the target number of replicas is the cardinality of set B and the current number of replicas is the cardinality of set A.).”
In reference to claim 21.
Verbraeken teaches:
“21. (New) The system of claim 1, wherein the synthetic training examples are synthetic images, and wherein the machine learning model is an image processing model configured to process an image to generate an output that characterizes the image.” (Verbraeken 30:7, “ML algorithms can be used for a wide variety of purposes, such as classifying an image or predicting the probability of an event.”)
The following claims are substantially similar: 10 and 19 to 1; 11 and 20 to 2; 13 to 4; 14 to 5; 15 to 6; 16 to 7; 17 to 8; 18 to 9; and 22 and 23 to 21. Thus, the claims are rejected using the same prior art.
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CODY RYAN GILLESPIE whose telephone number is (571)272-1331. The examiner can normally be reached M-F, 8 AM - 5 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Viker A Lamardo can be reached at 5172705871. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/CODY RYAN GILLESPIE/Examiner, Art Unit 2147
/VIKER A LAMARDO/Supervisory Patent Examiner, Art Unit 2147