Last updated: May 29, 2026

Application No. 18/193,762

METHOD AND SYSTEM FOR GENERATING AND MANAGING MACHINE LEARNING MODEL TRAINING DATA STREAMS

Non-Final OA §103

Filed

Mar 31, 2023

Examiner

SEYE, ABDOU K

Art Unit

2198

Tech Center

2100 — Computer Architecture & Software

Assignee

DELL PRODUCTS, L.P.

OA Round

1 (Non-Final)

Interview Optional

— +27.5% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 82% grant rate with +27.5% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.

Based on 583 resolved cases, 2023–2026

Examiner Intelligence

SEYE, ABDOU K View full profile →

Grants 82% — above average

Career Allowance Rate

480 granted / 583 resolved

+27.3% vs TC avg

Strong +28% interview lift

Without

With

+27.5%

Interview Lift

resolved cases with interview

Typical timeline

3y 3m

Avg Prosecution

17 currently pending

Career history

622

Total Applications

across all art units

Statute-Specific Performance

§101

5.6%

-34.4% vs TC avg

§103

89.7%

+49.7% vs TC avg

§102

1.4%

-38.6% vs TC avg

§112

2.0%

-38.0% vs TC avg

Black line = Tech Center average estimate • Based on career data from 583 resolved cases

Office Action

§103

DETAILED ACTION
Statement of claims
The present application include :
Claims 1-20 remain pending in the application.  Claims 1-20 are being considered on the merits.
 
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 07/30/2024 . The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.


Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over  Krishna et al. (US 2022/0245505, Krishna herein after)  in view of Nikoli Dryden et al. “Clairvoyant Prefetching for Distributed Machine Learning I/O”, 2021-11-14, Nikoli  hereinafter).

As to claim 1, Krishna teaches a method for managing training data (e.g., see FIG.4B,  para 68, “data manager 338 in communication with a training container” ,  “ training data” ) , comprising: 
obtaining a first stream request wherein the first stream request comprises a stream creation request and a stream specification (e.g., see, FIG. 3, para 47, “ the ingress request 302 is generated in response to input received from a user via user interface 205”, “ an ingress request 302, which identifies a job payload (e.g., a hyper-parameter optimization processing task)”, “an HTTP POST request that includes the job payload” and “request is received that identifies a payload for execution”,  a plurality of tasks are generated based on the payload”  in para 72, see FIG. 6. 
Thus, the “request” represents the a first stream request, the “payload” represents stream specification, the” ingress request 302 is generated  coupled with  “tasks are generated “ include the stream creation request)  ;
 in response to obtaining the stream creation request ( e.g., para 43, “” ingress request 302 is generated “):
 generating a new stream entry in a stream database ( e.g., “306”, FIG. 3,  see FIG. 3, para 47,  wherein “ Entry point service 304 receives ingress request 302”, : a checksum based on the received job payload”,  and “stores checkpoint data within object store 324”,  “Entry point service 304 places the job payload in processing queue 306.” checkpoint data may include a checksum (e.g., an ID), a timestamp” in para 52.
Thus,  the “processing queue 306” coupled with  “object store 324” include  the stream database,  the  “a checksum”, ID, a timestamp”  include the a new stream entry  );
 loading training data specified by the stream specification into a cache (e.g., “data Queue”, FIG. 4B,  and  “training data Stream”, FIG.4B  and  para 68,  wherein “receives the training data stream from object store 324”, FIG. 4B, “456”  . Thus, “data queue” include the  cache); 
generating augmented training data using the training data and the stream specification (e.g., FIG. 4B, “460 “, para 68 “the shuffled training data to augmentation 460. Augmentation 460 augments the shuffled training data,); 
generating a mini-batch  using the augmented training data and the stream specification (e.g., para 69, “Batching module 458 batches the training data into batch sizes”. Thus, one of the “batch sizes” include  a mini-batch ); 
creating a mini-batch  queue   and a stream endpoint (e.g., e.g., “470 training batch queue”, FIG. 4B , para 69, “ batch sizes”,  “batches) within data queue 456” , “the batches within training batch queue 470” ,  “processing unit, such as GPU 472” and “checkpoint data may include a checksum (e.g., an ID)” in para 52 )
Thus, “470 training batch queue” represent a mini-batch  queue, the “checksum (e.g., an ID)”  coupled with “processing unit, such as GPU 472”  include a  stream endpoint ); and 
training the mini-batch 
However, Krishna does not teach  a mini-batch sequence , a mini-batch sequence queue.
Nikoli teaches generating a mini-batch sequence    , using the augmented training data   (e.g., “Mini-batch 1”,  “Mini-batch 2”, ., “Mini-batch 3”, ., “Mini-batch 4”, Figure 2,  and “2.2 Machine Learning I/O Frameworks”, “ data augmentation, and finally collating them into a mini-batch for training (see Fig. 2. The ““Mini-batch 1”,  “Mini-batch 2”, ., “Mini-batch 3”, ., “Mini-batch 4” include the mini-batch sequence ), creating a mini-batch sequence queue (e.g.,   “Staging buffer”, Figure 5 in page 5, also, see FIG. 6,  page 6,   “the staging buffer, which is filled in a circular manner”, “a producer/consumer queue” for “Externa data augmentation”, “ data augmentation, and finally collating them into a mini-batch for training.  Thus, the “Staging buffer” coupled  with  “producer/consumer queue” include the  mini-batch sequence queue) , wherein the mini-batch sequence is used by a training environment to train a machine learning model (e.g.,  see page 2, “2.2 Machine Learning I/O Frameworks I/O for training deep neural networks “, “ reading samples from storage”, see Figure 6).
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify the method of Krishna by adopting the teachings of Nikoli to have generating a mini-batch sequence using the augmented training data and the stream specification; creating a mini-batch sequence queue and a stream endpoint; and streaming the mini-batch sequence using the mini-batch sequence queue and the stream endpoint, wherein the mini-batch sequence is used by a training environment to train a machine learning model  since, it would  “reduces I/O times and improves end to-end training” (see Nikoli, abstract) or to provide  “powerful interface that can be used in existing training pipelines to improve their I/O performance and reduce overall runtime” (see Nikoli, concludsion).

As to claim 2, Krishna does not explicitly  teach  wherein the augmented training data comprises training data examples of the training data and additional augmented training data examples. However,  Nikoli teaches wherein the augmented training data comprises training data examples of the training data and additional augmented training data examples (e.g., see page 2, “reading samples from storage”, “data augmentation, and finally collating them into a mini-batch for training (see Fig. 2)”.
Thus, the “samples” represent the examples). 

As to claim 3,  Krishna  does not teach  wherein the mini-batch sequence comprises: a plurality of mini-batches; end of epoch messages; and an end of stream message.  However,  Nikoli teaches wherein the mini-batch sequence comprises: a plurality of mini-batches; end of epoch messages; and an end of stream ( (e.g., page 4, wherein “The mini-batch size is 𝐵 and there are 𝐸 epochs. “, “ a batch 𝐵ℎ ⊆ {1, . . . , 𝐹 } “. “ local batch 𝐵ℎ,𝑖 ⊆ 𝐵ℎ. We write 𝑏𝑖 = |𝐵ℎ,𝑖 |”   and “Access stream 𝑅 = (⋯, 7, 4, 5, 8,⋯)”, Figure 5.  Thus, wherein the mini-batch sequence comprises: a plurality of mini-batches; end of epoch messages; and an end of stream would have been inherent) . 
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify the method of Krishna by adopting the teachings of Nikoli to have wherein the mini-batch sequence comprises: a plurality of mini-batches; end of epoch messages; and an end of stream message  since, it would  “reduces I/O times and improves end to-end training” (see Nikoli, abstract) or to provide  “powerful interface that can be used in existing training pipelines to improve their I/O performance and reduce overall runtime” (see Nikoli, concludsion)..

As to claim 4, Krishna does not teach   wherein a mini-batch of the mini-batch sequence comprises a randomly sampled portion of at least one of the augmented training data and the training data.  However,  Nikoli teaches wherein a mini-batch of the mini-batch sequence comprises a randomly sampled portion of at least one of the augmented training data and the training data (e.g., see page 4, “Random aggregate read throughput of the PFS, as a function of the number of readers 𝛾” ,  Figure 6, “ RandomSampler”) . 
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify the method of Krishna by adopting the teachings of Nikoli to have wherein a mini-batch of the mini-batch sequence comprises a randomly sampled portion of at least one of the augmented training data and the training data.since, it would  “reduces I/O times and improves end to-end training” (see Nikoli, abstract) or to provide  “powerful interface that can be used in existing training pipelines to improve their I/O performance and reduce overall runtime” (see Nikoli, concludsion)..


As to claim 5, Krishna teaches  wherein the stream entry comprises: a stream identifier; the stream specification; and a stream status (e.g., see FIG. 3,  para 66 and 68, “data”, “data stream” and   “a job (e.g., payload “, “status information” .  in para 55 and 56).  

As to claim 6,  Krishna teaches   wherein the stream specification comprises: stream metadata associated with the stream ; training data access information associated with the training data; mini-batch parameters; and augmentation parameters (e.g., para 68, “training data stream”, “ training data”, “ Augmentation 460 augments the shuffled training data” , “the shuffled training data to batching model 458”, “ the training data into batch sizes”,” batches”) .  

As to claim 7,  Krishna teaches  wherein the method further comprises: obtaining a second stream request, wherein the second stream request comprises a stream status request and a stream identifier (e.g., para 56, “obtain the status information” for  “requests after the failure in para 65) ; in response to obtaining the second request: obtaining a stream status from a stream entry in the stream database; and providing the stream status to a client associated with the second stream request ( e.g., para 56, “A user may view the job status “, “ displayed via a user interface 205.”) .  

As to claim 8, Krishna teaches   further  wherein the method further comprises: obtaining a second stream request, wherein the second stream request comprises a duplicate stream request and a parent stream identifier associated with a parent stream ; in response to obtaining the second request: creating a new stream entry associated with the parent stream in the stream database; creating a new stream endpoint (e.g., para 47, “ Entry point service 304 receives ingress request 302, and validates ingress request 302, such as by verifying a checksum.” “ checksum to a received checksum “, “checksums” , “mirrored (e.g., duplicated)” in para 49. Thus, obtaining a second stream request, wherein the second stream request comprises a duplicate stream request and a parent stream identifier associated with a parent stream ; in response to obtaining the second request: creating a new stream entry associated with the parent stream in the stream database, creating a new stream endpoint would have been inherent). However, Krishna does not teach  regenerating a mini-batch sequence associated with the parent stream; and streaming the mini-batch sequence using the mini-batch sequence queue and the stream endpoint.  Nikoli teaches regenerating a mini-batch sequence associated with the parent stream; and streaming the mini-batch sequence using the mini-batch sequence queue and the stream endpoint (see rejection of claim 1 above).
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify the method of Krishna by adopting the teachings of Nikoli to have obtaining a second stream request, wherein the second stream request comprises a duplicate stream request and a parent stream identifier associated with a parent stream; in response to obtaining the second request: creating a new stream entry associated with the parent stream in the stream database; regenerating a mini-batch sequence associated with the parent stream; creating a new stream endpoint; and streaming the mini-batch sequence using the mini-batch sequence queue and the stream endpoint since, it would  “reduces I/O times and improves end to-end training” (see Nikoli, abstract) or to provide  “powerful interface that can be used in existing training pipelines to improve their I/O performance and reduce overall runtime” (see Nikoli, concludsion)..


As to claim 9, Krishna teaches   wherein the method further comprises: obtaining a second stream request, wherein the second stream request comprises a stream save request and a stream identifier associated with the stream (e.g., see FIG. 4B, para 68 and 69,    wherein “ receives the training data stream from object store 324, “, “stores the batched training data (i.e., batches) within data queue 456.”. Thus, obtaining a second stream request, wherein the second stream request comprises a stream save request and a stream identifier associated with the stream)  ; in response to obtaining the second request: saving entries associated with the stream in the stream database a training data database (e.g., “456”, FIG. 4B) , and a mini-batch database in a log file (e.g., “470”, FIG. 4B ); and storing the log file in a storage (e.g., para 69, “The API client 468 stores the batches within training batch queue 470. A processing unit, such as GPU 472, obtains the batches from the training batch queue, and trains a machine learning model with the batches. Although only one GPU 472 is illustrated, a worker pod 332, 334 may execute on multiple GPUs”).  

As to claim 10, Krishna teaches  further  wherein the method further comprises: obtaining a third stream request, wherein the third stream request comprises a restore stream request and the stream identifier associated with the stream; in response to obtaining the third request: creating a new stream entry associated with the stream in the stream database, obtaining the log file from the storage (e.g., para 53, “ Assuming the processing task is interrupted (e.g., fails), a worker pod 332, 334 that is reassigned the same processing task may obtain the checkpoint data from the object store 324, and determine where in a given training batch to begin applying the processing task”. The  “reassigned the same processing task” include a restore stream request ), creating a stream endpoint( see rejection of claim1 above). However, Krishna does not teach   regenerating the mini-batch sequence associated with the stream using the log file; creating a mini-batch sequence queue; and streaming the mini-batch sequence using the mini-batch sequence queue and the stream endpoint.  Nikoli teaches regenerating the mini-batch sequence associated with the stream using the log file; creating a mini-batch sequence queue and a stream endpoint; and streaming the mini-batch sequence using the mini-batch sequence queue (see rejection of claim 1 above). 
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify the method of Krishna by adopting the teachings of Nikoli to have obtaining a third stream request, wherein the third stream request comprises a restore stream request and the stream identifier associated with the stream; in response to obtaining the third request: creating a new stream entry associated with the stream in the stream database; obtaining the log file from the storage; regenerating the mini-batch sequence associated with the stream using the log file; creating a mini-batch sequence queue and a stream endpoint; and streaming the mini-batch sequence using the mini-batch sequence queue and the stream endpoint.since, it would  “reduces I/O times and improves end to-end training” (see Nikoli, abstract) or to provide  “powerful interface that can be used in existing training pipelines to improve their I/O performance and reduce overall runtime” (see Nikoli, concludsion).

As to claim 11, Krishna teaches  further  obtaining a second stream request, wherein the second stream request comprises a stream termination request and a stream identifier associated with the stream; in response to obtaining the second request: deleting the stream endpoint and the mini-batch  queue associated with the stream; delete cached data associated with the stream; and updating a stream status to indicate that the stream is terminated (e.g., para 5, “when the task is interrupted, from a last “checkpoint,” rather than starting the task from the beginning. The master, worker pods, and work and results queues may be deleted upon completion of the job” and “the completion of a job (e.g., payload has been completely processed), completion manager 312 deletes master 316 and the associated worker pods 332, 334, as well as the work queue 320 and the results queue 322.” in para 55. Thus, deleted upon completion of the job include deleting the stream endpoint and the mini-batch  queue associated with the stream).  However, Krishna does not teach mini-batch sequence queue.  Nikoli teaches mini-batch sequence queue (see rejection of claim 1 above). 
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify the method of Krishna by adopting the teachings of Nikoli to have obtaining a second stream request, wherein the second stream request comprises a stream termination request and a stream identifier associated with the stream; in response to obtaining the second request: deleting the stream endpoint and the mini-batch sequence queue associated with the stream; delete cached data associated with the stream; and updating a stream status to indicate that the stream is terminated  since, it would  “reduces I/O times and improves end to-end training” (see Nikoli, abstract) or to provide  “powerful interface that can be used in existing training pipelines to improve their I/O performance and reduce overall runtime” (see Nikoli, concludsion)..


As to claim 12, see rejection of claim 1 above.  Krishna teaches  further a system for managing training data, comprising: a client; and a training data stream manager (TDSM), comprising a processor and memory, programmed (see FIG. 4B) .

As to claims 13-16, see rejection of claims 2-5 above.

As to claim 17, see rejection of claim 1 above.  Krishna teaches  further  a non-transitory computer readable medium comprising computer readable program code, which when executed by a computer processor enables the computer processor to (e.g., see FIG. 2).

As to claims 18-20, see rejection of claims 2-4 above. 
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Cmielowski et al. discloses A continuous machine learning system includes a data generator module, a pipeline search module, a pipeline refinement module, and a pipeline training module. The data generator module obtains raw training data defining a total data size and generates a plurality of data batches from the raw training data. The pipeline search module obtains an initial data batch from among the plurality of data batches and determines a best machine learning model pipeline among a plurality of machine learning model pipelines based on the initial data batch. The pipeline refinement module receives the best machine learning model pipeline and refines the best machine learning model pipeline to generate a refined pipeline that consumes the plurality of data batches. The pipeline training module incrementally trains the refined pipeline using remaining data batches among the plurality of data batches generated after the initial data batch.

Chen et al. (US 11,113,244) discloses An integrated data pipeline can take advantage of a streaming service, which can handle tasks such as automated redelivery, as well as a processing service, which can allocate workers on a task- or event-specific basis. Event data is aggregated and compressed for delivery by the streaming service. The streaming service can deliver the data asynchronously to the processing service, which can disaggregate and decompress the data to obtain the original data records. The type of event for each record can be determined to determine whether the data should be processed using online and/or offline processing. For online processing the appropriate fields are determined and data extracted to be passed to the online processing services. For offline processing the record data is concatenated sequentially into mini-batches, then compacted into larger batch files that are stored for subsequent offline processing..
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ABDOU K SEYE whose telephone number is (571)270-1062. The examiner can normally be reached M-F 9-5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Vital can be reached at 5712724215. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ABDOU K SEYE/Examiner, Art Unit 2198         


/PIERRE VITAL/Supervisory Patent Examiner, Art Unit 2198

Read full office action

Prosecution Timeline

Mar 31, 2023

Application Filed

Jan 15, 2026

Non-Final Rejection mailed — §103

Apr 06, 2026

Interview Requested

Apr 10, 2026

Examiner Interview Summary

Apr 10, 2026

Applicant Interview (Telephonic)

Apr 13, 2026

Response Filed

Precedent Cases

Applications granted by this same examiner with similar technology

18/405,550

Patent 12639140

REAL-TIME DATA PROCESSING PIPELINE AND PACING CONTROL SYSTEMS AND METHODS

2y 4m to grant Granted May 26, 2026

17/392,297

Patent 12632272

ADAPTIVE VIRTUAL DESKTOP SESSION PLACEMENT ON HOST SERVERS VIA USER LOGOFF PREDICTION

4y 9m to grant Granted May 19, 2026

18/610,083

Patent 12598527

Real-Time Any-G SON

2y 0m to grant Granted Apr 07, 2026

17/683,713

Patent 12587456

MACHINE LEARNING BASED EVENT MONITORING

4y 0m to grant Granted Mar 24, 2026

19/171,788

Patent 12585512

CUSTOMIZED SOCKET APPLICATION PROGRAMMING INTERFACE FUNCTIONS

11m to grant Granted Mar 24, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

82%

Grant Probability

99%

With Interview (+27.5%)

3y 3m (~1m remaining)

Median Time to Grant

Low

PTA Risk

Based on 583 resolved cases by this examiner. Grant probability derived from career allowance rate.