DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on November 10, 2025 has been entered.
In response to the application filed on November 10, 2025, claims 1-21 are now pending for examination in the application.
Response to Arguments
This office action is in response to amendment filed 11/10/2025. In this action claim(s) 1-2, 4-6, 8-10, 12-16, 18-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Okamoto (US Pub. No. 20220083914) and Serebryakov et al. (US Pub. No. 20220067577) in further view of Lambert et al. (US Pub. No. 20210334630). The Serebryakov et al. reference has been added to address the amendment of reallocating, based on sizes of the portion of the data files loaded to the multiple processors of a same processer group, among the plural processor groups, the loaded corresponding portion of the plurality of data files to the same group to reduce a deviation in size of the corresponding portion of the plurality of data files.
Applicant’s arguments:
In regards to claim 1 on Pages 9-10, applicant argues “performing the distributed training of a neural network model using a result of the reallocating and the multiple processors corresponding to a set of processors in a same server, such that the multiple processors only exchange the loaded data files with each other within the same server and are refrained from communicating with processors in a different server among the plurality of processors, thereby achieving a reduction in cross-server communication overhead. Accordingly, Applicants respectfully submit that the above-noted claimed features of independent claim 1 are not, and/or would/could not be, practically performed in the human mind and/or correspond to mental process or manual activities,” as recited in claim 1.
Examiner’s Reply:
The claims have been evaluated as a whole and when considered in their entirety they still amount to distributing data in a neural network using a size determination and allocating/reallocating data. The additional model training using multiple processors do not add meaningful limitations beyond the abstract idea.
Applicant’s arguments:
In regards to claim 1 on Pages 12, applicant argues “That is, the claims are directed to a specific implementation for improvements to technology or technical field of performing the distributed training of a neural network model using a result of the reallocating and the multiple processors corresponding to a set of processors in a same server, such that the multiple processors only exchange the loaded data files with each other within the same server and are refrained from communicating with processors in a different server among the plurality of processors, thereby achieving a reduction in cross-server communication overhead, as also further discussed below,” as recited in claim 1.
Examiner’s Reply:
Choosing how to efficiently train a neural network is not a technological improvement. The claims are silent with respect to any new training technique. The claims merely determine how and when to distribute data for processing. This determination and division of data files is a computer-implemented abstract mental process.
Applicant’s arguments:
In regards to claim 1 on Pages 14, applicant argues “Further, since the difference in computation time required for each processor to process its respective data file is small, the invention is capable of reducing synchronization overhead that would otherwise occur while waiting for other processors to complete their operations,” as recited in claim 1.
Examiner’s Reply:
Machine learning (eg training a neural network) is well-understood, routine, and conventional. The additional elements merely allow a user to determine the most efficient way to train a neural network given a certain amount of processing resources.
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
Claims 1-21 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claims 1 and 15 contain subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. There is no support for “such that the multiple processors only exchange the loaded data files with each other within the same server and are refrained from communicating with processors in a different server among the plurality of processors, thereby achieving a reduction in cross-server communication overhead ….”.
Dependent claims 2-14 and 16-21 is/are also rejected for inheriting the deficiencies of the independent claims from which they depend on.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-21 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA the applicant regards as the invention.
Claim(s) 1 and 15 recite “such that the multiple processors only exchange the loaded data files with each other within the same server and are refrained from communicating with processors in a different server among the plurality of processors, thereby achieving a reduction in cross-server communication overhead,” which are intended results. Thus making the claims indefinite.
Claims 2-14 and 16-21 are also rejected for incorporating the same indefiniteness of their respective base claims.
Claim(s) 1 and 15 recite “reallocating, based on sizes of the portion of the data files loaded to the multiple processors of a same processor group, among the plural processor groups, the loaded corresponding portion of the plurality of data files to the same processor group corresponding portion of the plurality of data files to the same processor group to reduce a deviation in size of the corresponding portion of the plurality of data files”. The terms “to reduce a deviation” is merely intended use as recited in the claim.
Claims 2-14 and 16-21 are also rejected for incorporating the same indefiniteness of their respective base claims.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claim 1-21 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Claim 1-21 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The judicial exception is not integrated into a practical application. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. The eligibility analysis in support of these findings is provided below, in accordance with the 2019 Revised Patent Subject Matter Eligibility Guidance, hereinafter 2019 PEG.
Step 1. in accordance with Step 1 of the eligibility inquiry (as explained in MPEP 2106), it is first noted the claim method (claims 1-14), and an apparatus (claims 15-21) are directed to one of the eligible categories of subject matter and therefore satisfies Step 1.
Step 2A. In accordance with Step 2A, prong one of the 2019 PEG, it is noted that the independent claims recite an abstract idea falling within the Mental Processes enumerated groupings of abstract ideas set forth in the 2019 PEG. Examiner is of the position that independent claims 1 and 15 are directed towards the Mental Process Grouping of Abstract Ideas.
Independent claim(s) 1, 14, and 15 recites the following limitations directed towards a Mental Processes & Mathematical Concepts:
determining a data file size range corresponding to each of a plurality of subsets of a training data set, based on a distribution of sizes of a plurality of data files included in the training data set (The limitation recites a mental process of observation and/or evaluation capable of being performed by the human mind by determining a size range);
dividing the training data set into the plurality of subsets based on the data file size range (The limitation recites a mathematical concept of dividing training data);
reallocating, based on sizes of the portion of the data files loaded to the multiple processors of a same processor group, among the plural processor groups, the loaded corresponding portion of the plurality of data files to the same processor group corresponding portion of the plurality of data files to the same processor group to reduce a deviation in size of the corresponding portion of the plurality of data files (The limitation recites a mental process of observation and/or evaluation capable of being performed by the human mind by reallocating data).
Step 2A. In accordance with Step 2A, prong two of the 2019 PEG, the judicial exception is not integrated into a practical application because of the recitation in claim(s) 1, 14, and 15:
loading, for each of the plurality of subsets, a corresponding portion of the plurality of data files in a corresponding subset to the plurality of processors based on a proportion of a total number of the plurality of data files that are in the corresponding subset, and based on a batch size of distributed training (recites insignificant extra solution activity that amounts to loading data);
performing the distributed training of a neural network model using a result of the reallocating and the multiple processors corresponding to a set of processors in a same server,
such that the multiple processors only exchange the loaded data files with each other within the same server and are refrained from communicating with processors in a different server among the plurality of processors, thereby achieving a reduction in cross-server communication overhead (recites insignificant extra solution activity that amounts to training a neural network).
Step 2B. Similar to the analysis under 2A Prong Two, the claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. Because the additional elements of the independent claims amount to insignificant extra solution activity and/or mere instructions, the additional elements do not add significantly more to the judicial exception such that the independent claims as a whole would be patent eligible.
Therefore, independent claims 1, 14, and 15 are rejected under 35 U.S.C. 101.
With respect to claim(s) 2 and 16:
Step 2A, prong one of the 2019 PEG:
performing the Separating, based on the data file size range, of the training data set into the plurality of subsets corresponding to predetermined intervals,
wherein, with respect to the separated training data, each of the plurality of subsets includes a respective data file, having a corresponding size, belonging to a corresponding interval among the predetermined intervals, with each of the predetermined intervals having a predetermined size and each of the predetermined intervals corresponding to a respective portion of data file size range corresponding to the training data (The limitation recites a mathematical concept of dividing training data).
Step 2A Prong Two Analysis:
This judicial exception is not integrated into a practical application because there are no
additional elements to provide practical application.
Step 2B Analysis:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
With respect to claim(s) 3 and 17:
Step 2A, prong one of the 2019 PEG:
performing the separating, based on the sizes of the data files, of the training data set into the plurality of subsets by the training data set into a predetermined number of subsets based on a cumulative distribution function (CDF) for the sizes of the data files such that each of the plurality of subsets comprises a same number of data files (The limitation recites a mathematical concept of dividing training data).
Step 2A Prong Two Analysis:
This judicial exception is not integrated into a practical application because there are no
additional elements to provide practical application.
Step 2B Analysis:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
With respect to claim(s) 4 and 18:
Step 2A, prong one of the 2019 PEG:
sorting the portion of the data files loaded to the multiple processors of the same processor group in an order of sizes (The limitation recites a mental process of observation and/or evaluation capable of being performed by the human mind by to sort data files);
distributing the sorted data files to the multiple processors of the same processor
group in a predetermined order (The limitation recites a mental process of observation and/or evaluation capable of being performed by the human mind by to distribute data files).
Step 2A Prong Two Analysis:
This judicial exception is not integrated into a practical application because there are no
additional elements to provide practical application.
Step 2B Analysis:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
With respect to claim(s) 5 and 19:
Step 2A, prong one of the 2019 PEG:
sorting the portion of the data files loaded to the multiple processors of the same processor group in an order of sizes (The limitation recites a mental process of observation and/or evaluation capable of being performed by the human mind by to sort data files);
distributing, to the multiple processors in the same processor group, a portion of the
sorted data files in a first order determined in advance and another portion of the sorted data
files in a second order that is a reverse order of the first order (The limitation recites a mental process of observation and/or evaluation capable of being performed by the human mind by to distribute data files).
Step 2A Prong Two Analysis:
This judicial exception is not integrated into a practical application because there are no
additional elements to provide practical application.
Step 2B Analysis:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
With respect to claim(s) 6 and 20:
Step 2A, prong one of the 2019 PEG:
wherein the distributing in the first order and the distributing in the second order is repetitively performed within the batch size (The limitation recites a mental process of observation and/or evaluation capable of being performed by the human mind by to distribute data files).
Step 2A Prong Two Analysis:
This judicial exception is not integrated into a practical application because there are no
additional elements to provide practical application.
Step 2B Analysis:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
With respect to claim(s) 7 and 21:
Step 2A, prong one of the 2019 PEG:
determining a number of data files to be extracted from the subset based on the proportion of the number of data files of the plurality of subsets in the subset and the batch size (The limitation recites a mental process of observation and/or evaluation capable of being performed by the human mind by to determining files to extracted).
Step 2A Prong Two Analysis:
This judicial exception is not integrated into a practical application because there are no
additional elements to provide practical application.
Step 2B Analysis:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
With respect to claim(s) 8:
Step 2A, prong one of the 2019 PEG:
a number of data files extracted from the first subset among the plurality of data files
loaded to the first processor is equal to a number of data files extracted from the first subset
among data files loaded to the second processor (The limitation recites a mental process of observation and/or evaluation capable of being performed by the human mind by to extracting files.
Step 2A Prong Two Analysis:
the plurality of processors comprises a first processor and a second processor, the plurality of subsets comprises a first subset (i.e., as a generic processor/component performing a generic computer function).
Step 2B Analysis:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
With respect to claim(s) 9:
Step 2A, prong one of the 2019 PEG:
wherein a number of the plurality of subsets is determined based on any one or any combination of any two or more of a number of the plurality of processors, the batch size, and an input of a user (The limitation recites a mental process of observation and/or evaluation capable of being performed by the human mind by to determining files to extracted).
Step 2A Prong Two Analysis:
This judicial exception is not integrated into a practical application because there are no
additional elements to provide practical application.
Step 2B Analysis:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
With respect to claim(s) 10:
Step 2A, prong one of the 2019 PEG:
Examiner is of the position the dependent claim is directed toward additional elements.
Step 2A Prong Two Analysis:
wherein the same processor group comprises a set of processors in a same server (i.e., as a generic processor/component performing a generic computer function).
Step 2B Analysis:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
With respect to claim(s) 11:
Step 2A, prong one of the 2019 PEG:
Examiner is of the position the dependent claim is directed toward additional elements.
Step 2A Prong Two Analysis:
natural language text data for training a natural language processing (NLP) model (recites insignificant extra solution activity that amounts to loading data); and
speech data for training the NLP model (recites insignificant extra solution activity that amounts to loading data).
Step 2B Analysis:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
With respect to claim(s) 12:
Step 2A, prong one of the 2019 PEG:
Examiner is of the position the dependent claim is directed toward additional elements.
Step 2A Prong Two Analysis:
wherein the multiple processors comprise a graphics processing unit (GPU) (i.e., as a generic processor/component performing a generic computer function).
Step 2B Analysis:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
With respect to claim(s) 13:
Step 2A, prong one of the 2019 PEG:
Examiner is of the position the dependent claim is directed toward additional elements.
Step 2A Prong Two Analysis:
performing, using the multiple processors of the same processor group, one or more training operations of a deep learning model based on the reallocated data files. (recites insignificant extra solution activity that amounts to training a model).
Step 2B Analysis:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-2, 4-6, 8-10, 12-16, 18-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Okamoto (US Pub. No. 20220083914) and Serebryakov et al. (US Pub. No. 20220067577) in further view of Lambert et al. (US Pub. No. 20210334630).
With respect to claim 1, Okamoto teaches a processor-implemented method for a data loading system for a distributed training of a neural network model that includes a plurality of processors with each of plural processor groups of the plurality of processors having multiple processors, the method comprising:
determining a data file size range corresponding to each of a plurality of subsets of a training data set, based on a distribution of sizes of a plurality of data files included in the training data set (Paragraph 111 discloses determine which training data among the data set is to be used for the actual learning, and executes the third optimization of optimizing the buffer size in which shuffle is performed);
dividing the training data set into the plurality of subsets based on the data file size range (Paragraph 111 discloses when the learning data set is divided into several subsets, the best performance model is not always trained when all the subsets are used for training the model). Okamoto does not disclose loading, for each of the plurality of subsets, a corresponding portion of the plurality of data files in a corresponding subset to the plurality of processors based on a proportion of a total number of the plurality of data files that are in the corresponding subset, and based on a batch size of distributed training;
loading, for each of the plurality of subsets, a corresponding portion of the plurality of data files in a corresponding subset to the plurality of processors based on a proportion of a total number of the plurality of data files that are in the corresponding subset, and based on a batch size of distributed training (Paragraph 245 discloses the first data control unit 133 divides the training data group files into data files namely, “File #1”, “File #2”, “File #3”, “File #4”, “File #5”, “File #6”, “File #7”, “File #8”, “File #9”, “File #10”, and “File #11”, each of which obtained corresponding to each of the sets and Paragraph 52 discloses he order in which each of subsets obtained by batch processing of the learning data set is to be learned is considered to contribute to the performance of the model). Okamoto does not disclose reallocating, based on sizes of the portion of the data files loaded to the multiple processors of a same processer group, among the plural processor groups, the loaded corresponding portion of the plurality of data files to the same group to reduce a deviation in size of the corresponding portion of the plurality of data files.
However, Serebryakov et al. teaches reallocating, based on sizes of the portion of the data files loaded to the multiple processors of a same processer group, among the plural processor groups, the loaded corresponding portion of the plurality of data files to the same group to reduce a deviation in size of the corresponding portion of the plurality of data files (Paragraph 69 discloses the nodes can be configured to exchange data within each node such that data files originally assigned to working set A may at some point be reallocated to working set B and vice versa)
performing the distributed training of a neural network model using a result of the reallocating and the multiple processors corresponding portion of the plurality of data files (Paragraph 56 discloses Hardware processor 312 for each node may execute instruction 344 to cause the node to shuffle data in the first working set (e.g., working set A) while training the neural network using data from the second working set (e.g., working set B) and Paragraph 40 discloses Computing component 180 may be, for example, a server computer, a controller, or any other similar computing component capable of processing data).
Therefore, it would have been obvious before the effective filing data of invention was made to a person having ordinary skill in the art to modify Okamoto with Serebryakov et al. This would have facilitated data parallelism See Serebryakov et al. Paragraph(s) 1-7.
Okamoto as modified by Serebryakov et al. does not explicitly disclose Such that the multiple processors only exchange the loaded data files with each other within the same server and are refrained from communicating with processors in a different server among the plurality of processors, thereby achieving a reduction in cross-server communication overhead.
However, Lambert et al. teaches such that the multiple processors only exchange the loaded data files with each other within the same server and are refrained from communicating with processors in a different server among the plurality of processors, thereby achieving a reduction in cross-server communication overhead (Paragraph 125 discloses DRAM, SRAM, flash memory or some other storage type may depend on available storage on-chip versus off-chip, latency requirements of training and/or inferencing functions being performed, batch size of data used in inferencing and/or training of a neural network, or some combination of these factors and Paragraph 326 disclose GPGPU 1730 can be linked directly to other instances of GPGPU 1730 to create a multi-GPU cluster to improve training speed for deep neural networks).
Therefore, it would have been obvious before the effective filing data of invention was made to a person having ordinary skill in the art to modify Okamoto and Serebryakov et al. with Lambert et al. This would have facilitated data parallelism. See Lambert et al. Paragraph(s) 2.
The Okamoto reference as modified by Serebryakov et al. and Lambert et al. teaches all the limitations of claim 1. With respect to claim 2, Okamoto teaches the method of claim 1, further comprising performing the Separating, based on the data file size range, of the training data set into the plurality of subsets corresponding to predetermined intervals,
wherein, with respect to the separated training data, each of the plurality of subsets includes a respective data file, having a corresponding size, belonging to a corresponding interval among the predetermined intervals, with each of the predetermined intervals having a predetermined size and each of the predetermined intervals corresponding to a respective portion of data file size range corresponding to the training data (Paragraph 256 discloses the user can use various hyperparameters such as upper limit (maxValue), lower limit (minValue), minimumUnit, or the like to designate details of division, that is, how the training data group included in “File #X” will be divided. In other words, the user can designate the shuffle buffer size using the above hyperparameters or the like).
. The Okamoto reference as modified by Serebryakov et al. and Lambert et al. teaches all the limitations of claim 1. With respect to claim 4, Okamoto teaches the method of claim 1, wherein the reallocating of the loaded data files comprises:
sorting the portion of the data files loaded to the multiple processors of the same
processor group in an order of sizes (Paragraph 399 discloses the first data control unit 133 divides the training data group sorted so that the included pieces of training data are arranged in chronological order, into a predetermined number of sets); and
distributing the sorted data files to the multiple processors of the same processor
group in a predetermined order (Paragraph 402 discloses divides the training data group generated by the first data control unit 133 as a process of generating training data having a size equal to the size of the shuffle buffer).
The Okamoto reference as modified by Serebryakov et al. and Lambert et al. teaches all the limitations of claim 1. With respect to claim 5, Okamoto teaches the method of claim 1, wherein the reallocating of the loaded data files comprises:
sorting the portion of the data files loaded to the multiple processors in the same
processor group in an order of sizes (Paragraph 399 discloses the first data control unit 133 divides the training data group sorted so that the included pieces of training data are arranged in chronological order, into a predetermined number of sets); and
distributing, to the multiple processors in the same processor group, a portion of the
sorted data files in a first order determined in advance and another portion of the sorted data
files in a second order that is a reverse order of the first order (Paragraph 402 discloses divides the training data group generated by the first data control unit 133 as a process of generating training data having a size equal to the size of the shuffle buffer).
The Okamoto reference as modified by Serebryakov et al. and Lambert et al. teaches all the limitations of claim 5. With respect to claim 6, Lambert et al. teaches the method of claim 5, wherein the distributing in the first order and the distributing in the second order is repetitively performed within the batch size (Paragraph 252 discloses When training a model in deep learning, proper batch processing of the data set and iterative learning on the model are considered important in order to improve the accuracy of the model. In addition, the order in which each of subsets obtained by batch processing of the learning data set is to be learned is considered to contribute to the performance of the model. The third optimization algorithm is an optimization process that has been realized based on such a premise). The motivation to combine statement previously provided in the rejection of independent claim 5 provided above, combining the Okamoto reference and the Lambert et al. reference is applicable to dependent claim 6.
The Okamoto reference as modified by Serebryakov et al. and Lambert et al. teaches all the limitations of claim 1. With respect to claim 8, Serebryakov et al. teaches the method of claim 1, wherein the plurality of processors comprises a first processor and a second processor, the plurality of subsets comprises a first subset, and a number of data files extracted from the first subset among the plurality of data files loaded to the first processor is equal to a number of data files extracted from the first subset among data files loaded to the second processor (Paragraph 35 discloses A simple API can be implemented for distributed training, test and feature extraction, which allows easy integration with existing data processing pipelines). The motivation to combine statement previously provided in the rejection of independent claim 1 provided above, combining the Okamoto reference and the Serebryakov et al. reference is applicable to dependent claim 8.
The Okamoto reference as modified by Serebryakov et al. and Lambert et al. teaches all the limitations of claim 1. With respect to claim 9, Okamoto teaches the method of claim 1, wherein a number of the plurality of subsets is determined based on any one or any combination of any two or more of a number of the plurality of processors, the batch size, and an input of a user (Paragraph 244 discloses divides the training data group in a state where the included training data is sorted, into a predetermined number of sets (step S132). For example, the first data control unit 133 can divide the training data group into a predetermined number of sets so that a predetermined number of pieces of training data (for example, a number designated by the user) is equally included in one set. Furthermore, the first data control unit 133 may divide the training data group into a predetermined number of sets so that one set includes a number of pieces of training data within a predetermined range).
The Okamoto reference as modified by Serebryakov et al. and Lambert et al. teaches all the limitations of claim 1. With respect to claim 10, Okamoto teaches the method of claim 1, wherein the same processor group comprises a set of processors in a same server (Paragraph 354 discloses a user desires to operate a model having performance improved by fine tuning by the information processing device 100 described above, in a production environment (for example, a server or an edge device)).
The Okamoto reference as modified by Serebryakov et al. and Lambert et al. teaches all the limitations of claim 1. With respect to claim 12, Okamoto teaches the method of claim 1, wherein the multiple processors comprise a graphics processing unit (GPU) (Paragraph 36 only one of a GPU and a CPU is defined as the arithmetic unit as an execution target to execute the process, for each of the plurality of processes executed as a model).
The Okamoto reference as modified by Serebryakov et al. and Lambert et al. teaches all the limitations of claim 1. With respect to claim 13, Okamoto teaches the method of claim 1, further comprising performing, using the multiple processors of the same processor group, one or more training operations of a deep learning model based on the reallocated data files (Paragraph 252 discloses When training a model in deep learning, proper batch processing of the data set and iterative learning on the model are considered important in order to improve the accuracy of the model).
The Okamoto reference as modified by Serebryakov et al. and Lambert et al. teaches all the limitations of claim 1. With respect to claim 14, Okamoto teaches a non-transitory computer-readable storage medium storing instructions that, when executed by one or more processors, configure the one or more processors to perform the method of claim 1 (Paragraph 37 discloses non-transitory computer-readable storage medium having stored therein a learning program/a classification apparatus, a classification method).
With respect to claim 15, Okamoto teaches an apparatus for a data loading system, the apparatus comprising:
a plurality of processors, with each of plural processor groups of the plurality of
processors having multiple processors (Paragraph 360 discloses the GPU execute the process A1 and the CPU execute the process A2), the plurality of processors configured to:
determine a data file size range corresponding to each of a plurality of subsets of a training data set, based on a distribution of sizes of a plurality of data files included in the training data set (Paragraph 111 discloses determine which training data among the data set is to be used for the actual learning, and executes the third optimization of optimizing the buffer size in which shuffle is performed);
divide the training data set into the plurality of subsets based on the data file size range (Paragraph 111 discloses when the learning data set is divided into several subsets, the best performance model is not always trained when all the subsets are used for training the model). Okamoto does not disclose loading, for each of the plurality of subsets, a corresponding portion of the plurality of data files in a corresponding subset to the plurality of processors based on a proportion of a total number of the plurality of data files that are in the corresponding subset, and based on a batch size of distributed training;
load, for each of the plurality of subsets, a corresponding portion of the plurality of data files in a corresponding subset to the plurality of processors based on a proportion of a total number of the plurality of data files that are in the corresponding subset, and based on a batch size of distributed training (Paragraph 245 discloses the first data control unit 133 divides the training data group files into data files namely, “File #1”, “File #2”, “File #3”, “File #4”, “File #5”, “File #6”, “File #7”, “File #8”, “File #9”, “File #10”, and “File #11”, each of which obtained corresponding to each of the sets and Paragraph 52 discloses he order in which each of subsets obtained by batch processing of the learning data set is to be learned is considered to contribute to the performance of the model). Okamoto does not disclose reallocating, based on sizes of the portion of the data files loaded to the multiple processors of a same processer group, among the plural processor groups, the loaded corresponding portion of the plurality of data files to the same group to reduce a deviation in size of the corresponding portion of the plurality of data files.
However, Serebryakov et al. teaches reallocate, based on sizes of the portion of the data files loaded to the multiple processors of a same processer group, among the plural processor groups, the loaded corresponding portion of the plurality of data files to the same group to reduce a deviation in size of the corresponding portion of the plurality of data files (Paragraph 69 discloses the nodes can be configured to exchange data within each node such that data files originally assigned to working set A may at some point be reallocated to working set B and vice versa)
perform the distributed training of a neural network model using a result of the reallocating and the multiple processors corresponding portion of the plurality of data files (Paragraph 56 discloses Hardware processor 312 for each node may execute instruction 344 to cause the node to shuffle data in the first working set (e.g., working set A) while training the neural network using data from the second working set (e.g., working set B) and Paragraph 40 discloses Computing component 180 may be, for example, a server computer, a controller, or any other similar computing component capable of processing data).
Therefore, it would have been obvious before the effective filing data of invention was made to a person having ordinary skill in the art to modify Okamoto with Serebryakov et al. This would have facilitated data parallelism See Serebryakov et al. Paragraph(s) 1-7.
Okamoto as modified by Serebryakov et al. does not explicitly disclose Such that the multiple processors only exchange the loaded data files with each other within the same server and are refrained from communicating with processors in a different server among the plurality of processors, thereby achieving a reduction in cross-server communication overhead.
However, Lambert et al. teaches such that the multiple processors only exchange the loaded data files with each other within the same server and are refrained from communicating with processors in a different server among the plurality of processors, thereby achieving a reduction in cross-server communication overhead (Paragraph 125 discloses DRAM, SRAM, flash memory or some other storage type may depend on available storage on-chip versus off-chip, latency requirements of training and/or inferencing functions being performed, batch size of data used in inferencing and/or training of a neural network, or some combination of these factors and Paragraph 326 disclose GPGPU 1730 can be linked directly to other instances of GPGPU 1730 to create a multi-GPU cluster to improve training speed for deep neural networks).
Therefore, it would have been obvious before the effective filing data of invention was made to a person having ordinary skill in the art to modify Okamoto and Serebryakov et al. with Lambert et al. This would have facilitated data parallelism. See Lambert et al. Paragraph(s) 2.
With respect to claim 16, it is rejected on grounds corresponding to above rejected claim 2, because claim 16 is substantially equivalent to claim 2.
With respect to claim 18, it is rejected on grounds corresponding to above rejected claim 4, because claim 18 is substantially equivalent to claim 4.
With respect to claim 19, it is rejected on grounds corresponding to above rejected claim 5, because claim 19 is substantially equivalent to claim 5.
With respect to claim 20, it is rejected on grounds corresponding to above rejected claim 6, because claim 20 is substantially equivalent to claim 6.
Claim(s) 3,7, 17, and 21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Okamoto (US Pub. No. 20220083914) and Serebryakov et al. (US Pub. No. 20220067577) and Lambert et al. (US Pub. No. 20210334630) in further view of Mayer et al. (US Pub. No. 20210287089.
The Okamoto reference as modified by Serebryakov et al. and Lambert et al. teaches all the limitations of claim 1. With respect to claim 3, Okamoto as modified by Serebryakov et al. and Lambert et al. does not disclose cumulative distribution function.
However, Mayer et al. teaches the method of claim 1, further comprising performing the
separating, based on the sizes of the data files, of the training data set into the plurality of
subsets by the training data set into a predetermined number of subsets based on a
cumulative distribution function (CDF) for the sizes of the data files such that each of the plurality of subsets comprises a same number of data files (Paragraph 30 discloses the training data can include tabular data having a plurality of rows and columns. Transforming the column of numerical values can include performing a ridit transformation or a cumulative distribution function transformation. The transformed numerical values can fall within a specified numerical range. Each row of the column of transformed numerical values can correspond to a respective row of the column of numerical values. Each row of the column of identifiers can correspond to a respective row of the column of numerical values).
Therefore, it would have been obvious before the effective filing data of invention was made to a person having ordinary skill in the art to modify Okamoto and Serebryakov et al. and Lambert et al. with Mayer et al. This would have facilitated data parallelism. See Mayer et al. Paragraphs 3-36.
The Okamoto reference as modified by Serebryakov et al. and Lambert et al. teaches all the limitations of claim 1. With respect to claim 7, Okamoto as modified by Serebryakov et al. and Lambert et al. does not disclose determining a number of data files to be extracted from the subset based on the proportion of the number of data files of the plurality of subsets in the subset and the batch size.
However, Mayer et al. teaches the method of claim 1, wherein the loading, from each of the plurality of subsets, of the portion of data files in the subset to the plurality of processors comprises:
determining a number of data files to be extracted from the corresponding subset based
on the proportion of the total number of the plurality of data files of the plurality of subsets in the corresponding subset and the batch size (Paragraph 15 discloses determining, based on a size of the training data, one or more first hyperparameters including at least one of a mini-batch size or a dropout rate); and
arbitrarily extracting the determined number of data files from the corresponding subset and loading the extracted data files to the plurality of processors (Paragraph 80 discloses Neural network models are flexible and allow for inclusion or composition of arbitrary functions).
Therefore, it would have been obvious before the effective filing data of invention was made to a person having ordinary skill in the art to modify Okamoto and Serebryakov et al. and Lambert et al. with Mayer et al. This would have facilitated data parallelism See Mayer et al. Paragraphs 3-36.
With respect to claim 17, it is rejected on grounds corresponding to above rejected claim 3, because claim 17 is substantially equivalent to claim 3.
With respect to claim 21, it is rejected on grounds corresponding to above rejected claim 7, because claim 21 is substantially equivalent to claim 7.
Claim(s) 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Okamoto (US Pub. No. 20220083914) and Serebryakov et al. (US Pub. No. 20220067577) and Lambert et al. (US Pub. No. 20210334630) in further view of Chen et al. (US Pub. No. 20210042620).
The Okamoto reference as modified by Serebryakov et al. and Lambert et al. teaches all the limitations of claim 11, Okamoto as modified by Serebryakov et al. and Lambert et al. does not disclose natural language processing.
However, Chen et al. teaches the method of claim 1, wherein the training data set comprises either one or both of:
natural language text data for training a natural language processing (NLP) model (Paragraph 24 discloses natural language processing); and
speech data for training the NLP model (Paragraph 37 discloses a natural language processing or understanding task, e.g., an entailment task, a paraphrase task, a textual similarity task, a sentiment task, a sentence completion task, a grammaticality task, and so on, that operates on a sequence of text in some natural language).
Therefore, it would have been obvious before the effective filing data of invention was made to a person having ordinary skill in the art to modify Okamoto and Serebryakov et al. and Lambert et al. with Chen et al. This would have facilitated data parallelism. See Chen et al. Paragraphs 2-18.
Relevant Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US PG-Patent. No. 11157812 is directed to Systems And Methods For Tuning Hyperparameters Of A Model And Advanced Curtailment Of A Training Of The Model: [Column 17 Lines 1-14] It shall be recognized that, while intervals and/or a frequency for implementing a checkpoint evaluation may be set based on epochs (i.e., an epoch-based interval), S230 may function to set a frequency and/or an interval for checkpoint evaluations based on any suitable training timing measure including based on a completion of training a subject model on a predetermined number of batches or the like. In such embodiments, a full training dataset for a training cycle or epoch may be divided into distinct batches of training data (i.e., a subset of the full training dataset) and thus, intervals for checkpoint evaluations may be set according to a completion of training a subject model on a certain number of batches of training data (e.g., 4 out of 8 total batches of training data) of the full training dataset.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NICHOLAS E ALLEN whose telephone number is (571)270-3562. The examiner can normally be reached Monday through Thursday 830-630.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Boris Gorney can be reached at (571) 270-5626. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/N.E.A/Examiner, Art Unit 2154
/BORIS GORNEY/Supervisory Patent Examiner, Art Unit 2154