Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claim(s) 1-2, 5-12, 15-20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Zhao et al. (US 2019/0312772) hereafter Zhao.
1. Zhao discloses a data processing method comprising:
sending, by a first processor of a data processing device, a first search message to a second processor, wherein the first search message comprises first data and is for searching for an embedding parameter of the first data (para 42-47, For a distributed DL training operation, each GPU device GPU0, GPU1, GPU2, GPU3 has access to an entire dataset (current minibatch data set), and each GPU device partitions the entire dataset into small chunks. In particular, for the ScatterReduce process, each GPU device GPU0, GPU1, GPU2, GPU3 in the logical communication ring will partition the dataset into N smaller chunks, where N is the number of GPUs in the ring. The GPUs will then perform N−1 iterations of the ScatterReduce process, where in each iteration, each GPU will send one of its data chunks to its right neighbor, and will receive a chunk from its left neighbor and accumulate the corresponding data chucks [the first search message is part of one iteration of the ScatterReduce process]; para 45, he data chunk that is sent and received by each GPU is different in each iteration. At the end of the ScatterReduce stage, each GPU device GPU0, GPU1, GPU2, GPU3 will have one complete data chunk which comprises an accumulation of all final values in that chunk (i.e., the one complete data chunk includes the contribution from all the GPU devices GPU0, GPU1, GPU2, GPU3 [the embedding parameter is the contribution of each GPU device]]), and the second processor is a next-hop processor of the first processor in a ring communication architecture in which the first processor is located (para 44, a plurality of GPU devices in a ring communication configuration to implement all-reduce operations for distributed DL training); and
receiving, by the first processor, a second search message from a third processor, wherein the second search message comprises second data and is for searching for an embedding parameter of the second data, and the third processor is a previous-hop processor of the first processor in the ring communication architecture (para 45, each GPU will send one of its data chunks to its right neighbor, and will receive a chunk from its left neighbor and accumulate the corresponding data chucks),
wherein the first processor, the second processor, and the third processor are among multiple processors comprised in a data training system, wherein multiple processors communicate with each other by using the ring communication architecture, and in the ring communication architecture each processor of the multiple processors receives a message only from a previous-hop processor of said each processor and sends a message only to a next-hop processor of said each processor (para 44, a plurality of GPU devices in a ring communication configuration to implement all-reduce operations for distributed DL training; para 45, For a distributed DL training).
2. Zhao discloses the method according to claim 1, further comprising: when embedding parameters of some or all data in the second data are found based on the second search message, adding, by the first processor, the embedding parameters of the some or all data to the second search message to obtain a third search message; and sending the third search message to the second processor; or when an embedding parameter of the second data is not found based on the second search message, sending, by the first processor, the second search message to the second processor (para 39-41).
5. Zhao discloses the method according to of claim 1, further comprising: receiving, by the first processor, a fourth search message from the third processor, wherein the fourth search message comprises third data and an embedding parameter to which a first part of data in the third data is mapped, and the fourth search message is for searching for an embedding parameter to which data other than the first part of data in the third data is mapped; and when an embedding parameter of a second part of data in the third data is found based on the fourth search message, adding, by the first processor, the embedding parameter of the second part of data to the fourth search message to obtain a fifth search message; and sending the fifth search message to the second processor; or when an embedding parameter of the third data is not found based on the fourth search message, sending, by the first processor, the fourth search message to the second processor (para 44-45).
6. Zhao discloses the method according to claim 1, further comprising: receiving, by the first processor, a sixth search message from the third processor, wherein the sixth search message comprises the first data and the embedding parameter of the first data (para 44-45).
7. A data processing method comprising:
sending, by a first processor of a data processing device, a first notification message to a second processor, wherein the first notification message comprises first data and a first gradient (para 45, AllReduce: comprises ScatterReduce stage and AllGather process; para 47, ) and is for propagating the first gradient to a first target processor, the first gradient corresponds to an embedding parameter of the first data, and the second processor is a next-hop processor of the first processor in a ring communication architecture in which the first processor is located (para 44, a plurality of GPU devices in a ring communication configuration to implement all-reduce operations for distributed DL training); and
receiving, by the first processor, a second notification message from a third processor, wherein the second notification message comprises second data and a second gradient (para 45, in each iteration, each GPU will send one of its data chunks to its right neighbor, and will receive a chunk from its left neighbor and accumulate the corresponding data chucks [after the AllScatter process]; para 46-47, AllReduce protocol, gradients are shared around the ring [follows the AllGather step]) and is for propagating the second gradient to a second target processor, the second gradient corresponds to an embedding parameter of the second data, and the third processor is a previous-hop processor of the first processor in the ring communication architecture (para 45, ScatterReduce process will receive a chunk from its left neighbor and accumulate the corresponding data chucks; para 45, the GPUs perform an AllGather process to exchange those data chunks, so that all at the completion of the AllGather process, each GPU GPU0, GPU1, GPU2, GPU3 will have the fully accumulated values for the entire dataset; para 46-47, AllReduce with gradient exchange),
wherein the first processor, the second processor, and the third processor are among multiple processors comprised in a data training system, wherein the multiple processors communicate with each other by using the ring communication architecture, and in the ring communication architecture each processor of the multiple processors receives a message only from a previous-hop processor of said each processor and sends a message only to a next-hop processor of said each processor (para 44, a plurality of GPU devices in a ring communication configuration to implement all-reduce operations for distributed DL training; para 45, For a distributed DL training).
8. Zhao discloses the method according to claim 7, further comprising: when the second notification message comprises a first target gradient, obtaining, by the first processor, the first target gradient from the second notification message; and sending the second notification message to the second processor, wherein the first target gradient is of an embedding parameter in a first embedding table maintained by the first processor, and there is a one-to-one mapping relationship between data and an embedding parameter in the first embedding table; or when the second notification message does not comprise the first target gradient, sending, by the first processor, the second notification message to the second processor (para 39-41).
9. Zhao discloses the method according to claim 8, wherein the step of obtaining the first target gradient from the second notification message when the second notification message comprises the first target gradient comprises: determining, by the first processor, that some or all data in the second data is the data in the first embedding table; and obtaining, by the first processor, the first target gradient from the second notification message based on the some or all data (para 44-45, see above).
10. Zhao discloses the method according to claim 9, further comprising: receiving, by the first processor, a third notification message from the third processor, wherein the third notification message comprises third data and a third gradient and is for propagating the third gradient to a third target processor, and the third gradient corresponds to an embedding parameter of the third data; and when the third notification message comprises a second target gradient, obtaining, by the first processor, the second target gradient from the third notification message; and sending the third notification message to the second processor, wherein the second target gradient is of an embedding parameter in the first embedding table maintained by the first processor, and the first embedding table comprises a mapping relationship between data and an embedding parameter of the data; or when the third notification message does not comprise the second target gradient, sending, by the first processor, the third notification message to the second processor (para 44-45, see above).
Claims 11-12, 15-20 are similar in scope to claims 1-2, 5-10 and are rejected under similar rationale.
Allowable Subject Matter
Claims 3, 4, 13, 14 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAMES R TURCHEN whose telephone number is (571)270-1378. The examiner can normally be reached Monday-Friday: 7-3.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Luu Pham can be reached at 571-270-5002. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/JAMES R TURCHEN/ Primary Examiner, Art Unit 2439