Prosecution Insights
Last updated: May 29, 2026
Application No. 18/758,347

FAULT IDENTIFICATION AND RECOVERY FOR DISTRIBUTED TRAINING

Non-Final OA §102
Filed
Jun 28, 2024
Examiner
LIN, KATHERINE Y
Art Unit
2113
Tech Center
2100 — Computer Architecture & Software
Assignee
Lemon Inc.
OA Round
2 (Non-Final)
91%
Grant Probability
Favorable
2-3
OA Rounds
4m
Est. Remaining
98%
With Interview

Examiner Intelligence

Grants 91% — above average
91%
Career Allowance Rate
322 granted / 353 resolved
+36.2% vs TC avg
Moderate +7% lift
Without
With
+7.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 3m
Avg Prosecution
18 currently pending
Career history
384
Total Applications
across all art units

Statute-Specific Performance

§101
19.5%
-20.5% vs TC avg
§103
49.1%
+9.1% vs TC avg
§102
18.9%
-21.1% vs TC avg
§112
4.4%
-35.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 353 resolved cases

Office Action

§102
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claim Rejections - 35 USC § 102 The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action: A person shall be entitled to a patent unless – (a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention. Claim(s) 1-20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Jiang et al. (MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs). Jiang discloses: 1. A method comprising: obtaining, during a distributed training task performed across a plurality of computing nodes, at least one heartbeat message from the plurality of computing nodes, each computing node including multiple graphics processing unit (GPU) workers; (p 5: 3.4; p 6: 4.1) detecting, based on the at least one heartbeat message, an abnormal status of the distributed training task; (p 6: 4.1) commanding the plurality of computing nodes to run at least one self-check diagnostics test; (p 6: 4.1) identifying, based on results of the at least one self-check diagnostics test, at least one faulty node from the plurality of computing nodes; and (p 6: 4.1) replacing the at least one faulty node with an equivalent number of heathy computing nodes that have passed the at least one self-check diagnostics test. (p 6: 4.1) 2. The method of claim 1, wherein the at least one heartbeat message includes at least one of: output and error logs of a training process running on a corresponding computing node; and (p 6: fig 5; p 7: 4.2) a Remote Direct Memory Access (RDMA) traffic metric indicating network utilization and efficiency among the plurality of computing nodes. (p 7: 4.2) 3. The method of claim 1, wherein detecting the abnormal status of the distributed training task comprises: performing first monitoring to assess an overall health status and to rule out common configuration impacts on the distributed training task; and (p 7: 4.2) performing second monitoring to determine whether there is network congestion among the plurality of computing nodes and whether a data transfer speed of data parallelism and pipe parallelism has reached its physical limit. (p 7: 4.2) 4. The method of claim 1, wherein the at least one self-check diagnostics test comprises at least one of: a first test to diagnose potential bottlenecks associated with RDMA network interface cards (RNICs) in an intra-host network of a computing node; or (p 7: 4.3) a second test to identify potential faults in GPU communication within a single computing node and among the plurality of computing nodes. (p 7: 4.3) 5. The method of claim 1, further comprising: suspending, upon detection of the abnormal status of the distributed training task, the distributed training task across the plurality of computing nodes. (p 6: 4.1) 6. The method of claim 1, wherein replacing the at least one faulty node with an equivalent number of heathy computing nodes that have passed the at least one self-check diagnostics test comprises: evicting the at least one faulty node from the distributed training task; and (p 6: 4.1) loading model weights and optimizer states from the most recent checkpoint into the heathy computing nodes. (p 7: 4.4) 7. The method of claim 6, further comprising: at a checkpoint, cause each GPU worker of a computing node to write its on-chip states including the model weights and the optimizer states into a memory of the computing node; and (p 7: 4.4) cause the computing node to asynchronously transfer the on-chip states from the memory to a distributed file system. (p 7: 4.4) 8. The method of claim 7, wherein loading model weights and optimizer states from the most recent checkpoint into the heathy computing nodes comprises: for a group of GPU workers that share a same state partition of the distributed file system, designating a single GPU worker in the group to read the shared state partition from the distributed file system; and (p 7: 4.4) causing the single GPU worker to broadcast the shared state partition to all other GPU works in the group. (p 7: 4.4) 9. The method of claim 1, further comprising: collecting data regarding execution time of a code segment on a set of GPU workers; and (fig 7; p 8: 5.1; p 9: 5.2) identifying a computing node, by visualizing the collected data, that includes a GPU worker with slower performance as a faulty node. (fig 7; p 8: 5.1; p 9: 5.2) 10. The method of claim 9, wherein visualizing the collected data comprises: generating a heat map that shows time consumption differences time consumption differences between the set of GPU workers. (p 8: 5.1) 11. The method of claim 9, wherein visualizing the collected data comprises: generating an event timeline on the set of GPU workers in a trace format. (p 8: 5.1) 12. The method of claim 9, wherein identifying at least one GPU worker with slower performance comprises: displaying a logical topology of the GPU workers with respect to at least one of data parallelism, pipeline parallelism, or tensor parallelism. (p 9: 5.2) Claim(s) 13-19 is/are rejected as being the device implemented by the method of claim(s) 1-4, 6-8, and is/are rejected on the same grounds. Claim(s) 20 is/are rejected as being the medium implemented by the method of claim(s) 1, and is/are rejected on the same grounds. Response to Remarks The amendments overcome the objections/rejections to the claim(s) under informalities and 112(b). Conclusion Applicant's submission of an information disclosure statement under 37 CFR 1.97(c) with the timing fee set forth in 37 CFR 1.17(p) on 7-7-2025 prompted the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 609.04(b). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to KATHERINE LIN whose telephone number is (571)431-0706. The examiner can normally be reached Monday-Friday; 8 a.m. - 5 p.m. EST. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bryce Bonzo can be reached at (571) 272-3655. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /KATHERINE LIN/Primary Examiner, Art Unit 2113
Read full office action

Prosecution Timeline

Jun 28, 2024
Application Filed
Jul 09, 2025
Non-Final Rejection mailed — §102
Oct 09, 2025
Response Filed
Jan 16, 2026
Final Rejection mailed — §102
Mar 16, 2026
Response after Non-Final Action

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12625760
System and method for machine-to-machine re-imaging
2y 7m to grant Granted May 12, 2026
Patent 12619486
Mechanism of Enabling Fault Handling with PCIe Re-timer
2y 9m to grant Granted May 05, 2026
Patent 12613772
MEMORY DEVICE AND OPERATING METHOD THEREOF
1y 6m to grant Granted Apr 28, 2026
Patent 12608292
MANAGEMENT METHOD AND APPARATUS AND ATE TEST SYSTEM
1y 6m to grant Granted Apr 21, 2026
Patent 12596953
QUANTUM ERROR CORRECTION USING NEURAL NETWORKS
2y 7m to grant Granted Apr 07, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

2-3
Expected OA Rounds
91%
Grant Probability
98%
With Interview (+7.0%)
2y 3m (~4m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 353 resolved cases by this examiner. Grant probability derived from career allowance rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month