Last updated: May 29, 2026

Application No. 18/146,295

METHODS, SYSTEMS, ARTICLES OF MANUFACTURE AND APPARATUS TO IMPROVE DISTRIBUTED MACHINE LEARNING EFFICIENCY

Non-Final OA §103

Filed

Dec 23, 2022

Examiner

JAYAKUMAR, CHAITANYA R

Art Unit

2128

Tech Center

2100 — Computer Architecture & Software

Assignee

Intel Corporation

OA Round

1 (Non-Final)

This examiner grants 24% of cases after interview

— +20.5% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 53 resolved cases, 2023–2026

Examiner Intelligence

JAYAKUMAR, CHAITANYA R View full profile →

Grants only 24% of cases

Career Allowance Rate

13 granted / 53 resolved

-30.5% vs TC avg

Strong +20% interview lift

Without

With

+20.5%

Interview Lift

resolved cases with interview

Typical timeline

5y 3m

Avg Prosecution

9 currently pending

Career history

Total Applications

across all art units

Statute-Specific Performance

§101

7.0%

-33.0% vs TC avg

§103

90.8%

+50.8% vs TC avg

§102

1.8%

-38.2% vs TC avg

Black line = Tech Center average estimate • Based on career data from 53 resolved cases

Office Action

§103

DETAILED ACTION

This action is in response to the submission filed 23 December 2022 for application 18/146,295. Currently claims 24-30 are canceled. Claims 1-23 are pending and have been examined.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Specification
The title is objected to because it includes the word “improve”. Appropriate correction is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-3, 8-10, and 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over Sapio et al (Scaling Distributed Machine Learning with In-Network Aggregation, 2021) in view of Lao et al (ATP: In-network Aggregation for Multi-tenant Learning, 2021).

Regarding claim 1:
Sapio teaches: An apparatus to accelerate processing iterations, comprising: train management circuitry to cause a first vector to be sent from a worker node to an in-network-aggregator (INA) after completion of a first processing iteration requested by a parameter server ([Page 786, Column 2, Paragraph 4] The parameter server (PS) approach. In this approach, workers compute model updates and send them to parameter servers. [Page 787, Column 1, Section 3 In-network aggregation] We propose an alternative approach to model update exchange for ML workloads: in-network aggregation. In this approach, workers send their model updates over the network, where an aggregation primitive in the network sums the updates. [Page 788, Column 1, Figure 1] Example of in-network aggregation of model updates. Ui is the model update computed by worker i. Workers stream pieces of model updates in a coordinated fashion);
and permit the second processing iteration when (a) an acknowledgement (ACK) from the INA corresponding to the first vector is received ([Page 790, Column 2, Paragraph 1] The worker consumes the result carried in the packet, copying that packet’s vector into the aggregated model update A at the offset carried in the packet (p.off). The worker then sends a new packet with the next piece of update to be aggregated. This reuses the same pool slot as the one just received, but contains a new set of k parameters, determined by advancing the previous offset by k ·s. Note: Also see Algorithm 2 showing for loop corresponding to iterations).
However, Sapio does not explicitly disclose: and protocol configuration circuitry to: prohibit a second processing iteration when an availability status of the INA is false; and (b) the availability status of the INA is true.
Lao teaches, in an analogous system: and protocol configuration circuitry to: prohibit a second processing iteration when an availability status of the INA is false ([Page 745, Column 2, Paragraph 1] The collision flag is marked by a switch when it forwards a gradient packet onward due to the aggregator not being available because it is in use by a different job. This flag helps PS choose another aggregator to avoid collision in the next round. Note: The aggregator not being available corresponds to the INA is false);
and (b) the availability status of the INA is true ([Page 749, Column 1, Last Paragraph] Control logic. This is responsible for checking whether an aggregator is available).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the apparatus of Sapio to incorporate the teachings of Lao to use and protocol configuration circuitry to: prohibit a second processing iteration when an availability status of the INA is false; and (b) the availability status of the INA is true. One would have been motivated to do this modification because doing so would give the benefit of choosing another aggregator to avoid collision in the next round as taught by Lao [Page 745, Column 2, Paragraph 1].

Regarding claim 2:
The system of Sapio and Lao teaches: The apparatus as defined in claim 1 (as shown above).
Sapio further teaches: further including resource location circuitry to select the worker node based on a proximity to the INA ([Page 786, Column 2, Paragraph 5] The workers communicate over an overlay network. A ring topology [6], where each worker communicates to the next neighboring worker on the ring).

Regarding claim 3:
The system of Sapio and Lao teaches: The apparatus as defined in claim 2 (as shown above).
Sapio further teaches: wherein the resource location circuitry is to determine the proximity is based on at least one of a physical distance metric or a node hop metric ([Page 786, Column 2, Paragraph 5] The workers communicate over an overlay network. A ring topology [6], where each worker communicates to the next neighboring worker on the ring. Note: The next neighboring corresponds to a physical distance metric).

Regarding claim 8:
Sapio teaches: An apparatus to facilitate distributed machine learning, comprising: memory; machine readable instructions ([Page 793, Column 2, Section 6, Paragraph 2] Testbed. We conduct most of our experiments on a testbed of 8 machines, each with 1 NVIDIA P100 16 GB GPU, dual 10-core CPU Intel Xeon E5-2630v4 at 2.20 GHz, 128 GB of RAM, and 3 ⇥ 1 TB disks for storage (as single RAID)).
The claim recites substantially same limitations as claim 1 except for the above limitations and is therefore rejected for same rationale as claim 1.

Regarding claim 9:
The claim recites substantially same limitations as claim 2 and is therefore rejected for same rationale as claim 2.

Regarding claim 10:
The claim recites substantially same limitations as claim 3 and is therefore rejected for same rationale as claim 3.

Regarding claim 17:
Sapio teaches: A non-transitory machine readable storage medium comprising instructions that, when executed, cause processor circuitry to at least: complete a first processing iteration requested by an orchestrator computing device ([Page 796, Column 1, Paragraph 1] This is due to our use of x86 SSE/AVX instructions. When we use float16, performance doubles, as expected. However, these overheads become more relevant as data rates increase, requiring to offload type conversion operations to the GPU at 100 Gbps and scaling up the number of cores; RDMA alleviates the pressure for CPU cycles used for I/O (see the gap between DPDK- and RDMA-based performance in Figure 3)).
The claim recites substantially same limitations as claim 1 except for the above limitations and is therefore rejected for same rationale as claim 1.

Regarding claim 18:
The claim recites substantially same limitations as claim 2 and is therefore rejected for same rationale as claim 2.

Regarding claim 19:
The claim recites substantially same limitations as claim 3 and is therefore rejected for same rationale as claim 3.

Claims 4-7, 11-16, and 20-23 are rejected under 35 U.S.C. 103 as being unpatentable over Sapio et al (Scaling Distributed Machine Learning with In-Network Aggregation, 2021) in view of Lao et al (ATP: In-network Aggregation for Multi-tenant Learning, 2021) and further in view of Gebara et al (IN-NETWORK AGGREGATION FOR SHARED MACHINE LEARNING CLUSTERS, 2021).
Regarding claim 4:
The system of Sapio and Lao teaches: The apparatus as defined in claim 1 (as shown above).
However, the system of Sapio and Lao does not explicitly disclose: further including resource determination circuitry to form an aggregation tree between the parameter server, a plurality of worker nodes, and a plurality of INAs. 
Gebara teaches, in an analogous system: further including resource determination circuitry to form an aggregation tree between the parameter server, a plurality of worker nodes, and a plurality of INAs ([Page 4, Column 1, Paragraph 2] When the PANAMA controller selects in-network aggregation for a job, it initializes all the PSwitches belonging to the spanning trees connecting the workers. Note: Also see Figure 3 showing trees).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of Sapio and Lao to incorporate the teachings of Gebara to further include resource determination circuitry to form an aggregation tree between the parameter server, a plurality of worker nodes, and a plurality of INAs. One would have been motivated to do this modification because doing so would give the benefit of higher network efficiency and lower congestion as taught by Gebara [Page 4, Column 1, Paragraph 2].

Regarding claim 5:
The system of Sapio and Lao teaches: The apparatus as defined in claim 4 (as shown above).
However, Sapio does not explicitly disclose: wherein the protocol configuration circuitry is to prevent resource stalling by permitting the second processing iteration before the parameter server receives the first vector. 
Lao teaches, in an analogous system: wherein the protocol configuration circuitry is to prevent resource stalling by permitting the second processing iteration before the parameter server receives the first vector ([Page 742, Column 1, Section 2.1, Last Paragraph] as shown in Figure 1 enables data-parallel training, where training data is partitioned and distributed to workers. There are two phases: gradient computation, where workers locally compute gra [Page 742, Column 2, Paragraph 1] dients; and gradient aggregation, where workers' gradients are transmitted over the network to be aggregated (which in volves the addition of gradients) at one or more end-hosts called parameter servers (PSs). Note: Figure 1 also shows calculations before the parameter server receives).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the apparatus of Sapio to incorporate the teachings of Lao wherein the protocol configuration circuitry is to prevent resource stalling by permitting the second processing iteration before the parameter server receives the first vector. One would have been motivated to do this modification because doing so would give the benefit of enables data-parallel training as taught by Lao [Page 742, Column 1, Section 2.1, Last Paragraph].

Regarding claim 6:
The system of Sapio and Lao teaches: The apparatus as defined in claim 1 (as shown above).
However, the system of Sapio and Lao does not explicitly disclose: wherein the protocol configuration circuitry is to cause a first model to be sent from the parameter server to the worker node. 
Gebara teaches, in an analogous system: wherein the protocol configuration circuitry is to cause a first model to be sent from the parameter server to the worker node ([Page 4, Column 1] Note: Figure 3 shows PS in step 2 corresponding to Parameter server and workers in step 3 corresponding to sending from parameter server to worker node).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of Sapio and Lao to incorporate the teachings of Gebara wherein the protocol configuration circuitry is to cause a first model to be sent from the parameter server to the worker node. One would have been motivated to do this modification because doing so would give the benefit of higher network efficiency and lower congestion as taught by Gebara [Page 4, Column 1, Paragraph 2].

Regarding claim 7:
The system of Sapio and Lao teaches: The apparatus as defined in claim 6 (as shown above).
However, the system of Sapio and Lao does not explicitly disclose: wherein the protocol configuration circuitry is to cause the worker node to calculate gradient data based on the first model, the worker node to send the gradient data to the INA as the first vector. 
Gebara teaches, in an analogous system: wherein the protocol configuration circuitry is to cause the worker node to calculate gradient data based on the first model, the worker node to send the gradient data to the INA as the first vector ([Page 4, Column 2, Last Paragraph] Workers distribute the gradient packets to different trees in a round robin fashion. For instance, in the topology in Fig. 3, with four aggregation trees AggTreei, i = 1; : : : ; 4 and eight aggregation packets with IDs pj , j = 1; : : : ; 8, assuming workers start by sending a single packet to each of the trees).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of Sapio and Lao to incorporate the teachings of Gebara wherein the protocol configuration circuitry is to cause the worker node to calculate gradient data based on the first model, the worker node to send the gradient data to the INA as the first vector. One would have been motivated to do this modification because doing so would give the benefit of higher network efficiency and lower congestion as taught by Gebara [Page 4, Column 1, Paragraph 2].

Regarding claim 11:
The claim recites substantially same limitations as claim 4 and is therefore rejected for same rationale as claim 4.

Regarding claim 12:
The claim recites substantially same limitations as claim 5 and is therefore rejected for same rationale as claim 5.

Regarding claim 13:
The system of Sapio and Lao teaches: The apparatus as defined in claim 11 (as shown above).
However, Sapio does not explicitly disclose: wherein the processor circuitry is to prevent INA stalling by permitting the second processing iteration when an indication of INA availability is detected. 
Lao teaches, in an analogous system: wherein the processor circuitry is to prevent INA stalling by permitting the second processing iteration when an indication of INA availability is detected ([Page 745, Column 2, Paragraph 1] The collision flag is marked by a switch when it forwards a gradient packet onward due to the aggregator not being available because it is in use by a different job. This flag helps PS choose another aggregator to avoid collision in the next round).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Sapio to incorporate the teachings of Lao wherein the processor circuitry is to prevent INA stalling by permitting the second processing iteration when an indication of INA availability is detected. One would have been motivated to do this modification because doing so would give the benefit of avoiding collision in the next round as taught by Lao [Page 745, Column 2, Paragraph 1].

Regarding claim 14:
The system of Sapio and Lao teaches: The apparatus as defined in claim 13 (as shown above).
However, Sapio does not explicitly disclose: wherein the processor circuitry is to permit the second processing iteration before data corresponding to the first processing iteration has propagated from the computing resource to the parameter server. 
Lao teaches, in an analogous system: wherein the processor circuitry is to permit the second processing iteration before data corresponding to the first processing iteration has propagated from the computing resource to the parameter server ([Page 742, Column 1, Section 2.1, Last Paragraph] as shown in Figure 1 enables data-parallel training, where training data is partitioned and distributed to workers. There are two phases: gradient computation, where workers locally compute gra [Page 742, Column 2, Paragraph 1] dients; and gradient aggregation, where workers' gradients are transmitted over the network to be aggregated (which in-volves the addition of gradients) at one or more end-hosts called parameter servers (PSs). Note: Figure 1 also shows calculations before the parameter server receives).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Sapio to incorporate the teachings of Lao wherein the processor circuitry is to permit the second processing iteration before data corresponding to the first processing iteration has propagated from the computing resource to the parameter server. One would have been motivated to do this modification because doing so would give the benefit of avoiding collision in the next round as taught by Lao [Page 745, Column 2, Paragraph 1].

Regarding claim 15:
The claim recites substantially same limitations as claim 6 and is therefore rejected for same rationale as claim 6.

Regarding claim 16:
The claim recites substantially same limitations as claim 7 and is therefore rejected for same rationale as claim 7.

Regarding claim 20:
The claim recites substantially same limitations as claim 4 and is therefore rejected for same rationale as claim 4.

Regarding claim 21:
The claim recites substantially same limitations as claim 5 and is therefore rejected for same rationale as claim 5.

Regarding claim 22:
The claim recites substantially same limitations as claim 6 and is therefore rejected for same rationale as claim 6.

Regarding claim 23:
The claim recites substantially same limitations as claim 7 and is therefore rejected for same rationale as claim 7.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Zhang et al (Is Network the Bottleneck of Distributed Training?, 2020) discloses a first-principles approach to measure and analyze the network performance of distributed training. As expected, our measurement confirms that communication is the component that blocks distributed training from linear scale-out. However, contrary to the common belief, we find that the network is running at low utilization and that if the network can be fully utilized, distributed training can achieve a scaling factor of close to one.
Abdelmoniem et al (DC2: Delay-aware Compression Control for Distributed Machine Learning, 2021) discloses  advocate a more controlled use of com pression and propose DC2, a delay-aware compression control mechanism. DC2 couples compression control and network delays in applying compression adaptively. DC2 not only compensates for network variations but can also strike a better trade-off between training speed and accuracy. DC2 is implemented as a drop-in module to the communication library used by the ML toolkit and can operate in a variety of network settings.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHAITANYA RAMESH JAYAKUMAR whose telephone number is (571)272-3369. The examiner can normally be reached Mon-Fri 9am-1pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez Rivas can be reached at (571)272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/C.R.J./Examiner, Art Unit 2128                                                                                                                                                                                                        
/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128

Read full office action

Prosecution Timeline

Dec 23, 2022

Application Filed

Feb 16, 2023

Response after Non-Final Action

Mar 12, 2026

Non-Final Rejection mailed — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

15/884,279

Patent 12293260

GENERATING AND DEPLOYING PACKAGES FOR MACHINE LEARNING AT EDGE DEVICES

7y 3m to grant Granted May 06, 2025

16/547,380

Patent 12147915

SYSTEMS AND METHODS FOR MODELLING PREDICTION ERRORS IN PATH-LEARNING OF AN AUTONOMOUS LEARNING AGENT

5y 3m to grant Granted Nov 19, 2024

15/866,225

Patent 11770571

Matrix Completion and Recommendation Provision with Deep Learning

5y 8m to grant Granted Sep 26, 2023

16/507,025

Patent 11769074

COLLECTING OBSERVATIONS FOR MACHINE LEARNING

4y 2m to grant Granted Sep 26, 2023

15/826,613

Patent 11741693

SYSTEM AND METHOD FOR SEMI-SUPERVISED CONDITIONAL GENERATIVE MODELING USING ADVERSARIAL NETWORKS

5y 9m to grant Granted Aug 29, 2023

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

24%

Grant Probability

45%

With Interview (+20.5%)

5y 3m (~1y 9m remaining)

Median Time to Grant

Low

PTA Risk

Based on 53 resolved cases by this examiner. Grant probability derived from career allowance rate.