Last updated: May 29, 2026

Application No. 18/258,523

IMPROVED DISTRIBUTED TRAINING OF GRAPH-EMBEDDING NEURAL NETWORKS

Non-Final OA §103

Filed

Jun 20, 2023

Priority

Dec 22, 2020 — CN PCT/CN2020/138290 +1 more

Examiner

FIGUEROA, KEVIN W

Art Unit

2124

Tech Center

2100 — Computer Architecture & Software

Assignee

Orange

OA Round

1 (Non-Final)

Interview Optional

— +22.1% interview lift. Examiner has a relatively high allowance rate (70%); +22.1% interview lift. A written response may suffice.

Based on 369 resolved cases, 2023–2026

Examiner Intelligence

FIGUEROA, KEVIN W View full profile →

Grants 70% — above average

Career Allowance Rate

257 granted / 369 resolved

+14.6% vs TC avg

Strong +22% interview lift

Without

With

+22.1%

Interview Lift

resolved cases with interview

Typical timeline

3y 11m

Avg Prosecution

9 currently pending

Career history

386

Total Applications

across all art units

Statute-Specific Performance

§101

8.3%

-31.7% vs TC avg

§103

86.2%

+46.2% vs TC avg

§102

3.1%

-36.9% vs TC avg

§112

0.8%

-39.2% vs TC avg

Black line = Tech Center average estimate • Based on career data from 369 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zheng, Da, et al. "DistDGL: Distributed graph neural network training for billion-scale graphs." in view of Zhu, Rong, et al. "Aligraph: A comprehensive graph neural network platform." [herein Ali]. 
Regarding claims 1 and 14, Zheng teaches “a computer-implemented method for distributed training of a graph-embedding neural network, the method performed at a first server and comprising” (abstract “DistDGL distributes the graph and its associated data (initial features and embeddings) across the machines and uses this distribution to derive a computational decomposition by following an owner-compute rule. DistDGL follows a synchronous training approach and al lows ego-networks forming the mini-batches to include non-local nodes”): 
“computing, based on a first input data sample, first model data and first embedding data of a first graph neural network, the first graph neural network corresponding to a first set of nodes of a graph that are visible to the first server” (pg. 2 left col. ¶1 “It distributes graph data (both graph structure and the associated data, such as node and edge features) across all machines and run trainers, sampling servers (for sampling subgraphs to generate mini-batches) and in-memory KVStore servers (for serving node data and edge data) all on the same set of machines”); 
“sharing the first model data and the first embedding data with a second server” (pg. 3 left col. “A number of trainers that compute the gradients of the model parameters over a mini-batch. At each iteration, they first fetch the mini-batch graphs from the samplers and the corresponding vertex/edge features from the KVStore. They then run the forward and backward computation on their own mini-batches in parallel to compute the gradients.” i.e. sharing); 
While Zheng generally teaches the remaining limitations Ali, in the same field of endeavor teaches “receiving second embedding data from a third server” (Ali pg. 3 left col. “If the neighbors of a vertex are not cached, a call to remote graph server is needed. When getting the context of a batch of vertices, we first partition the vertices into sub-batches, and the context of each sub-batch will be stitched together after being returned from the corresponding graph server”), “the second embedding data comprising embedding data of a second graph neural network corresponding to a second set of nodes of the graph that are invisible to the first server” (pg. 5 §3.3 “Recall that, GNN algorithms rely on aggregating neighborhood information to generate embeddings of each vertex. However, the degree distribution of real-world graphs is often skewed [48], which makes the convolution operation hard to operate. To tackle this, existing GNNs usually adopt various sampling strategies to sample a subset of neighbors with aligned sizes”); and 
“computing second model data of the first graph neural network based on a second input data sample and the embedding data of the second graph neural network” (Ali previous citation “existing GNNs usually adopt various sampling strategies to sample a subset of neighbors with aligned sizes” i.e. using the other data to compute the current data)  
It would have been obvious to one having ordinary skill in the art at the time that the invention was effectively filed to combine the teachings of Zheng with that of Ali since “AliGraph runs 40%-50% faster with the novel caching strategy and demonstrates around 12 times speed up with the improved runtime. In addition, our in-house developed GNNmodels all showcase their statistically significant superiorities in terms of both effectiveness and efficiency (e.g., 4.12%–17.19% lift by F1 scores).” Ali abstract. That is, by combining the two references, one would have faster distributed GNN training.
	Note that independent claim 14 recites the same substantial subject matter as independent claim 1, only differing in embodiment. The difference in embodiment, a computer-implemented method as opposed to a computer service executing the method are an obvious vibration of another. The additional limitations of a processor and memory are inherent components to any computing system such as the system of Zheng and Ali.
	Regarding claim 2, the Zheng and Ali references have been addressed above. Ali further teaches “computing third embedding data of the first graph neural network based on the second input data sample and the second embedding data” (Ali pg. 5 §3.3 “Recall that, GNN algorithms rely on aggregating neighborhood information to generate embeddings of each vertex. However, the degree distribution of real-world graphs is often skewed [48], which makes the convolution operation hard to operate. To tackle this, existing GNNs usually adopt various sampling strategies to sample a subset of neighbors with aligned sizes” the system is not limited to any particular number of servers/data and therefore the functionality is the same whether its third, fourth, fifth, data, etc.); and 
	Zheng teaches “sharing the third embedding data with the second server” (pg. 8 right col. “GNN models are composed of multiple operators organized into multiple graph convolution network layers shared among all nodes and edges”)
	Regarding claim 3, the Zheng and Ali references have been addressed above. Ali further teaches “wherein the embedding data of the second graph neural network is computed by a fourth server” (Ali pg. 5 §3.3 “Recall that, GNN algorithms rely on aggregating neighborhood information to generate embeddings of each vertex. However, the degree distribution of real-world graphs is often skewed [48], which makes the convolution operation hard to operate. To tackle this, existing GNNs usually adopt various sampling strategies to sample a subset of neighbors with aligned sizes” the system is not limited to any particular number of servers/data and therefore the functionality is the same whether its third, fourth, fifth, data, etc.)
	Regarding claim 4, the Zheng and Ali references have been addressed above. Zheng further teaches “wherein the third server is a parameter server that receives the embedding data of the second graph neural network from the fourth server” (pg. 3 left col. “A number of trainers that compute the gradients of the model parameters over a mini-batch. At each iteration, they first fetch the mini-batch graphs from the samplers and the corresponding vertex/edge features from the KVStore.”)
	Regarding claim 5, the Zheng and Ali references have been addressed above. Zheng further teaches “wherein the third server is the fourth server” (abstract “we develop DistDGL, a system for training GNNs in a mini-batch fashion on a cluster of machines.” i.e. servers/machines can be the same or not)
	Regarding claim 6, the Zheng and Ali references have been addressed above. Zheng further teaches “wherein the second server is different than the fourth server” (abstract “we develop DistDGL, a system for training GNNs in a mini-batch fashion on a cluster of machines.” i.e. servers/machines can be the same or not and fig. 3 which shows two distinct machines/servers)
	Regarding claim 7, the Zheng and Ali references have been addressed above. Zheng further teaches “wherein sharing the third embedding data with the second server comprises sharing the computed third embedding data and the second embedding data received from the third server” (pg. 8 right col. “GNN models are composed of multiple operators organized into multiple graph convolution network layers shared among all nodes and edges”)
	Regarding claim 8, the Zheng and Ali references have been addressed above. Ali further teaches “wherein the third server combines the first embedding data and the embedding data of the second graph neural network to form the second embedding data” (Ali pg. 3 algorithm 1 shows combination of vertexes/embeddings)
	Regarding claim 9, the Zheng and Ali references have been addressed above. Zheng further teaches “further comprising sharing the second model data of the first graph neural network with the second server” (pg. 8 right col. “GNN models are composed of multiple operators organized into multiple graph convolution network layers shared among all nodes and edges”)
	Regarding claim 10, the Zheng and Ali references have been addressed above. Zheng further teaches “further comprising receiving third model data, comprising a model of the graph-embedding neural network, from the third server, said third model data being used when computing said second model data” (pg. 2 right col. “This sampling strategy forms a computation graph for passing messages on. Figure 1b depicts such a graph for computing representation of one target vertex when the GNN has two layers. The sampled graph and together with the extracted features are called a mini-batch in GNN training” which can and is received by any of the servers and used by any)
	Regarding claim 11, the Zheng and Ali references have been addressed above. Zheng further teaches “wherein said third model data comprises aggregate model data obtained by aggregating, at the third server, a plurality of model data received from different servers” (pg. 3 left col. “A dense model update component for aggregating dense GNN parameters to perform synchronous SGD. Dist DGL reuses the existing components depending on DGL’s backend deep learning frameworks”) 
	Regarding claim 12, the Zheng and Ali references have been addressed above. Zheng further teaches “further comprising aggregating the third model data with the first model data to produce aggregate model data; and using the aggregate model data when computing the second model data” (pg. 3 left col. “A dense model update component for aggregating dense GNN parameters to perform synchronous SGD. Dist DGL reuses the existing components depending on DGL’s backend deep learning frameworks” and pg. 6 left col. “This hybrid approach is potentially more advantageous than the multiprocessing approach for synchronous SGD because we need to aggregate gradients of model parameters from all trainer processes and broadcast new model parameters to all trainers. More trainer processes result in more communication overhead for model parameter updates”) 
	Regarding claim 13, the Zheng and Ali references have been addressed above. Zheng further teaches “wherein computing the second model data of the first graph neural network comprises integrating the embedding data of the second graph neural network into the first graph neural network beginning at a first convolutional layer of the first graph neural network” (pg. 2 right col. “Similar to convolutional neural networks (CNNs), a GNN model iteratively applies Equations (1) to generate vertex representations for multiple layers.”).
	Regarding claim 15, the Zheng and Ali references have been addressed above. Zheng further teaches “the computer server of claim 14; and at least one server, connected to the computer server” (abstract “we develop DistDGL, a system for training GNNs in a mini-batch fashion on a cluster of machines” machines i.e. servers), 
	Ali teaches “said at least one server configured to receive model data and embedding data from the computer server and to return aggregate model data and aggregate embedding data to the computer server” (pg. 3 right col. “Specifically, we apply the SAMPLE function
to fetch a subset S of vertices based on the neighbor set Nb(v) of vertex v, aggregate the embeddings of all vertices u 2 S by the AGGREGATE function to obtain a vector h0 v, and combine h0 v with h(k􀀀1) v to generate the embedding vector h(k) v by the COMBINE function. After processing all vertices, the embedding vectors are normalized. Finally, after kmax hops, h(kmax) v is returned as the embedding result hv of vertex v”)
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KEVIN W FIGUEROA whose telephone number is (571)272-4623. The examiner can normally be reached Monday-Friday, 10AM-6PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, MIRANDA HUANG can be reached at (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

KEVIN W FIGUEROA
Primary Examiner
Art Unit 2124



/Kevin W Figueroa/Primary Examiner, Art Unit 2124

Read full office action

Prosecution Timeline

Jun 20, 2023

Application Filed

Mar 26, 2026

Non-Final Rejection mailed — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/477,514

Patent 12640714

Method and Apparatus for Cross Correlation

4y 8m to grant Granted May 26, 2026

16/983,775

Patent 12632748

APPARATUSES, SYSTEMS AND METHODS FOR GENERATING A BASE-LINE PROBABLE ROOF LOSS CONFIDENCE SCORE

5y 9m to grant Granted May 19, 2026

17/080,596

Patent 12625864

APPARATUS, SYSTEMS, AND METHODS FOR CROWDSOURCING DOMAIN SPECIFIC INTELLIGENCE

5y 6m to grant Granted May 12, 2026

17/412,970

Patent 12626170

SYSTEM AND METHOD FOR APPROXIMATING NUMERICAL FEATURES VIA CUBIC SPLINES AND APPLICATIONS THEREOF

4y 8m to grant Granted May 12, 2026

18/120,434

Patent 12626101

METHOD FOR CONSTRUCTING DESIGN CONCEPT GENERATION NETWORK (DCGN) AND METHOD FOR AUTOMATICALLY GENERATING CONCEPTUAL SCHEME

3y 2m to grant Granted May 12, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

70%

Grant Probability

92%

With Interview (+22.1%)

3y 11m (~11m remaining)

Median Time to Grant

Low

PTA Risk

Based on 369 resolved cases by this examiner. Grant probability derived from career allowance rate.