Office Action Analysis: 18154523 — MULTI-GRAPH CONVOLUTION COLLABORATIVE FILTERING

Office Action

§101 §102 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


	Claims 1-19 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea (mental process) without significantly more. 
Regarding claim 1, in Step 1 of the 101 analysis set forth in MPEP 2106, the claim recites a system that processes a bipartite graph. A system is one of the four statutory categories of invention.
In Step 2a Pong 1 of the 101 analysis set forth in the MPEP 2106, the examiner has determined that the following limitations recite a process that, under the broadest reasonable interpretation, covers a mental process but for recitation of generic computer components:
generating a target first node embedding for a target first node based on features of second nodes and first nodes that are within a multi-hop first node neighbourhood of the target first node, the target first node being selected from the plurality of first nodes of the first node type; (a person can mentally generate an embedding for nodes based on data as a process of simply evaluating data and making a judgement based on the observed data. (MPEP 2106))
  generating a target second node embedding for a target second node based on features of first nodes and second nodes that are within a multi-hop second node neighbourhood of the target second node, the target second node being selected from the plurality of second nodes of the second node type (a person can mentally generate an embedding for nodes based on data as a process of simply evaluating data and making a judgement based on the observed data. (MPEP 2106))
and determining a relationship between the target first node and the target second node based on the target first node embedding and the target second node embedding. (a person can mentally determine a relationship between data based on data as a process of simply evaluating data and making a judgement based on the observed data. (MPEP 2106))
If claim limitations, under their broadest reasonable interpretation, covers performance of the limitations as a mental process but for the recitation of generic computer components, then it 
falls within the mental process grouping of abstract ideas. According, the claim “recites” an abstract idea.
In Step 2a Prong 2 of the 101 analysis set forth in MPEP 2106, the examiner has determined that the following additional elements do not integrate this judicial exception into a practical application:
A computer implemented method for processing a bipartite graph that comprises a plurality of first nodes of a first node type, and a plurality of second nodes of a second type, comprising: (Generally linking the use of the judicial exception to a particular technological environment or field of use (MPEP 2106.05(h))).
Since the claim does not contain any other additional elements that are indicative of integration into a practical application, the claim is “directed” to an abstract idea.
In Step 2b of the 101 analysis set forth in the 2019 PEG, the examiner has determined that the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, the additional element (ii) recites generally linking the use of the judicial exception to a particular technological environment or field of use, which is not indicative of significantly more. Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.
Regarding claim 2, it is dependent upon claim 1, and thereby incorporates the limitations of, and corresponding analysis applied to claim 1. Further, claim 2 recites The method of claim 1 wherein: generating the target first node embedding comprises: for each of a plurality of second nodes included within the first node neighbourhood: (This limitation generally links the use of the judicial exception to a particular technological environment or field of use (see MPEP 2106.05(h)).) aggregating features for first nodes that are direct neighbours of the second node within the first node neighbourhood, and mapping the aggregated features to a respective second node embedding; (In step 2A pong 2, aggregating data is a mere application of a computer tool (machine learning model) aggregating the second node embeddings for the plurality of second nodes included within the first node neighbourhood; (In step 2A pong 2, aggregating data is a mere application of a computer tool (machine learning model)  and mapping the aggregated second node embeddings to generate the target first node embedding; (In step 2A pong 2, mapping data is a mere application of a computer tool (machine learning model) and generating the target second node embedding comprises: for each of a plurality of first nodes included within the second node neighbourhood: aggregating features for second nodes that are direct neighbours of the first node within the second node neighbourhood, and mapping the aggregated features to a respective first node embedding; (In step 2A pong 2, aggregating data is a mere application of a computer tool (machine learning model) aggregating the first node embeddings for the plurality of first nodes included within the second node neighbourhood; (In step 2A pong 2, aggregating data is a mere application of a computer tool (machine learning model) and mapping the aggregated first node embeddings to generate the target second node embedding. (In step 2A pong 2, mapping data is a mere application of a computer tool (machine learning model)
Regarding claim 3, it is dependent upon claim 2, and thereby incorporates the limitations of, and corresponding analysis applied to claim 2. Further, claim 3 recites The method of claim 2 wherein each aggregating and mapping is performed using a respective function that is defined by a respective set of learnable parameters, wherein the aggregating and mapping is performed iteratively in respect of the target first node and the target second node and the learnable parameters updated to optimize an objective function calculated based on the target first node embedding and the target second node embedding. (This limitation generally links the use of the judicial exception to a particular technological environment or field of use (see MPEP 2106.05(h)).)
Regarding claim 4, it is dependent upon claim 3, and thereby incorporates the limitations of, and corresponding analysis applied to claim 3. Further, claim 4 recites The method of claim 3 wherein the functions are implemented within a graphic convolution network (GCN) and the respective sets of learnable parameters are weight matrices.  (This limitation generally links the use of the judicial exception to a particular technological environment or field of use (see MPEP 2106.05(h)).)
Regarding claim 5, it is dependent upon claim 1, and thereby incorporates the limitations of, and corresponding analysis applied to claim 1. Further, claim 5 recites The method of claim 1, comprising: defining the first node neighbourhood of the target first node by randomly sampling the bipartite graph to: select a second node subset from second nodes that are direct neighbours of the target first node, and select respective subsets of first nodes from first nodes that are direct neighbours of each of the second nodes of the second node subset; (In step 2A, prong 1, this recites a mental process but for recitation of generic computer components which is not indicative of integration into a practical application). and defining the second node neighbourhood of the target second node by randomly sampling the bipartite graph to: select a first node subset from first nodes that are direct neighbours of the target second node, and select respective subsets of second nodes from second nodes that are direct neighbours of each of the first nodes of the first node subset. (In step 2A, prong 1, this recites a mental process but for recitation of generic computer components which is not indicative of integration into a practical application).
Regarding claim 6, it is dependent upon claim 5, and thereby incorporates the limitations of, and corresponding analysis applied to claim 5. Further, claim 6 recites The method of claim 5 wherein respective predefined hyper-parameters define respective sizes of the second node subset, the respective subsets of first nodes, the first node subset and the respective subsets of second nodes. (This limitation generally links the use of the judicial exception to a particular technological environment or field of use (see MPEP 2106.05(h)).)
Regarding claim 7, it is dependent upon claim 1, and thereby incorporates the limitations of, and corresponding analysis applied to claim 1. Further, claim 7 recites The method of claim 1, comprising: determining, based on the bipartite graph, first node to first node relationship information and constructing a first node graph that includes first nodes, including the target first node, from the bipartite graph and the first node to first node relationship information; (In step 2A, prong 1, this recites a mental process but for recitation of generic computer components which is not indicative of integration into a practical application). generating a first node-first node embedding for the target first node based on the first node graph; (In step 2A pong 2, generating data is a mere application of a computer tool (machine learning model) determining, based on the bipartite graph, second node to second node relationship information and constructing a second node graph that includes second nodes, including the target second node, from the bipartite graph and the second node to second node relationship information; (In step 2A, prong 1, this recites a mental process but for recitation of generic computer components which is not indicative of integration into a practical application). generating a second node-second node embedding for the target second node based on the second node graph; (In step 2A pong 2, generating data is a mere application of a computer tool (machine learning model) wherein the relationship between the target first node and the target second node is determined also based on the first node-first node embedding and the second node-second node embedding. (This limitation generally links the use of the judicial exception to a particular technological environment or field of use (see MPEP 2106.05(h)).)
Regarding claim 8, it is dependent upon claim 7, and thereby incorporates the limitations of, and corresponding analysis applied to claim 7. Further, claim 8 recites The method of claim 7 wherein: determining the first node to first node relationship information comprises determining the presence or absence of a direct neighbor relationship between respective pairs of the first nodes based on calculating pairwise cosine similarities between the respective pairs of the first nodes, and determining the second node to second node relationship information comprises determining the presence or absence of a direct neighbor relationship between respective pairs of the second nodes based on calculating pairwise cosine similarities between the respective pairs of second nodes. (In step 2A, prong 1, this recites a mental process but for recitation of generic computer components which is not indicative of integration into a practical application).
Regarding claim 9, it is dependent upon claim 8, and thereby incorporates the limitations of, and corresponding analysis applied to claim 8. Further, claim 9 recites The method of claim 8 wherein: generating the first node-first node embedding for the target first node comprises using a first node-first node aggregating function having learnable parameters to aggregate features of the first nodes that are direct neighbours of the target first node in the first node graph, (In step 2A pong 2, aggregating data is a mere application of a computer tool (machine learning model) and generating the second node-second node embedding for the target second node comprises using a second node-second node aggregating function having learnable parameters to aggregate features of the second nodes that are direct neighbours of the target second node in the second node graph. (In step 2A pong 2, generating data is a mere application of a computer tool (machine learning model)
Regarding claim 10, it is dependent upon claim 7, and thereby incorporates the limitations of, and corresponding analysis applied to claim 7. Further, claim 10 recites The method of claim 7 comprising: generating, using a first skip connection transformation function having learnable parameters, a target first node skip connection embedding based on an initial target first node embedding; (In step 2A pong 2, generating data is a mere application of a computer tool (machine learning model) and generating, using a second skip connection transformation function having learnable parameters, a target second node skip connection embedding based on an initial target second node embedding, (In step 2A pong 2, generating data is a mere application of a computer tool (machine learning model) wherein the relationship between the target first node and the target second node is determined also based on the target first node skip connection and the target second node skip connection. (This limitation generally links the use of the judicial exception to a particular technological environment or field of use (see MPEP 2106.05(h)).)
Regarding claim 11, it is dependent upon claim 10, and thereby incorporates the limitations of, and corresponding analysis applied to claim 10. Further, claim 11 recites The method of claim 10 comprising: determining a first node embedding by fusing the target first node embedding, the first node-first node embedding and the first node skip connection embedding; (In step 2A, prong 1, this recites a mental process but for recitation of generic computer components which is not indicative of integration into a practical application). determining a second node embedding by fusing the target second node embedding, the second node-second node embedding and the second node skip connection embedding; (In step 2A, prong 1, this recites a mental process but for recitation of generic computer components which is not indicative of integration into a practical application). the relationship between the target first node and the target second node being determined based on the first node embedding and the second node embedding. (In step 2A, prong 1, this recites a mental process but for recitation of generic computer components which is not indicative of integration into a practical application).
Regarding claim 12, it is dependent upon claim 1, and thereby incorporates the limitations of, and corresponding analysis applied to claim 1. Further, claim 12 recites The method of claim 1 wherein the first nodes represent users and the second nodes represent items, the bipartite graph includes historical user-item interaction data, and the method further comprising determining an item recommendation for a user represented by the target node based on the determined relationship between the target first node and the target second node. (This limitation generally links the use of the judicial exception to a particular technological environment or field of use (see MPEP 2106.05(h)).

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-4 and 13-14 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by HE “Bipartite Graph Neural Networks for Efficient Node Representation Learning” .
Regarding claim 1 , HE teaches the invention substantially as claimed, including:
A computer implemented method for processing a bipartite graph that comprises a plurality of first nodes of a first node type, and a plurality of second nodes of a second type, comprising: generating a target first node embedding for a target first node based on features of second nodes and first nodes that are within a multi-hop first node neighbourhood of the target first node, the target first node being selected from the plurality of first nodes of the first node type; ((page 3, section 3, paragraph 3) Here we introduce the overall frame work of our model for better readability. In general, bipartite graph representation learning is to learn the embedding representations Hu ∈ RP′(using notation, this is a vector of a vector of a matrix, which would mean a singular node.) and Hv ∈ RQ′ for nodes in group U and V (two node groups/neighborhoods), respectively. Let femb be a general bipartite graph embedding model with parametersθ, then the representation of Hu and Hv is defined as follow: Hu,Hv = femb(Xu,Bu,Xv,Bv;θ)) generating a target second node embedding for a target second node based on features of first nodes and second nodes that are within a multi-hop second node neighbourhood of the target second node, the target second node being selected from the plurality of second nodes of the second node type; ((page 3, section 3, paragraph 3)Here we introduce the overall frame work of our model for better readability. In general, bipartite graph representation learning is to learn the embedding representations Hu ∈ RP′ and Hv ∈ RQ′ for nodes in group U and V, respectively. Let femb be a general bipartite graph embedding model with parametersθ, then the representation of Hu and Hv is defined as follow: Hu,Hv = femb(Xu,Bu,Xv,Bv;θ) (this formula shows that the node representations are based on the other group’s nodes))  and determining a relationship between the target first node and the target second node based on the target first node embedding and the target second node embedding. ((page 2, figure 2 description) Given the inputs of two domains Xu and Xv, we obtain their unsupervised-embedded representations as Hu and Hv via inter-domain message passing and inter-domain distribution alignment. To enable multi-hop neighbor information aggregation, we stack multiple layers to formulate a deep BGNN whose layers are trained in a cascaded manner. In this paper, we develop the BGNN model for bipartite graph node representation learning. BGNN is expressive, unsupervised and resource-economic. We proposed Interdomain Message Passing(IDMP) as the encoder and Intradomain Alignment(IDA) by adversarial learning to address the node feature inconsistency issue in bipartite graphs. We further design the cascaded architecture to capture the multi hop relationship in local bipartite structure as well as to improve the scalability. The extensive experiments confirm the effectiveness and efficiency of our models.)

	HE further teaches:
The method of claim 1 wherein: generating the target first node embedding comprises: for each of a plurality of second nodes included within the first node neighbourhood: aggregating features for first nodes that are direct neighbours of the second node within the first node neighbourhood, and mapping the aggregated features to a respective second node embedding; ((page 2, figure 2 description) Given the inputs of two domains Xu and Xv (first and second neighborhoods), we obtain their unsupervised-embedded representations as Hu and Hv via inter-domain message passing and inter-domain distribution alignment. To enable multi-hop neighbor information aggregation, we stack multiple layers to formulate a deep BGNN whose layers are trained in a cascaded manner. (cascaded training means each node in a row is aggregated into the next meaning each node is aggregated into their neighbor)) aggregating the second node embeddings for the plurality of second nodes included within the first node neighbourhood; ((page 2, figure 2 description) Given the inputs of two domains Xu and Xv, we obtain their unsupervised-embedded representations as Hu and Hv via inter-domain message passing and inter-domain distribution alignment. To enable multi-hop neighbor information aggregation, we stack multiple layers to formulate a deep BGNN whose layers are trained in a cascaded manner.) and mapping the aggregated second node embeddings to generate the target first node embedding; ((page 2, figure 2 description) Given the inputs of two domains Xu and Xv, we obtain their unsupervised-embedded representations as Hu and Hv via inter-domain message passing and inter-domain distribution alignment. To enable multi-hop neighbor information aggregation, we stack multiple layers to formulate a deep BGNN whose layers are trained in a cascaded manner.) and generating the target second node embedding comprises: for each of a plurality of first nodes included within the second node neighbourhood: aggregating features for second nodes that are direct neighbours of the first node within the second node neighbourhood, and mapping the aggregated features to a respective first node embedding; ((page 2, figure 2 description) Given the inputs of two domains Xu and Xv, we obtain their unsupervised-embedded representations as Hu and Hv via inter-domain message passing and inter-domain distribution alignment. To enable multi-hop neighbor information aggregation, we stack multiple layers to formulate a deep BGNN whose layers are trained in a cascaded manner.) aggregating the first node embeddings for the plurality of first nodes included within the second node neighbourhood; and mapping the aggregated first node embeddings to generate the target second node embedding.  ((page 2, figure 2 description) Given the inputs of two domains Xu and Xv, we obtain their unsupervised-embedded representations as Hu and Hv via inter-domain message passing and inter-domain distribution alignment. To enable multi-hop neighbor information aggregation, we stack multiple layers to formulate a deep BGNN whose layers are trained in a cascaded manner.)

	HE further teaches:
The method of claim 2 wherein each aggregating and mapping is performed using a respective function that is defined by a respective set of learnable parameters, wherein the aggregating and mapping is performed iteratively in respect of the target first node and the target second node ((page 2, figure 2 description) Given the inputs of two domainsXu andXv, we obtaintheir unsupervised-embeddedrepresen tations as Hu and Hv via inter-domain message passing and inter-domain distribution alignment. To enable multi-hop neighbor information aggregation, we stack multiple layers to formulate a deep BGNN whose layers are trained in a cascaded manner. Adversarial training loss of cascaded architecture. The x axis denotes iteration numbers in each layer. (as seen, the cascaded agreggation is a part of an iterative training system. As such, the aggregation itself is iterative.)) and the learnable parameters updated to optimize an objective function calculated based on the target first node embedding and the target second node embedding.  ((page 4, section 3.3, paragraph 1) In this section, we present the cascaded architecture design for our proposed BGNN model. In Figure 3(a), we depict a detailed diagram to illustrate our cascaded training process to compare with the conventional end-to-end training paradigm. We regard the previous IDMP with IDA as one depth. Each depth (layer) is trained one-hop embedding in Eepochsat a time. Then its final learned embeddingis used as the input for the later depth training. This is in contrast to the conventional end-to-end training paradigm which prop agates through multiple depths for a fixed number of hops to train the final embedding (Shown in Figure 3(b))
    PNG
    media_image1.png
    110
    451
    media_image1.png
    Greyscale
)
	
HE further teaches:
The method of claim 3 wherein the functions are implemented within a graphic convolution network (GCN) ((section 2, page 3) Our work is also relevant to those GCN based methods that deal with the scalability issue on large scale graphs, like the node-wise sampling method Graph SAGE (Hamilton, Ying, and Leskovec, 2017) and the ayer wise sampling method AS-GCN (Huang et al., 2018). Dif fer from these methods, our BGNN has better scale perfor mance by proposing a cascaded training architecture.) and the respective sets of learnable parameters are weight matrices. ((page 2, paragraph 1) Suppose the two domains to be U and V, respectively. In each layer (depth) of BGNN, we for mulate two kinds of information flows, one from U to V and the other from V to U, each of which is equipped with different weight filter. In this way, we attain inter-domain representation for each node.)

Regarding claims 13-14, they comprise of limitations similar to those of claims 1-2 and is therefore rejected for similar rationale.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 5 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over HE (“Bipartite Graph Neural Networks for Efficient Node Representation Learning”) in view of HADELI (US 20130235764 A1).

	Regarding claim 5, while HE does teach claim 1, which claim 5 is dependent upon, it does not specifically teach:
	The method of claim 1, comprising: defining the first node neighbourhood of the target first node by randomly sampling the bipartite graph to: select a second node subset from second nodes that are direct neighbours of the target first node, and select respective subsets of first nodes from first nodes that are direct neighbours of each of the second nodes of the second node subset; and defining the second node neighbourhood of the target second node by randomly sampling the bipartite graph to: select a first node subset from first nodes that are direct neighbours of the target second node, and select respective subsets of second nodes from second nodes that are direct neighbours of each of the first nodes of the first node subset.  
	However, in analogous art that similarly teaches managing a node neighborhood, HADELI teaches:
The method of claim 1, comprising: defining the first node neighbourhood of the target first node by randomly sampling the bipartite graph to: select a second node subset from second nodes that are direct neighbours of the target first node, and select respective subsets of first nodes from first nodes that are direct neighbours of each of the second nodes of the second node subset; ([0054] The repeater module R is configured to select a direct neighbour node in the data set including discovered dataflow information F and to direct the collector module C to update the data set including discovered dataflow information F using the selected direct neighbour node.) and defining the second node neighbourhood of the target second node by randomly sampling the bipartite graph to: select a first node subset from first nodes that are direct neighbours of the target second node, and select respective subsets of second nodes from second nodes that are direct neighbours of each of the first nodes of the first node subset.  ([0054] The repeater module R is configured to select a direct neighbour node in the data set including discovered dataflow information F and to direct the collector module C to update the data set including discovered dataflow information F using the selected direct neighbour node.)
It would have been obvious to a person skilled in the art before the effective filing date of the invention to have combined HADELI’s teaching of node subset selection, and HE’s teaching of processing of a bipartite graph and system,to realize, with a reasonable expectation of success, a system that selects node subsets for  defining a neighborhood, as in HADELI, during the processing of a bipartite graph, as in HE. A person of ordinary skill would have been motivated to make this combination to improve dataflow.(HADELI [0005])

Regarding claim 16, it comprises of limitations similar to those of claim 5 and is therefore rejected for similar rationale

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over HE (“Bipartite Graph Neural Networks for Efficient Node Representation Learning”), HADELI (U.S. Pub. No. US 20130235764 A1) in further view of BENGIO (U.S. Pub. No. US 20150178596 A1)

	Regarding claim 6, while HE, as modified by HADELI, does teach claim 5, which claim 6 is dependent upon, it does not explicitly teach:
The method of claim 5 wherein respective predefined hyper-parameters define respective sizes of the second node subset, the respective subsets of first nodes, the first node subset and the respective subsets of second nodes
	However, in analogous art that similarly processes node, BENGIO teaches:
The method of claim 5 wherein respective predefined hyper-parameters define respective sizes of the second node subset, the respective subsets of first nodes, the first node subset and the respective subsets of second nodes. ([0032] N corresponds to a hyper-parameter used to determine the size of the subset of nodes with a consistency above a threshold)
It would have been obvious to a person skilled in the art before the effective filing date of the invention to have combined BENGIO’s teaching of defining nodes, and HE’s, as modified by HADELI, teaching of processing of a bipartite graph and system, to realize, with a reasonable expectation of success, a system that defines nodes with predefined parameters, as in HADELI, during the processing of a bipartite graph, as in HE, as modified by HADELI. A person of ordinary skill would have been motivated to make this combination to improve node functionality.(BENIGO [0032])

Claims 7 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over HE (“Bipartite Graph Neural Networks for Efficient Node Representation Learning”), in view of SMYTH (U.S. Pub. No. US 9107595 B1)

	Regarding claim 7, while HE does teach claim 1, which claim 7 is dependent upon, it does not explicitly teach:
The method of claim 1, comprising: determining, based on the bipartite graph, first node to first node relationship information and constructing a first node graph that includes first nodes, including the target first node, from the bipartite graph and the first node to first node relationship information;
However, in analogous art that similarly handles node neighbors in a graph, SMYTH teaches:
The method of claim 1, comprising: determining, based on the bipartite graph, first node to first node relationship information and constructing a first node graph that includes first nodes, including the target first node, from the bipartite graph and the first node to first node relationship information; ((column 23, lines 21-33) The degree of a node (vertex) is the number of connections (edges) with other nodes in a network. The degree of the network is the average across all nodes. The connections are determined from a binary cross-correlation matrix computed from the original using a threshold of 0.817 (mean of 0.778+1.96*std dev of 0.019); nodes are connected if the correlation exceeds this value. The result is a set of sub-graph matrices for the direct neighbor nodes of each sub-graph index node. Table 1 of FIG. 33 lists the node degrees by event class.) 
It would have been obvious to a person skilled in the art before the effective filing date of the invention to have combined SMYTH’s relationship information gathering, and HE’s teaching of processing of a bipartite graph and system, to realize, with a reasonable expectation of success, a system gathers node relationship information, as in SMYTH, for the processing of a bipartite graph, as in HE. A person of ordinary skill would have been motivated to make this combination to improve result accuracy.(SMYTH column 1, line 60-column 2, line 18)
HE further teaches:
generating a first node-first node embedding for the target first node based on the first node graph; ((page 2, figure 2 description) Given the inputs of two domains Xu and Xv, we obtain their unsupervised-embedded representations as Hu and Hv via inter-domain message passing and inter-domain distribution alignment. To enable multi-hop neighbor information aggregation, we stack multiple layers to formulate a deep BGNN whose layers are trained in a cascaded manner.) 
SMYTH further teaches:
determining, based on the bipartite graph, second node to second node relationship information and constructing a second node graph that includes second nodes, including the target second node, from the bipartite graph and the second node to second node relationship information; ((column 23, lines 21-33) The degree of a node (vertex) is the number of connections (edges) with other nodes in a network. The degree of the network is the average across all nodes. The connections are determined from a binary cross-correlation matrix computed from the original using a threshold of 0.817 (mean of 0.778+1.96*std dev of 0.019); nodes are connected if the correlation exceeds this value. The result is a set of sub-graph matrices for the direct neighbor nodes of each sub-graph index node. Table 1 of FIG. 33 lists the node degrees by event class. )
HE further teaches:
 generating a second node-second node embedding for the target second node based on the second node graph; wherein the relationship between the target first node and the target second node is determined also based on the first node-first node embedding and the second node-second node embedding.  ((page 2, figure 2 description) Given the inputs of two domains Xu and Xv, we obtain their unsupervised-embedded representations as Hu and Hv via inter-domain message passing and inter-domain distribution alignment. To enable multi-hop neighbor information aggregation, we stack multiple layers to formulate a deep BGNN whose layers are trained in a cascaded manner.)

Regarding claim 17, it comprises of limitations similar to those of claim 7 and is therefore rejected for similar rationale.

Claims 8-9 and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over HE (“Bipartite Graph Neural Networks for Efficient Node Representation Learning”), SMYTH (U.S. Pub. No. US 9107595 B1) in further view of PRASAD TANNIRU (U.S. Pub. No. US 9107595 B1)

	While HE, as modified by SMYTH, does teach claim 7, which claim 8 is dependent upon, it does not explicitly teach:
The method of claim 7 wherein: determining the first node to first node relationship information comprises determining the presence or absence of a direct neighbor relationship between respective pairs of the first nodes based on calculating pairwise cosine similarities between the respective pairs of the first nodes
However, in analogous art that similarly handles node relationships, PRASAD TANNIRU teaches:
The method of claim 7 wherein: determining the first node to first node relationship information comprises determining the presence or absence of a direct neighbor relationship between respective pairs of the first nodes based on calculating pairwise cosine similarities between the respective pairs of the first nodes, ([0028] In some implementations, the knowledge platform may group textual data into topics in child nodes based on a cosine similarity; may allow a user to edit and/or override topics determined by the knowledge platform, relationships between nodes determined by the knowledge platform)
It would have been obvious to a person skilled in the art before the effective filing date of the invention to have combined PRASAD TANNIRU ‘s teaching determining node relationships, and HE’s, as modified by SMYTH, teaching of processing of a bipartite graph and system, to realize, with a reasonable expectation of success, a system that gathers node relationship information, as in PRASAD TANNIRU, during the processing of a bipartite graph, as in HE, as modified by HADELI. A person of ordinary skill would have been motivated to make this combination to improve node relationship accuracy.( PRASAD TANNIRU [0028])
 and determining the second node to second node relationship information comprises determining the presence or absence of a direct neighbor relationship between respective pairs of the second nodes based on calculating pairwise cosine similarities between the respective pairs of second nodes. ([0028] In some implementations, the knowledge platform may group textual data into topics in child nodes based on a cosine similarity; may allow a user to edit and/or override topics determined by the knowledge platform, relationships between nodes determined by the knowledge platform)   

	Regarding claim 8, HE further teaches:	
The method of claim 8 wherein: generating the first node-first node embedding for the target first node comprises using a first node-first node aggregating function having learnable parameters to aggregate features of the first nodes that are direct neighbours of the target first node in the first node graph, ((section 3.1 paragraph 3) where as mentioned in Eq. (2)-(3), Hk v→u ∈ RM× Q′ (resp. H(k) u→v ∈ RN×P′ ) are hidden features of the nodes in set U (resp. V ) aggregated from the features in V (resp. U), k indicates the depth index (note that when k = 0, H(0) u = Xu,H(0) v =Xvareactually input features). As wecansee fromEq.7, there are two important charac teristics that differ IDMP fromconventionalGCNs: 1.IDMP only performs aggregation Chaoyang He) and generating the second node-second node embedding for the target second node comprises using a second node-second node aggregating function having learnable parameters to aggregate features of the second nodes that are direct neighbours of the target second node in the second node graph. ((section 3.1 paragraph 3) where as mentioned in Eq. (2)-(3), Hk v→u ∈ RM× Q′ (resp. H(k) u→v ∈ RN×P′ ) are hidden features of the nodes in set U (resp. V ) aggregated from the features in V (resp. U), k indicates the depth index (note that when k = 0, H(0) u = Xu,H(0) v =Xv are actually input features). As we can see fromEq.7, there are two important characteristics that differ IDMP from conventional GCNs: 1.IDMP only performs aggregation)

Regarding claims 18-19, they comprise of limitations similar to those of claims 8-9 and are therefore rejected for similar rationale.

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over HE (“Bipartite Graph Neural Networks for Efficient Node Representation Learning”), SMYTH (U.S. Pub. No. US 9107595 B1) in further view of ZHOU (U.S. Pub. No. US 20200380695 A1)

Regarding claim 10, while HE, as modified by SMYTH, does teach claim 7, which claim 10 is dependent upon, it does not explicitly teach:
The method of claim 7 comprising: generating, using a first skip connection transformation function having learnable parameters, a target first node skip connection embedding based on an initial target first node embedding;
However, in analogous art that similarly embeds nodes, ZHOU teaches:
The method of claim 7 comprising: generating, using a first skip connection transformation function having learnable parameters, a target first node skip connection embedding based on an initial target first node embedding; ([0032] Note that, in some embodiments, the U-Net.sup.e shown in FIG. 2C can benefit from knowledge sharing, because each U-Net within the U-Net.sup.e partially shares the same encoder. However, note that, because the decoders in the U-Net.sup.e are disconnected, deeper U-Nets in the U-Net.sup.e do not offer a supervision signal to decoders of shallower U-Nets in the aggregate U-Net. In particular, note that, referring to FIG. 2C, node X.sup.0,1 is not connected to node X.sup.0,2, node X.sup.0,2 is not connected to node X.sup.0,3, and node X.sup.0,3 is not connected to node X.sup.0,4 via skip connections (dashed lines). ) and generating, using a second skip connection transformation function having learnable parameters, a target second node skip connection embedding based on an initial target second node embedding, ([0032] Note that, in some embodiments, the U-Net.sup.e shown in FIG. 2C can benefit from knowledge sharing, because each U-Net within the U-Net.sup.e partially shares the same encoder. However, note that, because the decoders in the U-Net.sup.e are disconnected, deeper U-Nets in the U-Net.sup.e do not offer a supervision signal to decoders of shallower U-Nets in the aggregate U-Net. In particular, note that, referring to FIG. 2C, node X.sup.0,1 is not connected to node X.sup.0,2, node X.sup.0,2 is not connected to node X.sup.0,3, and node X.sup.0,3 is not connected to node X.sup.0,4 via skip connections (dashed lines). ) wherein the relationship between the target first node and the target second node is determined also based on the target first node skip connection and the target second node skip connection.  ([0032] Note that, in some embodiments, the U-Net.sup.e shown in FIG. 2C can benefit from knowledge sharing, because each U-Net within the U-Net.sup.e partially shares the same encoder. However, note that, because the decoders in the U-Net.sup.e are disconnected, deeper U-Nets in the U-Net.sup.e do not offer a supervision signal to decoders of shallower U-Nets in the aggregate U-Net. In particular, note that, referring to FIG. 2C, node X.sup.0,1 is not connected to node X.sup.0,2, node X.sup.0,2 is not connected to node X.sup.0,3, and node X.sup.0,3 is not connected to node X.sup.0,4 via skip connections (dashed lines). That is, in the U-Net.sup.e network, feature maps from a U-Net of a particular depth are not combined with feature maps from U-Nets of other depths. For example, feature maps from U-Net 202 with depth 1 are not combined with feature maps from U-Nets of depths 2, 3, or 4 (that is, with feature maps from U-Nets 204, 206, and 208).)
It would have been obvious to a person skilled in the art before the effective filing date of the invention to have combined ZHOU ‘s using skip connections to find a relationship, and HE’s, as modified by SMYTH, teaching of processing of a bipartite graph and system, to realize, with a reasonable expectation of success, a system that uses skip connections to find relationships between nodes, as in ZHOU, during the processing of a bipartite graph, as in HE, as modified by SMYTH. A person of ordinary skill would have been motivated to make this combination to reduce resources and time strain.(ZHOU [0005])

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over HE (“Bipartite Graph Neural Networks for Efficient Node Representation Learning”), SMYTH (U.S. Pub. No. US 9107595 B1) (U.S. Pub. No. US 20200380695 A1) in further view of GARCIA (U.S. Pub. No. US 20180247224 A1)

	While HE, as modified by SMYTH and ZHOU, does teach claim 10, which claim 11 is dependent upon, it does not explicitly teach:
 The method of claim 10 comprising: determining a first node embedding by fusing the target first node embedding, the first node-first node embedding and the first node skip connection embedding; determining a second node embedding by fusing the target second node embedding, the second node-second node embedding and the second node skip connection embedding;
However, in analogous art that similarly embeds nodes, GARCIA teaches:
The method of claim 10 comprising: determining a first node embedding by fusing the target first node embedding, the first node-first node embedding and the first node skip connection embedding; ([0074] GEMP-B jointly learns embeddings of several attributes in an unsupervised manner and combines these into a node embedding.) determining a second node embedding by fusing the target second node embedding, the second node-second node embedding and the second node skip connection embedding; ([0074] GEMP-B jointly learns embeddings of several attributes in an unsupervised manner and combines these into a node embedding.) 
It would have been obvious to a person skilled in the art before the effective filing date of the invention to have combined GARCIA ‘s node embedding combination, and HE’s, as modified by SMYTH and ZHOU, teaching of processing of a bipartite graph and system, to realize, with a reasonable expectation of success, a system that combines embeddings to make a single node embedding, as in GARCIA, using the first and second node embeddings, as in HE, as modified by SMYTH and ZHOU. A person of ordinary skill would have been motivated to make this combination to improve functionality.(GARCIA [0007])
HE further teaches:
the relationship between the target first node and the target second node being determined based on the first node embedding and the second node embedding.  (Given the inputs of two domains Xu and Xv, we obtain their unsupervised-embedded representations as Hu and Hv via inter-domain message passing and inter-domain distribution alignment. To enable multi-hop neighbor information aggregation, we stack multiple layers to formulate a deep BGNN whose layers are trained in a cascaded manner.)

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over HE (“Bipartite Graph Neural Networks for Efficient Node Representation Learning”), in view of ARORA (U.S. Pub. No. US 11763349 B2) 

Regarding claim 12, while HE does teach claim 1, which claim 12 is dependent upon it does not explicitly teach:
The method of claim 1 wherein the first nodes represent users and the second nodes represent items, the bipartite graph includes historical user-item interaction data, and the method further comprising determining an item recommendation for a user represented by the target node based on the determined relationship between the target first node and the target second node.
However in analogous art that similarly handles user data, ARORA teaches:
The method of claim 1 wherein the first nodes represent users and the second nodes represent items, the bipartite graph includes historical user-item interaction data, ((column 13, lines 19-27) The user factor p.sub.u and item factor q.sub.i may be determined by a minimizing loss function operating on user transaction data. For example, the factors may be determined by the loss function given below, which minimizes the difference between the actual user-item interaction value (r.sub.ui) from the user transaction data and predicted user-item interaction value ({circumflex over (r)}.sub.ui) through gradient descent. Once training is complete, the user and item factors can be used to obtain the user-item affinity scores for all (user, item) pairs.) and the method further comprising determining an item recommendation for a user represented by the target node based on the determined relationship between the target first node and the target second node. ((column 13, lines 45-62) Digital advertisement computing device 102 may then determine the ranking of the promoted items for each candidate user based on the user-item affinity scores. For example, digital advertisement computing device 102 may rank promoted items with higher user-item affinity scores higher than those with lower user-item affinity scores. Web server 104 may then display one or more of the promoted items on a website to the corresponding candidate user when that candidate user is browsing the website.
(57) In some examples, when a user is purchasing items at store 109, a message is transmitted to digital advertisement computing device 102 requesting recommended items for one or more campaigns for that user.)
It would have been obvious to a person skilled in the art before the effective filing date of the invention to have combined ARORA’s user-item information, and HE’s bipartite graph and nodes, to realize, with a reasonable expectation of success, a system that gathers user-item information to make recommendations, as in ARORA, where the user and items are nodes in two node neighborhoods, as in HE. A person of ordinary skill would have been motivated to make this combination to improve recommended advertisements.(ARORA column 1, lines 37-52)

Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over HE (“Bipartite Graph Neural Networks for Efficient Node Representation Learning”) in view of PALANISAMY (U.S. Pub. No. US 20200293041 A1)
While HE does teach claim 14, which claim 15 is dependent upon, it does not explicitly teach:
The GCN of claim 14 wherein GCN is configured to implement an objective loss calculation based on the target first node embedding and the target second node embedding, the functions are each defined by a respective set of learnable parameters, and the CGN is configured to learn the parameters iteratively to determine learnable parameters that optimize the objective loss calculation.
However, in analogous art that similarly handles embeddings, PALANISAMY teaches:
The GCN of claim 14 wherein GCN is configured to implement an objective loss calculation based on the target first node embedding and the target second node embedding, the functions are each defined by a respective set of learnable parameters, and the CGN is configured to learn the parameters iteratively to determine learnable parameters that optimize the objective loss calculation. ([0073] The constrained embedding module 206 normalizes the low-dimensional embeddings so that they can be combined, which can include constraining the low-dimensional embeddings (or the output of the encoder modules 204-1 to 204-N) using an objective or loss function to produce a constrained embedding space Zc)
It would have been obvious to a person skilled in the art before the effective filing date of the invention to have combined PALANISAMY’s teaching of using a loss function on embeddings, and HE’s teaching of processing of a bipartite graph and system, to realize, with a reasonable expectation of success, a system generates a loss for embeddings, as in PALANISAMY, for the node embeddings of the bipartite graph, as in HE. A person of ordinary skill would have been motivated to make this combination to increase information accuracy.(PALANISAMY [0002])

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over ARORA (U.S. Pub. No. US 11763349 B2)  in view of HE (“Bipartite Graph Neural Networks for Efficient Node Representation Learning”) in view of SMYTH (U.S. Pub. No. US 9107595 B1) in view of GARCIA (U.S. Pub. No. US 20180247224 A1)
	Regarding claim 20, ARORA substantially teaches the claimed invention, including:
A multi-graph convolution collaborative filtering system implemented in multiple layers of a multi-layer graph convolution neural network for learning about user-item preferences from a bipartite graph that includes user nodes, item nodes and interaction data about historical interactions between user nodes and item nodes, the system comprising: (((53) The user factor p.sub.u and item factor q.sub.i may be determined by a minimizing loss function operating on user transaction data. For example, the factors may be determined by the loss function given below, which minimizes the difference between the actual user-item interaction value (r.sub.ui) from the user transaction data and predicted user-item interaction value ({circumflex over (r)}.sub.ui) through gradient descent. Once training is complete, the user and item factors can be used to obtain the user-item affinity scores for all (user, item) pairs.)) 
While ARORA does teach user-item information data, it does not explicitly teach:
a bipartite-graph convolution network configured to independently generate a user embedding for a target user node and an item embedding for a target item node based on the bipartite graph;
However, in analogous art that similarly handles node relationships, HE teaches:
a bipartite-graph convolution network configured to independently generate a user embedding for a target user node and an item embedding for a target item node based on the bipartite graph; (Here we introduce the overall frame work of our model for better readability. In general, bipartite graph representation learning is to learn the embedding representations Hu ∈ RP′ and Hv ∈ RQ′ for nodes in group U and V, respectively. Let femb be a general bipartite graph embedding model with parametersθ, then the representation of Hu and Hv is defined as follow: Hu,Hv = femb(Xu,Bu,Xv,Bv;θ)) 
It would have been obvious to a person skilled in the art before the effective filing date of the invention to have combined HE’s bipartite graph network, and ARORA’s user-item information, to realize, with a reasonable expectation of success, a system that gathers user-item information to make recommendations, as in ARORA, where the user and items are nodes in two node neighborhoods, as in HE. A person of ordinary skill would have been motivated to make this combination to improve training speed and memory usage.(HE abstract)
While ARORA, as modified by HE, does teach gathering user-item data and making a bipartite gnn, it does not explicitly teach:
a multi-graph encoder configured to: construct a user-user graph representing similarities between user nodes included in the bipartite graph and generate an user- user embedding for the target user node based on the user-user graph; and construct an item-item graph representing similarities between item nodes included in the bipartite graph and generate an item-item embedding for the target item node based on the item-item graph;
However, in analogous art that similarly handles node data, SMYTH teaches:
 a multi-graph encoder configured to: construct a user-user graph representing similarities between user nodes included in the bipartite graph and generate an user- user embedding for the target user node based on the user-user graph; ((146) The degree of a node (vertex) is the number of connections (edges) with other nodes in a network. The degree of the network is the average across all nodes. The connections are determined from a binary cross-correlation matrix computed from the original using a threshold of 0.817 (mean of 0.778+1.96*std dev of 0.019); nodes are connected if the correlation exceeds this value. The result is a set of sub-graph matrices for the direct neighbor nodes of each sub-graph index node. Table 1 of FIG. 33 lists the node degrees by event class. ) and construct an item-item graph representing similarities between item nodes included in the bipartite graph and generate an item-item embedding for the target item node based on the item-item graph; ((146) The degree of a node (vertex) is the number of connections (edges) with other nodes in a network. The degree of the network is the average across all nodes. The connections are determined from a binary cross-correlation matrix computed from the original using a threshold of 0.817 (mean of 0.778+1.96*std dev of 0.019); nodes are connected if the correlation exceeds this value. The result is a set of sub-graph matrices for the direct neighbor nodes of each sub-graph index node. Table 1 of FIG. 33 lists the node degrees by event class. )  
It would have been obvious to a person skilled in the art before the effective filing date of the invention to have combined SMYTH’s node-node embeddings, and ARORA’s, as modified by HE, user-item information in a bipartite graph network, to realize, with a reasonable expectation of success, a system that gathers user-item information to make recommendations, as in ARORA, as modified by HE, and constructs user-user and item-item embeddings, as in SMYTH. A person of ordinary skill would have been motivated to make this combination to improve result accuracy.(SMYTH column 1, line 60-column 2, line 18)
While ARORA, as modified by HE and SMYTH, does teach generating a bipartite user-item graph as well as node-node embeddings, it does not explicitly teach:
and a fusing module configured to perform a first fusion operation to fuse the user embedding, user-user embedding, and to perform a second fusion operation to fuse the item embedding and item-item embedding to output information that represents a relationship between the target user node and the target item node.
However, in analogous art that similarly handles node embeddings, GARCIA teaches:
and a fusing module configured to perform a first fusion operation to fuse the user embedding, user-user embedding, and to perform a second fusion operation to fuse the item embedding and item-item embedding to output information that represents a relationship between the target user node and the target item node. ([0074] GEMP-B jointly learns embeddings of several attributes in an unsupervised manner and combines these into a node embedding.)
It would have been obvious to a person skilled in the art before the effective filing date of the invention to have combined GARCIA’s node embedding fusion, and ARORA’s, as modified by HE and SMYTH, user-item information and node-node embedding in a bipartite graph network, to realize, with a reasonable expectation of success, a system that gathers user-item information to make recommendations, as in ARORA, as modified by HE and SMYTH, and fuses user-item and node-node embeddings to output relationship information, as in GARCIA. A person of ordinary skill would have been motivated to make this combination to improve functionality.(GARCIA [0007])

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SKIELER A KOWALIK whose telephone number is (571)272-1850. The examiner can normally be reached 8-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mariela D Reyes can be reached at (571)270-1006. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SKIELER ALEXANDER KOWALIK/               Examiner, Art Unit 2142                                                                                                                                                                                         /Mariela Reyes/Supervisory Patent Examiner, Art Unit 2142
Read full office action
MULTI-GRAPH CONVOLUTION COLLABORATIVE FILTERING

This examiner grants 27% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

MULTI-GRAPH CONVOLUTION COLLABORATIVE FILTERING

This examiner grants 27% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email