So before we jump into some specific algorithms, let's first provide a little bit notations. So a graph is represented by a letter G, it consists of set of vertices V, a set of edges E, and a set of features. Usually those features are on nodes x, and sometimes you have edge feature then you have another set of features, maybe represented with the different letters here. So the graph structure are also encoded in this adjacency matrix A, that's just n by n matrix and this number of nodes, and every elements of A indicate whether there's an edge between a two pair of node. So it's a binary matrix. An X, as I said is a node feature and encodes some external information about those nodes. For example, if you have a molecule graph, then the chemical structure about the individual atom which are the nodes could be the features. If you don't have a features or sometimes, you just have a graph, that's also fine, you can simply use some one high encoding as the features for each node. So each node will have a different node features but the algorithm still works pretty much the same. So that's the input. Here's some key ideas between graph neural network. One idea is this aggregation idea. The idea is we want to aggregate information from the neighbors of a node. First, we want to define a neighbors for a given node. Typically, they are just different node that directly connected to this node. For example, this red node will connect to a whole bunch of green node that in this circle. So that's a neighborhood of a given node, and we want to aggregate information from it's neighbors in order to learn a better embedding for a given node, and we do that for each node. So that's aggregation idea. The second idea is layering. So the idea here is you can actually recursively or repeatedly applying this aggregation idea so that you can aggregate information not only from one degree node or this immediate neighbors, but also from two-degree node, two-step neighbors. In general, you can have k layers that aggregate information from k-hop neighbors. That's a very powerful information because you can do this iteratively to gradually refine and improve the embedding of each node. So here's one example of aggregation of graph neural network. We can average the neighboring embedding by applying this type of operation. So here are some notation, this h_i superscript l plus one, that's embedding of is node at the layer l. How is this computed? This embedding is computed by aggregating the neighbors of i. So here is this aggregation operation that just taking the neighbors embedding h j. J are the neighbors of i, this N_i notation indicate the set of neighbors of i, and so we take those embeddings, then do a linear transformation, then sum them up. We also take the embedding of the node itself from the previous layer and aggregate them as well. We also pass this whole sum in [inaudible] activation function, for example, ReLu or sigmoid. So that's aggregation operation wherever a concrete example. Here, those W-matrix is are the weight parameter. In this particular case, they're the same. So that means the neighbors and the node itself will be aggregated the same way. In some other formulation, they may be different, it could be two different W matrices here. So that's aggregation and layering. Then finally, to do this in a neural network fashion using backpropagation, we need to define a loss function. There are many different ways to do that, and here's just an example for node classification, then we can have this cross entropy loss on each node then just sum them up. Here, the loss function sum over all is are the different nodes, and y_is is whether the node have this property or not. For example, in the patient network, it could mean this patient is healthy or sick. That's the class label for node v_i. Then this h_i are they embedding for the node v, and we'll just do a transformation and the pass to a sigmoid function to do the classification. So that's the standard cross-entropy loss. Putting all this together, you have this general principle of designing a graph neural networks. First, define a neighborhood aggregation function, How do you aggregate information from your neighbors from each node, then we want to define a loss function on the embedding so that we can do backpropagation to learn the right parameters. Then we need to train on set of nodes based on this operations. Finally, once you have this models, you can map or generate embeddings for any node as long as you are given the node features. For example, those node feature could be those chemical structures of atom, then immediately, you can get some benefit by applying this graph neural network even for the node you haven't seen or haven't trained. That's what so-called inductive learning. So that's general neural network design.