Basic Graph Operations
We will introduce the basic concepts and graph operations of GLT.
Graph and Feature
GLT describes the graph topo data in CSR or CSC
format by an instance of graphlearn_torch.data.graph.Topology.
The graphlearn_torch.data.graph.Graph uses
graphlearn_torch.data.graph.Topology as input data and stores the data
into either CPU memory or GPU memory according to different mode.
Node features are described by an instance of
graphlearn_torch.data.feature.Feature, which uses a PyTorch CPU tensor
as the input and stores the data into GPU, pinned or CPU memory according to
the arguments split_ratio and device_group_list. The split_ratio
indicates the ratio of feature data stored in GPU and pinned memory, and the
device_group_list describes the GPU groups with peer-to-peer accsess which
is used to place GPU part data.
We show a simple example of a graph with three nodes and four edges. Each node contains two features, and there is at least one GPU available. The graph is stored in pinned memory and feature data is stored in GPU and pinned memory.
import torch
from graphlearn_torch.data import Topology, DeviceGroup, Feature, Graph
# graph topology:
# 0
# | \
# 1 2
# Graph is stored in pin memory.
edge_index = torch.tensor([[0, 0, 1, 2],
[1, 2, 0, 0]], dtype=torch.long)
x = torch.tensor([[0.0, 1.0], [0.1, 1.1], [0.2, 1.2]], dtype=torch.float)
csr_topo = Topology(edge_index, layout='CSR')
graph = Graph(csr_topo, mode='ZERO_COPY', device=0)
# The 20% feature data is stored in GPU 0 and the remaining 80% is in
# pinned memory.
feature = Feature(x,
split_ratio=0.2, device_group_list=[DeviceGroup(0, [0])], device=0)
Neighbor Sampling
Sampling is essiential for large-scale graphs training.
GLT uses an instance of
graphlearn_torch.sampler.neighbor_sampler.NeighborSampler
as neighbor sampler which is similiar to PyG’s neighbor sampler with the same input and output.
We show a 1-hop neighbor sampler example which samples 2 neighborhood nodes for each input node.
import torch
from graphlearn_torch.data import Topology, Graph
from graphlearn_torch.sampler import NeighborSampler
# 0
# | \
# 1 2
edge_index = torch.tensor([[0, 0, 1, 2],
[1, 2, 0, 0]], dtype=torch.long)
csr_topo = Topology(edge_index, layout='CSR')
graph = Graph(csr_topo, mode='ZERO_COPY', device=0)
sampler = NeighborSampler(graph, [2])
input_seeds = torch.tensor([0, 1, 2], dtype=torch.long)
sample_out = sampler.sample_from_nodes(input_seeds)
print(sample_out.node, sample_out.row, sample_out.col)
>>> tensor([0, 1, 2], device='cuda:0')
tensor([1, 2, 0, 0], device='cuda:0')
tensor([0, 0, 1, 2], device='cuda:0')
Dataset and DataLoader
To unify the management of graph and feature data, we define
a class graphlearn_torch.data.dataset.Dataset.
To be compatible with PyG/PyTorch training, we also define a class graphlearn_torch.loader.neighbor_loader.NeighborLoader, which is
similiar to PyG’s NeighborLoader.
import torch
from graphlearn_torch.data import Dataset
from graphlearn_torch.loader import NeighborLoader
# 0
# | \
# 1 2
edge_index = torch.tensor([[0, 0, 1, 2],
[1, 2, 0, 0]], dtype=torch.long)
x = torch.tensor([[0.0, 1.0], [0.1, 1.1], [0.2, 1.2]], dtype=torch.float)
dataset = Dataset()
dataset.init_graph(edge_index=edge_index)
dataset.init_node_features(node_feature_data=x, split_ratio=0.2,
device_group_list=[DeviceGroup(0, [0])])
input_seeds = torch.tensor([0, 1, 2], dtype=torch.long)
loader = NeighborLoader(dataset, [2], input_seeds)
for data in loader:
print(data.x, data.edge_index)
>>> tensor([[0.0000, 1.0000],
[0.1000, 1.1000],
[0.2000, 1.2000]], device='cuda:0') tensor([[1, 2],
[0, 0]], device='cuda:0')
tensor([[0.1000, 1.1000],
[0.0000, 1.0000]], device='cuda:0') tensor([[1],
[0]], device='cuda:0') tensor([1, 0], device='cuda:0')
tensor([[0.2000, 1.2000],
[0.0000, 1.0000]], device='cuda:0') tensor([[1],
[0]], device='cuda:0') tensor([2, 0], device='cuda:0')
Training GNNs
Because GLT provides several dataloaders similar to PyG,
the output format of these dataloaders is the same as PyG’s,
so the model part can use PyG’s model directly.
We provide examples of node classification and
link prediction based on PyG.
We also provide a simple distributed training example.
The whole examples can be found in GLT’s github exampes/ directorys.