# Basic Graph Operations


We will introduce the basic concepts and graph operations of GLT.


## Graph and Feature

GLT describes the graph topo data in CSR or CSC
format by an instance of [`graphlearn_torch.data.graph.Topology`](graphlearn_torch.data.graph.Topology).
The [`graphlearn_torch.data.graph.Graph`](graphlearn_torch.data.graph.Graph) uses
[`graphlearn_torch.data.graph.Topology`](graphlearn_torch.data.graph.Topology) as input data and stores the data
into either CPU memory or GPU memory according to different `mode`.

Node features are described by an instance of
[`graphlearn_torch.data.feature.Feature`](graphlearn_torch.data.feature.Feature), which uses a PyTorch CPU tensor
as the input and stores the data into GPU, pinned or CPU memory according to
the arguments `split_ratio` and `device_group_list`. The `split_ratio`
indicates the ratio of feature data stored in GPU and pinned memory, and the
`device_group_list` describes the GPU groups with peer-to-peer accsess which
is used to place GPU part data.


We show a simple example of a graph with three nodes and four edges.
Each node contains two features, and there is at least one GPU available.
The graph is stored in pinned memory and feature data is stored in GPU
and pinned memory.

``` python
import torch
from graphlearn_torch.data import Topology, DeviceGroup, Feature, Graph
# graph topology:
# 0
# | \
# 1  2
# Graph is stored in pin memory.
edge_index = torch.tensor([[0, 0, 1, 2],
                            [1, 2, 0, 0]], dtype=torch.long)
x = torch.tensor([[0.0, 1.0], [0.1, 1.1], [0.2, 1.2]], dtype=torch.float)
csr_topo = Topology(edge_index, layout='CSR')
graph = Graph(csr_topo, mode='ZERO_COPY', device=0)
# The 20% feature data is stored in GPU 0 and the remaining 80% is in
# pinned memory.
feature = Feature(x,
  split_ratio=0.2, device_group_list=[DeviceGroup(0, [0])], device=0)
```

## Neighbor Sampling

Sampling is essiential for large-scale graphs training.
GLT uses an instance of
[`graphlearn_torch.sampler.neighbor_sampler.NeighborSampler`](graphlearn_torch.sampler.neighbor_sampler.NeighborSampler)
as neighbor sampler which is similiar to PyG's neighbor sampler with the same input and output.

We show a 1-hop neighbor sampler example which samples 2 neighborhood nodes for
each input node.

``` python
import torch
from graphlearn_torch.data import Topology, Graph
from graphlearn_torch.sampler import NeighborSampler
# 0
# | \
# 1  2
edge_index = torch.tensor([[0, 0, 1, 2],
                           [1, 2, 0, 0]], dtype=torch.long)
csr_topo = Topology(edge_index, layout='CSR')
graph = Graph(csr_topo, mode='ZERO_COPY', device=0)
sampler = NeighborSampler(graph, [2])
input_seeds = torch.tensor([0, 1, 2], dtype=torch.long)
sample_out = sampler.sample_from_nodes(input_seeds)
print(sample_out.node, sample_out.row, sample_out.col)

>>> tensor([0, 1, 2], device='cuda:0')
    tensor([1, 2, 0, 0], device='cuda:0')
    tensor([0, 0, 1, 2], device='cuda:0')
```

## Dataset and DataLoader

To unify the management of graph and feature data, we define
a class [`graphlearn_torch.data.dataset.Dataset`](graphlearn_torch.data.dataset.Dataset).
To be compatible with PyG/PyTorch training, we also define a class [`graphlearn_torch.loader.neighbor_loader.NeighborLoader`](graphlearn_torch.loader.neighbor_loader.NeighborLoader), which is
similiar to PyG's `NeighborLoader`.


```python
import torch
from graphlearn_torch.data import Dataset
from graphlearn_torch.loader import NeighborLoader
# 0
# | \
# 1  2
edge_index = torch.tensor([[0, 0, 1, 2],
                            [1, 2, 0, 0]], dtype=torch.long)
x = torch.tensor([[0.0, 1.0], [0.1, 1.1], [0.2, 1.2]], dtype=torch.float)
dataset = Dataset()
dataset.init_graph(edge_index=edge_index)
dataset.init_node_features(node_feature_data=x, split_ratio=0.2,
                           device_group_list=[DeviceGroup(0, [0])])
input_seeds = torch.tensor([0, 1, 2], dtype=torch.long)
loader = NeighborLoader(dataset, [2], input_seeds)
for data in loader:
  print(data.x, data.edge_index)

>>> tensor([[0.0000, 1.0000],
            [0.1000, 1.1000],
            [0.2000, 1.2000]], device='cuda:0') tensor([[1, 2],
            [0, 0]], device='cuda:0')
    tensor([[0.1000, 1.1000],
            [0.0000, 1.0000]], device='cuda:0') tensor([[1],
            [0]], device='cuda:0') tensor([1, 0], device='cuda:0')
    tensor([[0.2000, 1.2000],
            [0.0000, 1.0000]], device='cuda:0') tensor([[1],
            [0]], device='cuda:0') tensor([2, 0], device='cuda:0')
```

## Training GNNs

Because GLT provides several dataloaders similar to PyG,
the output format of these dataloaders is the same as PyG's,
so the model part can use PyG's model directly.
We provide examples of [node classification](node_class) and
[link prediction](link_pred) based on PyG.
We also provide a simple [distributed training example](dist_train).
The whole examples can be found in GLT's github `exampes/` directorys.