# Graph Operators

Graph-Learn_torch(GLT) optimizes the end-to-end training throughput of GNN 
models by boosting the performance of graph sampling and feature collection. 
In GLT, we have implemented vertex-based
[`graphlearn_torch.sampler.NeighborSampler`](graphlearn_torch.sampler.NeighborSampler)
and 
[`graphlearn_torch.sampler.RandomNegativeSampler`](graphlearn_torch.sampler.RandomNegativeSampler).
Edge-based and subgraph-based samplers will be added in the next release.
In this tutorial, we introduce the detailed designs of these graph-related operators in GLT.

## NeighborSampler
Similar to PyG, GLT wraps the configuration, initialization and execution
of samplers inside data loaders. The following code illustrates an example
of [`graphlearn_torch.loader.NeighborLoader`](graphlearn_torch.loader.NeighborLoader) 
in single machine training.

``` python
# graphlearn_torch NeighborLoader
train_loader = glt.loader.NeighborLoader(glt_dataset,
                                         [15, 10, 5],
                                         split_idx['train'],
                                         batch_size=1024,
                                         shuffle=True,
                                         drop_last=True,
                                         device=device,
                                         as_pyg_v1=True)
```

During the initialization of the neighbor loader, an instance of 
[`graphlearn_torch.sampler.NeighborSampler`](graphlearn_torch.sampler.NeighborSampler)
is created.

``` python
class NeighborSampler(BaseSampler):
  r""" Neighbor Sampler.
  """
  def __init__(self,
               graph: Union[Graph, Dict[str, Graph]],
               num_neighbors: NumNeighbors,
               device: torch.device=torch.device('cuda', 0),
               with_edge: bool=False,
               strategy: str = 'random'):
```

To be compatible with PyG, ``NeighborSampler`` in GLT inherits the class[`torch_geometric.sampler.BaseSampler`](https://github.com/pyg-team/pytorch_geometric/blob/master/torch_geometric/sampler/base.py)
in PyG. Both homogeneous and heterogeneous graphs are supported. 
The sampler instance is created with the user-specified parameters: 
the number of hops, number of neighbors of each hop and the device where
sampling operations are performed. GLT supports both CPU and GPU sampling. 
By setting ``with_edge`` to ``True``, edge ids are included in the sampled results.
Edges ids can be used to extract edge features. By default the sampling 
strategy is ``random sampling``.

We directly use the input and output formats of PyG sampler for the
``NeighborSampler`` in GLT. The input formats of 
[`graphlearn_torch.sampler.NeighborSampler`](graphlearn_torch.sampler.NeighborSampler)
is 
[`torch_geometric.sampler.NodeSamplerInput `](https://github.com/pyg-team/pytorch_geometric/blob/master/torch_geometric/sampler/base.py)

```python
@dataclass
class NodeSamplerInput(CastMixin):
  r""" The sampling input of
  :meth:`~graphlearn_torch.sampler.BaseSampler.sample_from_nodes`.

  This class corresponds to :class:`~torch_geometric.sampler.NodeSamplerInput`:
  https://github.com/pyg-team/pytorch_geometric/blob/master/torch_geometric/sampler/base.py

  Args:
    node (torch.Tensor): The indices of seed nodes to start sampling from.
    input_type (str, optional): The input node type (in case of sampling in
      a heterogeneous graph). (default: :obj:`None`).
  """
  node: torch.Tensor
  input_type: Optional[NodeType] = None
  ```

The output format of sampling results on homogeneous graphs is
[torch_geometric.sampler.SamplerOutput](https://github.com/pyg-team/pytorch_geometric/blob/master/torch_geometric/sampler/base.py).

```python
@dataclass
class SamplerOutput(CastMixin):
  r""" The sampling output of a :class:`~graphlearn_torch.sampler.BaseSampler` on
  homogeneous graphs.

  Args:
    node (torch.Tensor): The sampled nodes in the original graph.
    row (torch.Tensor): The source node indices of the sampled subgraph.
      Indices must be re-indexed to :obj:`{ 0, ..., num_nodes - 1 }`
      corresponding to the nodes in the :obj:`node` tensor.
    col (torch.Tensor): The destination node indices of the sampled subgraph.
      Indices must be re-indexed to :obj:`{ 0, ..., num_nodes - 1 }`
      corresponding to the nodes in the :obj:`node` tensor.
    edge (torch.Tensor, optional): The sampled edges in the original graph.
      This tensor is used to obtain edge features from the original
      graph. If no edge attributes are present, it may be omitted.
    batch (torch.Tensor, optional): The vector to identify the seed node
      for each sampled node. Can be present in case of disjoint subgraph
      sampling per seed node. (default: :obj:`None`).
    device (torch.device, optional): The device that all data of this output
      resides in. (default: :obj:`None`).
    metadata: (Any, optional): Additional metadata information.
      (default: :obj:`None`).
  """
  node: torch.Tensor
  row: torch.Tensor
  col: torch.Tensor
  edge: Optional[torch.Tensor]  = None
  batch: Optional[torch.Tensor] = None
  device: Optional[torch.device] = None
  metadata: Optional[Any] = None
  ```
The output format of sampling results on heterogeneous graphs is
[torch_geometric.sampler.HeteroSamplerOutput](https://github.com/pyg-team/pytorch_geometric/blob/master/torch_geometric/sampler/base.py).

```python
@dataclass
class HeteroSamplerOutput(CastMixin):
  r""" The sampling output of a :class:`~graphlearn_torch.sampler.BaseSampler` on
  heterogeneous graphs.

  Args:
    node (Dict[str, torch.Tensor]): The sampled nodes in the original graph
      for each node type.
    row (Dict[Tuple[str, str, str], torch.Tensor]): The source node indices
      of the sampled subgraph for each edge type. Indices must be re-indexed
      to :obj:`{ 0, ..., num_nodes - 1 }` corresponding to the nodes in the
      :obj:`node` tensor of the source node type.
    col (Dict[Tuple[str, str, str], torch.Tensor]): The destination node
      indices of the sampled subgraph for each edge type. Indices must be
      re-indexed to :obj:`{ 0, ..., num_nodes - 1 }` corresponding to the nodes
      in the :obj:`node` tensor of the destination node type.
    edge (Dict[Tuple[str, str, str], torch.Tensor], optional): The sampled
      edges in the original graph for each edge type. This tensor is used to
      obtain edge features from the original graph. If no edge attributes are
      present, it may be omitted. (default: :obj:`None`).
    batch (Dict[str, torch.Tensor], optional): The vector to identify the
      seed node for each sampled node for each node type. Can be present
      in case of disjoint subgraph sampling per seed node.
      (default: :obj:`None`).
    edge_types: (List[Tuple[str, str, str]], optional): The list of edge types
      of the sampled subgraph. (default: :obj:`None`).
    device (torch.device, optional): The device that all data of this output
      resides in. (default: :obj:`None`).
    metadata: (Any, optional): Additional metadata information.
      (default: :obj:`None`)
  """
  node: Dict[NodeType, torch.Tensor]
  row: Dict[EdgeType, torch.Tensor]
  col: Dict[EdgeType, torch.Tensor]
  edge: Optional[Dict[EdgeType, torch.Tensor]] = None
  batch: Optional[Dict[NodeType, torch.Tensor]] = None
  edge_types: Optional[List[EdgeType]] = None
  device: Optional[torch.device] = None
  metadata: Optional[Any] = None
  ```

## Negative Sampler
GLT also supports GPU and CPU sampling for negative sampler. The below
code shows the [`graphlearn_torch.sampler.RandomNegativeSampler`](graphlearn_torch.sampler.RandomNegativeSampler) in GLT.

```python
class RandomNegativeSampler(object):
  r""" Random negative Sampler.

  Args:
    graph: A ``graphlearn_torch.data.Graph`` object.
    mode: Execution mode of sampling, 'CUDA' means sampling on
      GPU, 'CPU' means sampling on CPU.
  """
  def __init__(self, graph, mode='CUDA'):
    self._mode = mode
    if mode == 'CUDA':
      self._sampler = pywrap.CUDARandomNegativeSampler(graph.graph_handler)
    else:
      self._sampler = pywrap.CPURandomNegativeSampler(graph.graph_handler)
```

The inputs of the ``sample`` method in ``RandomNegativeSampler`` include:
 ``req_num``the number of maximum negative samples, ``trials_num`` the maximum number of 
 trails to generate enough negative samples, and ``padding`` specifies if the
 number of negative samples are smaller than ``req_num`` after ``trials_num``
 is used up, whether to use randomly generated samples (could be positive or negative samples) to complement the number of samples in the outputs.
 
```python
  def sample(self, req_num, trials_num=5, padding=False):
    r""" Negative sampling.

    Args:
      req_num: The number of request(max) negative samples.
      trials_num: The number of trials for negative sampling.
      padding: Whether to patch the negative sampling results to req_num.
        If True, after trying trials_num times, if the number of true negative
        samples is still less than req_num, just random sample edges(non-strict
        negative) as negative samples.

    Returns:
      negative edge_index(non-strict when padding is True).
    """
    rows, cols = self._sampler.sample(req_num, trials_num, padding)
    return torch.stack([rows, cols], dim=0)
```