Graph Operators
Graph-Learn_torch(GLT) optimizes the end-to-end training throughput of GNN
models by boosting the performance of graph sampling and feature collection.
In GLT, we have implemented vertex-based
graphlearn_torch.sampler.NeighborSampler
and
graphlearn_torch.sampler.RandomNegativeSampler.
Edge-based and subgraph-based samplers will be added in the next release.
In this tutorial, we introduce the detailed designs of these graph-related operators in GLT.
NeighborSampler
Similar to PyG, GLT wraps the configuration, initialization and execution
of samplers inside data loaders. The following code illustrates an example
of graphlearn_torch.loader.NeighborLoader
in single machine training.
# graphlearn_torch NeighborLoader
train_loader = glt.loader.NeighborLoader(glt_dataset,
[15, 10, 5],
split_idx['train'],
batch_size=1024,
shuffle=True,
drop_last=True,
device=device,
as_pyg_v1=True)
During the initialization of the neighbor loader, an instance of
graphlearn_torch.sampler.NeighborSampler
is created.
class NeighborSampler(BaseSampler):
r""" Neighbor Sampler.
"""
def __init__(self,
graph: Union[Graph, Dict[str, Graph]],
num_neighbors: NumNeighbors,
device: torch.device=torch.device('cuda', 0),
with_edge: bool=False,
strategy: str = 'random'):
To be compatible with PyG, NeighborSampler in GLT inherits the classtorch_geometric.sampler.BaseSampler
in PyG. Both homogeneous and heterogeneous graphs are supported.
The sampler instance is created with the user-specified parameters:
the number of hops, number of neighbors of each hop and the device where
sampling operations are performed. GLT supports both CPU and GPU sampling.
By setting with_edge to True, edge ids are included in the sampled results.
Edges ids can be used to extract edge features. By default the sampling
strategy is random sampling.
We directly use the input and output formats of PyG sampler for the
NeighborSampler in GLT. The input formats of
graphlearn_torch.sampler.NeighborSampler
is
torch_geometric.sampler.NodeSamplerInput
@dataclass
class NodeSamplerInput(CastMixin):
r""" The sampling input of
:meth:`~graphlearn_torch.sampler.BaseSampler.sample_from_nodes`.
This class corresponds to :class:`~torch_geometric.sampler.NodeSamplerInput`:
https://github.com/pyg-team/pytorch_geometric/blob/master/torch_geometric/sampler/base.py
Args:
node (torch.Tensor): The indices of seed nodes to start sampling from.
input_type (str, optional): The input node type (in case of sampling in
a heterogeneous graph). (default: :obj:`None`).
"""
node: torch.Tensor
input_type: Optional[NodeType] = None
The output format of sampling results on homogeneous graphs is torch_geometric.sampler.SamplerOutput.
@dataclass
class SamplerOutput(CastMixin):
r""" The sampling output of a :class:`~graphlearn_torch.sampler.BaseSampler` on
homogeneous graphs.
Args:
node (torch.Tensor): The sampled nodes in the original graph.
row (torch.Tensor): The source node indices of the sampled subgraph.
Indices must be re-indexed to :obj:`{ 0, ..., num_nodes - 1 }`
corresponding to the nodes in the :obj:`node` tensor.
col (torch.Tensor): The destination node indices of the sampled subgraph.
Indices must be re-indexed to :obj:`{ 0, ..., num_nodes - 1 }`
corresponding to the nodes in the :obj:`node` tensor.
edge (torch.Tensor, optional): The sampled edges in the original graph.
This tensor is used to obtain edge features from the original
graph. If no edge attributes are present, it may be omitted.
batch (torch.Tensor, optional): The vector to identify the seed node
for each sampled node. Can be present in case of disjoint subgraph
sampling per seed node. (default: :obj:`None`).
device (torch.device, optional): The device that all data of this output
resides in. (default: :obj:`None`).
metadata: (Any, optional): Additional metadata information.
(default: :obj:`None`).
"""
node: torch.Tensor
row: torch.Tensor
col: torch.Tensor
edge: Optional[torch.Tensor] = None
batch: Optional[torch.Tensor] = None
device: Optional[torch.device] = None
metadata: Optional[Any] = None
The output format of sampling results on heterogeneous graphs is torch_geometric.sampler.HeteroSamplerOutput.
@dataclass
class HeteroSamplerOutput(CastMixin):
r""" The sampling output of a :class:`~graphlearn_torch.sampler.BaseSampler` on
heterogeneous graphs.
Args:
node (Dict[str, torch.Tensor]): The sampled nodes in the original graph
for each node type.
row (Dict[Tuple[str, str, str], torch.Tensor]): The source node indices
of the sampled subgraph for each edge type. Indices must be re-indexed
to :obj:`{ 0, ..., num_nodes - 1 }` corresponding to the nodes in the
:obj:`node` tensor of the source node type.
col (Dict[Tuple[str, str, str], torch.Tensor]): The destination node
indices of the sampled subgraph for each edge type. Indices must be
re-indexed to :obj:`{ 0, ..., num_nodes - 1 }` corresponding to the nodes
in the :obj:`node` tensor of the destination node type.
edge (Dict[Tuple[str, str, str], torch.Tensor], optional): The sampled
edges in the original graph for each edge type. This tensor is used to
obtain edge features from the original graph. If no edge attributes are
present, it may be omitted. (default: :obj:`None`).
batch (Dict[str, torch.Tensor], optional): The vector to identify the
seed node for each sampled node for each node type. Can be present
in case of disjoint subgraph sampling per seed node.
(default: :obj:`None`).
edge_types: (List[Tuple[str, str, str]], optional): The list of edge types
of the sampled subgraph. (default: :obj:`None`).
device (torch.device, optional): The device that all data of this output
resides in. (default: :obj:`None`).
metadata: (Any, optional): Additional metadata information.
(default: :obj:`None`)
"""
node: Dict[NodeType, torch.Tensor]
row: Dict[EdgeType, torch.Tensor]
col: Dict[EdgeType, torch.Tensor]
edge: Optional[Dict[EdgeType, torch.Tensor]] = None
batch: Optional[Dict[NodeType, torch.Tensor]] = None
edge_types: Optional[List[EdgeType]] = None
device: Optional[torch.device] = None
metadata: Optional[Any] = None
Negative Sampler
GLT also supports GPU and CPU sampling for negative sampler. The below
code shows the graphlearn_torch.sampler.RandomNegativeSampler in GLT.
class RandomNegativeSampler(object):
r""" Random negative Sampler.
Args:
graph: A ``graphlearn_torch.data.Graph`` object.
mode: Execution mode of sampling, 'CUDA' means sampling on
GPU, 'CPU' means sampling on CPU.
"""
def __init__(self, graph, mode='CUDA'):
self._mode = mode
if mode == 'CUDA':
self._sampler = pywrap.CUDARandomNegativeSampler(graph.graph_handler)
else:
self._sampler = pywrap.CPURandomNegativeSampler(graph.graph_handler)
The inputs of the sample method in RandomNegativeSampler include:
req_numthe number of maximum negative samples, trials_num the maximum number of
trails to generate enough negative samples, and padding specifies if the
number of negative samples are smaller than req_num after trials_num
is used up, whether to use randomly generated samples (could be positive or negative samples) to complement the number of samples in the outputs.
def sample(self, req_num, trials_num=5, padding=False):
r""" Negative sampling.
Args:
req_num: The number of request(max) negative samples.
trials_num: The number of trials for negative sampling.
padding: Whether to patch the negative sampling results to req_num.
If True, after trying trials_num times, if the number of true negative
samples is still less than req_num, just random sample edges(non-strict
negative) as negative samples.
Returns:
negative edge_index(non-strict when padding is True).
"""
rows, cols = self._sampler.sample(req_num, trials_num, padding)
return torch.stack([rows, cols], dim=0)