graphlearn_torch.sampler

base

class EdgeIndex(edge_index: Tensor, e_id: Tensor | None, size: Tuple[int, int])[source]

Bases: tuple

PyG’s EdgeIndex used in old data loader NeighborSampler: https://github.com/pyg-team/pytorch_geometric/blob/master/torch_geometric/loader/neighbor_sampler.py

edge_index: Tensor

Alias for field number 0

e_id: Tensor | None

Alias for field number 1

size: Tuple[int, int]

Alias for field number 2

to(*args, **kwargs)[source]
class NodeSamplerInput(node: Tensor, input_type: str | None = None)[source]

Bases: CastMixin

The sampling input of sample_from_nodes().

This class corresponds to NodeSamplerInput: https://github.com/pyg-team/pytorch_geometric/blob/master/torch_geometric/sampler/base.py

Parameters:
  • node (torch.Tensor) – The indices of seed nodes to start sampling from.

  • input_type (str, optional) – The input node type (in case of sampling in a heterogeneous graph). (default: None).

node: Tensor
input_type: str | None = None
share_memory()[source]
to(device: device)[source]
class NegativeSamplingMode(value)[source]

Bases: Enum

An enumeration.

binary = 'binary'
triplet = 'triplet'
class NegativeSampling(mode: NegativeSamplingMode | str, amount: int | float = 1, weight: Tensor | None = None)[source]

Bases: CastMixin

The negative sampling configuration of a BaseSampler when calling sample_from_edges().

Parameters:
  • mode (str) – The negative sampling mode ("binary" or "triplet"). If set to "binary", will randomly sample negative links from the graph. If set to "triplet", will randomly sample negative destination nodes for each positive source node.

  • amount (int or float, optional) – The ratio of sampled negative edges to the number of positive edges. (default: 1)

  • weight (torch.Tensor, optional) – A node-level vector determining the sampling of nodes. Does not necessariyl need to sum up to one. If not given, negative nodes will be sampled uniformly. (default: None)

mode: NegativeSamplingMode
weight: Tensor | None = None
amount: int | float = 1
is_binary() bool[source]
is_triplet() bool[source]
share_memory()[source]
to(device: device)[source]
class EdgeSamplerInput(row: Tensor, col: Tensor, label: Tensor | None = None, input_type: Tuple[str, str, str] | None = None, neg_sampling: NegativeSampling | None = None)[source]

Bases: CastMixin

The sampling input of sample_from_edges().

This class corresponds to EdgeSamplerInput: https://github.com/pyg-team/pytorch_geometric/blob/master/torch_geometric/sampler/base.py

Parameters:
  • row (torch.Tensor) – The source node indices of seed links to start sampling from.

  • col (torch.Tensor) – The destination node indices of seed links to start sampling from.

  • label (torch.Tensor, optional) – The label for the seed links. (default: None).

  • input_type (Tuple[str, str, str], optional) – The input edge type (in case of sampling in a heterogeneous graph). (default: None).

row: Tensor
col: Tensor
label: Tensor | None = None
input_type: Tuple[str, str, str] | None = None
neg_sampling: NegativeSampling | None = None
share_memory()[source]
to(device: device)[source]
class SamplerOutput(node: Tensor, row: Tensor, col: Tensor, edge: Tensor | None = None, batch: Tensor | None = None, device: device | None = None, metadata: Any | None = None)[source]

Bases: CastMixin

The sampling output of a BaseSampler on homogeneous graphs.

This class corresponds to SamplerOutput: https://github.com/pyg-team/pytorch_geometric/blob/master/torch_geometric/sampler/base.py

Parameters:
  • node (torch.Tensor) – The sampled nodes in the original graph.

  • row (torch.Tensor) – The source node indices of the sampled subgraph. Indices must be re-indexed to { 0, ..., num_nodes - 1 } corresponding to the nodes in the node tensor.

  • col (torch.Tensor) – The destination node indices of the sampled subgraph. Indices must be re-indexed to { 0, ..., num_nodes - 1 } corresponding to the nodes in the node tensor.

  • edge (torch.Tensor, optional) – The sampled edges in the original graph. This tensor is used to obtain edge features from the original graph. If no edge attributes are present, it may be omitted.

  • batch (torch.Tensor, optional) – The vector to identify the seed node for each sampled node. Can be present in case of disjoint subgraph sampling per seed node. (default: None).

  • device (torch.device, optional) – The device that all data of this output resides in. (default: None).

  • metadata – (Any, optional): Additional metadata information. (default: None).

node: Tensor
row: Tensor
col: Tensor
edge: Tensor | None = None
batch: Tensor | None = None
device: device | None = None
metadata: Any | None = None
class HeteroSamplerOutput(node: Dict[str, Tensor], row: Dict[Tuple[str, str, str], Tensor], col: Dict[Tuple[str, str, str], Tensor], edge: Dict[Tuple[str, str, str], Tensor] | None = None, batch: Dict[str, Tensor] | None = None, edge_types: List[Tuple[str, str, str]] | None = None, input_type: str | Tuple[str, str, str] | None = None, device: device | None = None, metadata: Any | None = None)[source]

Bases: CastMixin

The sampling output of a BaseSampler on heterogeneous graphs.

This class corresponds to HeteroSamplerOutput: https://github.com/pyg-team/pytorch_geometric/blob/master/torch_geometric/sampler/base.py

Parameters:
  • node (Dict[str, torch.Tensor]) – The sampled nodes in the original graph for each node type.

  • row (Dict[Tuple[str, str, str], torch.Tensor]) – The source node indices of the sampled subgraph for each edge type. Indices must be re-indexed to { 0, ..., num_nodes - 1 } corresponding to the nodes in the node tensor of the source node type.

  • col (Dict[Tuple[str, str, str], torch.Tensor]) – The destination node indices of the sampled subgraph for each edge type. Indices must be re-indexed to { 0, ..., num_nodes - 1 } corresponding to the nodes in the node tensor of the destination node type.

  • edge (Dict[Tuple[str, str, str], torch.Tensor], optional) – The sampled edges in the original graph for each edge type. This tensor is used to obtain edge features from the original graph. If no edge attributes are present, it may be omitted. (default: None).

  • batch (Dict[str, torch.Tensor], optional) – The vector to identify the seed node for each sampled node for each node type. Can be present in case of disjoint subgraph sampling per seed node. (default: None).

  • edge_types – (List[Tuple[str, str, str]], optional): The list of edge types of the sampled subgraph. (default: None).

  • input_type – (Union[NodeType, EdgeType], optional): The input type of seed nodes or edge_label_index. (default: None).

  • device (torch.device, optional) – The device that all data of this output resides in. (default: None).

  • metadata – (Any, optional): Additional metadata information. (default: None)

node: Dict[str, Tensor]
row: Dict[Tuple[str, str, str], Tensor]
col: Dict[Tuple[str, str, str], Tensor]
edge: Dict[Tuple[str, str, str], Tensor] | None = None
batch: Dict[str, Tensor] | None = None
edge_types: List[Tuple[str, str, str]] | None = None
input_type: str | Tuple[str, str, str] | None = None
device: device | None = None
metadata: Any | None = None
get_edge_index()[source]
class NeighborOutput(nbr: Tensor, nbr_num: Tensor, edge: Tensor | None)[source]

Bases: CastMixin

The output of sampled neighbor results for a single hop sampling.

Parameters:
  • nbr (torch.Tensor) – A 1D tensor of all sampled neighborhood node ids.

  • nbr_num (torch.Tensor, optional) – A 1D tensor that identify the number of neighborhood nodes for each source nodes. Must be the same length as the source nodes of this sampling hop.

  • nbr_num – The edge ids corresponding to the sampled edges (from source node to the sampled neighborhood node). Should be the same length as nbr if provided.

nbr: Tensor
nbr_num: Tensor
edge: Tensor | None
to(device: device)[source]
class SamplingType(value)[source]

Bases: Enum

Enum class for sampling types.

NODE = 0
SUBGRAPH = 2
RANDOM_WALK = 3
class SamplingConfig(sampling_type: SamplingType, num_neighbors: List[int] | Dict[Tuple[str, str, str], List[int]] | None, batch_size: int, shuffle: bool, drop_last: bool, with_edge: bool, collect_features: bool, with_neg: bool)[source]

Bases: object

Configuration info for sampling.

sampling_type: SamplingType
num_neighbors: List[int] | Dict[Tuple[str, str, str], List[int]] | None
batch_size: int
shuffle: bool
drop_last: bool
with_edge: bool
collect_features: bool
with_neg: bool
class BaseSampler[source]

Bases: ABC

A base class that initializes a graph sampler and provides sample_from_nodes() and sample_from_edges() routines.

This class corresponds to BaseSampler: https://github.com/pyg-team/pytorch_geometric/blob/master/torch_geometric/sampler/base.py

abstract sample_from_nodes(inputs: NodeSamplerInput, **kwargs) HeteroSamplerOutput | SamplerOutput[source]

Performs sampling from the nodes specified in inputs, returning a sampled subgraph(egograph) in the specified output format.

Parameters:

inputs (torch.Tensor) – The input data with node indices to start sampling from.

abstract sample_from_edges(inputs: EdgeSamplerInput, **kwargs) HeteroSamplerOutput | SamplerOutput[source]

Performs sampling from the edges specified in inputs, returning a sampled subgraph(egograph) in the specified output format.

Parameters:

inputs (EdgeSamplerInput) – The input data for sampling from edges including the (1) source node indices, the (2) destination node indices, the (3) optional edge labels and the (4) input edge type.

abstract subgraph(inputs: NodeSamplerInput) SamplerOutput[source]
Induce an enclosing subgraph based on inputs and their neighbors(if

num_neighbors is not None).

Parameters:

inputs (torch.Tensor) – The input data with node indices to induce subgraph from.

Returns:

The sampled unique nodes, relabeled rows and cols, original edge_ids, and a mapping from indices in inputs to new indices in output nodes, i.e. nodes[mapping] = inputs.

negative_sampler

class RandomNegativeSampler(graph, mode='CUDA')[source]

Bases: object

Random negative Sampler.

Parameters:
  • graph – A graphlearn_torch.data.Graph object.

  • mode – Execution mode of sampling, ‘CUDA’ means sampling on GPU, ‘CPU’ means sampling on CPU.

sample(req_num, trials_num=5, padding=False)[source]

Negative sampling.

Parameters:
  • req_num – The number of request(max) negative samples.

  • trials_num – The number of trials for negative sampling.

  • padding – Whether to patch the negative sampling results to req_num. If True, after trying trials_num times, if the number of true negative samples is still less than req_num, just random sample edges(non-strict negative) as negative samples.

Returns:

negative edge_index(non-strict when padding is True).

neighbor_sampler

class NeighborSampler(graph: Graph | Dict[Tuple[str, str, str], Graph], num_neighbors: List[int] | Dict[Tuple[str, str, str], List[int]] | None = None, device: device = device(type='cuda', index=0), with_edge: bool = False, with_neg: bool = False, strategy: str = 'random')[source]

Bases: BaseSampler

Neighbor Sampler.

property subgraph_op
lazy_init_sampler()[source]
lazy_init_neg_sampler()[source]
lazy_init_subgraph_op()[source]
sample_one_hop(input_seeds: Tensor, req_num: int, etype: Tuple[str, str, str] | None = None) NeighborOutput[source]
sample_from_nodes(inputs: NodeSamplerInput, **kwargs) HeteroSamplerOutput | SamplerOutput[source]

Performs sampling from the nodes specified in inputs, returning a sampled subgraph(egograph) in the specified output format.

Parameters:

inputs (torch.Tensor) – The input data with node indices to start sampling from.

sample_from_edges(inputs: EdgeSamplerInput, **kwargs) HeteroSamplerOutput | SamplerOutput[source]

Performs sampling from an edge sampler input, leveraging a sampling function of the same signature as node_sample.

Currently, we support the out-edge sampling manner, so we reverse the direction of src and dst for the output so that features of the sampled nodes during training can be aggregated from k-hop to (k-1)-hop nodes.

sample_pyg_v1(ids: Tensor)[source]

Sample multi-hop neighbors and organize results to PyG’s EdgeIndex.

Parameters:
  • ids – input ids, 1D tensor.

  • NeighborSampler (The sampled results that is the same as PyG's) –

subgraph(inputs: NodeSamplerInput) SamplerOutput[source]
Induce an enclosing subgraph based on inputs and their neighbors(if

num_neighbors is not None).

Parameters:

inputs (torch.Tensor) – The input data with node indices to induce subgraph from.

Returns:

The sampled unique nodes, relabeled rows and cols, original edge_ids, and a mapping from indices in inputs to new indices in output nodes, i.e. nodes[mapping] = inputs.

sample_prob(inputs: NodeSamplerInput, node_cnt: int | Dict[str, int]) Tensor | Dict[str, Tensor][source]

Get the probability of each node being sampled.

get_inducer(input_batch_size: int)[source]
create_inducer(input_batch_size: int)[source]