graphlearn_torch.sampler

base

class EdgeIndex(edge_index: Tensor, e_id: Tensor | None, size: Tuple[int, int])[source]

Bases: tuple

PyG’s EdgeIndex used in old data loader NeighborSampler: https://github.com/pyg-team/pytorch_geometric/blob/master/torch_geometric/loader/neighbor_sampler.py

edge_index: Tensor: Alias for field number 0

e_id: Tensor | None: Alias for field number 1

size: Tuple[int, int]: Alias for field number 2

to(*args, **kwargs)[source]

class NodeSamplerInput(node: Tensor, input_type: str | None = None)[source]

Bases: CastMixin

The sampling input of sample_from_nodes().

This class corresponds to NodeSamplerInput: https://github.com/pyg-team/pytorch_geometric/blob/master/torch_geometric/sampler/base.py

Parameters:

node (torch.Tensor) – The indices of seed nodes to start sampling from.
input_type (str, optional) – The input node type (in case of sampling in a heterogeneous graph). (default: None).

node: Tensor

input_type: str | None = None

share_memory()[source]

to(device: device)[source]

class NegativeSamplingMode(value)[source]

Bases: Enum

An enumeration.

binary = 'binary'

triplet = 'triplet'

class NegativeSampling(mode: NegativeSamplingMode | str, amount: int | float = 1, weight: Tensor | None = None)[source]

Bases: CastMixin

The negative sampling configuration of a BaseSampler when calling sample_from_edges().

Parameters:

mode (str) – The negative sampling mode ("binary" or "triplet"). If set to "binary", will randomly sample negative links from the graph. If set to "triplet", will randomly sample negative destination nodes for each positive source node.
amount (int or float, optional) – The ratio of sampled negative edges to the number of positive edges. (default: 1)
weight (torch.Tensor, optional) – A node-level vector determining the sampling of nodes. Does not necessariyl need to sum up to one. If not given, negative nodes will be sampled uniformly. (default: None)

mode: NegativeSamplingMode

weight: Tensor | None = None

amount: int | float = 1

is_binary() → bool[source]

is_triplet() → bool[source]

share_memory()[source]

to(device: device)[source]

class EdgeSamplerInput(row: Tensor, col: Tensor, label: Tensor | None = None, input_type: Tuple[str, str, str] | None = None, neg_sampling: NegativeSampling | None = None)[source]

Bases: CastMixin

The sampling input of sample_from_edges().

This class corresponds to EdgeSamplerInput: https://github.com/pyg-team/pytorch_geometric/blob/master/torch_geometric/sampler/base.py

Parameters:

row (torch.Tensor) – The source node indices of seed links to start sampling from.
col (torch.Tensor) – The destination node indices of seed links to start sampling from.
label (torch.Tensor, optional) – The label for the seed links. (default: None).
input_type (Tuple[str, str, str], optional) – The input edge type (in case of sampling in a heterogeneous graph). (default: None).

row: Tensor

col: Tensor

label: Tensor | None = None

input_type: Tuple[str, str, str] | None = None

neg_sampling: NegativeSampling | None = None

share_memory()[source]

to(device: device)[source]

class SamplerOutput(node: Tensor, row: Tensor, col: Tensor, edge: Tensor | None = None, batch: Tensor | None = None, device: device | None = None, metadata: Any | None = None)[source]

Bases: CastMixin

The sampling output of a BaseSampler on homogeneous graphs.

This class corresponds to SamplerOutput: https://github.com/pyg-team/pytorch_geometric/blob/master/torch_geometric/sampler/base.py

Parameters:

node (torch.Tensor) – The sampled nodes in the original graph.
row (torch.Tensor) – The source node indices of the sampled subgraph. Indices must be re-indexed to { 0, ..., num_nodes - 1 } corresponding to the nodes in the node tensor.
col (torch.Tensor) – The destination node indices of the sampled subgraph. Indices must be re-indexed to { 0, ..., num_nodes - 1 } corresponding to the nodes in the node tensor.
edge (torch.Tensor, optional) – The sampled edges in the original graph. This tensor is used to obtain edge features from the original graph. If no edge attributes are present, it may be omitted.
batch (torch.Tensor, optional) – The vector to identify the seed node for each sampled node. Can be present in case of disjoint subgraph sampling per seed node. (default: None).
device (torch.device, optional) – The device that all data of this output resides in. (default: None).
metadata – (Any, optional): Additional metadata information. (default: None).

node: Tensor

row: Tensor

col: Tensor

edge: Tensor | None = None

batch: Tensor | None = None

device: device | None = None

metadata: Any | None = None

class HeteroSamplerOutput(node: Dict[str, Tensor], row: Dict[Tuple[str, str, str], Tensor], col: Dict[Tuple[str, str, str], Tensor], edge: Dict[Tuple[str, str, str], Tensor] | None = None, batch: Dict[str, Tensor] | None = None, edge_types: List[Tuple[str, str, str]] | None = None, input_type: str | Tuple[str, str, str] | None = None, device: device | None = None, metadata: Any | None = None)[source]

Bases: CastMixin

The sampling output of a BaseSampler on heterogeneous graphs.

This class corresponds to HeteroSamplerOutput: https://github.com/pyg-team/pytorch_geometric/blob/master/torch_geometric/sampler/base.py

Parameters:

node (Dict[str, torch.Tensor]) – The sampled nodes in the original graph for each node type.
row (Dict[Tuple[str, str, str], torch.Tensor]) – The source node indices of the sampled subgraph for each edge type. Indices must be re-indexed to { 0, ..., num_nodes - 1 } corresponding to the nodes in the node tensor of the source node type.
col (Dict[Tuple[str, str, str], torch.Tensor]) – The destination node indices of the sampled subgraph for each edge type. Indices must be re-indexed to { 0, ..., num_nodes - 1 } corresponding to the nodes in the node tensor of the destination node type.
edge (Dict[Tuple[str, str, str], torch.Tensor], optional) – The sampled edges in the original graph for each edge type. This tensor is used to obtain edge features from the original graph. If no edge attributes are present, it may be omitted. (default: None).
batch (Dict[str, torch.Tensor], optional) – The vector to identify the seed node for each sampled node for each node type. Can be present in case of disjoint subgraph sampling per seed node. (default: None).
edge_types – (List[Tuple[str, str, str]], optional): The list of edge types of the sampled subgraph. (default: None).
input_type – (Union[NodeType, EdgeType], optional): The input type of seed nodes or edge_label_index. (default: None).
device (torch.device, optional) – The device that all data of this output resides in. (default: None).
metadata – (Any, optional): Additional metadata information. (default: None)

node: Dict[str, Tensor]

row: Dict[Tuple[str, str, str], Tensor]

col: Dict[Tuple[str, str, str], Tensor]

edge: Dict[Tuple[str, str, str], Tensor] | None = None

batch: Dict[str, Tensor] | None = None

edge_types: List[Tuple[str, str, str]] | None = None

input_type: str | Tuple[str, str, str] | None = None

device: device | None = None

metadata: Any | None = None

get_edge_index()[source]

class NeighborOutput(nbr: Tensor, nbr_num: Tensor, edge: Tensor | None)[source]

Bases: CastMixin

The output of sampled neighbor results for a single hop sampling.

Parameters:

nbr (torch.Tensor) – A 1D tensor of all sampled neighborhood node ids.
nbr_num (torch.Tensor, optional) – A 1D tensor that identify the number of neighborhood nodes for each source nodes. Must be the same length as the source nodes of this sampling hop.
nbr_num – The edge ids corresponding to the sampled edges (from source node to the sampled neighborhood node). Should be the same length as nbr if provided.

nbr: Tensor

nbr_num: Tensor

edge: Tensor | None

to(device: device)[source]

class SamplingType(value)[source]

Bases: Enum

Enum class for sampling types.

NODE = 0

LINK = 1

SUBGRAPH = 2

RANDOM_WALK = 3

class SamplingConfig(sampling_type: SamplingType, num_neighbors: List[int] | Dict[Tuple[str, str, str], List[int]] | None, batch_size: int, shuffle: bool, drop_last: bool, with_edge: bool, collect_features: bool, with_neg: bool)[source]

Bases: object

Configuration info for sampling.

sampling_type: SamplingType

num_neighbors: List[int] | Dict[Tuple[str, str, str], List[int]] | None

batch_size: int

shuffle: bool

drop_last: bool

with_edge: bool

collect_features: bool

with_neg: bool

class BaseSampler[source]

Bases: ABC

A base class that initializes a graph sampler and provides sample_from_nodes() and sample_from_edges() routines.

This class corresponds to BaseSampler: https://github.com/pyg-team/pytorch_geometric/blob/master/torch_geometric/sampler/base.py

abstract sample_from_nodes(inputs: NodeSamplerInput, **kwargs) → HeteroSamplerOutput | SamplerOutput[source]

Performs sampling from the nodes specified in inputs, returning a sampled subgraph(egograph) in the specified output format.

Parameters:: inputs (torch.Tensor) – The input data with node indices to start sampling from.

abstract sample_from_edges(inputs: EdgeSamplerInput, **kwargs) → HeteroSamplerOutput | SamplerOutput[source]

Performs sampling from the edges specified in inputs, returning a sampled subgraph(egograph) in the specified output format.

Parameters:: inputs (EdgeSamplerInput) – The input data for sampling from edges including the (1) source node indices, the (2) destination node indices, the (3) optional edge labels and the (4) input edge type.

abstract subgraph(inputs: NodeSamplerInput) → SamplerOutput[source]

Induce an enclosing subgraph based on inputs and their neighbors(if: num_neighbors is not None).

Parameters:: inputs (torch.Tensor) – The input data with node indices to induce subgraph from.
Returns:: The sampled unique nodes, relabeled rows and cols, original edge_ids, and a mapping from indices in inputs to new indices in output nodes, i.e. nodes[mapping] = inputs.

negative_sampler

class RandomNegativeSampler(graph, mode='CUDA')[source]

Bases: object

Random negative Sampler.

Parameters:

graph – A graphlearn_torch.data.Graph object.
mode – Execution mode of sampling, ‘CUDA’ means sampling on GPU, ‘CPU’ means sampling on CPU.

sample(req_num, trials_num=5, padding=False)[source]

Negative sampling.

Parameters:

req_num – The number of request(max) negative samples.
trials_num – The number of trials for negative sampling.
padding – Whether to patch the negative sampling results to req_num. If True, after trying trials_num times, if the number of true negative samples is still less than req_num, just random sample edges(non-strict negative) as negative samples.

Returns:

negative edge_index(non-strict when padding is True).

neighbor_sampler

class NeighborSampler(graph: Graph | Dict[Tuple[str, str, str], Graph], num_neighbors: List[int] | Dict[Tuple[str, str, str], List[int]] | None = None, device: device = device(type='cuda', index=0), with_edge: bool = False, with_neg: bool = False, strategy: str = 'random')[source]

Bases: BaseSampler

Neighbor Sampler.

property subgraph_op

lazy_init_sampler()[source]

lazy_init_neg_sampler()[source]

lazy_init_subgraph_op()[source]

sample_one_hop(input_seeds: Tensor, req_num: int, etype: Tuple[str, str, str] | None = None) → NeighborOutput[source]

sample_from_nodes(inputs: NodeSamplerInput, **kwargs) → HeteroSamplerOutput | SamplerOutput[source]

Performs sampling from the nodes specified in inputs, returning a sampled subgraph(egograph) in the specified output format.

Parameters:: inputs (torch.Tensor) – The input data with node indices to start sampling from.

sample_from_edges(inputs: EdgeSamplerInput, **kwargs) → HeteroSamplerOutput | SamplerOutput[source]

Performs sampling from an edge sampler input, leveraging a sampling function of the same signature as node_sample.

Currently, we support the out-edge sampling manner, so we reverse the direction of src and dst for the output so that features of the sampled nodes during training can be aggregated from k-hop to (k-1)-hop nodes.

sample_pyg_v1(ids: Tensor)[source]

Sample multi-hop neighbors and organize results to PyG’s EdgeIndex.

Parameters:

ids – input ids, 1D tensor.
NeighborSampler (The sampled results that is the same as PyG's) –

subgraph(inputs: NodeSamplerInput) → SamplerOutput[source]

Induce an enclosing subgraph based on inputs and their neighbors(if: num_neighbors is not None).

Parameters:: inputs (torch.Tensor) – The input data with node indices to induce subgraph from.
Returns:: The sampled unique nodes, relabeled rows and cols, original edge_ids, and a mapping from indices in inputs to new indices in output nodes, i.e. nodes[mapping] = inputs.

sample_prob(inputs: NodeSamplerInput, node_cnt: int | Dict[str, int]) → Tensor | Dict[str, Tensor][source]: Get the probability of each node being sampled.

get_inducer(input_batch_size: int)[source]

create_inducer(input_batch_size: int)[source]