graphlearn_torch.sampler
base
- class EdgeIndex(edge_index: Tensor, e_id: Tensor | None, size: Tuple[int, int])[source]
Bases:
tuplePyG’s
EdgeIndexused in old data loaderNeighborSampler: https://github.com/pyg-team/pytorch_geometric/blob/master/torch_geometric/loader/neighbor_sampler.py- edge_index: Tensor
Alias for field number 0
- e_id: Tensor | None
Alias for field number 1
- size: Tuple[int, int]
Alias for field number 2
- class NodeSamplerInput(node: Tensor, input_type: str | None = None)[source]
Bases:
CastMixinThe sampling input of
sample_from_nodes().This class corresponds to
NodeSamplerInput: https://github.com/pyg-team/pytorch_geometric/blob/master/torch_geometric/sampler/base.py- Parameters:
node (torch.Tensor) – The indices of seed nodes to start sampling from.
input_type (str, optional) – The input node type (in case of sampling in a heterogeneous graph). (default:
None).
- node: Tensor
- input_type: str | None = None
- class NegativeSamplingMode(value)[source]
Bases:
EnumAn enumeration.
- binary = 'binary'
- triplet = 'triplet'
- class NegativeSampling(mode: NegativeSamplingMode | str, amount: int | float = 1, weight: Tensor | None = None)[source]
Bases:
CastMixinThe negative sampling configuration of a
BaseSamplerwhen callingsample_from_edges().- Parameters:
mode (str) – The negative sampling mode (
"binary"or"triplet"). If set to"binary", will randomly sample negative links from the graph. If set to"triplet", will randomly sample negative destination nodes for each positive source node.amount (int or float, optional) – The ratio of sampled negative edges to the number of positive edges. (default:
1)weight (torch.Tensor, optional) – A node-level vector determining the sampling of nodes. Does not necessariyl need to sum up to one. If not given, negative nodes will be sampled uniformly. (default:
None)
- mode: NegativeSamplingMode
- weight: Tensor | None = None
- amount: int | float = 1
- class EdgeSamplerInput(row: Tensor, col: Tensor, label: Tensor | None = None, input_type: Tuple[str, str, str] | None = None, neg_sampling: NegativeSampling | None = None)[source]
Bases:
CastMixinThe sampling input of
sample_from_edges().This class corresponds to
EdgeSamplerInput: https://github.com/pyg-team/pytorch_geometric/blob/master/torch_geometric/sampler/base.py- Parameters:
row (torch.Tensor) – The source node indices of seed links to start sampling from.
col (torch.Tensor) – The destination node indices of seed links to start sampling from.
label (torch.Tensor, optional) – The label for the seed links. (default:
None).input_type (Tuple[str, str, str], optional) – The input edge type (in case of sampling in a heterogeneous graph). (default:
None).
- row: Tensor
- col: Tensor
- label: Tensor | None = None
- input_type: Tuple[str, str, str] | None = None
- neg_sampling: NegativeSampling | None = None
- class SamplerOutput(node: Tensor, row: Tensor, col: Tensor, edge: Tensor | None = None, batch: Tensor | None = None, device: device | None = None, metadata: Any | None = None)[source]
Bases:
CastMixinThe sampling output of a
BaseSampleron homogeneous graphs.This class corresponds to
SamplerOutput: https://github.com/pyg-team/pytorch_geometric/blob/master/torch_geometric/sampler/base.py- Parameters:
node (torch.Tensor) – The sampled nodes in the original graph.
row (torch.Tensor) – The source node indices of the sampled subgraph. Indices must be re-indexed to
{ 0, ..., num_nodes - 1 }corresponding to the nodes in thenodetensor.col (torch.Tensor) – The destination node indices of the sampled subgraph. Indices must be re-indexed to
{ 0, ..., num_nodes - 1 }corresponding to the nodes in thenodetensor.edge (torch.Tensor, optional) – The sampled edges in the original graph. This tensor is used to obtain edge features from the original graph. If no edge attributes are present, it may be omitted.
batch (torch.Tensor, optional) – The vector to identify the seed node for each sampled node. Can be present in case of disjoint subgraph sampling per seed node. (default:
None).device (torch.device, optional) – The device that all data of this output resides in. (default:
None).metadata – (Any, optional): Additional metadata information. (default:
None).
- node: Tensor
- row: Tensor
- col: Tensor
- edge: Tensor | None = None
- batch: Tensor | None = None
- device: device | None = None
- metadata: Any | None = None
- class HeteroSamplerOutput(node: Dict[str, Tensor], row: Dict[Tuple[str, str, str], Tensor], col: Dict[Tuple[str, str, str], Tensor], edge: Dict[Tuple[str, str, str], Tensor] | None = None, batch: Dict[str, Tensor] | None = None, edge_types: List[Tuple[str, str, str]] | None = None, input_type: str | Tuple[str, str, str] | None = None, device: device | None = None, metadata: Any | None = None)[source]
Bases:
CastMixinThe sampling output of a
BaseSampleron heterogeneous graphs.This class corresponds to
HeteroSamplerOutput: https://github.com/pyg-team/pytorch_geometric/blob/master/torch_geometric/sampler/base.py- Parameters:
node (Dict[str, torch.Tensor]) – The sampled nodes in the original graph for each node type.
row (Dict[Tuple[str, str, str], torch.Tensor]) – The source node indices of the sampled subgraph for each edge type. Indices must be re-indexed to
{ 0, ..., num_nodes - 1 }corresponding to the nodes in thenodetensor of the source node type.col (Dict[Tuple[str, str, str], torch.Tensor]) – The destination node indices of the sampled subgraph for each edge type. Indices must be re-indexed to
{ 0, ..., num_nodes - 1 }corresponding to the nodes in thenodetensor of the destination node type.edge (Dict[Tuple[str, str, str], torch.Tensor], optional) – The sampled edges in the original graph for each edge type. This tensor is used to obtain edge features from the original graph. If no edge attributes are present, it may be omitted. (default:
None).batch (Dict[str, torch.Tensor], optional) – The vector to identify the seed node for each sampled node for each node type. Can be present in case of disjoint subgraph sampling per seed node. (default:
None).edge_types – (List[Tuple[str, str, str]], optional): The list of edge types of the sampled subgraph. (default:
None).input_type – (Union[NodeType, EdgeType], optional): The input type of seed nodes or edge_label_index. (default:
None).device (torch.device, optional) – The device that all data of this output resides in. (default:
None).metadata – (Any, optional): Additional metadata information. (default:
None)
- node: Dict[str, Tensor]
- row: Dict[Tuple[str, str, str], Tensor]
- col: Dict[Tuple[str, str, str], Tensor]
- edge: Dict[Tuple[str, str, str], Tensor] | None = None
- batch: Dict[str, Tensor] | None = None
- edge_types: List[Tuple[str, str, str]] | None = None
- input_type: str | Tuple[str, str, str] | None = None
- device: device | None = None
- metadata: Any | None = None
- class NeighborOutput(nbr: Tensor, nbr_num: Tensor, edge: Tensor | None)[source]
Bases:
CastMixinThe output of sampled neighbor results for a single hop sampling.
- Parameters:
nbr (torch.Tensor) – A 1D tensor of all sampled neighborhood node ids.
nbr_num (torch.Tensor, optional) – A 1D tensor that identify the number of neighborhood nodes for each source nodes. Must be the same length as the source nodes of this sampling hop.
nbr_num – The edge ids corresponding to the sampled edges (from source node to the sampled neighborhood node). Should be the same length as
nbrif provided.
- nbr: Tensor
- nbr_num: Tensor
- edge: Tensor | None
- class SamplingType(value)[source]
Bases:
EnumEnum class for sampling types.
- NODE = 0
- LINK = 1
- SUBGRAPH = 2
- RANDOM_WALK = 3
- class SamplingConfig(sampling_type: SamplingType, num_neighbors: List[int] | Dict[Tuple[str, str, str], List[int]] | None, batch_size: int, shuffle: bool, drop_last: bool, with_edge: bool, collect_features: bool, with_neg: bool)[source]
Bases:
objectConfiguration info for sampling.
- sampling_type: SamplingType
- num_neighbors: List[int] | Dict[Tuple[str, str, str], List[int]] | None
- batch_size: int
- shuffle: bool
- drop_last: bool
- with_edge: bool
- collect_features: bool
- with_neg: bool
- class BaseSampler[source]
Bases:
ABCA base class that initializes a graph sampler and provides
sample_from_nodes()andsample_from_edges()routines.This class corresponds to
BaseSampler: https://github.com/pyg-team/pytorch_geometric/blob/master/torch_geometric/sampler/base.py- abstract sample_from_nodes(inputs: NodeSamplerInput, **kwargs) HeteroSamplerOutput | SamplerOutput[source]
Performs sampling from the nodes specified in
inputs, returning a sampled subgraph(egograph) in the specified output format.- Parameters:
inputs (torch.Tensor) – The input data with node indices to start sampling from.
- abstract sample_from_edges(inputs: EdgeSamplerInput, **kwargs) HeteroSamplerOutput | SamplerOutput[source]
Performs sampling from the edges specified in
inputs, returning a sampled subgraph(egograph) in the specified output format.- Parameters:
inputs (EdgeSamplerInput) – The input data for sampling from edges including the (1) source node indices, the (2) destination node indices, the (3) optional edge labels and the (4) input edge type.
- abstract subgraph(inputs: NodeSamplerInput) SamplerOutput[source]
- Induce an enclosing subgraph based on inputs and their neighbors(if
num_neighbors is not None).
- Parameters:
inputs (torch.Tensor) – The input data with node indices to induce subgraph from.
- Returns:
The sampled unique nodes, relabeled rows and cols, original edge_ids, and a mapping from indices in inputs to new indices in output nodes, i.e. nodes[mapping] = inputs.
negative_sampler
- class RandomNegativeSampler(graph, mode='CUDA')[source]
Bases:
objectRandom negative Sampler.
- Parameters:
graph – A
graphlearn_torch.data.Graphobject.mode – Execution mode of sampling, ‘CUDA’ means sampling on GPU, ‘CPU’ means sampling on CPU.
- sample(req_num, trials_num=5, padding=False)[source]
Negative sampling.
- Parameters:
req_num – The number of request(max) negative samples.
trials_num – The number of trials for negative sampling.
padding – Whether to patch the negative sampling results to req_num. If True, after trying trials_num times, if the number of true negative samples is still less than req_num, just random sample edges(non-strict negative) as negative samples.
- Returns:
negative edge_index(non-strict when padding is True).
neighbor_sampler
- class NeighborSampler(graph: Graph | Dict[Tuple[str, str, str], Graph], num_neighbors: List[int] | Dict[Tuple[str, str, str], List[int]] | None = None, device: device = device(type='cuda', index=0), with_edge: bool = False, with_neg: bool = False, strategy: str = 'random')[source]
Bases:
BaseSamplerNeighbor Sampler.
- property subgraph_op
- sample_one_hop(input_seeds: Tensor, req_num: int, etype: Tuple[str, str, str] | None = None) NeighborOutput[source]
- sample_from_nodes(inputs: NodeSamplerInput, **kwargs) HeteroSamplerOutput | SamplerOutput[source]
Performs sampling from the nodes specified in
inputs, returning a sampled subgraph(egograph) in the specified output format.- Parameters:
inputs (torch.Tensor) – The input data with node indices to start sampling from.
- sample_from_edges(inputs: EdgeSamplerInput, **kwargs) HeteroSamplerOutput | SamplerOutput[source]
Performs sampling from an edge sampler input, leveraging a sampling function of the same signature as node_sample.
Currently, we support the out-edge sampling manner, so we reverse the direction of src and dst for the output so that features of the sampled nodes during training can be aggregated from k-hop to (k-1)-hop nodes.
- sample_pyg_v1(ids: Tensor)[source]
Sample multi-hop neighbors and organize results to PyG’s EdgeIndex.
- Parameters:
ids – input ids, 1D tensor.
NeighborSampler (The sampled results that is the same as PyG's) –
- subgraph(inputs: NodeSamplerInput) SamplerOutput[source]
- Induce an enclosing subgraph based on inputs and their neighbors(if
num_neighbors is not None).
- Parameters:
inputs (torch.Tensor) – The input data with node indices to induce subgraph from.
- Returns:
The sampled unique nodes, relabeled rows and cols, original edge_ids, and a mapping from indices in inputs to new indices in output nodes, i.e. nodes[mapping] = inputs.
- sample_prob(inputs: NodeSamplerInput, node_cnt: int | Dict[str, int]) Tensor | Dict[str, Tensor][source]
Get the probability of each node being sampled.