GBP

Introduction

Title: Scalable Graph Neural Networks via Bidirectional Propagation

Authors: Ming Chen, Zhewei Wei, Bolin Ding, Yaliang Li, Ye Yuan, Xiaoyong Du, Ji-Rong Wen

Abstract: Graph Neural Networks (GNN) is an emerging field for learning on non-Euclidean data. Recently, there has been increased interest in designing GNN that scales to large graphs. Most existing methods use “graph sampling” or “layer-wise sampling” techniques to reduce training time. However, these methods still suffer from degrading performance and scalability problems when applying to graphs with billions of edges. This paper presents GBP, a scalable GNN that utilizes a localized bidirectional propagation process from both the feature vectors and the training/testing nodes. Theoretical analysis shows that GBP is the first method that achieves sub-linear time complexity for both the precomputation and the training phases. An extensive empirical study demonstrates that GBP achieves state-of-the-art performance with significantly less training/testing time. Most notably, GBP can deliver superior performance on a graph with over 60 million nodes and 1.8 billion edges in less than half an hour on a single machine. The codes of GBP can be found at https://github.com/chennnM/GBP .

Running with XGCN

Configuration template for GBP:

# config/GBP-config.yaml
# Dataset/Results root
data_root: ""
results_root: ""

# Trainer configuration
epochs: 200
use_validation_for_early_stop: 1
val_freq: 1
key_score_metric: r100
convergence_threshold: 20
val_method: ""
val_batch_size: 256
file_val_set: ""

# Testing configuration
test_method: ""
test_batch_size: 256
file_test_set: ""

# DataLoader configuration
Dataset_type: NodeListDataset
num_workers: 1
NodeListDataset_type: LinkDataset
pos_sampler: ObservedEdges_Sampler
neg_sampler: RandomNeg_Sampler
num_neg: 1
BatchSampleIndicesGenerator_type: SampleIndicesWithReplacement
train_batch_size: 2048
str_num_total_samples: num_edges
epoch_sample_ratio: 0.1

# Model configuration
model: GBP
seed: 1999
device: 'cuda:0'

from_pretrained: 1
file_pretrained_emb: ''
freeze_emb: 1

alpha: 0.1
walk_length: 6
rmax_ratio: 0.01
dnn_arch: "[nn.Linear(64, 1024), nn.Tanh(), nn.Linear(1024, 64)]"
dnn_lr: 0.001
L2_reg_weight: 0.0

loss_fn: bpr

Run GBP from command line:

Note that pretrained embeddings are needed, to run the script below, please run Node2vec first.

# script/examples/facebook/run_GBP.sh
all_data_root='/home/sxr/code/XGCN_and_data/XGCN_data'
config_file_root='/home/sxr/code/XGCN_and_data/XGCN_library/config'

dataset=facebook
model=GBP
seed=0
device='cuda:1'

data_root=$all_data_root/dataset/instance_$dataset
results_root=$all_data_root/model_output/$dataset/$model/[seed$seed]

# pretrained embeddings are needed
file_pretrained_emb=$all_data_root/model_output/$dataset/Node2vec/[seed$seed]/model/out_emb_table.pt

python -m XGCN.main.run_model --seed $seed \
    --config_file $config_file_root/$model-config.yaml \
    --data_root $data_root --results_root $results_root \
    --val_method one_pos_k_neg \
    --file_val_set $data_root/val-one_pos_k_neg.pkl \
    --key_score_metric r20 \
    --test_method multi_pos_whole_graph \
    --file_test_set $data_root/test-multi_pos_whole_graph.pkl \
    --file_pretrained_emb $file_pretrained_emb \
    --device $device \