SGC
Introduction
Title: Simplifying Graph Convolutional Networks
Authors: Felix Wu, Tianyi Zhang, Amauri Holanda de Souza Jr., Christopher Fifty, Tao Yu, Kilian Q. Weinberger
Abstract: Graph Convolutional Networks (GCNs) and their variants have experienced significant attention and have become the de facto methods for learning graph representations. GCNs derive inspiration primarily from recent deep learning approaches, and as a result, may inherit unnecessary complexity and redundant computation. In this paper, we reduce this excess complexity through successively removing nonlinearities and collapsing weight matrices between consecutive layers. We theoretically analyze the resulting linear model and show that it corresponds to a fixed low-pass filter followed by a linear classifier. Notably, our experimental evaluation demonstrates that these simplifications do not negatively impact accuracy in many downstream applications. Moreover, the resulting model scales to larger datasets, is naturally interpretable, and yields up to two orders of magnitude speedup over FastGCN.
Running with XGCN
XGCN implements two version of SGC: (1) SGC: freeze node embeddings (such as embeddings pretrained by node2vec) as fixed features;
(2) SGC_learnable_emb: has learnable node embeddings.
SGC
Configuration template for SGC:
# config/SGC-config.yaml
# Dataset/Results root
data_root: ""
results_root: ""
# Trainer configuration
epochs: 200
use_validation_for_early_stop: 1
val_freq: 1
key_score_metric: r100
convergence_threshold: 20
val_method: ""
val_batch_size: 256
file_val_set: ""
# Testing configuration
test_method: ""
test_batch_size: 256
file_test_set: ""
# DataLoader configuration
Dataset_type: NodeListDataset
num_workers: 1
NodeListDataset_type: LinkDataset
pos_sampler: ObservedEdges_Sampler
neg_sampler: RandomNeg_Sampler
num_neg: 1
BatchSampleIndicesGenerator_type: SampleIndicesWithReplacement
train_batch_size: 1024
str_num_total_samples: num_edges
epoch_sample_ratio: 0.1
# Model configuration
model: SGC
seed: 1999
device: 'cuda:0'
from_pretrained: 1
file_pretrained_emb: ''
freeze_emb: 1
L2_reg_weight: 0.0
dnn_lr: 0.001
num_gcn_layers: 2
loss_fn: bpr
Run SGC from command line:
Note that pretrained embeddings are needed, to run the script below, please run Node2vec first.
# script/examples/facebook/run_SGC.sh
# set to your own path:
all_data_root='/home/sxr/code/XGCN_and_data/XGCN_data'
config_file_root='/home/sxr/code/XGCN_and_data/XGCN_library/config'
dataset=facebook
model=SGC
seed=0
device='cuda:1'
data_root=$all_data_root/dataset/instance_$dataset
results_root=$all_data_root/model_output/$dataset/$model/[seed$seed]
# pretrained embeddings are needed
file_pretrained_emb=$all_data_root/model_output/$dataset/Node2vec/[seed$seed]/model/out_emb_table.pt
python -m XGCN.main.run_model --seed $seed \
--config_file $config_file_root/$model-config.yaml \
--data_root $data_root --results_root $results_root \
--val_method one_pos_k_neg \
--file_val_set $data_root/val-one_pos_k_neg.pkl \
--key_score_metric r20 \
--test_method multi_pos_whole_graph \
--file_test_set $data_root/test-multi_pos_whole_graph.pkl \
--file_pretrained_emb $file_pretrained_emb \
--device $device \
SGC_learnable_emb
Configuration template for SGC_learnable_emb:
# config/SGC_learnable_emb-config.yaml
# Dataset/Results root
data_root: ""
results_root: ""
# Trainer configuration
epochs: 200
use_validation_for_early_stop: 1
val_freq: 1
key_score_metric: r100
convergence_threshold: 20
val_method: ""
val_batch_size: 256
file_val_set: ""
# Testing configuration
test_method: ""
test_batch_size: 256
file_test_set: ""
# DataLoader configuration
Dataset_type: BlockDataset
num_workers: 0
num_gcn_layers: 2
train_num_layer_sample: "[10, 10]"
NodeListDataset_type: LinkDataset
pos_sampler: ObservedEdges_Sampler
neg_sampler: RandomNeg_Sampler
num_neg: 1
BatchSampleIndicesGenerator_type: SampleIndicesWithReplacement
train_batch_size: 2048
str_num_total_samples: num_edges
epoch_sample_ratio: 0.1
# Model configuration
model: SGC_learnable_emb
seed: 1999
graph_device: "cuda:0"
emb_table_device: "cuda:0"
gnn_device: "cuda:0"
out_emb_table_device: "cuda:0"
forward_mode: sample
emb_dim: 64
emb_lr: 0.005
gnn_lr: 0.001
emb_init_std: 0.1
use_sparse: 0
freeze_emb: 0
from_pretrained: 0
file_pretrained_emb: ''
L2_reg_weight: 0.0
loss_type: bpr
Run SGC_learnable_emb from command line:
# script/examples/facebook/run_SGC_learnable_emb.sh
# set to your own path:
all_data_root='/home/sxr/code/XGCN_and_data/XGCN_data'
config_file_root='/home/sxr/code/XGCN_and_data/XGCN_library/config'
dataset=facebook
model=SGC_learnable_emb
seed=0
device="cuda:1"
graph_device=$device
emb_table_device=$device
gnn_device=$device
out_emb_table_device=$device
data_root=$all_data_root/dataset/instance_$dataset
results_root=$all_data_root/model_output/$dataset/$model/[seed$seed]
# file_pretrained_emb=$all_data_root/model_output/$dataset/Node2vec/[seed$seed]/model/out_emb_table.pt
python -m XGCN.main.run_model --seed $seed \
--config_file $config_file_root/$model-config.yaml \
--data_root $data_root --results_root $results_root \
--val_method one_pos_k_neg \
--file_val_set $data_root/val-one_pos_k_neg.pkl \
--key_score_metric r20 \
--test_method multi_pos_whole_graph \
--file_test_set $data_root/test-multi_pos_whole_graph.pkl \
--graph_device $graph_device --emb_table_device $emb_table_device \
--gnn_device $gnn_device --out_emb_table_device $out_emb_table_device \
# --from_pretrained 1 --file_pretrained_emb $file_pretrained_emb \