Skip to content

RingBDStack/DyGFM

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DyGFM

DyGFM (Dynamic Graph Foundation Model) is a three-stage pipeline for temporal graph learning: static GCN pre-trainingdynamic TGAT pre-training (two phases)downstream fine-tuning with MoE routing and conditional prompts.

Supported downstream datasets: genre, mooc, reddit, wikipedia.

Cross-domain rule: when you set --dataset X, pre-training uses the other three domains; fine-tuning and evaluation run on target domain X.


Table of Contents


Repository layout

DyGFM/                          # project root (all paths resolve from here via paths.py)
├── paths.py                    # central path constants
├── prepare_data.py             # verify inputs + build static/data/*.pt
├── processed/                  # node & edge feature matrices (.npy)
├── graph_structure/            # static edge_index (.npy)
├── downstream_data/            # temporal interaction CSVs
├── static/
│   ├── pretrain.py
│   ├── embedding.py            # export static routing tokens
│   ├── rebuild_static_pt.py
│   ├── data/{dataset}.pt
│   └── save_model/
├── dynamic/
│   ├── pretrain.py             # phase 1
│   ├── pretrain_phase2.py      # phase 2
│   ├── data/features|normal_time_pt|edge_feature/
│   ├── checkpoints/
│   └── save_model/
└── fine_tune/
    ├── fine_tune.py            # link prediction (main)
    ├── fine_tune_node.py       # node classification
    ├── fine_tune_genre.py      # genre-specific link prediction
    ├── sentenc_branch/         # static weights + tokens
    └── time_branch/            # dynamic weights + tokens

Environment

Component Recommended
Python 3.8+
PyTorch 1.10+ (CUDA build if using GPU)
PyTorch Geometric 2.5.x (for static/rebuild_static_pt.py and PyG Data)
Other numpy, pandas, scipy, scikit-learn, tqdm

Example install (adjust CUDA wheel URL for your torch version):

pip install torch numpy pandas scipy scikit-learn tqdm
pip install torch-geometric==2.5.3
# If needed for static GCN:
pip install torch-scatter torch-sparse -f https://data.pyg.org/whl/torch-1.10.0+cu113.html

Optional: swanlab is imported by some training scripts; disable or install if you use experiment logging.


Data preparation

1. Required initial files (not generated by training)

Place these under the DyGFM repository root. {ds}{genre, mooc, reddit, wikipedia}.

Path Description
processed/ml_{ds}_node.npy Node features (172-dim), row i = node id i
processed/ml_{ds}.npy Edge features for downstream loader
graph_structure/{ds}_edge_index.npy Static graph edges [2, E] (ground-truth topology)
downstream_data/{ds}/ds_{ds}.csv Temporal edges: u, i, ts, label, idx, ...
downstream_data/genre/ds_genre_{1..5}.csv Genre node-class splits (optional for genre tasks)
dynamic/data/features/{ds}.pt Dict/tensor with node features x
dynamic/data/normal_time_pt/{ds}.pt Temporal graph bundle for TGAT
dynamic/data/edge_feature/{ds}.pt Edge feature tensor for dynamic pre-training

Approximate total size for all four datasets: ~5 GB (dominated by processed/ml_*.npy and edge_feature/*.pt).

2. Generated locally (no external copy needed)

Path How to create
static/data/{ds}.pt python prepare_data.py or python static/rebuild_static_pt.py --dataset {ds}
static/data/cache/* Created automatically during static pre-training if caching is enabled

3. Produced by pre-training (for fine-tuning)

After static / dynamic training, copy or export artifacts as follows.

Static branch (target domain T):

Artifact Typical source Destination
GCN checkpoint static/save_model/T_*.pt (best run) fine_tune/sentenc_branch/saved_model/T.pt
Static routing tokens static/embedding.pystatic/data/embeddings/T/{src}_embeddings.pt fine_tune/sentenc_branch/embeddings/T/{src}_embeddings.pt

{src} = each of the three pre-training domains (e.g. for T=wikipedia: genre, mooc, reddit).

Dynamic branch (target domain T):

Artifact Typical source Destination
TGAT checkpoint dynamic/checkpoints/overall_best_model_T.pt (phase 1) or overall_best_model_T_phase2.pt (phase 2) fine_tune/time_branch/saved_model/T.pt
Dynamic routing tokens Export / training pipeline (shape [N, 64] per source domain) fine_tune/time_branch/embeddings/T/{src}_embeddings.pt

Fine-tuning will fail if saved_model/T.pt or cross-domain *_embeddings.pt files are missing.


Quick check

From the repository root:

cd /path/to/DyGFM
python prepare_data.py

Expected: all [OK] lines and Saved static/data/{ds}.pt for each dataset (or [OK] if already present).


End-to-end workflow

Replace T with your target domain (e.g. wikipedia). Use smoke-test epoch counts first, then scale to paper settings.

Step 0 — Verify data

cd /path/to/DyGFM
python prepare_data.py

Step 1 — Static pre-training

Pre-trains on the three domains other than T.

cd static
python pretrain.py --dataset T --gpu 0 --nb_epochs 5000 --patience 200 --lr 1e-5

Smoke test:

python pretrain.py --dataset wikipedia --gpu 0 --nb_epochs 2 --patience 50

Copy the best checkpoint:

cp save_model/wikipedia_<timestamp>.pt \
   ../fine_tune/sentenc_branch/saved_model/wikipedia.pt

Generate static routing tokens (run from static/, model path = your checkpoint):

python embedding.py --dataset wikipedia --model_path save_model/wikipedia_<timestamp>.pt --gpu 0
mkdir -p ../fine_tune/sentenc_branch/embeddings/wikipedia
cp data/embeddings/wikipedia/*_embeddings.pt \
   ../fine_tune/sentenc_branch/embeddings/wikipedia/

Step 2 — Dynamic pre-training (phase 1 + phase 2)

Phase 1 — train shared backbone, adapters frozen:

cd ../dynamic
python pretrain.py --dataset T --gpu 0 --freeze_adapter --epochs_per_domain 30 --alternating_cycles 2

Checkpoint: dynamic/checkpoints/overall_best_model_T.pt

Smoke test:

python pretrain.py --dataset wikipedia --gpu 0 --freeze_adapter --epochs_per_domain 1 --alternating_cycles 1

Phase 2 — freeze backbone, fine-tune adapters:

python pretrain_phase2.py --dataset T --gpu 0 \
  --phase1_model_path checkpoints/overall_best_model_T.pt \
  --epochs_per_domain 50

Copy TGAT weights for fine-tuning:

cp checkpoints/overall_best_model_T_phase2.pt \
   ../fine_tune/time_branch/saved_model/T.pt
# Or use phase-1 weights if phase 2 is skipped:
# cp checkpoints/overall_best_model_T.pt ../fine_tune/time_branch/saved_model/T.pt

Place dynamic routing tokens under fine_tune/time_branch/embeddings/T/ (one {src}_embeddings.pt per source domain). These are produced by your dynamic export / pre-training pipeline.

Step 3 — Fine-tuning

cd ../fine_tune
python fine_tune.py --dataset T --gpu 0 --epochs 1000 --num_runs 100 --patience 100 --val_freq 5

Smoke test (requires weights + tokens from steps 1–2):

python fine_tune.py --dataset wikipedia --gpu 0 --epochs 1 --num_runs 1 --val_freq 1

Fine-tuning experiments

All commands assume cd fine_tune and that sentenc_branch/ + time_branch/ artifacts exist for the target domain.

Link prediction (main)

python fine_tune.py \
  --dataset wikipedia \
  --gpu 0 \
  --epochs 10000 \
  --num_runs 100 \
  --patience 100 \
  --val_freq 5 \
  --prefix exp_wiki

Metrics: transductive / inductive AP and AUC on validation and test splits (see console and fine_tune/results/).

Node classification

Uses ds_{genre}_{class}.csv when --dataset genre.

python fine_tune_node.py \
  --dataset genre \
  --genre_class 1 \
  --gpu 0 \
  --epochs 10000 \
  --num_runs 100 \
  --patience 100 \
  --train_shot_num 3 \
  --val_shot_num 3

Smoke test:

python fine_tune_node.py --dataset genre --genre_class 1 --gpu 0 --epochs 5 --num_runs 1 --patience 1000

Genre link prediction (stratified sampling)

python fine_tune_genre.py --dataset genre --gpu 0 --epochs 300 --num_runs 1

Reload / time-slice node test

python reload_test_node.py --dataset mooc --gpu 0 --task_num 5

Outputs and checkpoints

Stage Location Naming
Static pre-train static/save_model/ {target}_{timestamp}.pt
Dynamic phase 1 dynamic/checkpoints/ overall_best_model_{target}.pt
Dynamic phase 2 dynamic/checkpoints/ overall_best_model_{target}_phase2.pt
Dynamic phase 1/2 saves dynamic/save_model/ tgat_{target}_{timestamp}_phase{1,2}.pt
Fine-tune run logs fine_tune/results/ {dataset}_{prefix}_run_*.pkl, *_final_summary.pkl
Fine-tune checkpoints `fine_tune/checkpoints/edge node/`

Path resolution is centralized in paths.py; scripts do not rely on ../ relative paths.


Module documentation

About

This repository is the official implementation of "Decoupled and Divergence-Conditioned Prompt for Multi-domain Dynamic Graph Foundation Models".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%