paper-ai-diffraction

Paper-focused reproducibility repository for Attention Is Not All You Need for Diffraction.

This repo reproduces the paper-facing table rows and figure layer for the mixed-curriculum results:

benchmark summary rows from bundled JSON artifacts
supplemental positional-ablation benchmark rows from bundled JSON artifacts
topology-distance figure
topology-flow figure set

The repo does not bundle model checkpoints or benchmark HDF5 files. Those come from:

Zenodo checkpoints: see reproducibility/checkpoint_manifest.csv
external benchmark/trainready datasets: see reproducibility/dataset_manifest.csv

Because the upstream RRUFF-derived benchmark files are not redistributed here, this repo publishes the benchmark-construction algorithms instead:

The repo does bundle compact reviewer-facing artifacts:

two example diffraction CSVs derived from the paper benchmark
their paired JSON metadata
SG/EG lookup CSVs
a compact prior JSON/CSV
a compact precomputed RRUFF-325 summary JSON

Those are sufficient for the shipped notebook walkthrough without Box or the full RRUFF benchmark.

Reviewer-facing notebook support is documented in:

Supported notebook usage paths:

local machine with the train/eval environment and a released checkpoint
TACC TAP on Stampede3 with the same repo checkout and checkpoint placement

Google Colab is plausible for the lightweight checkpoint-only reviewer demo, but it is not the primary validated path.

Zenodo archival package:

DOI: 10.5281/zenodo.19558452
Record: zenodo.org/records/19558452

Current archive split:

GitHub repo:
- code
- notebooks
- benchmark-construction scripts
- paper-facing wrappers and docs
Zenodo:
- checkpoints
- compact result JSONs
- configs
- launchers
- short archival manifests/notes
reviewer compact assets:
- packaged separately in reviewer_compact_assets.tar.gz

Install

For paper tables and figures:

conda env create -f environment.yml
conda activate paper-ai-diffraction
pip install -e .

For checkpoint evaluation or training reruns:

conda env create -f environment-train-eval.yml
conda activate paper-ai-diffraction-train-eval
pip install -e .

TACC-specific notes are in:

TACC_ENV.md

Checkpoints And Data

Checkpoints are downloaded from Zenodo and should be placed under:

external/checkpoints/                         # core paper checkpoints (flat)
external/checkpoints/                         # supplemental ViT checkpoints (Tables S3–S7, Figs S3/S5)

The exact filenames and expected local paths are listed in:

reproducibility/checkpoint_manifest.csv

This manifest includes every checkpoint explicitly named in the manuscript main text or supplement. Exploratory checkpoints are still archived, but are labeled accordingly in the notes column.

External benchmark and trainready datasets are not redistributed in this repo. Their required environment variables and example source paths are listed in:

reproducibility/dataset_manifest.csv

Regenerate Paper Outputs

Table rows from bundled paper JSONs:

python scripts/make_main_tables.py

Topology-distance figure from bundled failure JSONs:

./scripts/make_topology_distance_figure.sh

Topology-flow figure set from bundled failure JSON plus external canonical CSV:

export CANONICAL_CSV=/path/to/canonical_extinction_to_space_group.csv
./scripts/make_topology_flow_figure.sh

Calibration sweep figure from the bundled Stage-2c sweep JSON:

./scripts/make_calibration_figure.sh

Curriculum holdout figure from bundled paper values:

python scripts/make_curriculum_real_holdout.py

RRUFF-473 decoder-tradeoff figure from bundled paper values:

python scripts/make_stage_decoder_tradeoffs_rruff473.py

Physics-PE supplementary ruler figure from the bundled checkpoint-curve JSON:

python scripts/make_physics_pe_q2_ruler.py

Reconstruct the frozen RRUFF-473 benchmark from an upstream manifest plus raw XY files:

python scripts/reconstruct_rruff_473.py --manifest-json /path/to/rruff_cukalpha_manifest.json --xy-dir /path/to/xy_raw --reference-manifest-json /path/to/option1_metadata_manifest.json --output-json results/rruff473_reconstruction_summary.json

Rebuild RRUFF-325 deterministically from frozen RRUFF-473:

python scripts/build_rruff_325_from_473.py --input-h5 /path/to/RRUFF_option1_473_with_buckets_maxnorm.hdf5 --output-h5 /path/to/RRUFF_usable_plus_recoverable_325_with_labels_maxnorm.hdf5

Reviewer-support artifact generation:

python scripts/export_prior_asset.py --prior-h5 /path/to/trainready.hdf5 --output-csv results/reviewer/ext_group_priors.csv --output-json results/reviewer/ext_group_priors.json
python scripts/export_rruff_examples.py --benchmark-h5 /path/to/RRUFF_usable_plus_recoverable_325_with_labels_maxnorm.hdf5 --failure-json results/mixed2500k_compare_325_failure_modes_655279.json --output-dir assets/reviewer_examples
python scripts/precompute_benchmark_inference.py --checkpoint external/checkpoints/xrd_model_82ept35h_best.pth --config configs/final_mixed_2500k_dualsource.json --benchmark-h5 /path/to/RRUFF_usable_plus_recoverable_325_with_labels_maxnorm.hdf5 --prior-h5 /path/to/trainready.hdf5 --output-json results/reviewer/rruff325_precomputed_inference.json

If results/reviewer/rruff325_precomputed_inference.json is present, the reviewer notebook can browse the full paper-backed 325-example summary directly instead of recomputing it inside Jupyter.

Repo Contract

results/ contains only compact paper-backed JSON artifacts.
results/figures/ is generated output and is not tracked.
scripts/ contains the canonical paper-facing wrappers.
scripts/tacc_archive/ contains preserved historical campaign launchers for provenance only.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

paper-ai-diffraction

Install

Checkpoints And Data

Regenerate Paper Outputs

Repo Contract

Key References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
assets		assets
configs		configs
docs		docs
notebooks		notebooks
reproducibility		reproducibility
results		results
scripts		scripts
src/paper_ai_diffraction		src/paper_ai_diffraction
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment-train-eval.yml		environment-train-eval.yml
environment.yml		environment.yml
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

paper-ai-diffraction

Install

Checkpoints And Data

Regenerate Paper Outputs

Repo Contract

Key References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages