Open
Conversation
- Makefile with targets for running table/figure generation - README with usage instructions and data download link - requirements.txt with dependencies - download_data.py script for fetching pre-generated graphs
- generate_benchmark_tables.py: Standard PGD metrics with VUN - generate_mmd_tables.py: Gaussian TV and RBF MMD metrics Both scripts use the polygraph-benchmark API: - StandardPGDInterval for PGD computation - GaussianTVMMD2BenchmarkInterval/RBFMMD2BenchmarkInterval for MMD - Proper graph format conversion for DIGRESS tensor outputs
Computes PGD metrics using Logistic Regression classifier instead of TabPFN, with standard descriptors (orbit counts, degree, spectral, clustering, GIN). Uses PolyGraphDiscrepancyInterval with sklearn LogisticRegression for classifier-based evaluation.
Compares standard PGD (max over individual descriptors) vs concatenated PGD (all descriptors combined into single feature vector). Features: - ConcatenatedDescriptor class with PCA dimensionality reduction - Handles TabPFN 500-feature limit via PCA to 100 components - Uses LogisticRegression for concatenated features - Optimized subset mode for faster testing
- generate_model_quality_figures.py: Training/denoising curves - generate_perturbation_figures.py: Metric sensitivity to edge perturbations - generate_phase_plot.py: PGD vs VUN training dynamics - generate_subsampling_figures.py: Bias-variance tradeoff analysis All scripts use StandardPGDInterval from polygraph-benchmark API. Phase plot gracefully handles missing VUN values (requires graph_tool).
Rename [project] to [workspace] per updated pixi schema and correct the pypi-dependencies package name from polygraph to polygraph-benchmark.
Move the expected data path from polygraph_graphs/ to data/polygraph_graphs/ to keep generated data under the gitignored data/ directory.
Documentation is consolidated into the main README and dependencies are managed through pyproject.toml extras.
Add submitit-based cluster module for distributing reproducibility workloads across SLURM nodes. Includes YAML-configurable job parameters, job metadata tracking, and result collection helpers. - cluster.py: shared wrapper with SlurmConfig, submit_jobs, collect_results - configs/: default CPU and GPU SLURM configurations - pyproject.toml: new [cluster] optional dependency group (submitit, pyyaml)
Add --slurm-config, --local, and --collect CLI options to all four table generation scripts for distributing computation across SLURM nodes. Each script gains a standalone task function suitable for submitit, result reshaping helpers, and three execution modes (local, submit, collect). Also updates DATA_DIR paths and adds tables-submit/tables-collect Make targets.
Document the full reproducibility workflow including data download, script overview, Make targets, hardware requirements, SLURM cluster submission, and troubleshooting tips.
Include LaTeX tables and PDF figures produced by the reproducibility scripts so reviewers can verify outputs without re-running computation.
- Replace monolithic generate_*.py scripts with modular 01-08 experiment directories, each with compute.py, plot.py, and/or format.py - Add Hydra configs for all experiments with SLURM launcher support - Fix sparse feature OOM in GKLR (Bug 12), package name in graph_storage, TabPFN CPU limit workaround, and stale cache issues - Add kernel logistic regression module and async results I/O utility - Regenerate all tables with correct PGD values, subscores, and GKLR graph kernel metrics (PM/SP/WL) - Regenerate all figures including new subsampling, perturbation, model quality, and phase plot visualizations - Include all JSON result files for full reproducibility
Ensure consistent float64 dtype in kernel diagonal computation and normalization to prevent precision issues with sparse matrix outputs.
Ego dataset has 757 graphs (odd number), causing unequal reference/ perturbed splits which fails the equal-count requirement. Use half = len // 2 and slice [half : 2*half] to guarantee equal sizes. Same fix applied to proteins split for consistency.
TabPFN v2.0.9 raises ValueError when its encoder produces NaN from near-constant features after StandardScaler normalization (see github.com/PriorLabs/TabPFN/issues/108). This caused lobster/GRAN n=32 PGD subsampling to crash completely. Wrap classifier fit/predict in try/except in both the CV fold loop and the refit section. On failure, treat as indistinguishable distributions (score=0), matching the existing constant-feature fallback semantics.
Match original polygraph CombinedDescriptor behavior: per-descriptor StandardScaler + PCA, both fit on reference data only. Fix subsample size calculation to use 50% of min subset capped at 2048, matching the original experiment configuration.
Increase reference graph count from 512 to 4096 to match original experiment. Fix subsample size to 50% of min subset capped at 2048, consistent with the 2x requirement of PolyGraphDiscrepancyInterval.
Add submitit launcher config for p.hpcl94g partition with H100 GPUs, enabling faster TabPFN computation for PGD experiments.
Recompute all experiments after fixing: - KernelLogisticRegression float64 precision - Ego/proteins unequal graph splits - TabPFN NaN handling for near-constant features - Concatenation PCA pipeline - GKLR reference graph count and subsample sizes PGD subsampling: 117/120 results (3 ESGG n=4096 infeasible due to dataset size). All values within bootstrap variance of paper. Perturbation: 25/25 results including ego dataset. Benchmark, concatenation, GKLR tables: all 16/16 regenerated.
Includes TabPFN v6 classifier updates, plotting and formatting improvements across all reproducibility experiments, and added backoff/tabpfn dependencies.
Regenerate all reproducibility tables and figures using TabPFN weights v2.5 for camera-ready preparation. Add --results-suffix support to 03_model_quality/format.py. Include comparison and merge utility scripts.
Needed for PDF-to-image conversion in diff report generation.
Refactor the VUN metric to support multiprocessing for novelty and validity checks, and add a per-pair SIGALRM timeout on isomorphism to prevent hangs on pathological graph pairs. Extract shared VUN helpers into reproducibility/utils/vun.py for reuse across experiments.
Replace ad-hoc if/else branching on weights version with a version_map dict that raises on unknown versions instead of silently falling back. Applied consistently across all five compute scripts.
Add dedicated scripts to compute VUN (Valid-Unique-Novel) metrics for denoising-iteration checkpoints and benchmark results. These patch existing result JSONs with VUN values using parallel isomorphism checking.
Add CPU-only (hpcl94c) and GPU (hpcl93) SLURM launcher configs for Hydra multirun. Add experiment 09 that computes train-vs-test reference PGD values to establish metric baselines per dataset.
Add bold/underline formatting for best/second-best values per row in correlation and benchmark tables. Scale correlation values by 100 for readability. Add VUN column support in denoising PGS table. Add subscore ranking in benchmark table. Rename orbit_pgs to orbit4_pgs.
Add a new CLI subcommand for generating perturbation metric-vs-noise figures for a single dataset (e.g. SBM-only plots), supporting both single-perturbation and all-perturbation layouts.
Updated with TabPFN weights v2.5, improved table formatting (bold/ underline ranking, values scaled by 100), new SBM perturbation plots, and additional versioned table snapshots for comparison.
Add helper scripts used during the camera-ready recomputation: PGD diff checking, environment validation, pickle inspection, HTML diff report generation, SLURM recompute wrappers, and rerun notes documenting the process.
These are only used by the diff report generator script, not the core library. Move them from top-level pixi.toml dependencies into the dev extras in pyproject.toml so they're pulled in via the existing extras = ["dev", "cluster"] configuration.
These are generated artifacts that should be reproduced from the scripts, not tracked in version control.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Set of improvements for the camera-ready version