Skip to content

greenrace666/protfold

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mfold

An attempt to reproduce MiniFold in a compact, script-first form.

This project is a lightweight reimplementation inspired by MiniFold: Simple, Fast, and Accurate Protein Structure Prediction. It uses a pretrained language model and a MiniFold checkpoint to predict protein structures from amino-acid sequences and export them as PDB files.

Note: this repository is a reproduction attempt, not the official MiniFold codebase.


Features

  • Single-sequence structure prediction from a FASTA input file
  • Automatic checkpoint download from Hugging Face on first run
  • CPU-friendly inference with PyTorch
  • PDB export with atom coordinates and residue-level confidence scores
  • Simple workflow: drop in a prot.fasta, run the script, get output.pdb

System Requirements

Runtime

  • Python 3
  • PyTorch
  • huggingface_hub
  • esm

Recommended environment

  • A machine with atleast 24 RAM to load the model checkpoint
  • Internet access on first run so the checkpoint can be downloaded

Optional tuning

The script respects these environment variables:

  • MFOLD_NUM_THREADS
  • MFOLD_NUM_INTEROP_THREADS

Example:

set MFOLD_NUM_THREADS=8
set MFOLD_NUM_INTEROP_THREADS=1

Installation

Install the required Python packages in your environment:

uv sync

Usage

  1. Create a file named prot.fasta in the project directory.
  2. Put your protein sequence in standard FASTA format.
  3. Run the predictor:
python ownmfold.py

On the first run, the model checkpoint is downloaded automatically.


Input

The script expects a file named prot.fasta in the working directory.

Required format

  • Line 1: FASTA header, for example >protein_name
  • Line 2: the amino-acid sequence

Example:

>example_protein
MKTAYIAKQRQISFVKSHFSRQDILD

Important notes

  • The sequence is read from the second line of prot.fasta
  • Keep the sequence on a single line
  • Use the standard 20 amino-acid letters

Output

The script writes output.pdb in the current directory.

Output details

  • Atomic coordinates are written in PDB format
  • Residue confidence is stored in the B-factor column as a pLDDT-style score
  • The output currently uses a single chain A

Project Notes

  • The implementation is intentionally minimal and focused on inference.
  • The checkpoint is loaded from the Hugging Face repository jwohlwend/minifold.
  • The code is designed to run without gradients and with CPU execution by default.

Citation

If you use this project or compare against it, please cite the MiniFold paper:

@article{wohlwend2025minifold,
  title={MiniFold: Simple, Fast, and Accurate Protein Structure Prediction},
  author={Jeremy Wohlwend and Mateo Reveiz and Matt McPartlon and Axel Feldmann and Wengong Jin and Regina Barzilay},
  journal={Transactions on Machine Learning Research},
  issn={2835-8856},
  year={2025},
  url={https://openreview.net/forum?id=1p9hQTbjgo},
  note={Featured Certification}
}

License

This project is licensed under the MIT License. See LICENSE for the full text.


Acknowledgements

This project is inspired by the MiniFold paper and its released checkpoint. Thanks to the authors for making the model available.

About

Protein Structure Prediction in CPU

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages