Google Colab pipeline to design and evaluate de novo binders against any protein target, integrating P2Rank, RFdiffusion, ProteinMPNN and AlphaFold2.
ColabMiniprot is an end-to-end workflow that:
- Takes a target protein structure (PDB/mmCIF).
- Predicts surface pockets with P2Rank.
- Identifies hotspot residues in a chosen pocket using per–residue SASA.
- Trims the target around those hotspots to define a binding site.
- Uses RFdiffusion (backbones) + ProteinMPNN (sequence design) to generate candidate binders.
- Evaluates target–binder complexes with AlphaFold2 and interface metrics.
All the heavy lifting (installations, paths, file conversions, etc.) is automated inside the Colab notebook and linked to your Google Drive.
The notebook is organized into logical blocks:
-
Project setup (Google Drive)
- Asks for a
project_name. - Creates a working directory in your Drive:
MyDrive/<project_name>/ - Subfolders:
01_target02_p2rank03_hotspots04_trimming05_rfdiffusion06_sequences07_alphafold08_metrics
- Asks for a
-
Upload target structure 🎯
- You upload a PDB or (mm)CIF file.
- The notebook:
- Copies it into
01_target/. - Converts CIF → PDB if needed.
- Standardizes the name to:
target_input.pdb(both locally and in Drive).
- Copies it into
-
Pocket prediction with P2Rank 🔍
- Downloads and installs P2Rank (if not present).
- Runs P2Rank on
target_input.pdb. - Stores results in
p2rank_output/target_input.pdb_predictions.csv. - Loads and cleans the CSV into a
pandasDataFrame.
-
Pocket selection & hotspot detection 💥
- You choose:
pocket_rank(e.g. 1 = top-ranked pocket),sasa_cutoff,- residue filters:
nonearomatic_onlyhydrophobic_onlyaromatic_or_hydrophobic
- The notebook:
- Extracts
residue_idsfor the chosen pocket. - Computes per–residue SASA (Biopython + Shrake–Rupley).
- Ranks pocket residues by exposure / filters them by chemistry.
- Prints the top hotspot residues.
- Extracts
- You choose:
-
Binding-site trimming ✂️
- Input:
chain_id(e.g."A")hotspots_str(e.g."A79, A145, A173")flank(±N residues around min/max hotspot index)
- The notebook:
- Parses the hotspot list.
- Extracts a contiguous sequence window from
target_input.pdb. - Writes
target_trimmed_seqwin.pdbfor downstream modeling.
- Input:
-
RFdiffusion setup (binder backbone design)
- Installs:
RFdiffusionrepo.- Required Python packages.
- Schedules and checkpoints.
- Downloads AlphaFold2 parameters into
params/. - Prepares everything to:
- Generate de novo binder backbones.
- Or use existing backbone batches from Drive (e.g.
MyDrive/RFdiffusion_CDA/).
- Installs:
-
Backbone discovery & configuration
- Automatically scans common locations in Drive and local paths for PDB backbones:
MyDrive/RFdiffusion_CDA/*.pdbMyDrive/RFdiffusion_CDA/CDA_run_b*/*.pdboutputs/*.pdb, etc.
- Chooses a default
BACKBONE_GLOBpattern (first pattern with hits). - Prints examples of detected backbone files.
- Automatically scans common locations in Drive and local paths for PDB backbones:
-
Sequence design with ProteinMPNN + AF2 prediction 🧬 → 🧩
- For each backbone:
- Runs ProteinMPNN to generate multiple binder sequences (
NUM_SEQS). - Optionally uses an initial guess sequence.
- Removes specific amino acids from the design alphabet (e.g.
RM_AA="C").
- Runs ProteinMPNN to generate multiple binder sequences (
- For each designed sequence:
- Builds a target–binder complex model using AlphaFold2:
- Monomer or Multimer mode (
USE_MULTIMER). - Configurable
NUM_RECYCLES.
- Monomer or Multimer mode (
- Builds a target–binder complex model using AlphaFold2:
- Saves outputs (PDBs, JSONs…) under:
06_sequences/07_alphafold/
- For each backbone:
-
Visualization & interactive selection 🧿
- Uses py3Dmol to:
- Visualize target + binder.
- Switch between best designs via dropdown.
- Loads best-scoring models (e.g.,
best_designX.pdb).
- Uses py3Dmol to:
-
Scoring, metrics & downloads 📊📦
- Additional packages:
biopythonfreesasamdanalysispandas,numpy- (optionally)
pyrosettaviapyrosetta-help
- Can compute:
- Interface / buried SASA.
- Contact counts.
- Simple quality metrics.
- Optional PyRosetta-based energy terms.
- Packs results:
- Zips all outputs into
<run_name>.result.zip. - Provides a direct download via Colab.
- Zips all outputs into
- Additional packages:
-
Open the notebook in Google Colab (GPU runtime recommended).
-
Run the cells from top to bottom, in order:
- Define
project_nameand create folders. - Upload your target structure.
- Run pocket prediction with P2Rank.
- Select a pocket and compute hotspots.
- Trim the binding site.
- Set up RFdiffusion (and/or provide backbones).
- Run ProteinMPNN + AlphaFold2.
- Inspect, score and download designs.
- Define
-
Check your Google Drive:
- All intermediate files and results will be neatly organized in:
MyDrive/<project_name>/01_target ... 08_metrics.
- All intermediate files and results will be neatly organized in:
Because everything runs inside Colab, you don’t need to pre-install tools locally. You only need:
- A Google account and Google Drive.
- Colab runtime with:
- Python 3
- GPU (recommended for RFdiffusion & AlphaFold2).
The notebook itself takes care of:
- Installing P2Rank.
- Cloning and configuring RFdiffusion.
- Installing ProteinMPNN,
biopython,freesasa,mdanalysis, etc. - Downloading AlphaFold2 parameters.
MyDrive/
└── <project_name>/
├── 01_target/
├── 02_p2rank/
├── 03_hotspots/
├── 04_trimming/
├── 05_rfdiffusion/
├── 06_sequences/
├── 07_alphafold/
└── 08_metrics/