ColabDNMP v1.0

Google Colab pipeline to design and evaluate de novo binders against any protein target, integrating P2Rank, RFdiffusion, ProteinMPNN and AlphaFold2.

🧠 What is ColabMiniprot?

ColabMiniprot is an end-to-end workflow that:

Takes a target protein structure (PDB/mmCIF).
Predicts surface pockets with P2Rank.
Identifies hotspot residues in a chosen pocket using per–residue SASA.
Trims the target around those hotspots to define a binding site.
Uses RFdiffusion (backbones) + ProteinMPNN (sequence design) to generate candidate binders.
Evaluates target–binder complexes with AlphaFold2 and interface metrics.

All the heavy lifting (installations, paths, file conversions, etc.) is automated inside the Colab notebook and linked to your Google Drive.

🧬 Pipeline overview

The notebook is organized into logical blocks:

Project setup (Google Drive)
- Asks for a project_name.
- Creates a working directory in your Drive:
  MyDrive/<project_name>/
- Subfolders:
  - 01_target
  - 02_p2rank
  - 03_hotspots
  - 04_trimming
  - 05_rfdiffusion
  - 06_sequences
  - 07_alphafold
  - 08_metrics
Upload target structure 🎯
- You upload a PDB or (mm)CIF file.
- The notebook:
  - Copies it into 01_target/.
  - Converts CIF → PDB if needed.
  - Standardizes the name to:
    target_input.pdb (both locally and in Drive).
Pocket prediction with P2Rank 🔍
- Downloads and installs P2Rank (if not present).
- Runs P2Rank on target_input.pdb.
- Stores results in p2rank_output/target_input.pdb_predictions.csv.
- Loads and cleans the CSV into a pandas DataFrame.
Pocket selection & hotspot detection 💥
- You choose:
  - pocket_rank (e.g. 1 = top-ranked pocket),
  - sasa_cutoff,
  - residue filters:
    - none
    - aromatic_only
    - hydrophobic_only
    - aromatic_or_hydrophobic
- The notebook:
  - Extracts residue_ids for the chosen pocket.
  - Computes per–residue SASA (Biopython + Shrake–Rupley).
  - Ranks pocket residues by exposure / filters them by chemistry.
  - Prints the top hotspot residues.
Binding-site trimming ✂️
- Input:
  - chain_id (e.g. "A")
  - hotspots_str (e.g. "A79, A145, A173")
  - flank (±N residues around min/max hotspot index)
- The notebook:
  - Parses the hotspot list.
  - Extracts a contiguous sequence window from target_input.pdb.
  - Writes target_trimmed_seqwin.pdb for downstream modeling.
RFdiffusion setup (binder backbone design)
- Installs:
  - RFdiffusion repo.
  - Required Python packages.
  - Schedules and checkpoints.
- Downloads AlphaFold2 parameters into params/.
- Prepares everything to:
  - Generate de novo binder backbones.
  - Or use existing backbone batches from Drive (e.g. MyDrive/RFdiffusion_CDA/).
Backbone discovery & configuration
- Automatically scans common locations in Drive and local paths for PDB backbones:
  - MyDrive/RFdiffusion_CDA/*.pdb
  - MyDrive/RFdiffusion_CDA/CDA_run_b*/*.pdb
  - outputs/*.pdb, etc.
- Chooses a default BACKBONE_GLOB pattern (first pattern with hits).
- Prints examples of detected backbone files.
Sequence design with ProteinMPNN + AF2 prediction 🧬 → 🧩
- For each backbone:
  - Runs ProteinMPNN to generate multiple binder sequences (NUM_SEQS).
  - Optionally uses an initial guess sequence.
  - Removes specific amino acids from the design alphabet (e.g. RM_AA="C").
- For each designed sequence:
  - Builds a target–binder complex model using AlphaFold2:
    - Monomer or Multimer mode (USE_MULTIMER).
    - Configurable NUM_RECYCLES.
- Saves outputs (PDBs, JSONs…) under:
  - 06_sequences/
  - 07_alphafold/
Visualization & interactive selection 🧿
- Uses py3Dmol to:
  - Visualize target + binder.
  - Switch between best designs via dropdown.
- Loads best-scoring models (e.g., best_designX.pdb).
Scoring, metrics & downloads 📊📦
- Additional packages:
  - biopython
  - freesasa
  - mdanalysis
  - pandas, numpy
  - (optionally) pyrosetta via pyrosetta-help
- Can compute:
  - Interface / buried SASA.
  - Contact counts.
  - Simple quality metrics.
  - Optional PyRosetta-based energy terms.
- Packs results:
  - Zips all outputs into <run_name>.result.zip.
  - Provides a direct download via Colab.

🚀 How to use

Open the notebook in Google Colab (GPU runtime recommended).
Run the cells from top to bottom, in order:
- Define project_name and create folders.
- Upload your target structure.
- Run pocket prediction with P2Rank.
- Select a pocket and compute hotspots.
- Trim the binding site.
- Set up RFdiffusion (and/or provide backbones).
- Run ProteinMPNN + AlphaFold2.
- Inspect, score and download designs.
Check your Google Drive:
- All intermediate files and results will be neatly organized in: MyDrive/<project_name>/01_target ... 08_metrics.

📦 Requirements

Because everything runs inside Colab, you don’t need to pre-install tools locally. You only need:

A Google account and Google Drive.
Colab runtime with:
- Python 3
- GPU (recommended for RFdiffusion & AlphaFold2).

The notebook itself takes care of:

Installing P2Rank.
Cloning and configuring RFdiffusion.
Installing ProteinMPNN, biopython, freesasa, mdanalysis, etc.
Downloading AlphaFold2 parameters.

📁 Folder structure (in Drive)

MyDrive/
└── <project_name>/
    ├── 01_target/
    ├── 02_p2rank/
    ├── 03_hotspots/
    ├── 04_trimming/
    ├── 05_rfdiffusion/
    ├── 06_sequences/
    ├── 07_alphafold/
    └── 08_metrics/

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
ColabDNMP_Borrador.ipynb		ColabDNMP_Borrador.ipynb
GoogleColab		GoogleColab
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ColabDNMP v1.0

🧠 What is ColabMiniprot?

🧬 Pipeline overview

🚀 How to use

📦 Requirements

📁 Folder structure (in Drive)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ColabDNMP v1.0

🧠 What is ColabMiniprot?

🧬 Pipeline overview

🚀 How to use

📦 Requirements

📁 Folder structure (in Drive)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages