AI-powered professor discovery for PhD applicants
Find the right professors to cold-email — automatically.
You tell it a region, a school, and your research interests. It does the rest:
- Fetches the full faculty list from CSRankings (live data, 7 regions supported)
- Scrapes every professor's homepage — in parallel
- Summarizes each professor's research with an LLM
- Generates a ranked list of professors you should actually email, with a personalized tip for each
📚 Found 87 professors (Carnegie Mellon University)
⚡ Processing in parallel (HTTP×20 / LLM×8)
✅ [ 1/50] 2.0% Yonatan Bisk
✅ [ 2/50] 4.0% Graham Neubig
✅ [ 3/50] 6.0% Maarten Sap
...
### 1. Graham Neubig
**Research focus**: Multilingual NLP, low-resource languages, code generation with LLMs
**Why it fits**: Directly works on LLM-based code generation and multilingual transfer — exact overlap with your stated interests
**Email tip**: Mention his EMNLP 2024 paper on cross-lingual prompting; ask about the xCodeEval benchmark
git clone https://github.com/yourname/ProfRadar.git
cd ProfRadar
pip install -r requirements.txt
cp .env.example .env
# Add your API key (OpenAI / Claude / Gemini / Azure)
python main.py --limit 50Your choices are remembered — next run just press Enter to reuse the same region, school, and interests.
Requirements: Python 3.11+
pip install -r requirements.txtDependencies:
| Package | Purpose |
|---|---|
aiohttp |
Parallel homepage scraping |
beautifulsoup4 + lxml |
HTML parsing |
openai / anthropic / google-genai |
LLM providers |
python-dotenv |
API key management |
Copy .env.example to .env and fill in one or more API keys:
# OpenAI
OPENAI_API_KEY=sk-...
# Anthropic (Claude)
ANTHROPIC_API_KEY=sk-ant-...
# Google Gemini
GEMINI_API_KEY=AIza...
# Azure OpenAI
AZURE_OPENAI_API_KEY=...
AZURE_OPENAI_ENDPOINT=https://YOUR_RESOURCE.openai.azure.com/
AZURE_OPENAI_API_VERSION=2024-02-01
AZURE_OPENAI_DEPLOYMENT=your-deployment-name
# Default provider
DEFAULT_PROVIDER=openai
# Agent provider, used by `python run_agent.py`
AGENT_PROVIDER=deepseek
SCREENING_MODEL_PROVIDER=deepseek
DEEPSEEK_API_KEY=...
DEEPSEEK_MODEL=deepseek-v4-flash
DEEPSEEK_BASE_URL=https://api.deepseek.com
# Or use Codex agent mode after `codex login`
CODEX_MODEL=gpt-5.4
CODEX_BASE_URL=https://chatgpt.com/backend-api/codex# Basic (uses DEFAULT_PROVIDER from .env)
python main.py
# Limit to 50 professors (recommended for first run)
python main.py --limit 50
# Choose provider and model
python main.py --provider openai --model gpt-4o-mini --limit 50
python main.py --provider anthropic --limit 50
python main.py --provider gemini --model gemini-2.0-flash --limit 50
python main.py --provider azure --limit 50
# Refresh CSRankings data
python main.py --update
# Clear saved school/interests (re-enter everything)
python main.py --reset| Provider | Recommended Model | Notes |
|---|---|---|
| OpenAI | gpt-4o-mini |
Best cost/quality balance |
| Anthropic | claude-opus-4-6 |
Highest quality summaries |
gemini-2.0-flash |
Fast and cheap | |
| Azure OpenAI | your deployment | Enterprise/private deployments |
All results are saved under output/{school}/:
output/
└── Carnegie_Mellon_University/
├── Graham_Neubig.md # Individual professor profile
├── Yonatan_Bisk.md
├── ...
├── index.md # Full school directory (table)
└── match_report_2026-04-14.md # Your personalized recommendations
Each professor profile contains:
- Research areas (from CSRankings publication data)
- 60-word quick summary (used for matching)
- 200-word full research profile
The match report ranks the top 8 professors by fit and gives you a concrete cold-email tip for each.
CSRankings GitHub ──► data_manager.py ──► professor list
│
┌───────────────┤
│ asyncio.gather (parallel)
│
┌─────────▼──────────┐
│ aiohttp fetch │ ← up to 20 concurrent
│ homepage HTML │
└─────────┬──────────┘
│
┌─────────▼──────────┐
│ LLM summarize │ ← up to 8 concurrent
│ 60w + 200w │
└─────────┬──────────┘
│
┌─────────▼──────────┐
│ write .md files │
└─────────┬──────────┘
│
┌─────────▼──────────┐
│ matcher: rank │ ← 1 LLM call with all summaries
│ top 8 by fit │
└────────────────────┘
Fetching and summarizing are fully parallel — 50 professors typically finish in 2–4 minutes.
- Start with
--limit 50to keep costs and time manageable. A full department (100+ professors) can be processed in one shot once you know the tool works. - Re-run freely — existing
.mdfiles are overwritten, the match report gets a date stamp so old ones are preserved. - Switch schools without losing interests — just press
cat the school prompt, your interests stay saved. - gpt-4o-mini is the cheapest option and works well for the summarization step. The final matching call is just one request regardless of professor count.
ProfRadar/
├── main.py # Entry point & orchestration
├── data_manager.py # CSRankings data fetching & caching
├── school_selector.py # Interactive school picker
├── professor_pipeline.py # Async parallel scrape + summarize + write
├── matcher.py # Final LLM matching & ranking
├── providers/
│ ├── base.py # LLMProvider interface
│ ├── anthropic_provider.py
│ ├── openai_provider.py
│ ├── azure_openai_provider.py
│ └── gemini_provider.py
├── data/ # CSRankings CSV cache (auto-created)
├── output/ # Generated profiles & reports (auto-created)
├── requirements.txt
└── .env.example
| Region | Country Code |
|---|---|
| United States | US |
| China | CN |
| United Kingdom | GB |
| Canada | CA |
| Australia | AU |
| Switzerland | CH |
| Singapore | SG |
Region is selected interactively at startup and remembered for future runs.
Faculty data comes from CSRankings by Emery Berger — the most widely used metric-based ranking of CS research institutions. The raw CSV files are fetched directly from the GitHub repository and cached locally. Run python main.py --update to refresh.
PRs welcome. Some ideas:
- Cache scraped homepages to avoid re-fetching on re-runs
- Filter professors by CSRankings area before scraping (e.g., only ML/NLP profs)
- Export match report as PDF
- Support non-US schools — US, China, UK, Canada, Australia, Switzerland, Singapore
- Add a web UI
MIT