Skip to content

ReadyPixels/AI_Models_Matrix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

105 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

AI Models Matrix Logo

Awesome AI Models Matrix 🧠

Awesome License: CC BY-NC 4.0 Last Updated

Research-based list of AI models, development tools, and automation resources. Use it to compare releases, pricing, benchmarks, and deployment options from official sources.

Document Version: 3.0 Last Updated: 2026-05-04 21:38 UTC Repository: https://github.com/ReadyPixels/AI_Models_Matrix

Contents

Models 🧠

Comprehensive documentation of Large Language Models (LLMs), Small Language Models (SLMs), and specialized AI models available today.

Frontier Models πŸš€

State-of-the-art proprietary AI models with cutting-edge capabilities from leading AI labs.

Model Company Context GPQA Diamond Arena Elo SWE-bench Verified AIME 2025 Pricing Verified
GPT-5.5 OpenAI 1M 93.2% β€” β€” β€” $5.00 / $30.00 2026-04-26
GPT-5.5 Pro OpenAI 1M 95.1% β€” 92.3% 98.5% $15.00 / $60.00 2026-04-24
Claude Opus 4.7 Anthropic 1M 94.2% β€” 87.6% ~95% $5 / $25 2026-04-26
Claude Sonnet 4.6 Anthropic 1M 89.9% ~1438 (Text) / 1523 (Code) 79.6% ~95% $3 / $15 2026-04-26
GPT-5.3-Codex OpenAI 400K 91.5% β€” 85.0% β€” $1.75 / $14.00 2026-04-26
Gemini 3.1 Pro Google 1M 94.3% 1494 (Text) / 1455 (Code) 80.6% 100% $2 / $12 2026-04-26
Gemini 3 Deep Think Google 1M+ ~97% β€” ~58% β€” Ultra subscription 2026-04-26
GLM-5 Zhipu AI 200K 82.0% ~1451 (Text) / 1445 (Code) 77.8% 92.7% $1.00 / $3.20 2026-04-26
GLM-5.1 Zhipu AI 200K β€” β€” ~80.4% (est.) β€” $1.00 / $3.20 2026-04-26
MiniMax-M2.5 MiniMax 200K 85.2% β€” 80.2% 86.3% $0.30 / $1.20 2026-04-26
Kimi K2.6 Moonshot AI 256K 90.5% β€” 80.2% 96.4% $0.60 / $3.00 2026-04-26
DeepSeek-V4 DeepSeek 1M β€” β€” β€” β€” $0.30 / $0.50 2026-04-26
DeepSeek-V3.2 DeepSeek 164K 87.1% β€” 67.8% 89.3% $0.28 / $0.42 2026-04-26
Qwen3.5-Max Alibaba 128K 89.3% β€” 76.4% 91.3% Pay-per-token 2026-04-26
Gemini 3 Pro Google 1M+ 91.9% 1486 (Text) / 1438 (Code) 76.2% 98–100% Tiered pricing 2026-04-26
Gemini 3 Flash Google 10M 90.4% 1474 (Text) / 1438 (Code) 78.0% β€” $0.30 / $2.50 2026-04-26
Gemini 3.1 Flash-Lite Google 1M 86.9% 1432 β€” β€” $0.25 / $1.50 2026-04-26
GPT-5.4 OpenAI 1M 92.0% 1484 (Text) / 1457 (Code) ~80% 88% $2.50 / $15.00 2026-04-26
GPT-5.4 mini OpenAI 400K 87.5% β€” β€” β€” $0.75 / $4.50 2026-04-26
GPT-5.4 nano OpenAI 400K β€” β€” β€” β€” $0.20 / $1.25 2026-04-26
Step-3.5-Flash StepFun 256K 83.1% β€” 74.4% 97.3% Pay-per-token 2026-04-26
Mistral Large 3 Mistral AI 128K 43.9% β€” β€” β€” $0.50 / $1.50 2026-04-26
Claude Sonnet 4.5 Anthropic 200K 83.4% β€” 77.2% 87% $3 / $15 2026-04-26
Llama 4 Scout Meta 10M 57.2% β€” β€” β€” Free (self-host) 2026-04-26
Llama 4 Maverick Meta 128K 69.8% β€” β€” β€” Free (self-host) 2026-04-26
Grok 4 xAI 128K ~91.5% ~1493 (Text) β€” 100% $3 / $15 2026-04-26
Grok 4 Fast xAI 128K β€” β€” β€” β€” $0.20 / $1.50 2026-04-26

Top Models by Category

Category #1 #2 #3
Coding Claude Opus 4.7 GPT-5.5 Pro GPT-5.5
Reasoning Gemini 3 Deep Think GPT-5.5 Pro Qwen3-Max-Thinking
Open Source DeepSeek-V4 Qwen3.5-Max Llama 4
Cost Efficiency DeepSeek-V3.2 Grok 4 Fast GLM-4.7-FlashX
Context Window Gemini 3 Flash (10M) Llama 4 Scout (10M) Claude Opus 4.6 (1M)

Model Specifications πŸ“‹

Detailed technical specifications, pricing, and capabilities for all frontier models. Data as of April 2026.

Output Token Limits

Maximum output tokens per single API request.

Model Max Output Context Window Notes
Claude Opus 4.6 128K (300K via beta) 1M Extended output via output-128k-2025-02-19 beta header
Claude Opus 4.7 128K (300K via beta) 1M Extended output via output-128k-2025-02-19 beta header
Claude Sonnet 4.6 64K 1M β€”
Claude Sonnet 4.5 64K 200K β€”
GPT-5.4 128K 1.05M β€”
GPT-5.4 mini 128K 400K β€”
GPT-5.4 nano 128K 400K β€”
GPT-5.3-Codex 128K 400K β€”
Gemini 3.1 Pro 64K 1M β€”
Gemini 3 Pro 64K 2M β€”
Gemini 3 Flash 64K 1M β€”
Gemini 3.1 Flash-Lite 32K 1M β€”
DeepSeek-V4 DeepSeek 1M β€”
DeepSeek-V3.2 8K / 64K (reasoner) 128K Reasoner mode unlocks 64K output
Qwen3.5-Max 65K 1M β€”
GLM-5 128K 200K β€”
GLM-5.1 131K 200K β€”
MiniMax-M2.5 131K 1M β€”
Kimi K2.6 β€” 256K Not publicly specified
Step-3.5-Flash 66K 256K β€”
Grok 4 β€” 256K Not publicly specified
Grok 4 Fast 30K 2M β€”
Mistral Large 3 32K 128K β€”
Llama 4 Scout 16K 10M β€”
Llama 4 Maverick 16K 1M β€”

Cached & Batch Pricing

Discounted pricing tiers for high-volume usage. All prices in USD per million tokens.

Model Standard Input Cached Input Batch Discount Notes
Claude Opus 4.6 $5.00 $0.50 (hit) / $6.25 (5m write) 50% off Batch: $2.50 in / $12.50 out
Claude Sonnet 4.6 $3.00 $0.30 (hit) / $3.75 (5m write) 50% off Batch: $1.50 in / $7.50 out
Claude Sonnet 4.5 $3.00 $0.30 (hit) / $3.75 (5m write) 50% off Batch: $1.50 in / $7.50 out
GPT-5.4 $2.50 $0.25 50% off Data residency +10%
GPT-5.4 mini $0.75 $0.075 50% off β€”
GPT-5.4 nano $0.20 $0.02 50% off β€”
GPT-5.3-Codex $1.75 $0.175 50% off β€”
Gemini 3.1 Pro $2.00 $0.20–$0.40 + $4.50/hr storage 50% off Tiered by input length
Gemini 3 Flash $0.50 $0.05 + $1.00/hr storage 50% off β€”
Gemini 3.1 Flash-Lite $0.025 $0.0025 + $0.25/hr storage 50% off Most affordable Google model
DeepSeek-V4 $0.30 $0.03 (90% off) Off-peak 50% off 75% discount until 2026-05-05: ~$0.035 in
DeepSeek-V3.2 $0.28 $0.028 β€” No formal batch API
Qwen3.5-Max $0.40 Available 50% off β€”
GLM-5 / GLM-5.1 $1.00 $0.20 β€” β€”
Grok 4 $3.00 $0.75 β€” β€”
Grok 4 Fast $0.20 $0.05 β€” β€”
Mistral Large 3 $0.50 β€” β€” No formal batch/cache tier
Step-3.5-Flash $0.10 β€” β€” β€”

Speed & Latency

Output throughput and time-to-first-token from Artificial Analysis and provider benchmarks.

Model Output Speed (tok/s) TTFT Notes
Gemini 3.1 Flash-Lite ~250 ~2.1s Fastest budget Google model
Step-3.5-Flash 85–350 β€” Variable by provider; peak ~350 tok/s
Gemini 3 Flash ~193 ~4.16s β€”
MiniMax-M2.5 Lightning ~100 β€” Faster tier
GPT-5.3-Codex ~86 ~77.86s High TTFT due to extended reasoning
Grok 4 ~56 ~8.96s β€”
MiniMax-M2.5 Standard ~50 β€” β€”

Most frontier models (Claude Opus/Sonnet 4.6, GPT-5.4, Gemini 3.1 Pro, etc.) have not yet been benchmarked on Artificial Analysis as of April 2026.

Training Data Cutoffs

Knowledge cutoff dates β€” the point after which a model has no training data.

Model Training Cutoff Notes
Claude Sonnet 4.6 Jan 2026 Most recent cutoff among frontier models
Claude Opus 4.6 Aug 2025 Reliable knowledge: May 2025
GPT-5.4 / mini / nano Aug 31, 2025 β€”
GPT-5.3-Codex Aug 31, 2025 β€”
Grok 4 Fast Jul 2025 β€”
DeepSeek-V4 May 2025 β€”
Gemini 3.1 Flash-Lite Jan 2025 β€”
Gemini 3.1 Pro / 3 Pro / 3 Flash Jan 2025 β€”
Grok 4 ~Nov–Dec 2024 Approximate
DeepSeek-V3.2 Jul 2024 β€”
Llama 4 Scout / Maverick Aug 2024 β€”
DeepSeek-R1 ~Oct 2023 Based on base model

Models not listed (Qwen, GLM, MiniMax, Kimi, Step, Mistral): training cutoff not publicly disclosed.

Multilingual Support

Model Languages Details
Qwen3.5-Max 201 Largest language coverage
Llama 4 Scout 200 Pre-training languages
Qwen3-Max-Thinking 119 Qwen3 series
Gemini 3 Flash 100 91.8% MMMLU score across 100 languages
Gemini 3.1 Pro / 3 Pro 100+ β€”
Gemini 3.1 Flash-Lite 100 91.3% MMMLU score
Llama 4 Maverick 12 Output languages
Claude (all) Many English-optimized; broad multilingual
GPT-5.4 (all) Many Broad multilingual coverage
DeepSeek (all) Many Chinese + English focused
Grok (all) Many β€”
GLM-5 / GLM-5.1 Many 28.5T token training data

Structured Output & Function Calling

All frontier models support structured JSON output and function/tool calling except where noted.

Capability Supported Models Not Supported
Structured Output (JSON mode) All models listed in Frontier table Gemini 3 Deep Think (no API)
Function Calling / Tool Use All models listed in Frontier table Gemini 3 Deep Think (no API)

Gemini 3 Deep Think is available only via Gemini's in-app Think mode β€” no API access for structured output or function calling.

Regional Availability

Provider API Availability Cloud Partners Notes
Anthropic Global AWS Bedrock, GCP Vertex AI US-only inference at 1.1x via inference_geo
OpenAI Global Azure OpenAI Data residency endpoints +10% (post-3/5/26)
Google Global Google AI Studio, Vertex AI Some regional restrictions per Google terms
DeepSeek Global Azure (R1 only, select regions) China-based servers
Alibaba (Qwen) Global Alibaba Cloud Model Studio China-based; globally accessible
Zhipu AI (GLM) Global Z.AI API MIT license enables self-hosting anywhere
MiniMax Global MiniMax API β€”
Moonshot AI (Kimi) Global platform.kimi.ai MIT open-weight
xAI (Grok) US-focused Oracle OCI (East/Midwest/West) Limited non-US availability
Mistral Global Azure AI Foundry, AWS, GCP β€”
Meta (Llama) Global (self-host) All major cloud providers Llama 4 Community License
StepFun Global HuggingFace Apache 2.0 open-source

Free-Source Models πŸ†“

Self-hostable models with permissive licenses or open weights for privacy, cost control, and customization.

Model Company Params Context License
DeepSeek-V4 DeepSeek 1.6T / 49B active (MoE) 1M Open Weight
Qwen3.5-Max Alibaba 397B / 17B active (MoE) 262K Apache 2.0
Qwen3-Max-Thinking Alibaba 1T+ 128K Apache 2.0
Qwen3.6-27B Alibaba 27B dense 262K Apache 2.0
Qwen3.5-122B Alibaba 397B / 17B active (MoE) 262K Apache 2.0
Qwen3.5-27B Alibaba 27B dense 262K Apache 2.0
Mistral Large 3 Mistral AI 123B 128K Apache 2.0
Llama 4 Scout Meta 109B 10M Community
Llama 4 Maverick Meta 400B 128K Community
GPT-OSS-120B OpenAI 117B 128K Apache 2.0
GPT-OSS-20B OpenAI 21B 128K Apache 2.0
Qwen3-Coder Alibaba 480B 262K Apache 2.0
GLM-5.1 Zhipu AI 754B / 40B active (MoE) 200K MIT
GLM-4.7 Zhipu AI 400B+ MoE 205K Open Weight
Gemma 4 31B Google 31B dense 256K Apache 2.0
Gemma 4 27B Google 27B MoE (4B active) 256K Apache 2.0
Gemma 4 E4B Google 4B dense 256K Apache 2.0
Gemma 4 E2B Google 2B dense 256K Apache 2.0
Qwen3-Coder 7B Alibaba 7B dense 128K Apache 2.0
Qwen 2.5 Coder 32B Alibaba 32B dense 128K Apache 2.0
DeepSeek Coder-V2 DeepSeek 236B / 2.4B active 128K MIT
Step-3.5-flash StepFun 196B / 11B active (MoE) 256K Open Weight
Yi-Coder 01.AI 9B/1.5B 128K Apache 2.0
Lizzy-7B Flower Labs 7B β€” MIT
MiMo-V2.5 Xiaomi 309B / 15B active 262K MIT
MiMo-V2.5-Pro Xiaomi 1.02T / 42B active 1M MIT
Lizzy-7B Flower Labs 7B β€” MIT
MiMo-V2.5 Xiaomi 310B (15B active) 1M MIT
MiMo-V2.5-Pro Xiaomi 1.02T (42B active) 1M MIT

Deployment Options

Local Inference Tools:

  • Ollama - Easy local deployment
  • LM Studio - User-friendly GUI
  • llama.cpp - Efficient CPU inference
  • vLLM - High-throughput serving
  • SGLang - Structured generation

Cloud Deployment:

  • Hugging Face Inference - Managed deployment
  • AWS SageMaker - Full control
  • Google Cloud Vertex - Integrated
  • RunPod - GPU rental

Coding Models πŸ’»

Specialized AI models optimized for software development tasks.

SWE-bench Verified Leaderboard

Rank Model Company SWE-bench Verified
πŸ₯‡ #1 GPT-5.5 Pro OpenAI 92.3%
πŸ₯ˆ #2 GPT-5.5 OpenAI 88.5%
πŸ₯‰ #3 Claude Opus 4.7 Anthropic 87.6%
#4 Claude Opus 4.6 Anthropic 80.8%
#5 Gemini 3.1 Pro Google 80.6%
#6 MiniMax-M2.5 MiniMax 80.2%
#7 GPT-5.4 OpenAI ~80%
#8 GPT-5.2 OpenAI 80.0%
#9 Claude Sonnet 4.6 Anthropic 79.6%
#10 Gemini 3 Flash Google 78.0%
#11 GLM-5 Zhipu AI 77.8%
#12 Claude Sonnet 4.5 Anthropic 77.2%
#13 Kimi K2.6 Moonshot AI 80.2%

Commercial Coding Models

Model Developer Pricing Best For
Claude Opus 4.6 Anthropic $5 / $25 per 1M Agentic coding, complex tasks
GPT-5.5 Pro OpenAI $15.00 / $60.00 per 1M Highest benchmark coding
GPT-5.3-Codex OpenAI $1.75 / $14.00 per 1M Agentic coding, 7+ hour autonomy
Claude Haiku 4.5 Anthropic $1 / $5 per 1M Low-latency coding, sub-agents, computer use
GLM-5-Code Zhipu AI $1.20 / $5.00 per 1M Code generation, refactoring
MiniMax-M2.5 MiniMax $0.30 / $1.20 per 1M Code generation, refactoring
Claude Sonnet 4.5 Anthropic $3 / $15 per 1M Code review, refactoring
Codestral Mistral AI $0.30 / $0.90 Real-time completion
Grok 4 Fast xAI $0.20 / $1.50 Most used (50% share)

Open-Source Coding Models

Model Developer License Hardware
GPT-OSS-120B OpenAI Apache 2.0 80-160 GB VRAM
Qwen3-Coder Alibaba Apache 2.0 160-320 GB VRAM
DeepSeek-Coder-V2 DeepSeek MIT 48-80 GB VRAM
GLM-4.6 Zhipu AI Open Weight 80-160 GB VRAM
Phi-4 Microsoft MIT 24-48 GB VRAM

Reasoning Models 🧠

Models optimized for step-by-step reasoning, mathematical problem-solving, and complex logical inference.

AIME 2025 Leaderboard

Rank Model AIME 2025 ARC-AGI-2 Notes
πŸ₯‡ #1 GPT-5.5 Pro 100% 78.5% Highest combined
πŸ₯ˆ #2 Gemini 3.1 Pro 100% 77.1% Highest combined reasoning
πŸ₯‰ #3 GPT-5.2 100% 52.9% No tools needed
#4 Grok 4 100% β€” First-principles reasoning
#5 Claude Opus 4.6 99.8% 68.8% Near-perfect AIME
#6 Gemini 3 Pro 98–100% 31.1–45.1% With code execution
#7 Step-3.5-Flash 97.3% β€” Best efficiency ratio
#8 Kimi K2.6 96.4% β€” Strong multimodal reasoning
#9 Claude Sonnet 4.6 ~95% 58.3% Near-Opus performance
#10 GLM-5 92.7% β€” Thinking mode

Reasoning Model Details

Model Type Context Pricing
Gemini 3 Deep Think Reasoning 1M+ Ultra subscription
Qwen3-Max-Thinking Reasoning/Coding 128K $1.20 / $6.00
o3 / o1-Pro Reasoning 128K $2-150 / $8-600
GPT-5.5 Pro Reasoning 1M $15.00 / $60.00
Gemini 3 Pro General/Multimodal 1M+ $2 / $12
DeepSeek-R1 Reasoning 128K $0.50 / $2.15
Claude Sonnet 4.5 Hybrid 200K $3 / $15
GPT-Rosalind Life Sciences Reasoning 128K Pay-per-token (Research Preview)

Use Cases

  • Mathematical Problem Solving: Qwen3-Max-Thinking, GPT-5.5 Pro, Gemini 3 Pro
  • Scientific Analysis: Claude Opus 4.6, GPT-5.5, Gemini 3 Pro
  • Strategic Planning: o3/o1-Pro, Claude Sonnet 4.5, DeepSeek-R1
  • Code Debugging: Claude Sonnet 4.5, GPT-5.3-Codex, DeepSeek-V3.2

Multimodal Models 🎨

Models capable of processing and generating multiple types of content: text, images, audio, and video.

Leading Multimodal Models

Model Developer Context Key Features
GPT-5.4 OpenAI 1M Unified multimodal, audio
Gemini 3 Pro Google 1M+ Native multimodal, video
Claude Sonnet 4.5 Anthropic 200K Document understanding
Llama 4 Maverick Meta 128K Open multimodal
Nemotron 3 Nano Omni NVIDIA 30B (3B active) Vision, audio, language unified, 9x throughput

Vision Capabilities

Model MMMU / MMMU-Pro MathVista DocVQA
Gemini 3.1 Pro 95% (MMMU-Pro) β€” β€”
GPT-5.4 94% (MMMU-Pro) β€” β€”
Gemini 3 Pro 81% (MMMU-Pro) β€” β€”
Gemini 3 Flash 80% (MMMU-Pro) β€” β€”
Claude Sonnet 4.5 77.8% (MMMU) β€” β€”
Llama 4 Maverick 73.4% (MMMU) β€” β€”

Audio & Video

Model Speech-to-Text Text-to-Speech Video Input
Gemini 3 Pro βœ… βœ… βœ…
GPT-5 βœ… βœ… ⚠️
Whisper v3 βœ… ❌ βœ…

Image Generation

Model Developer License Best For
MAI-Image-2-Efficient Microsoft Proprietary Production-ready quality, 41% lower cost
Flux.1 Black Forest Labs Apache 2.0 High-fidelity art
Stable Diffusion 3.5 Stability AI Community License Fine-tuning
GLM-Image Zhipu AI (Z.ai) API Fast image generation
CogView-4 Zhipu AI (Z.ai) API Creative image generation
Firefly AI Assistant Adobe Public Beta (2026-04-27) Creative agent, 60+ tools, Photoshop/Premiere integration

Hardware Requirements πŸ–₯️

Comprehensive hardware specifications for self-hosting AI models.

Quick Reference by Model Size

Model Params Q4 Size Min VRAM Rec VRAM Min RAM
Phi-4 14B 8 GB 24 GB 48 GB 32 GB
GPT-OSS-20B 21B 12 GB 24 GB 48 GB 32 GB
Llama 4 Scout 109B 66 GB 48 GB 80 GB 96 GB
GPT-OSS-120B 117B 70 GB 80 GB 160 GB 128 GB
DeepSeek-Coder-V2 236B 143 GB 48 GB 80 GB 192 GB
Llama 4 Maverick 400B 242 GB 160 GB 320 GB 320 GB
DeepSeek-V4 671B 404 GB 80 GB 320 GB 512 GB
Qwen3-Max-Thinking 1T+ 600+ GB 160 GB 640 GB 768 GB

By Hardware Tier

Consumer/Entry Level (24-48 GB VRAM):

  • Phi-4, GPT-OSS-20B, Yi-Coder, Qwen2.5-Coder
  • Recommended GPUs: RTX 3090 (24GB), RTX 4090 (24GB)

Professional (80-160 GB VRAM):

  • Llama 4 Scout, GPT-OSS-120B, DeepSeek-Coder-V2
  • Recommended GPUs: A100 80GB, 2x A100 40GB

Enterprise (320+ GB VRAM):

  • Llama 4 Maverick, GLM-4.7, DeepSeek-V4, Qwen3-Max-Thinking
  • Recommended GPUs: 4x A100 80GB, 8x A100 80GB

Quantization Explained

Level Bits Size vs FP16 Quality Use Case
FP16/BF16 16 100% Best Training
Q8_0 8 ~50% Excellent High-quality inference
Q4_K_M 4 ~25% Good Recommended for deployment
Q3_K_M 3 ~19% Fair Limited resources

Comprehensive Benchmark Reference πŸ“ˆ

Detailed benchmark scores across all major evaluations. Scores are percentages (%) unless noted. Arena Elo scores are integers. β€” = not publicly reported. Data as of April 2026.

Full Benchmark Table

Model GPQA Diamond MMLU-Pro Arena Elo (Text) HLE SWE-bench Verified SWE-bench Pro LiveCodeBench AIME 2025 ARC-AGI-2 MMMU-Pro IFEval FrontierMath
Claude Opus 4.6 91.3% β€” 1500 40.0–53.0% 80.8% β€” β€” 99.8% 68.8% β€” β€” β€”
GPT-5.5 93.2% β€” 1495 42.1–55.0% 88.5% β€” β€” 99.9% 71.2% β€” β€” 52%
GPT-5.5 Pro 95.1% 96% 1520 48.5–62.0% 92.3% β€” β€” 100% 78.5% β€” 97% 58%
Claude Sonnet 4.6 89.9% β€” ~1438 33.2–49.0% 79.6% β€” β€” ~95% 58.3% β€” β€” β€”
Claude Sonnet 4.5 83.4% 88.0% β€” β€” 77.2% β€” β€” 87–100% β€” β€” β€” β€”
GPT-5.4 92.0% 94% 1484 36.6–41.6% ~80% 57.7% 84–88% 88% 73.3% 94% β€” 50% (Pro)
GPT-5.4 mini 87.5% β€” β€” β€” β€” 54.4% β€” β€” β€” β€” β€” β€”
GPT-5.3-Codex 91.5% β€” β€” β€” β€” 56.8% 85% β€” β€” β€” β€” β€”
GPT-5.2 92.4% β€” 1479 35.2% 80.0% 55.6% β€” 100% 52.9% β€” 95.6% ~40.3%
Gemini 3.1 Pro 94.3% 92% 1494 44.4–51.4% 80.6% 54.2–72% 71% 100% 77.1% 95% 95% β€”
Gemini 3 Pro 91.9–93.8% 83% 1486 37.5% 76.2% 43.3% 49% 98–100% 31.1–45.1% 81% 88% 38%
Gemini 3 Flash 90.4% 72% 1474 33.7% 78.0% 44% β€” β€” β€” 80% 85% β€”
Gemini 3 Deep Think ~97% 81% β€” 48.4% ~58% 63% 58% β€” 84.6% β€” β€” β€”
DeepSeek-V3.2 87.1% 85.0% β€” 25.1% 67.8% β€” β€” 89.3% β€” β€” β€” β€”
DeepSeek-R1 71.5% 84.0% β€” 8.5% 49.2% β€” 63.5% 70.0% β€” β€” β€” β€”
Qwen3.5-Max 89.3% β€” β€” β€” 76.4% β€” β€” 91.3% β€” 79% β€” β€”
Qwen3-Max-Thinking 86.1% β€” β€” 26.2% β€” β€” β€” β€” β€” β€” β€” β€”
GLM-5 82.0% β€” ~1451 10.4% 77.8% β€” β€” 92.7% β€” β€” β€” β€”
GLM-5.1 β€” β€” β€” β€” ~80.4% (est.) β€” β€” β€” β€” β€” β€” β€”
Kimi K2.6 90.5% 87.1% β€” 31.5–50.2% 80.2% β€” 85.0% 96.4% β€” 78.5% β€” β€”
MiniMax-M2.5 85.2% β€” β€” β€” 80.2% 55.4% β€” 86.3% β€” β€” β€” β€”
Step-3.5-Flash 83.1% β€” β€” β€” 74.4% β€” 86.4% 97.3% β€” β€” β€” β€”
Grok 4 ~91.5% 91.5% ~1493 50.7% β€” β€” β€” 100% β€” β€” β€” β€”
Llama 4 Maverick 69.8% 80.5% β€” β€” β€” β€” 43.4% β€” β€” β€” β€” β€”
Llama 4 Scout 57.2% 74.3% β€” β€” β€” β€” 32.8% β€” β€” β€” β€” β€”

FrontierMath Scores

FrontierMath is a benchmark of 350 original, exceptionally challenging mathematics problems created by expert mathematicians (Epoch AI). Problems span number theory, analysis, algebraic geometry, and category theory. Tier 4 problems can take research mathematicians multiple days.

Model Tiers 1–3 Tier 4 Source
GPT-5.4 Pro 50% ~36–38% Epoch AI
GPT-5.2 Pro ~40.3% 31% Epoch AI
Gemini 3 Pro 38% 19% Epoch AI
GPT-5.1 Thinking ~25% β€” llm-stats

Benchmark Glossary

Benchmark Description Source
GPQA Diamond Graduate-level science questions (PhD difficulty) Google Research
MMLU-Pro Extended multi-task language understanding (harder than MMLU) TIGER-Lab
Arena Elo Crowdsourced human preference ranking lmarena.ai
HLE Humanity's Last Exam β€” expert-level questions Scale AI
SWE-bench Verified Real GitHub issue resolution (human-verified subset) SWE-bench
SWE-bench Pro More challenging subset of SWE-bench SWE-bench
LiveCodeBench Live competitive programming problems (not in training data) LiveCodeBench
AIME 2025 American Invitational Mathematics Examination MAA
ARC-AGI-2 Abstract reasoning challenge (fluid intelligence) ARC Prize
MMMU / MMMU-Pro Multi-discipline multimodal understanding MMMU
IFEval Instruction-following evaluation Google Research
FrontierMath Expert-level research mathematics (Epoch AI) Epoch AI

Development Tools πŸ› οΈ

AI-powered tools for software development, from IDEs and CLI tools to API providers and IDE extensions.

IDEs πŸ’»

Integrated Development Environments with built-in AI capabilities.

Agentic IDEs

IDE Platform Version Release Date Pricing Key Features GitHub
Firebase Studio Web - - Free (3 workspaces, up to 30 with Google Developer Program) Cloud-based, Gemini, MCP πŸ”—
Lingma IDE (ι€šδΉ‰η΅η ) Windows, macOS - - Free (download) Built-in agent, MCP tool use, terminal command execution ❌
Tonkotsu Windows, macOS - - Free (during early access) Team of agents, workflow πŸ”—
OpenCode Windows, macOS, Linux - - Free (OSS) Terminal, desktop, IDE extension, multi-provider πŸ”—
Codex app Windows - 2026-03-04 00:00 UTC Included with Codex plans Multiple agents, isolated worktrees, reviewable diffs, CLI and IDE interop πŸ”—
Visual Studio Windows, macOS 17.14.12+, 18.1.0+ 2026-01-06 00:00 UTC Free / $250/yr Gemini 3 Flash integration, faster performance, zero-migration upgrades, real-time profiler agent ❌
IntelliJ IDEA Windows, macOS, Linux 2025.3.2 2026-01 Free / $149/yr Java 24 support, Kotlin K2 mode, performance and memory improvements ❌
IBM Bob Cross-platform GA (April 28, 2026) 2026-04-28 Free trial + Enterprise plans Multi-model orchestration, full SDLC, 45% productivity gain πŸ”—
PolyAI ADK PolyAI GA (April 22, 2026) 2026-04-22 Enterprise CX AI-native dev, Cursor/Claude Code integration πŸ”—
JAT Windows, macOS, Linux - 2026-04-14 Free (MIT) Self-contained agentic IDE, 20+ parallel agents, task management, unified environment πŸ”—

Native AI Editors

Editor Platform Version Release Date Pricing Key Features GitHub
Zed macOS, Windows, Linux 0.226.3 2026-03-03 00:00 UTC Free (OSS) + Copilot $10/mo Fast, collaboration, Gemini and Claude, Zeta AI, agent thread history, edit prediction providers, self-hosted OpenAI-compatible servers πŸ”—
Dyad Windows, macOS, Linux - - Free (OSS) Local generation, BYO keys πŸ”—
Memex macOS, Windows - - Freemium (Free + $10/mo) Agentic, browser↔desktop πŸ”—

VS Code Forks

IDE Platform Version Release Date Pricing Autonomous MCP GitHub
Cursor Windows, macOS, Linux 3.2 (May 1, 2026) 2026-05-01 00:00 UTC Freemium (Free + Pro $19/mo or $39/mo) βœ… ❌ ❌
Windsurf Windows, macOS, Linux 2.0.0 (May 3, 2026) 2026-05-03 00:00 UTC Freemium (Free + Pro) βœ… βœ… ❌
Trae macOS, Windows - - Free ❌ ❌ πŸ”—
PearAI Windows, macOS, Linux - - Free (OSS) βœ… ❌ πŸ”—
Void Windows, macOS, Linux - - Free (OSS) βœ… βœ… πŸ”—
Kiro Windows, macOS, Linux - - Free (Preview) βœ… βœ… πŸ”—
VS Code Agents Windows, macOS, Linux Insiders 2026-04-21 Free βœ… βœ… πŸ”—

Web-Based IDEs

IDE Platform Version Release Date Pricing Self-Hostable Best For GitHub
Replit 3 Web - - Free Starter, Core $20/mo, Pro $100/mo ❌ Learning/Prototyping ❌
Bolt.new Web - - Free, Pro $20-25/mo, Teams $30/user/mo ❌ Quick apps ❌
Bolt.diy Self-hosted - - Free (MIT), bring your own API βœ… Self-hosted πŸ”—
Lovable Web - - Free (5 credits/day), Pro $25/mo, Business $50/mo ❌ UI/Full-stack ❌
v0 Web - - Free ($5 credits/mo), Premium $20/mo, Teams $30/user ❌ React components ❌
Gitpod Web - - Free + Paid ❌ Cloud dev environments ❌
Rork Web - - Free & Paid (credits) ❌ Mobile apps (iOS/Android) ❌
Google Stitch Web - 2026-03 Free (Google account, 550 gen/mo) ❌ UI design, Figma/React export ❌
Google Antigravity Web - - Google AI Pro / Ultra Agent-first development with Gemini-powered coding ❌
Jules Web - 2025-05-20 00:00 UTC Free beta, higher limits on Google AI Pro / Ultra Async repo agent, reviewable diffs, GitHub integration ❌

CLI Tools πŸ–₯️

Command-line AI tools for autonomous coding and terminal enhancement.

Autonomous Coding Agents

Tool Platform Pricing Key Features GitHub
Aider Windows, macOS, Linux Free Gold standard, Architect mode, thinking tokens πŸ”—
Claude Code 2.2.1+ macOS, Linux, Windows Free + API Fast mode for Opus 4.7, simple mode file editing, multi-session support πŸ”—
Codex CLI Windows, macOS, Linux Included Sandbox, approval modes πŸ”—
Junie CLI Windows, macOS, Linux Free (BYOK) LLM-agnostic, JetBrains IDE integration, MCP πŸ”—
Goose Windows, macOS, Linux Free (Apache-2.0) MCP, extensible, desktop app, 25+ providers πŸ”—
GPT-Pilot Windows, macOS, Linux Free Full dev team simulation πŸ”—
OpenHands Windows, macOS, Linux Free Cloud agents, MCP πŸ”—
Mentat Windows, macOS, Linux Free Multi-file coordination πŸ”—
SERA Linux, macOS Free (Apache 2.0) Open-source coding agent, 200K synthetic trajectories πŸ”—
AI Dev Kit Cross-platform Free 59 skills, 33 agents, TDD, security audit, CI/CD πŸ”—

Assisted CLI Tools

Tool Developer Pricing Best For
Gemini CLI Google Free Google ecosystem
Cursor CLI Cursor Free tier Terminal + IDE bridge
Qwen Code Alibaba Free Qwen optimization
Qodo CLI Qodo Free tier Testing and review

CLI Tools by Programming Language

AI coding CLI tools categorized by their primary language support. All tools below accept plain English prompts.

Tool Primary Languages Multi-Language Local LLM Cloud API Pricing GitHub
Aider Python, JS, TS, Go, Rust, Ruby, Java, C/C++ βœ… (100+ langs) βœ… (Ollama, LM Studio) βœ… Free (OSS) πŸ”—
Claude Code All (polyglot) βœ… ❌ Claude API ($3–$15/M) Free tool / API cost ❌
Codex CLI Python, JS, TS, Bash βœ… ❌ OpenAI API Free OSS / API cost πŸ”—
OpenHands Python, JS, TS, Go, Rust, Java βœ… βœ… βœ… Free (OSS) πŸ”—
Goose (Block) Polyglot (25+ providers) βœ… βœ… (Ollama, LM Studio) βœ… Free (OSS) πŸ”—
Continue Polyglot (VS Code, JetBrains) βœ… βœ… (Ollama, LM Studio) βœ… Free (OSS) πŸ”—
Qwen Code Python, JS, TS, Go, Java βœ… βœ… (Qwen models) βœ… Free (OSS) πŸ”—
Devstral CLI Python, JS, TS, Go, Rust βœ… βœ… Mistral API Free OSS model / API cost ❌
OpenCode Polyglot βœ… βœ… βœ… Free (OSS) πŸ”—
Mentat Python, JS, TS, Go βœ… ❌ OpenAI API Free (OSS) πŸ”—
Amp (Sourcegraph) All βœ… ❌ βœ… Free / Enterprise ❌

CLI for Programming Languages & Multiple Use

Purpose-built CLI tools for coding across specific languages or polyglot multi-stack workflows.

Tool Language Focus Platform Pricing Key Features GitHub
Aider Polyglot (Python, JS, TS, Go, Rust, any) All Free (BYOK) Git-native multi-file edits, Architect mode, repo maps, thinking tokens πŸ”—
Claude Code Polyglot All Free + API Computer use, sub-agents, CLAUDE.md skills, Opus 4.7, multi-session πŸ”—
Codex CLI Python, JS, TS All Free (OpenAI account) Sandbox execution, approval modes, OpenAI models πŸ”—
OpenHands Python, JS, TS, Go, Rust All Free (OSS) Full SDLC agent, MCP, local LLM via Ollama πŸ”—
Goose Polyglot All Free (Apache-2.0) 25+ providers, MCP, extensible extensions, desktop app πŸ”—
Continue Polyglot All Free (OSS) VS Code + JetBrains, custom models via Ollama/LM Studio πŸ”—
Qwen Code Python, JS, TS, Go All Free Optimized for Qwen3-Coder 480B, Apache 2.0 ❌
Mentat Polyglot All Free Multi-file coordination, context-aware diffs πŸ”—
AI Dev Kit Polyglot All Free 59 skills, 33 agents, TDD, security audit, CI/CD pipeline πŸ”—
Devstral CLI Python, JS, TS, Go All Free (Mistral free tier) Mistral's open coding model, OpenRouter free access ❌
Junie CLI Polyglot All Free (BYOK) LLM-agnostic, JetBrains IDE integration, MCP πŸ”—
SERA Python, JS, TS Linux, macOS Free (Apache 2.0) Open-source coding agent, 200K synthetic trajectories πŸ”—

Terminal Enhancers

Tool Platform Pricing Key Features
Warp Terminal macOS, Linux, Windows Free AI Agents, workflow sharing
Fig macOS, Linux Free Autocomplete, AI suggestions

IDE Add-ons 🧩

Extensions and plugins that add AI capabilities to existing IDEs.

Universal (Cross-Platform)

Add-on Platform Pricing Context Best For GitHub
GitHub Copilot VS Code, JetBrains, Vim Free / $10/mo / $39/mo Large General coding ❌
Supermaven VS Code, JetBrains, Neovim Free / $10/mo 1M Large codebases ❌
Codeium VS Code, JetBrains, Vim Free / $15/mo / $60/mo Medium Free alternative ❌
Continue VS Code, JetBrains Free (OSS) Custom Self-hosted πŸ”—
Cody VS Code, JetBrains, Web Free (discontinued) / Enterprise Starter $19/mo / Enterprise $59/mo Enterprise Code search πŸ”—
Tabnine VS Code, JetBrains, VS, Eclipse Free / $39/mo Local Privacy ❌
Tabby VS Code, JetBrains, Vim, Neovim Free (OSS) Self-hosted Self-hosted code completion πŸ”—

VS Code Specific

Add-on Pricing Autonomous MCP Best For GitHub
Codex Free (with ChatGPT Plus $20/mo or Pro $200/mo) βœ… βœ… OpenAI's official coding agent πŸ”—
Cline Free βœ… βœ… Full agent πŸ”—
GitHub Copilot (Agent Mode) $0 / $10 / $39/mo ⚠️ ❌ Guided agent workflows ❌
RooCode Free/Pro ⚠️ ❌ Complex tasks πŸ”—
Keploy OSS/Enterprise ❌ ❌ Testing ❌

JetBrains Specific

Add-on Pricing Claude Agent Best For
JetBrains AI Assistant $10/mo (Pro), $249/yr (Ultimate) βœ… Deep IDE integration
JetBrains Claude Agent Included in subscription βœ… Native agent

API Providers πŸ”Œ

Services for accessing AI models via API.

Model Labs (Direct)

Provider Models Pricing
OpenAI GPT-5.4, GPT-5.4 mini, GPT-5.4 nano, o3, Codex Pay-per-token
Anthropic Claude Opus 4.6, Sonnet 4.6, Haiku 4.5 Pay-per-token
Alibaba Cloud Qwen3.5-Max, Qwen3-Coder, Qwen3.6-27B Pay-per-token / Coding Plan $50/mo
Gemini (Google) Gemini 3.1 Pro, 3 Pro, 3 Flash Pay-per-token
Z.ai (Zhipu AI) GLM-5, GLM-5.1, GLM-4.7, GLM-5-Code Pay-per-token
MiniMax MiniMax-M2.5/M2.7/M2 Pay-per-token
Cohere Command, Embed, Rerank Pay-per-token
AI21 Labs Jamba Pay-per-token
Perplexity Sonar / Sonar Pro / Sonar Reasoning Pro Pay-per-token + request fees
Moonshot AI Kimi (kimi-k2.5, kimi-k2-thinking) Pay-per-token
ByteDance (Volcengine) Doubao, Seed 1.6/2.0 Pay-per-token
Tencent (Hunyuan) Hunyuan, Hunyuan-a13b Pay-per-token
StepFun Step-3.5-Flash, Step-3.5 Pay-per-token (OpenRouter free)
PaleBlueDot AI Unified platform, 100+ models Token-based pricing
Osirus AI Unified platform + Agent Studio Free tier + paid plans
Logic Spec-driven managed agents Free tier + $49/mo
DeepSeek DeepSeek-V4/R1 Pay-per-token
Mistral AI Mistral Large 3, Codestral Pay-per-token
xAI Grok-4 Pay-per-token

Unified APIs & Aggregators

Provider Models Key Features
OpenRouter 200+ Crypto/fiat, rankings
Hugging Face Thousands Serverless inference

Inference Clouds

Provider Specialization Speed
Together AI Llama/Qwen/Mistral Fast
Fireworks AI FireAttention Low-latency, 6 free models
Groq LPU >500 T/s
Cerebras Wafer-Scale >2000 T/s
NVIDIA NIM 91 free endpoints, DGX Cloud 20Γ— faster than NVIDIA GPU

GPU Clouds

Provider Type Best For
RunPod GPU Rental Flexibility, cost-effective fine-tuning & inference
Replicate Model-as-a-Service Quick deployment, serverless inference
Vultr Global Cloud Hourly GPU instances
Hyperbolic Decentralized Crypto/Fiat payments
Cerebrium Serverless GPU Python-native ML inference & fine-tuning
Together AI AI-Native Cloud Fast, cost-effective inference & fine-tuning for open models
Modal Labs Serverless GPU Fine-tuning with LoRA, distributed training
Fireworks AI Inference & Fine-tuning Fast inference, RFT for model shaping
Databricks Mosaic AI Integrated ML Platform Enterprise fine-tuning, governed serving, RAG
NVIDIA DGX Cloud Managed AI Training Co-engineered clusters, maximum ROI for training
Vast.ai GPU Marketplace Serverless endpoints, diverse GPU options
DigitalOcean GPU Droplets Simple fine-tuning workflows, scalable GPU infrastructure

Automation πŸ€–

AI-powered tools for automating browser and desktop tasks.

Browser Automation 🌐

Tools and frameworks for AI-powered browser automation.

Standalone AI Browsers

Browser Platform Pricing Open Source Local AI Agent/Computer Use API Access Multi-Agent Parallel Sessions Best For GitHub
Perplexity Comet Windows, macOS, iOS, Android Free / Pro $20/mo ❌ ❌ βœ… ❌ ❌ ❌ Research + background tasks, voice mode, Computer Max agent ❌
ChatGPT Agent Mode Web, iOS, Android Plus $20/mo, Pro $200/mo ❌ ❌ βœ… ❌ ❌ ❌ Full computer use: browse, code, fill forms, book travel ❌
Dia macOS (M1+ / macOS 14+) Free / Pro $20/mo ❌ ❌ ⚠️ ❌ ❌ ❌ Tab intelligence, Skills, browsing history AI context ❌
Google Chrome (Auto Browse) Windows, macOS, Linux, ChromeOS Free / Gemini Pro $19.99/mo ❌ ❌ βœ… ❌ ❌ ❌ Gemini 3 built-in, auto browse agentic tasks (enterprise) ❌
Microsoft Edge (Copilot Agent) Windows, macOS, iOS, Android Free / Copilot Pro $20/mo ❌ ❌ βœ… ❌ ❌ ❌ Cross-tab context, voice commands, form automation, bookings ❌
Genspark Web, iOS, Android Free / Plus $25/mo / Pro $249/mo ❌ βœ… (169 local models) βœ… ❌ βœ… βœ… Super Agent, AI slides, AI websites, deep research, Call For Me ❌
Brave Leo (AI Browser) Windows, macOS, Linux, iOS, Android Free / Premium $14.99/mo βœ… (Chromium) βœ… (Leo local) ⚠️ ❌ ❌ ❌ Privacy-first, zero-log AI, Skills, Memories, local models ❌
SigmaOS (Airis) macOS Free / Pro (subscription) ❌ ❌ ⚠️ ❌ ❌ ❌ NL commands: "Book Airbnb in Iceland", cross-tab AI, YC-backed ❌
Opera Neon Windows, macOS $19.90/mo ❌ ❌ βœ… ❌ ❌ ❌ Agentic browsing, Aria assistant, built-in AI tools ❌
Opera One (Aria) Windows, macOS, Linux, iOS, Android Free ❌ ❌ ⚠️ ❌ ❌ ❌ Built-in Aria AI assistant, sidebar AI tools ❌
Firefox (AI Sidebar) Windows, macOS, Linux, iOS, Android Free βœ… ❌ ⚠️ ❌ ❌ ❌ AI Controls dashboard (v148+), ChatGPT/Claude/Mistral sidebars ❌
BrowserOS Linux, macOS Free βœ… βœ… βœ… ❌ βœ… βœ… Privacy-focused, built-in MCP, agentic πŸ”—
Manus AI Web (Cloud) Free 300 credits/day / Plus $20/mo / Pro $200/mo ❌ ❌ βœ… ❌ βœ… βœ… Cloud agent, full computer: code, deploy, files, search ❌
Sigma AI Browser Windows, macOS, Linux Free / Pro $29/mo ❌ βœ… βœ… ❌ ❌ ❌ Built-in local AI agent, offline, no tracking πŸ”—
Fellou Windows, macOS Free 4 tasks/day / Pro $20/mo ❌ ❌ βœ… ❌ ❌ ❌ Complex multi-step automation, agentic tasks πŸ”—
Arc Max macOS, Windows Free ❌ ❌ ⚠️ ❌ ❌ ❌ AI-enhanced browsing, pinch-to-summarize, Ask on Page ❌
Maxthon Windows, macOS, iOS, Android Free / Premium ❌ ❌ ⚠️ ❌ ❌ ❌ MaxAsk AI answers, built-in VPN, ad-blocker, resource sniffer ❌
ChatGPT Atlas macOS Free (with ChatGPT subscription) ❌ ❌ βœ… ❌ ❌ ❌ OpenAI integration, macOS computer use overlay πŸ”—
AnythingLLM Windows, macOS, Linux Free (OSS) βœ… βœ… ⚠️ βœ… (local API) ❌ ❌ All-in-one desktop AI, document chat, local + API πŸ”—
BrowserGPT iOS, Android Free / Premium ❌ ❌ ⚠️ ❌ ❌ ❌ Mobile-first AI browser ❌
Sidekick Browser Windows, macOS, Linux Free / Pro $10/mo ❌ ❌ βœ… ❌ ❌ ❌ AI assistant, natural language tab management, summarize, automate tasks ❌

Browser Extensions

Extension Pricing Free Multi-Agent Best For GitHub
Monica.im Freemium (Free + ~$9/mo) ❌ βœ… Chrome extension, no browser switch ❌
Harpa AI Free βœ… ❌ Automation recipes πŸ”—
MultiOn Free/Paid ⚠️ βœ… Complex tasks πŸ”—
NanoBrowser Free βœ… βœ… Local control, Ollama πŸ”—
Neobrowser Free (OSS) βœ… ❌ Local LLMs via Ollama, privacy-first, Chrome/Edge ❌
Open Operator Free βœ… ❌ Browserbase-powered, open NL browser control πŸ”—
Openator Free (OSS) βœ… ❌ Docker-based headless NL browser agent πŸ”—

Developer Libraries

Library Language Pricing Best For API Access Multi-Agent Parallel Sessions GitHub
Chrome DevTools MCP TypeScript Free (OSS) AI web debugging, 29 DevTools ❌ ❌ ❌ πŸ”—
Cloudflare Browser Run Cloud API Free Workers / $5+/mo CDP + MCP, WebMCP, Live View βœ… ❌ βœ… πŸ”—
Browser-use Python Free OSS / Cloud $29/mo Agentic automation, Workflow Use βœ… βœ… βœ… πŸ”—
Stagehand TypeScript/Python Free (OSS) Hybrid deterministic + AI, action caching βœ… ❌ βœ… πŸ”—
LaVague Python Free (OSS) NL to code ❌ ❌ βœ… πŸ”—
Skyvern Python Free tier / $29–$149/mo CV-based automation, Ollama support βœ… βœ… βœ… πŸ”—
Notte Python/Cloud Free tier / $29/mo+ Deterministic replay, demoβ†’script βœ… ❌ βœ… πŸ”—
Firecrawl Python / CLI Free tier / $49/mo+ LLM-powered crawling & scraping βœ… ❌ βœ… πŸ”—
Playwright MCP TypeScript Free (OSS) Cross-browser automation, VS Code βœ… ❌ βœ… πŸ”—
Langflow Python Free (OSS) / Cloud $29/mo Visual multi-agent & RAG workflows βœ… βœ… βœ… πŸ”—
LlamaIndex Python Free (OSS) / Cloud $29/mo Document-heavy RAG, retrieval quality βœ… βœ… βœ… πŸ”—
Haystack Python Free (OSS) / Cloud $49/mo Regulated deployments, structured pipelines βœ… βœ… βœ… πŸ”—
AgentQL TypeScript/Python Free (1K req/mo) / $49/mo / $149/mo Natural language web querying/automation βœ… βœ… βœ… πŸ”—
ScrapeGraphAI Python Free OSS / Cloud $29/mo Natural language web scraping βœ… βœ… βœ… πŸ”—
WebVoyager Python Free (OSS) Autonomous web browsing research ❌ ❌ βœ… πŸ”—

Cloud Automation

Service Platform Pricing Best For GitHub
ChatGPT agent ChatGPT Plus / Pro / Team Guided browser tasks, research, forms, and spreadsheets ❌
Project Mariner Google AI Ultra Included with Google AI Ultra Multi-step browser tasks, shopping, and reservations ❌
Skyvern Cloud Cloud API Paid Resilient automation πŸ”—
Browserbase Cloud API Paid Stealth mode, session recording ❌

Autonomous Agents β€” Plain English Prompts πŸ€–

Control a computer or cloud sandbox using plain English text β€” no coding required. Just describe the task and the agent handles everything: clicking, typing, navigating, running code, and completing multi-step workflows.

Legend: πŸ–₯️ = runs on your physical computer | ☁️ = cloud/sandbox computer | 🌐 = controls a browser | πŸ” = multi-agent/parallel | πŸ’¬ = simple English prompt | πŸ”“ = free/open-source | πŸ’° = paid


☁️ Cloud Sandbox Computer Use (English Prompts)

These services run in a cloud sandbox (virtual Linux/Windows desktop), control the computer for you, and are driven purely by natural language instructions.

Agent Interface Pricing Multi-Agent Parallel Sessions Local LLM English Prompt GitHub
Manus AI Web dashboard Free (300 credits/day) / Plus $20/mo / Pro $200/mo βœ… βœ… ❌ βœ… ❌
ChatGPT Agent ChatGPT Web/App Plus $20/mo / Pro $200/mo ❌ ❌ ❌ βœ… ❌
Gemini Computer Use API / AI Studio Gemini Pro $19.99/mo / API metered ❌ ❌ ❌ βœ… ❌
Devin Web dashboard Core $20/mo ($2.25/ACU) / Team $500/seat/mo ❌ ❌ ❌ βœ… ❌
OpenHands Web UI / CLI Free OSS / Cloud Individual free βœ… βœ… βœ… (any API) βœ… πŸ”—
E2B Desktop Sandbox API / SDK Hobby free / Pro $150/mo ❌ βœ… (via code) βœ… βœ… πŸ”—
Cua (trycua) CLI / Python SDK Free (OSS) ❌ βœ… βœ… βœ… πŸ”—
Airtop Web dashboard / API Starter $26/mo (3 sessions) / Pro $80/mo (30 sessions) ❌ βœ… ❌ βœ… ❌
Skyvern Cloud Web dashboard / API Free 1K credits / Hobby $29/mo / Pro $149/mo ❌ βœ… βœ… (Ollama) βœ… πŸ”—
Convergence Proxy Web / API Free tier / Pro $20/mo (acquired by Salesforce) ❌ ❌ ❌ βœ… ❌
Amazon Nova Act API (AWS) Pay-per-use (AWS pricing) ❌ βœ… ❌ βœ… ❌
Project Mariner Google AI Ultra Included ($249.99/mo Ultra plan) ❌ ❌ ❌ βœ… ❌
Perplexity Computer Web dashboard Perplexity Pro $20/mo ❌ ❌ ❌ βœ… ❌
OpenAI Computer Use (API) API / ChatGPT $15/M input, $60/M output βœ… βœ… ❌ βœ… ❌

πŸ–₯️ Local Machine / Physical Computer Use

These agents run on your own machine, see your screen, and control your keyboard/mouse β€” no cloud required.

Agent Windows macOS Linux Dashboard/UI CLI API/LLM Multi-Agent Parallel Sessions Pricing GitHub
Claude Computer Use βœ… βœ… βœ… ❌ βœ… (API) Claude API ($3–$15/M) ❌ ❌ Claude API ($3–$15/M tokens) Commercial
Agent TARS (ByteDance) βœ… βœ… βœ… βœ… Web UI βœ… npx @agent-tars/cli@latest Any LLM ❌ βœ… Free (OSS) πŸ”—
UI-TARS Desktop (ByteDance) βœ… βœ… βœ… βœ… Desktop app ❌ UI-TARS-2 model ❌ ❌ Free (OSS) πŸ”—
Open Interpreter βœ… βœ… βœ… βœ… Web βœ… interpreter Any (OpenAI, Claude, local) ❌ ❌ Free (OSS) πŸ”—
Open-Interface βœ… βœ… βœ… ❌ βœ… GPT-4V / any vision LLM ❌ ❌ Free (OSS) πŸ”—
Agent S / S2 βœ… βœ… βœ… ❌ βœ… Any LLM API ❌ ❌ Free (OSS) πŸ”—
UFO (Microsoft) βœ… ❌ ❌ βœ… UI βœ… GPT-4V / Azure ❌ ❌ Free (OSS) πŸ”—
Windows-Use βœ… ❌ ❌ ❌ βœ… Any vision LLM ❌ ❌ Free (OSS) πŸ”—
Bytebot ❌ ❌ βœ… βœ… (Docker) βœ… Any LLM ❌ ❌ Free (OSS) πŸ”—
OpenCUA βœ… βœ… βœ… ❌ βœ… Any ❌ ❌ Free (OSS) πŸ”—
Khoj βœ… βœ… βœ… βœ… Web UI βœ… Any (Ollama, LM Studio, OpenAI) ❌ ❌ Free (OSS) / Cloud $10/mo πŸ”—

🌐 Browser-Only Agents

Control a browser with natural language β€” click, fill forms, scrape, automate. No script writing needed.

Agent Type Pricing Dashboard CLI Multi-Agent Parallel Sessions Local LLM GitHub
Browser-use OSS Python lib + Cloud Free OSS / Cloud: Free 3 sessions / Dev $29/mo / Business $299/mo βœ… Cloud ❌ βœ… βœ… βœ… (Ollama) πŸ”—
Stagehand OSS TypeScript Free (OSS) ❌ βœ… ❌ βœ… βœ… πŸ”—
NanoBrowser Chrome extension Free (OSS) βœ… Extension ❌ βœ… ❌ βœ… (Ollama) πŸ”—
Skyvern Python / Cloud Free tier / $29–$149/mo βœ… Cloud βœ… βœ… βœ… βœ… (Ollama) πŸ”—
Openator Python Free (OSS) ❌ βœ… ❌ βœ… βœ… πŸ”—
Open Operator Web UI Free βœ… ❌ ❌ ❌ ❌ πŸ”—
Airtop Web / API $26–$80/mo βœ… ❌ βœ… βœ… ❌ ❌
MultiOn API / Chrome ext Free / Paid βœ… ❌ βœ… ❌ ❌ πŸ”—

πŸ” Multi-Agent / Parallel Agent Platforms (Plain English Orchestration)

Coordinate multiple AI agents in parallel to complete complex workflows β€” driven by plain English goals.

Platform Type Dashboard CLI Cloud Local LLM Parallel Pricing GitHub
CrewAI Multi-agent OSS + Cloud βœ… AMP Studio βœ… crewai βœ… AMP βœ… βœ… Free OSS / Starter $99/mo / Pro $299/mo / Enterprise custom πŸ”—
AutoGen (Microsoft) Multi-agent conversations ⚠️ βœ… Python βœ… Azure βœ… βœ… Free (OSS) / Azure pay-per-token πŸ”—
LangGraph Stateful agent graphs βœ… LangSmith βœ… Python βœ… Cloud βœ… βœ… Free OSS / Professional $99/mo πŸ”—
OpenHands Dev-focused multi-agent βœ… Web UI βœ… βœ… Cloud βœ… βœ… Free (OSS + Cloud free tier) πŸ”—
OWL (Camel-AI) Distributed multi-agent ❌ βœ… Python ❌ βœ… βœ… Free (OSS) πŸ”—
Manus AI Cloud multi-agent βœ… Web ❌ βœ… ❌ βœ… Free 300 credits/day / $20–$200/mo ❌
n8n Workflow + AI agents βœ… Visual canvas βœ… n8n βœ… Cloud βœ… (Ollama node) βœ… Free OSS / Starter $24/mo / Pro $60/mo πŸ”—
Devin Software engineering βœ… Web ❌ βœ… ❌ ❌ Core $20/mo ($2.25/ACU) / Team $500/seat ❌
Smolagents (HuggingFace) Lightweight code agents ❌ βœ… Python ❌ βœ… ⚠️ Free (OSS) πŸ”—
Dify Visual LLM platform βœ… Web UI βœ… βœ… Cloud βœ… βœ… Free OSS / Cloud plans πŸ”—

Multi-Agent & Parallel Execution Summary

Tools supporting parallel agent orchestration (βœ…) vs single-agent only (❌):

Category Supports Parallel Agents Tools
Cloud Sandbox βœ… Manus AI, OpenHands, E2B Desktop Sandbox, Cua (trycua), Airtop, Skyvern Cloud, Amazon Nova Act, Perplexity Computer, OpenAI Computer Use (API)
Cloud Sandbox ❌ ChatGPT Agent, Gemini Computer Use, Devin, Convergence Proxy, Project Mariner
Local Machine βœ… Agent TARS, E2B Desktop Sandbox, Cua (trycua)
Local Machine ❌ Claude Computer Use, UI-TARS Desktop, Open Interpreter, Open-Interface, Agent S/S2, UFO, Windows-Use, Bytebot, OpenCUA, Khoj
Browser-Only βœ… Browser-use, Skyvern, Airtop, MultiOn
Browser-Only ❌ Stagehand, NanoBrowser, Openator, Open Operator
Developer Libraries βœ… Browser-use, Skyvern, Cloudflare Browser Run, Langflow, LlamaIndex, Haystack, AgentQL, ScrapeGraphAI, WebVoyager
Developer Libraries ❌ Chrome DevTools MCP, Stagehand, LaVague, Notte, Firecrawl, Playwright MCP
Multi-Agent Platforms βœ… CrewAI, AutoGen, LangGraph, OpenHands, OWL, Manus AI, n8n, Smolagents, Dify
Multi-Agent Platforms ❌ Devin

AI Infrastructure πŸ—οΈ

Tools, frameworks, and specialized models for building production AI systems β€” from embeddings and video generation to safety, evaluation, and model routing.

Embedding & Reranking Models 🧲

Specialized models for converting text (or images) into dense vector representations and for reranking retrieval results. Essential infrastructure for RAG pipelines and semantic search. Prices as of April 2026.

Embedding Models

Model Developer Dimensions Max Tokens Pricing Best For GitHub
text-embedding-3-small OpenAI 1,536 8,191 $0.02/1M tokens Cost-effective English embeddings β€”
text-embedding-3-large OpenAI 3,072 8,191 $0.13/1M tokens Highest-quality English retrieval β€”
Embed v4 Cohere 1,536 128K $0.12/1M (text), $0.47/1M (image) Multimodal text + image RAG β€”
voyage-3-large Voyage AI 256–2,048 (flex) 32K ~$0.18/1M tokens Highest-quality retrieval, long context β€”
jina-embeddings-v3 Jina AI 32–1,024 (flex) 8,192 API pay-per-use Multilingual, task-adaptive (LoRA heads) πŸ”—
BGE-M3 BAAI 1,024 8,192 Free (open-source) Multi-functional: dense + sparse + ColBERT πŸ”—
Nomic Embed v2 (MoE) Nomic AI 256–768 (flex) 512 Free (open-source) Multilingual, MoE efficiency (305M active) πŸ”—
text-embedding-005 Google (Vertex AI) 768 2,048 $0.10/1M tokens GCP-native semantic search β€”

Reranking Models

Model Developer Max Tokens Pricing Best For GitHub
Rerank 4.0 Pro Cohere 32K $1.00/1K queries High-accuracy domain-specific reranking β€”
Rerank 4.0 Fast Cohere 32K $0.50/1K queries Low-latency production reranking β€”
rerank-2.5 Voyage AI 32K API pay-per-use Instruction-following, multilingual β€”
BGE Reranker v2-m3 BAAI 8,192 Free (open-source) Open-source cross-encoder reranking πŸ”—
Jina Reranker v2 Jina AI 8,192 API pay-per-use Multilingual, long-context reranking β€”

Video Generation Models 🎬

Text-to-video and image-to-video generation models for creating short clips from prompts. The field is moving rapidly β€” resolutions, durations, and pricing change frequently. Specs as of April 2026.

Model Developer Resolution Duration Pricing Open Source Best For GitHub
Sora 2 OpenAI Up to 1080p Up to 20s (Pro) $20–$200/mo via ChatGPT No Cinematic quality, long clips β€”
Veo 3 Google DeepMind 720p–1080p Up to 8s (extendable) ~$0.20–$0.40/s No Native audio + video, realistic physics β€”
Runway Gen-4 / Gen-4.5 Runway Up to 4K Up to 16s $12–$76/mo No Professional creative workflows β€”
Kling 2.0 Kuaishou 1080p Up to 10s Free / $5.99–$66/mo No Budget production, fast turnaround β€”
Pika 2.0 Pika Labs 1080p Up to 5s Free / $8–$58/mo No Social media, creative effects β€”
MiniMax Video-01 MiniMax 720p Up to 6s ~$0.40/video No Strong text-motion responsiveness β€”
HunyuanVideo Tencent 720p–2K Up to 16s Free (self-host; ~60GB VRAM) Yes (Apache 2.0) High per-frame fidelity, long clips πŸ”—
Wan 2.2 (14B) Alibaba 480p–1080p Up to 10s ~$0.10–$0.30/clip (API) Yes (Apache 2.0) Motion quality, VBench #1 benchmark πŸ”—
Mochi 1 Genmo 480p Up to 5.4s @ 30fps Free (open-source) Yes (Apache 2.0) High-quality open text-to-video πŸ”—
LTX Video Lightricks 720p Variable Free (open-source) Yes Fast generation, ComfyUI-native πŸ”—
CogVideoX Zhipu AI / Tsinghua 720p ~6s Free (open-source) Yes (Apache 2.0) Image-to-video quality, LoRA fine-tuning πŸ”—

Speech & TTS Models πŸ”Š

Text-to-speech (TTS) and speech-to-text (STT / ASR) models for voice generation, transcription, and real-time audio. Prices as of April 2026.

Text-to-Speech (TTS)

Model Developer Languages Real-time Open Source Pricing Best For GitHub
ElevenLabs Turbo v2.5 ElevenLabs 29+ Yes No Free – $1,320/mo Best quality (4.8 MOS), instant voice cloning β€”
OpenAI TTS / TTS HD OpenAI 57 Yes No $15 / $30 per 1M chars Enterprise, seamless GPT integration β€”
Sesame CSM Sesame AI Labs English Yes Yes Free Conversational, emotionally expressive (4.7 MOS) πŸ”—
Kokoro-82M Hexgrad Multilingual Yes Yes (Apache 2.0) Free Tiny (82M params), CPU-runnable, near-commercial quality πŸ”—
Fish Audio S1 Fish Audio Multilingual Yes Yes Free / $0.016/1K chars (API) Voice cloning, multilingual fluency πŸ”—
Parler-TTS HuggingFace English No Yes (Apache 2.0) Free Style-controllable via text descriptions πŸ”—
XTTS v2 Coqui AI 17 Yes Yes (MPL 2.0) Free Best open-source multilingual, 6s voice cloning πŸ”—
Bark Suno AI 13+ No Yes (MIT) Free Expressive, non-verbal sounds, long-form audio πŸ”—

Speech-to-Text (STT / ASR)

Model Developer Languages Real-time Open Source Pricing Best For GitHub
Whisper large-v3 OpenAI 100+ No Yes (MIT) $0.006/min (API) Open-source multilingual baseline πŸ”—
GPT-4o Transcribe OpenAI 50+ Yes No $0.006/min High-accuracy managed STT β€”
Deepgram Nova-3 Deepgram 36+ Yes No $0.0043/min Ultra-low latency, production STT β€”
AssemblyAI Universal-2 AssemblyAI Multilingual Yes No $0.0025/min Accurate, feature-rich transcription β€”

AI Safety & Guardrails πŸ›‘οΈ

Tools and frameworks for detecting unsafe content, preventing prompt injection, validating outputs, and enforcing policy compliance in LLM-powered applications. As of April 2026.

Tool Developer Type Open Source Pricing Best For GitHub
Llama Guard 3 Meta Safety classifier (8B LLM) Yes (Meta license) Free / ~$0.02/1M tokens (API) Input/output safety classification, 8 languages πŸ”—
NeMo Guardrails NVIDIA Programmable guardrail toolkit (Colang DSL) Yes (Apache 2.0) Free Dialog safety, policy enforcement, LangChain-native πŸ”—
OpenAI Privacy Filter OpenAI PII detection & redaction Yes (Apache 2.0) Free (OSS) Detects & redacts personal info in text πŸ”—
Guardrails AI Guardrails AI Python validator framework Yes Free (OSS) Output validation, PII detection, hallucination guards πŸ”—
Amazon Bedrock Guardrails AWS Managed safety layer No Pay-per-use (AWS) AWS-native, zero-ops compliance and content filtering β€”
ShieldGemma 2 Google Safety classifier (open weights) Yes (open weights) Free Text safety (2B/9B/27B), image safety (4B) β€”
Rebuff Protect AI Prompt injection detector Yes Free Self-hardening anti-injection using vector memory πŸ”—
Lakera Guard Lakera Managed LLM security API No Free tier + Enterprise Runtime LLM security, <50ms latency, PII + injection β€”

RAG Frameworks πŸ—‚οΈ

Frameworks and libraries for building Retrieval-Augmented Generation (RAG) pipelines β€” connecting LLMs to external knowledge sources. As of April 2026.

Framework Developer Language Key Features Open Source GitHub
LlamaIndex LlamaIndex Python 160+ data connectors, hybrid search, multi-agent support Yes (MIT) πŸ”—
LangChain LangChain AI Python / JS Chains, agents, memory, 50K+ integrations, LangGraph Yes (MIT) πŸ”—
RAGFlow InfiniFlow Python Visual workflow builder, deep document parsing (PDF/tables) Yes (Apache 2.0) πŸ”—
Haystack deepset Python Modular pipelines, enterprise-grade, built-in monitoring Yes (Apache 2.0) πŸ”—
Verba Weaviate Python No-code UI, Weaviate-native vector search Yes πŸ”—
Mem0 Mem0 AI Python / JS Persistent memory layer, graph memory, session recall Yes (Apache 2.0) πŸ”—
txtai NeuML Python All-in-one semantic search + workflow automation Yes (Apache 2.0) πŸ”—
R2R SciPhi Python Lightweight, low-latency, REST API, production-first Yes (MIT) πŸ”—

Fine-tuning Platforms βš™οΈ

Tools and platforms for adapting pre-trained LLMs to specific tasks or domains via supervised fine-tuning, RLHF, LoRA/QLoRA, and related methods. Prices as of April 2026.

Platform Type Supported Models Pricing Best For GitHub
Unsloth OSS library Llama, Mistral, Gemma, Qwen, Phi, + more Free 2–5Γ— faster training, 80% VRAM reduction via custom kernels πŸ”—
Axolotl OSS framework Most Hugging Face models Free Config-as-code (YAML), reproducibility, multi-GPU training πŸ”—
OpenAI Fine-tuning Managed API GPT-4o, GPT-4o-mini, GPT-3.5 Turbo GPT-4o-mini: $0.30/1M training tokens Managed, no infra, direct production deployment β€”
Google Vertex AI Managed cloud Gemini 2.5 Pro/Flash, Gemma 3 Gemini 2.5 Pro: $25/1M training tokens GCP-native, Gemini model access β€”
Predibase / LoRAX Cloud + OSS server Llama, Mistral, 50+ HF models Free tier + per-GPU pricing Multi-adapter serving: many LoRA adapters on one GPU πŸ”—
PEFT Hugging Face All Hugging Face models Free LoRA, QLoRA, prefix tuning, prompt tuning β€” full HF ecosystem πŸ”—
LLaMA-Factory Community 100+ models Free Web UI, low-code interface, beginner-friendly fine-tuning πŸ”—
torchtune PyTorch Llama, Gemma, Mistral, Phi Free PyTorch-native, composable training recipes πŸ”—

Evaluation & Observability πŸ“Š

Tools for tracing LLM calls, evaluating output quality, debugging RAG pipelines, and monitoring production AI systems. Prices as of April 2026.

Tool Developer Type Open Source Pricing Best For GitHub
LangSmith LangChain AI Tracing + evaluation platform No (enterprise self-host) Free (5K traces/mo), paid plans LangChain apps, chain + agent debugging β€”
Braintrust Braintrust Data Eval-first platform Partial (AI proxy OSS) Free (1M spans), enterprise CI/CD evals, dataset management, LLM-as-judge β€”
Helicone Helicone Proxy-based observability Yes Free tier, usage-based Cost tracking, request caching, drop-in API proxy πŸ”—
Arize Phoenix Arize AI OSS tracing + evaluation Yes Free (OSS); Arize Cloud paid RAG debugging, LLM-as-judge, local dev πŸ”—
Langfuse Langfuse Tracing + evaluation Yes (MIT) Free / self-host; cloud paid Open-source, 19K+ GitHub stars, OpenTelemetry πŸ”—
Ragas Ragas RAG evaluation framework Yes Free RAG-specific metrics: faithfulness, recall, precision πŸ”—
DeepEval Confident AI LLM evaluation framework Yes Free (OSS); cloud paid 14+ built-in metrics, pytest-style eval runner πŸ”—

MCP Ecosystem πŸ”Œ

The Model Context Protocol (MCP) is an open standard by Anthropic for connecting LLMs to external tools and data sources via a unified JSON-RPC 2.0 interface. It supports STDIO and Streamable HTTP transports. The official MCP registry at mcp.so lists 2,000+ servers.

MCP Clients: Claude Desktop, Claude Code, Cursor, Windsurf, VS Code (Copilot), Continue.dev, Zed, LibreChat, and more.

Popular MCP Servers

Tool / Server Developer Category Open Source Best For GitHub
MCP Filesystem Anthropic / Community File I/O Yes (MIT) Read/write local files from any MCP client πŸ”—
MCP GitHub GitHub / Anthropic Code & DevOps Yes Repo management, issues, PRs, code search πŸ”—
MCP Slack Community Messaging Yes Slack workspace read/write interaction πŸ”—
MCP PostgreSQL Community Database Yes Read-only SQL queries against Postgres πŸ”—
MCP Google Drive Community Storage Yes Drive file access and search πŸ”—
MCP Docker Community DevOps Yes Container management and inspection πŸ”—
MCP Brave Search Brave Search Yes Web + local search via Brave API πŸ”—
MCP AWS AWS Labs Cloud Yes (Apache 2.0) AWS service integration πŸ”—
MCP Notion Community Productivity Yes Notion page and database access πŸ”—
FastMCP Community Framework Yes Python framework for building MCP servers fast πŸ”—
Context7 Upstash Dev Tools Yes Up-to-date library docs for AI coding assistants πŸ”—

Agent Skills & Registries 🎯

Modular capability packages that extend AI agents with specialized knowledge, workflows, and procedural instructions β€” without bloating model context.

skills.sh

skills.sh is the primary registry and package manager for Agent Skills β€” an open standard developed by Anthropic for packaging and distributing reusable agent capabilities. Skills follow a progressive disclosure pattern: agents load only a skill's name and description at startup, then pull full instructions only when a task matches, keeping context overhead minimal.

Install a skill in one command:

npx skills add owner/repo
Feature Detail
Standard Agent Skills (open, SKILL.md format) β€” developed by Anthropic, hosted on GitHub
Registry URL skills.sh
Total installs 90,989+ all-time
Compatible agents Claude Code, Cursor, Windsurf, VS Code Copilot, Continue.dev, Zed, and any MCP-compatible agent
License Open (skills are author-licensed; spec is open standard)

Top Skills by Category

Skill Publisher Category Installs
find-skills vercel-labs/skills Discovery 1.3M
vercel-react-best-practices vercel-labs/agent-skills Frontend 366K
frontend-design anthropics/skills Design 361K
web-design-guidelines vercel-labs/agent-skills Design 291K
microsoft-foundry microsoft/azure-skills Cloud/Azure 286K
azure-ai microsoft/azure-skills AI/Cloud 276K
agent-browser vercel-labs/agent-browser Browser 229K
skill-creator anthropics/skills Meta 180K
browser-use browser-use/browser-use Automation 71.6K
systematic-debugging obra/superpowers Dev 78.5K
test-driven-development obra/superpowers Dev 68.0K
seo-audit coreyhaines31/marketingskills Marketing 95.4K
supabase-postgres-best-practices supabase/agent-skills Database 138K
playwright-best-practices currents-dev/playwright Testing 34.2K

Notable Publisher Ecosystems

Publisher Skills Count Focus
microsoft/azure-skills 19+ Azure cloud, AI, Kubernetes, cost optimization
vercel-labs/agent-skills 15+ React, Next.js, Tailwind, deployment
anthropics/skills 15+ Design, docs, coding, web artifacts
coreyhaines31/marketingskills 20+ SEO, marketing, content, analytics
obra/superpowers 12+ Dev workflows, parallel agents, TDD
firebase/agent-skills 10+ Firebase, Firestore, GenKit
larksuite/cli 13+ Lark workspace automation
pbakaus/impeccable 10+ Design polish, code quality

Model Routers & Load Balancers πŸ”€

Tools for routing LLM requests across multiple providers, models, and deployments β€” optimizing for cost, latency, quality, or reliability. Prices as of April 2026.

Tool Developer Key Features Open Source Pricing GitHub
LiteLLM BerriAI 100+ provider support, proxy server, load balancing, fallbacks, spend tracking Yes (MIT) Free (OSS) / $99/mo cloud πŸ”—
Portkey Portkey 250+ LLMs, AI gateway, guardrails, observability, virtual keys Yes (Apache 2.0) Free tier / $49/mo+ πŸ”—
OpenRouter OpenRouter 200+ model catalog, unified API, pay-per-use credit system No ~5% markup on provider cost β€”
RouteLLM LMSys Open-source router (strong vs. weak model) using classifier or matrix factorization Yes Free πŸ”—
Not Diamond Not Diamond Pre-trained + custom task-specific routers, cost/quality tradeoff No Free tier + enterprise β€”
Unify AI Unify Quality / cost / latency-aware routing across 100+ model deployments No Usage-based β€”
Semantic Router Aurelio AI Embedding-based semantic intent routing for agents and pipelines Yes Free πŸ”—

Small Language Models (SLMs) πŸ“±

Compact models designed for on-device inference, edge deployment, low-latency APIs, and resource-constrained environments. Generally defined as models under ~15B parameters. Specs as of April 2026.

Model Developer Params Context License Best For
Phi-4 Microsoft 14B 16K MIT Reasoning, math, code β€” STEM benchmark leader at class size
Phi-4-mini Microsoft 3.8B 128K MIT On-device STEM reasoning with long context
Phi-4-multimodal Microsoft 5.6B 128K MIT Vision + audio + text multimodal, edge deployment
Gemma 3 27B Google 27B 128K Apache 2.0 Top open model, multilingual (140+ languages)
Gemma 3 4B Google 4B 128K Apache 2.0 CPU inference, 140+ languages, mobile-friendly
Gemma 3 1B Google 1B 32K Apache 2.0 On-device, embedded, ultra-lightweight
SmolLM3 Hugging Face 3B 128K Apache 2.0 Efficient, tool use, multilingual, reasoning
Qwen2.5 3B Alibaba 3B 128K Apache 2.0 Asian and multilingual tasks, coding
Qwen2.5 7B Alibaba 7B 128K Apache 2.0 Strong multilingual baseline, function calling
Llama 3.2 3B Meta 3B 128K Llama 3.2 license General-purpose, on-device, Meta ecosystem
Llama 3.2 1B Meta 1B 128K Llama 3.2 license Lightweight edge inference, distillation target
Granite 3.3 8B IBM 8B 128K Apache 2.0 Enterprise tasks, tool use, business-domain
MiniCPM 3.0 ModelBest / Tsinghua 4B 32K Apache 2.0 Compact yet capable, mobile and edge
Danube 3 500M H2O.ai 500M 8K Apache 2.0 Ultra-lightweight on-device, IoT

Notable GitHub repos:


Guides πŸ“š

Tutorials, how-tos, and in-depth guides for getting the most out of AI models and tools.

Getting Started πŸš€

A beginner-friendly introduction to AI models and how to start using them effectively.

Understanding LLMs

Concept Description
Parameters Size of model (B = billions). More = more capable
Context Window How much text model can process (128K standard)
Tokens Basic units of text (~0.75 words per token)

Accessing AI Models

Method Best For Setup Difficulty
Web Interfaces Quick experiments Easiest
API Access Building applications Easy
Self-Hosting Privacy, no API costs Medium-Hard
IDE Integration Daily coding Easy

Model Recommendations by Task

Task Free Option Premium Option
Chat Llama 4 (self-hosted) GPT-5.4, Claude Opus 4.6
Coding DeepSeek-Coder-V2 Claude Opus 4.6
Reasoning DeepSeek-R1 Gemini 3 Deep Think, o3
Long docs Llama 4 Scout Gemini 3 Flash
Vision Llama 4 Maverick GPT-5.4, Gemini 3 Pro

Free Models & APIs for Vibe Coding πŸ’»

Vibe coding β€” describing what you want in natural language and letting AI generate the code β€” has exploded in 2026. The ecosystem splits into two tracks: free AI APIs you plug into your own editor/agent, and free vibe coding IDEs/platforms that bundle everything together.

Free AI APIs for Coding

These are the raw API endpoints you can use in tools like Cursor (BYOK), Cline, or any agent framework.

Provider Free Models Daily Limit Best For
Google Gemini API Gemini 2.5 Pro (100 req/day), Gemini 2.5 Flash (250 req/day), Gemini 2.5 Flash-Lite (1,000 req/day) Per-project limits Prototyping, large context (1M tokens), multimodal
Groq Cloud Llama 4 Scout, DeepSeek R1, Qwen3, GPT-OSS ~1,000-14,400 req/day Fast iteration, agentic workflows
OpenRouter 28+ free models including Qwen3 Coder 480B, Devstral 2, MiMo-V2-Flash, DeepSeek R1, GPT-OSS 120B, Llama 3.3 70B Varies by model Experimenting with many models
Cerebras Llama 3.3 70B, Qwen3 32B/235B, GPT-OSS 120B 1M tokens/day Batch tasks, raw speed (20Γ— faster than GPUs)
Mistral AI Codestral-2508, Devstral, Mistral Large, Pixtral 1B tokens/month Code completion, FIM tasks
NVIDIA NIM 91 free endpoints including Chinese models Varies Production inference on DGX Cloud

Free Vibe Coding IDEs & Platforms

Tool Type Key Features Best For
Cursor AI IDE Agent Mode, Composer 2, multi-agent workspace Professional development
Cline VS Code Extension Open-source, BYOK/Ollama, MCP tools Self-hosted, unlimited local LLM
Windsurf AI IDE Cascade agent, live browser preview IDE with browser integration
OpenHands Docker Agent Self-hosted, local LLM support, full SDLC Unlimited local development
bolt.diy Browser IDE 19+ LLM providers, Ollama, full-stack apps Free web app building
Open Interpreter CLI Natural language β†’ code, local LLM Simple local automation

Chrome DevTools MCP - Game Changer for Web Dev

Google's Chrome DevTools MCP connects AI agents directly to Chrome for debugging, profiling, and automation:

  • 29 tools across 6 categories (input, navigation, emulation, performance, network, debugging)
  • Run Lighthouse audits, capture performance traces, inspect network requests
  • Works with Claude Code, Cursor, Copilot via MCP
  • Supports BYOK/local LLMs through MCP clients
  • GitHub | 37,783+ stars

Cloudflare Browser Run

Managed browser infrastructure for AI agents:

  • Chrome DevTools Protocol (CDP) direct endpoint
  • MCP client support (Claude, Cursor, OpenCode)
  • Session recordings, Live View, WebMCP
  • Free Workers plan / $5+/mo paid
  • Browser Run

Recommendations by Use Case

Free + Local LLM: Cline + Ollama, OpenHands + Qwen3 Coder, bolt.diy + Ollama

Fast API Iteration: Groq (speed) + Cerebras (high limits)

Web Development: Chrome DevTools MCP + Cline (zero-cost debugging)

Many Models: OpenRouter (unified API)

Production Inference: NVIDIA NIM, Cerebras

πŸ’‘ Pro Tip: Combine Chrome DevTools MCP with a local LLM (Ollama) via Cline for completely free, unlimited AI-powered web development and debugging.

A comprehensive guide to running AI models on your own hardware.

Benefits

Benefit Description
Privacy Data never leaves your infrastructure
Cost Control No per-token API costs for unlimited usage
Customization Fine-tune models for specific needs
No Rate Limits Process as much as hardware allows
Offline Access Work without internet

Quick Start with Ollama

For installation and usage instructions, refer to the official Ollama documentation.

Local GPU Quick Guide

Recommended apps (local-first):

  • Ollama - Simple local runtime with a local HTTP API
  • LM Studio - Desktop UI for downloading and running models locally
  • llama.cpp - Fast local inference (CPU/GPU), great for quantized models
  • Open WebUI - Optional local web UI (pairs well with local runtimes)

If you want β€œserver-style” hosting (advanced):

  • vLLM - High-throughput serving for NVIDIA GPUs
  • SGLang - Structured generation and serving workflows

Practical setup tips:

  1. Install the latest NVIDIA drivers (enable GPU acceleration in your chosen app)
  2. Start with smaller quantized models (Q4 is a common β€œbest default”)
  3. Keep context windows realistic for local hardware (lower context = faster, less memory)
  4. Watch VRAM first, then system RAM; reduce model size or quantization if either saturates
  5. Prefer running locally on localhost and only expose to LAN if you understand firewall rules

Example hardware configurations:

Hardware Good starting point Notes
Consumer GPU (24 GB VRAM) 7B–14B quantized e.g., RTX 4090, RTX 3090 β€” great for chat/coding
Pro GPU (48–80 GB VRAM) 14B–70B quantized e.g., A6000, A100 β€” coding agents, longer contexts
Multi-GPU (160+ GB VRAM) 70B+ quantized e.g., 2Γ—A100 β€” larger open-source models
CPU-only (32–64 GB RAM) 7B–14B quantized Slower but viable for offline chat; keep context moderate

Deployment Options

Option Best For Pros Cons
Local Machine Personal use Simple, no latency Limited hardware
Dedicated Server Team use Full control Maintenance
Cloud GPU Rental Experimentation On-demand Hourly costs
Kubernetes Enterprise Scalable Complex

Cost Analysis πŸ’°

Comprehensive pricing comparisons and cost calculations.

Pricing Tiers

Tier Price Range Models
πŸ†“ Free $0 Self-hosted, free tiers
πŸ’Έ Budget $0.025 - $0.50/1M Gemini 3.1 Flash-Lite, GLM-4.7-FlashX, GPT-5.4 nano, Grok 4 Fast
πŸ’° Mid-range $0.60 - $15.00/1M GPT-5.4 mini, Claude Haiku 4.5, Kimi K2.5, Sonar, GLM-5, GPT-5.4, Claude Sonnet
πŸ’Ž Premium $15.00 - $600.00/1M GPT-5.4 Pro, Claude Opus, o1-Pro

Subscription Pricing (Monthly, USD)

AI chat apps

Product Plans (USD) Notes Official Source
ChatGPT Go $8, Plus $20, Pro $200, Business $25/seat (annual) or $30/seat (monthly), Enterprise (contact sales) Consumer prices are US-listed; Go is localized in some markets πŸ”—
Claude Pro $20, Max $100 (5Γ—) or $200 (20Γ—), Team/Enterprise (see pricing) Prices shown exclude applicable taxes; availability varies by region πŸ”—
Google AI (Gemini) Plus $7.99, Pro $19.99, Ultra $249.99 US pricing; some regions/local pricing differ πŸ”—

Coding assistants

Tool Plans (USD) Notes Official Source
GitHub Copilot Free $0, Pro $10, Pro+ $39, Business $19/user, Enterprise $39/user Annual options available for Pro/Pro+ πŸ”—

Model Pricing Comparison

Model Input Output Cached Input Best For
GLM-4.7-FlashX $0.07 $0.40 β€” Fast budget tasks
Step-3.5-Flash $0.10 $0.30 β€” Ultra-fast reasoning (85–350 tok/s)
GLM-4-32B-0414-128K $0.10 $0.10 β€” Budget chat/coding
Llama 4 Maverick $0.15 $0.60 β€” Open multimodal (self-host: $0)
GPT-5.4 nano $0.20 $1.25 $0.02 Classification and lightweight subagents
Grok 4 Fast $0.20 $0.50 $0.05 Fast Grok reasoning
Gemini 3.1 Flash-Lite $0.025 $1.50 $0.0025 Budget multimodal, fastest Google model
DeepSeek-V3.1 $0.27 $0.41 β€” Everything
DeepSeek-V3.2 $0.28 $0.42 $0.028 Budget workhorse, reasoning
DeepSeek-V4 $0.30 $0.50 $0.03 Engram memory, coding (off-peak 50% off)
Gemini 3 Flash $0.30 $2.50 $0.05 + $1/hr Long context
MiniMax-M2.5 $0.30 $1.20 Auto (included) Coding, long context
Mistral Large 3 $0.50 $1.50 β€” Strong open-source frontier model
Kimi K2.5 $0.60 $3.00 Auto (included) Multimodal + agent tasks
GPT-5.4 mini $0.75 $4.50 $0.075 Fast coding and multimodal tasks
Claude Haiku 4.5 $1.00 $5.00 β€” Low-latency coding and sub-agents
GLM-5 $1.00 $3.20 $0.20 Agentic engineering
Perplexity Sonar $1.00 $1.00 β€” Web-grounded chat (request fees apply)
GPT-5.3-Codex $1.75 $14.00 $0.175 Agentic coding, 7+ hour autonomy
Gemini 3.1 Pro $2.00 $12.00 $0.20–$0.40 + $4.50/hr Frontier reasoning
Perplexity Sonar Reasoning Pro $2.00 $8.00 β€” Reasoning + search (request fees apply)
GPT-5.4 $2.50 $15.00 $0.25 Frontier coding and professional work
Grok 4 $3.00 $15.00 $0.75 First-principles reasoning
Perplexity Sonar Pro $3.00 $15.00 β€” Higher quality + search (request fees apply)
Claude Sonnet 4.5 $3.00 $15.00 $0.30 (hit) Best coding
Claude Sonnet 4.6 $3.00 $15.00 $0.30 (hit) Near-Opus performance
Claude Opus 4.6 $5.00 $25.00 $0.50 (hit) Agentic coding

Self-Hosting vs API (Monthly)

Usage Level Self-Host (A100) API (GPT-5) Winner
Light (1M tokens) $300 (rental) $10 API
Medium (100M tokens) $300 $1,000 Self-host
Heavy (1B tokens) $300 $10,000 Self-host
Enterprise (10B+ tokens) $2,000 (owned) $100,000+ Self-host

Reference πŸ“–

Reference materials including glossary, comparison tables, and data sources.

Glossary πŸ“–

Definitions of common terms used throughout the documentation.

A-E

Term Definition
Agent AI system that autonomously performs tasks and interacts with environments
API Interface for programmatically accessing AI models
Attention Mechanism Neural network component focusing on relevant input parts
Benchmark Standardized test measuring model performance
Chain-of-Thought (CoT) Prompting technique showing step-by-step reasoning

F-L

Term Definition
Fine-Tuning Adapting pre-trained model to specific tasks
Frontier Model State-of-the-art proprietary model
GPU Hardware accelerator essential for ML
LLM Large Language Model
LoRA Efficient fine-tuning method

M-R

Term Definition
MCP Model Context Protocol for tool interaction
MMLU Massive Multitask Language Understanding benchmark
MoE Mixture of Experts architecture
Multimodal Processing multiple input types
RAG Retrieval-Augmented Generation

S-Z

Term Definition
Self-Hosting Running models on own infrastructure
SLM Small Language Model
SWE-bench Benchmark for real GitHub issue resolution
Token Basic unit of text processing
VRAM GPU memory for model storage

Comparison Tables πŸ“Š

Side-by-side comparisons of AI models sorted by various criteria.

Sort by Latest Update (Default)

🏒 Company πŸ€– Model πŸ“¦ Version πŸ“… Release Date πŸ”„ Latest Updated πŸ’» Coding πŸ“Š Benchmarks πŸ’° Price πŸ–₯️ Self-Host πŸ”— Official Site
πŸ€– OpenAI GPT-5 5.4 mini 2026-03-17 00:00 UTC 2026-03-17 00:00 UTC ⭐ βœ… GPQA 87.5% $0.75 / $4.50 ❌ πŸ”—
πŸ€– OpenAI GPT-5 5.4 2026-03-05 00:00 UTC 2026-03-05 00:00 UTC ⭐ βœ… GPQA 92.0%, SWE-bench ~80% $2.50 / $15.00 ❌ πŸ”—
🌐 Google DeepMind Gemini 3.1 Flash-Lite 2026-03-03 00:00 UTC 2026-03-03 00:00 UTC ⭐ βœ… β€” $0.25 / $1.50 ❌ πŸ”—
πŸ”¬ DeepSeek DeepSeek V4 2026-02-17 00:00 UTC 2026-02-17 00:00 UTC βœ… No public benchmarks Pay-per-token βœ… πŸ”—
🌐 Google DeepMind Gemini 3 Deep Think 2026-02-12 00:00 UTC 2026-02-12 00:00 UTC ⭐ βœ… GPQA ~97%, ARC-AGI-2 84.6%, HLE 48.4% Ultra subscription ❌ πŸ”—
πŸ‡¨πŸ‡³ Zhipu AI GLM 5 2026-02-12 00:00 UTC 2026-02-12 00:00 UTC ⭐ βœ… GPQA 82.0%, SWE-bench 77.8% $1.00 / $3.20 βœ… πŸ”—
πŸ€– Anthropic Claude Opus 4.6 2026-02-05 00:00 UTC 2026-02-05 00:00 UTC ⭐ βœ… GPQA 91.3%, SWE-bench 80.8% $5 / $25 ❌ πŸ”—
πŸ€– OpenAI GPT-5 5.3-Codex 2026-02-05 00:00 UTC 2026-02-05 00:00 UTC ⭐ βœ… GPQA 91.5%, SWE-bench Pro 56.8% TBD ❌ πŸ”—
πŸŒ™ Moonshot AI Kimi K2.5 2026-01-29 00:00 UTC 2026-02-02 00:00 UTC ⭐ βœ… GPQA 87.6%, SWE-bench 76.8% $0.60 / $3.00 ❌ πŸ”—

Release Windows (Month-level)

🏒 Company πŸ€– Model πŸ“… Release Window Notes πŸ”— Official Site
🧠 MiniMax MiniMax M2.5 2026-02 $0.30 / $1.20 πŸ”—
πŸ‡¨πŸ‡³ Alibaba/Qwen Qwen 3.5-Max 2026-02 Open-source release window πŸ”—
🌐 Google DeepMind Gemini 3.1 Flash-Lite 2026-02 Budget Gemini model πŸ”—
🌐 Google DeepMind Gemini 3 Pro 2026-01 Tiered pricing πŸ”—
πŸ€– OpenAI GPT-5.4 family 2026-03 GPT-5.4, GPT-5.4 mini, GPT-5.4 nano πŸ”—
πŸ‡«πŸ‡· Mistral AI Mistral Large 3 2025-11 Apache 2.0 open-source, 123B params πŸ”—

Sort by Price (Cheapest)

Rank Model Input Output License
1 Self-hosted $0 $0 Various
2 GLM-4.7-Flash $0 $0 Free
3 GLM-4.7-FlashX $0.07 $0.40 API
4 GLM-4-32B-0414-128K $0.10 $0.10 API
5 Yi-Lightning $0.14 $0.42 Apache 2.0
6 GPT-5.4 nano $0.20 $1.25 Proprietary
7 Gemini 3.1 Flash-Lite $0.025 $1.50 Proprietary
8 DeepSeek-V3.1 $0.27 $0.41 MIT
9 Gemini 3 Flash $0.30 $2.50 Proprietary
10 MiniMax-M2.5 $0.30 $1.20 Proprietary

Sort by Performance (Coding)

Rank Model HumanEval Self-Host
1 Claude Sonnet 4.5 ~92% ❌
2 GPT-OSS-120B ~89% βœ…
3 DeepSeek-Coder-V2 ~92% βœ…
4 Qwen3-Coder ~92% βœ…
5 DeepSeek-V3.1 82%+ βœ…

Sort by Context Window

Rank Model Context Best For
1 Gemini 3 Flash 10M Entire libraries
2 Llama 4 Scout 10M Long-document RAG
3 Gemini 3 Pro 1M+ Research papers
4 Kimi K2.5 256K Large codebases

Data Sources πŸ“š

Attribution, verification sources, and methodology.

Primary Sources

Company Source URL
OpenAI Official Documentation openai.com
OpenAI ChatGPT agent release notes help.openai.com
OpenAI Model release notes help.openai.com
OpenAI API pricing platform.openai.com
OpenAI March 2026 model news openai.com
OpenAI ChatGPT subscriptions (Go/Plus/Pro) openai.com
OpenAI ChatGPT Business pricing help.openai.com
Anthropic Claude Documentation anthropic.com
Anthropic Claude Haiku 4.5 announcement anthropic.com
Anthropic Claude Pro pricing anthropic.com
Anthropic Max plan pricing anthropic.com
Google Gemini Documentation deepmind.google
Google Gemini API models (Flash-Lite pricing) ai.google.dev
Google Project Mariner deepmind.google
Google Google AI plans one.google.com
Google Google AI Plus pricing blog.google
Google Google AI Pro pricing one.google.com
Google Google AI Ultra pricing blog.google
GitHub Copilot plans & pricing github.com
Zhipu AI (Z.ai) Developer Documentation docs.z.ai
MiniMax Developer Documentation platform.minimax.io
MiniMax Pricing (Pay‑as‑you‑go) platform.minimax.io
Moonshot AI Developer Documentation platform.moonshot.ai
Moonshot AI Models & Pricing platform.moonshot.ai
Cohere Developer Documentation docs.cohere.com
AI21 Labs Developer Documentation docs.ai21.com
Perplexity Developer Documentation docs.perplexity.ai
ByteDance (Volcengine) Developer Documentation volcengine.com
Tencent (Hunyuan) Cloud Documentation cloud.tencent.com
Baidu (ERNIE) AI Studio Documentation ai.baidu.com
DeepSeek Official Website deepseek.com
Meta Llama Documentation llama.meta.com

Benchmark Sources

Benchmark Source Description
GPQA Diamond Google Research Graduate-level science questions (PhD difficulty)
MMLU-Pro TIGER-Lab Extended multi-task language understanding
Arena Elo lmarena.ai Crowdsourced human preference ranking
HLE Scale AI Humanity's Last Exam β€” expert-level questions
SWE-bench Verified Princeton Real GitHub issue resolution (human-verified)
SWE-bench Pro Princeton More challenging subset of SWE-bench
LiveCodeBench LiveCodeBench Live competitive programming problems
AIME 2025 MAA American Invitational Mathematics Examination
ARC-AGI-2 ARC Prize Abstract reasoning challenge (fluid intelligence)
MMMU / MMMU-Pro MMMU Multi-discipline multimodal understanding
IFEval Google Research Instruction-following evaluation
FrontierMath Epoch AI Expert-level research mathematics
HumanEval OpenAI 164 Python programming problems

Verification Methodology

  1. Primary Source Review - Check official documentation
  2. Cross-Validation - Compare multiple sources
  3. Timestamp Verification - All data includes verification date
  4. Update Tracking - Monitor official channels

Last Updated: 2026-05-04 16:36 UTC Maintained by: ReadyPixels LLC


Made with ❀️ by ReadyPixels LLC

Star on GitHub

About

Research-based comparison of AI models, development tools, and automation resources. Compare releases, pricing, benchmarks, and deployment options from official sources.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors