Awesome AI Models Matrix π§
Research-based list of AI models, development tools, and automation resources. Use it to compare releases, pricing, benchmarks, and deployment options from official sources.
Document Version: 3.0
Last Updated: 2026-05-04 21:38 UTC
Repository: https://github.com/ReadyPixels/AI_Models_Matrix
Comprehensive documentation of Large Language Models (LLMs), Small Language Models (SLMs), and specialized AI models available today.
State-of-the-art proprietary AI models with cutting-edge capabilities from leading AI labs.
Model
Company
Context
GPQA Diamond
Arena Elo
SWE-bench Verified
AIME 2025
Pricing
Verified
GPT-5.5
OpenAI
1M
93.2%
β
β
β
$5.00 / $30.00
2026-04-26
GPT-5.5 Pro
OpenAI
1M
95.1%
β
92.3%
98.5%
$15.00 / $60.00
2026-04-24
Claude Opus 4.7
Anthropic
1M
94.2%
β
87.6%
~95%
$5 / $25
2026-04-26
Claude Sonnet 4.6
Anthropic
1M
89.9%
~1438 (Text) / 1523 (Code)
79.6%
~95%
$3 / $15
2026-04-26
GPT-5.3-Codex
OpenAI
400K
91.5%
β
85.0%
β
$1.75 / $14.00
2026-04-26
Gemini 3.1 Pro
Google
1M
94.3%
1494 (Text) / 1455 (Code)
80.6%
100%
$2 / $12
2026-04-26
Gemini 3 Deep Think
Google
1M+
~97%
β
~58%
β
Ultra subscription
2026-04-26
GLM-5
Zhipu AI
200K
82.0%
~1451 (Text) / 1445 (Code)
77.8%
92.7%
$1.00 / $3.20
2026-04-26
GLM-5.1
Zhipu AI
200K
β
β
~80.4% (est.)
β
$1.00 / $3.20
2026-04-26
MiniMax-M2.5
MiniMax
200K
85.2%
β
80.2%
86.3%
$0.30 / $1.20
2026-04-26
Kimi K2.6
Moonshot AI
256K
90.5%
β
80.2%
96.4%
$0.60 / $3.00
2026-04-26
DeepSeek-V4
DeepSeek
1M
β
β
β
β
$0.30 / $0.50
2026-04-26
DeepSeek-V3.2
DeepSeek
164K
87.1%
β
67.8%
89.3%
$0.28 / $0.42
2026-04-26
Qwen3.5-Max
Alibaba
128K
89.3%
β
76.4%
91.3%
Pay-per-token
2026-04-26
Gemini 3 Pro
Google
1M+
91.9%
1486 (Text) / 1438 (Code)
76.2%
98β100%
Tiered pricing
2026-04-26
Gemini 3 Flash
Google
10M
90.4%
1474 (Text) / 1438 (Code)
78.0%
β
$0.30 / $2.50
2026-04-26
Gemini 3.1 Flash-Lite
Google
1M
86.9%
1432
β
β
$0.25 / $1.50
2026-04-26
GPT-5.4
OpenAI
1M
92.0%
1484 (Text) / 1457 (Code)
~80%
88%
$2.50 / $15.00
2026-04-26
GPT-5.4 mini
OpenAI
400K
87.5%
β
β
β
$0.75 / $4.50
2026-04-26
GPT-5.4 nano
OpenAI
400K
β
β
β
β
$0.20 / $1.25
2026-04-26
Step-3.5-Flash
StepFun
256K
83.1%
β
74.4%
97.3%
Pay-per-token
2026-04-26
Mistral Large 3
Mistral AI
128K
43.9%
β
β
β
$0.50 / $1.50
2026-04-26
Claude Sonnet 4.5
Anthropic
200K
83.4%
β
77.2%
87%
$3 / $15
2026-04-26
Llama 4 Scout
Meta
10M
57.2%
β
β
β
Free (self-host)
2026-04-26
Llama 4 Maverick
Meta
128K
69.8%
β
β
β
Free (self-host)
2026-04-26
Grok 4
xAI
128K
~91.5%
~1493 (Text)
β
100%
$3 / $15
2026-04-26
Grok 4 Fast
xAI
128K
β
β
β
β
$0.20 / $1.50
2026-04-26
Category
#1
#2
#3
Coding
Claude Opus 4.7
GPT-5.5 Pro
GPT-5.5
Reasoning
Gemini 3 Deep Think
GPT-5.5 Pro
Qwen3-Max-Thinking
Open Source
DeepSeek-V4
Qwen3.5-Max
Llama 4
Cost Efficiency
DeepSeek-V3.2
Grok 4 Fast
GLM-4.7-FlashX
Context Window
Gemini 3 Flash (10M)
Llama 4 Scout (10M)
Claude Opus 4.6 (1M)
Model Specifications π
Detailed technical specifications, pricing, and capabilities for all frontier models. Data as of April 2026.
Maximum output tokens per single API request.
Model
Max Output
Context Window
Notes
Claude Opus 4.6
128K (300K via beta)
1M
Extended output via output-128k-2025-02-19 beta header
Claude Opus 4.7
128K (300K via beta)
1M
Extended output via output-128k-2025-02-19 beta header
Claude Sonnet 4.6
64K
1M
β
Claude Sonnet 4.5
64K
200K
β
GPT-5.4
128K
1.05M
β
GPT-5.4 mini
128K
400K
β
GPT-5.4 nano
128K
400K
β
GPT-5.3-Codex
128K
400K
β
Gemini 3.1 Pro
64K
1M
β
Gemini 3 Pro
64K
2M
β
Gemini 3 Flash
64K
1M
β
Gemini 3.1 Flash-Lite
32K
1M
β
DeepSeek-V4
DeepSeek
1M
β
DeepSeek-V3.2
8K / 64K (reasoner)
128K
Reasoner mode unlocks 64K output
Qwen3.5-Max
65K
1M
β
GLM-5
128K
200K
β
GLM-5.1
131K
200K
β
MiniMax-M2.5
131K
1M
β
Kimi K2.6
β
256K
Not publicly specified
Step-3.5-Flash
66K
256K
β
Grok 4
β
256K
Not publicly specified
Grok 4 Fast
30K
2M
β
Mistral Large 3
32K
128K
β
Llama 4 Scout
16K
10M
β
Llama 4 Maverick
16K
1M
β
Discounted pricing tiers for high-volume usage. All prices in USD per million tokens.
Model
Standard Input
Cached Input
Batch Discount
Notes
Claude Opus 4.6
$5.00
$0.50 (hit) / $6.25 (5m write)
50% off
Batch: $2.50 in / $12.50 out
Claude Sonnet 4.6
$3.00
$0.30 (hit) / $3.75 (5m write)
50% off
Batch: $1.50 in / $7.50 out
Claude Sonnet 4.5
$3.00
$0.30 (hit) / $3.75 (5m write)
50% off
Batch: $1.50 in / $7.50 out
GPT-5.4
$2.50
$0.25
50% off
Data residency +10%
GPT-5.4 mini
$0.75
$0.075
50% off
β
GPT-5.4 nano
$0.20
$0.02
50% off
β
GPT-5.3-Codex
$1.75
$0.175
50% off
β
Gemini 3.1 Pro
$2.00
$0.20β$0.40 + $4.50/hr storage
50% off
Tiered by input length
Gemini 3 Flash
$0.50
$0.05 + $1.00/hr storage
50% off
β
Gemini 3.1 Flash-Lite
$0.025
$0.0025 + $0.25/hr storage
50% off
Most affordable Google model
DeepSeek-V4
$0.30
$0.03 (90% off)
Off-peak 50% off
75% discount until 2026-05-05: ~$0.035 in
DeepSeek-V3.2
$0.28
$0.028
β
No formal batch API
Qwen3.5-Max
$0.40
Available
50% off
β
GLM-5 / GLM-5.1
$1.00
$0.20
β
β
Grok 4
$3.00
$0.75
β
β
Grok 4 Fast
$0.20
$0.05
β
β
Mistral Large 3
$0.50
β
β
No formal batch/cache tier
Step-3.5-Flash
$0.10
β
β
β
Output throughput and time-to-first-token from Artificial Analysis and provider benchmarks.
Model
Output Speed (tok/s)
TTFT
Notes
Gemini 3.1 Flash-Lite
~250
~2.1s
Fastest budget Google model
Step-3.5-Flash
85β350
β
Variable by provider; peak ~350 tok/s
Gemini 3 Flash
~193
~4.16s
β
MiniMax-M2.5 Lightning
~100
β
Faster tier
GPT-5.3-Codex
~86
~77.86s
High TTFT due to extended reasoning
Grok 4
~56
~8.96s
β
MiniMax-M2.5 Standard
~50
β
β
Most frontier models (Claude Opus/Sonnet 4.6, GPT-5.4, Gemini 3.1 Pro, etc.) have not yet been benchmarked on Artificial Analysis as of April 2026.
Knowledge cutoff dates β the point after which a model has no training data.
Model
Training Cutoff
Notes
Claude Sonnet 4.6
Jan 2026
Most recent cutoff among frontier models
Claude Opus 4.6
Aug 2025
Reliable knowledge: May 2025
GPT-5.4 / mini / nano
Aug 31, 2025
β
GPT-5.3-Codex
Aug 31, 2025
β
Grok 4 Fast
Jul 2025
β
DeepSeek-V4
May 2025
β
Gemini 3.1 Flash-Lite
Jan 2025
β
Gemini 3.1 Pro / 3 Pro / 3 Flash
Jan 2025
β
Grok 4
~NovβDec 2024
Approximate
DeepSeek-V3.2
Jul 2024
β
Llama 4 Scout / Maverick
Aug 2024
β
DeepSeek-R1
~Oct 2023
Based on base model
Models not listed (Qwen, GLM, MiniMax, Kimi, Step, Mistral): training cutoff not publicly disclosed.
Model
Languages
Details
Qwen3.5-Max
201
Largest language coverage
Llama 4 Scout
200
Pre-training languages
Qwen3-Max-Thinking
119
Qwen3 series
Gemini 3 Flash
100
91.8% MMMLU score across 100 languages
Gemini 3.1 Pro / 3 Pro
100+
β
Gemini 3.1 Flash-Lite
100
91.3% MMMLU score
Llama 4 Maverick
12
Output languages
Claude (all)
Many
English-optimized; broad multilingual
GPT-5.4 (all)
Many
Broad multilingual coverage
DeepSeek (all)
Many
Chinese + English focused
Grok (all)
Many
β
GLM-5 / GLM-5.1
Many
28.5T token training data
Structured Output & Function Calling
All frontier models support structured JSON output and function/tool calling except where noted.
Capability
Supported Models
Not Supported
Structured Output (JSON mode)
All models listed in Frontier table
Gemini 3 Deep Think (no API)
Function Calling / Tool Use
All models listed in Frontier table
Gemini 3 Deep Think (no API)
Gemini 3 Deep Think is available only via Gemini's in-app Think mode β no API access for structured output or function calling.
Provider
API Availability
Cloud Partners
Notes
Anthropic
Global
AWS Bedrock, GCP Vertex AI
US-only inference at 1.1x via inference_geo
OpenAI
Global
Azure OpenAI
Data residency endpoints +10% (post-3/5/26)
Google
Global
Google AI Studio, Vertex AI
Some regional restrictions per Google terms
DeepSeek
Global
Azure (R1 only, select regions)
China-based servers
Alibaba (Qwen)
Global
Alibaba Cloud Model Studio
China-based; globally accessible
Zhipu AI (GLM)
Global
Z.AI API
MIT license enables self-hosting anywhere
MiniMax
Global
MiniMax API
β
Moonshot AI (Kimi)
Global
platform.kimi.ai
MIT open-weight
xAI (Grok)
US-focused
Oracle OCI (East/Midwest/West)
Limited non-US availability
Mistral
Global
Azure AI Foundry, AWS, GCP
β
Meta (Llama)
Global (self-host)
All major cloud providers
Llama 4 Community License
StepFun
Global
HuggingFace
Apache 2.0 open-source
Self-hostable models with permissive licenses or open weights for privacy, cost control, and customization.
Model
Company
Params
Context
License
DeepSeek-V4
DeepSeek
1.6T / 49B active (MoE)
1M
Open Weight
Qwen3.5-Max
Alibaba
397B / 17B active (MoE)
262K
Apache 2.0
Qwen3-Max-Thinking
Alibaba
1T+
128K
Apache 2.0
Qwen3.6-27B
Alibaba
27B dense
262K
Apache 2.0
Qwen3.5-122B
Alibaba
397B / 17B active (MoE)
262K
Apache 2.0
Qwen3.5-27B
Alibaba
27B dense
262K
Apache 2.0
Mistral Large 3
Mistral AI
123B
128K
Apache 2.0
Llama 4 Scout
Meta
109B
10M
Community
Llama 4 Maverick
Meta
400B
128K
Community
GPT-OSS-120B
OpenAI
117B
128K
Apache 2.0
GPT-OSS-20B
OpenAI
21B
128K
Apache 2.0
Qwen3-Coder
Alibaba
480B
262K
Apache 2.0
GLM-5.1
Zhipu AI
754B / 40B active (MoE)
200K
MIT
GLM-4.7
Zhipu AI
400B+ MoE
205K
Open Weight
Gemma 4 31B
Google
31B dense
256K
Apache 2.0
Gemma 4 27B
Google
27B MoE (4B active)
256K
Apache 2.0
Gemma 4 E4B
Google
4B dense
256K
Apache 2.0
Gemma 4 E2B
Google
2B dense
256K
Apache 2.0
Qwen3-Coder 7B
Alibaba
7B dense
128K
Apache 2.0
Qwen 2.5 Coder 32B
Alibaba
32B dense
128K
Apache 2.0
DeepSeek Coder-V2
DeepSeek
236B / 2.4B active
128K
MIT
Step-3.5-flash
StepFun
196B / 11B active (MoE)
256K
Open Weight
Yi-Coder
01.AI
9B/1.5B
128K
Apache 2.0
Lizzy-7B
Flower Labs
7B
β
MIT
MiMo-V2.5
Xiaomi
309B / 15B active
262K
MIT
MiMo-V2.5-Pro
Xiaomi
1.02T / 42B active
1M
MIT
Lizzy-7B
Flower Labs
7B
β
MIT
MiMo-V2.5
Xiaomi
310B (15B active)
1M
MIT
MiMo-V2.5-Pro
Xiaomi
1.02T (42B active)
1M
MIT
Local Inference Tools:
Ollama - Easy local deployment
LM Studio - User-friendly GUI
llama.cpp - Efficient CPU inference
vLLM - High-throughput serving
SGLang - Structured generation
Cloud Deployment:
Hugging Face Inference - Managed deployment
AWS SageMaker - Full control
Google Cloud Vertex - Integrated
RunPod - GPU rental
Specialized AI models optimized for software development tasks.
SWE-bench Verified Leaderboard
Rank
Model
Company
SWE-bench Verified
π₯ #1
GPT-5.5 Pro
OpenAI
92.3%
π₯ #2
GPT-5.5
OpenAI
88.5%
π₯ #3
Claude Opus 4.7
Anthropic
87.6%
#4
Claude Opus 4.6
Anthropic
80.8%
#5
Gemini 3.1 Pro
Google
80.6%
#6
MiniMax-M2.5
MiniMax
80.2%
#7
GPT-5.4
OpenAI
~80%
#8
GPT-5.2
OpenAI
80.0%
#9
Claude Sonnet 4.6
Anthropic
79.6%
#10
Gemini 3 Flash
Google
78.0%
#11
GLM-5
Zhipu AI
77.8%
#12
Claude Sonnet 4.5
Anthropic
77.2%
#13
Kimi K2.6
Moonshot AI
80.2%
Model
Developer
Pricing
Best For
Claude Opus 4.6
Anthropic
$5 / $25 per 1M
Agentic coding, complex tasks
GPT-5.5 Pro
OpenAI
$15.00 / $60.00 per 1M
Highest benchmark coding
GPT-5.3-Codex
OpenAI
$1.75 / $14.00 per 1M
Agentic coding, 7+ hour autonomy
Claude Haiku 4.5
Anthropic
$1 / $5 per 1M
Low-latency coding, sub-agents, computer use
GLM-5-Code
Zhipu AI
$1.20 / $5.00 per 1M
Code generation, refactoring
MiniMax-M2.5
MiniMax
$0.30 / $1.20 per 1M
Code generation, refactoring
Claude Sonnet 4.5
Anthropic
$3 / $15 per 1M
Code review, refactoring
Codestral
Mistral AI
$0.30 / $0.90
Real-time completion
Grok 4 Fast
xAI
$0.20 / $1.50
Most used (50% share)
Open-Source Coding Models
Model
Developer
License
Hardware
GPT-OSS-120B
OpenAI
Apache 2.0
80-160 GB VRAM
Qwen3-Coder
Alibaba
Apache 2.0
160-320 GB VRAM
DeepSeek-Coder-V2
DeepSeek
MIT
48-80 GB VRAM
GLM-4.6
Zhipu AI
Open Weight
80-160 GB VRAM
Phi-4
Microsoft
MIT
24-48 GB VRAM
Models optimized for step-by-step reasoning, mathematical problem-solving, and complex logical inference.
Rank
Model
AIME 2025
ARC-AGI-2
Notes
π₯ #1
GPT-5.5 Pro
100%
78.5%
Highest combined
π₯ #2
Gemini 3.1 Pro
100%
77.1%
Highest combined reasoning
π₯ #3
GPT-5.2
100%
52.9%
No tools needed
#4
Grok 4
100%
β
First-principles reasoning
#5
Claude Opus 4.6
99.8%
68.8%
Near-perfect AIME
#6
Gemini 3 Pro
98β100%
31.1β45.1%
With code execution
#7
Step-3.5-Flash
97.3%
β
Best efficiency ratio
#8
Kimi K2.6
96.4%
β
Strong multimodal reasoning
#9
Claude Sonnet 4.6
~95%
58.3%
Near-Opus performance
#10
GLM-5
92.7%
β
Thinking mode
Model
Type
Context
Pricing
Gemini 3 Deep Think
Reasoning
1M+
Ultra subscription
Qwen3-Max-Thinking
Reasoning/Coding
128K
$1.20 / $6.00
o3 / o1-Pro
Reasoning
128K
$2-150 / $8-600
GPT-5.5 Pro
Reasoning
1M
$15.00 / $60.00
Gemini 3 Pro
General/Multimodal
1M+
$2 / $12
DeepSeek-R1
Reasoning
128K
$0.50 / $2.15
Claude Sonnet 4.5
Hybrid
200K
$3 / $15
GPT-Rosalind
Life Sciences Reasoning
128K
Pay-per-token (Research Preview)
Mathematical Problem Solving : Qwen3-Max-Thinking, GPT-5.5 Pro, Gemini 3 Pro
Scientific Analysis : Claude Opus 4.6, GPT-5.5, Gemini 3 Pro
Strategic Planning : o3/o1-Pro, Claude Sonnet 4.5, DeepSeek-R1
Code Debugging : Claude Sonnet 4.5, GPT-5.3-Codex, DeepSeek-V3.2
Models capable of processing and generating multiple types of content: text, images, audio, and video.
Leading Multimodal Models
Model
Developer
Context
Key Features
GPT-5.4
OpenAI
1M
Unified multimodal, audio
Gemini 3 Pro
Google
1M+
Native multimodal, video
Claude Sonnet 4.5
Anthropic
200K
Document understanding
Llama 4 Maverick
Meta
128K
Open multimodal
Nemotron 3 Nano Omni
NVIDIA
30B (3B active)
Vision, audio, language unified, 9x throughput
Model
MMMU / MMMU-Pro
MathVista
DocVQA
Gemini 3.1 Pro
95% (MMMU-Pro)
β
β
GPT-5.4
94% (MMMU-Pro)
β
β
Gemini 3 Pro
81% (MMMU-Pro)
β
β
Gemini 3 Flash
80% (MMMU-Pro)
β
β
Claude Sonnet 4.5
77.8% (MMMU)
β
β
Llama 4 Maverick
73.4% (MMMU)
β
β
Model
Speech-to-Text
Text-to-Speech
Video Input
Gemini 3 Pro
β
β
β
GPT-5
β
β
β οΈ
Whisper v3
β
β
β
Model
Developer
License
Best For
MAI-Image-2-Efficient
Microsoft
Proprietary
Production-ready quality, 41% lower cost
Flux.1
Black Forest Labs
Apache 2.0
High-fidelity art
Stable Diffusion 3.5
Stability AI
Community License
Fine-tuning
GLM-Image
Zhipu AI (Z.ai)
API
Fast image generation
CogView-4
Zhipu AI (Z.ai)
API
Creative image generation
Firefly AI Assistant
Adobe
Public Beta (2026-04-27)
Creative agent, 60+ tools, Photoshop/Premiere integration
Hardware Requirements π₯οΈ
Comprehensive hardware specifications for self-hosting AI models.
Quick Reference by Model Size
Model
Params
Q4 Size
Min VRAM
Rec VRAM
Min RAM
Phi-4
14B
8 GB
24 GB
48 GB
32 GB
GPT-OSS-20B
21B
12 GB
24 GB
48 GB
32 GB
Llama 4 Scout
109B
66 GB
48 GB
80 GB
96 GB
GPT-OSS-120B
117B
70 GB
80 GB
160 GB
128 GB
DeepSeek-Coder-V2
236B
143 GB
48 GB
80 GB
192 GB
Llama 4 Maverick
400B
242 GB
160 GB
320 GB
320 GB
DeepSeek-V4
671B
404 GB
80 GB
320 GB
512 GB
Qwen3-Max-Thinking
1T+
600+ GB
160 GB
640 GB
768 GB
Consumer/Entry Level (24-48 GB VRAM):
Phi-4, GPT-OSS-20B, Yi-Coder, Qwen2.5-Coder
Recommended GPUs : RTX 3090 (24GB), RTX 4090 (24GB)
Professional (80-160 GB VRAM):
Llama 4 Scout, GPT-OSS-120B, DeepSeek-Coder-V2
Recommended GPUs : A100 80GB, 2x A100 40GB
Enterprise (320+ GB VRAM):
Llama 4 Maverick, GLM-4.7, DeepSeek-V4, Qwen3-Max-Thinking
Recommended GPUs : 4x A100 80GB, 8x A100 80GB
Level
Bits
Size vs FP16
Quality
Use Case
FP16/BF16
16
100%
Best
Training
Q8_0
8
~50%
Excellent
High-quality inference
Q4_K_M
4
~25%
Good
Recommended for deployment
Q3_K_M
3
~19%
Fair
Limited resources
Comprehensive Benchmark Reference π
Detailed benchmark scores across all major evaluations. Scores are percentages (%) unless noted. Arena Elo scores are integers. β = not publicly reported. Data as of April 2026.
Model
GPQA Diamond
MMLU-Pro
Arena Elo (Text)
HLE
SWE-bench Verified
SWE-bench Pro
LiveCodeBench
AIME 2025
ARC-AGI-2
MMMU-Pro
IFEval
FrontierMath
Claude Opus 4.6
91.3%
β
1500
40.0β53.0%
80.8%
β
β
99.8%
68.8%
β
β
β
GPT-5.5
93.2%
β
1495
42.1β55.0%
88.5%
β
β
99.9%
71.2%
β
β
52%
GPT-5.5 Pro
95.1%
96%
1520
48.5β62.0%
92.3%
β
β
100%
78.5%
β
97%
58%
Claude Sonnet 4.6
89.9%
β
~1438
33.2β49.0%
79.6%
β
β
~95%
58.3%
β
β
β
Claude Sonnet 4.5
83.4%
88.0%
β
β
77.2%
β
β
87β100%
β
β
β
β
GPT-5.4
92.0%
94%
1484
36.6β41.6%
~80%
57.7%
84β88%
88%
73.3%
94%
β
50% (Pro)
GPT-5.4 mini
87.5%
β
β
β
β
54.4%
β
β
β
β
β
β
GPT-5.3-Codex
91.5%
β
β
β
β
56.8%
85%
β
β
β
β
β
GPT-5.2
92.4%
β
1479
35.2%
80.0%
55.6%
β
100%
52.9%
β
95.6%
~40.3%
Gemini 3.1 Pro
94.3%
92%
1494
44.4β51.4%
80.6%
54.2β72%
71%
100%
77.1%
95%
95%
β
Gemini 3 Pro
91.9β93.8%
83%
1486
37.5%
76.2%
43.3%
49%
98β100%
31.1β45.1%
81%
88%
38%
Gemini 3 Flash
90.4%
72%
1474
33.7%
78.0%
44%
β
β
β
80%
85%
β
Gemini 3 Deep Think
~97%
81%
β
48.4%
~58%
63%
58%
β
84.6%
β
β
β
DeepSeek-V3.2
87.1%
85.0%
β
25.1%
67.8%
β
β
89.3%
β
β
β
β
DeepSeek-R1
71.5%
84.0%
β
8.5%
49.2%
β
63.5%
70.0%
β
β
β
β
Qwen3.5-Max
89.3%
β
β
β
76.4%
β
β
91.3%
β
79%
β
β
Qwen3-Max-Thinking
86.1%
β
β
26.2%
β
β
β
β
β
β
β
β
GLM-5
82.0%
β
~1451
10.4%
77.8%
β
β
92.7%
β
β
β
β
GLM-5.1
β
β
β
β
~80.4% (est.)
β
β
β
β
β
β
β
Kimi K2.6
90.5%
87.1%
β
31.5β50.2%
80.2%
β
85.0%
96.4%
β
78.5%
β
β
MiniMax-M2.5
85.2%
β
β
β
80.2%
55.4%
β
86.3%
β
β
β
β
Step-3.5-Flash
83.1%
β
β
β
74.4%
β
86.4%
97.3%
β
β
β
β
Grok 4
~91.5%
91.5%
~1493
50.7%
β
β
β
100%
β
β
β
β
Llama 4 Maverick
69.8%
80.5%
β
β
β
β
43.4%
β
β
β
β
β
Llama 4 Scout
57.2%
74.3%
β
β
β
β
32.8%
β
β
β
β
β
FrontierMath is a benchmark of 350 original, exceptionally challenging mathematics problems created by expert mathematicians (Epoch AI). Problems span number theory, analysis, algebraic geometry, and category theory. Tier 4 problems can take research mathematicians multiple days.
Benchmark
Description
Source
GPQA Diamond
Graduate-level science questions (PhD difficulty)
Google Research
MMLU-Pro
Extended multi-task language understanding (harder than MMLU)
TIGER-Lab
Arena Elo
Crowdsourced human preference ranking
lmarena.ai
HLE
Humanity's Last Exam β expert-level questions
Scale AI
SWE-bench Verified
Real GitHub issue resolution (human-verified subset)
SWE-bench
SWE-bench Pro
More challenging subset of SWE-bench
SWE-bench
LiveCodeBench
Live competitive programming problems (not in training data)
LiveCodeBench
AIME 2025
American Invitational Mathematics Examination
MAA
ARC-AGI-2
Abstract reasoning challenge (fluid intelligence)
ARC Prize
MMMU / MMMU-Pro
Multi-discipline multimodal understanding
MMMU
IFEval
Instruction-following evaluation
Google Research
FrontierMath
Expert-level research mathematics (Epoch AI)
Epoch AI
Development Tools π οΈ
AI-powered tools for software development, from IDEs and CLI tools to API providers and IDE extensions.
Integrated Development Environments with built-in AI capabilities.
IDE
Platform
Version
Release Date
Pricing
Key Features
GitHub
Firebase Studio
Web
-
-
Free (3 workspaces, up to 30 with Google Developer Program)
Cloud-based, Gemini, MCP
π
Lingma IDE (ιδΉη΅η )
Windows, macOS
-
-
Free (download)
Built-in agent, MCP tool use, terminal command execution
β
Tonkotsu
Windows, macOS
-
-
Free (during early access)
Team of agents, workflow
π
OpenCode
Windows, macOS, Linux
-
-
Free (OSS)
Terminal, desktop, IDE extension, multi-provider
π
Codex app
Windows
-
2026-03-04 00:00 UTC
Included with Codex plans
Multiple agents, isolated worktrees, reviewable diffs, CLI and IDE interop
π
Visual Studio
Windows, macOS
17.14.12+, 18.1.0+
2026-01-06 00:00 UTC
Free / $250/yr
Gemini 3 Flash integration, faster performance, zero-migration upgrades, real-time profiler agent
β
IntelliJ IDEA
Windows, macOS, Linux
2025.3.2
2026-01
Free / $149/yr
Java 24 support, Kotlin K2 mode, performance and memory improvements
β
IBM Bob
Cross-platform
GA (April 28, 2026)
2026-04-28
Free trial + Enterprise plans
Multi-model orchestration, full SDLC, 45% productivity gain
π
PolyAI ADK
PolyAI
GA (April 22, 2026)
2026-04-22
Enterprise CX
AI-native dev, Cursor/Claude Code integration
π
JAT
Windows, macOS, Linux
-
2026-04-14
Free (MIT)
Self-contained agentic IDE, 20+ parallel agents, task management, unified environment
π
Editor
Platform
Version
Release Date
Pricing
Key Features
GitHub
Zed
macOS, Windows, Linux
0.226.3
2026-03-03 00:00 UTC
Free (OSS) + Copilot $10/mo
Fast, collaboration, Gemini and Claude, Zeta AI, agent thread history, edit prediction providers, self-hosted OpenAI-compatible servers
π
Dyad
Windows, macOS, Linux
-
-
Free (OSS)
Local generation, BYO keys
π
Memex
macOS, Windows
-
-
Freemium (Free + $10/mo)
Agentic, browserβdesktop
π
IDE
Platform
Version
Release Date
Pricing
Autonomous
MCP
GitHub
Cursor
Windows, macOS, Linux
3.2 (May 1, 2026)
2026-05-01 00:00 UTC
Freemium (Free + Pro $19/mo or $39/mo)
β
β
β
Windsurf
Windows, macOS, Linux
2.0.0 (May 3, 2026)
2026-05-03 00:00 UTC
Freemium (Free + Pro)
β
β
β
Trae
macOS, Windows
-
-
Free
β
β
π
PearAI
Windows, macOS, Linux
-
-
Free (OSS)
β
β
π
Void
Windows, macOS, Linux
-
-
Free (OSS)
β
β
π
Kiro
Windows, macOS, Linux
-
-
Free (Preview)
β
β
π
VS Code Agents
Windows, macOS, Linux
Insiders
2026-04-21
Free
β
β
π
IDE
Platform
Version
Release Date
Pricing
Self-Hostable
Best For
GitHub
Replit 3
Web
-
-
Free Starter, Core $20/mo , Pro $100/mo
β
Learning/Prototyping
β
Bolt.new
Web
-
-
Free, Pro $20-25/mo, Teams $30/user/mo
β
Quick apps
β
Bolt.diy
Self-hosted
-
-
Free (MIT), bring your own API
β
Self-hosted
π
Lovable
Web
-
-
Free (5 credits/day), Pro $25/mo, Business $50/mo
β
UI/Full-stack
β
v0
Web
-
-
Free ($5 credits/mo), Premium $20/mo, Teams $30/user
β
React components
β
Gitpod
Web
-
-
Free + Paid
β
Cloud dev environments
β
Rork
Web
-
-
Free & Paid (credits)
β
Mobile apps (iOS/Android)
β
Google Stitch
Web
-
2026-03
Free (Google account, 550 gen/mo)
β
UI design, Figma/React export
β
Google Antigravity
Web
-
-
Google AI Pro / Ultra
Agent-first development with Gemini-powered coding
β
Jules
Web
-
2025-05-20 00:00 UTC
Free beta, higher limits on Google AI Pro / Ultra
Async repo agent, reviewable diffs, GitHub integration
β
Command-line AI tools for autonomous coding and terminal enhancement.
Tool
Platform
Pricing
Key Features
GitHub
Aider
Windows, macOS, Linux
Free
Gold standard, Architect mode, thinking tokens
π
Claude Code 2.2.1+
macOS, Linux, Windows
Free + API
Fast mode for Opus 4.7, simple mode file editing, multi-session support
π
Codex CLI
Windows, macOS, Linux
Included
Sandbox, approval modes
π
Junie CLI
Windows, macOS, Linux
Free (BYOK)
LLM-agnostic, JetBrains IDE integration, MCP
π
Goose
Windows, macOS, Linux
Free (Apache-2.0)
MCP, extensible, desktop app, 25+ providers
π
GPT-Pilot
Windows, macOS, Linux
Free
Full dev team simulation
π
OpenHands
Windows, macOS, Linux
Free
Cloud agents, MCP
π
Mentat
Windows, macOS, Linux
Free
Multi-file coordination
π
SERA
Linux, macOS
Free (Apache 2.0)
Open-source coding agent, 200K synthetic trajectories
π
AI Dev Kit
Cross-platform
Free
59 skills, 33 agents, TDD, security audit, CI/CD
π
Tool
Developer
Pricing
Best For
Gemini CLI
Google
Free
Google ecosystem
Cursor CLI
Cursor
Free tier
Terminal + IDE bridge
Qwen Code
Alibaba
Free
Qwen optimization
Qodo CLI
Qodo
Free tier
Testing and review
CLI Tools by Programming Language
AI coding CLI tools categorized by their primary language support. All tools below accept plain English prompts.
Tool
Primary Languages
Multi-Language
Local LLM
Cloud API
Pricing
GitHub
Aider
Python, JS, TS, Go, Rust, Ruby, Java, C/C++
β
(100+ langs)
β
(Ollama, LM Studio)
β
Free (OSS)
π
Claude Code
All (polyglot)
β
β
Claude API ($3β$15/M)
Free tool / API cost
β
Codex CLI
Python, JS, TS, Bash
β
β
OpenAI API
Free OSS / API cost
π
OpenHands
Python, JS, TS, Go, Rust, Java
β
β
β
Free (OSS)
π
Goose (Block)
Polyglot (25+ providers)
β
β
(Ollama, LM Studio)
β
Free (OSS)
π
Continue
Polyglot (VS Code, JetBrains)
β
β
(Ollama, LM Studio)
β
Free (OSS)
π
Qwen Code
Python, JS, TS, Go, Java
β
β
(Qwen models)
β
Free (OSS)
π
Devstral CLI
Python, JS, TS, Go, Rust
β
β
Mistral API
Free OSS model / API cost
β
OpenCode
Polyglot
β
β
β
Free (OSS)
π
Mentat
Python, JS, TS, Go
β
β
OpenAI API
Free (OSS)
π
Amp (Sourcegraph)
All
β
β
β
Free / Enterprise
β
CLI for Programming Languages & Multiple Use
Purpose-built CLI tools for coding across specific languages or polyglot multi-stack workflows.
Tool
Language Focus
Platform
Pricing
Key Features
GitHub
Aider
Polyglot (Python, JS, TS, Go, Rust, any)
All
Free (BYOK)
Git-native multi-file edits, Architect mode, repo maps, thinking tokens
π
Claude Code
Polyglot
All
Free + API
Computer use, sub-agents, CLAUDE.md skills, Opus 4.7, multi-session
π
Codex CLI
Python, JS, TS
All
Free (OpenAI account)
Sandbox execution, approval modes, OpenAI models
π
OpenHands
Python, JS, TS, Go, Rust
All
Free (OSS)
Full SDLC agent, MCP, local LLM via Ollama
π
Goose
Polyglot
All
Free (Apache-2.0)
25+ providers, MCP, extensible extensions, desktop app
π
Continue
Polyglot
All
Free (OSS)
VS Code + JetBrains, custom models via Ollama/LM Studio
π
Qwen Code
Python, JS, TS, Go
All
Free
Optimized for Qwen3-Coder 480B, Apache 2.0
β
Mentat
Polyglot
All
Free
Multi-file coordination, context-aware diffs
π
AI Dev Kit
Polyglot
All
Free
59 skills, 33 agents, TDD, security audit, CI/CD pipeline
π
Devstral CLI
Python, JS, TS, Go
All
Free (Mistral free tier)
Mistral's open coding model, OpenRouter free access
β
Junie CLI
Polyglot
All
Free (BYOK)
LLM-agnostic, JetBrains IDE integration, MCP
π
SERA
Python, JS, TS
Linux, macOS
Free (Apache 2.0)
Open-source coding agent, 200K synthetic trajectories
π
Tool
Platform
Pricing
Key Features
Warp Terminal
macOS, Linux, Windows
Free
AI Agents, workflow sharing
Fig
macOS, Linux
Free
Autocomplete, AI suggestions
Extensions and plugins that add AI capabilities to existing IDEs.
Universal (Cross-Platform)
Add-on
Platform
Pricing
Context
Best For
GitHub
GitHub Copilot
VS Code, JetBrains, Vim
Free / $10/mo / $39/mo
Large
General coding
β
Supermaven
VS Code, JetBrains, Neovim
Free / $10/mo
1M
Large codebases
β
Codeium
VS Code, JetBrains, Vim
Free / $15/mo / $60/mo
Medium
Free alternative
β
Continue
VS Code, JetBrains
Free (OSS)
Custom
Self-hosted
π
Cody
VS Code, JetBrains, Web
Free (discontinued) / Enterprise Starter $19/mo / Enterprise $59/mo
Enterprise
Code search
π
Tabnine
VS Code, JetBrains, VS, Eclipse
Free / $39/mo
Local
Privacy
β
Tabby
VS Code, JetBrains, Vim, Neovim
Free (OSS)
Self-hosted
Self-hosted code completion
π
Add-on
Pricing
Autonomous
MCP
Best For
GitHub
Codex
Free (with ChatGPT Plus $20/mo or Pro $200/mo)
β
β
OpenAI's official coding agent
π
Cline
Free
β
β
Full agent
π
GitHub Copilot (Agent Mode)
$0 / $10 / $39/mo
β οΈ
β
Guided agent workflows
β
RooCode
Free/Pro
β οΈ
β
Complex tasks
π
Keploy
OSS/Enterprise
β
β
Testing
β
Add-on
Pricing
Claude Agent
Best For
JetBrains AI Assistant
$10/mo (Pro), $249/yr (Ultimate)
β
Deep IDE integration
JetBrains Claude Agent
Included in subscription
β
Native agent
Services for accessing AI models via API.
Provider
Models
Pricing
OpenAI
GPT-5.4, GPT-5.4 mini, GPT-5.4 nano, o3, Codex
Pay-per-token
Anthropic
Claude Opus 4.6, Sonnet 4.6, Haiku 4.5
Pay-per-token
Alibaba Cloud
Qwen3.5-Max, Qwen3-Coder, Qwen3.6-27B
Pay-per-token / Coding Plan $50/mo
Gemini (Google)
Gemini 3.1 Pro, 3 Pro, 3 Flash
Pay-per-token
Z.ai (Zhipu AI)
GLM-5, GLM-5.1, GLM-4.7, GLM-5-Code
Pay-per-token
MiniMax
MiniMax-M2.5/M2.7/M2
Pay-per-token
Cohere
Command, Embed, Rerank
Pay-per-token
AI21 Labs
Jamba
Pay-per-token
Perplexity
Sonar / Sonar Pro / Sonar Reasoning Pro
Pay-per-token + request fees
Moonshot AI
Kimi (kimi-k2.5, kimi-k2-thinking)
Pay-per-token
ByteDance (Volcengine)
Doubao, Seed 1.6/2.0
Pay-per-token
Tencent (Hunyuan)
Hunyuan, Hunyuan-a13b
Pay-per-token
StepFun
Step-3.5-Flash, Step-3.5
Pay-per-token (OpenRouter free)
PaleBlueDot AI
Unified platform, 100+ models
Token-based pricing
Osirus AI
Unified platform + Agent Studio
Free tier + paid plans
Logic
Spec-driven managed agents
Free tier + $49/mo
DeepSeek
DeepSeek-V4/R1
Pay-per-token
Mistral AI
Mistral Large 3, Codestral
Pay-per-token
xAI
Grok-4
Pay-per-token
Unified APIs & Aggregators
Provider
Models
Key Features
OpenRouter
200+
Crypto/fiat, rankings
Hugging Face
Thousands
Serverless inference
Provider
Specialization
Speed
Together AI
Llama/Qwen/Mistral
Fast
Fireworks AI
FireAttention
Low-latency, 6 free models
Groq
LPU
>500 T/s
Cerebras
Wafer-Scale
>2000 T/s
NVIDIA NIM
91 free endpoints, DGX Cloud
20Γ faster than NVIDIA GPU
Provider
Type
Best For
RunPod
GPU Rental
Flexibility, cost-effective fine-tuning & inference
Replicate
Model-as-a-Service
Quick deployment, serverless inference
Vultr
Global Cloud
Hourly GPU instances
Hyperbolic
Decentralized
Crypto/Fiat payments
Cerebrium
Serverless GPU
Python-native ML inference & fine-tuning
Together AI
AI-Native Cloud
Fast, cost-effective inference & fine-tuning for open models
Modal Labs
Serverless GPU
Fine-tuning with LoRA, distributed training
Fireworks AI
Inference & Fine-tuning
Fast inference, RFT for model shaping
Databricks Mosaic AI
Integrated ML Platform
Enterprise fine-tuning, governed serving, RAG
NVIDIA DGX Cloud
Managed AI Training
Co-engineered clusters, maximum ROI for training
Vast.ai
GPU Marketplace
Serverless endpoints, diverse GPU options
DigitalOcean
GPU Droplets
Simple fine-tuning workflows, scalable GPU infrastructure
AI-powered tools for automating browser and desktop tasks.
Tools and frameworks for AI-powered browser automation.
Browser
Platform
Pricing
Open Source
Local AI
Agent/Computer Use
API Access
Multi-Agent
Parallel Sessions
Best For
GitHub
Perplexity Comet
Windows, macOS, iOS, Android
Free / Pro $20/mo
β
β
β
β
β
β
Research + background tasks, voice mode, Computer Max agent
β
ChatGPT Agent Mode
Web, iOS, Android
Plus $20/mo, Pro $200/mo
β
β
β
β
β
β
Full computer use: browse, code, fill forms, book travel
β
Dia
macOS (M1+ / macOS 14+)
Free / Pro $20/mo
β
β
β οΈ
β
β
β
Tab intelligence, Skills, browsing history AI context
β
Google Chrome (Auto Browse)
Windows, macOS, Linux, ChromeOS
Free / Gemini Pro $19.99/mo
β
β
β
β
β
β
Gemini 3 built-in, auto browse agentic tasks (enterprise)
β
Microsoft Edge (Copilot Agent)
Windows, macOS, iOS, Android
Free / Copilot Pro $20/mo
β
β
β
β
β
β
Cross-tab context, voice commands, form automation, bookings
β
Genspark
Web, iOS, Android
Free / Plus $25/mo / Pro $249/mo
β
β
(169 local models)
β
β
β
β
Super Agent, AI slides, AI websites, deep research, Call For Me
β
Brave Leo (AI Browser)
Windows, macOS, Linux, iOS, Android
Free / Premium $14.99/mo
β
(Chromium)
β
(Leo local)
β οΈ
β
β
β
Privacy-first, zero-log AI, Skills, Memories, local models
β
SigmaOS (Airis)
macOS
Free / Pro (subscription)
β
β
β οΈ
β
β
β
NL commands: "Book Airbnb in Iceland", cross-tab AI, YC-backed
β
Opera Neon
Windows, macOS
$19.90/mo
β
β
β
β
β
β
Agentic browsing, Aria assistant, built-in AI tools
β
Opera One (Aria)
Windows, macOS, Linux, iOS, Android
Free
β
β
β οΈ
β
β
β
Built-in Aria AI assistant, sidebar AI tools
β
Firefox (AI Sidebar)
Windows, macOS, Linux, iOS, Android
Free
β
β
β οΈ
β
β
β
AI Controls dashboard (v148+), ChatGPT/Claude/Mistral sidebars
β
BrowserOS
Linux, macOS
Free
β
β
β
β
β
β
Privacy-focused, built-in MCP, agentic
π
Manus AI
Web (Cloud)
Free 300 credits/day / Plus $20/mo / Pro $200/mo
β
β
β
β
β
β
Cloud agent, full computer: code, deploy, files, search
β
Sigma AI Browser
Windows, macOS, Linux
Free / Pro $29/mo
β
β
β
β
β
β
Built-in local AI agent, offline, no tracking
π
Fellou
Windows, macOS
Free 4 tasks/day / Pro $20/mo
β
β
β
β
β
β
Complex multi-step automation, agentic tasks
π
Arc Max
macOS, Windows
Free
β
β
β οΈ
β
β
β
AI-enhanced browsing, pinch-to-summarize, Ask on Page
β
Maxthon
Windows, macOS, iOS, Android
Free / Premium
β
β
β οΈ
β
β
β
MaxAsk AI answers, built-in VPN, ad-blocker, resource sniffer
β
ChatGPT Atlas
macOS
Free (with ChatGPT subscription)
β
β
β
β
β
β
OpenAI integration, macOS computer use overlay
π
AnythingLLM
Windows, macOS, Linux
Free (OSS)
β
β
β οΈ
β
(local API)
β
β
All-in-one desktop AI, document chat, local + API
π
BrowserGPT
iOS, Android
Free / Premium
β
β
β οΈ
β
β
β
Mobile-first AI browser
β
Sidekick Browser
Windows, macOS, Linux
Free / Pro $10/mo
β
β
β
β
β
β
AI assistant, natural language tab management, summarize, automate tasks
β
Extension
Pricing
Free
Multi-Agent
Best For
GitHub
Monica.im
Freemium (Free + ~$9/mo)
β
β
Chrome extension, no browser switch
β
Harpa AI
Free
β
β
Automation recipes
π
MultiOn
Free/Paid
β οΈ
β
Complex tasks
π
NanoBrowser
Free
β
β
Local control, Ollama
π
Neobrowser
Free (OSS)
β
β
Local LLMs via Ollama, privacy-first, Chrome/Edge
β
Open Operator
Free
β
β
Browserbase-powered, open NL browser control
π
Openator
Free (OSS)
β
β
Docker-based headless NL browser agent
π
Library
Language
Pricing
Best For
API Access
Multi-Agent
Parallel Sessions
GitHub
Chrome DevTools MCP
TypeScript
Free (OSS)
AI web debugging, 29 DevTools
β
β
β
π
Cloudflare Browser Run
Cloud API
Free Workers / $5+/mo
CDP + MCP, WebMCP, Live View
β
β
β
π
Browser-use
Python
Free OSS / Cloud $29/mo
Agentic automation, Workflow Use
β
β
β
π
Stagehand
TypeScript/Python
Free (OSS)
Hybrid deterministic + AI, action caching
β
β
β
π
LaVague
Python
Free (OSS)
NL to code
β
β
β
π
Skyvern
Python
Free tier / $29β$149/mo
CV-based automation, Ollama support
β
β
β
π
Notte
Python/Cloud
Free tier / $29/mo+
Deterministic replay, demoβscript
β
β
β
π
Firecrawl
Python / CLI
Free tier / $49/mo+
LLM-powered crawling & scraping
β
β
β
π
Playwright MCP
TypeScript
Free (OSS)
Cross-browser automation, VS Code
β
β
β
π
Langflow
Python
Free (OSS) / Cloud $29/mo
Visual multi-agent & RAG workflows
β
β
β
π
LlamaIndex
Python
Free (OSS) / Cloud $29/mo
Document-heavy RAG, retrieval quality
β
β
β
π
Haystack
Python
Free (OSS) / Cloud $49/mo
Regulated deployments, structured pipelines
β
β
β
π
AgentQL
TypeScript/Python
Free (1K req/mo) / $49/mo / $149/mo
Natural language web querying/automation
β
β
β
π
ScrapeGraphAI
Python
Free OSS / Cloud $29/mo
Natural language web scraping
β
β
β
π
WebVoyager
Python
Free (OSS)
Autonomous web browsing research
β
β
β
π
Service
Platform
Pricing
Best For
GitHub
ChatGPT agent
ChatGPT
Plus / Pro / Team
Guided browser tasks, research, forms, and spreadsheets
β
Project Mariner
Google AI Ultra
Included with Google AI Ultra
Multi-step browser tasks, shopping, and reservations
β
Skyvern Cloud
Cloud API
Paid
Resilient automation
π
Browserbase
Cloud API
Paid
Stealth mode, session recording
β
Autonomous Agents β Plain English Prompts π€
Control a computer or cloud sandbox using plain English text β no coding required. Just describe the task and the agent handles everything: clicking, typing, navigating, running code, and completing multi-step workflows.
Legend: π₯οΈ = runs on your physical computer | βοΈ = cloud/sandbox computer | π = controls a browser | π = multi-agent/parallel | π¬ = simple English prompt | π = free/open-source | π° = paid
βοΈ Cloud Sandbox Computer Use (English Prompts)
These services run in a cloud sandbox (virtual Linux/Windows desktop), control the computer for you, and are driven purely by natural language instructions.
Agent
Interface
Pricing
Multi-Agent
Parallel Sessions
Local LLM
English Prompt
GitHub
Manus AI
Web dashboard
Free (300 credits/day) / Plus $20/mo / Pro $200/mo
β
β
β
β
β
ChatGPT Agent
ChatGPT Web/App
Plus $20/mo / Pro $200/mo
β
β
β
β
β
Gemini Computer Use
API / AI Studio
Gemini Pro $19.99/mo / API metered
β
β
β
β
β
Devin
Web dashboard
Core $20/mo ($2.25/ACU) / Team $500/seat/mo
β
β
β
β
β
OpenHands
Web UI / CLI
Free OSS / Cloud Individual free
β
β
β
(any API)
β
π
E2B Desktop Sandbox
API / SDK
Hobby free / Pro $150/mo
β
β
(via code)
β
β
π
Cua (trycua)
CLI / Python SDK
Free (OSS)
β
β
β
β
π
Airtop
Web dashboard / API
Starter $26/mo (3 sessions) / Pro $80/mo (30 sessions)
β
β
β
β
β
Skyvern Cloud
Web dashboard / API
Free 1K credits / Hobby $29/mo / Pro $149/mo
β
β
β
(Ollama)
β
π
Convergence Proxy
Web / API
Free tier / Pro $20/mo (acquired by Salesforce)
β
β
β
β
β
Amazon Nova Act
API (AWS)
Pay-per-use (AWS pricing)
β
β
β
β
β
Project Mariner
Google AI Ultra
Included ($249.99/mo Ultra plan)
β
β
β
β
β
Perplexity Computer
Web dashboard
Perplexity Pro $20/mo
β
β
β
β
β
OpenAI Computer Use (API)
API / ChatGPT
$15/M input, $60/M output
β
β
β
β
β
π₯οΈ Local Machine / Physical Computer Use
These agents run on your own machine, see your screen, and control your keyboard/mouse β no cloud required.
Agent
Windows
macOS
Linux
Dashboard/UI
CLI
API/LLM
Multi-Agent
Parallel Sessions
Pricing
GitHub
Claude Computer Use
β
β
β
β
β
(API)
Claude API ($3β$15/M)
β
β
Claude API ($3β$15/M tokens)
Commercial
Agent TARS (ByteDance)
β
β
β
β
Web UI
β
npx @agent-tars/cli@latest
Any LLM
β
β
Free (OSS)
π
UI-TARS Desktop (ByteDance)
β
β
β
β
Desktop app
β
UI-TARS-2 model
β
β
Free (OSS)
π
Open Interpreter
β
β
β
β
Web
β
interpreter
Any (OpenAI, Claude, local)
β
β
Free (OSS)
π
Open-Interface
β
β
β
β
β
GPT-4V / any vision LLM
β
β
Free (OSS)
π
Agent S / S2
β
β
β
β
β
Any LLM API
β
β
Free (OSS)
π
UFO (Microsoft)
β
β
β
β
UI
β
GPT-4V / Azure
β
β
Free (OSS)
π
Windows-Use
β
β
β
β
β
Any vision LLM
β
β
Free (OSS)
π
Bytebot
β
β
β
β
(Docker)
β
Any LLM
β
β
Free (OSS)
π
OpenCUA
β
β
β
β
β
Any
β
β
Free (OSS)
π
Khoj
β
β
β
β
Web UI
β
Any (Ollama, LM Studio, OpenAI)
β
β
Free (OSS) / Cloud $10/mo
π
Control a browser with natural language β click, fill forms, scrape, automate. No script writing needed.
Agent
Type
Pricing
Dashboard
CLI
Multi-Agent
Parallel Sessions
Local LLM
GitHub
Browser-use
OSS Python lib + Cloud
Free OSS / Cloud: Free 3 sessions / Dev $29/mo / Business $299/mo
β
Cloud
β
β
β
β
(Ollama)
π
Stagehand
OSS TypeScript
Free (OSS)
β
β
β
β
β
π
NanoBrowser
Chrome extension
Free (OSS)
β
Extension
β
β
β
β
(Ollama)
π
Skyvern
Python / Cloud
Free tier / $29β$149/mo
β
Cloud
β
β
β
β
(Ollama)
π
Openator
Python
Free (OSS)
β
β
β
β
β
π
Open Operator
Web UI
Free
β
β
β
β
β
π
Airtop
Web / API
$26β$80/mo
β
β
β
β
β
β
MultiOn
API / Chrome ext
Free / Paid
β
β
β
β
β
π
π Multi-Agent / Parallel Agent Platforms (Plain English Orchestration)
Coordinate multiple AI agents in parallel to complete complex workflows β driven by plain English goals.
Platform
Type
Dashboard
CLI
Cloud
Local LLM
Parallel
Pricing
GitHub
CrewAI
Multi-agent OSS + Cloud
β
AMP Studio
β
crewai
β
AMP
β
β
Free OSS / Starter $99/mo / Pro $299/mo / Enterprise custom
π
AutoGen (Microsoft)
Multi-agent conversations
β οΈ
β
Python
β
Azure
β
β
Free (OSS) / Azure pay-per-token
π
LangGraph
Stateful agent graphs
β
LangSmith
β
Python
β
Cloud
β
β
Free OSS / Professional $99/mo
π
OpenHands
Dev-focused multi-agent
β
Web UI
β
β
Cloud
β
β
Free (OSS + Cloud free tier)
π
OWL (Camel-AI)
Distributed multi-agent
β
β
Python
β
β
β
Free (OSS)
π
Manus AI
Cloud multi-agent
β
Web
β
β
β
β
Free 300 credits/day / $20β$200/mo
β
n8n
Workflow + AI agents
β
Visual canvas
β
n8n
β
Cloud
β
(Ollama node)
β
Free OSS / Starter $24/mo / Pro $60/mo
π
Devin
Software engineering
β
Web
β
β
β
β
Core $20/mo ($2.25/ACU) / Team $500/seat
β
Smolagents (HuggingFace)
Lightweight code agents
β
β
Python
β
β
β οΈ
Free (OSS)
π
Dify
Visual LLM platform
β
Web UI
β
β
Cloud
β
β
Free OSS / Cloud plans
π
Multi-Agent & Parallel Execution Summary
Tools supporting parallel agent orchestration (β
) vs single-agent only (β):
Category
Supports Parallel Agents
Tools
Cloud Sandbox
β
Manus AI, OpenHands, E2B Desktop Sandbox, Cua (trycua), Airtop, Skyvern Cloud, Amazon Nova Act, Perplexity Computer, OpenAI Computer Use (API)
Cloud Sandbox
β
ChatGPT Agent, Gemini Computer Use, Devin, Convergence Proxy, Project Mariner
Local Machine
β
Agent TARS, E2B Desktop Sandbox, Cua (trycua)
Local Machine
β
Claude Computer Use, UI-TARS Desktop, Open Interpreter, Open-Interface, Agent S/S2, UFO, Windows-Use, Bytebot, OpenCUA, Khoj
Browser-Only
β
Browser-use, Skyvern, Airtop, MultiOn
Browser-Only
β
Stagehand, NanoBrowser, Openator, Open Operator
Developer Libraries
β
Browser-use, Skyvern, Cloudflare Browser Run, Langflow, LlamaIndex, Haystack, AgentQL, ScrapeGraphAI, WebVoyager
Developer Libraries
β
Chrome DevTools MCP, Stagehand, LaVague, Notte, Firecrawl, Playwright MCP
Multi-Agent Platforms
β
CrewAI, AutoGen, LangGraph, OpenHands, OWL, Manus AI, n8n, Smolagents, Dify
Multi-Agent Platforms
β
Devin
AI Infrastructure ποΈ
Tools, frameworks, and specialized models for building production AI systems β from embeddings and video generation to safety, evaluation, and model routing.
Embedding & Reranking Models π§²
Specialized models for converting text (or images) into dense vector representations and for reranking retrieval results. Essential infrastructure for RAG pipelines and semantic search. Prices as of April 2026.
Model
Developer
Dimensions
Max Tokens
Pricing
Best For
GitHub
text-embedding-3-small
OpenAI
1,536
8,191
$0.02/1M tokens
Cost-effective English embeddings
β
text-embedding-3-large
OpenAI
3,072
8,191
$0.13/1M tokens
Highest-quality English retrieval
β
Embed v4
Cohere
1,536
128K
$0.12/1M (text), $0.47/1M (image)
Multimodal text + image RAG
β
voyage-3-large
Voyage AI
256β2,048 (flex)
32K
~$0.18/1M tokens
Highest-quality retrieval, long context
β
jina-embeddings-v3
Jina AI
32β1,024 (flex)
8,192
API pay-per-use
Multilingual, task-adaptive (LoRA heads)
π
BGE-M3
BAAI
1,024
8,192
Free (open-source)
Multi-functional: dense + sparse + ColBERT
π
Nomic Embed v2 (MoE)
Nomic AI
256β768 (flex)
512
Free (open-source)
Multilingual, MoE efficiency (305M active)
π
text-embedding-005
Google (Vertex AI)
768
2,048
$0.10/1M tokens
GCP-native semantic search
β
Model
Developer
Max Tokens
Pricing
Best For
GitHub
Rerank 4.0 Pro
Cohere
32K
$1.00/1K queries
High-accuracy domain-specific reranking
β
Rerank 4.0 Fast
Cohere
32K
$0.50/1K queries
Low-latency production reranking
β
rerank-2.5
Voyage AI
32K
API pay-per-use
Instruction-following, multilingual
β
BGE Reranker v2-m3
BAAI
8,192
Free (open-source)
Open-source cross-encoder reranking
π
Jina Reranker v2
Jina AI
8,192
API pay-per-use
Multilingual, long-context reranking
β
Video Generation Models π¬
Text-to-video and image-to-video generation models for creating short clips from prompts. The field is moving rapidly β resolutions, durations, and pricing change frequently. Specs as of April 2026.
Model
Developer
Resolution
Duration
Pricing
Open Source
Best For
GitHub
Sora 2
OpenAI
Up to 1080p
Up to 20s (Pro)
$20β$200/mo via ChatGPT
No
Cinematic quality, long clips
β
Veo 3
Google DeepMind
720pβ1080p
Up to 8s (extendable)
~$0.20β$0.40/s
No
Native audio + video, realistic physics
β
Runway Gen-4 / Gen-4.5
Runway
Up to 4K
Up to 16s
$12β$76/mo
No
Professional creative workflows
β
Kling 2.0
Kuaishou
1080p
Up to 10s
Free / $5.99β$66/mo
No
Budget production, fast turnaround
β
Pika 2.0
Pika Labs
1080p
Up to 5s
Free / $8β$58/mo
No
Social media, creative effects
β
MiniMax Video-01
MiniMax
720p
Up to 6s
~$0.40/video
No
Strong text-motion responsiveness
β
HunyuanVideo
Tencent
720pβ2K
Up to 16s
Free (self-host; ~60GB VRAM)
Yes (Apache 2.0)
High per-frame fidelity, long clips
π
Wan 2.2 (14B)
Alibaba
480pβ1080p
Up to 10s
~$0.10β$0.30/clip (API)
Yes (Apache 2.0)
Motion quality, VBench #1 benchmark
π
Mochi 1
Genmo
480p
Up to 5.4s @ 30fps
Free (open-source)
Yes (Apache 2.0)
High-quality open text-to-video
π
LTX Video
Lightricks
720p
Variable
Free (open-source)
Yes
Fast generation, ComfyUI-native
π
CogVideoX
Zhipu AI / Tsinghua
720p
~6s
Free (open-source)
Yes (Apache 2.0)
Image-to-video quality, LoRA fine-tuning
π
Text-to-speech (TTS) and speech-to-text (STT / ASR) models for voice generation, transcription, and real-time audio. Prices as of April 2026.
Model
Developer
Languages
Real-time
Open Source
Pricing
Best For
GitHub
ElevenLabs Turbo v2.5
ElevenLabs
29+
Yes
No
Free β $1,320/mo
Best quality (4.8 MOS), instant voice cloning
β
OpenAI TTS / TTS HD
OpenAI
57
Yes
No
$15 / $30 per 1M chars
Enterprise, seamless GPT integration
β
Sesame CSM
Sesame AI Labs
English
Yes
Yes
Free
Conversational, emotionally expressive (4.7 MOS)
π
Kokoro-82M
Hexgrad
Multilingual
Yes
Yes (Apache 2.0)
Free
Tiny (82M params), CPU-runnable, near-commercial quality
π
Fish Audio S1
Fish Audio
Multilingual
Yes
Yes
Free / $0.016/1K chars (API)
Voice cloning, multilingual fluency
π
Parler-TTS
HuggingFace
English
No
Yes (Apache 2.0)
Free
Style-controllable via text descriptions
π
XTTS v2
Coqui AI
17
Yes
Yes (MPL 2.0)
Free
Best open-source multilingual, 6s voice cloning
π
Bark
Suno AI
13+
No
Yes (MIT)
Free
Expressive, non-verbal sounds, long-form audio
π
Speech-to-Text (STT / ASR)
Model
Developer
Languages
Real-time
Open Source
Pricing
Best For
GitHub
Whisper large-v3
OpenAI
100+
No
Yes (MIT)
$0.006/min (API)
Open-source multilingual baseline
π
GPT-4o Transcribe
OpenAI
50+
Yes
No
$0.006/min
High-accuracy managed STT
β
Deepgram Nova-3
Deepgram
36+
Yes
No
$0.0043/min
Ultra-low latency, production STT
β
AssemblyAI Universal-2
AssemblyAI
Multilingual
Yes
No
$0.0025/min
Accurate, feature-rich transcription
β
AI Safety & Guardrails π‘οΈ
Tools and frameworks for detecting unsafe content, preventing prompt injection, validating outputs, and enforcing policy compliance in LLM-powered applications. As of April 2026.
Tool
Developer
Type
Open Source
Pricing
Best For
GitHub
Llama Guard 3
Meta
Safety classifier (8B LLM)
Yes (Meta license)
Free / ~$0.02/1M tokens (API)
Input/output safety classification, 8 languages
π
NeMo Guardrails
NVIDIA
Programmable guardrail toolkit (Colang DSL)
Yes (Apache 2.0)
Free
Dialog safety, policy enforcement, LangChain-native
π
OpenAI Privacy Filter
OpenAI
PII detection & redaction
Yes (Apache 2.0)
Free (OSS)
Detects & redacts personal info in text
π
Guardrails AI
Guardrails AI
Python validator framework
Yes
Free (OSS)
Output validation, PII detection, hallucination guards
π
Amazon Bedrock Guardrails
AWS
Managed safety layer
No
Pay-per-use (AWS)
AWS-native, zero-ops compliance and content filtering
β
ShieldGemma 2
Google
Safety classifier (open weights)
Yes (open weights)
Free
Text safety (2B/9B/27B), image safety (4B)
β
Rebuff
Protect AI
Prompt injection detector
Yes
Free
Self-hardening anti-injection using vector memory
π
Lakera Guard
Lakera
Managed LLM security API
No
Free tier + Enterprise
Runtime LLM security, <50ms latency, PII + injection
β
Frameworks and libraries for building Retrieval-Augmented Generation (RAG) pipelines β connecting LLMs to external knowledge sources. As of April 2026.
Framework
Developer
Language
Key Features
Open Source
GitHub
LlamaIndex
LlamaIndex
Python
160+ data connectors, hybrid search, multi-agent support
Yes (MIT)
π
LangChain
LangChain AI
Python / JS
Chains, agents, memory, 50K+ integrations, LangGraph
Yes (MIT)
π
RAGFlow
InfiniFlow
Python
Visual workflow builder, deep document parsing (PDF/tables)
Yes (Apache 2.0)
π
Haystack
deepset
Python
Modular pipelines, enterprise-grade, built-in monitoring
Yes (Apache 2.0)
π
Verba
Weaviate
Python
No-code UI, Weaviate-native vector search
Yes
π
Mem0
Mem0 AI
Python / JS
Persistent memory layer, graph memory, session recall
Yes (Apache 2.0)
π
txtai
NeuML
Python
All-in-one semantic search + workflow automation
Yes (Apache 2.0)
π
R2R
SciPhi
Python
Lightweight, low-latency, REST API, production-first
Yes (MIT)
π
Fine-tuning Platforms βοΈ
Tools and platforms for adapting pre-trained LLMs to specific tasks or domains via supervised fine-tuning, RLHF, LoRA/QLoRA, and related methods. Prices as of April 2026.
Platform
Type
Supported Models
Pricing
Best For
GitHub
Unsloth
OSS library
Llama, Mistral, Gemma, Qwen, Phi, + more
Free
2β5Γ faster training, 80% VRAM reduction via custom kernels
π
Axolotl
OSS framework
Most Hugging Face models
Free
Config-as-code (YAML), reproducibility, multi-GPU training
π
OpenAI Fine-tuning
Managed API
GPT-4o, GPT-4o-mini, GPT-3.5 Turbo
GPT-4o-mini: $0.30/1M training tokens
Managed, no infra, direct production deployment
β
Google Vertex AI
Managed cloud
Gemini 2.5 Pro/Flash, Gemma 3
Gemini 2.5 Pro: $25/1M training tokens
GCP-native, Gemini model access
β
Predibase / LoRAX
Cloud + OSS server
Llama, Mistral, 50+ HF models
Free tier + per-GPU pricing
Multi-adapter serving: many LoRA adapters on one GPU
π
PEFT
Hugging Face
All Hugging Face models
Free
LoRA, QLoRA, prefix tuning, prompt tuning β full HF ecosystem
π
LLaMA-Factory
Community
100+ models
Free
Web UI, low-code interface, beginner-friendly fine-tuning
π
torchtune
PyTorch
Llama, Gemma, Mistral, Phi
Free
PyTorch-native, composable training recipes
π
Evaluation & Observability π
Tools for tracing LLM calls, evaluating output quality, debugging RAG pipelines, and monitoring production AI systems. Prices as of April 2026.
Tool
Developer
Type
Open Source
Pricing
Best For
GitHub
LangSmith
LangChain AI
Tracing + evaluation platform
No (enterprise self-host)
Free (5K traces/mo), paid plans
LangChain apps, chain + agent debugging
β
Braintrust
Braintrust Data
Eval-first platform
Partial (AI proxy OSS)
Free (1M spans), enterprise
CI/CD evals, dataset management, LLM-as-judge
β
Helicone
Helicone
Proxy-based observability
Yes
Free tier, usage-based
Cost tracking, request caching, drop-in API proxy
π
Arize Phoenix
Arize AI
OSS tracing + evaluation
Yes
Free (OSS); Arize Cloud paid
RAG debugging, LLM-as-judge, local dev
π
Langfuse
Langfuse
Tracing + evaluation
Yes (MIT)
Free / self-host; cloud paid
Open-source, 19K+ GitHub stars, OpenTelemetry
π
Ragas
Ragas
RAG evaluation framework
Yes
Free
RAG-specific metrics: faithfulness, recall, precision
π
DeepEval
Confident AI
LLM evaluation framework
Yes
Free (OSS); cloud paid
14+ built-in metrics, pytest-style eval runner
π
The Model Context Protocol (MCP) is an open standard by Anthropic for connecting LLMs to external tools and data sources via a unified JSON-RPC 2.0 interface. It supports STDIO and Streamable HTTP transports. The official MCP registry at mcp.so lists 2,000+ servers.
MCP Clients: Claude Desktop, Claude Code, Cursor, Windsurf, VS Code (Copilot), Continue.dev, Zed, LibreChat, and more.
Tool / Server
Developer
Category
Open Source
Best For
GitHub
MCP Filesystem
Anthropic / Community
File I/O
Yes (MIT)
Read/write local files from any MCP client
π
MCP GitHub
GitHub / Anthropic
Code & DevOps
Yes
Repo management, issues, PRs, code search
π
MCP Slack
Community
Messaging
Yes
Slack workspace read/write interaction
π
MCP PostgreSQL
Community
Database
Yes
Read-only SQL queries against Postgres
π
MCP Google Drive
Community
Storage
Yes
Drive file access and search
π
MCP Docker
Community
DevOps
Yes
Container management and inspection
π
MCP Brave Search
Brave
Search
Yes
Web + local search via Brave API
π
MCP AWS
AWS Labs
Cloud
Yes (Apache 2.0)
AWS service integration
π
MCP Notion
Community
Productivity
Yes
Notion page and database access
π
FastMCP
Community
Framework
Yes
Python framework for building MCP servers fast
π
Context7
Upstash
Dev Tools
Yes
Up-to-date library docs for AI coding assistants
π
Agent Skills & Registries π―
Modular capability packages that extend AI agents with specialized knowledge, workflows, and procedural instructions β without bloating model context.
skills.sh is the primary registry and package manager for Agent Skills β an open standard developed by Anthropic for packaging and distributing reusable agent capabilities. Skills follow a progressive disclosure pattern: agents load only a skill's name and description at startup, then pull full instructions only when a task matches, keeping context overhead minimal.
Install a skill in one command:
npx skills add owner/repo
Feature
Detail
Standard
Agent Skills (open, SKILL.md format) β developed by Anthropic, hosted on GitHub
Registry URL
skills.sh
Total installs
90,989+ all-time
Compatible agents
Claude Code, Cursor, Windsurf, VS Code Copilot, Continue.dev, Zed, and any MCP-compatible agent
License
Open (skills are author-licensed; spec is open standard)
Skill
Publisher
Category
Installs
find-skills
vercel-labs/skills
Discovery
1.3M
vercel-react-best-practices
vercel-labs/agent-skills
Frontend
366K
frontend-design
anthropics/skills
Design
361K
web-design-guidelines
vercel-labs/agent-skills
Design
291K
microsoft-foundry
microsoft/azure-skills
Cloud/Azure
286K
azure-ai
microsoft/azure-skills
AI/Cloud
276K
agent-browser
vercel-labs/agent-browser
Browser
229K
skill-creator
anthropics/skills
Meta
180K
browser-use
browser-use/browser-use
Automation
71.6K
systematic-debugging
obra/superpowers
Dev
78.5K
test-driven-development
obra/superpowers
Dev
68.0K
seo-audit
coreyhaines31/marketingskills
Marketing
95.4K
supabase-postgres-best-practices
supabase/agent-skills
Database
138K
playwright-best-practices
currents-dev/playwright
Testing
34.2K
Notable Publisher Ecosystems
Publisher
Skills Count
Focus
microsoft/azure-skills
19+
Azure cloud, AI, Kubernetes, cost optimization
vercel-labs/agent-skills
15+
React, Next.js, Tailwind, deployment
anthropics/skills
15+
Design, docs, coding, web artifacts
coreyhaines31/marketingskills
20+
SEO, marketing, content, analytics
obra/superpowers
12+
Dev workflows, parallel agents, TDD
firebase/agent-skills
10+
Firebase, Firestore, GenKit
larksuite/cli
13+
Lark workspace automation
pbakaus/impeccable
10+
Design polish, code quality
Model Routers & Load Balancers π
Tools for routing LLM requests across multiple providers, models, and deployments β optimizing for cost, latency, quality, or reliability. Prices as of April 2026.
Tool
Developer
Key Features
Open Source
Pricing
GitHub
LiteLLM
BerriAI
100+ provider support, proxy server, load balancing, fallbacks, spend tracking
Yes (MIT)
Free (OSS) / $99/mo cloud
π
Portkey
Portkey
250+ LLMs, AI gateway, guardrails, observability, virtual keys
Yes (Apache 2.0)
Free tier / $49/mo+
π
OpenRouter
OpenRouter
200+ model catalog, unified API, pay-per-use credit system
No
~5% markup on provider cost
β
RouteLLM
LMSys
Open-source router (strong vs. weak model) using classifier or matrix factorization
Yes
Free
π
Not Diamond
Not Diamond
Pre-trained + custom task-specific routers, cost/quality tradeoff
No
Free tier + enterprise
β
Unify AI
Unify
Quality / cost / latency-aware routing across 100+ model deployments
No
Usage-based
β
Semantic Router
Aurelio AI
Embedding-based semantic intent routing for agents and pipelines
Yes
Free
π
Small Language Models (SLMs) π±
Compact models designed for on-device inference, edge deployment, low-latency APIs, and resource-constrained environments. Generally defined as models under ~15B parameters. Specs as of April 2026.
Model
Developer
Params
Context
License
Best For
Phi-4
Microsoft
14B
16K
MIT
Reasoning, math, code β STEM benchmark leader at class size
Phi-4-mini
Microsoft
3.8B
128K
MIT
On-device STEM reasoning with long context
Phi-4-multimodal
Microsoft
5.6B
128K
MIT
Vision + audio + text multimodal, edge deployment
Gemma 3 27B
Google
27B
128K
Apache 2.0
Top open model, multilingual (140+ languages)
Gemma 3 4B
Google
4B
128K
Apache 2.0
CPU inference, 140+ languages, mobile-friendly
Gemma 3 1B
Google
1B
32K
Apache 2.0
On-device, embedded, ultra-lightweight
SmolLM3
Hugging Face
3B
128K
Apache 2.0
Efficient, tool use, multilingual, reasoning
Qwen2.5 3B
Alibaba
3B
128K
Apache 2.0
Asian and multilingual tasks, coding
Qwen2.5 7B
Alibaba
7B
128K
Apache 2.0
Strong multilingual baseline, function calling
Llama 3.2 3B
Meta
3B
128K
Llama 3.2 license
General-purpose, on-device, Meta ecosystem
Llama 3.2 1B
Meta
1B
128K
Llama 3.2 license
Lightweight edge inference, distillation target
Granite 3.3 8B
IBM
8B
128K
Apache 2.0
Enterprise tasks, tool use, business-domain
MiniCPM 3.0
ModelBest / Tsinghua
4B
32K
Apache 2.0
Compact yet capable, mobile and edge
Danube 3 500M
H2O.ai
500M
8K
Apache 2.0
Ultra-lightweight on-device, IoT
Notable GitHub repos:
Tutorials, how-tos, and in-depth guides for getting the most out of AI models and tools.
A beginner-friendly introduction to AI models and how to start using them effectively.
Concept
Description
Parameters
Size of model (B = billions). More = more capable
Context Window
How much text model can process (128K standard)
Tokens
Basic units of text (~0.75 words per token)
Method
Best For
Setup Difficulty
Web Interfaces
Quick experiments
Easiest
API Access
Building applications
Easy
Self-Hosting
Privacy, no API costs
Medium-Hard
IDE Integration
Daily coding
Easy
Model Recommendations by Task
Task
Free Option
Premium Option
Chat
Llama 4 (self-hosted)
GPT-5.4, Claude Opus 4.6
Coding
DeepSeek-Coder-V2
Claude Opus 4.6
Reasoning
DeepSeek-R1
Gemini 3 Deep Think, o3
Long docs
Llama 4 Scout
Gemini 3 Flash
Vision
Llama 4 Maverick
GPT-5.4, Gemini 3 Pro
Free Models & APIs for Vibe Coding π»
Vibe coding β describing what you want in natural language and letting AI generate the code β has exploded in 2026. The ecosystem splits into two tracks: free AI APIs you plug into your own editor/agent, and free vibe coding IDEs/platforms that bundle everything together.
These are the raw API endpoints you can use in tools like Cursor (BYOK), Cline, or any agent framework.
Provider
Free Models
Daily Limit
Best For
Google Gemini API
Gemini 2.5 Pro (100 req/day), Gemini 2.5 Flash (250 req/day), Gemini 2.5 Flash-Lite (1,000 req/day)
Per-project limits
Prototyping, large context (1M tokens), multimodal
Groq Cloud
Llama 4 Scout, DeepSeek R1, Qwen3, GPT-OSS
~1,000-14,400 req/day
Fast iteration, agentic workflows
OpenRouter
28+ free models including Qwen3 Coder 480B, Devstral 2, MiMo-V2-Flash, DeepSeek R1, GPT-OSS 120B, Llama 3.3 70B
Varies by model
Experimenting with many models
Cerebras
Llama 3.3 70B, Qwen3 32B/235B, GPT-OSS 120B
1M tokens/day
Batch tasks, raw speed (20Γ faster than GPUs)
Mistral AI
Codestral-2508, Devstral, Mistral Large, Pixtral
1B tokens/month
Code completion, FIM tasks
NVIDIA NIM
91 free endpoints including Chinese models
Varies
Production inference on DGX Cloud
Free Vibe Coding IDEs & Platforms
Tool
Type
Key Features
Best For
Cursor
AI IDE
Agent Mode, Composer 2, multi-agent workspace
Professional development
Cline
VS Code Extension
Open-source, BYOK/Ollama, MCP tools
Self-hosted, unlimited local LLM
Windsurf
AI IDE
Cascade agent, live browser preview
IDE with browser integration
OpenHands
Docker Agent
Self-hosted, local LLM support, full SDLC
Unlimited local development
bolt.diy
Browser IDE
19+ LLM providers, Ollama, full-stack apps
Free web app building
Open Interpreter
CLI
Natural language β code, local LLM
Simple local automation
Chrome DevTools MCP - Game Changer for Web Dev
Google's Chrome DevTools MCP connects AI agents directly to Chrome for debugging, profiling, and automation:
29 tools across 6 categories (input, navigation, emulation, performance, network, debugging)
Run Lighthouse audits, capture performance traces, inspect network requests
Works with Claude Code, Cursor, Copilot via MCP
Supports BYOK/local LLMs through MCP clients
GitHub | 37,783+ stars
Managed browser infrastructure for AI agents:
Chrome DevTools Protocol (CDP) direct endpoint
MCP client support (Claude, Cursor, OpenCode)
Session recordings, Live View, WebMCP
Free Workers plan / $5+/mo paid
Browser Run
Recommendations by Use Case
Free + Local LLM: Cline + Ollama, OpenHands + Qwen3 Coder, bolt.diy + Ollama
Fast API Iteration: Groq (speed) + Cerebras (high limits)
Web Development: Chrome DevTools MCP + Cline (zero-cost debugging)
Many Models: OpenRouter (unified API)
Production Inference: NVIDIA NIM, Cerebras
π‘ Pro Tip: Combine Chrome DevTools MCP with a local LLM (Ollama) via Cline for completely free, unlimited AI-powered web development and debugging.
A comprehensive guide to running AI models on your own hardware.
Benefit
Description
Privacy
Data never leaves your infrastructure
Cost Control
No per-token API costs for unlimited usage
Customization
Fine-tune models for specific needs
No Rate Limits
Process as much as hardware allows
Offline Access
Work without internet
For installation and usage instructions, refer to the official Ollama documentation .
Recommended apps (local-first):
Ollama - Simple local runtime with a local HTTP API
LM Studio - Desktop UI for downloading and running models locally
llama.cpp - Fast local inference (CPU/GPU), great for quantized models
Open WebUI - Optional local web UI (pairs well with local runtimes)
If you want βserver-styleβ hosting (advanced):
vLLM - High-throughput serving for NVIDIA GPUs
SGLang - Structured generation and serving workflows
Practical setup tips:
Install the latest NVIDIA drivers (enable GPU acceleration in your chosen app)
Start with smaller quantized models (Q4 is a common βbest defaultβ)
Keep context windows realistic for local hardware (lower context = faster, less memory)
Watch VRAM first, then system RAM; reduce model size or quantization if either saturates
Prefer running locally on localhost and only expose to LAN if you understand firewall rules
Example hardware configurations:
Hardware
Good starting point
Notes
Consumer GPU (24 GB VRAM)
7Bβ14B quantized
e.g., RTX 4090, RTX 3090 β great for chat/coding
Pro GPU (48β80 GB VRAM)
14Bβ70B quantized
e.g., A6000, A100 β coding agents, longer contexts
Multi-GPU (160+ GB VRAM)
70B+ quantized
e.g., 2ΓA100 β larger open-source models
CPU-only (32β64 GB RAM)
7Bβ14B quantized
Slower but viable for offline chat; keep context moderate
Option
Best For
Pros
Cons
Local Machine
Personal use
Simple, no latency
Limited hardware
Dedicated Server
Team use
Full control
Maintenance
Cloud GPU Rental
Experimentation
On-demand
Hourly costs
Kubernetes
Enterprise
Scalable
Complex
Comprehensive pricing comparisons and cost calculations.
Tier
Price Range
Models
π Free
$0
Self-hosted, free tiers
πΈ Budget
$0.025 - $0.50/1M
Gemini 3.1 Flash-Lite, GLM-4.7-FlashX, GPT-5.4 nano, Grok 4 Fast
π° Mid-range
$0.60 - $15.00/1M
GPT-5.4 mini, Claude Haiku 4.5, Kimi K2.5, Sonar, GLM-5, GPT-5.4, Claude Sonnet
π Premium
$15.00 - $600.00/1M
GPT-5.4 Pro, Claude Opus, o1-Pro
Subscription Pricing (Monthly, USD)
AI chat apps
Product
Plans (USD)
Notes
Official Source
ChatGPT
Go $8 , Plus $20 , Pro $200 , Business $25/seat (annual) or $30/seat (monthly), Enterprise (contact sales)
Consumer prices are US-listed; Go is localized in some markets
π
Claude
Pro $20 , Max $100 (5Γ) or $200 (20Γ), Team/Enterprise (see pricing)
Prices shown exclude applicable taxes; availability varies by region
π
Google AI (Gemini)
Plus $7.99 , Pro $19.99 , Ultra $249.99
US pricing; some regions/local pricing differ
π
Coding assistants
Tool
Plans (USD)
Notes
Official Source
GitHub Copilot
Free $0 , Pro $10 , Pro+ $39 , Business $19/user , Enterprise $39/user
Annual options available for Pro/Pro+
π
Model
Input
Output
Cached Input
Best For
GLM-4.7-FlashX
$0.07
$0.40
β
Fast budget tasks
Step-3.5-Flash
$0.10
$0.30
β
Ultra-fast reasoning (85β350 tok/s)
GLM-4-32B-0414-128K
$0.10
$0.10
β
Budget chat/coding
Llama 4 Maverick
$0.15
$0.60
β
Open multimodal (self-host: $0)
GPT-5.4 nano
$0.20
$1.25
$0.02
Classification and lightweight subagents
Grok 4 Fast
$0.20
$0.50
$0.05
Fast Grok reasoning
Gemini 3.1 Flash-Lite
$0.025
$1.50
$0.0025
Budget multimodal, fastest Google model
DeepSeek-V3.1
$0.27
$0.41
β
Everything
DeepSeek-V3.2
$0.28
$0.42
$0.028
Budget workhorse, reasoning
DeepSeek-V4
$0.30
$0.50
$0.03
Engram memory, coding (off-peak 50% off)
Gemini 3 Flash
$0.30
$2.50
$0.05 + $1/hr
Long context
MiniMax-M2.5
$0.30
$1.20
Auto (included)
Coding, long context
Mistral Large 3
$0.50
$1.50
β
Strong open-source frontier model
Kimi K2.5
$0.60
$3.00
Auto (included)
Multimodal + agent tasks
GPT-5.4 mini
$0.75
$4.50
$0.075
Fast coding and multimodal tasks
Claude Haiku 4.5
$1.00
$5.00
β
Low-latency coding and sub-agents
GLM-5
$1.00
$3.20
$0.20
Agentic engineering
Perplexity Sonar
$1.00
$1.00
β
Web-grounded chat (request fees apply)
GPT-5.3-Codex
$1.75
$14.00
$0.175
Agentic coding, 7+ hour autonomy
Gemini 3.1 Pro
$2.00
$12.00
$0.20β$0.40 + $4.50/hr
Frontier reasoning
Perplexity Sonar Reasoning Pro
$2.00
$8.00
β
Reasoning + search (request fees apply)
GPT-5.4
$2.50
$15.00
$0.25
Frontier coding and professional work
Grok 4
$3.00
$15.00
$0.75
First-principles reasoning
Perplexity Sonar Pro
$3.00
$15.00
β
Higher quality + search (request fees apply)
Claude Sonnet 4.5
$3.00
$15.00
$0.30 (hit)
Best coding
Claude Sonnet 4.6
$3.00
$15.00
$0.30 (hit)
Near-Opus performance
Claude Opus 4.6
$5.00
$25.00
$0.50 (hit)
Agentic coding
Self-Hosting vs API (Monthly)
Usage Level
Self-Host (A100)
API (GPT-5)
Winner
Light (1M tokens)
$300 (rental)
$10
API
Medium (100M tokens)
$300
$1,000
Self-host
Heavy (1B tokens)
$300
$10,000
Self-host
Enterprise (10B+ tokens)
$2,000 (owned)
$100,000+
Self-host
Reference materials including glossary, comparison tables, and data sources.
Definitions of common terms used throughout the documentation.
Term
Definition
Agent
AI system that autonomously performs tasks and interacts with environments
API
Interface for programmatically accessing AI models
Attention Mechanism
Neural network component focusing on relevant input parts
Benchmark
Standardized test measuring model performance
Chain-of-Thought (CoT)
Prompting technique showing step-by-step reasoning
Term
Definition
Fine-Tuning
Adapting pre-trained model to specific tasks
Frontier Model
State-of-the-art proprietary model
GPU
Hardware accelerator essential for ML
LLM
Large Language Model
LoRA
Efficient fine-tuning method
Term
Definition
MCP
Model Context Protocol for tool interaction
MMLU
Massive Multitask Language Understanding benchmark
MoE
Mixture of Experts architecture
Multimodal
Processing multiple input types
RAG
Retrieval-Augmented Generation
Term
Definition
Self-Hosting
Running models on own infrastructure
SLM
Small Language Model
SWE-bench
Benchmark for real GitHub issue resolution
Token
Basic unit of text processing
VRAM
GPU memory for model storage
Side-by-side comparisons of AI models sorted by various criteria.
Sort by Latest Update (Default)
π’ Company
π€ Model
π¦ Version
π
Release Date
π Latest Updated
π» Coding
π Benchmarks
π° Price
π₯οΈ Self-Host
π Official Site
π€ OpenAI
GPT-5
5.4 mini
2026-03-17 00:00 UTC
2026-03-17 00:00 UTC β
β
GPQA 87.5%
$0.75 / $4.50
β
π
π€ OpenAI
GPT-5
5.4
2026-03-05 00:00 UTC
2026-03-05 00:00 UTC β
β
GPQA 92.0%, SWE-bench ~80%
$2.50 / $15.00
β
π
π Google DeepMind
Gemini 3.1
Flash-Lite
2026-03-03 00:00 UTC
2026-03-03 00:00 UTC β
β
β
$0.25 / $1.50
β
π
π¬ DeepSeek
DeepSeek
V4
2026-02-17 00:00 UTC
2026-02-17 00:00 UTC
β
No public benchmarks
Pay-per-token
β
π
π Google DeepMind
Gemini 3
Deep Think
2026-02-12 00:00 UTC
2026-02-12 00:00 UTC β
β
GPQA ~97%, ARC-AGI-2 84.6%, HLE 48.4%
Ultra subscription
β
π
π¨π³ Zhipu AI
GLM
5
2026-02-12 00:00 UTC
2026-02-12 00:00 UTC β
β
GPQA 82.0%, SWE-bench 77.8%
$1.00 / $3.20
β
π
π€ Anthropic
Claude
Opus 4.6
2026-02-05 00:00 UTC
2026-02-05 00:00 UTC β
β
GPQA 91.3%, SWE-bench 80.8%
$5 / $25
β
π
π€ OpenAI
GPT-5
5.3-Codex
2026-02-05 00:00 UTC
2026-02-05 00:00 UTC β
β
GPQA 91.5%, SWE-bench Pro 56.8%
TBD
β
π
π Moonshot AI
Kimi
K2.5
2026-01-29 00:00 UTC
2026-02-02 00:00 UTC β
β
GPQA 87.6%, SWE-bench 76.8%
$0.60 / $3.00
β
π
Release Windows (Month-level)
π’ Company
π€ Model
π
Release Window
Notes
π Official Site
π§ MiniMax
MiniMax M2.5
2026-02
$0.30 / $1.20
π
π¨π³ Alibaba/Qwen
Qwen 3.5-Max
2026-02
Open-source release window
π
π Google DeepMind
Gemini 3.1 Flash-Lite
2026-02
Budget Gemini model
π
π Google DeepMind
Gemini 3 Pro
2026-01
Tiered pricing
π
π€ OpenAI
GPT-5.4 family
2026-03
GPT-5.4, GPT-5.4 mini, GPT-5.4 nano
π
π«π· Mistral AI
Mistral Large 3
2025-11
Apache 2.0 open-source, 123B params
π
Rank
Model
Input
Output
License
1
Self-hosted
$0
$0
Various
2
GLM-4.7-Flash
$0
$0
Free
3
GLM-4.7-FlashX
$0.07
$0.40
API
4
GLM-4-32B-0414-128K
$0.10
$0.10
API
5
Yi-Lightning
$0.14
$0.42
Apache 2.0
6
GPT-5.4 nano
$0.20
$1.25
Proprietary
7
Gemini 3.1 Flash-Lite
$0.025
$1.50
Proprietary
8
DeepSeek-V3.1
$0.27
$0.41
MIT
9
Gemini 3 Flash
$0.30
$2.50
Proprietary
10
MiniMax-M2.5
$0.30
$1.20
Proprietary
Sort by Performance (Coding)
Rank
Model
HumanEval
Self-Host
1
Claude Sonnet 4.5
~92%
β
2
GPT-OSS-120B
~89%
β
3
DeepSeek-Coder-V2
~92%
β
4
Qwen3-Coder
~92%
β
5
DeepSeek-V3.1
82%+
β
Rank
Model
Context
Best For
1
Gemini 3 Flash
10M
Entire libraries
2
Llama 4 Scout
10M
Long-document RAG
3
Gemini 3 Pro
1M+
Research papers
4
Kimi K2.5
256K
Large codebases
Attribution, verification sources, and methodology.
Benchmark
Source
Description
GPQA Diamond
Google Research
Graduate-level science questions (PhD difficulty)
MMLU-Pro
TIGER-Lab
Extended multi-task language understanding
Arena Elo
lmarena.ai
Crowdsourced human preference ranking
HLE
Scale AI
Humanity's Last Exam β expert-level questions
SWE-bench Verified
Princeton
Real GitHub issue resolution (human-verified)
SWE-bench Pro
Princeton
More challenging subset of SWE-bench
LiveCodeBench
LiveCodeBench
Live competitive programming problems
AIME 2025
MAA
American Invitational Mathematics Examination
ARC-AGI-2
ARC Prize
Abstract reasoning challenge (fluid intelligence)
MMMU / MMMU-Pro
MMMU
Multi-discipline multimodal understanding
IFEval
Google Research
Instruction-following evaluation
FrontierMath
Epoch AI
Expert-level research mathematics
HumanEval
OpenAI
164 Python programming problems
Primary Source Review - Check official documentation
Cross-Validation - Compare multiple sources
Timestamp Verification - All data includes verification date
Update Tracking - Monitor official channels
Last Updated: 2026-05-04 16:36 UTC
Maintained by: ReadyPixels LLC
Made with β€οΈ by ReadyPixels LLC