| GPT-4.1 |
OpenAI |
Closed |
General reasoning and coding |
Premium cost |
API access
|
| GPT-4o |
OpenAI |
Closed multimodal |
Fast assistant UX and multimodal tasks |
Cost at high scale |
API access
|
| GPT-4o mini |
OpenAI |
Closed small |
Cost-sensitive high-volume automation |
Lower ceiling on hard reasoning |
API access
|
| o1 |
OpenAI |
Reasoning-first |
Complex multi-step logic |
Latency and cost per hard query |
API access
|
| o3-mini |
OpenAI |
Reasoning-efficient |
Technical Q&A and coding workflows |
Can require prompt tuning |
API access
|
| Claude 3.7 Sonnet |
Anthropic |
Closed |
Long-context writing and analysis |
Conservative tone in some flows |
API access
|
| Claude 3.5 Sonnet |
Anthropic |
Closed |
Balanced quality and stability |
Cost in very large traffic spikes |
API access
|
| Claude 3.5 Haiku |
Anthropic |
Closed small |
Fast responses and triage |
Less robust on deepest tasks |
API access
|
| Claude 3 Opus |
Anthropic |
Closed flagship |
High-stakes synthesis |
Throughput economics |
API access
|
| Gemini 2.0 Pro |
Google |
Closed |
Reasoning and multimodal enterprise apps |
Task variance across prompt styles |
API access
|
| Gemini 2.0 Flash |
Google |
Closed fast |
Low-latency assistant endpoints |
Lower quality than premium tier |
API access
|
| Gemini 1.5 Pro |
Google |
Closed long-context |
Very long document workflows |
Price/performance depends on load |
API access
|
| Gemini 1.5 Flash |
Google |
Closed fast |
Efficient summarization and extraction |
Reasoning depth can be limited |
API access
|
| Llama 3.1 405B Instruct |
Meta |
Open weight |
Top-end open deployment quality |
Heavy infrastructure requirements |
Download
·
70B
·
8B
|
| Llama 3.1 70B Instruct |
Meta |
Open weight |
Strong self-hosted quality/cost balance |
Needs good inference stack |
Download
·
405B
·
8B
|
| Llama 3.1 8B Instruct |
Meta |
Open weight small |
Edge and low-cost deployments |
Lower performance on complex tasks |
Download
·
70B
·
405B
|
| Llama 3.2 11B Vision |
Meta |
Open multimodal |
Private vision-text pipelines |
Requires evals for OCR-heavy cases |
Download
·
90B
|
| Llama 3.2 90B Vision |
Meta |
Open multimodal |
High-capacity multimodal inference |
Infrastructure complexity |
Download
·
11B
|
| Mistral Large |
Mistral AI |
Closed |
High-quality enterprise assistants |
Smaller ecosystem vs hyperscalers |
API access
|
| Mistral Medium |
Mistral AI |
Closed |
Balanced production usage |
Benchmark carefully vs peers |
API access
|
| Mistral Small |
Mistral AI |
Closed small |
Fast cost-efficient chat |
Limited depth on advanced reasoning |
API access
|
| Mixtral 8x22B |
Mistral AI |
Open MoE |
Strong open-weight generation quality |
Operational complexity |
Download
·
8x7B
|
| Mixtral 8x7B |
Mistral AI |
Open MoE |
Efficient self-hosting |
Can trail latest closed models |
Download
·
8x22B
|
| Codestral |
Mistral AI |
Code-specialized |
Code generation and completion |
Narrower general language strength |
Download
|
| Qwen2.5 72B Instruct |
Alibaba |
Open weight |
Reasoning and multilingual tasks |
Compliance checks for some regions |
Download
·
32B
·
14B
·
7B
|
| Qwen2.5 32B Instruct |
Alibaba |
Open weight |
Strong quality with lower infra cost |
Prompt tuning often needed |
Download
·
72B
·
14B
·
7B
|
| Qwen2.5 14B Instruct |
Alibaba |
Open weight |
Balanced private deployment |
Less robust on hardest tasks |
Download
·
72B
·
32B
·
7B
|
| Qwen2.5 7B Instruct |
Alibaba |
Open weight small |
High-throughput low-cost inference |
Lower reasoning depth |
Download
·
14B
·
32B
·
72B
|
| QwQ-32B |
Alibaba |
Reasoning open |
Reasoning-focused private usage |
Evals needed for stability |
Download
|
| DeepSeek V3 |
DeepSeek |
Open/available |
General reasoning and coding value |
Governance review in enterprise |
Download
|
| DeepSeek R1 |
DeepSeek |
Reasoning-focused |
Difficult multi-step reasoning tasks |
Latency on complex outputs |
Download
|
| DeepSeek Coder V2 |
DeepSeek |
Code-specialized |
Developer assistants and code review |
General writing less strong |
Download
|
| Command R+ |
Cohere |
Closed enterprise |
RAG and enterprise knowledge use |
Compare against top general models |
API access
|
| Command R |
Cohere |
Closed |
Fast retrieval-grounded responses |
Not always best for deep coding |
API access
|
| DBRX Instruct |
Databricks |
Open weight |
Data platform integrated workloads |
Requires platform maturity |
Download
|
| Phi-3 Medium |
Microsoft |
Small model |
Compact deployments and edge use |
Limited on very complex tasks |
Download
·
Mini
|
| Phi-3 Mini |
Microsoft |
Small model |
On-device and constrained inference |
Lower accuracy ceiling |
Download
·
Medium
|
| Yi-34B Chat |
01.AI |
Open weight |
Multilingual experimentation |
Needs thorough evaluation before prod |
Download
|