Model Inference

Choose the right model, get to the perfect prompt.

52 models, on 12 inference providers. New models added every week.

AAgenticaAnthropic AIAnthropic AIDeepSeekDeepSeekGoogleGoogleInfly-AIInfly-AIMetaMetaMMicrosoftMistral AIMistral AINous ResearchNous ResearchOpenAIOpenAIQwenQwenSsalesforceOOthersShow all
Metameta-llama/Llama-3.2-1B
May 2025
texttext
BInference by Baseten
$0.00
QwenQwen/Qwen2.5-1.5B
May 2025
texttext
BInference by Baseten
$0.00
QwenQwen/Qwen3-0.6B
Apr 2025
BInference by Baseten
$0.00
ADeepCoder 14B Preview
Apr 2025
texttext
DeepCoder-14B-Preview is a code reasoning LLM fine-tuned from DeepSeek-R1-Distilled-Qwen-14B using distributed reinforcement learning (RL) to scale up to long context lengths.
BytezInference by Bytez
$0.00
QwenQwQ 32B
Mar 2025
Excels in complex reasoning tasks like math and coding. Uses reinforcement learning to match larger models' performance while remaining efficient.
Fireworks AIInference by Fireworks AI
Input: $0.90 / Output: $0.90
Meta meta-llama/Llama-3.2-1B-Instruct
Mar 2025
meta-llama/Llama-3.2-1B-Instruct
BInference by Baseten
$0.00
MPhi 4 Mini Instruct
Mar 2025
Phi-4-mini-instruct is a lightweight open model built upon synthetic data and filtered publicly available websites - with a focus on high-quality, reasoning dense data. The model belongs to the Phi-4 model family and supports 128K token context length.
MInference by Microsoft
Input: $0.90 / Output: $0.90
DeepSeekDeepseek V3
Dec 2024
Efficient Mixture-of-Experts model with 37B active parameters per token, excelling in coding, math, and reasoning tasks while maintaining 128K token context length.
Fireworks AIInference by Fireworks AI
Input: $0.90 / Output: $0.90
Mmicrosoft/Phi-3-mini-4k-instruct
Dec 2024
texttext
microsoft/Phi-3-mini-4k-instruct
BytezInference by Bytez
$0.00
SBLIP Captions
Dec 2024
imagetext
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
BytezInference by Bytez
$0.00
MetaLlama 3.3 70B Speculative Decoding
Dec 2024
The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out).
GroqInference by Groq
Input: $0.59 / Output: $0.59
Infly-AIinfly/OpenCoder-1.5B-Instruct
Nov 2024
texttext
OpenCoder is an open and reproducible code LLM family which includes 1.5B and 8B base and chat models, supporting both English and Chinese languages.
BytezInference by Bytez
$0.00
Anthropic AIclaude-3-5-sonnet-20241022
Oct 2024
Anthropic's best combination of performance and speed for efficient, high-throughput tasks.
AnthropicInference by Anthropic
Input: $3.00 / Output: $15.00
OpenAIDall-e 3
Oct 2024
textimage
Given a prompt and/or an input image, the model will generate a new image.
OpenAIInference by OpenAI
Input: $0.04 / Output: $0.04
Mistral AIministral-3b-latest
Oct 2024
Handles a range of natural language processing tasks, offering strong text generation, translation, summarization, and code capabilities across multiple domains.
Mistral AIInference by Mistral AI
Input: $0.04 / Output: $0.04
QwenQwen2.5 1.5B Instruct
Sep 2024
texttext
Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2:
BytezInference by Bytez
$0.00
MetaLlama 3.2 3B Instruct
Sep 2024
Lambda LabsInference by Lambda Labs
Input: $0.02 / Output: $0.02
GoogleGemini 1.5 Flash
Sep 2024
Fast, Lightweight Model
GoogleInference by Google
Input: $0.08 / Output: $0.30
OpenAIo1 Preview
Sep 2024
Excels at multi-step reasoning, advanced math, coding, and science by spending more time thinking and leveraging chain-of-thought rationales for complex tasks.
OpenAIInference by OpenAI
Input: $15.00 / Output: $60.00
OpenAIo1 Mini
Sep 2024
A specialized, cost-efficient AI model focused on STEM reasoning, delivering strong performance in mathematics and coding at faster speeds.
OpenAIInference by OpenAI
Input: $3.00 / Output: $12.00
MetaLlama 3.1 405B Instruct Turbo
Jul 2024
Llama 3.1 is an auto-regressive language model that uses an optimized transformer architecture.
Together.aiInference by Together.ai
Input: $3.50 / Output: $3.50
MetaLlama3.1 8B
Jul 2024
Llama 3.1 8B
CerebrasInference by Cerebras
Input: $0.10 / Output: $0.10
OpenAIGPT-4o Mini
Jul 2024
Delivers strong text and vision capabilities with low cost and latency—ideal for real-time, large context, or multitask AI applications.
OpenAIInference by OpenAI
Input: $0.15 / Output: $0.60
QwenQwen2 VL 72b Instruct
Jun 2024
imagetext
The 72B variant of the latest iteration of Qwen-VL model from Alibaba, representing nearly a year of innovation.
Fireworks AIInference by Fireworks AI
Input: $0.90 / Output: $0.90
Googlegemini-1.5-pro
May 2024
GoogleInference by Google
Input: $1.25 / Output: $5.00
GoogleText Embedding 004
May 2024
textembeddings
Text embeddings
GoogleInference by Google
Input: $0.02 / Output: $0.02
OpenAIGPT-4o
May 2024
GPT-4o is our most advanced multimodal model that’s faster and cheaper than GPT-4 Turbo with stronger vision capabilities. The model has 128K context and an October 2023 knowledge cutoff.
OpenAIInference by OpenAI
Input: $2.50 / Output: $10.00
Nous ResearchHermes3 70B
Apr 2024
Hermes 3 is the latest version of the flagship Hermes series of LLMs by Nous Research
Lambda LabsInference by Lambda Labs
Input: $0.20 / Output: $0.20
Qwentest
Nov 1111
texttext
test
BInference by Baseten
$1.00
Mistral AIcodestral-2405
Oct 2024
State-of-the-art Mistral model trained specifically for code tasks.
Mistral AIInference by Mistral AI
Input: $0.20 / Output: $0.60
Mistral AImistral-small-2409
Oct 2024
Cost-efficient, fast, and reliable option for use cases such as translation, summarization, and sentiment analysis.
Mistral AIInference by Mistral AI
Input: $0.20 / Output: $0.60
Mistral AIpixtral-12b
Oct 2024
Version-capable small model.
Mistral AIInference by Mistral AI
Input: $0.15 / Output: $0.15
Mistral AIministral-8b-latest
Oct 2024
Powerful model for on-device use cases.
Mistral AIInference by Mistral AI
Input: $0.10 / Output: $0.10
Mistral AImistral-nemo
Oct 2024
State-of-the-art Mistral model trained specifically for code tasks.
Mistral AIInference by Mistral AI
Input: $0.15 / Output: $0.15
MetaLlama 3.1 8B Instruct
Nov 2024
Lambda LabsInference by Lambda Labs
Input: $0.20 / Output: $0.20
MetaLlama 3.2 11B Vision (Preview)
Nov 2024
imagetext
A powerful multimodal model capable of processing both text and image inputs that supports multilingual, multi-turn conversations, tool use, and JSON mode.
GroqInference by Groq
Input: $0.18 / Output: $0.18
MetaLlama-3 8B-instruct
Oct 2024
A fast 8B model
Fireworks AIInference by Fireworks AI
Input: $0.20 / Output: $0.20
MetaLlama 3.2 90B Vision (Preview)
Dec 2024
imagetext
A powerful multimodal model capable of processing both text and image inputs that supports multilingual, multi-turn conversations, tool use, and JSON mode.
GroqInference by Groq
Input: $0.90 / Output: $0.90
Nous ResearchHermes 3 405B
Nov 2024
Hermes 3 405B is the latest flagship model in the Hermes series of LLMs by Nous Research, and the first full parameter finetune since the release of Llama-3.1 405B.
Lambda LabsInference by Lambda Labs
Input: $0.20 / Output: $0.20
OpenAIText Embeddings Small v3
Nov 2024
textembeddings
OpenAI’s text embeddings measure the relatedness of text strings.
OpenAIInference by OpenAI
Input: $0.02 / Output: $0.02
Mistral AIopen-mixtral-8x7b
Oct 2024
A 7B sparse Mixture-of-Experts (SMoE). Uses 12.9B active parameters out of 45B total.
Mistral AIInference by Mistral AI
Input: $0.70 / Output: $0.70
Metameta-llama/Meta-Llama-3-8B-Instruct-Turbo
Oct 2024
meta-llama/Meta-Llama-3-8B-Instruct-Turbo
Together.aiInference by Together.ai
Input: $0.18 / Output: $0.18
?broken_model
Oct 2024
This is broken, just for testing
Together.aiInference by Together.ai
Input: $0.00 / Output: $0.00
?BAAI/bge-m3
Dec 2024
textembeddings
The BAAI/bge-m3 is an advanced multilingual embedding model supporting over 100 languages, designed for high-performance semantic similarity and retrieval tasks. By generating dense vector representations that capture semantic meaning across diverse linguistic contexts, the model enables sophisticated cross-lingual text understanding and comparison.
BytezInference by Bytez
$0.00
Mistral AIopen-mistral-7b
Oct 2024
A 7B transformer model, fast-deployed and easily customisable.
Mistral AIInference by Mistral AI
Input: $0.25 / Output: $0.25
QwenQwen/Qwen2-0.5B-Instruct
Dec 2024
texttext
Qwen2-0.5B-Instruct is a compact 500 million parameter language model from Alibaba's Qwen series, designed for efficient instruction-following and conversational tasks with a lightweight architecture optimized for resource-constrained environments.
BytezInference by Bytez
$0.00
QwenQwen/Qwen2-7B-Instruct
Dec 2024
texttext
Qwen2-7B-Instruct is an advanced 7 billion parameter large language model from Alibaba's Qwen series, optimized for instructional and conversational tasks with strong multilingual capabilities and improved performance across reasoning, mathematics, and coding benchmarks. The model builds on the Qwen series' reputation for high-quality open-source AI models, offering enhanced instruction-following abilities and efficient text generation across multiple domains.
BytezInference by Bytez
$0.00
Mistral AIopen-mixtral-8x22b
Oct 2024
Mixtral 8x22B is currently the most performant open model. A 22B sparse Mixture-of-Experts (SMoE). Uses only 39B active parameters out of 141B.
Mistral AIInference by Mistral AI
Input: $2.00 / Output: $6.00
Qwen Qwen 2.5 Coder 32B Instruct
Mar 2025
Specializes in code generation, reasoning, and fixing with 128K token context, open-source licensing, and local deployment capabilities.
Fireworks AIInference by Fireworks AI
Input: $0.90 / Output: $0.90
Anthropic AIClaude 3.7 Sonnet
Feb 2025
<needs_description>
AnthropicInference by Anthropic
Input: $3.00 / Output: $15.00
Mistral AIMistral Large
Oct 2024
Excels at complex multilingual reasoning, code generation, and precise instruction following, with native fluency in five languages and advanced function calling.
Mistral AIInference by Mistral AI
Input: $2.00 / Output: $6.00
?thenlper/gte-large
Dec 2024
texttext
General Text Embeddings (GTE) model. Towards General Text Embeddings with Multi-stage Contrastive Learning. The GTE models are trained by Alibaba DAMO Academy. They are mainly based on the BERT framework
BytezInference by Bytez
$0.00