Model Inference
Choose the right model, get to the perfect prompt.
52 models, on 12 inference providers. New models added every week.
AAgentica
Anthropic AI
DeepSeek
Google
Infly-AI
MetaMMicrosoft
Mistral AI
Nous Research
OpenAI
QwenSsalesforceOOthersShow all


meta-llama/Llama-3.2-1B
May 2025
texttext
BInference by Baseten
$0.00
Qwen/Qwen2.5-1.5B
May 2025
texttext
BInference by Baseten
$0.00
Qwen/Qwen3-0.6B
Apr 2025
$0.00
ADeepCoder 14B Preview
Apr 2025
texttext
DeepCoder-14B-Preview is a code reasoning LLM fine-tuned from DeepSeek-R1-Distilled-Qwen-14B using distributed reinforcement learning (RL) to scale up to long context lengths.

$0.00
QwQ 32B
Mar 2025
Excels in complex reasoning tasks like math and coding. Uses reinforcement learning to match larger models' performance while remaining efficient.

Input: $0.90 / Output: $0.90
meta-llama/Llama-3.2-1B-Instruct
Mar 2025
$0.00
MPhi 4 Mini Instruct
Mar 2025
Phi-4-mini-instruct is a lightweight open model built upon synthetic data and filtered publicly available websites - with a focus on high-quality, reasoning dense data. The model belongs to the Phi-4 model family and supports 128K token context length.
MInference by Microsoft
Input: $0.90 / Output: $0.90
Deepseek V3
Dec 2024
Efficient Mixture-of-Experts model with 37B active parameters per token, excelling in coding, math, and reasoning tasks while maintaining 128K token context length.

Input: $0.90 / Output: $0.90
Mmicrosoft/Phi-3-mini-4k-instruct
Dec 2024
texttext
microsoft/Phi-3-mini-4k-instruct

$0.00
SBLIP Captions
Dec 2024
imagetext
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

$0.00
Llama 3.3 70B Speculative Decoding
Dec 2024
The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out).

Input: $0.59 / Output: $0.59
infly/OpenCoder-1.5B-Instruct
Nov 2024
texttext
OpenCoder is an open and reproducible code LLM family which includes 1.5B and 8B base and chat models, supporting both English and Chinese languages.

$0.00
claude-3-5-sonnet-20241022
Oct 2024
Anthropic's best combination of performance and speed for efficient, high-throughput tasks.
Input: $3.00 / Output: $15.00
Dall-e 3
Oct 2024
textimage
Given a prompt and/or an input image, the model will generate a new image.
Input: $0.04 / Output: $0.04
ministral-3b-latest
Oct 2024
Handles a range of natural language processing tasks, offering strong text generation, translation, summarization, and code capabilities across multiple domains.
Input: $0.04 / Output: $0.04
Qwen2.5 1.5B Instruct
Sep 2024
texttext
Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2:

$0.00
Llama 3.2 3B Instruct
Sep 2024
Input: $0.02 / Output: $0.02
Gemini 1.5 Flash
Sep 2024
Input: $0.08 / Output: $0.30
o1 Preview
Sep 2024
Excels at multi-step reasoning, advanced math, coding, and science by spending more time thinking and leveraging chain-of-thought rationales for complex tasks.
Input: $15.00 / Output: $60.00
o1 Mini
Sep 2024
A specialized, cost-efficient AI model focused on STEM reasoning, delivering strong performance in mathematics and coding at faster speeds.
Input: $3.00 / Output: $12.00
Llama 3.1 405B Instruct Turbo
Jul 2024
Llama 3.1 is an auto-regressive language model that uses an optimized transformer architecture.
Input: $3.50 / Output: $3.50
Llama3.1 8B
Jul 2024
Input: $0.10 / Output: $0.10
GPT-4o Mini
Jul 2024
Delivers strong text and vision capabilities with low cost and latency—ideal for real-time, large context, or multitask AI applications.
Input: $0.15 / Output: $0.60
Qwen2 VL 72b Instruct
Jun 2024
imagetext
The 72B variant of the latest iteration of Qwen-VL model from Alibaba, representing nearly a year of innovation.

Input: $0.90 / Output: $0.90
gemini-1.5-pro
May 2024
Input: $1.25 / Output: $5.00
Text Embedding 004
May 2024
textembeddings
Text embeddings

Input: $0.02 / Output: $0.02
GPT-4o
May 2024
GPT-4o is our most advanced multimodal model that’s faster and cheaper than GPT-4 Turbo with stronger vision capabilities. The model has 128K context and an October 2023 knowledge cutoff.
Input: $2.50 / Output: $10.00
Hermes3 70B
Apr 2024
Hermes 3 is the latest version of the flagship Hermes series of LLMs by Nous Research
Input: $0.20 / Output: $0.20
test
Nov 1111
texttext
test
BInference by Baseten
$1.00
codestral-2405
Oct 2024
Input: $0.20 / Output: $0.60
mistral-small-2409
Oct 2024
Cost-efficient, fast, and reliable option for use cases such as translation, summarization, and sentiment analysis.
Input: $0.20 / Output: $0.60
pixtral-12b
Oct 2024
Input: $0.15 / Output: $0.15
ministral-8b-latest
Oct 2024
Input: $0.10 / Output: $0.10
mistral-nemo
Oct 2024
Input: $0.15 / Output: $0.15
Llama 3.1 8B Instruct
Nov 2024
Input: $0.20 / Output: $0.20
Llama 3.2 11B Vision (Preview)
Nov 2024
imagetext
A powerful multimodal model capable of processing both text and image inputs that supports multilingual, multi-turn conversations, tool use, and JSON mode.

Input: $0.18 / Output: $0.18
Llama-3 8B-instruct
Oct 2024
Input: $0.20 / Output: $0.20
Llama 3.2 90B Vision (Preview)
Dec 2024
imagetext
A powerful multimodal model capable of processing both text and image inputs that supports multilingual, multi-turn conversations, tool use, and JSON mode.

Input: $0.90 / Output: $0.90
Hermes 3 405B
Nov 2024
Hermes 3 405B is the latest flagship model in the Hermes series of LLMs by Nous Research, and the first full parameter finetune since the release of Llama-3.1 405B.
Input: $0.20 / Output: $0.20
Text Embeddings Small v3
Nov 2024
textembeddings
OpenAI’s text embeddings measure the relatedness of text strings.
Input: $0.02 / Output: $0.02
open-mixtral-8x7b
Oct 2024
A 7B sparse Mixture-of-Experts (SMoE). Uses 12.9B active parameters out of 45B total.
Input: $0.70 / Output: $0.70
meta-llama/Meta-Llama-3-8B-Instruct-Turbo
Oct 2024
Input: $0.18 / Output: $0.18
?broken_model
Oct 2024
Input: $0.00 / Output: $0.00
?BAAI/bge-m3
Dec 2024
textembeddings
The BAAI/bge-m3 is an advanced multilingual embedding model supporting over 100 languages, designed for high-performance semantic similarity and retrieval tasks. By generating dense vector representations that capture semantic meaning across diverse linguistic contexts, the model enables sophisticated cross-lingual text understanding and comparison.

$0.00
open-mistral-7b
Oct 2024
Input: $0.25 / Output: $0.25
Qwen/Qwen2-0.5B-Instruct
Dec 2024
texttext
Qwen2-0.5B-Instruct is a compact 500 million parameter language model from Alibaba's Qwen series, designed for efficient instruction-following and conversational tasks with a lightweight architecture optimized for resource-constrained environments.

$0.00
Qwen/Qwen2-7B-Instruct
Dec 2024
texttext
Qwen2-7B-Instruct is an advanced 7 billion parameter large language model from Alibaba's Qwen series, optimized for instructional and conversational tasks with strong multilingual capabilities and improved performance across reasoning, mathematics, and coding benchmarks. The model builds on the Qwen series' reputation for high-quality open-source AI models, offering enhanced instruction-following abilities and efficient text generation across multiple domains.

$0.00
open-mixtral-8x22b
Oct 2024
Mixtral 8x22B is currently the most performant open model. A 22B sparse Mixture-of-Experts (SMoE). Uses only 39B active parameters out of 141B.
Input: $2.00 / Output: $6.00
Qwen 2.5 Coder 32B Instruct
Mar 2025
Specializes in code generation, reasoning, and fixing with 128K token context, open-source licensing, and local deployment capabilities.

Input: $0.90 / Output: $0.90
Claude 3.7 Sonnet
Feb 2025
Input: $3.00 / Output: $15.00
Mistral Large
Oct 2024
Excels at complex multilingual reasoning, code generation, and precise instruction following, with native fluency in five languages and advanced function calling.
Input: $2.00 / Output: $6.00
?thenlper/gte-large
Dec 2024
texttext
General Text Embeddings (GTE) model. Towards General Text Embeddings with Multi-stage Contrastive Learning. The GTE models are trained by Alibaba DAMO Academy. They are mainly based on the BERT framework

$0.00