Evaluations
Run models against your data
Introducing Evaluations, a powerful feature designed to enable you to effortlessly test and compare a selection of AI models against your datasets.
Whether you're fine-tuning models or evaluating performance metrics, Oxen evaluations simplifies the process, allowing you to quickly and easily run prompts through an entire dataset.
Once you're happy with the results, output the resulting dataset to a new file, another branch, or directly as a new commit.
a5e2b6c4-d13d-4e59-b655-0ee16530f779
GoogleGoogle/Gemini 1.5 Flashtexttext
Ox
ox
8 months ago
Answer the math word problem

{problem}
completed 5 row sample2227 tokens$ 0.0006 1 iteration
e65016f5-a74d-457f-9de7-1663389f14c3
OpenAIOpenAI/GPT-4o Minitexttext
Ox
ox
8 months ago
Simplify the text below

{solution}
completed 100 rows45813 tokens$ 0.0167 2 iterations
625a5f74-1831-43ac-a0b8-8618274d2823
OpenAIOpenAI/GPT-4otexttext
Ox
ox
9 months ago
Take a deep breathe and think step by step through the problem.

{problem}
completed 5 row sample4386 tokens 1 iteration
9b25f098-be8c-4332-add1-e3466250ded0
OpenAIOpenAI/GPT-4otexttext
Ox
ox
9 months ago
Let's think step by step

{problem}
CoT_step_by_step
completed 100 rows62998 tokens 2 iterations