Evaluations
Run models against your data
Introducing Evaluations, a powerful feature designed to enable you to effortlessly test and compare a selection of AI models against your datasets.
Whether you're fine-tuning models or evaluating performance metrics, Oxen evaluations simplifies the process, allowing you to quickly and easily run prompts through an entire dataset.
Once you're happy with the results, output the resulting dataset to a new file, another branch, or directly as a new commit.
aea13857-4b44-4e87-9ddd-08222f59f84a
DeepSeekDeepSeek/Deepseek V3text → text
Ox
ox
2 months ago
{uuid} You are a {role} named {name} who is using an iPhone with an intermediate level of experience. Write a {descriptor} question about the product that a customer support agent might answer. Only write the question and nothing else.
completed 100 rows11711 tokens$ 0.0105 4 iterations
9978f9bf-a5db-4e4d-9dce-d184b35d4ac9
Microsoft/Phi 4 Mini Instructtext → text
Ox
ox
4 months ago
Write a body for this title

{title}
completed 5 row sample0 tokens$ 0.0000 1 iteration
777a0c92-b2e1-4182-9257-eb215c87868b
QwenQwen/ Qwen 2.5 Coder 32B Instructtext → text
Ox
ox
4 months ago
Imagine you are {role} named {name}, who is also a {experience} Rust programmer. Describe a Rust function that you want written in order to help with your job. The description should detail how the code should work but not the code itself. The code needed to solve the problem should have something to do with {problem}.

Keep the description {descriptor}. Answer with only the description of what you want, nothing else. Ways that you can start the query are:

* Write a function that
* I need a function
* code that
* write a rust program that
completed 5 row sample1228 tokens$ 0.0011 1 iteration
2256c202-a80d-40f9-a8ff-9cc2b3eebb50
QwenQwen/ Qwen 2.5 Coder 32B Instructtext → text
Ox
ox
4 months ago
You are a pragmatic Rust programmer who enjoys test driven development. Given the following question, write a Rust function to complete the task. Make the code simple and easy to understand. The code should pass `cargo build` and `cargo clippy`. Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else. Be careful with your types, and try to limit yourself to the basic built in types and standard library functions. When writing the function you can think through how to solve the problem and perform reasoning in the comments above the function.

Then write unit tests for the function you defined. Write three unit tests for the function. The tests should be a simple line delimited list of assert! or assert_eq! statements. When writing the unit tests you can have comments specifying what you are testing in plain english. The tests should use super::*.


An example output should look like the following:

```rust
/// Reasoning goes here
/// and can be multi-line
fn add_nums(x: i32, y: i32) -> i32 {
  x + y
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_add_nums() {
        // Test adding positive numbers
        assert_eq!(add_nums(4, 2), 6);
        // Test adding a positive and negative number
        assert_eq!(add_nums(4, -2), 2);
        // Test adding two negative numbers
        assert_eq!(add_nums(-12, -1), -13);
    }
}
```

Make sure to only respond with a single  ```rust``` block. The unit tests must be defined inside the mod tests {} module. Limit the unit tests to 3 assert statements.

Here is the question:
{prompt}
completed 100 rows100397 tokens$ 0.0904 3 iterations
9a35ceba-b873-4eac-ae17-de5f5ba01ab7
QwenQwen/ Qwen 2.5 Coder 32B Instructtext → text
Ox
ox
4 months ago
Imagine you are {role} named {name}, who is also a {experience} Rust programmer. Describe a Rust function that you want written in order to help with your job. The description should detail how the code should work but not the code itself. The code needed to solve the problem should have something to do with {problem}.

Keep the description {descriptor}. Answer with only the description of what you want, nothing else. Ways that you can start the query are:

* Write a function that
* I need a function
* code that
* write a rust program that
completed 100 rows23372 tokens$ 0.0210 2 iterations
d0eb1e04-eb88-4d35-978f-7ab711354b29
QwenQwen/ Qwen 2.5 Coder 32B Instructtext → text
Ox
ox
4 months ago
Imagine you are {role} named {name}, who is also a {experience} Rust programmer. Describe a Rust function that you want written in order to help with your job. The description should detail how the code should work but not the code itself. The code needed to solve the problem should have something to do with {problem}.

Keep the description as brief as possible. Answer with only the description of what you want, nothing else. Ways that you can start the query are:

* Write a function that
* I need a function
* code that
* write a rust program that
completed 5 row sample938 tokens$ 0.0008 4 iterations
fa546d1d-4dd0-46e3-9602-84961b916af8
QwenQwen/QwQ 32Btext → text
Ox
ox
4 months ago
Imagine you are {role} named {name}, who is also a {experience} Rust programmer. Describe a Rust function that you want written in order to help with your job. The description should detail how the code should work but not the code itself. The code needed to solve the problem should have something to do with {problem}.

Keep the description as brief as possible. Answer with only the description of what you want, nothing else. Ways that you can start the query are:

* Write a function that
* I need a function
* code that
* 
completed 5 row sample5736 tokens$ 0.0052 1 iteration
b939c3b6-1545-43c0-89d9-176968eae2ed
MetaMeta/Llama 3.3 70B Speculative Decodingtext → text
Ox
ox
4 months ago
Imagine you are {role} named {name}, who is also a advanced Rust programmer. Describe a Rust function that you want written in order to help with your job. The description should detail how the code should work but not the code itself. The code needed to solve the problem should have something to do with {problem}.

Keep the description as brief as possible. Answer with only the description of what you want, nothing else. Ways that you can start the query are:

* Write a function that
* I need a function
* code that
* 
completed 5 row sample993 tokens$ 0.0006 12 iterations
a0789374-d27e-4d53-b4c8-47a3f37c9057
Anthropic AIAnthropic AI/claude-3-5-sonnet-20241022text → text
Ox
ox
4 months ago
Answer the following question

{prompt}
started Running Waiting... 0 tokens$ 0.0000 1 iteration