Build World-Class
AI Datasets. Together.

Open-source tools to track, iterate, collaborate on, and discover multi-modal data in any format.
ImageAudioVideoTabularTextMore...

Hero Video
Trusted by a community of engineers and researchers at companies like
Berkeley University of California
Disney
Google
IBM
Johns Hopkins University
JP Morgan Chase & Co
Live Nation
Massachusetts Insititute of Technology
nVidia
Stanford University
University of Cambridge
University of Michigan
Model Inference

Run Models On Your Data

Oxen makes it easy to choose the right model, get to the perfect prompt, or kick off the data flywheel that is needed to improve state of the art AI.

Public Datasets

Explore Datasets

Oxen’s public and private datasets allow you to iterate on data within your organization or share them with the world.

ox/Flowers

Classify some flowers

3.7K image files > 99%
4 text files < 1%
1 tabular files < 1%
Updated: 7/17/2024
14
ox/WikiSQL

A large crowd-sourced dataset for developing natural language interfaces for relational databases.

1 text files 50.0%
1 tabular files 50.0%
Updated: 1/16/2024
42
3 tabular files > 99%
Updated: 6/5/2024
32
Measure Performance

Better Datasets.
Better AI.

AI is only as good as the datasets you feed it. Gain visibility into the data that goes in and out of your model.

Version Control

Find the changes that matter

Datasets change every day. Oxen’s version control allows you to quickly narrow down the most important changes that affect your model.

Scalability & Versatility

Thousands of hours of audio?
Millions of images?
Billion rows in your csv?
No problem.

Oxen’s data version control is built to handle data of any shape or size.

structured storage
Performance

Built for speed

Oxen.ai saves your engineers hours syncing data from training, testing, to evaluation. From fast syncing of data to removing push/pull bottlenecks from traditional VCS systems, Oxen.ai was built for machine learning datasets and workflows.

Data Visibility

Goodbye Messy Blob Storage.
Hello Oxen.

Oxen’s data version control turns your unstructured data into beautifully rendered datasets that evolve over time. Dive into any version of the dataset at any point in time and see exactly what changed.

Command Line Tooling

Powered by industrial strength version control

Oxen.ai has re-imagined version control for data. At the core are the same principles that have made Git so powerful, but Oxen has optimized down to the merkle trees, hashing principles, and network protocols to make it work effortlessly with large scale datasets.

Collaboration

Collaborate with your team

Oxen.ai allows all your stakeholders to share, review, and edit data together. ML Engineering, Data Science, Product, Legal, Auditing, and Community can all contribute. The more eyes the better.

Features that matter to you and your team

Ease of UsePerformanceCollaborationOpen SourceData VisibilityScalabilityAny Data FormatCompare & Diff
Hugging Face***
Neptune.ai
LakeFS
DVC
GitLFS
Project Nessie
*Hugging Face is migrating their data versioning from Git LFS to the closed source Xethub. As Hugging Face is not open source, it is expected that Xethub will remain closed-source
**While small datasets have a preview and are visible in Hugging Face, large datasets are unable to be seen through their website
Community

Join the growing Herd

Oxen.ai has developed a strong and growing community of individuals focused on furthering machine learning and artificial intelligence. From academic researchers training the next generation of models, to full-stack developers leveraging existing API's to build amazing products. Every Friday we get together and read research papers, discuss them, and apply them to our own work.