Dataset Ninja

Here in Supervisely we got tired of sifting through garbage datasets on platforms that offer no visualization or insights. So, we built something we needed: a curated catalog of high-quality datasets for computer vision supercharged with:

  • No worthless datasets of 10 images
  • Explore labels and statistics
  • Single JSON format with a download link
  • Free and commercial datasets with a clear license
  • Convenient search across thousands of datasets and labels
  • Regular updates with new data

How is this different from other catalogs?

They have virtually no quality control and focus on quantity, rather than quality. Our goal is to collect only datasets you can actually train production models on and prove it by providing descriptive visualizations.

Why datasets?

Data is a crucial factor for machine learning, especially for computer vision. You can easily find SotA neural network architectures freely available for the majority of models, but almost never — the data.

I want my dataset to be here

Yay, that’s the spirit! Please create a new dataset proposal and don’t forget to attach a brief description, license and download link to your dataset.

I don’t want my dataset to be here

Sorry to hear that. 😢 Send us a message to inbox@datasetninja.com from the email address so that we can verify you as the owner and we will remove it quickly.

Are those datasets free?

Datasets you can find here are mostly distributed under open licenses, but please always make sure to check the corresponding License section of every dataset you plan to use — some may be commercial or require attribution.

I have an idea or feedback

Awesome! Post your suggestions here and we will figure something out.

Why ninjas?

In two words: why not?

We aim to enhance datasets by incorporating cool visualizations and user-friendly analytical tools that come readily accessible. Picture yourself as a Master of Datasets (aka Dataset Ninja 🥷🏿), proficient in the art of Machine Learning datasets.