Dataset Ninja LogoDataset Ninja:

CLVOS23 Dataset

133625150
Taggeneral
Tasksemantic segmentation
Release YearMade in 2023
LicenseCC BY-NC-SA 4.0
Download857 MB

Introduction #

Released 2023-11-04 ·Amir Nazemi, Zeyad Moustafa, Paul Fieguth

The authors proposed the CLVOS23: A Long Video Object Segmentation Dataset for Continual Learning - new long-video object segmentation dataset for continual learning, as a realistic and significantly greater challenge for testing VOS(Video Object Segmentation) methods on long videos. The frames for dataset were taken from the Long Videos dataset (rat, dressage, blueboy videos) and from the YouTube (car, dog, parkour, skating, skiing, skiing-long videos).

Motivation

The goal of Video Object Segmentation (VOS) is to accurately extract a target object at the pixel level from each frame of a given video. VOS solutions are generally divided into two categories: semi-supervised (or one-shot) VOS, where ground-truth masks of the target objects are provided for at least one frame during inference, and unsupervised VOS, where the model has no prior knowledge about the objects.

In semi-supervised VOS, online approaches update part of the VOS model based on evaluated frames and estimated masks. The idea is that videos contain relevant information beyond just the current frame’s mask, which a model can leverage by learning during the evaluation process. However, online model learning raises questions about how effectively the model adapts from frame to frame, especially when new frames differ significantly from the initial ground-truth frame. This challenge falls under the domain of continual learning, a type of machine learning where a model is trained on a sequence of tasks and is expected to continuously improve its performance on each new task while maintaining its ability to perform well on previously learned tasks.

Current state-of-the-art semi-supervised and online VOS methods excel on short videos, typically a few seconds or up to 100 frames long, as seen in datasets like DAVIS16, DAVIS17, and YouTube-VOS18. However, these methods often struggle to maintain performance on long videos, such as those found in the Long Videos dataset. This issue has not been thoroughly investigated or addressed within the VOS field, particularly through the lens of continual learning.

Continual learning methods are usually evaluated on classification datasets like MNIST, CIFAR10, and ImageNet, or on datasets specifically designed for continual learning, such as Core50. In these scenarios, the classification dataset is presented to the model as a sequential data stream in online continual learning methods. Unlike these datasets and testing scenarios, long video object segmentation has numerous real-world applications, including video summarization, human-computer interaction, and autonomous vehicles, which necessitate robust performance over extended sequences.

Dataset description

In the ideal case, where the samples in a video sequence are independent and identically distributed (i.i.d.), machine learning problems are made significantly easier, since there is then no need to handle distributional drift and temporal dependency in VOS. However, i.i.d. assumption is not valid in video data.

The dataset consisted of three long sequences with a total of 7411 frames. The i.i.d. assumption is invalid for “dressage” videos due to the significant distribution drifts that occur, which align more closely with the non-i.i.d. assumptions of continual learning. This new continual learning-based interpretation of long video sequences is being discussed for the first time in the context of VOS and continual learning. The Long Videos dataset currently selects evaluation label masks uniformly, failing to adequately test how well a VOS solution handles sudden shifts in the target’s appearance. The authors propose an alternative approach: annotating frames for evaluation based on the distribution drifts occurring in each video sequence.

image

A subset of frames from “dressage” video of the Long Videos dataset. The video consists of 23 sub-chunks that are separated from each other by significant distributional drifts or discontinuities. The lower (sparse) row, in each set, show the annotated frames. The annotations provided by are shown without a border, whereas the annotated masks added via this paper, and made available via the CLVOS23 dataset, are shown with blue borders. The four sub-chunks that are missing from the Long Videos dataset are encircled in red.

Image above shows 23 sub-chunks of videos in the “dressage” video of the Long Videos dataset. Each sub-chunk is separated from its previous and next sub-chunks based on the distribution drifts. When an online or offline event, such as a sports competition, is recorded using multiple cameras, these distribution drifts are common in mediaprovided videos. As a result, in the authors proposed dataset, they first utilize the following strategy to select candidate frames for annotation and evaluation.

  • They select the first frame of each sub-chunk S. It is interesting to see how VOS models handle the distribution drift that happens in the sequence, which is arriving a new task in continual learning.
  • The last frame of each sequence is also selected. The first frame ground truth label mask is given to the model as it is set in the semi-supervised VOS scenario.
  • One frame from the middle of each sub-chunk is also selected for being annotated.

For CLVOS23, in addition to the 3 videos from the Long Videos dataset, the authors added the other 6 videos form YouYube. All frames of the 6 new added videos are extracted with the rate of 15 Frames Per Second (FPS). To ensure that all distribution drifts are captured, the authors only annotate the first frame of each sub-chunk in the Long Videos dataset and add them to the uniformly selected annotated frames. The proposed dataset has following advantages over the Long Videos dataset.

  • It added 5951 frames to 7411 frames of the Long Videos dataset.
  • CLVOS23 increased the number of annotation frames from 63 in the Long Videos dataset to 284.
  • It increases the number of videos from 3 to 9.
  • The selected annotated frames are chosen based on the distribution drift that happens in the videos (subchunks) rather than being uniformly selected.
Video name #Sub-chunks (tasks) #Frames #Annotated frames
dressage 23 3589 43
blueboy 27 2406 47
rat 22 1416 42
car 18 1109 37
dog 12 891 25
parkour 24 1578 49
skating 5 778 11
skiing 5 692 11
skiing-long 9 903 19

Each video sequence’s specifications in the proposed CLVOS23 dataset. The first three videos (Dressage, Blueboy, and Rat) are taken directly from the Long Videos dataset and the authors added additional annotated ground-truth frames to each of them to make them more appropriate for continual learning.

It is worth noting that for a long VOS dataset, it is very expensive and sometimes unnecessary to annotate all the frames of videos for evaluation. It is worth mentioning that the authors utilized the Toronto Annotation Suite to annotate the selected frames for evaluation. The frames of new 6 videos were resized to have a height of 480 pixels. The width of each frame is defined as proportionate to its height.

ExpandExpand
Dataset LinkHomepageDataset LinkResearch PaperDataset LinkKaggle

Summary #

CLVOS23: A Long Video Object Segmentation Dataset for Continual Learning is a dataset for semantic segmentation, object detection, and semi supervised learning tasks. It is applicable or relevant across various domains.

The dataset consists of 13362 images with 284 labeled objects belonging to 5 different classes including person, dressage, rat, and other: car and dog.

Images in the CLVOS23 dataset have pixel-level semantic segmentation annotations. There are 13078 (98% of the total) unlabeled images (i.e. without annotations). There are no pre-defined train/val/test splits in the dataset. Alternatively, the dataset could be split into 9 videos names: dressage (3589 images), blueboy (2406 images), parkour boy (1578 images), rat (1416 images), car (1109 images), skiing slalom (903 images), dog (891 images), skating (778 images), and skiing (692 images). The dataset was released in 2023 by the University of Waterloo, Canada.

Dataset Poster

Explore #

CLVOS23 dataset has 13362 images. Click on one of the examples below or open "Explore" tool anytime you need to view dataset images with annotations. This tool has extended visualization capabilities like zoom, translation, objects table, custom filters and more. Hover the mouse over the images to hide or show annotations.

OpenSample annotation mask from CLVOS23Sample image from CLVOS23
OpenSample annotation mask from CLVOS23Sample image from CLVOS23
OpenSample annotation mask from CLVOS23Sample image from CLVOS23
OpenSample annotation mask from CLVOS23Sample image from CLVOS23
OpenSample annotation mask from CLVOS23Sample image from CLVOS23
OpenSample annotation mask from CLVOS23Sample image from CLVOS23
OpenSample annotation mask from CLVOS23Sample image from CLVOS23
OpenSample annotation mask from CLVOS23Sample image from CLVOS23
OpenSample annotation mask from CLVOS23Sample image from CLVOS23
OpenSample annotation mask from CLVOS23Sample image from CLVOS23
OpenSample annotation mask from CLVOS23Sample image from CLVOS23
OpenSample annotation mask from CLVOS23Sample image from CLVOS23
OpenSample annotation mask from CLVOS23Sample image from CLVOS23
OpenSample annotation mask from CLVOS23Sample image from CLVOS23
OpenSample annotation mask from CLVOS23Sample image from CLVOS23
OpenSample annotation mask from CLVOS23Sample image from CLVOS23
OpenSample annotation mask from CLVOS23Sample image from CLVOS23
OpenSample annotation mask from CLVOS23Sample image from CLVOS23
OpenSample annotation mask from CLVOS23Sample image from CLVOS23
OpenSample annotation mask from CLVOS23Sample image from CLVOS23
OpenSample annotation mask from CLVOS23Sample image from CLVOS23
OpenSample annotation mask from CLVOS23Sample image from CLVOS23
OpenSample annotation mask from CLVOS23Sample image from CLVOS23
OpenSample annotation mask from CLVOS23Sample image from CLVOS23
OpenSample annotation mask from CLVOS23Sample image from CLVOS23
OpenSample annotation mask from CLVOS23Sample image from CLVOS23
OpenSample annotation mask from CLVOS23Sample image from CLVOS23
OpenSample annotation mask from CLVOS23Sample image from CLVOS23
OpenSample annotation mask from CLVOS23Sample image from CLVOS23
OpenSample annotation mask from CLVOS23Sample image from CLVOS23
👀
Have a look at 13362 images
View images along with annotations and tags, search and filter by various parameters

Class balance #

There are 5 annotation classes in the dataset. Find the general statistics and balances for every class in the table below. Click any row to preview images that have labels of the selected class. Sort by column to find the most rare or prevalent classes.

Search
Rows 1-5 of 5
Class
Images
Objects
Count on image
average
Area on image
average
person
mask
137
137
1
8.22%
dressage
mask
43
43
1
9.77%
rat
mask
42
42
1
5.56%
car
mask
37
37
1
13.74%
dog
mask
25
25
1
3.96%

Co-occurrence matrix #

Co-occurrence matrix is an extremely valuable tool that shows you the images for every pair of classes: how many images have objects of both classes at the same time. If you click any cell, you will see those images. We added the tooltip with an explanation for every cell for your convenience, just hover the mouse over a cell to preview the description.

Images #

Explore every single image in the dataset with respect to the number of annotations of each class it has. Click a row to preview selected image. Sort by any column to find anomalies and edge cases. Use horizontal scroll if the table has many columns for a large number of classes in the dataset.

Object distribution #

Interactive heatmap chart for every class with object distribution shows how many images are in the dataset with a certain number of objects of a specific class. Users can click cell and see the list of all corresponding images.

Class sizes #

The table below gives various size properties of objects for every class. Click a row to see the image with annotations of the selected class. Sort columns to find classes with the smallest or largest objects or understand the size differences between classes.

Search
Rows 1-5 of 5
Class
Object count
Avg area
Max area
Min area
Min height
Min height
Max height
Max height
Avg height
Avg height
Min width
Min width
Max width
Max width
person
mask
137
8.21%
31.31%
0.08%
25px
5.21%
480px
100%
273.04px
56.88%
31px
3.63%
567px
66.47%
dressage
mask
43
9.77%
75.5%
0.35%
38px
10.56%
360px
100%
194.02px
53.9%
23px
4.79%
428px
89.17%
rat
mask
42
5.55%
25.89%
0.12%
41px
8.54%
420px
87.5%
174.81px
36.42%
27px
2.81%
471px
49.06%
car
mask
37
13.74%
81.51%
0.12%
17px
3.54%
399px
83.12%
129.32px
26.94%
39px
4.57%
853px
100%
dog
mask
25
3.96%
24.58%
0.1%
29px
6.04%
406px
84.58%
147.84px
30.8%
17px
1.99%
573px
67.17%

Spatial Heatmap #

The heatmaps below give the spatial distributions of all objects for every class. These visualizations provide insights into the most probable and rare object locations on the image. It helps analyze objects' placements in a dataset.

Spatial Heatmap

Objects #

Table contains all 284 objects. Click a row to preview an image with annotations, and use search or pagination to navigate. Sort columns to find outliers in the dataset.

Search
Rows 1-10 of 284
Object ID
Class
Image name
click row to open
Image size
height x width
Height
Height
Width
Width
Area
1
rat
mask
rat_05559.jpg
480 x 960
167px
34.79%
251px
26.15%
6.65%
2
person
mask
parkour_boy_0085.jpg
480 x 853
164px
34.17%
97px
11.37%
1.66%
3
rat
mask
rat_01746.jpg
480 x 960
95px
19.79%
63px
6.56%
0.99%
4
person
mask
skiing_0546.jpg
480 x 853
42px
8.75%
41px
4.81%
0.18%
5
dressage
mask
dressage_12255.jpg
360 x 480
114px
31.67%
81px
16.88%
2.33%
6
rat
mask
rat_04902.jpg
480 x 960
198px
41.25%
323px
33.65%
8.66%
7
person
mask
skiing_slalom_0617.jpg
480 x 853
82px
17.08%
112px
13.13%
0.83%
8
person
mask
blueboy_00909.jpg
480 x 853
480px
100%
332px
38.92%
24.19%
9
car
mask
car_0090.jpg
480 x 853
61px
12.71%
98px
11.49%
1.2%
10
person
mask
blueboy_01188.jpg
480 x 853
406px
84.58%
203px
23.8%
13.32%

License #

CLVOS23: A Long Video Object Segmentation Dataset for Continual Learning is under CC BY-NC-SA 4.0 license.

Source

Citation #

If you make use of the CLVOS23 data, please cite the following reference:

@dataset{CLVOS23,
  author={Amir Nazemi and Zeyad Moustafa and Paul Fieguth},
  title={CLVOS23: A Long Video Object Segmentation Dataset for Continual Learning},
  year={2023},
  url={https://github.com/Amir4g/CLVOS23}
}

Source

If you are happy with Dataset Ninja and use provided visualizations and tools in your work, please cite us:

@misc{ visualization-tools-for-clvos23-dataset,
  title = { Visualization Tools for CLVOS23 Dataset },
  type = { Computer Vision Tools },
  author = { Dataset Ninja },
  howpublished = { \url{ https://datasetninja.com/clvos23 } },
  url = { https://datasetninja.com/clvos23 },
  journal = { Dataset Ninja },
  publisher = { Dataset Ninja },
  year = { 2024 },
  month = { sep },
  note = { visited on 2024-09-08 },
}

Download #

Dataset CLVOS23 can be downloaded in Supervisely format:

As an alternative, it can be downloaded with dataset-tools package:

pip install --upgrade dataset-tools

… using following python code:

import dataset_tools as dtools

dtools.download(dataset='CLVOS23', dst_dir='~/dataset-ninja/')

Make sure not to overlook the python code example available on the Supervisely Developer Portal. It will give you a clear idea of how to effortlessly work with the downloaded dataset.

The data in original format can be downloaded here.

. . .

Disclaimer #

Our gal from the legal dep told us we need to post this:

Dataset Ninja provides visualizations and statistics for some datasets that can be found online and can be downloaded by general audience. Dataset Ninja is not a dataset hosting platform and can only be used for informational purposes. The platform does not claim any rights for the original content, including images, videos, annotations and descriptions. Joint publishing is prohibited.

You take full responsibility when you use datasets presented at Dataset Ninja, as well as other information, including visualizations and statistics we provide. You are in charge of compliance with any dataset license and all other permissions. You are required to navigate datasets homepage and make sure that you can use it. In case of any questions, get in touch with us at hello@datasetninja.com.