Currently, existing image segmentation tasks mainly focus on segmenting objects with specific characteristics, e.g., salient, camouflaged, meticulous, or specific categories. Most of them have the same input/output formats, and barely use exclusive mechanisms designed for segmenting targets in their models, which means almost all tasks are dataset-dependent. Thus, it is very promising to formulate a category-agnostic DIS task for accurately segmenting objects with different structure complexities, regardless of their characteristics. Compared with semantic segmentation, the proposed DIS task usually focuses on images with single or a few targets, from which getting richer accurate details of each target is more feasible.
To build the highly accurate Dichotomous Image Segmentation dataset (DIS5K), we first manually collected over 12,000 images from Flickr1 based on our pre-designed keywords. Then, we obtained 5,470 images of 22 groups and 225 categories from the 12,000 images according to the structural complexities of the objects. Each image is then manually labeled with pixel-wise accuracy using GIMP. The labeled targets in DIS5K mainly focus on the “objects of the images defined by the pre-designed keywords (categories)” regardless of their characteristics e.g., salient, common, camouflaged, meticulous, etc. The average per-image labeling time is ∼30 minutes and some images cost up to 10 hours.
The training set (DIS-TR) contains 3,000 images paired with accurately labeled binary masks.
Validation Set (470)
The validation set (DIS-VD) contains 470 images paired with accurately labeled binary masks.
Test Set (2,000)
The test set contains 2,000 images, which are split into four subsets (DIS-TE1, DIS-TE2, DIS-TE3 and DIS-TE4, each contains 500 images) based on the shape complexities of the labeled masks.
Benchmark
Qualitative Comparisons Against SOTAs.
Citation
Xuebin Qin, Hang Dai, Xiaobin Hu, Deng-Ping Fan, Ling Shao, and Luc Van Gool. Highly Accurate Dichotomous Image Segmentation. ECCV, 2022.[Arxiv][中文][Github][DIS5K Dataset]
Bibtex
@InProceedings{qin2022,
author={Xuebin Qin and Hang Dai and Xiaobin Hu and Deng-Ping Fan and Ling Shao and Luc Van Gool},
title={Highly Accurate Dichotomous Image Segmentation},
booktitle={ECCV},
year={2022} }
Visitors
Image Background Removal
Image background removal.
Art Design
Art Design based on backgrounds removed images.
Simulate View Motion
Combine one or multiple backgrounds removed images with a background image to simulate the camera view motion.
AR Application
This video presents the process of moving object in cyber-space to real world.
AR Application
Demo generated by overlapping backgrounds removed videos with an image. This demo tries to bridge the cyber-space and the real-world.
3D Video Production
The 3D model shown in this video is built by extruding highly accurate segmentation mask. Its texture is the background-removed images. The whole process requires almost zero human interventions.
Groups and Categories
Our highly accurate Dichotomous Image Segmentation dataset (DIS5K) contains 5,470 images images of 22 groups and 225 categories manually collected from Flickr based on our pre-designed keywords.
Number of Images:
5,470
Number of Groups:
22
Number of Categories:
225
Quantitative Complexities
These plots illustrate the comparisons of different dichotomous datasets in terms of correlations between dataset scale (D), the isoperimetric inequality quotient (IPQ), the number of object contours (Cnum) and the number of dominant points (Pnum).
D:
Higher is better
IPQ:
Higher is better
Cnum:
Higher is better
Pnum:
Higher is better
Qualitative Complexities
Our DIS5K dataset contains diversified targets from different categories, which have very different structure complexities.
Compared with other dichotomous segmentation datasets, our DIS5K provides more detailed and accurate ground truth masks and has larger intra-categorical image diversities.
Image Characteristics
This figure shows five typical samples from our DIS5K, which have certain characteristics similar to that of the existing dichotomous segmentation tasks, such as salient object detection (SOD), salient object in clutter (SOC), camouflaged object detection (COD), thin object segmentation (TOS), meticulous object segmentation (MOS).
Intra- and Inter-categories similarities
Our DIS5K dataset provides relatively richer samples for studying the intra-category and inter-category similarities and dissimilarities. More qualitative and quantitative studies will be helpful to diversified vision tasks, such as image (shape) classification, segmentation, etc.
Keywords:
Intra- and inter-categories similarities.
Dataset Splitting
We split 5,470 images in DIS5K into three subsets: DIS-TR (3,000), DIS-VD (470), and DIS-TE (2,000) for training, validation, and testing. Since the object shape and structure complexities in our dataset are diversified, the 2000 images of DIS-TE are further split into four subsets with ascending shape complexities for a more comprehensive evaluation. Specifically, we first rank the 2,000 testing images in ascending order according to the multiplication (IPQ × Pnum) of their structure complexities IPQ and boundary complexities Pnum. Then, DIS-TE is split into four subsets (i.e., DIS-TE1∼DIS-TE4) with 500 images in each subset to represent four testing difficulty levels.