Highly Accurate
Dichotomous Image Segmentation


Image Background Removal
Art Design
Simulate View Motion
AR application based on images
AR application based on videos
3D Video Production


Currently, existing image segmentation tasks mainly focus on segmenting objects with specific characteristics, e.g., salient, camouflaged, meticulous, or specific categories. Most of them have the same input/output formats, and barely use exclusive mechanisms designed for segmenting targets in their models, which means almost all tasks are dataset-dependent. Thus, it is very promising to formulate a category-agnostic DIS task for accurately segmenting objects with different structure complexities, regardless of their characteristics. Compared with semantic segmentation, the proposed DIS task usually focuses on images with single or a few targets, from which getting richer accurate details of each target is more feasible.

To build the highly accurate Dichotomous Image Segmentation dataset (DIS5K), we first manually collected over 12,000 images from Flickr1 based on our pre-designed keywords. Then, we obtained 5,470 images of 22 groups and 225 categories from the 12,000 images according to the structural complexities of the objects. Each image is then manually labeled with pixel-wise accuracy using GIMP. The labeled targets in DIS5K mainly focus on the “objects of the images defined by the pre-designed keywords (categories)” regardless of their characteristics e.g., salient, common, camouflaged, meticulous, etc. The average per-image labeling time is ∼30 minutes and some images cost up to 10 hours.


Groups and Categories


Quantitative Complexities


Qualitative Complexities


Images Characteristics


Intra- and Inter-categories similarities


Dataset Splitting

Download (Newly Released Jul. 16, 2022)

DIS5K Dataset (5,470)


Training Set (3,000)

The training set (DIS-TR) contains 3,000 images paired with accurately labeled binary masks.


Validation Set (470)

The validation set (DIS-VD) contains 470 images paired with accurately labeled binary masks.


Test Set (2,000)

The test set contains 2,000 images, which are split into four subsets (DIS-TE1, DIS-TE2, DIS-TE3 and DIS-TE4, each contains 500 images) based on the shape complexities of the labeled masks.



Qualitative Comparisons Against SOTAs.


Xuebin Qin, Hang Dai, Xiaobin Hu, Deng-Ping Fan, Ling Shao, and Luc Van Gool. Highly Accurate Dichotomous Image Segmentation. ECCV, 2022.[Arxiv][中文][Github][DIS5K Dataset]


               author={Xuebin Qin and Hang Dai and Xiaobin Hu and Deng-Ping Fan and Ling Shao and Luc Van Gool},
               title={Highly Accurate Dichotomous Image Segmentation},