Butterfly-200 Dataset

sample_sysu_cloth

Description

Butterfly-200 is a dataset for multi-granurality image recognition.

  • 25,279 high-resolution butterfly images, including natural images with the buttery in their natural living environment and standard images with the buttery in the form of specimens
  • Covering 200 common species, 116 genera, 23 subfamilies, and 5 familiese.
  • All images are with four-level categories.
  • Citation

    “Fine-Grained Representation Learning and Recognition by Exploiting Hierarchical Semantic Embedding”, Tianshui Chen, Wenxi Wu, Yuefang Gao, Le Dong, Xiaonan Luo, Liang Lin 

    Downloads

    SYSU-Clothes Dataset

    sample_sysu_cloth

    Description

    SYSU-Clothes dataset is a new clothing database including elaborately annotated clothing items.

    • 2, 098 high-resolution street fashion photos with totally 59 tags
    • Wide range of styles, accessaries, garments, and pose
    • All images are with image-level annotations
    • 1000+ images are with pixel-level annotations

    Citation

    “Clothes Co-Parsing via Joint Image Segmentation and Labeling with Application to Clothing Retrieval”, Xiaodan Liang, Liang Lin*, Wei Yang, Ping Luo, Junshi Huang, and Shuicheng Yan,IEEE Transactions on Multimedia (T-MM), 18(6): 1175-1186, 2016.(A shorter previous version was published in CVPR 2014.) 

    Downloads


    Kinect2 Human Pose Dataset

    dataset_K2HGD

    Description

    Kinect2 Human Gesture Dataset (K2HGD) includes about 100K depth images with various human poses under challenging scenarios. As illustrated in Fig.~\ref{fig:dataset}, this dataset consists of 19 body joints of 30 subjects under ten different challenging scenes. The subject is asked to perform both normal daily poses and unusal poses. The human body joints are defined as follows: \emph{Head, Neck, MiddleSpine, RightShoulder, RightElbow, RightHand, LeftShoulder, LeftElbow, LeftHand, RightHip, RightKnee, RightFoot, LeftHip, LeftKnee, LeftFoot.}. The ground truth body joints are firstly estimated via the Kinect SDK, and further refined by active users.

    Citation

    Keze Wang, Shengfu Zhai, Hui Cheng, Xiaodan Liang, and Liang Lin. Human Pose Estimation from Depth Images via Inference Embedded Multi-task Learning. In Proceedings of the ACM International Conference on Multimedia (ACM MM), 2016.

     

    Downloads


    Object Extraction Dataset

    dataset_K2HGD

    Description

    This Object Extraction newly collected by us contains 10183 images with groundtruth segmentation masks. We selected the images from the PASCAL[1], iCoseg[2], Internet [3] dataset as well as other data (most of them are about people and clothes) from the web. We randomly split the dataset with 8230 images for training and 1953 images for testing.

    Citation

    Xiaolong Wang, Liliang Zhang, Liang Lin*, Zhujin Liang, Wangmeng Zuo, “Deep Joint Task Learning for Generic Object Extraction”, NIPS 2014.

     

    Downloads


    SYSU-Shape Dataset

    SYSU-Shape-Dataset

    Description

    SYSU-Shapes dataset is a new shape database including elaborately annotated shape contours. Compared with the existing shape databases, this database includes more realistic challenges in shape detection and localization, e.g., cluttered backgrounds, large intraclass variations, and different poses/views, in which part of the instances were originally used for appearance-based object detection.

    There are 5 categories, i.e. airplanes, boats, cars, motorbikes, and bicycles, and each category contains 200~500 images.

    Citation

    Liang Lin, Xiaolong Wang, Wei Yang, and JianHuang Lai, Discriminatively Trained And-Or Graph Models for Object Shape Detection, IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), DOI: 10.1109/TPAMI.2014.2359888, 2014.

    Downloads


    Taobao Commodity Dataset (TCD)

    TaobaoCommodity

    Description

    TCD contains 800 commodity images (dresses, jeans, T-shirts, shoes and hats) from the shops on the Taobao website. The ground truth masks of the TCD dataset are obtained by inviting common sellers of Taobao website to annotate their commodities, i.e., masking salient objects that they want to show from their exhibition. These images include all kinds of commodity with and without human models, thus having complex backgrounds and scenes with highly complex foregrounds. Pixel-accurate ground truth masks are given.

    Citation

     Keze Wang, Liang Lin, Jiangbo Lu, Chenglong Li, Keyang Shi. “PISA: Pixelwise Image Saliency by Aggregating Complementary Appearance Contrast Measures with Edge-Preserving Coherence.”, in IEEE Trans. Image Process., vol. 24, no. 10, pp. 3019-3033, 2015. (A shorter previous version was published in CVPR 2013.)

    Downloads


    SYSU-FLL-CEUS Dataset

    TaobaoCommodity

    Description

    The dataset consists of CEUS data of FLLs in three types: 186 HCC, 109 HEM and 58 FNH instances (i.e. 186 malignant and 167 benign instances). The equipment used was Aplio SSA-770A (Toshiba Medical System), and all videos included in the dataset are collected from pre-operative scans

    Citation

    Xiaodan Liang, Liang Lin, Qingxing Cao, Rui Huang, Yongtian Wang, “Recognizing Focal Liver Lesions in CEUS with Dynamically Trained Latent Structured Models”. IEEE TRANSACTIONS ON MEDICAL IMAGING (T-MI), 2015

    Downloads


    HumanParsing-Dataset

    Description

    This human parsing dataset includes the detailed pixel-wise annotations for fashion images, which is proposed in our TPAMI paper “Deep Human Parsing with Active Template Regression”, and ICCV 2015 paper “Human Parsing with Contextualized Convolutional Neural Network”. This dataset contains 7700 images.We use 6000 images for training,1000 for testing and 700 as the validation set.

    Citation

    Xiaodan Liang, Si Liu,  Xiaohui Shen ,  Jianchao Yang,  Luoqi Liu, Jian Dong,  Liang Lin, Shuicheng Yan, “Deep Human Parsing with Active Template Regression” , IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), in press, 2015.  

    Downloads


    CUHK-SYSU

    Dataset

    Description

    The dataset is a large scale benchmark for person search, containing 18,184 images and 8,432 identities.
    The dataset can be divided into two parts according to the image sources: street snap and movie: In street snap, images were collected with hand-held cameras across hundreds of scenes and tried to include variations of view-points, lighting, resolutions, occlusions, and background as much as possible.We choose movies and TV dramas as another source for collecting images, because they provide more diversified scenes and more challenging viewpoints.

    Citation

    Tong Xiao*, Shuang Li*, Bochao Wang, Liang Lin, Xiaogang Wang. Joint Detection and Identification Feature Learning for Person Search. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Spotlight, 2017

    Downloads


    SYSUCT SYSUUS

    subCT subUS

    Description

    The SYSU-CT and SYSU-US are both provided by the first affiliated hospital, Sun Yat-sen University. The SYSU-CT data set is constructed by seven CT volumetric images of liver tumor from different patients:all the patients were scanned using a 64 detector row CT machine (Aquilion64, Toshiba Medical System). The SYSU-US data set consists of 20 US image sequences of abdomen with liver tumor.

    Citation

    Liang Lin, Wei Yang, Chenglong Li, Jin Tang, and Xiaochun Cao. Inference with Collaborative Model for Interactive Tumor Segmentation in Medical Image Sequences. IEEE Transactions on Cybernetics (T-Cybernetics), 2015

    Downloads


    CAMPUS-Human Dataset

    Description

    This database includes general and realistic challenges for people re-identification in surveillance. We record videos using 3 cameras from different views and extract individuals as well as video shots within the videos. We also annotate the body part configurations for each query instance and annotate ID and locations (bounding box) for each video shot. In total, there are 370 reference images (normalized to 175 pixels in height), for 74 individuals, with IDs and locations provided. We extract 214 shots (640 x 360) containing 1519 target individuals. Note that the targets often appear with diverse poses/views or occluded by other people within the scenarios.

    Citation

    Yuanlu Xu, Liang Lin*, Wei-Shi Zheng, and Xiaobai Liu, “Human Re-identification by Matching Compositional Template with Cluster Sampling”, Proc. of IEEE International Conference on Computer Vision (ICCV), 2013

     

    Downloads


    Office Activity (OA) Dataset

    Description

    The Office Activity (OA) dataset collected by us is a more complex activity dataset which covers the common daily activities happened in office. It is a large dataset with 1180 RGB-D activity sequences. To capture human activities in multi-views, we set three RGB-D sensors in different viewpoints for recording and each subject is asked to perform twice in one activity. To increase the variability of the activities, we record them in two different scenes, i.e., two different offices. More importantly, we not only consider the single human activity, but also deal with the problem with more than one people.

    Citation

    Liang Lin, Keze Wang, Wangmeng Zuo, Meng Wang, Jiebo Luo, and Lei Zhang, “A Deep Structured Model with Radius-Margin Bound for 3D Human Activity Recognition”, International Journal of Computer Vision (IJCV), 118(2): 256-273, 2016.

     

    Downloads


    Grayscale-Thermal Foreground Detection Dataset

    Description

    It is urgent need to study the multi-model moving object detection due to its own shortness of inadequate of single model videos. However, almost no complete good multi-model datasets to use, thus, we proposed a multi-model moving object detection dataset and the specific details as followings.Our multi-model moving object detection dataset mainly considered 7 challenges, i.e. interminttent motion, low illumination, bad weather, intense shadow, dynamic scene, background clutter, thermal crossover et al.

    The following main aspects are taken into account in creating the grayscale-thermal video:

    1. Scene category. Including laboratory rooms, campus roads, playgrounds and water pools et al.

    2. Object category. Including rigid and non-rigid objects, such as vehicles, pedestrians and animals.

    3. Intermittent motion.

    4. Shadow effect.

    5. Illumination condition.

    6. Background factor.

     

    Citation

    Chenglong Li, Xiao Wang, Lei Zhang, Jin Tang, Hejun Wu, Liang Lin*, “WELD: Weighted Low-rank Decomposition for Robust Grayscale-Thermal Foreground Detection”, IEEE Transactions on Circuits and Systems for Video Technology (T-CSVT), DOI: 10.1109/TCSVT.2016.2556586, 2016.

     

    Downloads


    Grayscale-Thermal Object Tracking (GTOT) Benchmark

    Description

    We collected 50 grayscale-thermal video clips under different scenarios and conditions, e.g., office areas, public roads, and water pool, etc. Each grayscale video is paired with one thermal video. We manually annotated them with ground truth bounding boxes. All annotations are done by a full-time annotator, to guarantee consistency.

     

    Citation

    Chenglong Li, Hui Cheng, Shiyi Hu, Xiaobai Liu, Jin Tang, and Liang Lin, “Learning Collaborative Sparse Representation for Grayscale-Thermal Tracking”, IEEE Transactions on Image Processing, 2016.

     

    Downloads