List of datasets in computer vision and image processing

From Wikipedia, the free encyclopedia

This is a list of datasets for machine learning research. It is part of the list of datasets for machine-learning research. These datasets consist primarily of images or videos for tasks such as object detection, facial recognition, and multi-label classification.

Object detection and recognition

More information Dataset Name, Brief description ...
Dataset Name Brief description Preprocessing Instances Format Default Task Created (updated) Reference Creator
MNIST Database of grayscale handwritten digits. 60,000 image, label classification 1994 [1] LeCun et al.
Extended MNIST Database of grayscale handwritten digits and letters. 810,000 image, label classification 2010 [2] NIST
NYU Object Recognition Benchmark (NORB) Stereoscopic pairs of photos of toys in various orientations. Centering, perturbation. 97,200 image pairs (50 uniform-colored toys under 36 angles, 9 azimuths, and 6 lighting conditions) Images Object recognition 2004 [3][4] LeCun et al.
80 Million Tiny Images 80 million 32×32 images labelled with 75,062 non-abstract nouns. 80,000,000 image, label 2008 [5] Torralba et al.
Street View House Numbers (SVHN) 630,420 digits with bounding boxes in house numbers captured in Google Street View. 630,420 image, label, bounding boxes 2011 [6][7] Netzer et al.
JFT-300M Dataset internal to Google Research. 303M images with 375M labels in 18291 categories 303,000,000 image, label 2017 [8][9][10] Google Research
JFT-3B Internal to Google Research. 3 billion images, annotated with ~30k categories in a hierarchy. 3,000,000,000 image, label 2021 [11] Google Research
Places 10+ million images in 400+ scene classes, with 5000 to 30,000 images per class. 10,000,000 image, label 2018 [12] Zhou et al
Ego 4D A massive-scale, egocentric dataset and benchmark suite collected across 74 worldwide locations and 9 countries, with over 3,670 hours of daily-life activity video. Object bounding boxes, transcriptions, labeling. 3,670 video hours video, audio, transcriptions Multimodal first-person task 2022 [13] K. Grauman et al.
Wikipedia-based Image Text Dataset 37.5 million image-text examples with 11.5 million unique images across 108 Wikipedia languages. 11,500,000 image, caption Pretraining, image captioning 2021 [14] Srinivasan e al, Google Research
Visual Genome Images and their description 108,000 images, text Image captioning 2016 [15] R. Krishna et al.
Berkeley 3-D Object Dataset 849 images taken in 75 different scenes. About 50 different object classes are labeled. Object bounding boxes and labeling. 849 labeled images, text Object recognition 2014 [16][17] A. Janoch et al.
Berkeley Segmentation Data Set and Benchmarks 500 (BSDS500) 500 natural images, explicitly separated into disjoint train, validation and test subsets + benchmarking code. Based on BSDS300. Each image segmented by five different subjects on average. 500 Segmented images Contour detection and hierarchical image segmentation 2011 [18] University of California, Berkeley
Microsoft Common Objects in Context (MS COCO) complex everyday scenes of common objects in their natural context. Object highlighting, labeling, and classification into 91 object types. 2,500,000 Labeled images, text Object recognition, image segmentation, keypointing, image captioning 2015 [19][20][21] T. Lin et al.
ImageNet Labeled object image database, used in the ImageNet Large Scale Visual Recognition Challenge Labeled objects, bounding boxes, descriptive words, SIFT features 14,197,122 Images, text Object recognition, scene recognition 2009 (2014) [22][23][24] J. Deng et al.
SUN (Scene UNderstanding) Very large scene and object recognition database. Places and objects are labeled. Objects are segmented. 131,067 Images, text Object recognition, scene recognition 2014 [25][26] J. Xiao et al.
LSUN (Large SUN) 10 scene categories (bedroom, etc) and 20 object categories (airplane, etc) Images and labels. ~60 million Images, text Object recognition, scene recognition 2015 [27][28][29] Yu et al.
LVIS (Large Vocabulary Instance Segmentation) segmentation masks for over 1000 entry-level object categories in images 2.2 million segmentations, 164K images Images, segmentation masks. image segmentation masking 2019 [30]
Open Images A Large set of images listed as having CC BY 2.0 license with image-level labels and bounding boxes spanning thousands of classes. Image-level labels, Bounding boxes 9,178,275 Images, text Classification, Object recognition 2017

(V7 : 2022)

[31]
TV News Channel Commercial Detection Dataset TV commercials and news broadcasts. Audio and video features extracted from still images. 129,685 Text Clustering, classification 2015 [32][33] P. Guha et al.
Statlog (Image Segmentation) Dataset The instances were drawn randomly from a database of 7 outdoor images and hand-segmented to create a classification for every pixel. Many features calculated. 2310 Text Classification 1990 [34] University of Massachusetts
Caltech 101 Pictures of objects. Detailed object outlines marked. 9146 Images Classification, object recognition 2003 [35][36] F. Li et al.
Caltech-256 Large dataset of images for object classification. Images categorized and hand-sorted. 30,607 Images, Text Classification, object detection 2007 [37][38] G. Griffin et al.
COYO-700M Image–text-pair dataset 10 billion pairs of alt-text and image sources in HTML documents in CommonCrawl 746,972,269 Images, Text Classification, Image-Language 2022 [39]
SIFT10M Dataset SIFT features of Caltech-256 dataset. Extensive SIFT feature extraction. 11,164,866 Text Classification, object detection 2016 [40] X. Fu et al.
LabelMe Annotated pictures of scenes. Objects outlined. 187,240 Images, text Classification, object detection 2005 [41] MIT Computer Science and Artificial Intelligence Laboratory
PASCAL VOC Dataset Images in 20 categories and localization bounding boxes. Labeling, bounding box included 500,000 Images, text Classification, object detection 2010 [42][43] M. Everingham et al.
CIFAR-10 Dataset Many small, low-resolution, images of 10 classes of objects. Classes labelled, training set splits created. 60,000 Images Classification 2009 [23][44] A. Krizhevsky et al.
CIFAR-100 Dataset Like CIFAR-10, above, but 100 classes of objects are given. Classes labelled, training set splits created. 60,000 Images Classification 2009 [23][44] A. Krizhevsky et al.
CINIC-10 Dataset A unified contribution of CIFAR-10 and Imagenet with 10 classes, and 3 splits. Larger than CIFAR-10. Classes labelled, training, validation, test set splits created. 270,000 Images Classification 2018 [45] Luke N. Darlow, Elliot J. Crowley, Antreas Antoniou, Amos J. Storkey
Fashion-MNIST A MNIST-like fashion product database Classes labelled, training set splits created. 60,000 Images Classification 2017 [46] Zalando SE
notMNIST Some publicly available fonts and extracted glyphs from them to make a dataset similar to MNIST. There are 10 classes, with letters A–J taken from different fonts. Classes labelled, training set splits created. 500,000 Images Classification 2011 [47] Yaroslav Bulatov
Linnaeus 5 dataset Images of 5 classes of objects. Classes labelled, training set splits created. 8000 Images Classification 2017 [48] Chaladze & Kalatozishvili
11K Hands 11,076 hand images (1600 x 1200 pixels) of 190 subjects, of varying ages between 18 – 75 years old, for gender recognition and biometric identification. None 11,076 hand images Images and (.mat, .txt, and .csv) label files Gender recognition and biometric identification 2017 [49] M Afifi
CORe50 Specifically designed for Continuous/Lifelong Learning and Object Recognition, is a collection of more than 500 videos (30fps) of 50 domestic objects belonging to 10 different categories. Classes labelled, training set splits created based on a 3-way, multi-runs benchmark. 164,866 RBG-D images images (.png or .pkl)

and (.pkl, .txt, .tsv) label files

Classification, Object recognition 2017 [50] V. Lomonaco and D. Maltoni
OpenLORIS-Object Lifelong/Continual Robotic Vision dataset (OpenLORIS-Object) collected by real robots mounted with multiple high-resolution sensors, includes a collection of 121 object instances (1st version of dataset, 40 categories daily necessities objects under 20 scenes). The dataset has rigorously considered 4 environment factors under different scenes, including illumination, occlusion, object pixel size and clutter, and defines the difficulty levels of each factor explicitly. Classes labelled, training/validation/testing set splits created by benchmark scripts. 1,106,424 RBG-D images images (.png and .pkl)

and (.pkl) label files

Classification, Lifelong object recognition, Robotic Vision 2019 [51] Q. She et al.
THz and thermal video data set This multispectral data set includes terahertz, thermal, visual, near infrared, and three-dimensional videos of objects hidden under people's clothes. images and 3D point clouds More than 20 videos. The duration of each video is about 85 seconds (about 345 frames). AP2J Experiments with hidden object detection 2019 [52][53] Alexei A. Morozov and Olga S. Sushkova
TomatoMAP A large-scale, annotated RGB image dataset of tomato plants designed for fine-grained phenotyping. Labeling, bounding box included 720,938 images Image classification, object detection, semantic segmentation, instance segmentation 2026 [54] Y. Zhang et al.
Close

3D Objects

See (Calli et al, 2015)[55] for a review of 33 datasets of 3D object as of 2015. See (Downs et al., 2022)[56] for a review of more datasets as of 2022.

More information Dataset Name, Brief description ...
Dataset Name Brief description Preprocessing Instances Format Default Task Created (updated) Reference Creator
Princeton Shape Benchmark 3D polygonal models collected from the Internet 1814 models in 92 categories 3D polygonal models, categories shape-based retrieval and analysis 2004 [57][58] Shilane et al.
Berkeley 3-D Object Dataset (B3DO) Depth and color images collected from crowdsourced Microsoft Kinect users. Annotated in 50 object categories. 849 images, in 75 scenes color image, depth image, object class, bounding boxes, 3D center points Predict bounding boxes 2011, updated 2014 [59] Janoch et al.
ShapeNet 3D models. Some are classified into WordNet synsets, like ImageNet. Partially classified into 3,135 categories. 3,000,000 models, 220,000 of which are classified. 3D models, class labels Predict class label. 2015 [60] Chang et al.
ObjectNet3D Images, 3D shapes, and objects 100 categories. 90127 images, 201888 objects, 44147 3D shapes images, 3D shapes, object bounding boxes, category labels recognizing the 3D pose and 3D shape of objects from 2D images 2016 [61][62] Xiang et al.
Common Objects in 3D (CO3D) Video frames from videos capturing objects from 50 MS-COCO categories, filmed by people on Amazon Mechanical Turk. 6 million frames from 40000 videos multi-view images, camera poses, 3D point clouds, object category Predict object category. Generate objects. 2021, updated 2022 as CO3Dv2 [63][64] Meta AI
Google Scanned Objects Scanned objects in SDF format. over 10 million 2022 [56] Google AI
Objectverse-XL 3D objects over 10 million 3D objects, metadata novel view synthesis, 3D object generation 2023 [65] Deitke et al.
OmniObject3D Scanned objects, labelled in 190 daily categories 6,000 textured meshes, point clouds, multiview images, videos robust 3D perception, novel-view synthesis, surface reconstruction, 3D object generation 2023 [66][67] Wu et al.
UnCommon Objects in 3D (uCO3D) 1,070 categories in the LVIS 2025 [68][69] Meta AI
Close

Object detection and recognition for autonomous vehicles

More information Dataset Name, Brief description ...
Dataset Name Brief description Preprocessing Instances Format Default Task Created (updated) Reference Creator
Cityscapes Dataset Stereo video sequences recorded in street scenes, with pixel-level annotations. Metadata also included. Pixel-level segmentation and labeling 25,000 Images, text Classification, object detection 2016 [70] Daimler AG et al.
German Traffic Sign Detection Benchmark Dataset Images from vehicles of traffic signs on German roads. These signs comply with UN standards and therefore are the same as in other countries. Signs manually labeled 900 Images Classification 2013 [71][72] S. Houben et al.
KITTI Vision Benchmark Dataset Autonomous vehicles driving through a mid-size city captured images of various areas using cameras and laser scanners. Many benchmarks extracted from data. >100 GB of data Images, text Classification, object detection 2012 [73][74][75] A. Geiger et al.
FieldSAFE Multi-modal dataset for obstacle detection in agriculture including stereo camera, thermal camera, web camera, 360-degree camera, lidar, radar, and precise localization. Classes labelled geographically. >400 GB of data Images and 3D point clouds Classification, object detection, object localization 2017 [76] M. Kragh et al.
Daimler Monocular Pedestrian Detection dataset It is a dataset of pedestrians in urban environments. Pedestrians are box-wise labeled. Labeled part contains 15560 samples with pedestrians and 6744 samples without. Test set contains 21790 images without labels. Images Object recognition and classification 2006 [77][78][79] Daimler AG
CamVid The Cambridge-driving Labeled Video Database (CamVid) is a collection of videos. The dataset is labeled with semantic labels for 32 semantic classes. over 700 images Images Object recognition and classification 2008 [80][81][82] Gabriel J. Brostow, Jamie Shotton, Julien Fauqueur, Roberto Cipolla
RailSem19 RailSem19 is a dataset for understanding scenes for vision systems on railways. The dataset is labeled semanticly and box-wise. 8500 Images Object recognition and classification, scene recognition 2019 [83][84] Oliver Zendel, Markus Murschitz, Marcel Zeilinger, Daniel Steininger, Sara Abbasi, Csaba Beleznai
BOREAS BOREAS is a multi-season autonomous driving dataset. It includes data from includes a Velodyne Alpha-Prime (128-beam) lidar, a FLIR Blackfly S camera, a Navtech CIR304-H radar, and an Applanix POS LV GNSS-INS. The data is annotated by 3D bounding boxes. 350 km of driving data Images, Lidar and Radar data Object recognition and classification, scene recognition 2023 [85][86] Keenan Burnett, David J. Yoon, Yuchen Wu, Andrew Zou Li, Haowei Zhang, Shichen Lu, Jingxing Qian, Wei-Kang Tseng, Andrew Lambert, Keith Y.K. Leung, Angela P. Schoellig, Timothy D. Barfoot
Bosch Small Traffic Lights Dataset It is a dataset of traffic lights. The labeling include bounding boxes of traffic lights together with their state (active light). 5000 images for training and a video sequence of 8334 frames for evaluation Images Traffic light recognition 2017 [87][88] Karsten Behrendt, Libor Novak, Rami Botros
FRSign It is a dataset of French railway signals. The labeling include bounding boxes of railway signals together with their state (active light). more than 100000 Images Railway signal recognition 2020 [89][90] Jeanine Harb, Nicolas Rébéna, Raphaël Chosidow, Grégoire Roblin, Roman Potarusov, Hatem Hajri
GERALD It is a dataset of German railway signals. The labeling include bounding boxes of railway signals together with their state (active light). 5000 Images Railway signal recognition 2023 [91][92] Philipp Leibner, Fabian Hampel, Christian Schindler
Multi-cue pedestrian Multi-cue onboard pedestrian detection dataset is a dataset for detection of pedestrians. The databaset is labeled box-wise. 1092 image pairs with 1776 boxes for pedestrians Images Object recognition and classification 2009 [93] Christian Wojek, Stefan Walk, Bernt Schiele
RAWPED RAWPED is a dataset for detection of pedestrians in the context of railways. The dataset is labeled box-wise. 26000 Images Object recognition and classification 2020 [94][95] Tugce Toprak, Burak Belenlioglu, Burak Aydın, Cuneyt Guzelis, M. Alper Selver
OSDaR23 OSDaR23 is a multi-sensory dataset for detection of objects in the context of railways. The databaset is labeled box-wise. 16874 frames Images, Lidar, Radar and Infrared Object recognition and classification 2023 [96][97] Roman Tilly, Rustam Tagiew, Pavel Klasek (DZSF); Philipp Neumaier, Patrick Denzler, Tobias Klockau, Martin Boekhoff, Martin Köppel (Digitale Schiene Deutschland); Karsten Schwalbe (FusionSystems)
Agroverse Argoverse is a multi-sensory dataset for detection of objects in the context of roads. The dataset is annotated box-wise. 320 hours of recording Data from 7 cameras and LiDAR Object recognition and classification, object tracking 2022 [98][99] Argo AI, Carnegie Mellon University, Georgia Institute of Technology
Rail3D Rail3D is a LiDAR dataset for railways recorded in Hungary, France, and Belgium The dataset is annotated semantically 288 million annotated points LiDAR Object recognition and classification, object tracking 2024 [100] Abderrazzaq Kharroubi, Ballouch Zouhair, Rafika Hajji, Anass Yarroudh, and Roland Billen; University of Liège and Hassan II Institute of Agronomy and Veterinary Medicine
WHU-Railway3D WHU-Railway3D is a LiDAR dataset for urban, rural, and plateau railways recorded in China The dataset is annotated semantically 4.6 billion annotated data points LiDAR Object recognition and classification, object tracking 2024 [101] Bo Qiu, Yuzhou Zhou, Lei Dai; Bing Wang, Jianping Li, Zhen Dong, Chenglu Wen, Zhiliang Ma, Bisheng Yang; Wuhan University, University of Oxford, Hong Kong Polytechnic University, Nanyang Technological University, Xiamen University and Tsinghua University
RailFOD23 A dataset of foreign objects on railway catenary The dataset is annotated boxwise 14,615 images Images Object recognition and classification, object tracking 2024 [102] Zhichao Chen, Jie Yang, Zhicheng Feng, Hao Zhu; Jiangxi University of Science and Technology
ESRORAD A dataset of images and point clouds for urban road and rail scenes from Le Havre and Rouen The dataset is annotated boxwise 2,700 k virtual images and 100,000 real images Images, LiDAR Object recognition and classification, object tracking 2022 [103] Redouane Khemmar, Antoine Mauri, Camille Dulompont, Jayadeep Gajula, Vincent Vauchey, Madjid Haddad and Rémi Boutteau; Le Havre Normandy University and SEGULA Technologies
RailVID Data recorded by AT615X infrared thermography from InfiRay in diverse railway scenarios, including carport, depot, and straight. The dataset is annotated semantically 1,071 images infrared images Object recognition and classification, object tracking 2022 [104] Hao Yuan, Zhenkun Mei, Yihao Chen, Weilong Niu, Cheng Wu; Soochow University
RailPC LiDAR dataset in the context of railways The dataset is annotated semantically 3 billion data points LiDAR Object recognition and classification, object tracking 2024 [105] Tengping Jiang, Shiwei Li, Qinyu Zhang, Guangshuai Wang, Zequn Zhang, Fankun Zeng, Peng An, Xin Jin, Shan Liu, Yongjun Wang; Nanjing Normal University, Ministry of Natural Resources, Eastern Institute of Technology, Tianjin Key Laboratory of Rail Transit Navigation Positioning and Spatio‐temporal Big Data Technology, Northwest Normal University, Washington University in St. Louis and Ningbo University of Technology
RailCloud-HdF LiDAR dataset in the context of railways The dataset is annotated semantically 8060.3 million data points LiDAR Object recognition and classification, object tracking 2024 [106] Mahdi Abid, Mathis Teixeira, Ankur Mahtani and Thomas Laurent; Railenium
RailGoerl24 RGB and LiDAR dataset in the context of railways The dataset is annotated boxwise 12205 HD RGB frames and 383922305 LiDAR colored cloud points RGB, LiDAR Person recognition and classification 2025 [107] Rustam Tagiew, Ilkay Wunderlich, Philipp Zanitzer, Mark, Sastuba, Carsten Knoll, Kilian Göller, Haadia Amjad, Steffen Seitz
MRSI RGB and Infrared dataset in the context of railways The dataset is annotated boxwise and pixelwise, eleven classes including background 23000 RGB images and 4000 infrared images RGB, Infrared Object recognition and classification 2022 [108] Yihao Chen, Ning Zhu, Qian Wu, Cheng Wu, Weilong Niu and Yiming Wang
RailDriVE February 2019 Data Set for Rail Vehicle Positioning Experiments The dataset is not annotated 26:46 min back and forward driving on an 1.2 km track segment GNSS, IMU, Speed/distance sensors (Radar, optical, odometer), RGB Lokalisation and mapping 2019 [109] Hanno Winter, Michael Helmut Roth
Close

Facial recognition

In computer vision, face images have been used extensively to develop facial recognition systems, face detection, and many other projects that use images of faces. See [110] for a curated list of datasets, focused on the pre-2005 period.

More information Dataset name, Brief description ...
Dataset name Brief description Preprocessing Instances Format Default task Created (updated) Reference Creator
Labeled Faces in the Wild (LFW) Images of named individuals obtained by Internet search. frontal face detection, bounding box cropping 13233 images of 5749 named individuals images, labels unconstrained face recognition 2008 [111][112] Huang et al.
Aff-Wild 298 videos of 200 individuals, ~1,250,000 manually annotated images: annotated in terms of dimensional affect (valence-arousal); in-the-wild setting; color database; various resolutions (average = 640x360) the detected faces, facial landmarks and valence-arousal annotations ~1,250,000 manually annotated images video (visual + audio modalities) affect recognition (valence-arousal estimation) 2017 CVPR[113]

IJCV[114]

D. Kollias et al.
Aff-Wild2 558 videos of 458 individuals, ~2,800,000 manually annotated images: annotated in terms of i) categorical affect (7 basic expressions: neutral, happiness, sadness, surprise, fear, disgust, anger); ii) dimensional affect (valence-arousal); iii) action units (AUs 1,2,4,6,12,15,20,25); in-the-wild setting; color database; various resolutions (average = 1030x630) the detected faces, detected and aligned faces and annotations ~2,800,000 manually annotated images video (visual + audio modalities) affect recognition (valence-arousal estimation, basic expression classification, action unit detection) 2019 BMVC[115]

FG[116]

D. Kollias et al.
FERET (facial recognition technology) 11338 images of 1199 individuals in different positions and at different times. None. 11,338 Images Classification, face recognition 2003 [117][118] United States Department of Defense
Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) 7,356 video and audio recordings of 24 professional actors. 8 emotions each at two intensities. Files labelled with expression. Perceptual validation ratings provided by 319 raters. 7,356 Video, sound files Classification, face recognition, voice recognition 2018 [119][120] S.R. Livingstone and F.A. Russo
SCFace Color images of faces at various angles. Location of facial features extracted. Coordinates of features given. 4,160 Images, text Classification, face recognition 2011 [121][122] M. Grgic et al.
Yale Face Database Faces of 15 individuals in 11 different expressions. Labels of expressions. 165 Images Face recognition 1997 [123][124] J. Yang et al.
Cohn-Kanade AU-Coded Expression Database Large database of images with labels for expressions. Tracking of certain facial features. 500+ sequences Images, text Facial expression analysis 2000 [125][126] T. Kanade et al.
JAFFE Facial Expression Database 213 images of 7 facial expressions (6 basic facial expressions + 1 neutral) posed by 10 Japanese female models. Images are cropped to the facial region. Includes semantic ratings data on emotion labels. 213 Images, text Facial expression cognition 1998 [127][128] Lyons, Kamachi, Gyoba
FaceScrub Images of public figures scrubbed from image searching. Name and m/f annotation. 107,818 Images, text Face recognition 2014 [129][130] H. Ng et al.
BioID Face Database Images of faces with eye positions marked. Manually set eye positions. 1521 Images, text Face recognition 2001 [131] BioID
Skin Segmentation Dataset Randomly sampled color values from face images. B, G, R, values extracted. 245,057 Text Segmentation, classification 2012 [132][133] R. Bhatt.
Bosphorus 3D Face image database. 34 action units and 6 expressions labeled; 24 facial landmarks labeled. 4652

Images, text

Face recognition, classification 2008 [134][135] A Savran et al.
UOY 3D-Face neutral face, 5 expressions: anger, happiness, sadness, eyes closed, eyebrows raised. labeling. 5250

Images, text

Face recognition, classification 2004 [136][137] University of York
CASIA 3D Face Database Expressions: Anger, smile, laugh, surprise, closed eyes. None. 4624

Images, text

Face recognition, classification 2007 [138][139] Institute of Automation, Chinese Academy of Sciences
CASIA NIR Expressions: Anger Disgust Fear Happiness Sadness Surprise None. 480 Annotated Visible Spectrum and Near Infrared Video captures at 25 frames per second Face recognition, classification 2011 [140] Zhao, G. et al.
BU-3DFE neutral face, and 6 expressions: anger, happiness, sadness, surprise, disgust, fear (4 levels). 3D images extracted. None. 2500 Images, text Facial expression recognition, classification 2006 [141] Binghamton University
Face Recognition Grand Challenge Dataset Up to 22 samples for each subject. Expressions: anger, happiness, sadness, surprise, disgust, puffy. 3D Data. None. 4007 Images, text Face recognition, classification 2004 [142][143] National Institute of Standards and Technology
Gavabdb Up to 61 samples for each subject. Expressions neutral face, smile, frontal accentuated laugh, frontal random gesture. 3D images. None. 549 Images, text Face recognition, classification 2008 [144][145] King Juan Carlos University
3D-RMA Up to 100 subjects, expressions mostly neutral. Several poses as well. None. 9971 Images, text Face recognition, classification 2004 [146][147] Royal Military Academy (Belgium)
SoF 112 persons (66 males and 46 females) wear glasses under different illumination conditions. A set of synthetic filters (blur, occlusions, noise, and posterization ) with different level of difficulty. 42,592 (2,662 original image × 16 synthetic image) Images, Mat file Gender classification, face detection, face recognition, age estimation, and glasses detection 2017 [148][149] Afifi, M. et al.
IMDb-WIKI IMDb and Wikipedia face images with gender and age labels. None 523,051 Images Gender classification, face detection, face recognition, age estimation 2015 [150] R. Rothe, R. Timofte, L. V. Gool
Close

Action recognition

More information Dataset name, Brief description ...
Dataset name Brief description Preprocessing Instances Format Default Task Created (updated) Reference Creator
AVA-Kinetics Localized Human Actions Video Annotated 80 action classes from keyframes from videos from Kinetics-700. 1.6 million annotations. 238,906 video clips, 624,430 keyframes. Annotations, videos. Action prediction 2020 [151][152] Li et al from Perception Team of Google AI.
TV Human Interaction Dataset Videos from 20 different TV shows for prediction social actions: handshake, high five, hug, kiss and none. None. 6,766 video clips video clips Action prediction 2013 [153] Patron-Perez, A. et al.
Berkeley Multimodal Human Action Database (MHAD) Recordings of a single person performing 12 actions MoCap pre-processing 660 action samples 8 PhaseSpace Motion Capture, 2 Stereo Cameras, 4 Quad Cameras, 6 accelerometers, 4 microphones Action classification 2013 [154] Ofli, F. et al.
THUMOS Dataset Large video dataset for action classification. Actions classified and labeled. 45M frames of video Video, images, text Classification, action detection 2013 [155][156] Y. Jiang et al.
MEXAction2 Video dataset for action localization and spotting Actions classified and labeled. 1000 Video Action detection 2014 [157] Stoian et al.
Close

Handwriting and character recognition

More information Dataset name, Brief description ...
Dataset name Brief description Preprocessing Instances Format Default Task Created (updated) Reference Creator
Artificial Characters Dataset Artificially generated data describing the structure of 10 capital English letters. Coordinates of lines drawn given as integers. Various other features. 6000 Text Handwriting recognition, classification 1992 [158] H. Guvenir et al.
Letter Dataset Upper-case printed letters. 17 features are extracted from all images. 20,000 Text OCR, classification 1991 [159][160] D. Slate et al.
CASIA-HWDB Offline handwritten Chinese character database. 3755 classes in the GB 2312 character set. Gray-scaled images with background pixels labeled as 255. 1,172,907 Images, Text Handwriting recognition, classification 2009 [161] CASIA
CASIA-OLHWDB Online handwritten Chinese character database, collected using Anoto pen on paper. 3755 classes in the GB 2312 character set. Provides the sequences of coordinates of strokes. 1,174,364 Images, Text Handwriting recognition, classification 2009 [162][161] CASIA
Character Trajectories Dataset Labeled samples of pen tip trajectories for people writing simple characters. 3-dimensional pen tip velocity trajectory matrix for each sample 2858 Text Handwriting recognition, classification 2008 [163][164] B. Williams
Chars74K Dataset Character recognition in natural images of symbols used in both English and Kannada 74,107 Character recognition, handwriting recognition, OCR, classification 2009 [165] T. de Campos
EMNIST dataset Handwritten characters from 3600 contributors Derived from NIST Special Database 19. Converted to 28x28 pixel images, matching the MNIST dataset.[166] 800,000 Images character recognition, classification, handwriting recognition 2016 EMNIST dataset[167]

Documentation[168]

Gregory Cohen, et al.
UJI Pen Characters Dataset Isolated handwritten characters Coordinates of pen position as characters were written given. 11,640 Text Handwriting recognition, classification 2009 [169][170] F. Prat et al.
Gisette Dataset Handwriting samples from the often-confused 4 and 9 characters. Features extracted from images, split into train/test, handwriting images size-normalized. 13,500 Images, text Handwriting recognition, classification 2003 [171] Yann LeCun et al.
Omniglot dataset 1623 different handwritten characters from 50 different alphabets. Hand-labeled. 38,300 Images, text, strokes Classification, one-shot learning 2015 [172][173] American Association for the Advancement of Science
MNIST database Database of handwritten digits. Hand-labeled. 60,000 Images, text Classification 1994 [174][175] National Institute of Standards and Technology
Optical Recognition of Handwritten Digits Dataset Normalized bitmaps of handwritten data. Size normalized and mapped to bitmaps. 5620 Images, text Handwriting recognition, classification 1998 [176] E. Alpaydin et al.
Pen-Based Recognition of Handwritten Digits Dataset Handwritten digits on electronic pen-tablet. Feature vectors extracted to be uniformly spaced. 10,992 Images, text Handwriting recognition, classification 1998 [177][178] E. Alpaydin et al.
Semeion Handwritten Digit Dataset Handwritten digits from 80 people. All handwritten digits have been normalized for size and mapped to the same grid. 1593 Images, text Handwriting recognition, classification 2008 [179] T. Srl
HASYv2 Handwritten mathematical symbols All symbols are centered and of size 32px x 32px. 168233 Images, text Classification 2017 [180] Martin Thoma
Noisy Handwritten Bangla Dataset Includes Handwritten Numeral Dataset (10 classes) and Basic Character Dataset (50 classes), each dataset has three types of noise: white gaussian, motion blur, and reduced contrast. All images are centered and of size 32x32. Numeral Dataset:

23330,

Character Dataset:

76000

Images,

text

Handwriting recognition,

classification

2017 [181][182] M. Karki et al.
Close

Aerial images

More information Dataset name, Brief description ...
Dataset name Brief description Preprocessing Instances Format Default Task Created (updated) Reference Creator
iSAID: Instance Segmentation in Aerial Images Dataset Precise instance-level annotatio carried out by professional annotators, cross-checked and validated by expert annotators complying with well-defined guidelines. 655,451 (15 classes) Images, jpg, json Aerial Classification, Object Detection, Instance Segmentation 2019 [183][184] Syed Waqas Zamir,

Aditya Arora,

Akshita Gupta,

Salman Khan,

Guolei Sun,

Fahad Shahbaz Khan, Fan Zhu,

Ling Shao, Gui-Song Xia, Xiang Bai

Aerial Image Segmentation Dataset 80 high-resolution aerial images with spatial resolution ranging from 0.3 to 1.0. Images manually segmented. 80 Images Aerial Classification, object detection 2013 [185][186] J. Yuan et al.
KIT AIS Data Set Multiple labeled training and evaluation datasets of aerial images of crowds. Images manually labeled to show paths of individuals through crowds. ~ 150 Images with paths People tracking, aerial tracking 2012 [187][188] M. Butenuth et al.
Wilt Dataset Remote sensing data of diseased trees and other land cover. Various features extracted. 4899 Images Classification, aerial object detection 2014 [189][190] B. Johnson
MASATI dataset Maritime scenes of optical aerial images from the visible spectrum. It contains color images in dynamic marine environments, each image may contain one or multiple targets in different weather and illumination conditions. Object bounding boxes and labeling. 7389 Images Classification, aerial object detection 2018 [191][192] A.-J. Gallego et al.
Forest Type Mapping Dataset Satellite imagery of forests in Japan. Image wavelength bands extracted. 326 Text Classification 2015 [193][194] B. Johnson
Overhead Imagery Research Data Set Annotated overhead imagery. Images with multiple objects. Over 30 annotations and over 60 statistics that describe the target within the context of the image. 1000 Images, text Classification 2009 [195][196] F. Tanner et al.
SpaceNet SpaceNet is a corpus of commercial satellite imagery and labeled training data. GeoTiff and GeoJSON files containing building footprints. >17533 Images Classification, Object Identification 2017 [197][198][199] DigitalGlobe, Inc.
UC Merced Land Use Dataset These images were manually extracted from large images from the USGS National Map Urban Area Imagery collection for various urban areas around the US. This is a 21 class land use image dataset meant for research purposes. There are 100 images for each class. 2,100 Image chips of 256x256, 30 cm (1 foot) GSD Land cover classification 2010 [200] Yi Yang and Shawn Newsam
SAT-4 Airborne Dataset Images were extracted from the National Agriculture Imagery Program (NAIP) dataset. SAT-4 has four broad land cover classes, includes barren land, trees, grassland and a class that consists of all land cover classes other than the above three. 500,000 Images Classification 2015 [201][202] S. Basu et al.
SAT-6 Airborne Dataset Images were extracted from the National Agriculture Imagery Program (NAIP) dataset. SAT-6 has six broad land cover classes, includes barren land, trees, grassland, roads, buildings and water bodies. 405,000 Images Classification 2015 [201][202] S. Basu et al.
Close

Underwater images

More information Dataset name, Brief description ...
Dataset name Brief description Preprocessing Instances Format Default Task Created (updated) Reference Creator
SUIM Dataset The images have been rigorously collected during oceanic explorations and human-robot collaborative experiments, and annotated by human participants. Images with pixel annotations for eight object categories: fish (vertebrates), reefs (invertebrates), aquatic plants, wrecks/ruins, human divers, robots, and sea-floor. 1,635 Images Segmentation 2020 [203] Md Jahidul Islam et al.
LIACI Dataset Images have been collected during underwater ship inspections and annotated by human domain experts. Images with pixel annotations for ten object categories: defects, corrosion, paint peel, marine growth, sea chest gratings, overboard valves, propeller, anodes, bilge keel and ship hull. 1,893 Images Segmentation 2022 [204] Waszak et al.
Close

Other images

More information Dataset name, Brief description ...
Dataset name Brief description Preprocessing Instances Format Default Task Created (updated) Reference Creator
Kodak Lossless True Color Image Suite RGB images for testing image compression. None 24 Image Image compression 1999 [205] Kodak
NRC-GAMMA A novel benchmark gas meter image dataset None 28,883 Image, Label Classification 2021 [206][207] A. Ebadi, P. Paul, S. Auer, & S. Tremblay
The SUPATLANTIQUE dataset Images of scanned official and Wikipedia documents None 4908 TIFF/pdf Source device identification, forgery detection, Classification,.. 2020 [208] C. Ben Rabah et al.
Density functional theory quantum simulations of graphene Labelled images of raw input to a simulation of graphene Raw data (in HDF5 format) and output labels from density functional theory quantum simulation 60744 test and 501473 training files Labeled images Regression 2019 [209] K. Mills & I. Tamblyn
Quantum simulations of an electron in a two dimensional potential well Labelled images of raw input to a simulation of 2d Quantum mechanics Raw data (in HDF5 format) and output labels from quantum simulation 1.3 million images Labeled images Regression 2017 [210] K. Mills, M.A. Spanner, & I. Tamblyn
MPII Cooking Activities Dataset Videos and images of various cooking activities. Activity paths and directions, labels, fine-grained motion labeling, activity class, still image extraction and labeling. 881,755 frames Labeled video, images, text Classification 2012 [211][212] M. Rohrbach et al.
FAMOS Dataset 5,000 unique microstructures, all samples have been acquired 3 times with two different cameras. Original PNG files, sorted per camera and then per acquisition. MATLAB datafiles with one 16384 times 5000 matrix per camera per acquisition. 30,000 Images and .mat files Authentication 2012 [213] S. Voloshynovskiy, et al.
PharmaPack Dataset 1,000 unique classes with 54 images per class. Class labeling, many local descriptors, like SIFT and aKaZE, and local feature agreators, like Fisher Vector (FV). 54,000 Images and .mat files Fine-grain classification 2017 [214] O. Taran and S. Rezaeifar, et al.
Stanford Dogs Dataset Images of 120 breeds of dogs from around the world. Train/test splits and ImageNet annotations provided. 20,580 Images, text Fine-grain classification 2011 [215][216] A. Khosla et al.
StanfordExtra Dataset 2D keypoints and segmentations for the Stanford Dogs Dataset. 2D keypoints and segmentations provided. 12,035 Labelled images 3D reconstruction/pose estimation 2020 [217] B. Biggs et al.
The Oxford-IIIT Pet Dataset 37 categories of pets with roughly 200 images of each. Breed labeled, tight bounding box, foreground-background segmentation. ~ 7,400 Images, text Classification, object detection 2012 [216][218] O. Parkhi et al.
Corel Image Features Data Set Database of images with features extracted. Many features including color histogram, co-occurrence texture, and colormoments, 68,040 Text Classification, object detection 1999 [219][220] M. Ortega-Bindenberger et al.
Online Video Characteristics and Transcoding Time Dataset. Transcoding times for various different videos and video properties. Video features given. 168,286 Text Regression 2015 [221] T. Deneke et al.
Microsoft Sequential Image Narrative Dataset (SIND) Dataset for sequential vision-to-language Descriptive caption and storytelling given for each photo, and photos are arranged in sequences 81,743 Images, text Visual storytelling 2016 [222] Microsoft Research
Caltech-UCSD Birds-200-2011 Dataset Large dataset of images of birds. Part locations for birds, bounding boxes, 312 binary attributes given 11,788 Images, text Classification 2011 [223][224] C. Wah et al.
YouTube-8M Large and diverse labeled video dataset YouTube video IDs and associated labels from a diverse vocabulary of 4800 visual entities 8 million Video, text Video classification 2016 [225][226] S. Abu-El-Haija et al.
YFCC100M Large and diverse labeled image and video dataset Flickr Videos and Images and associated description, titles, tags, and other metadata (such as Exif and geotags) 100 million Video, Image, Text Video and Image classification 2016 [227][228] B. Thomee et al.
Discrete LIRIS-ACCEDE Short videos annotated for valence and arousal. Valence and arousal labels. 9800 Video Video emotion elicitation detection 2015 [229] Y. Baveye et al.
Continuous LIRIS-ACCEDE Long videos annotated for valence and arousal while also collecting Galvanic Skin Response. Valence and arousal labels. 30 Video Video emotion elicitation detection 2015 [230] Y. Baveye et al.
MediaEval LIRIS-ACCEDE Extension of Discrete LIRIS-ACCEDE including annotations for violence levels of the films. Violence, valence and arousal labels. 10900 Video Video emotion elicitation detection 2015 [231] Y. Baveye et al.
Leeds Sports Pose Articulated human pose annotations in 2000 natural sports images from Flickr. Rough crop around single person of interest with 14 joint labels 2000 Images plus .mat file labels Human pose estimation 2010 [232] S. Johnson and M. Everingham
Leeds Sports Pose Extended Training Articulated human pose annotations in 10,000 natural sports images from Flickr. 14 joint labels via crowdsourcing 10000 Images plus .mat file labels Human pose estimation 2011 [233] S. Johnson and M. Everingham
MCQ Dataset 6 different real multiple choice-based exams (735 answer sheets and 33,540 answer boxes) to evaluate computer vision techniques and systems developed for multiple choice test assessment systems. None 735 answer sheets and 33,540 answer boxes Images and .mat file labels Development of multiple choice test assessment systems 2017 [234][235] Afifi, M. et al.
Surveillance Videos Real surveillance videos cover a large surveillance time (7 days with 24 hours each). None 19 surveillance videos (7 days with 24 hours each). Videos Data compression 2016 [236] Taj-Eddin, I. A. T. F. et al.
LILA BC Labeled Information Library of Alexandria: Biology and Conservation. Labeled images that support machine learning research around ecology and environmental science. None ~10M images Images Classification 2019 [237] LILA working group
Can We See Photosynthesis? 32 videos for eight live and eight dead leaves recorded under both DC and AC lighting conditions. None 32 videos Videos Liveness detection of plants 2017 [238] Taj-Eddin, I. A. T. F. et al.
Mathematical Mathematics Memes Collection of 10,000 memes on mathematics. None ~10,000 Images Visual storytelling, object detection. 2021 [239] Mathematical Mathematics Memes
Flickr-Faces-HQ Dataset Collection of images containing a face each, crawled from Flickr Pruned with "various automatic filters", cropped and aligned to faces, and had images of statues, paintings, or photos of photos removed via crowdsourcing 70,000 Images Face Generation 2019 [240] Karras et al.
Fruits-360 dataset Collection of images containing 170 fruits, vegetables, nuts, and seeds. 100x100 pixels, white background. 115499 Images (jpg) Classification 2017–2025 [241] Mihai Oltean
RVL-CDIP Scanned documents from the Truth Tobacco Industry Documents library. Maximum dimension is 1000 pixels. 400,000 Images Classification 2015 [242] A. Harley et al.
Close

References

Related Articles

Wikiwand AI