Dataset Archives - AIDA - AI Doctoral Academy

You are in taxonomy page

AUW Dataset

The sample dataset, called the AUTH-Unreal-Wildfire (AUW) dataset, is a synthetic collection created to advance deep learning for wildfire segmentation. It addresses the critical challenge of obtaining accurately annotated training data in natural disaster management by using a novel, open-source pipeline built with the AirSim simulator. This pipeline uniquely integrates a custom particle segmentation camera and Procedural Content Generation (PCG) tools to produce photorealistic wildfire images paired with precise pixel-level segmentation masks—a feature previously difficult to achieve since fire assets are typically particle-based without a defined 3D mesh. The dataset consists of 1,500 training and 200 test images and was specifically designed to train and evaluate state-of-the-art segmentation models like PIDNet, both on its own and as a data augmentation resource to enhance performance on real-world wildfire imagery.

For a comprehensive explanation of the methodology and tools used to create this synthetic dataset, please refer to the full conference paper. This work is formally published and should be cited as follows: E. Spatharis, C. Papaioannidis, V. Mygdalis and I. Pitas, “UNREALFIRE: A synthetic dataset creation pipeline for annotated fire imagery in Unreal Engine”, IEEE International Conference on Image Processing (ICIP), Workshop on Bridging the Gap: Advanced Data Processing for Natural Disaster Management – Integrating Visual and Non-Visual Insights, Anchorage, Alaska, USA, 13-17 September, 2025. The paper is available at: https://aiia.csd.auth.gr/wp-content/uploads/2025/12/SPATHARIS_ICIP_2025.pdf and at https://zenodo.org/records/18198757 .

In order to access the AUW Dataset created/assembled by Aristotle University of Thessaloniki, please complete and sign the license agreement . Subsequently, email it to Prof. Ioannis Pitas (using “TEMA – Blaze Dataset availability” as e-mail subject) so as to rece

TEMA AIIA_flood Dataset

The dataset for the flood binary segmentation task comprises 720 images consolidated from two sources: D_Mallian_1 and BRK_1 trials. It is structured into distinct directories for each source (d_mallian_1 and brk_1), each containing standard train and val splits with separate folders for images (.jpg) and labels (.png). The annotation masks are binary, where pixels are labeled as 0 for background and 1 for floodwater. The total split consists of 428 training images (198 from D_Mallian_1 and 230 from BRK_1) and 292 validation images (140 from D_Mallian_1 and 152 from BRK_1), resulting in an approximate 60% – 40% training-validation distribution.

In order to access the TEMA AIIA_flood dataset created/assembled by Aristotle University of Thessaloniki, please complete and sign the license agreement. Subsequently, email it to Prof. Ioannis Pitas (using “TEMA – AIIA_flodd Dataset availability” as e-mail subject) so as to receive FTP credentials for downloading.

TEMA AIIA_wildfire Dataset

The TEMA AIIA_wildfire dataset is a collection of 2,237 natural disaster images designed for semantic segmentation, focusing on burnt areas, smoke, and fire. It aggregates and standardizes images from three distinct sources: the BLAZE classification dataset (a subset of which we annotated), KAHY trials, and RAS. The dataset is organized by source (BLAZE1, KAHY, RAS), each with standard train/val splits containing .jpg images and corresponding .png label masks. Labels follow a four-class hierarchy (0: background, 1: burnt, 2: smoke, 3: fire). The final composition is 985 images from BLAZE (https://aiia.csd.auth.gr/blaze-fire-classification-segmentation-dataset/) (655 annotated), 584 from KAHY, and 668 from RAS, split into 1,528 training and 655 validation images almost a 70 – 30% split.

In order to access the TEMA AIIA_wildfire dataset created/assembled by Aristotle University of Thessaloniki, please complete and sign the license agreement. Subsequently, email it to Prof. Ioannis Pitas (using “TEMA – AIIA_wildfire Dataset availability” as e-mail subject) so as to receive FTP credentials for downloading.

3D-Flood Dataset

The Aristotle University of Thessaloniki (hereinafter, AUTH) created the dataset ‘3D-Flood’, within the context of the project TEMA that was funded by the European Commission-European Union [Grant Agreement number: 101093003; start date: 01/12/2022; end date: 30/11/2026].

General description of the dataset

The dataset will be used for the construction of a 3D model regarding the district of Agios Thomas in Larisa, Greece, after the flood events of 2023. It is comprised of 795 UAV video frames, taken from 4 publicly available videos. We provide links to the public videos, along with the frame numbers that we kept.

To advance research in the relevant field, AUTH made the dataset publicly available for research purposes via the AIIA lab. All requests for access to/use of the dataset must be submitted in writing by researchers. In order to access the 3D-Flood dataset created/assembled by Aristotle University of Thessaloniki, please complete and sign this license agreement. Subsequently, email it to Prof. Ioannis Pitas (using “TEMA – 3D – Flood Dataset availability” as e-mail subject) so as to receive FTP credentials for downloading.

Flood Master Database

The Aristotle University of Thessaloniki (hereinafter, AUTH) created the following dataset, entitled ‘Flood Master Database’, within the context of the project TEMA that was funded by the European Commission-European Union [Grant Agreement number: 101093003; start date: 01/12/2022; end date: 30/11/2026].

General description of the dataset

The Master Flood Database consists of flood images picked from different publicly available datasets. The origin of the images is specified in the “sources.csv” file. The train- val split was made using 3:1 ratio. For images taken from the “Flood Area Segmentation” dataset and the “Water Dataset” (url links are given in “sources.csv”), we normalized the binary masks in the {0,1} values, so we provide our version of masks. For images taken from the “Roadway Flooding Image Dataset”, we use the annotations path from the original dataset, since the masks are already normalized in the wanted range.

Moreover, Test videos contain video frames from real flooding scenarios in Greece and Italy, to test the trained model in real world data. We extracted frames from these videos. Given the test video, its frames are named with our convention “frame.jpg”, where is the frame number as extracted from the video. We provide the segmentation annotations for these frames, with the naming convention “frameIds.png”. The Greek video is too big, so we selected and annotated 567 frames from it. As for the Italian video, we selected and annotated all 1406 frames.

Dataset Structure

Inside each folder train-val-test, there is a csv file that contains the real image path (on the source dataset), the annotation path (on our database) and the source of each image.

To advance research in the relevant field, AUTH made the dataset publicly available for research purposes via the AIIA lab. All requests for access to/use of the dataset must be submitted in writing by researchers. In order to access the Flood Master Database created/assembled by Aristotle University of Thessaloniki, please complete and sign this license agreement. Subsequently, email it to Prof. Ioannis Pitas (using “TEMA – Flood Master Database availability” as e-mail subject) so as to receive FTP credentials for downloading.

Blaze Fire Classification – Segmentation Dataset

The Aristotle University of Thessaloniki (hereinafter, AUTH) created the following dataset, entitled ‘Blaze’, within the context of the project TEMA that was funded by the European Commission-European Union [Grant Agreement number: 101093003; start date: 01/12/2022; end date: 30/11/2026].

General description of the dataset

The dataset will be used for wildfire image classification and burnt area segmentation tasks for Unmanned Aerial Vehicles. It is comprised of 5,408 frames of aerial views taken from 56 videos and 2 public datasets. From the D-Fire public dataset, 829 photographs were used; and from the Burned Area UAV public dataset 34 images were used. For the classification task, there are 5 classes (‘Burnt’, ‘Half-Burnt’, ’Non-Burnt’, ‘Fire’, ‘Smoke’). As for the segmentation task, 404 segmentation masks on a subset have been created, which assign to each pixel of the image the class ‘burnt’ or the class ‘non-burnt’.

Dataset StructureCSV files are provided containing the frames taken from every video, the class that has been assigned to them, the path to the respective segmentation mask along with the mask for the segmentation subset and the related links to the public videos and the 2 public datasets.More details on the dataset are available in the following papers:

de Venâncio, P.V.A.B., Lisboa, A.C. & Barbosa, A.V. An automatic fire detection system based on deep convolutional neural networks for low-power, resource-constrained devices. Neural Comput & Applic 34, 15349–15368 (2022). DOI
Tiago F.R. Ribeiro, Fernando Silva, José Moreira, Rogério Luís de C. Costa,Burned area semantic segmentation: A novel dataset and evaluation using convolutional networks,ISPRS Journal of Photogrammetry and Remote Sensing,Volume 202,2023,Pages 565-580,ISSN 0924-2716. DOI

In order to access the Blaze Dataset created/assembled by Aristotle University of Thessaloniki, please complete and sign this license agreement. Subsequently, email it to Prof. Ioannis Pitas (using “TEMA – Blaze Dataset availability” as e-mail subject) so as to receive FTP credentials for downloading.

AUW Dataset

AUW Dataset

TEMA AIIA_flood Dataset

TEMA AIIA_wildfire Dataset

3D-Flood Dataset

Flood Master Database

Blaze Fire Classification – Segmentation Dataset

About

PHD Studies

Activities

Resources

User Login

AUW Dataset

AUW Dataset

TEMA AIIA_flood Dataset

TEMA AIIA_wildfire Dataset

3D-Flood Dataset

Flood Master Database

Blaze Fire Classification – Segmentation Dataset