Datasets#

autrainer provides a number of different audio-specific datasets, base datasets for different tasks, and toy datasets for testing purposes. To ensure consistency across different data formats and manage multiple data types, all datasets should follow a standardized structure.

Tip

To create custom datasets, refer to the custom datasets tutorial.

In addition to the common attributes like id, _target_, the dataset configuration file should include the following attributes:

Structure and Loading

path: Directory path containing the features_subdir directory and corresponding CSV files (such as train.csv, dev.csv, and test.csv).
features_subdir: The subdirectory within the dataset path where (extracted) features are stored.
- If no preprocessing is used (e.g., for raw audio), it should be default.
- For preprocessing transforms (e.g., log-Mel spectrograms with log_mel_16k), it should match the transform’s name, and the processed features are saved in this subdirectory after preprocessing.
index_column: Column in the CSV files containing the file paths, relative to the features_subdir directory.
target_column: Column in the CSV files containing the corresponding targets or labels for each file.
file_type: Specifies the type of files to be loaded (e.g., wav, npy, etc.).
file_handler: The file handler used for loading the files.

This results in a directory structure like the following:

{path}/{features_subdir}/optional/subdirs/some.file

For instance, a file in the index_column might be optional/subdirs/some.file, where some.file is an audio or a feature file.

In order to load custom dataset splits that do not follow the standard train.csv, dev.csv, and test.csv convention, the df_train, df_dev, and df_test, properties of the dataset class can be overwritten (see custom datasets tutorial).

Training and Evaluation

criterion: The criterion to use for training.
metrics: A list of metrics to evaluate the model.
tracking_metric: The metric to track for early stopping and model selection.
transform: The online transforms to apply to the data and the output type of the dataset.
train_loader_kwargs, dev_loader_kwargs, and test_loader_kwargs: Additional keyword arguments for the DataLoader such as the num_workers, prefetch_factor, etc. The keyword arguments can also be specified globally in the main configuration file, which will be passed to all datasets. However, the dataset-specific keyword arguments will overwrite the global ones.

Note

The following attributes are automatically passed to the dataset during initialization and determined at runtime:

train_transform, dev_transform, and test_transform: The SmartCompose transformation pipelines (which may include possible online transforms or augmentations).
seed: The random seed for reproducibility during training.

The transform attribute in the configuration is not passed to the dataset during initialization and is used to specify the type of data the dataset provides as well as any online transforms to be applied to the data at runtime.

To avoid race conditions when using Launcher Plugins that may run multiple training jobs in parallel, autrainer fetch and autrainer preprocess or fetch() and preprocess() are used to download the dataset and preprocess the data before training.

Note

All datasets that are provided by autrainer can be automatically downloaded as well as optionally preprocessed using the autrainer fetch and autrainer preprocess CLI commands or the fetch() and preprocess() CLI wrapper functions.

Abstract Dataset#

All datasets inherit from the AbstractDataset class.

class autrainer.datasets.AbstractDataset(path, features_subdir, seed, task, metrics, tracking_metric, index_column, target_column, file_type, file_handler, features_path=None, train_transform=None, dev_transform=None, test_transform=None, stratify=None)[source]#

Abstract dataset class.

Parameters:

path (str) – Root path to the dataset.
features_subdir (str) – Subdirectory containing the features. If None, defaults to audio subdirectory, which is default for the standard format, but can be overridden in the dataset specification.
seed (int) – Seed for reproducibility.
task (str) – Task of the dataset in TASKS.
metrics (List[Union[str, DictConfig, Dict]]) – List of metrics to calculate.
tracking_metric (Union[str, DictConfig, Dict]) – Metric to track.
index_column (str) – Index column of the dataframe.
target_column (Union[str, List[str]]) – Target column of the dataframe.
file_type (str) – File type of the features.
file_handler (Union[str, DictConfig, Dict]) – File handler to load the data.
features_path (Optional[str]) – Root path to features. Useful when features need to be extracted and stored in a different folder than the root of the dataset. If None, will be set to path. Defaults to None.
train_transform (Optional[SmartCompose]) – Transform to apply to the training set. Defaults to None.
dev_transform (Optional[SmartCompose]) – Transform to apply to the development set. Defaults to None.
test_transform (Optional[SmartCompose]) – Transform to apply to the test set. Defaults to None.
stratify (Optional[List[str]]) – Columns to stratify the dataset on. Defaults to None.

property audio_subdir: str#

Subfolder containing audio data.

Defaults to default for our standard format. Should be overridden for datasets that do not conform to it.

abstract property target_transform: AbstractTargetTransform#

Get the transform to apply to the target.

Returns:: Target transform.

property output_dim: int#

Get the output dimension of the dataset.

Returns:: Output dimension.

property df_train: DataFrame#

Dataframe for the training set, loaded from train.csv by default.

Returns:: Training dataframe.

property df_dev: DataFrame#

Dataframe for the development set, loaded from dev.csv by default.

Returns:: Development dataframe.

property df_test: DataFrame#

Dataframe for the test set, loaded from test.csv by default.

Returns:: Test dataframe.

property train_dataset: DatasetWrapper#

Get the training dataset.

Returns:: Training dataset.

property dev_dataset: DatasetWrapper#

Get the development dataset.

Returns:: Development dataset.

property test_dataset: DatasetWrapper#

Get the test dataset.

Returns:: Test dataset.

static download(path)[source]#

Download the dataset. Can be implemented by subclasses, but is not required.

Parameters:: path (str) – Path to download the dataset to.
Return type:: None

Base Datasets#

Base datasets that can be used for training without the need for creating custom datasets.

class autrainer.datasets.BaseClassificationDataset(path, features_subdir, seed, metrics, tracking_metric, index_column, target_column, file_type, file_handler, features_path=None, train_transform=None, dev_transform=None, test_transform=None, stratify=None)[source]#

Base classification dataset.

Parameters:

path (str) – Root path to the dataset.
features_subdir (str) – Subdirectory containing the features. If None, defaults to audio subdirectory, which is default for the standard format, but can be overridden in the dataset specification.
seed (int) – Seed for reproducibility.
metrics (List[Union[str, DictConfig, Dict]]) – List of metrics to calculate.
tracking_metric (Union[str, DictConfig, Dict]) – Metric to track.
index_column (str) – Index column of the dataframe.
target_column (str) – Target column of the dataframe.
file_type (str) – File type of the features.
file_handler (Union[str, DictConfig, Dict]) – File handler to load the data.
features_path (Optional[str]) – Root path to features. Useful when features need to be extracted and stored in a different folder than the root of the dataset. If None, will be set to path. Defaults to None.
train_transform (Optional[SmartCompose]) – Transform to apply to the training set. Defaults to None.
dev_transform (Optional[SmartCompose]) – Transform to apply to the development set. Defaults to None.
test_transform (Optional[SmartCompose]) – Transform to apply to the test set. Defaults to None.
stratify (Optional[List[str]]) – Columns to stratify the dataset on. Defaults to None.

property target_transform: LabelEncoder#

Get the transform to apply to the target.

Returns:: Target transform.

class autrainer.datasets.BaseMLClassificationDataset(path, features_subdir, seed, metrics, tracking_metric, index_column, target_column, file_type, file_handler, features_path=None, train_transform=None, dev_transform=None, test_transform=None, stratify=None, threshold=0.5)[source]#

Base multi-label classification dataset.

Parameters:

path (str) – Root path to the dataset.
features_subdir (str) – Subdirectory containing the features. If None, defaults to audio subdirectory, which is default for the standard format, but can be overridden in the dataset specification.
seed (int) – Seed for reproducibility.
metrics (List[Union[str, DictConfig, Dict]]) – List of metrics to calculate.
tracking_metric (Union[str, DictConfig, Dict]) – Metric to track.
index_column (str) – Index column of the dataframe.
target_column (List[str]) – Target column of the dataframe.
file_type (str) – File type of the features.
file_handler (Union[str, DictConfig, Dict]) – File handler to load the data.
features_path (Optional[str]) – Root path to features. Useful when features need to be extracted and stored in a different folder than the root of the dataset. If None, will be set to path. Defaults to None.
train_transform (Optional[SmartCompose]) – Transform to apply to the training set. Defaults to None.
dev_transform (Optional[SmartCompose]) – Transform to apply to the development set. Defaults to None.
test_transform (Optional[SmartCompose]) – Transform to apply to the test set. Defaults to None.
stratify (Optional[List[str]]) – Columns to stratify the dataset on. Defaults to None.
threshold (float) – Threshold for classification. Defaults to 0.5.

property target_transform: MultiLabelEncoder#

Get the transform to apply to the target.

Returns:: Target transform.

class autrainer.datasets.BaseRegressionDataset(path, features_subdir, seed, metrics, tracking_metric, index_column, target_column, file_type, file_handler, features_path=None, train_transform=None, dev_transform=None, test_transform=None, stratify=None)[source]#

Base regression dataset.

Parameters:

path (str) – Root path to the dataset.
features_subdir (str) – Subdirectory containing the features. If None, defaults to audio subdirectory, which is default for the standard format, but can be overridden in the dataset specification.
seed (int) – Seed for reproducibility.
metrics (List[Union[str, DictConfig, Dict]]) – List of metrics to calculate.
tracking_metric (Union[str, DictConfig, Dict]) – Metric to track.
index_column (str) – Index column of the dataframe.
target_column (str) – Target column of the dataframe.
file_type (str) – File type of the features.
file_handler (Union[str, DictConfig, Dict]) – File handler to load the data.
features_path (Optional[str]) – Root path to features. Useful when features need to be extracted and stored in a different folder than the root of the dataset. If None, will be set to path. Defaults to None.
train_transform (Optional[SmartCompose]) – Transform to apply to the training set. Defaults to None.
dev_transform (Optional[SmartCompose]) – Transform to apply to the development set. Defaults to None.
test_transform (Optional[SmartCompose]) – Transform to apply to the test set. Defaults to None.
stratify (Optional[List[str]]) – Columns to stratify the dataset on. Defaults to None.

property target_transform: MinMaxScaler#

Get the transform to apply to the target.

Returns:: Target transform.

class autrainer.datasets.BaseMTRegressionDataset(path, features_subdir, seed, metrics, tracking_metric, index_column, target_column, file_type, file_handler, features_path=None, train_transform=None, dev_transform=None, test_transform=None, stratify=None)[source]#

Base multi-target regression dataset.

Parameters:

path (str) – Root path to the dataset.
features_subdir (str) – Subdirectory containing the features. If None, defaults to audio subdirectory, which is default for the standard format, but can be overridden in the dataset specification.
seed (int) – Seed for reproducibility.
metrics (List[Union[str, DictConfig, Dict]]) – List of metrics to calculate.
tracking_metric (Union[str, DictConfig, Dict]) – Metric to track.
index_column (str) – Index column of the dataframe.
target_column (List[str]) – Target column of the dataframe.
file_type (str) – File type of the features.
file_handler (Union[str, DictConfig, Dict]) – File handler to load the data.
features_path (Optional[str]) – Root path to features. Useful when features need to be extracted and stored in a different folder than the root of the dataset. If None, will be set to path. Defaults to None.
train_transform (Optional[SmartCompose]) – Transform to apply to the training set. Defaults to None.
dev_transform (Optional[SmartCompose]) – Transform to apply to the development set. Defaults to None.
test_transform (Optional[SmartCompose]) – Transform to apply to the test set. Defaults to None.
stratify (Optional[List[str]]) – Columns to stratify the dataset on. Defaults to None.

property target_transform: MultiTargetMinMaxScaler#

Get the transform to apply to the target.

Returns:: Target transform.

Toy Datasets#

A toy dataset for testing purposes.

Note

To easily test implementations, multiple toy dataset configurations across modalities and tasks are provided. We offer ToyAudio-... for audio, ToyImage-... for image, and ToyTabular-... for tabular data, respectively. For each dataset, we provide a task -R for regression, -C for classification, -MLC for multi-label classification, and -MTR for multi-target regression.

class autrainer.datasets.ToyDataset(task, size, num_targets, feature_shape, dev_split, test_split, seed, metrics, tracking_metric, dtype='float32', train_transform=None, dev_transform=None, test_transform=None)[source]#

Toy dataset for testing purposes.

Parameters:

task (str) – Task of the dataset in [“classification”, “regression”, “ml-classification”, “mt-regression”].
size (int) – Size of the dataset.
num_targets (int) – Number of targets.
feature_shape (Union[int, List[int]]) – Shape of the features.
dev_split (float) – Proportion of the dataset to use for the development set.
test_split (float) – Proportion of the dataset to use for the test set.
seed (int) – Seed for reproducibility.
metrics (List[Union[str, DictConfig, Dict]]) – List of metrics to calculate.
tracking_metric (Union[str, DictConfig, Dict]) – Metric to track.
train_transform (Optional[SmartCompose]) – Transform to apply to the training set. Defaults to None.
dev_transform (Optional[SmartCompose]) – Transform to apply to the development set. Defaults to None.
test_transform (Optional[SmartCompose]) – Transform to apply to the test set. Defaults to None.

Audio Datasets#

We provide a number of different audio-specific datasets.

class autrainer.datasets.AIBO(path, features_subdir, seed, metrics, tracking_metric, index_column, target_column, file_type, file_handler, features_path=None, train_transform=None, dev_transform=None, test_transform=None, stratify=None, aibo_task='2cl')[source]#

FAU AIBO dataset.

Parameters:

path (str) – Root path to the dataset.
features_subdir (str) – Subdirectory containing the features. If None, defaults to audio subdirectory, which is default for the standard format, but can be overridden in the dataset specification.
seed (int) – Seed for reproducibility.
metrics (List[Union[str, DictConfig, Dict]]) – List of metrics to calculate.
tracking_metric (Union[str, DictConfig, Dict]) – Metric to track.
index_column (str) – Index column of the dataframe.
target_column (str) – Target column of the dataframe.
file_type (str) – File type of the features.
file_handler (Union[str, DictConfig, Dict]) – File handler to load the data.
features_path (Optional[str]) – Root path to features. Useful when features need to be extracted and stored in a different folder than the root of the dataset. If None, will be set to path. Defaults to None.
train_transform (Optional[SmartCompose]) – Transform to apply to the training set. Defaults to None.
dev_transform (Optional[SmartCompose]) – Transform to apply to the development set. Defaults to None.
test_transform (Optional[SmartCompose]) – Transform to apply to the test set. Defaults to None.
stratify (Optional[List[str]]) – Columns to stratify the dataset on. Defaults to None.
aibo_task (str) – Task to load in [“2cl”, “5cl”]. Defaults to “2cl”.

Default Configurations

conf/dataset/AIBO-eGeMAPS-llds.yaml#

# Important: should be used with inference_batch_size: 1
id: AIBO-eGeMAPS-llds
_target_: autrainer.datasets.AIBO

aibo_task: 2cl

path: data/AIBO
features_subdir: eGeMAPSv02-llds
index_column: file
target_column: class
file_type: npy
file_handler: autrainer.datasets.utils.NumpyFileHandler

criterion: autrainer.criterions.BalancedCrossEntropyLoss
metrics: 
  - autrainer.metrics.Accuracy
  - autrainer.metrics.UAR
  - autrainer.metrics.F1
tracking_metric: autrainer.metrics.Accuracy

transform:
  type: tabular
  base:
    - autrainer.transforms.Expand:
        size: 200
  train:
    - autrainer.transforms.RandomCrop:
        size: 200

conf/dataset/AIBO-IS16-llds.yaml#

# Important: should be used with inference_batch_size: 1
id: AIBO-IS16-llds
_target_: autrainer.datasets.AIBO

aibo_task: 2cl

path: data/AIBO
features_subdir: ComParE_2016-llds
index_column: file
target_column: class
file_type: npy
file_handler: autrainer.datasets.utils.NumpyFileHandler

criterion: autrainer.criterions.BalancedCrossEntropyLoss
metrics: 
  - autrainer.metrics.Accuracy
  - autrainer.metrics.UAR
  - autrainer.metrics.F1
tracking_metric: autrainer.metrics.Accuracy

transform:
  type: tabular
  base:
    - autrainer.transforms.Expand:
        size: 200
  train:
    - autrainer.transforms.RandomCrop:
        size: 200

conf/dataset/AIBO-mel-16k.yaml#

# Important: should be used with inference_batch_size: 1
id: AIBO-mel-16k
_target_: autrainer.datasets.AIBO

aibo_task: 2cl

path: data/AIBO
features_subdir: log_mel_16k
index_column: file
target_column: class
file_type: npy
file_handler: autrainer.datasets.utils.NumpyFileHandler

criterion: autrainer.criterions.BalancedCrossEntropyLoss
metrics: 
  - autrainer.metrics.Accuracy
  - autrainer.metrics.UAR
  - autrainer.metrics.F1
tracking_metric: autrainer.metrics.Accuracy

transform:
  type: grayscale
  base:
    - autrainer.transforms.Expand:
        size: 301
        axis: -2
  train:
    - autrainer.transforms.RandomCrop:
        size: 301
        axis: -2

conf/dataset/AIBO-mel-32k.yaml#

# Important: should be used with inference_batch_size: 1
id: AIBO-mel-32k
_target_: autrainer.datasets.AIBO

aibo_task: 2cl

path: data/AIBO
features_subdir: log_mel_32k
index_column: file
target_column: class
file_type: npy
file_handler: autrainer.datasets.utils.NumpyFileHandler

criterion: autrainer.criterions.BalancedCrossEntropyLoss
metrics: 
  - autrainer.metrics.Accuracy
  - autrainer.metrics.UAR
  - autrainer.metrics.F1
tracking_metric: autrainer.metrics.Accuracy

transform:
  type: grayscale
  base:
    - autrainer.transforms.Expand:
        size: 301
        axis: -2
  train:
    - autrainer.transforms.RandomCrop:
        size: 301
        axis: -2

conf/dataset/AIBO-wav-pad.yaml#

# Important: should be used with inference_batch_size: 1
id: AIBO-wav-pad
_target_: autrainer.datasets.AIBO

aibo_task: 2cl

path: data/AIBO
features_subdir: default
index_column: file
target_column: class
file_type: wav
file_handler: autrainer.datasets.utils.AudioFileHandler

criterion: autrainer.criterions.BalancedCrossEntropyLoss
metrics: 
  - autrainer.metrics.Accuracy
  - autrainer.metrics.UAR
  - autrainer.metrics.F1
tracking_metric: autrainer.metrics.Accuracy

transform:
  type: raw 
  base:
    - autrainer.transforms.Expand:
        size: 48000
        axis: -1
  train:
    - autrainer.transforms.RandomCrop:
        size: 48000
        axis: -1

conf/dataset/AIBO-wav.yaml#

# Important: should be used with inference_batch_size: 1
id: AIBO-wav
_target_: autrainer.datasets.AIBO

aibo_task: 2cl

path: data/AIBO
features_subdir: default
index_column: file
target_column: class
file_type: wav
file_handler: autrainer.datasets.utils.AudioFileHandler

criterion: autrainer.criterions.BalancedCrossEntropyLoss
metrics: 
  - autrainer.metrics.Accuracy
  - autrainer.metrics.UAR
  - autrainer.metrics.F1
tracking_metric: autrainer.metrics.Accuracy

transform:
  type: raw

property audio_subdir: str#: Subfolder containing audio data.

property df_train: DataFrame#

Dataframe for the training set, loaded from train.csv by default.

Returns:: Training dataframe.

property df_dev: DataFrame#

Dataframe for the development set, loaded from dev.csv by default.

Returns:: Development dataframe.

property df_test: DataFrame#

Dataframe for the test set, loaded from test.csv by default.

Returns:: Test dataframe.

static download(path)[source]#

Download the FAU AIBO dataset.

As the AIBO dataset is private, this method does not download the dataset but rather prepares the file structure expected by the preprocessing routines.

In the specified path, the following directories and files are expected:

default/: Directory containing .wav files.
chunk_labels_2cl_corpus.txt: File containing the file names and corresponding labels for the 2-class classification task.
chunk_labels_5cl_corpus.txt: File containing the file names and corresponding labels for the 5-class classification task.

For more information on the dataset and dataset split, see: https://doi.org/10.1109/ICME51207.2021.9428217

Parameters:: path (str) – Path to the directory to download the dataset to.
Return type:: None

class autrainer.datasets.AudioSet(path, features_subdir, seed, metrics, tracking_metric, index_column, file_type, file_handler, target_column=None, features_path=None, train_transform=None, dev_transform=None, test_transform=None, stratify=None, threshold=0.5, use_unbalanced=False, include=None, exclude=None)[source]#

AudioSet dataset.

Warning

AudioSet changes constantly as videos are removed from YouTube over time. Results are not reproducible across different snapshots of the data.

Parameters:

path (str) – Root path to the dataset.
features_subdir (str) – Subdirectory containing the features. If None, defaults to audio subdirectory, which is default for the standard format, but can be overridden in the dataset specification.
seed (int) – Seed for reproducibility.
metrics (List[Union[str, DictConfig, Dict]]) – List of metrics to calculate.
tracking_metric (Union[str, DictConfig, Dict]) – Metric to track.
index_column (str) – Index column of the dataframe.
target_column (Optional[List[str]]) – Target column of the dataframe. If None, defaults to the least common subset available in the data. Defaults to None.
file_type (str) – File type of the features.
file_handler (Union[str, DictConfig, Dict]) – File handler to load the data.
batch_size – Batch size.
inference_batch_size – Inference batch size. If None, defaults to batch_size. Defaults to None.
features_path (Optional[str]) – Root path to features. Useful when features need to be extracted and stored in a different folder than the root of the dataset. If None, will be set to path. Defaults to None.
train_transform (Optional[SmartCompose]) – Transform to apply to the training set. Defaults to None.
dev_transform (Optional[SmartCompose]) – Transform to apply to the development set. Defaults to None.
test_transform (Optional[SmartCompose]) – Transform to apply to the test set. Defaults to None.
stratify (Optional[List[str]]) – Columns to stratify the dataset on. Defaults to None.
threshold (float) – Threshold for classification. Defaults to 0.5.
use_unbalanced (bool) – Flag to allow the use of unbalanced set during training.
include (Optional[List[str]]) – List of categories to include. If set, only instances tagged with at least one category from the list will be included in the final dataset.
exclude (Optional[List[str]]) – List of categories to include. If set, all instances tagged with at least one category from the list will be excluded from the final dataset.

property audio_subdir: str#

Subfolder containing audio data.

Data assumed to be in root folder under the name of the respective partition which is prepended once the CSV file is loaded. Available partitions are (with original number of instances): - balanced_train_segments (n=22,160) - unbalanced_train_segments (n=2,041,789) - eval_segments (n=20,371)

map_to_classes(df)[source]#

Map dataframe as loaded to autrainer compatible.

Assigns all AudioSet IDs to category names. Recursively assigns parent nodes to every instance where a child is present. Optionally filters dataframes to include/exclude instances. Creates columns with final labels.

Parameters:: df (DataFrame) – dataframe loaded from CSV.
Return type:: DataFrame
Returns:: Adapted dataframe.

property ontology: Dict#: Reads AudioSet ontology.

property df_train: DataFrame#

Dataframe for the training set, loaded from train.csv by default.

Returns:: Training dataframe.

property df_dev: DataFrame#

Dataframe for the development set, loaded from dev.csv by default.

Returns:: Development dataframe.

property df_test: DataFrame#

Dataframe for the test set, loaded from test.csv by default.

Returns:: Test dataframe.

static download(path)[source]#

Download AudioSet.

The data must be downloaded manually from https://research.google.com/audioset/download.html. We do not implement an automatic download as this is a very costly process that the user should handle outside of autrainer.

Parameters:: path (str) – Path to the directory to download the dataset to.
Return type:: None

class autrainer.datasets.DCASE2016Task1(path, features_subdir, seed, metrics, tracking_metric, index_column, target_column, file_type, file_handler, features_path=None, train_transform=None, dev_transform=None, test_transform=None, stratify=None, fold=1)[source]#

TUT Acoustic scenes 2016 Task 1 (DCASE2016Task1) dataset.

Parameters:

path (str) – Root path to the dataset.
features_subdir (str) – Subdirectory containing the features. If None, defaults to audio subdirectory, which is default for the standard format, but can be overridden in the dataset specification.
seed (int) – Seed for reproducibility.
metrics (List[Union[str, DictConfig, Dict]]) – List of metrics to calculate.
tracking_metric (Union[str, DictConfig, Dict]) – Metric to track.
index_column (str) – Index column of the dataframe.
target_column (str) – Target column of the dataframe.
file_type (str) – File type of the features.
file_handler (Union[str, DictConfig, Dict]) – File handler to load the data.
features_path (Optional[str]) – Root path to features. Useful when features need to be extracted and stored in a different folder than the root of the dataset. If None, will be set to path. Defaults to None.
train_transform (Optional[SmartCompose]) – Transform to apply to the training set. Defaults to None.
dev_transform (Optional[SmartCompose]) – Transform to apply to the development set. Defaults to None.
test_transform (Optional[SmartCompose]) – Transform to apply to the test set. Defaults to None.
stratify (Optional[List[str]]) – Columns to stratify the dataset on. Defaults to None.
fold (int) – Fold to use in [1, 2, 3, 4]. Defaults to 1.

property df_train: DataFrame#

Dataframe for the training set, loaded from train.csv by default.

Returns:: Training dataframe.

property df_dev: DataFrame#

Dataframe for the development set, loaded from dev.csv by default.

Returns:: Development dataframe.

property df_test: DataFrame#

Dataframe for the test set, loaded from test.csv by default.

Returns:: Test dataframe.

static download(path)[source]#

Download the TUT Acoustic scenes 2016 Task 1 (DCASE2016Task1) dataset.

For more information on the dataset and dataset split, see: https://dcase.community/challenge2016/task-acoustic-scene-classification

Parameters:: path (str) – Path to the directory to download the dataset to.
Return type:: None

class autrainer.datasets.DCASE2018Task3(path, features_subdir, seed, metrics, tracking_metric, index_column, target_column, file_type, file_handler, features_path=None, train_transform=None, dev_transform=None, test_transform=None, stratify=None, dev_split=0.0, dev_split_seed=None)[source]#

DCASE 2018 Task 3 dataset.

Parameters:

path (str) – Root path to the dataset.
features_subdir (str) – Subdirectory containing the features. If None, defaults to audio subdirectory, which is default for the standard format, but can be overridden in the dataset specification.
seed (int) – Seed for reproducibility.
metrics (List[Union[str, DictConfig, Dict]]) – List of metrics to calculate.
tracking_metric (Union[str, DictConfig, Dict]) – Metric to track.
index_column (str) – Index column of the dataframe.
target_column (str) – Target column of the dataframe.
file_type (str) – File type of the features.
file_handler (Union[str, DictConfig, Dict]) – File handler to load the data.
features_path (Optional[str]) – Root path to features. Useful when features need to be extracted and stored in a different folder than the root of the dataset. If None, will be set to path. Defaults to None.
train_transform (Optional[SmartCompose]) – Transform to apply to the training set. Defaults to None.
dev_transform (Optional[SmartCompose]) – Transform to apply to the development set. Defaults to None.
test_transform (Optional[SmartCompose]) – Transform to apply to the test set. Defaults to None.
stratify (Optional[List[str]]) – Columns to stratify the dataset on. Defaults to None.
dev_split (float) – Fraction of the training set to use as the development set. Defaults to 0.0.
dev_split_seed (Optional[int]) – Seed for the development split. If None, seed is used. Defaults to None.

property df_train: DataFrame#

Dataframe for the training set, loaded from train.csv by default.

Returns:: Training dataframe.

property df_dev: DataFrame#

Dataframe for the development set, loaded from dev.csv by default.

Returns:: Development dataframe.

static download(path)[source]#

Download the DCASE 2018 Task 3 dataset.

For the train dataset, the following subsets are used:

Field recordings, worldwide (“freefield1010”)
Remote monitoring flight calls, USA (“BirdVox-DCASE-20k”)

For the test dataset, the following subset is used:

Crowdsourced dataset, UK (“warblrb10k”)

Both the training and test datasets are taken from the development set, as no labels are provided for the evaluation set. For more information on the dataset, see: https://dcase.community/challenge2018/task-bird-audio-detection

Parameters:: path (str) – Path to the directory to download the dataset to.
Return type:: None

class autrainer.datasets.DCASE2020Task1A(path, features_subdir, seed, metrics, tracking_metric, index_column, target_column, file_type, file_handler, features_path=None, train_transform=None, dev_transform=None, test_transform=None, stratify=None, dev_split=0.0, dev_split_seed=None, scene_category=None, exclude_cities=None)[source]#

TAU Urban Acoustic Scenes 2020 Mobile Task 1 Subtask A (DCASE2020Task1A) dataset.

Parameters:

path (str) – Root path to the dataset.
features_subdir (str) – Subdirectory containing the features. If None, defaults to audio subdirectory, which is default for the standard format, but can be overridden in the dataset specification.
seed (int) – Seed for reproducibility.
metrics (List[Union[str, DictConfig, Dict]]) – List of metrics to calculate.
tracking_metric (Union[str, DictConfig, Dict]) – Metric to track.
index_column (str) – Index column of the dataframe.
target_column (str) – Target column of the dataframe.
file_type (str) – File type of the features.
file_handler (Union[str, DictConfig, Dict]) – File handler to load the data.
features_path (Optional[str]) – Root path to features. Useful when features need to be extracted and stored in a different folder than the root of the dataset. If None, will be set to path. Defaults to None.
train_transform (Optional[SmartCompose]) – Transform to apply to the training set. Defaults to None.
dev_transform (Optional[SmartCompose]) – Transform to apply to the development set. Defaults to None.
test_transform (Optional[SmartCompose]) – Transform to apply to the test set. Defaults to None.
stratify (Optional[List[str]]) – Columns to stratify the dataset on. Defaults to None.
dev_split (float) – Fraction of the training set to use as the development set. Defaults to 0.0.
dev_split_seed (Optional[int]) – Seed for the development split. If None, seed is used. Defaults to None.
scene_category (Optional[str]) – Scene category in [“indoor”, “outdoor”, “transportation”]. Defaults to None.
exclude_cities (Optional[List[str]]) – List of cities to exclude from the dataset. Defaults to None.

Default Configurations

conf/dataset/DCASE2020Task1A-16k.yaml#

id: DCASE2020Task1A-16k
_target_: autrainer.datasets.DCASE2020Task1A

dev_split: 0.1
dev_split_seed: 0
stratify:
  - scene_label
  - city
  - device

path: data/DCASE2020Task1A
features_subdir: log_mel_16k
index_column: filename
target_column: scene_label
file_type: npy
file_handler: autrainer.datasets.utils.NumpyFileHandler

criterion: autrainer.criterions.BalancedCrossEntropyLoss
metrics:
  - autrainer.metrics.Accuracy
  - autrainer.metrics.UAR
  - autrainer.metrics.F1
tracking_metric: autrainer.metrics.Accuracy

transform:
  type: grayscale

conf/dataset/DCASE2020Task1A-32k.yaml#

id: DCASE2020Task1A-32k
_target_: autrainer.datasets.DCASE2020Task1A

dev_split: 0.1
dev_split_seed: 0
stratify:
  - scene_label
  - city
  - device

path: data/DCASE2020Task1A
features_subdir: log_mel_32k
index_column: filename
target_column: scene_label
file_type: npy
file_handler: autrainer.datasets.utils.NumpyFileHandler

criterion: autrainer.criterions.BalancedCrossEntropyLoss
metrics:
  - autrainer.metrics.Accuracy
  - autrainer.metrics.UAR
  - autrainer.metrics.F1
tracking_metric: autrainer.metrics.Accuracy

transform:
  type: grayscale

conf/dataset/DCASE2020Task1A-wav-16k.yaml#

id: DCASE2020Task1A-wav-16k
_target_: autrainer.datasets.DCASE2020Task1A

dev_split: 0.1
dev_split_seed: 0
stratify:
  - scene_label
  - city
  - device

path: data/DCASE2020Task1A
features_subdir: default
index_column: filename
target_column: scene_label
file_type: wav
file_handler:
  autrainer.datasets.utils.AudioFileHandler:
    target_sample_rate: 16000

criterion: autrainer.criterions.BalancedCrossEntropyLoss
metrics:
  - autrainer.metrics.Accuracy
  - autrainer.metrics.UAR
  - autrainer.metrics.F1
tracking_metric: autrainer.metrics.Accuracy

transform:
  type: raw

conf/dataset/DCASE2020Task1A-wav.yaml#

id: DCASE2020Task1A-wav
_target_: autrainer.datasets.DCASE2020Task1A

dev_split: 0.1
dev_split_seed: 0
stratify:
  - scene_label
  - city
  - device

path: data/DCASE2020Task1A
features_subdir: default
index_column: filename
target_column: scene_label
file_type: wav
file_handler: autrainer.datasets.utils.AudioFileHandler

criterion: autrainer.criterions.BalancedCrossEntropyLoss
metrics:
  - autrainer.metrics.Accuracy
  - autrainer.metrics.UAR
  - autrainer.metrics.F1
tracking_metric: autrainer.metrics.Accuracy

transform:
  type: raw

property df_train: DataFrame#

Dataframe for the training set, loaded from train.csv by default.

Returns:: Training dataframe.

property df_dev: DataFrame#

Dataframe for the development set, loaded from dev.csv by default.

Returns:: Development dataframe.

property df_test: DataFrame#

Dataframe for the test set, loaded from test.csv by default.

Returns:: Test dataframe.

static download(path)[source]#

Download the TAU Urban Acoustic Scenes 2020 Mobile Task 1 Subtask A (DCASE2020Task1A) dataset.

As no labels are provided for the evaluation set, the provided training and test split of the development set is created. Therefore, this download does not include the evaluation set.

For more information on the dataset, see: https://dcase.community/challenge2020/task-acoustic-scene-classification

Parameters:: path (str) – Path to the directory to download the dataset to.
Return type:: None

class autrainer.datasets.EDANSA2019(path, features_subdir, seed, metrics, tracking_metric, index_column, target_column, file_type, file_handler, features_path=None, train_transform=None, dev_transform=None, test_transform=None, stratify=None, threshold=0.5)[source]#

EDANSA 2019 dataset.

Parameters:

path (str) – Root path to the dataset.
features_subdir (str) – Subdirectory containing the features. If None, defaults to audio subdirectory, which is default for the standard format, but can be overridden in the dataset specification.
seed (int) – Seed for reproducibility.
metrics (List[Union[str, DictConfig, Dict]]) – List of metrics to calculate.
tracking_metric (Union[str, DictConfig, Dict]) – Metric to track.
index_column (str) – Index column of the dataframe.
target_column (List[str]) – Target column of the dataframe.
file_type (str) – File type of the features.
file_handler (Union[str, DictConfig, Dict]) – File handler to load the data.
features_path (Optional[str]) – Root path to features. Useful when features need to be extracted and stored in a different folder than the root of the dataset. If None, will be set to path. Defaults to None.
train_transform (Optional[SmartCompose]) – Transform to apply to the training set. Defaults to None.
dev_transform (Optional[SmartCompose]) – Transform to apply to the development set. Defaults to None.
test_transform (Optional[SmartCompose]) – Transform to apply to the test set. Defaults to None.
stratify (Optional[List[str]]) – Columns to stratify the dataset on. Defaults to None.
threshold (float) – Threshold for classification. Defaults to 0.5.

static download(path)[source]#

Download the EDANSA 2019 dataset.

For more information on the dataset, see: https://zenodo.org/doi/10.5281/zenodo.6824271

Parameters:: path (str) – Path to the directory to download the dataset to.
Return type:: None

class autrainer.datasets.EmoDB(path, features_subdir, seed, metrics, tracking_metric, index_column, target_column, file_type, file_handler, features_path=None, train_transform=None, dev_transform=None, test_transform=None, stratify=None, train_speakers=None, dev_speakers=None, test_speakers=None)[source]#

EmoDB dataset for the task of Speech Emotion Recognition.

Parameters:

path (str) – Root path to the dataset.
features_subdir (str) – Subdirectory containing the features. If None, defaults to audio subdirectory, which is default for the standard format, but can be overridden in the dataset specification.
seed (int) – Seed for reproducibility.
metrics (List[Union[str, DictConfig, Dict]]) – List of metrics to calculate.
tracking_metric (Union[str, DictConfig, Dict]) – Metric to track.
index_column (str) – Index column of the dataframe.
target_column (str) – Target column of the dataframe.
file_type (str) – File type of the features.
file_handler (Union[str, DictConfig, Dict]) – File handler to load the data.
features_path (Optional[str]) – Root path to features. Useful when features need to be extracted and stored in a different folder than the root of the dataset. If None, will be set to path. Defaults to None.
train_transform (Optional[SmartCompose]) – Transform to apply to the training set. Defaults to None.
dev_transform (Optional[SmartCompose]) – Transform to apply to the development set. Defaults to None.
test_transform (Optional[SmartCompose]) – Transform to apply to the test set. Defaults to None.
stratify (Optional[List[str]]) – Columns to stratify the dataset on. Defaults to None.
train_speakers (Optional[List[int]]) – List of speakers IDs (int) to use for training. If None, 3, 8, 9, 10, 11, 12 are used. Defaults to None.
dev_speakers (Optional[List[int]]) – List of speakers IDs (int) to use for validation. If None, 13, 14 are used. Defaults to None.
test_speakers (Optional[List[int]]) – List of speakers IDs (int) to use for testing. If None, 15, 16 are used. Defaults to None.

property df_train: DataFrame#

Dataframe for the training set, loaded from train.csv by default.

Returns:: Training dataframe.

property df_dev: DataFrame#

Dataframe for the development set, loaded from dev.csv by default.

Returns:: Development dataframe.

property df_test: DataFrame#

Dataframe for the test set, loaded from test.csv by default.

Returns:: Test dataframe.

static download(path)[source]#

Download the EmoDB dataset.

For more information on the dataset, see: http://emodb.bilderbar.info/docu/

Parameters:: path (str) – Path to the directory to download the dataset to.
Return type:: None

class autrainer.datasets.MSPPodcast(path, seed, metrics, tracking_metric, target_column, file_type, file_handler, index_column='FileName', features_subdir=None, train_transform=None, dev_transform=None, test_transform=None, stratify=None, categories=None)[source]#

MSP-Podcast dataset.

Warning

There are multiple versions available for this dataset. We recommend always using the latest one (v1.11 at the time of writing) but our code is set up to work with all versions (at least up to v1.11).

Note

Note that after v1.7, the dataset features two test sets. We only use Test1, as Test2 was found to be biased with respect to gender. See https://doi.org/10.21437/Interspeech.2019-1708.

Note

Unlike other datasets which only support classification or regression, MSP-Podcast supports both. This is determined by picking the appropriate target column. EmoClass corresponds to categorical emotion classification, whereas EmoAct, EmoVal, and EmoDom to dimensional emotion regression for activation (arousal), valence, and dominance, respectively.

Parameters:

path (str) – Root path to the dataset.
features_subdir (Optional[str]) – Subdirectory containing the features.
seed (int) – Seed for reproducibility.
metrics (List[str]) – List of metrics to calculate.
tracking_metric (str) – Metric to track.
target_column (str) – Target column of the dataframe.
file_type (str) – File type of the features.
file_handler (Union[str, DictConfig, Dict]) – File handler to load the data.
index_column (str) – Index column of the dataframe. Defaults to FileName, as in the original data.
train_transform (Optional[SmartCompose]) – Transform to apply to the training set. Defaults to None.
dev_transform (Optional[SmartCompose]) – Transform to apply to the development set. Defaults to None.
test_transform (Optional[SmartCompose]) – Transform to apply to the test set. Defaults to None.
stratify (Optional[List[str]]) – Columns to stratify the dataset on. Defaults to None.
categories (Optional[List[str]]) – used to filter out specific emotional categories. Useful for training on subset of data/classes, such as the classic [“A”, “H”, “N”, “S”] 4-class problem found in literature. Defaults to None.

Default Configurations

conf/dataset/MSPPodcast-EmoAct-wav.yaml#

# Important: should be used with inference_batch_size: 1
id: MSPPodcast-EmoAct-wav
_target_: autrainer.datasets.MSPPodcast

path: data/MSPPodcast
index_column: FileName
target_column: EmoAct
file_type: wav
file_handler: autrainer.datasets.utils.AudioFileHandler

criterion: autrainer.criterions.MSELoss
metrics:
  - autrainer.metrics.PCC
  - autrainer.metrics.CCC
  - autrainer.metrics.MSE
  - autrainer.metrics.MAE
tracking_metric: autrainer.metrics.CCC

transform:
  type: raw
  base:
    - autrainer.transforms.Expand:
        size: 48000
        axis: -1
  train:
    - autrainer.transforms.RandomCrop:
        size: 48000
        axis: -1

conf/dataset/MSPPodcast-EmoClass-all-wav.yaml#

id: MSPPodcast-EmoClass-all-wav
_target_: autrainer.datasets.MSPPodcast

path: data/MSPPodcast
index_column: FileName
target_column: EmoClass
file_type: wav
file_handler: autrainer.datasets.utils.AudioFileHandler

criterion: autrainer.criterions.BalancedCrossEntropyLoss
metrics: 
  - autrainer.metrics.Accuracy
  - autrainer.metrics.UAR
  - autrainer.metrics.F1
tracking_metric: autrainer.metrics.Accuracy

transform:
  type: raw
  base:
    - autrainer.transforms.Expand:
        size: 48000
        axis: -1
  train:
    - autrainer.transforms.RandomCrop:
        size: 48000
        axis: -1

conf/dataset/MSPPodcast-EmoClass-big4-wav.yaml#

# Important: should be used with inference_batch_size: 1
id: MSPPodcast-EmoClass-big4-wav
_target_: autrainer.datasets.MSPPodcast

path: data/MSPPodcast
index_column: FileName
target_column: EmoClass
file_type: wav
file_handler: autrainer.datasets.utils.AudioFileHandler

criterion: autrainer.criterions.BalancedCrossEntropyLoss
metrics: 
  - autrainer.metrics.Accuracy
  - autrainer.metrics.UAR
  - autrainer.metrics.F1
tracking_metric: autrainer.metrics.Accuracy

categories: [A,H,N,S]

transform:
  type: raw
  base:
    - autrainer.transforms.Expand:
        size: 48000
        axis: -1
  train:
    - autrainer.transforms.RandomCrop:
        size: 48000
        axis: -1

conf/dataset/MSPPodcast-EmoDom-wav.yaml#

# Important: should be used with inference_batch_size: 1
id: MSPPodcast-EmoDom-wav
_target_: autrainer.datasets.MSPPodcast

path: data/MSPPodcast
index_column: FileName
target_column: EmoDom
file_type: wav
file_handler: autrainer.datasets.utils.AudioFileHandler

criterion: autrainer.criterions.MSELoss
metrics:
  - autrainer.metrics.PCC
  - autrainer.metrics.CCC
  - autrainer.metrics.MSE
  - autrainer.metrics.MAE
tracking_metric: autrainer.metrics.CCC

transform:
  type: raw
  base:
    - autrainer.transforms.Expand:
        size: 48000
        axis: -1
  train:
    - autrainer.transforms.RandomCrop:
        size: 48000
        axis: -1

conf/dataset/MSPPodcast-EmoVal-wav.yaml#

# Important: should be used with inference_batch_size: 1
id: MSPPodcast-EmoVal-wav
_target_: autrainer.datasets.MSPPodcast

path: data/MSPPodcast
index_column: FileName
target_column: EmoVal
file_type: wav
file_handler: autrainer.datasets.utils.AudioFileHandler

criterion: autrainer.criterions.MSELoss
metrics:
  - autrainer.metrics.PCC
  - autrainer.metrics.CCC
  - autrainer.metrics.MSE
  - autrainer.metrics.MAE
tracking_metric: autrainer.metrics.CCC

transform:
  type: raw
  base:
    - autrainer.transforms.Expand:
        size: 48000
        axis: -1
  train:
    - autrainer.transforms.RandomCrop:
        size: 48000
        axis: -1

property audio_subdir: str#

Subfolder containing audio data.

Defaults to Audios for MSP-Podcast.

static download(path)[source]#

Download the MSP-Podcast dataset.

As this dataset is not publicly-available, please download it manually by contacting Prof. Carlos Busso: https://ecs.utdallas.edu/research/researchlabs/msp-lab/MSP-Podcast.html

This function will not do anything.

For more information on the data, see: https://doi.org/10.1109/TAFFC.2017.2736999

Return type:: None

property df_train#

Dataframe for the training set, loaded from train.csv by default.

Returns:: Training dataframe.

property df_dev#

Dataframe for the development set, loaded from dev.csv by default.

Returns:: Development dataframe.

property df_test#

Dataframe for the test set, loaded from test.csv by default.

Returns:: Test dataframe.

property target_transform: AbstractTargetTransform#

Get the target transform.

Determined automatically based on the type of task.

Returns:: Target transform.

class autrainer.datasets.SpeechCommands(path, features_subdir, seed, metrics, tracking_metric, index_column, target_column, file_type, file_handler, features_path=None, train_transform=None, dev_transform=None, test_transform=None, stratify=None)[source]#

Speech Commands (v0.02) dataset.

Parameters:

path (str) – Root path to the dataset.
features_subdir (str) – Subdirectory containing the features. If None, defaults to audio subdirectory, which is default for the standard format, but can be overridden in the dataset specification.
seed (int) – Seed for reproducibility.
metrics (List[Union[str, DictConfig, Dict]]) – List of metrics to calculate.
tracking_metric (Union[str, DictConfig, Dict]) – Metric to track.
index_column (str) – Index column of the dataframe.
target_column (str) – Target column of the dataframe.
file_type (str) – File type of the features.
file_handler (Union[str, DictConfig, Dict]) – File handler to load the data.
features_path (Optional[str]) – Root path to features. Useful when features need to be extracted and stored in a different folder than the root of the dataset. If None, will be set to path. Defaults to None.
train_transform (Optional[SmartCompose]) – Transform to apply to the training set. Defaults to None.
dev_transform (Optional[SmartCompose]) – Transform to apply to the development set. Defaults to None.
test_transform (Optional[SmartCompose]) – Transform to apply to the test set. Defaults to None.
stratify (Optional[List[str]]) – Columns to stratify the dataset on. Defaults to None.

static download(path)[source]#

Download the Speech Commands (v0.02) dataset from torchaudio.

For more information on the dataset, see: https://doi.org/10.48550/arXiv.1804.03209

Parameters:: path (str) – Path to the directory to download the dataset to.
Return type:: None

Table of Contents

Datasets#

Abstract Dataset#

Base Datasets#

Toy Datasets#

Audio Datasets#