Datasets#

autrainer provides a number of different audio-specific datasets, base datasets for different tasks, and toy datasets for testing purposes. To ensure consistency across different data formats and manage multiple data types, all datasets should follow a standardized structure.

Tip

To create custom datasets, refer to the custom datasets tutorial.

In addition to the common attributes like id, _target_, the dataset configuration file should include the following attributes:

Structure and Loading

  • path: Directory path containing the features_subdir directory and corresponding CSV files (such as train.csv, dev.csv, and test.csv).

  • features_subdir: The subdirectory within the dataset path where (extracted) features are stored.

    • If no preprocessing is used (e.g., for raw audio), it should be default.

    • For preprocessing transforms (e.g., log-Mel spectrograms with log_mel_16k), it should match the transform’s name, and the processed features are saved in this subdirectory after preprocessing.

  • index_column: Column in the CSV files containing the file paths, relative to the features_subdir directory.

  • target_column: Column in the CSV files containing the corresponding targets or labels for each file.

  • file_type: Specifies the type of files to be loaded (e.g., wav, npy, etc.).

  • file_handler: The file handler used for loading the files.

This results in a directory structure like the following:

{path}/{features_subdir}/optional/subdirs/some.file

For instance, a file in the index_column might be optional/subdirs/some.file, where some.file is an audio or a feature file.

In order to load custom dataset splits that do not follow the standard train.csv, dev.csv, and test.csv convention, the load_dataframes() method can be overwritten (see custom datasets tutorial).

Training and Evaluation

  • criterion: The criterion to use for training.

  • metrics: A list of metrics to evaluate the model.

  • tracking_metric: The metric to track for early stopping and model selection.

  • transform: The online transforms to apply to the data and the output type of the dataset.

Note

The following attributes are automatically passed to the dataset during initialization and determined at runtime:

  • train_transform, dev_transform, and test_transform: The SmartCompose transformation pipelines (which may include possible online transforms or augmentations).

  • seed: The random seed for reproducibility during training.

  • batch_size, inference_batch_size: The batch sizes for training and inference (dev, test).

The transform attribute in the configuration is not passed to the dataset during initialization and is used to specify the type of data the dataset provides as well as any online transforms to be applied to the data at runtime.

To avoid race conditions when using Launcher Plugins that may run multiple training jobs in parallel, autrainer fetch and autrainer preprocess or fetch() and preprocess() are used to download the dataset and preprocess the data before training.

Note

All datasets that are provided by autrainer can be automatically downloaded as well as optionally preprocessed using the autrainer fetch and autrainer preprocess CLI commands or the fetch() and preprocess() CLI wrapper functions.

Abstract Dataset#

All datasets inherit from the AbstractDataset class.

class autrainer.datasets.AbstractDataset(path, features_subdir, seed, task, metrics, tracking_metric, index_column, target_column, file_type, file_handler, batch_size, inference_batch_size=None, train_transform=None, dev_transform=None, test_transform=None, stratify=None)[source]#

Abstract dataset class.

Parameters:
  • path (str) – Root path to the dataset.

  • features_subdir (str) – Subdirectory containing the features.

  • seed (int) – Seed for reproducibility.

  • task (str) – Task of the dataset in TASKS.

  • metrics (List[Union[str, DictConfig, Dict]]) – List of metrics to calculate.

  • tracking_metric (Union[str, DictConfig, Dict]) – Metric to track.

  • index_column (str) – Index column of the dataframe.

  • target_column (Union[str, List[str]]) – Target column of the dataframe.

  • file_type (str) – File type of the features.

  • file_handler (Union[str, DictConfig, Dict]) – File handler to load the data.

  • batch_size (int) – Batch size.

  • inference_batch_size (Optional[int]) – Inference batch size. If None, defaults to batch_size. Defaults to None.

  • train_transform (Optional[SmartCompose]) – Transform to apply to the training set. Defaults to None.

  • dev_transform (Optional[SmartCompose]) – Transform to apply to the development set. Defaults to None.

  • test_transform (Optional[SmartCompose]) – Transform to apply to the test set. Defaults to None.

  • stratify (Optional[List[str]]) – Columns to stratify the dataset on. Defaults to None.

abstract property target_transform: AbstractTargetTransform#

Get the transform to apply to the target.

Returns:

Target transform.

property output_dim: int#

Get the output dimension of the dataset.

Returns:

Output dimension.

load_dataframes()[source]#

Load the dataframes.

Return type:

Tuple[DataFrame, DataFrame, DataFrame]

Returns:

Dataframes for training, development, and testing.

property train_dataset: DatasetWrapper#

Get the training dataset.

Returns:

Training dataset.

property dev_dataset: DatasetWrapper#

Get the development dataset.

Returns:

Development dataset.

property test_dataset: DatasetWrapper#

Get the test dataset.

Returns:

Test dataset.

property train_loader: DataLoader#

Get the training loader.

Returns:

Training loader.

property dev_loader: DataLoader#

Get the development loader.

Returns:

Development loader.

property test_loader: DataLoader#

Get the test loader.

Returns:

Test loader.

get_evaluation_data()[source]#

Get the evaluation data.

Return type:

Tuple[DataFrame, DataFrame, List[str], AbstractTargetTransform]

Returns:

Dataframes for development and testing, columns to stratify on, and

the target transform.

static download(path)[source]#

Download the dataset. Can be implemented by subclasses, but is not required.

Parameters:

path (str) – Path to download the dataset to.

Return type:

None

Base Datasets#

Base datasets that can be used for training without the need for creating custom datasets.

class autrainer.datasets.BaseClassificationDataset(path, features_subdir, seed, metrics, tracking_metric, index_column, target_column, file_type, file_handler, batch_size, inference_batch_size=None, train_transform=None, dev_transform=None, test_transform=None, stratify=None)[source]#

Base classification dataset.

Parameters:
  • path (str) – Root path to the dataset.

  • features_subdir (str) – Subdirectory containing the features.

  • seed (int) – Seed for reproducibility.

  • metrics (List[Union[str, DictConfig, Dict]]) – List of metrics to calculate.

  • tracking_metric (Union[str, DictConfig, Dict]) – Metric to track.

  • index_column (str) – Index column of the dataframe.

  • target_column (str) – Target column of the dataframe.

  • file_type (str) – File type of the features.

  • file_handler (Union[str, DictConfig, Dict]) – File handler to load the data.

  • batch_size (int) – Batch size.

  • inference_batch_size (Optional[int]) – Inference batch size. If None, defaults to batch_size. Defaults to None.

  • train_transform (Optional[SmartCompose]) – Transform to apply to the training set. Defaults to None.

  • dev_transform (Optional[SmartCompose]) – Transform to apply to the development set. Defaults to None.

  • test_transform (Optional[SmartCompose]) – Transform to apply to the test set. Defaults to None.

  • stratify (Optional[List[str]]) – Columns to stratify the dataset on. Defaults to None.

property target_transform: LabelEncoder#

Get the transform to apply to the target.

Returns:

Target transform.

class autrainer.datasets.BaseMLClassificationDataset(path, features_subdir, seed, metrics, tracking_metric, index_column, target_column, file_type, file_handler, batch_size, inference_batch_size=None, train_transform=None, dev_transform=None, test_transform=None, stratify=None, threshold=0.5)[source]#

Base multi-label classification dataset.

Parameters:
  • path (str) – Root path to the dataset.

  • features_subdir (str) – Subdirectory containing the features.

  • seed (int) – Seed for reproducibility.

  • metrics (List[Union[str, DictConfig, Dict]]) – List of metrics to calculate.

  • tracking_metric (Union[str, DictConfig, Dict]) – Metric to track.

  • index_column (str) – Index column of the dataframe.

  • target_column (List[str]) – Target column of the dataframe.

  • file_type (str) – File type of the features.

  • file_handler (Union[str, DictConfig, Dict]) – File handler to load the data.

  • batch_size (int) – Batch size.

  • inference_batch_size (Optional[int]) – Inference batch size. If None, defaults to batch_size. Defaults to None.

  • train_transform (Optional[SmartCompose]) – Transform to apply to the training set. Defaults to None.

  • dev_transform (Optional[SmartCompose]) – Transform to apply to the development set. Defaults to None.

  • test_transform (Optional[SmartCompose]) – Transform to apply to the test set. Defaults to None.

  • stratify (Optional[List[str]]) – Columns to stratify the dataset on. Defaults to None.

  • threshold (float) – Threshold for classification. Defaults to 0.5.

property target_transform: MultiLabelEncoder#

Get the transform to apply to the target.

Returns:

Target transform.

class autrainer.datasets.BaseRegressionDataset(path, features_subdir, seed, metrics, tracking_metric, index_column, target_column, file_type, file_handler, batch_size, inference_batch_size=None, train_transform=None, dev_transform=None, test_transform=None, stratify=None)[source]#

Base regression dataset.

Parameters:
  • path (str) – Root path to the dataset.

  • features_subdir (str) – Subdirectory containing the features.

  • seed (int) – Seed for reproducibility.

  • metrics (List[Union[str, DictConfig, Dict]]) – List of metrics to calculate.

  • tracking_metric (Union[str, DictConfig, Dict]) – Metric to track.

  • index_column (str) – Index column of the dataframe.

  • target_column (str) – Target column of the dataframe.

  • file_type (str) – File type of the features.

  • file_handler (Union[str, DictConfig, Dict]) – File handler to load the data.

  • batch_size (int) – Batch size.

  • inference_batch_size (Optional[int]) – Inference batch size. If None, defaults to batch_size. Defaults to None.

  • train_transform (Optional[SmartCompose]) – Transform to apply to the training set. Defaults to None.

  • dev_transform (Optional[SmartCompose]) – Transform to apply to the development set. Defaults to None.

  • test_transform (Optional[SmartCompose]) – Transform to apply to the test set. Defaults to None.

  • stratify (Optional[List[str]]) – Columns to stratify the dataset on. Defaults to None.

property target_transform: MinMaxScaler#

Get the transform to apply to the target.

Returns:

Target transform.

class autrainer.datasets.BaseMTRegressionDataset(path, features_subdir, seed, metrics, tracking_metric, index_column, target_column, file_type, file_handler, batch_size, inference_batch_size=None, train_transform=None, dev_transform=None, test_transform=None, stratify=None)[source]#

Base multi-target regression dataset.

Parameters:
  • path (str) – Root path to the dataset.

  • features_subdir (str) – Subdirectory containing the features.

  • seed (int) – Seed for reproducibility.

  • metrics (List[Union[str, DictConfig, Dict]]) – List of metrics to calculate.

  • tracking_metric (Union[str, DictConfig, Dict]) – Metric to track.

  • index_column (str) – Index column of the dataframe.

  • target_column (List[str]) – Target column of the dataframe.

  • file_type (str) – File type of the features.

  • file_handler (Union[str, DictConfig, Dict]) – File handler to load the data.

  • batch_size (int) – Batch size.

  • inference_batch_size (Optional[int]) – Inference batch size. If None, defaults to batch_size. Defaults to None.

  • train_transform (Optional[SmartCompose]) – Transform to apply to the training set. Defaults to None.

  • dev_transform (Optional[SmartCompose]) – Transform to apply to the development set. Defaults to None.

  • test_transform (Optional[SmartCompose]) – Transform to apply to the test set. Defaults to None.

  • stratify (Optional[List[str]]) – Columns to stratify the dataset on. Defaults to None.

property target_transform: MultiTargetMinMaxScaler#

Get the transform to apply to the target.

Returns:

Target transform.

Toy Datasets#

A toy dataset for testing purposes.

Note

To easily test implementations, multiple toy dataset configurations across modalities and tasks are provided. We offer ToyAudio-... for audio, ToyImage-... for image, and ToyTabular-... for tabular data, respectively. For each dataset, we provide a task -R for regression, -C for classification, -MLC for multi-label classification, and -MTR for multi-target regression.

class autrainer.datasets.ToyDataset(task, size, num_targets, feature_shape, dev_split, test_split, seed, metrics, tracking_metric, batch_size, dtype='float32', inference_batch_size=None, train_transform=None, dev_transform=None, test_transform=None)[source]#

Toy dataset for testing purposes.

Parameters:
  • task (str) – Task of the dataset in [“classification”, “regression”, “ml-classification”, “mt-regression”].

  • size (int) – Size of the dataset.

  • num_targets (int) – Number of targets.

  • feature_shape (Union[int, List[int]]) – Shape of the features.

  • dev_split (float) – Proportion of the dataset to use for the development set.

  • test_split (float) – Proportion of the dataset to use for the test set.

  • seed (int) – Seed for reproducibility.

  • metrics (List[Union[str, DictConfig, Dict]]) – List of metrics to calculate.

  • tracking_metric (Union[str, DictConfig, Dict]) – Metric to track.

  • batch_size (int) – Batch size.

  • inference_batch_size (Optional[int]) – Inference batch size. If None, defaults to batch_size. Defaults to None.

  • train_transform (Optional[SmartCompose]) – Transform to apply to the training set. Defaults to None.

  • dev_transform (Optional[SmartCompose]) – Transform to apply to the development set. Defaults to None.

  • test_transform (Optional[SmartCompose]) – Transform to apply to the test set. Defaults to None.

Default Configurations

ToyAudio

conf/dataset/ToyAudio-C.yaml#
 1id: ToyAudio-C
 2_target_: autrainer.datasets.ToyDataset
 3
 4task: classification
 5size: 1000
 6num_targets: 10
 7feature_shape: [1, 48000]
 8dev_split: 0.2
 9test_split: 0.2
10
11criterion: autrainer.criterions.BalancedCrossEntropyLoss
12metrics:
13  - autrainer.metrics.Accuracy
14  - autrainer.metrics.UAR
15  - autrainer.metrics.F1
16tracking_metric: autrainer.metrics.Accuracy
17
18transform:
19  type: raw
conf/dataset/ToyAudio-MLC.yaml#
 1id: ToyAudio-MLC
 2_target_: autrainer.datasets.ToyDataset
 3
 4task: ml-classification
 5size: 1000
 6num_targets: 10
 7feature_shape: [1, 48000]
 8dev_split: 0.2
 9test_split: 0.2
10
11criterion: torch.nn.BCEWithLogitsLoss
12metrics:
13  - autrainer.metrics.MLAccuracy
14  - autrainer.metrics.MLF1Micro
15  - autrainer.metrics.MLF1Macro
16  - autrainer.metrics.MLF1Weighted
17tracking_metric: autrainer.metrics.MLF1Weighted
18
19transform:
20  type: raw
conf/dataset/ToyAudio-MTR.yaml#
 1id: ToyAudio-MTR
 2_target_: autrainer.datasets.ToyDataset
 3
 4task: mt-regression
 5size: 1000
 6num_targets: 10
 7feature_shape: [1, 48000]
 8dev_split: 0.2
 9test_split: 0.2
10
11criterion: autrainer.criterions.MSELoss
12metrics:
13  - autrainer.metrics.PCC
14  - autrainer.metrics.CCC
15  - autrainer.metrics.MSE
16  - autrainer.metrics.MAE
17tracking_metric: autrainer.metrics.PCC
18
19transform:
20  type: raw
conf/dataset/ToyAudio-R.yaml#
 1id: ToyAudio-R
 2_target_: autrainer.datasets.ToyDataset
 3
 4task: regression
 5size: 1000
 6num_targets: 1
 7feature_shape: [1, 48000]
 8dev_split: 0.2
 9test_split: 0.2
10
11criterion: autrainer.criterions.MSELoss
12metrics:
13  - autrainer.metrics.PCC
14  - autrainer.metrics.CCC
15  - autrainer.metrics.MSE
16  - autrainer.metrics.MAE
17tracking_metric: autrainer.metrics.PCC
18
19transform:
20  type: raw

ToyImage

conf/dataset/ToyImage-C.yaml#
 1id: ToyImage-C
 2_target_: autrainer.datasets.ToyDataset
 3
 4task: classification
 5size: 1000
 6num_targets: 10
 7feature_shape: [3, 64, 64]
 8dev_split: 0.2
 9test_split: 0.2
10dtype: uint8
11
12criterion: autrainer.criterions.BalancedCrossEntropyLoss
13metrics:
14  - autrainer.metrics.Accuracy
15  - autrainer.metrics.UAR
16  - autrainer.metrics.F1
17tracking_metric: autrainer.metrics.Accuracy
18
19transform:
20  type: image
21  base:
22    - autrainer.transforms.ScaleRange
23    - autrainer.transforms.Normalize:
24        mean: [0.485, 0.456, 0.406]
25        std: [0.229, 0.224, 0.225]
conf/dataset/ToyImage-MLC.yaml#
 1id: ToyImage-MLC
 2_target_: autrainer.datasets.ToyDataset
 3
 4task: ml-classification
 5size: 1000
 6num_targets: 10
 7feature_shape: [3, 64, 64]
 8dev_split: 0.2
 9test_split: 0.2
10dtype: uint8
11
12criterion: torch.nn.BCEWithLogitsLoss
13metrics:
14  - autrainer.metrics.MLAccuracy
15  - autrainer.metrics.MLF1Micro
16  - autrainer.metrics.MLF1Macro
17  - autrainer.metrics.MLF1Weighted
18tracking_metric: autrainer.metrics.MLF1Weighted
19
20transform:
21  type: image
22  base:
23    - autrainer.transforms.ScaleRange
24    - autrainer.transforms.Normalize:
25        mean: [0.485, 0.456, 0.406]
26        std: [0.229, 0.224, 0.225]
conf/dataset/ToyImage-MTR.yaml#
 1id: ToyImage-MTR
 2_target_: autrainer.datasets.ToyDataset
 3
 4task: mt-regression
 5size: 1000
 6num_targets: 10
 7feature_shape: [3, 64, 64]
 8dev_split: 0.2
 9test_split: 0.2
10dtype: uint8
11
12criterion: autrainer.criterions.MSELoss
13metrics:
14  - autrainer.metrics.PCC
15  - autrainer.metrics.CCC
16  - autrainer.metrics.MSE
17  - autrainer.metrics.MAE
18tracking_metric: autrainer.metrics.PCC
19
20transform:
21  type: image
22  base:
23    - autrainer.transforms.ScaleRange
24    - autrainer.transforms.Normalize:
25        mean: [0.485, 0.456, 0.406]
26        std: [0.229, 0.224, 0.225]
conf/dataset/ToyImage-R.yaml#
 1id: ToyImage-R
 2_target_: autrainer.datasets.ToyDataset
 3
 4task: regression
 5size: 1000
 6num_targets: 1
 7feature_shape: [3, 64, 64]
 8dev_split: 0.2
 9test_split: 0.2
10dtype: uint8
11
12criterion: autrainer.criterions.MSELoss
13metrics:
14  - autrainer.metrics.PCC
15  - autrainer.metrics.CCC
16  - autrainer.metrics.MSE
17  - autrainer.metrics.MAE
18tracking_metric: autrainer.metrics.PCC
19
20transform:
21  type: image
22  base:
23    - autrainer.transforms.ScaleRange
24    - autrainer.transforms.Normalize:
25        mean: [0.485, 0.456, 0.406]
26        std: [0.229, 0.224, 0.225]

ToyTabular

conf/dataset/ToyTabular-C.yaml#
 1id: ToyTabular-C
 2_target_: autrainer.datasets.ToyDataset
 3
 4task: classification
 5size: 1000
 6num_targets: 10
 7feature_shape: 64
 8dev_split: 0.2
 9test_split: 0.2
10
11criterion: autrainer.criterions.BalancedCrossEntropyLoss
12metrics:
13  - autrainer.metrics.Accuracy
14  - autrainer.metrics.UAR
15  - autrainer.metrics.F1
16tracking_metric: autrainer.metrics.Accuracy
17
18transform:
19  type: tabular
conf/dataset/ToyTabular-MLC.yaml#
 1id: ToyTabular-MLC
 2_target_: autrainer.datasets.ToyDataset
 3
 4task: ml-classification
 5size: 1000
 6num_targets: 10
 7feature_shape: 64
 8dev_split: 0.2
 9test_split: 0.2
10
11criterion: torch.nn.BCEWithLogitsLoss
12metrics:
13  - autrainer.metrics.MLAccuracy
14  - autrainer.metrics.MLF1Micro
15  - autrainer.metrics.MLF1Macro
16  - autrainer.metrics.MLF1Weighted
17tracking_metric: autrainer.metrics.MLF1Weighted
18
19transform:
20  type: tabular
conf/dataset/ToyTabular-MTR.yaml#
 1id: ToyTabular-MTR
 2_target_: autrainer.datasets.ToyDataset
 3
 4task: mt-regression
 5size: 1000
 6num_targets: 10
 7feature_shape: 64
 8dev_split: 0.2
 9test_split: 0.2
10
11criterion: autrainer.criterions.MSELoss
12metrics:
13  - autrainer.metrics.PCC
14  - autrainer.metrics.CCC
15  - autrainer.metrics.MSE
16  - autrainer.metrics.MAE
17tracking_metric: autrainer.metrics.PCC
18
19transform:
20  type: tabular
conf/dataset/ToyTabular-R.yaml#
 1id: ToyTabular-R
 2_target_: autrainer.datasets.ToyDataset
 3
 4task: regression
 5size: 1000
 6num_targets: 10
 7feature_shape: 64
 8dev_split: 0.2
 9test_split: 0.2
10
11
12criterion: autrainer.criterions.MSELoss
13metrics:
14  - autrainer.metrics.PCC
15  - autrainer.metrics.CCC
16  - autrainer.metrics.MSE
17  - autrainer.metrics.MAE
18tracking_metric: autrainer.metrics.PCC
19
20transform:
21  type: tabular

Audio Datasets#

We provide a number of different audio-specific datasets.

class autrainer.datasets.AIBO(path, features_subdir, seed, metrics, tracking_metric, index_column, target_column, file_type, file_handler, batch_size, inference_batch_size=None, train_transform=None, dev_transform=None, test_transform=None, stratify=None, standardize=False, aibo_task='2cl')[source]#

FAU AIBO dataset.

Parameters:
  • path (str) – Root path to the dataset.

  • features_subdir (str) – Subdirectory containing the features.

  • seed (int) – Seed for reproducibility.

  • metrics (List[Union[str, DictConfig, Dict]]) – List of metrics to calculate.

  • tracking_metric (Union[str, DictConfig, Dict]) – Metric to track.

  • index_column (str) – Index column of the dataframe.

  • target_column (str) – Target column of the dataframe.

  • file_type (str) – File type of the features.

  • file_handler (Union[str, DictConfig, Dict]) – File handler to load the data.

  • batch_size (int) – Batch size.

  • inference_batch_size (Optional[int]) – Inference batch size. If None, defaults to batch_size. Defaults to None.

  • train_transform (Optional[SmartCompose]) – Transform to apply to the training set. Defaults to None.

  • dev_transform (Optional[SmartCompose]) – Transform to apply to the development set. Defaults to None.

  • test_transform (Optional[SmartCompose]) – Transform to apply to the test set. Defaults to None.

  • stratify (Optional[List[str]]) – Columns to stratify the dataset on. Defaults to None.

  • standardize (bool) – Whether to standardize the data. Defaults to False.

  • aibo_task (str) – Task to load in [“2cl”, “5cl”]. Defaults to “2cl”.

Default Configurations
conf/dataset/AIBO-eGeMAPS-llds.yaml#
 1# Important: should be used with inference_batch_size: 1
 2id: AIBO-eGeMAPS-llds
 3_target_: autrainer.datasets.AIBO
 4
 5aibo_task: 2cl
 6
 7path: data/AIBO
 8features_subdir: eGeMAPSv02-llds
 9index_column: file
10target_column: class
11file_type: npy
12file_handler: autrainer.datasets.utils.NumpyFileHandler
13
14criterion: autrainer.criterions.BalancedCrossEntropyLoss
15metrics: 
16  - autrainer.metrics.Accuracy
17  - autrainer.metrics.UAR
18  - autrainer.metrics.F1
19tracking_metric: autrainer.metrics.Accuracy
20
21transform:
22  type: tabular
23  base:
24    - autrainer.transforms.Expand:
25        size: 200
26  train:
27    - autrainer.transforms.RandomCrop:
28        size: 200
conf/dataset/AIBO-IS16-llds.yaml#
 1# Important: should be used with inference_batch_size: 1
 2id: AIBO-IS16-llds
 3_target_: autrainer.datasets.AIBO
 4
 5aibo_task: 2cl
 6
 7path: data/AIBO
 8features_subdir: ComParE_2016-llds
 9index_column: file
10target_column: class
11file_type: npy
12file_handler: autrainer.datasets.utils.NumpyFileHandler
13
14criterion: autrainer.criterions.BalancedCrossEntropyLoss
15metrics: 
16  - autrainer.metrics.Accuracy
17  - autrainer.metrics.UAR
18  - autrainer.metrics.F1
19tracking_metric: autrainer.metrics.Accuracy
20
21transform:
22  type: tabular
23  base:
24    - autrainer.transforms.Expand:
25        size: 200
26  train:
27    - autrainer.transforms.RandomCrop:
28        size: 200
conf/dataset/AIBO-mel-16k.yaml#
 1# Important: should be used with inference_batch_size: 1
 2id: AIBO-mel-16k
 3_target_: autrainer.datasets.AIBO
 4
 5aibo_task: 2cl
 6
 7path: data/AIBO
 8features_subdir: log_mel_16k
 9index_column: file
10target_column: class
11file_type: npy
12file_handler: autrainer.datasets.utils.NumpyFileHandler
13
14criterion: autrainer.criterions.BalancedCrossEntropyLoss
15metrics: 
16  - autrainer.metrics.Accuracy
17  - autrainer.metrics.UAR
18  - autrainer.metrics.F1
19tracking_metric: autrainer.metrics.Accuracy
20
21transform:
22  type: grayscale
23  base:
24    - autrainer.transforms.Expand:
25        size: 301
26        axis: -2
27  train:
28    - autrainer.transforms.RandomCrop:
29        size: 301
30        axis: -2
conf/dataset/AIBO-mel-32k.yaml#
 1# Important: should be used with inference_batch_size: 1
 2id: AIBO-mel-32k
 3_target_: autrainer.datasets.AIBO
 4
 5aibo_task: 2cl
 6
 7path: data/AIBO
 8features_subdir: log_mel_32k
 9index_column: file
10target_column: class
11file_type: npy
12file_handler: autrainer.datasets.utils.NumpyFileHandler
13
14criterion: autrainer.criterions.BalancedCrossEntropyLoss
15metrics: 
16  - autrainer.metrics.Accuracy
17  - autrainer.metrics.UAR
18  - autrainer.metrics.F1
19tracking_metric: autrainer.metrics.Accuracy
20
21transform:
22  type: grayscale
23  base:
24    - autrainer.transforms.Expand:
25        size: 301
26        axis: -2
27  train:
28    - autrainer.transforms.RandomCrop:
29        size: 301
30        axis: -2
conf/dataset/AIBO-wav-pad.yaml#
 1# Important: should be used with inference_batch_size: 1
 2id: AIBO-wav-pad
 3_target_: autrainer.datasets.AIBO
 4
 5aibo_task: 2cl
 6
 7path: data/AIBO
 8features_subdir: default
 9index_column: file
10target_column: class
11file_type: wav
12file_handler: autrainer.datasets.utils.AudioFileHandler
13
14criterion: autrainer.criterions.BalancedCrossEntropyLoss
15metrics: 
16  - autrainer.metrics.Accuracy
17  - autrainer.metrics.UAR
18  - autrainer.metrics.F1
19tracking_metric: autrainer.metrics.Accuracy
20
21transform:
22  type: raw 
23  base:
24    - autrainer.transforms.Expand:
25        size: 48000
26        axis: -1
27  train:
28    - autrainer.transforms.RandomCrop:
29        size: 48000
30        axis: -1
conf/dataset/AIBO-wav.yaml#
 1# Important: should be used with inference_batch_size: 1
 2id: AIBO-wav
 3_target_: autrainer.datasets.AIBO
 4
 5aibo_task: 2cl
 6
 7path: data/AIBO
 8features_subdir: default
 9index_column: file
10target_column: class
11file_type: wav
12file_handler: autrainer.datasets.utils.AudioFileHandler
13
14criterion: autrainer.criterions.BalancedCrossEntropyLoss
15metrics: 
16  - autrainer.metrics.Accuracy
17  - autrainer.metrics.UAR
18  - autrainer.metrics.F1
19tracking_metric: autrainer.metrics.Accuracy
20
21transform:
22  type: raw
load_dataframes()[source]#

Load the dataframes.

Return type:

Tuple[DataFrame, DataFrame, DataFrame]

Returns:

Dataframes for training, development, and testing.

static download(path)[source]#

Download the FAU AIBO dataset.

As the AIBO dataset is private, this method does not download the dataset but rather prepares the file structure expected by the preprocessing routines.

In the specified path, the following directories and files are expected:

  • default/: Directory containing .wav files.

  • chunk_labels_2cl_corpus.txt: File containing the file names and corresponding labels for the 2-class classification task.

  • chunk_labels_5cl_corpus.txt: File containing the file names and corresponding labels for the 5-class classification task.

Produces the following splits for both tasks (2cl and 5cl):

  • train_{task}.csv: Training split of all speakers of the Ohm-Gymnasium with the exception of the last two speakers.

  • dev_{task}.csv: Development split of the last two speakers of the Ohm-Gymnasium.

  • test_{task}.csv: Test split of all speakers of the Montessori-Schule.

For more information on the dataset and dataset split, see: https://doi.org/10.1109/ICME51207.2021.9428217

Parameters:

path (str) – Path to the directory to download the dataset to.

Return type:

None

class autrainer.datasets.DCASE2016Task1(path, features_subdir, seed, metrics, tracking_metric, index_column, target_column, file_type, file_handler, batch_size, inference_batch_size=None, train_transform=None, dev_transform=None, test_transform=None, stratify=None, fold=1)[source]#

TUT Acoustic scenes 2016 Task 1 (DCASE2016Task1) dataset.

Parameters:
  • path (str) – Root path to the dataset.

  • features_subdir (str) – Subdirectory containing the features.

  • seed (int) – Seed for reproducibility.

  • metrics (List[Union[str, DictConfig, Dict]]) – List of metrics to calculate.

  • tracking_metric (Union[str, DictConfig, Dict]) – Metric to track.

  • index_column (str) – Index column of the dataframe.

  • target_column (str) – Target column of the dataframe.

  • file_type (str) – File type of the features.

  • file_handler (Union[str, DictConfig, Dict]) – File handler to load the data.

  • batch_size (int) – Batch size.

  • inference_batch_size (Optional[int]) – Inference batch size. If None, defaults to batch_size. Defaults to None.

  • train_transform (Optional[SmartCompose]) – Transform to apply to the training set. Defaults to None.

  • dev_transform (Optional[SmartCompose]) – Transform to apply to the development set. Defaults to None.

  • test_transform (Optional[SmartCompose]) – Transform to apply to the test set. Defaults to None.

  • stratify (Optional[List[str]]) – Columns to stratify the dataset on. Defaults to None.

  • fold (int) – Fold to use in [1, 2, 3, 4]. Defaults to 1.

Default Configurations
conf/dataset/DCASE2016Task1-16k.yaml#
 1id: DCASE2016Task1-16k
 2_target_: autrainer.datasets.DCASE2016Task1
 3
 4fold: 1
 5
 6path: data/DCASE2016
 7features_subdir: log_mel_16k
 8index_column: filename
 9target_column: scene_label
10file_type: npy
11file_handler: autrainer.datasets.utils.NumpyFileHandler
12
13criterion: autrainer.criterions.BalancedCrossEntropyLoss
14metrics: 
15  - autrainer.metrics.Accuracy
16  - autrainer.metrics.UAR
17  - autrainer.metrics.F1
18tracking_metric: autrainer.metrics.Accuracy
19
20transform:
21  type: grayscale
conf/dataset/DCASE2016Task1-32k.yaml#
 1id: DCASE2016Task1-32k
 2_target_: autrainer.datasets.DCASE2016Task1
 3
 4fold: 1
 5
 6path: data/DCASE2016
 7features_subdir: log_mel_32k
 8index_column: filename
 9target_column: scene_label
10file_type: npy
11file_handler: autrainer.datasets.utils.NumpyFileHandler
12
13criterion: autrainer.criterions.BalancedCrossEntropyLoss
14metrics: 
15  - autrainer.metrics.Accuracy
16  - autrainer.metrics.UAR
17  - autrainer.metrics.F1
18tracking_metric: autrainer.metrics.Accuracy
19
20transform:
21  type: grayscale
conf/dataset/DCASE2016Task1-wav-stm.yaml#
 1id: DCASE2016Task1-wav-stm
 2_target_: autrainer.datasets.DCASE2016Task1
 3
 4fold: 1
 5
 6path: data/DCASE2016
 7features_subdir: default
 8index_column: filename
 9target_column: scene_label
10file_type: wav
11file_handler: autrainer.datasets.utils.AudioFileHandler
12
13criterion: autrainer.criterions.BalancedCrossEntropyLoss
14metrics: 
15  - autrainer.metrics.Accuracy
16  - autrainer.metrics.UAR
17  - autrainer.metrics.F1
18tracking_metric: autrainer.metrics.Accuracy
19
20transform:
21  type: raw
22  base:
23    - autrainer.transforms.StereoToMono
conf/dataset/DCASE2016Task1-wav.yaml#
 1id: DCASE2016Task1-wav
 2_target_: autrainer.datasets.DCASE2016Task1
 3
 4fold: 1
 5
 6path: data/DCASE2016
 7features_subdir: default
 8index_column: filename
 9target_column: scene_label
10file_type: wav
11file_handler: autrainer.datasets.utils.AudioFileHandler
12
13criterion: autrainer.criterions.BalancedCrossEntropyLoss
14metrics: 
15  - autrainer.metrics.Accuracy
16  - autrainer.metrics.UAR
17  - autrainer.metrics.F1
18tracking_metric: autrainer.metrics.Accuracy
19
20transform:
21  type: raw
load_dataframes()[source]#

Load the dataframes.

Return type:

Tuple[DataFrame, DataFrame, DataFrame]

Returns:

Dataframes for training, development, and testing.

static download(path)[source]#

Download the TUT Acoustic scenes 2016 Task 1 (DCASE2016Task1) dataset.

For more information on the dataset and dataset split, see: https://dcase.community/challenge2016/task-acoustic-scene-classification

Parameters:

path (str) – Path to the directory to download the dataset to.

Return type:

None

class autrainer.datasets.DCASE2018Task3(path, features_subdir, seed, metrics, tracking_metric, index_column, target_column, file_type, file_handler, batch_size, inference_batch_size=None, train_transform=None, dev_transform=None, test_transform=None, stratify=None, dev_split=0.0, dev_split_seed=None)[source]#

DCASE 2018 Task 3 dataset.

Parameters:
  • path (str) – Root path to the dataset.

  • features_subdir (str) – Subdirectory containing the features.

  • seed (int) – Seed for reproducibility.

  • metrics (List[Union[str, DictConfig, Dict]]) – List of metrics to calculate.

  • tracking_metric (Union[str, DictConfig, Dict]) – Metric to track.

  • index_column (str) – Index column of the dataframe.

  • target_column (str) – Target column of the dataframe.

  • file_type (str) – File type of the features.

  • file_handler (Union[str, DictConfig, Dict]) – File handler to load the data.

  • batch_size (int) – Batch size.

  • inference_batch_size (Optional[int]) – Inference batch size. If None, defaults to batch_size. Defaults to None.

  • train_transform (Optional[SmartCompose]) – Transform to apply to the training set. Defaults to None.

  • dev_transform (Optional[SmartCompose]) – Transform to apply to the development set. Defaults to None.

  • test_transform (Optional[SmartCompose]) – Transform to apply to the test set. Defaults to None.

  • stratify (Optional[List[str]]) – Columns to stratify the dataset on. Defaults to None.

  • dev_split (float) – Fraction of the training set to use as the development set. Defaults to 0.0.

  • dev_split_seed (Optional[int]) – Seed for the development split. If None, seed is used. Defaults to None.

Default Configurations
conf/dataset/DCASE2018Task3-wav-16k.yaml#
 1id: DCASE2018Task3-wav-16k
 2_target_: autrainer.datasets.DCASE2018Task3
 3
 4dev_split: 0.1
 5dev_split_seed: 0
 6
 7path: data/DCASE2018Task3
 8features_subdir: default
 9index_column: filename
10target_column: hasbird
11file_type: wav
12file_handler:
13  autrainer.datasets.utils.AudioFileHandler:
14    target_sample_rate: 16000
15
16criterion: autrainer.criterions.BalancedCrossEntropyLoss
17metrics:
18  - autrainer.metrics.Accuracy
19  - autrainer.metrics.UAR
20  - autrainer.metrics.F1
21tracking_metric: autrainer.metrics.Accuracy
22
23transform:
24  type: raw
conf/dataset/DCASE2018Task3-wav.yaml#
 1id: DCASE2018Task3-wav
 2_target_: autrainer.datasets.DCASE2018Task3
 3
 4dev_split: 0.1
 5dev_split_seed: 0
 6
 7path: data/DCASE2018Task3
 8features_subdir: default
 9index_column: filename
10target_column: hasbird
11file_type: wav
12file_handler: autrainer.datasets.utils.AudioFileHandler
13
14criterion: autrainer.criterions.BalancedCrossEntropyLoss
15metrics:
16  - autrainer.metrics.Accuracy
17  - autrainer.metrics.UAR
18  - autrainer.metrics.F1
19tracking_metric: autrainer.metrics.Accuracy
20
21transform:
22  type: raw
load_dataframes()[source]#

Load the dataframes.

Return type:

Tuple[DataFrame, DataFrame, DataFrame]

Returns:

Dataframes for training, development, and testing.

static download(path)[source]#

Download the DCASE 2018 Task 3 dataset.

For the train dataset, the following subsets are used:

  • Field recordings, worldwide (“freefield1010”)

  • Remote monitoring flight calls, USA (“BirdVox-DCASE-20k”)

For the test dataset, the following subset is used:

  • Crowdsourced dataset, UK (“warblrb10k”)

Both the training and test datasets are taken from the development set, as no labels are provided for the evaluation set. For more information on the dataset, see: https://dcase.community/challenge2018/task-bird-audio-detection

Parameters:

path (str) – Path to the directory to download the dataset to.

Return type:

None

class autrainer.datasets.DCASE2020Task1A(path, features_subdir, seed, metrics, tracking_metric, index_column, target_column, file_type, file_handler, batch_size, inference_batch_size=None, train_transform=None, dev_transform=None, test_transform=None, stratify=None, dev_split=0.0, dev_split_seed=None, scene_category=None, exclude_cities=None)[source]#

TAU Urban Acoustic Scenes 2020 Mobile Task 1 Subtask A (DCASE2020Task1A) dataset.

Parameters:
  • path (str) – Root path to the dataset.

  • features_subdir (str) – Subdirectory containing the features.

  • seed (int) – Seed for reproducibility.

  • metrics (List[Union[str, DictConfig, Dict]]) – List of metrics to calculate.

  • tracking_metric (Union[str, DictConfig, Dict]) – Metric to track.

  • index_column (str) – Index column of the dataframe.

  • target_column (str) – Target column of the dataframe.

  • file_type (str) – File type of the features.

  • file_handler (Union[str, DictConfig, Dict]) – File handler to load the data.

  • batch_size (int) – Batch size.

  • inference_batch_size (Optional[int]) – Inference batch size. If None, defaults to batch_size. Defaults to None.

  • train_transform (Optional[SmartCompose]) – Transform to apply to the training set. Defaults to None.

  • dev_transform (Optional[SmartCompose]) – Transform to apply to the development set. Defaults to None.

  • test_transform (Optional[SmartCompose]) – Transform to apply to the test set. Defaults to None.

  • stratify (Optional[List[str]]) – Columns to stratify the dataset on. Defaults to None.

  • dev_split (float) – Fraction of the training set to use as the development set. Defaults to 0.0.

  • dev_split_seed (Optional[int]) – Seed for the development split. If None, seed is used. Defaults to None.

  • scene_category (Optional[str]) – Scene category in [“indoor”, “outdoor”, “transportation”]. Defaults to None.

  • exclude_cities (Optional[List[str]]) – List of cities to exclude from the dataset. Defaults to None.

Default Configurations
conf/dataset/DCASE2020Task1A-16k.yaml#
 1id: DCASE2020Task1A-16k
 2_target_: autrainer.datasets.DCASE2020Task1A
 3
 4dev_split: 0.1
 5dev_split_seed: 0
 6stratify:
 7  - scene_label
 8  - city
 9  - device
10
11path: data/DCASE2020Task1A
12features_subdir: log_mel_16k
13index_column: filename
14target_column: scene_label
15file_type: npy
16file_handler: autrainer.datasets.utils.NumpyFileHandler
17
18criterion: autrainer.criterions.BalancedCrossEntropyLoss
19metrics:
20  - autrainer.metrics.Accuracy
21  - autrainer.metrics.UAR
22  - autrainer.metrics.F1
23tracking_metric: autrainer.metrics.Accuracy
24
25transform:
26  type: grayscale
conf/dataset/DCASE2020Task1A-32k.yaml#
 1id: DCASE2020Task1A-32k
 2_target_: autrainer.datasets.DCASE2020Task1A
 3
 4dev_split: 0.1
 5dev_split_seed: 0
 6stratify:
 7  - scene_label
 8  - city
 9  - device
10
11path: data/DCASE2020Task1A
12features_subdir: log_mel_32k
13index_column: filename
14target_column: scene_label
15file_type: npy
16file_handler: autrainer.datasets.utils.NumpyFileHandler
17
18criterion: autrainer.criterions.BalancedCrossEntropyLoss
19metrics:
20  - autrainer.metrics.Accuracy
21  - autrainer.metrics.UAR
22  - autrainer.metrics.F1
23tracking_metric: autrainer.metrics.Accuracy
24
25transform:
26  type: grayscale
conf/dataset/DCASE2020Task1A-wav-16k.yaml#
 1id: DCASE2020Task1A-wav-16k
 2_target_: autrainer.datasets.DCASE2020Task1A
 3
 4dev_split: 0.1
 5dev_split_seed: 0
 6stratify:
 7  - scene_label
 8  - city
 9  - device
10
11path: data/DCASE2020Task1A
12features_subdir: default
13index_column: filename
14target_column: scene_label
15file_type: wav
16file_handler:
17  autrainer.datasets.utils.AudioFileHandler:
18    target_sample_rate: 16000
19
20criterion: autrainer.criterions.BalancedCrossEntropyLoss
21metrics:
22  - autrainer.metrics.Accuracy
23  - autrainer.metrics.UAR
24  - autrainer.metrics.F1
25tracking_metric: autrainer.metrics.Accuracy
26
27transform:
28  type: raw
conf/dataset/DCASE2020Task1A-wav.yaml#
 1id: DCASE2020Task1A-wav
 2_target_: autrainer.datasets.DCASE2020Task1A
 3
 4dev_split: 0.1
 5dev_split_seed: 0
 6stratify:
 7  - scene_label
 8  - city
 9  - device
10
11path: data/DCASE2020Task1A
12features_subdir: default
13index_column: filename
14target_column: scene_label
15file_type: wav
16file_handler: autrainer.datasets.utils.AudioFileHandler
17
18criterion: autrainer.criterions.BalancedCrossEntropyLoss
19metrics:
20  - autrainer.metrics.Accuracy
21  - autrainer.metrics.UAR
22  - autrainer.metrics.F1
23tracking_metric: autrainer.metrics.Accuracy
24
25transform:
26  type: raw
load_dataframes()[source]#

Load the dataframes.

Return type:

Tuple[DataFrame, DataFrame, DataFrame]

Returns:

Dataframes for training, development, and testing.

static download(path)[source]#

Download the TAU Urban Acoustic Scenes 2020 Mobile Task 1 Subtask A (DCASE2020Task1A) dataset.

As no labels are provided for the evaluation set, the provided training and test split of the development set is created. Therefore, this download does not include the evaluation set.

For more information on the dataset, see: https://dcase.community/challenge2020/task-acoustic-scene-classification

Parameters:

path (str) – Path to the directory to download the dataset to.

Return type:

None

class autrainer.datasets.EDANSA2019(path, features_subdir, seed, metrics, tracking_metric, index_column, target_column, file_type, file_handler, batch_size, inference_batch_size=None, train_transform=None, dev_transform=None, test_transform=None, stratify=None, threshold=0.5)[source]#

EDANSA 2019 dataset.

Parameters:
  • path (str) – Root path to the dataset.

  • features_subdir (str) – Subdirectory containing the features.

  • seed (int) – Seed for reproducibility.

  • metrics (List[Union[str, DictConfig, Dict]]) – List of metrics to calculate.

  • tracking_metric (Union[str, DictConfig, Dict]) – Metric to track.

  • index_column (str) – Index column of the dataframe.

  • target_column (List[str]) – Target column of the dataframe.

  • file_type (str) – File type of the features.

  • file_handler (Union[str, DictConfig, Dict]) – File handler to load the data.

  • batch_size (int) – Batch size.

  • inference_batch_size (Optional[int]) – Inference batch size. If None, defaults to batch_size. Defaults to None.

  • train_transform (Optional[SmartCompose]) – Transform to apply to the training set. Defaults to None.

  • dev_transform (Optional[SmartCompose]) – Transform to apply to the development set. Defaults to None.

  • test_transform (Optional[SmartCompose]) – Transform to apply to the test set. Defaults to None.

  • stratify (Optional[List[str]]) – Columns to stratify the dataset on. Defaults to None.

  • threshold (float) – Threshold for classification. Defaults to 0.5.

Default Configurations
conf/dataset/EDANSA2019-wav.yaml#
 1id: EDANSA2019-wav
 2_target_: autrainer.datasets.EDANSA2019
 3
 4threshold: 0.5
 5
 6path: data/EDANSA-2019
 7features_subdir: default
 8index_column: Clip Path
 9target_column:
10  - Anth
11  - Bio
12  - Geo
13  - Sil
14file_type: wav
15file_handler: autrainer.datasets.utils.AudioFileHandler
16
17criterion: torch.nn.BCEWithLogitsLoss
18metrics:
19  - autrainer.metrics.MLAccuracy
20  - autrainer.metrics.MLF1Micro
21  - autrainer.metrics.MLF1Macro
22  - autrainer.metrics.MLF1Weighted
23tracking_metric: autrainer.metrics.MLF1Weighted
24
25transform:
26  type: raw
static download(path)[source]#

Download the EDANSA 2019 dataset.

For more information on the dataset, see: https://zenodo.org/doi/10.5281/zenodo.6824271

Parameters:

path (str) – Path to the directory to download the dataset to.

Return type:

None

class autrainer.datasets.EmoDB(path, features_subdir, seed, metrics, tracking_metric, index_column, target_column, file_type, file_handler, batch_size, inference_batch_size=None, train_transform=None, dev_transform=None, test_transform=None, stratify=None, train_speakers=None, dev_speakers=None, test_speakers=None)[source]#

EmoDB dataset for the task of Speech Emotion Recognition.

Parameters:
  • path (str) – Root path to the dataset.

  • features_subdir (str) – Subdirectory containing the features.

  • seed (int) – Seed for reproducibility.

  • metrics (List[Union[str, DictConfig, Dict]]) – List of metrics to calculate.

  • tracking_metric (Union[str, DictConfig, Dict]) – Metric to track.

  • index_column (str) – Index column of the dataframe.

  • target_column (str) – Target column of the dataframe.

  • file_type (str) – File type of the features.

  • file_handler (Union[str, DictConfig, Dict]) – File handler to load the data.

  • batch_size (int) – Batch size.

  • inference_batch_size (Optional[int]) – Inference batch size. If None, defaults to batch_size. Defaults to None.

  • train_transform (Optional[SmartCompose]) – Transform to apply to the training set. Defaults to None.

  • dev_transform (Optional[SmartCompose]) – Transform to apply to the development set. Defaults to None.

  • test_transform (Optional[SmartCompose]) – Transform to apply to the test set. Defaults to None.

  • stratify (Optional[List[str]]) – Columns to stratify the dataset on. Defaults to None.

  • train_speakers (Optional[List[int]]) – List of speakers IDs (int) to use for training. If None, 3, 8, 9, 10, 11, 12 are used. Defaults to None.

  • dev_speakers (Optional[List[int]]) – List of speakers IDs (int) to use for validation. If None, 13, 14 are used. Defaults to None.

  • test_speakers (Optional[List[int]]) – List of speakers IDs (int) to use for testing. If None, 15, 16 are used. Defaults to None.

Default Configurations
conf/dataset/EmoDB-32k.yaml#
 1id: EmoDB-32k
 2_target_: autrainer.datasets.EmoDB
 3
 4train_speakers: [3, 8, 9, 10, 11, 12]
 5dev_speakers: [13, 14]
 6test_speakers: [15, 16]
 7
 8path: data/EmoDB
 9features_subdir: log_mel_32k
10index_column: filename
11target_column: emotion
12file_type: npy
13file_handler: autrainer.datasets.utils.NumpyFileHandler
14
15criterion: autrainer.criterions.BalancedCrossEntropyLoss
16metrics:
17  - autrainer.metrics.Accuracy
18  - autrainer.metrics.UAR
19  - autrainer.metrics.F1
20tracking_metric: autrainer.metrics.Accuracy
21
22transform:
23  type: grayscale
24  base:
25    - autrainer.transforms.Expand:
26        size: 901
27        axis: -2
conf/dataset/EmoDB-wav.yaml#
 1id: EmoDB-wav
 2_target_: autrainer.datasets.EmoDB
 3
 4train_speakers: [3, 8, 9, 10, 11, 12]
 5dev_speakers: [13, 14]
 6test_speakers: [15, 16]
 7
 8path: data/EmoDB
 9features_subdir: default
10index_column: filename
11target_column: emotion
12file_type: wav
13file_handler: autrainer.datasets.utils.AudioFileHandler
14
15criterion: autrainer.criterions.BalancedCrossEntropyLoss
16metrics:
17  - autrainer.metrics.Accuracy
18  - autrainer.metrics.UAR
19  - autrainer.metrics.F1
20tracking_metric: autrainer.metrics.Accuracy
21
22transform:
23  type: raw
24  base:
25    - autrainer.transforms.Expand:
26        size: 144000 # 16khz * 9s
27        axis: -1
load_dataframes()[source]#

Load the dataframes.

Return type:

Tuple[DataFrame, DataFrame, DataFrame]

Returns:

Dataframes for training, development, and testing.

static download(path)[source]#

Download the EmoDB dataset.

For more information on the dataset, see: http://emodb.bilderbar.info/docu/

Parameters:

path (str) – Path to the directory to download the dataset to.

Return type:

None

class autrainer.datasets.SpeechCommands(path, features_subdir, seed, metrics, tracking_metric, index_column, target_column, file_type, file_handler, batch_size, inference_batch_size=None, train_transform=None, dev_transform=None, test_transform=None, stratify=None)[source]#

Speech Commands (v0.02) dataset.

Parameters:
  • path (str) – Root path to the dataset.

  • features_subdir (str) – Subdirectory containing the features.

  • seed (int) – Seed for reproducibility.

  • metrics (List[Union[str, DictConfig, Dict]]) – List of metrics to calculate.

  • tracking_metric (Union[str, DictConfig, Dict]) – Metric to track.

  • index_column (str) – Index column of the dataframe.

  • target_column (str) – Target column of the dataframe.

  • file_type (str) – File type of the features.

  • file_handler (Union[str, DictConfig, Dict]) – File handler to load the data.

  • batch_size (int) – Batch size.

  • inference_batch_size (Optional[int]) – Inference batch size. If None, defaults to batch_size. Defaults to None.

  • train_transform (Optional[SmartCompose]) – Transform to apply to the training set. Defaults to None.

  • dev_transform (Optional[SmartCompose]) – Transform to apply to the development set. Defaults to None.

  • test_transform (Optional[SmartCompose]) – Transform to apply to the test set. Defaults to None.

  • stratify (Optional[List[str]]) – Columns to stratify the dataset on. Defaults to None.

Default Configurations
conf/dataset/SpeechCommands-16k.yaml#
 1id: SpeechCommands-16k
 2_target_: autrainer.datasets.SpeechCommands
 3
 4path: data/SpeechCommands
 5features_subdir: log_mel_16k
 6index_column: path
 7target_column: label
 8file_type: npy
 9file_handler: autrainer.datasets.utils.NumpyFileHandler
10
11criterion: autrainer.criterions.BalancedCrossEntropyLoss
12metrics: 
13  - autrainer.metrics.Accuracy
14  - autrainer.metrics.UAR
15  - autrainer.metrics.F1
16tracking_metric: autrainer.metrics.Accuracy
17
18transform:
19  type: grayscale
20  base:
21    - autrainer.transforms.Expand:
22        size: 101
23        axis: -2 
conf/dataset/SpeechCommands-32k.yaml#
 1id: SpeechCommands-32k
 2_target_: autrainer.datasets.SpeechCommands
 3
 4path: data/SpeechCommands
 5features_subdir: log_mel_32k
 6index_column: path
 7target_column: label
 8file_type: npy
 9file_handler: autrainer.datasets.utils.NumpyFileHandler
10
11criterion: autrainer.criterions.BalancedCrossEntropyLoss
12metrics: 
13  - autrainer.metrics.Accuracy
14  - autrainer.metrics.UAR
15  - autrainer.metrics.F1
16tracking_metric: autrainer.metrics.Accuracy
17
18transform:
19  type: grayscale
20  base:
21    - autrainer.transforms.Expand:
22        size: 101
23        axis: -2 
conf/dataset/SpeechCommands-wav.yaml#
 1id: SpeechCommands-wav
 2_target_: autrainer.datasets.SpeechCommands
 3
 4path: data/SpeechCommands
 5features_subdir: default
 6index_column: path
 7target_column: label
 8file_type: wav
 9file_handler: autrainer.datasets.utils.AudioFileHandler
10
11criterion: autrainer.criterions.BalancedCrossEntropyLoss
12metrics: 
13  - autrainer.metrics.Accuracy
14  - autrainer.metrics.UAR
15  - autrainer.metrics.F1
16tracking_metric: autrainer.metrics.Accuracy
17
18transform:
19  type: raw
20  base:
21    - autrainer.transforms.Expand:
22        size: 16000
23        axis: -1 
static download(path)[source]#

Download the Speech Commands (v0.02) dataset from torchaudio.

For more information on the dataset, see: https://doi.org/10.48550/arXiv.1804.03209

Parameters:

path (str) – Path to the directory to download the dataset to.

Return type:

None