Datasets#
autrainer provides a number of different audio-specific datasets, base datasets for different tasks, and toy datasets for testing purposes. To ensure consistency across different data formats and manage multiple data types, all datasets should follow a standardized structure.
Tip
To create custom datasets, refer to the custom datasets tutorial.
In addition to the common attributes like id
, _target_
, the dataset configuration file should include the following attributes:
Structure and Loading
path
: Directory path containing thefeatures_subdir
directory and corresponding CSV files (such astrain.csv
,dev.csv
, andtest.csv
).features_subdir
: The subdirectory within the dataset path where (extracted) features are stored.If no preprocessing is used (e.g., for raw audio), it should be
default
.For preprocessing transforms (e.g., log-Mel spectrograms with
log_mel_16k
), it should match the transform’s name, and the processed features are saved in this subdirectory after preprocessing.
index_column
: Column in the CSV files containing the file paths, relative to thefeatures_subdir
directory.target_column
: Column in the CSV files containing the corresponding targets or labels for each file.file_type
: Specifies the type of files to be loaded (e.g.,wav
,npy
, etc.).file_handler
: The file handler used for loading the files.
This results in a directory structure like the following:
{path}/{features_subdir}/optional/subdirs/some.file
For instance, a file in the index_column
might be optional/subdirs/some.file
,
where some.file
is an audio or a feature file.
In order to load custom dataset splits that do not follow the standard train.csv
, dev.csv
, and test.csv
convention,
the load_dataframes()
method can be overwritten (see custom datasets tutorial).
Training and Evaluation
criterion
: The criterion to use for training.metrics
: A list of metrics to evaluate the model.tracking_metric
: The metric to track for early stopping and model selection.transform
: The online transforms to apply to the data and the outputtype
of the dataset.
Note
The following attributes are automatically passed to the dataset during initialization and determined at runtime:
train_transform
,dev_transform
, andtest_transform
: TheSmartCompose
transformation pipelines (which may include possible online transforms or augmentations).seed
: The random seed for reproducibility during training.batch_size
,inference_batch_size
: The batch sizes for training and inference (dev, test).
The transform
attribute in the configuration is not passed to the dataset during initialization
and is used to specify the type of data the dataset provides as well as any
online transforms to be applied to the data at runtime.
To avoid race conditions when using Launcher Plugins that may run multiple training jobs in parallel,
autrainer fetch and autrainer preprocess
or fetch()
and preprocess()
are used to download the dataset and preprocess the data before training.
Note
All datasets that are provided by autrainer can be automatically downloaded as well as optionally preprocessed using the
autrainer fetch and autrainer preprocess CLI commands or the
fetch()
and preprocess()
CLI wrapper functions.
Abstract Dataset#
All datasets inherit from the AbstractDataset
class.
- class autrainer.datasets.AbstractDataset(path, features_subdir, seed, task, metrics, tracking_metric, index_column, target_column, file_type, file_handler, batch_size, inference_batch_size=None, train_transform=None, dev_transform=None, test_transform=None, stratify=None)[source]#
Abstract dataset class.
- Parameters:
path (
str
) – Root path to the dataset.features_subdir (
str
) – Subdirectory containing the features.seed (
int
) – Seed for reproducibility.task (
str
) – Task of the dataset inTASKS
.metrics (
List
[Union
[str
,DictConfig
,Dict
]]) – List of metrics to calculate.tracking_metric (
Union
[str
,DictConfig
,Dict
]) – Metric to track.index_column (
str
) – Index column of the dataframe.target_column (
Union
[str
,List
[str
]]) – Target column of the dataframe.file_type (
str
) – File type of the features.file_handler (
Union
[str
,DictConfig
,Dict
]) – File handler to load the data.batch_size (
int
) – Batch size.inference_batch_size (
Optional
[int
]) – Inference batch size. If None, defaults to batch_size. Defaults to None.train_transform (
Optional
[SmartCompose
]) – Transform to apply to the training set. Defaults to None.dev_transform (
Optional
[SmartCompose
]) – Transform to apply to the development set. Defaults to None.test_transform (
Optional
[SmartCompose
]) – Transform to apply to the test set. Defaults to None.stratify (
Optional
[List
[str
]]) – Columns to stratify the dataset on. Defaults to None.
- abstract property target_transform: AbstractTargetTransform#
Get the transform to apply to the target.
- Returns:
Target transform.
- property output_dim: int#
Get the output dimension of the dataset.
- Returns:
Output dimension.
- load_dataframes()[source]#
Load the dataframes.
- Return type:
Tuple
[DataFrame
,DataFrame
,DataFrame
]- Returns:
Dataframes for training, development, and testing.
- property train_dataset: DatasetWrapper#
Get the training dataset.
- Returns:
Training dataset.
- property dev_dataset: DatasetWrapper#
Get the development dataset.
- Returns:
Development dataset.
- property test_dataset: DatasetWrapper#
Get the test dataset.
- Returns:
Test dataset.
- property train_loader: DataLoader#
Get the training loader.
- Returns:
Training loader.
- property dev_loader: DataLoader#
Get the development loader.
- Returns:
Development loader.
- property test_loader: DataLoader#
Get the test loader.
- Returns:
Test loader.
- get_evaluation_data()[source]#
Get the evaluation data.
- Return type:
Tuple
[DataFrame
,DataFrame
,List
[str
],AbstractTargetTransform
]- Returns:
- Dataframes for development and testing, columns to stratify on, and
the target transform.
Base Datasets#
Base datasets that can be used for training without the need for creating custom datasets.
- class autrainer.datasets.BaseClassificationDataset(path, features_subdir, seed, metrics, tracking_metric, index_column, target_column, file_type, file_handler, batch_size, inference_batch_size=None, train_transform=None, dev_transform=None, test_transform=None, stratify=None)[source]#
Base classification dataset.
- Parameters:
path (
str
) – Root path to the dataset.features_subdir (
str
) – Subdirectory containing the features.seed (
int
) – Seed for reproducibility.metrics (
List
[Union
[str
,DictConfig
,Dict
]]) – List of metrics to calculate.tracking_metric (
Union
[str
,DictConfig
,Dict
]) – Metric to track.index_column (
str
) – Index column of the dataframe.target_column (
str
) – Target column of the dataframe.file_type (
str
) – File type of the features.file_handler (
Union
[str
,DictConfig
,Dict
]) – File handler to load the data.batch_size (
int
) – Batch size.inference_batch_size (
Optional
[int
]) – Inference batch size. If None, defaults to batch_size. Defaults to None.train_transform (
Optional
[SmartCompose
]) – Transform to apply to the training set. Defaults to None.dev_transform (
Optional
[SmartCompose
]) – Transform to apply to the development set. Defaults to None.test_transform (
Optional
[SmartCompose
]) – Transform to apply to the test set. Defaults to None.stratify (
Optional
[List
[str
]]) – Columns to stratify the dataset on. Defaults to None.
- property target_transform: LabelEncoder#
Get the transform to apply to the target.
- Returns:
Target transform.
- class autrainer.datasets.BaseMLClassificationDataset(path, features_subdir, seed, metrics, tracking_metric, index_column, target_column, file_type, file_handler, batch_size, inference_batch_size=None, train_transform=None, dev_transform=None, test_transform=None, stratify=None, threshold=0.5)[source]#
Base multi-label classification dataset.
- Parameters:
path (
str
) – Root path to the dataset.features_subdir (
str
) – Subdirectory containing the features.seed (
int
) – Seed for reproducibility.metrics (
List
[Union
[str
,DictConfig
,Dict
]]) – List of metrics to calculate.tracking_metric (
Union
[str
,DictConfig
,Dict
]) – Metric to track.index_column (
str
) – Index column of the dataframe.target_column (
List
[str
]) – Target column of the dataframe.file_type (
str
) – File type of the features.file_handler (
Union
[str
,DictConfig
,Dict
]) – File handler to load the data.batch_size (
int
) – Batch size.inference_batch_size (
Optional
[int
]) – Inference batch size. If None, defaults to batch_size. Defaults to None.train_transform (
Optional
[SmartCompose
]) – Transform to apply to the training set. Defaults to None.dev_transform (
Optional
[SmartCompose
]) – Transform to apply to the development set. Defaults to None.test_transform (
Optional
[SmartCompose
]) – Transform to apply to the test set. Defaults to None.stratify (
Optional
[List
[str
]]) – Columns to stratify the dataset on. Defaults to None.threshold (
float
) – Threshold for classification. Defaults to 0.5.
- property target_transform: MultiLabelEncoder#
Get the transform to apply to the target.
- Returns:
Target transform.
- class autrainer.datasets.BaseRegressionDataset(path, features_subdir, seed, metrics, tracking_metric, index_column, target_column, file_type, file_handler, batch_size, inference_batch_size=None, train_transform=None, dev_transform=None, test_transform=None, stratify=None)[source]#
Base regression dataset.
- Parameters:
path (
str
) – Root path to the dataset.features_subdir (
str
) – Subdirectory containing the features.seed (
int
) – Seed for reproducibility.metrics (
List
[Union
[str
,DictConfig
,Dict
]]) – List of metrics to calculate.tracking_metric (
Union
[str
,DictConfig
,Dict
]) – Metric to track.index_column (
str
) – Index column of the dataframe.target_column (
str
) – Target column of the dataframe.file_type (
str
) – File type of the features.file_handler (
Union
[str
,DictConfig
,Dict
]) – File handler to load the data.batch_size (
int
) – Batch size.inference_batch_size (
Optional
[int
]) – Inference batch size. If None, defaults to batch_size. Defaults to None.train_transform (
Optional
[SmartCompose
]) – Transform to apply to the training set. Defaults to None.dev_transform (
Optional
[SmartCompose
]) – Transform to apply to the development set. Defaults to None.test_transform (
Optional
[SmartCompose
]) – Transform to apply to the test set. Defaults to None.stratify (
Optional
[List
[str
]]) – Columns to stratify the dataset on. Defaults to None.
- property target_transform: MinMaxScaler#
Get the transform to apply to the target.
- Returns:
Target transform.
- class autrainer.datasets.BaseMTRegressionDataset(path, features_subdir, seed, metrics, tracking_metric, index_column, target_column, file_type, file_handler, batch_size, inference_batch_size=None, train_transform=None, dev_transform=None, test_transform=None, stratify=None)[source]#
Base multi-target regression dataset.
- Parameters:
path (
str
) – Root path to the dataset.features_subdir (
str
) – Subdirectory containing the features.seed (
int
) – Seed for reproducibility.metrics (
List
[Union
[str
,DictConfig
,Dict
]]) – List of metrics to calculate.tracking_metric (
Union
[str
,DictConfig
,Dict
]) – Metric to track.index_column (
str
) – Index column of the dataframe.target_column (
List
[str
]) – Target column of the dataframe.file_type (
str
) – File type of the features.file_handler (
Union
[str
,DictConfig
,Dict
]) – File handler to load the data.batch_size (
int
) – Batch size.inference_batch_size (
Optional
[int
]) – Inference batch size. If None, defaults to batch_size. Defaults to None.train_transform (
Optional
[SmartCompose
]) – Transform to apply to the training set. Defaults to None.dev_transform (
Optional
[SmartCompose
]) – Transform to apply to the development set. Defaults to None.test_transform (
Optional
[SmartCompose
]) – Transform to apply to the test set. Defaults to None.stratify (
Optional
[List
[str
]]) – Columns to stratify the dataset on. Defaults to None.
- property target_transform: MultiTargetMinMaxScaler#
Get the transform to apply to the target.
- Returns:
Target transform.
Toy Datasets#
A toy dataset for testing purposes.
Note
To easily test implementations, multiple toy dataset configurations across modalities and tasks are provided.
We offer ToyAudio-...
for audio, ToyImage-...
for image, and ToyTabular-...
for tabular data, respectively.
For each dataset, we provide a task -R
for regression, -C
for classification, -MLC
for multi-label classification,
and -MTR
for multi-target regression.
- class autrainer.datasets.ToyDataset(task, size, num_targets, feature_shape, dev_split, test_split, seed, metrics, tracking_metric, batch_size, dtype='float32', inference_batch_size=None, train_transform=None, dev_transform=None, test_transform=None)[source]#
Toy dataset for testing purposes.
- Parameters:
task (
str
) – Task of the dataset in [“classification”, “regression”, “ml-classification”, “mt-regression”].size (
int
) – Size of the dataset.num_targets (
int
) – Number of targets.feature_shape (
Union
[int
,List
[int
]]) – Shape of the features.dev_split (
float
) – Proportion of the dataset to use for the development set.test_split (
float
) – Proportion of the dataset to use for the test set.seed (
int
) – Seed for reproducibility.metrics (
List
[Union
[str
,DictConfig
,Dict
]]) – List of metrics to calculate.tracking_metric (
Union
[str
,DictConfig
,Dict
]) – Metric to track.batch_size (
int
) – Batch size.inference_batch_size (
Optional
[int
]) – Inference batch size. If None, defaults to batch_size. Defaults to None.train_transform (
Optional
[SmartCompose
]) – Transform to apply to the training set. Defaults to None.dev_transform (
Optional
[SmartCompose
]) – Transform to apply to the development set. Defaults to None.test_transform (
Optional
[SmartCompose
]) – Transform to apply to the test set. Defaults to None.
Default Configurations
ToyAudio
1id: ToyAudio-C 2_target_: autrainer.datasets.ToyDataset 3 4task: classification 5size: 1000 6num_targets: 10 7feature_shape: [1, 48000] 8dev_split: 0.2 9test_split: 0.2 10 11criterion: autrainer.criterions.BalancedCrossEntropyLoss 12metrics: 13 - autrainer.metrics.Accuracy 14 - autrainer.metrics.UAR 15 - autrainer.metrics.F1 16tracking_metric: autrainer.metrics.Accuracy 17 18transform: 19 type: raw
1id: ToyAudio-MLC 2_target_: autrainer.datasets.ToyDataset 3 4task: ml-classification 5size: 1000 6num_targets: 10 7feature_shape: [1, 48000] 8dev_split: 0.2 9test_split: 0.2 10 11criterion: torch.nn.BCEWithLogitsLoss 12metrics: 13 - autrainer.metrics.MLAccuracy 14 - autrainer.metrics.MLF1Micro 15 - autrainer.metrics.MLF1Macro 16 - autrainer.metrics.MLF1Weighted 17tracking_metric: autrainer.metrics.MLF1Weighted 18 19transform: 20 type: raw
1id: ToyAudio-MTR 2_target_: autrainer.datasets.ToyDataset 3 4task: mt-regression 5size: 1000 6num_targets: 10 7feature_shape: [1, 48000] 8dev_split: 0.2 9test_split: 0.2 10 11criterion: autrainer.criterions.MSELoss 12metrics: 13 - autrainer.metrics.PCC 14 - autrainer.metrics.CCC 15 - autrainer.metrics.MSE 16 - autrainer.metrics.MAE 17tracking_metric: autrainer.metrics.PCC 18 19transform: 20 type: raw
1id: ToyAudio-R 2_target_: autrainer.datasets.ToyDataset 3 4task: regression 5size: 1000 6num_targets: 1 7feature_shape: [1, 48000] 8dev_split: 0.2 9test_split: 0.2 10 11criterion: autrainer.criterions.MSELoss 12metrics: 13 - autrainer.metrics.PCC 14 - autrainer.metrics.CCC 15 - autrainer.metrics.MSE 16 - autrainer.metrics.MAE 17tracking_metric: autrainer.metrics.PCC 18 19transform: 20 type: raw
ToyImage
1id: ToyImage-C 2_target_: autrainer.datasets.ToyDataset 3 4task: classification 5size: 1000 6num_targets: 10 7feature_shape: [3, 64, 64] 8dev_split: 0.2 9test_split: 0.2 10dtype: uint8 11 12criterion: autrainer.criterions.BalancedCrossEntropyLoss 13metrics: 14 - autrainer.metrics.Accuracy 15 - autrainer.metrics.UAR 16 - autrainer.metrics.F1 17tracking_metric: autrainer.metrics.Accuracy 18 19transform: 20 type: image 21 base: 22 - autrainer.transforms.ScaleRange 23 - autrainer.transforms.Normalize: 24 mean: [0.485, 0.456, 0.406] 25 std: [0.229, 0.224, 0.225]
1id: ToyImage-MLC 2_target_: autrainer.datasets.ToyDataset 3 4task: ml-classification 5size: 1000 6num_targets: 10 7feature_shape: [3, 64, 64] 8dev_split: 0.2 9test_split: 0.2 10dtype: uint8 11 12criterion: torch.nn.BCEWithLogitsLoss 13metrics: 14 - autrainer.metrics.MLAccuracy 15 - autrainer.metrics.MLF1Micro 16 - autrainer.metrics.MLF1Macro 17 - autrainer.metrics.MLF1Weighted 18tracking_metric: autrainer.metrics.MLF1Weighted 19 20transform: 21 type: image 22 base: 23 - autrainer.transforms.ScaleRange 24 - autrainer.transforms.Normalize: 25 mean: [0.485, 0.456, 0.406] 26 std: [0.229, 0.224, 0.225]
1id: ToyImage-MTR 2_target_: autrainer.datasets.ToyDataset 3 4task: mt-regression 5size: 1000 6num_targets: 10 7feature_shape: [3, 64, 64] 8dev_split: 0.2 9test_split: 0.2 10dtype: uint8 11 12criterion: autrainer.criterions.MSELoss 13metrics: 14 - autrainer.metrics.PCC 15 - autrainer.metrics.CCC 16 - autrainer.metrics.MSE 17 - autrainer.metrics.MAE 18tracking_metric: autrainer.metrics.PCC 19 20transform: 21 type: image 22 base: 23 - autrainer.transforms.ScaleRange 24 - autrainer.transforms.Normalize: 25 mean: [0.485, 0.456, 0.406] 26 std: [0.229, 0.224, 0.225]
1id: ToyImage-R 2_target_: autrainer.datasets.ToyDataset 3 4task: regression 5size: 1000 6num_targets: 1 7feature_shape: [3, 64, 64] 8dev_split: 0.2 9test_split: 0.2 10dtype: uint8 11 12criterion: autrainer.criterions.MSELoss 13metrics: 14 - autrainer.metrics.PCC 15 - autrainer.metrics.CCC 16 - autrainer.metrics.MSE 17 - autrainer.metrics.MAE 18tracking_metric: autrainer.metrics.PCC 19 20transform: 21 type: image 22 base: 23 - autrainer.transforms.ScaleRange 24 - autrainer.transforms.Normalize: 25 mean: [0.485, 0.456, 0.406] 26 std: [0.229, 0.224, 0.225]
ToyTabular
1id: ToyTabular-C 2_target_: autrainer.datasets.ToyDataset 3 4task: classification 5size: 1000 6num_targets: 10 7feature_shape: 64 8dev_split: 0.2 9test_split: 0.2 10 11criterion: autrainer.criterions.BalancedCrossEntropyLoss 12metrics: 13 - autrainer.metrics.Accuracy 14 - autrainer.metrics.UAR 15 - autrainer.metrics.F1 16tracking_metric: autrainer.metrics.Accuracy 17 18transform: 19 type: tabular
1id: ToyTabular-MLC 2_target_: autrainer.datasets.ToyDataset 3 4task: ml-classification 5size: 1000 6num_targets: 10 7feature_shape: 64 8dev_split: 0.2 9test_split: 0.2 10 11criterion: torch.nn.BCEWithLogitsLoss 12metrics: 13 - autrainer.metrics.MLAccuracy 14 - autrainer.metrics.MLF1Micro 15 - autrainer.metrics.MLF1Macro 16 - autrainer.metrics.MLF1Weighted 17tracking_metric: autrainer.metrics.MLF1Weighted 18 19transform: 20 type: tabular
1id: ToyTabular-MTR 2_target_: autrainer.datasets.ToyDataset 3 4task: mt-regression 5size: 1000 6num_targets: 10 7feature_shape: 64 8dev_split: 0.2 9test_split: 0.2 10 11criterion: autrainer.criterions.MSELoss 12metrics: 13 - autrainer.metrics.PCC 14 - autrainer.metrics.CCC 15 - autrainer.metrics.MSE 16 - autrainer.metrics.MAE 17tracking_metric: autrainer.metrics.PCC 18 19transform: 20 type: tabular
1id: ToyTabular-R 2_target_: autrainer.datasets.ToyDataset 3 4task: regression 5size: 1000 6num_targets: 10 7feature_shape: 64 8dev_split: 0.2 9test_split: 0.2 10 11 12criterion: autrainer.criterions.MSELoss 13metrics: 14 - autrainer.metrics.PCC 15 - autrainer.metrics.CCC 16 - autrainer.metrics.MSE 17 - autrainer.metrics.MAE 18tracking_metric: autrainer.metrics.PCC 19 20transform: 21 type: tabular
Audio Datasets#
We provide a number of different audio-specific datasets.
- class autrainer.datasets.AIBO(path, features_subdir, seed, metrics, tracking_metric, index_column, target_column, file_type, file_handler, batch_size, inference_batch_size=None, train_transform=None, dev_transform=None, test_transform=None, stratify=None, standardize=False, aibo_task='2cl')[source]#
FAU AIBO dataset.
- Parameters:
path (
str
) – Root path to the dataset.features_subdir (
str
) – Subdirectory containing the features.seed (
int
) – Seed for reproducibility.metrics (
List
[Union
[str
,DictConfig
,Dict
]]) – List of metrics to calculate.tracking_metric (
Union
[str
,DictConfig
,Dict
]) – Metric to track.index_column (
str
) – Index column of the dataframe.target_column (
str
) – Target column of the dataframe.file_type (
str
) – File type of the features.file_handler (
Union
[str
,DictConfig
,Dict
]) – File handler to load the data.batch_size (
int
) – Batch size.inference_batch_size (
Optional
[int
]) – Inference batch size. If None, defaults to batch_size. Defaults to None.train_transform (
Optional
[SmartCompose
]) – Transform to apply to the training set. Defaults to None.dev_transform (
Optional
[SmartCompose
]) – Transform to apply to the development set. Defaults to None.test_transform (
Optional
[SmartCompose
]) – Transform to apply to the test set. Defaults to None.stratify (
Optional
[List
[str
]]) – Columns to stratify the dataset on. Defaults to None.standardize (
bool
) – Whether to standardize the data. Defaults to False.aibo_task (
str
) – Task to load in [“2cl”, “5cl”]. Defaults to “2cl”.
Default Configurations
1# Important: should be used with inference_batch_size: 1 2id: AIBO-eGeMAPS-llds 3_target_: autrainer.datasets.AIBO 4 5aibo_task: 2cl 6 7path: data/AIBO 8features_subdir: eGeMAPSv02-llds 9index_column: file 10target_column: class 11file_type: npy 12file_handler: autrainer.datasets.utils.NumpyFileHandler 13 14criterion: autrainer.criterions.BalancedCrossEntropyLoss 15metrics: 16 - autrainer.metrics.Accuracy 17 - autrainer.metrics.UAR 18 - autrainer.metrics.F1 19tracking_metric: autrainer.metrics.Accuracy 20 21transform: 22 type: tabular 23 base: 24 - autrainer.transforms.Expand: 25 size: 200 26 train: 27 - autrainer.transforms.RandomCrop: 28 size: 200
1# Important: should be used with inference_batch_size: 1 2id: AIBO-IS16-llds 3_target_: autrainer.datasets.AIBO 4 5aibo_task: 2cl 6 7path: data/AIBO 8features_subdir: ComParE_2016-llds 9index_column: file 10target_column: class 11file_type: npy 12file_handler: autrainer.datasets.utils.NumpyFileHandler 13 14criterion: autrainer.criterions.BalancedCrossEntropyLoss 15metrics: 16 - autrainer.metrics.Accuracy 17 - autrainer.metrics.UAR 18 - autrainer.metrics.F1 19tracking_metric: autrainer.metrics.Accuracy 20 21transform: 22 type: tabular 23 base: 24 - autrainer.transforms.Expand: 25 size: 200 26 train: 27 - autrainer.transforms.RandomCrop: 28 size: 200
1# Important: should be used with inference_batch_size: 1 2id: AIBO-mel-16k 3_target_: autrainer.datasets.AIBO 4 5aibo_task: 2cl 6 7path: data/AIBO 8features_subdir: log_mel_16k 9index_column: file 10target_column: class 11file_type: npy 12file_handler: autrainer.datasets.utils.NumpyFileHandler 13 14criterion: autrainer.criterions.BalancedCrossEntropyLoss 15metrics: 16 - autrainer.metrics.Accuracy 17 - autrainer.metrics.UAR 18 - autrainer.metrics.F1 19tracking_metric: autrainer.metrics.Accuracy 20 21transform: 22 type: grayscale 23 base: 24 - autrainer.transforms.Expand: 25 size: 301 26 axis: -2 27 train: 28 - autrainer.transforms.RandomCrop: 29 size: 301 30 axis: -2
1# Important: should be used with inference_batch_size: 1 2id: AIBO-mel-32k 3_target_: autrainer.datasets.AIBO 4 5aibo_task: 2cl 6 7path: data/AIBO 8features_subdir: log_mel_32k 9index_column: file 10target_column: class 11file_type: npy 12file_handler: autrainer.datasets.utils.NumpyFileHandler 13 14criterion: autrainer.criterions.BalancedCrossEntropyLoss 15metrics: 16 - autrainer.metrics.Accuracy 17 - autrainer.metrics.UAR 18 - autrainer.metrics.F1 19tracking_metric: autrainer.metrics.Accuracy 20 21transform: 22 type: grayscale 23 base: 24 - autrainer.transforms.Expand: 25 size: 301 26 axis: -2 27 train: 28 - autrainer.transforms.RandomCrop: 29 size: 301 30 axis: -2
1# Important: should be used with inference_batch_size: 1 2id: AIBO-wav-pad 3_target_: autrainer.datasets.AIBO 4 5aibo_task: 2cl 6 7path: data/AIBO 8features_subdir: default 9index_column: file 10target_column: class 11file_type: wav 12file_handler: autrainer.datasets.utils.AudioFileHandler 13 14criterion: autrainer.criterions.BalancedCrossEntropyLoss 15metrics: 16 - autrainer.metrics.Accuracy 17 - autrainer.metrics.UAR 18 - autrainer.metrics.F1 19tracking_metric: autrainer.metrics.Accuracy 20 21transform: 22 type: raw 23 base: 24 - autrainer.transforms.Expand: 25 size: 48000 26 axis: -1 27 train: 28 - autrainer.transforms.RandomCrop: 29 size: 48000 30 axis: -1
1# Important: should be used with inference_batch_size: 1 2id: AIBO-wav 3_target_: autrainer.datasets.AIBO 4 5aibo_task: 2cl 6 7path: data/AIBO 8features_subdir: default 9index_column: file 10target_column: class 11file_type: wav 12file_handler: autrainer.datasets.utils.AudioFileHandler 13 14criterion: autrainer.criterions.BalancedCrossEntropyLoss 15metrics: 16 - autrainer.metrics.Accuracy 17 - autrainer.metrics.UAR 18 - autrainer.metrics.F1 19tracking_metric: autrainer.metrics.Accuracy 20 21transform: 22 type: raw
- load_dataframes()[source]#
Load the dataframes.
- Return type:
Tuple
[DataFrame
,DataFrame
,DataFrame
]- Returns:
Dataframes for training, development, and testing.
- static download(path)[source]#
Download the FAU AIBO dataset.
As the AIBO dataset is private, this method does not download the dataset but rather prepares the file structure expected by the preprocessing routines.
In the specified path, the following directories and files are expected:
default/: Directory containing .wav files.
chunk_labels_2cl_corpus.txt: File containing the file names and corresponding labels for the 2-class classification task.
chunk_labels_5cl_corpus.txt: File containing the file names and corresponding labels for the 5-class classification task.
Produces the following splits for both tasks (2cl and 5cl):
train_{task}.csv: Training split of all speakers of the Ohm-Gymnasium with the exception of the last two speakers.
dev_{task}.csv: Development split of the last two speakers of the Ohm-Gymnasium.
test_{task}.csv: Test split of all speakers of the Montessori-Schule.
For more information on the dataset and dataset split, see: https://doi.org/10.1109/ICME51207.2021.9428217
- Parameters:
path (
str
) – Path to the directory to download the dataset to.- Return type:
None
- class autrainer.datasets.DCASE2016Task1(path, features_subdir, seed, metrics, tracking_metric, index_column, target_column, file_type, file_handler, batch_size, inference_batch_size=None, train_transform=None, dev_transform=None, test_transform=None, stratify=None, fold=1)[source]#
TUT Acoustic scenes 2016 Task 1 (DCASE2016Task1) dataset.
- Parameters:
path (
str
) – Root path to the dataset.features_subdir (
str
) – Subdirectory containing the features.seed (
int
) – Seed for reproducibility.metrics (
List
[Union
[str
,DictConfig
,Dict
]]) – List of metrics to calculate.tracking_metric (
Union
[str
,DictConfig
,Dict
]) – Metric to track.index_column (
str
) – Index column of the dataframe.target_column (
str
) – Target column of the dataframe.file_type (
str
) – File type of the features.file_handler (
Union
[str
,DictConfig
,Dict
]) – File handler to load the data.batch_size (
int
) – Batch size.inference_batch_size (
Optional
[int
]) – Inference batch size. If None, defaults to batch_size. Defaults to None.train_transform (
Optional
[SmartCompose
]) – Transform to apply to the training set. Defaults to None.dev_transform (
Optional
[SmartCompose
]) – Transform to apply to the development set. Defaults to None.test_transform (
Optional
[SmartCompose
]) – Transform to apply to the test set. Defaults to None.stratify (
Optional
[List
[str
]]) – Columns to stratify the dataset on. Defaults to None.fold (
int
) – Fold to use in [1, 2, 3, 4]. Defaults to 1.
Default Configurations
1id: DCASE2016Task1-16k 2_target_: autrainer.datasets.DCASE2016Task1 3 4fold: 1 5 6path: data/DCASE2016 7features_subdir: log_mel_16k 8index_column: filename 9target_column: scene_label 10file_type: npy 11file_handler: autrainer.datasets.utils.NumpyFileHandler 12 13criterion: autrainer.criterions.BalancedCrossEntropyLoss 14metrics: 15 - autrainer.metrics.Accuracy 16 - autrainer.metrics.UAR 17 - autrainer.metrics.F1 18tracking_metric: autrainer.metrics.Accuracy 19 20transform: 21 type: grayscale
1id: DCASE2016Task1-32k 2_target_: autrainer.datasets.DCASE2016Task1 3 4fold: 1 5 6path: data/DCASE2016 7features_subdir: log_mel_32k 8index_column: filename 9target_column: scene_label 10file_type: npy 11file_handler: autrainer.datasets.utils.NumpyFileHandler 12 13criterion: autrainer.criterions.BalancedCrossEntropyLoss 14metrics: 15 - autrainer.metrics.Accuracy 16 - autrainer.metrics.UAR 17 - autrainer.metrics.F1 18tracking_metric: autrainer.metrics.Accuracy 19 20transform: 21 type: grayscale
1id: DCASE2016Task1-wav-stm 2_target_: autrainer.datasets.DCASE2016Task1 3 4fold: 1 5 6path: data/DCASE2016 7features_subdir: default 8index_column: filename 9target_column: scene_label 10file_type: wav 11file_handler: autrainer.datasets.utils.AudioFileHandler 12 13criterion: autrainer.criterions.BalancedCrossEntropyLoss 14metrics: 15 - autrainer.metrics.Accuracy 16 - autrainer.metrics.UAR 17 - autrainer.metrics.F1 18tracking_metric: autrainer.metrics.Accuracy 19 20transform: 21 type: raw 22 base: 23 - autrainer.transforms.StereoToMono
1id: DCASE2016Task1-wav 2_target_: autrainer.datasets.DCASE2016Task1 3 4fold: 1 5 6path: data/DCASE2016 7features_subdir: default 8index_column: filename 9target_column: scene_label 10file_type: wav 11file_handler: autrainer.datasets.utils.AudioFileHandler 12 13criterion: autrainer.criterions.BalancedCrossEntropyLoss 14metrics: 15 - autrainer.metrics.Accuracy 16 - autrainer.metrics.UAR 17 - autrainer.metrics.F1 18tracking_metric: autrainer.metrics.Accuracy 19 20transform: 21 type: raw
- load_dataframes()[source]#
Load the dataframes.
- Return type:
Tuple
[DataFrame
,DataFrame
,DataFrame
]- Returns:
Dataframes for training, development, and testing.
- static download(path)[source]#
Download the TUT Acoustic scenes 2016 Task 1 (DCASE2016Task1) dataset.
For more information on the dataset and dataset split, see: https://dcase.community/challenge2016/task-acoustic-scene-classification
- Parameters:
path (
str
) – Path to the directory to download the dataset to.- Return type:
None
- class autrainer.datasets.DCASE2018Task3(path, features_subdir, seed, metrics, tracking_metric, index_column, target_column, file_type, file_handler, batch_size, inference_batch_size=None, train_transform=None, dev_transform=None, test_transform=None, stratify=None, dev_split=0.0, dev_split_seed=None)[source]#
DCASE 2018 Task 3 dataset.
- Parameters:
path (
str
) – Root path to the dataset.features_subdir (
str
) – Subdirectory containing the features.seed (
int
) – Seed for reproducibility.metrics (
List
[Union
[str
,DictConfig
,Dict
]]) – List of metrics to calculate.tracking_metric (
Union
[str
,DictConfig
,Dict
]) – Metric to track.index_column (
str
) – Index column of the dataframe.target_column (
str
) – Target column of the dataframe.file_type (
str
) – File type of the features.file_handler (
Union
[str
,DictConfig
,Dict
]) – File handler to load the data.batch_size (
int
) – Batch size.inference_batch_size (
Optional
[int
]) – Inference batch size. If None, defaults to batch_size. Defaults to None.train_transform (
Optional
[SmartCompose
]) – Transform to apply to the training set. Defaults to None.dev_transform (
Optional
[SmartCompose
]) – Transform to apply to the development set. Defaults to None.test_transform (
Optional
[SmartCompose
]) – Transform to apply to the test set. Defaults to None.stratify (
Optional
[List
[str
]]) – Columns to stratify the dataset on. Defaults to None.dev_split (
float
) – Fraction of the training set to use as the development set. Defaults to 0.0.dev_split_seed (
Optional
[int
]) – Seed for the development split. If None, seed is used. Defaults to None.
Default Configurations
1id: DCASE2018Task3-wav-16k 2_target_: autrainer.datasets.DCASE2018Task3 3 4dev_split: 0.1 5dev_split_seed: 0 6 7path: data/DCASE2018Task3 8features_subdir: default 9index_column: filename 10target_column: hasbird 11file_type: wav 12file_handler: 13 autrainer.datasets.utils.AudioFileHandler: 14 target_sample_rate: 16000 15 16criterion: autrainer.criterions.BalancedCrossEntropyLoss 17metrics: 18 - autrainer.metrics.Accuracy 19 - autrainer.metrics.UAR 20 - autrainer.metrics.F1 21tracking_metric: autrainer.metrics.Accuracy 22 23transform: 24 type: raw
1id: DCASE2018Task3-wav 2_target_: autrainer.datasets.DCASE2018Task3 3 4dev_split: 0.1 5dev_split_seed: 0 6 7path: data/DCASE2018Task3 8features_subdir: default 9index_column: filename 10target_column: hasbird 11file_type: wav 12file_handler: autrainer.datasets.utils.AudioFileHandler 13 14criterion: autrainer.criterions.BalancedCrossEntropyLoss 15metrics: 16 - autrainer.metrics.Accuracy 17 - autrainer.metrics.UAR 18 - autrainer.metrics.F1 19tracking_metric: autrainer.metrics.Accuracy 20 21transform: 22 type: raw
- load_dataframes()[source]#
Load the dataframes.
- Return type:
Tuple
[DataFrame
,DataFrame
,DataFrame
]- Returns:
Dataframes for training, development, and testing.
- static download(path)[source]#
Download the DCASE 2018 Task 3 dataset.
For the train dataset, the following subsets are used:
Field recordings, worldwide (“freefield1010”)
Remote monitoring flight calls, USA (“BirdVox-DCASE-20k”)
For the test dataset, the following subset is used:
Crowdsourced dataset, UK (“warblrb10k”)
Both the training and test datasets are taken from the development set, as no labels are provided for the evaluation set. For more information on the dataset, see: https://dcase.community/challenge2018/task-bird-audio-detection
- Parameters:
path (
str
) – Path to the directory to download the dataset to.- Return type:
None
- class autrainer.datasets.DCASE2020Task1A(path, features_subdir, seed, metrics, tracking_metric, index_column, target_column, file_type, file_handler, batch_size, inference_batch_size=None, train_transform=None, dev_transform=None, test_transform=None, stratify=None, dev_split=0.0, dev_split_seed=None, scene_category=None, exclude_cities=None)[source]#
TAU Urban Acoustic Scenes 2020 Mobile Task 1 Subtask A (DCASE2020Task1A) dataset.
- Parameters:
path (
str
) – Root path to the dataset.features_subdir (
str
) – Subdirectory containing the features.seed (
int
) – Seed for reproducibility.metrics (
List
[Union
[str
,DictConfig
,Dict
]]) – List of metrics to calculate.tracking_metric (
Union
[str
,DictConfig
,Dict
]) – Metric to track.index_column (
str
) – Index column of the dataframe.target_column (
str
) – Target column of the dataframe.file_type (
str
) – File type of the features.file_handler (
Union
[str
,DictConfig
,Dict
]) – File handler to load the data.batch_size (
int
) – Batch size.inference_batch_size (
Optional
[int
]) – Inference batch size. If None, defaults to batch_size. Defaults to None.train_transform (
Optional
[SmartCompose
]) – Transform to apply to the training set. Defaults to None.dev_transform (
Optional
[SmartCompose
]) – Transform to apply to the development set. Defaults to None.test_transform (
Optional
[SmartCompose
]) – Transform to apply to the test set. Defaults to None.stratify (
Optional
[List
[str
]]) – Columns to stratify the dataset on. Defaults to None.dev_split (
float
) – Fraction of the training set to use as the development set. Defaults to 0.0.dev_split_seed (
Optional
[int
]) – Seed for the development split. If None, seed is used. Defaults to None.scene_category (
Optional
[str
]) – Scene category in [“indoor”, “outdoor”, “transportation”]. Defaults to None.exclude_cities (
Optional
[List
[str
]]) – List of cities to exclude from the dataset. Defaults to None.
Default Configurations
1id: DCASE2020Task1A-16k 2_target_: autrainer.datasets.DCASE2020Task1A 3 4dev_split: 0.1 5dev_split_seed: 0 6stratify: 7 - scene_label 8 - city 9 - device 10 11path: data/DCASE2020Task1A 12features_subdir: log_mel_16k 13index_column: filename 14target_column: scene_label 15file_type: npy 16file_handler: autrainer.datasets.utils.NumpyFileHandler 17 18criterion: autrainer.criterions.BalancedCrossEntropyLoss 19metrics: 20 - autrainer.metrics.Accuracy 21 - autrainer.metrics.UAR 22 - autrainer.metrics.F1 23tracking_metric: autrainer.metrics.Accuracy 24 25transform: 26 type: grayscale
1id: DCASE2020Task1A-32k 2_target_: autrainer.datasets.DCASE2020Task1A 3 4dev_split: 0.1 5dev_split_seed: 0 6stratify: 7 - scene_label 8 - city 9 - device 10 11path: data/DCASE2020Task1A 12features_subdir: log_mel_32k 13index_column: filename 14target_column: scene_label 15file_type: npy 16file_handler: autrainer.datasets.utils.NumpyFileHandler 17 18criterion: autrainer.criterions.BalancedCrossEntropyLoss 19metrics: 20 - autrainer.metrics.Accuracy 21 - autrainer.metrics.UAR 22 - autrainer.metrics.F1 23tracking_metric: autrainer.metrics.Accuracy 24 25transform: 26 type: grayscale
1id: DCASE2020Task1A-wav-16k 2_target_: autrainer.datasets.DCASE2020Task1A 3 4dev_split: 0.1 5dev_split_seed: 0 6stratify: 7 - scene_label 8 - city 9 - device 10 11path: data/DCASE2020Task1A 12features_subdir: default 13index_column: filename 14target_column: scene_label 15file_type: wav 16file_handler: 17 autrainer.datasets.utils.AudioFileHandler: 18 target_sample_rate: 16000 19 20criterion: autrainer.criterions.BalancedCrossEntropyLoss 21metrics: 22 - autrainer.metrics.Accuracy 23 - autrainer.metrics.UAR 24 - autrainer.metrics.F1 25tracking_metric: autrainer.metrics.Accuracy 26 27transform: 28 type: raw
1id: DCASE2020Task1A-wav 2_target_: autrainer.datasets.DCASE2020Task1A 3 4dev_split: 0.1 5dev_split_seed: 0 6stratify: 7 - scene_label 8 - city 9 - device 10 11path: data/DCASE2020Task1A 12features_subdir: default 13index_column: filename 14target_column: scene_label 15file_type: wav 16file_handler: autrainer.datasets.utils.AudioFileHandler 17 18criterion: autrainer.criterions.BalancedCrossEntropyLoss 19metrics: 20 - autrainer.metrics.Accuracy 21 - autrainer.metrics.UAR 22 - autrainer.metrics.F1 23tracking_metric: autrainer.metrics.Accuracy 24 25transform: 26 type: raw
- load_dataframes()[source]#
Load the dataframes.
- Return type:
Tuple
[DataFrame
,DataFrame
,DataFrame
]- Returns:
Dataframes for training, development, and testing.
- static download(path)[source]#
Download the TAU Urban Acoustic Scenes 2020 Mobile Task 1 Subtask A (DCASE2020Task1A) dataset.
As no labels are provided for the evaluation set, the provided training and test split of the development set is created. Therefore, this download does not include the evaluation set.
For more information on the dataset, see: https://dcase.community/challenge2020/task-acoustic-scene-classification
- Parameters:
path (
str
) – Path to the directory to download the dataset to.- Return type:
None
- class autrainer.datasets.EDANSA2019(path, features_subdir, seed, metrics, tracking_metric, index_column, target_column, file_type, file_handler, batch_size, inference_batch_size=None, train_transform=None, dev_transform=None, test_transform=None, stratify=None, threshold=0.5)[source]#
EDANSA 2019 dataset.
- Parameters:
path (
str
) – Root path to the dataset.features_subdir (
str
) – Subdirectory containing the features.seed (
int
) – Seed for reproducibility.metrics (
List
[Union
[str
,DictConfig
,Dict
]]) – List of metrics to calculate.tracking_metric (
Union
[str
,DictConfig
,Dict
]) – Metric to track.index_column (
str
) – Index column of the dataframe.target_column (
List
[str
]) – Target column of the dataframe.file_type (
str
) – File type of the features.file_handler (
Union
[str
,DictConfig
,Dict
]) – File handler to load the data.batch_size (
int
) – Batch size.inference_batch_size (
Optional
[int
]) – Inference batch size. If None, defaults to batch_size. Defaults to None.train_transform (
Optional
[SmartCompose
]) – Transform to apply to the training set. Defaults to None.dev_transform (
Optional
[SmartCompose
]) – Transform to apply to the development set. Defaults to None.test_transform (
Optional
[SmartCompose
]) – Transform to apply to the test set. Defaults to None.stratify (
Optional
[List
[str
]]) – Columns to stratify the dataset on. Defaults to None.threshold (
float
) – Threshold for classification. Defaults to 0.5.
Default Configurations
1id: EDANSA2019-wav 2_target_: autrainer.datasets.EDANSA2019 3 4threshold: 0.5 5 6path: data/EDANSA-2019 7features_subdir: default 8index_column: Clip Path 9target_column: 10 - Anth 11 - Bio 12 - Geo 13 - Sil 14file_type: wav 15file_handler: autrainer.datasets.utils.AudioFileHandler 16 17criterion: torch.nn.BCEWithLogitsLoss 18metrics: 19 - autrainer.metrics.MLAccuracy 20 - autrainer.metrics.MLF1Micro 21 - autrainer.metrics.MLF1Macro 22 - autrainer.metrics.MLF1Weighted 23tracking_metric: autrainer.metrics.MLF1Weighted 24 25transform: 26 type: raw
- static download(path)[source]#
Download the EDANSA 2019 dataset.
For more information on the dataset, see: https://zenodo.org/doi/10.5281/zenodo.6824271
- Parameters:
path (
str
) – Path to the directory to download the dataset to.- Return type:
None
- class autrainer.datasets.EmoDB(path, features_subdir, seed, metrics, tracking_metric, index_column, target_column, file_type, file_handler, batch_size, inference_batch_size=None, train_transform=None, dev_transform=None, test_transform=None, stratify=None, train_speakers=None, dev_speakers=None, test_speakers=None)[source]#
EmoDB dataset for the task of Speech Emotion Recognition.
- Parameters:
path (
str
) – Root path to the dataset.features_subdir (
str
) – Subdirectory containing the features.seed (
int
) – Seed for reproducibility.metrics (
List
[Union
[str
,DictConfig
,Dict
]]) – List of metrics to calculate.tracking_metric (
Union
[str
,DictConfig
,Dict
]) – Metric to track.index_column (
str
) – Index column of the dataframe.target_column (
str
) – Target column of the dataframe.file_type (
str
) – File type of the features.file_handler (
Union
[str
,DictConfig
,Dict
]) – File handler to load the data.batch_size (
int
) – Batch size.inference_batch_size (
Optional
[int
]) – Inference batch size. If None, defaults to batch_size. Defaults to None.train_transform (
Optional
[SmartCompose
]) – Transform to apply to the training set. Defaults to None.dev_transform (
Optional
[SmartCompose
]) – Transform to apply to the development set. Defaults to None.test_transform (
Optional
[SmartCompose
]) – Transform to apply to the test set. Defaults to None.stratify (
Optional
[List
[str
]]) – Columns to stratify the dataset on. Defaults to None.train_speakers (
Optional
[List
[int
]]) – List of speakers IDs (int) to use for training. If None, 3, 8, 9, 10, 11, 12 are used. Defaults to None.dev_speakers (
Optional
[List
[int
]]) – List of speakers IDs (int) to use for validation. If None, 13, 14 are used. Defaults to None.test_speakers (
Optional
[List
[int
]]) – List of speakers IDs (int) to use for testing. If None, 15, 16 are used. Defaults to None.
Default Configurations
1id: EmoDB-32k 2_target_: autrainer.datasets.EmoDB 3 4train_speakers: [3, 8, 9, 10, 11, 12] 5dev_speakers: [13, 14] 6test_speakers: [15, 16] 7 8path: data/EmoDB 9features_subdir: log_mel_32k 10index_column: filename 11target_column: emotion 12file_type: npy 13file_handler: autrainer.datasets.utils.NumpyFileHandler 14 15criterion: autrainer.criterions.BalancedCrossEntropyLoss 16metrics: 17 - autrainer.metrics.Accuracy 18 - autrainer.metrics.UAR 19 - autrainer.metrics.F1 20tracking_metric: autrainer.metrics.Accuracy 21 22transform: 23 type: grayscale 24 base: 25 - autrainer.transforms.Expand: 26 size: 901 27 axis: -2
1id: EmoDB-wav 2_target_: autrainer.datasets.EmoDB 3 4train_speakers: [3, 8, 9, 10, 11, 12] 5dev_speakers: [13, 14] 6test_speakers: [15, 16] 7 8path: data/EmoDB 9features_subdir: default 10index_column: filename 11target_column: emotion 12file_type: wav 13file_handler: autrainer.datasets.utils.AudioFileHandler 14 15criterion: autrainer.criterions.BalancedCrossEntropyLoss 16metrics: 17 - autrainer.metrics.Accuracy 18 - autrainer.metrics.UAR 19 - autrainer.metrics.F1 20tracking_metric: autrainer.metrics.Accuracy 21 22transform: 23 type: raw 24 base: 25 - autrainer.transforms.Expand: 26 size: 144000 # 16khz * 9s 27 axis: -1
- load_dataframes()[source]#
Load the dataframes.
- Return type:
Tuple
[DataFrame
,DataFrame
,DataFrame
]- Returns:
Dataframes for training, development, and testing.
- static download(path)[source]#
Download the EmoDB dataset.
For more information on the dataset, see: http://emodb.bilderbar.info/docu/
- Parameters:
path (
str
) – Path to the directory to download the dataset to.- Return type:
None
- class autrainer.datasets.SpeechCommands(path, features_subdir, seed, metrics, tracking_metric, index_column, target_column, file_type, file_handler, batch_size, inference_batch_size=None, train_transform=None, dev_transform=None, test_transform=None, stratify=None)[source]#
Speech Commands (v0.02) dataset.
- Parameters:
path (
str
) – Root path to the dataset.features_subdir (
str
) – Subdirectory containing the features.seed (
int
) – Seed for reproducibility.metrics (
List
[Union
[str
,DictConfig
,Dict
]]) – List of metrics to calculate.tracking_metric (
Union
[str
,DictConfig
,Dict
]) – Metric to track.index_column (
str
) – Index column of the dataframe.target_column (
str
) – Target column of the dataframe.file_type (
str
) – File type of the features.file_handler (
Union
[str
,DictConfig
,Dict
]) – File handler to load the data.batch_size (
int
) – Batch size.inference_batch_size (
Optional
[int
]) – Inference batch size. If None, defaults to batch_size. Defaults to None.train_transform (
Optional
[SmartCompose
]) – Transform to apply to the training set. Defaults to None.dev_transform (
Optional
[SmartCompose
]) – Transform to apply to the development set. Defaults to None.test_transform (
Optional
[SmartCompose
]) – Transform to apply to the test set. Defaults to None.stratify (
Optional
[List
[str
]]) – Columns to stratify the dataset on. Defaults to None.
Default Configurations
1id: SpeechCommands-16k 2_target_: autrainer.datasets.SpeechCommands 3 4path: data/SpeechCommands 5features_subdir: log_mel_16k 6index_column: path 7target_column: label 8file_type: npy 9file_handler: autrainer.datasets.utils.NumpyFileHandler 10 11criterion: autrainer.criterions.BalancedCrossEntropyLoss 12metrics: 13 - autrainer.metrics.Accuracy 14 - autrainer.metrics.UAR 15 - autrainer.metrics.F1 16tracking_metric: autrainer.metrics.Accuracy 17 18transform: 19 type: grayscale 20 base: 21 - autrainer.transforms.Expand: 22 size: 101 23 axis: -2
1id: SpeechCommands-32k 2_target_: autrainer.datasets.SpeechCommands 3 4path: data/SpeechCommands 5features_subdir: log_mel_32k 6index_column: path 7target_column: label 8file_type: npy 9file_handler: autrainer.datasets.utils.NumpyFileHandler 10 11criterion: autrainer.criterions.BalancedCrossEntropyLoss 12metrics: 13 - autrainer.metrics.Accuracy 14 - autrainer.metrics.UAR 15 - autrainer.metrics.F1 16tracking_metric: autrainer.metrics.Accuracy 17 18transform: 19 type: grayscale 20 base: 21 - autrainer.transforms.Expand: 22 size: 101 23 axis: -2
1id: SpeechCommands-wav 2_target_: autrainer.datasets.SpeechCommands 3 4path: data/SpeechCommands 5features_subdir: default 6index_column: path 7target_column: label 8file_type: wav 9file_handler: autrainer.datasets.utils.AudioFileHandler 10 11criterion: autrainer.criterions.BalancedCrossEntropyLoss 12metrics: 13 - autrainer.metrics.Accuracy 14 - autrainer.metrics.UAR 15 - autrainer.metrics.F1 16tracking_metric: autrainer.metrics.Accuracy 17 18transform: 19 type: raw 20 base: 21 - autrainer.transforms.Expand: 22 size: 16000 23 axis: -1
- static download(path)[source]#
Download the Speech Commands (v0.02) dataset from torchaudio.
For more information on the dataset, see: https://doi.org/10.48550/arXiv.1804.03209
- Parameters:
path (
str
) – Path to the directory to download the dataset to.- Return type:
None