Quickstart#
The following quickstart guide provides a short introduction to autrainer and the creation of simple training experiments.
First Experiment#
To get started, create a new directory and navigate to it:
mkdir autrainer_example && cd autrainer_example
Next, create a new empty autrainer project using the following configuration management CLI command:
autrainer create --empty
Alternatively, use the following configuration management CLI wrapper function:
import autrainer.cli # the import is omitted in the following examples for brevity
autrainer.cli.create(empty=True)
This will create the configuration directory structure
and the main configuration (conf/config.yaml
) file with default values:
1defaults:
2 - _autrainer_
3 - _self_
4
5results_dir: results
6experiment_id: default
7iterations: 5
8
9hydra:
10 sweeper:
11 params:
12 +seed: 1
13 +batch_size: 32
14 +learning_rate: 0.001
15 dataset: ToyTabular-C
16 model: ToyFFNN
17 optimizer: Adam
Now, run the following training command to train the model:
autrainer train
Alternatively, use the following training CLI wrapper function:
autrainer.cli.train() # the train function is omitted in the following examples for brevity
This will train the default ToyFFNN
feed-forward neural network (FFNN
) on the default
ToyTabular-C
classification dataset with tabular data (ToyDataset
)
and output the training results to the results/default/
directory.
Custom Model Configuration#
The first experiment uses the default ToyFFNN
model with the following configuration having 2 hidden layers:
1id: ToyFFNN
2_target_: autrainer.models.FFNN
3input_size: 64
4hidden_size: 64
5num_layers: 2
6
7transform:
8 type: tabular
To create another configuration for the FFNN
model with 3 hidden layers,
create a new configuration file in the conf/model/
directory:
1id: Three-Layer-FFNN
2_target_: autrainer.models.FFNN
3input_size: 64
4hidden_size: 64
5num_layers: 3 # 3 hidden layers
6
7transform:
8 type: tabular
Next, update the main configuration (conf/config.yaml
) file to use the new model configuration:
1defaults:
2 - _autrainer_
3 - _self_
4
5results_dir: results
6experiment_id: default
7iterations: 5
8
9hydra:
10 sweeper:
11 params:
12 +seed: 1
13 +batch_size: 32
14 +learning_rate: 0.001
15 dataset: ToyTabular-C
16 model: Three-Layer-FFNN # 3 hidden layers
17 optimizer: Adam
Now, run the following training command to train the model with 3 hidden layers:
autrainer train
Grid Search Configuration#
To perform a grid search over multiple multiple configurations defined in the params
, update the
main configuration (conf/config.yaml
) to include multiple values separated by a comma.
The following configuration performs a grid search over the default FFNN
model with 2 and 3 hidden layers
as well as 3 different seeds:
1defaults:
2 - _autrainer_
3 - _self_
4
5results_dir: results
6experiment_id: default
7iterations: 5
8
9hydra:
10 sweeper:
11 params:
12 +seed: 1, 2, 3 # 3 seeds to compare
13 +batch_size: 32
14 +learning_rate: 0.001
15 dataset: ToyTabular-C
16 model: ToyFFNN, Three-Layer-FFNN # 2 models to compare
17 optimizer: Adam
Now, run the following training command to train the models with 2 and 3 hidden layers and 3 different seeds:
autrainer train
By default, a grid search is performed sequentially. Hydra allows the use of different launcher plugins to perform parallel grid searches.
Note
If a run already exists in the same experiment and has been completed successfully, then it will be skipped. This may be the case for both the default and custom model configurations with seed 1 if they have already been trained in the previous examples.
To compare the results of the individual runs as well as averaged across seeds, run the following postprocessing command:
autrainer postprocess results default --aggregate seed
Alternatively, use the following postprocessing CLI wrapper function:
autrainer.cli.postprocess(
results_dir="results",
experiment_id="default",
aggregate=[["seed"]],
)
Spectrogram Classification#
To train a Cnn10
model on an audio dataset such as DCASE2016Task1
,
update the main configuration (conf/config.yaml
) file:
1defaults:
2 - _autrainer_
3 - _self_
4
5results_dir: results
6experiment_id: spectrogram
7iterations: 5
8
9hydra:
10 sweeper:
11 params:
12 +seed: 1
13 +batch_size: 32
14 +learning_rate: 0.001
15 dataset: DCASE2016Task1-32k
16 model: Cnn10-32k-T
17 optimizer: Adam
For the Cnn10
model, the following configuration is used:
1id: Cnn10-32k-T
2_target_: autrainer.models.Cnn10
3transfer: https://zenodo.org/records/3987831/files/Cnn10_mAP%3D0.380.pth
4
5transform:
6 type: grayscale
7 base:
8 - autrainer.transforms.Normalize: null
The ending 32k-T
indicates that the model using transfer learning and has been pretrained with a sample rate of 32 kHz.
Tip
To discover all available default configurations for e.g. different models, the configuration management CLI, the configuration management CLI wrapper, and the models documentation can be used.
For the DCASE2016Task1
dataset, the following configuration is used:
1id: DCASE2016Task1-32k
2_target_: autrainer.datasets.DCASE2016Task1
3
4fold: 1
5
6path: data/DCASE2016
7features_subdir: log_mel_32k
8index_column: filename
9target_column: scene_label
10file_type: npy
11file_handler: autrainer.datasets.utils.NumpyFileHandler
12
13criterion: autrainer.criterions.BalancedCrossEntropyLoss
14metrics:
15 - autrainer.metrics.Accuracy
16 - autrainer.metrics.UAR
17 - autrainer.metrics.F1
18tracking_metric: autrainer.metrics.Accuracy
19
20transform:
21 type: grayscale
The ending 32k
indicates that the dataset has a sample rate of 32 kHz and provides log-Mel spectrograms instead of raw audio.
To avoid race conditions when using Launcher Plugins that may run multiple training jobs in parallel, the following preprocessing command is used to fetch and download the model weights and the raw audio files of the dataset:
autrainer fetch
Alternatively, use the following preprocessing CLI wrapper function:
autrainer.cli.fetch()
As the dataset uses log-Mel spectrograms instead of the raw audio files downloaded in the previous step, the following preprocessing command is used to preprocess and extract the features from the raw audio files:
autrainer preprocess
Alternatively, use the following preprocessing CLI wrapper function:
autrainer.cli.preprocess()
Now, run the following training command to train the model on the audio dataset:
autrainer train
Training Duration & Step-based Training#
By default, autrainer uses epoch-based training, where the iterations
correspond to the number of epochs.
To change the training duration of the spectrogram classification model,
increase the number of iterations
in the main configuration (conf/config.yaml
) file.
However, to use step-based training instead of epoch-based training, set the training_type
to step
.
The following configuration trains the spectrogram classification model for a total of 1000 steps with step-based training, evaluating every 100 steps, saving the states every 200 steps, and without displaying a progress bar:
1defaults:
2 - _autrainer_
3 - _self_
4
5results_dir: results
6experiment_id: spectrogram_step
7
8training_type: step
9iterations: 1000
10eval_frequency: 100
11save_frequency: 200
12progress_bar: false
13
14hydra:
15 sweeper:
16 params:
17 +seed: 1
18 +batch_size: 32
19 +learning_rate: 0.001
20 dataset: DCASE2016Task1-32k
21 model: Cnn10-32k-T
22 optimizer: Adam
Now, run the following training command to train the model on the audio dataset for 1000 steps:
autrainer train
Filtering Configurations#
By default, autrainer filters out any configurations that have already been trained
and exist in the same experiment using the hydra-filter-sweeper plugin
with the following filters
that are implicitly set in the _autrainer_.yaml defaults file:
1 filters:
2 - type: exists
3 path: metrics.csv
To filter out unwanted configurations and exclude them from training,
the hydra-filter-sweeper plugin can be used as the
Hydra sweeper plugin.
hydra-filter-sweeper
allows to specify a list of filters
to exclude configurations based on their attributes.
The following configuration expands the grid search configuration
and adds a filter that excludes any seed greater than 2 for the Three-Layer-FFNN
model:
1defaults:
2 - _autrainer_
3 - _self_
4
5results_dir: results
6experiment_id: default
7iterations: 5
8
9hydra:
10 sweeper:
11 params:
12 +seed: 1, 2, 3
13 +batch_size: 32
14 +learning_rate: 0.001
15 dataset: ToyTabular-C
16 model: ToyFFNN, Three-Layer-FFNN
17 optimizer: Adam
18 filters:
19 - type: exists
20 path: metrics.csv
21 - type: expr
22 expr: model.id == "Three-Layer-FFNN" and seed > 2
Note
If the filters
attribute is overridden in the main configuration (conf/config.yaml
) file,
then the default filters are not applied.
To still filter out configurations that have already been trained, the following default filter should still be included:
1 filters:
2 - type: exists
3 path: metrics.csv
Now, run the following training command to train the ToyFFNN
with 3 seeds and the Three-Layer-FFNN
with 2 seeds:
autrainer train
Next Steps#
For more information on creating configurations, refer to the Hydra configurations as well as the Hydra documentation.
To create custom implementations alongside configurations, refer to the tutorials.