Tutorials#

autrainer is designed to be flexible and extensible, allowing for the creation of custom …

models
datasets (including metrics, criterions, file handlers, target transforms, and advanced data pipelines)
optimizers
schedulers
transforms (including preprocessing transforms and online transforms)
augmentations
loggers

For each, a tutorial is provided below to demonstrate their implementation and configuration.

For the following tutorials, all python files should be placed in the project root directory and all configuration files should be placed in the corresponding subdirectories of the conf/ directory.

Custom Models#

To create a custom model, inherit from AbstractModel and implement the forward() and embeddings() methods. All arguments of the constructor have to be assigned to a variable with the same name, as AbstractModel inherits from audobject.

For example, the following model is a simple CNN that takes a spectrogram as input and has a variable number of hidden CNN layers with a different number of filters each:

spectrogram_cnn.py#

from typing import List

import torch

from autrainer.models import AbstractModel


class SpectrogramCNN(AbstractModel):
    def __init__(self, output_dim: int, hidden_dims: List[int]) -> None:
        """Spectrogram CNN model with a variable number of hidden CNN layers.

        Args:
            output_dim: Output dimension of the model.
            hidden_dims: List of hidden dimensions for the CNN layers.
        """
        super().__init__(output_dim, None)  # no transfer learning
        self.hidden_dims = hidden_dims
        layers = []
        input_dim = 1
        for hidden_dim in self.hidden_dims:
            layers.extend(
                [
                    torch.nn.Conv2d(input_dim, hidden_dim, (3, 3), 1),
                    torch.nn.ReLU(),
                    torch.nn.MaxPool2d((2, 2)),
                ]
            )
            input_dim = hidden_dim
        layers.extend(
            [
                torch.nn.AdaptiveAvgPool2d((1, 1)),
                torch.nn.Flatten(),
            ]
        )
        self.backbone = torch.nn.Sequential(*layers)
        self.classifier = torch.nn.Linear(self.hidden_dims[-1], output_dim)

    def embeddings(self, features: torch.Tensor) -> torch.Tensor:
        return self.backbone(features)

    def forward(self, features: torch.Tensor) -> torch.Tensor:
        return self.classifier(self.embeddings(features))

Next, create a SpectrogramCNN.yaml configuration file for the model in the conf/model/ directory:

conf/model/SpectrogramCNN.yaml#

id: SpectrogramCNN
_target_: spectrogram_cnn.SpectrogramCNN

hidden_dims: [32, 64, 128]

transform:
  type: grayscale

The id should match the name of the configuration file. The _target_ should point to the custom model class via a python import path (here assuming that the spectrogram_cnn.py file is in the root directory of the project). Each model should include a transform/type attribute in the configuration file, specifying the input type it expects.

Note

The output_dim attribute is automatically passed to the model during initialization and determined by the dataset at runtime.

The transform attribute in the configuration is not passed to the model during initialization and is used to specify the input type of the model and any online transforms to be applied to the data at runtime.

Custom Datasets#

To create a custom dataset, inherit from AbstractDataset and implement the target_transform and output_dim properties.

The train, dev, and test datasets as well as loaders are automatically created by the abstract class. However, this requires that the dataset structure follows the standard format outlined in the dataset documentation. If the dataset structure is different or does not rely on dataframes, the df_train, df_dev, and df_test, train_dataset, train_loader etc. properties can be overridden.

autrainer provides base datasets for classification (BaseClassificationDataset), regression (BaseRegressionDataset), and multi-label classification (BaseMLClassificationDataset) tasks. In this case, both the target transform and output dimension are already implemented in the base class and do not need to be overridden.

Tip

To automatically download a custom dataset, implement the download() method. This method is called by the autrainer fetch CLI command as well as the fetch() CLI wrapper function. The path attribute specified in the dataset configuration file is passed to the method to store the downloaded data in.

ESC-50 Example

For example, the ESC-50 dataset is an audio classification dataset and can be implemented as follows:

esc_50.py#

from functools import cached_property
import os
import shutil
from typing import Any, Dict, List

import pandas as pd

from autrainer.datasets import BaseClassificationDataset
from autrainer.datasets.utils import ZipDownloadManager


FILES = {"ESC-50.zip": "https://github.com/karoldvl/ESC-50/archive/master.zip"}


class ESC50(BaseClassificationDataset):
    def __init__(
        self,
        train_folds: List[int],
        dev_folds: List[int],
        test_folds: List[int],
        **kwargs: Dict[str, Any],  # kwargs only for simplicity in the tutorial
    ) -> None:
        self.train_folds = train_folds
        self.dev_folds = dev_folds
        self.test_folds = test_folds
        super().__init__(**kwargs)

    @cached_property
    def _load_metadata(self) -> pd.DataFrame:
        return pd.read_csv(os.path.join(self.path, "esc50.csv"))

    @cached_property
    def df_train(self) -> pd.DataFrame:
        meta = self._load_metadata
        return meta[meta["fold"].isin(self.train_folds)]

    @cached_property
    def df_dev(self) -> pd.DataFrame:
        meta = self._load_metadata
        return meta[meta["fold"].isin(self.dev_folds)]

    @cached_property
    def df_test(self) -> pd.DataFrame:
        meta = self._load_metadata
        return meta[meta["fold"].isin(self.test_folds)]

    @staticmethod
    def download(path: str) -> None:
        if os.path.exists(os.path.join(path, "default")):
            return

        dl_manager = ZipDownloadManager(FILES, path)
        dl_manager.download(check_exist=["ESC-50.zip"])
        dl_manager.extract(check_exist=["ESC-50-master"])
        shutil.move(
            os.path.join(path, "ESC-50-master", "audio"),
            os.path.join(path, "default"),
        )
        shutil.move(
            os.path.join(path, "ESC-50-master", "meta", "esc50.csv"),
            path,
        )
        shutil.rmtree(os.path.join(path, "ESC-50-master"))

The dataset provides audio files by default (which are moved to the default/ directory in the download() method) and the corresponding metadata of the dataset is stored in the esc50.csv file.

To allow the the specification of custom folds, the df_train, df_dev, and df_test properties are overridden to split the esc50.csv metadata file into the respective train, dev, and test dataframes. This also allows for cross-validation by creating multiple configurations with different folds.

To extract log-Mel spectrograms from the audio files, a preprocessing transform can be applied to the data before training. The following configuration creates a new ESC50-32k.yaml dataset in the conf/dataset/ directory with log-Mel spectrograms preprocessed at a sample rate of 32 kHz:

conf/dataset/ESC50-32k.yaml#

id: ESC50-32k
_target_: esc_50.ESC50

path: data/ESC50
features_subdir: log_mel_32k
index_column: filename
target_column: category
file_type: npy
file_handler: autrainer.datasets.utils.NumpyFileHandler

train_folds: [1, 2, 3]
dev_folds: [4]
test_folds: [5]

criterion: autrainer.criterions.BalancedCrossEntropyLoss
metrics: 
  - autrainer.metrics.Accuracy
  - autrainer.metrics.UAR
  - autrainer.metrics.F1
tracking_metric: autrainer.metrics.Accuracy

transform:
  type: grayscale

The dataset can be automatically downloaded and preprocessed using the autrainer fetch and autrainer preprocess CLI commands or the fetch() and preprocess() CLI wrapper functions.

Simple Dataset Example

If the structure of the dataset follows the standard format outlined in the dataset documentation, no implementation is necessary and a new dataset can be created by simply adding a configuration file to the conf/dataset/ directory.

For example, the following configuration file creates a new SpectrogramDataset.yaml classification dataset, preprocessing the data with a spectrogram preprocessing transform at a sample rate of 32 kHz:

conf/dataset/SpectrogramDataset.yaml#

id: SpectrogramDataset-32k
_target_: autrainer.datasets.BaseClassificationDataset

path: data/SpectrogramDataset # base path to the dataset
features_subdir: log_mel_32k # spectrogram preprocessed features
index_column: path # column in the CSVs containing features paths relative to features_subdir
target_column: label # column in the CSVs containing the target labels
file_type: npy # file extension of the spectrogram features
file_handler: autrainer.datasets.utils.NumpyFileHandler # file handler for the spectrogram features

criterion: autrainer.criterions.BalancedCrossEntropyLoss
metrics: 
  - autrainer.metrics.Accuracy
  - autrainer.metrics.UAR
  - autrainer.metrics.F1
tracking_metric: autrainer.metrics.Accuracy

transform:
  type: grayscale

This dataset assumes that the data/SpectrogramDataset directory contains the following directories and files:

default/ directory containing the raw audio files. These audio files are preprocessed using the spectrogram preprocessing transform with the autrainer preprocess CLI command or the preprocess() CLI wrapper function and stored in the data/SpectrogramDataset/log_mel_32k directory.
train.csv, dev.csv, and test.csv files containing the file paths relative to the default/ directory in the index_column column and the corresponding labels in the target_column column.

Custom Metrics#

To create a custom metric, inherit from AbstractMetric and implement the starting_metric, suffix properties, as well as the get_best(), the get_best_pos(), and compare() static methods.

autrainer provides base classes for ascending (BaseAscendingMetric) and descending (BaseDescendingMetric) metrics that can be inherited from to simplify the implementation.

For example, the following metric implements the Cohen’s Kappa score with either linear or quadratic weights:

cohens_kappa_metric.py#

import sklearn.metrics

from autrainer.metrics import BaseAscendingMetric


class CohensKappa(BaseAscendingMetric):
    def __init__(self, weights: str) -> None:
        """Coehn's Kappa metric using `sklearn.metrics.cohen_kappa_score`.

        Args:
            weights: Weighting type for the metric in ["linear", "quadratic"].
        """
        super().__init__(
            name="cohens-kappa",
            fn=sklearn.metrics.cohen_kappa_score,
            weights=weights,
        )

The fn attribute is the function that is automatically called in the __call__() method and the weights attribute is passed to the fn as a keyword argument.

As metrics are specified using shorthand syntax in the dataset configuration, the following relative import path can be used to reference it as the tracking_metric for the dataset:

conf/dataset/ExampleDataset.yaml#

...
tracking_metric:
  cohens_kappa_metric.CohensKappa:
     weights: linear # linear or quadratic
...

Custom Criterions#

To create a custom criterion, inherit from torch.nn.modules.loss._Loss and implement the forward() method. If the criterion relies on the dataset, an optional criterion setup method can be defined which is called after the dataset is initialized.

Note

The reduction attribute of each criterion is automatically set to "none" during instantiation and the forward() method should return the per-example loss.

For example, the following criterion implements CrossEntropyLoss with an additional scaling factor:

scaled_ce_loss.py#

from typing import Any, Dict

import torch


class ScaledCrossEntropyLoss(torch.nn.CrossEntropyLoss):
    def __init__(
        self,
        scaling_factor: float = 1.0,
        *args: Any,
        **kwargs: Dict[str, Any],
    ) -> None:
        """Cross entropy loss with a scaling factor.

        Args:
            scaling_factor: Scaling factor for the loss.
            *args: Positional arguments passed to `torch.nn.CrossEntropyLoss`.
            **kwargs: Keyword arguments passed to `torch.nn.CrossEntropyLoss`.
        """
        super().__init__(*args, **kwargs)
        self.scaling_factor = scaling_factor

    def forward(self, x: torch.Tensor, y: torch.Tensor) -> torch.Tensor:
        if y.ndim == 1:
            y = y.long()
        return self.scaling_factor * super().forward(x, y)

As criterions are specified using shorthand syntax in the dataset configuration, the following relative import path can be used to reference it as the criterion for the dataset:

conf/dataset/ExampleDataset.yaml#

...
criterion:
  scaled_ce_loss.ScaledCrossEntropyLoss:
     scaling_factor: 0.5
...

Or without overriding the default scaling_factor value:

conf/dataset/ExampleDataset.yaml#

...
criterion: scaled_ce_loss.ScaledCrossEntropyLoss
...

Custom File Handlers#

To create a custom file handler, inherit from AbstractFileHandler and implement the load() and save() methods.

For example, the following file handler loads and saves PyTorch tensors:

torch_file_handler.py#

import torch

from autrainer.datasets.utils import AbstractFileHandler


class TorchFileHandler(AbstractFileHandler):
    def load(self, file: str) -> torch.Tensor:
        return torch.load(file)

    def save(self, file: str, data: torch.Tensor) -> None:
        torch.save(data, file)

File handlers are specified using shorthand syntax in the dataset configuration. The following configuration utilizes the TorchFileHandler to load and save PyTorch tensors with the file extension .pt:

conf/dataset/ExampleDataset.yaml#

...
file_type: pt
file_handler: torch_file_handler.TorchFileHandler
...

Custom Target Transforms#

To create a custom target transform, inherit from AbstractTargetTransform and implement the encode(), decode(), predict_batch(), and majority_vote() methods.

For example, the following target transform logarithmically encodes and decodes the targets for regression tasks:

log_target_transform.py#

import math
from typing import Dict, List, Union

import torch

from autrainer.datasets.utils import AbstractTargetTransform


class LogTargetTransform(AbstractTargetTransform):
    def __init__(self, target: str, base: int = 10, eps: float = 1e-9) -> None:
        """Logarithmic target transform for regression tasks.

        Args:
            target: Name of the target.
            base: Base of the logarithm. Defaults to 10.
            eps: Small value to avoid taking the logarithm of zero.
                Defaults to 1e-9.
        """
        self.target = target
        self.base = base
        self.eps = eps

    def encode(self, x: float) -> float:
        return math.log(x + self.eps, self.base)

    def decode(self, x: float) -> float:
        return math.pow(self.base, x) - self.eps

    def probabilities_inference(self, x: torch.Tensor) -> torch.Tensor:
        return x

    def predict_inference(self, x: torch.Tensor) -> Union[List[float], float]:
        return x.squeeze().tolist()

    def majority_vote(self, x: List[float]) -> float:
        return sum(x) / len(x)

    def probabilities_to_dict(self, x: torch.Tensor) -> Dict[str, float]:
        return {self.target: x.item()}

The target transforms are specified in the target_transform property of a dataset implementation.

Advanced Data Pipelines#

To create data and model pipelines that go beyond the standard DataItem convention of using only features as input to the model, first create a new DataItem dataclass that includes a new parameter (e.g., meta):

multi_branch_data.py#

@dataclass
class DataItemMultiBranch(AbstractDataItem):
    features: torch.Tensor
    meta: torch.Tensor
    target: int
    index: int

Next, override AbstractDataBatch to include the new meta parameter:

multi_branch_data.py#

@dataclass
class DataBatchMulti(AbstractDataBatch[DataItemMultiBranch]):
    """Data batch class for a batch of data samples.

    Args:
        features: Tensor of input features.
        meta: Tensor of input support features.
        target: Tensor of target values for the input features.
        index: Tensor of indices for the data samples.
    """

    features: torch.Tensor
    meta: torch.Tensor
    target: torch.Tensor
    index: torch.Tensor

    def to(self, device: torch.device) -> None:
        self.features = self.features.to(device)
        self.meta = self.meta.to(device)
        self.target = self.target.to(device)

Following that, inherit from DatasetWrapper to create a torch.utils.data.Dataset that iterates over your data and returns your custom AbstractDataBatch (here we simply replicate features as our auxiliary features):

multi_branch_data.py#

class ToyDatasetMultiBranch(ToyDatasetWrapper):
    def __getitem__(self, index: int) -> DataItemMultiBranch:
        data = super().__getitem__(index)
        return DataItemMultiBranch(
            features=data.features,
            target=data.target,
            index=data.index,
            meta=data.features,
        )

Subsequently, inherit from AbstractDataset to create a dataset that instantiates your DatasetWrapper:

multi_branch_data.py#

class ToyMultiBranchData(ToyDataset):
    def _init_dataset(
        self,
        df: pd.DataFrame,
        transform: SmartCompose,
    ) -> ToyDatasetMultiBranch:
        return ToyDatasetMultiBranch(
            df=df,
            target_column=self.target_column,
            feature_shape=self.feature_shape,
            dtype=self.dtype,
            generator=self._generator,
            transform=transform,
            target_transform=self.target_transform,
        )

    @property
    def default_collate_fn(self) -> Callable:
        return DataBatchMulti.collate

Next, create a ToyMultiBranch-C.yaml configuration file for the dataset in the conf/dataset/ directory:

conf/data/ToyMultiBranch-C.yaml#

id: ToyMultiBranch-C
_target_: multi_branch_data.ToyMultiBranchData

task: classification
size: 1000
num_targets: 10
feature_shape: 64
dev_split: 0.2
test_split: 0.2

criterion: autrainer.criterions.BalancedCrossEntropyLoss
metrics:
  - autrainer.metrics.Accuracy
  - autrainer.metrics.UAR
  - autrainer.metrics.F1
tracking_metric: autrainer.metrics.Accuracy

transform:
  type: tabular

Finally, inherit from AbstractModel and create a model with a matching signature, in its forward pass to access the meta parameter:

multi_branch_model.py#

class ToyMultiBranchModel(AbstractModel):
    def __init__(self, input_dim: int, output_dim: int, hidden_dim: int) -> None:
        super().__init__(output_dim, None)  # no transfer learning
        self.input_dim = input_dim
        self.hidden_dim = hidden_dim

        self.linear1 = torch.nn.Linear(self.input_dim, self.hidden_dim)
        self.linear2 = torch.nn.Linear(self.input_dim, self.hidden_dim)
        self.out = torch.nn.Linear(self.hidden_dim * 2, self.output_dim)

    def embeddings(self, features: torch.Tensor, meta: torch.Tensor) -> torch.Tensor:
        return torch.concat([self.linear1(features), self.linear2(meta)], axis=1)

    def forward(self, features: torch.Tensor, meta: torch.Tensor) -> torch.Tensor:
        return self.out(self.embeddings(features=features, meta=meta))

Next, create a ToyMultiBranchModel.yaml configuration file for the model in the conf/model/ directory:

conf/data/ToyMultiBranchModel.yaml#

id: ToyMultiBranchModel
_target_: multi_branch_model.ToyMultiBranchModel
input_dim: 64
hidden_dim: 64

transform:
  type: tabular

Custom Optimizers#

To create a custom optimizer, inherit from torch.optim.Optimizer and implement the step() method.

For example, the following optimizer implements the SGD optimizer with an additional randomly scaled learning rate using a custom step function:

random_scaled_sgd.py#

from typing import Any, Callable, Dict, Tuple

import torch

from autrainer.core.structs import AbstractDataBatch
from autrainer.models import AbstractModel
from autrainer.models.utils import create_model_inputs


class RandomScaledSGD(torch.optim.Optimizer):
    def __init__(
        self,
        scaling_factor: float = 0.01,
        p: float = 1.0,
        generator_seed: int = None,
        *args: Any,
        **kwargs: Dict[str, Any],
    ) -> None:
        """Randomized Scaled SGD optimizer. Randomly scales the learning rate.

        Args:
            scaling_factor: Learning rate scaling factor. Defaults to 1.0.
            p: Probability of scaling the learning rate. Defaults to 1.0.
            generator_seed: Seed for the random number generator.
                Defaults to None.
        """
        super().__init__(*args, **kwargs)
        self.scaling_factor = scaling_factor
        self.p = p
        self.g = torch.Generator()
        self.base_lr = self.param_groups[0]["lr"]
        if generator_seed is not None:
            self.g.manual_seed(generator_seed)

    def custom_step(
        self,
        model: AbstractModel,  # model
        data: AbstractDataBatch,  # batched input data
        criterion: torch.nn.Module,  # loss function
        probabilities_fn: Callable,  # function to get probabilities from model outputs
    ) -> Tuple[torch.Tensor, torch.Tensor]:
        self.zero_grad()
        output = model(**create_model_inputs(model, data))
        loss = criterion(probabilities_fn(output), data.target)
        loss.mean().backward()
        if torch.rand(1, generator=self.g).item() < self.p:
            self.param_groups[0]["lr"] *= self.scaling_factor
        self.step()
        self.param_groups[0]["lr"] = self.base_lr
        return loss, output

The following configuration creates a new RandomScaledSGD.yaml optimizer in the conf/optimizer/ directory and uses the global seed of the main configuration as the generator_seed attribute:

conf/optimizer/RandomScaledSGD.yaml#

id: RandomScaledSGD
_target_: random_scaled_sgd.RandomScaledSGD

scaling_factor: 0.001
p: 0.05
generator_seed: ${seed}

Note

The params and lr attributes are automatically passed to the optimizer during initialization and determined at runtime.

Custom Schedulers#

To create a custom scheduler, inherit from torch.optim.lr_scheduler.LRScheduler and implement the get_lr() method.

For example, the following scheduler implements a simple linear warm-up scheduler:

linear_warm_up_lr.py#

from typing import List

import torch
from torch.optim.lr_scheduler import LRScheduler


class LinearWarmUpLR(LRScheduler):
    def __init__(
        self,
        optimizer: torch.optim.Optimizer,
        warmup_steps: int,
        last_epoch: int = -1,
    ) -> None:
        """Linear warm-up learning rate scheduler.

        Args:
            optimizer: Wrapped optimizer.
            warmup_steps: Number of warmup steps.
            last_epoch: The index of last epoch. Defaults to -1.
        """
        self.warmup_steps = warmup_steps
        super().__init__(optimizer, last_epoch)

    def get_lr(self) -> List[float]:
        if self.last_epoch < self.warmup_steps:
            return [
                base_lr * (self.last_epoch + 1) / self.warmup_steps
                for base_lr in self.base_lrs
            ]
        return self.base_lrs

The following configuration creates a new LinearWarmUpLR.yaml scheduler with a linear warm-up period of 10 training iterations in the conf/scheduler/ directory:

conf/scheduler/LinearWarmUpLR.yaml#

id: LinearWarmUpLR
_target_: linear_warm_up_lr.LinearWarmUpLR

warmup_steps: 10

step_frequency: evaluation

Note

The optimizer attribute is automatically passed to the scheduler during initialization and determined at runtime.

Custom Transforms#

To create a custom transform, inherit from AbstractTransform and implement the __call__() method.

For example, the following transform denoises a spectrogram by applying a median filter:

spect_median_filter.py#

import scipy.ndimage
import torch

from autrainer.core.structs import AbstractDataItem
from autrainer.transforms import AbstractTransform


class SpectMedianFilter(AbstractTransform):
    def __init__(self, size: int, order: int = 0) -> None:
        """Spectrogram median filter to remove noise.

        Args:
            size: Number of neighboring pixels to consider when filtering.
                Must be odd.
            order: The order of the transform in the pipeline. Larger means
                later in the pipeline. If multiple transforms have the same
                order, they are applied in the order they were added to the
                pipeline. Defaults to 0.
        """
        super().__init__(order=order)
        self.size = size

    def __call__(self, item: AbstractDataItem) -> AbstractDataItem:
        item.features = torch.from_numpy(
            scipy.ndimage.median_filter(
                item.features.cpu().numpy(),
                size=self.size,
            )
        ).to(item.features.device)
        return item

This transform can be used both as a preprocessing transform and as an online transform.

Custom Preprocessing Transforms#

To create a custom preprocessing transform, create a new file in the conf/preprocessing/ directory.

For example, the following preprocessing transform extracts log-Mel spectrograms from audio data at a sampling rate of 32 kHz and applies the custom denoising transform to the data:

conf/scheduler/denoised_log_mel_32k.yaml#

file_handler:
  autrainer.datasets.utils.AudioFileHandler:
    target_sample_rate: 32000
pipeline:
  - autrainer.transforms.StereoToMono
  - autrainer.transforms.PannMel:
      sample_rate: 32000
      window_size: 1024
      hop_size: 320
      mel_bins: 64
      fmin: 50
      fmax: 14000
      ref: 1.0
      amin: 1e-10
      top_db: null
  - spect_median_filter.SpectMedianFilter:
      size: 5

Any audio dataset can utilize this preprocessing transform by specifying the features_subdir attribute in the dataset configuration and adjusting the file_type, file_handler, and transform attributes:

conf/dataset/ExampleDataset.yaml#

...
features_subdir: denoised_log_mel_32k
file_type: npy
file_handler: autrainer.datasets.utils.NumpyFileHandler
...
transform:
  type: grayscale

Note

The save() method of the file_handler specified in the dataset configuration is used to save the processed data to the features_subdir directory. The load() method of the file_handler is used to load the processed data during training and inference.

Custom Online Transforms#

To create a custom online transform, no configuration file is necessary as the transform is applied at runtime and specified in the transform attribute of the model and dataset configurations using shorthand syntax.

For example, the following configuration applies the custom denoising transform to the data at runtime:

conf/dataset/ExampleDataset.yaml#

...
transform:
  type: grayscale
  base:
    - spect_median_filter.SpectMedianFilter:
        size: 5

In line with the custom preprocessing transform example, the custom denoising transform is applied to the train, dev, and test datasets.

It may be desirable to only apply a transform to a specific subset of the data. The following configuration applies the custom denoising transform only to the train subset of the data:

conf/dataset/ExampleDataset.yaml#

...
transform:
  type: grayscale
  train:
    - spect_median_filter.SpectMedianFilter:
        size: 5

Custom Augmentations#

To create a custom augmentation, inherit from AbstractAugmentation and implement the apply() method.

For example, the following augmentation scales the amplitude of a spectrogram by a random factor in a given range:

amplitude_scale_augmentation.py#

from typing import Optional, Tuple

import torch

from autrainer.augmentations import AbstractAugmentation
from autrainer.core.structs import AbstractDataItem


class AmplitudeScale(AbstractAugmentation):
    def __init__(
        self,
        scale_range: Tuple[float, float],
        order: int = 0,
        p: float = 1.0,
        generator_seed: Optional[int] = None,
    ) -> None:
        """Amplitude scaling augmentation. The amplitude is randomly scaled by
        a factor drawn from scale_range.

        Args:
            scale_range: The range of the amplitude scaling factor.
            order: The order of the augmentation in the transformation pipeline.
                Defaults to 0.
            p: The probability of applying the augmentation. Defaults to 1.0.
            generator_seed: The initial seed for the internal random number
                generator drawing the probability. If None, the generator is
                not seeded. Defaults to None.

        Raises:
            ValueError: If p is not in the range [0, 1].
        """
        super().__init__(order, p, generator_seed)
        self.scale_range = scale_range
        self.scale_g = torch.Generator()
        if self.generator_seed is not None:
            self.scale_g.manual_seed(self.generator_seed)

    def apply(self, item: AbstractDataItem) -> AbstractDataItem:
        s0, s1 = self.scale_range
        scale = torch.rand(1, generator=self.scale_g)
        item.features = item.features * (scale * (s1 - s0) + s0)
        return item

The following configuration creates a new AmplitudeScale.yaml augmentation in the conf/augmentation/ directory, scaling the amplitude of the spectrogram by a random factor between 0.8 and 1.2 with a probability p of 0.5:

conf/augmentation/AmplitudeScale.yaml#

id: AmplitudeScale
_target_: autrainer.augmentations.AugmentationPipeline

generator_seed: 0

pipeline:
  - amplitude_scale_augmentation.AmplitudeScale:
      scale_range: [0.8, 1.2]
      p: 0.5

As no augmentation in the pipeline specifies a generator_seed attribute, the global generator_seed attribute is broadcasted to all augmentations to ensure reproducibility.

Custom Augmentation Graphs#

For example, the following configuration creates a new AmplitudeScaleOrTimeFreqMask.yaml augmentation in the conf/augmentation/ directory, either applying the custom amplitude scale augmentation or a sequence of the TimeMask and FrequencyMask augmentations:

conf/augmentation/AmplitudeScaleOrTimeFreqMask.yaml#

id: AmplitudeScaleOrTimeFreqMask
_target_: autrainer.augmentations.AugmentationPipeline

generator_seed: 0

pipeline:
  - autrainer.augmentations.Choice:
      weights: [0.2, 0.8]
      choices:
        - amplitude_scale_augmentation.AmplitudeScale:
            scale_range: [0.8, 1.2]
        - autrainer.augmentations.Sequential:
            sequence:
              - autrainer.augmentations.TimeMask:
                  time_mask: 80
              - autrainer.augmentations.FrequencyMask:
                  freq_mask: 10

The custom amplitude scale augmentation is selected with a probability of 0.2, while the sequence of the TimeMask and FrequencyMask augmentations is selected with a probability of 0.8.

Custom Collate Augmentations#

To create a custom collate augmentation, inherit from AbstractAugmentation and implement the optional get_collate_fn() method.

The collate function is used to apply the augmentation on the batch level. In case the collate function modifies the shape of the input or labels, this may need to be accounted for if the augmentation is not applied.

For example, the following augmentation randomly applies CutMix or MixUp augmentations on the batch level:

cut_mix_up.py#

from typing import TYPE_CHECKING, Callable, List, Optional

import torch
from torchvision.transforms import v2

from autrainer.augmentations import AbstractAugmentation
from autrainer.core.structs import AbstractDataBatch, AbstractDataItem


if TYPE_CHECKING:
    from autrainer.datasets import AbstractDataset


class CutMixUp(AbstractAugmentation):
    def __init__(
        self,
        alpha: float = 1.0,
        order: int = 0,
        p: float = 1.0,
        generator_seed: Optional[int] = None,
    ) -> None:
        """Randomly applies CutMix or MixUp augmentations with a probability
        of 0.5 each.

        Args:
            alpha: Hyperparameter of the Beta distribution. Defaults to 1.0.
            order: The order of the augmentation in the transformation pipeline.
                Defaults to 0.
            p: The probability of applying the augmentation. Defaults to 1.0.
            generator_seed: The initial seed for the internal random number
                generator drawing the probability. If None, the generator is
                not seeded. Defaults to None.
        """
        super().__init__(order, p, generator_seed)
        self.alpha = alpha
        self.cut_mix_up_g = torch.Generator()
        if generator_seed is not None:
            self.cut_mix_up_g.manual_seed(generator_seed)

    def get_collate_fn(
        self,
        data: "AbstractDataset",
        default: Callable,
    ) -> Callable:
        self.cutmix = v2.CutMix(num_classes=data.output_dim, alpha=self.alpha)
        self.mixup = v2.MixUp(num_classes=data.output_dim, alpha=self.alpha)

        def _collate_fn(batch: List[AbstractDataItem]) -> AbstractDataBatch:
            probability = torch.rand(1, generator=self.g).item()
            batched: AbstractDataBatch = default(batch)
            if probability < self.p:
                features = batched.features
                target = batched.target
                if probability < 0.5:
                    results = self.cutmix(features, target)
                else:
                    results = self.mixup(features, target)
                batched.features = results[0]
                batched.target = results[1]
                return batched
            batched.target = torch.nn.functional.one_hot(
                batched.target, data.output_dim
            ).float()
            return batched

        return _collate_fn

    def apply(self, item: AbstractDataItem) -> AbstractDataItem:
        # no-op as the augmentation is applied in the collate function
        return item

Custom Loggers#

To create a custom logger, inherit from AbstractLogger and implement the log_params(), log_metrics(), log_timers(), and log_artifact() methods, as well as the optional setup(), and end_run() methods.

All methods are automatically called a the appropriate time during training and inference.

For example, the following logger logs to Weights & Biases:

wandb_logger.py#

import os
from typing import Dict, List, Optional, Union

from omegaconf import DictConfig
import wandb

from autrainer.loggers import AbstractLogger, get_params_to_export
from autrainer.metrics import AbstractMetric


class WandBLogger(AbstractLogger):
    def __init__(
        self,
        exp_name: str,
        run_name: str,
        metrics: List[AbstractMetric],
        tracking_metric: AbstractMetric,
        artifacts: List[Union[str, Dict[str, str]]] = None,
        output_dir: str = "wandb",
    ) -> None:
        super().__init__(exp_name, run_name, metrics, tracking_metric, artifacts)
        if not os.path.isabs(output_dir):
            output_dir = os.path.join(os.getcwd(), output_dir)
        os.makedirs(output_dir, exist_ok=True)
        self.output_dir = output_dir

    def log_params(self, params: Union[dict, DictConfig]) -> None:
        wandb.init(
            project=self.exp_name,
            name=self.run_name,
            config=get_params_to_export(params),
            dir=self.output_dir,
        )

    def log_metrics(
        self,
        metrics: Dict[str, Union[int, float]],
        iteration: Optional[int] = None,
    ) -> None:
        wandb.log(metrics, step=iteration)

    def log_timers(self, timers: Dict[str, float]) -> None:
        wandb.log(timers)

    def log_artifact(self, filename: str, path: str = "") -> None:
        artifact = wandb.Artifact(name=filename, type="model")
        artifact.add_file(os.path.join(path, filename))
        wandb.log_artifact(artifact)

    def end_run(self) -> None:
        wandb.finish()

Note that the WandBLogger assumes that wandb is installed, the API key is set, and a project with the same name as the experiment_id of the main configuration exists.

To add the WandBLogger, specify it in the main configuration by adding a list of loggers:

conf/config.yaml#

...
loggers:
  - wandb_logger.WandBLogger:
      output_dir: ${results_dir}/.wandb
...

Custom Callbacks#

To create a custom callback, implement a class that specifies any of the callback functions defined in CallbackSignature.

For example, the following callback tracks learning rate changes at the beginning of each iteration:

lr_tracker_callback.py#

from typing import TYPE_CHECKING


if TYPE_CHECKING:
    from autrainer.training import ModularTaskTrainer


class LRTrackerCallback:
    def cb_on_train_begin(self, trainer: "ModularTaskTrainer") -> None:
        self.lr = trainer.optimizer.param_groups[0]["lr"]

    def cb_on_iteration_begin(
        self,
        trainer: "ModularTaskTrainer",
        iteration: int,
    ) -> None:
        current_lr = trainer.optimizer.param_groups[0]["lr"]
        if current_lr != self.lr:
            print(
                f"Learning rate changed from {self.lr} "
                f"to {current_lr} in iteration {iteration}."
            )
            self.lr = current_lr

To add the LRTrackerCallback, specify it in the main configuration by adding a list of callbacks:

conf/config.yaml#

...
callbacks:
  - lr_tracker_callback.LRTrackerCallback
...

Custom Plotting#

To create a custom plotting configuration, create a new file in the conf/plotting/ directory.

For example, the following configuration uses the LaTeX backend, the Palatino font with a font size of 9, replaces None values in the run name with ~ for better readability, and adds labels as well as titles to the plot.

conf/plotting/LaTeX.yaml#

figsize: [10, 5] # figure size in inches
latex: true # use LaTeX for text rendering
filetypes: [png, pdf] # save figures in these formats
pickle: true # save the figure data in a pickle file
context: notebook # seaborn context
palette: colorblind # seaborn color palette
replace_none: true # replace None with ~
add_titles: true
add_xlabels: true
add_ylabels: true

rcParams:
  font.serif: Palatino # LaTeX font
  font.family: serif
  legend.fontsize: 9

To add the LaTeX.yaml plotting configuration, specify it in the main configuration by overriding the plotting attribute:

conf/config.yaml#

defaults:
  - ...
  - override plotting: LaTeX
...

Table of Contents

Tutorials#

Custom Models#

Custom Datasets#

Custom Metrics#

Custom Criterions#

Custom File Handlers#

Custom Target Transforms#

Advanced Data Pipelines#

Custom Optimizers#

Custom Schedulers#

Custom Transforms#

Custom Preprocessing Transforms#

Custom Online Transforms#

Custom Augmentations#

Custom Augmentation Graphs#

Custom Collate Augmentations#

Custom Loggers#

Custom Callbacks#

Custom Plotting#