Augmentations#

Augmentations are optional and by default not used. This is indicated by the absence of the augmentation attribute in the sweeper configuration (implicitly set to a None configuration file). To use an augmentation, specify it in the configuration file (conf/config.yaml) for the sweeper.

Tip

To create custom augmentations, refer to the custom augmentations tutorial.

Augmentations are specified analogously to transforms using shorthand syntax and have an order attribute to define the order of the augmentations. The augmentations are combined with the transform pipeline and sorted based on the order of the augmentations as well as the transforms.

In addition to the order of the augmentation, a seeded probability p of applying the augmentation can be specified. The optional generator_seed attribute is used to seed the random number generator for the augmentation.

Augmentation Pipelines#

The AugmentationManager is responsible for building the augmentation pipeline.

class autrainer.augmentations.AugmentationManager(train_augmentation=None, dev_augmentation=None, test_augmentation=None)[source]#

Manage the creation of the augmentation pipelines for train, dev, and test sets.

Parameters:

train_augmentation (Union[DictConfig, Dict, None]) – Train augmentation configuration.
dev_augmentation (Union[DictConfig, Dict, None]) – Dev augmentation configuration.
test_augmentation (Union[DictConfig, Dict, None]) – Test augmentation configuration.

get_augmentations()[source]#

Get augmentation pipelines for train, dev, and test.

Return type:: Tuple[SmartCompose, SmartCompose, SmartCompose]
Returns:: Tuple of augmentation pipelines for train, dev, and test.

The AugmentationPipeline class is used to define the configuration and instantiate the augmentation pipeline.

class autrainer.augmentations.AugmentationPipeline(pipeline, generator_seed=0, increment=True)[source]#

Initialize an augmentation pipeline.

Parameters:

pipeline (List[Union[str, Dict[str, Any]]]) – The list of augmentations to apply.
generator_seed (int) – Seed to pass to each augmentation for reproducibility if the augmentation does not have a seed. Defaults to 0.
increment (bool) – Whether to increment the generator seed for each augmentation that does not define its own seed. Defaults to True.

create_pipeline()[source]#

Create the composed and ordered augmentation pipeline.

Return type:: SmartCompose
Returns:: Composed augmentation pipeline.

Abstract Augmentation#

class autrainer.augmentations.AbstractAugmentation(order=0, p=1.0, generator_seed=None, **kwargs)[source]#

Abstract class for an augmentation.

Parameters:

order (int) – The order of the augmentation in the transformation pipeline. Defaults to 0.
p (float) – The probability of applying the augmentation. Defaults to 1.0.
generator_seed (Optional[int]) – The initial seed for the internal random number generator drawing the probability. If None, the generator is not seeded. Defaults to None.
kwargs – Additional keyword arguments to store in the object.

Raises:

ValueError – If p is not in the range [0, 1].

offset_generator_seed(offset)[source]#

Offset the generator seed used to draw the probability of applying the augmentation. Useful for ensuring reproducibility and randomness of augmentations when using multiple workers.

Parameters:: offset (int) – Offset to add to the generator seed. Usually the worker index.
Return type:: None

__call__(item)[source]#

Call the augmentation apply method with probability p.

Parameters:: item (AbstractDataItem) – The input data item.
Return type:: AbstractDataItem
Returns:: The augmented item if the probability is less than p, otherwise the input item.

abstract apply(item)[source]#

Apply the augmentation to the input tensor.

Apply is called with probability p.

Parameters:: item (AbstractDataItem) – The input data item.
Return type:: AbstractDataItem
Returns:: The augmented item.

Augmentation Wrappers#

For easier access to common augmentation libraries, autrainer provides wrappers for torchaudio, torchvision, torch-audiomentations, and albumentations augmentations.

The underlying augmentation is specified with the name attribute, representing the class name of the augmentation. Any further attributes are passed as keyword arguments to the augmentation constructor.

Note

For each augmentation, the probability p of applying the augmentation is always available, if the underlying augmentation supports it. If not specified, the default value is 1.0, overriding any existing default value of the library.

Both torch-audiomentations and albumentations augmentations are optional and can be installed using the following commands.

pip install autrainer[albumentations]
pip install autrainer[torch-audiomentations]

class autrainer.augmentations.TorchaudioAugmentation(name, order=0, p=1.0, generator_seed=None, **kwargs)[source]#

Wrapper around torchaudio.transforms transforms, which are specified by their class name and keyword arguments.

Important: While the probability of applying the augmentation is deterministic if the generator_seed is set, the actual augmentation applied is not deterministic. This is because the internal random number generator of the augmentation is not seeded.

Parameters:

name (str) – Name of the torchaudio augmentation. Must be a valid torchaudio.transforms transform class name.
order (int) – The order of the augmentation in the transformation pipeline. Defaults to 0.
p (float) – The probability of applying the augmentation. Defaults to 1.0.
generator_seed (Optional[int]) – The initial seed for the internal random number generator drawing the probability. If None, the generator is not seeded. Defaults to None.
kwargs – Keyword arguments passed to the torchaudio augmentation.

class autrainer.augmentations.TorchvisionAugmentation(name, order=0, p=1.0, generator_seed=None, **kwargs)[source]#

Wrapper around torchvision.transforms.v2 transforms, which are specified by their class name and keyword arguments.

Functionals are currently not supported.

Important: While the probability of applying the augmentation is deterministic if the generator_seed is set, the actual augmentation applied is not deterministic. This is because the internal random number generator of the augmentation is not seeded.

Parameters:

name (str) – Name of the torchvision augmentation. Must be a valid torchvision.transforms.v2 transform class name.
order (int) – The order of the augmentation in the transformation pipeline. Defaults to 0.
p (float) – The probability of applying the augmentation. Defaults to 1.0.
generator_seed (Optional[int]) – The initial seed for the internal random number generator drawing the probability. If None, the generator is not seeded. Defaults to None.
kwargs – Keyword arguments passed to the torchvision augmentation.

class autrainer.augmentations.AudiomentationsAugmentation(name, sample_rate=None, order=0, p=1.0, generator_seed=None, **kwargs)[source]#

Wrapper around audiomentations transforms, which are specified by their class name and keyword arguments.

Audiomentations operates on numpy arrays, so the input tensor is converted to a numpy array before applying the augmentation, and the output numpy array is converted back to a tensor.

Important: While the probability of applying the augmentation is deterministic if the generator_seed is set, the actual augmentation applied is not deterministic. This is because the internal random number generator of the augmentation is not seeded.

Parameters:

name (str) – Name of the torchaudio augmentation. Must be a valid audiomentations transform class name.
sample_rate (Optional[int]) – The sample rate of the audio data. Should be specified for most audio augmentations. If None, the sample rate is not passed to the augmentation. Defaults to None.
order (int) – The order of the augmentation in the transformation pipeline. Defaults to 0.
p (float) – The probability of applying the augmentation. Defaults to 1.0.
generator_seed (Optional[int]) – The initial seed for the internal random number generator drawing the probability. If None, the generator is not seeded. Defaults to None.
kwargs – Keyword arguments passed to the audiomentations augmentation.

class autrainer.augmentations.TorchAudiomentationsAugmentation(name, sample_rate, order=0, p=1.0, generator_seed=None, **kwargs)[source]#

Wrapper around torch_audiomentations transforms, which are specified by their class name and keyword arguments.

Important: While the probability of applying the augmentation is deterministic if the generator_seed is set, the actual augmentation applied is not deterministic. This is because the internal random number generator of the augmentation is not seeded.

Parameters:

name (str) – Name of the torchaudio augmentation. Must be a valid torch_audiomentations transform class name.
sample_rate (int) – The sample rate of the audio data.
order (int) – The order of the augmentation in the transformation pipeline. Defaults to 0.
p (float) – The probability of applying the augmentation. Defaults to 1.0.
generator_seed (Optional[int]) – The initial seed for the internal random number generator drawing the probability. If None, the generator is not seeded. Defaults to None.
kwargs – Keyword arguments passed to the torch_audiomentations augmentation.

class autrainer.augmentations.AlbumentationsAugmentation(name, order=0, p=1.0, generator_seed=None, **kwargs)[source]#

Wrapper around albumentations transforms, which are specified by their class name and keyword arguments.

Albumentations operates on numpy arrays, so the input tensor is converted to a numpy array before applying the augmentation, and the output numpy array is converted back to a tensor.

Important: While the probability of applying the augmentation is deterministic if the generator_seed is set, the actual augmentation applied is not deterministic. This is because the internal random number generator of the augmentation is not seeded.

Parameters:

name (str) – Name of the albumentations augmentation. Must be a valid albumentations transform class name.
order (int) – The order of the augmentation in the transformation pipeline. Defaults to 0.
p (float) – The probability of applying the augmentation. Defaults to 1.0.
generator_seed (Optional[int]) – The initial seed for the internal random number generator drawing the probability. If None, the generator is not seeded. Defaults to None.
kwargs – Keyword arguments passed to the albumentations augmentation.

Augmentation Graphs#

To create more complex augmentation pipelines which may resemble a graph structure, Sequential and Choice can be used.

Tip

To create custom augmentation graphs, refer to the custom augmentation graphs tutorial.

class autrainer.augmentations.Sequential(sequence, order=0, p=1.0, generator_seed=None)[source]#

Create a fixed sequence of augmentations.

The order of the augmentations in the list is not considered and is placed with respect to the order of the sequence augmentation itself. This means that the sequence of augmentations is applied in the order they are defined in the list and not disrupted by any other transform.

Augmentations in the list must not have a collate function.

Parameters:

sequence (List[Dict]) – A list of (shorthand syntax) dictionaries defining the augmentation sequence.
order (int) – The order of the augmentation in the transformation pipeline. Defaults to 0.
p (float) – The probability of applying the augmentation. Defaults to 1.0.
generator_seed (Optional[int]) – The initial seed for the internal random number generator drawing the probability. If None, the generator is not seeded. Defaults to None.

offset_generator_seed(offset)[source]#

Offset the generator seed used to draw the probability of applying the augmentation. Useful for ensuring reproducibility and randomness of augmentations when using multiple workers.

Parameters:: offset (int) – Offset to add to the generator seed. Usually the worker index.
Return type:: None

apply(item)[source]#

Apply all augmentations in sequence to the input tensor.

Parameters:: item (AbstractDataItem) – The input data item.
Return type:: AbstractDataItem
Returns:: The augmented item.

class autrainer.augmentations.Choice(choices, weights=None, order=0, p=1.0, generator_seed=None)[source]#

Choose one augmentation from a list of augmentations with a given probability.

The order of the augmentations in the list is not considered and is placed with respect to the order of the choice augmentation itself.

Augmentations in the list must not have a collate function.

Parameters:

choices (List[Dict]) – A list of (shorthand syntax) dictionaries defining the augmentations to choose from.
weights (Optional[List[float]]) – A list of weights for each choice. If None, all augmentations are assigned equal weights. Defaults to None.
order (int) – The order of the augmentation in the transformation pipeline. Defaults to 0.
p (float) – The probability of applying the augmentation. Defaults to 1.0.
generator_seed (Optional[int]) – The initial seed for the internal random number generator drawing the probability. If None, the generator is not seeded. Defaults to None.

Raises:

ValueError – If choices and weights have different lengths.
ValueError – If any augmentation has a collate function.

offset_generator_seed(offset)[source]#

Offset the generator seed used to draw the probability of applying the augmentation. Useful for ensuring reproducibility and randomness of augmentations when using multiple workers.

Parameters:: offset (int) – Offset to add to the generator seed. Usually the worker index.
Return type:: None

apply(item)[source]#

Choose one augmentation from the list of augmentations based on the given weights.

Parameters:: item (AbstractDataItem) – The input data item.
Return type:: AbstractDataItem
Returns:: The augmented item.

Note

The order of Sequential and Choice can be defined in the configuration file by the order attribute. However, order attributes of the augmentations within the Sequential and Choice are ignored. As the augmentations are applied in a scoped manner, their order is determined by the order of the augmentations in the configuration file.

Spectrogram Augmentations#

class autrainer.augmentations.GaussianNoise(mean=0.0, std=1.0, order=0, p=1.0, generator_seed=None)[source]#

Add Gaussian noise to the input tensor with mean and standard deviation.

Parameters:

mean (float) – The mean of the Gaussian noise. Defaults to 0.0.
std (float) – The standard deviation of the Gaussian noise. Defaults to 1.0.
order (int) – The order of the augmentation in the transformation pipeline. Defaults to 0.
p (float) – The probability of applying the augmentation. Defaults to 1.0.
generator_seed (Optional[int]) – The initial seed for the internal random number generator drawing the probability. If None, the generator is not seeded. Defaults to None.

class autrainer.augmentations.TimeMask(time_mask, axis, replace_with_zero=True, order=0, p=1.0, generator_seed=None)[source]#

Mask a random number of time steps.

Important: While the probability of applying the augmentation is deterministic if the generator_seed is set, the actual augmentation applied is not deterministic. This is because the internal random number generator of the augmentation is not seeded.

Parameters:

time_mask (int) – maximum time steps in a tensor will be masked.
axis (int) – Time axis. If the image is torch Tensor, it is expected to have [C, H, W] shape, then H is assumed to be axis 0, and W is axis 1.
replace_with_zero (bool) – Fill the mask either with a tensor mean, or 0’s. Defaults to True.
order (int) – The order of the augmentation in the transformation pipeline. Defaults to 0.
p (float) – The probability of applying the augmentation. Defaults to 1.0.
generator_seed (Optional[int]) – The initial seed for the internal random number generator drawing the probability. If None, the generator is not seeded. Defaults to None.

class autrainer.augmentations.FrequencyMask(freq_mask, axis, replace_with_zero=True, order=0, p=1.0, generator_seed=None)[source]#

Mask a random number of frequency steps.

Important: While the probability of applying the augmentation is deterministic if the generator_seed is set, the actual augmentation applied is not deterministic. This is because the internal random number generator of the augmentation is not seeded.

Parameters:

freq_mask (int) – maximum frequency steps in a tensor will be masked.
axis (int) – Frequency axis. If the image is torch Tensor, it is expected to have [C, H, W] shape, then H is assumed to be axis 0, and W is axis 1.
replace_with_zero (bool) – Fill the mask either with a tensor mean, or 0’s. Defaults to True.
order (int) – The order of the augmentation in the transformation pipeline. Defaults to 0.
p (float) – The probability of applying the augmentation. Defaults to 1.0.
generator_seed (Optional[int]) – The initial seed for the internal random number generator drawing the probability. If None, the generator is not seeded. Defaults to None.

class autrainer.augmentations.TimeShift(axis, time_steps=0, order=0, p=1.0, generator_seed=None)[source]#

Shift the input tensor along the time axis.

Parameters:

axis (int) – Time axis. If the image is torch Tensor, it is expected to have [C, H, W] shape, then H is assumed to be axis 0, and W is axis 1.
time_steps (int) – maximum time steps a tensor will shifted forward or backward. Defaults to 0.
order (int) – The order of the augmentation in the transformation pipeline. Defaults to 0.
p (float) – The probability of applying the augmentation. Defaults to 1.0.
generator_seed (Optional[int]) – The initial seed for the internal random number generator drawing the probability. If None, the generator is not seeded. Defaults to None.

class autrainer.augmentations.TimeWarp(axis, W=10, order=0, p=1.0, generator_seed=None)[source]#

A random point along the time axis passing through the center of the image within the time steps (W, tau - W) is to be warped either to the left or right by a distance w chosen from a uniform distribution from 0 to the time warp parameter W along that line.

Parameters:

axis (int) – Time axis. If the image is torch Tensor, it is expected to have [C, H, W] shape, then H is assumed to be axis 0, and W is axis 1.
W (int) – Bound for squishing/stretching. Defaults to 10.
order (int) – The order of the augmentation in the transformation pipeline. Defaults to 0.
p (float) – The probability of applying the augmentation. Defaults to 1.0.
generator_seed (Optional[int]) – The initial seed for the internal random number generator drawing the probability. If None, the generator is not seeded. Defaults to None.

class autrainer.augmentations.SpecAugment(time_mask=10, freq_mask=10, W=50, order=0, p=1.0, generator_seed=None)[source]#

SpecAugment augmentation. A combination of time warp, frequency masking, and time masking.

Important: While the probability of applying the augmentation is deterministic if the generator_seed is set, the actual augmentation applied is not deterministic. This is because the internal random number generator of the augmentation is not seeded.

For more information, see: https://arxiv.org/abs/1904.08779

This implementation differs from PyTorch, as they apply TimeStrech instead of TimeWarp. For more information, see: https://pytorch.org/audio/master/tutorials/audio_feature_augmentation_tutorial.html#specaugment

Parameters:

time_mask (int) – maximum time steps in a tensor will be masked. Defaults to 10.
freq_mask (int) – maximum frequency steps in a tensor will be masked. Defaults to 10.
W (int) – Bound for squishing/stretching the time axis. Defaults to 50.
order (int) – The order of the augmentation in the transformation pipeline. Defaults to 0.
p (float) – The probability of applying the augmentation. Defaults to 1.0.
generator_seed (Optional[int]) – The initial seed for the internal random number generator drawing the probability. If None, the generator is not seeded. Defaults to None.

Augmentations with Collate Functions#

For augmentations that require a collate function, an optional get_collate_fn method can be implemented. This method is used to retrieve the collate function from the augmentation if it is present.

Tip

To create custom augmentations with collate functions, refer to the custom augmentations tutorial.

The signature of the get_collate_fn method should be as follows and return a collate function taking in a list of AbstractDataItem objects and returning a single AbstractDataBatch object.

example_augmentation.ExampleCollateAugmentation#

class ExampleCollateAugmentation(AbstractAugmentation):
    def get_collate_fn(
        self,
        data: "AbstractDataset",
        default: Callable,
    ) -> Callable:
        return DataBatch.collate

Note

Only one collate function can be used in each transform pipeline. If multiple collate functions are defined, the last one in the pipeline (defined by the order of the transforms) is used.

Both CutMix and MixUp augmentations require a collate function and operate on the batch level. This means, that the collate function is applied to the batch of samples, rather than individual samples, and the probability of applying the augmentation acts on the batch level as well.

class autrainer.augmentations.CutMix(alpha=1.0, order=0, p=1.0, generator_seed=None)[source]#

CutMix augmentation. As CutMix utilizes a collate function, the probability of applying the augmentation is drawn for each batch.

Parameters:

alpha (float) – Hyperparameter of the Beta distribution. Defaults to 1.0.
order (int) – The order of the augmentation in the transformation pipeline. Defaults to 0.
p (float) – The probability of applying the augmentation. Defaults to 1.0.
generator_seed (Optional[int]) – The initial seed for the internal random number generator drawing the probability. If None, the generator is not seeded. Defaults to None.

class autrainer.augmentations.MixUp(alpha=1.0, order=0, p=1.0, generator_seed=None)[source]#

MixUp augmentation. As MixUp utilizes a collate function, the probability of applying the augmentation is drawn for each batch.

Parameters:

alpha (float) – Hyperparameter of the Beta distribution. Defaults to 1.0.
order (int) – The order of the augmentation in the transformation pipeline. Defaults to 0.
p (float) – The probability of applying the augmentation. Defaults to 1.0.
generator_seed (Optional[int]) – The initial seed for the internal random number generator drawing the probability. If None, the generator is not seeded. Defaults to None.

Miscellaneous Augmentations#

class autrainer.augmentations.SampleGaussianWhiteNoise(snr_df, snr_col, sample_seed=None, order=101, p=1.0, generator_seed=None)[source]#

Sample-level gaussian white noise augmentation based on SNR values.

Parameters:

snr_df (str) – Path to a CSV file containing SNR values for each sample. Index of the CSV file must match the index of the dataset.
snr_col (str) – Name of the column containing the SNR values.
sample_seed (Optional[int]) – Seed for the random number generator used for sampling the noise. If a seed is provided, a consistent augmentation is applied to the same sample. Defaults to None.
order (int) – The order of the augmentation in the transformation pipeline. Defaults to 101.
p (float) – The probability of applying the augmentation. Defaults to 1.0.
generator_seed (Optional[int]) – The initial seed for the internal random number generator drawing the probability. If None, the generator is not seeded. Defaults to None.

Table of Contents

Augmentations#

Augmentation Pipelines#

Abstract Augmentation#

Augmentation Wrappers#

Augmentation Graphs#

Spectrogram Augmentations#

Augmentations with Collate Functions#

Miscellaneous Augmentations#