Core#

Core provides various utilities and entry points for the autrainer framework.

Entry Point#

The main training entry point for autrainer.

autrainer.main(config_name, config_path=None, version_base=None)[source]#

Hydra main decorator with additional autrainer configs.

The conf directory in the current working directory is always added to the search path if it exists. The current working directory is also added to the Python path.

Parameters:

config_name (str) – The name of the config (usually the file name without the .yaml extension).
config_path (Optional[str]) – The config path, a directory where Hydra will search for config files. If config_path is None no directory is added to the search path. Defaults to None.
version_base (Optional[str]) – Hydra version base. Defaults to None.

Instantiation#

Instantiation functions provide wrappers around Hydra object instantiation, providing additional type safety and Shorthand Syntax support.

autrainer.instantiate(config, instance_of=None, convert=None, recursive=False, **kwargs)[source]#

Instantiate an object from a configuration Dict or DictConfig.

The config must contain a _target_ field that specifies a relative import path to the object to instantiate. If _target_ is None, returns None.

Parameters:

config (Union[DictConfig, Dict]) – The configuration to instantiate.
instance_of (Optional[Type[TypeVar(T)]]) – The expected type of the instantiated object. Defaults to None.
convert (Optional[HydraConvertEnum]) – The conversion strategy to use, one of HydraConvertEnum. Convert is only used if the config does not have a _convert_ attribute. If None, uses HydraConvertEnum.ALL. Defaults to None.
recursive (bool) – Whether to recursively instantiate objects. Recursive is only used if the config does not have a _recursive_ field. Defaults to False.
**kwargs – Additional keyword arguments to pass to the object.

Raises:

ValueError – If the config does not have a _target_ field.
ValueError – If the instantiated object is not an instance of instance_of and instance_of is provided.

Return type:

TypeVar(T)

Returns:

The instantiated object.

autrainer.instantiate_shorthand(config, instance_of=None, convert=None, recursive=False, **kwargs)[source]#

Instantiate an object from a shorthand configuration.

A shorthand config is either a string or a dictionary with a single key. If config is a string, it should be a python import path. If config is a dictionary, the key should be a python import path and the value should be a dictionary of keyword arguments.

Parameters:

config (Union[str, DictConfig, Dict]) – The config to instantiate.
instance_of (Optional[Type[TypeVar(T)]]) – The expected type of the instantiated object. Defaults to None.
convert (Optional[HydraConvertEnum]) – The conversion strategy to use, one of HydraConvertEnum. Convert is only used if the config does not have a _convert_ attribute. If None, uses HydraConvertEnum.ALL. Defaults to None.
recursive (bool) – Whether to recursively instantiate objects. Recursive is only used if the config does not have a _recursive_ field. Defaults to False.
**kwargs – Additional keyword arguments to pass to the object.

Raises:

ValueError – If the config is empty (None or an empty string/dictionary).

Return type:

TypeVar(T)

Returns:

The instantiated object.

Data Items and Batches#

autrainer provides a set of classes to represent data items and batches of data items. The DataItem class represents individual data instances. These classes (or structs) are meant to hold all important attributes needed by AbstractModel objects that are compatible with a particular dataset. In addition they hold attributes needed by autrainer’s utilities, such as the index corresponding to each instance (where we assume that each AbstractDataset is an ordered set of instances).

class autrainer.core.structs.AbstractDataItem(features, target, index)[source]#

Abstract data item class for a single sample.

Parameters:

features (Tensor) – Tensor of input features.
target (Union[int, float, List[int], List[float], ndarray]) – Target value for the input features.
index (int) – Index of the data sample.

class autrainer.core.structs.DataItem(features, target, index)[source]#

Data item for a single sample.

Parameters:

features (Tensor) – Tensor of input features.
target (Union[int, float, List[int], List[float], ndarray]) – Target value for the input features.
index (int) – Index of the data sample.

Additionally, autrainer provides a set of classes that hold batches of individual data instances. They provide an implementation of the collate_fn() which is used to collate the individual data items into a batch of data items and must be passed to the torch.utils.data.DataLoader.

class autrainer.core.structs.AbstractDataBatch(features, target, index)[source]#

Abstract data batch class for a batch of samples.

Parameters:

features (Tensor) – Tensor of input features.
target (Tensor) – Tensor of target values for the input features.
index (Tensor) – Tensor of indices for the data samples.

abstract to(device, **kwargs)[source]#

Move the features, target, and additional data to a device.

Parameters:

device (device) – Device to move the data to.
kwargs (dict) – Additional keyword arguments passed to the to method.

Return type:

None

classmethod collate(items)[source]#

Collate a list of data items into a data batch.

Parameters:: items (List[TypeVar(ItemType, bound= AbstractDataItem)]) – List of data items to collate.
Return type:: AbstractDataBatch[TypeVar(ItemType, bound= AbstractDataItem)]
Returns:: Collated data batch.

class autrainer.core.structs.DataBatch(features, target, index)[source]#

Data batch for a batch of samples.

Parameters:

features (Tensor) – Tensor of input features.
target (Tensor) – Tensor of target values for the input features.
index (Tensor) – Tensor of indices for the data samples.

to(device, **kwargs)[source]#

Move the features, target, and additional data to a device.

Parameters:

device (device) – Device to move the data to.
kwargs (dict) – Additional keyword arguments passed to the to method.

Return type:

None

Warning

features, target, and index are the three reserved attributes that every object derived from AbstractDataItem must include. Moreover, every object derived from AbstractModel must include features as the first argument (after self) of its forward() method.

Utils#

Utils provide various helpers for I/O, logging, timing, and hardware information.

class autrainer.core.utils.Bookkeeping(output_directory, file_handler_path=None)[source]#

Bookkeeping to handle general disk operations and interactions.

Parameters:

output_directory (str) – Output directory to save files to.
file_handler_path (Optional[str]) – Path to save the log file to. Defaults to None.

log(message, level=20)[source]#

Log a message.

Parameters:

message (str) – Message to log.
level (int) – Logging level. Defaults to logging.INFO.

Return type:

None

log_to_file(message, level=20)[source]#

Log a message to the file handler.

Parameters:

message (str) – Message to log.
level (int) – Logging level. Defaults to logging.INFO.

Return type:

None

create_folder(folder_name, path='')[source]#

Create a new folder in the output directory.

Parameters:

folder_name (str) – Name of the folder to create.
path (str) – Subdirectory to create the folder in. Defaults to “”.

Return type:

None

save_model_summary(model, shape, device, filename)[source]#

Save a model summary to a file.

Parameters:

model (Module) – Model to summarize.
shape (Tuple[int, ...]) – Shape of the input to the model.
filename (str) – Name of the file to save the summary to.

Return type:

None

save_state(obj, filename, path='')[source]#

Save the state of an object.

Parameters:

obj (Union[Module, Optimizer, LRScheduler, None]) – Object to save the state of. If None, do nothing.
filename (str) – Name of the file to save the state to.
path (str) – Subdirectory to save the state to. Defaults to “”.

Raises:

TypeError – If the object type is not supported.

Return type:

None

load_state(obj, filename, path='')[source]#

Load the state of an object.

Parameters:

obj (Union[Module, Optimizer, LRScheduler]) – Object to load the state into.
filename (str) – Name of the file to load the state from.
path (str) – Subdirectory to load the state from. Defaults to “”.

Raises:

TypeError – If the object type is not supported.
FileNotFoundError – If the file is not found.

Return type:

None

save_audobject(obj, filename, path='')[source]#

Save an audobject.Object to disk.

Parameters:

obj (Object) – Object to save.
filename (str) – Name of the file to save the object to.
path (str) – Subdirectory to save the object to. Defaults to “”.

Raises:

TypeError – If the object type is not supported.

Return type:

None

save_results_dict(results_dict, filename, path='')[source]#

Save a results dictionary to disk.

Parameters:

results_dict (Dict[str, float]) – Dictionary of metric names and values to save.
filename (str) – Name of the file to save the results to.
path (str) – Subdirectory to save the results to. Defaults to “”.

Return type:

None

save_results_df(results_df, filename, path='')[source]#

Save a results DataFrame to disk.

Parameters:

results_df (DataFrame) – DataFrame to save.
filename (str) – Name of the file to save the results to.
path (str) – Subdirectory to save the results to. Defaults to “”.

Return type:

None

save_results_np(results_np, filename, path='')[source]#

Save a results numpy array to disk.

Parameters:

results_np (ndarray) – Numpy array to save.
filename (str) – Name of the file to save the results to.
path (str) – Subdirectory to save the results to. Defaults to “”.

Return type:

None

save_best_results(metrics, filename, metric_fns, tracking_metric_fn, path='')[source]#

Save the best results to disk.

Parameters:

metrics (DataFrame) – DataFrame of metrics to save.
filename (str) – Name of the file to save the best results to.
metric_fns (List[AbstractMetric]) – List of metric functions to get the best results from.
tracking_metric_fn (AbstractMetric) – Tracking metric function to get the best iteration from.
path (str) – Subdirectory to save the best results to. Defaults to “”.

Return type:

None

class autrainer.core.utils.Timer(output_directory, timer_type)[source]#

Timer to measure time of different parts of the training process.

Parameters:

output_directory (str) – Directory to save the timer.yaml file to.
timer_type (str) – Name of the timer.

start()[source]#

Start the timer.

Return type:: None

stop()[source]#

Stop the timer.

Raises:: ValueError – If the timer was not started.
Return type:: None

get_time_log()[source]#

Get the time log.

Return type:: list
Returns:: List of times.

get_mean_seconds()[source]#

Get the mean time in seconds.

Return type:: float
Returns:: Mean time in seconds.

get_total_seconds()[source]#

Get the total time in seconds.

Return type:: float
Returns:: Total time in seconds.

classmethod pretty_time(seconds)[source]#

Convert seconds to a pretty string.

Parameters:: seconds (float) – Time in seconds.
Return type:: str
Returns:: Time in a pretty string format.

save(path='')[source]#

Save and append the timer to timer.yaml.

Parameters:: path (str) – Subdirectory to save the timer.yaml file to relative to the output directory. Defaults to “”.
Return type:: dict
Returns:: Dictionary with mean and total time in seconds and pretty format.

autrainer.core.utils.get_hardware_info(device)[source]#

Get hardware information of the current system.

Parameters:: device (device) – Device to get the hardware information from.
Return type:: dict
Returns:: Dictionary containing system and GPU information.

autrainer.core.utils.save_hardware_info(output_directory, device)[source]#

Save hardware information to a hardware.yaml file.

Parameters:

output_directory (str) – Directory to save the hardware information to.
device (device) – Device to get the hardware information from.

Return type:

None

autrainer.core.utils.set_seed(seed)[source]#

Set a global seed for reproducibility for random, numpy, and torch.

If CUDA is available, set the seed for CUDA and cuDNN as well.

Parameters:: seed (int) – Seed to set.
Return type:: None

autrainer.core.utils.silence()[source]#

Context manager to suppress stdout and stderr.

Yields:: None.
Return type:: Generator[None, None, None]

Plotting#

Plotting provides a simple interface to plot metrics of a single run during Training as well as multiple runs during Postprocessing.

Tip

To create custom plotting configurations, refer to the custom plotting configurations tutorial.

By default, training plots are saved as png files for each metric. This can optionally be extended to any format supported by Matplotlib and additionally pickled for further processing.

Note

Plots are fully customizable by providing Matplotlib rcParams in a custom plotting configuration.

class autrainer.core.plotting.PlotBase(output_directory, training_type, figsize, latex, filetypes, pickle, context, palette, replace_none, add_titles, add_xlabels, add_ylabels, rcParams)[source]#

Base class for plotting.

Parameters:

output_directory (str) – Output directory to save plots to.
training_type (str) – Type of training in [“Epoch”, “Step”].
figsize (tuple) – Figure size in inches.
latex (bool) – Whether to use LaTeX in plots. Requires the latex package. To install all necessary dependencies, run: pip install autrainer[latex].
filetypes (list) – Filetypes to save plots as.
pickle (bool) – Whether to save additional pickle files of the plots.
context (str) – Context for seaborn plots.
palette (str) – Color palette for seaborn plots.
replace_none (bool) – Whether to replace “None” in labels with “~”.
add_titles (bool) – Whether to add titles to plots.
add_xlabels (bool) – Whether to add x-labels to plots.
add_ylabels (bool) – Whether to add y-labels to plots.
rcParams (dict) – Additional Matplotlib rcParams to set.

save_plot(fig, name, path='', close=True, tight_layout=True)[source]#

Save a plot to the output directory.

Parameters:

fig (Figure) – Matplotlib figure to save.
name (str) – Name of the plot.
path (str) – Path to save the plot to relative to the output directory.
close (bool) – Whether to close the figure after saving.
tight_layout (bool) – Whether to apply tight layout to the plot.

Return type:

None

class autrainer.core.plotting.PlotMetrics(output_directory, training_type, figsize, latex, filetypes, pickle, context, palette, replace_none, add_titles, add_xlabels, add_ylabels, rcParams, metric_fns)[source]#

Plot the metrics of one or multiple runs.

Parameters:

output_directory (str) – Output directory to save plots to.
training_type (str) – Type of training in [“Epoch”, “Step”].
figsize (tuple) – Figure size in inches.
latex (bool) – Whether to use LaTeX in plots. Requires the latex package. To install all necessary dependencies, run: pip install autrainer[latex].
filetypes (list) – Filetypes to save plots as.
pickle (bool) – Whether to save additional pickle files of the plots.
context (str) – Context for seaborn plots.
palette (str) – Color palette for seaborn plots.
replace_none (bool) – Whether to replace “None” in labels with “~”.
add_titles (bool) – Whether to add titles to plots.
add_xlabels (bool) – Whether to add x-labels to plots.
add_ylabels (bool) – Whether to add y-labels to plots.
rcParams (dict) – Additional Matplotlib rcParams to set.
metric_fns (List[AbstractMetric]) – List of metrics to use for plotting.

plot_run(metrics, std_scale=0.1)[source]#

Plot the metrics of a single run.

Parameters:

metrics (DataFrame) – DataFrame containing the metrics.
std_scale (float) – Scale factor for the standard deviation. Defaults to 0.1.

Return type:

None

plot_metric(metrics, metric, metrics_std=None, std_scale=0.1, max_runs=None)[source]#

Plot a single metric of multiple runs.

Parameters:

metrics (DataFrame) – DataFrame containing the metrics.
metric (str) – Metric to plot.
metrics_std (Optional[DataFrame]) – DataFrame containing the standard deviations. Defaults to None.
std_scale (float) – Scale factor for the standard deviation. Defaults to 0.1.
max_runs (Optional[int]) – Maximum number of best runs to plot. If None, all runs are plotted. Defaults to None.

Return type:

None

plot_aggregated_bars(metrics_df, metric, subplots_by=0, group_by=1, split_subgroups=True)[source]#

Plot aggregated bar plots for a metric.

Generate a bar plots from the metrics_df, which are divided by the “subplots_by” column, further grouped according to the “group_by” column. If “split_subgroups” is set to true, each group is further split into subgroups based on what comes after a potential “-” in the “group_by” entry. Finally the “metric” entries are averaged to create the bars and the standard deviation is shown as error bars.

Parameters:

metrics_df (DataFrame) – DataFrame containing the metrics.
metric (str) – Metric to plot.
subplots_by (int) – Column to group the subplots by.
group_by (int) – Column to group the data by.
split_subgroups (bool) – Whether to split subgroups.

Constants#

autrainer provides a set of constants singletons to control naming, training, and exporting configurations at runtime.

class autrainer.core.constants.AbstractConstants[source]#: Abstract constants singleton class for managing the configurations of autrainer.

class autrainer.core.constants.NamingConstants[source]#

Singleton for managing the naming configurations of autrainer.

property NAMING_CONVENTION: List[str]#

Get the naming convention of runs. Defaults to ["dataset", "model", "optimizer", "learning_rate", "batch_size", "training_type", "iterations", "scheduler", "augmentation", "seed"].

Returns:: Naming convention of runs.

property INVALID_AGGREGATIONS: List[str]#

Get the invalid aggregations for postprocessing. Defaults to ["training_type"].

Returns:: Invalid aggregations for postprocessing.

property VALID_AGGREGATIONS: List[str]#

Get the valid aggregations for postprocessing. Defaults to ["dataset", "model", "optimizer", "learning_rate", "batch_size", "iterations", "scheduler", "augmentation", "seed"] (the naming convention without the invalid aggregations).

Returns:: Valid aggregations for postprocessing.

property CONFIG_DIRS: List[str]#

Get the configuration directories for Hydra configurations. Defaults to ["augmentation", "dataset", "model", "optimizer", "plotting", "preprocessing", "scheduler"].

Returns:: Configuration directories for Hydra configurations.

class autrainer.core.constants.TrainingConstants[source]#

Singleton for managing the training configurations of autrainer.

property TASKS: List[str]#

Get the supported training tasks. Defaults to ["classification", "ml-classification", "regression", "mt-regression"].

Returns:: Supported training tasks.

class autrainer.core.constants.ExportConstants[source]#

Singleton for managing the export and logging configurations of autrainer.

property LOGGING_DEPTH: int#

Get the depth of logging for configuration parameters. Defaults to 2.

Returns:: Depth of logging for configuration parameters.

property IGNORE_PARAMS: List[str]#

Get the ignored configuration parameters for logging. Defaults to ["results_dir", "experiment_id", "model.dataset", "training_type", "save_frequency", "dataset.metrics", "plotting", "model.transform", "dataset.transform", "augmentation.steps", "loggers", "progress_bar", "continue_training", "remove_continued_runs", "save_train_outputs", "save_dev_outputs", "save_test_outputs"].

Returns:: Ignored configuration parameters for logging.

property ARTIFACTS: List[str | Dict[str, str]]#

Get the artifacts to log for runs. Defaults to ["model_summary.txt", "metrics.csv", {"config.yaml": ".hydra"}].

Returns:: Artifacts to log for runs.

Table of Contents

Core#

Entry Point#

Instantiation#

Data Items and Batches#

Utils#

Plotting#

Constants#