Core#
Core provides various utilities and entry points for the autrainer framework.
Entry Point#
The main training entry point for autrainer.
- autrainer.main(config_name, config_path=None, version_base=None)[source]#
Hydra main decorator with additional autrainer configs.
The conf directory in the current working directory is always added to the search path if it exists. The current working directory is also added to the Python path.
- Parameters:
config_name (
str) – The name of the config (usually the file name without the .yaml extension).config_path (
Optional[str]) – The config path, a directory where Hydra will search for config files. If config_path is None no directory is added to the search path. Defaults to None.version_base (
Optional[str]) – Hydra version base. Defaults to None.
- Return type:
Callable[[Callable[[Any],Any]],Any]
Instantiation#
Instantiation functions provide wrappers around Hydra object instantiation, providing additional type safety and Shorthand Syntax support.
- autrainer.instantiate(config, instance_of=None, convert=None, recursive=False, **kwargs)[source]#
Instantiate an object from a configuration Dict or DictConfig.
The config must contain a _target_ field that specifies a relative import path to the object to instantiate. If _target_ is None, returns None.
- Parameters:
config (
Union[DictConfig,Dict]) – The configuration to instantiate.instance_of (
Optional[Type[TypeVar(T)]]) – The expected type of the instantiated object. Defaults to None.convert (
Optional[HydraConvertEnum]) – The conversion strategy to use, one of HydraConvertEnum. Convert is only used if the config does not have a _convert_ attribute. If None, uses HydraConvertEnum.ALL. Defaults to None.recursive (
bool) – Whether to recursively instantiate objects. Recursive is only used if the config does not have a _recursive_ field. Defaults to False.**kwargs (
Dict[str,Any]) – Additional keyword arguments to pass to the object.
- Raises:
ValueError – If the config does not have a _target_ field.
ValueError – If the instantiated object is not an instance of instance_of and instance_of is provided.
- Return type:
TypeVar(T)- Returns:
The instantiated object.
- autrainer.instantiate_shorthand(config, instance_of=None, convert=None, recursive=False, **kwargs)[source]#
Instantiate an object from a shorthand configuration.
A shorthand config is either a string or a dictionary with a single key. If config is a string, it should be a python import path. If config is a dictionary, the key should be a python import path and the value should be a dictionary of keyword arguments.
- Parameters:
config (
Union[str,DictConfig,Dict]) – The config to instantiate.instance_of (
Optional[Type[TypeVar(T)]]) – The expected type of the instantiated object. Defaults to None.convert (
Optional[HydraConvertEnum]) – The conversion strategy to use, one of HydraConvertEnum. Convert is only used if the config does not have a _convert_ attribute. If None, uses HydraConvertEnum.ALL. Defaults to None.recursive (
bool) – Whether to recursively instantiate objects. Recursive is only used if the config does not have a _recursive_ field. Defaults to False.**kwargs (
Dict[str,Any]) – Additional keyword arguments to pass to the object.
- Raises:
ValueError – If the config is empty (None or an empty string/dictionary).
- Return type:
TypeVar(T)- Returns:
The instantiated object.
Data Items and Batches#
autrainer provides a set of classes to represent data items and batches of data items.
The DataItem class represents individual data instances.
These classes (or structs) are meant to hold all important attributes needed by AbstractModel
objects that are compatible with a particular dataset.
In addition they hold attributes needed by autrainer’s utilities, such as the index corresponding to each instance
(where we assume that each AbstractDataset is an ordered set of instances).
- class autrainer.core.structs.AbstractDataItem(features, target, index)[source]#
Abstract data item class for a single sample.
- Parameters:
features (
Tensor) – Tensor of input features.target (
Union[int,float,List[int],List[float],ndarray]) – Target value for the input features.index (
int) – Index of the data sample.
- class autrainer.core.structs.DataItem(features, target, index)[source]#
Data item for a single sample.
- Parameters:
features (
Tensor) – Tensor of input features.target (
Union[int,float,List[int],List[float],ndarray]) – Target value for the input features.index (
int) – Index of the data sample.
Additionally, autrainer provides a set of classes that hold batches of individual data instances.
They provide an implementation of the collate_fn() which is used to collate the individual data items
into a batch of data items and must be passed to the torch.utils.data.DataLoader.
- class autrainer.core.structs.AbstractDataBatch(features, target, index)[source]#
Abstract data batch class for a batch of samples.
- Parameters:
features (
Tensor) – Tensor of input features.target (
Tensor) – Tensor of target values for the input features.index (
Tensor) – Tensor of indices for the data samples.
- abstract to(device, **kwargs)[source]#
Move the features, target, and additional data to a device.
- Parameters:
device (
device) – Device to move the data to.kwargs (
dict) – Additional keyword arguments passed to the to method.
- Return type:
None
- classmethod collate(items)[source]#
Collate a list of data items into a data batch.
- Parameters:
items (
List[TypeVar(ItemType, bound= AbstractDataItem)]) – List of data items to collate.- Return type:
AbstractDataBatch[TypeVar(ItemType, bound= AbstractDataItem)]- Returns:
Collated data batch.
- class autrainer.core.structs.DataBatch(features, target, index)[source]#
Data batch for a batch of samples.
- Parameters:
features (
Tensor) – Tensor of input features.target (
Tensor) – Tensor of target values for the input features.index (
Tensor) – Tensor of indices for the data samples.
Warning
features, target,
and index are the three reserved attributes that every object derived from
AbstractDataItem must include.
Moreover, every object derived from AbstractModel must include features
as the first argument (after self) of its forward() method.
Callbacks#
autrainer provides a set of callback functions (cb_on_*()) that are called at various stages of the training loop.
Each callback is automatically invoked at the appropriate time during training.
For more control over the training process, custom callbacks can be defined and added to the trainer by specifying a list of
callback classes using shorthand syntax in the callbacks attribute of the
main configuration file.
Each callback class can specify any number of callback functions following the signatures defined in CallbackMixin.
Tip
To create custom callbacks, refer to the custom callbacks tutorial.
- class autrainer.core.callbacks.CallbackMixin(*args, **kwargs)[source]#
- static order(order)[source]#
Decorator to set the order of the callback in the callback list.
A larger order means the callback will be called later in the list. If multiple callbacks share the same order, they are applied in the order they were registered (i.e., in instantiation order of the objects).
Note: If used in combination with CallbackMixin.chain, the order is determined by the order of the methods in the class hierarchy, with the base class method being registered first by default.
- Parameters:
order (
int) – The order of the callback in the callback list. Defaults to 0.- Return type:
Callable[...,Any]
- static chain()[source]#
Decorator to indicate that the callback should be chained and the base class implementation should also be called.
By default, the base class implementation is called first, followed by the derived (decorated) implementation.
- Return type:
Callable[...,Any]
- cb_on_train_begin(trainer)[source]#
Called at the beginning of the training loop before the first iteration.
- Parameters:
trainer (
ModularTaskTrainer) – Mutable reference to the trainer.- Return type:
None
- cb_on_train_end(trainer)[source]#
Called at the end of the training loop after the last iteration, validation, and testing are completed.
- Parameters:
trainer (
ModularTaskTrainer) – Mutable reference to the trainer.- Return type:
None
- cb_on_iteration_begin(trainer, iteration)[source]#
Called at the beginning of each iteration.
- Parameters:
trainer (
ModularTaskTrainer) – Mutable reference to the trainer.iteration (
int) – Current iteration number. For epoch-based training, this is the epoch number. For step-based training, this is the step number.
- Return type:
None
- cb_on_iteration_end(trainer, iteration, metrics)[source]#
Called at the end of each iteration including validation.
- Parameters:
trainer (
ModularTaskTrainer) – Mutable reference to the trainer.iteration (
int) – Current iteration number. For epoch-based training, this is the epoch number. For step-based training, this is the step number.metrics (
Dict[str,float]) – Dictionary of various metrics collected during the iteration.
- Return type:
None
- cb_on_step_begin(trainer, iteration, batch_idx)[source]#
Called at the beginning of step within an iteration.
- Parameters:
trainer (
ModularTaskTrainer) – Mutable reference to the trainer.iteration (
int) – Current iteration number. For epoch-based training, this is the epoch number. For step-based training, this is the step number.batch_idx (
int) – Current batch index within the iteration. For epoch-based training, this is the batch index within the epoch. For step-based training, this is the step number modulo the evaluation frequency.
- Return type:
None
- cb_on_step_end(trainer, iteration, batch_idx, loss)[source]#
Called at the end of step within an iteration.
- Parameters:
trainer (
ModularTaskTrainer) – Mutable reference to the trainer.iteration (
int) – Current iteration number. For epoch-based training, this is the epoch number. For step-based training, this is the step number.batch_idx (
int) – Current batch index within the iteration. For epoch-based training, this is the batch index within the epoch. For step-based training, this is the step number modulo the evaluation frequency.loss (
float) – Reduced loss value for the batch.
- Return type:
None
- cb_on_loader_exhausted(trainer, iteration)[source]#
Called when the training data loader is exhausted.
- Parameters:
trainer (
ModularTaskTrainer) – Mutable reference to the trainer.iteration (
int) – Current iteration number. For epoch-based training, this is the epoch number. For step-based training, this is the step number.
- Return type:
None
- cb_on_dev_begin(trainer, iteration)[source]#
Called at the beginning of the validation loop.
- Parameters:
trainer (
ModularTaskTrainer) – Mutable reference to the trainer.iteration (
int) – Current iteration number. For epoch-based training, this is the epoch number. For step-based training, this is the step number.
- Return type:
None
- cb_on_dev_end(trainer, iteration, dev_results)[source]#
Called at the end of the validation loop.
- Parameters:
trainer (
ModularTaskTrainer) – Mutable reference to the trainer.iteration (
int) – Current iteration number. For epoch-based training, this is the epoch number. For step-based training, this is the step number.dev_results (
Dict[str,float]) – Dictionary of validation results for the entire validation loop of the current iteration.
- Return type:
None
- cb_on_dev_step_begin(trainer, batch_idx)[source]#
Called at the beginning of the validation step.
- Parameters:
trainer (
ModularTaskTrainer) – Mutable reference to the trainer.batch_idx (
int) – Current batch index within the validation loop.
- Return type:
None
- cb_on_dev_step_end(trainer, batch_idx, loss)[source]#
Called at the end of the validation step.
- Parameters:
trainer (
ModularTaskTrainer) – Mutable reference to the trainer.batch_idx (
int) – Current batch index within the validation loop.loss (
float) – Reduced loss value for the batch.
- Return type:
None
- cb_on_test_begin(trainer)[source]#
Called at the beginning of the testing loop.
- Parameters:
trainer (
ModularTaskTrainer) – Mutable reference to the trainer.- Return type:
None
- cb_on_test_end(trainer, test_results)[source]#
Called at the end of the testing loop.
- Parameters:
trainer (
ModularTaskTrainer) – Mutable reference to the trainer.test_results (
Dict[str,float]) – Dictionary of test results for the entire testing loop.
- Return type:
None
- cb_on_test_step_begin(trainer, batch_idx)[source]#
Called at the beginning of the testing step.
- Parameters:
trainer (
ModularTaskTrainer) – Mutable reference to the trainer.batch_idx (
int) – Current batch index within the testing loop.
- Return type:
None
- cb_on_test_step_end(trainer, batch_idx, loss)[source]#
Called at the end of the testing step.
- Parameters:
trainer (
ModularTaskTrainer) – Mutable reference to the trainer.batch_idx (
int) – Current batch index within the testing loop.loss (
float) – Reduced loss value for the batch.
- Return type:
None
- class autrainer.core.callbacks.CallbackManager[source]#
Utils#
Utils provide various helpers for I/O, logging, timing, and hardware information.
- class autrainer.core.utils.Bookkeeping(output_directory, file_handler_path=None)[source]#
Bookkeeping to handle general disk operations and interactions.
- Parameters:
output_directory (
str) – Output directory to save files to.file_handler_path (
Optional[str]) – Path to save the log file to. Defaults to None.
- log(message, level=20)[source]#
Log a message.
- Parameters:
message (
str) – Message to log.level (
int) – Logging level. Defaults to logging.INFO.
- Return type:
None
- log_to_file(message, level=20)[source]#
Log a message to the file handler.
- Parameters:
message (
str) – Message to log.level (
int) – Logging level. Defaults to logging.INFO.
- Return type:
None
- create_folder(folder_name, path='')[source]#
Create a new folder in the output directory.
- Parameters:
folder_name (
str) – Name of the folder to create.path (
str) – Subdirectory to create the folder in. Defaults to “”.
- Return type:
None
- save_model_summary(model, shape, device, filename)[source]#
Save a model summary to a file.
- Parameters:
model (
Module) – Model to summarize.shape (
Tuple[int,...]) – Shape of the input to the model.filename (
str) – Name of the file to save the summary to.
- Return type:
None
- save_state(obj, filename, path='')[source]#
Save the state of an object.
- Parameters:
obj (
Union[Module,Optimizer,LRScheduler,None]) – Object to save the state of. If None, do nothing.filename (
str) – Name of the file to save the state to.path (
str) – Subdirectory to save the state to. Defaults to “”.
- Raises:
TypeError – If the object type is not supported.
- Return type:
None
- load_state(obj, filename, path='')[source]#
Load the state of an object.
- Parameters:
obj (
Union[Module,Optimizer,LRScheduler]) – Object to load the state into.filename (
str) – Name of the file to load the state from.path (
str) – Subdirectory to load the state from. Defaults to “”.
- Raises:
TypeError – If the object type is not supported.
FileNotFoundError – If the file is not found.
- Return type:
None
- save_audobject(obj, filename, path='')[source]#
Save an audobject.Object to disk.
- Parameters:
obj (
Object) – Object to save.filename (
str) – Name of the file to save the object to.path (
str) – Subdirectory to save the object to. Defaults to “”.
- Raises:
TypeError – If the object type is not supported.
- Return type:
None
- save_results_dict(results_dict, filename, path='')[source]#
Save a results dictionary to disk.
- Parameters:
results_dict (
Dict[str,float]) – Dictionary of metric names and values to save.filename (
str) – Name of the file to save the results to.path (
str) – Subdirectory to save the results to. Defaults to “”.
- Return type:
None
- save_results_df(results_df, filename, path='')[source]#
Save a results DataFrame to disk.
- Parameters:
results_df (
DataFrame) – DataFrame to save.filename (
str) – Name of the file to save the results to.path (
str) – Subdirectory to save the results to. Defaults to “”.
- Return type:
None
- save_results_np(results_np, filename, path='')[source]#
Save a results numpy array to disk.
- Parameters:
results_np (
ndarray) – Numpy array to save.filename (
str) – Name of the file to save the results to.path (
str) – Subdirectory to save the results to. Defaults to “”.
- Return type:
None
- save_best_results(metrics, filename, metric_fns, tracking_metric_fn, path='')[source]#
Save the best results to disk.
- Parameters:
metrics (
DataFrame) – DataFrame of metrics to save.filename (
str) – Name of the file to save the best results to.metric_fns (
List[AbstractMetric]) – List of metric functions to get the best results from.tracking_metric_fn (
AbstractMetric) – Tracking metric function to get the best iteration from.path (
str) – Subdirectory to save the best results to. Defaults to “”.
- Return type:
None
- class autrainer.core.utils.Timer(output_directory, timer_type)[source]#
Timer to measure time of different parts of the training process.
- Parameters:
output_directory (
str) – Directory to save the timer.yaml file to.timer_type (
str) – Name of the timer.
- stop()[source]#
Stop the timer.
- Raises:
ValueError – If the timer was not started.
- Return type:
None
- get_mean_seconds()[source]#
Get the mean time in seconds.
- Return type:
float- Returns:
Mean time in seconds.
- get_total_seconds()[source]#
Get the total time in seconds.
- Return type:
float- Returns:
Total time in seconds.
- autrainer.core.utils.get_hardware_info(device)[source]#
Get hardware information of the current system.
- Parameters:
device (
device) – Device to get the hardware information from.- Return type:
dict- Returns:
Dictionary containing system and GPU information.
- autrainer.core.utils.save_hardware_info(output_directory, device)[source]#
Save hardware information to a hardware.yaml file.
- Parameters:
output_directory (
str) – Directory to save the hardware information to.device (
device) – Device to get the hardware information from.
- Return type:
None
- autrainer.core.utils.set_seed(seed)[source]#
Set a global seed for random, numpy, and torch.
If CUDA is available, set the seed for CUDA and cuDNN as well.
- Parameters:
seed (
int) – Seed to set.- Return type:
None
- autrainer.core.utils.set_reproducibility(reproducible)[source]#
Set the reproducibility of the training process.
- Parameters:
reproducible (
bool) – Whether to make the training process reproducible. If True, the training process will be deterministic and use only deterministic algorithms. If False, the training process may be non-deterministic and use non-deterministic algorithms.- Return type:
None
Plotting#
Plotting provides a simple interface to plot metrics of a single run during Training as well as multiple runs during Postprocessing.
Tip
To create custom plotting configurations, refer to the custom plotting configurations tutorial.
By default, training plots are saved as png files for each metric. This can optionally be extended to any format supported by Matplotlib and additionally pickled for further processing.
Note
Plots are fully customizable by providing Matplotlib rcParams in a custom plotting configuration.
- class autrainer.core.plotting.PlotBase(output_directory, training_type, figsize, latex, filetypes, pickle, context, palette, replace_none, add_titles, add_xlabels, add_ylabels, rcParams)[source]#
Base class for plotting.
- Parameters:
output_directory (
str) – Output directory to save plots to.training_type (
str) – Type of training in [“Epoch”, “Step”].figsize (
tuple) – Figure size in inches.latex (
bool) – Whether to use LaTeX in plots. Requires the latex package. To install all necessary dependencies, run: pip install autrainer[latex].filetypes (
list) – Filetypes to save plots as.pickle (
bool) – Whether to save additional pickle files of the plots.context (
str) – Context for seaborn plots.palette (
str) – Color palette for seaborn plots.replace_none (
bool) – Whether to replace “None” in labels with “~”.add_titles (
bool) – Whether to add titles to plots.add_xlabels (
bool) – Whether to add x-labels to plots.add_ylabels (
bool) – Whether to add y-labels to plots.rcParams (
Dict[str,Any]) – Additional Matplotlib rcParams to set.
- save_plot(fig, name, path='', close=True, tight_layout=True)[source]#
Save a plot to the output directory.
- Parameters:
fig (
Figure) – Matplotlib figure to save.name (
str) – Name of the plot.path (
str) – Path to save the plot to relative to the output directory.close (
bool) – Whether to close the figure after saving.tight_layout (
bool) – Whether to apply tight layout to the plot.
- Return type:
None
- class autrainer.core.plotting.PlotMetrics(output_directory, training_type, figsize, latex, filetypes, pickle, context, palette, replace_none, add_titles, add_xlabels, add_ylabels, rcParams, metric_fns)[source]#
Plot the metrics of one or multiple runs.
- Parameters:
output_directory (
str) – Output directory to save plots to.training_type (
str) – Type of training in [“Epoch”, “Step”].figsize (
tuple) – Figure size in inches.latex (
bool) – Whether to use LaTeX in plots. Requires the latex package. To install all necessary dependencies, run: pip install autrainer[latex].filetypes (
list) – Filetypes to save plots as.pickle (
bool) – Whether to save additional pickle files of the plots.context (
str) – Context for seaborn plots.palette (
str) – Color palette for seaborn plots.replace_none (
bool) – Whether to replace “None” in labels with “~”.add_titles (
bool) – Whether to add titles to plots.add_xlabels (
bool) – Whether to add x-labels to plots.add_ylabels (
bool) – Whether to add y-labels to plots.rcParams (
Dict[str,Any]) – Additional Matplotlib rcParams to set.metric_fns (
List[AbstractMetric]) – List of metrics to use for plotting.
Default Configurations
Default
conf/plotting/Default.yaml#1figsize: [10, 5] 2latex: false 3filetypes: [png] 4pickle: false 5context: notebook 6palette: colorblind 7replace_none: false 8add_titles: true 9add_xlabels: true 10add_ylabels: true 11 12rcParams: 13 legend.fontsize: 9
- plot_run(metrics, std_scale=0.1)[source]#
Plot the metrics of a single run.
- Parameters:
metrics (
DataFrame) – DataFrame containing the metrics.std_scale (
float) – Scale factor for the standard deviation. Defaults to 0.1.
- Return type:
None
- plot_metric(metrics, metric, metrics_std=None, std_scale=0.1, max_runs=None)[source]#
Plot a single metric of multiple runs.
- Parameters:
metrics (
DataFrame) – DataFrame containing the metrics.metric (
str) – Metric to plot.metrics_std (
Optional[DataFrame]) – DataFrame containing the standard deviations. Defaults to None.std_scale (
float) – Scale factor for the standard deviation. Defaults to 0.1.max_runs (
Optional[int]) – Maximum number of best runs to plot. If None, all runs are plotted. Defaults to None.
- Return type:
None
- plot_aggregated_bars(metrics_df, metric, subplots_by=0, group_by=1, split_subgroups=True)[source]#
Plot aggregated bar plots for a metric.
Generate a bar plots from the metrics_df, which are divided by the “subplots_by” column, further grouped according to the “group_by” column. If “split_subgroups” is set to true, each group is further split into subgroups based on what comes after a potential “-” in the “group_by” entry. Finally the “metric” entries are averaged to create the bars and the standard deviation is shown as error bars.
- Parameters:
metrics_df (
DataFrame) – DataFrame containing the metrics.metric (
str) – Metric to plot.subplots_by (
int) – Column to group the subplots by.group_by (
int) – Column to group the data by.split_subgroups (
bool) – Whether to split subgroups.
- Return type:
None
Constants#
autrainer provides a set of constants singletons to control naming, training, and exporting configurations at runtime.
- class autrainer.core.constants.AbstractConstants[source]#
Abstract constants singleton class for managing the configurations of autrainer.
- class autrainer.core.constants.NamingConstants[source]#
Singleton for managing the naming configurations of autrainer.
- property NAMING_CONVENTION: List[str]#
Get the naming convention of runs. Defaults to
["dataset", "model", "optimizer", "learning_rate", "batch_size", "training_type", "iterations", "scheduler", "augmentation", "seed"].- Returns:
Naming convention of runs.
- property INVALID_AGGREGATIONS: List[str]#
Get the invalid aggregations for postprocessing. Defaults to
["training_type"].- Returns:
Invalid aggregations for postprocessing.
- property VALID_AGGREGATIONS: List[str]#
Get the valid aggregations for postprocessing. Defaults to
["dataset", "model", "optimizer", "learning_rate", "batch_size", "iterations", "scheduler", "augmentation", "seed"](the naming convention without the invalid aggregations).- Returns:
Valid aggregations for postprocessing.
- property CONFIG_DIRS: List[str]#
Get the configuration directories for Hydra configurations. Defaults to
["augmentation", "dataset", "model", "optimizer", "plotting", "preprocessing", "scheduler"].- Returns:
Configuration directories for Hydra configurations.
- class autrainer.core.constants.TrainingConstants[source]#
Singleton for managing the training configurations of autrainer.
- property TASKS: List[str]#
Get the supported training tasks. Defaults to
["classification", "ml-classification", "regression", "mt-regression"].- Returns:
Supported training tasks.
- class autrainer.core.constants.ExportConstants[source]#
Singleton for managing the export and logging configurations of autrainer.
- property LOGGING_DEPTH: int#
Get the depth of logging for configuration parameters. Defaults to
2.- Returns:
Depth of logging for configuration parameters.
- property IGNORE_PARAMS: List[str]#
Get the ignored configuration parameters for logging. Defaults to
["results_dir", "experiment_id", "model.dataset", "training_type", "save_frequency", "dataset.metrics", "plotting", "model.transform", "dataset.transform", "augmentation.steps", "loggers", "progress_bar", "continue_training", "remove_continued_runs", "save_train_outputs", "save_dev_outputs", "save_test_outputs"].- Returns:
Ignored configuration parameters for logging.
- property ARTIFACTS: List[str | Dict[str, str]]#
Get the artifacts to log for runs. Defaults to
["model_summary.txt", "metrics.csv", {"config.yaml": ".hydra"}].- Returns:
Artifacts to log for runs.