Training#
autrainer supports both epoch
- and step
-based training.
Tip
To create custom training configurations, refer to the custom training configurations quickstart.
Configuring Training#
Training is configured in the main configuration file and comprises the following attributes:
iterations
: The number of iterations to train the model for.training_type
: The type of training, eitherepoch
orstep
. By default, it is set toepoch
.eval_frequency
: The frequency in terms of iterations to evaluate the model on the development set. By default, it is set to 1.save_frequency
: The frequency in terms of iterations to states of the model, optimizer, and scheduler. By default, it is set to 1.inference_batch_size
: The batch size to use during inference. By default, it is set to the training batch size.
The following optional attributes can be set to configure the training process:
progress_bar
: Whether to display a progress bar during training and evaluation. By default, it is set to True.continue_training
: Whether to continue from an already finished run with the same configuration and fewer iterations. By default, it is set to True.remove_continued_runs
: Whether to remove the runs that have been continued. By default, it is set to True.save_train_outputs
: Whether to save indices, targets, losses, outputs, and predictions (results) on the training set. By default, it is set to True.save_dev_outputs
: Whether to save indices, targets, losses, outputs, and predictions (results) on the development set. By default, it is set to True.save_test_outputs
: Whether to save indices, targets, losses, outputs, and predictions (results) on the test set. By default, it is set to True.
For brevity, all training attributes with default values are outsourced to the _autrainer_.yaml defaults file and imported in the main configuration file.
Note
Throughout the documentation, the term iteration (as well as iterations
, eval_frequency
, and save_frequency
)
refers to a full pass over the training set for epoch-based training, and a single optimization step over a batch of the training set
for step-based training.
Trainer#
autrainer.training.Trainer
manages the training process.
It instantiates the model, dataset, criterion, optimizer, scheduler, and callbacks, and trains the model on the dataset.
It also logs the training process and saves the model, optimizer, and scheduler states at the end of each epoch.
The cfg
of the trainer is the composed main configuration file (e.g. conf/config.yaml
) for each training configuration in the sweep.
- class autrainer.training.ModularTaskTrainer(cfg, output_directory, experiment_id=None, run_name=None)[source]#
Modular Task Trainer.
- Parameters:
cfg (
DictConfig
) – Run configuration.output_directory (
str
) – Output directory for the run.experiment_id (
Optional
[str
]) – Experiment ID for the run. If None, the ID is automatically set based on the parent directory of the output directory. Defaults to None.run_name (
Optional
[str
]) – Run name for the run. If None, the name is automatically set based on the output directory. Defaults to None.
- train()[source]#
Train the model.
- Raises:
ValueError – If the training type is not supported.
- Return type:
float
- Returns:
The best value of the tracking metric.
- evaluate(iteration, iteration_folder, loader, df, dev_evaluation=True, save_to='dev', tracker=None)[source]#
Evaluate the model on the dev or test set.
- Parameters:
iteration (
int
) – Current iteration.iteration_folder – Folder to save the results to.
loader – Dataloader to evaluate on.
df – Groundtruth dataframe.
dev_evaluation – Whether to evaluate on the dev set. Defaults to True.
save_to – Prefix to save the results to. Defaults to “dev”.
tracker – Tracker to save the outputs. Defaults to None.
- Return type:
Dict
[str
,float
]- Returns:
Dictionary containing the evaluation results.
- property cfg: DictConfig#
Return the configuration of the trainer.
- Returns:
Copy of the configuration.
Callbacks#
Any dataset,
model,
optimizer,
scheduler,
criterion,
or logger can specify callback functions which start with cb_on_*()
.
Each callback is automatically invoked at the appropriate time during training.
The function signature of each callback is defined in CallbackSignature
.
For more control over the training process, custom callbacks can be defined and added to the trainer by specifying a list of
callback classes using shorthand syntax in the callbacks
attribute of the
main configuration file.
Each callback class can specify any number of callback functions following the signatures defined in CallbackSignature
.
Tip
To create custom callbacks, refer to the custom callbacks tutorial.
- class autrainer.training.CallbackSignature[source]#
- abstract cb_on_train_begin(trainer)[source]#
Called at the beginning of the training loop before the first iteration.
- Parameters:
trainer (
ModularTaskTrainer
) – Mutable reference to the trainer.- Return type:
None
- abstract cb_on_train_end(trainer)[source]#
Called at the end of the training loop after the last iteration, validation, and testing are completed.
- Parameters:
trainer (
ModularTaskTrainer
) – Mutable reference to the trainer.- Return type:
None
- abstract cb_on_iteration_begin(trainer, iteration)[source]#
Called at the beginning of each iteration.
- Parameters:
trainer (
ModularTaskTrainer
) – Mutable reference to the trainer.iteration (
int
) – Current iteration number. For epoch-based training, this is the epoch number. For step-based training, this is the step number.
- Return type:
None
- abstract cb_on_iteration_end(trainer, iteration, metrics)[source]#
Called at the end of each iteration including validation.
- Parameters:
trainer (
ModularTaskTrainer
) – Mutable reference to the trainer.iteration (
int
) – Current iteration number. For epoch-based training, this is the epoch number. For step-based training, this is the step number.metrics (
dict
) – Dictionary of various metrics collected during the iteration.
- Return type:
None
- abstract cb_on_step_begin(trainer, iteration, batch_idx)[source]#
Called at the beginning of step within an iteration.
- Parameters:
trainer (
ModularTaskTrainer
) – Mutable reference to the trainer.iteration (
int
) – Current iteration number. For epoch-based training, this is the epoch number. For step-based training, this is the step number.batch_idx (
int
) – Current batch index within the iteration. For epoch-based training, this is the batch index within the epoch. For step-based training, this is the step number modulo the evaluation frequency.
- Return type:
None
- abstract cb_on_step_end(trainer, iteration, batch_idx, loss)[source]#
Called at the end of step within an iteration.
- Parameters:
trainer (
ModularTaskTrainer
) – Mutable reference to the trainer.iteration (
int
) – Current iteration number. For epoch-based training, this is the epoch number. For step-based training, this is the step number.batch_idx (
int
) – Current batch index within the iteration. For epoch-based training, this is the batch index within the epoch. For step-based training, this is the step number modulo the evaluation frequency.loss (
float
) – Reduced loss value for the batch.
- Return type:
None
- abstract cb_on_loader_exhausted(trainer, iteration)[source]#
Called when the training data loader is exhausted.
- Parameters:
trainer (
ModularTaskTrainer
) – Mutable reference to the trainer.iteration (
int
) – Current iteration number. For epoch-based training, this is the epoch number. For step-based training, this is the step number.
- Return type:
None
- abstract cb_on_val_begin(trainer, iteration)[source]#
Called at the beginning of the validation loop.
- Parameters:
trainer (
ModularTaskTrainer
) – Mutable reference to the trainer.iteration (
int
) – Current iteration number. For epoch-based training, this is the epoch number. For step-based training, this is the step number.
- Return type:
None
- abstract cb_on_val_end(trainer, iteration, val_results)[source]#
Called at the end of the validation loop.
- Parameters:
trainer (
ModularTaskTrainer
) – Mutable reference to the trainer.iteration (
int
) – Current iteration number. For epoch-based training, this is the epoch number. For step-based training, this is the step number.val_results (
dict
) – Dictionary of validation results for the entire validation loop of the current iteration.
- Return type:
None
- abstract cb_on_val_step_begin(trainer, batch_idx)[source]#
Called at the beginning of the validation step.
- Parameters:
trainer (
ModularTaskTrainer
) – Mutable reference to the trainer.batch_idx (
int
) – Current batch index within the validation loop.
- Return type:
None
- abstract cb_on_val_step_end(trainer, batch_idx, loss)[source]#
Called at the end of the validation step.
- Parameters:
trainer (
ModularTaskTrainer
) – Mutable reference to the trainer.batch_idx (
int
) – Current batch index within the validation loop.loss (
float
) – Reduced loss value for the batch.
- Return type:
None
- abstract cb_on_test_begin(trainer)[source]#
Called at the beginning of the testing loop.
- Parameters:
trainer (
ModularTaskTrainer
) – Mutable reference to the trainer.- Return type:
None
- abstract cb_on_test_end(trainer, test_results)[source]#
Called at the end of the testing loop.
- Parameters:
trainer (
ModularTaskTrainer
) – Mutable reference to the trainer.test_results (
dict
) – Dictionary of test results for the entire testing loop.
- Return type:
None
- abstract cb_on_test_step_begin(trainer, batch_idx)[source]#
Called at the beginning of the testing step.
- Parameters:
trainer (
ModularTaskTrainer
) – Mutable reference to the trainer.batch_idx (
int
) – Current batch index within the testing loop.
- Return type:
None
- abstract cb_on_test_step_end(trainer, batch_idx, loss)[source]#
- Parameters:
trainer (
ModularTaskTrainer
) – Mutable reference to the trainer.batch_idx (
int
) – Current batch index within the testing loop.loss (
float
) – Reduced loss value for the batch.
- Return type:
None