Quickstart#

The following quickstart guide provides a short introduction to aucurriculum and the creation of simple curriculum scoring and training experiments.

Tip

The quickstart example uses epoch-based training for the curriculum-based training, leading to a different number of training steps for each epoch, as the training subset size is reduced by the curriculum. To use a fixed number of training steps (which allows for an easier alignment of models trained with and without a curriculum), refer to the autrainer step-based training documentation.

Training without a Curriculum#

To get started, create a new directory and navigate to it:

mkdir aucurriculum_example && cd aucurriculum_example

Next, create a new empty aucurriculum project using the following configuration management CLI command:

aucurriculum create --empty

Alternatively, use the following configuration management CLI wrapper function:

import aucurriculum.cli

aucurriculum.cli.create(empty=True)

This will create the configuration directory structure and the main training configuration (conf/config.yaml) file with default values:

conf/config.yaml#
 1defaults:
 2  - _aucurriculum_train_
 3  - _self_
 4
 5results_dir: results
 6experiment_id: default
 7iterations: 5
 8
 9hydra:
10  sweeper:
11    params:
12      +seed: 1
13      +batch_size: 32
14      +learning_rate: 0.001
15      dataset: ToyTabular-C
16      model: ToyFFNN
17      optimizer: Adam
18      curriculum: None
19      curriculum/sampling: None
20      curriculum/scoring: None
21      curriculum/pacing: None
22      curriculum.pacing.initial_size: 1
23      curriculum.pacing.final_iteration: 0

Alongside the main training configuration file, the main curriculum configuration (conf/curriculum.yaml) file to obtain sample difficulty ordering is created with default values:

conf/curriculum.yaml#
 1defaults:
 2  - _aucurriculum_score_
 3  - _self_
 4
 5results_dir: results
 6experiment_id: default
 7
 8hydra:
 9  sweeper:
10    params:
11      curriculum/scoring: None
12
13correlation:
14  correlation_matrix: all

To train a model without a curriculum, run the following training CLI command:

aucurriculum train

Alternatively, use the following training CLI wrapper function:

aucurriculum.cli.train()

This will train the default ToyFFNN feed-forward neural network (autrainer.models.FFNN) on the default ToyTabular-C classification dataset with tabular data (autrainer.datasets.ToyDataset) and output the training results to the results/default/ directory.

Obtaining Difficulty Scores#

After training, models can be used to obtain difficulty scores for samples in combination with scoring functions and the main curriculum configuration (conf/curriculum.yaml) file.

Most scoring functions have a run_name parameter that specifies the run name or list of run names from which to load the models for scoring. By default, this parameter is a placeholder (indicated by ???) and has to be manually replaced with the run name of the trained model.

For example, create a local CELoss scoring function configuration file (conf/curriculum/scoring/CELoss.yaml), replacing the run_name with the previously trained run:

conf/curriculum/scoring/CELoss.yaml#
1id: CELoss
2type: CELoss
3_target_: aucurriculum.curricula.scoring.CELoss
4run_name: ToyTabular-C_ToyFFNN_Adam_0.001_32_epoch_5_None_None_N_N_None_None_1_0_1
5criterion: autrainer.criterions.CrossEntropyLoss
6stop: best # "best" or "last"
7subset: train # train, dev, test

Next, replace the curriculum/scoring parameter in the main curriculum configuration file (conf/curriculum.yaml) with the scoring function ID:

conf/curriculum.yaml#
 1defaults:
 2  - _aucurriculum_score_
 3  - _self_
 4
 5results_dir: results
 6experiment_id: default
 7
 8hydra:
 9  sweeper:
10    params:
11      curriculum/scoring: CELoss
12
13correlation:
14  correlation_matrix: all

To obtain the difficulty scores for the samples in the dataset, run the following curriculum CLI command:

aucurriculum curriculum

Alternatively, use the following curriculum CLI wrapper function:

aucurriculum.cli.curriculum()

Training with a Curriculum#

After obtaining the difficulty scores, the scoring function can be used in combination with a pacing function to create a curriculum for training.

For example, create a new main training configuration file (conf/curriculum_training.yaml) with the following parameters:

conf/curriculum_training.yaml#
 1defaults:
 2  - _aucurriculum_train_
 3  - _self_
 4
 5results_dir: results
 6experiment_id: default
 7iterations: 5
 8
 9hydra:
10  sweeper:
11    params:
12      +seed: 1
13      +batch_size: 32
14      +learning_rate: 0.001
15      dataset: ToyTabular-C
16      model: ToyFFNN
17      optimizer: Adam
18      curriculum: Curriculum # sample easiest examples first
19      curriculum/sampling: Balanced # assure class-balance in training subsets
20      curriculum/scoring: CELoss # the trained scoring function
21      curriculum/pacing: Linear, Logarithmic # increase the training dataset size linearly and logarithmically
22      curriculum.pacing.initial_size: 0.2 # use 20% of the training set initially
23      curriculum.pacing.final_iteration: 0.8 # use all data after 80% of training

Next, train a model with the scoring function and two different pacing functions using the following training CLI command:

aucurriculum train -cn curriculum_training

Alternatively, use the following training CLI wrapper function:

aucurriculum.cli.train(config_name="curriculum_training")

Finally, to compare the results of the run without a curriculum to the runs with a curriculum as well as averaged across pacing functions, run the following postprocessing command:

aucurriculum postprocess results default --aggregate curriculum.pacing

Alternatively, use the following postprocessing CLI wrapper function:

aucurriculum.cli.postprocess(
    results_dir="results",
    experiment_id="default",
    aggregate=[["curriculum.pacing"]],
)

Next Steps#

For more information on creating configurations, refer to the autrainer quickstart, Hydra configurations, as well as the Hydra documentation.

To create custom scoring and pacing functions alongside configurations, refer to the tutorials.