Scoring Functions#
Scoring functions compute a difficulty score for each sample in a dataset. Difficulty scores are converted into a difficulty ordering by ranking all examples by ascending sample difficulty.
Tip
To create custom scoring functions, refer to the custom scoring functions tutorial.
Such a difficulty ordering may be used to create a curriculum, a sequence of samples ordered by difficulty, which can be used in downstream tasks to train a model in combination with a pacing function.
Most scoring functions obtain sample difficulty scores from a single training configuration. aucurriculum allows for the automatic creation of ensemble scoring functions which may consist of multiple configurations and obtain the final difficulty ordering by averaging the per-example difficulty scores across all atomic scoring functions.
Note
aucurriculum currently supports scoring functions exclusively for multi-class classification tasks, both for obtaining sample difficulty scores and for curriculum-based training.
aucurriculum povides model-based, predefined, and random scoring functions to compute sample difficulty scores.
Tip
All scoring function configurations contain placeholder values (indicated by ???
) that need to be replaced with the appropriate values.
For more information on how to configure scoring functions, refer to the quickstart guide.
Curriculum Score Manager#
CurriculumScoreManager
manages the calculation of scoring functions in three steps:
preprocess()
: Preprocess a scoring function configuration, creating one or more atomic scoring functions.run()
: Run a single atomic scoring function, producing a difficulty score for each sample.postprocess()
: Optionally combine multiple atomic scoring functions, creating a single difficulty score for each sample in a dataset.
- class aucurriculum.curricula.CurriculumScoreManager(cfg, output_directory)[source]#
Curriculum score manager to control pre-processing, running, and post-processing of scoring functions.
- Parameters:
cfg (
DictConfig
) – Curriculum configuration.output_directory (
str
) – The output directory to save the results.
- preprocess()[source]#
Preprocess the scoring function, possibly creating one or more configurations and run names.
- Return type:
Tuple
[list
,list
]- Returns:
List of configurations and list of run names.
- run(run_config, run_name)[source]#
Run a single scoring function.
- Parameters:
run_config (
DictConfig
) – The run configuration.run_name (
str
) – The name of the run.
- Return type:
None
- postprocess(score_id, correlation=None)[source]#
Postprocess the scoring function and optionally create a correlation matrix.
- Parameters:
score_id (
str
) – The score ID to postprocess.correlation (
Optional
[DictConfig
]) – The correlation matrix configuration. Dictionary of lists of score IDs to include in a single correlation matrix. Defaults to None.
- Return type:
None
Abstract Scoring Function#
All scoring functions inherit from the AbstractScore
class and implement the
run()
method calculating the difficulty scores for each sample in the dataset.
AbstractScore
additionally provides common methods that are shared among most scoring functions.
Note
Scoring functions can optionally override the preprocess()
and
postprocess()
methods to perform additional operations before
and after the scoring function is run, such as combining multiple atomic scoring functions in a different way than averaging.
- class aucurriculum.curricula.scoring.AbstractScore(output_directory, results_dir, experiment_id, run_name, stop=None, subset='train', reverse_score=False, criterion=None)[source]#
Abstract class for scoring functions.
- Parameters:
output_directory (
str
) – Directory where the scores will be stored.results_dir (
str
) – The directory where the results are stored.experiment_id (
str
) – The ID of the grid search experiment.run_name (
Union
[str
,List
[str
]]) – Name or list of names of the runs to score. Runs can be single runs or aggregated runs.stop (
Optional
[str
]) – Model state dict to load or to stop at in [“best”, “last”]. Defaults to None.subset (
str
) – Dataset subset to use for scoring in [“train”, “dev”, “test”]. Defaults to “train”.reverse_score (
bool
) – Whether to reverse the score ranking. Defaults to False.criterion (
Optional
[str
]) – The criterion to use for scoring. If None, no criterion will be used. Defaults to None.
- Raises:
ValueError – If subset is not in [“train”, “dev”, “test”] or if stop is not in [“best”, “last”, None].
- preprocess()[source]#
Preprocess one or multiple runs, creating a list of configurations and a list of run names to score.
- Return type:
Tuple
[list
,list
]- Returns:
List of configurations and list of run names to score.
- abstract run(config, run_config, run_name)[source]#
Run the scoring function for a single run and generate the scores.
- Parameters:
config (
DictConfig
) – The configuration of the curriculum scoring function.run_config (
DictConfig
) – The configuration of the run to score.run_name (
str
) – The name of the run to score.
- Return type:
None
- postprocess(score_id, runs)[source]#
Postprocess the scores and create the final scoring function ordering by averaging the scores of multiple runs and ranking the samples based on the mean score.
- Parameters:
score_id (
str
) – ID of the score to save.runs (
list
) – List of run names to postprocess and include in the score.
- Return type:
None
- split_run_name(run_name)[source]#
Split the full run name run name into the underlying training run name and the full run name containing the stop iteration (and optional criterion).
- Parameters:
run_name (
str
) – The run name to split.- Return type:
Tuple
[str
,str
]- Returns:
The name of the underlying training run and the full run name.
- create_criterion(data, reduction='none')[source]#
Create the criterion for the scoring function based on the criterion configuration.
- Parameters:
data (
AbstractDataset
) – Dataset to use for criterion setup.reduction (
str
) – Reduction to use for the criterion. Defaults to “none”.
- Return type:
Module
- Returns:
Criterion for the scoring function.
- static prepare_data_and_model(cfg)[source]#
Prepare the dataset and model for the scoring function based on the underlying training run configuration.
- Parameters:
cfg (
DictConfig
) – The configuration of the underlying training run.- Return type:
Tuple
[AbstractDataset
,AbstractModel
]- Returns:
The instantiated dataset and model.
- load_model_checkpoint(model, run_name)[source]#
Load the trained model checkpoint based on the run name and run configuration. The model will be loaded from the best checkpoint if stop is set to “best” and from the last checkpoint if stop is set to “last”.
- Parameters:
model (
Module
) – Model to load the checkpoint into.run_name (
str
) – Name of the run to load the checkpoint from.
- Return type:
None
- static forward_pass(model, loader, batch_size, output_map_fn, output_size=None, tqdm_desc='Scoring Forward Pass', disable_progress_bar=True, device=None, timer=None)[source]#
Perform a forward pass through the model and return the outputs and labels.
- Parameters:
model (
AbstractModel
) – Model to perform the forward pass with.loader (
DataLoader
) – DataLoader to use for the forward pass.batch_size (
int
) – Batch size to use for the forward pass.output_map_fn (
Callable
[[Tensor
],Tensor
]) – Function to map the model outputs to the desired output format.output_size (
Optional
[int
]) – Size of the output tensor. If None, the model output should be a single scalar. Defaults to None.tqdm_desc (
str
) – Description for the tqdm progress bar. Defaults to “Scoring Forward Pass”.disable_progress_bar (
bool
) – Whether to disable the progress bar. Defaults to True.device (
Optional
[device
]) – Device to use for the forward pass. If None, the device will be set to “cpu”. Defaults to None.timer (
Optional
[Timer
]) – Timer to time the forward pass. If provided, the timer is started before the forward pass and stopped after the forward pass. Defaults to None.
- Return type:
Tuple
[ndarray
,ndarray
]- Returns:
Mapped model outputs and labels.
- create_dataframe(scores, labels, data)[source]#
Create a dataframe from the scores, labels, and dataset.
- Parameters:
scores (
ndarray
) – The score for each sample.labels (
ndarray
) – The encoded labels for each sample.data (
AbstractDataset
) – The dataset.
- Return type:
DataFrame
- Returns:
The dataframe with the scores, encoded labels, and decoded labels.
- rank_and_normalize(df)[source]#
Rank and normalize the scores in the dataframe by ranking the scores using method=”first” and normalizing the ranks to the range [0, 1]. If reverse_score is set to True, the ranks will be reversed.
In the resulting difficulty orderint, lower ranks always indicate easier samples.
- Parameters:
df (
DataFrame
) – The output dataframe with “mean” column containing the scores.- Return type:
Series
- Returns:
The normalized ranks.
- static save_scores(df, path)[source]#
Save the scores dataframe to the specified path.
- Parameters:
df (
DataFrame
) – The scores dataframe.path (
str
) – The path to save the scores dataframe.
- Return type:
None
Model-based Scoring Functions#
Model-based scoring functions obtain a difficulty score for each sample by leveraging trained models and using, among others, training dynamics, model predictions, or losses to determine the difficulty of a sample.
Most model-based scoring functions require a trained model to compute the difficulty scores which is specified in the scoring function configuration
under the run_name
parameter.
The run_name
should be a run name or list of run names from which to load the models for scoring and should exist in the
results_dir
and experiment_id
set in the curriculum scoring configuration (conf/curriculum.yaml
) file.
It is also possible to specify (lists of) aggregated run names which are automatically resolved to the underlying runs,
effectively creating an ensemble scoring function.
- class aucurriculum.curricula.scoring.CELoss(output_directory, results_dir, experiment_id, run_name, criterion, stop='best', subset='train')[source]#
Cross-Entropy Loss scoring function computing the cross-entropy loss for each sample in the dataset individually. It is originally termed bootstrapping and implemented as described in: https://arxiv.org/abs/1904.03626
- Parameters:
output_directory (
str
) – Directory where the scores will be stored.results_dir (
str
) – The directory where the results are stored.experiment_id (
str
) – The ID of the grid search experiment.run_name (
str
) – Name or list of names of the runs to score. Runs can be single runs or aggregated runs.criterion (
str
) – The criterion to use for obtaining the per-example loss. The reduction of the criterion is automatically set to “none”.stop (
str
) – Model state dict to load or to stop at in [“best”, “last”]. Defaults to “best”.subset (
str
) – Dataset subset to use for scoring in [“train”, “dev”, “test”]. Defaults to “train”.
Default Configurations
1id: CELoss 2type: CELoss 3_target_: aucurriculum.curricula.scoring.CELoss 4run_name: ??? 5criterion: autrainer.criterions.CrossEntropyLoss 6stop: best # "best" or "last" 7subset: train # train, dev, test
- class aucurriculum.curricula.scoring.CumulativeAccuracy(output_directory, results_dir, experiment_id, run_name, stop='best', subset='train')[source]#
Cumulative Accuracy scoring function computing the mean accuracy from the first to the stop epoch for each sample in the dataset individually. The scoring function serves as a computationally less expensive proxy to the C-score as described in: https://arxiv.org/abs/2002.03206
- Parameters:
output_directory (
str
) – Directory where the scores will be stored.results_dir (
str
) – The directory where the results are stored.experiment_id (
str
) – The ID of the grid search experiment.run_name (
str
) – Name or list of names of the runs to score. Runs can be single runs or aggregated runs.stop (
str
) – Model state dict to load or to stop at in [“best”, “last”]. Defaults to “best”.subset (
str
) – Dataset subset to use for scoring in [“train”, “dev”, “test”]. Defaults to “train”.
Default Configurations
1id: CumulativeAccuracy 2type: CumulativeAccuracy 3_target_: aucurriculum.curricula.scoring.CumulativeAccuracy 4run_name: ??? 5stop: best # "best" or "last" 6subset: train # train, dev, test
- class aucurriculum.curricula.scoring.CVLoss(output_directory, results_dir, experiment_id, splits, setup, criterion, stop='best', subset='train')[source]#
Cross-Validation Loss scoring function computing the cross-entropy loss for each sample in the dataset individually. The dataset is split into splits parts and the loss is computed for each part individually by training on the remaining parts as described in: TODO: add reference once paper is published.
- Parameters:
output_directory (
str
) – Directory where the scores will be stored.results_dir (
str
) – The directory where the results are stored.experiment_id (
str
) – The ID of the grid search experiment.splits (
int
) – Number of splits for the cross-validation.setup (
DictConfig
) –Configuration for the grid search to perform for each split. Each configuration parameter can be a string or list of strings for multiple configurations. The following parameters are required:
filters: Optional list of filters to apply to the runs.
dataset: Dataset ID.
model: Model ID.
optimizer: Optimizer ID.
learning_rate: Learning rate.
scheduler: Scheduler ID.
augmentation: Augmentation ID.
seed: Seed.
batch_size: Batch size.
inference_batch_size: Batch size for inference.
plotting: Plotting ID.
training_type: Training type.
iterations: Number of iterations.
eval_frequency: Evaluation frequency.
save_frequency: Save frequency.
save_train_outputs: Whether to save the training outputs.
save_dev_outputs: Whether to save the dev outputs.
save_test_outputs: Whether to save the test outputs.
criterion (
str
) – The criterion to use for obtaining the per-example loss. The reduction of the criterion is automatically set to “none”.stop (
str
) – Model state dict to load or to stop at in [“best”, “last”]. Defaults to “best”.subset (
str
) – Dataset subset to use for scoring in [“train”, “dev”, “test”]. Defaults to “train”.
- Raises:
ValueError – If the number of splits is less than 2.
Default Configurations
1id: CVLoss 2type: CVLoss 3_target_: aucurriculum.curricula.scoring.CVLoss 4splits: 3 5setup: 6 filters: null 7 dataset: ??? 8 model: ??? 9 optimizer: ??? 10 learning_rate: ??? 11 scheduler: ??? 12 augmentation: ??? 13 seed: ??? 14 batch_size: ??? 15 inference_batch_size: ??? 16 plotting: Default 17 training_type: epoch 18 iterations: 5 19 eval_frequency: 1 20 save_frequency: 1 21 save_train_outputs: true 22 save_dev_outputs: true 23 save_test_outputs: true 24criterion: autrainer.criterions.CrossEntropyLoss 25stop: best 26subset: train
- class aucurriculum.curricula.scoring.FirstIteration(output_directory, results_dir, experiment_id, run_name, stop='best', subset='train')[source]#
First Iteration scoring function computing the first iteration in which the model correctly predicts the target including all subsequent iterations for each sample in the dataset individually as described in: https://arxiv.org/abs/2012.03107
- Parameters:
output_directory (
str
) – Directory where the scores will be stored.results_dir (
str
) – The directory where the results are stored.experiment_id (
str
) – The ID of the grid search experiment.run_name (
str
) – Name or list of names of the runs to score. Runs can be single runs or aggregated runs.stop (
str
) – Model state dict to load or to stop at in [“best”, “last”]. Defaults to “best”.subset (
str
) – Dataset subset to use for scoring in [“train”, “dev”, “test”]. Defaults to “train”.
Default Configurations
1id: FirstIteration 2type: FirstIteration 3_target_: aucurriculum.curricula.scoring.FirstIteration 4run_name: ??? 5stop: best # "best" or "last" 6subset: train # train, dev, test
- class aucurriculum.curricula.scoring.PredictionDepth(output_directory, results_dir, experiment_id, run_name, probe_placements, max_embedding_size=None, match_dimensions=False, knn_n_neighbors=30, knn_batch_size=1024, save_embeddings=False, stop='best', subset='train')[source]#
Prediction Depth scoring function computing the depth at which the first and all subsequent KNN probes align with the model’s prediction for each sample in the dataset individually as described in: https://arxiv.org/abs/2106.09647
- Parameters:
output_directory (
str
) – Directory where the scores will be stored.results_dir (
str
) – The directory where the results are stored.experiment_id (
str
) – The ID of the grid search experiment.run_name (
str
) – Name or list of names of the runs to score. Runs can be single runs or aggregated runs.probe_placements (
Union
[List
[str
],Dict
[str
,List
[str
]]]) – Names of the nodes in the traced model graph where the probes should be placed, specified using regex patterns. The input and output of the model are automatically added. If a list is provided, the same placements will be used for all runs. If a dictionary is provided, the placements will be used for the corresponding run names.max_embedding_size (
Optional
[int
]) – Maximum dimensionality of the flattened embeddings. If embeddings exceed this size, they will be pooled. Defaults to None.match_dimensions (
bool
) – Whether to match the spatial dimensions of the embeddings and create square embeddings. Defaults to False.knn_n_neighbors (
int
) – Number of neighbors to use for the parallel k-nearest neighbors algorithm. Defaults to 30.knn_batch_size (
int
) – Batch size for the parallel k-nearest neighbors algorithm. Defaults to 1024.save_embeddings (
bool
) – Whether to save the embeddings for each probe. Defaults to False.stop (
str
) – Model state dict to load or to stop at in [“best”, “last”]. Defaults to “best”.subset (
str
) – Dataset subset to use for scoring in [“train”, “dev”, “test”]. Defaults to “train”.
Default Configurations
1id: PredictionDepth 2type: PredictionDepth 3_target_: aucurriculum.curricula.scoring.PredictionDepth 4run_name: ??? 5probe_placements: ??? 6max_embedding_size: 65536 7match_dimensions: true 8stop: best # "best" or "last" 9subset: train # train, dev, test
- class aucurriculum.curricula.scoring.TransferTeacher(output_directory, results_dir, experiment_id, model, dataset, subset='train')[source]#
Transfer Teacher scoring function that computes margin to the decision boundary of a support vector machine (SVM) trained on the embeddings of a pre-trained model for each sample in the dataset as described in: https://arxiv.org/abs/1904.03626
- Parameters:
output_directory (
str
) – Directory where the scores will be stored.results_dir (
str
) – The directory where the results are stored.experiment_id (
str
) – The ID of the grid search experiment.model (
Union
[str
,List
[str
]]) – Model ID or list of model IDs to use for scoring.dataset (
str
) – Dataset ID to use for scoring.subset (
str
) – Dataset subset to use for scoring in [“train”, “dev”, “test”]. Defaults to “train”.
Default Configurations
1id: TransferTeacher 2type: TransferTeacher 3_target_: aucurriculum.curricula.scoring.TransferTeacher 4model: ??? 5dataset: ??? 6subset: train # train, dev, test
Predefined Scoring Functions#
Predefined scoring functions determine the difficulty of a sample based on predefined criteria and are specified in a CSV file.
- class aucurriculum.curricula.scoring.Predefined(output_directory, results_dir, experiment_id, file, scores_column, reverse, dataset, subset='train')[source]#
Predefined scoring function using predefined scores from a file.
- Parameters:
output_directory (
str
) – Directory where the scores will be stored.results_dir (
str
) – The directory where the results are stored.experiment_id (
str
) – The ID of the grid search experiment.file (
str
) – Path to the file containing the scores.scores_column (
str
) – Column name of the scores in the file.reverse (
bool
) – Whether to reverse the order of the scores.dataset (
str
) – Dataset ID to use for scoring.subset (
str
) – Dataset subset to use for scoring in [“train”, “dev”, “test”]. Defaults to “train”.
- Raises:
ValueError – If the file does not exist.
Default Configurations
1id: Predefined 2type: Predefined 3_target_: aucurriculum.curricula.scoring.Predefined 4file: ??? 5scores_column: ??? 6reverse: ??? # true, false 7dataset: ??? 8subset: train # train, dev, test
Random Scoring Functions#
Random scoring functions assign a random difficulty score to each sample in the dataset.
- class aucurriculum.curricula.scoring.Random(output_directory, results_dir, experiment_id, dataset, seed, subset='train')[source]#
Random scoring function that assigns random scores to each sample in the dataset.
- Parameters:
output_directory (
str
) – Directory where the scores will be stored.results_dir (
str
) – The directory where the results are stored.experiment_id (
str
) – The ID of the grid search experiment.dataset (
str
) – Dataset ID to use for scoring.seed (
int
) – Seed to use for random scoring.subset (
str
) – Dataset subset to use for scoring in [“train”, “dev”, “test”]. Defaults to “train”.
Default Configurations
1id: Random 2type: Random 3_target_: aucurriculum.curricula.scoring.Random 4dataset: ??? 5seed: 1 6subset: train # train, dev, test