Postprocessing#
Postprocessing allows for the summarization, aggregation, as well as grouping of the results of the grid search and can be done using the postprocessing CLI commands or the postprocessing CLI wrapper functions.
Summarization#
SummarizeGrid
is used to summarize the results of the grid search.
For each metric, a plot is created.
All validation and test results are stored in a DataFrame.
In addition, a DataFrame summarizing the hyperparameters is created.
- class autrainer.postprocessing.SummarizeGrid(results_dir, experiment_id, summary_dir='summary', training_dir='training', clear_old_outputs=True, training_type=None, max_runs_plot=None, plot_params=None)[source]#
Summarize the results of a grid search.
- Parameters:
results_dir (
str
) – The directory where the results are stored.experiment_id (
str
) – The ID of the grid search experiment.summary_dir (
str
) – The directory where the the grid search summary will be stored. Defaults to “summary”.training_dir (
str
) – The directory of the training results of the experiment. Defaults to “training”.clear_old_outputs (
bool
) – Whether to clear existing summary outputs. Defaults to True.training_type (
Optional
[str
]) – The type of training in [“Epoch”, “Step”]. If None, it will be inferred from the training results. Defaults to None.max_runs_plot (
Optional
[int
]) – The maximum number of best runs to plot. If None, all runs will be plotted. Defaults to None.plot_params (
Union
[DictConfig
,Dict
,None
]) – Additional parameters for plotting. Defaults to None.
Aggregation#
AggregateGrid
is used to aggregate the results of the grid search.
The results are aggregated over one or more hyperparameters.
- class autrainer.postprocessing.AggregateGrid(results_dir, experiment_id, aggregate_list, aggregate_prefix='agg', training_dir='training', max_runs_plot=None, aggregate_name=None, aggregated_dict=None, plot_params=None)[source]#
Aggregate the results of a grid search over one or more parameters.
If loggers have been used for the grid search, the aggregated results will be logged to the same loggers.
- Parameters:
results_dir (
str
) – The directory where the results are stored.experiment_id (
str
) – The ID of the grid search experiment.aggregate_list (
List
[str
]) – The list of parameters to aggregate over.aggregate_prefix (
str
) – The prefix for the aggregated experiment ID. Defaults to “agg”.training_dir (
str
) – The directory of the training results of the experiment. Defaults to “training”.max_runs_plot (
Optional
[int
]) – The maximum number of best runs to plot. If None, all runs will be plotted. Defaults to None.aggregate_name (
Optional
[str
]) – The name of the aggregated experiment. If None, it will be generated from the aggregate_list. Defaults to None.aggregated_dict (
Optional
[dict
]) – A dictionary mapping the aggregated experiment names to the runs to aggregate. If None, the runs will be aggregated based on the aggregate_list. Defaults to None.plot_params (
Optional
[dict
]) – Additional parameters for plotting. Defaults to None.
Grouping#
GroupGrid
is used to manually group the results of the grid search using a Hydra configuration file.
A configuration file is used to define the groups which can be any combination of runs.
The results are grouped according to the configuration file and can span multiple experiments.
The following configuration file illustrates the structure of the configuration file:
Manual Grouping Example
Manual grouping is done by defining a YAML configuration file as shown below.
Multiple experiments (exp1, exp2, …) can be created hosting the grouped runs.
runs
is a list of runs that are created for each experiment.
1defaults:
2 - _hydra_disable_logging_
3 - _self_
4 - plotting: Default # Use the default plotting configuration
5
6results_dir: results # Directory to save results
7max_runs: null # Maximum number of best runs to include in the summary plots
8
9groupings:
10 - experiment_id: exp1 # Experiment ID (will be created if it doesn't exist)
11 create_summary: true # Whether to create a summary for the experiment
12 dir: null # Optional global directory for all runs
13 id: null # Optional global ID for all runs
14 states: null # Optional global save states for all runs
15 runs:
16 - run_name: FirstRun # Run name
17 dir: some_results_dir # Directory for the runs to be grouped
18 id: some_exp # ID for the runs to be grouped
19 states: false # Whether to copy the model states
20 combine: # Runs to combine into run_name
21 - SomeRun1
22 - SomeRun2
23 - run_name: SecondRun
24 dir: some_results_dir
25 id: some_exp
26 states: false
27 combine:
28 - SomeRun3
29 - SomeRun4
30 - experiment_id: exp2 # Example with global parameters to be more concise
31 create_summary: true
32 dir: some_results_dir
33 id: some_exp
34 states: false
35 runs:
36 - run_name: FirstRun
37 combine:
38 - SomeRun1
39 - SomeRun2
40 - run_name: SecondRun
41 combine:
42 - SomeRun3
43 - SomeRun4
- class autrainer.postprocessing.GroupGrid(results_dir, groupings, max_runs=None, plot_params=None)[source]#
Group runs of one or more grid search experiments based on the specified groupings.
- Parameters:
results_dir (
str
) – The directory where the results are stored.groupings (
Union
[ListConfig
[DictConfig
],List
[Dict
]]) – A list of experiments to create containing one or more runs to group.max_runs_plot – The maximum number of best runs to plot. If None, all runs will be plotted. Defaults to None.
plot_params (
Optional
[dict
]) – Additional parameters for plotting. Defaults to None.