Postprocessing#

Postprocessing allows for the summarization, aggregation, as well as grouping of the results of the grid search and can be done using the postprocessing CLI commands or the postprocessing CLI wrapper functions.

Summarization#

SummarizeGrid is used to summarize the results of the grid search. For each metric, a plot is created. All validation and test results are stored in a DataFrame. In addition, a DataFrame summarizing the hyperparameters is created.

class autrainer.postprocessing.SummarizeGrid(results_dir, experiment_id, summary_dir='summary', training_dir='training', clear_old_outputs=True, training_type=None, max_runs_plot=None, plot_params=None)[source]#

Summarize the results of a grid search.

Parameters:
  • results_dir (str) – The directory where the results are stored.

  • experiment_id (str) – The ID of the grid search experiment.

  • summary_dir (str) – The directory where the the grid search summary will be stored. Defaults to “summary”.

  • training_dir (str) – The directory of the training results of the experiment. Defaults to “training”.

  • clear_old_outputs (bool) – Whether to clear existing summary outputs. Defaults to True.

  • training_type (Optional[str]) – The type of training in [“Epoch”, “Step”]. If None, it will be inferred from the training results. Defaults to None.

  • max_runs_plot (Optional[int]) – The maximum number of best runs to plot. If None, all runs will be plotted. Defaults to None.

  • plot_params (Union[DictConfig, Dict, None]) – Additional parameters for plotting. Defaults to None.

summarize()[source]#

Summarize the results of the grid search.

Return type:

None

plot_metrics()[source]#

Plot the metrics of the grid search.

Return type:

None

plot_aggregated_bars()[source]#

Plot additional aggregated bars for the grid search.

Return type:

None

Aggregation#

AggregateGrid is used to aggregate the results of the grid search. The results are aggregated over one or more hyperparameters.

class autrainer.postprocessing.AggregateGrid(results_dir, experiment_id, aggregate_list, aggregate_prefix='agg', training_dir='training', max_runs_plot=None, aggregate_name=None, aggregated_dict=None, plot_params=None)[source]#

Aggregate the results of a grid search over one or more parameters.

If loggers have been used for the grid search, the aggregated results will be logged to the same loggers.

Parameters:
  • results_dir (str) – The directory where the results are stored.

  • experiment_id (str) – The ID of the grid search experiment.

  • aggregate_list (List[str]) – The list of parameters to aggregate over.

  • aggregate_prefix (str) – The prefix for the aggregated experiment ID. Defaults to “agg”.

  • training_dir (str) – The directory of the training results of the experiment. Defaults to “training”.

  • max_runs_plot (Optional[int]) – The maximum number of best runs to plot. If None, all runs will be plotted. Defaults to None.

  • aggregate_name (Optional[str]) – The name of the aggregated experiment. If None, it will be generated from the aggregate_list. Defaults to None.

  • aggregated_dict (Optional[dict]) – A dictionary mapping the aggregated experiment names to the runs to aggregate. If None, the runs will be aggregated based on the aggregate_list. Defaults to None.

  • plot_params (Optional[dict]) – Additional parameters for plotting. Defaults to None.

aggregate()[source]#

Aggregate the runs based on the specified parameters.

Return type:

None

summarize()[source]#

Summarize the aggregated results.

Return type:

None

Grouping#

GroupGrid is used to manually group the results of the grid search using a Hydra configuration file. A configuration file is used to define the groups which can be any combination of runs. The results are grouped according to the configuration file and can span multiple experiments.

The following configuration file illustrates the structure of the configuration file:

Manual Grouping Example

Manual grouping is done by defining a YAML configuration file as shown below. Multiple experiments (exp1, exp2, …) can be created hosting the grouped runs. runs is a list of runs that are created for each experiment.

conf/grouping.yaml#
 1defaults:  
 2  - _hydra_disable_logging_
 3  - _self_
 4  - plotting: Default # Use the default plotting configuration
 5
 6results_dir: results # Directory to save results
 7max_runs: null # Maximum number of best runs to include in the summary plots
 8
 9groupings:
10  - experiment_id: exp1 # Experiment ID (will be created if it doesn't exist)
11    create_summary: true # Whether to create a summary for the experiment
12    dir: null # Optional global directory for all runs
13    id: null # Optional global ID for all runs
14    states: null # Optional global save states for all runs
15    runs:
16      - run_name: FirstRun # Run name
17        dir: some_results_dir # Directory for the runs to be grouped
18        id: some_exp # ID for the runs to be grouped
19        states: false # Whether to copy the model states
20        combine: # Runs to combine into run_name
21          - SomeRun1
22          - SomeRun2
23      - run_name: SecondRun
24        dir: some_results_dir
25        id: some_exp
26        states: false
27        combine:
28          - SomeRun3
29          - SomeRun4
30  - experiment_id: exp2 # Example with global parameters to be more concise
31    create_summary: true
32    dir: some_results_dir
33    id: some_exp
34    states: false
35    runs:
36      - run_name: FirstRun
37        combine:
38          - SomeRun1
39          - SomeRun2
40      - run_name: SecondRun
41        combine:
42          - SomeRun3
43          - SomeRun4
class autrainer.postprocessing.GroupGrid(results_dir, groupings, max_runs=None, plot_params=None)[source]#

Group runs of one or more grid search experiments based on the specified groupings.

Parameters:
  • results_dir (str) – The directory where the results are stored.

  • groupings (Union[ListConfig[DictConfig], List[Dict]]) – A list of experiments to create containing one or more runs to group.

  • max_runs_plot – The maximum number of best runs to plot. If None, all runs will be plotted. Defaults to None.

  • plot_params (Optional[dict]) – Additional parameters for plotting. Defaults to None.

group_runs()[source]#

Group the runs of the specified experiments based on the groupings.

Return type:

None