Experiments
Experiments combine Model, Data and possibly Environment modules to run a particular ITP task. There are intentionally very few constraints on Experiments to allow for a large variety of approaches.
The output of experiments are often used in others. For example, the End-to-End experiment might use a model checkpoint from a pre-training experiment, and the data generated from the End-to-End run might be used in a separate finetuning experiment.
Configuration
We use Hydra as our configuration management library. This allows for flexible, minimal changes to configuration files for running experiments. There are several ‘levels’ of hydra configuration which are analogous to class inheritance.
Specific experiments should include a subfolder in the configs
directory,
such as premise_selection
. In the root of the subfolder, you can implement a configuration
file as a base for the experiment, with default configurations for the specific experiment.
For example, configs/tacticzero/tactic_zero.yaml
defines the specific tactics used
in TacticZero, as well as default values for the number of steps max_steps
, number of epochs etc.
This configuration can inherit configurations from configs/base
, which define common options such as how directories,
checkpoints and logging are managed.
Within a config subdirectory, specific datasets and models can be further configured from the base.
For premise selection, we organise this into {dataset}/{model}
, whereas other experiments such as TacticZero and
HOList
which use one benchmark/dataset are organised based only on {model}
.
These configurations inherit from the base experiment, as well as the default model/data configuration
in config/data_type
.
They are the final configuration in the composition order, and are what should be specified when running an experiment.
At a minimum, they should specify experiment
, name
and the model to be run.
Running
Experiments are run from the root directory,
with the path to the associated configuration file passed in as the config-name
parameter.
Parameters can be overloaded, added or removed using the Hydra override grammar.
Lightning Experiments
Many experiments can be fully contained with PyTorch Lightning Modules. In these cases, the experiment (including logging and checkpoint behaviour) must be specified in the associated LightningModule and LightningDataModule.
The class experiments.lightning_runner
provides a generic interface for running these experiments.
This class is analogous to the PyTorch Lightning CLI, taking in a DataModule and a LightningModule,
with some additional functionality and flexibility through using Hydra configurations.
Such experiments currently include Premise Selection, the HOList training approach, and training/fine-tuning generative models. We also include some currently experimental approaches such as Direct Preference Optimisation and Implicit Language Q-Learning.
To use this class, you need to make a corresponding LightningModule and DataModule,
and specify their configuration parameters in an associated Hydra configuration file. The Hydra file
should specify the LightningModule using the model
key, with the path to the module in _target_
and
the parameters listed below this. Similarly for the DataModule, with the data_module
key.
More complicated experiments require a custom experiment module, and users can refer to the documentation on our TacticZero, HOList Eval, INT experiments to see some examples and design patterns.
Logging
Weights and Biases is the default logging platform used in BAIT, and is automatically integrated into all current experiments. This can be changed if desired, by modifying the logging source defined in the relevant experiment module.
Checkpointing
Checkpointing for all experiments which use a LightningModule is easily configured through the associated callbacks
for the trainer in the corresponding yaml
file.
Resuming Runs
To resume a run, you should add the following fields to the final configuration file:
exp_config.resume: true
logging_config.id: { wandb_id } #where `wandb_id` is the id associated with the resuming run
exp_config.directory: { base_dir } #where `base_dir` is the root of the directory created from the resuming run.
Sweeps
Sweeps can be run using the Hydra multi-run functionality. This allows multiple runs to be set up which vary several configuration keys, and is useful for e.g. hyperparameter sweeping.