Authors: Michele Invernizzi

Date: February 28, 2022

Aims

This Masterclass is an introduction to the OPES (On-the-fly Probability Enhanced Sampling) method and its PLUMED implementation.

Objectives

Once this Masterclass is completed, you will be able to:

Perform OPES simulations and analyse the results.
Use OPES_EXPANDED to sample different generalized ensembles
Test the effect of various OPES_METAD input options
Understand the typical use-case for OPES_METAD_EXPLORE

Prerequisites

We assume that you are familiar with PLUMED, metadynamics, umbrella sampling, and replica exchange. If you are not, the 2021 PLUMED Masterclass is a great place to start. In particular we suggest PLUMED Masterclass 21.1: PLUMED syntax and analysis, PLUMED Masterclass 21.3: Umbrella sampling, and PLUMED Masterclass 21.4: Metadynamics. Reading the OPES papers is also recommended [73] [74] [75].

Overview of the theory

The OPES method is an evolution of Metadynamics that also incorporates some of the ideas of the Variationally Enhanced Sampling (VES code) method. OPES is designed to be simple to use and robust with respect to suboptimal collective variables. Compared to metadynamics, it is faster in converging to a quasi-static bias and it handles multi-dimensional collective variables more efficiently.

The theory for the OPES method is presented in three separate papers [73] [74] [75]. A short overview can be found in the PLUMED documentation at OPES (On-the-fly Probability Enhanced Sampling), OPES_EXPANDED, OPES_METAD, and OPES_METAD_EXPLORE.

To learn more: Introduction to the theory

The OPES method aims at sampling a given target distribution over the configuration space, \(p^{\text{tg}}(\mathbf{x})\), different from the equilibrium Boltzmann distribution, \(P(\mathbf{x})\propto e^{-\beta U(\mathbf{x})}\). To do so, it iteratively builds a bias potential \(V(\mathbf{x})\), by estimating on-the-fly the needed probability distributions:

\[ V(\mathbf{x}) = -\frac{1}{\beta}\log\frac{p^{\text{tg}}(\mathbf{x})}{P(\mathbf{x})}\, . \]

The bias quickly becomes quasi-static and the desired properties, such as the free energy, can be calculated with a simple reweighting. For any given observable \(O=O(\mathbf{x})\) one can write the expectation value as:

\[ \langle O \rangle_{P} = \frac{\langle O e^{\beta V}\rangle_{p^{\text{tg}}}}{\langle e^{\beta V}\rangle_{p^{\text{tg}}}}\, . \]

There are two ways to define an OPES target distribution, a metadynamics-like and a replica-exchange-like. A replica-exchange-like target distribution is an expanded ensemble defined as a sum of overlapping probability distributions:

\[ p^{\text{tg}}_{\{\lambda\}}(\mathbf{x})=\frac{1}{N_{\{\lambda\}}}\sum_{\lambda} P_{\lambda}(\mathbf{x})\, . \]

A typical example is the temperature expanded ensemble, where each \( P_{\lambda}(\mathbf{x})\) is the same system, but at a different temperature. This kind of target distribution can be defined via a set of expansion collective variables (ECVs). The action OPES_EXPANDED can be used to sample this kind of target distributions.

A metadynamics-like target distribution, is defined by its marginal over a set of collective variables (CVs), \(\mathbf{s}=\mathbf{s}(\mathbf{x})\). A typical choice for this marginal distribution is the well-tempered one:

\[ p^{\text{WT}}(\mathbf{s})=\left[P(\mathbf{s})\right]^{1/\gamma}\, , \]

where \(P(\mathbf{s})\) is the marginal over the CVs of the Boltzmann distribution, and \(\gamma>1\) is the bias factor. The well-tempered distribution is a smoother version of the original one, and in the limit of \(\gamma=\infty\) it become a uniform distribution. The actions OPES_METAD and OPES_METAD_EXPLORE can be used to sample this kind of target distributions.

Setting up the software

For this Masterclass we will use GROMACS 2020.6 patched with PLUMED 2.8, with the OPES module and MPI enabled. The easiest way to install all the software needed is to use the plumed-masterclass-2022 conda environment, as described here.

Resources

The data needed to complete the exercises of this Masterclass can be found on GitHub. You can clone this repository locally on your machine with the usual command:

git clone https://github.com/invemichele/masterclass-22-03.git

The repository contains all the data to run the simulations, together with the scripts to analyse them. It also contains some annotated Jupyter notebooks, that will guide us through this tutorial. Most of the information can be found there, and this page is mainly meant as an overview of the content of the notebooks.

As done in some previous masterclass, we will play with a toy system, the small alanine dipeptide molecule in vacuum (ala2). It has the advantage of being a well-known system, cheap to simulate and with a higher level of complexity compared to a 2D model potential as the one used in this other OPES_METAD Tutorial: Running and post-processing. Ala2 has two main metastable basins which can be identified by a very efficient collective variable (CV), the \(\phi\) torsional angle. Here is its free energy surface (FES) as a function of the \(\phi\) and \(\psi\) angles:

Alanine dipeptide free energy surface along the Ramachandran angles.

Note: All the exercises have been tested with PLUMED version 2.8.0 and GROMACS 2020.6

Exercise 1: Sampling expanded ensembles with OPES

We start by using OPES to sample expanded ensembles [75]. This probably is not a familiar application of adaptive-bias enhanced sampling for most of PLUMED users, so we will start with a little

1.1 Multithermal simulations

The first example we consider is a multithermal simulation of alanine dipeptide. One can use replica exchange to perform a parallel tempering simulation and sample the two metastable states of alanine dipeptide, as explained for instance in Belfast tutorial: Replica exchange I. Here instead, we want to sample the same multithermal distribution not by combining simulations at different temperatures, but by adding a bias potential to a single simulation.

To learn more: The bias that we need

Enough equations, here is the plumed input file we need to run this multithermal simulation:

Click on the labels of the actions for more information on what each action computes

ene: ENERGY The ENERGY action with label ene calculates a single scalar value
ecv: ECV_MULTITHERMAL ARGcompulsory keyword 
the label of the internal energy of the system. 
=ene TEMP_MINthe minimum of the temperature range 
=300 TEMP_MAXthe maximum of the temperature range 
=1000 TEMP_STEPSthe number of steps in temperature 
=3 The ECV_MULTITHERMAL action with label ecv calculates a single scalar value
opes: OPES_EXPANDED ARGcompulsory keyword 
the label of the ECVs that define the expansion. 
=ecv.ene PACEcompulsory keyword 
how often the bias is updated 
=500 The OPES_EXPANDED action with label opes calculates a single scalar value

We use the OPES_EXPANDED bias action, with a multithermal expansion defined by the ECV_MULTITHERMAL action. The resulting bias is a function of the potential energy, and is updated every 500 MD steps.

In the notebook we show the shape of the resulting bias, how to check for convergence and how to reweight for estimating the free energy at different temperatures. We also explain a simple procedure that is used to automatically set a reasonable number of temperature steps, if TEMP_STEPS is not provided.

1.2 Multiumbrella simulations

Next, we present another expanded ensemble that can also be sampled by replica exchange. We call it multiumbrella and it is composed by all the distributions sampled by a typical multiple windows umbrella sampling simulation. In this kind of umbrella sampling, one chooses a CV and runs several simulations each one with a parabolic bias potential at a different CV location (see PLUMED Masterclass 21.3: Umbrella sampling). We can use OPES to sample an expanded target distribution that is composed of all of these combined.

In multiple windows umbrella sampling, the free energy difference between the windows is estimated during postprocessing (e.g. with WHAM), in OPES this is done on-the-fly at simulation time. The main advantages are that choosing the number and location of the umbrellas becomes easier, and having a single longer simulation can also help with sampling orthogonal slow degrees of freedom.

We choose as CV the \(\phi\) angle. The ECVs we need for this type of expansion are the following:

\[ \Delta u_i = \frac{(\phi-\phi_i)^2}{2\sigma^2}\, , \]

where \(\phi_i\) are the fixed centers of the umbrellas and \(\sigma\) determines their width. Usually in umbrella sampling another parameter is used, \(k=1/\sqrt{\sigma}\), but we will see in the notebook why we made a different choice.

The plumed input we need might look like this:

Click on the labels of the actions for more information on what each action computes

phi: TORSION ATOMSthe four atoms involved in the torsional angle 
=5,7,9,15 The TORSION action with label phi calculates a single scalar value
ecv: ECV_UMBRELLAS_LINE ARGthe input for this action is the scalar output from one or more other actions. 
=phi CV_MINcompulsory keyword 
the minimum of the CV range to be explored 
=-pi CV_MAXcompulsory keyword 
the maximum of the CV range to be explored 
=pi SIGMAcompulsory keyword 
sigma of the umbrella Gaussians 
=0.2 The ECV_UMBRELLAS_LINE action with label ecv calculates a single scalar value
opes: OPES_EXPANDED ARGcompulsory keyword 
the label of the ECVs that define the expansion. 
=ecv.phi PACEcompulsory keyword 
how often the bias is updated 
=500 The OPES_EXPANDED action with label opes calculates a single scalar value

This places umbrellas uniformly along \(\phi\), each one separated by a distance of one \(\sigma\). We can make them less close to each with the keyword SPACING. We can also provide a guess of the maximum free energy BARRIER we need to overcome, it can help speed up convergence.

In the notebook we show what kind of sampling this input provides, and invite you to play around with the parameters, to get a sense of what each option does. We also perform reweighting to obtain a FES estimate. This reweighting is done with kernel density estimation (similarly to plumed histogram function), and is interesting to see the effect of changing the bandwidth parameter.

1.3 Combined multithermal and multiumbrella simulations

Another advantage of using OPES to sample expanded ensembles is that it makes it very easy to combine them. We show this by running a combined multithermal and multiumbrella simulation of alanine dipeptide.

Using the multithermal expansion, we were able to sample both metastable states, but not very efficiently. With the multiumbrella target distribution we have many transitions, but we sample only at 300 K. By combining these two expansions we obtain the best of both, with frequent transitions and an efficient reweighting over a whole range of temperatures.

The drawback? We might have to deal with a lot of ECVs! However, while in a replica exchange scheme each ECV requires its own replica, with OPES we are free to use just one, or choose any number of parallel replica, in a multiple walkers scheme.

The way we combine expansions is to create a grid with all possible combinations of temperature ECVs and umbrella ECVs. So, if we have 10 ECVs for the temperature expansion and 20 for the multiumbrella one, we end up with 200 combined ECVs. This will become clearer once we look at the notebook and the simulation output.

From the point of view of the plumed input things are quite straightforward. For a combined multithermal-multiumbrella simulation it should look similar to this one:

Click on the labels of the actions for more information on what each action computes

# temperature expansion, based on energy
ene: ENERGY The ENERGY action with label ene calculates a single scalar value
ecv_mt: ECV_MULTITHERMAL ARGcompulsory keyword 
the label of the internal energy of the system. 
=ene TEMP_MINthe minimum of the temperature range 
=300 TEMP_MAXthe maximum of the temperature range 
=1000 The ECV_MULTITHERMAL action with label ecv_mt calculates a single scalar value
# multiumbrella expansion, based on phi
phi: TORSION ATOMSthe four atoms involved in the torsional angle 
=5,7,9,15 The TORSION action with label phi calculates a single scalar value
ecv_mu: ECV_UMBRELLAS_LINE ARGthe input for this action is the scalar output from one or more other actions. 
=phi CV_MINcompulsory keyword 
the minimum of the CV range to be explored 
=-pi CV_MAXcompulsory keyword 
the maximum of the CV range to be explored 
=pi SIGMAcompulsory keyword 
sigma of the umbrella Gaussians 
=0.2 The ECV_UMBRELLAS_LINE action with label ecv_mu calculates a single scalar value
# the OPES bias combines them
opes: OPES_EXPANDED ARGcompulsory keyword 
the label of the ECVs that define the expansion. 
=ecv_mt.ene,ecv_mu.phi PACEcompulsory keyword 
how often the bias is updated 
=500 The OPES_EXPANDED action with label opes calculates a single scalar value

To fully appreciate the strength of combining these two different expansions, one can run the same simulations but using the \(\psi\) angle instead of \(\phi\), and see what happens.

Exercise 2: OPES for metadynamics-like simulations

We now use OPES to perform biased simulations very similar to those of metadynamics [73] [74]. We will sample a well-tempered distribution along a chosen set of collective variables.

2.1 Biasing a good CV

We start by biasing the \(\phi\) angle, which is known to be a good CV for alanine dipeptide. Here is how the FES along this CV looks like:

Alanine dipeptide free energy surface along one of its torsional angles.

We use this simple example to get familiar with OPES_METAD, its input parameter and output quantities. We also compare it with well-tempered metadynamics, and we start to see how this two methods differ.

This also gives us the opportunity of using some of the postprocessing scripts for OPES. For OPES, there is not yet a tool similar to the sum_hills, but we can obtain the same result with simple python scripts and by dumping the state of the simulation through the STATE_WFILE keyword. This STATE file contains the compressed kernels used by OPES at a given time to represent the probability distribution, and thus the bias.

2.2 Biasing two good CVs

Next, we move to biasing both the Ramachandran angles of ala2. A minimal plumed input to do this is the following:

Click on the labels of the actions for more information on what each action computes

phi: TORSION ATOMSthe four atoms involved in the torsional angle 
=5,7,9,15 The TORSION action with label phi calculates a single scalar value
psi: TORSION ATOMSthe four atoms involved in the torsional angle 
=7,9,15,17 The TORSION action with label psi calculates a single scalar value
opes: OPES_METAD ARGthe input for this action is the scalar output from one or more other actions. 
=phi,psi PACEcompulsory keyword 
the frequency for kernel deposition 
=500 BARRIERcompulsory keyword 
the free energy barrier to be overcome. 
=40 The OPES_METAD action with label opes calculates the following quantities:

 Quantity    Description  
opes.bias the instantaneous value of the bias potential
opes.rct estimate of c(t). \f$\frac{1}{\beta}\log \langle e^{\beta V} \rangle\f$, should become flat as the simulation converges. Do NOT use for reweighting
opes.zed estimate of Z_n. should become flat once no new CV-space region is explored
opes.neff effective sample size
opes.nker total number of compressed kernels used to represent the bias

This example gives us a nice opportunity to visualize the kernel compression algorithm in action. We can also get an idea of the differences between OPES_METAD and OPES_METAD_EXPLORE methods. The plumed inputs are identical:

Click on the labels of the actions for more information on what each action computes

phi: TORSION ATOMSthe four atoms involved in the torsional angle 
=5,7,9,15 The TORSION action with label phi calculates a single scalar value
psi: TORSION ATOMSthe four atoms involved in the torsional angle 
=7,9,15,17 The TORSION action with label psi calculates a single scalar value
opes: OPES_METAD_EXPLORE ARGthe input for this action is the scalar output from one or more other actions. 
=phi,psi PACEcompulsory keyword 
the frequency for kernel deposition 
=500 BARRIERcompulsory keyword 
the free energy barrier to be overcome. 
=40 The OPES_METAD_EXPLORE action with label opes calculates the following quantities:

 Quantity    Description  
opes.bias the instantaneous value of the bias potential
opes.rct estimate of c(t). \f$\frac{1}{\beta}\log \langle e^{\beta V} \rangle\f$, should become flat as the simulation converges. Do NOT use for reweighting
opes.zed estimate of Z_n. should become flat once no new CV-space region is explored
opes.neff effective sample size
opes.nker total number of compressed kernels used to represent the bias

From here it should be easy to compare both OPES variants to metadynamics, maybe by running some longer simulations. We can also check the way the bias converges by focusing on the estimate of the free energy difference between the basins.

2.3 Biasing more CVs: alanine tetrapeptide

In order to experiment with more than two CVs, we have to introduce a new system, alanine tetrapeptide in vacuum (ala4). It is a bigger brother of ala2, with not one but three \(\phi\) and \(\psi\) torsional angles, that can be used to sample its 8 metastable basins.

We can try to sample ala4 with each of the tree methods, METAD, OPES_METAD, and OPES_METAD_EXPLORE. How will the respective input files look like?

We run only short simulations to see how quickly the CV space is sampled by the various methods, and not for converging a FES. This example can help to appreciate the effect of the \(Z\) term in OPES_METAD. In fact, the more CVs there are, the more it plays an important role in speeding up exploration.

We can also see the effect of changing the bias factor \(\gamma\) on the sampling efficiency.

2.4 Biasing a suboptimal CV

Finally, what is probably the most interesting scenario. We bias again ala2, but this time using a suboptimal collective variable. We don't know much about this CV, it could be for instance the output of a machine learning procedure. Will we be able to calculate the FES as a function of this CV? How should the plumed input files look like? Here it would be nice to also have some error bars to our estimates.

We will discuss together the results of these simulations, and also reveal how this CV was constructed.

If you prefer a more straightforward example of suboptimal CV, you can check out this OPES_METAD Tutorial: Running and post-processing.

Conclusion

Hopefully this masterclass helped you to start playing around with the OPES method and get a sense of what it can be useful for. If you are still interested in testing it out and understanding more about it, then you should check out the PLUMED-NEST. Just search for "opes" and you will find several papers that use the method, together with all the input files necessary to reproduce their simulations.

Quantity	Description
opes.bias	the instantaneous value of the bias potential
opes.rct	estimate of c(t). \f$\frac{1}{\beta}\log \langle e^{\beta V} \rangle\f$, should become flat as the simulation converges. Do NOT use for reweighting
opes.zed	estimate of Z_n. should become flat once no new CV-space region is explored
opes.neff	effective sample size
opes.nker	total number of compressed kernels used to represent the bias