PLUMED Masterclass 21.5: Simulations with multiple replicas
Authors
Giovanni Bussi
Date
March 15, 2021

Aims

In this Masterclass, we will discuss how to perform and analyze multi-replica simulations where different replicas feel a different bias potential. We will also understand how to compute statistical errors on the computed quantities.

Objectives

Once you have completed this Masterclass you will be able to:

  • Use PLUMED and GROMACS to run multiple-replica simulations.
  • Use WHAM to combine multiple simulations performed with different bias potentials.
  • Calculate error bars on free energies and populations, taking into account correlations induced by replica exchanges.

Setting up PLUMED

For this masterclass you will need versions of PLUMED and GROMACS that are compiled using the MPI library. The versions used in the previous masterclasses will thus not work properly. In order to obtain the correct versions, please use the following commands:

conda install --strict-channel-priority -c plumed/label/masterclass-mpi -c conda-forge plumed
conda install --strict-channel-priority -c plumed/label/masterclass-mpi -c conda-forge gromacs

The --strict-channel-priority might be necessary in case your conda install is configured to download packages from the bioconda channel. Indeed, bioconda contains a version of GROMACS that is not patched with PLUMED and would thus not work here. Similarly, the channel plumed/label/masterclass-mpi should receive a priority higher than conda-forge, so as to install the MPI version of PLUMED.

On Linux, the command above should install the following packages:

  gromacs            plumed/label/masterclass-mpi/linux-64::gromacs-2019.6-h3fd9d12_100
  plumed             plumed/label/masterclass-mpi/linux-64::plumed-2.7.0-h3fd9d12_100
  mpi                conda-forge/linux-64::mpi-1.0-openmpi
  openmpi            conda-forge/linux-64::openmpi-4.1.0-h9b22176_1
  [ etc ... ]

The exact versions might be different. Notice however that GROMACS and PLUMED come from the plumed/label/masterclass-mpi channel, whereas the required libraries come from the conda-forge channel. To be sure the installed GROMACS is compiled with MPI patched with PLUMED, try the following shell command:

gmx_mpi mdrun -h 2> /dev/null | grep -q plumed && echo ok

It should print ok. To be sure that PLUMED has been compiled with MPI, try the following shell command:

plumed --has-mpi && echo ok

It should print ok.

Please ensure that you have setup PLUMED and GROMACS on your machine before starting the exercises. Also notice that in order to obtain good performances it is better to compile GROMACS from source on the machine you are running your simulations. You can find out in the PLUMED documention how to patch GROMACS with PLUMED so as to be able to install it from source. For this tutorial, the conda precompiled binaries will be sufficient.

Resources

The data needed to execute the exercises of this Masterclass can be found on GitHub. You can clone this repository locally on your machine using the following command:

git clone https://github.com/plumed/masterclass-21-5.git
Note
All the exercises were tested with PLUMED version 2.7.0 and GROMACS 2019.6

Exercises

Throughout this tutorial we will run simulations of alanine dipeptide in vacuum using GROMACS and PLUMED. Whereas this system is too simple to be considered a proper benchmark for enhanced sampling methods, it is complex enough to be used in learning them. Notice that, although PLUMED has a portable interface, the support for replica-exchange simulations is limited depending on the specific molecular dynamics engine. You should check the documentation of the MD code you are using to know if replica exchange simulations will work correctly with PLUMED.

Warning
At the time of this writing there is a bug in the rendering of the manual for PLUMED 2.7. In particular, all pages containing an example that requires multiple replicas are truncated. Since there is no new features in v2.7 in this sense, you are recommended to switch to the v2.6 manual. To do so, just replace the string doc-v2.7/user-doc with the string doc-v2.6/user-doc in the address bar.

Introduction to replica simulations

Many methods are based on the simultaneous simulation of multiple replicas. In some cases, all the replicas will use the same input file, whereas in other cases a separate input file should be provided for each replica. Notice that using a single input file does not imply that all the replicas will feel the same biasing potential. Indeed, since biasing potentials in PLUMED might be history dependent, and the history of each replica might different from the history of other replicas, the potentials might in the end be different.

PLUMED has been designed so that multiple-replica simulations can be run even if all the replicas are acting in the same directory. In order to avoid clashes in output files, thus, PLUMED will append a suffix corresponding to the index of the replica to the name of each output file (for instance, the command PRINT FILE=colvar.dat will print on a file names colvar.0.dat in the first replica, etc.). Suffixes will be added also to input files, so that if you run a simulation where the input file is plumed.dat, the first replica will open a file named plumed.0.dat, and so on. However, for input files, if the file including the suffix does not exist, PLUMED will look for the file without the suffix (in the example, plumed.dat). This provides maximum flexibility and allows to manage both cases where the input file is the same and cases where specific input files should be used.

In addition to this, it is possible to use a Special replica syntax that allows to differentiate the input of different replicas, even if they are all reading the same plumed.dat file. For instance, the command RESTRAINT ARG=d AT=@replicas:1.0,1.1,1.2 KAPPA=1.0 will apply restraints at different positions for three replicas.

Notice however that starting with GROMACS 2019 replica simulations are forced to run in separate directories. To exploit the possibility to use a single input file, one should put it in the parent directory and refer to is as -plumed ../plumed.dat. Output files will be produced in separate directories by default, but their names will be suffixed. If you want the PLUMED output files to be in the parent directory, just prepend their name with ../ (as in PRINT FILE=../colvar.dat).

In order to use multiple-replica methods, you should run your simulation using MPI. This can be done prefixing your command with mpiexec -np N --oversubscribe, where N is the number of processes that you want to use and the --oversubscribe option is an OpenMPI option that is required to use more processes than the number of available processors. This is typically suboptimal, but we will need it in our lectures to run, e.g., simulations with 32 replicas even if we have a computer with 4 cores.

In brief, to run a GROMACS simulation where the individual replicas are in directories names dir0, dir1, etc and the plumed.dat file is in the parent directory you will need a command such as

mpiexec -np 16 --oversubscribe gmx_mpi mdrun -multidir dir? dir?? -plumed ../plumed.dat

To run the PLUMED driver processing a trajectory with multiple processes you will need a command such as

mpiexec -np 16 --oversubscribe plumed driver -multi 16 -plumed plumed.dat --ixtc traj.xtc

If you have random crashes on MacOS, try to set this environemnt variable:

export OMPI_MCA_btl="self,tcp"

Exercise 1: Multiple-windows umbrella sampling with replica exchange

In Exercise 4: Enhancing conformational transitions with multiple-windows umbrella sampling we have seen how to run a multiple-windows umbrella sampling simulation with independent simulations. Here we will run it using replica exchange. The only differences are that:

  • Simulations should be run at the same time using mpiexec
  • You will have to specify a stride for GROMACS to attempt coordinate exchanges, using the -replex option.

It will be sufficient to use a single plumed.dat file that looks like this:

Click on the labels of the actions for more information on what each action computes
tested on v2.8
# vim:ft=plumed
MOLINFO 
STRUCTURE
compulsory keyword a file in pdb format containing a reference structure.
=../reference.pdb phi: TORSION
ATOMS
the four atoms involved in the torsional angle
=__FILL__ psi: TORSION
ATOMS
the four atoms involved in the torsional angle
=__FILL__ bb: RESTRAINT
ARG
the input for this action is the scalar output from one or more other actions.
=phi
KAPPA
compulsory keyword ( default=0.0 ) specifies that the restraint is harmonic and what the values of the force constants on each of the variables are
=200.0
AT
compulsory keyword the position of the restraint
=@replicas:__FILL__ PRINT
ARG
the input for this action is the scalar output from one or more other actions.
=phi,psi,bb.bias
FILE
the name of the file on which to output these quantities
=../colvar_multi.dat
STRIDE
compulsory keyword ( default=1 ) the frequency with which the quantities of interest should be output
=100

According to the instructions above, you should create 32 directories (one per replica), place the tpr file (for this exercise: topolA.tpr) in each of them, and run the following command

mpiexec -np 32 --oversubscribe gmx_mpi mdrun -multidir dir? dir?? -plumed ../plumed.dat -s topolA.tpr -replex 200 -nsteps 200000

Notice that by omitting the -replex option you will be able to run a non-replica-exchange umbrella sampling simulation, identical to the one you performed in Exercise 4: Enhancing conformational transitions with multiple-windows umbrella sampling. We will now repeat exercises Exercise 4: Enhancing conformational transitions with multiple-windows umbrella sampling and Exercise 6: Effect of initial conditions using replica exchange. We will also test different initial conditions, as in Exercise 6: Effect of initial conditions. Please run the following four simulations:

For the four simulations, perform a WHAM analysis to compute the weights of each frame, and then compute the relative stability of the two minima (as in Exercise 5: Computing populations and errors). To compute weights you need to do the following steps:

  1. Concatenate the trajectories (gmx_mpi trjcat -cat -f dir?/traj_comp.xtc dir??/traj_comp.xtc -o traj_multi.xtc).
  2. Run plumed driver on the concatenated trajectory (mpiexec -np 32 --oversubscribe plumed driver --ixtc traj_multi.xtc --plumed plumed.dat --multi 32 --trajectory-stride 100).
  3. Read the resulting trajectories, perform WHAM, and compute relative population of the two states adapting this script:
    import wham
    kb=0.008314462618
    T=300
    col=[]
    for i in range(32):
        col.append(plumed.read_as_pandas("colvar_multi." + str(i)+".dat"))
    bias=np.zeros((len(col[0]["bb.bias"]),32))
    for i in range(32):
            bias[:,i]=traj[i]["bb.bias"]
    w=wham.wham(bias,T=kBT)
    tr=col[0].phi
    is_in_B=np.int_(np.logical_and(tr>0,tr<2))
    is_in_A=np.int_(tr<0)
    print(np.average(is_in_B,weights=np.exp(w["logW"]))/np.average(is_in_A,weights=np.exp(w["logW"])))
    

Now answer the following questions:

Exercise 2: Demuxing your trajectories

Close to the end of one of the md.log files produced by your simulation you will find a short report of the accepted exchanges. For instance

Repl  average probabilities:
Repl     0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19   20   21   22   23   24   25   26   27   28   29   30   31
Repl      .30  .32  .29  .24  .21  .17  .16  .21  .31  .27  .28  .26  .23  .21  .16  .08  .11  .23  .28  .27  .31  .32  .31  .36  .34  .25  .18  .01  .21  .28  .26

A result like this one will already warn you that there is a bottleneck between replicas 27 and 28 (only 1 percent of the attempted exchanges have been accepted). Anyway, bottlenecks might be not visible in this representation. The full path in the replica-index space of the continuous trajectories ("demuxed") is much more informative.

We will now "demux" our trajectories. For these short trajectories we can use the demux.pl script provided by GROMACS. Notice that for long trajectories and frequent exchanges it could have problems to process correctly the output file. In particular, since the value of time is written by GROMACS with a limited number of digits, the original script might be confused regarding when exchanges happened. At this link you can find a modified script that solves the problem by asking you the value of the time step and computing the value of time from the step number, that is stored as an integer and does not suffer roundoff problems: https://github.com/srnas/demux .

The demux script can be used to produce files named replica_temp.xvg and replica_index.xvg as follows

demux.pl dir0/md.log

The replica_temp.xvg provides, as a function of time, the number of the replica on which each of the continuous simulations is located. For instance, you can follow the migration in the replica ladder of the first replica as follows:

replica_temp=np.loadtxt("replica_temp.xvg")
plt.plot(replica_temp[:,0],replica_temp[:,1])

The column 1 contains time, whereas the number on column i+1 says which is the current replica index (that is: temperature, in a temperature replica exchange simulation; position of the restraint in a replica-exchange umbrella sampling simulation) of the continuous simulations that started at position i. This file is called "replica_temp" because it has been implemented with temperature replica exchange in mind, but here the index actually refers to the position of the restraint.

Now answer the following two questions:

  • Is there any replica that is able to explore the full range of indexes?
  • Are all the continuous replicas able to explore the full range of indexes?

Notice that each row of the replica_temp.xvg file contains a permutation. The replica_index.xvg file just contains the inverse of this permutation. The replica_index.xvg can be used to generate the "demuxed" (continuous) trajectories with the following command:

import subprocess
subprocess.run("gmx_mpi trjcat -cat -f dir?/traj_comp.xtc dir??/traj_comp.xtc -demux replica_index.xvg -o " + ''.join([" "+str(i)+"_trajout.xtc" for i in range(32)]),shell=True)

The resulting trajectories can be visualized or analyzed as usual and, at variance with the original trajectories, will have no jump or discontinuity but will rather be continuous functions of time. For instance, you could use plumed driver to compute phi on the demuxed trajectories.

Now answer the following two questions:

  • Is there any replica that is able to jump from the metastable state at negative phi to the one at positive phi (or viceversa)?
  • Are all the continuous replicas able to do so?
  • Are these two questions related to the two questions above?

Notice that, if you run your simulation long enough, each "demuxed" trajectory is expected to cover uniformly the whole range of replica indexes. Due to the location of the restraints, this will imply that each "demuxed" trajectory is expected to cover in an approximately uniform manner the range of the biased CV. Thus, to some extent, each of these trajectories should behave similarly to a metadynamics simulation (see ref masterclass-21-4). The flatness of the distribution on the biased CV depends however on the specific parameters of the restraints (stiffness and locations).

Exercise 3: Block analysis from demuxed trajectories

Notice that the WHAM analysis does not need to know where each of the frames come from. This implies that when you run WHAM you can do it equivalently using the concatenation of the original trajectories or the concatenation of the "demuxed" trajectories. The advantage of the latter choice is that you can then perform a block analysis on the resulting trajectory where the number of blocks is exactly equal to the number of replicas. These blocks will be independent simulation, with only two small exceptions:

  • the paths in replica space are partly constrained, since when a replica goes up another replica goes down.
  • replicas might be initialized from correlated conformations (e.g., all of them in A), inducing a correlation.

The second factor can be decreased by improving the way replicas are initialized. The first factor is usually impacting correlation much less than the actual exchanges. These blocks are thus optimally suited to perform a bootstrap analysis of the error without incurring in underestimation due to correlations between blocks.

There is a small tricky issue here. In particular, when we perform the bootstrap analysis, we are going to pick each block a different number of times. Since each block (that is: each "demuxed" trajectory) has been spanning the replica indexes by spending a different time at each replica, the bootstrap trajectory will not satisfy anymore the property that it was generated spending the same time in each replica. The included wham script allows to use this information passing an additional option traj_weight. You can adjust the script below to perform the bootstrap analysis:

bias=np.zeros((2001*32,32))
! demux.pl dir0/md.log
replica_temp=np.loadtxt("replica_temp.xvg")
replica_temp=np.int_(replica_temp[:,1:]) # ignore first column (time) and convert to int
for i in range(32):
col=plumed.read_as_pandas("colvar.{}.dat".format(i))
bias[:,i]=col["bb.bias"]
# here is the calculation done using the full trajectory
w0=wham.wham(bias,T=kb*T)
tr=col.phi
is_in_B=np.int_(np.logical_and(tr>0,tr<2))
is_in_A=np.int_(tr<0)
# here is the resulting ratio in the population of the two minima:
print(np.average(is_in_B,weights=np.exp(w0["logW"]))/np.average(is_in_A,weights=np.exp(w0["logW"])))
# now we run the bootstrap analysis
pop=[]
for i in range(200): # will require some time, you can first play with less than 200 iterations
# here we pick the blocks
c=np.random.choice(32,32)
# here we count how much time was spent in each replica for the resulting trajectory
tr_w=np.zeros(32)
for k in range(32):
tr_w+=np.bincount(replica_temp[:,c[k]],minlength=32)
# we then use wham. The traj_weight option can be used to tell to the script
# how much time was spent at each replica
w=wham.wham(bias.reshape((32,-1,32))[c].reshape((-1,32)),T=kb*T,traj_weight=tr_w)
tr=np.array(col.phi).reshape((32,-1))[c].flatten()
is_in_B=np.int_(np.logical_and(tr>0,tr<2))
is_in_A=np.int_(tr<0)
pop.append(np.average(is_in_B,weights=np.exp(w["logW"]))/np.average(is_in_A,weights=np.exp(w["logW"])))
# and here we print average and standard deviation
print(np.average(pop),np.std(pop))

Notice that this approach is not really standard, so use it with care. There are a few papers in the literature discussing similar ideas, but they usually require estimating the autocorrelation time in advance.

Exercise 4: Bias-exchange metadynamics

We will now run a bias-exchange simulation of alanine dipeptide. In bias-exchange simulations, each replica biases a different collective variable. This is a very practical way to enhance sampling for a large number of variables (as many as the replicas that you can afford!). Notice that they will be biased one at a time. As a matter of fact only those that are useful in identifying a transition state will help, but the other ones will not hurt.

Prepare the input file for a simulation with 3 replicas where the following variables are biased:

  • phi
  • psi
  • none of them

Initialize two of them in structure A (using topolA.tpr) and one of them is structure B (using topolB.tpr). You can use a single input file that looks like this:

Click on the labels of the actions for more information on what each action computes
tested on v2.8
# vim:ft=plumed
MOLINFO 
STRUCTURE
compulsory keyword a file in pdb format containing a reference structure.
=../reference.pdb # this is needed to allow arbitrary pairs to try exchanges # in this case, 0<->1, 0<->2, and 1<->2 RANDOM_EXCHANGES phi: TORSION
ATOMS
the four atoms involved in the torsional angle
=__FILL__ psi: TORSION
ATOMS
the four atoms involved in the torsional angle
=__FILL__ # You can use the same parameters that you used in masterclass 21.4 m: METAD ...
ARG
the input for this action is the scalar output from one or more other actions.
=@replicas:phi,psi,phi
SIGMA
compulsory keyword the widths of the Gaussian hills
=@replicas:__FILL__
HEIGHT
the heights of the Gaussian hills.
=__FILL__ # make sure that there is no bias on the third replica!
BIASFACTOR
use well tempered metadynamics and use this bias factor.
=__FILL__
PACE
compulsory keyword the frequency for hill addition
=__FILL__
GRID_MIN
the lower bounds for the grid
=-pi
GRID_MAX
the upper bounds for the grid
=+pi ...

Now run two separate simulations for 1000000 steps per replica. In one of them you will propose exchanges between replicas with a pace 200, in the other you will not propose any exchange (just omit -replex 200 from the command line). The second run will thus be equivalent to running three simulations (free, metadynamics on phi, metadynamics on psi) that you already ran in PLUMED Masterclass 21.4: Metadynamics .

We will now use WHAM to combine the resulting trajectories. We can proceed as we did above, but taking into account that when analyzing a metadynamics simulations the way to compute the weight is slightly different. As discussed in Exercise 3: Reweighting (unbiasing) a metadynamics simulation, one of the possible manners to obtain the weight is to use the final potential computed along the trajectory. This required a further processing step in a simple metadynamics simulation. Here we can compute the final potential while processing the concatenated trajectory. In practice, the only difference with respect to the analysis done in Exercise 1: Multiple-windows umbrella sampling with replica exchange is that here we will have to process our trajectories using a different input file, where PACE has been set to a large number and HEIGHT set to zero. You can then perform WHAM as in Exercise 1: Multiple-windows umbrella sampling with replica exchange and compute the population of the two metastable states.

After you have calculated the relative populations in the two runs (with and without exchanges), answer the following questions:

Notice that the third replica has been simulated without any metadynamics. This is a so-called neutral replica, that is used sometime in bias-exchange simulations. You can compute the relative population of the two metastable states directly using the populations in that replica (no post-processing needed!).

  • Is the result the same as when using WHAM with all replicas?

Now imagine to perform the bias-exchange simulation again usign only two replicas: one of them biasing psi and the other one with no bias. In other words, you would on purpose forget a variable that is very important:

  • How do you expect the resulting population to be?

Exercise 5: Parallel-tempering metadynamics

We will finally learn how to use parallel-tempering metadynamics. In parallel-tempering metadynamics, sampling is enhanced using parallel-tempering (which enhances all degrees of freedom), whereas metadynamics is used to flatten their histogram. If the biased CV contains a relevant bottleneck and is capable to approximately single out the corresponding transition state, the corresponding transition will be enhanced as well. Notice however that if the parallel-tempering side of the algorithm is sufficient to enhance sampling, it is not necessary to bias a CV that can identify the transition state.

First we will need to prepare our input files. We will use 4 replicas, with temperatures taken from a geometric distribution ranging between 300 and 800K. You should be able to generate the corresponding tpr files using the following script

import numpy as np
import re
T=np.geomspace(300,800,4)
for i in range(len(T)):
with open("top/grompp.mdp") as f:
l=f.read()
with open("top/grompp{}.mdp".format(i),"w") as f:
# if you use this script on your input files, make sure that 300 only appears
# on the temperature line! or better replace it with a placeholder string such as __TEMP__
print(re.sub("300",str(T[i]),l),file=f)
subprocess.run("mkdir -p ptmetad_{}".format(i),shell=True)
# we will initialize some replica in A and some replica in B
if i%2==0:
conf="A"
else:
conf="B"
# we use -maxwarn 1 here since the grompp file has been adapted from an old gromacs version.
# in general, only use this option after you have understood that the warning is harmless
subprocess.run("cd top/; gmx_mpi grompp -f grompp{}.mdp -c conf{}.gro -maxwarn 1 -o ../ptmetad_{}/topol.tpr".format(i,conf,i), shell=True)

You will then be able to run a parallel tempering simulation with the following command

mpiexec -np 4 --oversubscribe gmx_mpi mdrun -multi ptmetad_? -replex 200

Notice that the acceptance will be compute by GROMACS taking into account the fact that simulations are running at different temperatures. Also notice that in order to obtain a large enough acceptance given the temperature span, you will need a number of replicas that grows with the square root of the number of atoms in the system. For solvated molecules, you would typically need tens of replicas at least.

We will now add the metadynamics ingredient, by preparing a suitable PLUMED input file. Since parallel-tempering metadynamics is designed to cope with cases where you do not have a good CV available, we will directly use psi rather than phi!

Click on the labels of the actions for more information on what each action computes
tested on v2.8
# vim:ft=plumed
MOLINFO 
STRUCTURE
compulsory keyword a file in pdb format containing a reference structure.
=../reference.pdb phi: TORSION
ATOMS
the four atoms involved in the torsional angle
=__FILL__ psi: TORSION
ATOMS
the four atoms involved in the torsional angle
=__FILL__ # You can use the same parameters that you used in masterclass 21.4 # However, it is recommended to scale HEIGHT with temperature. # You can do it either using replicas: syntax in HEIGHT or specifying TAU # instead of HEIGHT m: METAD ...
ARG
the input for this action is the scalar output from one or more other actions.
=psi
SIGMA
compulsory keyword the widths of the Gaussian hills
=__FILL__
HEIGHT
the heights of the Gaussian hills.
=__FILL__
BIASFACTOR
use well tempered metadynamics and use this bias factor.
=__FILL__
PACE
compulsory keyword the frequency for hill addition
=__FILL__
GRID_MIN
the lower bounds for the grid
=-pi
GRID_MAX
the upper bounds for the grid
=+pi ...

The analysis will be simpler here. We will just analyze the first replica (in ptmetad_0) as if it was generated using a simple metadynamics simulation. Now compute the usual ratio between the populations of the two metastable minima and answer the following questions:

  • Is the result compatible with what you obtained using umbrella sampling?
  • Was biasing psi useful in this case (you can also try to compute the populations from a parallel tempering simulation without metadynamics to answer this question)?

Exercise 6: Parallel-tempering: pathological case

Repeat exercise Exercise 5: Parallel-tempering metadynamics, but now place your replicas in the range between 300 and 310.

  • Is the population of the two states compatible with what you obtained in the other exercises above.
  • If not, which is the correct answer? Draw some conclusion on how to detect this type of problem in a realistic situation.