In this Masterclass, we will discuss how to perform and analyze multi-replica simulations where different replicas feel a different bias potential. We will also understand how to compute statistical errors on the computed quantities.
Once you have completed this Masterclass you will be able to:
For this masterclass you will need versions of PLUMED and GROMACS that are compiled using the MPI library. The versions used in the previous masterclasses will thus not work properly. In order to obtain the correct versions, please use the following commands:
conda install --strict-channel-priority -c plumed/label/masterclass-mpi -c conda-forge plumed conda install --strict-channel-priority -c plumed/label/masterclass-mpi -c conda-forge gromacs
The --strict-channel-priority
might be necessary in case your conda install is configured to download packages from the bioconda
channel. Indeed, bioconda
contains a version of GROMACS that is not patched with PLUMED and would thus not work here. Similarly, the channel plumed/label/masterclass-mpi
should receive a priority higher than conda-forge
, so as to install the MPI version of PLUMED.
On Linux, the command above should install the following packages:
gromacs plumed/label/masterclass-mpi/linux-64::gromacs-2019.6-h3fd9d12_100 plumed plumed/label/masterclass-mpi/linux-64::plumed-2.7.0-h3fd9d12_100 mpi conda-forge/linux-64::mpi-1.0-openmpi openmpi conda-forge/linux-64::openmpi-4.1.0-h9b22176_1 [ etc ... ]
The exact versions might be different. Notice however that GROMACS and PLUMED come from the plumed/label/masterclass-mpi
channel, whereas the required libraries come from the conda-forge
channel. To be sure the installed GROMACS is compiled with MPI patched with PLUMED, try the following shell command:
gmx_mpi mdrun -h 2> /dev/null | grep -q plumed && echo ok
It should print ok
. To be sure that PLUMED has been compiled with MPI, try the following shell command:
plumed --has-mpi && echo ok
It should print ok
.
Please ensure that you have setup PLUMED and GROMACS on your machine before starting the exercises. Also notice that in order to obtain good performances it is better to compile GROMACS from source on the machine you are running your simulations. You can find out in the PLUMED documention how to patch GROMACS with PLUMED so as to be able to install it from source. For this tutorial, the conda precompiled binaries will be sufficient.
The data needed to execute the exercises of this Masterclass can be found on GitHub. You can clone this repository locally on your machine using the following command:
git clone https://github.com/plumed/masterclass-21-5.git
Throughout this tutorial we will run simulations of alanine dipeptide in vacuum using GROMACS and PLUMED. Whereas this system is too simple to be considered a proper benchmark for enhanced sampling methods, it is complex enough to be used in learning them. Notice that, although PLUMED has a portable interface, the support for replica-exchange simulations is limited depending on the specific molecular dynamics engine. You should check the documentation of the MD code you are using to know if replica exchange simulations will work correctly with PLUMED.
doc-v2.7/user-doc
with the string doc-v2.6/user-doc
in the address bar.Many methods are based on the simultaneous simulation of multiple replicas. In some cases, all the replicas will use the same input file, whereas in other cases a separate input file should be provided for each replica. Notice that using a single input file does not imply that all the replicas will feel the same biasing potential. Indeed, since biasing potentials in PLUMED might be history dependent, and the history of each replica might different from the history of other replicas, the potentials might in the end be different.
PLUMED has been designed so that multiple-replica simulations can be run even if all the replicas are acting in the same directory. In order to avoid clashes in output files, thus, PLUMED will append a suffix corresponding to the index of the replica to the name of each output file (for instance, the command PRINT FILE=colvar.dat
will print on a file names colvar.0.dat
in the first replica, etc.). Suffixes will be added also to input files, so that if you run a simulation where the input file is plumed.dat
, the first replica will open a file named plumed.0.dat
, and so on. However, for input files, if the file including the suffix does not exist, PLUMED will look for the file without the suffix (in the example, plumed.dat
). This provides maximum flexibility and allows to manage both cases where the input file is the same and cases where specific input files should be used.
In addition to this, it is possible to use a Special replica syntax that allows to differentiate the input of different replicas, even if they are all reading the same plumed.dat
file. For instance, the command RESTRAINT ARG=d AT=@replicas:1.0,1.1,1.2 KAPPA=1.0
will apply restraints at different positions for three replicas.
Notice however that starting with GROMACS 2019 replica simulations are forced to run in separate directories. To exploit the possibility to use a single input file, one should put it in the parent directory and refer to is as -plumed ../plumed.dat
. Output files will be produced in separate directories by default, but their names will be suffixed. If you want the PLUMED output files to be in the parent directory, just prepend their name with ../
(as in PRINT FILE=../colvar.dat
).
In order to use multiple-replica methods, you should run your simulation using MPI. This can be done prefixing your command with mpiexec -np N --oversubscribe
, where N
is the number of processes that you want to use and the --oversubscribe
option is an OpenMPI option that is required to use more processes than the number of available processors. This is typically suboptimal, but we will need it in our lectures to run, e.g., simulations with 32 replicas even if we have a computer with 4 cores.
In brief, to run a GROMACS simulation where the individual replicas are in directories names dir0
, dir1
, etc and the plumed.dat
file is in the parent directory you will need a command such as
To run the PLUMED driver processing a trajectory with multiple processes you will need a command such as
If you have random crashes on MacOS, try to set this environemnt variable:
export OMPI_MCA_btl="self,tcp"
In Exercise 4: Enhancing conformational transitions with multiple-windows umbrella sampling we have seen how to run a multiple-windows umbrella sampling simulation with independent simulations. Here we will run it using replica exchange. The only differences are that:
mpiexec
-replex
option.It will be sufficient to use a single plumed.dat
file that looks like this:
# vim:ft=plumed MOLINFOSTRUCTURE=../reference.pdb phi: TORSIONcompulsory keyword a file in pdb format containing a reference structure.ATOMS=__FILL__ psi: TORSIONthe four atoms involved in the torsional angleATOMS=__FILL__ bb: RESTRAINTthe four atoms involved in the torsional angleARG=phithe input for this action is the scalar output from one or more other actions.KAPPA=200.0compulsory keyword ( default=0.0 ) specifies that the restraint is harmonic and what the values of the force constants on each of the variables areAT=@replicas:__FILL__ PRINTcompulsory keyword the position of the restraintARG=phi,psi,bb.biasthe input for this action is the scalar output from one or more other actions.FILE=../colvar_multi.datthe name of the file on which to output these quantitiesSTRIDE=100compulsory keyword ( default=1 ) the frequency with which the quantities of interest should be output
According to the instructions above, you should create 32 directories (one per replica), place the tpr file (for this exercise: topolA.tpr
) in each of them, and run the following command
mpiexec -np 32 --oversubscribe gmx_mpi mdrun -multidir dir? dir?? -plumed ../plumed.dat -s topolA.tpr -replex 200 -nsteps 200000
Notice that by omitting the -replex
option you will be able to run a non-replica-exchange umbrella sampling simulation, identical to the one you performed in Exercise 4: Enhancing conformational transitions with multiple-windows umbrella sampling. We will now repeat exercises Exercise 4: Enhancing conformational transitions with multiple-windows umbrella sampling and Exercise 6: Effect of initial conditions using replica exchange. We will also test different initial conditions, as in Exercise 6: Effect of initial conditions. Please run the following four simulations:
-replex
(will be identical to Exercise 4: Enhancing conformational transitions with multiple-windows umbrella sampling)-replex
(will be identical to Exercise 6: Effect of initial conditions)-replex 200
-replex 200
For the four simulations, perform a WHAM analysis to compute the weights of each frame, and then compute the relative stability of the two minima (as in Exercise 5: Computing populations and errors). To compute weights you need to do the following steps:
gmx_mpi trjcat -cat -f dir?/traj_comp.xtc dir??/traj_comp.xtc -o traj_multi.xtc
).mpiexec -np 32 --oversubscribe plumed driver --ixtc traj_multi.xtc --plumed plumed.dat --multi 32 --trajectory-stride 100
).import wham kb=0.008314462618 T=300 col=[] for i in range(32): col.append(plumed.read_as_pandas("colvar_multi." + str(i)+".dat")) bias=np.zeros((len(col[0]["bb.bias"]),32)) for i in range(32): bias[:,i]=traj[i]["bb.bias"] w=wham.wham(bias,T=kBT) tr=col[0].phi is_in_B=np.int_(np.logical_and(tr>0,tr<2)) is_in_A=np.int_(tr<0) print(np.average(is_in_B,weights=np.exp(w["logW"]))/np.average(is_in_A,weights=np.exp(w["logW"])))
Now answer the following questions:
Close to the end of one of the md.log
files produced by your simulation you will find a short report of the accepted exchanges. For instance
Repl average probabilities: Repl 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Repl .30 .32 .29 .24 .21 .17 .16 .21 .31 .27 .28 .26 .23 .21 .16 .08 .11 .23 .28 .27 .31 .32 .31 .36 .34 .25 .18 .01 .21 .28 .26
A result like this one will already warn you that there is a bottleneck between replicas 27 and 28 (only 1 percent of the attempted exchanges have been accepted). Anyway, bottlenecks might be not visible in this representation. The full path in the replica-index space of the continuous trajectories ("demuxed") is much more informative.
We will now "demux" our trajectories. For these short trajectories we can use the demux.pl
script provided by GROMACS. Notice that for long trajectories and frequent exchanges it could have problems to process correctly the output file. In particular, since the value of time is written by GROMACS with a limited number of digits, the original script might be confused regarding when exchanges happened. At this link you can find a modified script that solves the problem by asking you the value of the time step and computing the value of time from the step number, that is stored as an integer and does not suffer roundoff problems: https://github.com/srnas/demux .
The demux script can be used to produce files named replica_temp.xvg
and replica_index.xvg
as follows
demux.pl dir0/md.log
The replica_temp.xvg
provides, as a function of time, the number of the replica on which each of the continuous simulations is located. For instance, you can follow the migration in the replica ladder of the first replica as follows:
replica_temp=np.loadtxt("replica_temp.xvg") plt.plot(replica_temp[:,0],replica_temp[:,1])
The column 1 contains time, whereas the number on column i+1 says which is the current replica index (that is: temperature, in a temperature replica exchange simulation; position of the restraint in a replica-exchange umbrella sampling simulation) of the continuous simulations that started at position i. This file is called "replica_temp" because it has been implemented with temperature replica exchange in mind, but here the index actually refers to the position of the restraint.
Now answer the following two questions:
Notice that each row of the replica_temp.xvg
file contains a permutation. The replica_index.xvg
file just contains the inverse of this permutation. The replica_index.xvg
can be used to generate the "demuxed" (continuous) trajectories with the following command:
import subprocess subprocess.run("gmx_mpi trjcat -cat -f dir?/traj_comp.xtc dir??/traj_comp.xtc -demux replica_index.xvg -o " + ''.join([" "+str(i)+"_trajout.xtc" for i in range(32)]),shell=True)
The resulting trajectories can be visualized or analyzed as usual and, at variance with the original trajectories, will have no jump or discontinuity but will rather be continuous functions of time. For instance, you could use plumed driver
to compute phi on the demuxed trajectories.
Now answer the following two questions:
Notice that, if you run your simulation long enough, each "demuxed" trajectory is expected to cover uniformly the whole range of replica indexes. Due to the location of the restraints, this will imply that each "demuxed" trajectory is expected to cover in an approximately uniform manner the range of the biased CV. Thus, to some extent, each of these trajectories should behave similarly to a metadynamics simulation (see ref masterclass-21-4). The flatness of the distribution on the biased CV depends however on the specific parameters of the restraints (stiffness and locations).
Notice that the WHAM analysis does not need to know where each of the frames come from. This implies that when you run WHAM you can do it equivalently using the concatenation of the original trajectories or the concatenation of the "demuxed" trajectories. The advantage of the latter choice is that you can then perform a block analysis on the resulting trajectory where the number of blocks is exactly equal to the number of replicas. These blocks will be independent simulation, with only two small exceptions:
The second factor can be decreased by improving the way replicas are initialized. The first factor is usually impacting correlation much less than the actual exchanges. These blocks are thus optimally suited to perform a bootstrap analysis of the error without incurring in underestimation due to correlations between blocks.
There is a small tricky issue here. In particular, when we perform the bootstrap analysis, we are going to pick each block a different number of times. Since each block (that is: each "demuxed" trajectory) has been spanning the replica indexes by spending a different time at each replica, the bootstrap trajectory will not satisfy anymore the property that it was generated spending the same time in each replica. The included wham script allows to use this information passing an additional option traj_weight
. You can adjust the script below to perform the bootstrap analysis:
Notice that this approach is not really standard, so use it with care. There are a few papers in the literature discussing similar ideas, but they usually require estimating the autocorrelation time in advance.
We will now run a bias-exchange simulation of alanine dipeptide. In bias-exchange simulations, each replica biases a different collective variable. This is a very practical way to enhance sampling for a large number of variables (as many as the replicas that you can afford!). Notice that they will be biased one at a time. As a matter of fact only those that are useful in identifying a transition state will help, but the other ones will not hurt.
Prepare the input file for a simulation with 3 replicas where the following variables are biased:
phi
psi
Initialize two of them in structure A (using topolA.tpr) and one of them is structure B (using topolB.tpr). You can use a single input file that looks like this:
# vim:ft=plumed MOLINFOSTRUCTURE=../reference.pdb # this is needed to allow arbitrary pairs to try exchanges # in this case, 0<->1, 0<->2, and 1<->2 RANDOM_EXCHANGES phi: TORSIONcompulsory keyword a file in pdb format containing a reference structure.ATOMS=__FILL__ psi: TORSIONthe four atoms involved in the torsional angleATOMS=__FILL__ # You can use the same parameters that you used in masterclass 21.4 m: METAD ...the four atoms involved in the torsional angleARG=@replicas:phi,psi,phithe input for this action is the scalar output from one or more other actions.SIGMA=@replicas:__FILL__compulsory keyword the widths of the Gaussian hillsHEIGHT=__FILL__ # make sure that there is no bias on the third replica!the heights of the Gaussian hills.BIASFACTOR=__FILL__use well tempered metadynamics and use this bias factor.PACE=__FILL__compulsory keyword the frequency for hill additionGRID_MIN=-pithe lower bounds for the gridGRID_MAX=+pi ...the upper bounds for the grid
Now run two separate simulations for 1000000 steps per replica. In one of them you will propose exchanges between replicas with a pace 200, in the other you will not propose any exchange (just omit -replex 200
from the command line). The second run will thus be equivalent to running three simulations (free, metadynamics on phi, metadynamics on psi) that you already ran in PLUMED Masterclass 21.4: Metadynamics .
We will now use WHAM to combine the resulting trajectories. We can proceed as we did above, but taking into account that when analyzing a metadynamics simulations the way to compute the weight is slightly different. As discussed in Exercise 3: Reweighting (unbiasing) a metadynamics simulation, one of the possible manners to obtain the weight is to use the final potential computed along the trajectory. This required a further processing step in a simple metadynamics simulation. Here we can compute the final potential while processing the concatenated trajectory. In practice, the only difference with respect to the analysis done in Exercise 1: Multiple-windows umbrella sampling with replica exchange is that here we will have to process our trajectories using a different input file, where PACE
has been set to a large number and HEIGHT
set to zero. You can then perform WHAM as in Exercise 1: Multiple-windows umbrella sampling with replica exchange and compute the population of the two metastable states.
After you have calculated the relative populations in the two runs (with and without exchanges), answer the following questions:
Notice that the third replica has been simulated without any metadynamics. This is a so-called neutral replica, that is used sometime in bias-exchange simulations. You can compute the relative population of the two metastable states directly using the populations in that replica (no post-processing needed!).
Now imagine to perform the bias-exchange simulation again usign only two replicas: one of them biasing psi and the other one with no bias. In other words, you would on purpose forget a variable that is very important:
We will finally learn how to use parallel-tempering metadynamics. In parallel-tempering metadynamics, sampling is enhanced using parallel-tempering (which enhances all degrees of freedom), whereas metadynamics is used to flatten their histogram. If the biased CV contains a relevant bottleneck and is capable to approximately single out the corresponding transition state, the corresponding transition will be enhanced as well. Notice however that if the parallel-tempering side of the algorithm is sufficient to enhance sampling, it is not necessary to bias a CV that can identify the transition state.
First we will need to prepare our input files. We will use 4 replicas, with temperatures taken from a geometric distribution ranging between 300 and 800K. You should be able to generate the corresponding tpr files using the following script
You will then be able to run a parallel tempering simulation with the following command
mpiexec -np 4 --oversubscribe gmx_mpi mdrun -multi ptmetad_? -replex 200
Notice that the acceptance will be compute by GROMACS taking into account the fact that simulations are running at different temperatures. Also notice that in order to obtain a large enough acceptance given the temperature span, you will need a number of replicas that grows with the square root of the number of atoms in the system. For solvated molecules, you would typically need tens of replicas at least.
We will now add the metadynamics ingredient, by preparing a suitable PLUMED input file. Since parallel-tempering metadynamics is designed to cope with cases where you do not have a good CV available, we will directly use psi
rather than phi
!
# vim:ft=plumed MOLINFOSTRUCTURE=../reference.pdb phi: TORSIONcompulsory keyword a file in pdb format containing a reference structure.ATOMS=__FILL__ psi: TORSIONthe four atoms involved in the torsional angleATOMS=__FILL__ # You can use the same parameters that you used in masterclass 21.4 # However, it is recommended to scale HEIGHT with temperature. # You can do it either using replicas: syntax in HEIGHT or specifying TAU # instead of HEIGHT m: METAD ...the four atoms involved in the torsional angleARG=psithe input for this action is the scalar output from one or more other actions.SIGMA=__FILL__compulsory keyword the widths of the Gaussian hillsHEIGHT=__FILL__the heights of the Gaussian hills.BIASFACTOR=__FILL__use well tempered metadynamics and use this bias factor.PACE=__FILL__compulsory keyword the frequency for hill additionGRID_MIN=-pithe lower bounds for the gridGRID_MAX=+pi ...the upper bounds for the grid
The analysis will be simpler here. We will just analyze the first replica (in ptmetad_0
) as if it was generated using a simple metadynamics simulation. Now compute the usual ratio between the populations of the two metastable minima and answer the following questions:
Repeat exercise Exercise 5: Parallel-tempering metadynamics, but now place your replicas in the range between 300 and 310.