Belfast tutorial: Analyzing CVs

Aims

The aim of this tutorial is to introduce the users to the plumed syntax. We will go through the writing of simple collective variable and we will use them to analyse a trajectory in terms of probabilty distributions and free energy.

Learning Outcomes

Once this tutorial is completed students will:

  • Know how to write a simple plumed input file
  • Know how to analyse a trajectory using plumed

Resources

The tarball for this project contains the following files:

  • trajectory-short.xyz : a (short) trajectory for a 16 residue protein in xyz format. All calculations with plumed driver use this trajectory.
  • template.pdb : a single frame from the trajectory that can be used in conjuction with the MOLINFO command

Instructions

PLUMED2 is a library that can be accessed by multiple codes adding a relatively simple and well documented interface. Once PLUMED is installed you can run a plumed executable that can be used for multiple purposes:

plumed --help 

some of the listed options report about the plumed available functionalities, other can be used to tell plumed to do something: from analysing a trajectory to patch the source code of a MD code and so on. All the commands have further options that can be seen using plumed command –help, i.e.:

plumed driver --help

In the following we are going to see how to write an input file for plumed2 that can be used to analyse a trajectory.

A note on units

By default the PLUMED inputs and outputs quantities in the following units:

  • Energy - kJ/mol
  • Length - nanometers
  • Time - picoseconds

If you want to change these units you can do this using the UNITS keyword.

Introduction to the PLUMED input file

A typical input file for PLUMED input is composed by specification of one or more CVs, the printout frequency and a termination line. Comments are denoted with a # and the termination of the input for PLUMED is marked with the keyword ENDPLUMED. Whatever it follows is ignored by PLUMED. You can introduce blank lines. They are not interpreted by PLUMED.

In the following input we will analyse the DISTANCE between the two terminal carbons of a 16 residues peptide, and we will PRINT the results in file named COLVAR.

#my first plumed input:
DISTANCE ATOMS=2,253 LABEL=e2edist

#printout frequency
PRINT ARG=e2edist STRIDE=1 FILE=COLVAR 

#endofinput 
ENDPLUMED
here I can write what I want it won't be read.

Now we can use this simple input file to analyse the trajectory included in the RESOURCES:

plumed driver --plumed plumed.dat --ixyz trajectory-short.xyz --length-units 0.1

NOTE: –length-units 0.1, xyz files, as well as pdb files, are in Angstrom.

You should have a file COLVAR, if you look at it (i.e. more COLVAR) the first two lines should be:

#! FIELDS time e2edist 
 0.000000 2.5613161

NOTE: the first line of the file COLVAR tells you what is the content of each column.

In PLUMED2 the commands defined in the input files are executed in the same order in which they are written, this means that the following input file is wrong:

#printout frequency
PRINT ARG=cvdist STRIDE=1 FILE=COLVAR 
#my first plumed input:
DISTANCE ATOMS=2,253 LABEL=e2edist
#endofinput 
ENDPLUMED
here I can write what I want it won't be read.

Try to run it.

Sometimes, when calculating a collective variable, you may not want to use the positions of a number of atoms directly. Instead you may wish to use the position of a virtual atom whose position is generated based on the positions of a collection of other atoms. For example you might want to use the center of mass of a group of atoms (COM):

Since PLUMED executes the input in order you need to define the new Virtual Atom before using it:

first: COM ATOMS=1,2,3,4,5,6
last: COM ATOMS=251-256

e2edist: DISTANCE ATOMS=2,253
comdist: DISTANCE ATOMS=first,last

PRINT ARG=e2edist,comdist STRIDE=1 FILE=COLVAR 

ENDPLUMED

NOTE: an action (i.e. COM or DISTANCE here) can be either label using LABEL as we did before or as label: ACTION as we have just done here.

With the above input this is what happen inside PLUMED with a STRIDE=1:

  1. calculates the position of the Virtual Atom 'first' as the COM of atoms from 1 to 6;
  2. calculates the position of the Virtual Atom 'last' as the COM of atoms from 251 to 256;
  3. calculates the distance between atoms 2 and 253 and saves it in 'e2edist';
  4. calculates the distance between the two atoms 'first' and 'last' and saves it in 'comdist';
  5. print the content of 'e2edist' and 'comdist' in the file COLVAR

In the above input we have used to different ways of writing the atoms used in COM calculation:

  1. ATOMS=1,2,3,4,5,6 is the explicit list of the atoms we need
  2. ATOMS=251-256 is the range of atoms needed

ranges of atoms can be defined with a stride which can also be negative:

  1. ATOMS=from,to:by (i.e.: 251-256:2)
  2. ATOMS=to,from:-by (i.e.: 256-251:-2)

Now by plotting the content of the COLVAR file we can compare the behaviour in this trajectory of both the terminal carbons as well as of the centre of masses of the terminal residues.

gnuplot

What do you expect to see now by looking at the trajectory? Let's have a look at it

vmd template.pdb trajectory-short.xyz 

Virtual atoms can be used in place of standard atoms everywhere an atom can be given as input, they can also be used together with standard atoms. So for example we can analyse the TORSION angle for a set of Virtual and Standard atoms:

first: COM ATOMS=1-6
last: COM ATOMS=251-256
cvtor: TORSION ATOMS=first,102,138,last

PRINT ARG=cvtor STRIDE=1 FILE=COLVAR 

ENDPLUMED

The above CV don't look smart to learn something about the system we are looking at. In principle CV are used to reduce the complexity of a system by looking at a small number of properties that could be enough to rationalise its behaviour.

Now try to write a collective variable that measures the Radius of Gyration of the system: GYRATION.

NOTE: if what you need for one or more variables is a long list of atoms and not a virtual atom one can use the keyword GROUP. A GROUP can be defined using ATOMS in the same way we saw before, in addition it is also possible to define a GROUP by reading a GROMACS index file.

ca: GROUP ATOMS=9,16,31,55,69,90,102,114,124,138,160,174,194,208,224,238

Now 'ca' is not a virtual atom but a simple list of atoms.

MULTICOLVAR

Sometimes it can be useful to calculate properties of many similar collective variables at the same time, for example one can be interested in calculating the properties of the distances between a group of atoms, or properties linked to the distribution of the dihedral angles of a chain and so on. In PLUMED2 this kind of collective variables fall under the name of MULTICOLVAR (cf. MultiColvar Documentation.) Here we are going to analyse the distances between CA carbons along the chain:

ca: GROUP ATOMS=9,16,31,55,69,90,102,114,124,138,160,174,194,208,224,238
dd: DISTANCES GROUP=ca MEAN MIN={BETA=50} MAX={BETA=0.02} MOMENTS=2

PRINT ARG=dd.mean,dd.min,dd.max,dd.moment-2 STRIDE=1 FILE=COLVAR 

ENDPLUMED

The above input tells PLUMED to calculate all the distances between CA carbons and then look for the mean distance, the minimum distance, the maximum distance and the variance. In this way we have defined four collective variables that are calculated using the distances. These four collective variables are stored as components of the defined action 'dd': dd.mean, dd.min, dd.max, dd.moment-2.

The infrastracture of multicolvar has been used to develop many PLUMED2 collective variables as for example the set of Secondary Structure CVs (ANTIBETARMSD, PARABETARMSD and ALPHARMSD).

MOLINFO STRUCTURE=template.pdb
abeta: ANTIBETARMSD RESIDUES=all TYPE=DRMSD LESS_THAN={RATIONAL R_0=0.08 NN=8 MM=12} STRANDS_CUTOFF=1

PRINT ARG=abeta.lessthan STRIDE=1 FILE=COLVAR 

ENDPLUMED

We have now seen how to write the input some of the many CVs available in PLUMED. More complex CVs will be discussed in the next workshop, Belfast tutorial: Adaptive variables I.

Analysis of Collective Variables

Collective variables are usually used to visualize the Free Energy of a system. Given a system evolving at fixed temperature, fixed number of particles and fixed volume, it will explore different conformations with a probability

\[ P(q)\propto e^{-\frac{U(q)}{kb_BT}} \]

where \( q \) are the microscopic coordinates and \( k_B \) is the Boltzmann constant.

It is possible to analyse the above probabilty as a function of one or more collective variable \( s(q)\):

\[ P(s)\propto \int dq e^{-\frac{U(q)}{kb_BT}} \delta(s-s(q)) \]

where the \( \delta \) function means that to for a given value \( s\) of the collective variable are counted only those conformations for which the CV is \( s\). The probability can be recast to a free energy by taking its logarithm:

\[ F(s)=-k_B T \log P(s) \]

This means that by estimating the probability distribution of a CV it is possible to know the free energy of a system along that CV. Estimating the probability distribution of the conformations of a system is what is called 'sampling'.

In order to estimate a probability distribution one needs to make HISTOGRAM from the calculated CVs. PLUMED2 includes the possibility of histogramming data both on the fly as well as a posteriori as we are going to do now.

MOLINFO STRUCTURE=template.pdb
abeta: ANTIBETARMSD RESIDUES=all TYPE=DRMSD LESS_THAN={RATIONAL R_0=0.08 NN=8 MM=12} STRANDS_CUTOFF=1
ca: GROUP ATOMS=9,16,31,55,69,90,102,114,124,138,160,174,194,208,224,238
DISTANCES ...
GROUP=ca MEAN MIN={BETA=50} MAX={BETA=0.02} MOMENTS=2 LABEL=dd
... DISTANCES 

PRINT ARG=abeta.lessthan,dd.mean,dd.min,dd.max,dd.moment-2 STRIDE=1 FILE=COLVAR 

HISTOGRAM ...
ARG=abeta.lessthan,dd.mean
USE_ALL_DATA
KERNEL=discrete
GRID_MIN=0,0.8
GRID_MAX=4,1.2
GRID_BIN=40,40
GRID_WFILE=histo
... HISTOGRAM

ENDPLUMED

NOTE: HISTOGRAM ... means that what follow is part of the HISTOGRAM function, the same can be done for any action in PLUMED.

The above input tells PLUMED to accumulate the two collective variables on a GRID. In addition the probability can be converted to a free-energy using the flag FREE-ENERGY and setting the temperature using TEMP (i.e. 300K). Histograms can be accumulated in a smoother way by using a KERNEL function, a kernel is a normalised function, for example a normalised gaussian is the default kernel in PLUMED, that is added to the histogram centered in the position of the data. Estimating a probability density using kernels can in principle give more accurate results, on the other hand in addition to the choice of the binning one has to choose a parameter that is the WIDTH of the kernel function. As a rule of thumb: the grid spacing should be smaller (i.e. one half or less) than the BANDWIDTH and the BANDWIDTH should be smaller (i.e. one order of magnitude) than the variance observed/expected for the variable.

HISTOGRAM ...
ARG=abeta.lessthan,dd.mean
USE_ALL_DATA
GRID_MIN=0,0.8
GRID_MAX=4,1.2
GRID_SPACING=0.04,0.004
BANDWIDTH=0.08,0.008
GRID_WFILE=histo
... HISTOGRAM

ENDPLUMED

If you have time less at the end of the session read the manual and look for alternative collective variables to analyse the trajectory. Furthemore try to play with the HISTOGRAM parameters to see the effect of using KERNEL in analysing data.