PLUMED can be used to analyze trajectories either on the fly during an MD run or via post processing a trajectory using driver. A molecular dynamics trajectory is in essence an ordered set of configurations of atoms. Trajectory analysis algorithms are methods that allow us to extract meaningful information from this extremely high-dimensionality information. In extracting this information much of the information in the trajectory will be discarded and assumed to be irrelevant to the problem at hand. For example, when we calculate a histogram from a trajectory we throw away all information on the order the frames were visited during the trajectory. We instead opt to display a time average that shows the parts of configuration space that were
visited most frequently. There are many situations in which this is a reasonable thing to do as we know that time averages are equivalent to ensemble averages in the long timescale limit and that these average probabilities of being in different parts of configuration space, \(P(s)\), are thus related to the underlying free energy, \(F(s)\), via:
\[ F(s) = - k_B T \ln P(s) \]
About the simplest form of analysis that PLUMED can perform involves printing information to a file. PLUMED can output various different kinds of information to files as described below:
COMMITTOR | Does a committor analysis. |
DUMPATOMS | Dump selected atoms on a file. |
DUMPDERIVATIVES | Dump the derivatives with respect to the input parameters for one or more objects (generally CVs, functions or biases). |
DUMPFORCES | Dump the force acting on one of a values in a file. |
DUMPMASSCHARGE | Dump masses and charges on a selected file. |
DUMPPDB | Output PDB file. |
DUMPPROJECTIONS | Dump the derivatives with respect to the input parameters for one or more objects (generally CVs, functions or biases). |
DUMPVECTOR | Print a vector to a file |
Print quantities to a file. | |
PRINT_NDX | Print an ndx file |
SELECT_COMPONENTS | Create a new value to hold a subset of the components that are in a vector or matrix |
SELECT_WITH_MASK | Use a mask to select elements of an array |
UPDATE_IF | Conditional update of other actions. |
The UPDATE_IF action allows you to do more complex things using the above print commands. As detailed in the documentation for UPDATE_IF when you put any of the above actions within an UPDATE_IF block then data will only be output to the file if colvars are within particular ranges. In other words, the above printing commands, in tandem with UPDATE_IF, allow you to identify the frames in your trajectory that satisfy some particular criteria and output information on those frames only.
Another useful command is the COMMITTOR command. As detailed in the documentation for COMMITTOR this command tells PLUMED (and the underlying MD code) to stop the calculation one some criteria is satisfied, alternatively one can use it to keep track of the number of times a criteria is satisfied.
A number of more complicated forms of analysis can be performed that take a number of frames from the trajectory as input. In all these commands the STRIDE keyword is used to tell PLUMED how frequently to collect data from the trajectory. In all these methods the output from the analysis is a form of ensemble average. If you are running with a bias it is thus likely that you may want to reweight the trajectory frames in order to remove the effect the bias has on the static behavior of the system. The following methods can thus be used to calculate weights for the various trajectory frames so that the final ensemble average is an average for the canonical ensemble at the appropriate temperature.
COVARIANCE_MATRIX | Calculate a covariance matix |
REWEIGHT_BIAS | Calculate weights for ensemble averages that negate the effect the bias has on the region of phase space explored |
REWEIGHT_METAD | Calculate the weights configurations should contribute to the histogram in a simulation in which a metadynamics bias acts upon the system. |
REWEIGHT_TEMP_PRESS | Calculate weights for ensemble averages at temperatures and/or pressures different than those used in your original simulation. |
WHAM | Calculate the weights for configurations using the weighted histogram analysis method. |
WHAM_HISTOGRAM | This can be used to output the a histogram using the weighted histogram technique |
WHAM_WEIGHTS | Calculate and output weights for configurations using the weighted histogram analysis method. |
You can then calculate ensemble averages using the following actions.
ACCUMULATE | Sum the elements of this value over the course of the trajectory |
AVERAGE | Calculate the ensemble average of a collective variable |
CUSTOM_GRID | Calculate a function of the grid or grids that are input and return a new grid |
EVALUATE_FUNCTION_FROM_GRID | Calculate the function stored on the input grid at an arbitrary point |
EVALUATE_FUNCTION_FROM_GRID_SCALAR | Calculate the function stored on the input grid at an arbitrary point |
EVALUATE_FUNCTION_FROM_GRID_VECTOR | Calculate the function stored on the input grid for each of the points in the input vector/s |
FIND_GRID_MAXIMUM | Find the point with the highest value of the function on the grid |
FIND_GRID_MINIMUM | Find the point with the lowest value of the function on the grid |
HISTOGRAM | Accumulate the average probability density along a few CVs from a trajectory. |
INTEGRATE_GRID | Calculate the numerical integral of the function stored on the grid |
MATHEVAL_GRID | Calculate a function of the grid or grids that are input and return a new grid |
MULTICOLVARDENS | Evaluate the average value of a multicolvar on a grid. |
REFERENCE_GRID | Setup a constant grid by either reading values from a file or definining a function in input |
SUM_GRID | Sum the values of all the function at the points on a grid |
For many of the above commands data is accumulated on the grids. These grids can be further analyzed using one of the actions detailed below at some time.
CONVERT_TO_FES | Convert a histogram to a free energy surface. |
DUMPCONTOUR | Print the contour |
DUMPCUBE | Output a three dimensional grid using the Gaussian cube file format. |
DUMPGRID | Output the function on the grid to a file with the PLUMED grid format. |
FIND_CONTOUR | Find an isocontour in a smooth function. |
FIND_CONTOUR_SURFACE | Find an isocontour by searching along either the x, y or z direction. |
FIND_SPHERICAL_CONTOUR | Find an isocontour in a three dimensional grid by searching over a Fibonacci sphere. |
FOURIER_TRANSFORM | Compute the Discrete Fourier Transform (DFT) by means of FFTW of data stored on a 2D grid. |
INTERPOLATE_GRID | Interpolate a smooth function stored on a grid onto a grid with a smaller grid spacing. |
As an example the following set of commands instructs PLUMED to calculate the distance between atoms 1 and 2 for every fifth frame in the trajectory and to accumulate a histogram from this data which will be output every 100 steps (i.e. when 20 distances have been added to the histogram).
x: DISTANCEATOMS=1,2 h: HISTOGRAMthe pair of atom that we are calculating the distance between.ARG=xthe quantities that are being used to construct the histogramGRID_MIN=0.0compulsory keyword ( default=auto ) the lower bounds for the gridGRID_MAX=3.0compulsory keyword ( default=auto ) the upper bounds for the gridGRID_BIN=100the number of bins for the gridBANDWIDTH=0.1the bandwidths for kernel density esimtationSTRIDE=5 DUMPGRIDcompulsory keyword ( default=1 ) the frequency with which to store data for averagingGRID=hthe grid you would like to print (can also use ARG for specifying what is being printed)FILE=histocompulsory keyword ( default=density ) the file on which to write the grid.STRIDE=100compulsory keyword ( default=0 ) the frequency with which the grid should be output to the file.
It is important to note when using commands such as the above the first frame in the trajectory is assumed to be the initial configuration that was input to the MD code. It is thus ignored. Furthermore, if you are running with driver and you would like to analyze the whole trajectory (without specifying its length) and then print the result you simply call DUMPGRID (or any of the commands above) without a STRIDE keyword as shown in the example below.
x: DISTANCEATOMS=1,2 h: HISTOGRAMthe pair of atom that we are calculating the distance between.ARG=xthe quantities that are being used to construct the histogramGRID_MIN=0.0compulsory keyword ( default=auto ) the lower bounds for the gridGRID_MAX=3.0compulsory keyword ( default=auto ) the upper bounds for the gridGRID_BIN=100the number of bins for the gridBANDWIDTH=0.1the bandwidths for kernel density esimtationSTRIDE=5 DUMPGRIDcompulsory keyword ( default=1 ) the frequency with which to store data for averagingGRID=hthe grid you would like to print (can also use ARG for specifying what is being printed)FILE=histocompulsory keyword ( default=density ) the file on which to write the grid.
Please note that even with this calculation the first frame in the trajectory is ignored when computing the histogram.
Notice that all the commands for calculating smooth functions described above calculate some sort of average. There are two ways that you may wish to average the data in your trajectory:
PLUMED has a number of diagnostic tools that can be used to check that new Actions are working correctly:
DUMPFORCES | Dump the force acting on one of a values in a file. |
DUMPDERIVATIVES | Dump the derivatives with respect to the input parameters for one or more objects (generally CVs, functions or biases). |
DUMPMASSCHARGE | Dump masses and charges on a selected file. |
DUMPPROJECTIONS | Dump the derivatives with respect to the input parameters for one or more objects (generally CVs, functions or biases). |
These commands allow you to test that derivatives and forces are calculated correctly within colvars and functions. One place where this is very useful is when you are testing whether or not you have implemented the derivatives of a new collective variables correctly. So for example if we wanted to do such a test on the distance CV we would employ an input file something like this:
d1: DISTANCEATOMS=1,2 d1n: DISTANCEthe pair of atom that we are calculating the distance between.ATOMS=1,2the pair of atom that we are calculating the distance between.NUMERICAL_DERIVATIVESDUMPDERIVATIVES( default=off ) calculate the derivatives for these quantities numericallyARG=d1,d1ncompulsory keyword the labels of the values whose derivatives should be outputFILE=derivativescompulsory keyword the name of the file on which to output the derivatives
The first of these two distance commands calculates the analytical derivatives of the distance while the second calculates these derivatives numerically. Obviously, if your CV is implemented correctly these two sets of quantities should be nearly identical.
All the analysis methods described in previous sections accumulate averages or output diagnostic information on the fly. That is to say these methods calculate something given the instantaneous positions of the atoms or the instantaneous values of a set of collective variables. Many methods (e.g. dimensionality reduction and clustering) will not work like this, however, as information from multiple trajectory frames is required at the point when the analysis is performed. In other words the output from these types of analysis cannot be accumulated one frame at time. When using these methods you must therefore store trajectory frames for later analysis. You can do this storing by using the following action:
COLLECT_FRAMES | Collect and store trajectory frames for later analysis with one of the methods detailed below. |
One of the simplest things that we can do if we have stored a set of trajectory frames using COLLECT_FRAMES is we can calculate the dissimilarity between every pair of frames we have stored. When using the dimensionality reduction algorithms described in the sections that follow the first step is to calculate this matrix. Consequently, within PLUMED the following command will collect the trajectory data as your simulation progressed and calculate the dissimilarities:
EUCLIDEAN_DISSIMILARITIES | Calculate the matrix of dissimilarities between a trajectory of atomic configurations. |
By exploiting the functionality described in Distances from reference configurations you can calculate these dissimilarities in a wide variety of different ways (e.g. you can use RMSD, or you can use a collection of collective variable values see TARGET). If you wish to view this dissimilarity information you can print these quantities to a file using:
PRINT_DISSIMILARITY_MATRIX | Print the matrix of dissimilarities between a trajectory of atomic configurations. |
In addition, if PLUMED does not calculate the dissimilarities you need you can read this information from an external file
READ_DISSIMILARITY_MATRIX | Read a matrix of dissimilarities between a trajectory of atomic configurations from a file. |
N.B. You can only use the two commands above when you are doing post-processing.
Many of the techniques described in the following sections are very computationally expensive to run on large trajectories. A common strategy is thus to use a landmark selection algorithm to pick a particularly-representative subset of trajectory frames and to only apply the expensive analysis algorithm on these configurations. The various landmark selection algorithms that are available in PLUMED are as follows
FARTHEST_POINT_SAMPLING | Select a set of landmarks using farthest point sampling. |
LANDMARK_SELECT_FPS | Select a of landmarks from a large set of configurations using farthest point sampling. |
LANDMARK_SELECT_RANDOM | Select a random set of landmarks from a large set of configurations. |
LANDMARK_SELECT_STRIDE | Select every ith frame from the stored data |
In general most of these landmark selection algorithms must be used in tandem with a dissimilarity matrix object as as follows:
d1: DISTANCEATOMS=1,2 d2: DISTANCEthe pair of atom that we are calculating the distance between.ATOMS=3,4 d3: DISTANCEthe pair of atom that we are calculating the distance between.ATOMS=5,6 data: COLLECT_FRAMESthe pair of atom that we are calculating the distance between.ARG=d1,d2,d3the labels of the values whose time series you would like to collect for later analysisSTRIDE=1 ss1: EUCLIDEAN_DISSIMILARITIEScompulsory keyword ( default=1 ) the frequency with which data should be stored for analysis.USE_OUTPUT_DATA_FROM=data ll2: LANDMARK_SELECT_FPScould not find this keywordUSE_OUTPUT_DATA_FROM=ss1could not find this keywordNLANDMARKS=300 OUTPUT_ANALYSIS_DATA_TO_COLVARcompulsory keyword the numbe rof landmarks you would like to createUSE_OUTPUT_DATA_FROM=ll2could not find this keywordFILE=mylandmarkscould not find this keyword
When landmark selection is performed in this way a weight is ascribed to each of the landmark configurations. This weight is calculated by summing the weights of all the trajectory frames in each of the landmarks Voronoi polyhedron (https://en.wikipedia.org/wiki/Voronoi_diagram). The weight of each trajectory frame is one unless you are reweighting using the formula described in the Reweighting and Averaging to counteract the fact of a simulation bias or an elevated temperature. If you are reweighting using these formula the weight of each of the points is equal to the exponential term in the numerator of these expressions.
Many dimensionality reduction algorithms work in a manner similar to the way we use when we make maps. You start with distances between London, Belfast, Paris and Dublin and then you try to arrange points on a piece of paper so that the (suitably transformed) distances between the points in your map representing each of those cities are related to the true distances between the cities.
Stating this more mathematically MDS endeavors to find an isometry between points distributed in a high-dimensional space and a set of points distributed in a low-dimensional plane.
In other words, if we have \(M\) \(D\)-dimensional points, \(\mathbf{X}\), and we can calculate dissimilarities between pairs them, \(D_{ij}\), we can, with an MDS calculation, try to create \(M\) projections, \(\mathbf{x}\), of the high dimensionality points in a \(d\)-dimensional linear space by trying to arrange the projections so that the Euclidean distances between pairs of them, \(d_{ij}\), resemble the dissimilarities between the high dimensional points. In short we minimize:
\[ \chi^2 = \sum_{i \ne j} w_i w_j \left( F(D_{ij}) - f(d_{ij}) \right)^2 \]
where \(F(D_{ij})\) is some transformation of the distance between point \(X^{i}\) and point \(X^{j}\) and \(f(d_{ij})\) is some transformation of the distance between the projection of \(X^{i}\), \(x^i\), and the projection of \(X^{j}\), \(x^j\). \(w_i\) and \(w_j\) are the weights of configurations \(X^i\) and \(^j\) respectively. These weights are calculated using the reweighting and Voronoi polyhedron approaches described in previous sections. A tutorial on dimensionality reduction and how it can be used to analyze simulations can be found in the tutorial lugano-5 and in the following short video.
Within PLUMED running an input to run a dimensionality reduction algorithm can be as simple as:
d1: DISTANCEATOMS=1,2 d2: DISTANCEthe pair of atom that we are calculating the distance between.ATOMS=3,4 d3: DISTANCEthe pair of atom that we are calculating the distance between.ATOMS=5,6 data: COLLECT_FRAMESthe pair of atom that we are calculating the distance between.STRIDE=1compulsory keyword ( default=1 ) the frequency with which data should be stored for analysis.ARG=d1,d2,d3 ss1: EUCLIDEAN_DISSIMILARITIESthe labels of the values whose time series you would like to collect for later analysisUSE_OUTPUT_DATA_FROM=data mds: CLASSICAL_MDScould not find this keywordUSE_OUTPUT_DATA_FROM=ss1could not find this keywordNLOW_DIM=2compulsory keyword number of low-dimensional coordinates required
Where we have to use the EUCLIDEAN_DISSIMILARITIES action here in order to calculate the matrix of dissimilarities between trajectory frames. We can even throw some landmark selection into this procedure and perform
d1: DISTANCEATOMS=1,2 d2: DISTANCEthe pair of atom that we are calculating the distance between.ATOMS=3,4 d3: DISTANCEthe pair of atom that we are calculating the distance between.ATOMS=5,6 data: COLLECT_FRAMESthe pair of atom that we are calculating the distance between.STRIDE=1compulsory keyword ( default=1 ) the frequency with which data should be stored for analysis.ARG=d1,d2,d3 matrix: EUCLIDEAN_DISSIMILARITIESthe labels of the values whose time series you would like to collect for later analysisUSE_OUTPUT_DATA_FROM=data ll2: LANDMARK_SELECT_FPScould not find this keywordUSE_OUTPUT_DATA_FROM=matrixcould not find this keywordNLANDMARKS=300 mds: CLASSICAL_MDScompulsory keyword the numbe rof landmarks you would like to createUSE_OUTPUT_DATA_FROM=ll2could not find this keywordNLOW_DIM=2 osample: PROJECT_ALL_ANALYSIS_DATAcompulsory keyword number of low-dimensional coordinates requiredUSE_OUTPUT_DATA_FROM=matrixcould not find this keywordPROJECTION=mdscould not find this keyword
Notice here that the final command allows us to calculate the projections of all the non-landmark points that were collected by the action with label matrix.
Dimensionality can be more complicated, however, because the stress function that calculates \(\chi^2\) has to optimized rather carefully using a number of different algorithms. The various algorithms that can be used to optimize this function are described below
ARRANGE_POINTS | Arrange points in a low dimensional space so that the (transformed) distances between points in the low dimensional space match the dissimilarities provided in an input matrix. |
CLASSICAL_MDS | Create a low-dimensional projection of a trajectory using the classical multidimensional scaling algorithm. |
CREATE_MASK | Create a mask vector to use for landmark selection |
PCA | Perform principal component analysis (PCA) using either the positions of the atoms a large number of collective variables as input. |
PROJECT_POINTS | Find the projection of a point in a low dimensional space by matching the (transformed) distance between it and a series of reference configurations that were input |
SKETCHMAP | Construct a sketch map projection of the input data |
SKETCHMAP_PROJECTION | Read in a sketch-map projection |
The following methods are available for printing the result output by the various analysis algorithms:
OUTPUT_ANALYSIS_DATA_TO_COLVAR | Output the results from an analysis using the PLUMED colvar file format. |
OUTPUT_ANALYSIS_DATA_TO_PDB | Output the results from an analysis using the PDB file format. |
Using the above commands to output the data from any form of analysis is important as the STRIDE with which you output the data to a COLVAR or PDB file controls how frequently the analysis is performed on the collected data . If you specified no stride on the output lines then PLUMED assumes you want to perform analysis on the entire trajectory.
If you use the above commands to output data from one of the Landmark Selection algorithms then only the second will give you information on the atomic positions in your landmark configurations and their associated weights. The first of these commands will give the values of the colvars in the landmark configurations only. If you use the above commands to output data from one of the Dimensionality Reduction algorithms then OUTPUT_ANALYSIS_DATA_TO_COLVAR will give you an output file that contains the projection for each of your input points. OUTPUT_ANALYSIS_DATA_TO_PDB will give you a PDB that contains the position of the input point, the projections and the weight of the configuration.
A nice feature of plumed is that when you use Landmark Selection algorithms or Dimensionality Reduction algorithms the output information is just a vector of variables. As such you can use HISTOGRAM to construct a histogram of the information generated by these algorithms.