PCA
This is part of the dimred module
It is only available if you configure PLUMED with ./configure –enable-modules=dimred . Furthermore, this feature is still being developed so take care when using it and report any problems on the mailing list.

Perform principal component analysis (PCA) using either the positions of the atoms a large number of collective variables as input.

Principal component analysis is a statistical technique that uses an orthogonal transformation to convert a set of observations of poorly correlated variables into a set of linearly uncorrelated variables. You can read more about the specifics of this technique here: https://en.wikipedia.org/wiki/Principal_component_analysis

When used with molecular dynamics simulations a set of frames taken from the trajectory, \(\{X_i\}\), or the values of a number of collective variables which are calculated from the trajectory frames are used as input. In this second instance your input to the PCA analysis algorithm is thus a set of high-dimensional vectors of collective variables. However, if collective variables are calculated from the positions of the atoms or if the positions are used directly the assumption is that this input trajectory is a set of poorly correlated (high-dimensional) vectors. After principal component analysis has been performed the output is a set of orthogonal vectors that describe the directions in which the largest motions have been seen. In other words, principal component analysis provides a method for lowering the dimensionality of the data contained in a trajectory. These output directions are some linear combination of the \(x\), \(y\) and \(z\) positions if the positions were used as input or some linear combination of the input collective variables if a high-dimensional vector of collective variables was used as input.

As explained on the Wikipedia page you must calculate the average and covariance for each of the input coordinates. In other words, you must calculate the average structure and the amount the system fluctuates around this average structure. The problem in doing so when the \(x\), \(y\) and \(z\) coordinates of a molecule are used as input is that the majority of the changes in the positions of the atoms comes from the translational and rotational degrees of freedom of the molecule. The first six principal components will thus, most likely, be uninteresting. Consequently, to remedy this problem PLUMED provides the functionality to perform an RMSD alignment of the all the structures to be analyzed to the first frame in the trajectory. This can be used to effectively remove translational and/or rotational motions from consideration. The resulting principal components thus describe vibrational motions of the molecule.

If you wish to calculate the projection of a trajectory on a set of principal components calculated from this PCA action then the output can be used as input for the PCAVARS action.

Examples

The following input instructs PLUMED to perform a principal component analysis in which the covariance matrix is calculated from changes in the positions of the first 22 atoms. The TYPE=OPTIMAL instruction ensures that translational and rotational degrees of freedom are removed from consideration. The first two principal components will be output to a file called PCA-comp.pdb. Trajectory frames will be collected on every step and the PCA calculation will be performed at the end of the simulation.

Click on the labels of the actions for more information on what each action computes
tested on master
ff: COLLECT_FRAMES 
ATOMS
list of atomic positions that you would like to collect and store for later analysis
=1-22
STRIDE
compulsory keyword ( default=1 ) the frequency with which data should be stored for analysis.
=1 pca: PCA
USE_OUTPUT_DATA_FROM
could not find this keyword
=ff
METRIC
could not find this keyword
=OPTIMAL
NLOW_DIM
compulsory keyword number of low-dimensional coordinates required
=2 OUTPUT_PCA_PROJECTION
USE_OUTPUT_DATA_FROM
could not find this keyword
=pca
FILE
could not find this keyword
=PCA-comp.pdb

The following input instructs PLUMED to perform a principal component analysis in which the covariance matrix is calculated from changes in the six distances seen in the previous lines. Notice that here the TYPE=EUCLIDEAN keyword is used to indicate that no alignment has to be done when calculating the various elements of the covariance matrix from the input vectors. In this calculation the first two principal components will be output to a file called PCA-comp.pdb. Trajectory frames will be collected every five steps and the PCA calculation is performed every 1000 steps. Consequently, if you run a 2000 step simulation the PCA analysis will be performed twice. The REWEIGHT_BIAS action in this input tells PLUMED that rather that ascribing a weight of one to each of the frames when calculating averages and covariance matrices a reweighting should be performed based and each frames' weight in these calculations should be determined based on the current value of the instantaneous bias (see REWEIGHT_BIAS).

Click on the labels of the actions for more information on what each action computes
tested on master
d1: DISTANCE 
ATOMS
the pair of atom that we are calculating the distance between.
=1,2 d2: DISTANCE
ATOMS
the pair of atom that we are calculating the distance between.
=1,3 d3: DISTANCE
ATOMS
the pair of atom that we are calculating the distance between.
=1,4 d4: DISTANCE
ATOMS
the pair of atom that we are calculating the distance between.
=2,3 d5: DISTANCE
ATOMS
the pair of atom that we are calculating the distance between.
=2,4 d6: DISTANCE
ATOMS
the pair of atom that we are calculating the distance between.
=3,4 rr: RESTRAINT
ARG
the values the harmonic restraint acts upon.
=d1
AT
compulsory keyword the position of the restraint
=0.1
KAPPA
compulsory keyword ( default=0.0 ) specifies that the restraint is harmonic and what the values of the force constants on each of the variables are
=10 rbias: REWEIGHT_BIAS
TEMP
the system temperature.
=300 ff: COLLECT_FRAMES
ARG
the labels of the values whose time series you would like to collect for later analysis
=d1,d2,d3,d4,d5,d6
LOGWEIGHTS
list of actions that calculates log weights that should be used to weight configurations when calculating averages
=rbias
STRIDE
compulsory keyword ( default=1 ) the frequency with which data should be stored for analysis.
=5 pca: PCA
USE_OUTPUT_DATA_FROM
could not find this keyword
=ff
METRIC
could not find this keyword
=EUCLIDEAN
NLOW_DIM
compulsory keyword number of low-dimensional coordinates required
=2 OUTPUT_PCA_PROJECTION
USE_OUTPUT_DATA_FROM
could not find this keyword
=pca
STRIDE
could not find this keyword
=100
FILE
could not find this keyword
=PCA-comp.pdb
Glossary of keywords and components
Description of components
Quantity Description
.#!value the projections of the input coordinates on the PCA components that were found from the covariance matrix
Compulsory keywords
ARG the arguments that you would like to make the histogram for
NLOW_DIM number of low-dimensional coordinates required
STRIDE ( default=0 ) the frequency with which to perform this analysis
Options
FILE the file on which to output the low dimensional coordinates
FMT the format to use when outputting the low dimensional coordinates