Line data Source code
1 : /* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 : Copyright (c) 2016-2023 The plumed team 3 : (see the PEOPLE file at the root of the distribution for a list of names) 4 : 5 : See http://www.plumed.org for more information. 6 : 7 : This file is part of plumed, version 2. 8 : 9 : plumed is free software: you can redistribute it and/or modify 10 : it under the terms of the GNU Lesser General Public License as published by 11 : the Free Software Foundation, either version 3 of the License, or 12 : (at your option) any later version. 13 : 14 : plumed is distributed in the hope that it will be useful, 15 : but WITHOUT ANY WARRANTY; without even the implied warranty of 16 : MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 17 : GNU Lesser General Public License for more details. 18 : 19 : You should have received a copy of the GNU Lesser General Public License 20 : along with plumed. If not, see <http://www.gnu.org/licenses/>. 21 : +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ */ 22 : #include "core/ActionShortcut.h" 23 : #include "core/ActionRegister.h" 24 : #include "core/ActionWithValue.h" 25 : #include "core/ActionPilot.h" 26 : #include "core/ActionAtomistic.h" 27 : #include "core/PlumedMain.h" 28 : #include "core/ActionSet.h" 29 : 30 : //+PLUMEDOC DIMRED PCA 31 : /* 32 : Perform principal component analysis (PCA) using either the positions of the atoms a large number of collective variables as input. 33 : 34 : Principal component analysis is a statistical technique that uses an orthogonal transformation to convert a set of observations of 35 : poorly correlated variables into a set of linearly uncorrelated variables. You can read more about the specifics of this technique 36 : here: https://en.wikipedia.org/wiki/Principal_component_analysis 37 : 38 : When used with molecular dynamics simulations a set of frames taken from the trajectory, \f$\{X_i\}\f$, or the values of 39 : a number of collective variables which are calculated from the trajectory frames are used as input. In this second instance your 40 : input to the PCA analysis algorithm is thus a set of high-dimensional vectors of collective variables. However, if 41 : collective variables are calculated from the positions of the atoms or if the positions are used directly the assumption is that 42 : this input trajectory is a set of poorly correlated (high-dimensional) vectors. After principal component analysis has been 43 : performed the output is a set of orthogonal vectors that describe the directions in which the largest motions have been seen. 44 : In other words, principal component analysis provides a method for lowering the dimensionality of the data contained in a trajectory. 45 : These output directions are some linear combination of the \f$x\f$, \f$y\f$ and \f$z\f$ positions if the positions were used as input 46 : or some linear combination of the input collective variables if a high-dimensional vector of collective variables was used as input. 47 : 48 : As explained on the Wikipedia page you must calculate the average and covariance for each of the input coordinates. In other words, you must 49 : calculate the average structure and the amount the system fluctuates around this average structure. The problem in doing so when the 50 : \f$x\f$, \f$y\f$ and \f$z\f$ coordinates of a molecule are used as input is that the majority of the changes in the positions of the 51 : atoms comes from the translational and rotational degrees of freedom of the molecule. The first six principal components will thus, most likely, 52 : be uninteresting. Consequently, to remedy this problem PLUMED provides the functionality to perform an RMSD alignment of the all the structures 53 : to be analyzed to the first frame in the trajectory. This can be used to effectively remove translational and/or rotational motions from 54 : consideration. The resulting principal components thus describe vibrational motions of the molecule. 55 : 56 : If you wish to calculate the projection of a trajectory on a set of principal components calculated from this PCA action then the output can be 57 : used as input for the \ref PCAVARS action. 58 : 59 : \par Examples 60 : 61 : The following input instructs PLUMED to perform a principal component analysis in which the covariance matrix is calculated from changes in the positions 62 : of the first 22 atoms. The TYPE=OPTIMAL instruction ensures that translational and rotational degrees of freedom are removed from consideration. 63 : The first two principal components will be output to a file called PCA-comp.pdb. Trajectory frames will be collected on every step and the PCA calculation 64 : will be performed at the end of the simulation. 65 : 66 : \plumedfile 67 : ff: COLLECT_FRAMES ATOMS=1-22 STRIDE=1 68 : pca: PCA USE_OUTPUT_DATA_FROM=ff METRIC=OPTIMAL NLOW_DIM=2 69 : OUTPUT_PCA_PROJECTION USE_OUTPUT_DATA_FROM=pca FILE=PCA-comp.pdb 70 : \endplumedfile 71 : 72 : The following input instructs PLUMED to perform a principal component analysis in which the covariance matrix is calculated from changes in the six distances 73 : seen in the previous lines. Notice that here the TYPE=EUCLIDEAN keyword is used to indicate that no alignment has to be done when calculating the various 74 : elements of the covariance matrix from the input vectors. In this calculation the first two principal components will be output to a file called PCA-comp.pdb. 75 : Trajectory frames will be collected every five steps and the PCA calculation is performed every 1000 steps. Consequently, if you run a 2000 step simulation the 76 : PCA analysis will be performed twice. The REWEIGHT_BIAS action in this input tells PLUMED that rather that ascribing a weight of one to each of the frames 77 : when calculating averages and covariance matrices a reweighting should be performed based and each frames' weight in these calculations should be determined based on 78 : the current value of the instantaneous bias (see \ref REWEIGHT_BIAS). 79 : 80 : \plumedfile 81 : d1: DISTANCE ATOMS=1,2 82 : d2: DISTANCE ATOMS=1,3 83 : d3: DISTANCE ATOMS=1,4 84 : d4: DISTANCE ATOMS=2,3 85 : d5: DISTANCE ATOMS=2,4 86 : d6: DISTANCE ATOMS=3,4 87 : rr: RESTRAINT ARG=d1 AT=0.1 KAPPA=10 88 : rbias: REWEIGHT_BIAS TEMP=300 89 : 90 : ff: COLLECT_FRAMES ARG=d1,d2,d3,d4,d5,d6 LOGWEIGHTS=rbias STRIDE=5 91 : pca: PCA USE_OUTPUT_DATA_FROM=ff METRIC=EUCLIDEAN NLOW_DIM=2 92 : OUTPUT_PCA_PROJECTION USE_OUTPUT_DATA_FROM=pca STRIDE=100 FILE=PCA-comp.pdb 93 : \endplumedfile 94 : 95 : */ 96 : //+ENDPLUMEDOC 97 : 98 : namespace PLMD { 99 : namespace dimred { 100 : 101 : class PCA : public ActionShortcut { 102 : public: 103 : static void registerKeywords( Keywords& keys ); 104 : PCA( const ActionOptions& ); 105 : }; 106 : 107 : PLUMED_REGISTER_ACTION(PCA,"PCA") 108 : 109 4 : void PCA::registerKeywords( Keywords& keys ) { 110 4 : ActionShortcut::registerKeywords( keys ); 111 8 : keys.add("compulsory","ARG","the arguments that you would like to make the histogram for"); 112 8 : keys.add("compulsory","NLOW_DIM","number of low-dimensional coordinates required"); 113 8 : keys.add("compulsory","STRIDE","0","the frequency with which to perform this analysis"); 114 8 : keys.add("optional","FILE","the file on which to output the low dimensional coordinates"); 115 8 : keys.add("optional","FMT","the format to use when outputting the low dimensional coordinates"); 116 8 : keys.setValueDescription("matrix","the projections of the input coordinates on the PCA components that were found from the covariance matrix"); 117 12 : keys.needsAction("LOGSUMEXP"); keys.needsAction("TRANSPOSE"); keys.needsAction("MATRIX_VECTOR_PRODUCT"); 118 16 : keys.needsAction("CONSTANT"); keys.needsAction("COLLECT"); keys.needsAction("OUTER_PRODUCT"); keys.needsAction("CUSTOM"); 119 16 : keys.needsAction("MATRIX_PRODUCT"); keys.needsAction("DIAGONALIZE"); keys.needsAction("VSTACK"); keys.needsAction("DUMPPDB"); 120 4 : } 121 : 122 : 123 2 : PCA::PCA(const ActionOptions&ao): 124 : Action(ao), 125 2 : ActionShortcut(ao) 126 : { 127 : // Find the argument name 128 2 : std::string argn; parse("ARG",argn); 129 2 : ActionShortcut* as = plumed.getActionSet().getShortcutActionWithLabel( argn ); 130 2 : if( !as || as->getName()!="COLLECT_FRAMES" ) error("found no COLLECT_FRAMES action with label " + argn ); 131 : // Get the final weights using the logsumexp trick 132 4 : readInputLine( getShortcutLabel() + "_weights: LOGSUMEXP ARG=" + argn + "_logweights"); 133 : // Now transpose the collected frames 134 4 : readInputLine( getShortcutLabel() + "_dataT: TRANSPOSE ARG=" + argn + "_data"); 135 : // And multiply the transpose by the weights to get the averages 136 4 : readInputLine( getShortcutLabel() + "_mean: MATRIX_VECTOR_PRODUCT ARG=" + getShortcutLabel() + "_dataT," + getShortcutLabel() + "_weights"); 137 : // Make a matrix of averages 138 4 : readInputLine( getShortcutLabel() + "_averages: OUTER_PRODUCT ARG=" + argn + "_ones," + getShortcutLabel() + "_mean"); 139 : // Make a matrix of weights 140 2 : ActionWithValue* av2 = plumed.getActionSet().selectWithLabel<ActionWithValue*>( argn + "_data" ); 141 2 : if( !av2 ) error("count not find data"); 142 2 : unsigned nones = (av2->copyOutput(0))->getShape()[1]; 143 68 : std::string ones="1"; for(unsigned i=1; i<nones; ++i) ones += ",1"; 144 4 : readInputLine( getShortcutLabel() + "_wones: CONSTANT VALUES=" + ones ); 145 4 : readInputLine( getShortcutLabel() + "_wmat: OUTER_PRODUCT ARG=" + getShortcutLabel() + "_weights," + getShortcutLabel() + "_wones"); 146 : // And compute the data substract the mean 147 4 : readInputLine( getShortcutLabel() + "_diff: CUSTOM ARG=" + argn + "_data," + getShortcutLabel() + "_averages FUNC=(x-y) PERIODIC=NO"); 148 4 : readInputLine( getShortcutLabel() + "_wdiff: CUSTOM ARG=" + getShortcutLabel() + "_wmat," + getShortcutLabel() + "_diff FUNC=sqrt(x)*y PERIODIC=NO"); 149 : // And the covariance 150 4 : readInputLine( getShortcutLabel() + "_wdiffT: TRANSPOSE ARG=" + getShortcutLabel() + "_wdiff"); 151 4 : readInputLine( getShortcutLabel() + "_covar: MATRIX_PRODUCT ARG=" + getShortcutLabel() + "_wdiffT," + getShortcutLabel() + "_wdiff"); 152 : // Read the dimensionality of the low dimensional space 153 4 : unsigned ndim; parse("NLOW_DIM",ndim); std::string vecstr="1"; 154 2 : if( ndim<=0 || ndim>nones ) error("cannot generate projection in space of dimension higher than input coordinates"); 155 6 : for(unsigned i=1; i<ndim; ++i) { std::string num; Tools::convert( i+1, num ); vecstr += "," + num; } 156 4 : readInputLine( getShortcutLabel() + "_eig: DIAGONALIZE ARG=" + getShortcutLabel() + "_covar VECTORS=" + vecstr ); 157 : // Now create a matrix to hold the output data 158 2 : std::string outd = "ARG=" + getShortcutLabel() + "_mean"; 159 10 : for(unsigned i=0; i<ndim; ++i) { std::string num; Tools::convert( i+1, num ); outd += "," + getShortcutLabel() + "_eig.vecs-" + num; } 160 4 : readInputLine( getShortcutLabel() + "_pcaT: VSTACK " + outd ); 161 4 : readInputLine( getShortcutLabel() + "_pca: TRANSPOSE ARG=" + getShortcutLabel() + "_pcaT"); 162 : // And output it all 163 6 : std::string filename, pstride; parse("STRIDE",pstride); parse("FILE",filename); 164 2 : if( filename.length()>0 && av2->getName()=="VSTACK" ) { 165 1 : std::vector<std::string> argnames; av2->getMatrixColumnTitles( argnames ); 166 2 : std::string argname_str=argnames[0]; for(unsigned i=1; i<argnames.size(); ++i) argname_str += "," + argnames[i]; 167 3 : std::string fmt; parse("FMT",fmt); if( fmt.length()>0 ) fmt=" FMT=" + fmt; 168 2 : readInputLine("DUMPPDB DESCRIPTION=PCA ARG_NAMES=" + argname_str + " ARG=" + getShortcutLabel() + "_pca FILE=" + filename + " STRIDE=" + pstride + fmt ); 169 1 : } else { 170 1 : if( av2->getName()!="COLLECT" ) error("input data should be VSTACK if list of arguments of COLLECT if atom positions"); 171 1 : ActionAtomistic* rmsdact = plumed.getActionSet().selectWithLabel<ActionAtomistic*>( argn + "_getposx" ); 172 1 : if( !rmsdact ) error("could not find action that gets positions from trajectory for RMSD"); 173 1 : std::vector<AtomNumber> atoms( rmsdact->getAbsoluteIndexes() ); std::string indices; Tools::convert( atoms[0].serial(), indices ); 174 22 : for(unsigned i=1; i<atoms.size(); ++i) { std::string jnum; Tools::convert( atoms[i].serial(), jnum ); indices += "," + jnum; } 175 2 : readInputLine("DUMPPDB DESCRIPTION=PCA ATOM_INDICES=" + indices + " ATOMS=" + getShortcutLabel() + "_pca FILE=" + filename + " STRIDE=" + pstride ); 176 : } 177 4 : outd = "ARG=" + getShortcutLabel() + "_eig.vecs-1"; 178 6 : for(unsigned i=1; i<ndim; ++i) { std::string num; Tools::convert( i+1, num ); outd += "," + getShortcutLabel() + "_eig.vecs-" + num; } 179 4 : readInputLine( getShortcutLabel() + "_eigv: VSTACK " + outd ); 180 4 : readInputLine( getShortcutLabel() + ": MATRIX_PRODUCT ARG=" + getShortcutLabel() + "_diff," + getShortcutLabel() + "_eigv"); 181 2 : } 182 : 183 : } 184 : }