LCOV - code coverage report
Current view: top level - dimred - PCA.cpp (source / functions) Hit Total Coverage
Test: plumed test coverage Lines: 57 57 100.0 %
Date: 2024-10-18 14:00:25 Functions: 2 3 66.7 %

          Line data    Source code
       1             : /* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
       2             :    Copyright (c) 2016-2023 The plumed team
       3             :    (see the PEOPLE file at the root of the distribution for a list of names)
       4             : 
       5             :    See http://www.plumed.org for more information.
       6             : 
       7             :    This file is part of plumed, version 2.
       8             : 
       9             :    plumed is free software: you can redistribute it and/or modify
      10             :    it under the terms of the GNU Lesser General Public License as published by
      11             :    the Free Software Foundation, either version 3 of the License, or
      12             :    (at your option) any later version.
      13             : 
      14             :    plumed is distributed in the hope that it will be useful,
      15             :    but WITHOUT ANY WARRANTY; without even the implied warranty of
      16             :    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
      17             :    GNU Lesser General Public License for more details.
      18             : 
      19             :    You should have received a copy of the GNU Lesser General Public License
      20             :    along with plumed.  If not, see <http://www.gnu.org/licenses/>.
      21             : +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ */
      22             : #include "core/ActionShortcut.h"
      23             : #include "core/ActionRegister.h"
      24             : #include "core/ActionWithValue.h"
      25             : #include "core/ActionPilot.h"
      26             : #include "core/ActionAtomistic.h"
      27             : #include "core/PlumedMain.h"
      28             : #include "core/ActionSet.h"
      29             : 
      30             : //+PLUMEDOC DIMRED PCA
      31             : /*
      32             : Perform principal component analysis (PCA) using either the positions of the atoms a large number of collective variables as input.
      33             : 
      34             : Principal component analysis is a statistical technique that uses an orthogonal transformation to convert a set of observations of
      35             : poorly correlated variables into a set of linearly uncorrelated variables.  You can read more about the specifics of this technique
      36             : here: https://en.wikipedia.org/wiki/Principal_component_analysis
      37             : 
      38             : When used with molecular dynamics simulations a set of frames taken from the trajectory, \f$\{X_i\}\f$, or the values of
      39             : a number of collective variables which are calculated from the trajectory frames are used as input.  In this second instance your
      40             : input to the PCA analysis algorithm is thus a set of high-dimensional vectors of collective variables.  However, if
      41             : collective variables are calculated from the positions of the atoms or if the positions are used directly the assumption is that
      42             : this input trajectory is a set of poorly correlated (high-dimensional) vectors.  After principal component analysis has been
      43             : performed the output is a set of orthogonal vectors that describe the directions in which the largest motions have been seen.
      44             : In other words, principal component analysis provides a method for lowering the dimensionality of the data contained in a trajectory.
      45             : These output directions are some linear combination of the \f$x\f$, \f$y\f$ and \f$z\f$ positions if the positions were used as input
      46             : or some linear combination of the input collective variables if a high-dimensional vector of collective variables was used as input.
      47             : 
      48             : As explained on the Wikipedia page you must calculate the average and covariance for each of the input coordinates.  In other words, you must
      49             : calculate the average structure and the amount the system fluctuates around this average structure.  The problem in doing so when the
      50             : \f$x\f$, \f$y\f$ and \f$z\f$ coordinates of a molecule are used as input is that the majority of the changes in the positions of the
      51             : atoms comes from the translational and rotational degrees of freedom of the molecule.  The first six principal components will thus, most likely,
      52             : be uninteresting.  Consequently, to remedy this problem PLUMED provides the functionality to perform an RMSD alignment of the all the structures
      53             : to be analyzed to the first frame in the trajectory.  This can be used to effectively remove translational and/or rotational motions from
      54             : consideration.  The resulting principal components thus describe vibrational motions of the molecule.
      55             : 
      56             : If you wish to calculate the projection of a trajectory on a set of principal components calculated from this PCA action then the output can be
      57             : used as input for the \ref PCAVARS action.
      58             : 
      59             : \par Examples
      60             : 
      61             : The following input instructs PLUMED to perform a principal component analysis in which the covariance matrix is calculated from changes in the positions
      62             : of the first 22 atoms.  The TYPE=OPTIMAL instruction ensures that translational and rotational degrees of freedom are removed from consideration.
      63             : The first two principal components will be output to a file called PCA-comp.pdb.  Trajectory frames will be collected on every step and the PCA calculation
      64             : will be performed at the end of the simulation.
      65             : 
      66             : \plumedfile
      67             : ff: COLLECT_FRAMES ATOMS=1-22 STRIDE=1
      68             : pca: PCA USE_OUTPUT_DATA_FROM=ff METRIC=OPTIMAL NLOW_DIM=2
      69             : OUTPUT_PCA_PROJECTION USE_OUTPUT_DATA_FROM=pca FILE=PCA-comp.pdb
      70             : \endplumedfile
      71             : 
      72             : The following input instructs PLUMED to perform a principal component analysis in which the covariance matrix is calculated from changes in the six distances
      73             : seen in the previous lines.  Notice that here the TYPE=EUCLIDEAN keyword is used to indicate that no alignment has to be done when calculating the various
      74             : elements of the covariance matrix from the input vectors.  In this calculation the first two principal components will be output to a file called PCA-comp.pdb.
      75             : Trajectory frames will be collected every five steps and the PCA calculation is performed every 1000 steps.  Consequently, if you run a 2000 step simulation the
      76             : PCA analysis will be performed twice.  The REWEIGHT_BIAS action in this input tells PLUMED that rather that ascribing a weight of one to each of the frames
      77             : when calculating averages and covariance matrices a reweighting should be performed based and each frames' weight in these calculations should be determined based on
      78             : the current value of the instantaneous bias (see \ref REWEIGHT_BIAS).
      79             : 
      80             : \plumedfile
      81             : d1: DISTANCE ATOMS=1,2
      82             : d2: DISTANCE ATOMS=1,3
      83             : d3: DISTANCE ATOMS=1,4
      84             : d4: DISTANCE ATOMS=2,3
      85             : d5: DISTANCE ATOMS=2,4
      86             : d6: DISTANCE ATOMS=3,4
      87             : rr: RESTRAINT ARG=d1 AT=0.1 KAPPA=10
      88             : rbias: REWEIGHT_BIAS TEMP=300
      89             : 
      90             : ff: COLLECT_FRAMES ARG=d1,d2,d3,d4,d5,d6 LOGWEIGHTS=rbias STRIDE=5
      91             : pca: PCA USE_OUTPUT_DATA_FROM=ff METRIC=EUCLIDEAN NLOW_DIM=2
      92             : OUTPUT_PCA_PROJECTION USE_OUTPUT_DATA_FROM=pca STRIDE=100 FILE=PCA-comp.pdb
      93             : \endplumedfile
      94             : 
      95             : */
      96             : //+ENDPLUMEDOC
      97             : 
      98             : namespace PLMD {
      99             : namespace dimred {
     100             : 
     101             : class PCA : public ActionShortcut {
     102             : public:
     103             :   static void registerKeywords( Keywords& keys );
     104             :   PCA( const ActionOptions& );
     105             : };
     106             : 
     107             : PLUMED_REGISTER_ACTION(PCA,"PCA")
     108             : 
     109           4 : void PCA::registerKeywords( Keywords& keys ) {
     110           4 :   ActionShortcut::registerKeywords( keys );
     111           8 :   keys.add("compulsory","ARG","the arguments that you would like to make the histogram for");
     112           8 :   keys.add("compulsory","NLOW_DIM","number of low-dimensional coordinates required");
     113           8 :   keys.add("compulsory","STRIDE","0","the frequency with which to perform this analysis");
     114           8 :   keys.add("optional","FILE","the file on which to output the low dimensional coordinates");
     115           8 :   keys.add("optional","FMT","the format to use when outputting the low dimensional coordinates");
     116           4 :   keys.setValueDescription("the projections of the input coordinates on the PCA components that were found from the covariance matrix");
     117          12 :   keys.needsAction("LOGSUMEXP"); keys.needsAction("TRANSPOSE"); keys.needsAction("MATRIX_VECTOR_PRODUCT");
     118          16 :   keys.needsAction("CONSTANT"); keys.needsAction("COLLECT"); keys.needsAction("OUTER_PRODUCT"); keys.needsAction("CUSTOM");
     119          16 :   keys.needsAction("MATRIX_PRODUCT"); keys.needsAction("DIAGONALIZE"); keys.needsAction("VSTACK"); keys.needsAction("DUMPPDB");
     120           4 : }
     121             : 
     122             : 
     123           2 : PCA::PCA(const ActionOptions&ao):
     124             :   Action(ao),
     125           2 :   ActionShortcut(ao)
     126             : {
     127             :   // Find the argument name
     128           2 :   std::string argn; parse("ARG",argn);
     129           2 :   ActionShortcut* as = plumed.getActionSet().getShortcutActionWithLabel( argn );
     130           2 :   if( !as || as->getName()!="COLLECT_FRAMES" ) error("found no COLLECT_FRAMES action with label " + argn );
     131             :   // Get the final weights using the logsumexp trick
     132           4 :   readInputLine( getShortcutLabel() + "_weights: LOGSUMEXP ARG=" + argn + "_logweights");
     133             :   // Now transpose the collected frames
     134           4 :   readInputLine( getShortcutLabel() + "_dataT: TRANSPOSE ARG=" + argn + "_data");
     135             :   // And multiply the transpose by the weights to get the averages
     136           4 :   readInputLine( getShortcutLabel() + "_mean: MATRIX_VECTOR_PRODUCT ARG=" + getShortcutLabel() + "_dataT," + getShortcutLabel() + "_weights");
     137             :   // Make a matrix of averages
     138           4 :   readInputLine( getShortcutLabel() + "_averages: OUTER_PRODUCT ARG=" + argn + "_ones," + getShortcutLabel() + "_mean");
     139             :   // Make a matrix of weights
     140           2 :   ActionWithValue* av2 = plumed.getActionSet().selectWithLabel<ActionWithValue*>( argn + "_data" );
     141           2 :   if( !av2 ) error("count not find data");
     142           2 :   unsigned nones = (av2->copyOutput(0))->getShape()[1];
     143          68 :   std::string ones="1"; for(unsigned i=1; i<nones; ++i) ones += ",1";
     144           4 :   readInputLine( getShortcutLabel() + "_wones: CONSTANT VALUES=" + ones );
     145           4 :   readInputLine( getShortcutLabel() + "_wmat: OUTER_PRODUCT ARG=" + getShortcutLabel() + "_weights," + getShortcutLabel() + "_wones");
     146             :   // And compute the data substract the mean
     147           4 :   readInputLine( getShortcutLabel() + "_diff: CUSTOM ARG=" + argn + "_data," + getShortcutLabel() + "_averages FUNC=(x-y) PERIODIC=NO");
     148           4 :   readInputLine( getShortcutLabel() + "_wdiff: CUSTOM ARG=" + getShortcutLabel() + "_wmat," + getShortcutLabel() + "_diff FUNC=sqrt(x)*y PERIODIC=NO");
     149             :   // And the covariance
     150           4 :   readInputLine( getShortcutLabel() + "_wdiffT: TRANSPOSE ARG=" + getShortcutLabel() + "_wdiff");
     151           4 :   readInputLine( getShortcutLabel() + "_covar: MATRIX_PRODUCT ARG=" + getShortcutLabel() + "_wdiffT," + getShortcutLabel() + "_wdiff");
     152             :   // Read the dimensionality of the low dimensional space
     153           4 :   unsigned ndim; parse("NLOW_DIM",ndim); std::string vecstr="1";
     154           2 :   if( ndim<=0 || ndim>nones ) error("cannot generate projection in space of dimension higher than input coordinates");
     155           6 :   for(unsigned i=1; i<ndim; ++i) { std::string num; Tools::convert( i+1, num ); vecstr += "," + num; }
     156           4 :   readInputLine( getShortcutLabel() + "_eig: DIAGONALIZE ARG=" + getShortcutLabel() + "_covar VECTORS=" + vecstr );
     157             :   // Now create a matrix to hold the output data
     158           2 :   std::string outd = "ARG=" + getShortcutLabel() + "_mean";
     159          10 :   for(unsigned i=0; i<ndim; ++i) { std::string num; Tools::convert( i+1, num ); outd += "," + getShortcutLabel() + "_eig.vecs-" + num; }
     160           4 :   readInputLine( getShortcutLabel() + "_pcaT: VSTACK " + outd );
     161           4 :   readInputLine( getShortcutLabel() + "_pca: TRANSPOSE ARG=" + getShortcutLabel() + "_pcaT");
     162             :   // And output it all
     163           6 :   std::string filename, pstride; parse("STRIDE",pstride); parse("FILE",filename);
     164           2 :   if( filename.length()>0 && av2->getName()=="VSTACK" ) {
     165           1 :     std::vector<std::string> argnames; av2->getMatrixColumnTitles( argnames );
     166           2 :     std::string argname_str=argnames[0]; for(unsigned i=1; i<argnames.size(); ++i) argname_str += "," + argnames[i];
     167           3 :     std::string fmt; parse("FMT",fmt); if( fmt.length()>0 ) fmt=" FMT=" + fmt;
     168           2 :     readInputLine("DUMPPDB DESCRIPTION=PCA ARG_NAMES=" + argname_str + " ARG=" + getShortcutLabel() + "_pca FILE=" + filename + " STRIDE=" + pstride + fmt );
     169           1 :   } else {
     170           1 :     if( av2->getName()!="COLLECT" ) error("input data should be VSTACK if list of arguments of COLLECT if atom positions");
     171           1 :     ActionAtomistic* rmsdact = plumed.getActionSet().selectWithLabel<ActionAtomistic*>( argn + "_getposx" );
     172           1 :     if( !rmsdact ) error("could not find action that gets positions from trajectory for RMSD");
     173           1 :     std::vector<AtomNumber> atoms( rmsdact->getAbsoluteIndexes() ); std::string indices; Tools::convert( atoms[0].serial(), indices );
     174          22 :     for(unsigned i=1; i<atoms.size(); ++i) { std::string jnum; Tools::convert( atoms[i].serial(), jnum ); indices += "," + jnum; }
     175           2 :     readInputLine("DUMPPDB DESCRIPTION=PCA ATOM_INDICES=" + indices + " ATOMS=" + getShortcutLabel() + "_pca FILE=" + filename + " STRIDE=" + pstride );
     176             :   }
     177           4 :   outd = "ARG=" + getShortcutLabel() + "_eig.vecs-1";
     178           6 :   for(unsigned i=1; i<ndim; ++i) { std::string num; Tools::convert( i+1, num ); outd += "," + getShortcutLabel() + "_eig.vecs-" + num; }
     179           4 :   readInputLine( getShortcutLabel() + "_eigv: VSTACK " + outd );
     180           4 :   readInputLine( getShortcutLabel() + ": MATRIX_PRODUCT ARG=" + getShortcutLabel() + "_diff," + getShortcutLabel() + "_eigv");
     181           2 : }
     182             : 
     183             : }
     184             : }

Generated by: LCOV version 1.16