LCOV - code coverage report
Current view: top level - analysis - PCA.cpp (source / functions) Hit Total Coverage
Test: plumed test coverage Lines: 83 84 98.8 %
Date: 2020-11-18 11:20:57 Functions: 10 13 76.9 %

          Line data    Source code
       1             : /* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
       2             :    Copyright (c) 2016-2019 The plumed team
       3             :    (see the PEOPLE file at the root of the distribution for a list of names)
       4             : 
       5             :    See http://www.plumed.org for more information.
       6             : 
       7             :    This file is part of plumed, version 2.
       8             : 
       9             :    plumed is free software: you can redistribute it and/or modify
      10             :    it under the terms of the GNU Lesser General Public License as published by
      11             :    the Free Software Foundation, either version 3 of the License, or
      12             :    (at your option) any later version.
      13             : 
      14             :    plumed is distributed in the hope that it will be useful,
      15             :    but WITHOUT ANY WARRANTY; without even the implied warranty of
      16             :    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
      17             :    GNU Lesser General Public License for more details.
      18             : 
      19             :    You should have received a copy of the GNU Lesser General Public License
      20             :    along with plumed.  If not, see <http://www.gnu.org/licenses/>.
      21             : +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ */
      22             : #include "Analysis.h"
      23             : #include "tools/Matrix.h"
      24             : #include "reference/Direction.h"
      25             : #include "reference/MetricRegister.h"
      26             : #include "reference/ReferenceConfiguration.h"
      27             : #include "reference/ReferenceValuePack.h"
      28             : #include "core/ActionRegister.h"
      29             : 
      30             : //+PLUMEDOC DIMRED PCA
      31             : /*
      32             : Perform principal component analysis (PCA) using either the positions of the atoms a large number of collective variables as input.
      33             : 
      34             : Principal component analysis is a statistical technique that uses an orthogonal transformation to convert a set of observations of
      35             : poorly correlated variables into a set of linearly uncorrelated variables.  You can read more about the specifics of this technique
      36             : here: https://en.wikipedia.org/wiki/Principal_component_analysis
      37             : 
      38             : When used with molecular dynamics simulations a set of frames taken from the trajectory, \f$\{X_i\}\f$, or the values of
      39             : a number of collective variables which are calculated from the trajectory frames are used as input.  In this second instance your
      40             : input to the PCA analysis algorithm is thus a set of high-dimensional vectors of collective variables.  However, if
      41             : collective variables are calculated from the positions of the atoms or if the positions are used directly the assumption is that
      42             : this input trajectory is a set of poorly correlated (high-dimensional) vectors.  After principal component analysis has been
      43             : performed the output is a set of orthogonal vectors that describe the directions in which the largest motions have been seen.
      44             : In other words, principal component analysis provides a method for lowering the dimensionality of the data contained in a trajectory.
      45             : These output directions are some linear combination of the \f$x\f$, \f$y\f$ and \f$z\f$ positions if the positions were used as input
      46             : or some linear combination of the input collective variables if a high-dimensional vector of collective variables was used as input.
      47             : 
      48             : As explained on the Wikipedia page you must calculate the average and covariance for each of the input coordinates.  In other words, you must
      49             : calculate the average structure and the amount the system fluctuates around this average structure.  The problem in doing so when the
      50             : \f$x\f$, \f$y\f$ and \f$z\f$ coordinates of a molecule are used as input is that the majority of the changes in the positions of the
      51             : atoms comes from the translational and rotational degrees of freedom of the molecule.  The first six principal components will thus, most likely,
      52             : be uninteresting.  Consequently, to remedy this problem PLUMED provides the functionality to perform an RMSD alignment of the all the structures
      53             : to be analysed to the first frame in the trajectory.  This can be used to effectively remove translational and/or rotational motions from
      54             : consideration.  The resulting principal components thus describe vibrational motions of the molecule.
      55             : 
      56             : If you wish to calculate the projection of a trajectory on a set of principal components calculated from this PCA action then the output can be
      57             : used as input for the \ref PCAVARS action.
      58             : 
      59             : \par Examples
      60             : 
      61             : The following input instructs PLUMED to perform a principal component analysis in which the covariance matrix is calculated from changes in the positions
      62             : of the first 22 atoms.  The TYPE=OPTIMAL instruction ensures that translational and rotational degrees of freedom are removed from consideration.
      63             : The first two principal components will be output to a file called pca-comp.pdb.  Trajectory frames will be collected on every step and the PCA calculation
      64             : will be performed at the end of the simulation.
      65             : 
      66             : \plumedfile
      67             : PCA METRIC=OPTIMAL ATOMS=1-22 STRIDE=1 NLOW_DIM=2 OFILE=pca-comp.pdb
      68             : \endplumedfile
      69             : 
      70             : The following input instructs PLUMED to perform a principal component analysis in which the covariance matrix is calculated from chnages in the six distances
      71             : seen in the previous lines.  Notice that here the TYPE=EUCLIDEAN keyword is used to indicate that no alighment has to be done when calculating the various
      72             : elements of the covariance matrix from the input vectors.  In this calculation the first two principal components will be output to a file called pca-comp.pdb.
      73             : Trajectory frames will be collected every five steps and the PCA calculation is performed every 1000 steps.  Consequently, if you run a 2000 step simulation the
      74             : PCA analysis will be performed twice.  The REWEIGHT_BIAS keyword in this input tells PLUMED that rather that ascribing a weight of one to each of the frames
      75             : when calculating averages and covariances a reweighting should be performed based and each frames' weight in these calculations should be determined based on
      76             : the current value of the instantaneous bias (see \ref REWEIGHT_BIAS).
      77             : 
      78             : \plumedfile
      79             : d1: DISTANCE ATOMS=1,2
      80             : d2: DISTANCE ATOMS=1,3
      81             : d3: DISTANCE ATOMS=1,4
      82             : d4: DISTANCE ATOMS=2,3
      83             : d5: DISTANCE ATOMS=2,4
      84             : d6: DISTANCE ATOMS=3,4
      85             : 
      86             : PCA ARG=d1,d2,d3,d4,d5,d6 METRIC=EUCLIDEAN STRIDE=5 RUN=1000 NLOW_DIM=2 REWEIGHT_BIAS OFILE=pca-comp.pdb
      87             : \endplumedfile
      88             : 
      89             : */
      90             : //+ENDPLUMEDOC
      91             : 
      92             : namespace PLMD {
      93             : namespace analysis {
      94             : 
      95             : class PCA : public Analysis {
      96             : private:
      97             :   unsigned ndim;
      98             : /// The position of the reference configuration (the one we align to)
      99             :   ReferenceConfiguration* myref;
     100             : /// The eigenvectors for the atomic displacements
     101             :   Matrix<Vector> atom_eigv;
     102             : /// The eigenvectors for the displacements in argument space
     103             :   Matrix<double> arg_eigv;
     104             :   std::string ofilename;
     105             : public:
     106             :   static void registerKeywords( Keywords& keys );
     107             :   explicit PCA(const ActionOptions&ao);
     108             :   ~PCA();
     109             :   void performAnalysis();
     110           0 :   void performTask( const unsigned&, const unsigned&, MultiValue& ) const { plumed_error(); }
     111             : };
     112             : 
     113        6454 : PLUMED_REGISTER_ACTION(PCA,"PCA")
     114             : 
     115           3 : void PCA::registerKeywords( Keywords& keys ) {
     116           3 :   Analysis::registerKeywords( keys );
     117          12 :   keys.add("compulsory","NLOW_DIM","number of PCA coordinates required");
     118          12 :   keys.add("compulsory","OFILE","the file on which to output the eigenvectors");
     119           3 : }
     120             : 
     121           2 : PCA::PCA(const ActionOptions&ao):
     122           4 :   PLUMED_ANALYSIS_INIT(ao)
     123             : {
     124             :   // Setup reference configuration
     125           6 :   log.printf("  performing PCA analysis using %s metric \n", getMetricName().c_str() );
     126           6 :   myref = metricRegister().create<ReferenceConfiguration>( getMetricName() );
     127           4 :   std::vector<std::string> argnames( getNumberOfArguments() );
     128          10 :   for(unsigned i=0; i<argnames.size(); ++i) {
     129           6 :     if( getArguments()[i]->isPeriodic() ) error("cannot run PCA with periodic variables");
     130           6 :     argnames[i] = getArguments()[i]->getName();
     131             :   }
     132           4 :   myref->setNamesAndAtomNumbers( getAbsoluteIndexes(), argnames );
     133             : 
     134           4 :   parse("NLOW_DIM",ndim);
     135           2 :   if( getNumberOfAtoms()>0 ) atom_eigv.resize( ndim, getNumberOfAtoms() );
     136           2 :   if( getNumberOfArguments()>0 ) arg_eigv.resize( ndim, getNumberOfArguments() );
     137             : 
     138             :   // Read stuff for output file
     139           4 :   parseOutputFile("OFILE",ofilename);
     140           2 :   checkRead();
     141           2 : }
     142             : 
     143           6 : PCA::~PCA() {
     144           2 :   delete myref;
     145           4 : }
     146             : 
     147           2 : void PCA::performAnalysis() {
     148             :   // Align everything to the first frame
     149           6 :   MultiValue myval( 1, getNumberOfArguments() + 3*getNumberOfAtoms() + 9 );
     150           4 :   ReferenceValuePack mypack( getNumberOfArguments(), getNumberOfAtoms(), myval );
     151          46 :   for(unsigned i=0; i<getNumberOfAtoms(); ++i) mypack.setAtomIndex( i, i );
     152             :   // Setup some PCA storage
     153           4 :   data[0]->setupPCAStorage ( mypack ); std::vector<double> displace( getNumberOfAtoms() );
     154           2 :   if( getNumberOfAtoms()>0 ) {
     155           1 :     ReferenceAtoms* at = dynamic_cast<ReferenceAtoms*>( data[0] );
     156           1 :     displace = at->getDisplace();
     157             :   }
     158             : 
     159             :   // Create some arrays to store the average position
     160           2 :   std::vector<double> sarg( getNumberOfArguments(), 0 );
     161           2 :   std::vector<Vector> spos( getNumberOfAtoms() );
     162          46 :   for(unsigned i=0; i<getNumberOfAtoms(); ++i) spos[i].zero();
     163             : 
     164             :   // Calculate the average displacement from the first frame
     165           2 :   double norm=getWeight(0);
     166        2184 :   for(unsigned i=1; i<getNumberOfDataPoints(); ++i) {
     167        5450 :     data[0]->calc( data[i]->getReferencePositions(), getPbc(), getArguments(), data[i]->getReferenceArguments(), mypack, true );
     168             :     // Accumulate average displacement of arguments (Here PBC could do fucked up things - really needs Berry Phase ) GAT
     169        4360 :     for(unsigned j=0; j<getNumberOfArguments(); ++j) sarg[j] += 0.5*getWeight(i)*mypack.getArgumentDerivative(j);
     170             :     // Accumulate average displacement of position
     171       49050 :     for(unsigned j=0; j<getNumberOfAtoms(); ++j) spos[j] += getWeight(i)*mypack.getAtomsDisplacementVector()[j] / displace[j];
     172        1090 :     norm += getWeight(i);
     173             :   }
     174             :   // Now normalise the displacements to get the average and add these to the first frame
     175           2 :   double inorm = 1.0 / norm ;
     176          12 :   for(unsigned j=0; j<getNumberOfArguments(); ++j) sarg[j] = inorm*sarg[j] + data[0]->getReferenceArguments()[j];
     177          90 :   for(unsigned j=0; j<getNumberOfAtoms(); ++j) spos[j] = inorm*spos[j] + data[0]->getReferencePositions()[j];
     178             :   // And set the reference configuration
     179           2 :   std::vector<double> empty( getNumberOfArguments(), 1.0 ); myref->setReferenceConfig( spos, sarg, empty );
     180             : 
     181             :   // Now accumulate the covariance
     182           2 :   unsigned narg=getNumberOfArguments();
     183           6 :   Matrix<double> covar( getNumberOfArguments()+3*getNumberOfAtoms(), getNumberOfArguments()+3*getNumberOfAtoms() ); covar=0;
     184        2188 :   for(unsigned i=0; i<getNumberOfDataPoints(); ++i) {
     185             :     // double d = data[i]->calc( spos, getPbc(), getArguments(), sarg, mypack, true );
     186        5460 :     data[0]->calc( data[i]->getReferencePositions(), getPbc(), getArguments(), data[i]->getReferenceArguments(), mypack, true );
     187        3276 :     for(unsigned jarg=0; jarg<getNumberOfArguments(); ++jarg) {
     188             :       // Need sorting for PBC with GAT
     189        4368 :       double jarg_d = 0.5*mypack.getArgumentDerivative(jarg) + data[0]->getReferenceArguments()[jarg] - sarg[jarg];
     190        5460 :       for(unsigned karg=0; karg<getNumberOfArguments(); ++karg) {
     191             :         // Need sorting for PBC with GAT
     192        8736 :         double karg_d = 0.5*mypack.getArgumentDerivative(karg) + data[0]->getReferenceArguments()[karg] - sarg[karg];
     193        4368 :         covar( jarg, karg ) += 0.25*getWeight(i)*jarg_d*karg_d; // mypack.getArgumentDerivative(jarg)*mypack.getArgumentDerivative(karg);
     194             :       }
     195             :     }
     196       25116 :     for(unsigned jat=0; jat<getNumberOfAtoms(); ++jat) {
     197       84084 :       for(unsigned jc=0; jc<3; ++jc) {
     198      216216 :         double jdisplace = mypack.getAtomsDisplacementVector()[jat][jc] / displace[jat] + data[0]->getReferencePositions()[jat][jc] - spos[jat][jc];
     199     1621620 :         for(unsigned kat=0; kat<getNumberOfAtoms(); ++kat) {
     200     5549544 :           for(unsigned kc=0; kc<3; ++kc) {
     201    14270256 :             double kdisplace = mypack.getAtomsDisplacementVector()[kat][kc] / displace[kat] + data[0]->getReferencePositions()[kat][kc] - spos[kat][kc];
     202     4756752 :             covar( narg+3*jat + jc, narg+3*kat + kc ) += getWeight(i)*jdisplace*kdisplace;
     203             :           }
     204             :         }
     205             :       }
     206             :     }
     207             :   }
     208             :   // Normalise
     209         138 :   for(unsigned i=0; i<covar.nrows(); ++i) {
     210        8788 :     for(unsigned j=0; j<covar.ncols(); ++j) covar(i,j) *= inorm;
     211             :   }
     212             : 
     213             :   // Diagonalise the covariance
     214           4 :   std::vector<double> eigval( getNumberOfArguments()+3*getNumberOfAtoms() );
     215           6 :   Matrix<double> eigvec( getNumberOfArguments()+3*getNumberOfAtoms(), getNumberOfArguments()+3*getNumberOfAtoms() );
     216           2 :   diagMat( covar, eigval, eigvec );
     217             : 
     218             :   // Open an output file
     219           6 :   OFile ofile; ofile.link(*this); ofile.setBackupString("analysis");
     220           2 :   ofile.open( ofilename );
     221             :   // Output the reference configuration
     222           6 :   myref->print( ofile, getOutputFormat(), atoms.getUnits().getLength()/0.1 );
     223             : 
     224             :   // Store and print the eigenvectors
     225           2 :   std::vector<Vector> tmp_atoms( getNumberOfAtoms() );
     226           2 :   std::vector<double> tmp_args( getNumberOfArguments() );
     227           4 :   Direction* tref = metricRegister().create<Direction>( "DIRECTION" );
     228           2 :   tref->setNamesAndAtomNumbers( getAbsoluteIndexes(), argument_names );
     229          10 :   for(unsigned dim=0; dim<ndim; ++dim) {
     230           4 :     unsigned idim = covar.ncols() - 1 - dim;
     231          16 :     for(unsigned i=0; i<getNumberOfArguments(); ++i) tmp_args[i]=arg_eigv(dim,i)=eigvec(idim,i);
     232          92 :     for(unsigned i=0; i<getNumberOfAtoms(); ++i) {
     233         440 :       for(unsigned k=0; k<3; ++k) tmp_atoms[i][k]=atom_eigv(dim,i)[k]=eigvec(idim,narg+3*i+k);
     234             :     }
     235           4 :     tref->setDirection( tmp_atoms, tmp_args );
     236          12 :     tref->print( ofile, getOutputFormat(), atoms.getUnits().getLength()/0.1 );
     237             :   }
     238             :   // Close the output file
     239           2 :   delete tref; ofile.close();
     240           2 : }
     241             : 
     242             : }
     243        4839 : }

Generated by: LCOV version 1.13