LCOV - code coverage report
Current view: top level - dimred - PCA.cpp (source / functions) Hit Total Coverage
Test: plumed test coverage Lines: 97 120 80.8 %
Date: 2024-10-11 08:09:47 Functions: 7 9 77.8 %

          Line data    Source code
       1             : /* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
       2             :    Copyright (c) 2016-2023 The plumed team
       3             :    (see the PEOPLE file at the root of the distribution for a list of names)
       4             : 
       5             :    See http://www.plumed.org for more information.
       6             : 
       7             :    This file is part of plumed, version 2.
       8             : 
       9             :    plumed is free software: you can redistribute it and/or modify
      10             :    it under the terms of the GNU Lesser General Public License as published by
      11             :    the Free Software Foundation, either version 3 of the License, or
      12             :    (at your option) any later version.
      13             : 
      14             :    plumed is distributed in the hope that it will be useful,
      15             :    but WITHOUT ANY WARRANTY; without even the implied warranty of
      16             :    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
      17             :    GNU Lesser General Public License for more details.
      18             : 
      19             :    You should have received a copy of the GNU Lesser General Public License
      20             :    along with plumed.  If not, see <http://www.gnu.org/licenses/>.
      21             : +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ */
      22             : #include "PCA.h"
      23             : #include "tools/Matrix.h"
      24             : #include "reference/MetricRegister.h"
      25             : #include "reference/ReferenceValuePack.h"
      26             : #include "analysis/ReadAnalysisFrames.h"
      27             : #include "core/ActionRegister.h"
      28             : 
      29             : //+PLUMEDOC DIMRED PCA
      30             : /*
      31             : Perform principal component analysis (PCA) using either the positions of the atoms a large number of collective variables as input.
      32             : 
      33             : Principal component analysis is a statistical technique that uses an orthogonal transformation to convert a set of observations of
      34             : poorly correlated variables into a set of linearly uncorrelated variables.  You can read more about the specifics of this technique
      35             : here: https://en.wikipedia.org/wiki/Principal_component_analysis
      36             : 
      37             : When used with molecular dynamics simulations a set of frames taken from the trajectory, \f$\{X_i\}\f$, or the values of
      38             : a number of collective variables which are calculated from the trajectory frames are used as input.  In this second instance your
      39             : input to the PCA analysis algorithm is thus a set of high-dimensional vectors of collective variables.  However, if
      40             : collective variables are calculated from the positions of the atoms or if the positions are used directly the assumption is that
      41             : this input trajectory is a set of poorly correlated (high-dimensional) vectors.  After principal component analysis has been
      42             : performed the output is a set of orthogonal vectors that describe the directions in which the largest motions have been seen.
      43             : In other words, principal component analysis provides a method for lowering the dimensionality of the data contained in a trajectory.
      44             : These output directions are some linear combination of the \f$x\f$, \f$y\f$ and \f$z\f$ positions if the positions were used as input
      45             : or some linear combination of the input collective variables if a high-dimensional vector of collective variables was used as input.
      46             : 
      47             : As explained on the Wikipedia page you must calculate the average and covariance for each of the input coordinates.  In other words, you must
      48             : calculate the average structure and the amount the system fluctuates around this average structure.  The problem in doing so when the
      49             : \f$x\f$, \f$y\f$ and \f$z\f$ coordinates of a molecule are used as input is that the majority of the changes in the positions of the
      50             : atoms comes from the translational and rotational degrees of freedom of the molecule.  The first six principal components will thus, most likely,
      51             : be uninteresting.  Consequently, to remedy this problem PLUMED provides the functionality to perform an RMSD alignment of the all the structures
      52             : to be analyzed to the first frame in the trajectory.  This can be used to effectively remove translational and/or rotational motions from
      53             : consideration.  The resulting principal components thus describe vibrational motions of the molecule.
      54             : 
      55             : If you wish to calculate the projection of a trajectory on a set of principal components calculated from this PCA action then the output can be
      56             : used as input for the \ref PCAVARS action.
      57             : 
      58             : \par Examples
      59             : 
      60             : The following input instructs PLUMED to perform a principal component analysis in which the covariance matrix is calculated from changes in the positions
      61             : of the first 22 atoms.  The TYPE=OPTIMAL instruction ensures that translational and rotational degrees of freedom are removed from consideration.
      62             : The first two principal components will be output to a file called PCA-comp.pdb.  Trajectory frames will be collected on every step and the PCA calculation
      63             : will be performed at the end of the simulation.
      64             : 
      65             : \plumedfile
      66             : ff: COLLECT_FRAMES ATOMS=1-22 STRIDE=1
      67             : pca: PCA USE_OUTPUT_DATA_FROM=ff METRIC=OPTIMAL NLOW_DIM=2
      68             : OUTPUT_PCA_PROJECTION USE_OUTPUT_DATA_FROM=pca FILE=PCA-comp.pdb
      69             : \endplumedfile
      70             : 
      71             : The following input instructs PLUMED to perform a principal component analysis in which the covariance matrix is calculated from changes in the six distances
      72             : seen in the previous lines.  Notice that here the TYPE=EUCLIDEAN keyword is used to indicate that no alignment has to be done when calculating the various
      73             : elements of the covariance matrix from the input vectors.  In this calculation the first two principal components will be output to a file called PCA-comp.pdb.
      74             : Trajectory frames will be collected every five steps and the PCA calculation is performed every 1000 steps.  Consequently, if you run a 2000 step simulation the
      75             : PCA analysis will be performed twice.  The REWEIGHT_BIAS action in this input tells PLUMED that rather that ascribing a weight of one to each of the frames
      76             : when calculating averages and covariance matrices a reweighting should be performed based and each frames' weight in these calculations should be determined based on
      77             : the current value of the instantaneous bias (see \ref REWEIGHT_BIAS).
      78             : 
      79             : \plumedfile
      80             : d1: DISTANCE ATOMS=1,2
      81             : d2: DISTANCE ATOMS=1,3
      82             : d3: DISTANCE ATOMS=1,4
      83             : d4: DISTANCE ATOMS=2,3
      84             : d5: DISTANCE ATOMS=2,4
      85             : d6: DISTANCE ATOMS=3,4
      86             : rr: RESTRAINT ARG=d1 AT=0.1 KAPPA=10
      87             : rbias: REWEIGHT_BIAS TEMP=300
      88             : 
      89             : ff: COLLECT_FRAMES ARG=d1,d2,d3,d4,d5,d6 LOGWEIGHTS=rbias STRIDE=5
      90             : pca: PCA USE_OUTPUT_DATA_FROM=ff METRIC=EUCLIDEAN NLOW_DIM=2
      91             : OUTPUT_PCA_PROJECTION USE_OUTPUT_DATA_FROM=pca STRIDE=100 FILE=PCA-comp.pdb
      92             : \endplumedfile
      93             : 
      94             : */
      95             : //+ENDPLUMEDOC
      96             : 
      97             : namespace PLMD {
      98             : namespace dimred {
      99             : 
     100       10425 : PLUMED_REGISTER_ACTION(PCA,"PCA")
     101             : 
     102           4 : void PCA::registerKeywords( Keywords& keys ) {
     103          12 :   DimensionalityReductionBase::registerKeywords( keys ); keys.use("ARG"); keys.reset_style("ARG","optional");
     104           8 :   keys.add("compulsory","METRIC","EUCLIDEAN","the method that you are going to use to measure the distances between points");
     105           8 :   keys.add("atoms","ATOMS","the list of atoms that you are going to use in the measure of distance that you are using");
     106           4 : }
     107             : 
     108           3 : PCA::PCA(const ActionOptions&ao):
     109             :   Action(ao),
     110           3 :   DimensionalityReductionBase(ao)
     111             : {
     112             :   // Get the input PDB file from the underlying ReadAnalysisFrames object
     113           3 :   analysis::ReadAnalysisFrames* myframes = dynamic_cast<analysis::ReadAnalysisFrames*>( my_input_data );
     114           3 :   if( !myframes ) error("input to PCA should be ReadAnalysisFrames object");
     115           6 :   parse("METRIC",mtype); std::vector<AtomNumber> atoms;
     116           3 :   log.printf("  performing PCA analysis using %s metric \n", mtype.c_str() );
     117           3 :   if( my_input_data->getNumberOfAtoms()>0 ) {
     118           2 :     parseAtomList("ATOMS",atoms);
     119           1 :     if( atoms.size()!=0 ) {
     120           0 :       mypdb.setAtomNumbers( atoms );
     121           0 :       for(unsigned i=0; i<atoms.size(); ++i) {
     122             :         bool found=false;
     123           0 :         for(unsigned j=0; j<my_input_data->getAtomIndexes().size(); ++j) {
     124           0 :           if( my_input_data->getAtomIndexes()[j]==atoms[i] ) { found=true; break; }
     125             :         }
     126           0 :         if( !found ) {
     127           0 :           std::string num; Tools::convert( atoms[i].serial(), num );
     128           0 :           error("atom number " + num + " is not stored in any action that has been input");
     129             :         }
     130             :       }
     131           0 :       mypdb.addBlockEnd( atoms.size() );
     132           1 :     } else if( getNumberOfArguments()==0 ) {
     133           1 :       mypdb.setAtomNumbers( my_input_data->getAtomIndexes() );
     134           1 :       mypdb.addBlockEnd( my_input_data->getAtomIndexes().size() );
     135           1 :       if( mtype=="EUCLIDEAN" ) mtype="OPTIMAL";
     136             :     }
     137             :   }
     138           3 :   if( my_input_data->getArgumentNames().size()>0 ) {
     139           2 :     if( getNumberOfArguments()==0 && atoms.size()==0 ) {
     140           2 :       std::vector<std::string> argnames( my_input_data->getArgumentNames() );
     141           2 :       mypdb.setArgumentNames( argnames ); requestArguments( my_input_data->getArgumentList() );
     142           2 :     } else {
     143           0 :       std::vector<Value*> myargs( getArguments() );
     144           0 :       std::vector<std::string> inargnames( my_input_data->getArgumentNames() );
     145           0 :       std::vector<std::string> argnames( myargs.size() );
     146           0 :       for(unsigned i=0; i<myargs.size(); ++i) {
     147           0 :         argnames[i]=myargs[i]->getName();
     148             :         bool found=false;
     149           0 :         for(unsigned j=0; j<inargnames.size(); ++j) {
     150           0 :           if( argnames[i]==inargnames[j] ) { found=true; break; }
     151             :         }
     152           0 :         if( !found ) error("input named " + my_input_data->getLabel() + " does not store/calculate quantity named " + argnames[i] );
     153             :       }
     154           0 :       mypdb.setArgumentNames( argnames ); requestArguments( myargs );
     155           0 :     }
     156             :   }
     157           3 :   if( nlow==0 ) error("dimensionality of output not set");
     158           3 :   checkRead();
     159           3 : }
     160             : 
     161           3 : void PCA::performAnalysis() {
     162             :   // Align everything to the first frame
     163           3 :   my_input_data->getStoredData( 0, false ).transferDataToPDB( mypdb );
     164           3 :   auto myconf0=metricRegister().create<ReferenceConfiguration>(mtype, mypdb);
     165           3 :   MultiValue myval( 1, myconf0->getNumberOfReferenceArguments() + 3*myconf0->getNumberOfReferencePositions() + 9 );
     166           3 :   ReferenceValuePack mypack( myconf0->getNumberOfReferenceArguments(), myconf0->getNumberOfReferencePositions(), myval );
     167          25 :   for(unsigned i=0; i<myconf0->getNumberOfReferencePositions(); ++i) mypack.setAtomIndex( i, i );
     168             :   // Setup some PCA storage
     169           3 :   myconf0->setupPCAStorage ( mypack );
     170           3 :   std::vector<double> displace( myconf0->getNumberOfReferencePositions() );
     171           3 :   if( myconf0->getNumberOfReferencePositions()>0 ) {
     172           1 :     ReferenceAtoms* at = dynamic_cast<ReferenceAtoms*>( myconf0.get() );
     173           1 :     displace = at->getDisplace();
     174             :   }
     175             : 
     176             :   // Create some arrays to store the average position
     177           3 :   std::vector<double> sarg( myconf0->getNumberOfReferenceArguments(), 0 );
     178           3 :   std::vector<Vector> spos( myconf0->getNumberOfReferencePositions() );
     179          25 :   for(unsigned i=0; i<myconf0->getNumberOfReferencePositions(); ++i) spos[i].zero();
     180             : 
     181             :   // Calculate the average displacement from the first frame
     182           3 :   double norm=getWeight(0); std::vector<double> args( getNumberOfArguments() );
     183        1638 :   for(unsigned i=1; i<getNumberOfDataPoints(); ++i) {
     184        1635 :     my_input_data->getStoredData( i, false ).transferDataToPDB( mypdb );
     185        3815 :     for(unsigned j=0; j<getArguments().size(); ++j) mypdb.getArgumentValue( getArguments()[j]->getName(), args[j] );
     186        1635 :     myconf0->calc( mypdb.getPositions(), getPbc(), getArguments(), args, mypack, true );
     187             :     // Accumulate average displacement of arguments (Here PBC could do fucked up things - really needs Berry Phase ) GAT
     188        3815 :     for(unsigned j=0; j<myconf0->getNumberOfReferenceArguments(); ++j) sarg[j] += 0.5*getWeight(i)*mypack.getArgumentDerivative(j);
     189             :     // Accumulate average displacement of position
     190       13625 :     for(unsigned j=0; j<myconf0->getNumberOfReferencePositions(); ++j) spos[j] += getWeight(i)*mypack.getAtomsDisplacementVector()[j] / displace[j];
     191        1635 :     norm += getWeight(i);
     192             :   }
     193             :   // Now normalise the displacements to get the average and add these to the first frame
     194           3 :   double inorm = 1.0 / norm ;
     195           7 :   for(unsigned j=0; j<myconf0->getNumberOfReferenceArguments(); ++j) sarg[j] = inorm*sarg[j] + myconf0->getReferenceArguments()[j];
     196          25 :   for(unsigned j=0; j<myconf0->getNumberOfReferencePositions(); ++j) spos[j] = inorm*spos[j] + myconf0->getReferencePositions()[j];
     197             :   // Now accumulate the covariance
     198           3 :   unsigned narg=myconf0->getNumberOfReferenceArguments(), natoms=myconf0->getNumberOfReferencePositions();
     199           3 :   Matrix<double> covar( narg+3*natoms, narg+3*natoms ); covar=0;
     200        1641 :   for(unsigned i=0; i<getNumberOfDataPoints(); ++i) {
     201        1638 :     my_input_data->getStoredData( i, false ).transferDataToPDB( mypdb );
     202        3822 :     for(unsigned j=0; j<getArguments().size(); ++j) mypdb.getArgumentValue( getArguments()[j]->getName(), args[j] );
     203        1638 :     myconf0->calc( mypdb.getPositions(), getPbc(), getArguments(), args, mypack, true );
     204        3822 :     for(unsigned jarg=0; jarg<narg; ++jarg) {
     205             :       // Need sorting for PBC with GAT
     206        2184 :       double jarg_d = 0.5*mypack.getArgumentDerivative(jarg) + myconf0->getReferenceArguments()[jarg] - sarg[jarg];
     207        6552 :       for(unsigned karg=0; karg<narg; ++karg) {
     208             :         // Need sorting for PBC with GAT
     209        4368 :         double karg_d = 0.5*mypack.getArgumentDerivative(karg) + myconf0->getReferenceArguments()[karg] - sarg[karg];
     210        4368 :         covar( jarg, karg ) += 0.25*getWeight(i)*jarg_d*karg_d; // mypack.getArgumentDerivative(jarg)*mypack.getArgumentDerivative(karg);
     211             :       }
     212             :     }
     213       13650 :     for(unsigned jat=0; jat<natoms; ++jat) {
     214       48048 :       for(unsigned jc=0; jc<3; ++jc) {
     215       36036 :         double jdisplace = mypack.getAtomsDisplacementVector()[jat][jc] / displace[jat] + myconf0->getReferencePositions()[jat][jc] - spos[jat][jc];
     216      828828 :         for(unsigned kat=0; kat<natoms; ++kat) {
     217     3171168 :           for(unsigned kc=0; kc<3; ++kc) {
     218     2378376 :             double kdisplace = mypack.getAtomsDisplacementVector()[kat][kc] /displace[kat] + myconf0->getReferencePositions()[kat][kc] - spos[kat][kc];
     219     2378376 :             covar( narg+3*jat + jc, narg+3*kat + kc ) += getWeight(i)*jdisplace*kdisplace;
     220             :           }
     221             :         }
     222             :       }
     223             :     }
     224             :   }
     225             :   // Normalise
     226          73 :   for(unsigned i=0; i<covar.nrows(); ++i) {
     227        4434 :     for(unsigned j=0; j<covar.ncols(); ++j) covar(i,j) *= inorm;
     228             :   }
     229             : 
     230             :   // Diagonalise the covariance
     231           3 :   std::vector<double> eigval( narg+3*natoms );
     232             :   Matrix<double> eigvec( narg+3*natoms, narg+3*natoms );
     233           3 :   diagMat( covar, eigval, eigvec );
     234             : 
     235             :   // Output the reference configuration
     236           3 :   mypdb.setAtomPositions( spos );
     237           7 :   for(unsigned j=0; j<sarg.size(); ++j) mypdb.setArgumentValue( getArguments()[j]->getName(), sarg[j] );
     238             :   // Reset the reference configuration
     239           6 :   myref = metricRegister().create<ReferenceConfiguration>( mtype, mypdb );
     240             : 
     241             :   // Store and print the eigenvectors
     242           3 :   std::vector<Vector> tmp_atoms( natoms );
     243           9 :   for(unsigned dim=0; dim<nlow; ++dim) {
     244           6 :     unsigned idim = covar.ncols() - 1 - dim;
     245          14 :     for(unsigned i=0; i<narg; ++i) mypdb.setArgumentValue( getArguments()[i]->getName(), eigvec(idim,i) );
     246          50 :     for(unsigned i=0; i<natoms; ++i) {
     247         176 :       for(unsigned k=0; k<3; ++k) tmp_atoms[i][k]=eigvec(idim,narg+3*i+k);
     248             :     }
     249           6 :     mypdb.setAtomPositions( tmp_atoms );
     250             :     // Create a direction object so that we can calculate other PCA components
     251          12 :     directions.push_back( Direction(ReferenceConfigurationOptions("DIRECTION")));
     252           6 :     directions[dim].read( mypdb );
     253             :   }
     254           3 : }
     255             : 
     256           0 : void PCA::getProjection( const unsigned& idata, std::vector<double>& point, double& weight ) {
     257           0 :   if( point.size()!=nlow ) point.resize( nlow );
     258             :   // Retrieve the weight
     259           0 :   weight = getWeight(idata);
     260             :   // Retrieve the data point
     261           0 :   getProjection( my_input_data->getStoredData( idata, false ), point );
     262           0 : }
     263             : 
     264         546 : void PCA::getProjection( analysis::DataCollectionObject& myidata, std::vector<double>& point ) {
     265         546 :   myidata.transferDataToPDB( mypdb ); std::vector<double> args( getArguments().size() );
     266        1638 :   for(unsigned j=0; j<getArguments().size(); ++j) mypdb.getArgumentValue( getArguments()[j]->getName(), args[j] );
     267             :   // Create some storage space
     268         546 :   MultiValue myval( 1, 3*mypdb.getPositions().size() + args.size() + 9);
     269         546 :   ReferenceValuePack mypack( args.size(), mypdb.getPositions().size(), myval );
     270         546 :   for(unsigned i=0; i<mypdb.getPositions().size(); ++i) mypack.setAtomIndex( i, i );
     271         546 :   myref->setupPCAStorage( mypack );
     272             :   // And calculate
     273         546 :   myref->calculate( mypdb.getPositions(), getPbc(), getArguments(), mypack, true );
     274        1638 :   for(unsigned i=0; i<nlow; ++i) point[i]=myref->projectDisplacementOnVector( directions[i], getArguments(), args, mypack );
     275        1092 : }
     276             : 
     277             : }
     278             : }

Generated by: LCOV version 1.15