Line data Source code
1 : /* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 : Copyright (c) 2016-2019 The plumed team
3 : (see the PEOPLE file at the root of the distribution for a list of names)
4 :
5 : See http://www.plumed.org for more information.
6 :
7 : This file is part of plumed, version 2.
8 :
9 : plumed is free software: you can redistribute it and/or modify
10 : it under the terms of the GNU Lesser General Public License as published by
11 : the Free Software Foundation, either version 3 of the License, or
12 : (at your option) any later version.
13 :
14 : plumed is distributed in the hope that it will be useful,
15 : but WITHOUT ANY WARRANTY; without even the implied warranty of
16 : MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
17 : GNU Lesser General Public License for more details.
18 :
19 : You should have received a copy of the GNU Lesser General Public License
20 : along with plumed. If not, see <http://www.gnu.org/licenses/>.
21 : +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ */
22 : #include "Analysis.h"
23 : #include "tools/Matrix.h"
24 : #include "reference/Direction.h"
25 : #include "reference/MetricRegister.h"
26 : #include "reference/ReferenceConfiguration.h"
27 : #include "reference/ReferenceValuePack.h"
28 : #include "core/ActionRegister.h"
29 :
30 : //+PLUMEDOC DIMRED PCA
31 : /*
32 : Perform principal component analysis (PCA) using either the positions of the atoms a large number of collective variables as input.
33 :
34 : Principal component analysis is a statistical technique that uses an orthogonal transformation to convert a set of observations of
35 : poorly correlated variables into a set of linearly uncorrelated variables. You can read more about the specifics of this technique
36 : here: https://en.wikipedia.org/wiki/Principal_component_analysis
37 :
38 : When used with molecular dynamics simulations a set of frames taken from the trajectory, \f$\{X_i\}\f$, or the values of
39 : a number of collective variables which are calculated from the trajectory frames are used as input. In this second instance your
40 : input to the PCA analysis algorithm is thus a set of high-dimensional vectors of collective variables. However, if
41 : collective variables are calculated from the positions of the atoms or if the positions are used directly the assumption is that
42 : this input trajectory is a set of poorly correlated (high-dimensional) vectors. After principal component analysis has been
43 : performed the output is a set of orthogonal vectors that describe the directions in which the largest motions have been seen.
44 : In other words, principal component analysis provides a method for lowering the dimensionality of the data contained in a trajectory.
45 : These output directions are some linear combination of the \f$x\f$, \f$y\f$ and \f$z\f$ positions if the positions were used as input
46 : or some linear combination of the input collective variables if a high-dimensional vector of collective variables was used as input.
47 :
48 : As explained on the Wikipedia page you must calculate the average and covariance for each of the input coordinates. In other words, you must
49 : calculate the average structure and the amount the system fluctuates around this average structure. The problem in doing so when the
50 : \f$x\f$, \f$y\f$ and \f$z\f$ coordinates of a molecule are used as input is that the majority of the changes in the positions of the
51 : atoms comes from the translational and rotational degrees of freedom of the molecule. The first six principal components will thus, most likely,
52 : be uninteresting. Consequently, to remedy this problem PLUMED provides the functionality to perform an RMSD alignment of the all the structures
53 : to be analysed to the first frame in the trajectory. This can be used to effectively remove translational and/or rotational motions from
54 : consideration. The resulting principal components thus describe vibrational motions of the molecule.
55 :
56 : If you wish to calculate the projection of a trajectory on a set of principal components calculated from this PCA action then the output can be
57 : used as input for the \ref PCAVARS action.
58 :
59 : \par Examples
60 :
61 : The following input instructs PLUMED to perform a principal component analysis in which the covariance matrix is calculated from changes in the positions
62 : of the first 22 atoms. The TYPE=OPTIMAL instruction ensures that translational and rotational degrees of freedom are removed from consideration.
63 : The first two principal components will be output to a file called pca-comp.pdb. Trajectory frames will be collected on every step and the PCA calculation
64 : will be performed at the end of the simulation.
65 :
66 : \plumedfile
67 : PCA METRIC=OPTIMAL ATOMS=1-22 STRIDE=1 NLOW_DIM=2 OFILE=pca-comp.pdb
68 : \endplumedfile
69 :
70 : The following input instructs PLUMED to perform a principal component analysis in which the covariance matrix is calculated from chnages in the six distances
71 : seen in the previous lines. Notice that here the TYPE=EUCLIDEAN keyword is used to indicate that no alighment has to be done when calculating the various
72 : elements of the covariance matrix from the input vectors. In this calculation the first two principal components will be output to a file called pca-comp.pdb.
73 : Trajectory frames will be collected every five steps and the PCA calculation is performed every 1000 steps. Consequently, if you run a 2000 step simulation the
74 : PCA analysis will be performed twice. The REWEIGHT_BIAS keyword in this input tells PLUMED that rather that ascribing a weight of one to each of the frames
75 : when calculating averages and covariances a reweighting should be performed based and each frames' weight in these calculations should be determined based on
76 : the current value of the instantaneous bias (see \ref REWEIGHT_BIAS).
77 :
78 : \plumedfile
79 : d1: DISTANCE ATOMS=1,2
80 : d2: DISTANCE ATOMS=1,3
81 : d3: DISTANCE ATOMS=1,4
82 : d4: DISTANCE ATOMS=2,3
83 : d5: DISTANCE ATOMS=2,4
84 : d6: DISTANCE ATOMS=3,4
85 :
86 : PCA ARG=d1,d2,d3,d4,d5,d6 METRIC=EUCLIDEAN STRIDE=5 RUN=1000 NLOW_DIM=2 REWEIGHT_BIAS OFILE=pca-comp.pdb
87 : \endplumedfile
88 :
89 : */
90 : //+ENDPLUMEDOC
91 :
92 : namespace PLMD {
93 : namespace analysis {
94 :
95 : class PCA : public Analysis {
96 : private:
97 : unsigned ndim;
98 : /// The position of the reference configuration (the one we align to)
99 : ReferenceConfiguration* myref;
100 : /// The eigenvectors for the atomic displacements
101 : Matrix<Vector> atom_eigv;
102 : /// The eigenvectors for the displacements in argument space
103 : Matrix<double> arg_eigv;
104 : std::string ofilename;
105 : public:
106 : static void registerKeywords( Keywords& keys );
107 : explicit PCA(const ActionOptions&ao);
108 : ~PCA();
109 : void performAnalysis();
110 0 : void performTask( const unsigned&, const unsigned&, MultiValue& ) const { plumed_error(); }
111 : };
112 :
113 6454 : PLUMED_REGISTER_ACTION(PCA,"PCA")
114 :
115 3 : void PCA::registerKeywords( Keywords& keys ) {
116 3 : Analysis::registerKeywords( keys );
117 12 : keys.add("compulsory","NLOW_DIM","number of PCA coordinates required");
118 12 : keys.add("compulsory","OFILE","the file on which to output the eigenvectors");
119 3 : }
120 :
121 2 : PCA::PCA(const ActionOptions&ao):
122 4 : PLUMED_ANALYSIS_INIT(ao)
123 : {
124 : // Setup reference configuration
125 6 : log.printf(" performing PCA analysis using %s metric \n", getMetricName().c_str() );
126 6 : myref = metricRegister().create<ReferenceConfiguration>( getMetricName() );
127 4 : std::vector<std::string> argnames( getNumberOfArguments() );
128 10 : for(unsigned i=0; i<argnames.size(); ++i) {
129 6 : if( getArguments()[i]->isPeriodic() ) error("cannot run PCA with periodic variables");
130 6 : argnames[i] = getArguments()[i]->getName();
131 : }
132 4 : myref->setNamesAndAtomNumbers( getAbsoluteIndexes(), argnames );
133 :
134 4 : parse("NLOW_DIM",ndim);
135 2 : if( getNumberOfAtoms()>0 ) atom_eigv.resize( ndim, getNumberOfAtoms() );
136 2 : if( getNumberOfArguments()>0 ) arg_eigv.resize( ndim, getNumberOfArguments() );
137 :
138 : // Read stuff for output file
139 4 : parseOutputFile("OFILE",ofilename);
140 2 : checkRead();
141 2 : }
142 :
143 6 : PCA::~PCA() {
144 2 : delete myref;
145 4 : }
146 :
147 2 : void PCA::performAnalysis() {
148 : // Align everything to the first frame
149 6 : MultiValue myval( 1, getNumberOfArguments() + 3*getNumberOfAtoms() + 9 );
150 4 : ReferenceValuePack mypack( getNumberOfArguments(), getNumberOfAtoms(), myval );
151 46 : for(unsigned i=0; i<getNumberOfAtoms(); ++i) mypack.setAtomIndex( i, i );
152 : // Setup some PCA storage
153 4 : data[0]->setupPCAStorage ( mypack ); std::vector<double> displace( getNumberOfAtoms() );
154 2 : if( getNumberOfAtoms()>0 ) {
155 1 : ReferenceAtoms* at = dynamic_cast<ReferenceAtoms*>( data[0] );
156 1 : displace = at->getDisplace();
157 : }
158 :
159 : // Create some arrays to store the average position
160 2 : std::vector<double> sarg( getNumberOfArguments(), 0 );
161 2 : std::vector<Vector> spos( getNumberOfAtoms() );
162 46 : for(unsigned i=0; i<getNumberOfAtoms(); ++i) spos[i].zero();
163 :
164 : // Calculate the average displacement from the first frame
165 2 : double norm=getWeight(0);
166 2184 : for(unsigned i=1; i<getNumberOfDataPoints(); ++i) {
167 5450 : data[0]->calc( data[i]->getReferencePositions(), getPbc(), getArguments(), data[i]->getReferenceArguments(), mypack, true );
168 : // Accumulate average displacement of arguments (Here PBC could do fucked up things - really needs Berry Phase ) GAT
169 4360 : for(unsigned j=0; j<getNumberOfArguments(); ++j) sarg[j] += 0.5*getWeight(i)*mypack.getArgumentDerivative(j);
170 : // Accumulate average displacement of position
171 49050 : for(unsigned j=0; j<getNumberOfAtoms(); ++j) spos[j] += getWeight(i)*mypack.getAtomsDisplacementVector()[j] / displace[j];
172 1090 : norm += getWeight(i);
173 : }
174 : // Now normalise the displacements to get the average and add these to the first frame
175 2 : double inorm = 1.0 / norm ;
176 12 : for(unsigned j=0; j<getNumberOfArguments(); ++j) sarg[j] = inorm*sarg[j] + data[0]->getReferenceArguments()[j];
177 90 : for(unsigned j=0; j<getNumberOfAtoms(); ++j) spos[j] = inorm*spos[j] + data[0]->getReferencePositions()[j];
178 : // And set the reference configuration
179 2 : std::vector<double> empty( getNumberOfArguments(), 1.0 ); myref->setReferenceConfig( spos, sarg, empty );
180 :
181 : // Now accumulate the covariance
182 2 : unsigned narg=getNumberOfArguments();
183 6 : Matrix<double> covar( getNumberOfArguments()+3*getNumberOfAtoms(), getNumberOfArguments()+3*getNumberOfAtoms() ); covar=0;
184 2188 : for(unsigned i=0; i<getNumberOfDataPoints(); ++i) {
185 : // double d = data[i]->calc( spos, getPbc(), getArguments(), sarg, mypack, true );
186 5460 : data[0]->calc( data[i]->getReferencePositions(), getPbc(), getArguments(), data[i]->getReferenceArguments(), mypack, true );
187 3276 : for(unsigned jarg=0; jarg<getNumberOfArguments(); ++jarg) {
188 : // Need sorting for PBC with GAT
189 4368 : double jarg_d = 0.5*mypack.getArgumentDerivative(jarg) + data[0]->getReferenceArguments()[jarg] - sarg[jarg];
190 5460 : for(unsigned karg=0; karg<getNumberOfArguments(); ++karg) {
191 : // Need sorting for PBC with GAT
192 8736 : double karg_d = 0.5*mypack.getArgumentDerivative(karg) + data[0]->getReferenceArguments()[karg] - sarg[karg];
193 4368 : covar( jarg, karg ) += 0.25*getWeight(i)*jarg_d*karg_d; // mypack.getArgumentDerivative(jarg)*mypack.getArgumentDerivative(karg);
194 : }
195 : }
196 25116 : for(unsigned jat=0; jat<getNumberOfAtoms(); ++jat) {
197 84084 : for(unsigned jc=0; jc<3; ++jc) {
198 216216 : double jdisplace = mypack.getAtomsDisplacementVector()[jat][jc] / displace[jat] + data[0]->getReferencePositions()[jat][jc] - spos[jat][jc];
199 1621620 : for(unsigned kat=0; kat<getNumberOfAtoms(); ++kat) {
200 5549544 : for(unsigned kc=0; kc<3; ++kc) {
201 14270256 : double kdisplace = mypack.getAtomsDisplacementVector()[kat][kc] / displace[kat] + data[0]->getReferencePositions()[kat][kc] - spos[kat][kc];
202 4756752 : covar( narg+3*jat + jc, narg+3*kat + kc ) += getWeight(i)*jdisplace*kdisplace;
203 : }
204 : }
205 : }
206 : }
207 : }
208 : // Normalise
209 138 : for(unsigned i=0; i<covar.nrows(); ++i) {
210 8788 : for(unsigned j=0; j<covar.ncols(); ++j) covar(i,j) *= inorm;
211 : }
212 :
213 : // Diagonalise the covariance
214 4 : std::vector<double> eigval( getNumberOfArguments()+3*getNumberOfAtoms() );
215 6 : Matrix<double> eigvec( getNumberOfArguments()+3*getNumberOfAtoms(), getNumberOfArguments()+3*getNumberOfAtoms() );
216 2 : diagMat( covar, eigval, eigvec );
217 :
218 : // Open an output file
219 6 : OFile ofile; ofile.link(*this); ofile.setBackupString("analysis");
220 2 : ofile.open( ofilename );
221 : // Output the reference configuration
222 6 : myref->print( ofile, getOutputFormat(), atoms.getUnits().getLength()/0.1 );
223 :
224 : // Store and print the eigenvectors
225 2 : std::vector<Vector> tmp_atoms( getNumberOfAtoms() );
226 2 : std::vector<double> tmp_args( getNumberOfArguments() );
227 4 : Direction* tref = metricRegister().create<Direction>( "DIRECTION" );
228 2 : tref->setNamesAndAtomNumbers( getAbsoluteIndexes(), argument_names );
229 10 : for(unsigned dim=0; dim<ndim; ++dim) {
230 4 : unsigned idim = covar.ncols() - 1 - dim;
231 16 : for(unsigned i=0; i<getNumberOfArguments(); ++i) tmp_args[i]=arg_eigv(dim,i)=eigvec(idim,i);
232 92 : for(unsigned i=0; i<getNumberOfAtoms(); ++i) {
233 440 : for(unsigned k=0; k<3; ++k) tmp_atoms[i][k]=atom_eigv(dim,i)[k]=eigvec(idim,narg+3*i+k);
234 : }
235 4 : tref->setDirection( tmp_atoms, tmp_args );
236 12 : tref->print( ofile, getOutputFormat(), atoms.getUnits().getLength()/0.1 );
237 : }
238 : // Close the output file
239 2 : delete tref; ofile.close();
240 2 : }
241 :
242 : }
243 4839 : }
|