Line data Source code
1 : /* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 : Copyright (c) 2016-2023 The plumed team
3 : (see the PEOPLE file at the root of the distribution for a list of names)
4 :
5 : See http://www.plumed.org for more information.
6 :
7 : This file is part of plumed, version 2.
8 :
9 : plumed is free software: you can redistribute it and/or modify
10 : it under the terms of the GNU Lesser General Public License as published by
11 : the Free Software Foundation, either version 3 of the License, or
12 : (at your option) any later version.
13 :
14 : plumed is distributed in the hope that it will be useful,
15 : but WITHOUT ANY WARRANTY; without even the implied warranty of
16 : MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
17 : GNU Lesser General Public License for more details.
18 :
19 : You should have received a copy of the GNU Lesser General Public License
20 : along with plumed. If not, see <http://www.gnu.org/licenses/>.
21 : +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ */
22 : #include "PCA.h"
23 : #include "tools/Matrix.h"
24 : #include "reference/MetricRegister.h"
25 : #include "reference/ReferenceValuePack.h"
26 : #include "analysis/ReadAnalysisFrames.h"
27 : #include "core/ActionRegister.h"
28 :
29 : //+PLUMEDOC DIMRED PCA
30 : /*
31 : Perform principal component analysis (PCA) using either the positions of the atoms a large number of collective variables as input.
32 :
33 : Principal component analysis is a statistical technique that uses an orthogonal transformation to convert a set of observations of
34 : poorly correlated variables into a set of linearly uncorrelated variables. You can read more about the specifics of this technique
35 : here: https://en.wikipedia.org/wiki/Principal_component_analysis
36 :
37 : When used with molecular dynamics simulations a set of frames taken from the trajectory, \f$\{X_i\}\f$, or the values of
38 : a number of collective variables which are calculated from the trajectory frames are used as input. In this second instance your
39 : input to the PCA analysis algorithm is thus a set of high-dimensional vectors of collective variables. However, if
40 : collective variables are calculated from the positions of the atoms or if the positions are used directly the assumption is that
41 : this input trajectory is a set of poorly correlated (high-dimensional) vectors. After principal component analysis has been
42 : performed the output is a set of orthogonal vectors that describe the directions in which the largest motions have been seen.
43 : In other words, principal component analysis provides a method for lowering the dimensionality of the data contained in a trajectory.
44 : These output directions are some linear combination of the \f$x\f$, \f$y\f$ and \f$z\f$ positions if the positions were used as input
45 : or some linear combination of the input collective variables if a high-dimensional vector of collective variables was used as input.
46 :
47 : As explained on the Wikipedia page you must calculate the average and covariance for each of the input coordinates. In other words, you must
48 : calculate the average structure and the amount the system fluctuates around this average structure. The problem in doing so when the
49 : \f$x\f$, \f$y\f$ and \f$z\f$ coordinates of a molecule are used as input is that the majority of the changes in the positions of the
50 : atoms comes from the translational and rotational degrees of freedom of the molecule. The first six principal components will thus, most likely,
51 : be uninteresting. Consequently, to remedy this problem PLUMED provides the functionality to perform an RMSD alignment of the all the structures
52 : to be analyzed to the first frame in the trajectory. This can be used to effectively remove translational and/or rotational motions from
53 : consideration. The resulting principal components thus describe vibrational motions of the molecule.
54 :
55 : If you wish to calculate the projection of a trajectory on a set of principal components calculated from this PCA action then the output can be
56 : used as input for the \ref PCAVARS action.
57 :
58 : \par Examples
59 :
60 : The following input instructs PLUMED to perform a principal component analysis in which the covariance matrix is calculated from changes in the positions
61 : of the first 22 atoms. The TYPE=OPTIMAL instruction ensures that translational and rotational degrees of freedom are removed from consideration.
62 : The first two principal components will be output to a file called PCA-comp.pdb. Trajectory frames will be collected on every step and the PCA calculation
63 : will be performed at the end of the simulation.
64 :
65 : \plumedfile
66 : ff: COLLECT_FRAMES ATOMS=1-22 STRIDE=1
67 : pca: PCA USE_OUTPUT_DATA_FROM=ff METRIC=OPTIMAL NLOW_DIM=2
68 : OUTPUT_PCA_PROJECTION USE_OUTPUT_DATA_FROM=pca FILE=PCA-comp.pdb
69 : \endplumedfile
70 :
71 : The following input instructs PLUMED to perform a principal component analysis in which the covariance matrix is calculated from changes in the six distances
72 : seen in the previous lines. Notice that here the TYPE=EUCLIDEAN keyword is used to indicate that no alignment has to be done when calculating the various
73 : elements of the covariance matrix from the input vectors. In this calculation the first two principal components will be output to a file called PCA-comp.pdb.
74 : Trajectory frames will be collected every five steps and the PCA calculation is performed every 1000 steps. Consequently, if you run a 2000 step simulation the
75 : PCA analysis will be performed twice. The REWEIGHT_BIAS action in this input tells PLUMED that rather that ascribing a weight of one to each of the frames
76 : when calculating averages and covariance matrices a reweighting should be performed based and each frames' weight in these calculations should be determined based on
77 : the current value of the instantaneous bias (see \ref REWEIGHT_BIAS).
78 :
79 : \plumedfile
80 : d1: DISTANCE ATOMS=1,2
81 : d2: DISTANCE ATOMS=1,3
82 : d3: DISTANCE ATOMS=1,4
83 : d4: DISTANCE ATOMS=2,3
84 : d5: DISTANCE ATOMS=2,4
85 : d6: DISTANCE ATOMS=3,4
86 : rr: RESTRAINT ARG=d1 AT=0.1 KAPPA=10
87 : rbias: REWEIGHT_BIAS TEMP=300
88 :
89 : ff: COLLECT_FRAMES ARG=d1,d2,d3,d4,d5,d6 LOGWEIGHTS=rbias STRIDE=5
90 : pca: PCA USE_OUTPUT_DATA_FROM=ff METRIC=EUCLIDEAN NLOW_DIM=2
91 : OUTPUT_PCA_PROJECTION USE_OUTPUT_DATA_FROM=pca STRIDE=100 FILE=PCA-comp.pdb
92 : \endplumedfile
93 :
94 : */
95 : //+ENDPLUMEDOC
96 :
97 : namespace PLMD {
98 : namespace dimred {
99 :
100 10425 : PLUMED_REGISTER_ACTION(PCA,"PCA")
101 :
102 4 : void PCA::registerKeywords( Keywords& keys ) {
103 12 : DimensionalityReductionBase::registerKeywords( keys ); keys.use("ARG"); keys.reset_style("ARG","optional");
104 8 : keys.add("compulsory","METRIC","EUCLIDEAN","the method that you are going to use to measure the distances between points");
105 8 : keys.add("atoms","ATOMS","the list of atoms that you are going to use in the measure of distance that you are using");
106 4 : }
107 :
108 3 : PCA::PCA(const ActionOptions&ao):
109 : Action(ao),
110 3 : DimensionalityReductionBase(ao)
111 : {
112 : // Get the input PDB file from the underlying ReadAnalysisFrames object
113 3 : analysis::ReadAnalysisFrames* myframes = dynamic_cast<analysis::ReadAnalysisFrames*>( my_input_data );
114 3 : if( !myframes ) error("input to PCA should be ReadAnalysisFrames object");
115 6 : parse("METRIC",mtype); std::vector<AtomNumber> atoms;
116 3 : log.printf(" performing PCA analysis using %s metric \n", mtype.c_str() );
117 3 : if( my_input_data->getNumberOfAtoms()>0 ) {
118 2 : parseAtomList("ATOMS",atoms);
119 1 : if( atoms.size()!=0 ) {
120 0 : mypdb.setAtomNumbers( atoms );
121 0 : for(unsigned i=0; i<atoms.size(); ++i) {
122 : bool found=false;
123 0 : for(unsigned j=0; j<my_input_data->getAtomIndexes().size(); ++j) {
124 0 : if( my_input_data->getAtomIndexes()[j]==atoms[i] ) { found=true; break; }
125 : }
126 0 : if( !found ) {
127 0 : std::string num; Tools::convert( atoms[i].serial(), num );
128 0 : error("atom number " + num + " is not stored in any action that has been input");
129 : }
130 : }
131 0 : mypdb.addBlockEnd( atoms.size() );
132 1 : } else if( getNumberOfArguments()==0 ) {
133 1 : mypdb.setAtomNumbers( my_input_data->getAtomIndexes() );
134 1 : mypdb.addBlockEnd( my_input_data->getAtomIndexes().size() );
135 1 : if( mtype=="EUCLIDEAN" ) mtype="OPTIMAL";
136 : }
137 : }
138 3 : if( my_input_data->getArgumentNames().size()>0 ) {
139 2 : if( getNumberOfArguments()==0 && atoms.size()==0 ) {
140 2 : std::vector<std::string> argnames( my_input_data->getArgumentNames() );
141 2 : mypdb.setArgumentNames( argnames ); requestArguments( my_input_data->getArgumentList() );
142 2 : } else {
143 0 : std::vector<Value*> myargs( getArguments() );
144 0 : std::vector<std::string> inargnames( my_input_data->getArgumentNames() );
145 0 : std::vector<std::string> argnames( myargs.size() );
146 0 : for(unsigned i=0; i<myargs.size(); ++i) {
147 0 : argnames[i]=myargs[i]->getName();
148 : bool found=false;
149 0 : for(unsigned j=0; j<inargnames.size(); ++j) {
150 0 : if( argnames[i]==inargnames[j] ) { found=true; break; }
151 : }
152 0 : if( !found ) error("input named " + my_input_data->getLabel() + " does not store/calculate quantity named " + argnames[i] );
153 : }
154 0 : mypdb.setArgumentNames( argnames ); requestArguments( myargs );
155 0 : }
156 : }
157 3 : if( nlow==0 ) error("dimensionality of output not set");
158 3 : checkRead();
159 3 : }
160 :
161 3 : void PCA::performAnalysis() {
162 : // Align everything to the first frame
163 3 : my_input_data->getStoredData( 0, false ).transferDataToPDB( mypdb );
164 3 : auto myconf0=metricRegister().create<ReferenceConfiguration>(mtype, mypdb);
165 3 : MultiValue myval( 1, myconf0->getNumberOfReferenceArguments() + 3*myconf0->getNumberOfReferencePositions() + 9 );
166 3 : ReferenceValuePack mypack( myconf0->getNumberOfReferenceArguments(), myconf0->getNumberOfReferencePositions(), myval );
167 25 : for(unsigned i=0; i<myconf0->getNumberOfReferencePositions(); ++i) mypack.setAtomIndex( i, i );
168 : // Setup some PCA storage
169 3 : myconf0->setupPCAStorage ( mypack );
170 3 : std::vector<double> displace( myconf0->getNumberOfReferencePositions() );
171 3 : if( myconf0->getNumberOfReferencePositions()>0 ) {
172 1 : ReferenceAtoms* at = dynamic_cast<ReferenceAtoms*>( myconf0.get() );
173 1 : displace = at->getDisplace();
174 : }
175 :
176 : // Create some arrays to store the average position
177 3 : std::vector<double> sarg( myconf0->getNumberOfReferenceArguments(), 0 );
178 3 : std::vector<Vector> spos( myconf0->getNumberOfReferencePositions() );
179 25 : for(unsigned i=0; i<myconf0->getNumberOfReferencePositions(); ++i) spos[i].zero();
180 :
181 : // Calculate the average displacement from the first frame
182 3 : double norm=getWeight(0); std::vector<double> args( getNumberOfArguments() );
183 1638 : for(unsigned i=1; i<getNumberOfDataPoints(); ++i) {
184 1635 : my_input_data->getStoredData( i, false ).transferDataToPDB( mypdb );
185 3815 : for(unsigned j=0; j<getArguments().size(); ++j) mypdb.getArgumentValue( getArguments()[j]->getName(), args[j] );
186 1635 : myconf0->calc( mypdb.getPositions(), getPbc(), getArguments(), args, mypack, true );
187 : // Accumulate average displacement of arguments (Here PBC could do fucked up things - really needs Berry Phase ) GAT
188 3815 : for(unsigned j=0; j<myconf0->getNumberOfReferenceArguments(); ++j) sarg[j] += 0.5*getWeight(i)*mypack.getArgumentDerivative(j);
189 : // Accumulate average displacement of position
190 13625 : for(unsigned j=0; j<myconf0->getNumberOfReferencePositions(); ++j) spos[j] += getWeight(i)*mypack.getAtomsDisplacementVector()[j] / displace[j];
191 1635 : norm += getWeight(i);
192 : }
193 : // Now normalise the displacements to get the average and add these to the first frame
194 3 : double inorm = 1.0 / norm ;
195 7 : for(unsigned j=0; j<myconf0->getNumberOfReferenceArguments(); ++j) sarg[j] = inorm*sarg[j] + myconf0->getReferenceArguments()[j];
196 25 : for(unsigned j=0; j<myconf0->getNumberOfReferencePositions(); ++j) spos[j] = inorm*spos[j] + myconf0->getReferencePositions()[j];
197 : // Now accumulate the covariance
198 3 : unsigned narg=myconf0->getNumberOfReferenceArguments(), natoms=myconf0->getNumberOfReferencePositions();
199 3 : Matrix<double> covar( narg+3*natoms, narg+3*natoms ); covar=0;
200 1641 : for(unsigned i=0; i<getNumberOfDataPoints(); ++i) {
201 1638 : my_input_data->getStoredData( i, false ).transferDataToPDB( mypdb );
202 3822 : for(unsigned j=0; j<getArguments().size(); ++j) mypdb.getArgumentValue( getArguments()[j]->getName(), args[j] );
203 1638 : myconf0->calc( mypdb.getPositions(), getPbc(), getArguments(), args, mypack, true );
204 3822 : for(unsigned jarg=0; jarg<narg; ++jarg) {
205 : // Need sorting for PBC with GAT
206 2184 : double jarg_d = 0.5*mypack.getArgumentDerivative(jarg) + myconf0->getReferenceArguments()[jarg] - sarg[jarg];
207 6552 : for(unsigned karg=0; karg<narg; ++karg) {
208 : // Need sorting for PBC with GAT
209 4368 : double karg_d = 0.5*mypack.getArgumentDerivative(karg) + myconf0->getReferenceArguments()[karg] - sarg[karg];
210 4368 : covar( jarg, karg ) += 0.25*getWeight(i)*jarg_d*karg_d; // mypack.getArgumentDerivative(jarg)*mypack.getArgumentDerivative(karg);
211 : }
212 : }
213 13650 : for(unsigned jat=0; jat<natoms; ++jat) {
214 48048 : for(unsigned jc=0; jc<3; ++jc) {
215 36036 : double jdisplace = mypack.getAtomsDisplacementVector()[jat][jc] / displace[jat] + myconf0->getReferencePositions()[jat][jc] - spos[jat][jc];
216 828828 : for(unsigned kat=0; kat<natoms; ++kat) {
217 3171168 : for(unsigned kc=0; kc<3; ++kc) {
218 2378376 : double kdisplace = mypack.getAtomsDisplacementVector()[kat][kc] /displace[kat] + myconf0->getReferencePositions()[kat][kc] - spos[kat][kc];
219 2378376 : covar( narg+3*jat + jc, narg+3*kat + kc ) += getWeight(i)*jdisplace*kdisplace;
220 : }
221 : }
222 : }
223 : }
224 : }
225 : // Normalise
226 73 : for(unsigned i=0; i<covar.nrows(); ++i) {
227 4434 : for(unsigned j=0; j<covar.ncols(); ++j) covar(i,j) *= inorm;
228 : }
229 :
230 : // Diagonalise the covariance
231 3 : std::vector<double> eigval( narg+3*natoms );
232 : Matrix<double> eigvec( narg+3*natoms, narg+3*natoms );
233 3 : diagMat( covar, eigval, eigvec );
234 :
235 : // Output the reference configuration
236 3 : mypdb.setAtomPositions( spos );
237 7 : for(unsigned j=0; j<sarg.size(); ++j) mypdb.setArgumentValue( getArguments()[j]->getName(), sarg[j] );
238 : // Reset the reference configuration
239 6 : myref = metricRegister().create<ReferenceConfiguration>( mtype, mypdb );
240 :
241 : // Store and print the eigenvectors
242 3 : std::vector<Vector> tmp_atoms( natoms );
243 9 : for(unsigned dim=0; dim<nlow; ++dim) {
244 6 : unsigned idim = covar.ncols() - 1 - dim;
245 14 : for(unsigned i=0; i<narg; ++i) mypdb.setArgumentValue( getArguments()[i]->getName(), eigvec(idim,i) );
246 50 : for(unsigned i=0; i<natoms; ++i) {
247 176 : for(unsigned k=0; k<3; ++k) tmp_atoms[i][k]=eigvec(idim,narg+3*i+k);
248 : }
249 6 : mypdb.setAtomPositions( tmp_atoms );
250 : // Create a direction object so that we can calculate other PCA components
251 12 : directions.push_back( Direction(ReferenceConfigurationOptions("DIRECTION")));
252 6 : directions[dim].read( mypdb );
253 : }
254 3 : }
255 :
256 0 : void PCA::getProjection( const unsigned& idata, std::vector<double>& point, double& weight ) {
257 0 : if( point.size()!=nlow ) point.resize( nlow );
258 : // Retrieve the weight
259 0 : weight = getWeight(idata);
260 : // Retrieve the data point
261 0 : getProjection( my_input_data->getStoredData( idata, false ), point );
262 0 : }
263 :
264 546 : void PCA::getProjection( analysis::DataCollectionObject& myidata, std::vector<double>& point ) {
265 546 : myidata.transferDataToPDB( mypdb ); std::vector<double> args( getArguments().size() );
266 1638 : for(unsigned j=0; j<getArguments().size(); ++j) mypdb.getArgumentValue( getArguments()[j]->getName(), args[j] );
267 : // Create some storage space
268 546 : MultiValue myval( 1, 3*mypdb.getPositions().size() + args.size() + 9);
269 546 : ReferenceValuePack mypack( args.size(), mypdb.getPositions().size(), myval );
270 546 : for(unsigned i=0; i<mypdb.getPositions().size(); ++i) mypack.setAtomIndex( i, i );
271 546 : myref->setupPCAStorage( mypack );
272 : // And calculate
273 546 : myref->calculate( mypdb.getPositions(), getPbc(), getArguments(), mypack, true );
274 1638 : for(unsigned i=0; i<nlow; ++i) point[i]=myref->projectDisplacementOnVector( directions[i], getArguments(), args, mypack );
275 1092 : }
276 :
277 : }
278 : }
|