Line data Source code
1 : /* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 : Copyright (c) 2013-2019 The plumed team
3 : (see the PEOPLE file at the root of the distribution for a list of names)
4 :
5 : See http://www.plumed.org for more information.
6 :
7 : This file is part of plumed, version 2.
8 :
9 : plumed is free software: you can redistribute it and/or modify
10 : it under the terms of the GNU Lesser General Public License as published by
11 : the Free Software Foundation, either version 3 of the License, or
12 : (at your option) any later version.
13 :
14 : plumed is distributed in the hope that it will be useful,
15 : but WITHOUT ANY WARRANTY; without even the implied warranty of
16 : MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
17 : GNU Lesser General Public License for more details.
18 :
19 : You should have received a copy of the GNU Lesser General Public License
20 : along with plumed. If not, see <http://www.gnu.org/licenses/>.
21 : +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ */
22 : #include "AnalysisWithLandmarks.h"
23 : #include "ClassicalScaling.h"
24 : #include "reference/PointWiseMapping.h"
25 : #include "core/ActionRegister.h"
26 :
27 : namespace PLMD {
28 : namespace analysis {
29 :
30 : //+PLUMEDOC DIMRED CLASSICAL_MDS
31 : /*
32 : Create a low-dimensional projection of a trajectory using the classical multidimensional
33 : scaling algorithm.
34 :
35 : Multidimensional scaling (MDS) is similar to what is done when you make a map. You start with distances
36 : between London, Belfast, Paris and Dublin and then you try to arrange points on a piece of paper so that the (suitably scaled)
37 : distances between the points in your map representing each of those cities are related to the true distances between the cities.
38 : Stating this more mathematically MDS endeavors to find an <a href="http://en.wikipedia.org/wiki/Isometry">isometry</a>
39 : between points distributed in a high-dimensional space and a set of points distributed in a low-dimensional plane.
40 : In other words, if we have \f$M\f$ \f$D\f$-dimensional points, \f$\mathbf{X}\f$,
41 : and we can calculate dissimilarities between pairs them, \f$D_{ij}\f$, we can, with an MDS calculation, try to create \f$M\f$ projections,
42 : \f$\mathbf{x}\f$, of the high dimensionality points in a \f$d\f$-dimensional linear space by trying to arrange the projections so that the
43 : Euclidean distances between pairs of them, \f$d_{ij}\f$, resemble the dissimilarities between the high dimensional points. In short we minimize:
44 :
45 : \f[
46 : \chi^2 = \sum_{i \ne j} \left( D_{ij} - d_{ij} \right)^2
47 : \f]
48 :
49 : where \f$D_{ij}\f$ is the distance between point \f$X^{i}\f$ and point \f$X^{j}\f$ and \f$d_{ij}\f$ is the distance between the projection
50 : of \f$X^{i}\f$, \f$x^i\f$, and the projection of \f$X^{j}\f$, \f$x^j\f$. A tutorial on this approach can be used to analyse simulations
51 : can be found in the tutorial \ref belfast-3 and in the following <a href="https://www.youtube.com/watch?v=ofC2qz0_9_A&feature=youtu.be" > short video.</a>
52 :
53 : \par Examples
54 :
55 : The following command instructs plumed to construct a classical multidimensional scaling projection of a trajectory.
56 : The RMSD distance between atoms 1-256 have moved is used to measure the distances in the high-dimensional space.
57 :
58 : \plumedfile
59 : CLASSICAL_MDS ...
60 : ATOMS=1-256
61 : METRIC=OPTIMAL-FAST
62 : NLOW_DIM=2
63 : OUTPUT_FILE=rmsd-embed
64 : ... CLASSICAL_MDS
65 : \endplumedfile
66 :
67 : The following section is for people who are interested in how this method works in detail. A solid understanding of this material is
68 : not necessary to use MDS.
69 :
70 : \section dim-sec Method of optimisation
71 :
72 : The stress function can be minimized using a standard optimization algorithm such as conjugate gradients or steepest descent.
73 : However, it is more common to do this minimization using a technique known as classical scaling. Classical scaling works by
74 : recognizing that each of the distances $D_{ij}$ in the above sum can be written as:
75 :
76 : \f[
77 : D_{ij}^2 = \sum_{\alpha} (X^i_\alpha - X^j_\alpha)^2 = \sum_\alpha (X^i_\alpha)^2 + (X^j_\alpha)^2 - 2X^i_\alpha X^j_\alpha
78 : \f]
79 :
80 : We can use this expression and matrix algebra to calculate multiple distances at once. For instance if we have three points,
81 : \f$\mathbf{X}\f$, we can write distances between them as:
82 :
83 : \f{eqnarray*}{
84 : D^2(\mathbf{X}) &=& \left[ \begin{array}{ccc}
85 : 0 & d_{12}^2 & d_{13}^2 \\
86 : d_{12}^2 & 0 & d_{23}^2 \\
87 : d_{13}^2 & d_{23}^2 & 0
88 : \end{array}\right] \\
89 : &=&
90 : \sum_\alpha \left[ \begin{array}{ccc}
91 : (X^1_\alpha)^2 & (X^1_\alpha)^2 & (X^1_\alpha)^2 \\
92 : (X^2_\alpha)^2 & (X^2_\alpha)^2 & (X^2_\alpha)^2 \\
93 : (X^3_\alpha)^2 & (X^3_\alpha)^2 & (X^3_\alpha)^2 \\
94 : \end{array}\right]
95 : + \sum_\alpha \left[ \begin{array}{ccc}
96 : (X^1_\alpha)^2 & (X^2_\alpha)^2 & (X^3_\alpha)^2 \\
97 : (X^1_\alpha)^2 & (X^2_\alpha)^2 & (X^3_\alpha)^2 \\
98 : (X^1_\alpha)^2 & (X^2_\alpha)^2 & (X^3_\alpha)^2 \\
99 : \end{array}\right]
100 : - 2 \sum_\alpha \left[ \begin{array}{ccc}
101 : X^1_\alpha X^1_\alpha & X^1_\alpha X^2_\alpha & X^1_\alpha X^3_\alpha \\
102 : X^2_\alpha X^1_\alpha & X^2_\alpha X^2_\alpha & X^2_\alpha X^3_\alpha \\
103 : X^1_\alpha X^3_\alpha & X^3_\alpha X^2_\alpha & X^3_\alpha X^3_\alpha
104 : \end{array}\right] \nonumber \\
105 : &=& \mathbf{c 1^T} + \mathbf{1 c^T} - 2 \sum_\alpha \mathbf{x}_a \mathbf{x}^T_a = \mathbf{c 1^T} + \mathbf{1 c^T} - 2\mathbf{X X^T}
106 : \f}
107 :
108 : This last equation can be extended to situations when we have more than three points. In it \f$\mathbf{X}\f$ is a matrix that has
109 : one high-dimensional point on each of its rows and \f$\mathbf{X^T}\f$ is its transpose. \f$\mathbf{1}\f$ is an \f$M \times 1\f$ vector
110 : of ones and \f$\mathbf{c}\f$ is a vector with components given by:
111 :
112 : \f[
113 : c_i = \sum_\alpha (x_\alpha^i)^2
114 : \f]
115 :
116 : These quantities are the diagonal elements of \f$\mathbf{X X^T}\f$, which is a dot product or Gram Matrix that contains the
117 : dot product of the vector \f$X_i\f$ with the vector \f$X_j\f$ in element \f$i,j\f$.
118 :
119 : In classical scaling we introduce a centering matrix \f$\mathbf{J}\f$ that is given by:
120 :
121 : \f[
122 : \mathbf{J} = \mathbf{I} - \frac{1}{M} \mathbf{11^T}
123 : \f]
124 :
125 : where \f$\mathbf{I}\f$ is the identity. Multiplying the equations above from the front and back by this matrix and a factor of a \f$-\frac{1}{2}\f$ gives:
126 :
127 : \f{eqnarray*}{
128 : -\frac{1}{2} \mathbf{J} \mathbf{D}^2(\mathbf{X}) \mathbf{J} &=& -\frac{1}{2}\mathbf{J}( \mathbf{c 1^T} + \mathbf{1 c^T} - 2\mathbf{X X^T})\mathbf{J} \\
129 : &=& -\frac{1}{2}\mathbf{J c 1^T J} - \frac{1}{2} \mathbf{J 1 c^T J} + \frac{1}{2} \mathbf{J}(2\mathbf{X X^T})\mathbf{J} \\
130 : &=& \mathbf{ J X X^T J } = \mathbf{X X^T } \label{eqn:scaling}
131 : \f}
132 :
133 : The fist two terms in this expression disappear because \f$\mathbf{1^T J}=\mathbf{J 1} =\mathbf{0}\f$, where \f$\mathbf{0}\f$
134 : is a matrix containing all zeros. In the final step meanwhile we use the fact that the matrix of squared distances will not
135 : change when we translate all the points. We can thus assume that the mean value, \f$\mu\f$, for each of the components, \f$\alpha\f$:
136 : \f[
137 : \mu_\alpha = \frac{1}{M} \sum_{i=1}^N \mathbf{X}^i_\alpha
138 : \f]
139 : is equal to 0 so the columns of \f$\mathbf{X}\f$ add up to 0. This in turn means that each of the columns of
140 : \f$\mathbf{X X^T}\f$ adds up to zero, which is what allows us to write \f$\mathbf{ J X X^T J } = \mathbf{X X^T }\f$.
141 :
142 : The matrix of squared distances is symmetric and positive-definite we can thus use the spectral decomposition to decompose it as:
143 :
144 : \f[
145 : \Phi= \mathbf{V} \Lambda \mathbf{V}^T
146 : \f]
147 :
148 : Furthermore, because the matrix we are diagonalizing, \f$\mathbf{X X^T}\f$, is the product of a matrix and its transpose
149 : we can use this decomposition to write:
150 :
151 : \f[
152 : \mathbf{X} =\mathbf{V} \Lambda^\frac{1}{2}
153 : \f]
154 :
155 : Much as in PCA there are generally a small number of large eigenvalues in \f$\Lambda\f$ and many small eigenvalues.
156 : We can safely use only the large eigenvalues and their corresponding eigenvectors to express the relationship between
157 : the coordinates \f$\mathbf{X}\f$. This gives us our set of low-dimensional projections.
158 :
159 : This derivation makes a number of assumptions about the how the low dimensional points should best be arranged to minimise
160 : the stress. If you use an interative optimization algorithm such as SMACOF you may thus be able to find a better
161 : (lower-stress) projection of the points. For more details on the assumptions made
162 : see <a href="http://quest4rigor.com/tag/multidimensional-scaling/"> this website.</a>
163 : */
164 : //+ENDPLUMEDOC
165 :
166 : class ClassicalMultiDimensionalScaling : public AnalysisWithLandmarks {
167 : private:
168 : unsigned nlow;
169 : std::string ofilename;
170 : std::string efilename;
171 : PointWiseMapping* myembedding;
172 : public:
173 : static void registerKeywords( Keywords& keys );
174 : explicit ClassicalMultiDimensionalScaling( const ActionOptions& ao );
175 : ~ClassicalMultiDimensionalScaling();
176 : void analyzeLandmarks();
177 : };
178 :
179 6453 : PLUMED_REGISTER_ACTION(ClassicalMultiDimensionalScaling,"CLASSICAL_MDS")
180 :
181 2 : void ClassicalMultiDimensionalScaling::registerKeywords( Keywords& keys ) {
182 2 : AnalysisWithLandmarks::registerKeywords( keys );
183 8 : keys.add("compulsory","NLOW_DIM","number of low-dimensional coordinates required");
184 8 : keys.add("compulsory","OUTPUT_FILE","file on which to output the final embedding coordinates");
185 10 : keys.add("compulsory","EMBEDDING_OFILE","dont output","file on which to output the embedding in plumed input format");
186 2 : }
187 :
188 1 : ClassicalMultiDimensionalScaling::ClassicalMultiDimensionalScaling( const ActionOptions& ao ):
189 : Action(ao),
190 2 : AnalysisWithLandmarks(ao)
191 : {
192 2 : myembedding = new PointWiseMapping( getMetricName(), false );
193 1 : setDataToAnalyze( dynamic_cast<MultiReferenceBase*>(myembedding) );
194 :
195 2 : parse("NLOW_DIM",nlow);
196 1 : if( nlow<1 ) error("dimensionality of low dimensional space must be at least one");
197 2 : std::vector<std::string> propnames( nlow ); std::string num;
198 8 : for(unsigned i=0; i<propnames.size(); ++i) {
199 2 : Tools::convert(i+1,num); std::string lab=getLabel();
200 10 : if(lab.find("@")!=std::string::npos) propnames[i]=getName() + "." + num;
201 0 : else propnames[i]=getLabel() + "." + num;
202 : }
203 1 : myembedding->setPropertyNames( propnames, false );
204 :
205 2 : parseOutputFile("EMBEDDING_OFILE",efilename);
206 2 : parseOutputFile("OUTPUT_FILE",ofilename);
207 1 : }
208 :
209 3 : ClassicalMultiDimensionalScaling::~ClassicalMultiDimensionalScaling() {
210 1 : delete myembedding;
211 2 : }
212 :
213 2 : void ClassicalMultiDimensionalScaling::analyzeLandmarks() {
214 : // Calculate all pairwise diatances
215 6 : myembedding->calculateAllDistances( getPbc(), getArguments(), comm, myembedding->modifyDmat(), true );
216 :
217 : // Run multidimensional scaling
218 2 : ClassicalScaling::run( myembedding );
219 :
220 : // Output the embedding as long lists of data
221 : // std::string gfname=saveResultsFromPreviousAnalyses( ofilename );
222 4 : OFile gfile; gfile.link(*this);
223 4 : gfile.setBackupString("analysis");
224 6 : gfile.fmtField(getOutputFormat()+" ");
225 4 : gfile.open( ofilename.c_str() );
226 :
227 : // Print embedding coordinates
228 604 : for(unsigned i=0; i<myembedding->getNumberOfReferenceFrames(); ++i) {
229 1000 : for(unsigned j=0; j<nlow; ++j) {
230 400 : std::string num; Tools::convert(j+1,num);
231 2000 : gfile.printField( getLabel() + "." + num, myembedding->getProjectionCoordinate(i,j) );
232 : }
233 200 : gfile.printField();
234 : }
235 2 : gfile.close();
236 :
237 : // Output the embedding in plumed format
238 4 : if( efilename!="dont output") {
239 6 : OFile afile; afile.link(*this); afile.setBackupString("analysis");
240 4 : afile.open( efilename.c_str() );
241 8 : myembedding->print( "classical mds", getTime(), afile, getOutputFormat(), atoms.getUnits().getLength()/0.1 );
242 2 : afile.close();
243 : }
244 2 : }
245 :
246 : }
247 4839 : }
|