Line data Source code
1 : /* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 : Copyright (c) 2015-2023 The plumed team
3 : (see the PEOPLE file at the root of the distribution for a list of names)
4 :
5 : See http://www.plumed.org for more information.
6 :
7 : This file is part of plumed, version 2.
8 :
9 : plumed is free software: you can redistribute it and/or modify
10 : it under the terms of the GNU Lesser General Public License as published by
11 : the Free Software Foundation, either version 3 of the License, or
12 : (at your option) any later version.
13 :
14 : plumed is distributed in the hope that it will be useful,
15 : but WITHOUT ANY WARRANTY; without even the implied warranty of
16 : MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
17 : GNU Lesser General Public License for more details.
18 :
19 : You should have received a copy of the GNU Lesser General Public License
20 : along with plumed. If not, see <http://www.gnu.org/licenses/>.
21 : +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ */
22 : #include "core/ActionRegister.h"
23 : #include "tools/KernelFunctions.h"
24 : #include "tools/IFile.h"
25 : #include "multicolvar/MultiColvarBase.h"
26 : #include "multicolvar/AtomValuePack.h"
27 : #include "PammObject.h"
28 :
29 : //+PLUMEDOC MCOLVARF PAMM
30 : /*
31 : Probabilistic analysis of molecular motifs.
32 :
33 : Probabilistic analysis of molecular motifs (PAMM) was introduced in this paper \cite pamm.
34 : The essence of this approach involves calculating some large set of collective variables
35 : for a set of atoms in a short trajectory and fitting this data using a Gaussian Mixture Model.
36 : The idea is that modes in these distributions can be used to identify features such as hydrogen bonds or
37 : secondary structure types.
38 :
39 : The assumption within this implementation is that the fitting of the Gaussian mixture model has been
40 : done elsewhere by a separate code. You thus provide an input file to this action which contains the
41 : means, covariance matrices and weights for a set of Gaussian kernels, \f$\{ \phi \}\f$. The values and
42 : derivatives for the following set of quantities is then computed:
43 :
44 : \f[
45 : s_k = \frac{ \phi_k}{ \sum_i \phi_i }
46 : \f]
47 :
48 : Each of the \f$\phi_k\f$ is a Gaussian function that acts on a set of quantities calculated within
49 : a \ref mcolv . These might be \ref TORSIONS, \ref DISTANCES, \ref ANGLES or any one of the many
50 : symmetry functions that are available within \ref mcolv actions. These quantities are then inserted into
51 : the set of \f$n\f$ kernels that are in the the input file. This will be done for multiple sets of values
52 : for the input quantities and a final quantity will be calculated by summing the above \f$s_k\f$ values or
53 : some transformation of the above. This sounds less complicated than it is and is best understood by
54 : looking through the example given below.
55 :
56 : \warning Mixing \ref mcolv actions that are periodic with variables that are not periodic has not been tested
57 :
58 : \par Examples
59 :
60 : In this example I will explain in detail what the following input is computing:
61 :
62 : \plumedfile
63 : #SETTINGS MOLFILE=regtest/basic/rt32/helix.pdb
64 : MOLINFO MOLTYPE=protein STRUCTURE=M1d.pdb
65 : psi: TORSIONS ATOMS1=@psi-2 ATOMS2=@psi-3 ATOMS3=@psi-4
66 : phi: TORSIONS ATOMS1=@phi-2 ATOMS2=@phi-3 ATOMS3=@phi-4
67 : p: PAMM DATA=phi,psi CLUSTERS=clusters.pamm MEAN1={COMPONENT=1} MEAN2={COMPONENT=2}
68 : PRINT ARG=p.mean-1,p.mean-2 FILE=colvar
69 : \endplumedfile
70 :
71 : The best place to start our explanation is to look at the contents of the clusters.pamm file
72 :
73 : \auxfile{clusters.pamm}
74 : #! FIELDS height phi psi sigma_phi_phi sigma_phi_psi sigma_psi_phi sigma_psi_psi
75 : #! SET multivariate von-misses
76 : #! SET kerneltype gaussian
77 : 2.97197455E-0001 -1.91983118E+0000 2.25029540E+0000 2.45960237E-0001 -1.30615381E-0001 -1.30615381E-0001 2.40239117E-0001
78 : 2.29131448E-0002 1.39809354E+0000 9.54585380E-0002 9.61755708E-0002 -3.55657919E-0002 -3.55657919E-0002 1.06147253E-0001
79 : 5.06676398E-0001 -1.09648066E+0000 -7.17867907E-0001 1.40523052E-0001 -1.05385552E-0001 -1.05385552E-0001 1.63290557E-0001
80 : \endauxfile
81 :
82 : This files contains the parameters of two two-dimensional Gaussian functions. Each of these Gaussian kernels has a weight, \f$w_k\f$,
83 : a vector that specifies the position of its center, \f$\mathbf{c}_k\f$, and a covariance matrix, \f$\Sigma_k\f$. The \f$\phi_k\f$ functions that
84 : we use to calculate our PAMM components are thus:
85 :
86 : \f[
87 : \phi_k = \frac{w_k}{N_k} \exp\left( -(\mathbf{s} - \mathbf{c}_k)^T \Sigma^{-1}_k (\mathbf{s} - \mathbf{c}_k) \right)
88 : \f]
89 :
90 : In the above \f$N_k\f$ is a normalization factor that is calculated based on \f$\Sigma\f$. The vector \f$\mathbf{s}\f$ is a vector of quantities
91 : that are calculated by the \ref TORSIONS actions. This vector must be two dimensional and in this case each component is the value of a
92 : torsion angle. If we look at the two \ref TORSIONS actions in the above we are calculating the \f$\phi\f$ and \f$\psi\f$ backbone torsional
93 : angles in a protein (Note the use of \ref MOLINFO to make specification of atoms straightforward). We thus calculate the values of our
94 : 2 \f$ \{ \phi \} \f$ kernels 3 times. The first time we use the \f$\phi\f$ and \f$\psi\f$ angles in the second residue of the protein,
95 : the second time it is the \f$\phi\f$ and \f$\psi\f$ angles of the third residue of the protein and the third time it is the \f$\phi\f$ and \f$\psi\f$ angles
96 : of the fourth residue in the protein. The final two quantities that are output by the print command, p.mean-1 and p.mean-2, are the averages
97 : over these three residues for the quantities:
98 : \f[
99 : s_1 = \frac{ \phi_1}{ \phi_1 + \phi_2 }
100 : \f]
101 : and
102 : \f[
103 : s_2 = \frac{ \phi_2}{ \phi_1 + \phi_2 }
104 : \f]
105 : There is a great deal of flexibility in this input. We can work with, and examine, any number of components, we can use any set of collective variables
106 : and compute these PAMM variables and we can transform the PAMM variables themselves in a large number of different ways when computing these sums.
107 : */
108 : //+ENDPLUMEDOC
109 :
110 : namespace PLMD {
111 : namespace pamm {
112 :
113 : class PAMM : public multicolvar::MultiColvarBase {
114 : private:
115 : PammObject mypamm;
116 : public:
117 : static void registerKeywords( Keywords& keys );
118 : explicit PAMM(const ActionOptions&);
119 : /// We have to overwrite this here
120 : unsigned getNumberOfQuantities() const override;
121 : /// Calculate the weight of this object ( average of input weights )
122 : using PLMD::multicolvar::MultiColvarBase::calculateWeight;
123 : void calculateWeight( multicolvar::AtomValuePack& myatoms );
124 : /// Actually do the calculation
125 : double compute( const unsigned& tindex, multicolvar::AtomValuePack& myatoms ) const override;
126 : /// This returns the position of the central atom
127 : Vector getCentralAtom();
128 : /// Is the variable periodic
129 16 : bool isPeriodic() override { return false; }
130 : };
131 :
132 10423 : PLUMED_REGISTER_ACTION(PAMM,"PAMM")
133 :
134 3 : void PAMM::registerKeywords( Keywords& keys ) {
135 3 : MultiColvarBase::registerKeywords( keys );
136 6 : keys.add("compulsory","DATA","the multicolvars from which the pamm coordinates are calculated");
137 6 : keys.add("compulsory","CLUSTERS","the name of the file that contains the definitions of all the clusters");
138 6 : keys.add("compulsory","REGULARISE","0.001","don't allow the denominator to be smaller then this value");
139 15 : keys.use("MEAN"); keys.use("MORE_THAN"); keys.use("SUM"); keys.use("LESS_THAN"); keys.use("HISTOGRAM");
140 21 : keys.use("MIN"); keys.use("MAX"); keys.use("LOWEST"); keys.use("HIGHEST"); keys.use("ALT_MIN"); keys.use("BETWEEN"); keys.use("MOMENTS");
141 3 : keys.setComponentsIntroduction("When the label of this action is used as the input for a second you are not referring to a scalar quantity as you are in "
142 : "regular collective variables. The label is used to reference the full set of quantities calculated by "
143 : "the action. This is usual when using \\ref multicolvarfunction. Generally when doing this the set of PAMM variables "
144 : "will be referenced using the DATA keyword rather than ARG.\n\n"
145 : "This Action can be used to calculate the following scalar quantities directly from the underlying set of PAMM variables. "
146 : "These quantities are calculated by employing the keywords listed below and they can be referenced elsewhere in the input "
147 : "file by using this Action's label followed by a dot and the name of the quantity. The particular PAMM variable that should "
148 : "be averaged in a MEAN command or transformed by a switching function in a LESS_THAN command is specified using the COMPONENT "
149 : "keyword. COMPONENT=1 refers to the PAMM variable in which the first kernel in your input file is on the numerator, COMPONENT=2 refers to "
150 : "PAMM variable in which the second kernel in the input file is on the numerator and so on. The same quantity can be calculated "
151 : "multiple times for different PAMM components by a single PAMM action. In this case the relevant keyword must appear multiple "
152 : "times on the input line followed by a numerical identifier i.e. MEAN1, MEAN2, ... The quantities calculated when multiple "
153 : "MEAN commands appear on the input line can be reference elsewhere in the input file by using the name of the quantity followed "
154 : "followed by a numerical identifier e.g. <em>label</em>.lessthan-1, <em>label</em>.lessthan-2 etc. Alternatively, you can "
155 : "customize the labels of the quantities by using the LABEL keyword in the description of the keyword.");
156 3 : keys.remove("ALL_INPUT_SAME_TYPE");
157 3 : }
158 :
159 2 : PAMM::PAMM(const ActionOptions& ao):
160 : Action(ao),
161 2 : MultiColvarBase(ao)
162 : {
163 : // This builds the lists
164 2 : buildSets();
165 : // Check for reasonable input
166 5 : for(unsigned i=0; i<getNumberOfBaseMultiColvars(); ++i) {
167 3 : if( getBaseMultiColvar(i)->getNumberOfQuantities()!=2 ) error("cannot use PAMM with " + getBaseMultiColvar(i)->getName() );
168 : }
169 :
170 2 : bool mixed=getBaseMultiColvar(0)->isPeriodic();
171 2 : std::vector<bool> pbc( getNumberOfBaseMultiColvars() );
172 2 : std::vector<std::string> valnames( getNumberOfBaseMultiColvars() );
173 2 : std::vector<std::string> min( getNumberOfBaseMultiColvars() ), max( getNumberOfBaseMultiColvars() );
174 5 : for(unsigned i=0; i<getNumberOfBaseMultiColvars(); ++i) {
175 3 : if( getBaseMultiColvar(i)->isPeriodic()!=mixed ) warning("mixing of periodic and aperiodic base variables in pamm is untested");
176 3 : pbc[i]=getBaseMultiColvar(i)->isPeriodic();
177 3 : if( pbc[i] ) getBaseMultiColvar(i)->retrieveDomain( min[i], max[i] );
178 3 : valnames[i]=getBaseMultiColvar(i)->getLabel();
179 : }
180 :
181 4 : double regulariser; parse("REGULARISE",regulariser);
182 2 : std::string errorstr, filename; parse("CLUSTERS",filename);
183 2 : mypamm.setup( filename, regulariser, valnames, pbc, min, max, errorstr );
184 2 : if( errorstr.length()>0 ) error( errorstr );
185 4 : }
186 :
187 86 : unsigned PAMM::getNumberOfQuantities() const {
188 86 : return 1 + mypamm.getNumberOfKernels();
189 : }
190 :
191 0 : void PAMM::calculateWeight( multicolvar::AtomValuePack& myatoms ) {
192 : unsigned nvars = getNumberOfBaseMultiColvars();
193 : // Weight of point is average of weights of input colvars?
194 0 : std::vector<double> tval(2); double ww=0;
195 0 : for(unsigned i=0; i<nvars; ++i) {
196 0 : getInputData( i, false, myatoms, tval ); ww+=tval[0];
197 : }
198 0 : myatoms.setValue( 0, ww / static_cast<double>( nvars ) );
199 :
200 0 : if(!doNotCalculateDerivatives() ) {
201 0 : double pref = 1.0 / static_cast<double>( nvars );
202 0 : for(unsigned ivar=0; ivar<nvars; ++ivar) {
203 : // Get the values of derivatives
204 0 : MultiValue& myder=getInputDerivatives( ivar, false, myatoms );
205 0 : for(unsigned j=0; j<myder.getNumberActive(); ++j) {
206 0 : unsigned jder=myder.getActiveIndex(j);
207 0 : myatoms.addDerivative( 0, jder, pref*myder.getDerivative( 0, jder ) );
208 : }
209 : }
210 : }
211 0 : }
212 :
213 28 : double PAMM::compute( const unsigned& tindex, multicolvar::AtomValuePack& myatoms ) const {
214 : unsigned nvars = getNumberOfBaseMultiColvars();
215 28 : std::vector<std::vector<double> > tderiv( mypamm.getNumberOfKernels() );
216 174 : for(unsigned i=0; i<tderiv.size(); ++i) tderiv[i].resize( nvars );
217 28 : std::vector<double> tval(2), invals( nvars ), vals( mypamm.getNumberOfKernels() );
218 :
219 74 : for(unsigned i=0; i<nvars; ++i) {
220 46 : getInputData( i, false, myatoms, tval ); invals[i]=tval[1];
221 : }
222 28 : mypamm.evaluate( invals, vals, tderiv );
223 :
224 : // Now set all values other than the first one
225 : // This is because of some peverse choices in multicolvar
226 146 : for(unsigned i=1; i<vals.size(); ++i) myatoms.setValue( 1+i, vals[i] );
227 :
228 28 : if( !doNotCalculateDerivatives() ) {
229 28 : std::vector<double> mypref( 1 + vals.size() );
230 74 : for(unsigned ivar=0; ivar<nvars; ++ivar) {
231 : // Get the values of the derivatives
232 46 : MultiValue& myder = getInputDerivatives( ivar, false, myatoms );
233 : // And calculate the derivatives
234 318 : for(unsigned i=0; i<vals.size(); ++i) mypref[1+i] = tderiv[i][ivar];
235 : // This is basically doing the chain rule to get the final derivatives
236 46 : splitInputDerivatives( 1, 1, 1+vals.size(), ivar, mypref, myder, myatoms );
237 : // And clear the derivatives
238 46 : myder.clearAll();
239 : }
240 : }
241 :
242 56 : return vals[0];
243 28 : }
244 :
245 0 : Vector PAMM::getCentralAtom() {
246 : // Who knows how this should work
247 0 : plumed_error();
248 : return Vector(1.0,0.0,0.0);
249 : }
250 :
251 : }
252 : }
|