Line data Source code
1 : /* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 : Copyright (c) 2015-2020 The plumed team 3 : (see the PEOPLE file at the root of the distribution for a list of names) 4 : 5 : See http://www.plumed.org for more information. 6 : 7 : This file is part of plumed, version 2. 8 : 9 : plumed is free software: you can redistribute it and/or modify 10 : it under the terms of the GNU Lesser General Public License as published by 11 : the Free Software Foundation, either version 3 of the License, or 12 : (at your option) any later version. 13 : 14 : plumed is distributed in the hope that it will be useful, 15 : but WITHOUT ANY WARRANTY; without even the implied warranty of 16 : MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 17 : GNU Lesser General Public License for more details. 18 : 19 : You should have received a copy of the GNU Lesser General Public License 20 : along with plumed. If not, see <http://www.gnu.org/licenses/>. 21 : +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ */ 22 : #include "core/ActionRegister.h" 23 : #include "core/ActionShortcut.h" 24 : #include "multicolvar/MultiColvarShortcuts.h" 25 : #include "tools/IFile.h" 26 : #include "core/ActionSetup.h" 27 : 28 : //+PLUMEDOC MCOLVARF PAMM 29 : /* 30 : Probabilistic analysis of molecular motifs. 31 : 32 : Probabilistic analysis of molecular motifs (PAMM) was introduced in this paper \cite pamm. 33 : The essence of this approach involves calculating some large set of collective variables 34 : for a set of atoms in a short trajectory and fitting this data using a Gaussian Mixture Model. 35 : The idea is that modes in these distributions can be used to identify features such as hydrogen bonds or 36 : secondary structure types. 37 : 38 : The assumption within this implementation is that the fitting of the Gaussian mixture model has been 39 : done elsewhere by a separate code. You thus provide an input file to this action which contains the 40 : means, covariance matrices and weights for a set of Gaussian kernels, \f$\{ \phi \}\f$. The values and 41 : derivatives for the following set of quantities is then computed: 42 : 43 : \f[ 44 : s_k = \frac{ \phi_k}{ \sum_i \phi_i } 45 : \f] 46 : 47 : Each of the \f$\phi_k\f$ is a Gaussian function that acts on a set of quantities calculated within 48 : a \ref mcolv . These might be \ref TORSIONS, \ref DISTANCES, \ref ANGLES or any one of the many 49 : symmetry functions that are available within \ref mcolv actions. These quantities are then inserted into 50 : the set of \f$n\f$ kernels that are in the the input file. This will be done for multiple sets of values 51 : for the input quantities and a final quantity will be calculated by summing the above \f$s_k\f$ values or 52 : some transformation of the above. This sounds less complicated than it is and is best understood by 53 : looking through the example given below. 54 : 55 : \warning Mixing \ref mcolv actions that are periodic with variables that are not periodic has not been tested 56 : 57 : \par Examples 58 : 59 : In this example I will explain in detail what the following input is computing: 60 : 61 : \plumedfile 62 : #SETTINGS MOLFILE=regtest/basic/rt32/helix.pdb 63 : MOLINFO MOLTYPE=protein STRUCTURE=M1d.pdb 64 : psi: TORSIONS ATOMS1=@psi-2 ATOMS2=@psi-3 ATOMS3=@psi-4 65 : phi: TORSIONS ATOMS1=@phi-2 ATOMS2=@phi-3 ATOMS3=@phi-4 66 : p: PAMM DATA=phi,psi CLUSTERS=clusters.pamm MEAN1={COMPONENT=1} MEAN2={COMPONENT=2} 67 : PRINT ARG=p.mean-1,p.mean-2 FILE=colvar 68 : \endplumedfile 69 : 70 : The best place to start our explanation is to look at the contents of the clusters.pamm file 71 : 72 : \auxfile{clusters.pamm} 73 : #! FIELDS height phi psi sigma_phi_phi sigma_phi_psi sigma_psi_phi sigma_psi_psi 74 : #! SET multivariate von-misses 75 : #! SET kerneltype gaussian 76 : 2.97197455E-0001 -1.91983118E+0000 2.25029540E+0000 2.45960237E-0001 -1.30615381E-0001 -1.30615381E-0001 2.40239117E-0001 77 : 2.29131448E-0002 1.39809354E+0000 9.54585380E-0002 9.61755708E-0002 -3.55657919E-0002 -3.55657919E-0002 1.06147253E-0001 78 : 5.06676398E-0001 -1.09648066E+0000 -7.17867907E-0001 1.40523052E-0001 -1.05385552E-0001 -1.05385552E-0001 1.63290557E-0001 79 : \endauxfile 80 : 81 : This files contains the parameters of two two-dimensional Gaussian functions. Each of these Gaussian kernels has a weight, \f$w_k\f$, 82 : a vector that specifies the position of its center, \f$\mathbf{c}_k\f$, and a covariance matrix, \f$\Sigma_k\f$. The \f$\phi_k\f$ functions that 83 : we use to calculate our PAMM components are thus: 84 : 85 : \f[ 86 : \phi_k = \frac{w_k}{N_k} \exp\left( -(\mathbf{s} - \mathbf{c}_k)^T \Sigma^{-1}_k (\mathbf{s} - \mathbf{c}_k) \right) 87 : \f] 88 : 89 : In the above \f$N_k\f$ is a normalization factor that is calculated based on \f$\Sigma\f$. The vector \f$\mathbf{s}\f$ is a vector of quantities 90 : that are calculated by the \ref TORSIONS actions. This vector must be two dimensional and in this case each component is the value of a 91 : torsion angle. If we look at the two \ref TORSIONS actions in the above we are calculating the \f$\phi\f$ and \f$\psi\f$ backbone torsional 92 : angles in a protein (Note the use of \ref MOLINFO to make specification of atoms straightforward). We thus calculate the values of our 93 : 2 \f$ \{ \phi \} \f$ kernels 3 times. The first time we use the \f$\phi\f$ and \f$\psi\f$ angles in the second residue of the protein, 94 : the second time it is the \f$\phi\f$ and \f$\psi\f$ angles of the third residue of the protein and the third time it is the \f$\phi\f$ and \f$\psi\f$ angles 95 : of the fourth residue in the protein. The final two quantities that are output by the print command, p.mean-1 and p.mean-2, are the averages 96 : over these three residues for the quantities: 97 : \f[ 98 : s_1 = \frac{ \phi_1}{ \phi_1 + \phi_2 } 99 : \f] 100 : and 101 : \f[ 102 : s_2 = \frac{ \phi_2}{ \phi_1 + \phi_2 } 103 : \f] 104 : There is a great deal of flexibility in this input. We can work with, and examine, any number of components, we can use any set of collective variables 105 : and compute these PAMM variables and we can transform the PAMM variables themselves in a large number of different ways when computing these sums. 106 : */ 107 : //+ENDPLUMEDOC 108 : 109 : namespace PLMD { 110 : namespace pamm { 111 : 112 : class PAMM : public ActionShortcut { 113 : public: 114 : static void registerKeywords( Keywords& keys ); 115 : explicit PAMM(const ActionOptions&); 116 : }; 117 : 118 : PLUMED_REGISTER_ACTION(PAMM,"PAMM") 119 : 120 4 : void PAMM::registerKeywords( Keywords& keys ) { 121 4 : ActionShortcut::registerKeywords( keys ); 122 8 : keys.add("compulsory","ARG","the vectors from which the pamm coordinates are calculated"); 123 8 : keys.add("compulsory","CLUSTERS","the name of the file that contains the definitions of all the clusters"); 124 8 : keys.add("compulsory","REGULARISE","0.001","don't allow the denominator to be smaller then this value"); 125 8 : keys.add("compulsory","KERNELS","all","which kernels are we computing the PAMM values for"); 126 4 : multicolvar::MultiColvarShortcuts::shortcutKeywords( keys ); 127 8 : keys.needsAction("KERNEL"); keys.needsAction("COMBINE"); 128 4 : } 129 : 130 2 : PAMM::PAMM(const ActionOptions& ao) : 131 : Action(ao), 132 2 : ActionShortcut(ao) 133 : { 134 : // Must get list of input value names 135 4 : std::vector<std::string> valnames; parseVector("ARG",valnames); 136 : // Create input values 137 2 : std::string argstr=" ARG=" + valnames[0]; 138 3 : for(unsigned j=1; j<valnames.size(); ++j) argstr += "," + valnames[j]; 139 : 140 : // Create actions to calculate all pamm kernels 141 : unsigned nkernels = 0; double h; 142 2 : std::string fname; parse("CLUSTERS",fname); 143 2 : IFile ifile; ifile.open(fname); ifile.allowIgnoredFields(); 144 : for(unsigned k=0;; ++k) { 145 22 : if( !ifile.scanField("height",h) ) break; 146 : // Create a kernel for this cluster 147 9 : std::string num, wstr, ktype; Tools::convert( k+1, num ); Tools::convert(h,wstr); ifile.scanField("kerneltype",ktype); 148 18 : readInputLine( getShortcutLabel() + "_kernel-" + num + ": KERNEL NORMALIZED" + argstr + " NUMBER=" + num + " REFERENCE=" + fname + " WEIGHT=" + wstr + " TYPE=" + ktype ); 149 9 : nkernels++; ifile.scanField(); 150 9 : } 151 2 : ifile.close(); 152 : 153 : // And add on the regularization 154 4 : std::string regparam; parse("REGULARISE",regparam); 155 : // Now combine all the PAMM objects with the regparam 156 2 : std::string paramstr, cinput = getShortcutLabel() + "_ksum: COMBINE PERIODIC=NO"; 157 11 : for(unsigned k=0; k<nkernels; ++k) { 158 9 : std::string num; Tools::convert( k+1, num ); 159 9 : if( k==0 ) { 160 4 : cinput += " ARG="; paramstr=" PARAMETERS=-" + regparam; 161 : } else { 162 : cinput += ","; paramstr += ",0"; 163 : } 164 18 : cinput += getShortcutLabel() + "_kernel-" + num; 165 : } 166 4 : readInputLine( cinput + paramstr ); 167 : 168 : // And now compute all the pamm kernels 169 4 : std::string kchoice; parse("KERNELS",kchoice); 170 2 : std::map<std::string,std::string> keymap; multicolvar::MultiColvarShortcuts::readShortcutKeywords( keymap, this ); 171 2 : if( kchoice=="all" ) { 172 11 : for(unsigned k=0; k<nkernels; ++k) { 173 9 : std::string num; Tools::convert( k+1, num ); 174 18 : readInputLine( getShortcutLabel() + "-" + num + ": CUSTOM ARG=" + getShortcutLabel() + "_kernel-" + num + "," + getShortcutLabel() + "_ksum FUNC=x/y PERIODIC=NO"); 175 18 : multicolvar::MultiColvarShortcuts::expandFunctions( getShortcutLabel() + "-" + num, getShortcutLabel() + "-" + num, "", keymap, this ); 176 : } 177 : } else { 178 0 : std::vector<std::string> awords=Tools::getWords(kchoice,"\t\n ,"); Tools::interpretRanges( awords ); 179 0 : for(unsigned k=0; k<awords.size(); ++k) { 180 0 : readInputLine( getShortcutLabel() + "-" + awords[k] + ": CUSTOM ARG=" + getShortcutLabel() + "_kernel-" + awords[k] + "," + getShortcutLabel() + "_ksum FUNC=x/y PERIODIC=NO"); 181 0 : multicolvar::MultiColvarShortcuts::expandFunctions( getShortcutLabel() + "-" + awords[k], getShortcutLabel() + "-" + awords[k], "", keymap, this ); 182 : } 183 0 : } 184 4 : } 185 : 186 : } 187 : }