Line data Source code
1 : /* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 : Copyright (c) 2015-2020 The plumed team
3 : (see the PEOPLE file at the root of the distribution for a list of names)
4 :
5 : See http://www.plumed.org for more information.
6 :
7 : This file is part of plumed, version 2.
8 :
9 : plumed is free software: you can redistribute it and/or modify
10 : it under the terms of the GNU Lesser General Public License as published by
11 : the Free Software Foundation, either version 3 of the License, or
12 : (at your option) any later version.
13 :
14 : plumed is distributed in the hope that it will be useful,
15 : but WITHOUT ANY WARRANTY; without even the implied warranty of
16 : MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
17 : GNU Lesser General Public License for more details.
18 :
19 : You should have received a copy of the GNU Lesser General Public License
20 : along with plumed. If not, see <http://www.gnu.org/licenses/>.
21 : +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ */
22 : #include "core/ActionRegister.h"
23 : #include "core/ActionShortcut.h"
24 : #include "multicolvar/MultiColvarShortcuts.h"
25 : #include "tools/IFile.h"
26 : #include "core/ActionSetup.h"
27 :
28 : //+PLUMEDOC MCOLVARF PAMM
29 : /*
30 : Probabilistic analysis of molecular motifs.
31 :
32 : Probabilistic analysis of molecular motifs (PAMM) was introduced in the papers in the bibliography.
33 : The essence of this approach involves calculating some large set of collective variables
34 : for a set of atoms in a short trajectory and fitting this data using a Gaussian Mixture Model.
35 : The idea is that modes in these distributions can be used to identify features such as hydrogen bonds or
36 : secondary structure types.
37 :
38 : The assumption within this implementation is that the fitting of the Gaussian mixture model has been
39 : done elsewhere by a separate code. You thus provide an input file to this action which contains the
40 : means, covariance matrices and weights for a set of Gaussian kernels, $\{\phi\}$. The values and
41 : derivatives for the following set of quantities is then computed:
42 :
43 : $$
44 : s_k = \frac{ \phi_k}{ \sum_i \phi_i }
45 : $$
46 :
47 : Each of the $\phi_k$ is a Gaussian function that acts on a set in quantities calculated that might be calculated
48 : using a [TORSION](TORSION.md), [DISTANCE](DISTANCE.md) or [ANGLE](ANGLE.md) action for example.
49 : These quantities are then inserted into the set of $n$ kernels that are in the the input file. This will be done for multiple sets of values
50 : for the input quantities and a final quantity will be calculated by summing the above $s_k$ values or
51 : some transformation of the above. This sounds less complicated than it is and is best understood by
52 : looking through the example given below, which can be expanded to show the full set of operations that PLUMED is performing.
53 :
54 : \warning Mixing input variables that are periodic with variables that are not periodic has not been tested
55 :
56 : ## Examples
57 :
58 : In this example I will explain in detail what the following input is computing:
59 :
60 : ```plumed
61 : #SETTINGS MOLFILE=regtest/pamm/rt-pamm-periodic/M1d.pdb INPUTFILES=regtest/pamm/rt-pamm-periodic/2D-testc-0.75.pammp
62 : MOLINFO MOLTYPE=protein STRUCTURE=regtest/pamm/rt-pamm-periodic/M1d.pdb
63 : psi: TORSION ATOMS1=@psi-2 ATOMS2=@psi-3 ATOMS3=@psi-4
64 : phi: TORSION ATOMS1=@phi-2 ATOMS2=@phi-3 ATOMS3=@phi-4
65 : p: PAMM ARG=phi,psi CLUSTERS=regtest/pamm/rt-pamm-periodic/2D-testc-0.75.pammp MEAN
66 : PRINT ARG=p-1_mean,p-2_mean FILE=colvar
67 : ```
68 :
69 : The best place to start our explanation is to look at the contents of the `2D-testc-0.75.pammp` file, which you can do
70 : by clicking on the links in the annotated input above. This files contains the parameters of two two-dimensional Gaussian functions.
71 : Each of these Gaussian kernels has a weight, $w_k$, a vector that specifies the position of its center, $\mathbf{c}_k$, and a covariance matrix, $\Sigma_k$.
72 : The $\phi_k$ functions that we use to calculate our PAMM components are thus:
73 :
74 : $$
75 : \phi_k = \frac{w_k}{N_k} \exp\left( -(\mathbf{s} - \mathbf{c}_k)^T \Sigma^{-1}_k (\mathbf{s} - \mathbf{c}_k) \right)
76 : $$
77 :
78 : In the above $N_k$ is a normalization factor that is calculated based on $\Sigma$. The vector $\mathbf{s}$ is a vector of quantities
79 : that are calculated by the input [TORSION](TORSION.md) actions. This vector must be two dimensional and in this case each component is the value of a
80 : torsion angle. If we look at the two TORSION actions in the above we are calculating the $\phi$ and $\psi$ backbone torsional
81 : angles in a protein (Note the use of [MOLINFO](MOLINFO.md) to make specification of atoms straightforward). We thus calculate the values of our
82 : 2 $ \{ \phi \} $ kernels 3 times. The first time we use the $\phi$ and $\psi$ angles in the second residue of the protein,
83 : the second time it is the $\phi$ and $\psi$ angles of the third residue of the protein and the third time it is the $\phi$ and $\psi$ angles
84 : of the fourth residue in the protein. The final two quantities that are output by the print command, p.mean-1 and p.mean-2, are the averages
85 : over these three residues for the quantities:
86 :
87 : $$
88 : s_1 = \frac{ \phi_1}{ \phi_1 + \phi_2 }
89 : $$
90 :
91 : and
92 :
93 : $$
94 : s_2 = \frac{ \phi_2}{ \phi_1 + \phi_2 }
95 : $$
96 :
97 : There is a great deal of flexibility in this input. We can work with, and examine, any number of components, we can use any set of collective variables
98 : and compute these PAMM variables and we can transform the PAMM variables themselves in a large number of different ways when computing these sums. Furthermore,
99 : by expanding the shortcuts in the example above we can obtain insight into how the PAMM method operates.
100 : */
101 : //+ENDPLUMEDOC
102 :
103 : namespace PLMD {
104 : namespace pamm {
105 :
106 : class PAMM : public ActionShortcut {
107 : public:
108 : static void registerKeywords( Keywords& keys );
109 : explicit PAMM(const ActionOptions&);
110 : };
111 :
112 : PLUMED_REGISTER_ACTION(PAMM,"PAMM")
113 :
114 4 : void PAMM::registerKeywords( Keywords& keys ) {
115 4 : ActionShortcut::registerKeywords( keys );
116 4 : keys.add("compulsory","ARG","the vectors from which the pamm coordinates are calculated");
117 4 : keys.add("compulsory","CLUSTERS","the name of the file that contains the definitions of all the clusters");
118 4 : keys.add("compulsory","REGULARISE","0.001","don't allow the denominator to be smaller then this value");
119 4 : keys.add("compulsory","KERNELS","all","which kernels are we computing the PAMM values for");
120 4 : multicolvar::MultiColvarShortcuts::shortcutKeywords( keys );
121 4 : keys.needsAction("KERNEL");
122 4 : keys.needsAction("COMBINE");
123 4 : keys.addDOI("10.1063/1.4900655");
124 4 : keys.addDOI("10.1021/acs.jctc.7b00993");
125 4 : }
126 :
127 2 : PAMM::PAMM(const ActionOptions& ao) :
128 : Action(ao),
129 2 : ActionShortcut(ao) {
130 : // Must get list of input value names
131 : std::vector<std::string> valnames;
132 4 : parseVector("ARG",valnames);
133 : // Create input values
134 2 : std::string argstr=" ARG=" + valnames[0];
135 3 : for(unsigned j=1; j<valnames.size(); ++j) {
136 2 : argstr += "," + valnames[j];
137 : }
138 :
139 : // Create actions to calculate all pamm kernels
140 : unsigned nkernels = 0;
141 : double h;
142 : std::string fname;
143 2 : parse("CLUSTERS",fname);
144 2 : IFile ifile;
145 2 : ifile.open(fname);
146 2 : ifile.allowIgnoredFields();
147 : for(unsigned k=0;; ++k) {
148 22 : if( !ifile.scanField("height",h) ) {
149 : break;
150 : }
151 : // Create a kernel for this cluster
152 : std::string num, wstr, ktype;
153 9 : Tools::convert( k+1, num );
154 9 : Tools::convert(h,wstr);
155 9 : ifile.scanField("kerneltype",ktype);
156 18 : readInputLine( getShortcutLabel() + "_kernel-" + num + ": KERNEL NORMALIZED" + argstr + " NUMBER=" + num + " REFERENCE=" + fname + " WEIGHT=" + wstr + " TYPE=" + ktype );
157 9 : nkernels++;
158 9 : ifile.scanField();
159 9 : }
160 2 : ifile.close();
161 :
162 : // And add on the regularization
163 : std::string regparam;
164 4 : parse("REGULARISE",regparam);
165 : // Now combine all the PAMM objects with the regparam
166 2 : std::string paramstr, cinput = getShortcutLabel() + "_ksum: COMBINE PERIODIC=NO";
167 11 : for(unsigned k=0; k<nkernels; ++k) {
168 : std::string num;
169 9 : Tools::convert( k+1, num );
170 9 : if( k==0 ) {
171 : cinput += " ARG=";
172 4 : paramstr=" PARAMETERS=-" + regparam;
173 : } else {
174 : cinput += ",";
175 : paramstr += ",0";
176 : }
177 18 : cinput += getShortcutLabel() + "_kernel-" + num;
178 : }
179 4 : readInputLine( cinput + paramstr );
180 :
181 : // And now compute all the pamm kernels
182 : std::string kchoice;
183 4 : parse("KERNELS",kchoice);
184 : std::map<std::string,std::string> keymap;
185 2 : multicolvar::MultiColvarShortcuts::readShortcutKeywords( keymap, this );
186 2 : if( kchoice=="all" ) {
187 11 : for(unsigned k=0; k<nkernels; ++k) {
188 : std::string num;
189 9 : Tools::convert( k+1, num );
190 18 : readInputLine( getShortcutLabel() + "-" + num + ": CUSTOM ARG=" + getShortcutLabel() + "_kernel-" + num + "," + getShortcutLabel() + "_ksum FUNC=x/y PERIODIC=NO");
191 18 : multicolvar::MultiColvarShortcuts::expandFunctions( getShortcutLabel() + "-" + num, getShortcutLabel() + "-" + num, "", keymap, this );
192 : }
193 : } else {
194 0 : std::vector<std::string> awords=Tools::getWords(kchoice,"\t\n ,");
195 0 : Tools::interpretRanges( awords );
196 0 : for(unsigned k=0; k<awords.size(); ++k) {
197 0 : readInputLine( getShortcutLabel() + "-" + awords[k] + ": CUSTOM ARG=" + getShortcutLabel() + "_kernel-" + awords[k] + "," + getShortcutLabel() + "_ksum FUNC=x/y PERIODIC=NO");
198 0 : multicolvar::MultiColvarShortcuts::expandFunctions( getShortcutLabel() + "-" + awords[k], getShortcutLabel() + "-" + awords[k], "", keymap, this );
199 : }
200 0 : }
201 4 : }
202 :
203 : }
204 : }
|