Line data Source code
1 : /* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 : Copyright (c) 2012-2023 The plumed team
3 : (see the PEOPLE file at the root of the distribution for a list of names)
4 :
5 : See http://www.plumed.org for more information.
6 :
7 : This file is part of plumed, version 2.
8 :
9 : plumed is free software: you can redistribute it and/or modify
10 : it under the terms of the GNU Lesser General Public License as published by
11 : the Free Software Foundation, either version 3 of the License, or
12 : (at your option) any later version.
13 :
14 : plumed is distributed in the hope that it will be useful,
15 : but WITHOUT ANY WARRANTY; without even the implied warranty of
16 : MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
17 : GNU Lesser General Public License for more details.
18 :
19 : You should have received a copy of the GNU Lesser General Public License
20 : along with plumed. If not, see <http://www.gnu.org/licenses/>.
21 : +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ */
22 : #include "core/ActionRegister.h"
23 : #include "core/ActionShortcut.h"
24 : #include "core/ActionWithValue.h"
25 : #include "core/ActionWithArguments.h"
26 : #include "core/PlumedMain.h"
27 : #include "core/ActionSet.h"
28 :
29 : //+PLUMEDOC GRIDCALC HISTOGRAM
30 : /*
31 : Accumulate the average probability density along a few CVs from a trajectory.
32 :
33 : When using this method it is supposed that you have some collective variable \f$\zeta\f$ that
34 : gives a reasonable description of some physical or chemical phenomenon. As an example of what we
35 : mean by this suppose you wish to examine the following SN2 reaction:
36 :
37 : \f[
38 : \textrm{OH}^- + \textrm{CH}_3Cl \rightarrow \textrm{CH}_3OH + \textrm{Cl}^-
39 : \f]
40 :
41 : The distance between the chlorine atom and the carbon is an excellent collective variable, \f$\zeta\f$,
42 : in this case because this distance is short for the reactant, \f$\textrm{CH}_3Cl\f$, because the carbon
43 : and chlorine are chemically bonded, and because it is long for the product state when these two atoms are
44 : not chemically bonded. We thus might want to accumulate the probability density, \f$P(\zeta)\f$, as a function of this distance
45 : as this will provide us with information about the overall likelihood of the reaction. Furthermore, the
46 : free energy, \f$F(\zeta)\f$, is related to this probability density via:
47 :
48 : \f[
49 : F(\zeta) = - k_B T \ln P(\zeta)
50 : \f]
51 :
52 : Accumulating these probability densities is precisely what this Action can be used to do. Furthermore, the conversion
53 : of the histogram to the free energy can be achieved by using the method \ref CONVERT_TO_FES.
54 :
55 : We calculate histograms within PLUMED using a method known as kernel density estimation, which you can read more about here:
56 :
57 : https://en.wikipedia.org/wiki/Kernel_density_estimation
58 :
59 : In PLUMED the value of \f$\zeta\f$ at each discrete instant in time in the trajectory is accumulated. A kernel, \f$K(\zeta-\zeta(t'),\sigma)\f$,
60 : centered at the current value, \f$\zeta(t)\f$, of this quantity is generated with a bandwidth \f$\sigma\f$, which
61 : is set by the user. These kernels are then used to accumulate the ensemble average for the probability density:
62 :
63 : \f[
64 : \langle P(\zeta) \rangle = \frac{ \sum_{t'=0}^t w(t') K(\zeta-\zeta(t'),\sigma) }{ \sum_{t'=0}^t w(t') }
65 : \f]
66 :
67 : Here the sums run over a portion of the trajectory specified by the user. The final quantity evaluated is a weighted
68 : average as the weights, \f$w(t')\f$, allow us to negate the effect any bias might have on the region of phase space
69 : sampled by the system. This is discussed in the section of the manual on \ref Analysis.
70 :
71 : A discrete analogue of kernel density estimation can also be used. In this analogue the kernels in the above formula
72 : are replaced by Dirac delta functions. When this method is used the final function calculated is no longer a probability
73 : density - it is instead a probability mass function as each element of the function tells you the value of an integral
74 : between two points on your grid rather than the value of a (continuous) function on a grid.
75 :
76 : Additional material and examples can be also found in the tutorials \ref lugano-1.
77 :
78 : \par A note on block averaging and errors
79 :
80 : Some particularly important
81 : issues related to the convergence of histograms and the estimation of error bars around the ensemble averages you calculate are covered in \ref trieste-2.
82 : The technique for estimating error bars that is known as block averaging is introduced in this tutorial. The essence of this technique is that
83 : the trajectory is split into a set of blocks and separate ensemble averages are calculated from each separate block of data. If \f$\{A_i\}\f$ is
84 : the set of \f$N\f$ block averages that are obtained from this technique then the final error bar is calculated as:
85 :
86 : \f[
87 : \textrm{error} = \sqrt{ \frac{1}{N} \frac{1}{N-1} \sum_{i=1}^N (A_i^2 - \langle A \rangle )^2 } \qquad \textrm{where} \qquad \langle A \rangle = \frac{1}{N} \sum_{i=1}^N A_i
88 : \f]
89 :
90 : If the simulation is biased and reweighting is performed then life is a little more complex as each of the block averages should be calculated as a
91 : weighted average. Furthermore, the weights should be taken into account when the final ensemble and error bars are calculated. As such the error should be:
92 :
93 : \f[
94 : \textrm{error} = \sqrt{ \frac{1}{N} \frac{\sum_{i=1}^N W_i }{\sum_{i=1}^N W_i - \sum_{i=1}^N W_i^2 / \sum_{i=1}^N W_i} \sum_{i=1}^N W_i (A_i^2 - \langle A \rangle )^2 }
95 : \f]
96 :
97 : where \f$W_i\f$ is the sum of all the weights for the \f$i\f$th block of data.
98 :
99 : If we wish to calculate a normalized histogram we must calculate ensemble averages from our biased simulation using:
100 : \f[
101 : \langle H(x) \rangle = \frac{\sum_{t=1}^M w_t K( x - x_t,\sigma) }{\sum_{t=1}^M w_t}
102 : \f]
103 : where the sums runs over the trajectory, \f$w_t\f$ is the weight of the \f$t\f$th trajectory frame, \f$x_t\f$ is the value of the CV for the \f$t\f$th
104 : trajectory frame and \f$K\f$ is a kernel function centered on \f$x_t\f$ with bandwidth \f$\sigma\f$. The quantity that is evaluated is the value of the
105 : normalized histogram at point \f$x\f$. The following ensemble average will be calculated if you use the NORMALIZATION=true option in HISTOGRAM.
106 : If the ensemble average is calculated in this way we must calculate the associated error bars from our block averages using the second of the expressions
107 : above.
108 :
109 : A number of works have shown that when biased simulations are performed it is often better to calculate an estimate of the histogram that is not normalized using:
110 : \f[
111 : \langle H(x) \rangle = \frac{1}{M} \sum_{t=1}^M w_t K( x - x_t,\sigma)
112 : \f]
113 : instead of the expression above. As such this is what is done by default in HISTOGRAM or if the NORMALIZATION=ndata option is used.
114 : When the histogram is calculated in this second way the first of the two formula above can be used when calculating error bars from
115 : block averages.
116 :
117 : \par Examples
118 :
119 : The following input monitors two torsional angles during a simulation
120 : and outputs a continuous histogram as a function of them at the end of the simulation.
121 : \plumedfile
122 : TORSION ATOMS=1,2,3,4 LABEL=r1
123 : TORSION ATOMS=2,3,4,5 LABEL=r2
124 : HISTOGRAM ...
125 : ARG=r1,r2
126 : GRID_MIN=-3.14,-3.14
127 : GRID_MAX=3.14,3.14
128 : GRID_BIN=200,200
129 : BANDWIDTH=0.05,0.05
130 : LABEL=hh
131 : ... HISTOGRAM
132 :
133 : DUMPGRID GRID=hh FILE=histo
134 : \endplumedfile
135 :
136 : The following input monitors two torsional angles during a simulation
137 : and outputs a discrete histogram as a function of them at the end of the simulation.
138 : \plumedfile
139 : TORSION ATOMS=1,2,3,4 LABEL=r1
140 : TORSION ATOMS=2,3,4,5 LABEL=r2
141 : HISTOGRAM ...
142 : ARG=r1,r2
143 : KERNEL=DISCRETE
144 : GRID_MIN=-3.14,-3.14
145 : GRID_MAX=3.14,3.14
146 : GRID_BIN=200,200
147 : LABEL=hh
148 : ... HISTOGRAM
149 :
150 : DUMPGRID GRID=hh FILE=histo
151 : \endplumedfile
152 :
153 : The following input monitors two torsional angles during a simulation
154 : and outputs the histogram accumulated thus far every 100000 steps.
155 : \plumedfile
156 : TORSION ATOMS=1,2,3,4 LABEL=r1
157 : TORSION ATOMS=2,3,4,5 LABEL=r2
158 : HISTOGRAM ...
159 : ARG=r1,r2
160 : GRID_MIN=-3.14,-3.14
161 : GRID_MAX=3.14,3.14
162 : GRID_BIN=200,200
163 : BANDWIDTH=0.05,0.05
164 : LABEL=hh
165 : ... HISTOGRAM
166 :
167 : DUMPGRID GRID=hh FILE=histo STRIDE=100000
168 : \endplumedfile
169 :
170 : The following input monitors two torsional angles during a simulation
171 : and outputs a separate histogram for each 100000 steps worth of trajectory.
172 : Notice how the CLEAR keyword is used here and how it is not used in the
173 : previous example.
174 :
175 : \plumedfile
176 : TORSION ATOMS=1,2,3,4 LABEL=r1
177 : TORSION ATOMS=2,3,4,5 LABEL=r2
178 : HISTOGRAM ...
179 : ARG=r1,r2 CLEAR=100000
180 : GRID_MIN=-3.14,-3.14
181 : GRID_MAX=3.14,3.14
182 : GRID_BIN=200,200
183 : BANDWIDTH=0.05,0.05
184 : LABEL=hh
185 : ... HISTOGRAM
186 :
187 : DUMPGRID GRID=hh FILE=histo STRIDE=100000
188 : \endplumedfile
189 :
190 : */
191 : //+ENDPLUMEDOC
192 :
193 : namespace PLMD {
194 : namespace gridtools {
195 :
196 : class Histogram : public ActionShortcut {
197 : public:
198 : static void registerKeywords( Keywords& keys );
199 : explicit Histogram( const ActionOptions& );
200 : };
201 :
202 : PLUMED_REGISTER_ACTION(Histogram,"HISTOGRAM")
203 :
204 29 : void Histogram::registerKeywords( Keywords& keys ) {
205 58 : ActionShortcut::registerKeywords( keys ); keys.use("UPDATE_FROM"); keys.use("UPDATE_UNTIL");
206 58 : keys.add("compulsory","NORMALIZATION","ndata","This controls how the data is normalized it can be set equal to true, false or ndata. See above for an explanation");
207 58 : keys.addInputKeyword("optional","ARG","scalar/vector/matrix","the quantities that are being used to construct the histogram");
208 58 : keys.add("optional","DATA","an alternative to the ARG keyword");
209 58 : keys.add("compulsory","GRID_MIN","auto","the lower bounds for the grid");
210 58 : keys.add("compulsory","GRID_MAX","auto","the upper bounds for the grid");
211 58 : keys.add("optional","BANDWIDTH","the bandwidths for kernel density esimtation");
212 58 : keys.add("compulsory","KERNEL","GAUSSIAN","the kernel function you are using. More details on the kernels available "
213 : "in plumed plumed can be found in \\ref kernelfunctions.");
214 58 : keys.add("optional","GRID_BIN","the number of bins for the grid");
215 58 : keys.add("optional","GRID_SPACING","the approximate grid spacing (to be used as an alternative or together with GRID_BIN)");
216 58 : keys.add("optional","LOGWEIGHTS","the logarithm of the quantity to use as the weights when calculating averages");
217 58 : keys.add("compulsory","STRIDE","1","the frequency with which to store data for averaging");
218 58 : keys.add("compulsory","CLEAR","0","the frequency with whihc to clear the data that is being averaged");
219 58 : keys.setValueDescription("grid","the estimate of the histogram as a function of the argument that was obtained");
220 87 : keys.needsAction("COMBINE"); keys.needsAction("CUSTOM"); keys.needsAction("ONES");
221 58 : keys.needsAction("KDE"); keys.needsAction("ACCUMULATE");
222 29 : }
223 :
224 22 : Histogram::Histogram( const ActionOptions& ao ):
225 : Action(ao),
226 22 : ActionShortcut(ao)
227 : {
228 44 : std::string normflag; parse("NORMALIZATION",normflag);
229 88 : std::string lw; parse("LOGWEIGHTS",lw); std::string stride, clearstride; parse("STRIDE",stride); parse("CLEAR",clearstride);
230 28 : if( lw.length()>0 && normflag!="ndata" ) {
231 12 : readInputLine( getShortcutLabel() + "_wsum: COMBINE ARG=" + lw + " PERIODIC=NO");
232 12 : readInputLine( getShortcutLabel() + "_weight: CUSTOM ARG=" + getShortcutLabel() + "_wsum FUNC=exp(x) PERIODIC=NO");
233 32 : } else readInputLine( getShortcutLabel() + "_weight: ONES SIZE=1" );
234 :
235 44 : std::vector<std::string> arglist; parseVector("ARG",arglist); if( arglist.size()==0 ) parseVector("DATA",arglist);
236 22 : if( arglist.size()==0 ) error("arguments have not been specified use ARG");
237 22 : std::vector<Value*> theargs; ActionWithArguments::interpretArgumentList( arglist, plumed.getActionSet(), this, theargs );
238 22 : plumed_assert( theargs.size()>0 ); std::string argstr=theargs[0]->getName();
239 33 : for(unsigned i=1; i<theargs.size(); ++i) argstr += "," + theargs[i]->getName();
240 22 : std::string strnum; Tools::convert( theargs[0]->getNumberOfValues(), strnum );
241 22 : if( theargs[0]->getNumberOfValues()==1 ) {
242 : // Create the KDE object
243 38 : readInputLine( getShortcutLabel() + "_kde: KDE ARG=" + argstr + " " + convertInputLineToString() );
244 : } else {
245 : // Create the KDE object
246 6 : readInputLine( getShortcutLabel() + "_kde_u: KDE ARG=" + argstr + " " + convertInputLineToString() );
247 : // Normalise the KDE object
248 6 : readInputLine( getShortcutLabel() + "_kde: CUSTOM ARG=" + getShortcutLabel() + "_kde_u PERIODIC=NO FUNC=x/" + strnum );
249 : }
250 : // Now get the quantity to accumulate
251 44 : readInputLine( getShortcutLabel() + "_kdep: CUSTOM ARG=" + getShortcutLabel() + "_kde," + getShortcutLabel() + "_weight FUNC=x*y PERIODIC=NO");
252 : // And accumulate the average
253 22 : if( normflag=="false" ) {
254 14 : readInputLine( getShortcutLabel() + ": ACCUMULATE ARG=" + getShortcutLabel() + "_kdep STRIDE=" + stride + " CLEAR=" + clearstride + " " + getUpdateLimits() );
255 : } else {
256 30 : readInputLine( getShortcutLabel() + "_u: ACCUMULATE ARG=" + getShortcutLabel() + "_kdep STRIDE=" + stride + " CLEAR=" + clearstride + " " + getUpdateLimits() );
257 30 : readInputLine( getShortcutLabel() + "_nsum: ACCUMULATE ARG=" + getShortcutLabel() + "_weight STRIDE=" + stride + " CLEAR=" + clearstride + " " + getUpdateLimits() );
258 : // And divide by the total weight
259 30 : readInputLine( getShortcutLabel() + ": CUSTOM ARG=" + getShortcutLabel() + "_u," + getShortcutLabel() + "_nsum FUNC=x/y PERIODIC=NO");
260 : }
261 44 : }
262 :
263 : }
264 : }
|