LCOV - code coverage report
Current view: top level - gridtools - Histogram.cpp (source / functions) Hit Total Coverage
Test: plumed test coverage Lines: 43 43 100.0 %
Date: 2024-10-18 14:00:25 Functions: 2 3 66.7 %

          Line data    Source code
       1             : /* +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
       2             :    Copyright (c) 2012-2023 The plumed team
       3             :    (see the PEOPLE file at the root of the distribution for a list of names)
       4             : 
       5             :    See http://www.plumed.org for more information.
       6             : 
       7             :    This file is part of plumed, version 2.
       8             : 
       9             :    plumed is free software: you can redistribute it and/or modify
      10             :    it under the terms of the GNU Lesser General Public License as published by
      11             :    the Free Software Foundation, either version 3 of the License, or
      12             :    (at your option) any later version.
      13             : 
      14             :    plumed is distributed in the hope that it will be useful,
      15             :    but WITHOUT ANY WARRANTY; without even the implied warranty of
      16             :    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
      17             :    GNU Lesser General Public License for more details.
      18             : 
      19             :    You should have received a copy of the GNU Lesser General Public License
      20             :    along with plumed.  If not, see <http://www.gnu.org/licenses/>.
      21             : +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ */
      22             : #include "core/ActionRegister.h"
      23             : #include "core/ActionShortcut.h"
      24             : #include "core/ActionWithValue.h"
      25             : #include "core/ActionWithArguments.h"
      26             : #include "core/PlumedMain.h"
      27             : #include "core/ActionSet.h"
      28             : 
      29             : //+PLUMEDOC GRIDCALC HISTOGRAM
      30             : /*
      31             : Accumulate the average probability density along a few CVs from a trajectory.
      32             : 
      33             : When using this method it is supposed that you have some collective variable \f$\zeta\f$ that
      34             : gives a reasonable description of some physical or chemical phenomenon.  As an example of what we
      35             : mean by this suppose you wish to examine the following SN2 reaction:
      36             : 
      37             : \f[
      38             :  \textrm{OH}^- + \textrm{CH}_3Cl  \rightarrow \textrm{CH}_3OH + \textrm{Cl}^-
      39             : \f]
      40             : 
      41             : The distance between the chlorine atom and the carbon is an excellent collective variable, \f$\zeta\f$,
      42             : in this case because this distance is short for the reactant, \f$\textrm{CH}_3Cl\f$, because the carbon
      43             : and chlorine are chemically bonded, and because it is long for the product state when these two atoms are
      44             : not chemically bonded.  We thus might want to accumulate the probability density, \f$P(\zeta)\f$, as a function of this distance
      45             : as this will provide us with information about the overall likelihood of the reaction.   Furthermore, the
      46             : free energy, \f$F(\zeta)\f$, is related to this probability density via:
      47             : 
      48             : \f[
      49             : F(\zeta) = - k_B T \ln P(\zeta)
      50             : \f]
      51             : 
      52             : Accumulating these probability densities is precisely what this Action can be used to do.  Furthermore, the conversion
      53             : of the histogram to the free energy can be achieved by using the method \ref CONVERT_TO_FES.
      54             : 
      55             : We calculate histograms within PLUMED using a method known as kernel density estimation, which you can read more about here:
      56             : 
      57             : https://en.wikipedia.org/wiki/Kernel_density_estimation
      58             : 
      59             : In PLUMED the value of \f$\zeta\f$ at each discrete instant in time in the trajectory is accumulated.  A kernel, \f$K(\zeta-\zeta(t'),\sigma)\f$,
      60             : centered at the current value, \f$\zeta(t)\f$, of this quantity is generated with a bandwidth \f$\sigma\f$, which
      61             : is set by the user.  These kernels are then used to accumulate the ensemble average for the probability density:
      62             : 
      63             : \f[
      64             : \langle P(\zeta) \rangle = \frac{ \sum_{t'=0}^t w(t') K(\zeta-\zeta(t'),\sigma) }{ \sum_{t'=0}^t w(t') }
      65             : \f]
      66             : 
      67             : Here the sums run over a portion of the trajectory specified by the user.  The final quantity evaluated is a weighted
      68             : average as the weights, \f$w(t')\f$, allow us to negate the effect any bias might have on the region of phase space
      69             : sampled by the system.  This is discussed in the section of the manual on \ref Analysis.
      70             : 
      71             : A discrete analogue of kernel density estimation can also be used.  In this analogue the kernels in the above formula
      72             : are replaced by Dirac delta functions.   When this method is used the final function calculated is no longer a probability
      73             : density - it is instead a probability mass function as each element of the function tells you the value of an integral
      74             : between two points on your grid rather than the value of a (continuous) function on a grid.
      75             : 
      76             : Additional material and examples can be also found in the tutorials \ref lugano-1.
      77             : 
      78             : \par A note on block averaging and errors
      79             : 
      80             : Some particularly important
      81             : issues related to the convergence of histograms and the estimation of error bars around the ensemble averages you calculate are covered in \ref trieste-2.
      82             : The technique for estimating error bars that is known as block averaging is introduced in this tutorial.  The essence of this technique is that
      83             : the trajectory is split into a set of blocks and separate ensemble averages are calculated from each separate block of data.  If \f$\{A_i\}\f$ is
      84             : the set of \f$N\f$ block averages that are obtained from this technique then the final error bar is calculated as:
      85             : 
      86             : \f[
      87             : \textrm{error} = \sqrt{ \frac{1}{N} \frac{1}{N-1} \sum_{i=1}^N (A_i^2 - \langle A \rangle )^2 } \qquad \textrm{where} \qquad \langle A \rangle = \frac{1}{N} \sum_{i=1}^N A_i
      88             : \f]
      89             : 
      90             : If the simulation is biased and reweighting is performed then life is a little more complex as each of the block averages should be calculated as a
      91             : weighted average.  Furthermore, the weights should be taken into account when the final ensemble and error bars are calculated.  As such the error should be:
      92             : 
      93             : \f[
      94             : \textrm{error} = \sqrt{ \frac{1}{N} \frac{\sum_{i=1}^N W_i }{\sum_{i=1}^N W_i - \sum_{i=1}^N W_i^2 / \sum_{i=1}^N W_i} \sum_{i=1}^N W_i (A_i^2 - \langle A \rangle )^2 }
      95             : \f]
      96             : 
      97             : where \f$W_i\f$ is the sum of all the weights for the \f$i\f$th block of data.
      98             : 
      99             : If we wish to calculate a normalized histogram we must calculate ensemble averages from our biased simulation using:
     100             : \f[
     101             :  \langle H(x) \rangle = \frac{\sum_{t=1}^M w_t K( x - x_t,\sigma) }{\sum_{t=1}^M w_t}
     102             : \f]
     103             : where the sums runs over the trajectory, \f$w_t\f$ is the weight of the \f$t\f$th trajectory frame, \f$x_t\f$ is the value of the CV for the \f$t\f$th
     104             : trajectory frame and \f$K\f$ is a kernel function centered on \f$x_t\f$ with bandwidth \f$\sigma\f$.  The quantity that is evaluated is the value of the
     105             : normalized histogram at point \f$x\f$.  The following ensemble average will be calculated if you use the NORMALIZATION=true option in HISTOGRAM.
     106             : If the ensemble average is calculated in this way we must calculate the associated error bars from our block averages using the second of the expressions
     107             : above.
     108             : 
     109             : A number of works have shown that when biased simulations are performed it is often better to calculate an estimate of the histogram that is not normalized using:
     110             : \f[
     111             : \langle H(x) \rangle = \frac{1}{M} \sum_{t=1}^M w_t K( x - x_t,\sigma)
     112             : \f]
     113             : instead of the expression above.  As such this is what is done by default in HISTOGRAM or if the NORMALIZATION=ndata option is used.
     114             : When the histogram is calculated in this second way the first of the two formula above can be used when calculating error bars from
     115             : block averages.
     116             : 
     117             : \par Examples
     118             : 
     119             : The following input monitors two torsional angles during a simulation
     120             : and outputs a continuous histogram as a function of them at the end of the simulation.
     121             : \plumedfile
     122             : TORSION ATOMS=1,2,3,4 LABEL=r1
     123             : TORSION ATOMS=2,3,4,5 LABEL=r2
     124             : HISTOGRAM ...
     125             :   ARG=r1,r2
     126             :   GRID_MIN=-3.14,-3.14
     127             :   GRID_MAX=3.14,3.14
     128             :   GRID_BIN=200,200
     129             :   BANDWIDTH=0.05,0.05
     130             :   LABEL=hh
     131             : ... HISTOGRAM
     132             : 
     133             : DUMPGRID GRID=hh FILE=histo
     134             : \endplumedfile
     135             : 
     136             : The following input monitors two torsional angles during a simulation
     137             : and outputs a discrete histogram as a function of them at the end of the simulation.
     138             : \plumedfile
     139             : TORSION ATOMS=1,2,3,4 LABEL=r1
     140             : TORSION ATOMS=2,3,4,5 LABEL=r2
     141             : HISTOGRAM ...
     142             :   ARG=r1,r2
     143             :   KERNEL=DISCRETE
     144             :   GRID_MIN=-3.14,-3.14
     145             :   GRID_MAX=3.14,3.14
     146             :   GRID_BIN=200,200
     147             :   LABEL=hh
     148             : ... HISTOGRAM
     149             : 
     150             : DUMPGRID GRID=hh FILE=histo
     151             : \endplumedfile
     152             : 
     153             : The following input monitors two torsional angles during a simulation
     154             : and outputs the histogram accumulated thus far every 100000 steps.
     155             : \plumedfile
     156             : TORSION ATOMS=1,2,3,4 LABEL=r1
     157             : TORSION ATOMS=2,3,4,5 LABEL=r2
     158             : HISTOGRAM ...
     159             :   ARG=r1,r2
     160             :   GRID_MIN=-3.14,-3.14
     161             :   GRID_MAX=3.14,3.14
     162             :   GRID_BIN=200,200
     163             :   BANDWIDTH=0.05,0.05
     164             :   LABEL=hh
     165             : ... HISTOGRAM
     166             : 
     167             : DUMPGRID GRID=hh FILE=histo STRIDE=100000
     168             : \endplumedfile
     169             : 
     170             : The following input monitors two torsional angles during a simulation
     171             : and outputs a separate histogram for each 100000 steps worth of trajectory.
     172             : Notice how the CLEAR keyword is used here and how it is not used in the
     173             : previous example.
     174             : 
     175             : \plumedfile
     176             : TORSION ATOMS=1,2,3,4 LABEL=r1
     177             : TORSION ATOMS=2,3,4,5 LABEL=r2
     178             : HISTOGRAM ...
     179             :   ARG=r1,r2 CLEAR=100000
     180             :   GRID_MIN=-3.14,-3.14
     181             :   GRID_MAX=3.14,3.14
     182             :   GRID_BIN=200,200
     183             :   BANDWIDTH=0.05,0.05
     184             :   LABEL=hh
     185             : ... HISTOGRAM
     186             : 
     187             : DUMPGRID GRID=hh FILE=histo STRIDE=100000
     188             : \endplumedfile
     189             : 
     190             : */
     191             : //+ENDPLUMEDOC
     192             : 
     193             : namespace PLMD {
     194             : namespace gridtools {
     195             : 
     196             : class Histogram : public ActionShortcut {
     197             : public:
     198             :   static void registerKeywords( Keywords& keys );
     199             :   explicit Histogram( const ActionOptions& );
     200             : };
     201             : 
     202             : PLUMED_REGISTER_ACTION(Histogram,"HISTOGRAM")
     203             : 
     204          29 : void Histogram::registerKeywords( Keywords& keys ) {
     205          58 :   ActionShortcut::registerKeywords( keys ); keys.use("UPDATE_FROM"); keys.use("UPDATE_UNTIL");
     206          58 :   keys.add("compulsory","NORMALIZATION","ndata","This controls how the data is normalized it can be set equal to true, false or ndata.  See above for an explanation");
     207          58 :   keys.add("optional","ARG","the quantity that is being averaged");
     208          58 :   keys.add("optional","DATA","an alternative to the ARG keyword");
     209          58 :   keys.add("compulsory","GRID_MIN","auto","the lower bounds for the grid");
     210          58 :   keys.add("compulsory","GRID_MAX","auto","the upper bounds for the grid");
     211          58 :   keys.add("optional","BANDWIDTH","the bandwidths for kernel density esimtation");
     212          58 :   keys.add("compulsory","KERNEL","GAUSSIAN","the kernel function you are using.  More details on  the kernels available "
     213             :            "in plumed plumed can be found in \\ref kernelfunctions.");
     214          58 :   keys.add("optional","GRID_BIN","the number of bins for the grid");
     215          58 :   keys.add("optional","GRID_SPACING","the approximate grid spacing (to be used as an alternative or together with GRID_BIN)");
     216          58 :   keys.add("optional","LOGWEIGHTS","the logarithm of the quantity to use as the weights when calculating averages");
     217          58 :   keys.add("compulsory","STRIDE","1","the frequency with which to store data for averaging");
     218          58 :   keys.add("compulsory","CLEAR","0","the frequency with whihc to clear the data that is being averaged");
     219          29 :   keys.setValueDescription("the estimate of the histogram as a function of the argument that was obtained");
     220          87 :   keys.needsAction("COMBINE"); keys.needsAction("CUSTOM"); keys.needsAction("ONES");
     221          58 :   keys.needsAction("KDE"); keys.needsAction("ACCUMULATE");
     222          29 : }
     223             : 
     224          22 : Histogram::Histogram( const ActionOptions& ao ):
     225             :   Action(ao),
     226          22 :   ActionShortcut(ao)
     227             : {
     228          44 :   std::string normflag; parse("NORMALIZATION",normflag);
     229          88 :   std::string lw; parse("LOGWEIGHTS",lw); std::string stride, clearstride; parse("STRIDE",stride); parse("CLEAR",clearstride);
     230          28 :   if( lw.length()>0 && normflag!="ndata" ) {
     231          12 :     readInputLine( getShortcutLabel() + "_wsum: COMBINE ARG=" + lw + " PERIODIC=NO");
     232          12 :     readInputLine( getShortcutLabel() + "_weight: CUSTOM ARG=" + getShortcutLabel() + "_wsum FUNC=exp(x) PERIODIC=NO");
     233          32 :   } else readInputLine( getShortcutLabel() + "_weight: ONES SIZE=1" );
     234             : 
     235          44 :   std::vector<std::string> arglist; parseVector("ARG",arglist); if( arglist.size()==0 ) parseVector("DATA",arglist);
     236          22 :   if( arglist.size()==0 ) error("arguments have not been specified use ARG");
     237          22 :   std::vector<Value*> theargs; ActionWithArguments::interpretArgumentList( arglist, plumed.getActionSet(), this, theargs );
     238          22 :   plumed_assert( theargs.size()>0 ); std::string argstr=theargs[0]->getName();
     239          33 :   for(unsigned i=1; i<theargs.size(); ++i) argstr += "," + theargs[i]->getName();
     240          22 :   std::string strnum; Tools::convert( theargs[0]->getNumberOfValues(), strnum );
     241          22 :   if( theargs[0]->getNumberOfValues()==1 ) {
     242             :     // Create the KDE object
     243          38 :     readInputLine( getShortcutLabel() + "_kde: KDE ARG=" + argstr + " " + convertInputLineToString() );
     244             :   } else {
     245             :     // Create the KDE object
     246           6 :     readInputLine( getShortcutLabel() + "_kde_u: KDE ARG=" + argstr + " " + convertInputLineToString() );
     247             :     // Normalise the KDE object
     248           6 :     readInputLine( getShortcutLabel() + "_kde: CUSTOM ARG=" + getShortcutLabel() + "_kde_u PERIODIC=NO FUNC=x/" + strnum );
     249             :   }
     250             :   // Now get the quantity to accumulate
     251          44 :   readInputLine( getShortcutLabel() + "_kdep: CUSTOM ARG=" + getShortcutLabel() + "_kde," + getShortcutLabel() + "_weight FUNC=x*y PERIODIC=NO");
     252             :   // And accumulate the average
     253          22 :   if( normflag=="false" ) {
     254          14 :     readInputLine( getShortcutLabel() + ": ACCUMULATE ARG=" + getShortcutLabel() + "_kdep STRIDE=" + stride + " CLEAR=" + clearstride + " " + getUpdateLimits() );
     255             :   } else {
     256          30 :     readInputLine( getShortcutLabel() + "_u: ACCUMULATE ARG=" + getShortcutLabel() + "_kdep STRIDE=" + stride + " CLEAR=" + clearstride + " " + getUpdateLimits() );
     257          30 :     readInputLine( getShortcutLabel() + "_nsum: ACCUMULATE ARG=" + getShortcutLabel() + "_weight STRIDE=" + stride + " CLEAR=" + clearstride + " " + getUpdateLimits() );
     258             :     // And divide by the total weight
     259          30 :     readInputLine( getShortcutLabel() + ": CUSTOM ARG=" + getShortcutLabel() + "_u," + getShortcutLabel() + "_nsum FUNC=x/y PERIODIC=NO");
     260             :   }
     261          44 : }
     262             : 
     263             : }
     264             : }

Generated by: LCOV version 1.16