This is part of the cltools module

benchmark is a lightweight reimplementation of driver focused on running benchmarks

The main difference wrt driver is that it generates a trajectory in memory rather than reading it from a file. This allows to better time the overhead of the plumed library, without including the time needed to read the trajectory.

It is also possible to load a separate version of the plumed kernel. This enables running benchmarks agaist previous plumed versions in a controlled setting, where systematic errors in the comparison are minimized.

Examples

First, you should create a sample plumed.dat file for testing. For instance:

Click on the labels of the actions for more information on what each action computes

WHOLEMOLECULES ENTITY0the atoms that make up a molecule that you wish to align. =1-10000 The WHOLEMOLECULES action with label 
p: POSITION ATOMthe atom number. =1 The POSITION action with label p calculates the following quantities:

 Quantity    Description  
p.x the x-component of the atom position
p.y the y-component of the atom position
p.z the z-component of the atom position


RESTRAINT ARGthe values the harmonic restraint acts upon. =p.x KAPPAcompulsory keyword ( default=0.0 )
specifies that the restraint is harmonic and what the values of the force constants
on each of the variables are =1 ATcompulsory keyword 
the position of the restraint =0 The RESTRAINT action with label

WHOLEMOLECULES ENTITY0the atoms that make up a molecule that you wish to align. =1-10000 The WHOLEMOLECULES action with label 
p: POSITION ATOMthe atom number. =1 The POSITION action with label p calculates the following quantities:

 Quantity    Description  
p.x the x-component of the atom position
p.y the y-component of the atom position
p.z the z-component of the atom position

Then you can test the performance of this input with the following command:

plumed benchmark

You can also test a different (older) version of PLUMED with the same input. To do so, you should run

plumed-runtime benchmark --kernel /path/to/lib/libplumedKernel.so

Warning: It is necessary to use the plumed-runtime executable here to avoid conflicts between different plumed versions. You will find it in your path if you are using the non installed version of plumed, and in $prefix/lib/plumed if you installed plumed in $prefix,.

Comparing multiple versions

The best way to compare two versions of plumed on the same input is to pass multiple colon-separated kernels:

plumed-runtime benchmark --kernel /path/to/lib/libplumedKernel.so:/path2/to/lib/libplumedKernel.so:this

Here this means the kernel of the version with which you are running the benchmark. This comparison runs the three instances simultaneously (alternating them) so that systematic differences in the load of your machine will affect them to the same extent.

In case the different versions require modified plumed.dat files, or if you simply want to compare two different plumed input files that compute the same thing, you can also use multiple plumed input files:

plumed-runtime benchmark --kernel /path/to/lib/libplumedKernel.so:this --plumed plumed1.dat:plumed2.dat

Similarly, you might want to run two different inputs using the same kernel, which can be obtained with:

plumed-runtime benchmark --plumed plumed1.dat:plumed2.dat

Profiling

If you want to attach a profiler on the fly to the process, you might find it convenient to use --nsteps -1. The simulation will run forever and can be interrupted with CTRL-C. When interrupted, the result of the timers should be displayed anyway. You can also run setting a maximum time with --maxtime.

If you run a profiler when testing multiple PLUMED versions you might be confused by which function is from each version. It is recommended to recompile separate instances with a separate C++ namespace (-DPLMD=PLUMED_version_1) so that you will be able to distinguish them. In addition, compiling with CXXFLAGS="-g -O3" will make the profiling report more complete, likely including code lines.

MPI runs

You can run emulating a domain decomposition. This is done automatically if plumed has been compiled with MPI and you run with mpirun

mpirun -np 4 plumed-runtime benchmark

If you load separate PLUMED instances as discussed above, they should all be compiled against the same MPI version. Notice that when using MPI signals (CTRL-C) might not work.

Since some of the data transfer could happen asynchronously, you might want to use the --sleep option to simulate a lag between the prepareCalc and performCalc actions. This part of the calculation will not contribute to timer, but will obviously slow down your test.

Output

In the output you will see the usual reports about timing produced by the internal timers of the tested plumed instances. In addition, this tool will monitor the timing externally, with some slightly different criterion:

First, the initialization (construction of the input) will be shown with a separate timer, as well as the timing for the first step.
Second, the timer corresponding to the calculation will be split in three parts, reporting execution of the first 20% (warm-up) and the next two blocks of 40% each.
Finally, you might notice some discrepancy because some of the actions that are usually not expensive are not included in the internal timers. The external timer will thus provide a better estimate of the total elapsed time, including everything.

The internal timers are still useful to monitor what happens at the different stages and, with DEBUG DETAILED_TIMERS, what happens in each action.

When you run multiple version, a comparative analisys of the time spent within PLUMED in the various instances will be done, showing the ratio between the total time and the time measured on the first instance, which will act as a reference. Errors will be estimated with bootstrapping. The warm-up phase will be discarded for this analysis.

Glossary of keywords and components

Compulsory keywords

`--plumed`	( default=plumed.dat ) colon separated path(s) to the input file(s)
`--kernel`	( default=this ) colon separated path(s) to kernel(s)
`--natoms`	( default=100000 ) the number of atoms to use for the simulation
`--nsteps`	( default=2000 ) number of steps of MD to perform (-1 means forever)
`--maxtime`	( default=-1 ) maximum number of seconds (-1 means forever)
`--sleep`	( default=0 ) number of seconds of sleep, mimicking MD calculation
`--atom-distribution`	( default=line ) the kind of possible atomic displacement at each step

Options

`--help/-h`	( default=off ) print this help
`--domain-decomposition`	( default=off ) simulate domain decomposition, implies –shuffle
`--shuffled`	( default=off ) reshuffle atoms
`--dump-trajectory`	dump the trajectory to this file

Quantity	Description
p.x	the x-component of the atom position
p.y	the y-component of the atom position
p.z	the z-component of the atom position