This is part of the isdb module |
Calculates the backbone chemical shifts for a protein.
The functional form is that of CamShift [55]. The chemical shifts of the selected nuclei/residues are saved as components. Reference experimental values can also be stored as components. The two sets of components can then be used to calculate either a scoring function as in [76] [47], using the keyword CAMSHIFT or to calculate ensemble averaged chemical shift as in [29] [30] (see ENSEMBLE, STATS and RESTRAINT). Finally they can also be used as input for METAINFERENCE, [16] . In the current implementation there is no need to pass the data to METAINFERENCE because CS2BACKBONE can internally enable Metainference using the keywork DOSCORE.
CamShift calculation is relatively heavy because it often uses a large number of atoms, in order to make it faster it is currently parallelised with OpenMP.
As a general rule, when using CS2BACKBONE or other experimental restraints it is better to increase the accuracy of the constraint algorithm due to the increased strain on the bonded structure. In the case of GROMACS it is safer to use lincs-iter=2 and lincs-order=6.
In general the system for which chemical shifts are calculated must be completly included in ATOMS and a TEMPLATE pdb file for the same atoms should be provided as well in the folder DATADIR. The atoms are made automatically whole unless NOPBC is used, in particular if the system is made of by multiple chains it is usually better to use NOPBC and make the molecule whole WHOLEMOLECULES selecting an appropriate order.
In addition to a pdb file one needs to provide a list of chemical shifts to be calculated using one file per nucleus type (CAshifts.dat, CBshifts.dat, Cshifts.dat, Hshifts.dat, HAshifts.dat, Nshifts.dat), all the six files should always be present. A chemical shift for a nucleus is calculated if a value greater than 0 is provided. For practical purposes the value can correspond to the experimental value. Residues numbers should go from 1 to N irrespectively of the numbers used in the pdb file. The first and last residue of each chain should be preceeded by a # character. Termini groups like ACE or NME should be removed from the PDB.
CAshifts.dat: #1 0.0 2 55.5 3 58.4 . . #last 0.0 #last+1 (first) of second chain . #last of second chain
The default behaviour is to store the values for the active nuclei in components (ca_#, cb_#, co_#, ha_#, hn_#, nh_# and expca_#, expcb_#, expco_#, expha_#, exphn_#, exp_nh#) with NOEXP it is possible to only store the backcalculated values.
A pdb file is needed to the generate a simple topology of the protein. For histidines in protonation states different from D the HIE/HSE HIP/HSP name should be used. GLH and ASH can be used for the alternative protonation of GLU and ASP. Non-standard amino acids and other molecules are not yet supported, but in principle they can be named UNK. If multiple chains are present the chain identifier must be in the standard PDB format, together with the TER keyword at the end of each chain.
One more standard file is also needed in the folder DATADIR: camshift.db. This file includes all the CamShift parameters and can be found in regtest/isdb/rt-cs2backbone/data/ .
All the above files must be in a single folder that must be specified with the keyword DATADIR.
Additional material and examples can be also found in the tutorial Belfast tutorial: NMR restraints
The names of the components in this action can be customized by the user in the actions input file. However, in addition to these customizable components the following quantities will always be output
Quantity | Description |
sigma | uncertainty parameter |
sigmaMean | uncertainty in the mean estimate |
acceptSigma | MC acceptance |
ha | the calculated Ha hydrogen chemical shifts |
hn | the calculated H hydrogen chemical shifts |
nh | the calculated N nitrogen chemical shifts |
ca | the calculated Ca carbon chemical shifts |
cb | the calculated Cb carbon chemical shifts |
co | the calculated C' carbon chemical shifts |
expha | the experimental Ha hydrogen chemical shifts |
exphn | the experimental H hydrogen chemical shifts |
expnh | the experimental N nitrogen chemical shifts |
expca | the experimental Ca carbon chemical shifts |
expcb | the experimental Cb carbon chemical shifts |
expco | the experimental C' carbon chemical shifts |
In addition the following quantities can be calculated by employing the keywords listed below
Quantity | Keyword | Description |
acceptScale | SCALEDATA | MC acceptance |
weight | REWEIGHT | weights of the weighted average |
biasDer | REWEIGHT | derivatives wrt the bias |
scale | SCALEDATA | scale parameter |
offset | ADDOFFSET | offset parameter |
ftilde | GENERIC | ensemble average estimator |
ATOMS | The atoms to be included in the calculation, e.g. the whole protein.. For more information on how to specify lists of atoms see Groups and Virtual Atoms |
NOISETYPE | ( default=MGAUSS ) functional form of the noise (GAUSS,MGAUSS,OUTLIERS,MOUTLIERS,GENERIC) |
LIKELIHOOD | ( default=GAUSS ) the likelihood for the GENERIC metainference model, GAUSS or LOGN |
DFTILDE | ( default=0.1 ) fraction of sigma_mean used to evolve ftilde |
SCALE0 | ( default=1.0 ) initial value of the scaling factor |
SCALE_PRIOR | ( default=FLAT ) either FLAT or GAUSSIAN |
OFFSET0 | ( default=0.0 ) initial value of the offset |
OFFSET_PRIOR | ( default=FLAT ) either FLAT or GAUSSIAN |
SIGMA0 | ( default=1.0 ) initial value of the uncertainty parameter |
SIGMA_MIN | ( default=0.0 ) minimum value of the uncertainty parameter |
SIGMA_MAX | ( default=10. ) maximum value of the uncertainty parameter |
OPTSIGMAMEAN | ( default=NONE ) Set to NONE/SEM to manually set sigma mean, or to estimate it on the fly |
WRITE_STRIDE | ( default=1000 ) write the status to a file every N steps, this can be used for restart/continuation |
DATADIR | ( default=data/ ) The folder with the experimental chemical shifts. |
TEMPLATE | ( default=template.pdb ) A PDB file of the protein system to initialise ALMOST. |
NEIGH_FREQ | ( default=20 ) Period in step for neighbour list update. |
NRES | Number of residues, corresponding to the number of chemical shifts. |
NUMERICAL_DERIVATIVES | ( default=off ) calculate the derivatives for these quantities numerically |
DOSCORE | ( default=off ) activate metainference |
NOENSEMBLE | ( default=off ) don't perform any replica-averaging |
REWEIGHT | ( default=off ) simple REWEIGHT using the ARG as energy |
SCALEDATA | ( default=off ) Set to TRUE if you want to sample a scaling factor common to all values and replicas |
ADDOFFSET | ( default=off ) Set to TRUE if you want to sample an offset common to all values and replicas |
NOPBC | ( default=off ) ignore the periodic boundary conditions when calculating distances |
CAMSHIFT | ( default=off ) Set to TRUE if you to calculate a single CamShift score. |
NOEXP | ( default=off ) Set to TRUE if you don't want to have fixed components with the experimetnal values. |
ARG | the input for this action is the scalar output from one or more other actions. The particular scalars that you will use are referenced using the label of the action. If the label appears on its own then it is assumed that the Action calculates a single scalar value. The value of this scalar is thus used as the input to this new action. If * or *.* appears the scalars calculated by all the proceding actions in the input file are taken. Some actions have multi-component outputs and each component of the output has a specific label. For example a DISTANCE action labelled dist may have three componets x, y and z. To take just the x component you should use dist.x, if you wish to take all three components then use dist.*.More information on the referencing of Actions can be found in the section of the manual on the PLUMED Getting Started. Scalar values can also be referenced using POSIX regular expressions as detailed in the section on Regular Expressions. To use this feature you you must compile PLUMED with the appropriate flag. You can use multiple instances of this keyword i.e. ARG1, ARG2, ARG3... |
AVERAGING | Stride for calculation of averaged weights and sigma_mean |
SCALE_MIN | minimum value of the scaling factor |
SCALE_MAX | maximum value of the scaling factor |
DSCALE | maximum MC move of the scaling factor |
OFFSET_MIN | minimum value of the offset |
OFFSET_MAX | maximum value of the offset |
DOFFSET | maximum MC move of the offset |
DSIGMA | maximum MC move of the uncertainty parameter |
SIGMA_MEAN0 | starting value for the uncertainty in the mean estimate |
TEMP | the system temperature - this is only needed if code doesnt' pass the temperature to plumed |
MC_STEPS | number of MC steps |
MC_STRIDE | MC stride |
MC_CHUNKSIZE | MC chunksize |
STATUS_FILE | write a file with all the data usefull for restart/continuation of Metainference |
SELECTOR | name of selector |
NSELECT | range of values for selector [0, N-1] |
RESTART | allows per-action setting of restart (YES/NO/AUTO) |
In this first example the chemical shifts are used to calculate a scoring function to be used in NMR driven Metadynamics [47] :
whole: GROUP ATOMS=2612-2514:-1,961-1:-1,2466-962:-1,2513-2467:-1 WHOLEMOLECULES ENTITY0=whole cs: CS2BACKBONE ATOMS=1-2612 NRES=176 DATADIR=../data/ TEMPLATE=template.pdb CAMSHIFT NOPBC metad: METAD ARG=cs HEIGHT=0.5 SIGMA=0.1 PACE=200 BIASFACTOR=10 PRINT ARG=cs,metad.bias FILE=COLVAR STRIDE=100
In this second example the chemical shifts are used as replica-averaged restrained as in [29] [30].
cs: CS2BACKBONE ATOMS=1-174 DATADIR=data/ NRES=13 encs: ENSEMBLE ARG=(cs\.hn_.*),(cs\.nh_.*) stcs: STATS ARG=encs.* SQDEVSUM PARARG=(cs\.exphn_.*),(cs\.expnh_.*) RESTRAINT ARG=stcs.sqdevsum AT=0 KAPPA=0 SLOPE=24 PRINT ARG=(cs\.hn_.*),(cs\.nh_.*) FILE=RESTRAINT STRIDE=100
This third example show how to use chemical shifts to calculate a METAINFERENCE score .
cs: CS2BACKBONE ATOMS=1-174 DATADIR=data/ NRES=13 DOSCORE NDATA=24 csbias: BIASVALUE ARG=cs.score PRINT ARG=(cs\.hn_.*),(cs\.nh_.*) FILE=CS.dat STRIDE=1000 PRINT ARG=cs.score FILE=BIAS STRIDE=100