This is part of the isdb module |
Calculate the fit of a structure or ensemble of structures with a cryo-EM density map.
This action implements the multi-scale Bayesian approach to cryo-EM data fitting introduced in Ref. [60] . This method allows efficient and accurate structural modeling of cryo-electron microscopy density maps at multiple scales, from coarse-grained to atomistic resolution, by addressing the presence of random and systematic errors in the data, sample heterogeneity, data correlation, and noise correlation.
The experimental density map is fit by a Gaussian Mixture Model (GMM), which is provided as an external file specified by the keyword GMM_FILE. We are currently working on a web server to perform this operation. In the meantime, the user can request a stand-alone version of the GMM code at massimiliano.bonomi_AT_gmail.com.
When run in single-replica mode, this action allows atomistic, flexible refinement of an individual structure into a density map. Combined with a multi-replica framework (such as the -multi option in GROMACS), the user can model an ensemble of structures using the Metainference approach [21] .
In this example, we perform a single-structure refinement based on an experimental cryo-EM map. The map is fit with a GMM, whose parameters are listed in the file GMM_fit.dat. This file contains one line per GMM component in the following format:
#! FIELDS Id Weight Mean_0 Mean_1 Mean_2 Cov_00 Cov_01 Cov_02 Cov_11 Cov_12 Cov_22 Beta 02.9993805e+01could not find this keyword6.54628could not find this keyword10.37820could not find this keyword-0.92988could not find this keyword2.078920e-02could not find this keyword1.216254e-03could not find this keyword5.990827e-04could not find this keyword2.556246e-02could not find this keyword8.411835e-03could not find this keyword2.486254e-02could not find this keyword11could not find this keyword2.3468312e+01could not find this keyword6.56095could not find this keyword10.34790could not find this keyword-0.87808could not find this keyword1.879859e-02could not find this keyword6.636049e-03could not find this keyword3.682865e-04could not find this keyword3.194490e-02could not find this keyword1.750524e-03could not find this keyword3.017100e-02@newlinecould not find this keyword
To accelerate the computation of the Bayesian score, one can:
All the heavy atoms of the system are used to calculate the density map. This list can conveniently be provided using a GROMACS index file.
The input file looks as follows:
# include pdb info MOLINFOSTRUCTURE=prot.pdb # all heavy atoms protein-h: GROUPcompulsory keyword a file in pdb format containing a reference structure.NDX_FILE=index.ndxthe name of index file (gromacs syntax)NDX_GROUP=Protein-H # create EMMI score gmm: EMMIthe name of the group to be imported (gromacs syntax) - first group found is used by defaultNOPBC( default=off ) ignore the periodic boundary conditions when calculating distancesSIGMA_MIN=0.01compulsory keyword minimum uncertaintyTEMP=300.0temperatureNL_STRIDE=100compulsory keyword The frequency with which we are updating the neighbor listNL_CUTOFF=0.01compulsory keyword The cutoff in overlap for the neighbor listGMM_FILE=GMM_fit.datcompulsory keyword file with the parameters of the GMM componentsATOMS=protein-h # translate into bias - apply every 2 steps emr: BIASVALUEatoms for which we calculate the density map, typically all heavy atoms.ARG=gmm.scorebthe input for this action is the scalar output from one or more other actions.STRIDE=2 PRINTthe frequency with which the forces due to the bias should be calculated.ARG=emr.*the input for this action is the scalar output from one or more other actions.FILE=COLVARthe name of the file on which to output these quantitiesSTRIDE=500compulsory keyword ( default=1 ) the frequency with which the quantities of interest should be outputFMT=%20.10fthe format that should be used to output real numbers
By default this Action calculates the following quantities. These quantities can be referenced elsewhere in the input by using this Action's label followed by a dot and the name of the quantity required from the list below.
Quantity | Description |
scoreb | Bayesian score |
neff | effective number of replicas |
In addition the following quantities can be calculated by employing the keywords listed below
Quantity | Keyword | Description |
acc | NOISETYPE | MC acceptance for uncertainty |
scale | REGRESSION | scale factor |
accscale | REGRESSION | MC acceptance for scale regression |
enescale | REGRESSION | MC energy for scale regression |
anneal | ANNEAL | annealing factor |
weight | REWEIGHT | weights of the weighted average |
biasDer | REWEIGHT | derivatives with respect to the bias |
sigma | NOISETYPE | uncertainty in the forward models and experiment |
ATOMS | atoms for which we calculate the density map, typically all heavy atoms. For more information on how to specify lists of atoms see Groups and Virtual Atoms |
GMM_FILE | file with the parameters of the GMM components |
NL_CUTOFF | The cutoff in overlap for the neighbor list |
NL_STRIDE | The frequency with which we are updating the neighbor list |
SIGMA_MIN | minimum uncertainty |
RESOLUTION | Cryo-EM map resolution |
NOISETYPE | functional form of the noise (GAUSS, OUTLIERS, MARGINAL) |
NUMERICAL_DERIVATIVES | ( default=off ) calculate the derivatives for these quantities numerically |
NOPBC | ( default=off ) ignore the periodic boundary conditions when calculating distances |
NO_AVER | ( default=off ) don't do ensemble averaging in multi-replica mode |
REWEIGHT | ( default=off ) simple REWEIGHT using the ARG as energy |
ARG | the input for this action is the scalar output from one or more other actions. The particular scalars that you will use are referenced using the label of the action. If the label appears on its own then it is assumed that the Action calculates a single scalar value. The value of this scalar is thus used as the input to this new action. If * or *.* appears the scalars calculated by all the proceeding actions in the input file are taken. Some actions have multi-component outputs and each component of the output has a specific label. For example a DISTANCE action labelled dist may have three components x, y and z. To take just the x component you should use dist.x, if you wish to take all three components then use dist.*.More information on the referencing of Actions can be found in the section of the manual on the PLUMED Getting Started. Scalar values can also be referenced using POSIX regular expressions as detailed in the section on Regular Expressions. To use this feature you you must compile PLUMED with the appropriate flag. You can use multiple instances of this keyword i.e. ARG1, ARG2, ARG3... |
SIGMA0 | initial value of the uncertainty |
DSIGMA | MC step for uncertainties |
MC_STRIDE | Monte Carlo stride |
ERR_FILE | file with experimental or GMM fit errors |
OV_FILE | file with experimental overlaps |
NORM_DENSITY | integral of the experimental density |
STATUS_FILE | write a file with all the data useful for restart |
WRITE_STRIDE | write the status to a file every N steps, this can be used for restart |
REGRESSION | regression stride |
REG_SCALE_MIN | regression minimum scale |
REG_SCALE_MAX | regression maximum scale |
REG_DSCALE | regression maximum scale MC move |
SCALE | scale factor |
ANNEAL | Length of annealing cycle |
ANNEAL_FACT | Annealing temperature factor |
TEMP | temperature |
PRIOR | exponent of uncertainty prior |
WRITE_OV_STRIDE | write model overlaps every N steps |
WRITE_OV | write a file with model overlaps |
AVERAGING | Averaging window for weights |