ISDB: setting up a SAXS post processing and refinement calculation using MARTINI form factors
Authors
Cristina Paissoni

Aims

This tutorial is thought to illustrate how it is possible to compute SAXS intensities from single PDB files or trajectories using PLUMED. In particular, we will show how to compute scattering intensities with the hybrid coarse-grained/atomistic approach that is described in [100] . The tutorial will provide basic instructions to prepare files for the back-calculation of SAXS intensities using the hybrid approach (this process is simplified by the possibility to use an ad-hoc python script). Further, it is explained how to adjust the plumed input file for specific purposes, e.g. to use SAXS data as restraints in MD simulations or to compare SAXS intensities computed with both the atomistic and the hybrid approach.

Objectives

Once this tutorial is completed users will be able to:

Resources

The TARBALL for this project contains the following files:

This tutorial has been tested on PLUMED version 2.5.1

Introduction

Calculations of Small-angle X-ray scattering (SAXS) intensities from a structure of N atoms could be extremely demanding from a computational perspective as it is an \(O(N^2)\) problem. This issue has limited the applicability of SAXS in numerous situations, including their use as restraints in combination with MD simulations. A strategy to reduce the cost of computing SAXS from atomic structure consists in using a coarse grain representation of the structure, represented as a collection of M beads with M<N, each comprising a variable number of atoms. Niebling et al [98] have previously derived the Martini beads form factors for proteins, showing how this approach can be almost 50 times faster than the standard SAXS calculation. More recently, Martini beads form factors for nucleic acids have been also derived and, together with the ones for proteins, have been implemented in PLUMED. Importantly, it has been shown that the computation of scattering intensities using these Martini form factors achieves a good accuracy, as compared to the atomistic ones, for đť‘ž values up to 0.45 Ă…-1.

The Martini form factors can be exploited within a hybrid multi-resolution strategy to speed up SAXS profiles calculations, both for the back-calculation of scattering intensities from atomistic PDB or trajectory files and for the inclusion of SAXS data as restraints in experimental-driven all-atom simulations (e.g. METAINFERENCE). This can be achieved using PLUMED, which computes on the fly the virtual position of the Martini beads from the atomistic 3D-structure and then uses the centres of mass of these beads, along with Martini form factors, to quickly back-calculate SAXS intensities.

Computing SAXS intensities with the hybrid coarse-grained/atomistic approach

Given a PDB file, PLUMED is able to compute SAXS intensities for the molecule in the PDB and to compare these intensities with the experimental ones stored in a data file. It is possible to apply the same procedure to all the conformations visited during a MD trajectory. This is achieved by using the PLUMED driver utility and the SAXS variable of the ISDB module. While computing scattering intensities with a full atomistic representation is quite easy (see the SAXS keyword and later in this tutorial), in order to adopt the hybrid coarse-grained/atomistic approach a more elaborated procedure is needed, requiring the generation of few specific files to be used as input by driver. To facilitate this step, it is possible to use this script (martiniFormFactor_p3.py for python 3) and type in the bash shell:

python martiniFormFactor_p3.py -f start.pdb -dat SASDAB7.dat [-unit Ang/nm -nq 15]

The input files are:

  • filepdb.pdb: a PDB file containing the atomic coordinates of the molecule; in our case it is start.pdb. Note that only one model should be present in the PDB and an ENDMDL statement is expected to appear only at the end of the file. Further, if different chains are present, they are expected to be separated by a TER statement or named differently (this is true also for chain breaks, but it is generally not recommended to use broken molecules for these calculations). If the aim is to analyse a trr/xtc trajectories, the PDB should contain the same atoms of the trajectories and with the same order, including water molecules and ions (you can simply save a sample PDB from the xtc/trr and use it to generate the necessary files).
  • filedat.dat: a file containing the momentum transfer in the first column and the experimental SAXS intensities in the second, with the two columns separated by blanks or commas; in our case it is SASDAB7.dat. Further columns are accepted but they will not be considered. The momentum transfer is expected to be expressed in inverse Angstrom, if it is expressed in inverse nm you can use the option “-unit nm” (note that this is needed in our case!). By default, the python script will select 15 equally separated q values, and the relative intensities, between the first and the last values of the file. If you want to change this default behaviour of the python script you can use the option “-nq” to indicate the number of q values to consider. These values will be used by PLUMED to compute SAXS intensities for the selected momentum transfers and to compare them with the corresponding experimental values provided.

The files generated in this step, to be used later as input for driver, are the following:

  • aacg_template.pdb: a PDB file, which PLUMED uses as a template, in which the atomistic model provided (with the atoms renumbered in sequential order) is concatenated to the coarse-grained representation.
  • plumed_beads.dat: a file that instructs PLUMED about how to compute the centre of mass of each bead and that define a group of atoms (called “martini”) containing all the identified beads. These atoms are the ones that will be used later for the calculation of SAXS intensities (the atom group “martini” is indeed recalled in the SAXS keyword of the plumed.dat file). We suggest to always double-check this file, verifying that the lists of atoms in the last beads correspond to the actual Martini mapping.
  • plumed.dat: a file that tells PLUMED to compute the CENTER of each bead (achieved reading plumed_beads.dat), compute scattering intensities for the selected q-values using the hybrid coarse-grained/atomistic approach and, lastly, print some output files that could be useful for further statistics. We will see later how to modify this file to achieve different goals.

The default files generated should be sufficient to post-process single PDB files or trajectories, by typing:

plumed driver --plumed plumed.dat --mf_pdb start.pdb

or

plumed driver --plumed plumed.dat --mf_xtc samplextc.xtc

The output files generated by this step are:

  • SAXSINT, containing the computed SAXS intensities. Each line of this file contains in the first column a time and in the following columns the scattering intensities for each of the selected q-values. If the file post-processed is a PDB there will be only one line with time 0. Otherwise, each line will correspond to a frame of the input trajectory (by default the time unit is 1.0, but you can set the desired time step using the option “–timestep” of PLUMED driver).
  • ST.SAXS, containing in the first column a time (as above) and in the following column the correlation between experimental and back-calculated SAXS intensities for the selected q-values.

Let's see now the meaning of each line in plumed.dat and how it is possible to modify it for other purposes, e.g. to compare atomistic/coarse-grained scattering intensities and to perform Metainference simulations where SAXS data are used as restraints. Here is a sample of plumed.dat produced by the python script martiniFormFactor_p3.py:

Click on the labels of the actions for more information on what each action computes
tested on v2.9
MOLINFO 
STRUCTURE
compulsory keyword a file in pdb format containing a reference structure.
=aacg_template.pdb # BEADS DEFINITION INCLUDE
FILE
compulsory keyword file to be included
=plumed_beads.dat # SAXS saxsdata: SAXS ...
ATOMS
The atoms to be included in the calculation, e.g.
=martini
MARTINI
( default=off ) calculate SAXS for a Martini model
# You can use SCALEINT keyword to set appropriate scaling factor. # SCALEINT is expected to correspond to the intensity in q=0 # SCALEINT=
QVALUE1
Selected scattering lengths in inverse angstroms are given as QVALUE1, QVALUE2, ....
=0.0111721000
EXPINT1
Add an experimental value for each q value.
=0.0527250000
QVALUE2
Selected scattering lengths in inverse angstroms are given as QVALUE1, QVALUE2, ....
=0.0368675000
EXPINT2
Add an experimental value for each q value.
=0.0327126000
QVALUE3
Selected scattering lengths in inverse angstroms are given as QVALUE1, QVALUE2, ....
=0.0625615000
EXPINT3
Add an experimental value for each q value.
=0.0128316000
QVALUE4
Selected scattering lengths in inverse angstroms are given as QVALUE1, QVALUE2, ....
=0.0882529000
EXPINT4
Add an experimental value for each q value.
=0.0045545200
QVALUE5
Selected scattering lengths in inverse angstroms are given as QVALUE1, QVALUE2, ....
=0.1139410000
EXPINT5
Add an experimental value for each q value.
=0.0022799600
QVALUE6
Selected scattering lengths in inverse angstroms are given as QVALUE1, QVALUE2, ....
=0.1396240000
EXPINT6
Add an experimental value for each q value.
=0.0013048600
QVALUE7
Selected scattering lengths in inverse angstroms are given as QVALUE1, QVALUE2, ....
=0.1653020000
EXPINT7
Add an experimental value for each q value.
=0.0007215740
QVALUE8
Selected scattering lengths in inverse angstroms are given as QVALUE1, QVALUE2, ....
=0.1909730000
EXPINT8
Add an experimental value for each q value.
=0.0004340930
QVALUE9
Selected scattering lengths in inverse angstroms are given as QVALUE1, QVALUE2, ....
=0.2166360000
EXPINT9
Add an experimental value for each q value.
=0.0002717150
QVALUE10
Selected scattering lengths in inverse angstroms are given as QVALUE1, QVALUE2, ....
=0.2422910000
EXPINT10
Add an experimental value for each q value.
=0.0002574160
QVALUE11
Selected scattering lengths in inverse angstroms are given as QVALUE1, QVALUE2, ....
=0.2679360000
EXPINT11
Add an experimental value for each q value.
=0.0001878030
QVALUE12
Selected scattering lengths in inverse angstroms are given as QVALUE1, QVALUE2, ....
=0.2935700000
EXPINT12
Add an experimental value for each q value.
=0.0001592670
QVALUE13
Selected scattering lengths in inverse angstroms are given as QVALUE1, QVALUE2, ....
=0.3191930000
EXPINT13
Add an experimental value for each q value.
=0.0000811279
QVALUE14
Selected scattering lengths in inverse angstroms are given as QVALUE1, QVALUE2, ....
=0.3448030000
EXPINT14
Add an experimental value for each q value.
=0.0001110630
QVALUE15
Selected scattering lengths in inverse angstroms are given as QVALUE1, QVALUE2, ....
=0.3703990000
EXPINT15
Add an experimental value for each q value.
=0.0001264680 # METAINFERENCE # Uncomment the following keywords and adjust parameters to activate METAINFERENCE # DOSCORE NOENSEMBLE SIGMA_MEAN0=0 # REGRES_ZERO=500 # SIGMA0=5 SIGMA_MIN=0.001 SIGMA_MAX=5.00 # NOISETYPE=MGAUSS ... # METAINFERENCE # Uncomment the following keyword to activate METAINF # saxsbias: BIASVALUE ARG=(saxsdata\.score) STRIDE=10 # STATISTICS statcg: STATS
ARG
the input for this action is the scalar output from one or more other actions.
=(saxsdata\.q_.*)
PARARG
the input for this action is the scalar output from one or more other actions without derivatives.
=(saxsdata\.exp_.*) # PRINT # Uncomment the following line to print METAINFERENCE output # PRINT ARG=(saxsdata\.score),(saxsdata\.biasDer),(saxsdata\.weight),(saxsdata\.scale), (saxsdata\.offset),(saxsdata\.acceptSigma),(saxsdata\.sigma.*) STRIDE=500 FILE=BAYES.SAXS # change stride if you are using METAINFERENCE PRINT
ARG
the input for this action is the scalar output from one or more other actions.
=(saxsdata\.q_.*)
STRIDE
compulsory keyword ( default=1 ) the frequency with which the quantities of interest should be output
=1
FILE
the name of the file on which to output these quantities
=SAXSINT PRINT
ARG
the input for this action is the scalar output from one or more other actions.
=statcg.corr
STRIDE
compulsory keyword ( default=1 ) the frequency with which the quantities of interest should be output
=1
FILE
the name of the file on which to output these quantities
=ST.SAXSCG

The first two lines tell PLUMED to use aacg_template.pdb as a template and to read the file plumed_beads.dat (which rules how to compute the CENTER of the beads and defines the “martini” group, containing all the beads). Note that you do not need to prepare these two files, since they both are produced by the python script martiniFormFactor_p[2,3].py. These lines are followed by the SAXS keyword, labelled “saxsdata”. Here are defined the atoms to be used for SAXS calculations (in our case these are all the beads within the group “martini”) and the structure factors to adopt: in this case we are using the Martini form factors (flag MARTINI), alternatively it is possible to use the atomistic ones (flag ATOMISTIC) or define custom form factors using a polynomial expansion to any order (flag PARAMETERS). By default, PLUMED computes a scaling factor to fit experimental and back-calculated intensities, however it is possible to set manually this scaling factor using the flag SCALEINT, which is expected to correspond to the intensity in q=0. The following lines indicate the q-values at which the calculation of scattering intensities is required (QVALUE) and the corresponding intensities (EXPINT). These are the ones selected by default by the python script, but it is possible to manually adjust them and/or to add new values. Lastly, within the SAXS keyword, there are few lines that are needed to activate METAINFERENCE. These are commented since they are not necessary for the back-calculation of SAXS intensities, however you have to uncomment and adjust them if you want to perform Metainference simulations in which SAXS data are used as restraints. The same is true for the line containing the keyword BIASVALUE. In the last part of the file, we ask PLUMED to compute some statistics comparing experimental and back-calculated intensities; finally, we print the computed intensities and statistics into the SAXSINT and ST.SAXSCG files, respectively.

It is easy to modify the plumed.dat for your own purposes, for instance:

  1. it could be used to perform Metainference simulations: to this aim it is sufficient to uncomment the lines indicated in the plumed.dat above and adjust the parameters to make them suitable for the investigated system (see the tutorial ISDB: setting up a Metadynamics Metainference simulation and the SAXS keyword). We further suggest adding (if needed) WHOLEMOLECULES and NOPBC flags.
  2. it could be exploited to compute SAXS intensities with both the atomistic and the hybrid approach to compare them later. A sample plumed.dat to achieve this could be:
Click on the labels of the actions for more information on what each action computes
tested on v2.9
MOLINFO 
STRUCTURE
compulsory keyword a file in pdb format containing a reference structure.
=aacg_template.pdb # BEADS DEFINITION INCLUDE
FILE
compulsory keyword file to be included
=plumed_beads.dat # SAXS saxscg: SAXS ...
ATOMS
The atoms to be included in the calculation, e.g.
=martini
MARTINI
( default=off ) calculate SAXS for a Martini model
QVALUE1
Selected scattering lengths in inverse angstroms are given as QVALUE1, QVALUE2, ....
=0.010
QVALUE2
Selected scattering lengths in inverse angstroms are given as QVALUE1, QVALUE2, ....
=0.020 # etc.. ... saxsaa: SAXS ...
ATOMS
The atoms to be included in the calculation, e.g.
=1-11104
ATOMISTIC
( default=off ) calculate SAXS for an atomistic model
QVALUE1
Selected scattering lengths in inverse angstroms are given as QVALUE1, QVALUE2, ....
=0.010
QVALUE2
Selected scattering lengths in inverse angstroms are given as QVALUE1, QVALUE2, ....
=0.020 # etc.. ... #PRINT PRINT
ARG
the input for this action is the scalar output from one or more other actions.
=(saxscg\.q_.*)
FILE
the name of the file on which to output these quantities
=QVAL_CG PRINT
ARG
the input for this action is the scalar output from one or more other actions.
=(saxsaa\.q_.*)
FILE
the name of the file on which to output these quantities
=QVAL_AA