PLUMED can use the PDB format in several places PLUMED can use the PDB format in several places
The implemented PDB reader expects a file formatted correctly according to the PDB standard. In particular, the following columns are read from ATOM records
columns | content 1-6 | record name (ATOM or HETATM) 7-11 | serial number of the atom (starting from 1) 13-16 | atom name 18-20 | residue name 22 | chain id 23-26 | residue number 31-38 | x coordinate 39-46 | y coordinate 47-54 | z coordinate 55-60 | occupancy 61-66 | beta factor
PLUMED parser is slightly more permissive than the official PDB format in the fact that the format of real numbers is not fixed. In other words, any parsable real number is ok and the dot can be placed anywhere. However, columns are interpret strictly. A sample PDB should look like the following
ATOM 2 CH3 ACE 1 12.932 -14.718 -6.016 1.00 1.00 ATOM 5 C ACE 1 21.312 -9.928 -5.946 1.00 1.00 ATOM 9 CA ALA 2 19.462 -11.088 -8.986 1.00 1.00
Notice that serial numbers need not to be consecutive. In the three-line example above, only the coordinates of three atoms are provided. This is perfectly legal and indicates PLUMED that information about these atoms only is available. This could be both for structural information in MOLINFO, where the other atoms would have no name assigned, and for reference structures used in RMSD, where only the provided atoms would be used to compute RMSD.
PLUMED reads also occupancy and beta factors that however are given a very special meaning. In cases where the PDB structure is used as a reference for an alignment (that's the case for instance in RMSD and in FIT_TO_TEMPLATE), the occupancy column is used to provide the weight of each atom in the alignment. In cases where, perhaps after alignment, the displacement between running coordinates and the provided PDB is computed, the beta factors are used as weight for the displacement. Since setting the weights to zero is the same as not including an atom in the alignement or displacement calculation, the two following reference files would be equivalent when used in an RMSD calculation. First file:
ATOM 2 CH3 ACE 1 12.932 -14.718 -6.016 1.00 1.00 ATOM 5 C ACE 1 21.312 -9.928 -5.946 1.00 1.00 ATOM 9 CA ALA 2 19.462 -11.088 -8.986 0.00 0.00
Second file:
ATOM 2 CH3 ACE 1 12.932 -14.718 -6.016 1.00 1.00 ATOM 5 C ACE 1 21.312 -9.928 -5.946 1.00 1.00
However notice that many extra atoms with zero weight might slow down the calculation, so removing lines is better than setting their weights to zero. In addition, weights for alignment need not to be equivalent to weights for displacement.
Notice that it very likely does not make any sense to compute the RMSD or any other structural deviation using so many atoms. However, if the protein for which you want to compute RMSD has atoms with large serial numbers (e.g. because it is located after solvent in the sorted list of atoms) you might end up with troubles with the limitations of the PDB format. Indeed, since there are 5 columns available for atom serial number, this number cannot be larger than 99999. In addition, providing MOLINFO with names associated to atoms with a serial larger than 99999 would be impossible.
Since PLUMED 2.4 we allow hybrid 36 format to be used to specify atom numbers. This format is not particularly widespread, but has the nice feature that it provides a one-to-one mapping between numbers up to approximately 80 millions and strings with 5 characters, plus it is backward compatible for numbers smaller than 100000. This is not true for notations like the hex notation exported by VMD. Using the hybrid 36 format, the ATOM records for atom ranging from 99997 to 100002 would read like these:
ATOM 99997 Ar X 1 45.349 38.631 15.116 1.00 1.00 ATOM 99998 Ar X 1 46.189 38.631 15.956 1.00 1.00 ATOM 99999 Ar X 1 46.189 39.471 15.116 1.00 1.00 ATOM A0000 Ar X 1 45.349 39.471 15.956 1.00 1.00 ATOM A0000 Ar X 1 45.349 38.631 16.796 1.00 1.00 ATOM A0001 Ar X 1 46.189 38.631 17.636 1.00 1.00
There are tools that can be found to translate from integers to strings and back using hybrid 36 format (a simple python script can be found here).