Pteros
2.0
Molecular modeling library for human beings!
|
Pteros treats all particles as atoms. There is no destinction between real atoms and various dummy particles (virtual sites, shell particles, etc.). Each atom in Pteros has the following attributes:
Property | Data type | Description | Comment |
---|---|---|---|
name | string | Atom name | |
resid | integer | Residue id | Unique within single protein or nucleic acid chain. May start at any value defined in the structure file for each chain. |
resindex | integer | Residue index | Unique within the whole system. Chain boundaries are ignored. Starts at 0. Assigned automatically. |
resname | string | Residue name | |
chain | char | Chain identifier | Single character. If no chain information is present defaults to single space. |
tag | string | Arbitrary textual tag | Defaults to empty string |
mass | float | Atomic mass | In atomic units. If not given explicitly in the structure file guessed from the atom name. Defaults to 1.0. |
charge | float | Atomic charge | Electron charge units. Defaults to 0.0. Usually read from topology files. |
beta | float | PDB B-factor | Defaults to 0. |
occupancy | float | PDB occupancy facor | Defaults to 0. |
type | integer | Numerical index of atom type | Defaults to -1, only makes sense in MD topologies. |
type_name | string | Name of atom type | Defaults to empty string. |
element_number | interger | Atomic number of the atom | Defaults to -1. If not defined in the structure or topology file it is not guessed. Element name is not stored and is computed from element number on the fly if needed. |
Atoms are represented by the objects of Atom class.
The system is a container for atoms, coordinate frames and associated force field parameters. The system may contain many coordinate frames which represent different states of the system (MD time steps or NMR structures for example). System is usually read from one or more structure, trajectory or topology files. Systems are represented by the System class.
The frame is representation of the instanteneous state of the system. It contains coordinates of all atoms, time stamp and the periodic box (if the periodic boundary conditions are used). Represented by the Frame class.
Selection is a subset of atoms from particular system. Selection do not contain any data but just points to existing atoms in the system. Selection should be considered as a "handle" to certain group of atoms which allows to query their properties and to manipulate their attributes and coordinates in various ways.
Currently the following file formats are supported:
Molecular data formats contain different information about the system. In Pteros all information stored in the data files is classified into atoms, coordinates, trajectory and topology. Type of particular piece of information is determined by Mol_file_content class.
Mol_file_content::atoms()
. Information about atoms in the system (name, residue name, chain and other attributes) but without coordinates. Mol_file_content::coord()
. Single set of coordinates of atoms (single frame) and the periodic box if periodicity is present. Mol_file_content::traj()
. The number of coordinate frames each containing a set of coordinates, time stamp and the periodic box if periodicity is present. Mol_file_content::top()
. MD topology containing connectivity, atom types, precise masses, partial charges, non-bond Van-der-Waals parameters, etc.Particular file format may contain different information. For example PDB files contain atoms and coordinates. XTC files contain the trajectory but no atoms. TNG files contain atoms and trajectory at the same time, etc. There are two ways of loading data files - simple and advanced.
Loading is performed by either constructor of the System class or by System::load() method. In both cases the same behavior and parameters are used.
When the data file is loaded in Pteros using simple mode the information being loaded depend on the file type and on the current content of the System where the data go. The most "logical" way of adding new data to the system is chosen but there is no fine control about what is read and how. Here is an example:
C++ | Python |
// (1) Initial loading in constructor
System s("somefile.pdb");
// (2) Adding structure file
s.load("other-file.pdb");
// (3) Adding trajectory
s.load("trajectory.xtc",3,20);
// (4) Adding topology
s.load("topol.tpr");
| # Initial loading in constructor
s = System('somefile.pdb')
# (2) Adding structure file
s.load('other-file.pdb')
# (3) Adding trajectory
s.load('trajectory.xtc',3,20)
# (4) Adding topology
s.load('topol.tpr')
|
Since different file types contain different information the logic of adding information to the System becomes rather complicated in non-trivial situations. The following table summarises what is read from different files in different cases.
Initial loading (in constructor or to the empty System)
File type | What is loaded? | Comment |
---|---|---|
PDB, GRO, MOL2 | Atoms and the single frame | MOL2 does not contain periodic box! |
TPR | Atoms, the single frame, full topology | Read only. The file itself should be produced by Gromacs |
TNG | Atoms and multiple frames | |
XTC, TRR, DCD | Fails and raises exception | In Pteros reading trajectory into an empty system is not allowed (in contrast to VMD for example). |
Adding data to the System which is not empty (by System::load())
File type | What is loaded? | Comment |
---|---|---|
PDB, GRO, MOL2 | Single frame | New frame added. The only check is matching number of atoms. |
TPR | Topology | Adds atom types and charges, updates masses, adds other topology information. No coordinates are read, no frames added. |
TNG, XTC, TRR, DCD | Multiple frames | Adds number of new frames. The only check is matching number of atoms. |
In advanced mode the user specifies explicitly what to read and what to store in the system.
In this example we first create three file handlers for corresponding PDB, GRO and XTC files. File handlers are created by static Mol_file::open() method. Handlers could be created for reading (mode 'r') and for writing (mode 'w'). In our case we open them for reading. After that we create an empty system and load data from handlers.
System::load(handler)
reads the single frame. This behavior is intentional and allows reading frames one by one. Mol_file_content().atoms(true).traj(true)
or "Mol_file_content().coord(true).traj(true)" on steps (1) and (2) then trajectory reading will be ignored. This is again intentional to separate frame-by-frame trajectory reading from reading other information. Mol_file_content().atoms(true)
is specified the system is cleared before reading even if it already contained some atoms!Both simple and advanced forms of System::load() support optional user-defined callback function as last parameter. This function takes two argiments: the pointer to parent system and the index of current frame. If the callback returns true the loading of trajctory will continue to the next frame. If it returns false the loading will stop. Such callbacks are useful for organizing simple frame by frame processing of trajectories.
C++ | Python |
bool callback(System* sys, int frame){
cout << "Called for frame " << frame << endl;
}
...
System sys('some-structure.pdb');
sys.load('some-traj.xtc',0,-1,0,&callback)
| def callback(sys,frame):
print 'Called for frame ',frame
...
sys = System('some-structure.pdb')
sys.load('some-traj.xtc', on_frame=callback)
|
C++ | Python |
bool callback(System* sys, int frame){
// Do the job in callback
...
// Delete current frame.
sys->frame_delete(frame,frame);
}
| def callback(sys,frame):
# Do the job in callback
...
# Delete current frame.
sys.frame_delete(frame,frame)
|
Sometimes it is not necessary to read all atoms from the data file. For example if the system contains protein in water quite often only the protein is of interest. However, water may constitute 75% of the whole number of atoms and storing them in the System is redundant. This is especially important if many frames have to be kept in memory. In this case filtering the input may save huge amount of memory.
Pteros has the mechanism of input filtering which helps in such situations. The filter is a special kind of Selection which is set to empty system before any loading occures. All subsequent calls to System::load() will be filtered and only selected atoms will be kept in the system.
Filter is set by System::set_filter() method. It takes the same arguments as ordinary selections with two exceptions: filters can't be coordinate-dependent (they will throw an error during loading) and can't be set by callbacs.
Filter could only be set to empty system (set_filter() throws an error otherwise), thus the correct way of using them is the following:
C++ | Python |
// Create empty system
System sys;
// Set filter
sys.set_filter("name CA");
// All calls to load will use filter now
sys.load("some_protein.pdb");
sys.load("trajectory.xtc");
| # Create empty system
sys = System()
# Set filter
sys.set_filter('name CA')
# All calls to load will use filter now
sys.load('some_protein.pdb')
sys.load('trajectory.xtc')
|
Selections are the most important objects in Pteros which allow manipulating groups of atoms in the System. It is important to understand that selections do not contain any data but merely point to the set of atoms in the System.
In order to select atoms one need to create Selection class in one of the following ways:
C++ | Python |
Selection sel(system, <arguments...>);
| sel = Selection(system, <arguments...>)
|
C++ | Python |
sel = system.select(<arguments...>)
|
C++ | Python |
Selection sel = system(<arguments...>);
// Or using type inference:
auto sel = system(<arguments...>);
| sel = system(<arguments...>)
|
The arguments passed to selection could be the following (only method 2 is used for illustration but any method takes the same variants of arguments):
C++ | Python |
# Defaults to 1st frame in the System
sel = system.select('resname ALA GLY and within 3.0 pbc of resid 23-34 45')
# Explicitly points to frame #3
sel = system.select('resname ALA GLY and within 3.0 pbc of resid 23-34 45', 3)
|
C++ | Python |
ind1 = 10
ind2 = 20
sel = system.select(ind1, ind2)
|
C++ | Python |
vector<int> ind = {1,2,3,56,67,100};
sel = system.select(ind);
| ind = [1,2,3,56,67,100]
sel = system.select(ind)
|
C++ | Python |
void selection_callback(const System& sys, int fr, std::vector<int>& ind);
| def selection_callback(sys, fr, ind)
|
C++ | Python |
// Callback function
void sel_func(const System& sys,int fr,std::vector<int>& ind){
// Just for example we are selecting all atoms with x>5
ind.clear();
for(int i=0;i<sys.num_atoms();++i)
if(sys.XYZ(i,fr)(0)>5.0) ind.push_back(i);
}
...
System s("struct.pdb");
// Callback function is called to fill selection
Selection sel(s, &sel_func);
// The same but for specific frame #3
Selection sel(s, &sel_func, 3);
| # Callback function
def sel_func(sys,fr,ind):
# Just for example we are selecting all atoms with x>5
ind = []
for i in xrange(sys.num_atoms()):
if sys.getXYZ(i,fr)[0]>5.0:
ind.append(i)
...
s = System('struct.pdb')
# Callback function is called to fill selection
sel = s.select(sel_func)
# The same but for specific frame #3
sel = s.select(sel_func, 3)
|
Existing selection objects could be modified in two ways: by changing parent System and by selecting another set of atoms. Parent selection is changed by Selection::set_system() method:
C++ | Python |
System sys1("structure1.pdb");
System sys2("structure2.pdb");
// Create empty selection pointing to sys1
Selection sel(sys1);
// Make it point to sys2
sel.set_system(sys2);
| sys1 = System('structure1.pdb')
sys2 = System('structure2.pdb')
# Create empty selection pointing to sys1
sel = Selection(sys1)
# Make it point to sys2
sel.set_system(sys2)
|
In order to modify content of seelction (the set of selected atoms) the number of Selection::modify() methods is present. They accept the same arguments as corresponding constructors of Selection class. Each method also have the variant where new System is passed as the first argument:
C++ | Python |
# Changes selected atoms
sel.modify('resname ALA GLY')
# The same but also assigns to other system
sel.modify(other_system, 'resname ALA GLY')
# The same but also changes the target frame to frame 3
sel.modify(other_system, 'resname ALA GLY', 3)
|
C++ | Python |
ind1 = 10
ind2 = 20
# Changes selected atoms
sel.modify(ind1, ind2)
# The same but also assigns to other system
sel.modify(other_system, ind1, ind2)
|
C++ | Python |
ind = [1,2,3,56,67,100]
# Changes selected atoms
sel.modify(ind)
# The same but also assigns to other system
sel.modify(other_system, ind)
|
C++ | Python |
// Callback function
void sel_func(const System& sys,int fr,std::vector<int>& ind){
// Just for example we are selecting all atoms with x>5
ind.clear();
for(int i=0;i<sys.num_atoms();++i)
if(sys.x(i,fr)>5.0) ind.push_back(i);
}
...
// Changes selected atoms
sel.modify(sel_func);
// The same but also assigns to other system
sel.modify(other_system, sel_func);
// The same but also changes the target frame to frame 3
sel.modify(other_system, sel_func, 3);
| # Callback function
def sel_func(sys,fr,ind):
# Just for example we are selecting all atoms with x>5
ind = []
for i in xrange(sys.num_atoms()):
if sys[i,fr].x>5.0:
ind.append(i)
...
# Changes selected atoms
sel.modify(sel_func)
# The same but also assigns to other system
sel.modify(other_system, sel_func)
# The same but also changes the target frame to frame 3
sel.modify(other_system, sel_func, 3)
|
C++ | Python |
// Using System::select_all() (the most efficient method)
Selection all = system.select_all();
// Using operator "()" with no arguments (the same as above)
all = system();
// Using textual selection (less efficient)
all = Selection("all");
| # Using System.select_all() (the most efficient method)
all = system.select_all()
# Using operator "()" with no arguments (the same as above)
all = system()
# Using textual selection (less efficient)
all = Selection('all')
|
Selection language in Pteros is similar but not identical to one of VMD. Simple selections could be copy-pasted from VMD to Pteros and vice versa, while more complex selections would be different. Particularly Pteros provides much more advanced selections which include periodic boundary conditions, which are not possible in VMD.
The basic elements of selection language are the following:
The keyword "all" selects all atoms in the system.
Consist of the keyword, which represent property of the atom, followed by one or more values of this property. The values are implicitly combined by logical OR. Depending on the keyword the values are either strings, integers or unsigned integers.
For integers in addition to individual values the ranges are allowed in two forms: 1 to 10
and 1-10
. Both forms are equivalent.
For strings regular expressions could be given in single ('') or double ("") quotes. The systnax of the regular expressions is the same as used in C++11.
Keyword | Type of values | Example |
---|---|---|
name | string | name CA CB "C.*" |
resname | string | resname ALA GLU '.N.*' |
resid | int | resid 1 2 3 4-10 20 to 40 |
resindex | unsigned int | resindex 1 2 3 5 to 100 200-211 |
chain | single character | chain A B C |
tag | string | tag TAG1 TAG2 |
index | unsigned int | index 12 13 45 2-7 |
Consist of a lone keyword, which represent single numeric property of the atom. Numeric properties could be either integer or floats. They could be combined to numerical expressions and compared by numeric comparisons.
Keyword | Property |
---|---|
x | X coordinate of the atom |
y | Y coordinate of the atom |
z | Z coordinate of the atom |
beta | B-factor of the atom |
occupancy | Value of the PDB occupancy field for the atom |
index | Index of the atom |
resindex | Residue index of the atom |
resid | Resid of the atom |
distance | The distance of atom from given point, line or plane. See description of distance selections for details. |
Selection string | Interpretation |
---|---|
"resid 1 2 3 4-10" | interpreted as keyword selection |
"resid > 25" | interpreted as numeric property in numerical expression. |
Numerical expressions consist of numerical properties, integer of float point literals and common arithmetic operators "+", "-", "*", "/". Raising to any integer or fractional power is possible using either "^" or "**" operators. Unary minus could be used to change the sign of expression. Operator precedence could be changed by parentheses. Numerical expressions are only meaningful as operands of numeric comparisons. Single number or single numerical property is also valid numerical expression.
Numerical expressions are compared to each other by the common operators:
Operator | Meaning | Example |
---|---|---|
">" | Greater than | x > 3 |
"<" | Lower than | y < 5.6 |
"=" | Equal | resid = 45 |
"<>" | Not equal | resid <> 14 |
">=" | Greater or equal | index >= 15 |
"<=" | Lower or equal | resindex <= 100 |
The comparisons could be chained to select the ranges:
Selection text | Meaning |
---|---|
3 < x < 10 | Open range |
10 > x >= 3 | Semi-closed range |
100 <= resid <= 200 | Closed range |
Common logical operators "or", "and" and "not" are supported and could be used to construct arbitrary logical expressions:
Within selections allow selecting atoms which are within given cutoff distance around other selection, called "central selection". The syntax of within selections in Pteros is the following:
Distance is in nm.
Optional keywords "pbc" or "periodic" mean that selection takes into account periodic boundary conditions. Keywords "nopbc" or "noperiodic" means that periodicity is switched off. By default periodicity is off.
The optional keyword "self" means that central selection itself is also included into resulting selection. If "noself" is specified the central selection is excluded from result. The default is "self".
Keywords "pbc|nopbc" and "self|noself" can go in any order.
Here are some examples:
Selection text | What is selected |
---|---|
within 3.0 of protein | All atoms within 3.0 nm from any protein atom including the protein itself. Not periodic. |
within 3.0 pbc of protein | Same as above but periodic. |
within 3.0 pbc noself of protein | Same as above but the protein itself is not included. |
The "by residue <central selection>" operator is used to select whole residues, which are referenced by the central selection. This means that if at least one atom of particular residue is present in central selection then the whole residue would be selected.
Distance selections are unique for Pteros and allow selecting atoms by their distance from given reference point, line or plane. Their syntax is the following:
The keyword "dist" and "distance" are synonyms. If "point" is specified than three float point numbers x0, y0, z0 are expected. For "vector" and "distance" six float point numbers x0, y0, z0, x1, y1, z1 are expected.
Optional keywords "pbc" or "periodic" mean that selection takes into account periodic boundary conditions. Keywords "nopbc" or "noperiodic" means that periodicity is switched off. By default periodicity is off.
In the case of "dist vector" (x0,y0,z0) give the point in space and (x1,y1,z1) specify direction of the vector originating from this point (direction is normalized automatically). The distance from the atoms to this vector is evaluated.
In the case of "dist plane" (x0,y0,z0) give the point in space and (x1,y1,z1) specify the normal vector of the plane originating in this point (the normal vector is normalized automatically). The distance from the atoms to this plane is evaluated.
Distance keyword operates on per-atom basis and is treated as numeric property of the atom:
Selection text | What is selected |
---|---|
dist point 12.4 34.3 45.1 < 3.0 | Atoms within 3.0 nm from the point "12.4 34.3 45.1" |
dist point pbc 12.4 34.3 45.1 < 3.0 | The same as above but periodic |
dist vector 2.1 3.3 3.5 1 0 0 < 3.0 | Atoms within the cylinder with radius 3 nm and the axis going from point "2.1 3.3 3.5" along X axis (the direction vector is "1 0 0") |
3.0 < dist plane 2.1 3.3 3.5 1 1 1 < 5.0 | Selects the slabs of atoms which are between 3 and 5 nm from the plane which goes through the point "2.1 3.3 3.5" and has the normal "1 1 1". |
Selections created by means of selection string (text-based selections) are a bit special in Pteros. The set of selected atoms may depends on atomic coordinates (for example for selection x>15
or within 3.0 of y<34
). These cases are recognized automatically and such coordinate-dependent selection are treated in special manner.
Coordinate-dependent selections have to be updated if atomic coordinates change. If the coordinates of atoms were changed manually without changing the current frame this have to be done by hand using Selection::apply() method. If selection is not coordinate-dependent it does nothing.
If the user calls Selection::set_frame() to change the frame selection points to then the coordinate-dependent selection calls apply() automatically to be consistent with new atomic coordinates.
It is also possible to re-parse selection text completely by calling Selection::update(). This is necessary when the number of atoms (not only their coordinate) change. For non-textual selections it does nothing.
In most cases the feature of auto-update is very handy and saves a lot of time allowing not to bother about consistency of the coordinate-dependent selections with the current frame. However in certain cases such behavior is not desirable. For example one selects all water molecules around the protein in starting configuration using "resname SOL and within 3.0 of protein" and want to see how they diffuse in the course of MD simulation. Each update of the coordinate frame made by Selection::set_frame() will select new water shell around protein, which is not what we want. In such case one can call Selection::flatten() method. It "flattens" selection to the plain set of indexes without any auto-update magic.
Text-based status of selection is also lost immediately after any operation which modifies it externally (i.e. not by means of parsing selection string) such as using append(), remove() or invert(), using any logical operators with selections, etc. Current status of selection could be queries by Selection::text_based() and Selection::coord_dependent() methods.
Sub-selections allow selecting atoms inside existing selection (narrowing or refining existing selection in other terms). Sub-selections could be very useful in the following situation. Suppose that we need to create separate selections for N,C,CA and CB atoms of particular protein residue. With "normal" selections the following code could be used:
The problem with this code is that we are looping over all atoms in the system four times, ones in each selection. This is very inefficient since we only need to find our residue with "protein and resid 1" (one loop over all atoms) and then we need to search inside this residue four times (looping over ~10 atoms only). This problem is not apparent for small systems but becomes very painful for the systems with millions of atoms. Subselections solve this problem:
Subselections inherit the system and frame from the parent. The search in sub-selections is performed over selected atoms of the parent only (the only exception from this rule are within selections which involve seacrh over all atoms by design).
Subselections can also accept the pair of local indexes or the vector of local indexes:
Once sub-selection is created it is not distinguishable from the gnormal one. Particularly it could be modifyed as any other selection, reassigned, etc. Sub-selections are only special by the way of their creation.
Selection objects could be combined by logical operations to create new selections:
Logical AND
C++ | Python |
auto new_sel = sel1 & sel2;
| new_sel = sel1 and sel2
|
Logical OR
C++ | Python |
auto new_sel = sel1 | sel2;
| new_sel = sel1 or sel2
|
Logical NOT
C++ | Python |
auto new_sel = ~sel1;
| new_sel = not sel2
|
C++ | Python |
auto new_sel = sel1 - sel2;
| new_sel = sel2 - sel1
|
It is also possible to perform logical operation in place without creating new selections:
In-place operation | Method |
---|---|
Appending one selection to the other | Selection::append() |
Removing atoms from one selection from the other | Selection::remove() |
Invert selection | Selection::invert() |
If selection contains N atoms then they could be accessed by relative indexes from 0 to N-1 inclusive. Relative indexes are NOT the same as atomic indexes. Atom with global index 150 may have relative index 0 in one selection and 23 in the other. Relative indexes are very convenient in accessing properties of selected atoms. For this Selection class contains the set of accessor methods.
Accessor methos in C++ have the same names as the properties of atoms: sel.index(i)
returns global index of atom with relative index i, sel.name(i)
returns name of this atom, sel.resname(i)
returns its residue name, etc.
Accessors x(i)
, y(i)
and z(i)
give access to individual atomic coordinates of the current frame (the frame pointed by selection). The method xyz(i)
returns the vector of all three coordinates. These accessors also accept frame index as optional second parameter: z(i,fr)
, xyz(i,fr)
, etc.
There are also accessors for periodic box associated with current frame pointed by selection (box()
) and the time stamp of the pointed frame (time()
). These methods are not available for arbitrary frames.
There are also special accessor: vdw(i)
which returns the van der Waals radius of i-th atom and element_name(i)
which returns element name based on either element number (if available) or the first name of atom name.
In C++ all of them are accessors inlined l-value functions. This means that they do not add any overhead in accessing atom properties and could be assinged to:
Due to language limitations in Python each accessor have to be prepended by "get" or "set" prefixes for read and write access respectively, which makes them rather clumsy and difficult to use. That is why accessors are not ported to Python and recommended way of accessing atom properties in using indexing (see below).
Selections support indexing by "[]" operator. It returns special Atom_proxy object, which has the same set of atom property accessors as Selection itslef.
In C++ the only difference is that these accessors do not take relative atomic index since Atom_proxy stores it internally.
In Python Atom_proxy objects benefit from the Python properties, which makes the usage of indexing much easier than the direct usage of accessors:
There are two ways to iterate over selected atoms - using relative selection indexes and using iterators.
The first variant uses either accessor methods or indexing operator to get atom properties:
C++ | Python |
Selection sel(system, "name CA CB");
// Usage of accessor methods for each atom property
cout << sel.name(i) << " " << sel.resname(i) << endl;
}
// Usage of indexing operator (slower than property accessors and not recommended in C++!)
cout << sel[i].name() << " " << sel[i].resname() << endl;
}
| sel = Selection(system, 'name CA CB')
# Usage of indexing operator (still not the best way in Python, see below)
for i in xrange(sel.size()):
print sel[i].name, sel[i].resname
|
The second way is using built-in iterators of Selection objects:
C++ | Python |
Selection sel(system, "name CA CB");
// Usage of explicit iterators
Selection::iterator it;
cout << it->name() << " " << it->resname() << endl;
}
// Usage of range-based for loop in C++11 and higher
for(auto& a: sel){
cout << a.name() << " " << a.resname() << endl;
}
| sel = Selection(system, 'name CA CB')
# Recommended way in Python
for a in sel:
print a.name, a.resname
|
There is a set of methods allowing to get particular property of all selected atoms at the same time. These methods have the names "get_<property>()" and "set_<property>()" where property is name, resname, resid, beta, x, y, z, xyz, etc. In C++ getters return either std::vector for strings and char properties or Eigen::Vector for integer and float properties. Setters accept the same vectors as corresponding getters return. It is also possible to call setterss with simgle value which will "fill" all selected atoms. In Python the lists are used for strings and chars and numpy arrays for ints and floats:
C++ | Python |
resids = sel.get_resid()
resids[5] += 4
sel.set_resid(resids)
# Set all atom names to CA
sel.set_name('CA')
|
There are also methods "get_unique_<property>()" which return the vector of unique properties in selection. For example in order to get the list of all residue ids in selection:
C++ | Python |
sel = system.select_all();
auto resids = sel.get_resid();
// May return 1,1,1,1,2,2,2,2,2,3,3,3,3,4,4,4,4,...
resids = sel.get_unique_resid();
// Returns 1,2,3,4,5,...
| sel = system.select_all()
resids = sel.get_resid()
#May return 1,1,1,1,2,2,2,2,2,3,3,3,3,4,4,4,4,...
resids = sel.get_unique_resid()
# Returns 1,2,3,4,5,...
|
Pteros has reach set of methods for building molecular systems. They are not oriented on particular type of molecules like proteins or nucleic acids but allow manipulating atoms and selections in general.
Adding and deleting atoms in the system is rather expensive operation due to internal data organization. It is much faster to add or delete the list of several atoms at once than doing this by one atom at a time.
In order to add one or more atoms use System::atoms_add() method:
C++ | Python |
vector<Atom> atoms(N);
vector<Vector3f> coords(N);
for(i=0;i<N;++i){
Atom at;
at.name = 'C'
at.resid = i;
atoms.push_back(at);
coords.push_back(Vector3f(x,y,z));
}
Selection added_atoms = system.atoms_add(atoms,ccords);
|
There is a common problem of rearranging the atoms in the system in particular order, which is especially important for preparing MD topologies. In Pteros this is easily acomplished with System::rearrange() method. It takes a vector of selections or the vector of selection strings and arranges the atoms in the order of these selections. The atoms, which do not belong to any of provided selections are placed at the end in their initial order.
Pteros is designed to work with arbitrary triclinic periodic boxes. There are several ways of accounting for periodicity:
Upon reading the data files Pteros sets the periodic box accordingly in the System if it is present in the data file.
Periodic box could be extracted for each frame by using System::Box() method. Here is how to check if the periodic box is present:
C++ | Python |
System s("some_file.pdb");
if( s.Box(0).is_priodic() ) cout << "Box present for frame 0!" << endl;
| s = System('some_file.pdb')
if s.getBox(0).is_periodic():
print 'Box present for frame 0!'
|
An important feature of the Periodic_box is that it stores some internal data (like inverse tranform matrices) and thus individual components of the box are immutable. In order to change the box you have to get it as a 3x3 matrix, edit this matrix and assign it back using Periodic_box::modify() method:
C++ | Python |
System s("some_file.pdb");
// Get a box as matrix. Column 0 is vector a, column 1 is b, column 2 is c.
auto m = s.Box(0).get_matrix();
// Modify individual box element
m(0,1) *= 2;
// Set it back
s.box(0).modify(m);
| s = System('some_file.pdb')
# Get a box as matrix. Column 0 is vector a, column 1 is b, column 2 is c.
box = s.getBox(0).get_matrix()
# Modify individual matrix element
m[0][1] *= 2
# Set modified box object to the System
s.getBox(0).modify(m)
|
You can also get lengths of individual box vectors with Periodic_box::get_extent() or the lengths of all vectors with Periodic_box::get_extents(). It is also possible to extract an individual vector directly with Periodic_box::get_vector().
One common operation is scaling the peridic box without changing the angles between the vectors. This could be achieved with convenience function Periodic_box::scale_vectors():
C++ | Python |
System s("some_file.pdb");
// Scale all box vectors by x2
vector<float> scale = {2,2,2};
s.Box(0).scale_vectors(scale);
| s = System('some_file.pdb')
# Scale all box vectors by x2
s.getBox(0).scale_vectors([2,2,2])
|
It is also possible to get and set the box as a set of vector lengths (a,b,c) and the set of angles between the vectors (a^c, b^c, a^b). This is performed by the Periodic_box::to_vectors_angles() and Periodic_box::from_vectors_angles() methods:
C++ | Python |
System s("some_file.pdb");
vector<float> vecs, angles;
// Get as vectors and angles
s.Box(0).to_vectors_angles(vecs,angles);
// Modify a^b angle
angles[2] *= 2;
// Set it back
s.Box(0).from_vectors_angles(vecs,angles);
| System s('some_file.pdb')
# Get as vectors and angles
vecs,angles = s.getBox(0).to_vectors_angles()
# Modify a^b angle
angles[2] *= 2
# Set it back
s.getBox(0).from_vectors_angles(vecs,angles)
|
There are several ways to wrap atoms or points into the box:
In contrast to wrapping unwrapping is non-trivial operation which could be done in different ways. Pteros supports two unwrapping methods:
In order to compute the periodic distances the following methods could be used:
Periodic_box::shortest_vector() computes the shortest vector between two arbitrary points. Periodic_box::get_closest_image() returns the periodic image of given point, which is closest to the anchor point.
Pteros uses fast grid search algorithms for finding distances between the atoms. There are several variants of the distance seacrh which return either pairs of atoms within given cut-off distance or the list of atoms within given distance from point in space or other set of atoms.
Use this function to search all contacts between the atoms within single selection.
C++ | Python |
// Vector for found pairs
vector<Vector2i> pairs;
// Cut-off distance in nm
float cut_off = 3.5;
// Non-periodic search returning local indexes
search_contacts(cut_off,sel,pairs);
// Periodic search returning global indexes
search_contacts(cut_off,sel,pairs,true,true);
// Same as above but also return distances for each pair
vector<float> distances;
search_contacts(cut_off,sel,pairs,true,true,&distances);
| # Cut-off distance in nm
cut_off = 3.5
# In Python search_contacts() *always*
# returns a tuple (pairs,distances)
# If no distances are requested the 'distances'
# would be equal to None!
# Non-periodic search returning local indexes
# We only take first member of the tuple (the pairs)
pairs = search_contacts(cut_off,sel)[0]
# Periodic search returning global indexes
# We only take first member of the tuple (the pairs)
pairs = search_contacts(cut_off,sel,True,True)[0]
# Same as above but also return distances for each pair
# Passing True in last argument instruct to fill distances with
# values instead of returning it as None.
pairs,distances = search_contacts(cut_off,sel,True,True,True)
|
Use this function to search the contacts between two different selections.
C++ | Python |
// Vector for found pairs
vector<Vector2i> pairs;
// Cut-off distance in nm
float cut_off = 3.5;
// Non-periodic search returning local indexes
search_contacts(cut_off,sel1,sel2,pairs);
// Periodic search returning global indexes
search_contacts(cut_off,sel1,sel2,pairs,true,true);
// Same as above but also return distances for each pair
vector<float> distances;
search_contacts(cut_off,sel1,sel2,pairs,true,true,&distances);
| # Cut-off distance in nm
cut_off = 3.5
# In Python search_contacts() *always*
# returns a tuple (pairs,distances)
# If no distances are requested the 'distances'
# would be equal to None!
# Non-periodic search returning local indexes
# We only take first member of the tuple (the pairs)
pairs = search_contacts(cut_off,sel1,sel2)[0]
# Periodic search returning global indexes
# We only take first member of the tuple (the pairs)
pairs = search_contacts(cut_off,sel1,sel2,True,True)[0]
# Same as above but also return distances for each pair
# Passing True in last argument instruct to fill distances with
# values instead of returning it as None.
pairs,distances = search_contacts(cut_off,sel1,sel2,True,True,True)
|
Use pteros::search_within() function to search all atoms from source selection which are within given distance from any atom of target selection.
C++ | Python |
// Source selection is water
auto source_sel = system("resname SOL");
// Target selection is protein
auto target_sel = system("protein");
// Vector of found atom indexes
// Absolute indexes are returned!
vector<int> result;
float cut_off = 3.5; // in nm
// Find all atoms of source in 3.5 nm shell around target
// not including target itself accounting for periodicity
search_within(cut_off, source_sel, target_sel, result, false, true);
| # Source selection is water
source_sel = system('resname SOL')
# Target selection is protein
target_sel = system('protein')
cut_off = 3.5 # in nm
# Find all atoms of source in 3.nm 5 shell around target
# not including target itself accounting for periodicity
# Absolute indexes are returned!
result = search_within(cut_off, source_sel, target_sel, False, True)
|
There is a common situation when one needs to find, for example, water molecules around multiple protein residues or around head group of each lipid in the membrane. In this scenario the source selection (water) is always the same, while the target selection changes and multiple searches have to be done. Usage of pteros::search_within() function is not optimal in this case bacause it will redundantly distribute source atoms to the same grid multiple times. In this case the Distance_search_within provides much better performance. This class is initialized with source selection and then allows to search around different targets efficiently multple times. It also allows searching around fixed points in space.
C++ | Python |
// Source selection for water
auto water = system("resname SOL");
// Multiple target selections for 20 protein residues
vector<Selection> residues;
for(int i=1; i<20; i++){
residues[i].modify(system,"protein and resid "+to_string(i));
}
// Vector of vectors for results
vector<vector<int>> results(20);
// Initialise for periodic search and to return local indexes
Distance_search_within seacrher(3.5, water, false, true);
// Search for each residue
for(int i=1; i<20; i++){
searcher.seacrh_within(residues[i], results[i]);
}
| # Source selection for water
water = system('resname SOL')
# Multiple target selections for 20 protein residues
residues = []
for i in xrange(20):
sel = system("protein and resid %i" % (i+1))
residues.append(sel);
# Storage for results
results = []
# Initialise for periodic search and to return local indexes
searcher = Distance_search_within(3.5, water, False, True)
# Search for each residue
for i in xrange(20):
res = searcher.search_within(residues[i+1]);
results.append(res);
|
Sometimes one does not need to do a distance searcg but want to distribute the atoms from given selection into spatial grid. This is useful for computing 3D density distributions, occupancy volumetric hystograms, velocity fields, etc. This task is perfromed by Grid class.
Grid cells are of type vector<Grid_element>
and contain information about indexes and coordinates of atoms in each cell.
Pteros is able to compute the short-range non-bond energies if topology file is read for the system. Currently Pteros reads Gromacs tpr run input files to extract topology and some energy evaluation options from them.
Treatment of non-bonded interactions in Gromacs is, unfortunately, complex and rather confusing. Coulomb and van-der-Waals interactions have a type and a modifier. The type is the functional form of interaction and the modifier is a special function, which is used to avoid the discontinuity of derivatives at cut-off distance. In practice different combinations of types and modifiers in Gromacs .mdp files lead to non-trivial and confusing choises of actual interaction functions. Pteros is trying to match these complex rules as close as possible but it is not guaranteed that the results would be numerically identical.
All energy evaluation methods return an object of type Eigen::Vector2f where the first number if Coulomb energy and the second is Van der Waals energy.
Before evaluating non-bond energies the user should always check if the System has correctly set force field parameters by calling System::force_field_ready(). If it returns true
than it is safe to call energy functions.
Selection class provides two higher-level methods of energy evaluation. The first one computes self-energy of selection within given distance cut-off. The second computes the energy between two selection of the same size within given cut-off for given frame.
C++ | Python |
Selection sel1(sys,"name CA");
Selection sel2(sys,"name CB");
// Check if it is safe to compute energies
if(sys.force_field_ready()){
// Energy of all atom pairs within 0.3 nm accounting for periodicity
// Interaction between two selections within default cutoff for frame 3
// and accounting for periodicity
auto en12 = non_bond_energy(sel1, sel2, 0, 3, true);
}
| sel1 = sys("name CA")
sel2 = sys("name CB")
# Check if it is safe to compute energies
if sys.force_field_ready():
# Energy of all atom pairs within 0.3 nm accounting for periodicity
self_en = sel1.non_bond_energy(0.3, True)
# Interaction between two selections within default cutoff for frame 3
# and accounting for periodicity
en12 = non_bond_energy(sel1, sel2, 0, 3, True)
|
Pteros uses DSSP method for determining secondary structure of proteins. Selection::dssp() writes detailed DSSP report to the file or to the stream if file name of stream object are provided as an argument. If called without arguments the DSSP string is returned with letter-code for secondary structure. The encoding is the same as in original DSSP program:
Structure | code |
---|---|
alphahelix | H |
betabridge | B |
strand | E |
helix_3 | G |
helix_5 | I |
turn | T |
bend | S |
loop | ' ' (space) |
Solvent accessible surface area (SASA) is computed in Pteros using two different algorithms: