Pteros
2.0
Molecular modeling library for human beings!
|
Suppose that the Pteros library is compiled and all linking requirements are satisfied. In order to start using Pteros in you program you need single include statement:
This will include basic classes of the library. All classes and function of the Pteros library are defined inside "pteros" namespace. The following line will allow to omit repetitive "pteros::" prefix in your program:
The fundamental objects in Pteros are systems and selections. System is what its name suggests - the whole molecular system with atoms and their coordinates (plus optional information about bonds, periodic box and force field parameters). The attributes and the coordinates of atoms are physically stored in the system. Typically the system is loaded from one or several files (such as PDB or GRO). The system can contain several sets of coordinates, called frames, which are typically loaded from trajectory files of MD simulations (such as XTC or DCD). The system can be created in two ways:
The method load
has reach set of additional options and could be called several times to add difference pieces of data from several structure and trajectory files. See Advanced loading with file handlers for details.
Atoms and frames are stored inside the system are usually not accessed directly. The system is only a container for them, while all manipulations are done by means of selections.
Selection is a subset of atoms in the System. Selection does not hold any data, but merely point to particular group of atoms. There are several ways of creating a selection - from textual description, from the sequence of indexes, by using custom selection functions, etc. (see here for details). Every selection is associated with one and only one system. You can't select atoms from several systems simultaneously.
In order to create a selection from the textual dexcription you must supply the parent system and selection string. The selection syntax in Pteros is very similar to one used in VMD.
Instead of giving long and boring formal description of the selection syntax (available here), let's learn it by example:
Textual selections are "smart" in a way that selection text is analysed and optimized in numerous ways before evaluation. If selection depends on the coordinates of atoms it updates automatically if the coordinates change:
It is also possible to make a selection from the pair of indexes or the pair of iterators of some integer sequence:
Finally, if you want to implement really complex logic of selecting atoms you can use selection with callbacks:
You can also create empty selections and populate them later using modify() methods:
Different modify()
methods exist, which correspond to other types of selections - for the pair of indexes, for the pair of iterators, for callback function, etc.
Selections could be copyed and assigned, particularly it is possible to place them to STL containers:
If assigning one selection to another, the deep copy of the selection (not just a reference!) is created.
The systems are also copyable, but with one important twist - associated selections are not copyed with the parent system.
We can do a lot of different things with selections. Let's start from obtaining residue names of all selected atoms as an STL vector:
We can also obtain any property of particular selected atom with very simple syntax. For example let's print the chain and the resid of the first atom in selection:
Note that first atom in selection is not the first atom in the system! If you want to know the index of this atom in the system you should do
In fact the code "sel1.Chain(0)" above is an equivalent of verbose expression
This fragment will not compile because Selection::index is private, but in any case the shorthand function Selection::Chain() simplifies the things a lot. Such shorthand functions are inlined, thus in principle there is no performance loss. Other atom attributes could also be accessed by means of such functions with the name, which coincide with the attribute name, but with the capital first letter (Name(i), Chain(i), Index(i), X(i), etc.). The main attributes of the atoms are:
Now let's play with the coordinates of atoms. First of all let's load molecular dynamics trajectory into the system:
The trajectory should contain the same number of atoms as the system. The XTC, TRR and DCD trajectory files are now supportd.
It is also possible to read only certain portion of trajectory, say between frames 10 and 100:
Selections are always born pointing to the frame 0 (frame count starts from zero). Let's make selection point to the frame 3:
Now we can obtain the coordinate of particular atom i for the frame 3:
or the coordinates of all atoms in selection for frame 3 as:
It is also possible to get coordinates of any frame by supplying second parameter:
Another way of getting the properties of atoms and coordinates in selection is using the indexing syntax:
This code will output resid of the atom 0 and the coordinates of atom 3 for frame 10. Usually indexing syntax is less convenient and more verbose, however it has an big advantage of working in the iterator-based or range-based loops:
One can also duplicate frames, copy one frame to the other and delete frames. Note that this is done by the methods of System class:
Pteros provides reach set of geometry transformation functions. Transformation applied to selection will immediately take effect on all selections, which overlap with given selection. Let's look at some examples:
It is very easy to compute the RMSD between two selection of the same size (they can belong to different systems):
We can also do this for arbitrary frames:
It is possible to compute RMSD for different frame of the same selection:
In order to do RMSD fitting of two selections of the same size it enough to write:
However, the most common situation is when you are fitting together, say, Ca atoms, but need to rotate the whole protein according to this fitting. This is accomplished by computing fitting transformation first and then applying it:
Although Pteros is a C++ library, many molecular analysis tasks require writing simple "throw-away" scripts without edit-compile-run overhead of C++. Python bindings serve this purpose in Pteros. In addition to this end-user application Python bindings are also vital part of the Very-high-level facilities system.
Bindings are described in a dedicated documentation page.
Although System and Selection classes already provide quite high-level tools for building custom analysis programs, Pteros contains even more advanced facilities for rapid implementation of complex analysis algorithms. When you build your custom analysis program, it is usually painful to implement the following things:
It is necessary to emphasize an importance of parallel processing. MD trajectories are often huge (up to ~100Gb) and reading them from disk tipically takes many minutes, especially if the storage is not local. If you have 5 different anaysis tasks, which should be applied to the same trajectory it is very wasteful to run them sequntially and to read the whole trajectory five times. It is much more logical to read the trajectory only ones and execute all your tasks in parallel for each frame. By doing this you will also utilize the power of you modern multi-core processor effectively.
All these routine operations in Pteros are incapsulated into the Trajectory_processor class. The logic of using this class is the following. You supply it with the set of options (the trajectory to read, which frames to include into the analysis, etc). In addition you create a number of Consumer objects, which represent separated analysis tasks, and connect them to the Trajectory_processor. After that you run the processor. It launches all supplied tasks in separate parallel threads, read the trajectory frame by frame and passes the frames to each of the tasks for user-defined processing.
Let's write a simple example of computing average minimal and maximal coordinates in selection along the trajectory using the power of Trajectory_processor. First of all we need to subclass a Consumer class:
All logic of our analysis should be implemented in three virtual methods: pre_process(), process_frame() and post_process(). The names are self-explanatory. Let's implement them:
Now we can write a main program for our small analysis utility:
Now three tasks, operating on different selections will run in parallel while reading the trajectory. But wait, what trajectory we are going to read? This is specified at run time by the command line arguments:
In our case we specify -f, -b and -e arguments, which are absorbed internally by Trajectory_processor. Trajectory_processor looks at the list of files cpecified after -f and finds a structure file (some-protein.pdb in our case). This file is loaded into the "system" variable of all our tasks. Then Trajectory_processor reads all trajectories one by one in order of appearance and calls our task for frame processing. Processing starts at frame 14 and ends when the time stamp in current frame becomes larger then 250 ps. All this complex logic is completely incapsulated by Trajectory_processor class, which saves you a lot of time.
As you noticed, we hard-coded selection texts in out code. Let's do our program more flexible. We will modify it to take multiple additional arguments like this:
This is surprisingly simple:
In fact we only need three lines to process our additional options!
Pteros provides even higher level facilities for developing custom trajectory analysis algorithms - the analysis plugins. The analysis plugin is a class with very simple interface derived from Consumer, which runs in parallel during reading the trajectory. In contrast to Consumer analysis plugins are loaded and executed by dedicated driver program, so you don't need to bother with initialization of Trajectory_processor, passing parameters and other 'housholding' code. The most exciting thing about analysis plugins is that they could be written either in C++ or pure Python using almost identical API and intergrated seamlessly.
C++ analysis plugins run at the same speed as manually written programs, which use Trajectory_processor and Consumer - there is no run-time overhead (except initial searching and loading of plugins, which is usually neglegible). The driver script only connects the consumers with Trajectory_processor from Python side and after that no Python code is evaluated at all.
Pure Python tasks are, of course, limited by the speed of Python interpreter but they are extremely easy to write and to modify. In general pure Python tasks, which mostly call compiled Pteros methods are also very fast.
In the /bin directory of your pteros installation you can find pteros_analysis.py executable Python script, which is the driver for analysis plugins. All plugins (both C++ and Python) are stored in python/pteros_analysis_plugins directory. Any shared library (*.so) or python file (*.py), which appear there is treated as a plugin and could be loaded by the driver.
The driver script is called like this (splitted by several lines for clarity):
The driver loads specified structure file and reads provided trajectory frame by frame in the given range of frames. On each frame all specified tasks are called.
Thus all Python tasks run in parallel with C++ tasks but sequentially in respect to each other (this is an unfortunate limitation of the Python multithreading model). In practice you'll unlikely run many time consuming Python tasks simultaneously, so this should not be a serious performance limitation. In any case if the execution speed becomes a problem it is better to implement the plugin in C++.
Passing the parameter -plugin_file to any task allows loading plugins from any non-standard location by the relative or absolute path.
You are free to run multiple instances of the same task with different parameters
all of them will be executed separately and will not interfere with each other.
Let's write simple pure Python analysis plugin, which computes center of masses of given selection. Put the following code into the file example_plugin.py and place it into the directory python/pteros_analysis_plugins of your Pteros installation
Any Python plugin should define a class Task with three methods: pre_process(), post_process() and process_frame(). The signatures of the methods are evident from the code. Such class gets "magic" variables system, label and options. The system variable is a reference to the underlying system object, while label is a textual label "\<TaksName\>_id\<N\>", where TaskName is the name of analysis plugin and N is the unique number of the task. Label is handy when you need to get a unique name of the output file, which will never clash with the output of other tasks, which are running in parallel.
The variable options contains the options, which corresponds to this particular task. We extract the options selection and mass_weighted using it.
The rest of the code is self-explanatory.
Let us implement the same plugin in C++. It will look almost the same and has very small syntactic overhead:
We inherit a class from Compiled_plugin_base and override the same methods pre_process(), post_process() and process_frame(). In C++ we need to put an explicit constructor, which will initialize the base class. The rest of the code is almost 1-to-1 translation of the Python example given above.
The crucial point is the macro CREATE_COMPILED_PLUGIN(PLUGIN_NAME), which does all the magic for us. Behind the scene it creates the code for compiled Python extension module. After compilation and linking we get center.so file, which is loaded by the driver program at the same way as our pure Python plugin.
The macro PLUGIN_NAME comes from the CMake build script, but you can define your own name in the code (in this case you whould also make sure that the build system produces shared library with appropriate name!).
The simplest way to build and install your plugin is using the template CMake project located at the /template_plugin directory of the Pteros source tree.