Pteros
2.0
Molecular modeling library for human beings!
|
Although System and Selection classes already provide quite high-level tools for building custom analysis programs, Pteros contains even more advanced facilities for rapid implementation of complex analysis algorithms. When you build your custom analysis program, it is usually painful and repetitive to implement the following things:
It is necessary to emphasize an importance of parallel processing. MD trajectories are often huge (up to ~100Gb) and reading them from disk tipically takes many minutes, especially if the storage is slow or even non-local. If you have 5 different anaysis tasks, which should be applied to the same trajectory it is very wasteful to run them sequntially and to read the whole trajectory five times. It is much more logical to read the trajectory only ones and execute all your tasks in parallel for each frame. By doing this you will also utilize the power of you modern multi-core processor effectively.
Another scenario is the analysis task, which does not depend on frame processing order. Typical example is averaging of some property over the frames. In such case it is tempting to spawn several instances of analys task, each running in parallel in its own thread, and dispatch frames to them from the shared queue without any particular order. This again utilizes all cores of your processor and can make the analysis much faster.
All these routine operations in Pteros are incapsulated into the Trajectory_reader class. The logic of using this class is the following. You supply it with the set of options (the trajectory to read, which frames to include into the analysis, etc). In addition you create a number of task objects, which represent separated analysis tasks, and connect them to the Trajectory_reader. After that you run the reader. It launches all supplied tasks in separate parallel threads (or spawns many instances of the parallel task), reads the trajectory frame by frame in yet another thread and passes the frames to each of the tasks for user-defined processing. In such scenario the speed of trajectory processing is limited by either the slowest task or the trajectory file IO (whatever is slower).
Although the framework of trajectory reader and tasks is a high-level concept by itself it is used to provide even higher-level abstractions - the Analysis plugins.
Trajectory_reader class incapsulates all low-level details of reading trajectories including reading intervals of frames or times, skipping frames, etc. Its typical usage is the following:
This program could be called with the following command line (see Processing command-line options for details):
All analysis tasks in Pteros are subdivided into serial and parallel.
The objects of serial tasks exist as single instances, which run in their own thread of execution. The task is "serial" in the sence that inside the task code all operations are sequential and executed one by one in defined order. Trajectory_reader may run as many serial tasks as you want at the same time.
The parallel tasks behave differently. The Trajectory_reader may run only one parallel task and nothing else. The parallel task spawns as many instances of itself as there are cores in your processor. Each trajectory frame is dispatched to one of the instances only with no particular order. At the end all instances merge their results to the master instance, which is combining the results. Parallel tasks work much like parallel for loops over trajectory frames.
Regardless of the task type you have to override three methods: pre_process(), process_frame() and post_process(). For parallel tasks additional methods before_spawn() and collect_data() have to be overriden.
Tasks are created by the macro TASK_SERIAL or TASK_PARALLEL:
The moments when these methods are called are different for serial and parallel tasks:
Method | Serial task | Parallel task |
---|---|---|
pre_process | Called by each task on the first valid trajectory frame | Called on the frame which is consumed first by particular task instance. It is not predictable which frame is this. This is a place to set up selections for this particular task instance. |
process_frame | Called on each valid trajectory frame. For the first frame is called just after pre_process(). Frames are processed in order. | Called by task instances in parallel on different frames. Frames are dispatched to instances with no particular order on the "first come first served" basis. |
post_process | Called after last valid trajectory frame for each task. | Called by each task instance after consuming last available frame for this instance. It is not predictable which frame is this. |
collect_data | N/A | Called by master instance only when the trajectory is complete. All other instances are passed in tasks vector. The purpose is to merge local data of all instances together in the master instance for final processing. |
before_spawn | N/A | Called by master task instance before spawning other instances and before consuming any frames. This is a place to process task options, set up jump remover, etc.
|
Parallel tasks require additional attention when setting up selections. When "normal" serial task is executed it possesses an independent System in the member variable called 'system'. Any selections which are the members of your task class are created based on this system in the pre_process() method. All this is straightforward and easy to understand. When parallel task is executed the things become more complex. When multiple instances of your task class are spawned each of them will have its own 'system' variable. This means that any selection, which was made before spawning will be copyed to all task instances but they will still bind to the 'system' of the master instance! As a results you'll have a lot of fun trying to debug misterious errors and crashes.
Trajectory_reader takes an Options object, which provides mandatory built-in command-line options that specify which trajectory to read and which frames from it to accept. Default options are the following:
Option flag | Meaning | Comment |
---|---|---|
-help | Show help on usage and options | Could be extended for custom options |
-f | List of files to read | \i Example: -f sturcture.pdb traj1.xtc traj2.xtc The list may include
Files may appear in any order, but trajectory files will be processed in the order of their appearance. |
-b | Beginning of processing (starting frame or time) | Default: 0 (fisrt frame of trajectory). Value suffixes accepted. |
-e | End of processing (end frame or time) | Default: -1 (last frame of trajectory). Value suffixes accepted. |
-t0 | Custom starting time | Default: -1 (use value from the first frame). Value suffixes accepted. Useful when trajectory does not contain time stamps or if the starting time is incorrect.
|
-dt | Custom time step | Default: -1 (use value from trajectory). Useful when trajectory does not contain time stamps.
|
-log | Prints logging information on each n-th processed frame | Default: -1 (no logging). |
-buffer | Number of frames, which are kept in memory of Trajectory_reader | Default: 10. You only need to decrease this if individual frames are very large. |
-task | Name of the analysis task | No default. Name of the task is used to load corresponding compiled analysis plugin or python plugin. Options, which follow "-task name" are forwarded to the corresponding task only and will not be visible in other tasks or Trajectory_reader. Many tasks could be specified. |
Options -b, -e and -t0 could be specified with the following convenience suffixes:
Suffix | Example | Meaning |
---|---|---|
(no suffix) | value is in frames | |
fr | 10fr | value is in frames |
t | 10t | value is time in picoseconds (value used as is) |
ps | 15ps | value is time in picoseconds (value used as is) |
ns | 100ns | value is time in nanoseconds (value multiplied by 10^3) |
us | 2us | value is time in microseconds (value multiplied by 10^6) |
ms | 4ms | value is time in milliseconds (value multiplied by 10^9) |
The process_frame() and post_process() methods of analysis tasks are called with Frame_info argument, which contains metadata of the currently processed frame:
Property | Meaning |
---|---|
absolute_frame | Counts all frames (either consumed or skipped) in trajectory starting from 0 |
absolute_time | Current time stamp |
valid_frame | Counts only valid frames (the frames, which fall into specified time and frame range) and are sent to processing. Starts from 0. |
first_time | Time of the first processed valid frame |
last_time | Time of the last processed valid frame. During the processing is set to absolute_time. Meaningfull for post-processing only. |
first_frame | First processed valid frame (an absolute value) |
last_frame | Last processed valid frame (an absolute value). During the processing is set to absolute_frame. Meaningfull for post-processing only. |
Each analysis task is derived from Task_base class under the hood and contains the following "magic variables":
Variable | Meaning |
---|---|
system | the System allocated locally for this task containg currently processing frame as frame #0. Changes in this system are local does not propagate to other serial tasks or instances of parallel tasks. |
log | logger object, which is local for current task. Logger uses unique name formed by the name of task class and its id. |
options | Options object with command-line options provided for this particular task. See Processing command-line options for details. |
jump_remover | Jump_remover object of the current task. See Removing jumps over periodic boundaries for details. |