Home / BiomedicalEn / The ProFAX project and the protein folding algorithm based on Monte Carlo

The ProFAX project and the protein folding algorithm based on Monte Carlo

Protein molecule structure

PH: hhmi.org

The connection between amino acid sequence and protein structure, known as the protein folding problem, is a central issue in the post-genomic era (Khoury, G. A., Smadbeck, J., Kieslich, C. A., & Floudas, C. A. (2014). Protein folding and de novo protein design for biotechnological applications. Trends in biotechnology32(2), 99-109). Protein structure prediction aims to accurately determine the full 3-dimensional structure of a protein given only its amino acid sequence, exploiting energetic and geometrical features. This knowledge is crucial for the development of new pharmaceutical therapies. The ability to produce new usable protein templates through Ab Initio protein structure prediction starting from an unknown amino acid sequence (or from an amino acid sequence whose 3-dimensional structure is unknown), as the algorithm we are working on, is important for protein design, and for advancement in biotechnology and drug discovery. The data size of this type of algorithms and their complexity require biologists to work in close collaboration with experts in computational sciences, modeling and statistics. Pharmaceutical and biotech industries can successfully use this computational method. They should take the initiative to enhance their research and development with state of the art bioinformatics approaches.

Proteing folding

Improving the computational approach to protein folding is also a crucial point in order to realize the power of full-genome sequencing. The field of computational genomics has been growing steadily, attracting more research funding for both academic and applied research companies. The results, derived from this approach, could be equally applied to applications also in the domain of agricultural research, biodiversity, conservation, sustainable development, bioremediation, bioengineering and nutrition.

However, although its potentiality, companies are slowed down by the high computational needing of the algorithm. A speed-up in the execution time would be crucial to enhance the productivity of such industries.

Pro FAX, Polimi

After recently presented the PrISMA project led by Gea Bianchi and Fabiola Casasopra to search for proteomic biomarkers, it’s time for the accelerated implementation of an Ab initio algorithm protein folding, based on Monte Carlo simulation.

Thinking about that were two students at Politecnico di Milano, Giulia Guidi and Lorenzo Di Tucci: she is a student of Biomedical Engineering, and he  is a Computer Engineering. The project is Profax.

WHO THEY ARE

Giulia started being interested in bioinformatics after attending a computer science course held by Prof. Marco Domenico Santambrogio, head of the NECST Lab at Politecnico di Milano. During the summer, she started searching for ideas and topic related to bioinformatics and, in October, she proposed her idea to Prof. Santambrogio who liked it and decide to support it. She liked so much staying at NECST Lab that she decided to combine some computer science related exam to her biomedical ones.

Lorenzo became interested in hardware architectures during his 4th year at Politecnico di Milano, while attending the course High Performance Processor and Systems, where he did a project regarding hardware acceleration. He strengthen his knowledge on the topic during a 6 months internship in Xilinx research labs (the company that invented the FPGA) in Dublin, Ireland, under the guidance and supervision of Dr. Michaela Blott. Following Giulia’s ideas, Lorenzo started collaborating with her in the realization of the algorithm resulting in a publication at the RAW conference.

THE PROJECT

“Our goal -explain at Close-up Engineering Giulia Guidi and Lorenzo Di Tucci- is to reach an improvement in terms of ratio performance over power consumption by using the Xilinx SDAccel toolchain targeting the Alpha Data board.

Alpha data board PROFAX, Polimi

The implementation proposed is based on Monte Carlo simulation, which is commonly employed to compute pathways and thermodynamic properties of proteins. A simulation run is a series of random steps in conformation space, each perturbing some degrees of freedom of the molecule. A step is accepted with a probability that depends on the change in value of an energy function. Typical energy functions sum different terms.  The most expensive ones, from a computational point of views, are contributed by atom pairs closer than some cut-off distance. The energy function adopted in the algorithm, to search the minimum energy conformation, is called EEF1 function (Lazaridis, T., & Karplus, M. (1999). Effective energy function for proteins in solution. Proteins: Structure, Function, and Bioinformatics35(2), 133-152). It is an atom-based function and it is described by Themis  Lazaridis and Martin Karplus. EEF1 reports the effective energy hypersurface of proteins. It uses the CHARMM19 polar hydrogen potential energy function complemented by a simple Gaussian model for the solvation free energy.

SDAccel, ProFAX

The functional form of the solvation free energy density was chosen basing on statistical mechanical calculations. Ionic side-chains are treated as highly polar but neutral, in order to avoid the problem arising from the insufficiently accurate compensation between the very large coulombic interactions and solvation free energies of ionic groups. The model uses the linear distance-dependent dielectric constant in CHARMM.

The proposed code was profiled using the profiling tool Callgrind and the profile data visualization KCachegrind.

“The aim of this strategy is to find out what the bottleneck procedures in the program are and what is the most compute intense  function, which means that it takes upto 60% of the runtime in the current application. The identified function, in this case named computePairEnergy, has been parallelized. Thereafter, a static analysis of the function has been made in order to understand, exploiting Roofline model (Williams, S., Waterman, A., & Patterson, D. (2009). Roofline: an insightful visual performance model for multicore architectures. Communications of the ACM,52(4), 65-76.), if is worth moving the function into hardware”.

Due to its complexity, a software implementation is inefficient from both an energetic and execution time perspective. Moving the implementation to hardware is the best way to optimize such type of algorithms.

In this context there are different hardware platforms, such as ASICs, (Application Specific Integrated Circuits) GPUs (Graphics Processing Unit) and FPGAs (Field Programmable Gate Arrays), among which the user can choose to implement the algorithm for.

Generally speaking, ASICs are quite often the best solution both from a performance and power consumption point of view. However the cost of the production of such devices is very high and is justified only in case of a high number of deployed systems. GPUs, thanks to their architecture, can considerably accelerate certain types of algorithms, but from the other side their power consumption is substantially high.

FPGAs, instead, offer a good trade-off between performances and power consumption.
Thanks to their flexibility and lack of a statically predefined hardware architecture, FPGAs can be configured and reconfigured several times by the user to implement different functions. An hardware implementation on a FPGA would give the benefit of a speedup over a pure software implementation while reducing power consumption and in adding runtime hardware adaptability

Moreover, the development cost of an application to run on an FPGA is remarkably lower compared to the ASIC one. For these reasons, the FPGA seems to be the best target for our application. We can, in fact, leaverage its reconfigurability capabilities in order to create an hardware implementation aiming at power performance ratio optimization.

A scientific paper -conclude the two students-, describing our implementation choices and our first results in terms of speed-up between the software implementation and the hardware one, have been accepted, and will be published at the RAW (Reconfigurable Architecture Workshop) Conference (raw.necst.it) that will take place in Chicago, Illinois (USA) on May 23-24. During the workshop there will be a presentation of our work and a poster will be shown where whoever is interested can come and ask questions. (If someone is interested, that would be a great opportunity to have information about our work!)“.

Together, with other students from NECSTLab at Politecnico di Milano they partecipate with the protein folding application at the Xilinx Hardware Contest 2016.

Follow the project on:
Facebook
Twitter