(NEW!) 5/5: Most presentations have been uploaded. Please check this page regularly for updates.
Porting Biological Problems to the Grid
| Introduction to Grid computing | |
![]() Dr Ludek Matyska |
Professor Ludek Matyska will be providing an overview of Grid computing as well as the biomedical applications.
Keynote Presentation on EUAsiaGrid (Professor Ludek Matyska) Biomedical Applications of Grid Computing (PPT)
|
| gLITE Grid Computing Hands-on | |
![]() Dr Marco Fargetta ![]() Dr Giuseppe La Rocca |
Professors Marco Fargetta and Giuseppe La Rocca will be covering the basic concepts of Grid computing
on the EUAsiaGrid as well as providing hands-on training experience
for participants in using the grid.
|
| Grid-Enabled Applications in Bioinformatics | |
| (NEW!) Grid-Enabled Solutions | |
![]() Dr Marco Fargetta ![]() Dr Giuseppe La Rocca ![]() Jim Ho ![]() Wang HsiKai |
Sample JDL files for submitting bioinformatics applications to Grid
|
| 1. Grid enabling phylogenetic inference | |
![]() Jim Ho ![]() Wang HsiKai ![]() Hu Yong Li ![]() Dr. Chanditha Hapuarachchi ![]() Lee Kim-Sung ![]() Deng Lu |
Phylogenetic analysis of HIV proteomes Hu Yong Li, Dept of Biochemistry, NUS Powerpoint presentation of problem scenario (HIV) Currently, there are nearly 200,000 HIV proteomes in our HIV database. We carry out phylogenetic inference analysis regularly on non-grid standalone servers. To migrate to a grid/cloud platform will help us increase our speed of analysis. The software we use for phylogenetic analysis is the publicly accessible PHYLIP package, in particular the protdist program and others such DNAML if we wish to include analysis at the DNA level. If we can grid-enable PHYLIP which is already done on many platforms, we can scale up our analysis.
Chikungunya and Dengue virus DNA sequences using BEAST
Currently, our group is
analysing full and partial genome sequences of viruses (mainly vector
borne viruses such as dengue, chikungunya etc) implicated in common
infectious diseases in Singapore. The objective is to understand the
molecular epidemiology of the disease. The information will be used to
understand the pattern of spread and importations of these viruses in the
country. We often use phylogenetic analyses to infer the genetic
relatedness between different groups of viruses. This involves multiple
alignment and subsequent analysis of alignments using a variety of
bioinformatics tools.
Problem: One of the tools that is being
currently used is the BEAST package, which is a publicly accessible, java
based programme. The package is used to construct trees, to calculate
rates of evolution and to understand the population dynamics and spatial
distribution of viruses. The programme involves a lot of analytical steps
and consumes a huge amount of computing power. For an example, the
analysis of a 12 kb full genome dataset with 90 sequences takes us
approximately a week to complete under the currently available computing
power. Most often, the same analysis has to be repeated with different
parameter settings several times and takes weeks to months to complete.
The problem becomes even worse when two or more analyses have to be done
at the same time. Accessing to external computer resources that support
similar types of analyses may be a good solution for this problem. We hope
that this workshop will provide us some clues about whether such programme
packages could be grid-enabled, that makes our analysis much faster and
broader. I have attached a mock dataset to test during the workshop
|
| 2. Grid-enabled parameter sweep for SVM parameter optimization of caspase cleavage sites prediction | |
![]() Jim Ho ![]() Wang HsiKai ![]() Dr Lawrence Wee |
Dr Lawrence Wee, SiGN, Singapore LibSVM is regularly used for our prediction of Caspases. A typical run takes a minute on a machine. However, we need to optimize the SVM prediction and this is carried out based on a parameter sweep, based on the type of SVM prediction used. If we can accelerate the processes, and run LibSVM multiple times to optimize the training of the machine learning process, we can enhance our prediction of caspase protein sequences. |
| 3. Grid Enabling Genome Search to identify T3SS effectors | |
![]() Jim Ho ![]() Wang HsiKai ![]() Sun Guang Wen |
Dr Sun Guang Wen, Department of Biochemistry, NUS Powerpoint presentation of problem scenario (T3SS) Currently there are 1,500 records of experimentally verified or suspected T3 effector proteins. Using getorf in the popular EMBOSS sequence analysis package, it is possible to identify 100,000 open reading frames in a typical bacteria genome such as that of Burkholderia species of which only 10% may actually be functional. Of these, we would like to identify which of these open reading frames by code for Effector Proteins using BLAST. We wish to analyse this for all bacteria genomes to identify families of effector proteins, starting with Burkholderia. If we can scale up and carry out multiple sequence alignments of similar protein sequences, using CLUSTAL, MUSCLE, PROMALS, T-Coffee, etc we can classify the groups of putative Effectors for further analysis. The best group of effectors can be selected for extracting patterns and motifs, and feature extracted for development Support Vector Machine prediction (e.g. LibSVM, SVMLight) of novel effectors which we can verify in the laboratory. |
| 4. Grid-enabled Ligand-Receptor Docking | |
![]() Jim Ho ![]() Wang HsiKai ![]() Lam Tze Hau ![]() Dr. Heru Suhartanto |
Ligand Docking to MHC Class I molecules Lam Tze Hau, I2R, A*STAR, Singapore Powerpoint presentation of problem scenario (Autodock) A single MHC-peptide ligand-receptor docking using Autodock or ICM takes 10 t o 15mins to complete, depending on the constraints. We typically need to adjust for multiple constraints and evaluate the results. We currently have thousands o f dockings we need to carryout for different MHC molecules and different binding peptides. If we can grid-enable this process, we can have significant speed up to evaluate more MHC and more peptides. Docking with Autodock and Molecular Dynamic analysis with Gromacs: Indonesian Herbal Pharmacological screening in silico Dr Heru Suhartanto, Universitas Indonesia Powerpoint presentation of problem scenario (Gromacs) a> |
| 5. Grid-enabled Multiple Sequence Alignment | |
![]() Jim Ho ![]() Wang HsiKai ![]() Lee Hong Kai ![]() Thomas Tay |
Enabling Multiple Sequence Comparison by Log-Expectation (MUSCLE) Lee Hong Kai and Thomas Tay, NUHS Molecular Diagnostic Center, Singapore Powerpoint presentation of problem scenario (MUSCLE) Purpose: A need for multiple sequence alignment of norovirus from genogroup I,II and IV for effective primer probe design, phylogeny analysis, SNP analysis as well as sequence variability analysis. Multiple sequence alignment tools like ClustalW is very time-consuming, taking about 9 hours to run for about 1000 sequences. MUSCLE is much faster but is still limited by memory, therefore we are seeking for ways to grid enable MUSCLE to speed up multiple sequence alignment. |
| Invited Talk | |
| 1. Grids or Clouds? | |
![]() Simone Brunozzi |
(NEW!)
Presentation slides
Simone Brunozzi, Amazon Web Services Technology Evangelist for APAC, will demystify common beliefs about Cloud Computing, briefly explain its main features, and show examples on how to use Amazon Web Services to run HPC tasks, MapReduce jobs, or in general to tap into a vast on-demand computing resource provided by Amazon to solve Bioinformatics problems or other computational tasks. The talk will cover both technical and economic aspects of Cloud Computing. At the end, attendees will be encouraged to ask questions. |
| Practical Parallel Computing in R | |
![]() Xie Chao ![]() TW Tan |
R programming environment is frequently used in bioinformatics. For example the very popular BioConductor packages for R is widely used by biological researchers and highly cited.
This session will consider how to parallelise applications with the R environment at different scales of granularity. A comparison of different platforms implementing this parallelised application
is made across multicore, cluster, grid and cloud computing to illustrate the issues encountered. (30mins)
|
| Setting up a Biocloud using BioSlax |
![]() Mark De Silva ![]() KS Lim ![]() TW Tan |
Bioinformatics users need to standardise an operating system containing a controlled and predictable programming environment without having to worry about versions of programming languages and applications. BioSlax is designed as a standard BioLinux platform which is easily adapted to Grid or Cloud environments.
|