Science and Environment

Bioinformatics for everyone

STAR SCIENCE - STAR SCIENCE By Marla A. Endriga, M.Sc. -

May 11, 2006 | 12:00am

In 2003, after a 13-year worldwide effort called the Human Genome Project, scientists were able to determine the complete sequence of the human DNA. It is composed of some three billion units called bases, represented by the letters G, A, T and C. The sequence of these letters spells out the instructions that make us us, from the curve of our lip, to the shape of our eyes, even some of the diseases we are prone to. However, deducing the sequence of the human genome is tantamount to having a book written in a foreign language, this one with a four-letter alphabet. The task now is to unlock the meaning of the sequence that has been discovered.

Enter the field of bioinformatics, or the computational branch of biology. It is also referred to as a hybrid of computer science and biology, which may sound strange, since there is nothing biological about computers. Or is there?

Computers are very commonly used by bankers, students, and by those in the call center industry. They are also very useful for the scientist, and not only for typing up scientific articles. In addition to carrying out experiments within a living organism (in vivo) or in an artificial environment (in vitro), researchers can now also perform experiments on a computer (in silico) to answer biological questions. Thus, the union of the fields of computer science and biology.

Aside from human DNA sequence information, an offshoot of the Human Genome Project was the development of advanced technologies that vastly increased the speed of sequencing experiments. Thus, the DNA sequence of other organisms such as the chicken, cow, pufferfish and many others have also been determined (see http://www.ensembl.org/index.html). Managing and organizing strings and strings of letters, or sequences numbering in the hundreds of thousands, such that they can be accessed and used even by those not familiar with programming and database management is no menial task, and this is one of the major areas of bioinformatics. The data that were generated from the Human Genome Project, for example, are stored in databases such as the Genbank (http://www.ncbi.nlm.nih.gov/), the DNA Databank of Japan (http://www.ddbj.nig.ac.jp/) and the European Molecular Biology Laboratory (http://www.ebi.ac.uk/embl/). These three databases contain the same information (scientists like to keep more than one copy of their data) and are scanned every day for errors. The information in these repositories of vast amounts of data is freely available to the public via the Internet, and there is no restriction on its use or redistribution. Scientists from all over the world who are able to elucidate new sequences in their respective labs can likewise submit new information for inclusion in the databases, thus increasing the data available for use by the scientific community.

But of what use are these data?

One of the most common uses of sequence information is in the determination of evolutionary relationships among organisms. An undergraduate student, for example, may be curious to find out if he is more genetically similar to a frog, a mouse, or a fruit fly. Or, one may be curious if marine snails that possess similar shell patterns are indeed more genetically similar to one another than to those that have different shell designs.

There are also databases that contain not DNA sequence information, but protein sequence and other related information [for example, SwissProt (http://www.expasy.org/sprot/) and the Protein Data Bank (http://www.rcsb.org/pdb/). Proteins are those other molecules of interest in molecular biology. If DNA is composed of chains of bases, proteins are composed of 20 types of units called amino acids, strung together and folded in a particular way. Examples of protein molecules are the collagen in skin, insulin that breaks down sugar in the body, and amylase that aids in digestion of food in the mouth. Proteins range in size from a few (about 20) amino acids to several hundreds.

Each amino acid in a protein molecule is composed of atoms (say, nitrogen and carbon). Science is at a state where the position of each single atom of each amino acid is known, for tens of thousands of proteins. Thus it is now possible, using such information, to visualize how protein molecules appear in three-dimensional space. Software such as Rasmol and Protein Explorer (http://www.umass.edu/microbio/rasmol/) may be used for such a task. Using these freeware, it is even possible to turn the molecule around with the click of a mouse button, and see the protein from different angles.

Armed with protein structure and sequence information, molecular visualization tools and a firm grasp of chemistry and physics, researchers are now able to design and construct their own protein molecules for use as therapeutic drugs.

Designer drugs are an exciting application of bioinformatics techniques. When making tailor-made molecules against diseases, scientists can control the specificity of the molecule. The idea is that the drug, once administered in the body, will react only with the specific molecule it was designed to interact with, thus decreasing side effects and increasing efficacy of the drug. An example of a work in progress are proposed structures of anti-inflammatory agents called COX-2 inhibitors, which are more potent versions of aspirin, designed at the Department of Physical Sciences and Mathematics at UP Manila. The use of bioinformatics tools in such projects decreases the need for the trial-and-error method traditionally used in screening would-be drug candidates.

One of the logical outcomes of studies at the cellular and molecular level is the eventual simulation of processes that occur simultaneously inside a living cell – the reactions that take place inside it, how it produces energy to perform its functions, how it communicates with other cells around it or some distance from it, what happens when it encounters an intruder; processes like aging and cell death. This type of work requires not only knowledge of biology and chemistry, but of higher physics and a lot of mathematics as well. Sometime in the future, scientists may well come up with a fully simulated cell, thus increasing understanding of the unit of life. That would certainly lead to more advances in the biological and medical sciences.

As with any scientific field, bioinformatics continues to evolve. Though the initial focus has been on molecular applications, bioinformatics approaches are now also used in organismic and ecosystem-level studies. Aside from UP Manila, bioinformatics techniques are used in other local institutions like St. Luke’s Medical Center, UP Diliman, Ateneo de Manila University, the Advanced Science and Technology Institute of the Department of Science and Technology as well as other universities and research institutions, in many research projects in human health, agriculture and other areas of interest. A number of graduate and undergraduate theses that use bioinformatics techniques have also been completed. Bioinformatics not only involves the biologist and the computer scientist, but the chemist, physicist, mathematician and statistician as well. Getting into bioinformatics is not difficult at all, as even undergraduate students can participate. All that’s needed to start is a computer with a good Internet connection, an inquisitive mind, and a healthy dose of imagination.

* * *

Marla A. Endriga is a junior faculty member at the Institute of Biology at the College of Science in UP Diliman. She holds a bachelor’s degree in Molecular Biology and Biotechnology and a master’s degree in Marine Science from the University of the Philippines in Diliman. She did post-graduate studies in Bioinformatics at the University of Cologne in Germany. She is the president of Mensa Philippines. Her e-mail address is [email protected].