My View

Richard Gayle

Meeting for a new century January 24, 2000

I spent the first week of this new year at a scientific meeting (Sure glad Y2K problems never had a serious impact.). So, this won't be a really technical column since I am still trying to get back up to speed (I am especially happy I remembered all of my passwords, even if I am not too sure what people's names are.).

The meeting I attended was The Pacific Symposium on Biocomputing. It attracted a really good mix of scientists from the US, Asia and Europe, both in academics and in business. This was the fourth one I have been to and was by far the best. This is truly a meeting that could not be organized in the absence of the Internet. Topics are proposed via the Internet and speakers are lined up. In fact, anyone can propose a topic and, if it is accepted, you get to be the chairperson for the session. Papers are written for each presentation, they are all refereed and they are indexed in PubMed, so they count as a real publication. Consequently, they are all of high quality. Because of the format, PDF files of every paper are available on the Internet, often BEFORE the meeting starts. At registration there is a hard-bound book containing each paper being presented. Very neat.

The meeting is pretty small, making it very approachable and quick on its feet. Sessions are during the day and informal discussion groups are in the evening. Last year we asked for more food (i.e. sit-down opportunities to chat with other meeting goers while consuming mass quantities.). They had several such opportunities this year. In response to suggestions, there are now tutorials to get people up to speed on topics that they may not be familiar with. This is really necessary since the meeting attracts a mixed crowd of biologists and computer scientists. There is a real attempt to present topics that are cutting edge and forward-looking. I have found this meeting very useful for getting an idea of where biocomputing, and the use of computers to solve biological problems, is going.

The first year I went, 1997, the sessions spent more time on the computing than on the bio. A lot of talks dealt with accessing databases, visualizing data, algorithms for searching sequences. Here are a few titles:

Towards a Density Functional Treatment of Chemical Reactions in Complex Media
On Some Operations Suggested by Genome Evolution
Test Tube Systems with Cutting/Recombination Operations

Even when they had interesting titles, the math was stomach-turning. I am funny for a biologist. I like math. But the number of equations in this book make my eyes cross. And many of the presentations themselves were even more difficult. Part of the problem comes down to differences between how a biologist presents data and how a computational scientist does.

First of all, if a slide projector is used, it must be by a biologist. In 4 years I have never seen a computational scientist use a slide projector. They ALL use overheads. I think this derives from the fact that the computational scientists use hardware all the time and are intimately acquainted with the problems that are inherent in using such a complex device as a slide projector, particularly if a wireless controller is to be used to advance the projector. We biologists have this naïve impression that the slide projector WILL work; the computer guys know better. An overhead projector is about a simple as you can get.

And sometimes the computer people do not even seem to trust a printer to handle their overheads. They only trust a blank plastic sheet and their Sharpie pens, which they bring along just in case. It may not be pretty but it will almost ALWAYS work.

But the real difference comes from the content. People who deal in computer algorithms LOVE them and have to show all of us exactly how they are derived. Now, I am sure if I designed search algorithms for a living, I would love them too. But slamming up overhead after overhead of dense mathmatical formulae is more likely to drive every biologist out of the auditorium. By the time I have remembered what means, they were 3 overheads on. Not only was I quickly lost but as a biologist, I want to know what all this has to do with biology.

There were enough presentations in 1997 dealing with computing solutions to relevant biological problems to make me return. And every year there have been increasing numbers of discussions about "real" biological problems, and less dealing with the best way to find a gene in a sequence. One hot idea last year was trying to model cellular systems in a computer. With the coming of "complete" knowledge of the coding sequences of many genomes, we will have to put it all back together. Can we learn things with a computer model of a signalling pathway? Can it help predict what the critical steps are? Can we model what happens in a heart during a myocardial infarction? well, we can't today, but I am sure we will be able to.

Another topic was how do we do protein structure determinations after we know the sequence of every coding region? Knowing the sequence of the protein does not tell us anything about structure. And even knowing the structure of the protein does not tell us about function. Proteins with similar structure can have very different functions and those with the same function can have different structures. This will be a incredibly important problem to solve in the coming years. And, thankfully, I do not think computers will be able to do it alone, meaning that there will still be lots of work for biologists like me.

So, what was this year's meeting like? Well, computational scientists still used overheads, a few behind-the-times biologists used slides but many of the structural genomics/bioinformatics/guys-with-lots-of-grant-money used computers to represent their data. Talk about naïve views of hardware . Yet, every one of the talks off of computer worked fine, except one using Windows and Powerpoint. This tells me an awful lot about the penetration of computers into biology (One interesting note was the very small numbers of people that used Windows for their presentation. I would say that over 80% of the speakers that used a computer used an alternative operating system, either Mac or Linux. Mayber they were not so naïve.).

This year's meeting had really interesting talks every day. The tutorials started off the first day. In many cases there was a direct relevance to questions we work on. One tutorial dealt with how protein structure is determined, with a good basic explanation of X-ray crystallography. Another gave the basics of gene arrays, with insights into problems of the technology. The informal nature and small size allowed you to easily ask questions and provided for excellent discussion. This is something I wish all meetings had.

The presentations themselves were enlightening. For example, a real problem today is not how to access databases or to set them up. It is designing ways to easily display the HUGE amount of data in a way that can be understood by biologists, allowing them to spend their time on experiments and not waiting for Web pages to load. The data mining and visualization talks dealt with making complex biological relationships presentable, not on how to find a gene in a sequence of DNA. There was also a lot of great discussions on array data, not only how to collect but, more importantly, how to interpret it.

And there were some nice "theoretical" papers. Sure, there was lots of math but they actually may have biological relevance. One looked at the information content between coding and non-coding sequences. But not just simple stuff like identifying splice junctions or such. No, it required high powered statistical sampling and mathematical manipluations. Lots of math but it appears that they may have identified something of biological interest, even if it does require a Fourier transform to see it.

Two sessions were held on new areas. One was on cheminformatics and computer-aided combinatorial chemistry. Identifying molecule leads for therapeutic needs (© 2000 Richard Gayle) using a computer has been a dream of many biologists and chemists. There was a nice discussion of searching chemical libraries using knowledge of the active site. Some of the newer approaches look very interesting. The tools and computation power are now present to allow fairly accurate identification of relevant small molecules and their isolation from a combinatorial library.

The other new session looked at the next step in sequence databases -- Single Nulceotide Polymorphisms or SNPs. These are single, random nucleotide changes in the human genome. In a very short time we will have databases that identify single nucleotide differences between many groups of human beings. With complete knowledge of the sequence, we will be able to identify SNPs that are closely linked to genetic disorders, not only for simple systems involving single genes but very complex genetic diseases that involved many different genes. Computers will allow us to identify the multiplexed activities of different gene combinations in a variety of disease settings. We will be able to look at migration patterns of human populations over the centuries. There are a lot of ethical questions this work will open up, but the generation of the databases has already begun.

In fact, one thing that really struck me about the meeting this year was how far we have come in answering scientific questions in biology, how many mountains we have climbed and how far we still need to go. Something we, as scientists, will need to consider are the ethical questions that are really starting to arise. Should Celera be able to patent their database containing the complete human genome? Should they be able to make the sequence proprietary? What effect will understanding how genes interact to create disease have on society, for good or ill? If we do not address these questions, someone else will. Jeremy Rifkin has made a living attacking biotechnology. The controversy regarding genetically-engineered food is another example. I am not really sure what we could do or how we should do it. Maybe I'll propose that as a topic for next year's meeting. Being a session chair should look good on my c.v.