Richard Gayle
Computer Bugs July 21, 2000
Last week I discussed the inability of people to accurately predict the future. It is easy to take small amounts of information and extrapolate them to the future, but seldom does this actually happen. Remember this quote:
Where a calculator on the ENIAC is equipped with 18,000 vacuum tubes and weighs 30 tons, computers in the future may have only 1,000 vacuum tubes and perhaps weigh 1 1/2 tons.
Popular Mechanics, March 1949.
The original 'computer' was designed by Babbage in the 1800's. It used standard Industrial Revolution technology, with pins and levers to do its calculations. Or I should say would have used because the actual device that could do multiplication and advanced calculations was never built. One problem, besides its costs, was that it was based on the decimal system making it very difficult to machine parts accurately to keep track of all 10 possible numbers at each position. The idea died until the introduction of vacuum tubes simplified the engineering. No moving parts!
Now, the first 'modern' (defined as a general purpose calculating machine. Specialized machines that solved specific types of operations were in operation in England) computer was huge and worked very slowly. ENIAC (Electronic Numerical Integrator and Calculator), like the atom bomb, was a child of World War II. The Army needed new ballistics tables in order to properly calculate the flight of a projectile fired from one of its guns. Women 'computers' required 10-40 hours to solve one trajectory in a table that could have up to 4 thousand entries!! Obviously, the military was interested in anything that could speed this up.
John Mauchly in Pennsylvania proposed an electronic computer that could solve these equations. ENIAC was built in secret, called Project X. Besides having almost 18,000 vacuum tubes (contrary to modern wisdom, properly maintained vacuum tubes were quite stable and were not burning out ALL of the time), it also used 10,000 capacitors and 70,000 resistors, taking up almost 2000 square feet.
However, redundancy was not really possible and a fault in a single solder could bring the entire computer down (irony engine on not like our fault tolerant computers of today). And problems did not just arise from burnt out tubes or bad solders. The term 'bug' with reference to early computers was a literal fact. An insect in the wrong place would short circuit the system. (Although, this article shows that the term 'bug' had been in usage for quite sometime with regard to mechanical breakdowns. Seems that bugs could cause just about any device to malfunction. At least it gave the inventor an excuse for why the product did not work.)
ENIAC was finished too late to help in the war in Europe. One of its first real uses was actually examining some calculations performed by female computers from the Manhattan Project in Los Alamos. It worked very well. It could solve in 30 seconds what it had previously taken 10 hours to calculate. Compared to anything today, it was incredibly primitive. It cost almost $500,000. It has a 'clock' speed of 100 kilohertz. A $40 calculator outshines it. But it was the first effective demonstration of a general purpose computer, that could be easily modified to solve almost any form of calculation.
Obviously, computers made of tubes would not go very far. Anyone extrapolating from ENIAC would predict a vacuum tube shortage but necessity is the mother of invention. First the transistor was developed, then made solid-state on silicon wafers. This allowed low power, small, failure-resistant (at least in the hardware) devices to be made. When I was in high school in 1972, we still had slide rule competitions. And slide rules could solve very difficult equations, particularly if you knew enough math to make simplifications. Now they are antiques and every child has a calculator.
Today, these programmable, handheld calculators are more powerful than the computers used onboard the space shuttle. We can now compute things that were unthinkable just a few years ago. What we will be able to calculate in just a few years is unknowable. But I'd like to take a whack at it. Because there is something so ego-gratifying about predicting the future, even if I am wrong.
There have always been a lot of similarities between the biotech industry and the computer industry. New technology drive both forward. All you need is one really bright idea and you can start your own company. In recent years, increasing computing power has made substantial inroads to biotechnology. Protein structure determinations used to result in handmade models constructed using half-silvered mirrors and lots of graduate student time. Now, any of us can use a web browser to see such a structure in great detail. Bioinformatics, a term that in only 4 or so years old, applies the tremendous computing power we now have to biological problems.
But computers will allow us to do so much more than just search databases for homologous sequences or to display information. They will help us do actual science. They will be added to the repertoire of research tools. They will be instrumental for our continuing quest to understand the human body.
Earlier this month, a really cool paper was published in Nature (a News and Views discussion is here). It illustrates the ability of computers to increase our knowledge of a system, in a way that is unique and incredibly useful. One of the major advances of multi-cellular life is the complex interactions between cells. Proper development of an organism requires timely communication between cells. The complex bodies seen in metazoans would be impossible without this sort of communication. All cells, prokaryotic and eukaryotic alike, have similar intermediary metabolic pathways, but only metazoans have a complex web of protein interactions that control how the different cells interact.
Segmentation in Drosophila is just such a complex process. Anterior cells express a transcription factor called Cubitus interruptus and secrete a signaling factor called Wingless. Posterior cells express a transcription factor called Engrailed and secrete Hedgehog. This polarity helped sets up the proper alignment of a linear array of 14 segments in Drosophila. It is most probably responsible for the multitude of body forms found in arthropods and perhaps in mammals. Any defect will result in abnormal segmentation.
von Dassow et al., right here at UW, have taken all the known data regarding interactions between the known players in this signaling pathway and described them in silico. This required them to solve the kinetics for 136 equations using almost 50 different parameters. And the values of most of these parameters (such as half-life, diffusion constants, affinities) are actually unknown.
But with high speed computers, you can do some modeling, starting with a set of parameters and working to a solution. Then you can see if your solution matches real life. If it does not, adjust the parameters and continue again. They used about 240,000 randomly chosen parameter sets and found over 1100 solutions. This is about 1 solution per 200 sets, implying that there is about a 90% chance of ANY random value for ANY parameter to be found in a solution.
Now, if they had just verified what we already knew, it would be interesting but probably not a Nature paper. What is really exciting about this paper are the things they found that were NOT known before. It turns out that just using the interactions that have already been reported, they were unable to find any parameters that gave a correct solution. Nothing worked. There was no way the defined network, based on current knowledge, could work. So, they examined their model and found a few logical gaps in the network, ones that had not been noticed before. Theory filled these gaps in and they were then able to find multiple solutions. So, one thing this work does is present an obvious course of future work determine whether their logical fixes actually occur in nature. Preliminary investigations indicate that they do.
But, even more interesting are the answers they get. On first examination, one might expect that the proper solution to this problem will be a small number of starting parameters. With such a complex system, the overall control must be fairly tight. However, exactly the opposite was found. In fact, almost EVERY one of the 50-odd parameters can vary over a huge range of values and still result in a correct solution. There is a tremendous amount of redundancy. And each particular solution was pretty robust. They could take a particular solution and vary the values of single parameters, while keeping the others constant, up to 1000-fold without destroying the segmentation pattern. It appears that the network itself generates the stability, not the degree of molecular interactions between its components.
Let me repeat that. The network itself is the most important aspect, not the actual affinities between any of its members, or their half-lives, or their binding coefficients!! If one member changes any of its properties, the others can adapt and still result in a proper solution.What a great way to accomodate any mutational changes in protein activity without disrupting the network. If this is a general aspect of molecular networks, and I suspect it is, it has the potential to completely alter the way we think about biology.
Most science, but particularly biology, operates using inductive reasoning. By understanding a specific instance, you can extrapolate to the general. Biology has always been way too complex to understand as a whole. Biology made grand strides as a discipline when scientists were able to examine isolated systems and purified materials. For at least the last 100 years, it has been reductionist is scope. By understanding, in all detail, how a single isolated protein works, perhaps we could gain some small understanding of how a cell works. The tremendous amount of work done the last 50 years has produced a wealth of information that has helped us move towards this goal.
The complete sequencing of the human genome, to my mind, signals an end to the purely reductionist approach. We will know what the sequence of EVERY protein made in the cell is. One goal will be to find out what they do, what their kinetic parameters are. But the real goal now will be to gain further understanding of how they all fit together when they are expressed, which influence the production of others, how the network of cellular communications is constructed. The actual physical values dealing with affinities of proteins for each other, their serum half-life, etc. will not be as important as simply knowing which proteins interact with others. The elucidation of the cellular networks will yield more information than knowing the physical properties of its members.
I believe that this is what biologists will do for the next few years. Instead of the bugs outside the computer, it will be the bugs inside, the in silico cells, that will be important. Because the first group that can accurately simulate the complete biology of a human cell will be in a position to rewrite how biological research is done.