Computational biology ­ with genomics as the new internet

With interests stretching from palaeontology to gastronomy and a background in theoretical and mathematical physics, Nathan Myhrvold is not a man to leave a stone unturned. After working with Stephen Hawking at Cambridge University, he ended up as chief technology officer with Microsoft before retiring in 2000 to set up technology commercialisation firm Intellectual Values.

At the recent Cambridge-MIT Institute Distinguished Lecture, his enthusiasm was focused on computational biology. Genomics, he believes, is the new internet. And its boom could be even longer and louder.

Myhrvold began by using Moore's alaw' to draw parallels between the growth in computing and the growth in bioinformation. Gordon Moore was co-founder of Intel and his alaw' is based on his comment in 1965 that the number of transistors per square inch on integrated circuits had doubled every year since such circuits were invented and that this trend would continue for the foreseeable future.

Myhrvold pointed out that truth of Moore's comment lies in the fact that PCs are one million times more powerful today than they were 20 years ago. And they will be one million times more powerful in 20 years' time. It's a similar situation, too, when you look at millions of information points on CPUs against cost, and DRAM hard disc space.

"This has been brought about by advances in solid state physics, he said. "But these things don't just happen. They need personal motivation. It's a virtuous circle: the number of software varieties increases, their volume grows and the price falls, the variety increases more and so on. So engineers are pushing ideas out in a dramatic way. But you do have to have demand.“

Turning then to bioinformation, he proposed a digital biology law ­ that key metrics will grow exponentially. That means a doubling every 18months. Over 40 years, this is a factor of onebillion. "It sounds fantastic, but this is already happening. Look at the size of Genbank. Its growth rate is 60percent per year and there is no reason to believe that this won't continue.“

In fact the growth rate could even accelerate: "It is based on expensive techniques. We are now getting new approaches such as arrays, real time PCR and direct optical scans. In the case of 2D arrays, for example, the number of genes you get to your Dollar is growing at an enormous 384percent every year.“

He regards sequencing of the human genome as a landmark, but only part of the equation. "Now we need to sequence every human, every disease organism, every economic plant and animal and ultimately everything in the biosphere,“ he said.

Myhrvold believes that there is easily the demand to drive development a billionfold. Take protein pathways as an example: "Understanding protein pathways would be a hell of a tool. It would completely change the way we look at healthcare for one thing.“

He went on to identify the abig data problems' that are facing computational biologists, including genomes, proteomes, pathogens, human flora/biomes, metabolic pathways, the immune system and the nervous system.

"So there is a lot to do once you have got to the gene. But it is easy to get into the idea of counting bits ­ ideas are just as important as ever. Experimental design will change enormously, as will schema; what data will you collect, what is the structure of the problem? There is also a tremendous amount of work to be done on algorithms.“

Myhrvold ended his talk by reminding the audience that computers are not just tools. "They transform the user and there is no escaping them. Digital biology is here to stay.“

Converting data into knowledge

Myhvold was followed by Sydney Brenner, Distinguished Professor at The Salk Institute in La Jolla, California, USA. Formerly Director of the UK's MRC Laboratory of Molecular Biology in Cambridge, he shared the Nobel Prize in 2002 for discoveries concerning genetic regulation of organ development and programmed cell death. Most of his career has been spent at the heart of the work in molecular biology that has made computational biology possible today.

He, too, began my emphasising the speed of progress: "Recently I remarked to a young scientist that there are two epochs in science ­ the last two years and everything before that.“

Brenner believes that the central problem of biology today is how to convert data into knowledge. "We tend to think of atom by atom descriptions now. It's like the taxonomy of molecules, but we have to get away from the idea that description is everything. LinearG is the primary DNA sequence. We can sequence it, but the issue is trying to find a meaning for it. It's a question of interpretation.“

Among the many challenges in making such interpretations is the fact that the texts in the DNA sequence contain a lot of noise, or junk.

"I think we can see some way into the future, but what is it that we have to know? There is no science if we are going to describe everything without making predictions. It is a very important part of science that we should have an end to description in order to come to a conclusion“.

Eventually, Brenner believes, all biologists will be computational biologists. He illustrated the point with a story about when the National Academy of Sciences changed all its sections a few years ago.

"Some disappeared while other new ones such as neurobiology were added. I couldn't find the molecular biology section ­ it wasn't there any more. So everyone else asked me to join their sections; it wasn't that molecular biology is unfashionable, just that everyone today is a molecular biologist. So I hope and believe that computational biology will just become part of the science that we do to solve these problems.

"I can see a fantastic future for the life sciences. Technology over the last 50 years has really opened the door yet we still have no real idea how, for example, the eye or the brain work. We are not even at the beginning of the beginning,“ he concluded. u

Recent Issues