Genetics: Exploring microbial genomics standards

Microbes contribute to manifold human endeavours ranging from bioenergy to agriculture to medicine. Moreover, they make the Earth's biogeochemical cycles go round, a prerequisite for all life on the planet. Exceedingly numerous, they are also extremely diverse, encompassing most of Earth's total biodiversity. So it should come as no surprise to find that two-thirds of the nearly 5,000 genome projects reported in the Genomes OnLine Database involve microbes. But far more could be done with microbial genomics, according to DOE JGI Genome Biology head Nikos Kyrpides, if researchers would embrace the world of possibilities that lie beyond the present anthropocentric focus and would also institute shared standards for genomic data collection and analysis.

In a perspective piece published in the July issue of the journal Nature Biotechnology, Kyrpides reflects on the role of microbial studies in the genomics revolution of the past decade, and considers the factors that have hindered the advancement of the field. Although nearly 1,000 microbial genomes have been sequenced over the past 15 years, nearly a quarter of them by DOE JGI, he noted that the data obtained has been compromised by the lack of standards for so many critical procedures in the field, procedures ranging from simple data exchange to gene finding, function prediction, and metabolic pathway description. Echoing other researchers, most notably DOE JGI's Patrick Chain and Miriam Land during the recent "Sequencing, Finishing, Analysis in the Future" Conference, Kyrpides calls for the development of genome annotation standards and their adoption by sequencing centres around the world - a necessity for meaningful genome comparisons.

Kyrpides offers numerous suggestions to meet these and other challenges that face genomics research in the decade ahead. For example, the list of microbial genomes for potential sequencing, limited to the approximately one percent of the organisms that can be cultured in the lab, has been further biased by a focus on a few groups of particular impact on human health or activities. Thus, vast realms of biodiversity remain unexplored. Kyrpides applauds the effort to coordinate balanced sampling of the Tree of Life recently launched through GEBA: the Genomic Encyclopedia of Bacteria and Archaea. He also sees a way forward using single-cell genomics - a technique now being pursued in earnest by DOE JGI researcher Tanja Woyke and her colleagues - in partnership with environmental metagenomics to provide a more holistic understanding of microbial communities and their individual members.

Kyrpides also suggests several innovative approaches for easing the data processing bottleneck accompanying the exponential increase in genomic data. All-versus-all gene comparisons - previously a common practise - will become infeasible. To reduce the size of the datasets, he proposes a proxy approach in which one protein from each protein family or one species from each genus represents the group. Taking this one step farther, all the genes from all the sequenced strains in a species - the pan-genome for that species - would constitute the genome representing that species for gene comparisons.

Sharing his vision for the future of microbial genomics, Kyrpides observes: "The remarkable number of microbes-already estimated to be several orders of magnitude greater than the number of stars in the universe-urgently calls for a transition from random, anecdotal, and small scale surveys towards a systematic and comprehensive exploration of our planet." With new tools in hand and international initiatives for increased collaboration underway, the field of microbial genomics is poised for a decade of exciting advances.

The U.S. Department of Energy Joint Genome Institute, supported by DOE's Office of Science, is committed to advancing genomics in support of DOE missions related to clean energy generation and environmental characterization and cleanup. DOE JGI, headquartered in Walnut Creek, Calif., provides integrated high-throughput sequencing and computational analysis that enable systems-based scientific approaches to these challenges.

Recent Issues