DNA sequencing bottlenecked in a deluge of data

BGI, based in China, is the world’s largest genomics research institute, with 167 DNA sequencers producing the equivalent of 2,000 human genomes a day.

BGI churns out so much data that it often cannot transmit its results to clients or collaborators over the Internet or other communications lines because that would take weeks. Instead, it sends computer disks containing the data via FedEx.

“It sounds like an analog solution in a digital age,” said Sifei He, the head of cloud computing for BGI, formerly known as the Beijing Genomics Institute.

But for now, he said, there is no better way.

The field of genomics is caught in a data deluge. DNA sequencing is becoming faster and cheaper at a pace far outstripping Moore’s law, which describes the rate at which computing gets faster and cheaper.

The result is that the ability to determine DNA sequences is starting to outrun the ability of researchers to store, transmit and especially to analyze the data.

“Data handling is now the bottleneck,” said David Haussler, director of the Center for Biomolecular Science and Engineering at the University of California, Santa Cruz. “It costs more to analyze a genome than to sequence a genome.”

That could delay the day when DNA sequencing is routinely used in medicine. In only a year or two, the cost of determining a person’s complete DNA blueprint is expected to fall below $1,000. But that long-awaited threshold excludes the cost of making sense of that data, which is becoming a bigger part of the total cost as sequencing costs themselves decline.

“The real cost in the sequencing is more than just running the sequencing machine,” said Mark Gerstein, professor of biomedical informatics at Yale. “And now that is becoming more apparent.”

But the data challenges are also creating opportunities. There is demand for people trained in bioinformatics, the convergence of biology and computing. Numerous bioinformatics companies, like SoftGenetics, DNAStar, DNAnexus and NextBio, have sprung up to offer software and services to help analyze the data. EMC, a maker of data storage equipment, has found life sciences a fertile market for products that handle large amounts of information. BGI is starting a journal, GigaScience, to publish data-heavy life science papers.

“We believe the field of bioinformatics for genetic analysis will be one of the biggest areas of disruptive innovation in life science tools over the next few years,” Isaac Ro, an analyst at Goldman Sachs, wrote in a recent report.