With whole genome sequencing (WGS) increasingly commoditized, translating the torrent of new WGS data into usable information is the next major IT challenge. While there continue to be debates at the top genomics centers as to whether to create their own datacenters or go to commercial cloud service providers (or some mixture of both), most researchers are realizing that the practical move to the cloud is only the first step on their path to sustainable genomics analysis capabilities.
In reviewing presentations that I attended at the recent BioIT World meeting here in Boston, there were a number of great insights on IT in the life sciences and the move to the cloud.
As researchers set up their first AWS EC2 instance to begin analysis of their newly created WGS datasets, they are finding that there are multiple ways to proceed depending on the type of data, size of their datasets, and the complexity of their analyses. Analysis of an exome is different from analysis of RNA-Seq which is different from analysis of whole genomes. Is a grid computing adequate or is there the need to MapReduce? As researchers gain additional experience, they routinely establish automated workflows to enable them to repetitively analyze current and future datasets. Depending on the size and complexity of the data, it becomes possible to further optimize analyses by using different HPC cloud configurations which can shorten the time needed to complete analyses as well as reduce the overall cost of analyses. All of these processes and issues are becoming increasingly apparent and academic consortia and commercial service providers are rising to the challenge to help researchers to smooth the path forward.
From an individual lab's perspective, the choices can be overwhelming. Most labs cannot afford full time programmers to help create and modify process workflows and tight budgets routinely slow progress. While early analyses from just a few years ago (e.g. exome analysis) could routinely be performed on a laptop, WGS analysis will likely require HPC and/or the scalability of the cloud. The $1000 genome has brought WGS to the masses and the cloud will eventually take them to the promised land. With robust process workflows in development and commercial service support gaining traction, genomics IT infrastructure is well on the path to maturity. That said, there is value to the old adage, "there are some tasks that are best left to professionals." Genomics IT is likely to be one of those.
As always, comments and alternative opinions are welcome.