While the achievement of the $1000 genome will surely bring forth a tidal wave of new WGS data in the next 3-5 years, mountains of newly available data from completed clinical trials (likely via the Pfizer Clinical Cloud approach), related clinical trial transparency initiatives, and EMR data will likely be the near term focus of life science big data activities over the near term.
The ability to effectively exploit data from within the organization has been, and continues to be, the focus of significant investment in actionable analytics in the industry. These efforts are directly contributing to the industry's near term focus on improving operational efficiency and effectiveness, as validated by a measurable near term ROI. As the efforts at Pfizer are showing (which I will discuss in detail in a report to be published shortly), it is now possible to systematically bring together data from completed trials into a common repository to enable researchers, product managers, and senior management to more effectively extract historical information and translate it into more real-time tactical and strategic insights. Concurrently, leading efforts at GSK, Roche, Johnson & Johnson, Novartis and others are bringing completed clinical trial data into the public view with a goal of allowing other researchers to identify and discover new insights. In a similar vein, the advent of the electronic medical record is making real world patient data available like never before, providing researchers with a new window to better understand disease manifestation and progression, drug comparative effectiveness, and real world impact of concurrent disease management.
Historically, the ability to bring together these diverse datasets has been a formidable challenge for the industry, with isolated data silos, incompatible data, and regulatory restrictions limiting the researcher's ability to even access these data at the same time. The data silo issue is evolving and changing with a new focus on access to data outside of organizational firewalls. Aggregation layers are helping to normalize datasets. Careful patient data de-identification, technology-enabled global data management, and improved informed consent are empowering regulatory compliant access to data.
Despite lots of noise to the contrary, the life science industry has not seriously had a big data problem to date, in contrast to other industries (e.g. retail and financial). In conjunction with an increased willingness to exploit these best practices, the life sciences are at the cusp of applying big data technology solutions to an important and increasingly accessible data resource that should help to accelerate discovery and progress in the move towards knowledge-based medicine. The major impediment to progress remains antiquated regulatory processes and infrastructure that will need to change to accommodate 21st century capabilities. As always, the only constant is change and, for the most part, change will be good.
As always, comments and alternative opinions are welcome.