Transforming 300 billion points of data into diagnostics, therapeutics and new insights into disease
INTRODUCTION
Over the past few years, collaboration between the CPNP Research and Program Committees has resulted in high quality, member-driven research programming at the Annual Meeting. Programming input is requested from all CPNP committees but this collaboration, in particular, has led to the development of focused programming fulfilling a need for members who either work primarily in a research setting or want to keep abreast of the ever-changing field of psychopharmacology research. One of the logistical difficulties of this type of collaboration is the CPNP programming timeline. It is not long after the Annual Meeting that the following year's offerings are determined. With the focus of both committees and commitment to excellence in programming, this project has been a great success. The collaboration has led to two-hour tracks of programming ranging from ‘Incorporating Health Services Research into Practice’ to this past meeting's ‘Academic Detailing’. An additional hour of research programming this year focused on drug repurposing/repositioning. Dr. Atul Butte, MD, a bioinformatician and pediatric endocrinologist from Stanford University's School of Medicine, whose lab uses publicly available molecular measurements to find new uses for drugs, presented a lecture entitled, “Transforming 300 Billion Points of Data into Diagnostics, Therapeutics and New Insights into Disease”. As the offerings are many at the Annual Meeting, CPNP Program Committee leadership thought a brief review of what was discussed in this lecture was in order for all members.
A DATA DELUGE
Dr. Butte began his lecture discussing the data deluge in which we currently live. He commented on an Economist magazine article from three years ago which reported that the human species generated two zetta bytes of data per year, reminding the audience of the metric prefixes from smallest to largest; kilo-, mega-, giga-, tera-, peta- and zetta- to emphasize its large size.1 Within the next year it is expected that four zetta bytes of information will be generated. Though much of this data is described by some to have little scientific value, such as YouTube videos and the like, some is indeed useful for science. Dr. Butte referenced NASA's Sky Survey that took so many high resolution pictures of the sky that there were not enough scientists to evaluate them and they had to utilize crowd sourcing, using the internet community to look for interesting findings such as nebulae and the like among the many high resolution photos.2 He then went on to discuss the data deluge within the life sciences, noting that journals such as Nature, Science and Lancet have made it clear that data should be shared in order to most effectively evaluate and use it.34 In fact, the lack of sharing could be seen as an impediment to public health. Some postulate that the data deluge within the life sciences render the scientific method obsolete, noting that outcomes are already available through data mining and the ‘magic’ will be determining the question to be asked as the data has already been produced.5
He went on to discuss gene expression microarrays, devices that quantitate the entire genome, describing them as commodity items since so many are available that the choice one makes often comes down to price. Some journals have taken a leadership role and have refused to accept papers on microarrays, indicating the need for them to be shared in repositories. In so doing, they can be used by other scientists in other important areas of scientific inquiry. As of August 2012, there were one million publicly available microarrays.67 Dr. Butte went on to discuss examples of large databases that are publically available including PubChem, noting that if the National Institutes of Health funds a study investigating a pharmacological intervention, that data must be available to others on the PubChem website. He then proceeded to discuss the state-of-the-art for studying drugs in the preclinical arena. Once a computer prediction is made to determine a likely drug target, contract research organizations can now provide entire pre-clinical animal experiments. Various types of diseased mice are available and by the click of a computer mouse on a website, the experiment can be run and the entire data set sent to the investigator by, as Dr. Butte described, simply adding it to your online shopping cart.
THE TRANSLATIONAL PIPELINE/GENE SEQUENCING
Dr. Butte then discussed the bench to bedside translational pipeline, referencing multiple stages of the pipeline, again, as commodity services. These stages include molecular measurements, statistical and computational methods and the validation of drugs or biomarkers. The two main life science databases, disease and drug, can and should be merged to come up with new uses for existing drugs. He discussed examples outside of the neuropsychiatric arena, including minoxidil for hair loss, topiramate for inflammatory bowel disease,8 and cimetidine for slow growth lung cancers.9 He then discussed examples within psychiatry and neurology including bupropion for smoking cessation, fluoxetine for premenstrual dysphoria, rivastigmine for Parkinson's disease dementia, pramipexole for restless leg syndrome, and minocycline and exenatide for Alzheimer's disease.
The lecture ended with a discussion of the excitement around sequencing. Dr. Butte mentioned that companies are on track to sequence the full human genome within 15 minutes and can now offer full genomes for approximately $1000. In the future prices will be as low as $33. Finally, he presented a case of a patient that presented to his doctor's office with his entire genome to determine his risk of sudden death after a family member died of a heart attack. The result was a ‘riskogram’ that detailed the diseases he was most at risk for and life style changes that might decrease his risk. Though not a perfect science, it is a clear prototype of what the future holds with respect to genetic sequencing.10
CONCLUSION
The presentation from Dr. Butte highlighted the importance of transforming data into important findings related to diseases and treatments. The take home points from his lecture were that interpretation of results is a much bigger challenge than obtaining and analyzing data, that tools currently exist to enable new, important drug discovery and that personalized medicine will take more than the just the knowledge of DNA; it must include other clinical, molecular and environmental measures. For those who would like to hear the presentation in its entirety, look for future notification of its availability at CPNP University via your CPNP Weekly Updates.
Contributor Notes
Editor's Note: Dr. Overman, in addition to his duties at NIMH, also serves as Administrative Chair of the CPNP Program Committee.