Data and the Future of Healthcare

Interview with Phil Bourne

11 November 2016

Phil Bourne is a founding Editor-in-Chief of PLOS Computational Biology and the Associate Director for Data Science (ADDS) at the National Institutes of Health.

His research focuses are countless: algorithms, text mining, machine learning, metalanguages, biological databases, and visualizations applied to problems in systems pharmacology, evolution, cell signaling, apoptosis, immunology and scientific dissemination coupled with education and public service.

Attendees of ISMB 2016 in Orlando had the opportunity to interact with him at the workshop on Education in Bioinformatics (WEB) — ‘Exploiting Cloud and Virtual Resources for Training : Getting the Best Training in Computational Biology in an Era of Cloud Computing and Big Data’.

In this interview, Phil Bourne gives his opinion about Big Data era, the impact of data science and will impact biological research and healthcare. He also comments on frictions: between the corporate world and research, as well as formal training and citizen science.

The list of your achievements and engagements is very long. Besides your academic ones, you also founded companies so you are familiar with corporate world. With regards to Data Science & Big Data, it seems like there is a hype around them in companies right now. Do you think solutions developed for science can compete with the ones developed for big data with big money? Will academic and private Data Science work together or companies like Google will take a monopoly on data solutions?

Answer: Big data covers a big area — there is a need for government and private funder solutions, for company solutions and for hybrid solutions. Just as the market place benefits from government research that then gets translated into commercial products we will see the same with big data.

Biomedical research and health care is a bit late in the game and has the disadvantage of not being born digital and so is relatively slow to respond.

Notwithstanding there is a lot of fundamental research which the private sector is not likely to do that will be done by government and private foundation funding and ultimately will prove to be of great value. Areas of provenance, standards, data wrangling and security are examples that come to mind.

Biological research is still under an ongoing digitization process. However an important number of wet lab biologist are still reluctant to learn how to use computational tools that would make their research more interesting and more quantitative. What message would you like to convey them?

Answer: The contributions of computation and other peoples data to the biomedical research enterprise cannot be ignored at this point. It should be embraced. At the same time computational biologists have an obligation to make their methods and tools accessible to a broader audience. We need training programs that prepare students for a future where analytical techniques go hand-in-hand with more experimental approaches. Students need to be equally adept in silico, in vivo and in vitro.

You are the founding Editor-in-Chief of PLOS Computational Biology, and you are enthusiastic about open science; open data and open education (like MOOCs and NIH training programs like Big Data to Knowledge). Can you explain why these matters are so important to you? Do you think non-scientist citizens should have equal access to scientific research and should actively participate?
For instance, in Paris we had a data challenge (Epidemium) for cancer data, which involved engaging citizens to improve data management (databases) and data treatment. Do you think formal scientific education is necessary to participate in research?

Answer: The future of biomedical research involves a broader spectrum of data types and data which is publicly accessible. This creates unprecedented opportunities and we are already seeing the results. Examples are predicting healthcare outcomes from electronic health records either alone or in combination with other data. Use of mobility data to influence behavior etc. We need to foster the use of these data and the new methods emerging to use such open data. Openness breaks down the traditional academic hierarchy and this is a good thing. We are doing what we can to foster new audiences to access and discover from the these data. The Open Science Prize and the Ideas labs we run with the NSF are examples.

While formal training is not necessary we must maintain the ethical standards and the rigor of the scientific method.

Do you think we are capable of coping with the amount of generated data? I have experienced in my own research that dealing even with medium size datasets can be problematic and many questions can be asked when looking for data from different angles. Is there a threat that with big data we could miss important questions? Some leaders of big data businesses say that we should slow the rate of data generation as data storage is costly and we are not able to analyse all these data. What’s your opinion?

Answer: This question is getting at data sustainability and maintenance and we have much work to do as a community here. We do not have a good handle on the rate of data generation, how to determine what to keep and what to discard, how to make the community aware of the persistence of particular types of data and many more issues. In the end we have to make decisions about cost vs benefit. In business the model is clearer — if we can leverage the data for the benefit of the company keep it, otherwise discard it. In academia there is a legacy of data preservation irrespective of perceived current value and the value cannot necessarily be measured in monetary terms. We need to develop metrics and best practices to define our data and subsequently make preservation decisions based upon these supporting data.

Today Big data, tomorrow…? What will be the next big step for health research?

Answer: The patient will become the center of the health care system as opposed to early health care (the physician) and today (the health care organization). The patient/participant will be empowered by a usable version of their own health record and the biomedical research which has used that record. They will interpret it themselves and whole new industries will appear to help them interpret the data and make decisions with the assistance of health care providers. Healthcare providers and healthcare industries will need to adjust to this new reality. From a research perspective, systems based approaches will become more accepted and mainstream and be used as predictive tools Machine learning will figure more prominently. Neuroscience will grow further as we explore the basis of consciousness and other aspects of the human condition. The relationship of health to the environment will be facilitated by the availability of large public environmental datasets and much more. Last but not least is the use of computational techniques and associated data in first response situations, notably concerning pandemics.

Closing remarks

The vision of data and healthcare outlined by Phil Bourne seem to be pretty optimistic. It is easy to see that the world of Data Science is complex but it is not lacking opportunities. This provides a lot of hope to researchers and students in the field of computational biology. As a community our mission is to seek opportunities in order to make the most of this universe; to explore it and to find new solutions for improving healthcare. Now is the ideal time to come forward with any ideas you have been sitting on — the potential benefits of data solutions are now more real than ever.

We extend our thanks to Phil Bourne for participating in this interview. Phil Bourne is a founding Editor-in-Chief of PLOS Computational Biology, which manages this collaborative blog.

Disclaimer: Any views expressed are those of the author, not necessarily those of PLOS.

< Back

Intro and retrospective on Computational Biology & Data

Start

Start page

by:
Urszula Czerwinska
(urszula.czerwinska@cri-paris.org)
http://urszulaczerwinska.gihub.io

Data Scientist

PhD in Bio-Mathematics, Data Science & Machine Learning