Big data is a powerful tool, but also one that should be used with caution, said presenters at the MIT Technology Review's EmTech event in Cambridge, Massachusetts. As data analytic tools become more and more powerful, new questions about privacy and security will emerge -- especially in healthcare.
"We've said that this year, the next frontier of big data will be the individual," MIT Technology Review Editor in Chief Jason Pontin said at the event. "From healthcare, to different ways to make data more accessible, data is becoming highly personalized and new services are rolling into the market. But at the same time, new questions are raised as we consider what the implications might be for security as well as privacy."
Kate Crawford, a principal researcher at Microsoft Research, talked about some "myths of big data," often using health care use cases as illustrations. The "myth of objectivity" is that big data will always provide an objective view of a population, when really it can often be skewed toward smartphone, Twitter, or internet users. For instance, after years of accuracy, Google Flu Trends, which uses de-identified search data to map the flu, reported twice as many cases last year as the CDC. The culprit turned out to be modt likely public hysteria about the flu, which caused healthy people to conduct misleading Google searches.
Crawford also spoke about some of the risks involved in de-identified data and the promise of anonymity. Namely, that as data analytics get better and better, anonymous data can even be re-identified after the fact. In response to a question from MobiHealthNews about de-identified data from health apps, Crawford said the problem is even bigger than data created by apps.
"We're in a really interesting situation with health data," she said. "We have previously had HIPAA as the act that really tightly regulates how health data can move and be shared and that's worked really well for doctors. But think about what happens when you get sick now. You probably type something into a search engine, put your symptoms in and see 'what might be wrong with me?' That data is completely unprotected by HIPAA."
Mitchell Higashi, chief economist at GE Healthcare, pointed out that when quantified self apps create a complex biometric picture of an individual, they also create the possibility of a "quantified clone" of that individual, which could then be accessed by others.
"There's benefits to you getting realtime feedback about your health and then there's the question of, well in this virtual world, as we get more and more precise information on individuals, how does that help us plan as a society for future healthcare burdens?" he said. "It's an important discussion to have and we have to be careful on these things."
In Higashi's talk, he spoke about an evolution of big data that's being developed by GE to plan locations of new hospitals in India, which to some extent skirts privacy questions. Although the Indian state of Andhra Pradesh is the current testing ground, Higashi said he hoped to bring similar technologies to hospitals all around the world.
GE worked with MadPow to create a gamified interface where users can drag and drop hospitals on a model of an Indian state to see how much the hospital would cost and how much it would improve health outcomes. The map overlays the existing health infrastructure, power grid, and clean water availability.
The real "big data" innovation is not just data about the environment, though, but a model of the people in it. GE Healthcare turned to the Argonne National Laboratory, the research arm of the department of energy, which has been working on "agent-based modeling," to try to get a handle on that factor.
"In a world of unstructured big data, we go all the way to the extreme of highly structured data, where we're essentially re-creating digital people," said Higashi. "Once your unit of measurement is a person, you can now build into this person code. Specific health behaviors, risks for many diseases, and you can tag them to a virtual location on the map."
Higashi sees a much wider use case for this modeling and technology. If the system could be opened up to more users, it would allow doctors anywhere to develop population-level interventions in their hospitals and communities and begin to test them. Higashi stressed that there are limits to how much can be accomplished with modeling, however.
"All of this is about the design phase," he said. "We still need to implement in the real world to collect data."